TY - JOUR
T1 - Ensemble learning for predicting mortality rates affected by air quality
AU - Dewi, Kartika C.
AU - Mustika, Widya F.
AU - Murfi, H.
N1 - Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2019/5/17
Y1 - 2019/5/17
N2 - Methods in machine learning are very helpful to solve various problems, especially those related to large data. The mortality rate is one of the problems related to large data which is fluctuating depending on the factors that influence it. One of the factors that affect the mortality rates is air quality. Methods that can be used to predict the mortality rate of a population are Random Forest and Extreme Gradient Boosting (XGBoost), which is an ensemble method with decision trees as the basic model. The missing values in the data used to cause the low level of accuracy. In this paper, we discuss how to handle missing values and comparing the accuracy level of ensemble methods that we used to predict the mortality rate. By the simulation results, it shown that handle the missing values in the data is best overcome by removing the missing values (Drop NaN). Mean Square Error (MSE) value generated by the Random Forest and XGBoost methods are 0.007239 ± (1.699 x 10-7) and 0.04019. Based on the MSE values of both methods, Random Forest gives better accuracy than XGBoost to predict mortality rate affected by air quality.
AB - Methods in machine learning are very helpful to solve various problems, especially those related to large data. The mortality rate is one of the problems related to large data which is fluctuating depending on the factors that influence it. One of the factors that affect the mortality rates is air quality. Methods that can be used to predict the mortality rate of a population are Random Forest and Extreme Gradient Boosting (XGBoost), which is an ensemble method with decision trees as the basic model. The missing values in the data used to cause the low level of accuracy. In this paper, we discuss how to handle missing values and comparing the accuracy level of ensemble methods that we used to predict the mortality rate. By the simulation results, it shown that handle the missing values in the data is best overcome by removing the missing values (Drop NaN). Mean Square Error (MSE) value generated by the Random Forest and XGBoost methods are 0.007239 ± (1.699 x 10-7) and 0.04019. Based on the MSE values of both methods, Random Forest gives better accuracy than XGBoost to predict mortality rate affected by air quality.
UR - http://www.scopus.com/inward/record.url?scp=85066318967&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/1192/1/012021
DO - 10.1088/1742-6596/1192/1/012021
M3 - Conference article
AN - SCOPUS:85066318967
SN - 1742-6588
VL - 1192
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012021
T2 - 2nd International Conference on Data and Information Science, ICoDIS 2018
Y2 - 15 November 2018 through 16 November 2018
ER -