Ensemble learning for predicting mortality rates affected by air quality

Kartika C. Dewi, Widya F. Mustika, H. Murfi

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)


Methods in machine learning are very helpful to solve various problems, especially those related to large data. The mortality rate is one of the problems related to large data which is fluctuating depending on the factors that influence it. One of the factors that affect the mortality rates is air quality. Methods that can be used to predict the mortality rate of a population are Random Forest and Extreme Gradient Boosting (XGBoost), which is an ensemble method with decision trees as the basic model. The missing values in the data used to cause the low level of accuracy. In this paper, we discuss how to handle missing values and comparing the accuracy level of ensemble methods that we used to predict the mortality rate. By the simulation results, it shown that handle the missing values in the data is best overcome by removing the missing values (Drop NaN). Mean Square Error (MSE) value generated by the Random Forest and XGBoost methods are 0.007239 ± (1.699 x 10-7) and 0.04019. Based on the MSE values of both methods, Random Forest gives better accuracy than XGBoost to predict mortality rate affected by air quality.

Original languageEnglish
Article number012021
JournalJournal of Physics: Conference Series
Issue number1
Publication statusPublished - 17 May 2019
Event2nd International Conference on Data and Information Science, ICoDIS 2018 - Bandung, Indonesia
Duration: 15 Nov 201816 Nov 2018


Dive into the research topics of 'Ensemble learning for predicting mortality rates affected by air quality'. Together they form a unique fingerprint.

Cite this