TY - GEN
T1 - Predicting the status of water pumps using data mining approach
AU - Darmatasia,
AU - Arymurthy, Aniati Murni
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/3/6
Y1 - 2017/3/6
N2 - Data mining approach can be used to discover knowledge by analyzing the patterns or correlations among of fields in large databases. Data mining approach was used to find the patterns of the data from Tanzania Ministry of Water. It is used to predict current and future status of water pumps in Tanzania. The data mining method proposed is XGBoost (eXtreme Gradient Boosting). XGBoost implement the concept of Gradient Tree Boosting which designed to be highly fast, accurate, efficient, flexible, and portable. In addition, Recursive Feature Elimination (RFE) is also proposed to select the important features of the data to obtain an accurate model. The best accuracy achieved with using 27 input factors selected by RFE and XGBoost as a learning model. The achieved result show 80.38% in accuracy. The information or knowledge which is discovered from data mining approach can be used by the government to improve the inspection planning, maintenance, and identify which factor that can cause damage to the water pumps to ensure the availability of potable water in Tanzania. Using data mining approach is cost-effective, less time consuming and faster than manual inspection.
AB - Data mining approach can be used to discover knowledge by analyzing the patterns or correlations among of fields in large databases. Data mining approach was used to find the patterns of the data from Tanzania Ministry of Water. It is used to predict current and future status of water pumps in Tanzania. The data mining method proposed is XGBoost (eXtreme Gradient Boosting). XGBoost implement the concept of Gradient Tree Boosting which designed to be highly fast, accurate, efficient, flexible, and portable. In addition, Recursive Feature Elimination (RFE) is also proposed to select the important features of the data to obtain an accurate model. The best accuracy achieved with using 27 input factors selected by RFE and XGBoost as a learning model. The achieved result show 80.38% in accuracy. The information or knowledge which is discovered from data mining approach can be used by the government to improve the inspection planning, maintenance, and identify which factor that can cause damage to the water pumps to ensure the availability of potable water in Tanzania. Using data mining approach is cost-effective, less time consuming and faster than manual inspection.
KW - Data Mining
KW - Tree Algorithm
KW - Water Pumps
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85017031692&partnerID=8YFLogxK
U2 - 10.1109/IWBIS.2016.7872890
DO - 10.1109/IWBIS.2016.7872890
M3 - Conference contribution
AN - SCOPUS:85017031692
T3 - 2016 International Workshop on Big Data and Information Security, IWBIS 2016
SP - 57
EP - 63
BT - 2016 International Workshop on Big Data and Information Security, IWBIS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 International Workshop on Big Data and Information Security, IWBIS 2016
Y2 - 18 October 2016 through 19 October 2016
ER -