TY - JOUR
T1 - CUSTOMER CHURN PREDICTION USING HISTOGRAM AUGMENTATION TECHNIQUE AND XGBOOST MODEL WITH BAYESIAN OPTIMIZATION
AU - Pradana, Doni
AU - Purnamasari, Prima Dewi
N1 - Publisher Copyright:
© ICIC International 2024.
PY - 2024/1
Y1 - 2024/1
N2 - Customer churn is a significant issue in many sectors, such as the telecom- munication sector. Therefore, telecommunication companies need to recognize churn risk as early as possible. Data from the IBM telco customer churn dataset was selected as a case study. One of the common challenges in classification problems is an imbalanced dataset, which will likely fail to predict the minority class. Oversampling with Histogram Augmentation Technique (HAT) is proposed in this study for handling the imbalanced class data. An ensemble learning of gradient boost machine learning techniques, namely XGBoost, was used in this study. In addition, we used Bayesian Optimization (BO) to find the best hyperparameter of the model. The experimental result shows that the accu- racy of HAT-XGBoost-BO is 0.88 and the F1-score is 0.85, outperforming the XGBoost, HAT-XGBoost, and SMOTE-XGBoost models.
AB - Customer churn is a significant issue in many sectors, such as the telecom- munication sector. Therefore, telecommunication companies need to recognize churn risk as early as possible. Data from the IBM telco customer churn dataset was selected as a case study. One of the common challenges in classification problems is an imbalanced dataset, which will likely fail to predict the minority class. Oversampling with Histogram Augmentation Technique (HAT) is proposed in this study for handling the imbalanced class data. An ensemble learning of gradient boost machine learning techniques, namely XGBoost, was used in this study. In addition, we used Bayesian Optimization (BO) to find the best hyperparameter of the model. The experimental result shows that the accu- racy of HAT-XGBoost-BO is 0.88 and the F1-score is 0.85, outperforming the XGBoost, HAT-XGBoost, and SMOTE-XGBoost models.
KW - Augmentation
KW - Bayesian optimization
KW - Customer churn
KW - HAT
KW - Imbalanced dataset
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85181149406&partnerID=8YFLogxK
U2 - 10.24507/icicel.18.01.87
DO - 10.24507/icicel.18.01.87
M3 - Article
AN - SCOPUS:85181149406
SN - 1881-803X
VL - 18
SP - 87
EP - 95
JO - ICIC Express Letters
JF - ICIC Express Letters
IS - 1
ER -