Performance Evaluation XGBoost in Handling Missing Value on Classification of Hepatocellular Carcinoma Gene Expression Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Missing values are a condition where there is no value in the observation, which results in loss of information. One of the steps in dealing with missing values is by deleting observations that have missing values. Still, for data that has small missing values, it can reduce important information from the data. This study analyzed the performance evaluation of the XGBoost method in dealing with missing values for classification problems in Hepatocellular Carcinoma gene expression data. This study used Hepatocellular Carcinoma gene expression data with 40 observations and 54675 features obtained from the National Center for Biotechnology Information website. The researchers randomly eliminated 5%, 10%, 15%, and 20% of the total data to compare the model's performance evaluation with the imputation method and without using Imputation. The imputation method used is the mean and k-nearest neighbor method. Measurement of model performance using cross-validation and confusion matrix evaluation procedures. In finding the best parameter, tuning hyperparameter using grid search. In general, the handling of missing values with the mean's Imputation is better in performance evaluation than the handling of missing values with the KNN imputation and without the imputation process for classifying Hepatocellular Carcinoma gene expression data. From the results of the above research, the value of missing 20% using the mean imputation method produces the highest evaluation performance value with 100% specificity, 100% sensitivity, 100% accuracy, 100% precision, and 100% MCC in training and testing data, and sensitivity 88%, 100% specificity, 100% precision, 94% accuracy, and 89% MCC. The XGBoost Machine learning model can handle missing values in a dataset without Imputation, but with the imputation method can improve performance evaluation on classification Hepatocellular Carcinoma Gene Expression Data.

Original languageEnglish
Title of host publicationICICoS 2020 - Proceeding
Subtitle of host publication4th International Conference on Informatics and Computational Sciences
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728195261
DOIs
Publication statusPublished - 10 Nov 2020
Event4th International Conference on Informatics and Computational Sciences, ICICoS 2020 - Semarang, Indonesia
Duration: 10 Nov 202011 Nov 2020

Publication series

NameICICoS 2020 - Proceeding: 4th International Conference on Informatics and Computational Sciences

Conference

Conference4th International Conference on Informatics and Computational Sciences, ICICoS 2020
CountryIndonesia
CitySemarang
Period10/11/2011/11/20

Keywords

  • cross-validation
  • grid search
  • KNN imputation mean imputation
  • XGBoost

Fingerprint Dive into the research topics of 'Performance Evaluation XGBoost in Handling Missing Value on Classification of Hepatocellular Carcinoma Gene Expression Data'. Together they form a unique fingerprint.

Cite this