TY - JOUR
T1 - BREAST CANCER MOLECULAR SUBTYPES CLASSIFICATION WITH NAÏVE BAYES CLASSIFIER
T2 - A CASE STUDY IN INDONESIAN BREAST CANCER PATIENTS
AU - Abdullah, Sarini
AU - Fauziyyah, Nabilla
AU - Nurrohmah, Siti
AU - Rachman, Andhika
N1 - Publisher Copyright:
© ISOSS Publications.
PY - 2021/12
Y1 - 2021/12
N2 - Naïve Bayes Classifier (NBC) is one of the most popular machine learning methods due to its simplicity yet great overall accuracy. Previous studies stated a number of claims and reasons as to why this method is suitable to apply in analysing medical data. While there is abundant literature which cover the application of NBC on medical data, there is little to none which covers its application on medical data in Indonesia, especially Indonesian breast cancer data. The main goal of this study is to apply NBC in classifying 101 breast cancer patients in private hospital in Indonesia into five classes of molecular subtypes, assess its accuracy, and compare its performance with another popular classification model, Decision Tree (DT). Results showed that NBC outperformed DT, predicting the classes of most patients in the test set correctly by having an overall accuracy of 85.7%. NBC also could identify all patients in the human epidermal growth factor receptor 2 (HER2)-Enriched and Triple Negative subtypes faultlessly. Another important result to be noted is that oestrogen (ER), progesterone (PR), and HER2 statuses are dominant factors in differentiating subtypes between each patient, matching the official guide made by the breast cancer expert panel. Several empirical results were also found regarding NBC: it does not need a large data to produce high overall accuracy and it could still predict the classes of most patients correctly even with the presence of missing and noisy data.
AB - Naïve Bayes Classifier (NBC) is one of the most popular machine learning methods due to its simplicity yet great overall accuracy. Previous studies stated a number of claims and reasons as to why this method is suitable to apply in analysing medical data. While there is abundant literature which cover the application of NBC on medical data, there is little to none which covers its application on medical data in Indonesia, especially Indonesian breast cancer data. The main goal of this study is to apply NBC in classifying 101 breast cancer patients in private hospital in Indonesia into five classes of molecular subtypes, assess its accuracy, and compare its performance with another popular classification model, Decision Tree (DT). Results showed that NBC outperformed DT, predicting the classes of most patients in the test set correctly by having an overall accuracy of 85.7%. NBC also could identify all patients in the human epidermal growth factor receptor 2 (HER2)-Enriched and Triple Negative subtypes faultlessly. Another important result to be noted is that oestrogen (ER), progesterone (PR), and HER2 statuses are dominant factors in differentiating subtypes between each patient, matching the official guide made by the breast cancer expert panel. Several empirical results were also found regarding NBC: it does not need a large data to produce high overall accuracy and it could still predict the classes of most patients correctly even with the presence of missing and noisy data.
KW - and posterior class probability
KW - Bayes rule
KW - breast cancer subtypes
KW - classification
KW - decision tree
UR - http://www.scopus.com/inward/record.url?scp=85175310669&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85175310669
SN - 1930-6792
VL - 16
SP - 1
EP - 17
JO - Journal of Applied Probability and Statistics
JF - Journal of Applied Probability and Statistics
IS - 3
ER -