TY - GEN
T1 - Predicting the molecular structure relationship and the biological activity of DPP-4 inhibitor using deep neural network with catboost method as feature selection
AU - Hamzah, Haris
AU - Bustamam, Alhadi
AU - Yanuar, Arry
AU - Sarwinda, Devvi
N1 - Funding Information:
This research is funded by Tesis Magister Grant 2020 from Kemenetrian Riset dan Teknologi/Badan Riset dan Inovasi Nasional, Indonesia with contract number is NKB-481/UN2.RST/HKP.05.00/2020. We also would like to thank all anonymous reviewers of this paper for their constructive comments and helpful suggestions.
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/10/17
Y1 - 2020/10/17
N2 - Dipeptidyl peptidase 4 (DPP-4) are drug targets for type-2 diabetes mellitus (T2DM). The enzyme dipeptidyl peptidase 4 (DPP-4) can catalyze the decrease in the hormone incretin peptide, especially peptide-1, such as glucagon-like peptide-1 (GLP-1) and the hormone gastric inhibitory peptide (GIP), which results in decreased insulin synthesis. Inhibitors of DPP-4 are promising drug targets for T2DM because they are able to block the work of the DPP-4 enzyme by inhibiting the action of the hormones GLP-1 and GIP. Unfortunately, DPP-4 inhibitors have some adverse effects, such as nausea, headache, nasopharyngitis, and skin reactions. So, the medical field are still expecting new DPP-4 inhibitors with minimal effects. In this study, there are 1773 structures of DPP-4 inhibitors with 1185 active compounds and 588 inactive compounds extracted using topological fingerprints as descriptors. As there is a class imbalance in the dataset, there needs to be an oversampling technique applied, we have decided to use the SMOTE technique. The deep neural network (DNN) method is proposed as a method of classifying DPP-4 inhibitors and is optimized using Adam's optimizer and dropout regularization technique. In addition, we introduce CatBoost as a feature selection method. As a result, the DNN method combined with ECFP-6 and using feature selection with the proportion of the importance value of the feature at 90% produces the highest MCC value that is 0.810, and the sensitivity, specificity, and accuracy values being 0.927, 0.881, and 0.906, respectively.
AB - Dipeptidyl peptidase 4 (DPP-4) are drug targets for type-2 diabetes mellitus (T2DM). The enzyme dipeptidyl peptidase 4 (DPP-4) can catalyze the decrease in the hormone incretin peptide, especially peptide-1, such as glucagon-like peptide-1 (GLP-1) and the hormone gastric inhibitory peptide (GIP), which results in decreased insulin synthesis. Inhibitors of DPP-4 are promising drug targets for T2DM because they are able to block the work of the DPP-4 enzyme by inhibiting the action of the hormones GLP-1 and GIP. Unfortunately, DPP-4 inhibitors have some adverse effects, such as nausea, headache, nasopharyngitis, and skin reactions. So, the medical field are still expecting new DPP-4 inhibitors with minimal effects. In this study, there are 1773 structures of DPP-4 inhibitors with 1185 active compounds and 588 inactive compounds extracted using topological fingerprints as descriptors. As there is a class imbalance in the dataset, there needs to be an oversampling technique applied, we have decided to use the SMOTE technique. The deep neural network (DNN) method is proposed as a method of classifying DPP-4 inhibitors and is optimized using Adam's optimizer and dropout regularization technique. In addition, we introduce CatBoost as a feature selection method. As a result, the DNN method combined with ECFP-6 and using feature selection with the proportion of the importance value of the feature at 90% produces the highest MCC value that is 0.810, and the sensitivity, specificity, and accuracy values being 0.927, 0.881, and 0.906, respectively.
KW - Adam
KW - CatBoost
KW - Circular fingerprints
KW - Deep neural network
KW - Dipeptidyl peptidase-IV
UR - http://www.scopus.com/inward/record.url?scp=85099752440&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS51025.2020.9263204
DO - 10.1109/ICACSIS51025.2020.9263204
M3 - Conference contribution
AN - SCOPUS:85099752440
T3 - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
SP - 101
EP - 108
BT - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
Y2 - 17 October 2020 through 18 October 2020
ER -