TY - JOUR
T1 - COMPUTATIONAL QSAR-BASED MACHINE LEARNING APPROACH FOR PREDICTING ACTIVITY OF SGLT2 INHIBITORS USING THE KNIME PLATFORM
AU - Illahi, Adha Dastu
AU - Hertono, Gatot Fatwanto
AU - Yanuar, Arry
N1 - Publisher Copyright:
© 2025 The Authors.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Objective: This study aims to identify optimal predictive models and key molecular fragments by preparing a dataset and using machine learning techniques within the Konstanz Information Miner (KNIME) platform. Methods: The human sodium-glucose cotransporter 2 (SGLT2) target dataset was obtained from the ChEMBL database and refined by removing salts, incomplete/incorrect data, and duplicates. The data was classified into active and inactive compounds, and fingerprints and descriptors were calculated. Christian Borgelt's Molecular Substructure Miner (MoSS) was employed to identify frequent molecular fragments. Following data partitioning, various ‘classification’ and ‘regression’ machine learning (ML) based Quantitative Structure-Activity Relationship (QSAR) models were developed and evaluated using different techniques, including sensitivity and mean Squared Error (MSE). Results: In QSAR classification, the Support Vector Machine (SVM) model demonstrated the best performance with an accuracy of 81.66%, while in QSAR Regression, the Extreme Gradient Boosting (XGB) model exhibited the best coefficient of determination (R2) and mean Absolute Error (MAE) values of 0.69 and 0.47 respectively. The identification of frequent Molecular Fragments highlighted common characteristics in active SGLT2 inhibitors. Conclusion: The results of developing these QSAR models indicate that machine learning methods can be effectively used to predict SGLT2 inhibitors virtually, thereby expediting the drug discovery process.
AB - Objective: This study aims to identify optimal predictive models and key molecular fragments by preparing a dataset and using machine learning techniques within the Konstanz Information Miner (KNIME) platform. Methods: The human sodium-glucose cotransporter 2 (SGLT2) target dataset was obtained from the ChEMBL database and refined by removing salts, incomplete/incorrect data, and duplicates. The data was classified into active and inactive compounds, and fingerprints and descriptors were calculated. Christian Borgelt's Molecular Substructure Miner (MoSS) was employed to identify frequent molecular fragments. Following data partitioning, various ‘classification’ and ‘regression’ machine learning (ML) based Quantitative Structure-Activity Relationship (QSAR) models were developed and evaluated using different techniques, including sensitivity and mean Squared Error (MSE). Results: In QSAR classification, the Support Vector Machine (SVM) model demonstrated the best performance with an accuracy of 81.66%, while in QSAR Regression, the Extreme Gradient Boosting (XGB) model exhibited the best coefficient of determination (R2) and mean Absolute Error (MAE) values of 0.69 and 0.47 respectively. The identification of frequent Molecular Fragments highlighted common characteristics in active SGLT2 inhibitors. Conclusion: The results of developing these QSAR models indicate that machine learning methods can be effectively used to predict SGLT2 inhibitors virtually, thereby expediting the drug discovery process.
KW - Artificial intelligent
KW - In silico
KW - KNIME
KW - Machine learning
KW - QSAR
KW - SGLT2 inhibitor
UR - http://www.scopus.com/inward/record.url?scp=85214659325&partnerID=8YFLogxK
U2 - 10.22159/ijap.2025v17i1.51726
DO - 10.22159/ijap.2025v17i1.51726
M3 - Article
AN - SCOPUS:85214659325
SN - 0975-7058
VL - 17
SP - 328
EP - 333
JO - International Journal of Applied Pharmaceutics
JF - International Journal of Applied Pharmaceutics
IS - 1
ER -