TY - JOUR
T1 - Dimensional Reduction of QSAR Features Using a Machine Learning Approach on the SARS-Cov-2 Inhibitor Database
AU - Yanuar, Arry
PY - 2022/12/22
Y1 - 2022/12/22
N2 - Quantitative Structure-Activity Relationship (QSAR) is a method that relates the chemical composition of a molecule to its biochemical, pharmaceutical and biological activities. The characteristics of a molecule's chemical constituents, such as chemical descriptors and fingerprints, are necessary to create a good QSAR model. Dimensionality reduction can alleviate the issue of several unnecessary and redundant chemical descriptors and chemical fingerprints in a high-dimensional feature-number data set by shrinking the high-dimensional original space to a low-dimensional intrinsic space. There are two categories of dimensional reduction techniques: feature extraction and feature selection. The dimension reduction approach can be utilized as a starting step in running a QSAR Virtual Screening Model on a dataset of SARS-CoV-2 inhibitor medications to create novel treatments for Covid-19 cases based on machine learning (ML) and the idea of medicinal repurposing. Fe extraction and feature selection are crucial to determining which feature sets should be applied to a specific classification process in QSAR modeling to produce reliable virtual screening results. The SARS-Cov-2 inhibitor drug database's chemical descriptor and chemical fingerprint were extracted using a simple, quick, and accurate method in this work. The total number of selected features is 12122 features. PCA, Missing values, and Random Forest are the techniques employed. The Xgboost Tree Ensemble, Naive Bayes, Support Vector Machine, Random Forest, and Deep Learning (Artificial Neural Network/Multilayer Perceptron) were used to classify the QSAR modeling on the training and test data. The Random Forest approach, when applied to all chemical descriptors and chemical fingerprint features, along with the XGBoost algorithm, yields the best feature selection results (accuracy value of 0.845 and AUC of 0.904). There are 233 characteristics for the regression QSAR approach and 273 features for the feature selection-based QSAR method of classification. Next, virtual screening of QSAR modeling of prospective drugs for Covid-19 therapy can be done utilizing the outcomes of the characteristics that have been chosen using the Random Forest approach
AB - Quantitative Structure-Activity Relationship (QSAR) is a method that relates the chemical composition of a molecule to its biochemical, pharmaceutical and biological activities. The characteristics of a molecule's chemical constituents, such as chemical descriptors and fingerprints, are necessary to create a good QSAR model. Dimensionality reduction can alleviate the issue of several unnecessary and redundant chemical descriptors and chemical fingerprints in a high-dimensional feature-number data set by shrinking the high-dimensional original space to a low-dimensional intrinsic space. There are two categories of dimensional reduction techniques: feature extraction and feature selection. The dimension reduction approach can be utilized as a starting step in running a QSAR Virtual Screening Model on a dataset of SARS-CoV-2 inhibitor medications to create novel treatments for Covid-19 cases based on machine learning (ML) and the idea of medicinal repurposing. Fe extraction and feature selection are crucial to determining which feature sets should be applied to a specific classification process in QSAR modeling to produce reliable virtual screening results. The SARS-Cov-2 inhibitor drug database's chemical descriptor and chemical fingerprint were extracted using a simple, quick, and accurate method in this work. The total number of selected features is 12122 features. PCA, Missing values, and Random Forest are the techniques employed. The Xgboost Tree Ensemble, Naive Bayes, Support Vector Machine, Random Forest, and Deep Learning (Artificial Neural Network/Multilayer Perceptron) were used to classify the QSAR modeling on the training and test data. The Random Forest approach, when applied to all chemical descriptors and chemical fingerprint features, along with the XGBoost algorithm, yields the best feature selection results (accuracy value of 0.845 and AUC of 0.904). There are 233 characteristics for the regression QSAR approach and 273 features for the feature selection-based QSAR method of classification. Next, virtual screening of QSAR modeling of prospective drugs for Covid-19 therapy can be done utilizing the outcomes of the characteristics that have been chosen using the Random Forest approach
KW - QSAR
KW - PCA
KW - Missing values
KW - Random forest
KW - SARS-Cov-2
UR - https://jppipa.unram.ac.id/index.php/jppipa
U2 - 10.29303/jppipa.v8i6.2432
DO - 10.29303/jppipa.v8i6.2432
M3 - Article
SN - 2460-2582
VL - 8
SP - 3095
EP - 3101
JO - Jurnal Penelitian Pendidikan IPA
JF - Jurnal Penelitian Pendidikan IPA
IS - 6
ER -