TY - GEN
T1 - Prediction of protein tertiary structure using pre-Trained self-supervised learning based on transformer
AU - Kurniawan, Alif
AU - Jatmiko, Wisnu
AU - Hertadi, Rukman
AU - Habibie, Novian
N1 - Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/10/17
Y1 - 2020/10/17
N2 - Information of 3D protein structure plays an important role in various fields, including health, biotechnology, biomaterials, and so on. Knowledge of the 3D structure of proteins will help in understanding the interactions that can take place with other molecules such as drug molecules and other effector molecules. To get the 3D protein structure, it is generally carried out experimentally using x-ray diffraction and NMR (Nuclear Magnetic Resonance) methods. The experimental method requires a relatively long time and expertise to handle the completion of the structure. Until now, not all protein structures can be determined because the level of complexity varies from one protein to another. One approach is to use machine learning that leverages evolution information and deep learning method. The results given can improve the accuracy of the prediction compared to the conventional approach. However, the level of accuracy is still influenced by the number of homologous proteins in the database. Therefore, this study propose to replace the process of extracting the evolutionary information from Multiple Sequence Analysis (MSA) into Transformer-based self-supervised pre-Trained. To test these changes, an experiment was carried out on a 3D protein structure prediction model based on Long Short-Term Memory (LSTM) and Universal Transformer. The proposed method results show a decrease in the value of distance Root-Mean Square Deviation (dRMSD) of 0.561 Angstrom and Root Mean Square Error (RMSE) torsion angle of 0.11 degree on the Universal Transformer predictor. The results of the T-Test show that the decrease in the two indicators shows a significant result. Therefore, pre-Trained data can be used as an evolutionary information only for the Universal Transformer predictor.
AB - Information of 3D protein structure plays an important role in various fields, including health, biotechnology, biomaterials, and so on. Knowledge of the 3D structure of proteins will help in understanding the interactions that can take place with other molecules such as drug molecules and other effector molecules. To get the 3D protein structure, it is generally carried out experimentally using x-ray diffraction and NMR (Nuclear Magnetic Resonance) methods. The experimental method requires a relatively long time and expertise to handle the completion of the structure. Until now, not all protein structures can be determined because the level of complexity varies from one protein to another. One approach is to use machine learning that leverages evolution information and deep learning method. The results given can improve the accuracy of the prediction compared to the conventional approach. However, the level of accuracy is still influenced by the number of homologous proteins in the database. Therefore, this study propose to replace the process of extracting the evolutionary information from Multiple Sequence Analysis (MSA) into Transformer-based self-supervised pre-Trained. To test these changes, an experiment was carried out on a 3D protein structure prediction model based on Long Short-Term Memory (LSTM) and Universal Transformer. The proposed method results show a decrease in the value of distance Root-Mean Square Deviation (dRMSD) of 0.561 Angstrom and Root Mean Square Error (RMSE) torsion angle of 0.11 degree on the Universal Transformer predictor. The results of the T-Test show that the decrease in the two indicators shows a significant result. Therefore, pre-Trained data can be used as an evolutionary information only for the Universal Transformer predictor.
KW - Protein Structure Prediction
KW - Self-Supervised Pre-Trained
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85097619830&partnerID=8YFLogxK
U2 - 10.1109/IWBIS50925.2020.9255624
DO - 10.1109/IWBIS50925.2020.9255624
M3 - Conference contribution
AN - SCOPUS:85097619830
T3 - 2020 International Workshop on Big Data and Information Security, IWBIS 2020
SP - 73
EP - 78
BT - 2020 International Workshop on Big Data and Information Security, IWBIS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Workshop on Big Data and Information Security, IWBIS 2020
Y2 - 17 October 2020 through 18 October 2020
ER -