TY - JOUR
T1 - Academic expert finding using BERT pre-trained language model
AU - Mannix, Ilma Alpha
AU - Yulianti, Evi
N1 - Publisher Copyright:
© 2024, Universitas Ahmad Dahlan. All rights reserved.
PY - 2024/5
Y1 - 2024/5
N2 - Academic expert finding has numerous advantages, such as: finding paper-reviewers, research collaboration, enhancing knowledge transfer, etc. Especially, for research collaboration, researchers tend to seek collaborators who share similar backgrounds or with the same native languages. Despite its importance, academic expert findings remain relatively unexplored within the context of Indonesian language. Recent studies have primarily relied on static word embedding techniques such as Word2Vec to match documents with relevant expertise areas. However, Word2Vec is unable to capture the varying meanings of words in different contexts. To address this research gap, this study employs Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art contextual embedding model. This paper aims to examine the effectiveness of BERT on the task of academic expert finding. The proposed model in this research consists of three variations of BERT, namely IndoBERT (Indonesian BERT), mBERT (Multilingual BERT), and SciBERT (Scientific BERT), which will be compared to a static embedding model using Word2Vec. Two approaches were employed to rank experts using the BERT variations: feature-based and fine-tuning. We found that the IndoBERT model outperforms the baseline by 6–9% when utilizing the feature-based approach and shows an improvement of 10–18% with the fine-tuning approach. Our results proved that the fine-tuning approach performs better than the feature-based approach, with an improvement of 1–5%. It concludes by using IndoBERT, this research has shown an improved effectiveness in the academic expert finding within the context of Indonesian language.
AB - Academic expert finding has numerous advantages, such as: finding paper-reviewers, research collaboration, enhancing knowledge transfer, etc. Especially, for research collaboration, researchers tend to seek collaborators who share similar backgrounds or with the same native languages. Despite its importance, academic expert findings remain relatively unexplored within the context of Indonesian language. Recent studies have primarily relied on static word embedding techniques such as Word2Vec to match documents with relevant expertise areas. However, Word2Vec is unable to capture the varying meanings of words in different contexts. To address this research gap, this study employs Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art contextual embedding model. This paper aims to examine the effectiveness of BERT on the task of academic expert finding. The proposed model in this research consists of three variations of BERT, namely IndoBERT (Indonesian BERT), mBERT (Multilingual BERT), and SciBERT (Scientific BERT), which will be compared to a static embedding model using Word2Vec. Two approaches were employed to rank experts using the BERT variations: feature-based and fine-tuning. We found that the IndoBERT model outperforms the baseline by 6–9% when utilizing the feature-based approach and shows an improvement of 10–18% with the fine-tuning approach. Our results proved that the fine-tuning approach performs better than the feature-based approach, with an improvement of 1–5%. It concludes by using IndoBERT, this research has shown an improved effectiveness in the academic expert finding within the context of Indonesian language.
KW - Academic expert finding
KW - BERT
KW - Contextual embedding
KW - Static embedding
KW - Word2Vec
UR - http://www.scopus.com/inward/record.url?scp=85201661369&partnerID=8YFLogxK
U2 - 10.26555/ijain.v10i2.1497
DO - 10.26555/ijain.v10i2.1497
M3 - Article
AN - SCOPUS:85201661369
SN - 2442-6571
VL - 10
SP - 280
EP - 295
JO - International Journal of Advances in Intelligent Informatics
JF - International Journal of Advances in Intelligent Informatics
IS - 2
ER -