TY - GEN
T1 - Building Indonesian Dependency Parser Using Cross-lingual Transfer Learning
AU - Maulana, Andhika Yusup
AU - Alfina, Ika
AU - Azizah, Kurniawati
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported by the Faculty of Computer Science, Universitas Indonesia.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In recent years, cross-lingual transfer learning has been gaining positive trends across NLP tasks. This research aims to develop a dependency parser for Indonesian using cross-lingual transfer learning. The dependency parser uses a Transformer as the encoder layer and a deep biaffine attention decoder as the decoder layer. The model is trained using a transfer learning approach from a source language to our target language with fine-tuning. We choose four languages as the source domain for comparison: French, Italian, Slovenian, and English. Our proposed approach is able to improve the performance of the dependency parser model for Indonesian as the target domain on both same-domain and cross-domain testing. Compared to the baseline model, our best model increases UAS up to 4.31% and LAS up to 4.46%. Among the chosen source languages of dependency treebanks, French and Italian that are selected based on LangRank output perform better than other languages selected based on other criteria. French, which has the highest rank from LangRank, performs the best on cross-lingual transfer learning for the dependency parser model.
AB - In recent years, cross-lingual transfer learning has been gaining positive trends across NLP tasks. This research aims to develop a dependency parser for Indonesian using cross-lingual transfer learning. The dependency parser uses a Transformer as the encoder layer and a deep biaffine attention decoder as the decoder layer. The model is trained using a transfer learning approach from a source language to our target language with fine-tuning. We choose four languages as the source domain for comparison: French, Italian, Slovenian, and English. Our proposed approach is able to improve the performance of the dependency parser model for Indonesian as the target domain on both same-domain and cross-domain testing. Compared to the baseline model, our best model increases UAS up to 4.31% and LAS up to 4.46%. Among the chosen source languages of dependency treebanks, French and Italian that are selected based on LangRank output perform better than other languages selected based on other criteria. French, which has the highest rank from LangRank, performs the best on cross-lingual transfer learning for the dependency parser model.
KW - cross-domain
KW - cross-lingual transfer learning
KW - dependency parser
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85143969618&partnerID=8YFLogxK
U2 - 10.1109/IALP57159.2022.9961296
DO - 10.1109/IALP57159.2022.9961296
M3 - Conference contribution
AN - SCOPUS:85143969618
T3 - 2022 International Conference on Asian Language Processing, IALP 2022
SP - 488
EP - 493
BT - 2022 International Conference on Asian Language Processing, IALP 2022
A2 - Tong, Rong
A2 - Lu, Yanfeng
A2 - Dong, Minghui
A2 - Gong, Wengao
A2 - Li, Haizhou
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Asian Language Processing, IALP 2022
Y2 - 27 October 2022 through 28 October 2022
ER -