TY - JOUR
T1 - Japanese Short Answer Grading for Japanese Language Learners Using the Contextual Representation of BERT
AU - Lalita Luhurkinanti, Dyah
AU - Dewi Purnamasari, Prima
AU - Tsunakawa, Takashi
AU - Agung Putri Ratna, Anak
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - The automatization of grading short answers in examinations aims to help teachers grade more efficiently and fairly. The Japanese SIMPLE-O attempts to grade Japanese language learners' short answers using a dataset from a real examination. Bidirectional encoder representations from transformer (BERT), which has shown potential for natural language processing (NLP) tasks, is implemented to grade answers without fine-tuning due to the small amount of data. Two experiments are conducted in this study. The first experiment attempts to grade based on similarities, while the second classifies the answers as either correct or incorrect. Five BERT models are tested in the system, and two additional sentence BERT (SBERT) and RoBERTa models are tested for the similarity problem. The best Pearson's correlation for grading with similarities is obtained with the Tohoku BERT Base. The use of hiragana-kanji conversion improves the correlation to 0.615 for BERT and 0.593 for SBERT but does not show much improvement for RoBERTa. In the binary classification experiments, all models have an accuracy above 90%, with Tohoku BERT Large having the best performance. Even without fine-tuning, BERT can be used as an embedding method to perform binary classification with high accuracy.
AB - The automatization of grading short answers in examinations aims to help teachers grade more efficiently and fairly. The Japanese SIMPLE-O attempts to grade Japanese language learners' short answers using a dataset from a real examination. Bidirectional encoder representations from transformer (BERT), which has shown potential for natural language processing (NLP) tasks, is implemented to grade answers without fine-tuning due to the small amount of data. Two experiments are conducted in this study. The first experiment attempts to grade based on similarities, while the second classifies the answers as either correct or incorrect. Five BERT models are tested in the system, and two additional sentence BERT (SBERT) and RoBERTa models are tested for the similarity problem. The best Pearson's correlation for grading with similarities is obtained with the Tohoku BERT Base. The use of hiragana-kanji conversion improves the correlation to 0.615 for BERT and 0.593 for SBERT but does not show much improvement for RoBERTa. In the binary classification experiments, all models have an accuracy above 90%, with Tohoku BERT Large having the best performance. Even without fine-tuning, BERT can be used as an embedding method to perform binary classification with high accuracy.
KW - Automated short answer grading
KW - BERT
KW - contextual embeddings
KW - deep learning
KW - SBERT
UR - http://www.scopus.com/inward/record.url?scp=85216383700&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3532659
DO - 10.1109/ACCESS.2025.3532659
M3 - Article
AN - SCOPUS:85216383700
SN - 2169-3536
VL - 13
SP - 17195
EP - 17207
JO - IEEE Access
JF - IEEE Access
ER -