Abstract
Recognizing Textual Entailment (RTE) is a Natural Language Processing task to determine whether a sentence (text) semantically entails another sentence (hypothesis). In this paper, we extracted and learned 35 features from a pair of text and hypothesis in Indonesian. The ablation study was conducted to analyze features contribution to RTE model. The experiments shown that, using Support Vector Machine (SVM) and Logistic Regression, the token-based features contribute positively to improve the model performance. The best model in our experiment is SVM that scored F1-Score of 79.65%. Despite sacrificing 5-points accuracy to the state-of-the-art BERT model, SVM classifier is 31 hours more efficient in terms of training time.
Original language | English |
---|---|
Pages (from-to) | 148-155 |
Number of pages | 8 |
Journal | Procedia CIRP |
Volume | 189 |
DOIs | |
Publication status | Published - 2021 |
Event | 5th International Conference on Artificial Intelligence in Computational Linguistics, ACLing 2021 - Virtual, Online, United Arab Emirates Duration: 4 Jun 2021 → 5 Jun 2021 |
Keywords
- Ablation study
- Feature
- Indonesia, Text classification
- Natural Language Inference
- Recognizing Textual Entailment