Semi-supervised Textual Entailment on Indonesian Wikipedia Data

Ken Nabila Setya, Rahmad Mahendra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

Recognizing Textual Entailment (RTE) is a research in Natural Language Processing that aims to identify whether there is an entailment relation between two texts. Textual Entailment has been studied in a variety of languages, but it is rare for the Indonesian language. The purpose of the work presented in this paper is to conduct the RTE experiment on Indonesian language dataset. Since manual data creation is costly and time consuming, we choose semi-supervised machine learning approach. We apply co-training algorithm to enlarge small amounts of annotated data, called seeds. With this method, the human effort only needed to annotate the seeds. The data resource used is all from Wikipedia and the entailment pairs are extracted from its revision history. Evaluation of 1,857 sentence pairs labelled with entailment information using our approach achieve accuracy 76%.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 19th International Conference, CICLing 2018, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Science and Business Media Deutschland GmbH
Pages416-427
Number of pages12
ISBN (Print)9783031237928
DOIs
Publication statusPublished - 2023
Event19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018 - Hanoi, Viet Nam
Duration: 18 Mar 201824 Mar 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13396 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2018
Country/TerritoryViet Nam
CityHanoi
Period18/03/1824/03/18

Keywords

  • Co-training
  • Indonesian language
  • LSTM
  • Semi-supervised
  • Textual entailment
  • Wikipedia revision history

Fingerprint

Dive into the research topics of 'Semi-supervised Textual Entailment on Indonesian Wikipedia Data'. Together they form a unique fingerprint.

Cite this