Word Segmentation for Javanese Character Using Dictionary, SVM, and CRF

Dipta Tanaya, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Word segmentation method based on dictionary for Javanese character still cannot overcome the problem of ambiguity, derivational word segmentation, and unknown words identification. We propose a new approach for Javanese character segmentation, i.e. a machine learning based method using CRF and SVM with a set feature of Javanese character's categorization and its combination with dictionary-based method. Through several experiments, it is proven that combination of dictionary-based method and CRF performs best than word segmentation using CRF, SVM, and traditional dictionary-based word segmentation.

Original languageEnglish
Title of host publicationProceedings of the 2018 International Conference on Asian Language Processing, IALP 2018
EditorsMinghui Dong, Fariska Z. Ruskanda, Herry Sujaini, Ade Romadhony, Moch. Bijaksana, Elvira Nurfadhilah, Lyla Ruslana Aini, Arif Bijaksana Putra Negara
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages240-243
Number of pages4
ISBN (Electronic)9781728111766
DOIs
Publication statusPublished - 28 Jan 2019
Event22nd International Conference on Asian Language Processing, IALP 2018 - Bandung, Indonesia
Duration: 15 Nov 201817 Nov 2018

Publication series

NameProceedings of the 2018 International Conference on Asian Language Processing, IALP 2018

Conference

Conference22nd International Conference on Asian Language Processing, IALP 2018
CountryIndonesia
CityBandung
Period15/11/1817/11/18

Keywords

  • conditional random field
  • Javanese character
  • support vector machine
  • word segmentation

Fingerprint Dive into the research topics of 'Word Segmentation for Javanese Character Using Dictionary, SVM, and CRF'. Together they form a unique fingerprint.

Cite this