Word Segmentation for Javanese Character Using Dictionary, SVM, and CRF

Dipta Tanaya, Mirna Adriani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Word segmentation method based on dictionary for Javanese character still cannot overcome the problem of ambiguity, derivational word segmentation, and unknown words identification. We propose a new approach for Javanese character segmentation, i.e. a machine learning based method using CRF and SVM with a set feature of Javanese character's categorization and its combination with dictionary-based method. Through several experiments, it is proven that combination of dictionary-based method and CRF performs best than word segmentation using CRF, SVM, and traditional dictionary-based word segmentation.

Original languageEnglish
Title of host publicationProceedings of the 2018 International Conference on Asian Language Processing, IALP 2018
EditorsMinghui Dong, Moch. Bijaksana, Herry Sujaini, Arif Bijaksana Putra Negara, Ade Romadhony, Fariska Z. Ruskanda, Elvira Nurfadhilah, Lyla Ruslana Aini
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages240-243
Number of pages4
ISBN (Electronic)9781728111766
DOIs
Publication statusPublished - 2 Jul 2018
Event22nd International Conference on Asian Language Processing, IALP 2018 - Bandung, Indonesia
Duration: 15 Nov 201817 Nov 2018

Publication series

NameProceedings of the 2018 International Conference on Asian Language Processing, IALP 2018

Conference

Conference22nd International Conference on Asian Language Processing, IALP 2018
Country/TerritoryIndonesia
CityBandung
Period15/11/1817/11/18

Keywords

  • Javanese character
  • conditional random field
  • support vector machine
  • word segmentation

Fingerprint

Dive into the research topics of 'Word Segmentation for Javanese Character Using Dictionary, SVM, and CRF'. Together they form a unique fingerprint.

Cite this