TY - GEN
T1 - Word Segmentation for Javanese Character Using Dictionary, SVM, and CRF
AU - Tanaya, Dipta
AU - Adriani, Mirna
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Word segmentation method based on dictionary for Javanese character still cannot overcome the problem of ambiguity, derivational word segmentation, and unknown words identification. We propose a new approach for Javanese character segmentation, i.e. a machine learning based method using CRF and SVM with a set feature of Javanese character's categorization and its combination with dictionary-based method. Through several experiments, it is proven that combination of dictionary-based method and CRF performs best than word segmentation using CRF, SVM, and traditional dictionary-based word segmentation.
AB - Word segmentation method based on dictionary for Javanese character still cannot overcome the problem of ambiguity, derivational word segmentation, and unknown words identification. We propose a new approach for Javanese character segmentation, i.e. a machine learning based method using CRF and SVM with a set feature of Javanese character's categorization and its combination with dictionary-based method. Through several experiments, it is proven that combination of dictionary-based method and CRF performs best than word segmentation using CRF, SVM, and traditional dictionary-based word segmentation.
KW - Javanese character
KW - conditional random field
KW - support vector machine
KW - word segmentation
UR - http://www.scopus.com/inward/record.url?scp=85062811125&partnerID=8YFLogxK
U2 - 10.1109/IALP.2018.8629215
DO - 10.1109/IALP.2018.8629215
M3 - Conference contribution
AN - SCOPUS:85062811125
T3 - Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018
SP - 240
EP - 243
BT - Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018
A2 - Dong, Minghui
A2 - Bijaksana, Moch.
A2 - Sujaini, Herry
A2 - Negara, Arif Bijaksana Putra
A2 - Romadhony, Ade
A2 - Ruskanda, Fariska Z.
A2 - Nurfadhilah, Elvira
A2 - Aini, Lyla Ruslana
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd International Conference on Asian Language Processing, IALP 2018
Y2 - 15 November 2018 through 17 November 2018
ER -