Improving stemming algorithm using morphological rules

Titin Winarti, Djati Kirani, E. T.P. Lussiana, Sunny Arief Sudiro

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Stemming words to remove suffixes has applications in text search, translation machine, summarization document, and text classification. For example, Indonesian stemming reduces the words "kebaikan", "perbaikan", "memperbaiki" and "sebaikbaiknya" to their common morphological root "baik". In text search, this permits a search for a player to find documents containing all words with the stem play. In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make them match to relate difficult words. This research proposed a stemmer with more accurate word results by employing an algorithm which gave more than one word candidate results and more than one affix combinations. New stemming algorithm is called CAT stemming algorithm. Here, the word results did not depend on the order of the morphological rule. All rules were checked, and the word results were kept in a candidate list. To make an efficient stemmer, two kinds of word lists (vocabularies) were used: words that had more than one candidate words and list of root word as a candidate reference. The final word results were selected with several rules. This strategy was proved to have a better result than the two most known about Indonesian stemmers. The experiments showed that the proposed approach gave higher accuracy than the compared systems known.

Original languageEnglish
Pages (from-to)1758-1764
Number of pages7
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume7
Issue number5
DOIs
Publication statusPublished - 1 Jan 2017

Keywords

  • Information retrieval
  • Morphological rule
  • Stemming

Fingerprint Dive into the research topics of 'Improving stemming algorithm using morphological rules'. Together they form a unique fingerprint.

Cite this