Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages

Nur Endah Safitri, Amalia Zahra, Mirna Adriani

Research output: Contribution to journalConference articlepeer-review

19 Citations (Scopus)

Abstract

Research in the field of spoken language identification (spoken LID) on local languages helps to extend the outreach of technology to local language speakers. This research also contributes to the preservation of local languages. In this paper, we report our work on identifying spoken data in three local Indonesian languages: Minangkabau, Sundanese and Javanese. Statistical phonotactics models are created to map the speech signals into the language used by the speaker. We use two phonotactics methods, namely Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM). PRLM method shows the highest accuracy using the phone recognizer trained for English and Russian with the average of 77.42% and 75.94% respectively.

Original languageEnglish
Pages (from-to)182-187
Number of pages6
JournalProcedia Computer Science
Volume81
DOIs
Publication statusPublished - 2016
Event5th Workshop on Spoken Language Technologies for Under-resourced languages, SLTU 2016 - Yogyakarta, Indonesia
Duration: 9 May 201612 May 2016

Keywords

  • phonotactic methods
  • spoken language identification

Fingerprint

Dive into the research topics of 'Spoken Language Identification with Phonotactics Methods on Minangkabau, Sundanese, and Javanese Languages'. Together they form a unique fingerprint.

Cite this