Research in the field of spoken language identification (spoken LID) on local languages helps to extend the outreach of technology to local language speakers. This research also contributes to the preservation of local languages. In this paper, we report our work on identifying spoken data in three local Indonesian languages: Minangkabau, Sundanese and Javanese. Statistical phonotactics models are created to map the speech signals into the language used by the speaker. We use two phonotactics methods, namely Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM). PRLM method shows the highest accuracy using the phone recognizer trained for English and Russian with the average of 77.42% and 75.94% respectively.
|Number of pages||6|
|Journal||Procedia Computer Science|
|Publication status||Published - 1 Jan 2016|
|Event||5th Workshop on Spoken Language Technologies for Under-resourced languages, SLTU 2016 - Yogyakarta, Indonesia|
Duration: 9 May 2016 → 12 May 2016
- phonotactic methods
- spoken language identification