Developing a Singlish Neural Language Model using ELECTRA

Galangkangin Gotera, Radityo Eko Prasojo, Yugo Kartono Isal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

We develop and benchmark a Singlish pretrained neural language model. To this end, we build a novel 3 GB Singlish freetext dataset collected through various Singaporean websites. Then, we leverage ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to train a transformer-based Singlish language model. ELECTRA is chosen due to its resource-efficiency to better ensure reproducibility. We further build two text classification datasets in Singlish: sentiment analysis and language identification. We use the two datasets to fine-tune our ELECTRA model and benchmark the results against other available pretrained models in English and Singlish. Our experiments show that our Singlish ELECTRA model is competitive against the best open-source models we found despite being pretrained within a significantly less amount of time. We publicly release the benchmarking dataset.

Original languageEnglish
Title of host publicationProceedings - ICACSIS 2022
Subtitle of host publication14th International Conference on Advanced Computer Science and Information Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages235-240
Number of pages6
ISBN (Electronic)9781665489362
DOIs
Publication statusPublished - 2022
Event14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022 - Virtual, Online, Indonesia
Duration: 1 Oct 20223 Oct 2022

Publication series

NameProceedings - ICACSIS 2022: 14th International Conference on Advanced Computer Science and Information Systems

Conference

Conference14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022
Country/TerritoryIndonesia
CityVirtual, Online
Period1/10/223/10/22

Keywords

  • benchmarking dataset
  • ELECTRA
  • language model pretraining
  • Singlish

Fingerprint

Dive into the research topics of 'Developing a Singlish Neural Language Model using ELECTRA'. Together they form a unique fingerprint.

Cite this