TY - GEN
T1 - Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text
AU - Putri, Shofianina Dwi Ananda
AU - Ibrohim, Muhammad Okky
AU - Budi, Indra
N1 - Funding Information:
The authors acknowledge the PUTI Doktor research grant NKB-3226/UN2.RST/HKP.05.00/2020 from Directorate Research and Community Services, Universi-tas Indonesia.
Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - In social media, people are free to express their feelings and thoughts. However, people can also use abusive language and hate speech to insult or humiliate individuals or groups on social media, such as Twitter. Various detection methods have been developed to control the spread of abusive language and hate speech in Indonesia, but the detection process is still focused on monolingual. As a country with various ethnicities and cultures, Indonesia also has a variety of local languages. This study examines abusive language and hate speech detection on Twitter, which also contains five local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. In this work, we present a preliminary evaluation to find the best performance of machine learning methods in detecting abusive language and hate speech on Twitter as preliminary study for each local language. We use several machine learning algorithms, such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) as classifiers and TF-IDF weighted word n-gram and character-n gram as feature extraction. The experiments use the 5-Fold cross-validation approach and evaluated by measuring the F-1-Score. After the experiment, we have obtained the SVM classifier with word n-gram features show the best F-1-Score for each dataset.
AB - In social media, people are free to express their feelings and thoughts. However, people can also use abusive language and hate speech to insult or humiliate individuals or groups on social media, such as Twitter. Various detection methods have been developed to control the spread of abusive language and hate speech in Indonesia, but the detection process is still focused on monolingual. As a country with various ethnicities and cultures, Indonesia also has a variety of local languages. This study examines abusive language and hate speech detection on Twitter, which also contains five local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. In this work, we present a preliminary evaluation to find the best performance of machine learning methods in detecting abusive language and hate speech on Twitter as preliminary study for each local language. We use several machine learning algorithms, such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) as classifiers and TF-IDF weighted word n-gram and character-n gram as feature extraction. The experiments use the 5-Fold cross-validation approach and evaluated by measuring the F-1-Score. After the experiment, we have obtained the SVM classifier with word n-gram features show the best F-1-Score for each dataset.
KW - Abusive
KW - Hate speech
KW - Indonesian local languages
KW - Machine learning
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85111468836&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-79757-7_9
DO - 10.1007/978-3-030-79757-7_9
M3 - Conference contribution
AN - SCOPUS:85111468836
SN - 9783030797560
T3 - Lecture Notes in Networks and Systems
SP - 88
EP - 98
BT - Recent Advances in Information and Communication Technology 2021 - Proceedings of the 17th International Conference on Computing and Information Technology, IC2IT 2021
A2 - Meesad, Phayung
A2 - Sodsee, Sunantha
A2 - Jitsakul, Watchareewan
A2 - Tangwannawit, Sakchai
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th International Conference on Computing and Information Technology, IC2IT 2021
Y2 - 13 May 2021 through 14 May 2021
ER -