TY - GEN
T1 - Abusive language and hate speech detection for Javanese and Sundanese languages in tweets
T2 - 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021
AU - Putri, Shofianina Dwi Ananda
AU - Ibrohim, Muhammad Okky
AU - Budi, Indra
N1 - Funding Information:
This work was supported by the PUTI Prosiding research grant NKB-3486/UN2.RST/HKP.05.00/2020 from Directorate Research and Community Services, Universitas Indonesia
Publisher Copyright:
© 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021. All Rights Reserved.
PY - 2021
Y1 - 2021
N2 - Indonesia’s demography as an archipelago with lots of tribes and local languages added variances in their communication style. Every region in Indonesia has its own distinct culture, accents, and languages. The demographical condition can influence the characteristic of the language used in social media, such as Twitter. It can be found that Indonesian uses their own local language for communicating and expressing their mind in tweets. Nowadays, research about identifying hate speech and abusive language has become an attractive and developing topic. Moreover, the research related to Indonesian local languages still rarely encountered. This paper analyzes the use of machine learning approaches such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) in detecting hate speech and abusive language in Sundanese and Javanese as Indonesian local languages. The classifiers were used with the several term weightings features, such as word n-grams and char n-grams. The experiments are evaluated using the F-measure. It achieves over 60 % for both local languages.
AB - Indonesia’s demography as an archipelago with lots of tribes and local languages added variances in their communication style. Every region in Indonesia has its own distinct culture, accents, and languages. The demographical condition can influence the characteristic of the language used in social media, such as Twitter. It can be found that Indonesian uses their own local language for communicating and expressing their mind in tweets. Nowadays, research about identifying hate speech and abusive language has become an attractive and developing topic. Moreover, the research related to Indonesian local languages still rarely encountered. This paper analyzes the use of machine learning approaches such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) in detecting hate speech and abusive language in Sundanese and Javanese as Indonesian local languages. The classifiers were used with the several term weightings features, such as word n-grams and char n-grams. The experiments are evaluated using the F-measure. It achieves over 60 % for both local languages.
KW - Abusive
KW - Hate speech
KW - Indonesian local language
KW - Javanese
KW - Sundanese
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85114203618&partnerID=8YFLogxK
U2 - 10.18178/wcse.2021.02.011
DO - 10.18178/wcse.2021.02.011
M3 - Conference contribution
AN - SCOPUS:85114203618
T3 - 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021
SP - 461
EP - 465
BT - 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021
PB - International Workshop on Computer Science and Engineering (WCSE)
Y2 - 19 June 2021 through 21 June 2021
ER -