Abstract
Freedom of speech for the people of Indonesia on social media makes the spread of hate speech and abusive language inevitable. If there is no proper handling, this will lead to social disharmony between individuals and communities. The identification of hate speech and abusive language on Twitter in the Indonesian language is quite challenging. Because of its ability to understand the meaning of a sentence, semantic features such as word embedding can be relied on to understand tweets that contain hateful and abusive words. In this study, word embedding (word2vec) feature and its combinations with part of speech and/or emoji were used to identify hate speech and abusive language on Twitter in the Indonesian language. Furthermore, some combinations of unigram with part of speech and/or emojis were also utilized during the experiment and the results were studied. The classification algorithms used in this study were Support Vector Machine, Random Forest Decision Tree, and Logistic Regression. The combination of unigram features, part of speech and emoji obtained the highest accuracy value of 79.85% with F-Measure of 87.51%.
Original language | English |
---|---|
Title of host publication | AISS '19 |
Subtitle of host publication | Proceedings of the International Conference on Advanced Information Science and System |
Publisher | Association for Computing Machinery (ACM) |
ISBN (Electronic) | 9781450372916 |
DOIs | |
Publication status | Published - 15 Nov 2019 |
Event | 2019 International Conference on Advanced Information Science and System, AISS 2019 - Singapore, Singapore Duration: 15 Nov 2019 → 17 Nov 2019 |
Conference
Conference | 2019 International Conference on Advanced Information Science and System, AISS 2019 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 15/11/19 → 17/11/19 |
Keywords
- Abusive Language
- Hate Speech
- Machine Learning