TY - GEN
T1 - Hate speech detection in the Indonesian language
T2 - 9th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
AU - Alfina, Ika
AU - Mulia, Rio
AU - Fanany, Mohamad Ivan
AU - Ekanata, Yudo
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - The objective of our work is to detect hate speech in the Indonesian language. As far as we know, the research on this subject is still very rare. The only research we found has created a dataset for hate speech against religion, but the quality of this dataset is inadequate. Our research aimed to create a new dataset that covers hate speech in general, including hatred for religion, race, ethnicity, and gender. In addition, we also conducted a preliminary study using machine learning approach. Machine learning so far is the most frequently used approach in classifying text. We compared the performance of several features and machine learning algorithms for hate speech detection. Features that extracted were word n-gram with n=l and n=2, character n-gram with n=3 and n=4, and negative sentiment. The classification was performed using Naïve Bayes, Support Vector Machine, Bayesian Logistic Regression, and Random Forest Decision Tree. An F-measure of 93.5% was achieved when using word n-gram feature with Random Forest Decision Tree algorithm. Results also show that word n-gram feature outperformed character n-gram.
AB - The objective of our work is to detect hate speech in the Indonesian language. As far as we know, the research on this subject is still very rare. The only research we found has created a dataset for hate speech against religion, but the quality of this dataset is inadequate. Our research aimed to create a new dataset that covers hate speech in general, including hatred for religion, race, ethnicity, and gender. In addition, we also conducted a preliminary study using machine learning approach. Machine learning so far is the most frequently used approach in classifying text. We compared the performance of several features and machine learning algorithms for hate speech detection. Features that extracted were word n-gram with n=l and n=2, character n-gram with n=3 and n=4, and negative sentiment. The classification was performed using Naïve Bayes, Support Vector Machine, Bayesian Logistic Regression, and Random Forest Decision Tree. An F-measure of 93.5% was achieved when using word n-gram feature with Random Forest Decision Tree algorithm. Results also show that word n-gram feature outperformed character n-gram.
KW - building dataset
KW - classification
KW - hate speech detection
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85047165416&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS.2017.8355039
DO - 10.1109/ICACSIS.2017.8355039
M3 - Conference contribution
AN - SCOPUS:85047165416
T3 - 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
SP - 233
EP - 237
BT - 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 October 2017 through 29 October 2017
ER -