TY - GEN
T1 - Hate Code Detection in Indonesian Tweets using Machine Learning Approach
T2 - 8th International Conference on Information and Communication Technology, ICoICT 2020
AU - Elisabeth, Damayanti
AU - Budi, Indra
AU - Ibrohim, Muhammad Okky
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/6
Y1 - 2020/6
N2 - The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.
AB - The existence of social media causes side effects from freedom of speech to freedom to hate. People can spread hate speech with creative ways to avoid the hate speech detector. Implicit intends used using many codes. The purpose of using these codes is to disguise their hate speech targets. This paper presents an implementation of hate code detection for Indonesian tweets using machine learning and a classification explainer. First, we developed a dataset for hate codes ground truth. We generated hate codes from two scenarios i.e., hate code from hate speech classification and hate code from hate code classification. We used Logistic Regression (LR), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) as our classifier. We also used TF-IDF and word bigrams as the features. The codes consist of word and phrase form. The best f-measure score is 94.90% from hate code classification using Logistic Regression with abusive codes elimination. This number means the model can detect all tweets that have no hate codes. For tweets that annotated have hate code, the f-measure is 28.23% for recognized all the hate codes, and the recall is 56.91%.
KW - hate code
KW - hate code detection
KW - hate speech
KW - multi-label classification
UR - http://www.scopus.com/inward/record.url?scp=85090997232&partnerID=8YFLogxK
U2 - 10.1109/ICoICT49345.2020.9166251
DO - 10.1109/ICoICT49345.2020.9166251
M3 - Conference contribution
AN - SCOPUS:85090997232
T3 - 2020 8th International Conference on Information and Communication Technology, ICoICT 2020
BT - 2020 8th International Conference on Information and Communication Technology, ICoICT 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 24 June 2020 through 26 June 2020
ER -