TY - GEN
T1 - Hate speech identification using the hate codes for Indonesian tweets
AU - Pratiwit, Nur Indah
AU - Budi, Indra
AU - Jiwanggi, Meganingrum Arista
N1 - Funding Information:
We gratefully thank the Universitas Indonesia for the International Publication Grants (Hibah PIT-9) No. NKB- 0010/UN2.R3.1/HKP.05.00/2019.
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/7/19
Y1 - 2019/7/19
N2 - The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically blocking or suspending the accounts or posts containing hate speech. However, the social media users have become more creative in expressing the hate speech. To avoid the social media regulations regarding the hate speech, users usually use some special codes to interact with each other. This study aims to utilize the hate codes to identify the hate speech on the social media data. We used the Indonesian tweets as the dataset. We utilized Logistic Regression, Support Vector Machine, Naïve Bayes, and Random Forest Decision Tree as the classifiers. The highest F-Measure score for the hate speech identification was 80.71% by using the hate code feature combined with Logistic Regression as the classifier.
AB - The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically blocking or suspending the accounts or posts containing hate speech. However, the social media users have become more creative in expressing the hate speech. To avoid the social media regulations regarding the hate speech, users usually use some special codes to interact with each other. This study aims to utilize the hate codes to identify the hate speech on the social media data. We used the Indonesian tweets as the dataset. We utilized Logistic Regression, Support Vector Machine, Naïve Bayes, and Random Forest Decision Tree as the classifiers. The highest F-Measure score for the hate speech identification was 80.71% by using the hate code feature combined with Logistic Regression as the classifier.
KW - Classification
KW - Hate code
KW - Hate speech
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85072801330&partnerID=8YFLogxK
U2 - 10.1145/3352411.3352432
DO - 10.1145/3352411.3352432
M3 - Conference contribution
AN - SCOPUS:85072801330
T3 - ACM International Conference Proceeding Series
SP - 128
EP - 133
BT - Proceedings of the 2019 2nd International Conference on Data Science and Information Technology, DSIT 2019
PB - Association for Computing Machinery
T2 - 2nd International Conference on Data Science and Information Technology, DSIT 2019
Y2 - 19 July 2019 through 21 July 2019
ER -