TY - GEN
T1 - Sexual Violence Classification as Hate Speech using Indonesian Tweet
AU - Ramadhan, Muammar Notareza
AU - Budi, Indra
AU - Santoso, Aris Budi
AU - Suryono, Ryan Randy
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by sexual violence cases spread through online media. Our research aims to produce the best classification model with high accuracy by comparing several combinations of machine learning techniques. We collected 9, 035 twitter user opinions to be used as a dataset. From a total of 6, 089 opinions that were successfully annotated, 5, 102 opinions were classified as non-hate speech and 987 opinions as hate speech. We purpose SVM model classification with TF-IDF (Unigram) as feature extraction method and Oversampling method such as ROS and SMOTE to solve imbalance dataset problem and improve the performance of model classification. The classification model with SVM algorithm reach the best accuracy, which is 0.942 with F1-score of 0.940.
AB - Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by sexual violence cases spread through online media. Our research aims to produce the best classification model with high accuracy by comparing several combinations of machine learning techniques. We collected 9, 035 twitter user opinions to be used as a dataset. From a total of 6, 089 opinions that were successfully annotated, 5, 102 opinions were classified as non-hate speech and 987 opinions as hate speech. We purpose SVM model classification with TF-IDF (Unigram) as feature extraction method and Oversampling method such as ROS and SMOTE to solve imbalance dataset problem and improve the performance of model classification. The classification model with SVM algorithm reach the best accuracy, which is 0.942 with F1-score of 0.940.
KW - Hate speech
KW - machine learning
KW - oversampling
KW - twitter
UR - http://www.scopus.com/inward/record.url?scp=85143200260&partnerID=8YFLogxK
U2 - 10.1109/ISITDI55734.2022.9944482
DO - 10.1109/ISITDI55734.2022.9944482
M3 - Conference contribution
AN - SCOPUS:85143200260
T3 - Proceeding - 2022 International Symposium on Information Technology and Digital Innovation: Technology Innovation During Pandemic, ISITDI 2022
SP - 114
EP - 120
BT - Proceeding - 2022 International Symposium on Information Technology and Digital Innovation
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Symposium on Information Technology and Digital Innovation, ISITDI 2022
Y2 - 27 July 2022 through 28 July 2022
ER -