TY - GEN
T1 - Classification of pornographic content on Twitter using support vector machine and Naive Bayes
AU - Izzah, Nur
AU - Budi, Indra
AU - Louvan, Samuel
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - The Internet has many benefits, some of them are to gain knowledge and gain the latest information. The internet can be used by anyone and can contain any information, including negative content such as pornographic content, radicalism, racial intolerance, violence, fraud, gambling, security and drugs. Those contents cause the number of children victims of pornography on social media increasing every year. Based on that, it needs a system that detects pornographic content on social media. This study aims to determine the best model to detect the pornographic content. Model selection is determined based on unigram and bigram features, classification algorithm, k-fold cross validation. The classification algorithm used is Support Vector Machine and Naive Bayes. The highest F1-score is yielded by the model with combination of Support Vector Machine, most common words, and combination of unigram and bigram, which returns F1-Score value of 91.14%.
AB - The Internet has many benefits, some of them are to gain knowledge and gain the latest information. The internet can be used by anyone and can contain any information, including negative content such as pornographic content, radicalism, racial intolerance, violence, fraud, gambling, security and drugs. Those contents cause the number of children victims of pornography on social media increasing every year. Based on that, it needs a system that detects pornographic content on social media. This study aims to determine the best model to detect the pornographic content. Model selection is determined based on unigram and bigram features, classification algorithm, k-fold cross validation. The classification algorithm used is Support Vector Machine and Naive Bayes. The highest F1-score is yielded by the model with combination of Support Vector Machine, most common words, and combination of unigram and bigram, which returns F1-Score value of 91.14%.
KW - K-fold cross validation
KW - classification
KW - naive bayes
KW - pornography
KW - social media
KW - support vector machine
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85050230197&partnerID=8YFLogxK
U2 - 10.1109/CATA.2018.8398674
DO - 10.1109/CATA.2018.8398674
M3 - Conference contribution
AN - SCOPUS:85050230197
T3 - 2018 4th International Conference on Computer and Technology Applications, ICCTA 2018
SP - 156
EP - 160
BT - 2018 4th International Conference on Computer and Technology Applications, ICCTA 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Computer and Technology Applications, ICCTA 2018
Y2 - 3 May 2018 through 5 May 2018
ER -