TY - GEN
T1 - Mobile application review classification for the Indonesian language using machine learning approach
AU - Ekanata, Yudo
AU - Budi, Indra
N1 - Funding Information:
ACKNOWLEDGMENT We gratefully thank the Universitas Indonesia for the International Publication Grants for Student Thesis for financial support (HIBAH PITTA).
Publisher Copyright:
© 2018 IEEE.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - The number of user reviews for a mobile app can reach thousands so it will take a lot of time for app developers to sort through and find information that is important for further app development. Therefore, this study aims to automatically classify mobile application user reviews. Automatic classification conducted in this study is using machine learning approach. The features extracted from user review are unigram, bigram, star rating, review length, as well as the ratio of the number of words with positive and negative sentiment. For classification algorithms, we used Naïve Bayes, Support Vector Machine, Logistic Regression and Decision Tree. The experiment result shows that Logistic Regression gives the best F-Measure of 85% when combined with unigram plus sentence length and sentiment score. Unigram was proven as the most important feature since the additional features like sentence length and sentiment score only increased the F-measure around 1%. Bigram and star rating has negative impact on the classifier performance.
AB - The number of user reviews for a mobile app can reach thousands so it will take a lot of time for app developers to sort through and find information that is important for further app development. Therefore, this study aims to automatically classify mobile application user reviews. Automatic classification conducted in this study is using machine learning approach. The features extracted from user review are unigram, bigram, star rating, review length, as well as the ratio of the number of words with positive and negative sentiment. For classification algorithms, we used Naïve Bayes, Support Vector Machine, Logistic Regression and Decision Tree. The experiment result shows that Logistic Regression gives the best F-Measure of 85% when combined with unigram plus sentence length and sentiment score. Unigram was proven as the most important feature since the additional features like sentence length and sentiment score only increased the F-measure around 1%. Bigram and star rating has negative impact on the classifier performance.
KW - app review
KW - classification
KW - machine learning
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85050213588&partnerID=8YFLogxK
U2 - 10.1109/CATA.2018.8398667
DO - 10.1109/CATA.2018.8398667
M3 - Conference contribution
AN - SCOPUS:85050213588
T3 - 2018 4th International Conference on Computer and Technology Applications, ICCTA 2018
SP - 117
EP - 121
BT - 2018 4th International Conference on Computer and Technology Applications, ICCTA 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Computer and Technology Applications, ICCTA 2018
Y2 - 3 May 2018 through 5 May 2018
ER -