TY - GEN
T1 - Internet addiction and mental health prediction using ensemble learning based on web browsing history
AU - Purwandari, Betty
AU - Wibawa, Wayan Surya
AU - Fitriah, Nilam
AU - Christia, Mellia
AU - Bintari, Dini Rahma
N1 - Funding Information:
This paper’s publication was supported by the PIT9 grant from Universitas Indonesia under the contract number NKB-0006/UN2.R3.1/HKP.05.00/2019. It was performed under the Faculty of Computer Science in collaboration with the Faculty of Psychology.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/1/12
Y1 - 2020/1/12
N2 - The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment. The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire. The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM. For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.
AB - The widespread prevalence of Web browsing may lead to Internet Addiction Disorder (IAD), which impacts negatively on Web users' general health. Young people who are very active online are prone to suffer from IAD. It negatively affects their academic performance and social lives. The earlier the detection, the better the treatment. Therefore, this pilot study aimed to predict IAD among the youth to encourage early treatment. The sample included 30 undergraduate students at Universitas Indonesia (UI). Their Web browsing histories for five weeks were recorded from their laptops and analyzed using the support vector machine (SVM) with radial basis function (RBF) kernel as a machine learning method for prediction. The results were subsequently compared using ensemble learning, such as random forest (RF) and gradient boosting (GB). It was then matched with respondents' responses to an Internet Addiction Test (IAT) questionnaire, which measures IAD levels. Respondents' general health data were collected with the 12-item General Health Questionnaire (GHQ-12). Features from Web browsing histories were extracted to classify activities in five types. These are information retrieval (IR), instant messaging (IM), social networking services (SNS), leisure, and online shopping (OS). The extracted features became input to classify participants' IAD. The results were compared with their IAD results from the IAT questionnaire. Machine learning was also employed to classify the input into respondents' general health (GH) status, which was matched with their responses to the GHQ-12 questionnaire. The findings revealed that the prediction accuracies were 66.67% for the IAD status and 65.17% for the GH status employing SVM. The precisions for predicting IAD and GH were 63.33% and 44.33%, according to RF. Moreover, the accuracies were 63.33% and 67.17%, according to GB. Results indicated that RF decreased prediction accuracies, but GB was slightly different from SVM. For each classifier, IAD status was predicted more accurately than GH status. An alternative to improve the outcomes is gaining data from the Internet firewall instead of the Web browsing history from users' laptops. It can provide richer and more realistic records of Web access, which are collected from any devices connected to the university's computer networks. However, it requires consent from the participants and authority managing the infrastructure. If each class has a balanced example, we plan to add more features and employ other types of ensemble learning for higher accuracy. Furthermore, performing a multiclass prediction can demonstrate specific IAD severity levels and the class of mental health status, i.e., anxiety and depression.
KW - Data mining
KW - Ensemble learning
KW - Internet addiction
KW - Mental health
KW - Support Vector Machine
KW - Web behavior
UR - http://www.scopus.com/inward/record.url?scp=85082003982&partnerID=8YFLogxK
U2 - 10.1145/3378936.3378947
DO - 10.1145/3378936.3378947
M3 - Conference contribution
AN - SCOPUS:85082003982
T3 - ACM International Conference Proceeding Series
SP - 155
EP - 159
BT - Proceedings of the 2020 3rd International Conference on Software Engineering and Information Management, ICSIM 2020 - Workshop 2020 the 3rd International Conference on Big Data and Smart Computing, ICBDSC 2020
PB - Association for Computing Machinery
T2 - 3rd International Conference on Software Engineering and Information Management, ICSIM 2020 - and its Workshop 2020 the 3rd International Conference on Big Data and Smart Computing, ICBDSC 2020
Y2 - 12 January 2020 through 15 January 2020
ER -