TY - JOUR
T1 - Early Prediction of Students’ Academic Achievement: Categorical Data from Fully Online Learning on Machine-Learning Classification Algorithms
AU - Santoso, Harry Budi
PY - 2021
Y1 - 2021
N2 - Several challenges related to predicting students' academic achievement in fully online learning are defining the dataset used as a predictor. Accordingly, in this study, we define the dataset as categorical data from student demographic profile data, activities, and learning habits of Fully Online Learning students at the Universitas Terbuka (UT). This study's main objective is to predict early academic achievement of fully online learning students using category data as features and to identify relevant important features/predictors. We apply several machine learning (ML) classification algorithms to make early predictions of student academic achievement. This study uses 75,136,349 UT-LMS log data, combined with the demographic profile of 101,617 undergraduate students in fully online learning. Datasets were converted into categorical data to minimize noise arising from large datasets. This study found that the influence factors to student's academic achievement are online learning activities related to access day, study time, and student profession profile. Most students were accessing the UT-LMS on Monday, and the time was in the evening. The evaluations and experiments showed that the random forest algorithm could achieve 85.03% accuracy for the balancing dataset with SMOTE, encoding ordinal data with a label encoder and nominal data with a one-hot encoder. The findings can assist lecturers in designing instructional strategies to improve the student's academic achievement success. Furthermore, the principal novel contribution of this study is how to explore the UT-LMS log data and student demographic data to define it as a categorical data set in the machine-learning classification algorithms. The process of categorizing datasets in this study is more of an art than a science, but this research can form the basis for similar research with other scientific principles analysis. So that similar research after this produces a more optimal accuracy.
AB - Several challenges related to predicting students' academic achievement in fully online learning are defining the dataset used as a predictor. Accordingly, in this study, we define the dataset as categorical data from student demographic profile data, activities, and learning habits of Fully Online Learning students at the Universitas Terbuka (UT). This study's main objective is to predict early academic achievement of fully online learning students using category data as features and to identify relevant important features/predictors. We apply several machine learning (ML) classification algorithms to make early predictions of student academic achievement. This study uses 75,136,349 UT-LMS log data, combined with the demographic profile of 101,617 undergraduate students in fully online learning. Datasets were converted into categorical data to minimize noise arising from large datasets. This study found that the influence factors to student's academic achievement are online learning activities related to access day, study time, and student profession profile. Most students were accessing the UT-LMS on Monday, and the time was in the evening. The evaluations and experiments showed that the random forest algorithm could achieve 85.03% accuracy for the balancing dataset with SMOTE, encoding ordinal data with a label encoder and nominal data with a one-hot encoder. The findings can assist lecturers in designing instructional strategies to improve the student's academic achievement success. Furthermore, the principal novel contribution of this study is how to explore the UT-LMS log data and student demographic data to define it as a categorical data set in the machine-learning classification algorithms. The process of categorizing datasets in this study is more of an art than a science, but this research can form the basis for similar research with other scientific principles analysis. So that similar research after this produces a more optimal accuracy.
KW - learning management system
KW - fully online learning
KW - academic achievement
KW - machine learning
UR - http://jonuns.com/index.php/journal/article/view/713
M3 - Article
SN - 1674-2974
VL - 48
JO - Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences
JF - Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences
IS - 9
ER -