TY - JOUR
T1 - Multi-label text classification of Indonesian customer reviews using bidirectional encoder representations from transformers language model
AU - Nissa, Nuzulul Khairu
AU - Yulianti, Evi
N1 - Funding Information:
This research was funded by the Directorate of Research and Development, Universitas Indonesia, under Hibah PUTI Pascasarjana 2022 (Grant No. NKB-103/UN2.RST/HKP.05.00/2022).
Publisher Copyright:
© 2023 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2023/10
Y1 - 2023/10
N2 - Customer review is a critical resource to support the decision-making process in various industries. To understand how customers perceived each aspect of the product, we can first identify all aspects discussed in the customer reviews by performing multi-label text classification. In this work, we want to know the effectiveness of our two proposed strategies using bidirectional encoder representations from transformers (BERT) language model that was pre-trained on the Indonesian language, referred to as IndoBERT, to perform multi-label text classification. First, IndoBERT is used as feature representation to be combined with convolutional neural network-extreme gradient boosting (CNN-XGBoost). Second, IndoBERT is used both as the feature representation as well as the classifier to directly solve the classification task. Additional analysis is performed to compare our results with those using multilingual BERT model. According to our experimental results, our first model using IndoBERT as feature representation shows significant performance over some baselines. Our second model using IndoBERT as both feature representation and classifier can significantly enhance the effectiveness of our first model. In summary, our proposed models can improve the effectiveness of the baseline using Word2Vec-CNN-XGBoost by 19.19% and 6.17%, in terms of accuracy and F-1 score, respectively.
AB - Customer review is a critical resource to support the decision-making process in various industries. To understand how customers perceived each aspect of the product, we can first identify all aspects discussed in the customer reviews by performing multi-label text classification. In this work, we want to know the effectiveness of our two proposed strategies using bidirectional encoder representations from transformers (BERT) language model that was pre-trained on the Indonesian language, referred to as IndoBERT, to perform multi-label text classification. First, IndoBERT is used as feature representation to be combined with convolutional neural network-extreme gradient boosting (CNN-XGBoost). Second, IndoBERT is used both as the feature representation as well as the classifier to directly solve the classification task. Additional analysis is performed to compare our results with those using multilingual BERT model. According to our experimental results, our first model using IndoBERT as feature representation shows significant performance over some baselines. Our second model using IndoBERT as both feature representation and classifier can significantly enhance the effectiveness of our first model. In summary, our proposed models can improve the effectiveness of the baseline using Word2Vec-CNN-XGBoost by 19.19% and 6.17%, in terms of accuracy and F-1 score, respectively.
KW - Convolutional neural network
KW - Customer review
KW - IndoBERT
KW - Multi-label text classification
KW - Word2Vec
UR - http://www.scopus.com/inward/record.url?scp=85163984009&partnerID=8YFLogxK
U2 - 10.11591/ijece.v13i5.pp5641-5652
DO - 10.11591/ijece.v13i5.pp5641-5652
M3 - Article
AN - SCOPUS:85163984009
SN - 2088-8708
VL - 13
SP - 5641
EP - 5652
JO - International Journal of Electrical and Computer Engineering
JF - International Journal of Electrical and Computer Engineering
IS - 5
ER -