TY - JOUR
T1 - We Know You Are Living in Bali
T2 - Location Prediction of Twitter Users Using BERT Language Model
AU - Simanjuntak, Lihardo Faisal
AU - Mahendra, Rahmad
AU - Yulianti, Evi
N1 - Funding Information:
Funding: The APC was funded by Universitas Indonesia, under Hibah PPI Q1 2021 (Grant No. NKB-546/UN2.RST/HKP.05.00/2021).
Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/9
Y1 - 2022/9
N2 - Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models.
AB - Twitter user location data provide essential information that can be used for various purposes. However, user location is not easy to identify because many profiles omit this information, or users enter data that do not correspond to their actual locations. Several related works attempted to predict location on English-language tweets. In this study, we attempted to predict the location of Indonesian tweets. We utilized machine learning approaches, i.e., long-short term memory (LSTM) and bidirectional encoder representations from transformers (BERT) to infer Twitter users’ home locations using display name in profile, user description, and user tweets. By concatenating display name, description, and aggregated tweet, the model achieved the best accuracy of 0.77. The performance of the IndoBERT model outperformed several baseline models.
KW - BERT
KW - Indonesian
KW - location
KW - prediction
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85134036779&partnerID=8YFLogxK
U2 - 10.3390/bdcc6030077
DO - 10.3390/bdcc6030077
M3 - Article
AN - SCOPUS:85134036779
SN - 2504-2289
VL - 6
JO - Big Data and Cognitive Computing
JF - Big Data and Cognitive Computing
IS - 3
M1 - 77
ER -