Abstract
Congested roads and daily traffic jams cause traffic disturbances. A traffic monitoring system using closed-circuit television (CCTV) has been implemented, but the information gathered is still limited for public use. This research focuses on utilizing Twitter data to monitor traffic and road conditions. Traffic-related information is extracted from social media using text mining approach. The methods include Tweet classification for filtering relevant data, location information extraction, and geocoding in order to convert text-based location into coordinate information that can be deployed into Geographic Information System. We test several supervised classification algorithms in this study, i.e., Naïve Bayes, Random Forest, Logistic Regression, and Support Vector Machine. We experiment with Bag Of Words (BOW) and Term Frequency - Inverse Document Frequency (TF-IDF) as the feature representation. The location information is extracted using Named Entity Recognition (NER) and Part-Of-Speech (POS) Tagger. The geocoding is implemented using the ArcPy library. The best model for Tweet relevance classification is the Logistic Regression classifier with the feature combination of unigram and char n-gram, achieving an F1-score of 93%. The NER-based location extractor obtains an F1-score of 54% with a precision of 96%. The geocoding success rate for extracting the location information is 68%. In addition, a web-based visualization is also implemented in order to display traffic information using the spatial interface.
Original language | English |
---|---|
Article number | 65 |
Journal | Journal of Big Data |
Volume | 9 |
Issue number | 1 |
DOIs | |
Publication status | Published - Dec 2022 |
Keywords
- Geocoding
- Information extraction
- Road condition
- Text classification
- Text mining
- Traffic situation