TY - JOUR
T1 - Adjusted TextRank for keyword extraction in petrochemical project correspondence documents
AU - Atmoko, Indri
AU - Yulianti, Evi
AU - Jiwanggi, Meganingrum Arista
N1 - Publisher Copyright:
© 2024 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2024/8
Y1 - 2024/8
N2 - A large petrochemical construction project is typically executed by multiple parties, all bound by contract agreement. During the execution phase, issues and problems may arise because the work details are not clearly specified in the contractual agreement. These issues are formally communicated and documented through written correspondence letters. By identifying important keywords within these formal letters, a comprehensive narrative of the project, including its associated issues, can be identified and analyzed. In this research, we introduce an adjusted TextRank algorithm that integrates external features from the Indonesian FastText language model and term frequency-inverse document frequency (TF-IDF) scores to identify important keywords within a dataset of correspondence letters of petrochemical projects. This enhancement involves refining phrase detection, semantic relationship estimation between words, and part-of-speech (POS) identification for words or phrases. Our results show that the proposed adjustments result in improved evaluation scores compared to the baseline standard TextRank and standard TF-IDF, respectively by 24.1% and 25% in terms of F-1 scores.
AB - A large petrochemical construction project is typically executed by multiple parties, all bound by contract agreement. During the execution phase, issues and problems may arise because the work details are not clearly specified in the contractual agreement. These issues are formally communicated and documented through written correspondence letters. By identifying important keywords within these formal letters, a comprehensive narrative of the project, including its associated issues, can be identified and analyzed. In this research, we introduce an adjusted TextRank algorithm that integrates external features from the Indonesian FastText language model and term frequency-inverse document frequency (TF-IDF) scores to identify important keywords within a dataset of correspondence letters of petrochemical projects. This enhancement involves refining phrase detection, semantic relationship estimation between words, and part-of-speech (POS) identification for words or phrases. Our results show that the proposed adjustments result in improved evaluation scores compared to the baseline standard TextRank and standard TF-IDF, respectively by 24.1% and 25% in terms of F-1 scores.
KW - Bahasa Indonesia
KW - Keyword extraction
KW - Phrase detection
KW - Project management
KW - TextRank
UR - http://www.scopus.com/inward/record.url?scp=85195186292&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v35.i2.pp1171-1180
DO - 10.11591/ijeecs.v35.i2.pp1171-1180
M3 - Article
AN - SCOPUS:85195186292
SN - 2502-4752
VL - 35
SP - 1171
EP - 1180
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 2
ER -