TY - JOUR
T1 - Determining subject headings of documents using information retrieval models
AU - Yulianti, Evi
AU - Rahadianti, Laksmita
N1 - Funding Information:
This research is supported by the PUTI Q3 grant number NKB-4379/UN2.RST/HKP.05.00/2020 from Universitas Indonesia.
Funding Information:
This research is supported by the PUTI Q3 grant number from Universitas Indonesia.
Publisher Copyright:
© 2021 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2021/8
Y1 - 2021/8
N2 - Subject heading is a controlled vocabulary that describes the topic of a document, which is important to find and organize library resources. Assigning appropriate subject headings to a document, however, is a time-consuming process. We therefore conduct a novel study on the effectiveness of information retrieval models, i.e., language model (LM) and vector space model (VSM), to automatically generate a ranked list of relevant subject headings, with the aim to give a recommendation for librarians to determine the subject headings effectively and efficiently. Our results show that there are a high number of our queries (up to 61%) that have relevant subject headings in the ten top-ranked recommendations; and on average, the first relevant subject heading is found at the early position (3rd rank). This indicates that document retrieval methods can help the subject heading assignment process. LM and VSM are shown to have comparable performance, except when the search unit is title, VSM is superior to LM by 8-22%. Our further analysis exhibits three faculty pairs that are potential to have research collaboration as their students' thesis often have overlap subject headings: i) economy and business-social and political sciences, ii) nursing-public health, and iii) medicine-public health.
AB - Subject heading is a controlled vocabulary that describes the topic of a document, which is important to find and organize library resources. Assigning appropriate subject headings to a document, however, is a time-consuming process. We therefore conduct a novel study on the effectiveness of information retrieval models, i.e., language model (LM) and vector space model (VSM), to automatically generate a ranked list of relevant subject headings, with the aim to give a recommendation for librarians to determine the subject headings effectively and efficiently. Our results show that there are a high number of our queries (up to 61%) that have relevant subject headings in the ten top-ranked recommendations; and on average, the first relevant subject heading is found at the early position (3rd rank). This indicates that document retrieval methods can help the subject heading assignment process. LM and VSM are shown to have comparable performance, except when the search unit is title, VSM is superior to LM by 8-22%. Our further analysis exhibits three faculty pairs that are potential to have research collaboration as their students' thesis often have overlap subject headings: i) economy and business-social and political sciences, ii) nursing-public health, and iii) medicine-public health.
KW - Document retrieval
KW - Information retrieval
KW - Language model
KW - Subject heading
KW - Vector space model
UR - http://www.scopus.com/inward/record.url?scp=85112106885&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v23.i2.pp1049-1058
DO - 10.11591/ijeecs.v23.i2.pp1049-1058
M3 - Article
AN - SCOPUS:85112106885
SN - 2502-4752
VL - 23
SP - 1049
EP - 1058
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 2
ER -