TY - JOUR
T1 - Towards product attributes extraction in Indonesian e-commerce platform
AU - Rif’at, Muhammad
AU - Mahendra, Rahmad
AU - Budi, Indra
AU - Wibowo, Haryo Akbarianto
N1 - Funding Information:
The authors gratefully acknowledge the support of the PITTA UI Grant Contract No. 406/UN2.R3.1/HKP.05.00/2017. The first author was also partially funded by Bukalapak.
Publisher Copyright:
© 2018 Instituto Politecnico Nacional. All rights reserved.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Product attribute extraction is an important task in e-commerce domain. Extracting pairs of attribute label and value from free-text product descriptions can be useful for many tasks, such as product matching, product categorization, faceted product search, and product recommendation. In this paper, we present a study of attribute extraction from Indonesian e-commerce product titles. We annotate 1,721 product titles with 16 attribute labels. We apply supervised learning technique using CRF algorithm. We propose combination of lexical, word embedding, and dictionary features to learn the attribute using joint extraction model. Our model achieves F1-measure 47.30% and 68.49% respectively for full match and partial match evaluation. Based on the experiment, we find that doing attributes extraction on more various number and diverse attributes simultaneously does not necessarily give worse result compared to extraction on less number of attributes.
AB - Product attribute extraction is an important task in e-commerce domain. Extracting pairs of attribute label and value from free-text product descriptions can be useful for many tasks, such as product matching, product categorization, faceted product search, and product recommendation. In this paper, we present a study of attribute extraction from Indonesian e-commerce product titles. We annotate 1,721 product titles with 16 attribute labels. We apply supervised learning technique using CRF algorithm. We propose combination of lexical, word embedding, and dictionary features to learn the attribute using joint extraction model. Our model achieves F1-measure 47.30% and 68.49% respectively for full match and partial match evaluation. Based on the experiment, we find that doing attributes extraction on more various number and diverse attributes simultaneously does not necessarily give worse result compared to extraction on less number of attributes.
KW - Attributes extraction
KW - E-commerce
KW - Indonesian language
KW - Named entity recognition
KW - Product title
UR - http://www.scopus.com/inward/record.url?scp=85069546940&partnerID=8YFLogxK
U2 - 10.13053/CyS-22-4-3073
DO - 10.13053/CyS-22-4-3073
M3 - Article
AN - SCOPUS:85069546940
SN - 1405-5546
VL - 22
SP - 1367
EP - 1375
JO - Computacion y Sistemas
JF - Computacion y Sistemas
IS - 4
ER -