TY - GEN
T1 - Transforming Table to Knowledge Graph using A Rule-based Pipeline
AU - Yulianti, Gries
AU - Krisnadhi, Adila Alfa
AU - Hilman, Muhammad Hafizhuddin
N1 - Funding Information:
The authors acknowledge the support of Universitas Indonesia through Hibah PITTA B 2019 “Pemrosesan Tabular Data, Teks, dan Data berbasis Graf untuk Perolehan Informasi dan Pengayaan Basis Pengetahuan”, contract number NKB-0508/UN2.R3.1/HKP.05.00/2019.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Usability of data in millions of relational tables on the web can be improved by transforming them into a knowledge graph (KG). Unlike relational tables that possess a fixed number of columns and contain no explicit interlinking between entities they contain, column restrictions do not exist in KGs and each entity and relation are identified uniquely through the use of Uniform Resource Identifiers (URIs), which enables the creation of interlinks between entities. A complete process of such a transformation requires one to map (table) cells to (KG) entities, (table) columns to (KG) properties, i.e., binary relations, and the (whole) table to a (KG) class. The latter allows the subject entity of each row (e.g., countries in a country table) to be explicitly assigned an appropriate category. Unfortunately, most of the existing transformation methods only focus on mapping table cells to KG entities - the remaining two tasks are rarely addressed. In this work, we propose a table to KG transformation pipeline accomplishing all of those three tasks. Our approach differs from T2K-the only existing transformation approach that perform the above three tasks-in that T2K employs supervised learning models, while our pipeline consists of a number of heuristics that do not depend on ground truths in the T2D gold standard dataset. In the cell-to-entity mapping, the proposed method outperform T2K and achieve a comparable performance with TabEAno, an unsupervised approach specialized for this task. In the other two tasks, although we did not outperform T2K, the unsupervised nature of our approach means that dependency with gold standard data is not critical.
AB - Usability of data in millions of relational tables on the web can be improved by transforming them into a knowledge graph (KG). Unlike relational tables that possess a fixed number of columns and contain no explicit interlinking between entities they contain, column restrictions do not exist in KGs and each entity and relation are identified uniquely through the use of Uniform Resource Identifiers (URIs), which enables the creation of interlinks between entities. A complete process of such a transformation requires one to map (table) cells to (KG) entities, (table) columns to (KG) properties, i.e., binary relations, and the (whole) table to a (KG) class. The latter allows the subject entity of each row (e.g., countries in a country table) to be explicitly assigned an appropriate category. Unfortunately, most of the existing transformation methods only focus on mapping table cells to KG entities - the remaining two tasks are rarely addressed. In this work, we propose a table to KG transformation pipeline accomplishing all of those three tasks. Our approach differs from T2K-the only existing transformation approach that perform the above three tasks-in that T2K employs supervised learning models, while our pipeline consists of a number of heuristics that do not depend on ground truths in the T2D gold standard dataset. In the cell-to-entity mapping, the proposed method outperform T2K and achieve a comparable performance with TabEAno, an unsupervised approach specialized for this task. In the other two tasks, although we did not outperform T2K, the unsupervised nature of our approach means that dependency with gold standard data is not critical.
KW - cell to entity
KW - column to property
KW - knowledge graph
KW - T2D gold standard
KW - table
KW - table to class
KW - transformation
UR - http://www.scopus.com/inward/record.url?scp=85123843805&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS53237.2021.9631350
DO - 10.1109/ICACSIS53237.2021.9631350
M3 - Conference contribution
AN - SCOPUS:85123843805
T3 - 2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
BT - 2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
Y2 - 23 October 2021 through 26 October 2021
ER -