TY - GEN
T1 - Enriching knowledge bases with counting quantifiers
AU - Mirza, Paramita
AU - Razniewski, Simon
AU - Darari, Fariz
AU - Weikum, Gerhard
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Information extraction traditionally focuses on extracting relations between identifiable entities, such as . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, “California is divided into 58 counties”. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.
AB - Information extraction traditionally focuses on extracting relations between identifiable entities, such as . Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, “California is divided into 58 counties”. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work. This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.
UR - http://www.scopus.com/inward/record.url?scp=85054848519&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-00671-6_11
DO - 10.1007/978-3-030-00671-6_11
M3 - Conference contribution
AN - SCOPUS:85054848519
SN - 9783030006709
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 179
EP - 197
BT - The Semantic Web – ISWC 2018 - 17th International Semantic Web Conference, 2018, Proceedings
A2 - Suárez-Figueroa, Mari Carmen
A2 - Presutti, Valentina
A2 - Kaffee, Lucie-Aimee
A2 - Simperl, Elena
A2 - Sabou, Marta
A2 - Vrandecic, Denny
A2 - Celino, Irene
A2 - Bontcheva, Kalina
PB - Springer Verlag
T2 - 17th International Semantic Web Conference, ISWC 2018
Y2 - 8 October 2018 through 12 October 2018
ER -