TY - GEN
T1 - Extracting disease-symptom relationships from health question and answer forum
AU - Halim, Christian
AU - Wicaksono, Alfan Farizki
AU - Adriani, Mirna
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - In this paper, we address the problem of automatically extracting disease-symptom relationships from health question-answer forums due to its usefulness for medical question answering system. To cope with the problem, we divide our main task into two subtasks since they exhibit different challenges: (1) disease-symptom extraction across sentences, (2) disease-symptom extraction within a sentence. For both subtasks, we employed machine learning approach leveraging several hand-crafted features, such as syntactic features (i.e., information from part-of-speech tags) and pre-trained word vectors. Furthermore, we basically formulate our problem as a binary classification task, in which we classify the 'indicating' relation between a pair of Symptom and Disease entity. To evaluate the performance, we also collected and annotated corpus containing 463 pairs of question-answer threads from several Indonesian health consultation websites. Our experiment shows that, as our expected, the first subtask is relatively more difficult than the second subtask. For the first subtask, the extraction of disease-symptom relation only achieved 36% in terms of F1 measure, while the second one was 76%. To the best of our knowledge, this is the first work addressing such relation extraction task for both 'across' and 'within' sentence, especially in Indonesia.
AB - In this paper, we address the problem of automatically extracting disease-symptom relationships from health question-answer forums due to its usefulness for medical question answering system. To cope with the problem, we divide our main task into two subtasks since they exhibit different challenges: (1) disease-symptom extraction across sentences, (2) disease-symptom extraction within a sentence. For both subtasks, we employed machine learning approach leveraging several hand-crafted features, such as syntactic features (i.e., information from part-of-speech tags) and pre-trained word vectors. Furthermore, we basically formulate our problem as a binary classification task, in which we classify the 'indicating' relation between a pair of Symptom and Disease entity. To evaluate the performance, we also collected and annotated corpus containing 463 pairs of question-answer threads from several Indonesian health consultation websites. Our experiment shows that, as our expected, the first subtask is relatively more difficult than the second subtask. For the first subtask, the extraction of disease-symptom relation only achieved 36% in terms of F1 measure, while the second one was 76%. To the best of our knowledge, this is the first work addressing such relation extraction task for both 'across' and 'within' sentence, especially in Indonesia.
KW - Indonesian Language
KW - Information Extraction
KW - Machine Learning
KW - Natural Language
KW - Question Answering System
KW - Relation Extraction
UR - http://www.scopus.com/inward/record.url?scp=85046675030&partnerID=8YFLogxK
U2 - 10.1109/IALP.2017.8300552
DO - 10.1109/IALP.2017.8300552
M3 - Conference contribution
AN - SCOPUS:85046675030
T3 - Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
SP - 87
EP - 90
BT - Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
A2 - Tong, Rong
A2 - Zhang, Yue
A2 - Lu, Yanfeng
A2 - Dong, Minghui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference on Asian Language Processing, IALP 2017
Y2 - 5 December 2017 through 7 December 2017
ER -