TY - JOUR
T1 - Discovering high quality answers in community question answering archives using a hierarchy of classifiers
AU - Toba, Hapnes
AU - Ming, Zhao Yan
AU - Adriani, Mirna
AU - Chua, Tat Seng
N1 - Funding Information:
This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office, and by I-MHERE Project of Faculty of Computer Science, Universitas Indonesia, IBRD Loan No. 4789-IND & IDA Credit No. 4077-IND, Directorate General of Higher Education Ministry of Education and Culture, Republic of Indonesia.
PY - 2014/3/10
Y1 - 2014/3/10
N2 - In community-based question answering (CQA) services where answers are generated by human, users may expect better answers than an automatic question answering system. However, in some cases, the user generated answers provided by CQA archives are not always of high quality. Most existing works on answer quality prediction use the same model for all answers, despite the fact that each answer is intrinsically different. However, modeling each individual QA pair differently is not feasible in practice. To balance between efficiency and accuracy, we propose a hybrid hierarchy-of-classifiers framework to model the QA pairs. First, we analyze the question type to guide the selection of the right answer quality model. Second, we use the information from question analysis to predict the expected answer features and train the type-based quality classifiers to hierarchically aggregate an overall answer quality score. We also propose a number of novel features that are effective in distinguishing the quality of answers. We tested the framework on a dataset of about 50 thousand QA pairs from Yahoo! Answer. The results show that our proposed framework is effective in identifying high quality answers. Moreover, further analysis reveals the ability of our framework to classify low quality answers more accurately than a single classifier approach.
AB - In community-based question answering (CQA) services where answers are generated by human, users may expect better answers than an automatic question answering system. However, in some cases, the user generated answers provided by CQA archives are not always of high quality. Most existing works on answer quality prediction use the same model for all answers, despite the fact that each answer is intrinsically different. However, modeling each individual QA pair differently is not feasible in practice. To balance between efficiency and accuracy, we propose a hybrid hierarchy-of-classifiers framework to model the QA pairs. First, we analyze the question type to guide the selection of the right answer quality model. Second, we use the information from question analysis to predict the expected answer features and train the type-based quality classifiers to hierarchically aggregate an overall answer quality score. We also propose a number of novel features that are effective in distinguishing the quality of answers. We tested the framework on a dataset of about 50 thousand QA pairs from Yahoo! Answer. The results show that our proposed framework is effective in identifying high quality answers. Moreover, further analysis reveals the ability of our framework to classify low quality answers more accurately than a single classifier approach.
KW - Answer quality
KW - Hierarchical supervised learning
KW - Question answering system
KW - Question type analysis
KW - User generated content
UR - http://www.scopus.com/inward/record.url?scp=84891836269&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2013.10.030
DO - 10.1016/j.ins.2013.10.030
M3 - Article
AN - SCOPUS:84891836269
SN - 0020-0255
VL - 261
SP - 101
EP - 115
JO - Information Sciences
JF - Information Sciences
ER -