TY - GEN
T1 - Hierarchical attention network with XGBoost for recognizing insufficiently supported argument
AU - Suhartono, Derwin
AU - Gema, Aryo Pradipta
AU - Winton, Suhendro
AU - David, Theodorus
AU - Fanany, Mohamad Ivan
AU - Arymurthy, Aniati Murni
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - In this paper, we propose the empirical analysis of Hierarchical Attention Network (HAN) as a feature extractor that works conjointly with eXtreme Gradient Boosting (XGBoost) as the classifier to recognize insufficiently supported arguments using a publicly available dataset. Besides HAN + XGBoost, we performed experiments with several other deep learning models, such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), bidirectional LSTM, and bidirectional GRU. All results with the best hyper-parameters are presented. In this paper, we present the following three key findings: (1) Shallow models work significantly better than the deep models when using only a small dataset. (2) Attention mechanism can improve the deep model’s result. In average, it improves Area Under the Receiver Operating Characteristic Curve (ROC-AUC) score of Recurrent Neural Network (RNN) with a margin of 18.94%. The hierarchical attention network gave a higher ROC-AUC score by 2.25% in comparison to the non-hierarchical one. (3) The use of XGBoost as the replacement for the last fully connected layer improved the F1 macro score by 5.26%. Overall our best setting achieves 1.88% improvement compared to the state-of-the-art result.
AB - In this paper, we propose the empirical analysis of Hierarchical Attention Network (HAN) as a feature extractor that works conjointly with eXtreme Gradient Boosting (XGBoost) as the classifier to recognize insufficiently supported arguments using a publicly available dataset. Besides HAN + XGBoost, we performed experiments with several other deep learning models, such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), bidirectional LSTM, and bidirectional GRU. All results with the best hyper-parameters are presented. In this paper, we present the following three key findings: (1) Shallow models work significantly better than the deep models when using only a small dataset. (2) Attention mechanism can improve the deep model’s result. In average, it improves Area Under the Receiver Operating Characteristic Curve (ROC-AUC) score of Recurrent Neural Network (RNN) with a margin of 18.94%. The hierarchical attention network gave a higher ROC-AUC score by 2.25% in comparison to the non-hierarchical one. (3) The use of XGBoost as the replacement for the last fully connected layer improved the F1 macro score by 5.26%. Overall our best setting achieves 1.88% improvement compared to the state-of-the-art result.
KW - Deep learning
KW - Hierarchical Attention Network
KW - Insufficiently supported argument
KW - Shallow learning
KW - XGBoost
UR - https://www.scopus.com/pages/publications/85034266613
U2 - 10.1007/978-3-319-69456-6_15
DO - 10.1007/978-3-319-69456-6_15
M3 - Conference contribution
AN - SCOPUS:85034266613
SN - 9783319694559
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 174
EP - 188
BT - Multi-disciplinary Trends in Artificial Intelligence - 11th International Workshop, MIWAI 2017, Proceedings
A2 - Phon-Amnuaisuk, Somnuk
A2 - Ang, Swee-Peng
A2 - Lee, Soo-Young
PB - Springer Verlag
T2 - 11th Multi-disciplinary International Workshop on Artificial Intelligence, MIWAI 2017
Y2 - 20 November 2017 through 22 November 2017
ER -