TY - GEN
T1 - Visual question answering for monas tourism object using deep learning
AU - Siregar, Ahmad Hasan
AU - Chahyati, Dina
N1 - Funding Information:
This research is funded by PTUPT (Penelitian Terapan Unggulan Perguruan Tinggi) from University of Indonesia.
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2020/10/17
Y1 - 2020/10/17
N2 - Visual Question Answering (VQA) is a machine learning task, given a pair of image and natural language visual question about the image, the task is to answer the question. It is known that there is no public VQA dataset currently available in Bahasa Indonesia. This research compiles a Monas VQA dataset that uses Bahasa Indonesia in the question and Monas, a memorial monument for Indonesian, as the image specific context to resolve the problem. This research also proposes methods to solve VQA using CNN for image embedding, techniques from the NLP field for sentence embedding e.g. Bag-of-Words, fastText, BERT, and BiLSTM, lastly multimodal machine learning to let both embedded information to interact with each other. The best performing model achieves 68.39% accuracy with architecture impact analysis presented.
AB - Visual Question Answering (VQA) is a machine learning task, given a pair of image and natural language visual question about the image, the task is to answer the question. It is known that there is no public VQA dataset currently available in Bahasa Indonesia. This research compiles a Monas VQA dataset that uses Bahasa Indonesia in the question and Monas, a memorial monument for Indonesian, as the image specific context to resolve the problem. This research also proposes methods to solve VQA using CNN for image embedding, techniques from the NLP field for sentence embedding e.g. Bag-of-Words, fastText, BERT, and BiLSTM, lastly multimodal machine learning to let both embedded information to interact with each other. The best performing model achieves 68.39% accuracy with architecture impact analysis presented.
KW - Bahasa Indonesia
KW - Dataset building
KW - Embedding
KW - Multimodal machine learning
KW - Visual Question Answering (VQA)
UR - http://www.scopus.com/inward/record.url?scp=85099738995&partnerID=8YFLogxK
U2 - 10.1109/ICACSIS51025.2020.9263149
DO - 10.1109/ICACSIS51025.2020.9263149
M3 - Conference contribution
AN - SCOPUS:85099738995
T3 - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
SP - 381
EP - 386
BT - 2020 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2020
Y2 - 17 October 2020 through 18 October 2020
ER -