Item type |
SIG Technical Reports(1) |
公開日 |
2018-09-05 |
タイトル |
|
|
タイトル |
A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA |
タイトル |
|
|
言語 |
en |
|
タイトル |
A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
質問応答・検索 |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Waseda University |
著者所属 |
|
|
|
Carnegie Mellon University |
著者所属 |
|
|
|
Yahoo Japan Corporation |
著者所属 |
|
|
|
Yahoo Japan Corporation |
著者所属 |
|
|
|
Yahoo Japan Corporation |
著者所属 |
|
|
|
Waseda University |
著者所属(英) |
|
|
|
en |
|
|
Waseda University |
著者所属(英) |
|
|
|
en |
|
|
Carnegie Mellon University |
著者所属(英) |
|
|
|
en |
|
|
Yahoo Japan Corporation |
著者所属(英) |
|
|
|
en |
|
|
Yahoo Japan Corporation |
著者所属(英) |
|
|
|
en |
|
|
Yahoo Japan Corporation |
著者所属(英) |
|
|
|
en |
|
|
Waseda University |
著者名 |
Hsin-Wen, Liu
Avikalp, Srivastava
Sumio, Fujita
Toru, Shimizu
Riku, Togashi
Tetsuya, Sakai
|
著者名(英) |
Hsin-Wen, Liu
Avikalp, Srivastava
Sumio, Fujita
Toru, Shimizu
Riku, Togashi
Tetsuya, Sakai
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Tasks that take not only text but also image as inputs, such as Visual Question Answering (VQA), have received growing attention and become an active research field in recent years. In this study, we consider the task of Visual Question Classification (VQC), where a given question containing both text and an image needs to be classified into one of predefined categories for a Community Question Answering (CQA) site. Our experiments use real data from a major Japanese CQA site called Yahoo Chiebukuro. To our knowledge, our work is the first to systematically compare different deep learning approaches on VQC tasks for CQA. Our study shows that the model that uses HieText for text representation, ResNet50 for image representation, and Multimodal Compact Bilinear pooling for combining the two representations achieved the highest performance in the VQC task. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Tasks that take not only text but also image as inputs, such as Visual Question Answering (VQA), have received growing attention and become an active research field in recent years. In this study, we consider the task of Visual Question Classification (VQC), where a given question containing both text and an image needs to be classified into one of predefined categories for a Community Question Answering (CQA) site. Our experiments use real data from a major Japanese CQA site called Yahoo Chiebukuro. To our knowledge, our work is the first to systematically compare different deep learning approaches on VQC tasks for CQA. Our study shows that the model that uses HieText for text representation, ResNet50 for image representation, and Multimodal Compact Bilinear pooling for combining the two representations achieved the highest performance in the VQC task. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10112482 |
書誌情報 |
研究報告データベースシステム(DBS)
巻 2018-DBS-167,
号 17,
p. 1-6,
発行日 2018-09-05
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-871X |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |