Item type |
SIG Technical Reports(1) |
公開日 |
2019-08-22 |
タイトル |
|
|
タイトル |
Double Attention-based Multimodal Neural Machine Translation with Semantic Image Region |
タイトル |
|
|
言語 |
en |
|
タイトル |
Double Attention-based Multimodal Neural Machine Translation with Semantic Image Region |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Tokyo Metropolitan University |
著者所属 |
|
|
|
Tokyo Metropolitan University |
著者所属 |
|
|
|
Osaka University |
著者所属 |
|
|
|
Osaka University |
著者所属(英) |
|
|
|
en |
|
|
Tokyo Metropolitan University |
著者所属(英) |
|
|
|
en |
|
|
Tokyo Metropolitan University |
著者所属(英) |
|
|
|
en |
|
|
Osaka University |
著者所属(英) |
|
|
|
en |
|
|
Osaka University |
著者名 |
Yuting, Zhao
Mamoru, Komachi
Tomoyuki, Kajiwara
Chenhui, Chu
|
著者名(英) |
Yuting, Zhao
Mamoru, Komachi
Tomoyuki, Kajiwara
Chenhui, Chu
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Current work on multimodal neural machine translation (MNMT) has mostly paid attention to the effect of combining visual and textual modalities in improving translation performance. However, it has been suggested that the visual modality is only marginally beneficial. As conventional visual attention mechanisms are used to select visual features from grids of equal size in an image generated by convolutional neural net, the feature of a grid that is not related to image content may arise limited effects in aligning visual concepts associated with the textual object. In contrast, we propose to apply semantic image regions for MNMT with integrating visual and textual features by means of two separate attention mechanisms (double attention) in order to improve predictive token generation. Our approach on the Multi30k dataset achieves 0.5 and 0.9 BLEU point improvement on English-German and English-French translation tasks compared with the baseline double attention-based MNMT. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Current work on multimodal neural machine translation (MNMT) has mostly paid attention to the effect of combining visual and textual modalities in improving translation performance. However, it has been suggested that the visual modality is only marginally beneficial. As conventional visual attention mechanisms are used to select visual features from grids of equal size in an image generated by convolutional neural net, the feature of a grid that is not related to image content may arise limited effects in aligning visual concepts associated with the textual object. In contrast, we propose to apply semantic image regions for MNMT with integrating visual and textual features by means of two separate attention mechanisms (double attention) in order to improve predictive token generation. Our approach on the Multi30k dataset achieves 0.5 and 0.9 BLEU point improvement on English-German and English-French translation tasks compared with the baseline double attention-based MNMT. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10115061 |
書誌情報 |
研究報告自然言語処理(NL)
巻 2019-NL-241,
号 18,
p. 1-7,
発行日 2019-08-22
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8779 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |