WEKO3
アイテム
Why Videos Do Not Guide Translations in Video-guided Machine Translation? An Empirical Evaluation of Video-guided Machine Translation Dataset
https://ipsj.ixsq.nii.ac.jp/records/217664
https://ipsj.ixsq.nii.ac.jp/records/217664baa70c6f-f043-4bfd-ab33-c4d83cd2368a
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2022 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | Trans(1) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2022-04-07 | |||||||||||||
タイトル | ||||||||||||||
タイトル | Why Videos Do Not Guide Translations in Video-guided Machine Translation? An Empirical Evaluation of Video-guided Machine Translation Dataset | |||||||||||||
タイトル | ||||||||||||||
言語 | en | |||||||||||||
タイトル | Why Videos Do Not Guide Translations in Video-guided Machine Translation? An Empirical Evaluation of Video-guided Machine Translation Dataset | |||||||||||||
言語 | ||||||||||||||
言語 | eng | |||||||||||||
キーワード | ||||||||||||||
主題Scheme | Other | |||||||||||||
主題 | [研究論文] natural language processing, multimodal machine translation, video-guided machine translation, machine translation | |||||||||||||
資源タイプ | ||||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||||
資源タイプ | journal article | |||||||||||||
著者所属 | ||||||||||||||
Tokyo Institute of Technology | ||||||||||||||
著者所属 | ||||||||||||||
Tokyo Metropolitan University | ||||||||||||||
著者所属 | ||||||||||||||
Tokyo Metropolitan University | ||||||||||||||
著者所属 | ||||||||||||||
Tokyo Institute of Technology | ||||||||||||||
著者所属(英) | ||||||||||||||
en | ||||||||||||||
Tokyo Institute of Technology | ||||||||||||||
著者所属(英) | ||||||||||||||
en | ||||||||||||||
Tokyo Metropolitan University | ||||||||||||||
著者所属(英) | ||||||||||||||
en | ||||||||||||||
Tokyo Metropolitan University | ||||||||||||||
著者所属(英) | ||||||||||||||
en | ||||||||||||||
Tokyo Institute of Technology | ||||||||||||||
著者名 |
Zhishen, Yang
× Zhishen, Yang
× Tosho, Hirasawa
× Mamoru, Komachi
× Naoaki, Okazaki
|
|||||||||||||
著者名(英) |
Zhishen, Yang
× Zhishen, Yang
× Tosho, Hirasawa
× Mamoru, Komachi
× Naoaki, Okazaki
|
|||||||||||||
論文抄録 | ||||||||||||||
内容記述タイプ | Other | |||||||||||||
内容記述 | Video-guided machine translation (VMT) is a type of multimodal machine translation that uses information from videos to guide translation. However, in the VMT 2020 challenge, adding videos only marginally improved the performance of VMT models compared to their text-only baselines. In this study, we systematically analyze why videos did not guide translation. Specifically, we evaluate the models in input degradation and visual sensitivity experiments and compare the results with a human evaluation using VATEX, which is the dataset used in the VMT 2020 challenge. The results indicate that short and straightforward video descriptions in VATEX are sufficient to perform the translations, which renders the videos redundant in the process. Based on our findings, we provide suggestions on the design of future VMT datasets. Code and human-evaluated data are publicly available for future research. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.30(2022) (online) ------------------------------ |
|||||||||||||
論文抄録(英) | ||||||||||||||
内容記述タイプ | Other | |||||||||||||
内容記述 | Video-guided machine translation (VMT) is a type of multimodal machine translation that uses information from videos to guide translation. However, in the VMT 2020 challenge, adding videos only marginally improved the performance of VMT models compared to their text-only baselines. In this study, we systematically analyze why videos did not guide translation. Specifically, we evaluate the models in input degradation and visual sensitivity experiments and compare the results with a human evaluation using VATEX, which is the dataset used in the VMT 2020 challenge. The results indicate that short and straightforward video descriptions in VATEX are sufficient to perform the translations, which renders the videos redundant in the process. Based on our findings, we provide suggestions on the design of future VMT datasets. Code and human-evaluated data are publicly available for future research. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.30(2022) (online) ------------------------------ |
|||||||||||||
書誌レコードID | ||||||||||||||
収録物識別子タイプ | NCID | |||||||||||||
収録物識別子 | AA11464847 | |||||||||||||
書誌情報 |
情報処理学会論文誌データベース(TOD) 巻 15, 号 2, 発行日 2022-04-07 |
|||||||||||||
ISSN | ||||||||||||||
収録物識別子タイプ | ISSN | |||||||||||||
収録物識別子 | 1882-7799 | |||||||||||||
出版者 | ||||||||||||||
言語 | ja | |||||||||||||
出版者 | 情報処理学会 |