Item type |
SIG Technical Reports(1) |
公開日 |
2019-05-30 |
タイトル |
|
|
タイトル |
A Large Scale Dataset for Cross Modal Action Understanding |
タイトル |
|
|
言語 |
en |
|
タイトル |
A Large Scale Dataset for Cross Modal Action Understanding |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
行動認識 |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Hitachi,Ltd. Research & Development Group |
著者所属 |
|
|
|
Hong Kong University of Science and Technology/Hitachi |
著者所属 |
|
|
|
Hitachi,Ltd. Research & Development Group |
著者所属 |
|
|
|
Hitachi,Ltd. Research & Development Group |
著者所属 |
|
|
|
Hitachi,Ltd. Research & Development Group |
著者所属 |
|
|
|
Hitachi,Ltd. Research & Development Group |
著者所属(英) |
|
|
|
en |
|
|
Hitachi,Ltd. Research & Development Group |
著者所属(英) |
|
|
|
en |
|
|
Hong Kong University of Science and Technology / Hitachi |
著者所属(英) |
|
|
|
en |
|
|
Hitachi,Ltd. Research & Development Group |
著者所属(英) |
|
|
|
en |
|
|
Hitachi,Ltd. Research & Development Group |
著者所属(英) |
|
|
|
en |
|
|
Hitachi,Ltd. Research & Development Group |
著者所属(英) |
|
|
|
en |
|
|
Hitachi,Ltd. Research & Development Group |
著者名 |
Quan, Kong
Ziming, Wu
Ziwei, Deng
Martin, Klinkigt
Bin, Tong
Tomokazu, Murakami
|
著者名(英) |
Quan, Kong
Ziming, Wu
Ziwei, Deng
Martin, Klinkigt
Bin, Tong
Tomokazu, Murakami
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
In recent years, many vision-based multimodal datasets have been proposed for human action understanding. Except RGB, most of them provide only one additional modality like depth. Unlike vision modalities, body-worn sensors or passive sensing can however avoid the failure of action understanding in cases of occlusion. Among the state-of-the-art bechmarks, a standard large-scale dataset does not exist, in which different types of modalities are integrated. To address the disadvantage of vision-based modalities, this paper introduces a new large-scale benchmark recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism that appropriately utilizes both RGB-based and sensor-based modalities. The proposed model significantly improves performance of action recognition by up-to 8% compared to models without using sensor-based modalities. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal-based action understanding. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
In recent years, many vision-based multimodal datasets have been proposed for human action understanding. Except RGB, most of them provide only one additional modality like depth. Unlike vision modalities, body-worn sensors or passive sensing can however avoid the failure of action understanding in cases of occlusion. Among the state-of-the-art bechmarks, a standard large-scale dataset does not exist, in which different types of modalities are integrated. To address the disadvantage of vision-based modalities, this paper introduces a new large-scale benchmark recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism that appropriately utilizes both RGB-based and sensor-based modalities. The proposed model significantly improves performance of action recognition by up-to 8% compared to models without using sensor-based modalities. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal-based action understanding. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA11838947 |
書誌情報 |
研究報告ユビキタスコンピューティングシステム(UBI)
巻 2019-UBI-62,
号 13,
p. 1-9,
発行日 2019-05-30
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8698 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |