2024-03-29T23:46:35Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:002044352024-03-29T05:26:34Z01164:04619:10081:10200
Two-stream 3D BagNet による人物行動認識Human Action Recognition with Two-stream 3D BagNetjpnセッション2http://id.nii.ac.jp/1001/00204340/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=204435&item_no=1&attribute_id=1&file_no=1Copyright (c) 2020 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.立命館大学院情報理工学研究科立命館大学院情報理工学研究科内田, 準也ワン, ユ加藤, ジェーン本研究では,Two-stream I3D と BagNet の考え方に基づき,認識性能とモデルの解釈性を両立させるための Two-stream 3D BagNet を提案する.我々は,まず,3D ResNet 18 をベースに,パディングを無くした上,殆どの 3*3*3 のフィルタを 1*1*1 に置き変えることによって,受容野を小さく抑えた 3D BagNet9,17,33 を設計する.また,各モデルにおいて,それぞれの RGB 画像を入力とする Spatial Net およびオプティカルフローを入力とする Temporal Net を学習し,各モデルの認識性能および認識根拠の確認を行う.さらに,同じ 3D BagNet の Spatial Net と Temporal Net の Two-stream 融合を行う.実験では本提案手法により,モデルの解釈性をキープしたまま,認識性能を大幅に向上させることが確認できた.In this paper, we propose the Two-stream 3D BagNet for the human action recognition task. The proposed architecture is inspired by the concepts of Two-stream I3D and BagNet, and enjoys both good accuracy and interpretability. Specifically, we designed 3D BagNet 9, 17and 33 based on 3D ResNet18. The receptive fields of these models are enforced to be small by eliminating padding and replacing most 3*3*3 filters with 1*1*1 ones. The spatial version and temporal version of these BagNets were trained using RGB images and optical flow images respectively. We evaluated these models' performances and interpretabilities extensively. We also confirmed by fusing the spatial and temporal nets, the recognition performance can be significantly improved without sacrificing the interpretability.AA11131797研究報告コンピュータビジョンとイメージメディア(CVIM)2020-CVIM-2223152020-05-072188-87012020-04-21