@techreport{oai:ipsj.ixsq.nii.ac.jp:00226724, author = {徐, 宸飛 and 岡留, 有哉 and 石黒, 浩 and 中村, 泰 and Chenfei, Xu and Yuya, Okadome and Hiroshi, Ishiguro and Yutaka, Nakamura}, issue = {69}, month = {Jun}, note = {近年,ナビゲーション技術の発展により実空間で頑健に移動するロボットが実現してきたが,人間と共存する環境で周囲の人間の動きに合わせて移動するメカニズムは確立されていない.そこで我々は,人間の動作を含む移動中の状態変化を将来のビデオフレームの予測としてモデル化を行うことを考え,人間が歩行中に集めた一人称視点ビデオデータに対するビデオフレームの生成モデルについて実現可能性を調査している.本報告では,ネットワーク構造やロス関数の違いによる生成データの性質について報告する., Current researches on robot navigation mainly emphasize on responsive behaviors, but it usually presents insufficient intelligence if we want robots be better integrated into human society. To cope with this issue, we are tackling to model the change of the state including other person’s motion by using a deep generative model. In terms of Variational Autoencoders, different factors like hyperparameters, the dimensions of latent space and loss function can directly influence the generation quality. However, the concrete effect of these factors still needs to be studied. Thus, this paper works out an investigation which goes deep in this problem. In particular, we built a VAE based generative model with 3D ConvNet for video reconstruction task. Then we have summarized and compared models’ performances with various combinations of influencing factors and discovered their disentangled representation. A Variational Autoencoder with Arbitrary Conditioning(VAEAC) based prediction model was also evaluated. Our results demonstrate how the performance of generative models was changed to offer a good instance for video prediction problem.}, title = {A Study on generating First-Person Video in moving daily space using variational auto encoders}, year = {2023} }