Human-Robot Interaction through Multi-modal Semantic Understanding

Chiori, Hori; Chiori, Hori

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Human-Robot Interaction through Multi-modal Semantic Understanding

https://ipsj.ixsq.nii.ac.jp/records/218464

名前 / ファイル	ライセンス	アクション
IPSJ-SLP22142005.pdf (7.5 MB)	Copyright (c) 2022 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2022-06-10

タイトル

Human-Robot Interaction through Multi-modal Semantic Understanding

タイトル

言語

タイトル

Human-Robot Interaction through Multi-modal Semantic Understanding

言語

eng

キーワード

主題Scheme

Other

主題

招待講演

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

Mitsubishi Electric Research Laboratories

著者所属(英)

Mitsubishi Electric Research Laboratories

著者名

Chiori, Hori

著者名(英)

Chiori, Hori

論文抄録

内容記述タイプ

Other

内容記述

Science fiction television and movies have portrayed humanoid robots with human-like capabilities to recognize their surroundings and the context of the situation. While computers have recently become much more capable at many perceptual tasks, they are not yet ready to take the place of a human in many situations. The recent artificial intelligence (AI) boom and intelligent use of data acquired from various sensors has certainly accelerated the development of technologies needed to realize these advanced human-like capabilities in machines. We have developed a new AI system, called Scene-Aware Interaction, that enables machines to translate their perception and understanding of a scene and respond to it using natural language to interact more effectively with humans. To develop such a machine, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted three-time AVSD challenge track at the Dialog System Technology Challenges (DSTC). We tested the performance of answer generation and temporal reasoning by finding evidence from the video to support each answer. This paper introduces a new system that extends our AV-transformer-based system with attentional multimodal fusion, joint student-teacher learning (JSTL), and model combination techniques, achieving state-of-the-art performances on the AVSD datasets for DSTC7-8,10. We applied the Scene-aware interaction technology to a car navigation system to recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2022-SLP-142, 号 5, p. 1-7, 発行日 2022-06-10

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 15:09:11.575340

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Human-Robot Interaction through Multi-modal Semantic Understanding

× Chiori, Hori

× Chiori, Hori

Versions

Share

Cite as

エクスポート