モーションキャプチャシステムを用いたマルチモーダル音声コーパスの構築

四倉, 達夫; 森島, 繁生; 中村, 哲; Tatsuo, Yotsukura; Shigeo, Morishima; Satoshi, Nakamura

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

モーションキャプチャシステムを用いたマルチモーダル音声コーパスの構築

https://ipsj.ixsq.nii.ac.jp/records/36806

名前 / ファイル	ライセンス	アクション
IPSJ-HI04110004.pdf (1.1 MB)	Copyright (c) 2004 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2004-09-10

タイトル

モーションキャプチャシステムを用いたマルチモーダル音声コーパスの構築

タイトル

言語

タイトル

Construction of Audio - Visual Speech Corpus using Motion Capture System

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

ATR音声言語コミュニケーション研究所

著者所属

ATR音声言語コミュニケーション研究所／早稲田大学理工学部

著者所属

ATR音声言語コミュニケーション研究所

著者所属(英)

ATR Spoken Language Translation Research Laboratory

著者所属(英)

ATR Spoken Language Translation Research Laboratory/School of Science and Engineering, Waseda University

著者所属(英)

ATR Spoken Language Translation Research Laboratory

著者名

四倉, 達夫

著者名(英)

Tatsuo, Yotsukura

論文抄録

内容記述タイプ

Other

内容記述

本稿では音声、顔画像および発話時における顔器官の位置とその変化量を含むマルチモーダル音声コーパスの制作方法、およびデータの処理方法について述べる。発話用テキストはATR日本語バランス文とし、女性話者1名の発話をコーパスとした。変化量の計測には光学式モーションキャプチャシステムを使用し、発話者の顔上に多数のマーカを配置することで、顔画像情報のみでは獲得することができない顔位置の詳細かつ高精度の3次元データを収録した。さらに本稿では純粋な顔器官の動きを算出するため、アフィン変換を用い頭部の動きを除去し顔位置のみの情報を獲得する手法を提案する。またコンピュータ上に計測した変化量を発話アニメーションへ容易に再現させるため、顔器官の動きをメッシュで構成された顔オブジェクトへ割り当てる手法について述べる。

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we describe the construction and the processing method of the multi-modal speech corpus, which contain speech data, facial movie data and position and movements of facial organs. One female speaker uttered ATR Japanese phoneme balanced sentences. Measurement of the facial movements is done by an optical motion capture system. We captured high-resolution 3D data by arranging many makers on the speaker’s face. Furthermore, we propose the method of acquiring the facial movements and removing head movements using affine transformation for computing displacements of pure facial organs. Finally in order to represent facial animation from this motion data easily, we describe the technique of assigning the facial polygon model.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA1221543X

書誌情報

情報処理学会研究報告ヒューマンコンピュータインタラクション（HCI）

巻 2004, 号 90(2004-HI-110), p. 19-24, 発行日 2004-09-10

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 14:05:34.034529

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

モーションキャプチャシステムを用いたマルチモーダル音声コーパスの構築

× 四倉, 達夫

× Tatsuo, Yotsukura

Versions

Share

Cite as

エクスポート