空間写像に基づく母音と鼻子音を対象としたジェスチャ－音声変換システム

國越晶; 喬宇; 齋藤, 大輔; 峯松, 信明; 広瀬, 啓吉; Aki, Kunikoshi; Yu, Qiao; Daisuke, Saito; Nobuaki, Minematsu; Keikichi, Hirose

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

空間写像に基づく母音と鼻子音を対象としたジェスチャ－音声変換システム

https://ipsj.ixsq.nii.ac.jp/records/83947

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5309026.pdf (1.7 MB)	Copyright (c) 2012 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2012-09-15

タイトル

空間写像に基づく母音と鼻子音を対象としたジェスチャ－音声変換システム

タイトル

言語

タイトル

A Speech-to-Hand Conversion System for Vowels and Nasals Based on Space Mapping

言語

jpn

キーワード

主題Scheme

Other

主題

[一般論文] 音声生成，手の運動，メディア変換，ジェスチャと母音の配置

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

東京大学

著者所属

中国科学院深セン先進技術研究院

著者所属

東京大学

著者所属

東京大学

著者所属

東京大学

著者所属(英)

The University of Tokyo

著者所属(英)

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

著者所属(英)

The University of Tokyo

著者所属(英)

The University of Tokyo

著者所属(英)

The University of Tokyo

著者名

國越晶

著者名(英)

Aki, Kunikoshi

論文抄録

内容記述タイプ

Other

内容記述

調音音声合成に代表される文字や記号を介さない合成方式は，運動の連続性に基づく滑らかな合成音の生成やその話速制御などにおいて有効性が注目されている．しかしそのアプリケーションのほとんどは，入力機器の特性を活用して設計されているため，その方法論を他のメディアや機器に応用することは容易ではない．本研究では身体運動から音声を生成するプロセスを，特定の身体部位に限定せずに一般化してとらえ，音声以外のメディア情報の動きを入力として音声を出力するプロセス，異メディア間写像の問題としてとらえる．そして近年声質変換の分野で広く用いられている統計的空間写像構築法を応用した，メディア非依存の方法論を提案する．本稿ではその一例として手の運動からの音声出力を考える．この手法においては，どの手の姿勢（以下ジェスチャ）をどの音に割り当てるかが課題となる．これまでに，ジェスチャを入力とした日本語5母音の連続音声生成において，本手法の有効性および適切なジェスチャ選択手法を報告している．本稿では，子音として鼻子音に注目し，母音に関して，ジェスチャと音が時間同期されたデータを用いて構築した音声→ジェスチャ変換システム（目的とするシステムの逆システム）に鼻子音音声を入力することにより，鼻子音に割り当てるジェスチャを推定する手法を提案する．聴取実験の結果，音声→ジェスチャ変換システムによって推定されたジェスチャは，ジェスチャ候補から選ばれた準最適なジェスチャと比較して，より自然な音声を出力するジェスチャ→音声システムを構築することが示された．

論文抄録(英)

内容記述タイプ

Other

内容記述

Synthesis methods which do not require symbol inputs, such as articulatory synthesis, are useful in continuous speech synthesis and pitch control based on dynamic body motion, in which there are no inherent symbols. Conventional applications based on these methods, however, are strongly dependent on their input media because those applications are designed to make use of their specific characteristics. Once an application is constructed for one media therefore, its methodology is difficult to apply to another media. Considering this point, we treat speech generation from body motion as a mapping problem between different media, non-acoustic media to speech, and propose a media-independent methodology. As one example of our methodology, media conversion from hand motion to speech is discussed. In recent years, the GMM-based statistical mapping techniques have become widely used for voice conversion. Using similar techniques, we have developed a speech generation system which maps gesture space to vowel space and converts hand motions to vowel transitions. In this paper, we expand the system to nasal sound generation. In order to derive the gestures for nasals, a Speech-to-Hand conversion system was developed using the parallel data for vowels. Subjective evaluations showed that our proposed method is effective to generate more natural speech than the quasi-optimal design among a given gesture candidate set.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 53, 号 9, p. 2291-2301, 発行日 2012-09-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 18:06:49.145090

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

空間写像に基づく母音と鼻子音を対象としたジェスチャ－音声変換システム

× 國越晶

× Aki, Kunikoshi

Versions

Share

Cite as

エクスポート