HMMに基づいた視聴覚テキスト音声合成－画像ベースアプローチ

酒向慎司; 徳田, 恵一; 益子, 貴史; 小林, 隆夫; 北村, 正; Shinji, Sako; Keiichi, Tokuda; Takashi, Masuko; Takao, Kobayashi; Tadashi, Kitamura

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

HMMに基づいた視聴覚テキスト音声合成－画像ベースアプローチ

https://ipsj.ixsq.nii.ac.jp/records/11558

名前 / ファイル	ライセンス	アクション
IPSJ-JNL4307018.pdf (1.1 MB)	Copyright (c) 2002 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2002-07-15

タイトル

HMMに基づいた視聴覚テキスト音声合成－画像ベースアプローチ

タイトル

言語

タイトル

HMM-Based Audio-visual Speech Synthesis-Pixel-based Approach

言語

jpn

キーワード

主題Scheme

Other

主題

特集：音声言語情報処理とその応用

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

その他タイトル

その他のタイトル

音声合成・変換とその応用

著者所属

名古屋工業大学大学院工学研究科

著者所属

名古屋工業大学大学院工学研究科

著者所属

東京工業大学大学院総合理工学研究科

著者所属

東京工業大学大学院総合理工学研究科

著者所属

名古屋工業大学大学院工学研究科

著者所属(英)

Department of Computer Science, Nagoya Institute of Technology

著者所属(英)

Department of Computer Science, Nagoya Institute of Technology

著者所属(英)

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

著者所属(英)

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

著者所属(英)

Department of Computer Science, Nagoya Institute of Technology

著者名

酒向慎司

著者名(英)

Shinji, Sako

論文抄録

内容記述タイプ

Other

内容記述

隠れマルコフモデル（HMM）に基づき，任意の入力テキストから実画像に近い唇動画像を生成するシステムを提案する．我々がこれまでに提案してきたHMMに基づく音声合成法により，高品質なテキスト音声合成システムが実現されているが，この枠組みを，画像ベースアプローチによる唇画像生成に適用する．これによりテキストから，同期した音声と唇動画像の生成が可能であることを示す．本手法の特徴として，主成分分析によって得られる固有唇を利用して，唇パラメータの次元圧縮を行っている．合成システムでは，連結された唇動画像HMMから尤度最大化基準により最適な唇パラメータ系列を求める．この際，静的特徴量（唇の形状）のみでなく，動的特徴量（唇の動き）を考慮することにより，連続的に変化する唇パラメータ系列が生成され，それに基づいて，なめらかに変化する唇動画像を合成することができる．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper describes a technique for text-to-audio-visual speechsynthesis based on hidden Markov models (HMMs), in which lip imagesequences are modeled based on pixel-based approach. To reduce the dimensionality of visual speech feature space, we obtain a set of orthogonal vectors (eigenlips) by principal components analysis (PCA), and use a subset of the PCA coefficients and their dynamic featuresas visual speech parameters.Auditory and visual speech parameters are modeled by HMMs separately, and lip movements are synchronized with auditory speech by usingphoneme boundaries of auditory speech for synthesizing lip imagesequences.We confirmed that the generated auditory speech and lip image sequences are realistic and synchronized naturally.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 43, 号 7, p. 2169-2176, 発行日 2002-07-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-23 02:09:09.334515

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

HMMに基づいた視聴覚テキスト音声合成－画像ベースアプローチ

× 酒向慎司

× Shinji, Sako

Versions

Share

Cite as

エクスポート