統計的機械学習問題としての音声合成

徳田恵一

WEKO3

インデックスツリー

RootNode

アイテム

統計的機械学習問題としての音声合成

https://ipsj.ixsq.nii.ac.jp/records/91797

名前 / ファイル	ライセンス	アクション
IPSJ-MUS13099002.pdf (357.0 kB)	Copyright (c) 2013 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2013-05-04

タイトル

統計的機械学習問題としての音声合成

タイトル

言語

タイトル

Speech synthesis as a statistical machine learning problem

言語

jpn

キーワード

主題Scheme

Other

主題

【招待講演】音声の認識と合成

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

名古屋工業大学

著者所属(英)

Nagoya Institute of Technology

著者名

徳田恵一

論文抄録

内容記述タイプ

Other

内容記述

任意のテキストから音声波形を生成するテキスト音声合成（以下，音声合成）の問題は，しばしば「泥臭い」問題として扱われてきた．本講演では，音声合成の問題をいかにして，統計的機械学習の枠組みで定式できるかについて議論する．音声合成の基本問題は次のように記述することができる：音声データベース，つまり音声波形と対応するテキストの組の集合がある．それらとは別に与えられた任意のテキストに対応する音声波形を求めよ．これまでの音声合成システムは，テキスト解析，音響モデリング，音声特徴抽出/波形生成等，実現可能な部分問題に分解することにより実現されてきた．部分問題のひとつが“統計的パラメトリック音声合成”であり，隠れマルコフモデルを統計モデルとして用いる場合には“HMM音声合成”と呼ばれる．これらの部分問題を統合し，ひとつの確率モデルとみることにより，上記基本問題を直接解くことが可能となる．本講演では，統計的アプローチによる音声合成研究の今後の展望についても議論する．

論文抄録(英)

内容記述タイプ

Other

内容記述

Text-to-speech synthesis, which is a technique for generating natural-sounding artificial speech for a given text, is often regarded as a messy problem. In this talk, I would like to discuss how we can formulate the problem of speech synthesis in a unified statistical machine learning framework. The basic problem of speech synthesis can be stated as follows: We have a speech database, i.e., a set of speech waveforms and corresponding texts. Given a text to be synthesized, what is the speech waveform corresponding to the text? Conventionally text-to-speech systems were realized by decomposing the problem into subproblems, such as text processing, acoustic modeling, speech feature extraction/waveform reconstruction, etc. One of subproblems is statistical parametric speech synthesis, which is called “HMM-based speech synthesis” when we use hidden Markov models (HMMs) as statistical models. By combining all these processes and regarding them as one probabilistic model, we can understand the whole speech synthesis process in a unified statistical framework. I also discuss future challenges and the direction in speech synthesis research.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10438388

書誌情報

研究報告音楽情報科学（MUS）

巻 2013-MUS-99, 号 2, p. 1-1, 発行日 2013-05-04

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 15:15:06.182923

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

統計的機械学習問題としての音声合成

× 徳田恵一

Versions

Share

Cite as

エクスポート