統計的パラメトリック音声合成のための FFT スペクトルからの Deep Auto-encoder に基づく低次元音響特徴量抽出

高木, 信二; 山岸, 順一; Shinji, Takaki; Junichi, Yamagishi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

統計的パラメトリック音声合成のための FFT スペクトルからの Deep Auto-encoder に基づく低次元音響特徴量抽出

https://ipsj.ixsq.nii.ac.jp/records/146190

名前 / ファイル	ライセンス	アクション
IPSJ-SLP15109018.pdf (2.2 MB)	Copyright (c) 2015 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2015-11-25

タイトル

統計的パラメトリック音声合成のための FFT スペクトルからの Deep Auto-encoder に基づく低次元音響特徴量抽出

タイトル

言語

タイトル

Deep Auto-encoder based Low-dimensional Feature Extraction using FFT Spectral Envelopes in Statistical Parametric Speech Synthesis

言語

jpn

キーワード

主題Scheme

Other

主題

音声合成

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

国立情報学研究所

著者所属

国立情報学研究所

著者所属(英)

National Institute of Informatics

著者所属(英)

National Institute of Informatics

著者名

高木, 信二
山岸, 順一

著者名(英)

Shinji, Takaki
Junichi, Yamagishi

論文抄録

内容記述タイプ

Other

内容記述

統計的パラメトリック音声合成システムでは，正確かつ安定したスペクトル包絡を推定するため，STRAIGHT スペクトル解析器のような音声解析モジュールが用いられ，得られたスペクトル包絡から抽出された低次元特徴量が音響モデル構築に用いられることが多い．しかし，音声合成の目標を正確なスペクトル包絡の抽出，モデル化，予測ではなく音声波形の再現と考えた場合，音声波形もしくはより原信号に近い入力を利用し，音声波形との誤差を少なくするという方向'性も考えられる．本論文では，統計的パラメトリック音声合成において，Deep Auto-encoder を用い，より原信号近い FFT スペクトルから低次元音響特徴量を抽出することを検討する．テキスト音声合成実験において，異なるスペクトル推定 (STRAIGHT，WORLD，FFT)，低次元特徴量抽出（メルケプストラム分析，Deep Auto-encoder)，音響モデル (HMM, DNN) を組み合わせた 7 種類のテキスト音声合成システムを構築し比較を行い，評価を行った．

論文抄録(英)

内容記述タイプ

Other

内容記述

In the state-of-the-art statistical parametric speech synthesis system, a speech analysis module, e.g. STRAIGHT spectral analysis, is generally used for obtaining accurate and stable spectral envelopes, and then low-dimensional acoustic features extracted from obtained spectral envelopes are used for training acoustic models. However, a spectral envelope estimation algorithm used in such a speech analysis module includes various processing derived from human knowledge. In this paper, we investigate a deep auto-encoder based, non-linear, data-driven and unsupervised low-dimensional feature extraction using FFT spectral envelopes for statistical parametric speech synthesis. Experimental results have shown that a text-to-speech synthesis system using a deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes is indeed a promising approach.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2015-SLP-109, 号 18, p. 1-6, 発行日 2015-11-25

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 18:04:16.085243

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

統計的パラメトリック音声合成のための FFT スペクトルからの Deep Auto-encoder に基づく低次元音響特徴量抽出

× 高木, 信二

× 山岸, 順一

× Shinji, Takaki

× Junichi, Yamagishi

Versions

Share

Cite as

エクスポート