深層ニューラルネットワークを用いた波形接続型感情音声合成のための感情制御法

大谷, 大和; 松永, 悟之; 平井, 啓之

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

深層ニューラルネットワークを用いた波形接続型感情音声合成のための感情制御法

https://ipsj.ixsq.nii.ac.jp/records/197892

名前 / ファイル	ライセンス	アクション
IPSJ-SLP19127039.pdf (1.3 MB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2019-06-15

タイトル

深層ニューラルネットワークを用いた波形接続型感情音声合成のための感情制御法

タイトル

言語

タイトル

Emotion manipulation for unit-selection-based speech synthesis using deep neural network

言語

jpn

キーワード

主題Scheme

Other

主題

ポスターセッション1

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

株式会社エーアイ

著者所属

株式会社エーアイ

著者所属

株式会社エーアイ

著者名

大谷, 大和
松永, 悟之
平井, 啓之

論文抄録

内容記述タイプ

Other

内容記述

本稿では深層学習を用いた波形接続型感情音声合成のための感情制御法について述べる．従来の波形接続型感情音声合成では，1) 素片単位での混合が困難であるため，中間的な感情表現が乏しい，2) 入力された感情強度に従い素片の感情の種類を切り替えるため，感情による声質の変化が不連続になるといった問題があった．これらの問題を解決するために，提案手法では深層ニューラルネットワーク（DNN）を用いて，平静音声のスペクトル特徴量と感情強度から感情音声と平静音声の差分スペクトルを予測し，これを平静の素片に畳み込むことで所望の感情強度の感情素片を生成する．また，入力感情強度に応した差分スペクトル特徴量を予測可能にするため，データ拡張により感情強度に対応した差分スペクトル特徴量を生成し，これらを学習に用いることで所望の制御則を DNN に埋め込む．実験的評価では，従来手法と比較して滑らかな感情制御ができていることを確認した．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper describes a novel emotion manipulation method for unit-selection-based speech synthesis (USS) using a deep neural network. Our conventional unit-selection-based emotional speech synthesis (USES) includes two weaknesses; 1) it is poor at mixed emotional expressions because it is difficult to generate interpolated units, and 2) variations of emotional voice quality are discontinuous because emotional unit set are changed based on input emotion intensities. To solve these problems, the proposed method predicts spectral differentials between emotional and neutral speech from input emotional intensities and neutral spectral features using the deep neural network (DNN). Then the emotional units are generated by convolution of neutral ones with predicted spectral differentials. Moreover, in order to generate spectral differentials corresponding with input emotional intensities, we introduce data augmentation technique to training of DNNs. Experimental results show that the proposed method achieves smooth manipulations of emotional intensities compared with the conventional USES.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2019-SLP-127, 号 39, p. 1-6, 発行日 2019-06-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 22:10:01.886384

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

深層ニューラルネットワークを用いた波形接続型感情音声合成のための感情制御法

× 大谷, 大和

× 松永, 悟之

× 平井, 啓之

Versions

Share

Cite as

エクスポート