Item type |
SIG Technical Reports(1) |
公開日 |
2017-12-14 |
タイトル |
|
|
タイトル |
Analyzing the impact of including listener perception annotations in RNN-based emotional speech synthesis |
タイトル |
|
|
言語 |
en |
|
タイトル |
Analyzing the impact of including listener perception annotations in RNN-based emotional speech synthesis |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
ポスターセッション |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
National Institute of Informatics |
著者所属 |
|
|
|
National Institute of Informatics |
著者所属 |
|
|
|
National Institute of Informatics |
著者所属 |
|
|
|
National Institute of Informatics/The University of Edinburgh |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Informatics |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Informatics |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Informatics |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Informatics / The University of Edinburgh |
著者名 |
Jaime, Lorenzo-Trueba
Gustav, Eje Henter
Shinji, Takaki
Junichi, Yamagishi
|
著者名(英) |
Jaime, Lorenzo-Trueba
Gustav, Eje Henter
Shinji, Takaki
Junichi, Yamagishi
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This paper investigates simultaneous modeling of multiple emotions in DNN-based expressive speech synthesis, and how to represent the emotional labels, such as emotional class and strength, for this task. Our goal is to answer two questions: First, what is the best way to annotate speech data with multiple emotions? Second, how should the emotional information be represented as labels for supervised DNN training? We evaluate on a large-scale corpus of emotional speech from a professional actress, additionally annotated with perceived emotional labels from crowd-sourced listeners. By comparing DNN-based speech synthesizers that utilize different emotional representations, we assess the impact of these representations and design decisions on human emotion recognition rates. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This paper investigates simultaneous modeling of multiple emotions in DNN-based expressive speech synthesis, and how to represent the emotional labels, such as emotional class and strength, for this task. Our goal is to answer two questions: First, what is the best way to annotate speech data with multiple emotions? Second, how should the emotional information be represented as labels for supervised DNN training? We evaluate on a large-scale corpus of emotional speech from a professional actress, additionally annotated with perceived emotional labels from crowd-sourced listeners. By comparing DNN-based speech synthesizers that utilize different emotional representations, we assess the impact of these representations and design decisions on human emotion recognition rates. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10442647 |
書誌情報 |
研究報告音声言語情報処理(SLP)
巻 2017-SLP-119,
号 8,
p. 1-2,
発行日 2017-12-14
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8663 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |