Item type |
SIG Technical Reports(1) |
公開日 |
2018-02-13 |
タイトル |
|
|
タイトル |
Recent Advances in our Neural Parametric Singing Synthesizer |
タイトル |
|
|
言語 |
en |
|
タイトル |
Recent Advances in our Neural Parametric Singing Synthesizer |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Universitat Pompeu Fabra |
著者所属 |
|
|
|
Universitat Pompeu Fabra |
著者所属(英) |
|
|
|
en |
|
|
Universitat Pompeu Fabra |
著者所属(英) |
|
|
|
en |
|
|
Universitat Pompeu Fabra |
著者名 |
Jordi, Bonada
Merlijn, Blaauw
|
著者名(英) |
Jordi, Bonada
Merlijn, Blaauw
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. We include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. Here we describe our recent advances on multisinger and multiple voice quality models. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. We include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. Here we describe our recent advances on multisinger and multiple voice quality models. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10442647 |
書誌情報 |
研究報告音声言語情報処理(SLP)
巻 2018-SLP-120,
号 4,
p. 1-2,
発行日 2018-02-13
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8663 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |