Deep Neural Networkを用いた話者空間基底への射影による声質変換

橋本, 哲弥; 柏木, 陽佑; 齋藤, 大輔; 峯松, 信明; Tetsuya, Hashimoto; Yousuke, Kashiwagi; Daisuke, Saito; Nobuaki, Minematsu

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Deep Neural Networkを用いた話者空間基底への射影による声質変換

https://ipsj.ixsq.nii.ac.jp/records/146173

名前 / ファイル	ライセンス	アクション
IPSJ-SLP15109001.pdf (415.7 kB)	Copyright (c) 2015 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2015-11-25

タイトル

Deep Neural Networkを用いた話者空間基底への射影による声質変換

タイトル

言語

タイトル

Voice Conversion based on Projection to Speaker Space Bases constructed by Deep Neural Network

言語

jpn

キーワード

主題Scheme

Other

主題

声質変換

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京大学大学院工学系研究科

著者所属

東京大学大学院工学系研究科

著者所属

東京大学大学院情報理工学系研究科

著者所属

東京大学大学院工学系研究科

著者所属(英)

Grad. School of Engineering, The Univ. of Tokyo

著者所属(英)

Grad. School of Engineering, The Univ. of Tokyo

著者所属(英)

Grad. School of Information Science and Technology, The Univ. of Tokyo

著者所属(英)

Grad. School of Engineering, The Univ. of Tokyo

著者名

橋本, 哲弥
柏木, 陽佑
齋藤, 大輔
峯松, 信明

著者名(英)

Tetsuya, Hashimoto
Yousuke, Kashiwagi
Daisuke, Saito
Nobuaki, Minematsu

論文抄録

内容記述タイプ

Other

内容記述

本研究では，入出力に任意話者を用いることのできる柔軟な声質変換を目的とし，Deep Neural Network(DNN) と Eigenvoice GMMs (EVGMM) の枠組みを組み合わせた変換手法を提案する．初めに大規模話者コーパスを用いて EVGMM の学習を行い，GMM の話者空間の基底ベクトル群を得る．EVGMM においては，この基底ベクトル群に対して目的話者に固有の重みベクトルを掛けることで目的話者の GMM 平均ベクトルを決定する．提案手法では，重みベクトルとして 1-of-K 表現ベクトルを用いることで，話者空間を張る基底話者群の GMM を近似する．近似した GMM によって大規模コーパス中の各話者の特徴量を基底話者群の特徴量へ分解することができる．これらを用いることで DNN によって「ある話者の特徴量から基底話者群の特徴量への変換」と「基底話者群の特徴量から目的話者への変換」の学習をそれぞれ行う．提案手法に対して，適応データ数に対する未知話者への変換精度の客観評価を行った結果，既存手法である EVGMM を上回る精度が得られた．

論文抄録(英)

内容記述タイプ

Other

内容記述

This paper describes a novel approach to construct a voice conversion (VC) system using deep neural networks (DNN) and Eigenvoice GMMs (EVGMM) with the final goal to realize conversion to arbitrary speakers. At first, we train EVGMM with multiple parallel datasets consisting of utterance pairs of a single speaker (reference speaker) and many pre-stored speakers and construct bases of a speaker space based on GMM supervectors. In our proposed method, 1-of-K coding is used instead of speaker-dependent weight parameters in EVGMM to divide input features into components in each basis of a speaker space. Then, converting input features to those components in each basis of a speaker space and converting these components to target speakers' features. These two steps are technically implemented by using DNN. The objective evaluation demonstrates that the proposed architecture improves the performance of target-speaker-open voice conversion.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2015-SLP-109, 号 1, p. 1-6, 発行日 2015-11-25

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 18:03:43.774561

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Deep Neural Networkを用いた話者空間基底への射影による声質変換

× 橋本, 哲弥

× 柏木, 陽佑

× 齋藤, 大輔

× 峯松, 信明

× Tetsuya, Hashimoto

× Yousuke, Kashiwagi

× Daisuke, Saito

× Nobuaki, Minematsu

Versions

Share

Cite as

エクスポート