事前学習したvq-wav2vecの音声特徴表現を用いたボコーダフリーのAny-to-Many音声変換

越塚, 毅; 大村, 英史; 桂田, 浩一; Takeshi, Koshizuka; Hidefumi, Ohmura; Kouichi, Katsurada

WEKO3

インデックスツリー

RootNode

アイテム

事前学習したvq-wav2vecの音声特徴表現を用いたボコーダフリーのAny-to-Many音声変換

https://ipsj.ixsq.nii.ac.jp/records/209777

名前 / ファイル	ライセンス	アクション
IPSJ-SLP21136039.pdf (1.8 MB)	Copyright (c) 2021 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2021-02-24

タイトル

事前学習したvq-wav2vecの音声特徴表現を用いたボコーダフリーのAny-to-Many音声変換

タイトル

言語

タイトル

A Vocoder-free Any-to-Many Voice Conversion using Pre-trained vq-wav2vec

言語

jpn

キーワード

主題Scheme

Other

主題

SP2

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京理科大学理工学部

著者所属

東京理科大学理工学部

著者所属

東京理科大学理工学部

著者所属(英)

Faculty of Science and Engineering, Tokyo University of Science

著者所属(英)

Faculty of Science and Engineering, Tokyo University of Science

著者所属(英)

Faculty of Science and Engineering, Tokyo University of Science

著者名

越塚, 毅
大村, 英史
桂田, 浩一

著者名(英)

Takeshi, Koshizuka
Hidefumi, Ohmura
Kouichi, Katsurada

論文抄録

内容記述タイプ

Other

内容記述

音声変換は，入力された音声に対して言語情報を保持しつつ，話者性などの非言語情報のみを変換する技術である．一般的に，音声から話者性を除去するEncoderと，別話者の情報を加えるDecoderから構成されるシステムが多い．本稿では，事前学習した vq-wav2vecをEncoderに用いたボコーダフリーのAny-to-Many音声変換モデルを提案する．提案モデルでは Encoder の事前学習に加えて，RNN_MS と同様の構造を持つDecoderも事前学習することによって，少量の学習データからの音声変換を実現している．このように Encoder および Decoderを事前学習することにより学習データ量を削減する方法は既に提案されているが，Any-to-Many音声変換を対象としている点，およびDecoderの事前学習を音声変換タスクによって行う点が異なる．音声変換の精度を評価したところ，良好な音声変換精度が得られることが確認できた．また，既に学習済みのターゲット話者に対する変換精度を損なうことなく新たなターゲット話者を追加できることが確認できた．

論文抄録(英)

内容記述タイプ

Other

内容記述

Voice conversion (VC) is a technique that converts speaker-dependent non-linguistic information to another speaker’s one while retaining the linguistic information of input speeches. A typical VC system is composed of two modules: an encoder module which removes speaker individuality from the speech, and a decoder module which incorporates another speaker’s individuality to the synthesized speech. In this paper, we propose a vocoder-free any-to-many voice conversion model using the pre-trained vq-wav2vec as an encoder module. Our model makes it possible to convert speech using only a small amount of training data by pre-training the RNN_MS like decoder module in addition to pre-training the encoder module. The difference from the previous approach which also pre-trains both the encoder and the decoder modules is that our target is any-to-many voice conversion and the decoder module is pre-trained with the voice conversion task. The experimental results show that we could obtain good conversion performance. We have also confirmed the system can add new target speakers without deteriorating the performance of conversion for the pre-trained target speakers.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2021-SLP-136, 号 39, p. 1-6, 発行日 2021-02-24

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 18:23:39.608772

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

事前学習したvq-wav2vecの音声特徴表現を用いたボコーダフリーのAny-to-Many音声変換

× 越塚, 毅

× 大村, 英史

× 桂田, 浩一

× Takeshi, Koshizuka

× Hidefumi, Ohmura

× Kouichi, Katsurada

Versions

Share

Cite as

エクスポート