深層学習に基づく話者照合システムのための非学習型帯域拡張法を用いたデータ拡張

宮本, 春奈; 塩田, さやか; 貴家, 仁志

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

深層学習に基づく話者照合システムのための非学習型帯域拡張法を用いたデータ拡張

https://ipsj.ixsq.nii.ac.jp/records/202989

名前 / ファイル	ライセンス	アクション
IPSJ-SLP20131005.pdf (863.9 kB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2020-02-06

タイトル

深層学習に基づく話者照合システムのための非学習型帯域拡張法を用いたデータ拡張

タイトル

言語

タイトル

Data augmentation using non-learning-based bandwidth extension for automatic speaker verification based on deep-learning

言語

jpn

キーワード

主題Scheme

Other

主題

一般講演1

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

首都大学東京

著者所属

首都大学東京

著者所属

首都大学東京

著者名

宮本, 春奈
塩田, さやか
貴家, 仁志

論文抄録

内容記述タイプ

Other

内容記述

本論文では，深層学習に基づく話者照合システムのために非学習型帯域拡張法を適用して生成した広帯域 (wideband; WB) 音声を用いたデータ拡張を提案する．深層ニューラルネットワーク (deep neural network; DNN) を用いた手法の 1 つである x-vector に基づく話者照合システムの学習には大量のデータが必要となる．アメリカ国立標準技術研究所では話者照合のための狭帯域 (narrowband; NB) 音声データベースを多く提供しているが，WB 音声データベースはあまり公開されていない．これまでに，様々なノイズの重畳や帯域拡張データを混ぜ合わせてモデル学習に用いることで x-vector に基づく話者照合システムの性能向上を行う手法が報告されており，DNN に基づく帯域拡張を用いたデータ拡張についても報告されている．しかしながら，DNN に基づく帯域拡張法で生成された高帯域部の情報は少なく，多くの学習データを必要としながらも非学習型の帯域拡張法と品質はあまり変わりがなかった．筆者らはこれまで非学習型の帯域拡張法を NB 音声に適用することで機械学習に有効であることを報告してきた．そこで本論文では，NB 音声データに対して非学習型帯域拡張法を適用した音声を拡張データとして使用した場合の x-vector に基づく話者照合システムの性能評価を行った．実験結果より，データ拡張を行ったシステムはデータ拡張をしないシステムと比べて 22.7% のエラー改善率を得たことを報告する．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this research, we propose a data augmentation scheme using wideband (WB) speech generated by non-learning based bandwidth extension (BWE) methods for deep learning-based automatic speaker verification (ASV). Deep neural network (DNN)-based ASV systems require a large amount of training data for constructing the systems. The national institute of standards and technology provides a large amount of narrowband (NB) speech databases, however, only few WB speech databases are provided for ASV. There are some methods adopting data augmentation with adding noise or BWE for DNN-based ASV systems so far. One of those systems uses a DNN-based BWE method. However, although the DNN-based BWE method requires a large amount of training data, the qualities of generated speeches are almost same as those generated by non-learning-based BWE methods. The authors have been reported that applying the non-learning-based BWE methods to NB speech is effective for machine learning systems. Therefore, in this study, we evaluated the performance of the x-vector-based ASV system adopting the non-learning-based BWE methods as data augmentation. Experimental results showed that the proposed system provided the error reduction of 22.7%, compared with our baseline system.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2020-SLP-131, 号 5, p. 1-6, 発行日 2020-02-06

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 20:41:59.841074

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

深層学習に基づく話者照合システムのための非学習型帯域拡張法を用いたデータ拡張

× 宮本, 春奈

× 塩田, さやか

× 貴家, 仁志

Versions

Share

Cite as

エクスポート