補助音声特徴量によるDNN適応を用いた音声区間検出

太刀岡, 勇気; Yuuki, Tachioka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

補助音声特徴量によるDNN適応を用いた音声区間検出

https://ipsj.ixsq.nii.ac.jp/records/195512

名前 / ファイル	ライセンス	アクション
IPSJ-JNL6004016.pdf (714.8 kB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2019-04-15

タイトル

補助音声特徴量によるDNN適応を用いた音声区間検出

タイトル

言語

タイトル

Voice Activity Detection Using DNN Adaptation with Auxiliary Speech Features

言語

jpn

キーワード

主題Scheme

Other

主題

[一般論文] DNNに基づく音声区間検出，非負値行列因子分解，音声認識，補助特徴量

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

株式会社デンソーアイティーラボラトリ

著者所属(英)

Denso IT Laboratory

著者名

太刀岡, 勇気

著者名(英)

Yuuki, Tachioka

論文抄録

内容記述タイプ

Other

内容記述

音声区間検出は，騒音環境下で音声認識を行う際には必須の前処理である．音声区間検出を行う際には，パワーに基づく方法がよく使われる．しかしながら，この方法は高騒音下において性能の低下が著しいため，近年ではスペクトルの形状を考慮するような方法が提案されている．とりわけ深層神経回路網（deep neural network; DNN）に基づく方法が性能が高いことが知られている．音声認識や音声強調の分野では，DNNを対象の環境に適応させて性能を向上させるために，補助特徴量が使われる．DNNに基づく音声区間検出の性能をさらに向上させるため，本論文では2つの音声のモデル化に基づく特徴量とそれらの結合を提案する．第1は非負値行列因子分解のアクティベーション，第2は音声認識の音響モデルの音響スコアを使うものである．騒音下音声区間検出の実験により，DNNに基づく手法は従来の方法を性能を上回り，2つの補助特徴量は，フレーム別の音声区間検出精度，音声認識の単語正解精度の両観点から有効であることが分かった．

論文抄録(英)

内容記述タイプ

Other

内容記述

Voice activity detection (VAD) is an essential pre-process for automatic speech recognition (ASR) in noisy environments. Power-based methods are widely used; however, because these methods are susceptible to noise, recently, methods that consider the shape of spectrum have been proposed. In particular, deep neural network (DNN) based methods have outperformed previous methods. In the fields of ASR and speech enhancement, to improve their performance by adapting DNNs to a target environment, auxiliary features are used. To improve the performance of DNN-based VAD further, this paper proposes two types of auxiliary features based on speech modelings and their combination. The first is activation of non-negative matrix factorization and the second is acoustic score of ASR acoustic models. Experimental results for noisy VAD tasks demonstrated that DNN-based methods outperformed one of the most effective conventional methods and that both auxiliary features improved performance in terms of both frame-level VAD accuracy and ASR word accuracy.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 60, 号 4, p. 1162-1170, 発行日 2019-04-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 23:04:10.506403

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

補助音声特徴量によるDNN適応を用いた音声区間検出

× 太刀岡, 勇気

× Yuuki, Tachioka

Versions

Share

Cite as

エクスポート