LSTMを用いた線形予測フィルタの推定に基づく残響下音声認識

木田, 祐介; 谷口, 徹; Yusuke, Kida; Toru, Taniguchi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

LSTMを用いた線形予測フィルタの推定に基づく残響下音声認識

https://ipsj.ixsq.nii.ac.jp/records/176411

名前 / ファイル	ライセンス	アクション
IPSJ-SLP16114025.pdf (582.8 kB)	Copyright (c) 2016 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2016-12-13

タイトル

LSTMを用いた線形予測フィルタの推定に基づく残響下音声認識

タイトル

言語

タイトル

LSTM-based linear prediction filter estimation for reverberant speech recognition

言語

jpn

キーワード

主題Scheme

Other

主題

音声分析，特徴抽出

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

(株)東芝研究開発センター

著者所属

(株)東芝研究開発センター

著者所属(英)

Corporate Research & Development Center

著者所属(英)

Corporate Research & Development Center

著者名

木田, 祐介
谷口, 徹

著者名(英)

Yusuke, Kida
Toru, Taniguchi

論文抄録

内容記述タイプ

Other

内容記述

DNN (Deep Neural Network) によって音声認識の精度がめざましく向上したが，マイクから離れた位置から発せられた遠隔音声の認識は依然として大きな課題である．音圧の減衰による SNR (Signal-to-Noise Ratio) の低下と，床や壁，天井などによる音の反射が引き起こす残響が認識精度を劣化させる主な要因として知られており，これまでに様々な対策が提案されている．本稿では，DNN を用いた新たな残響抑圧技術を提案した．提案法は，残響による歪みを加えた特徴量からクリーンな特徴量へのマッピング関数を直接 DNN に学習させる従来の方式とは異なり，線形予測フィルタの係数を推定する DNN を学習し，DNN から出力されたフィルタを用いて残響の抑圧を行う．残響を精度よくモデル化するため，提案法は長時間の時系列パターンのモデル化に適した LSTM (Long-Short Term Memory) を用いてネットワークを構築する． 2014 年に開催された国際コンペである REVERB challenge の単一マイクを用いたタスクにて提案法の評価を行った結果，処理にかかる遅延を 10 ミリ秒に抑えつつ実音声の単語認識誤りを 29.7 % から 25.3 % に削減できた．

論文抄録(英)

内容記述タイプ

Other

内容記述

Performances of automatic speech recognition (ASR) systems have been drastically improved by DNN (Deep Neural Network). However, distant ASR is still a challenging problem. The difficulty of the distant ASR is caused mainly by two factors; decrease of SNR (Signal-to-Noise Ratio) due to sound attenuation, and reverberation which is created when a sound reflects off the wall, floor and ceiling. In this paper, we propose a novel dereverberation method based on DNN. Different from conventional DNN-based approaches which train mapping functions from corrupted features to clean features directly, the proposed method trains DNN which estimates coefficients of a linear prediction filter, and then dereverberates using the filter outputed from the trained DNN. To model reverberation accurately, the proposed method utilizes LSTM (Long-Short Term Memory) which is appropriate for modeling time-series patterns. Experiments were performed on the REVERB challenge task which was an international competition held in 2014. The proposed method reduced WER (Word Error Rate) from 29.7% to 25.3% with short latency of 10 ms.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2016-SLP-114, 号 25, p. 1-6, 発行日 2016-12-13

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 05:53:11.101273

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

LSTMを用いた線形予測フィルタの推定に基づく残響下音声認識

× 木田, 祐介

× 谷口, 徹

× Yusuke, Kida

× Toru, Taniguchi

Versions

Share

Cite as

エクスポート