音声認識のための回帰木に基づく複数の変換行列の重み付けによる特徴量空間の適応

金川, 裕紀; 太刀岡, 勇気; 渡部, 晋治; 石井, 純; Hiroki, Kanagawa; Yuuki, Tachioka; Shinji, Watanabe; Jun, Ishii

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

音声認識のための回帰木に基づく複数の変換行列の重み付けによる特徴量空間の適応

https://ipsj.ixsq.nii.ac.jp/records/183616

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5809018.pdf (1.1 MB)	Copyright (c) 2017 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2017-09-15

タイトル

音声認識のための回帰木に基づく複数の変換行列の重み付けによる特徴量空間の適応

タイトル

言語

タイトル

Feature-space Adaptation with a Weighted Sum of Multiple Transformation Matrices Based on Regression Tree for Automatic Speech Recognition

言語

jpn

キーワード

主題Scheme

Other

主題

[一般論文] 自動音声認識，適応，特徴量変換，ディープ・ニューラルネットワーク

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

三菱電機株式会社情報技術総合研究所

著者所属

三菱電機株式会社情報技術総合研究所

著者所属

Mitsubishi Electric Research Laboratories

著者所属

三菱電機株式会社情報技術総合研究所

著者所属(英)

Information Technology R&D Center, Mitsubishi Electric Corporation

著者所属(英)

Information Technology R&D Center, Mitsubishi Electric Corporation

著者所属(英)

Mitsubishi Electric Research Laboratories

著者所属(英)

Information Technology R&D Center, Mitsubishi Electric Corporation

著者名

金川, 裕紀
太刀岡, 勇気
渡部, 晋治
石井, 純

著者名(英)

Hiroki, Kanagawa
Yuuki, Tachioka
Shinji, Watanabe
Jun, Ishii

論文抄録

内容記述タイプ

Other

内容記述

音声認識では適応が重要である．特徴量空間での適応（fMLLR）は，特徴量ベクトル系列に単一の変換行列を乗算することで実現されるため，デコーディング処理とは独立な，特徴量に関する前処理として実装できる．このためガウス混合分布（GMM）と同様にディープ・ニューラルネットワーク（DNN）の音響モデルに対しても適用できる．一方でモデル空間の適応は，回帰木に基づき複数の変換行列を用いることで，単一の変換行列を用いるfMLLRよりも高い精度で適応が可能である．しかしこの手法には2つの課題がある．1つ目は適応とデコードに同じ生成モデル（例：GMM）の音響モデルを共有しなければならず，DNNの音響モデルには適用できないこと，2つ目は変換行列の数が多くなると，変換行列の推定が過学習しやすいことである．本論文では，1パスの状態アラインメント情報を用いてフレームごとに対応する複数の変換行列を対応付け，それらを用いて重み付け線形和で表現される変換行列で特徴量変換を行う手法を提案する．さらに2つ目の課題に対し，構造的な事前確率の導入により変換行列をMAP推定する，特徴量空間における構造的事後確率最大線形（fSMAPLR）を提案する．実験より，提案するfSMAPLRはfMLLRの性能を上回った．

論文抄録(英)

内容記述タイプ

Other

内容記述

In automatic speech recognition, an adaptation is important. Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, model-space adaptations (i.e., CMLLR) improve the performance of fMLLR because it can use multiple transformation matrices based on a regression tree. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e., Gaussian mixture model (GMM). Second, transformation matrices tend to be over-estimated when the number of transformation matrices is large. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 58, 号 9, p. 1555-1564, 発行日 2017-09-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 03:36:27.538491

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

音声認識のための回帰木に基づく複数の変換行列の重み付けによる特徴量空間の適応

× 金川, 裕紀

× 太刀岡, 勇気

× 渡部, 晋治

× 石井, 純

× Hiroki, Kanagawa

× Yuuki, Tachioka

× Shinji, Watanabe

× Jun, Ishii

Versions

Share

Cite as

エクスポート