End-to-end音声認識モデルにおける暗黙的言語情報の置換法

森, 大輝; 太田, 健吾; 西村, 良太; 小川, 厚徳; 北岡, 教英; Daiki, Mori; Kengo, Ohta; Ryota, Nishimura; Atsunori, Ogawa; Norihide, Kitaoka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

End-to-end音声認識モデルにおける暗黙的言語情報の置換法

https://ipsj.ixsq.nii.ac.jp/records/211592

名前 / ファイル	ライセンス	アクション
IPSJ-SLP21137017.pdf (835.3 kB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2021-06-11

タイトル

End-to-end音声認識モデルにおける暗黙的言語情報の置換法

タイトル

言語

タイトル

Language Model replacement method for end-to-end speech recognition which excludes implicit linguistic information

言語

jpn

キーワード

主題Scheme

Other

主題

一般発表

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

豊橋技術科学大学

著者所属

阿南工業高等専門学校

著者所属

徳島大学

著者所属

日本電信電話株式会社

著者所属

豊橋技術科学大学

著者名

森, 大輝
太田, 健吾
西村, 良太
小川, 厚徳
北岡, 教英

著者名(英)

Daiki, Mori
Kengo, Ohta
Ryota, Nishimura
Atsunori, Ogawa
Norihide, Kitaoka

論文抄録

内容記述タイプ

Other

内容記述

近年，End-to-end 音声認識が従来の DNN-HMM 音声認識と比べ，高速かつ簡潔であることから注目されている．さらに大量のテキストデータによって学習された言語モデルを併用することで，認識精度が向上すると報告されている．本稿では，音声認識モデルと言語モデルの一般的な統合方法とされる Shallow Fusion を応用した新しい言語モデルの統合方法である Language Model Replacement を提案する．提案法では，事前学習済み音声認識モデルと事前学習済み言語モデルを用いる．提案法ではベイズ則に基づき，音声認識モデルに暗黙的に含まれる言語情報を差し替えることが可能となっている．我々の実験では，学術講演音声データを使用して学習された音声認識モデル内部の言語情報を，模擬講演テキストデータで学習した言語モデルによって差し替えた．模擬講演ドメインにおける提案法の CER は Shallow Fusion での認識精度と比較して，1.3 ポイント上回った．

論文抄録(英)

内容記述タイプ

Other

内容記述

Recently, end-to-end speech recognition has attracted much attention because it is faster and more concise than conventional DNN-HMM speech recognition. It has also been reported that recognition performance is improved by employing a language model trained with a large amount of text data. Based on these observations, we propose a new language model integration method which we call Language Model Replacement. In our proposed method, we use a pre-trained speech recognition model and a pre-trained language model. In contrast to the Shallow Fusion method, our proposed method can replace the linguistic information implied in the ASR model with independently trained model based on Bayes' rule. In our experiments, the ASR linguistic information implicitly trained using the Japanese language Academic Presentation Speech corpus is replaced with the language model trained using the Japanese language Simulated Public Speech corpus. We then compare ASR performance for Japanese speech recognition tasks using the Character Error Rate (CER). Our proposed Language Model Replacement method achieved 1.3 percent lower CER in comparison to the Shallow Fusion method.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2021-SLP-137, 号 17, p. 1-6, 発行日 2021-06-11

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 17:44:50.103151

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

End-to-end音声認識モデルにおける暗黙的言語情報の置換法

× 森, 大輝

× 太田, 健吾

× 西村, 良太

× 小川, 厚徳

× 北岡, 教英

× Daiki, Mori

× Kengo, Ohta

× Ryota, Nishimura

× Atsunori, Ogawa

× Norihide, Kitaoka

Versions

Share

Cite as

エクスポート