スライド情報を用いた言語モデル適応による講義音声認識

河原, 達也; 根本, 雄介; 勝丸徳浩; 秋田, 祐哉; Tatsuya, Kawahara; Yusuke, Nemoto; Norihiro, Katsumaru; Yuya, Akita

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

スライド情報を用いた言語モデル適応による講義音声認識

https://ipsj.ixsq.nii.ac.jp/records/9266

名前 / ファイル	ライセンス	アクション
IPSJ-JNL5002004.pdf (322.5 kB)	Copyright (c) 2009 by the Information Processing Society of Japan
オープンアクセス

Item type

Journal(1)

公開日

2009-02-15

タイトル

スライド情報を用いた言語モデル適応による講義音声認識

タイトル

言語

タイトル

Automatic Lecture Transcription by Exploiting Slide Information for Language Model Adaptation

言語

jpn

キーワード

主題Scheme

Other

主題

特集：音声ドキュメント処理

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

その他タイトル

その他のタイトル

言語モデル、トピック推定

著者所属

京都大学大学院情報学研究科

著者所属

京都大学大学院情報学研究科

著者所属

京都大学大学院情報学研究科

著者所属

京都大学大学院情報学研究科

著者所属(英)

Graduate School of Informatics, Kyoto University

著者所属(英)

Graduate School of Informatics, Kyoto University

著者所属(英)

Graduate School of Informatics, Kyoto University

著者所属(英)

Graduate School of Informatics, Kyoto University

著者名

河原, 達也根本, 雄介勝丸徳浩秋田, 祐哉

著者名(英)

Tatsuya, Kawahara Yusuke, Nemoto Norihiro, Katsumaru Yuya, Akita

論文抄録

内容記述タイプ

Other

内容記述

大学などの講義で使用されるスライドの情報を用いて，言語モデルを動的に適応することにより，音声認識の高精度化を実現する方法を提案する．まず，当該講義のスライド全体のテキストを用いて，PLSA（Probabilistic Latent Semantic Analysis）によりN-gramモデルの話題への適応を行う．次に，発話に対応する個々のスライドの情報を用いて，キャッシュモデルによりスライドに現れる単語の確率を強化し，認識結果のリスコアリングを行う．京都大学で行われた技術講習会と正規の講義を対象とした音声認識において評価を行った結果，PLSAによる大域的な適応とキャッシュモデルによる局所的な適応を組み合わせることにより，認識精度の有意な改善が得られた．特に，キーワードの検出で大きな改善が得られ，大学の講義でも80%に近い精度（F値）を実現した．

論文抄録(英)

内容記述タイプ

Other

内容記述

We investigate several language model adaptation methods which exploit presentation slide information for automatic lecture transcription. First, N-gram probabilities are re-scaled with lecture-dependent unigram probabilities estimated by PLSA (Probabilistic Latent Semantic Analysis) using all slides of the lecture. Then, N-best hypotheses of the initial speech recognition results are re-scored using word probabilities enhanced with a cache model using the slide corresponding to each utterance. Experimental evaluations on real lectures show that the proposed method with the combination of the global and local slide information achieves a significant improvement of recognition accuracy, especially in the detection rate of content keywords.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 50, 号 2, p. 469-476, 発行日 2009-02-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

戻る

views

See details

	Views

Versions

Ver.1

2025-01-23 03:27:57.563175

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

スライド情報を用いた言語モデル適応による講義音声認識

× 河原, 達也根本, 雄介勝丸徳浩秋田, 祐哉

× Tatsuya, Kawahara Yusuke, Nemoto Norihiro, Katsumaru Yuya, Akita

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

スライド情報を用いた言語モデル適応による講義音声認識

× 河原, 達也 根本, 雄介 勝丸徳浩 秋田, 祐哉

× Tatsuya, Kawahara Yusuke, Nemoto Norihiro, Katsumaru Yuya, Akita

Versions

Share

Cite as

エクスポート

× 河原, 達也根本, 雄介勝丸徳浩秋田, 祐哉