講演のリアルタイム字幕付与のための音声認識結果の簡約

大田, 健翔; 秋田, 祐哉; 河原, 達也; Kensho, Ota; Yuya, Akita; Tatsuya, Kawahara

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

講演のリアルタイム字幕付与のための音声認識結果の簡約

https://ipsj.ixsq.nii.ac.jp/records/169883

名前 / ファイル	ライセンス	アクション
IPSJ-SLP16112012.pdf (482.5 kB)	Copyright (c) 2016 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2016-07-21

タイトル

講演のリアルタイム字幕付与のための音声認識結果の簡約

タイトル

言語

タイトル

Condensation of Speech Recognition Results for Real-Time Lecture Captioning

言語

jpn

キーワード

主題Scheme

Other

主題

言語モデル・簡約

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

京都大学情報学研究科

著者所属

京都大学経済学研究科

著者所属

京都大学情報学研究科

著者所属(英)

Graduate School of Informatics, Kyoto University

著者所属(英)

Graduate School of Economics, Kyoto University

著者所属(英)

Graduate School of Informatics, Kyoto University

著者名

大田, 健翔
秋田, 祐哉
河原, 達也

著者名(英)

Kensho, Ota
Yuya, Akita
Tatsuya, Kawahara

論文抄録

内容記述タイプ

Other

内容記述

本研究では，聴覚障がい者への情報保障のために，講演に対する音声認識を用いたリアルタイムの字幕付与を扱う．話し言葉を音声認識で書き起こす際には，冗長な語句も認識結果として出力されるため文字数が増えて読みにくくなる．そこで本研究では，文意を保存しつつ冗長な語句を削減する簡約処理を検討する．具体的には，講演内容を理解するにあたって必要な単語（内容語）とそうでない単語（付属語）に分類し，原則として後者を削除し前者のみを残して字幕として提示する．この原則にあてはまらないものがあるので，内容語で削除するものをアノテーション頻度の比率に基づいて決定し，付属語で復元するものをアノテーション頻度の比率，N-gram による言語尤度比較，機械学習を用いる方法で決定する．講演音声の書き起こしに対して簡約処理を行った結果，正解率 78%・圧縮率 64%で文を圧縮することができた．

論文抄録(英)

内容記述タイプ

Other

内容記述

We have been investigating a real-time captioning framework using automatic speech recognition (ASR) technology for hearing-impaired audience. Since an ASR system transcribes all of speech input, including redundant spoken expressions, resulting captions are very long and thus hard to read and understand. To solve this problem, we propose a “condensation” method, which reduces unnecessary expressions in ASR results as much as possible while retaining key meaning of the utterances. Specifically, each word in ASR results is classified into a content word or a dependent word. Basically, the latter is deleted, while the former is retained for captions. However, there are exceptions in this principle, thus we further introduce refinement process. Redundant content words to be deleted are determined using occurrence counts in annotated training data. On the other hand, for recovery of dependent words, we investigate three methods: occurrence counts in annotated training data, linguistic likelihood measure calculated by an N-gram language model, and a machine learning framework. In an experiment over real lecture transcriptions, word-based compression rate of 64% and accuracy of 78% was obtained.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2016-SLP-112, 号 12, p. 1-6, 発行日 2016-07-21

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 08:49:19.972492

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

講演のリアルタイム字幕付与のための音声認識結果の簡約

× 大田, 健翔

× 秋田, 祐哉

× 河原, 達也

× Kensho, Ota

× Yuya, Akita

× Tatsuya, Kawahara

Versions

Share

Cite as

エクスポート