Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識

李晃伸; 河原, 達也; 武田, 一哉; 鹿野, 清宏; Akinobu, Lee; Tatsuya, Kawahara; Kazuya, Takeda; Kiyohiro, Shikano

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識

https://ipsj.ixsq.nii.ac.jp/records/57553

名前 / ファイル	ライセンス	アクション
IPSJ-SLP99029008.pdf (585.4 kB)	Copyright (c) 1999 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

1999-12-20

タイトル

Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識

タイトル

言語

タイトル

Phonetic Tied - Mixture Model for LVCSR

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

京都大学情報学研究科

著者所属

京都大学情報学研究科

著者所属

名古屋大学工学研究科

著者所属

奈良先端科学技術大学院大学情報科学研究科

著者所属(英)

Kyoto University

著者所属(英)

Kyoto University

著者所属(英)

Nagoya University

著者所属(英)

Nara Institute of Science and Technology

著者名

李晃伸河原, 達也武田, 一哉鹿野, 清宏

著者名(英)

Akinobu, Lee Tatsuya, Kawahara Kazuya, Takeda Kiyohiro, Shikano

論文抄録

内容記述タイプ

Other

内容記述

大語彙連続音声認識のための新たなphonetic tied-mixture (PTM)モデルを提案する．このモデルはmonophoneモデルの各状態が持つ数十個のガウス分布集合をtriphoneの対応する状態に割り当て，重みのみを変えて共有することで合成する．通常の状態共有triphoneに比べて音響空間を効率よく表現でき，また巨大なコートブックを要する従来のtied-mixtureモデルよりも学習が容易である．JNASの2万語の新聞記事読み上げタスクにおいて評価した結果，triphoneでの最大性能と同等の7.0%の単語誤り率をより少ないパラメータ数で達成した．また処理効率の面においても，音響スコア計算に用いるガウス分布を上位3%にまで削減しても精度がほとんど低下しなかった．いくつかのガウス分布の足切り計算(Gaussian pruning)手法を提案および比較した結果，最終的に音響尤度計算を約5分の1にまで削減できた．

論文抄録(英)

内容記述タイプ

Other

内容記述

A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% at 20k-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

情報処理学会研究報告音声言語情報処理（SLP）

巻 1999, 号 108(1999-SLP-029), p. 43-48, 発行日 1999-12-20

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 04:26:18.706158

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識

× 李晃伸河原, 達也武田, 一哉鹿野, 清宏

× Akinobu, Lee Tatsuya, Kawahara Kazuya, Takeda Kiyohiro, Shikano

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識

× 李晃伸 河原, 達也 武田, 一哉 鹿野, 清宏

× Akinobu, Lee Tatsuya, Kawahara Kazuya, Takeda Kiyohiro, Shikano

Versions

Share

Cite as

エクスポート

× 李晃伸河原, 達也武田, 一哉鹿野, 清宏