@techreport{oai:ipsj.ixsq.nii.ac.jp:00057553, author = {李晃伸 and 河原, 達也 and 武田, 一哉 and 鹿野, 清宏 and Akinobu, Lee and Tatsuya, Kawahara and Kazuya, Takeda and Kiyohiro, Shikano}, issue = {108(1999-SLP-029)}, month = {Dec}, note = {大語彙連続音声認識のための新たなphonetic tied-mixture (PTM)モデルを提案する.このモデルはmonophoneモデルの各状態が持つ数十個のガウス分布集合をtriphoneの対応する状態に割り当て,重みのみを変えて共有することで合成する.通常の状態共有triphoneに比べて音響空間を効率よく表現でき,また巨大なコートブックを要する従来のtied-mixtureモデルよりも学習が容易である.JNASの2万語の新聞記事読み上げタスクにおいて評価した結果,triphoneでの最大性能と同等の7.0%の単語誤り率をより少ないパラメータ数で達成した.また処理効率の面においても,音響スコア計算に用いるガウス分布を上位3%にまで削減しても精度がほとんど低下しなかった.いくつかのガウス分布の足切り計算(Gaussian pruning)手法を提案および比較した結果,最終的に音響尤度計算を約5分の1にまで削減できた., A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% at 20k-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.}, title = {Phonetic Tied - Mixtureモデルを用いた大語彙連続音声認識}, year = {1999} }