超並列計算機を用いた入力音声の変動に頑健な音声対話システムの検討

中川, 竜太; 岩野公司; 古井貞煕; Ryuta, NAKAGAWA; KojiIWANO; SadaokiFURUI

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

超並列計算機を用いた入力音声の変動に頑健な音声対話システムの検討

https://ipsj.ixsq.nii.ac.jp/records/56933

名前 / ファイル	ライセンス	アクション
IPSJ-SLP05059016.pdf (921.4 kB)	Copyright (c) 2005 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2005-12-22

タイトル

超並列計算機を用いた入力音声の変動に頑健な音声対話システムの検討

タイトル

言語

タイトル

Spoken dialogue system robust against speech variations based on massively parallel computing

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京工業大学

著者所属

東京工業大学

著者所属

東京工業大学

著者所属(英)

Tokyo Institute of Technology

著者所属(英)

Tokyo Institute of Technology

著者所属(英)

Tokyo Institute of Technology

著者名

中川, 竜太

著者名(英)

Ryuta, NAKAGAWA

論文抄録

内容記述タイプ

Other

内容記述

入力音声の変動のうち，事前に予測できる変動にはそれらに適したモデルを予め用意し，予測が困難な変動にはモデルを逐次適応化することで，音声認識の頑健性が向上する．計算量が莫大となるこれらの手法を組み合わせて，実時間での処理が要求される音声対話システムに適用するために，超並列計算機を用いることを検討する．本稿では，複数の音声認識器を同時並行に駆動して得られた複数の認識仮説を尤度を基準に選択し，適応処理をバックグラウンドで行うアーキテクチャをGRID上に実装した．飲食店舗検索タスクにおいて，発話内容（話題・発話カテゴリ）による入力音声の言語的変動を表す複数の言語モデルと，話者の違いによる音響的変動を表す複数の特定話者音響モデルを用いた．事前に収録された対話音声による評価実験を行ったところ，単一の音響モデルと言語モデルによる従来のシステムと比べ，75台の音声認識ノードと15台の話者適応ノードを駆動することで，構築したシステムではキーワード認識誤り率を25．5％削減することができた．

論文抄録(英)

内容記述タイプ

Other

内容記述

Robustness of speech recognition increases by preparing models suitable to acoustic and linguistic variations when they can be predicted. This also increases by incrementally adapting the models when the variation is difficult to predict. In order to combine these methods which need huge amount of computation, and implement them in spoken dialogue systems which need real time processing, this paper investigates using a massively parallel computer. Architecture of selecting a recognition result having the maximum likelihood from the results obtained by multiple speech recognizers driven in parallel and running adaptation processes in the background has been implemented on a GRID computing system. In a restaurant information retrieval task, multiple language models representing linguistic variations of input speech according to utterance contents (topics/utterance categories) and multiple speaker-dependent acoustic models representing speaker variations have been used. Results of evaluation experiments using pre-recorded dialogue utterances show that the proposed system achieves 25.5% reduction in the keyword recognition error rate in comparison with a conventional system using a single acoustic as well as language model, when 75 recognition nodes and 15 speaker-adaptation nodes are driven.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

情報処理学会研究報告音声言語情報処理（SLP）

巻 2005, 号 127(2005-SLP-059), p. 91-96, 発行日 2005-12-22

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 04:44:05.836573

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

超並列計算機を用いた入力音声の変動に頑健な音声対話システムの検討

× 中川, 竜太

× Ryuta, NAKAGAWA

Versions

Share

Cite as

エクスポート