@techreport{oai:ipsj.ixsq.nii.ac.jp:00069905, author = {田村, 哲嗣 and 宮島, 千代美 and 北岡, 教英 and 武田, 一哉 and 山田, 武志 and 滝口, 哲也 and 柘植, 覚 and 山本, 一公 and 西浦, 敬信 and 中山, 雅人 and 傳田, 遊亀 and 藤本, 雅清 and 松田, 繁樹 and 小川, 哲司 and 黒岩, 眞吾 and 中村, 哲 and Satoshi, Tamura and Chiyomi, Miyajima and Norihide, Kitaoka and Kazuya, Takeda and Takeshi, Yamada and Tetsuya, Takiguchi and Satoru, Tsuge and Kazumasa, Yamamoto and Takanobu, Nishiura and Masato, Nakayama and Yuki, Denda and Masakiyo, Fujimoto and Shigeki, Matsuda and Tetsuji, Ogawa and Shingo, Kuroiwa and Satoshi, Nakamura}, issue = {7}, month = {Jul}, note = {本稿では,音声と画像を用いたマルチモーダル音声認識の共通評価基盤 CENSREC-1-AV について紹介する.CENSREC-1-AV では,音声・画像データベースおよびベースラインシステムを提供する.音声は学習用クリーンデータのほか,乗用車走行雑音を付与したものを収録した.画像はカラー映像と近赤外線映像を収録し,ガンマ補正を用いて乗用車走行シミュレーション画像をテストデータとした.ベースラインシステムでは,MFCC と,固有顔ないしはオプティカルフローを特徴量として,マルチストリーム HMM により認識を行った., This paper introduces an evaluation framework for multimodal speech recognition: CENSREC-1-AV. The corpus CENSREC-1-AV provides an audiovisual speech database and a baseline system of multimodal speech recognition. Speech signals were recorded in clean condition for training and in-car noises were overlapped for testing. Color and infrared pictures were captured as training data, and image corruption was conducted for testing using the gamma correction technique. In the baseline system, acoustic MFCC as well as eigenface or optical-flow information are adopted as audio and visual features respectively, then multi-stream HMMs are used as a recognition model.}, title = {雑音下マルチモーダル音声認識評価基盤CENSREC-1-AVの構築}, year = {2010} }