{"metadata":{"_oai":{"id":"oai:ipsj.ixsq.nii.ac.jp:00216620","sets":["1164:5159:10869:10870"]},"path":["10870"],"owner":"44499","recid":"216620","title":["Hybrid RNN-T/Attention 構造を用いたストリーミング型End-to-End 音声認識モデルと内部言語モデル統合の検討"],"pubdate":{"attribute_name":"公開日","attribute_value":"2022-02-22"},"_buckets":{"deposit":"4e974b09-aa1e-41c3-ba21-0eb66b1ec9b0"},"_deposit":{"id":"216620","pid":{"type":"depid","value":"216620","revision_id":0},"owners":[44499],"status":"published","created_by":44499},"item_title":"Hybrid RNN-T/Attention 構造を用いたストリーミング型End-to-End 音声認識モデルと内部言語モデル統合の検討","author_link":["559320","559321","559310","559323","559318","559315","559316","559311","559326","559319","559309","559312","559314","559325","559322","559317","559324","559313"],"item_titles":{"attribute_name":"タイトル","attribute_value_mlt":[{"subitem_title":"Hybrid RNN-T/Attention 構造を用いたストリーミング型End-to-End 音声認識モデルと内部言語モデル統合の検討"},{"subitem_title":"A Study on Hybrid RNN-T/Attention-based Streaming ASR with Triggered Chunkwise Attention and Dual Internal Language Model Integration","subitem_title_language":"en"}]},"item_keyword":{"attribute_name":"キーワード","attribute_value_mlt":[{"subitem_subject":"SP2 ","subitem_subject_scheme":"Other"}]},"item_type_id":"4","publish_date":"2022-02-22","item_4_text_3":{"attribute_name":"著者所属","attribute_value_mlt":[{"subitem_text_value":"日本電信電話株式会社／東京工業大学"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"日本電信電話株式会社"},{"subitem_text_value":"東京工業大学"}]},"item_4_text_4":{"attribute_name":"著者所属(英)","attribute_value_mlt":[{"subitem_text_value":"NTT Corporation / Tokyo Institute of Technology","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"NTT Corporation","subitem_text_language":"en"},{"subitem_text_value":"Tokyo Institute of Technology","subitem_text_language":"en"}]},"item_language":{"attribute_name":"言語","attribute_value_mlt":[{"subitem_language":"jpn"}]},"item_publisher":{"attribute_name":"出版者","attribute_value_mlt":[{"subitem_publisher":"情報処理学会","subitem_publisher_language":"ja"}]},"publish_status":"0","weko_shared_id":-1,"item_file_price":{"attribute_name":"Billing file","attribute_type":"file","attribute_value_mlt":[{"url":{"url":"https://ipsj.ixsq.nii.ac.jp/record/216620/files/IPSJ-SLP22140019.pdf","label":"IPSJ-SLP22140019.pdf"},"format":"application/pdf","billing":["billing_file"],"filename":"IPSJ-SLP22140019.pdf","filesize":[{"value":"2.0 MB"}],"mimetype":"application/pdf","priceinfo":[{"tax":["include_tax"],"price":"0","billingrole":"22"},{"tax":["include_tax"],"price":"0","billingrole":"44"}],"accessrole":"open_login","version_id":"cc7ff7a0-8701-4ec9-8d10-6559b2050b58","displaytype":"detail","licensetype":"license_note","license_note":"Copyright (c) 2022 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG."}]},"item_4_creator_5":{"attribute_name":"著者名","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"森谷, 崇史"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"芦原, 孝典"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"安藤, 厚志"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"佐藤, 宏"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"田中, 智大"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"松浦, 孝平"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"増村, 亮"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"デルクロア, マーク"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"篠崎, 隆宏"}],"nameIdentifiers":[{}]}]},"item_4_creator_6":{"attribute_name":"著者名(英)","attribute_type":"creator","attribute_value_mlt":[{"creatorNames":[{"creatorName":"Takafumi, Moriya","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Takanori, Ashihara","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Atsushi, Ando","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Hiroshi, Sato","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Tomohiro, Tanaka","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Kohei, Matsuura","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Ryo, Masumura","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Marc, Delcroix","creatorNameLang":"en"}],"nameIdentifiers":[{}]},{"creatorNames":[{"creatorName":"Takahiro, Shinozaki","creatorNameLang":"en"}],"nameIdentifiers":[{}]}]},"item_4_source_id_9":{"attribute_name":"書誌レコードID","attribute_value_mlt":[{"subitem_source_identifier":"AN10442647","subitem_source_identifier_type":"NCID"}]},"item_4_textarea_12":{"attribute_name":"Notice","attribute_value_mlt":[{"subitem_textarea_value":"SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc."}]},"item_resource_type":{"attribute_name":"資源タイプ","attribute_value_mlt":[{"resourceuri":"http://purl.org/coar/resource_type/c_18gh","resourcetype":"technical report"}]},"item_4_source_id_11":{"attribute_name":"ISSN","attribute_value_mlt":[{"subitem_source_identifier":"2188-8663","subitem_source_identifier_type":"ISSN"}]},"item_4_description_7":{"attribute_name":"論文抄録","attribute_value_mlt":[{"subitem_description":"本研究ではストリーミング音声認識における Recurrent neural network-transducer（RNN-T）と Atten- tion-based decoder（AD）を組み合わせた Hybrid RNN-T/Attention モデルの改善手法について述べる．一般に AD は注意重みの計算に始端から終端までの入力音声情報が必要なためストリーミング動作が困難であった．そこで我々は先行研究として始端から各 trigger の位置までの音響特徴量を用いて注意重みを計算する Triggered attention-based decoder（TAD）と組み合わせることでストリーミング動作可能な Hybrid RNN-T/Attention モデルを提案した．しかしながら従来の TAD ではストリーミング処理を可能としたが，計算量やメモリ消費量に課題があった．本研究では認識精度を保ちながら計算コストが削減可能な Triggered chunkwise attention-based decoder（TCAD）を用いた Hybrid RNN-T/Attention モデルを提案する．また，本研究ではさらなる認識精度の改善に向けて Hybrid RNN-T/Attention モデルが持つ 2 種類の内部言語モデルを用いた言語モデルの統合方法についても検討を行なう．","subitem_description_type":"Other"}]},"item_4_description_8":{"attribute_name":"論文抄録(英)","attribute_value_mlt":[{"subitem_description":"In this paper we propose improvements to our recently proposed hybrid RNN-T/Attention architecture that includes a shared encoder followed by recurrent neural network-transducer (RNN-T) and triggered attention-based decoders (TAD). The use of triggered attention enables the attention-based decoder (AD) to operate in a streaming manner. When a trigger point is detected by RNN-T, TAD uses the context from the start-of-speech up to that trigger point to compute the attention weights. Consequently, the computation costs and the memory consumptions are quadratically increased with the duration of the utterances because all input features must be stored and used to re-compute the attention weights. In this paper, we use a short context from a few frames prior to each trigger point for attention weight computation resulting in reduced computation and memory costs. We call the proposed framework triggered chunkwise AD (TCAD). We also investigate the eﬀectiveness of internal language model (ILM) estimation approach using both ILMs of RNN-T and TCAD heads for improving RNN-T performance. ","subitem_description_type":"Other"}]},"item_4_biblio_info_10":{"attribute_name":"書誌情報","attribute_value_mlt":[{"bibliographicPageEnd":"6","bibliographic_titles":[{"bibliographic_title":"研究報告音声言語情報処理（SLP）"}],"bibliographicPageStart":"1","bibliographicIssueDates":{"bibliographicIssueDate":"2022-02-22","bibliographicIssueDateType":"Issued"},"bibliographicIssueNumber":"19","bibliographicVolumeNumber":"2022-SLP-140"}]},"relation_version_is_last":true,"weko_creator_id":"44499"},"id":216620,"updated":"2025-01-19T15:47:29.989200+00:00","links":{},"created":"2025-01-19T01:17:08.640283+00:00"}