言語情報と言語・非言語現象を同時認識する音声認識モデルの構築

塩根, 凪人; 若林, 佑幸; 北岡, 教英; Nagito, Shione; Yukoh, Wakabayashi; Norihide, Kitaoka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

言語情報と言語・非言語現象を同時認識する音声認識モデルの構築

https://ipsj.ixsq.nii.ac.jp/records/226432

名前 / ファイル	ライセンス	アクション
IPSJ-SLP23147061.pdf (947.6 kB)	Copyright (c) 2023 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2023-06-16

タイトル

言語情報と言語・非言語現象を同時認識する音声認識モデルの構築

タイトル

言語

タイトル

Automatic speech recognition model simultaneously recognizes linguistic information and verbal/non-verbal phenomena

言語

jpn

キーワード

主題Scheme

Other

主題

一般発表

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

豊橋技術科学大学

著者所属

豊橋技術科学大学

著者所属

豊橋技術科学大学

著者所属(英)

Toyohashi University of Technology

著者所属(英)

Toyohashi University of Technology

著者所属(英)

Toyohashi University of Technology

著者所属(英)

Toyohashi University of Technology

著者名

塩根, 凪人
若林, 佑幸
北岡, 教英

著者名(英)

Nagito, Shione
Yukoh, Wakabayashi
Norihide, Kitaoka

論文抄録

内容記述タイプ

Other

内容記述

近年では音声認識の技術が進歩しているが，言語情報だけ認識するものが多く，言語・非言語現象を認識できない．そこで本研究は，言語情報だけでなく多種類の言語・非言語現象も同時認識する音声認識モデルを提案する．認識する言語・非言語現象は，フィラー・笑い・疑問系上昇調・発話の終了・言い誤り・語のいいさし・小さい声の発話・会話の流れに関わる発話・方言や外国語の発話である．また，言語・非言語現象を示すタグの付与位置による音声認識への影響の調査を行った．実験の結果，日本語日常会話コーパスにおいて言語・非言語現象タグを言語情報の前に付与する認識を行う音声認識モデルが，文字誤り率の観点から最適であることを示した．また，言語・非言語現象の同時認識によって，音声認識精度向上に繋がることがわかった．

論文抄録(英)

内容記述タイプ

Other

内容記述

Although speech recognition technology has advanced in recent years, most of them recognize only linguistic information and cannot recognize verbal/non-verbal (VNV) phenomena. In this study, we propose a speech recognition model that simultaneously recognizes various types of VNV phenomena and linguistic information. The VNV phenomena to be recognized are ﬁller, laughter, rising intonation in questions, end of speech, word errors, word restarts, small speech, speech related to the ﬂow of conversation, and speech in dialects and foreign languages. In addition, we investigated the eﬀect of the position of tags indicating VNV on speech recognition. The experimental results demonstrated that the best speech recognition model for the CEJC database was the one in which the VNV phenomenon tag was annotated before the linguistic information in the transcribed text. We also showed that the accuracy of speech recognition is improved by recognizing VNV phenomena.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2023-SLP-147, 号 61, p. 1-5, 発行日 2023-06-16

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 12:28:39.198637

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

言語情報と言語・非言語現象を同時認識する音声認識モデルの構築

× 塩根, 凪人

× 若林, 佑幸

× 北岡, 教英

× Nagito, Shione

× Yukoh, Wakabayashi

× Norihide, Kitaoka

Versions

Share

Cite as

エクスポート