Clarity Prediction Challenge 2のための音声基盤モデルベースの音声了解度の客観評価指標の調査

山本, 克彦; Katsuhiko, Yamamoto

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Clarity Prediction Challenge 2のための音声基盤モデルベースの音声了解度の客観評価指標の調査

https://ipsj.ixsq.nii.ac.jp/records/232545

名前 / ファイル	ライセンス	アクション
IPSJ-SLP24151075.pdf (1.0 MB)	Copyright (c) 2024 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2024-02-22

タイトル

Clarity Prediction Challenge 2のための音声基盤モデルベースの音声了解度の客観評価指標の調査

タイトル

言語

タイトル

Investigation of objective intelligibility metrics based on speech foundation models for Clarity Prediction Challenge 2

言語

jpn

キーワード

主題Scheme

Other

主題

ポスターセッション3 EA/SIP

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

株式会社サイバーエージェントAI Lab

著者所属(英)

AI Lab, CyberAgent, Inc.

著者名

山本, 克彦

著者名(英)

Katsuhiko, Yamamoto

論文抄録

内容記述タイプ

Other

内容記述

Whisper のエンコーダ層などを利用した音声基盤モデル (Speech Foundation Models; SFMs) は，音声信号と雑音を分離することが示唆されている．難聴者の音声了解度を予測するコンペである Clarity Prediction Challenge 2 (CPC2) において， SFM ベースの手法でリファレンス信号を使用しない SFM-OIM (SFM-based objective intelligibility metric) が一位を獲得した．SFM-OIM は，環境音分類のために提案された Whisper-AT のネットワーク構造を，難聴者の音声了解度予測向けに拡張した手法である．本報告では，CPC2 のデータセットを用いて，Whisper から抽出された特徴量を用いた場合の SFM-OIM の実装および再現性の評価実験を実施した．また，Whisper のモデルやバッチサイズを変更した場合の結果についても述べる．

論文抄録(英)

内容記述タイプ

Other

内容記述

Speech Foundation Models (SFMs), which use components like the encoder layer of Whisper, have been suggested to separate speech signals from noise. In the Clarity Prediction Challenge 2 (CPC2), a competition for predicting the speech intelligibility (SI) of individuals with hearing loss, an SFM-based Objective Intelligibility Metric (SFM-OIM) that does not use reference signals took first place. The SFM-OIM is a method that extends the network structure of Whisper-AT, proposed for environmental sound classification for SI prediction of hearing-impaired listeners. This report presents the reproduction implementation and reproducibility evaluation experiments of SFM-OIM using the features extracted from Whisper with the CPC2 dataset. It also discusses the results when changing the Whisper model and batch size.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2024-SLP-151, 号 75, p. 1-6, 発行日 2024-02-22

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 10:24:43.302723

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Clarity Prediction Challenge 2のための音声基盤モデルベースの音声了解度の客観評価指標の調査

× 山本, 克彦

× Katsuhiko, Yamamoto

Versions

Share

Cite as

エクスポート