言語表現による喉頭摘出者のための音声強調システム

Lester, Phillip Violeta; Wen-ChinHuang, Ding Ma; 山本, 龍一; 小林, 和弘; 戸田, 智基; Lester, Phillip Violeta; Wen-Chin, Huang; Ding, Ma; Ryuichi, Yamamoto; Kazuhiro, Kobayashi; Tomoki, Toda

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

言語表現による喉頭摘出者のための音声強調システム

https://ipsj.ixsq.nii.ac.jp/records/228438

名前 / ファイル	ライセンス	アクション
IPSJ-SLP23148008.pdf (853.7 kB)	Copyright (c) 2023 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
SLP:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2023-10-07

タイトル

言語表現による喉頭摘出者のための音声強調システム

タイトル

言語

タイトル

Electrolaryngeal Speech Enhancement Through Strong Linguistic Encoding Methods

言語

eng

キーワード

主題Scheme

Other

主題

音声

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

名古屋大学情報学研究科知能システム学専攻

著者所属

名古屋大学情報学研究科知能システム学専攻

著者所属

名古屋大学情報学研究科知能システム学専攻

著者所属

名古屋大学情報学研究科知能システム学専攻／株式会社TARVO

著者所属

名古屋大学情報基盤センター

著者所属(英)

Graduate School of Informatics, Nagoya University

著者所属(英)

Graduate School of Informatics, Nagoya University

著者所属(英)

Graduate School of Informatics, Nagoya University

著者所属(英)

Graduate School of Informatics, Nagoya University / TARVO, Inc.

著者所属(英)

Information Technology Center, Nagoya University

著者名

Lester, Phillip Violeta
Wen-ChinHuang, Ding Ma
山本, 龍一
小林, 和弘
戸田, 智基

著者名(英)

Lester, Phillip Violeta
Wen-Chin, Huang
Ding, Ma
Ryuichi, Yamamoto
Kazuhiro, Kobayashi
Tomoki, Toda

論文抄録

内容記述タイプ

Other

内容記述

Although pretraining and ﬁne-tuning approaches have proven to work well in speech intelligibility enhancement, various mismatches, such as the speech type mismatch or a speaker mismatches between the datasets used in each stage, can deteriorate the conversion performance of this framework. We propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a uniﬁed representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch. Such a framework makes it possible to eﬀectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a signiﬁcant 16% improvement in character error rate and 0.83 improvement in naturalness score.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10442647

書誌情報

研究報告音声言語情報処理（SLP）

巻 2023-SLP-148, 号 8, p. 1-6, 発行日 2023-10-07

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8663

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 11:51:07.237953

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

言語表現による喉頭摘出者のための音声強調システム

× Lester, Phillip Violeta

× Wen-ChinHuang, Ding Ma

× 山本, 龍一

× 小林, 和弘

× 戸田, 智基

× Lester, Phillip Violeta

× Wen-Chin, Huang

× Ding, Ma

× Ryuichi, Yamamoto

× Kazuhiro, Kobayashi

× Tomoki, Toda

Versions

Share

Cite as

エクスポート