Item type |
SIG Technical Reports(1) |
公開日 |
2023-06-16 |
タイトル |
|
|
タイトル |
Sensitivity to Phonemic Contrasts and Insensitivity to Non-phonemic Contrasts of Various Speech Representations Examined for Pronunciation Assessment |
タイトル |
|
|
言語 |
en |
|
タイトル |
Sensitivity to Phonemic Contrasts and Insensitivity to Non-phonemic Contrasts of Various Speech Representations Examined for Pronunciation Assessment |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
一般発表 |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
The University of Tokyo |
著者所属 |
|
|
|
The University of Tokyo |
著者所属 |
|
|
|
The University of Tokyo |
著者所属 |
|
|
|
The University of Tokyo |
著者所属 |
|
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
The University of Tokyo |
著者名 |
Haitong, Sun
Yingxiang, Gao
Yusuke, Shozui
Tong, Ma
Nobuaki, Minematsu
|
著者名(英) |
Haitong, Sun
Yingxiang, Gao
Yusuke, Shozui
Tong, Ma
Nobuaki, Minematsu
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
To assess the segmental aspect of L2 speech produced by various types of learners, researchers and teachers need speech representations which satisfy two conditions of being able to capture phonemic contrasts accurately and ignore non-phonemic contrasts adequately. Acoustically speaking, the former contrasts are often characterized by spectrum envelopes as well as many of the latter, e.g. speaker contrasts, are also characterized as such. Therefore, purely acoustic representation such as MFCC cannot satisfy the two conditions at all. Recently, by using posterior probabilities of phonemes, which are estimated by DNN-based acoustic models of ASR, phonetic posteriorgram is often used for L2 assessment. More recently, various kinds of self-supervised representations are proposed such as wav2vec2 and WavLM. In this study, by setting up a simple and adequate metric to examine sensitivity to phonemic contrasts and insensitivity to non-phonemic contrasts, various pretrained models are compared. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
To assess the segmental aspect of L2 speech produced by various types of learners, researchers and teachers need speech representations which satisfy two conditions of being able to capture phonemic contrasts accurately and ignore non-phonemic contrasts adequately. Acoustically speaking, the former contrasts are often characterized by spectrum envelopes as well as many of the latter, e.g. speaker contrasts, are also characterized as such. Therefore, purely acoustic representation such as MFCC cannot satisfy the two conditions at all. Recently, by using posterior probabilities of phonemes, which are estimated by DNN-based acoustic models of ASR, phonetic posteriorgram is often used for L2 assessment. More recently, various kinds of self-supervised representations are proposed such as wav2vec2 and WavLM. In this study, by setting up a simple and adequate metric to examine sensitivity to phonemic contrasts and insensitivity to non-phonemic contrasts, various pretrained models are compared. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10438388 |
書誌情報 |
研究報告音楽情報科学(MUS)
巻 2023-MUS-137,
号 12,
p. 1-5,
発行日 2023-06-16
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8752 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |