Item type |
SIG Technical Reports(1) |
公開日 |
2022-11-22 |
タイトル |
|
|
タイトル |
Domain and Language Adaptation of Large-scale Pretrained Model for Speech Recognition of Low-resource Language |
タイトル |
|
|
言語 |
en |
|
タイトル |
Domain and Language Adaptation of Large-scale Pretrained Model for Speech Recognition of Low-resource Language |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
音声認識(2) |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Graduate School of Informatics, Kyoto University |
著者所属 |
|
|
|
National Institute of Information Communications and Technology (NICT) |
著者所属 |
|
|
|
Graduate School of Informatics, Kyoto University |
著者所属 |
|
|
|
Graduate School of Informatics, Kyoto University |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Informatics, Kyoto University |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Information Communications and Technology (NICT) |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Informatics, Kyoto University |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Informatics, Kyoto University |
著者名 |
Kak, Soky
Sheng, Li
Chenhui, Chu
Tatsuya, Kawahara
|
著者名(英) |
Kak, Soky
Sheng, Li
Chenhui, Chu
Tatsuya, Kawahara
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were initially trained using European languages; they thus might need to be better adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multilingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows the first conducting domain adaption. Then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
The self-supervised learning (SSL) models are effective for automatic speech recognition (ASR). Due to the huge parameter size, it usually requires about 10 hours of data for finetuning ASR. However, such size of ASR training data is unavailable for some low-resource languages. Moreover, the SSL pre-trained models were initially trained using European languages; they thus might need to be better adapted to other domains or languages. To bare those challenges, We propose a two-step adaptation method: (1) domain adaptation, which uses in-domain multilingual datasets to finetune the pre-trained model, and (2) language adaptation, which finetunes the same language datasets but different domains. Then, we investigate the effectiveness of adapting only one hour of target-labeled data for the ASR task. The experiment using the Extraordinary Chambers in the Courts of Cambodia dataset shows the first conducting domain adaption. Then language adaption is the most effective method for reducing the CER of the baseline by 6.15% and 7.75% of the test and validation sets, respectively. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10442647 |
書誌情報 |
研究報告音声言語情報処理(SLP)
巻 2022-SLP-144,
号 25,
p. 1-5,
発行日 2022-11-22
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8663 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |