WEKO3
アイテム
Active Utterance Collection for Efficient NLU Model Training in Dialog Systems
https://ipsj.ixsq.nii.ac.jp/records/2005280
https://ipsj.ixsq.nii.ac.jp/records/2005280a67a2748-a692-4819-8f01-1ae78c981d59
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
2027年10月30日からダウンロード可能です。
|
Copyright (c) 2025 by the Information Processing Society of Japan
|
|
| 非会員:¥0, IPSJ:学会員:¥0, DBS:会員:¥0, IFAT:会員:¥0, DLIB:会員:¥0 | ||
| Item type | Trans(1) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2025-10-30 | |||||||||
| タイトル | ||||||||||
| 言語 | ja | |||||||||
| タイトル | Active Utterance Collection for Efficient NLU Model Training in Dialog Systems | |||||||||
| タイトル | ||||||||||
| 言語 | en | |||||||||
| タイトル | Investigating Information Needs During Spreadsheet Data Analysis | |||||||||
| 言語 | ||||||||||
| 言語 | eng | |||||||||
| キーワード | ||||||||||
| 主題Scheme | Other | |||||||||
| 主題 | [研究論文] natural language understanding, data collection, dialog systems, synthetic data generation | |||||||||
| 資源タイプ | ||||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||
| 資源タイプ | journal article | |||||||||
| 著者所属 | ||||||||||
| University of Tsukuba | ||||||||||
| 著者所属 | ||||||||||
| University of Tsukuba | ||||||||||
| 著者所属(英) | ||||||||||
| en | ||||||||||
| University of Tsukuba | ||||||||||
| 著者所属(英) | ||||||||||
| en | ||||||||||
| University of Tsukuba | ||||||||||
| 著者名 |
Rui,Yang
× Rui,Yang
× Kei,Wakabayashi
|
|||||||||
| 著者名(英) |
Rui Yang
× Rui Yang
× Kei Wakabayashi
|
|||||||||
| 論文抄録 | ||||||||||
| 内容記述タイプ | Other | |||||||||
| 内容記述 | The development of natural language understanding (NLU) models for dialogue systems necessitates the collection of a large volume of user utterances as training data, which requires significant human effort. To improve the efficiency of data collection, we develop a novel active utterance collection framework that leverages dialog scenes, which are the states of the dialog manager in the system, to actively control the data collection process. The key idea of the proposed method is to identify dialog scenes where the current NLU model performs worse and collect more data instances in those scenes to efficiently improve the model's performance. To estimate the performance of the NLU model on each dialog scene, we propose two strategies to generate validation data, including a method that uses large language models (LLMs). Empirical evaluations on the Schema-Guided Dialog dataset indicate that the proposed method can improve the efficiency of data collection in scenarios where a substantial labeled validation dataset is available. However, its efficacy diminishes in settings with practical constraints that limit the availability of validation data. These findings underscore the potential of the proposed approach, which opens new avenues for future research in practical methods for enhancing the efficiency of data collection in dialog systems development. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.33(2025) (online) ------------------------------ |
|||||||||
| 論文抄録(英) | ||||||||||
| 内容記述タイプ | Other | |||||||||
| 内容記述 | The development of natural language understanding (NLU) models for dialogue systems necessitates the collection of a large volume of user utterances as training data, which requires significant human effort. To improve the efficiency of data collection, we develop a novel active utterance collection framework that leverages dialog scenes, which are the states of the dialog manager in the system, to actively control the data collection process. The key idea of the proposed method is to identify dialog scenes where the current NLU model performs worse and collect more data instances in those scenes to efficiently improve the model's performance. To estimate the performance of the NLU model on each dialog scene, we propose two strategies to generate validation data, including a method that uses large language models (LLMs). Empirical evaluations on the Schema-Guided Dialog dataset indicate that the proposed method can improve the efficiency of data collection in scenarios where a substantial labeled validation dataset is available. However, its efficacy diminishes in settings with practical constraints that limit the availability of validation data. These findings underscore the potential of the proposed approach, which opens new avenues for future research in practical methods for enhancing the efficiency of data collection in dialog systems development. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.33(2025) (online) ------------------------------ |
|||||||||
| 書誌レコードID | ||||||||||
| 収録物識別子タイプ | NCID | |||||||||
| 収録物識別子 | AA11464847 | |||||||||
| 書誌情報 |
情報処理学会論文誌データベース(TOD) 巻 18, 号 4, 発行日 2025-10-30 |
|||||||||
| ISSN | ||||||||||
| 収録物識別子タイプ | ISSN | |||||||||
| 収録物識別子 | 1882-7799 | |||||||||
| 出版者 | ||||||||||
| 言語 | ja | |||||||||
| 出版者 | 情報処理学会 | |||||||||