WEKO3
アイテム
Update Reward Function based on Accumulated Data
https://ipsj.ixsq.nii.ac.jp/records/214957
https://ipsj.ixsq.nii.ac.jp/records/21495746b7f39f-b417-4ffb-8090-e78a4f90d8c4
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
|
Copyright (c) 2021 by the Information Processing Society of Japan
|
| Item type | National Convention(1) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2021-03-04 | |||||||||
| タイトル | ||||||||||
| タイトル | Update Reward Function based on Accumulated Data | |||||||||
| 言語 | ||||||||||
| 言語 | eng | |||||||||
| キーワード | ||||||||||
| 主題Scheme | Other | |||||||||
| 主題 | 人工知能と認知科学 | |||||||||
| 資源タイプ | ||||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_5794 | |||||||||
| 資源タイプ | conference paper | |||||||||
| 著者所属 | ||||||||||
| University of Southampton | ||||||||||
| 著者所属 | ||||||||||
| BOSCH | ||||||||||
| 著者名 |
中澤, 耕平
× 中澤, 耕平
× 中里, 研一
|
|||||||||
| 論文抄録 | ||||||||||
| 内容記述タイプ | Other | |||||||||
| 内容記述 | Efficient reward functions can shorten the training time for Reinforcement learning, but it could restrict exploration of solution spaces. Here, the sub-reward function is considered using the Curling problem, which aims to stop a stone launched at a constant velocity by exerting various opposing forces. In our procedure, accumulated data, position and velocity, is classified into groups along the final reward, and sub-reward is updated based on the data groups. Consequently, the optimised reward function successfully executes quick commands without the programmer intentionally limiting the agent from exploring the solution space. Finally, we discuss practical situations for our method. | |||||||||
| 書誌レコードID | ||||||||||
| 収録物識別子タイプ | NCID | |||||||||
| 収録物識別子 | AN00349328 | |||||||||
| 書誌情報 |
第83回全国大会講演論文集 巻 2021, 号 1, p. 309-310, 発行日 2021-03-04 |
|||||||||
| 出版者 | ||||||||||
| 言語 | ja | |||||||||
| 出版者 | 情報処理学会 | |||||||||