Item type |
SIG Technical Reports(1) |
公開日 |
2021-02-22 |
タイトル |
|
|
タイトル |
Generating Intrinsic Rewards by Random Recurrent Network Distillation |
タイトル |
|
|
言語 |
en |
|
タイトル |
Generating Intrinsic Rewards by Random Recurrent Network Distillation |
言語 |
|
|
言語 |
eng |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属 |
|
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属 |
|
|
|
Department of Clinical Engineering, College of Life and Health Sciences, Chubu University |
著者所属 |
|
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属 |
|
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属(英) |
|
|
|
en |
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属(英) |
|
|
|
en |
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属(英) |
|
|
|
en |
|
|
Department of Clinical Engineering, College of Life and Health Sciences, Chubu University |
著者所属(英) |
|
|
|
en |
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者所属(英) |
|
|
|
en |
|
|
Department of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology |
著者名 |
Zefeng, Xu
Koichi, Moriyama
Tohgoroh, Matsui
Atsuko, Mutoh
Nobuhiro, Inuzuka
|
著者名(英) |
Zefeng, Xu
Koichi, Moriyama
Tohgoroh, Matsui
Atsuko, Mutoh
Nobuhiro, Inuzuka
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Exploration in sparse reward environments pose significant challenges for many reinforcement learning algorithms. Rather than solely relying on extrinsic rewards provided by environments, many state-of-the-art methods generate intrinsic rewards to encourage the agent explore the environments. However, we found that existing models fall short in some environments, where the agent must visit a same state more than once. Thus, we improve an existing model to propose a novel type of intrinsic exploration bonus which will reward the agent when a new sequence is discovered. The intrinsic reward is the error of a recurrent neural network predicting features of the sequences given by a fixed randomly initialized recurrent neural network. Our approach performs well in some Atari games where conditions must be fulfilled to develop stories. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Exploration in sparse reward environments pose significant challenges for many reinforcement learning algorithms. Rather than solely relying on extrinsic rewards provided by environments, many state-of-the-art methods generate intrinsic rewards to encourage the agent explore the environments. However, we found that existing models fall short in some environments, where the agent must visit a same state more than once. Thus, we improve an existing model to propose a novel type of intrinsic exploration bonus which will reward the agent when a new sequence is discovered. The intrinsic reward is the error of a recurrent neural network predicting features of the sequences given by a fixed randomly initialized recurrent neural network. Our approach performs well in some Atari games where conditions must be fulfilled to develop stories. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10505667 |
書誌情報 |
研究報告数理モデル化と問題解決(MPS)
巻 2021-MPS-132,
号 15,
p. 1-6,
発行日 2021-02-22
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8833 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |