リセット機能を活用したシミュレータにおける効率的な方策学習

橋本, 大世; 鶴岡, 慶雅; Taisei, Hashimoto; Yoshimasa, Tsuruoka

WEKO3

インデックスツリー

RootNode

アイテム

リセット機能を活用したシミュレータにおける効率的な方策学習

https://ipsj.ixsq.nii.ac.jp/records/213449

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2021028.pdf (1.8 MB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2021-11-06

タイトル

リセット機能を活用したシミュレータにおける効率的な方策学習

タイトル

言語

タイトル

Exploiting Reset Functions for Efficient Policy Learning on Simulators

言語

jpn

キーワード

主題Scheme

Other

主題

強化学習

キーワード

主題Scheme

Other

主題

サンプル効率

キーワード

主題Scheme

Other

主題

リセット機能

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者所属(英)

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者名

橋本, 大世
鶴岡, 慶雅

著者名(英)

Taisei, Hashimoto
Yoshimasa, Tsuruoka

論文抄録

内容記述タイプ

Other

内容記述

強化学習ではシミュレータを使った方策学習が一般的である. これは, シミュレータでは実環境よりも速くかつ安全にデータを収集できるためである. 強化学習は試行錯誤を繰り返しながら学習するため一般的に大量のデータが必要であり, シミュレータを使っても学習に長時間かかることが多い. そのため強化学習の実応用に向けて, シミュレータにおける方策学習のサンプル効率を高めることが重要である. サンプル効率の向上を目的とする研究は数多く存在するが, シミュレータでの学習の特性を利用する研究は不十分であり改善の余地がある. そこで本研究では, シミュレータが備えるリセット機能を活用して方策学習を効率化する手法を検討する. 具体的には, 累積報酬の高い軌跡を素早く見つけることで学習効率を高める. そのために, リセットする状態を選ぶ基準や, 不必要なデータ収集を避ける方法を提案する. 実験では,CartPole という古典的なタスクと, Pong, Boxing というビデオゲームのタスクにおいて提案手法の有効性を定量的に検証した. 加えて, 提案手法の動作に関する定性的な分析も行った.

論文抄録(英)

内容記述タイプ

Other

内容記述

In reinforcement learning, it is common to use a simulator for policy learning. This is because an agent can collect data faster and more safely on a simulator than in a real environment. Reinforcement learning generally requires a large amount of data because its learning process is a trial-and-error, and training often takes a long time to learn even using a simulator. Therefore, it is crucial to improve the sample efficiency of policy learning on a simulator for practical applications of reinforcement learning. Although there is a large body of work aiming to improve the sample efficiency, there are not many studies that exploit the characteristics of learning on a simulator. Therefore, in this study, we investigate an approach to improve the efficiency of policy learning by utilizing the reset function of a simulator. Speciﬁcally, we improve the learning efficiency by quickly ﬁnding trajectories with high cumulative rewards. For this purpose, we propose a criterion to select reset states and a method to avoid unnecessary data collection. In the experiments, we quantitatively veriﬁed the effectiveness of the proposed method in a classical task called CartPole and video game tasks of Pong and Boxing. In addition, we conducted a qualitative analysis of the behavior of the proposed method.

書誌情報

ゲームプログラミングワークショップ2021論文集

巻 2021, p. 152-159, 発行日 2021-11-06

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 17:09:28.179061

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

リセット機能を活用したシミュレータにおける効率的な方策学習

× 橋本, 大世

× 鶴岡, 慶雅

× Taisei, Hashimoto

× Yoshimasa, Tsuruoka

Versions

Share

Cite as

エクスポート