Surprise とOn-policyness に基づく優先度による省メモリな強化学習

海野, 良介; 鶴岡, 慶雅; Ryosuke, Unno; Yoshimasa, Tsuruoka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Surprise とOn-policyness に基づく優先度による省メモリな強化学習

https://ipsj.ixsq.nii.ac.jp/records/222020

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2022037.pdf (2.6 MB)	Copyright (c) 2022 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2022-11-04

タイトル

Surprise とOn-policyness に基づく優先度による省メモリな強化学習

タイトル

言語

タイトル

Memory-Efficient Reinforcement Learning with Priority based on Surprise and On-policyness

言語

jpn

キーワード

主題Scheme

Other

主題

強化学習

キーワード

主題Scheme

Other

主題

リプレイバッファ

キーワード

主題Scheme

Other

主題

Atari

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo

著者名

海野, 良介
鶴岡, 慶雅

著者名(英)

Ryosuke, Unno
Yoshimasa, Tsuruoka

論文抄録

内容記述タイプ

Other

内容記述

Off-policy 強化学習では，エージェントは環境から遷移データを収集し，パラメータ更新のために集めた遷移データをリプレイバッファに保持する．環境観測が画像で与えられる場合，これらの遷移データを保持するために大量のメモリが消費される．特に，計算資源の限られた状況で強化学習手法を適用する場合，大量のメモリ消費は問題となる．本研究では，遷移データの学習における優先度を計算し，相対的に重要でないと判断されたものから破棄することで，バッファによるメモリ消費を節約する手法を提案する．優先度の評価には，遷移データのsurprise とon-policyness を用いる．Surprise は，データのモデルにとっての目新しさを表し，そのデータから得られる情報を定量化した値であり，On-policyness は，そのデータが現在のモデルの方策に対応しているかを表す値である．本手法により，画像観測の環境において，性能を低下させることなく，リプレイバッファによるメモリ消費を大幅に削減できることを実験的に示した．

論文抄録(英)

内容記述タイプ

Other

内容記述

In off-policy reinforcement learning, an agent collects transition data (a.k.a. experience tuples) from the environment and stores them in a replay buffer for the incoming parameter updates. Storing those tuples consumes a large amount of memory when the environment observations are given as images. Large memory consumption is especially problematic when reinforcement learning methods are applied in scenarios where the computational resources are limited. In this paper, we introduce a method to prune relatively unimportant experience tuples by a simple metric that estimates the importance of experiences and saves the overall memory consumption by the buffer. To measure the importance of experiences, we use surprise and on-policyness. Surprise is quantified by the information gain the model can obtain from the experiences and on-policyness ensures that they are relevant to the current policy. In our experiments, we empirically show that our method can significantly reduce the memory consumption by the replay buffer without decreasing the performance in vision-based environments.

書誌情報

ゲームプログラミングワークショップ2022論文集

巻 2022, p. 235-242, 発行日 2022-11-04

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 13:54:39.716931

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Surprise とOn-policyness に基づく優先度による省メモリな強化学習

× 海野, 良介

× 鶴岡, 慶雅

× Ryosuke, Unno

× Yoshimasa, Tsuruoka

Versions

Share

Cite as

エクスポート