外部記憶を用いた部分観測環境における教師なし強化学習

中本, 光彦; 鶴岡, 慶雅; Mitsuhiko, Nakamoto; Yoshimasa, Tsuruoka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

外部記憶を用いた部分観測環境における教師なし強化学習

https://ipsj.ixsq.nii.ac.jp/records/213450

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2021029.pdf (2.5 MB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2021-11-06

タイトル

外部記憶を用いた部分観測環境における教師なし強化学習

タイトル

言語

タイトル

Unsupervised Reinforcement Learning for Partially Observable Environments Using External Memories

言語

jpn

キーワード

主題Scheme

Other

主題

深層強化学習

キーワード

主題Scheme

Other

主題

部分観測環境

キーワード

主題Scheme

Other

主題

教師なし強化学習

キーワード

主題Scheme

Other

主題

外部記憶

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学工学部電子情報工学科

著者所属

東京大学大学院情報理工学系研究科電子情報学専攻

著者所属(英)

Department of Information and Communication Engineer-ing, The University of Tokyo

著者所属(英)

Graduate School of Information Science and Technology,The University of Tokyo

著者名

中本, 光彦
鶴岡, 慶雅

著者名(英)

Mitsuhiko, Nakamoto
Yoshimasa, Tsuruoka

論文抄録

内容記述タイプ

Other

内容記述

部分観測環境における深層強化学習の適用は困難である．また，複雑なタスクにおいては適切な報酬関数を設計することも難しいとされている．本研究では，これらの課題を解決するために，部分観測環境における教師なし強化学習のアルゴリズムを提案する．部分観測性に対処するためにエージェントに外部の記憶機構を与え，外部報酬を用いる代わりに相互情報量に基づいた内発的報酬を提案する．提案する内発的報酬は，エージェントに観測情報が非常に限られている状態空間を優先的に探索しながら，有効な記憶を学習させることを可能にする．実験では，HalfCheetah エージェントに限られた観測だけで，外部報酬を一切使用せずに，前後に走ることを習得させることができた．

論文抄録(英)

内容記述タイプ

Other

内容記述

Deep reinforcement learning (RL) is difficult when the environment is partially observable and has no reward function. In this paper, we propose an unsupervised RL algorithm to tackle these problems. We provide the agent with external memory to deal with partial observability, and propose a novel mutual information-based intrinsic reward for unsupervised exploration. The proposed intrinsic reward encourages the agent to explore the state space with strict partial observability, and at the same time, obtain an informative memory. In the experiments, our algorithm enables a HalfCheetah agent to run forward and backward with limited observations and without receiving any external rewards.

書誌情報

ゲームプログラミングワークショップ2021論文集

巻 2021, p. 160-165, 発行日 2021-11-06

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 17:09:27.125540

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

外部記憶を用いた部分観測環境における教師なし強化学習

× 中本, 光彦

× 鶴岡, 慶雅

× Mitsuhiko, Nakamoto

× Yoshimasa, Tsuruoka

Versions

Share

Cite as

エクスポート