Update Reward Function based on Accumulated Data

中澤, 耕平; 中里, 研一

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Update Reward Function based on Accumulated Data

https://ipsj.ixsq.nii.ac.jp/records/214957

名前 / ファイル	ライセンス	アクション
IPSJ-Z83-6P-03.pdf (391.1 kB)	Copyright (c) 2021 by the Information Processing Society of Japan

Item type

National Convention(1)

公開日

2021-03-04

タイトル

Update Reward Function based on Accumulated Data

言語

eng

キーワード

主題Scheme

Other

主題

人工知能と認知科学

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

University of Southampton

著者所属

BOSCH

著者名

中澤, 耕平
中里, 研一

論文抄録

内容記述タイプ

Other

内容記述

Efficient reward functions can shorten the training time for Reinforcement learning, but it could restrict exploration of solution spaces. Here, the sub-reward function is considered using the Curling problem, which aims to stop a stone launched at a constant velocity by exerting various opposing forces. In our procedure, accumulated data, position and velocity, is classified into groups along the final reward, and sub-reward is updated based on the data groups. Consequently, the optimised reward function successfully executes quick commands without the programmer intentionally limiting the agent from exploring the solution space. Finally, we discuss practical situations for our method.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00349328

書誌情報

第83回全国大会講演論文集

巻 2021, 号 1, p. 309-310, 発行日 2021-03-04

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 16:23:20.861874

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Update Reward Function based on Accumulated Data

× 中澤, 耕平

× 中里, 研一

Versions

Share

Cite as

エクスポート