将棋81万：強化学習のための多様性を持った将棋初期局面集

出村, 洋介; 金子, 知適; Yosuke, Demura; Tomoyuki, Kaneko

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

将棋81万：強化学習のための多様性を持った将棋初期局面集

https://ipsj.ixsq.nii.ac.jp/records/229354

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2023021.pdf (429.9 kB)	Copyright (c) 2023 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2023-11-10

タイトル

将棋81万：強化学習のための多様性を持った将棋初期局面集

タイトル

言語

タイトル

Shogi816K: A Diverse Collection of Starting Positions for Reinforcement Learning in Shogi

言語

jpn

キーワード

主題Scheme

Other

主題

ボードゲーム

キーワード

主題Scheme

Other

主題

将棋

キーワード

主題Scheme

Other

主題

チェス 960

キーワード

主題Scheme

Other

主題

強化学習

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院総合文化研究科

著者所属

東京大学大学院総合文化研究科

著者所属(英)

Graduate School of Arts and Sciences, The University of Tokyo

著者所属(英)

Graduate School of Arts and Sciences, The University of Tokyo

著者名

出村, 洋介
金子, 知適

著者名(英)

Yosuke, Demura
Tomoyuki, Kaneko

論文抄録

内容記述タイプ

Other

内容記述

経験の多様性と不偏性は強化学習エージェントの性能や頑健性を向上させるが，大きな計算コストなしにそれを実現するのは困難な場合がある．多くのチェスライクゲームやオセロなどでは，初期状態（初期局面の駒配置等）が固定されていて 1 通りしかないため，AlphaZero スタイルの強化学習を行う場合，エージェントは似たようなエピソードや棋譜を経験しがちである．本論文では，この課題に対応するため，将棋の初期局面を拡張した「将棋 81 万」を提案し，将棋における有効性を実験的に評価する．「将棋 81 万」は，チェス 960 [1] と同様に駒の初期配置を一定の制約のもとでランダムにシャッフルして作成された将棋の初期局面集である．我々は，Gumbel AlphaZero の手法で 1000 万局の自己対局を行って様々なエージェントを訓練する実験を行い，最初に将棋 81 万で事前学習を行った後に通常の将棋に適応学習させたエージェントは，通常の将棋のみで訓練したエージェントよりも人間の対局で見られる様々な戦型において平均的パフォーマンスや頑健性が向上することを示した．

論文抄録(英)

内容記述タイプ

Other

内容記述

While the diversity and unbiasedness in experiences will improve the performance and robustness of reinforcement learning agents, it is sometimes difficult to realize them without incurring significant costs. Many chess variants and Othello are typical domains where agents experience similar episodes (or game records) in AlphaZero-style reinforcement learning because there is a single fixed opening state that restricts the legal moves. In this paper, we address the problem by carefully augmenting opening positions to propose Shogi816K and empirically evaluate the effectiveness in shogi, a Japanese chess variant. As in Chess 960 or Fischer Random Chess [1], Shogi816K randomizes pieces in the opening positions with reasonable restrictions. We trained various agents by Gumbel AlphaZero with ten million game records and showed that agents first pre-trained with Shogi816K and later adapted to the usual shogi achieved better performance in average and robustness with respect to various opening variations in human playing than those trained only with the usual shogi.

書誌情報

ゲームプログラミングワークショップ2023論文集

巻 2023, p. 111-118, 発行日 2023-11-10

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 10:53:59.132236

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

将棋81万：強化学習のための多様性を持った将棋初期局面集

× 出村, 洋介

× 金子, 知適

× Yosuke, Demura

× Tomoyuki, Kaneko

Versions

Share

Cite as

エクスポート