どうぶつしょうぎを用いたAlphaZeroの手法の調査

中屋敷, 太一; 金子 知適; Taichi, Nakayashiki; Tomoyuki Kaneko

WEKO3

インデックスツリー

RootNode

アイテム

どうぶつしょうぎを用いたAlphaZeroの手法の調査

https://ipsj.ixsq.nii.ac.jp/records/199977

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2019014.pdf (1.2 MB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2019-11-01

タイトル

どうぶつしょうぎを用いたAlphaZeroの手法の調査

タイトル

言語

タイトル

A Survey on AlphaZero Algorithm through Dobutsu Shogi

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院総合文化研究科

著者所属

東京大学大学院情報学環

著者所属(英)

Graduate School of Arts and Sciences, The University of Tokyo

著者所属(英)

Interfaculty Initiative in Information Studies, the University of Tokyo

著者名

中屋敷, 太一
金子知適

著者名(英)

Taichi, Nakayashiki
Tomoyuki Kaneko

論文抄録

内容記述タイプ

Other

内容記述

AlphaZero は同一のアルゴリズムで強いプレイヤを作成できることを将棋，チェス，そして囲碁の3 つのゲームのそれぞれで示した．しかし AlphaZero の手法は，どのくらいの学習でどのくらい強くなるかなどを理論的に解析することは難しく，プレイヤ強さを測るには実験的に行うしかない．本稿ではAlphaZero の手法で学習を行ったニューラルネットワークがどの程度正しい判断をしているかを，すでに完全解析されたゲームであるどうぶつしょうぎを用いて，完全解析結果と比較し測定した．また異なる大きさのニューラルネットワークを用いて実験を行い，ニューラルネットワークの大きさによる影響を測定した．さらに完全解析結果を用いた教師あり学習も行い，ニューラルネットワークの大きさそのものによる性能比較も行った．最後に AlphaZero が指し手決定の際に用いている探索アルゴリズムである.Monte-Carlo Tree Search について，そのハイパーパラメータによる違いを簡単に調査した．実験の結果，教師あり学習の場合には大きいニューラルネットワークほどよい性能である一方で，AlphaZero の手法で用いる際には必ずしもそうではないことを示した．また Monte-Carlo Tree Search のハイパーパラメータによって探索の挙動が大きく変わることを示した．

論文抄録(英)

内容記述タイプ

Other

内容記述

AlphaZero succeeded to make a strong player with its alrogithm on each game of Shogi (Japanese chess), Chess and Go. However, it is a hard work to analyze the relationship between learning amount and strength of AlphaZero theoretically, so experiments are needed to measure its strength. In this paper, we investigate performance of neural networks which trained in AlphaZero algorithm via Dobutsu Shogi that has already solved, comparing solved data. We trained neural networks of diﬀerent sizes and compare them. Then we conduct supervised learning on neural networks of several sizes with solved data and compare the diﬀerence among them. Finally, using Monte-Carlo Tree Search that is used when AlphaZero decides the next move, we investigate eﬀects of its hyper parameter. As a result, we found that larger neural networks have better performance in supervised learning of our experiments. On the other hand, larger neural networks can be worse in AlphaZero algorithm. Subsequently, we found that the hyper parameter is not negligible for its behavior.

書誌情報

ゲームプログラミングワークショップ2019論文集

巻 2019, p. 86-93, 発行日 2019-11-01

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 21:29:19.753739

Show All versions

Cite as

中屋敷, 太一, 金子知適, 2019: 情報処理学会, 86–93 p.

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

どうぶつしょうぎを用いたAlphaZeroの手法の調査

× 中屋敷, 太一

× 金子知適

× Taichi, Nakayashiki

× Tomoyuki Kaneko

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

どうぶつしょうぎを用いたAlphaZeroの手法の調査

× 中屋敷, 太一

× 金子 知適

× Taichi, Nakayashiki

× Tomoyuki Kaneko

Versions

Share

Cite as

エクスポート

× 金子知適