麻雀のポリシー関数に適したネットワークモデルの構築と評価

清水, 大志; 田中, 哲朗; Taishi, Shimizu; Tetsuro, Tanaka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

麻雀のポリシー関数に適したネットワークモデルの構築と評価

https://ipsj.ixsq.nii.ac.jp/records/199990

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2019027.pdf (1.1 MB)	Copyright (c) 2019 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2019-11-01

タイトル

麻雀のポリシー関数に適したネットワークモデルの構築と評価

タイトル

言語

タイトル

Building and evaluating neural networks for policy functions of mahjong

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院総合文化研究科

著者所属

東京大学情報基盤センター

著者所属(英)

Graduate School of Arts and Sciences, The University of　Tokyo

著者所属(英)

Information Technology Center, The University of Tokyo

著者名

清水, 大志
田中, 哲朗

著者名(英)

Taishi, Shimizu
Tetsuro, Tanaka

論文抄録

内容記述タイプ

Other

内容記述

入力にゲーム固有の特徴量をほとんど用いずに自己対戦による強化学習のみで，AlphaGo Zero は囲碁のトッププレイヤを大きく超える強さを達成した．この成功を受け，他のゲームにおいてもゲーム固有の特徴量をなるべく入力に使わないニューラルネットを強化学習により学習させて強いプレイヤを作成する試みが行われている．強化学習を用いた自己対戦には大量の計算機を使った実験が必要になるが，本研究ではあるゲームにおいて強化学習をさせる前に，事前にそのゲームの性質を持つ小さいゲームを教師あり学習で学習させて，適したネットワークモデルを求める方法を提案する．小さいゲームに対する教師あり学習は短い時間で終了するため，ハイパーパラメータ自動最適化ツールを用いて様々なネットワークモデルの中から適したモデルを選択することが可能である．本研究では，麻雀のゲームとしての特徴を保持しつつ，理論的な最善手が求められるミニゲームを対象として教師あり学習により，ニューラルネットワークのモデルを評価した．提案したモデルは先行研究のモデルよりも高い正解率が得られた．高評価を得たモデルに対して強化学習を適用したが，得られた正解率は低かった．

論文抄録(英)

内容記述タイプ

Other

内容記述

With the success of AlphaGo Zero, which achieved strength far exceeding the top players of Go by using only the reinforcement learning by self-training using almost no game-speciﬁc features for input, many people have been attempting to create strong players by learning a neural network that does not use game-speciﬁc features for input as much as possible. Reinforcement learning with self-training requires ex-periments using a large amount of computation. We propose a method for ﬁnding a suitable network model, which is learned by using a small game with the characteristics of the game with supervised learning before using reinforcement learning. Since supervised learning for a small game can be completed in a short time, we can select a suitable model from various network models using hyperparameter automatic optimization tools. In this study, we evaluated the neural network model by supervised learning for mini-games that require the best of the theory while retaining the characteristics of mahjong. We call this variation of games as mini-mahjong. The proposed model achieves higher accuracy than the models previously proposed. This highly evaluated model applied to reinforcement learning, but the accuracy was low.

書誌情報

ゲームプログラミングワークショップ2019論文集

巻 2019, p. 165-171, 発行日 2019-11-01

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 21:29:05.229737

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

麻雀のポリシー関数に適したネットワークモデルの構築と評価

× 清水, 大志

× 田中, 哲朗

× Taishi, Shimizu

× Tetsuro, Tanaka

Versions

Share

Cite as

エクスポート