深層強化学習を用いた麻雀プレイヤの構築

清水, 大志; 田中, 哲朗; Taishi, Shimizu; Tetsuro, Tanaka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

深層強化学習を用いた麻雀プレイヤの構築

https://ipsj.ixsq.nii.ac.jp/records/207668

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2020024.pdf (9.1 MB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2020-11-06

タイトル

深層強化学習を用いた麻雀プレイヤの構築

タイトル

言語

タイトル

Building mahjong player using deep reinforcement learning

言語

jpn

キーワード

主題Scheme

Other

主題

麻雀

キーワード

主題Scheme

Other

主題

強化学習

キーワード

主題Scheme

Other

主題

すずめ雀

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

東京大学大学院総合文化研究科

著者所属

東京大学情報基盤センター

著者所属(英)

Graduate School of Arts and Sciences, The University ofTokyo

著者所属(英)

Information Technology Center, The University of Tokyo

著者名

清水, 大志
田中, 哲朗

著者名(英)

Taishi, Shimizu
Tetsuro, Tanaka

論文抄録

内容記述タイプ

Other

内容記述

本研究では，麻雀で人間の知識をなるべく用いずに人間を超える実力を持つコンピュータプレイヤを作成することを目標とし，そのための第一歩として麻雀を簡略化したすずめ雀を用いて強化学習の効率を高める方法を探求する．すずめ雀は通常の麻雀から手牌や用いる牌の種類を減らし，ルールも単純化したゲームである．多人数ゲームの強化学習を行う場合，single agent の強化学習のように環境として他プレイヤを用意しなくてはいけないが，本研究では，自分の手牌のみを考慮に入れて割引累積報酬和の期待値が最も高い牌を切る一人すずめ雀プレイヤを対戦相手として強化学習を行い，一人すずめ雀プレイヤに迫る強さのプレイヤを作成できた．一方，各局の点数の最大化を目指すのではく，全局を終えたときの平均順位を最小化することを目指して，Super Phoenix で提案されたGlobal Reward Prediction による予測値を報酬に用いる試みを行ったが，平均順位の改善は達成できていない．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this research, we aim to create a computer player with the ability to surpass human beings. As the first step to that end, we will explore a method to improve the efficiency of reinforcement learning by using a simplified mahjong game, Suzume-Jong. Suzume-Jong is a game that reduces the number of hand tiles and tile types from ordinary mahjong and has a simplified rule. When performing reinforcement learning of a multiplayer game, it is necessary to prepare another player as an environment like the reinforcement learning of a single agent. In this research, as opponent players, we used a Suzume-jong player that selects a move that maximizes the expected value of the sum of the discounted rewards taking only his tiles into account. As a result, we succeeded in creating a player of comparable strength to the opponent’s players. Next, we tried to use the predicted value by Global Reward Prediction proposed by Super Phoenix as a reward, aiming to minimize the average ranking. However, we have not achieved the improvement of the average ranking.

書誌情報

ゲームプログラミングワークショップ2020論文集

巻 2020, p. 147-154, 発行日 2020-11-06

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 19:05:49.146651

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

深層強化学習を用いた麻雀プレイヤの構築

× 清水, 大志

× 田中, 哲朗

× Taishi, Shimizu

× Tetsuro, Tanaka

Versions

Share

Cite as

エクスポート