花札のこいこいにおける方策勾配法とNeural Fitted Q Iterationの適用

佐藤, 直之; 池田, 心; Naoyuki, Sato; Kokolo, Ikeda

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

花札のこいこいにおける方策勾配法とNeural Fitted Q Iterationの適用

https://ipsj.ixsq.nii.ac.jp/records/183837

名前 / ファイル	ライセンス	アクション
IPSJ-GPWS2017010.pdf (4.7 MB)	Copyright (c) 2017 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2017-11-03

タイトル

花札のこいこいにおける方策勾配法とNeural Fitted Q Iterationの適用

タイトル

言語

タイトル

Applying Policy Gradient method and Neural Fitted Q Iteration for Hanafuda Koi-Koi game player

言語

jpn

キーワード

主題Scheme

Other

主題

強いAI

キーワード

主題Scheme

Other

主題

不完全情報ゲーム

キーワード

主題Scheme

Other

主題

花札

キーワード

主題Scheme

Other

主題

強化学習

キーワード

主題Scheme

Other

主題

方策勾配法

キーワード

主題Scheme

Other

主題

Deep Q network

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

北陸先端科学技術大学院大学

著者所属

北陸先端科学技術大学院大学

著者所属(英)

Japan Advanced Institute of Science and Technology

著者所属(英)

Japan Advanced Institute of Science and Technology

著者名

佐藤, 直之
池田, 心

著者名(英)

Naoyuki, Sato
Kokolo, Ikeda

論文抄録

内容記述タイプ

Other

内容記述

花札の「こいこい」ゲームは交互2人零和不完全情報ゲームの一種で，様々な媒体で多くの人に遊ばれているが研究例が少なく，人間の上級者に匹敵する人工プレイヤが開発されたという話も聞かない．そのため我々は強化学習の方策勾配法とNeural Fitted Q Iterationを用いて強い「こいこい」プレイヤの実装を試みた．それぞれ盤面の低級な特徴量268個を入力に用いた人工ニューラルネットワークを状態行動価値の推定に用い，簡単なルールベース人工プレイヤとの反復対戦を通じて適切なパラメータの学習を行った．その結果それぞれ対戦相手から搾取した平均スコアは-0.3点と0.5点となった．

論文抄録(英)

内容記述タイプ

Other

内容記述

Koi-koi game, which is played using Hanafuda playing cards, is a Japanese traditional card game classiﬁed as two players turn based imperfect information zero sum game. There are few research article focusing on this game even though this game is popular in Japan. Therefore, we tried to make strong Koi-koi game player by applying two types of reinforcement learning methods. We applied policy gradient method and neural ﬁtted Q iteration. Each player played games against an artiﬁcial player which we constructed making its decision in a simple rule based manner. Over 1,000 times game, policy gradient player gained -0.3 score per game and neural ﬁtted Q iteration player gained 0.5 scores in average.

書誌情報

ゲームプログラミングワークショップ2017論文集

巻 2017, p. 64-71, 発行日 2017-11-03

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-20 03:30:24.662519

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

花札のこいこいにおける方策勾配法とNeural Fitted Q Iterationの適用

× 佐藤, 直之

× 池田, 心

× Naoyuki, Sato

× Kokolo, Ikeda

Versions

Share

Cite as

エクスポート