2020-09-24T07:11:47Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:002049602020-06-19T15:00:00Z01164:05305:10161:10228
An Experimental Evaluation of PUCT Algorithm with Convolutional Neural Network Evaluation FunctionsAn Experimental Evaluation of PUCT Algorithm with Convolutional Neural Network Evaluation Functionsenghttp://id.nii.ac.jp/1001/00204864/Technical Reporthttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=204960&item_no=1&attribute_id=1&file_no=1Copyright (c) 2020 by the Information Processing Society of JapanKochi University of TechnologyKochi University of TechnologyLucien, TroilletKiminori, MatsuzakiOne of the most successful Monte-Carlo tree search (MCTS) applications is AlphaGo and its successors in which neural network evaluation functions are combined with a variant of MCTS called PUCT (Predictor + UCB applied to trees). However, further investigation on how various factors (e.g., evaluation functions, reinforcement learning, number of simulations) impact its performance is still required. To answer this question, our previous work examined the impact of pattern-based (linear) evaluation functions on PUCT outcomes. In this paper, we try to analyze the PUCT algorithm with evaluation functions based on (non-linear) neural networks. We developed several convolutional neural networks attempting to replicate evaluation values of Zebra (an open-source champion-level player) with different quality. Through experiments feeding these to the PUCT algorithm, we find that it still consistently performs better than the evaluation function alone. It also appears that its behavior is independent of linear or non-linear evaluation function usage.One of the most successful Monte-Carlo tree search (MCTS) applications is AlphaGo and its successors in which neural network evaluation functions are combined with a variant of MCTS called PUCT (Predictor + UCB applied to trees). However, further investigation on how various factors (e.g., evaluation functions, reinforcement learning, number of simulations) impact its performance is still required. To answer this question, our previous work examined the impact of pattern-based (linear) evaluation functions on PUCT outcomes. In this paper, we try to analyze the PUCT algorithm with evaluation functions based on (non-linear) neural networks. We developed several convolutional neural networks attempting to replicate evaluation values of Zebra (an open-source champion-level player) with different quality. Through experiments feeding these to the PUCT algorithm, we find that it still consistently performs better than the evaluation function alone. It also appears that its behavior is independent of linear or non-linear evaluation function usage.AA11362144研究報告ゲーム情報学（GI）2020-GI-446182020-06-202188-87362020-06-18