http://swrc.ontoware.org/ontology#TechnicalReport
An Experimental Evaluation of PUCT Algorithm with Convolutional Neural Network Evaluation Functions
en
Kochi University of Technology
Kochi University of Technology
Lucien Troillet
Kiminori Matsuzaki
One of the most successful Monte-Carlo tree search (MCTS) applications is AlphaGo and its successors in which neural network evaluation functions are combined with a variant of MCTS called PUCT (Predictor + UCB applied to trees). However, further investigation on how various factors (e.g., evaluation functions, reinforcement learning, number of simulations) impact its performance is still required. To answer this question, our previous work examined the impact of pattern-based (linear) evaluation functions on PUCT outcomes. In this paper, we try to analyze the PUCT algorithm with evaluation functions based on (non-linear) neural networks. We developed several convolutional neural networks attempting to replicate evaluation values of Zebra (an open-source champion-level player) with different quality. Through experiments feeding these to the PUCT algorithm, we find that it still consistently performs better than the evaluation function alone. It also appears that its behavior is independent of linear or non-linear evaluation function usage.
One of the most successful Monte-Carlo tree search (MCTS) applications is AlphaGo and its successors in which neural network evaluation functions are combined with a variant of MCTS called PUCT (Predictor + UCB applied to trees). However, further investigation on how various factors (e.g., evaluation functions, reinforcement learning, number of simulations) impact its performance is still required. To answer this question, our previous work examined the impact of pattern-based (linear) evaluation functions on PUCT outcomes. In this paper, we try to analyze the PUCT algorithm with evaluation functions based on (non-linear) neural networks. We developed several convolutional neural networks attempting to replicate evaluation values of Zebra (an open-source champion-level player) with different quality. Through experiments feeding these to the PUCT algorithm, we find that it still consistently performs better than the evaluation function alone. It also appears that its behavior is independent of linear or non-linear evaluation function usage.
AA11362144
研究報告ゲーム情報学（GI）
2020-GI-44
6
1-8
2020-06-20
2188-8736