2024-03-28T19:37:21Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:001999762023-11-14T00:51:14Z06164:06165:06210:09955
Improving Action Branching for Deep Reinforcement Learning with A Multi-dimensional Hybrid Action SpaceImproving Action Branching for Deep Reinforcement Learning with A Multi-dimensional Hybrid Action Spaceenghttp://id.nii.ac.jp/1001/00199883/Conference Paperhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=199976&item_no=1&attribute_id=1&file_no=1Copyright (c) 2019 by the Information Processing Society of JapanThe University of TokyoLaige, PengYoshimasa, TsuruokaRecent deep reinforcement learning methods address the complexity of state space and achieve great success in various video games. Deep Q-Network (DQN)-like algorithms show efficiency in environments with discrete action spaces while policy-based algorithms have good performance in environments with con-tinuous action spaces. However, it is difficult to apply those algorithms in a complicated multi-dimensional hybrid action space in which both discrete and continuous action spaces exist. We propose to combine the action branching architecture proposed by Tavakoli et al. [1] with the proximal policy optimization (PPO) algorithm to address this problem. Our method keeps the continuous action space and achieves better perfor-mance than the dueling double DQN model which discretizes the continuous action space, and shows better compatibility with human demonstration data.Recent deep reinforcement learning methods address the complexity of state space and achieve great success in various video games. Deep Q-Network (DQN)-like algorithms show efficiency in environments with discrete action spaces while policy-based algorithms have good performance in environments with con-tinuous action spaces. However, it is difficult to apply those algorithms in a complicated multi-dimensional hybrid action space in which both discrete and continuous action spaces exist. We propose to combine the action branching architecture proposed by Tavakoli et al. [1] with the proximal policy optimization (PPO) algorithm to address this problem. Our method keeps the continuous action space and achieves better perfor-mance than the dueling double DQN model which discretizes the continuous action space, and shows better compatibility with human demonstration data.ゲームプログラミングワークショップ2019論文集201980852019-11-012019-10-28