| Item type |
SIG Technical Reports(1) |
| 公開日 |
2019-07-22 |
| タイトル |
|
|
タイトル |
Branching Deep Q-Network Agent for Joint Replenishment Policy |
| タイトル |
|
|
言語 |
en |
|
タイトル |
Branching Deep Q-Network Agent for Joint Replenishment Policy |
| 言語 |
|
|
言語 |
eng |
| 資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
| 著者所属 |
|
|
|
Graduate School of Engineering University of Tokyo |
| 著者所属 |
|
|
|
Graduate School of Engineering University of Tokyo |
| 著者所属 |
|
|
|
Graduate School of Engineering University of Tokyo |
| 著者所属(英) |
|
|
|
en |
|
|
Graduate School of Engineering University of Tokyo |
| 著者所属(英) |
|
|
|
en |
|
|
Graduate School of Engineering University of Tokyo |
| 著者所属(英) |
|
|
|
en |
|
|
Graduate School of Engineering University of Tokyo |
| 著者名 |
Hiroshi, Suetsugu
Yoshiaki, Narusue
Hiroyuki, Morikawa
|
| 著者名(英) |
Hiroshi, Suetsugu
Yoshiaki, Narusue
Hiroyuki, Morikawa
|
| 論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This study proposes a reinforcement learning approach to find the near-optimal dynamic ordering policy for a multi-product inventory system with non-stationary demands. The distinguishing feature of multi-product inventory systems is the need to take into account the coordination among products with the aim of total cost reduction. The Markov decision process formulation has been used to obtain an optimal policy. However, the curse of dimensionality has made it intractable for a large number of products. For more products, heuristic algorithms have been proposed on the assumption of a stationary demand in literature. In this study, we propose an extended Q-learning agent with function approximation, called the branching deep Q-network (DQN) with reward allocation based on the branching double DQN. Our numerical experiments show that the proposed agent learns the coordinated order policy without any knowledge of other products' decisions and outperforms non-coordinated forecast-based economic order policy. |
| 論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
This study proposes a reinforcement learning approach to find the near-optimal dynamic ordering policy for a multi-product inventory system with non-stationary demands. The distinguishing feature of multi-product inventory systems is the need to take into account the coordination among products with the aim of total cost reduction. The Markov decision process formulation has been used to obtain an optimal policy. However, the curse of dimensionality has made it intractable for a large number of products. For more products, heuristic algorithms have been proposed on the assumption of a stationary demand in literature. In this study, we propose an extended Q-learning agent with function approximation, called the branching deep Q-network (DQN) with reward allocation based on the branching double DQN. Our numerical experiments show that the proposed agent learns the coordinated order policy without any knowledge of other products' decisions and outperforms non-coordinated forecast-based economic order policy. |
| 書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10505667 |
| 書誌情報 |
研究報告数理モデル化と問題解決(MPS)
巻 2019-MPS-124,
号 2,
p. 1-4,
発行日 2019-07-22
|
| ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8833 |
| Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
| 出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |