Item type |
Trans(1) |
公開日 |
2019-05-21 |
タイトル |
|
|
タイトル |
Formal Approach to Editing a Tensorflow Computational Graph for Large Model Support |
タイトル |
|
|
言語 |
en |
|
タイトル |
Formal Approach to Editing a Tensorflow Computational Graph for Large Model Support |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
[発表概要,Unrefereed Presentation Abstract] |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_6501 |
|
資源タイプ |
journal article |
著者所属 |
|
|
|
IBM Research - Tokyo |
著者所属 |
|
|
|
IBM Research - Tokyo |
著者所属 |
|
|
|
IBM Research - Tokyo |
著者所属 |
|
|
|
IBM Research - Tokyo |
著者所属(英) |
|
|
|
en |
|
|
IBM Research - Tokyo |
著者所属(英) |
|
|
|
en |
|
|
IBM Research - Tokyo |
著者所属(英) |
|
|
|
en |
|
|
IBM Research - Tokyo |
著者所属(英) |
|
|
|
en |
|
|
IBM Research - Tokyo |
著者名 |
Tung, D. Le
Haruki, Imai
Yasushi, Negishi
Kiyokuni, Kawachiya
|
著者名(英) |
Tung, D. Le
Haruki, Imai
Yasushi, Negishi
Kiyokuni, Kawachiya
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Deep neural networks are becoming larger and their training consumes a huge memory space. While accelerators such as GPUs are suitable for training neural networks, they have limited memory. Meanwhile, host memory that is about 32 times bigger than GPU memory is not fully utilized during training. Moreover, modern IBM machines for AI are integrated with NVLinks that provide very fast connection between CPUs and GPUs. This motivates us to propose a new method to fully utilize host memory as well as NVLinks to support training very large models. In this presentation, we present a formal method for rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted into the graph for temporarily storing intermediate results on CPU memory. In particular, we first revise the concept of a computational graph in TensorFlow by defining a concrete semantics for variables in a graph. We then formally show how to derive swap-out and swap-in operations from an existing graph, and finally present rules to optimize the graph. To show the advantage of our method, we trained a neural network, 3DUNet, for detecting brain tumors. We used an IBM Power8 machine coupled with a NVIDIA Tesla P100 GPU (16GB memory). Power8 is directly connected to the GPU by 80GB/s duplex links (NVLinks). We were able to train 3DUNet using four 3D images of size of 192 3 per mini-batch. Meanwhile, the vanilla TensorFlow 1.8 were only able to train 3DUNet using one 3D images of size of 144 3 per mini-batch. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
Deep neural networks are becoming larger and their training consumes a huge memory space. While accelerators such as GPUs are suitable for training neural networks, they have limited memory. Meanwhile, host memory that is about 32 times bigger than GPU memory is not fully utilized during training. Moreover, modern IBM machines for AI are integrated with NVLinks that provide very fast connection between CPUs and GPUs. This motivates us to propose a new method to fully utilize host memory as well as NVLinks to support training very large models. In this presentation, we present a formal method for rewriting the computational graph of a neural network, in which swap-out and swap-in operations are inserted into the graph for temporarily storing intermediate results on CPU memory. In particular, we first revise the concept of a computational graph in TensorFlow by defining a concrete semantics for variables in a graph. We then formally show how to derive swap-out and swap-in operations from an existing graph, and finally present rules to optimize the graph. To show the advantage of our method, we trained a neural network, 3DUNet, for detecting brain tumors. We used an IBM Power8 machine coupled with a NVIDIA Tesla P100 GPU (16GB memory). Power8 is directly connected to the GPU by 80GB/s duplex links (NVLinks). We were able to train 3DUNet using four 3D images of size of 192 3 per mini-batch. Meanwhile, the vanilla TensorFlow 1.8 were only able to train 3DUNet using one 3D images of size of 144 3 per mini-batch. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA11464814 |
書誌情報 |
情報処理学会論文誌プログラミング(PRO)
巻 12,
号 2,
p. 17-17,
発行日 2019-05-21
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
1882-7802 |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |