Item type |
SIG Technical Reports(1) |
公開日 |
2021-07-13 |
タイトル |
|
|
タイトル |
ABCI 2.0: Advances in Open AI Computing Infrastructure at AIST |
タイトル |
|
|
言語 |
en |
|
タイトル |
ABCI 2.0: Advances in Open AI Computing Infrastructure at AIST |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
HPCシステム |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者所属(英) |
|
|
|
en |
|
|
National Institute of Advanced Industrial Science and Technology (AIST) |
著者名 |
Shinichiro, Takizawa
Yusuke, Tanimura
Hidemoto, Nakada
Ryousei, Takano
Hirotaka, Ogawa
|
著者名(英) |
Shinichiro, Takizawa
Yusuke, Tanimura
Hidemoto, Nakada
Ryousei, Takano
Hirotaka, Ogawa
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
ABCI is the world's first large-scale Open AI Computing Infrastructure for both developing AI technologies and bridging them into the industry, operated by AIST, Japan since August 2018. It delivers 19.88 petaflops of HPL performance and achieves 70 seconds for training ResNet-50 model in MLPerf Training v0.6. Last November we achieved world's fastest records for CosmoFlow and DeepCAM in MLPerf HPC benchmarks. ABCI was the fastest supercomputer in Japan until Fugaku made a spectacular debut, however, it soon became short of computing capacity and I/O performance due to the rapid expansion of its usage. This forced us to make a major upgrade to ABCI. With this upgrade, we have added 120 compute nodes and a stroage system with a capacity of 11 PBytes. We named the whole system which includes both existing ABCI and the newly added equipments as ABCI 2.0. ABCI 2.0 provides the same software environment that ABCI provided. It enables that existing ABCI users can easily use the newly equipments in a similar way they used ABCI. We compared the performance of existing and new compute nodes and found that new nodes had 4.1 times higher performance than existing nodes in training ResNet-50 model using PyTorch. We expect that the new nodes largely contributes to increase the system throughput. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
ABCI is the world's first large-scale Open AI Computing Infrastructure for both developing AI technologies and bridging them into the industry, operated by AIST, Japan since August 2018. It delivers 19.88 petaflops of HPL performance and achieves 70 seconds for training ResNet-50 model in MLPerf Training v0.6. Last November we achieved world's fastest records for CosmoFlow and DeepCAM in MLPerf HPC benchmarks. ABCI was the fastest supercomputer in Japan until Fugaku made a spectacular debut, however, it soon became short of computing capacity and I/O performance due to the rapid expansion of its usage. This forced us to make a major upgrade to ABCI. With this upgrade, we have added 120 compute nodes and a stroage system with a capacity of 11 PBytes. We named the whole system which includes both existing ABCI and the newly added equipments as ABCI 2.0. ABCI 2.0 provides the same software environment that ABCI provided. It enables that existing ABCI users can easily use the newly equipments in a similar way they used ABCI. We compared the performance of existing and new compute nodes and found that new nodes had 4.1 times higher performance than existing nodes in training ResNet-50 model using PyTorch. We expect that the new nodes largely contributes to increase the system throughput. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10463942 |
書誌情報 |
研究報告ハイパフォーマンスコンピューティング(HPC)
巻 2021-HPC-180,
号 18,
p. 1-8,
発行日 2021-07-13
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8841 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |