Item type |
SIG Technical Reports(1) |
公開日 |
2017-07-19 |
タイトル |
|
|
タイトル |
A study on Network Structure and Parameter Exchange Method in large-scale Cluster for Machine Learning |
タイトル |
|
|
言語 |
en |
|
タイトル |
A study on Network Structure and Parameter Exchange Method in large-scale Cluster for Machine Learning |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
学習方式 |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
University of Tsukuba/National Institute of Advanced Industrial Science and Technology |
著者所属 |
|
|
|
University of Tsukuba/National Institute of Advanced Industrial Science and Technology |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology/University of Tsukuba |
著者所属 |
|
|
|
National Institute of Advanced Industrial Science and Technology/University of Tsukuba |
著者名 |
Duo, Zhang
Mingxi, Li
Yusuke, Tanimura
Hidemoto, Nakada
|
著者名(英) |
Duo, Zhang
Mingxi, Li
Yusuke, Tanimura
Hidemoto, Nakada
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
For modern machine learning systems, including deep learning systems, parallelization is inevitable since they are required to process massive amount of training data. One of the hot area of this area is the dataparallel learning where multiple nodes cooperate each other exchanging parameter / gradient periodically. In this paper, we focus on the network resource requirement for this kind of application. We investigate 3-layered Clos network and omega-network adding to the 2-layered fat tree network which we have already reported. As parameter exchange method, we tested direct parameter exchange method and centralized server method. We evaluated these three types of network with SimGrid, a simulator for distributed environment, and confirmed that with suitable parameter exchange methods, we can maintain performance with higher over subscription factor. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
For modern machine learning systems, including deep learning systems, parallelization is inevitable since they are required to process massive amount of training data. One of the hot area of this area is the dataparallel learning where multiple nodes cooperate each other exchanging parameter / gradient periodically. In this paper, we focus on the network resource requirement for this kind of application. We investigate 3-layered Clos network and omega-network adding to the 2-layered fat tree network which we have already reported. As parameter exchange method, we tested direct parameter exchange method and centralized server method. We evaluated these three types of network with SimGrid, a simulator for distributed environment, and confirmed that with suitable parameter exchange methods, we can maintain performance with higher over subscription factor. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10096105 |
書誌情報 |
研究報告システム・アーキテクチャ(ARC)
巻 2017-ARC-227,
号 26,
p. 1-6,
発行日 2017-07-19
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8574 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |