Item type |
SIG Technical Reports(1) |
公開日 |
2019-12-11 |
タイトル |
|
|
タイトル |
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs |
タイトル |
|
|
言語 |
en |
|
タイトル |
A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
GPU |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science |
著者所属 |
|
|
|
AIST-Tokyo Tech RWBC-OIL National Institute of Advanced Industrial Science and Technology |
著者所属 |
|
|
|
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science |
著者所属 |
|
|
|
RIKEN Center for Computational Science |
著者所属(英) |
|
|
|
en |
|
|
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science |
著者所属(英) |
|
|
|
en |
|
|
AIST-Tokyo Tech RWBC-OIL National Institute of Advanced Industrial Science and Technology |
著者所属(英) |
|
|
|
en |
|
|
Tokyo Institute of Technology, Dept. of Mathematical and Computing Science |
著者所属(英) |
|
|
|
en |
|
|
RIKEN Center for Computational Science |
著者名 |
Lingqi, Zhang
Mohamed, Wahib
Haoyu, Zhang
Satoshi, Matsuoka
|
著者名(英) |
Lingqi, Zhang
Mohamed, Wahib
Haoyu, Zhang
Satoshi, Matsuoka
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10463942 |
書誌情報 |
研究報告ハイパフォーマンスコンピューティング(HPC)
巻 2019-HPC-172,
号 14,
p. 1-10,
発行日 2019-12-11
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8841 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |