WEKO3
アイテム
Simple Memory Machine Models for GPUs
https://ipsj.ixsq.nii.ac.jp/records/80976
https://ipsj.ixsq.nii.ac.jp/records/80976c93229d0-5ed2-4449-88c2-1d7c93b13502
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2012 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | SIG Technical Reports(1) | |||||||
---|---|---|---|---|---|---|---|---|
公開日 | 2012-03-07 | |||||||
タイトル | ||||||||
タイトル | Simple Memory Machine Models for GPUs | |||||||
タイトル | ||||||||
言語 | en | |||||||
タイトル | Simple Memory Machine Models for GPUs | |||||||
言語 | ||||||||
言語 | eng | |||||||
資源タイプ | ||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_18gh | |||||||
資源タイプ | technical report | |||||||
著者所属 | ||||||||
Hiroshima University | ||||||||
著者所属(英) | ||||||||
en | ||||||||
Hiroshima University | ||||||||
著者名 |
Koji, Nakano
× Koji, Nakano
|
|||||||
著者名(英) |
Koji, Nakano
× Koji, Nakano
|
|||||||
論文抄録 | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | The main contribution of this paper is to introduce two parallel memory machines, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). Unlike well studied theoretical parallel computational models such as PRAMs, these parallel memory machines are practical and capture the essential feature of memory access of NVIDIA GPUs. As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluated the computing time for the contiguous access and the stride access to the memory on these models. We also present parallel algorithms to transpose a two dimensional array on these models. Since the computing time of our transposing algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency overhead, they are optimal from the theoretical point of view. | |||||||
論文抄録(英) | ||||||||
内容記述タイプ | Other | |||||||
内容記述 | The main contribution of this paper is to introduce two parallel memory machines, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). Unlike well studied theoretical parallel computational models such as PRAMs, these parallel memory machines are practical and capture the essential feature of memory access of NVIDIA GPUs. As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluated the computing time for the contiguous access and the stride access to the memory on these models. We also present parallel algorithms to transpose a two dimensional array on these models. Since the computing time of our transposing algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency overhead, they are optimal from the theoretical point of view. | |||||||
書誌レコードID | ||||||||
収録物識別子タイプ | NCID | |||||||
収録物識別子 | AN1009593X | |||||||
書誌情報 |
研究報告アルゴリズム(AL) 巻 2012-AL-139, 号 9, p. 1-8, 発行日 2012-03-07 |
|||||||
Notice | ||||||||
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. | ||||||||
出版者 | ||||||||
言語 | ja | |||||||
出版者 | 情報処理学会 |