GPGPU処理系の自動最適化手法におけるシェアードメモリへのデータ転送方法の改良

神谷, 智晴; 丸山, 剛寛; 大野, 和彦; Tomoharu, Kamiya; Takanori, Maruyama; Kazuhiko, Ohno

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

GPGPU処理系の自動最適化手法におけるシェアードメモリへのデータ転送方法の改良

https://ipsj.ixsq.nii.ac.jp/records/98692

名前 / ファイル	ライセンス	アクション
IPSJ-HPC14143006.pdf (1.0 MB)	Copyright (c) 2014 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2014-02-24

タイトル

GPGPU処理系の自動最適化手法におけるシェアードメモリへのデータ転送方法の改良

タイトル

言語

タイトル

An Improved of Transferming Data of Shared Memory in GPGPU Programming Framework

言語

jpn

キーワード

主題Scheme

Other

主題

アクセラレータとメモリシステム

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

三重大学大学院工学研究科

著者所属

三重大学大学院工学研究科

著者所属

三重大学大学院工学研究科

著者所属(英)

Mie University

著者所属(英)

Mie University

著者所属(英)

Mie University

著者名

神谷, 智晴

著者名(英)

Tomoharu, Kamiya

論文抄録

内容記述タイプ

Other

内容記述

近年，GPU 上で汎用計算を実行する GPGPU が注目されている．現在主流な開発環境である CUDA では，高級言語で記述することが可能だが，GPU の複雑なメモリ構造を意識してプログラミングする必要がある．これに対し，我々は単純なメモリ構造モデルでプログラミング可能な MESI-CUDA を提案している．しかし，現在の MESI-CUDA 処理系が生成するコードは最適化が不十分であり，手動最適化を施した CUDA コードと比べて実行時間が長くなることがある．一例として，GPU ではグローバルメモリの他，低容量だがアクセスレイテンシが短いシェアードメモリが複数存在し，手動最適化では両者を明示的に使い分ける．しかし従来の MESI-CUDA 実装ではグローバルメモリしか使用しない．そこで，我々は MESI-CUDA 上でシェアードメモリを用いるコードを自動生成する手法を開発している．本研究では，従来手法に対しシェアードメモリへのデータ転送部分の改良を行った．シェアードメモリへデータを転送する際，実行中のスレッドに合わせて格納するデータを入れ替えることでシェアードメモリの利用効率を向上させた．また，データを単純に分割して各シェアードメモリに格納するだけでなく，境界部分を重複して格納できるようにした．これにより従来手法では対応できなかったプログラムの最適化を可能としている．

論文抄録(英)

内容記述タイプ

Other

内容記述

The performance of Graphics Processing Units (GPU) is improving rapidly. Thus, General Purpose computation on Graphics Processing Units (GPGPU) is expected as an important method forhigh-performance computing. Major developing environment, such as CUDA, enables GPU programming using C, but the user must handle the complicated memory architecture. Therefore, we are developing a new programming framework named MESI-CUDA, which provides a simple memory architecture modelautomatically generating low-level CUDA code. The current implementation of MESI-CUDA may generate inefficient code compared with the hand-optimized CUDA program, because the auto-generated code only uses the global memory of GPU. In this research, we improve our conventional method of transferring data to shared memory. Changing storing data in accordance with executing threads improves efficiency of using shared memory. We propose storing not only divided data but also data on the boundary doubly. These make it possible to optimize program which our conventional method cannot optimize.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10463942

書誌情報

研究報告ハイパフォーマンスコンピューティング（HPC）

巻 2014-HPC-143, 号 6, p. 1-10, 発行日 2014-02-24

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 12:20:47.224306

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

GPGPU処理系の自動最適化手法におけるシェアードメモリへのデータ転送方法の改良

× 神谷, 智晴

× Tomoharu, Kamiya

Versions

Share

Cite as

エクスポート