Gather機能を有するメモリアクセラレータの疎行列計算への応用

田邊, 昇; 小郷, 絢子; 小川, 裕佳; 高田, 雅美; 城, 和貴; Noboru, Tanabe; Junko, Kogou; Yuka, Ogawa; Masami, Takata; Kazuki, Joe

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Gather機能を有するメモリアクセラレータの疎行列計算への応用

https://ipsj.ixsq.nii.ac.jp/records/80215

名前 / ファイル	ライセンス	アクション
IPSJ-HPCS2012007.pdf (1.0 MB)	Copyright (c) 2012 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2012-01-17

タイトル

Gather機能を有するメモリアクセラレータの疎行列計算への応用

タイトル

言語

タイトル

Application for Sparse Matrix Computation of a Memory Accelerator with Gather Functions

言語

jpn

キーワード

主題Scheme

Other

主題

疎行列演算

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

(株)東芝

著者所属

奈良女子大学

著者所属

奈良女子大学

著者所属

奈良女子大学

著者所属

奈良女子大学

著者所属(英)

Toshiba corporation

著者所属(英)

Nara women's university

著者所属(英)

Nara women's university

著者所属(英)

Nara women's university

著者所属(英)

Nara women's university

著者名

田邊, 昇

著者名(英)

Noboru, Tanabe

論文抄録

内容記述タイプ

Other

内容記述

本論文では，疎行列ベクトル積のベクトルがデバイスメモリに入りきらないほど大きな問題向けの並列処理方式を提案する．提案手法は GPU が Gather 機能を有する大容量機能メモリ (メモリアクセラレータ) をアクセスするシステムを用いる．長い行を適切な折り目で折り畳む提案アルゴリズム（Fold 法）が負荷分散を改善し並列性を高める．これが生成した行列を転置して用いる方式は GPU 向けのアクセス順序にしている．フロリダ大の疎行列コレクションを用いて提案方式の性能評価を行った．その結果，間接アクセスの直接アクセス化により，単体性能は既存研究の最大 4.1 倍に向上した．GPU 内キャッシュが溢れる心配も無い．GPU 間の 1 対 1 通信を完全に排除可能にした構成によりスケーラビリティは保証されており，機能メモリとのインタフェースのバースト転送バンド幅で制約される単体性能にノード数を乗じたものが並列実効性能となる．

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we propose a parallel processing strategy for huge scale sparse matrix-vector product whose vector cannot be held on a device memory. The strategy uses a system with GPUs and functional memories named Memory Accelerator with gather function. Proposed algorithm named "Fold method" improves load distribution and parallelism. Transposing matrix produced by it improves access sequence for GPU. We evaluate the performance of proposed strategy with University of Florida Sparse Matrix Collection. The result shows the 4.1 times acceleration over the existing performance record with a GPU in the maximum case. There is no risk of performance degradation by overflowing cache capacity on GPU. Because of the architecture without inter-GPU communications, scalability is guaranteed. Therefore, parallel effective performance is the product of number of nodes and single GPU performance limited by burst transfer bandwidth of interface of functional memory.

書誌情報

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集

巻 2012, p. 32-41, 発行日 2012-01-17

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 19:53:17.056478

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Gather機能を有するメモリアクセラレータの疎行列計算への応用

× 田邊, 昇

× Noboru, Tanabe

Versions

Share

Cite as

エクスポート