プログラマブルGPUにおけるLU分解の設計と実装

松井, 学; 伊野, 文彦; 萩原, 兼一; Manabu, Matsui; Fumihiko, Ino; Kenichi, Hagihara

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

プログラマブルGPUにおけるLU分解の設計と実装

https://ipsj.ixsq.nii.ac.jp/records/18377

名前 / ファイル	ライセンス	アクション
IPSJ-TACS4612013.pdf (382.4 kB)	Copyright (c) 2005 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2005-08-15

タイトル

プログラマブルGPUにおけるLU分解の設計と実装

タイトル

言語

タイトル

Design and Implementation of LU Decomposition on the Programmable GPU

言語

jpn

キーワード

主題Scheme

Other

主題

GPU応用

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

大阪大学大学院情報科学研究科コンピュータサイエンス専攻現在，日本IBM システムズ・エンジニアリング株式会社

著者所属

大阪大学大学院情報科学研究科コンピュータサイエンス専攻

著者所属

大阪大学大学院情報科学研究科コンピュータサイエンス専攻

著者所属(英)

Department of Computer Science Graduate School of Information Science and Technology Osaka University,Presently with IBM Japan Systems Engineering Co.,Ltd.

著者所属(英)

Department of Computer Science Graduate School of Information Science and Technology Osaka University

著者所属(英)

Department of Computer Science Graduate School of Information Science and Technology Osaka University

著者名

松井, 学

著者名(英)

Manabu, Matsui

論文抄録

内容記述タイプ

Other

内容記述

GPU（Graphics Processing Unit）とは，描画処理の高速化を目的とした1 チッププロセッサのことである．本稿では，プログラマブルGPU の振舞いを解析することを目的として，数値計算の1例としてLU 分解を取り上げ，その設計と実装について述べる．この実現のために，我々は，a）繰返し処理，b）分岐処理，およびc）ベクトル演算に関していくつかの方式を実装し評価した．評価実験の結果，1）依存関係のある繰返しに対してはレンダテクスチャを用いた切替え方式がVRAM（Video Random Access Memory）内のコピーを回避でき，LU 分解の実行時間を半減できたこと，2）CPU およびGPU は，分岐処理の効率に関してトレードオフの関係にあり，行列サイズが512 を超える場合はCPU による分岐処理の効率が良いこと，3）今回の実装において浮動小数点演算性能に関する効率は30%弱であり，Fatahalian らが行列積に関して指摘しているように，LU 分解に関しても，GPU の演算性能を引き出すために高いバンド幅を持つGPU 内キャッシュを必要とすること，および4）GPU による分解結果がCPU のものと一致することはなく，その主な原因は分解における除算の計算誤差が累積するためであることが分かった．

論文抄録(英)

内容記述タイプ

Other

内容記述

The graphics processing unit (GPU) is a single-chip processor whose purpose is to accelerate rendering tasks for interactive visualization. In this paper, to analyze the behavior of the programmable GPU, we describe a design and implementation of LU decomposition as an example of numerical computation. To achieve this, we have developed and evaluated some methods with different implementation approaches in terms of a) loop processing, b) branch processing, and c) vector processing. As a result, our experimental results give four important points: 1) for dependent iterations, a render texture based method avoids copies in the video random access memory (VRAM), cutting the decomposition time in half; 2) there is a tradeoff between CPU- and GPU-based branch methods, and the CPU-based branch provides higher performance for the decomposition of matrices larger than 512×512; 3) the efficiency of floating point operations is at most 30%, and as Fatahalian et al. state for matrix multiplication, the GPU also requires a higher cache bandwidth in order to provide full performance also for LU decomposition; and 4) the GPU provides different decomposition results from those obtained using a CPU, mainly due to the floating point division error that increases the error with the progress of decomposition.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11833852

書誌情報

情報処理学会論文誌コンピューティングシステム（ACS）

巻 46, 号 SIG12(ACS11), p. 129-139, 発行日 2005-08-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7829

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 22:47:25.876951

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

プログラマブルGPUにおけるLU分解の設計と実装

× 松井, 学

× Manabu, Matsui

Versions

Share

Cite as

エクスポート