GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価

椋木, 大地; 高橋, 大介; Daichi, Mukunoki; Daisuke, Takahashi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価

https://ipsj.ixsq.nii.ac.jp/records/89938

名前 / ファイル	ライセンス	アクション
IPSJ-TACS0601007.pdf (1.0 MB)	Copyright (c) 2013 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2013-01-31

タイトル

GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価

タイトル

言語

タイトル

Implementation and Evaluation of Triple and Quadruple Precision Floating-point Operations on GPUs

言語

jpn

キーワード

主題Scheme

Other

主題

[数値計算] 3倍精度，4倍精度，GPU，BLAS

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

筑波大学大学院システム情報工学研究科

著者所属

筑波大学システム情報系

著者所属(英)

Graduate School of Systems and Information Engineering, University of Tsukuba

著者所属(英)

Faculty of Engineering, Information and Systems, University of Tsukuba

著者名

椋木, 大地
高橋, 大介

著者名(英)

Daichi, Mukunoki
Daisuke, Takahashi

論文抄録

内容記述タイプ

Other

内容記述

本論文では GPU において 3 倍・ 4 倍精度浮動小数点演算を実現し，線形計算への適用例として Level 1-3 の代表的な BLAS （Basic Linear Algebra Subprograms）ルーチンである AXPY， GEMV， GEMM を実装して性能評価を行った結果を示す． 4 倍精度演算には Double-Double 型（DD型）の 4 倍精度演算（DD演算）を用いた．一方で 3 倍精度演算として新たに， Double+Single 型（D+S型）・Double+Int 型（D+I型）の 3 倍精度フォーマットを提案し，内部の計算に DD 演算を用いることで 3 倍精度演算を行う手法を実装した． NVIDIA Tesla M2090 における性能評価では， 3 倍・ 4 倍精度の AXPY・GEMV がメモリ律速となり，その実行時間はデータサイズに比例して，単精度ルーチンに対しておよそ 3 倍， 4 倍となることを示した．我々が提案した 3 倍精度演算は， 3 倍精度データに対する DD 演算がメモリ律速となるケースにおいて， 4 倍精度演算に対する速度面での利点が主張できる． 4 倍精度は必要ないが倍精度では精度が不足する場合では，特に PCI Express やネットワークの帯域が性能のボトルネックとなりやすい GPU クラスタ環境などで， 4 倍精度に対する 3 倍精度の有効性が期待できる．

論文抄録(英)

内容記述タイプ

Other

内容記述

We have implemented triple and quadruple precision floating-point operations on GPUs. As an example of the application of linear algebra operations, we have implemented triple and quadruple precision subroutines of the Basic Linear Algebra Subprograms (BLAS), AXPY, GEMV and GEMM, and evaluated their performance. For quadruple precision, we used Double-Double (DD) type quadruple precision operations (DD-operations). On the other hand, in our research we are proposing Double+Single (D+S) and Double+Int (D+I) type triple precision floating-point formats and triple precision operations that use DD-operations internally. On an NVIDIA Tesla M2090, the triple and quadruple precision AXPY and GEMV are memory-bound. Therefore, the execution time of the triple and quadruple precision operations is approximately 3x and 4x that of the single precision, respectively. Our triple precision operations have the advantage of speed compared to quadruple precision, in cases where the triple precision operations are memory-bound. In cases where quadruple precision is not required, but double precision is insufficient, we predict that our triple precision operations will perform well, especially in environments such as GPU clusters where the bandwidth of the PCI Express and the network may become bottlenecks.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11833852

書誌情報

情報処理学会論文誌コンピューティングシステム（ACS）

巻 6, 号 1, p. 66-77, 発行日 2013-01-31

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7829

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 16:16:47.056686

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価

× 椋木, 大地

× 高橋, 大介

× Daichi, Mukunoki

× Daisuke, Takahashi

Versions

Share

Cite as

エクスポート