行列演算ベンチマークを用いた並列計算機EM - Xの評価

坂根, 広史; 児玉, 祐悦; 佐藤三久; 山名, 早人; 坂井, 修一; 山口, 喜教; Hirofumi, Sakane; Yuetsu, Kodama; Mitsuhisa, Sato; Hayato, Yamana; Shuichi, Sakai; Yoshinori, Yamaguchi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

行列演算ベンチマークを用いた並列計算機EM - Xの評価

https://ipsj.ixsq.nii.ac.jp/records/24031

名前 / ファイル	ライセンス	アクション
IPSJ-ARC96119041.pdf (581.8 kB)	Copyright (c) 1996 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

1996-08-27

タイトル

行列演算ベンチマークを用いた並列計算機EM - Xの評価

タイトル

言語

タイトル

Performance Evaluation for a Matrix Operation Benchmark on EM - X Multiprocessor

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

電子技術総合研究所

著者所属

電子技術総合研究所

著者所属

新情報処理開発機構

著者所属

電子技術総合研究所

著者所属

電子技術総合研究所

著者所属

電子技術総合研究所

著者所属(英)

Electrotechnical Laboratory

著者所属(英)

Electrotechnical Laboratory

著者所属(英)

Real World Computing Partnership

著者所属(英)

Electrotechnical Laboratory

著者所属(英)

Electrotechnical Laboratory

著者所属(英)

Electrotechnical Laboratory

著者名

坂根, 広史

著者名(英)

Hirofumi, Sakane

論文抄録

内容記述タイプ

Other

内容記述

分散メモリ型並列計算機EM?Xの上でLINPACKベンチマークを並列化して実装し、定型的な粗粒度演算および通信パターンが現れる行列問題における浮動小数点演算能力について評価した。並列化においてはピポット列のブロードキャストアルゴリズムと負荷分散の関係や、ブロードキャスト通信と列消去演算のオーバラップについて検討した。最内周ループの逐次実行部分は、過去に報告した高速コードを、多列同時消去によりレジスタの有効利用を図ってさらに高速化し、理論ピーク性能として1要素演算につき4命令で実行できるコードを作成した。そしてこれら有効な高速化手法とEM?X固有の特性との関連を調べた。80PE構成において1000次元のLINPACKベンチマークに対して、354．2　Mflop/s、5000次元で601．5　Mflop/sの実測値を得た。

論文抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we discuss an implementation of the LINPACK benchmark parallelized on the EM-X multiprocessor and evaluate its performance focusing the floating point operations in which a regular repetitive pattern occurs. It is important to overlap the communication and calculation as much as relationship between the broadcast algorithms and load balancing. Exploiting the potential of a reduction of the number of memory accesses and adopting the multi-column simultaneous elimination technique, we also further accelerated the most inner-loop code we had already reported for optimization on a single processor. We demonstrate that the parallelized LINPACK benchmark on the 80PEs system can achieve 354.2 Mflop/s for a matrix of order 1000 and 601.5 Mflop/s for order 5000.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10096105

書誌情報

情報処理学会研究報告計算機アーキテクチャ（ARC）

巻 1996, 号 80(1996-ARC-119), p. 239-244, 発行日 1996-08-27

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-22 20:06:47.882015

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

行列演算ベンチマークを用いた並列計算機EM - Xの評価

× 坂根, 広史

× Hirofumi, Sakane

Versions

Share

Cite as

エクスポート