動的タスクスケジューリングによるCPU/GPUヘテロジニアス環境でのFMMの最適化

福田, 圭祐; 丸山, 直也; 松岡, 聡; Keisuke, Fukuda; Naoya, Maruyama; Satoshi, Matsuoka

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

動的タスクスケジューリングによるCPU/GPUヘテロジニアス環境でのFMMの最適化

https://ipsj.ixsq.nii.ac.jp/records/79293

名前 / ファイル	ライセンス	アクション
IPSJ-ARC11197028.pdf (458.1 kB)	Copyright (c) 2011 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2011-11-21

タイトル

動的タスクスケジューリングによるCPU/GPUヘテロジニアス環境でのFMMの最適化

タイトル

言語

タイトル

Towards Optimizations of FMM on CPU-GPU Heterogeneous Environments using Dynamic Task Scheduling Runtimes

言語

jpn

キーワード

主題Scheme

Other

主題

GPU最適化

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

東京工業大学

著者所属

東京工業大学／JST/CREST

著者所属

東京工業大学／JST/CREST

著者所属(英)

Tokyo Institute of Technology

著者所属(英)

Tokyo Institute of Technology / JST/CREST

著者所属(英)

Tokyo Institute of Technology / JST/CREST

著者名

福田, 圭祐

著者名(英)

Keisuke, Fukuda

論文抄録

内容記述タイプ

Other

内容記述

FMM は、N 体問題を O(N) 時間で近似的に計算するアルゴリズムであり、他の N 体問題のアルゴリズムと比較してスケーラブルであることから近年着目されている．一方で，FMM は異なる計算特性や依存性を持つ複数の計算フェーズからなるアルゴリズムであり、それらを複数の異種プロセッサを持つ環境上で効率よく実行する方法は明らかではない．本稿では、筆者らの過去の発表18) に引き続き、FMM の実装である kifmm3d の複数のフェーズを GPU 化し、速度向上と実装上の課題について検討した．さらにそれらの実装を元に、動的タスクスケジューリングシステムである StarPU5) を用いて CPU/GPU からなるヘテロジニアス環境上でプロセッサ資源を有効に利用する試みの初期実装および検討を行い，CUDA スレッドの起動や速度に関する不具合，フェーズ分割および filter の実装に関する検討事項など，様々な技術的問題点や検討事項に関する知見を得た．

論文抄録(英)

内容記述タイプ

Other

内容記述

FMM is an O(N) approximative algorithm for N-body problems and recognized more scalable and promising than other N-body computation methods. Effectively utilizing heterogeneous systmes in FMM, however, is a challenging issue because FMM consists of several phases with different performance characteristics that call for careful load balancing for optimal performance. This paper extends our previous work18) that partially ported the CPU implementation of kifmm3d to CUDA, and presents a complete CUDA implementation. To exploit heterogeneous processing elements, we further extend the implementation with StarPU, which allows dynamic task scheduling on CPU-GPU heterogeneous environments. We have found several technical issues and challenges, such as failing CUDA kernel invocations, phase splitting and implementation of filters, to achieve a good load balancing.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10096105

書誌情報

研究報告計算機アーキテクチャ（ARC）

巻 2011-ARC-197, 号 28, p. 1-9, 発行日 2011-11-21

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 20:18:47.108480

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

動的タスクスケジューリングによるCPU/GPUヘテロジニアス環境でのFMMの最適化

× 福田, 圭祐

× Keisuke, Fukuda

Versions

Share

Cite as

エクスポート