演算器アレイ型アクセラレータにおけるローカルバッファの最適化

下岡, 俊介; 吉村, 和浩; 中田, 尚; 中島, 康彦; Shunsuke, Shitaoka; Kazuhiro, Yoshimura; Takashi, Nakada; Yasuhiko, Nakashima

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

演算器アレイ型アクセラレータにおけるローカルバッファの最適化

https://ipsj.ixsq.nii.ac.jp/records/75385

名前 / ファイル	ライセンス	アクション
IPSJ-ARC11196018.pdf (272.6 kB)	Copyright (c) 2011 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2011-07-20

タイトル

演算器アレイ型アクセラレータにおけるローカルバッファの最適化

タイトル

言語

タイトル

Optimizing Local Buffers for FU Array Accelerator

言語

jpn

キーワード

主題Scheme

Other

主題

ベクトル・アレイ

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

奈良先端科学技術大学院大学

著者所属

奈良先端科学技術大学院大学

著者所属

奈良先端科学技術大学院大学

著者所属

奈良先端科学技術大学院大学

著者所属(英)

Nara Institute of Science and Technology

著者所属(英)

Nara Institute of Science and Technology

著者所属(英)

Nara Institute of Science and Technology

著者所属(英)

Nara Institute of Science and Technology

著者名

下岡, 俊介

著者名(英)

Shunsuke, Shitaoka

論文抄録

内容記述タイプ

Other

内容記述

我々は，高電力効率かつバイナリ互換性を備えた演算器アレイ型アクセラレータ（LAPP）を提案している．LAPP は既存 VLIW 命令で記述されたプログラムの最内ループを演算器アレイに写像し，高速実行する．高速実行時は，写像されたロード命令はアドレス計算とタグ比較によりローカルバッファの該当 way からデータを読み出すことで実行される．ここで各ロード命令とローカルバッファのいずれかの way とは 1 対 1 対応していることを利用することで，各段のロードストアユニットに含まれるアドレス計算とタグ比較の論理を削減し，ローカルバッファに使用される値のみを保持するよう最適化することができる．本稿では，写像されるロード命令とローカルバッファの way を対応させ，各段のロードストアユニットのアドレス計算とタグ比較を削減すること，および，レジスタに置き換えることによりローカルバッファを最適化する手法を提案する．提案手法を HDL により実装し，回路規模および遅延時間を評価した．評価の結果，ローカルバッファの要素が 64 ワードまでは，ロードのレイテンシを 2 から 1 へ確実に削減可能であることが判明した．

論文抄録(英)

内容記述タイプ

Other

内容記述

Our previously proposed FU (functional unit) array accelerator LAPP, can achieve extemely high energy-efficiency by fully exploiting parallelism between inner loop iterations and using minimum necessary units to perform the calculation. For the acceleration purpose, an n-way local buffer is additionally attached to each array stage to efficiently supply data into the arrayed FUs. The load instruction is executed by indexing from the address calculation and comparing the request address with the tags from the indexed lines in all ways of local buffers. In order to reduce the unnecessary comparison, in this paper, we propose an optimized method by removing address calculation and tag comparison. We evaluated circuit area and delay of local buffers of the proposed method by an HDL implementation. The results have indicagted that the load latency can be reduced from 2 cycles to 1 cycle in the local buffers which contain 64 words per each way.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN10096105

書誌情報

研究報告計算機アーキテクチャ（ARC）

巻 2011-ARC-196, 号 18, p. 1-6, 発行日 2011-07-20

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 21:16:13.364383

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

演算器アレイ型アクセラレータにおけるローカルバッファの最適化

× 下岡, 俊介

× Shunsuke, Shitaoka

Versions

Share

Cite as

エクスポート