8ビット整数化したCNNのGPUからの外部メモリアクセス容量削減による高速化手法

島村, 光太郎; 鈴木, 基也; Kotaro, Shimamura; Motonari, Suzuki

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

8ビット整数化したCNNのGPUからの外部メモリアクセス容量削減による高速化手法

https://ipsj.ixsq.nii.ac.jp/records/238258

名前 / ファイル	ライセンス	アクション
IPSJ-DAS2024034.pdf (1.2 MB) 2026年8月21日からダウンロード可能です。	Copyright (c) 2024 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, SLDM:会員：¥0, DLIB:会員：¥0

Item type

Symposium(1)

公開日

2024-08-21

タイトル

8ビット整数化したCNNのGPUからの外部メモリアクセス容量削減による高速化手法

タイトル

言語

タイトル

Acceleration of 8-bit integer CNN by reducing the size of external memory access from GPU

言語

jpn

キーワード

主題Scheme

Other

主題

機械学習

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

(株)日立製作所

著者所属

(株)日立製作所

著者所属(英)

Hitachi Ltd.

著者所属(英)

Hitachi Ltd.

著者名

島村, 光太郎
鈴木, 基也

著者名(英)

Kotaro, Shimamura
Motonari, Suzuki

論文抄録

内容記述タイプ

Other

内容記述

CNN (Convolutional Neural Network) は画像や音声の認識処理などで実用化が進んでいる．CNN の処理は演算量が多いため，専用ハードウェアを用いて高速化する既存研究も多数存在するが，開発コストやフレキシビリティに難があるため，GPU 上のソフトウェアで処理を行うケースも多い．GPU を用いる場合，16 ビット浮動小数点数や 8 ビット整数といったビット数の少ないデータ型を用いることで高速化を図る手法がよく用いられるが，演算器のピーク性能に比べると，CNN 処理の実質的な性能は桁違いに小さい．本論文では，物体検知用 CNN の 1 つである YOLOv4 を 8 ビット整数化した際の処理時間削減を目的に，GPU からの外部メモリアクセスの容量を削減する手法を提案する．8 ビット整数化に際しては，畳み込み処理は 8 ビット整数で演算することができるが，活性化関数の処理は指数関数を含むため浮動小数点数で計算する必要がありデータサイズが大きくなる．提案手法では，活性化関数の処理結果を 8 ビット整数に変換してから外部メモリに格納することにより，外部メモリアクセスの容量を削減する．YOLOv4 の一部を使用して評価したところ，浮動小数点数のまま外部メモリに格納する場合に比べて処理時間を 22% 削減することができた．

論文抄録(英)

内容記述タイプ

Other

内容記述

CNN (Convolutional Neural Network) has been put into practical use in areas such as the image recognition and the voice recognition. Executing CNN requires a massive amount of calculations, which has lead to many studies on the hardware accelerators of CNN. On the other hand, because of the large development cost and the lack of flexibility of the hardware accelerators, there are many applications in which CNN is executed by the software on a GPU. When executing CNN on GPUs, small data types such as 16bit floating-point number and 8bit integer are often used to accelerate the calculation. Many GPUs achieve much higher peak performance for those data types than larger data types, but the effective performance of the CNN calculation is much lower than the peak performance. In order to accelerate 8-bit integer object detection CNN (YOLOv4), a method to reduce the size of external memory access from GPU is proposed in this paper. When quantizing YOLOv4 to 8 -bit integer, convolution calculation can be executed in 8-bit integer. On the other hand, activation function have to be executed in floating-point number because it contains exponential function, which leads to data size augmentation. In the proposed method, results of the activation function are converted to 8-bit integer and stored in the external memory, which reduces the size of the external memory access. Evaluation of a part of the YOLOv4 has resulted in 22% execution time reduction.

書誌情報

DAシンポジウム2024論文集

巻 2024, p. 217-222, 発行日 2024-08-21

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 08:37:18.700041

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

8ビット整数化したCNNのGPUからの外部メモリアクセス容量削減による高速化手法

× 島村, 光太郎

× 鈴木, 基也

× Kotaro, Shimamura

× Motonari, Suzuki

Versions

Share

Cite as

エクスポート