Item type |
SIG Technical Reports(1) |
公開日 |
2023-11-28 |
タイトル |
|
|
タイトル |
An Efficient Sparse Matrix Storage Format for Sparse Matrix-Vector Multiplication and Sparse Matrix-Transpose-Vector Multiplication on GPUs |
タイトル |
|
|
言語 |
en |
|
タイトル |
An Efficient Sparse Matrix Storage Format for Sparse Matrix-Vector Multiplication and Sparse Matrix-Transpose-Vector Multiplication on GPUs |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
アクセラレータ |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_18gh |
|
資源タイプ |
technical report |
著者所属 |
|
|
|
Japan Advanced Institute of Science and Technology |
著者所属 |
|
|
|
Japan Advanced Institute of Science and Technology |
著者所属(英) |
|
|
|
en |
|
|
Japan Advanced Institute of Science and Technology |
著者所属(英) |
|
|
|
en |
|
|
Japan Advanced Institute of Science and Technology |
著者名 |
Ryohei, Izawa
Yasushi, Inoguchi
|
著者名(英) |
Ryohei, Izawa
Yasushi, Inoguchi
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
The utilization of sparse matrix storage formats is widespread across various fields, including scientific computing, machine learning, and statistics. Within these domains, there is a need to perform Sparse Matrix-Vector Multiplication (SpMV) and Sparse Matrix-Transpose-Vector Multiplication (SpMVT) iteratively within a single application. However, executing SpMV and SpMVT on GPUs using existing sparse matrix storage formats presents challenges in terms of memory usage, memory access and load balancing. In our study, we present a novel sparse matrix storage format named GCSB, designed specifically for optimizing SpMV and SpMVT operations on GPUs through the implementation of advanced memory compression techniques. Expanding upon the pre-existing CSB format compatible with CPU-based SpMV and SpMVT, we extend its functionality to the GPU environment. This adaptation enables quicker execution of SpMV and SpMVT in comparison to CSR, achieved by effectively utilizing the L1 cache and ensuring load balancing, while maintaining the theoretical memory usage equivalent to that of CSR. Through our experiments, we demonstrate that GCSB achieves comparable theoretical memory usage to CSR while outperforming CSR in terms of speed on various matrices sourced from the University of Florida Sparse Matrix Collection. GCSB achieves a speedup of up to 1.47 speedup on TITAN RTX and 2.75 on A100. Furthermore, we show that GCSB reduces the L1 cache miss counts by strategically grouping and rearranging non-zero elements. Additionally, we conduct a qualitative assessment, affirming that GCSB exhibits superior performance, particularly when non-zero elements are widely dispersed throughout the matrix and the proportion of non-zero elements within the matrix is relatively high. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
The utilization of sparse matrix storage formats is widespread across various fields, including scientific computing, machine learning, and statistics. Within these domains, there is a need to perform Sparse Matrix-Vector Multiplication (SpMV) and Sparse Matrix-Transpose-Vector Multiplication (SpMVT) iteratively within a single application. However, executing SpMV and SpMVT on GPUs using existing sparse matrix storage formats presents challenges in terms of memory usage, memory access and load balancing. In our study, we present a novel sparse matrix storage format named GCSB, designed specifically for optimizing SpMV and SpMVT operations on GPUs through the implementation of advanced memory compression techniques. Expanding upon the pre-existing CSB format compatible with CPU-based SpMV and SpMVT, we extend its functionality to the GPU environment. This adaptation enables quicker execution of SpMV and SpMVT in comparison to CSR, achieved by effectively utilizing the L1 cache and ensuring load balancing, while maintaining the theoretical memory usage equivalent to that of CSR. Through our experiments, we demonstrate that GCSB achieves comparable theoretical memory usage to CSR while outperforming CSR in terms of speed on various matrices sourced from the University of Florida Sparse Matrix Collection. GCSB achieves a speedup of up to 1.47 speedup on TITAN RTX and 2.75 on A100. Furthermore, we show that GCSB reduces the L1 cache miss counts by strategically grouping and rearranging non-zero elements. Additionally, we conduct a qualitative assessment, affirming that GCSB exhibits superior performance, particularly when non-zero elements are widely dispersed throughout the matrix and the proportion of non-zero elements within the matrix is relatively high. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AN10463942 |
書誌情報 |
研究報告ハイパフォーマンスコンピューティング(HPC)
巻 2023-HPC-192,
号 35,
p. 1-6,
発行日 2023-11-28
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
2188-8841 |
Notice |
|
|
|
SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc. |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |