http://swrc.ontoware.org/ontology#Article
Performance Evaluation of Golub-Kahan-Lanczos Algorithm with Reorthogonalization by Classical Gram-Schmidt Algorithm and OpenMP
en
[オリジナル論文] subset computation of singular pairs, Golub-Kahan-Lanczos algorithm with reorthogonalization, classical Gram-Schmidt algorithm with reorthogonalization, OpenMP, sharedmemory multi-core processing
Research Group of Information and Communication Technology for Life, Nara Women's University
Graduate School of Informatics, Kyoto University
Graduate School of Informatics, Kyoto University
Graduate School of Informatics, Kyoto University
Graduate School of Informatics, Kyoto University
Graduate School of Informatics, Kyoto University
Masami Takata
Hiroyuki Ishigami
Kinji Kimura
Yuki Fujii
Hiroki Tanaka
Yoshimasa Nakamura
The Golub-Kahan-Lanczos algorithm with reorthogonalization (GKLR algorithm) is an algorithm for computing a subset of singular triplets of large-scale sparse matrices. The reorthogonalization tends to become a bottleneck of the execution time, as the iteration number of the GKLR algorithm increases. In this paper, OpenMP-based parallel implementation of the classical Gram-Schmidt algorithm with reorthogonalization (OMP-CGS2 algorithm) is introduced. The OMP-CGS2 algorithm has the advantage of data reusability and is expected to achieve higher performance of the reorthogonalization computations on shared-memory multi-core processors with large caches than the conventional reorthogonalization algorithms. Numerical experiments on shared-memory multi-core processors show that the OMP-CGS2 algorithm accelerates the GKLR algorithm more effectively for computing a subset of singular triplets of some matrices than the conventional reorthogonalization algorithms. In addition, we discuss the cache utilization in the OMP-CGS2 algorithm and a condition that the OMP-CGS2 algorithm achieves higher performance than the CGS2 algorithm.
The Golub-Kahan-Lanczos algorithm with reorthogonalization (GKLR algorithm) is an algorithm for computing a subset of singular triplets of large-scale sparse matrices. The reorthogonalization tends to become a bottleneck of the execution time, as the iteration number of the GKLR algorithm increases. In this paper, OpenMP-based parallel implementation of the classical Gram-Schmidt algorithm with reorthogonalization (OMP-CGS2 algorithm) is introduced. The OMP-CGS2 algorithm has the advantage of data reusability and is expected to achieve higher performance of the reorthogonalization computations on shared-memory multi-core processors with large caches than the conventional reorthogonalization algorithms. Numerical experiments on shared-memory multi-core processors show that the OMP-CGS2 algorithm accelerates the GKLR algorithm more effectively for computing a subset of singular triplets of some matrices than the conventional reorthogonalization algorithms. In addition, we discuss the cache utilization in the OMP-CGS2 algorithm and a condition that the OMP-CGS2 algorithm achieves higher performance than the CGS2 algorithm.
AA11464803
情報処理学会論文誌数理モデル化と応用（TOM）
9
2
1-8
2016-08-10
1882-7780