2024-03-28T21:02:08Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:001077302023-11-17T02:17:36Z06504:06505:07793
LU factorization on Cypress GPUengアーキテクチャhttp://id.nii.ac.jp/1001/00107706/Conference Paperhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=107730&item_no=1&attribute_id=1&file_no=1Copyright (c) 2011 by the Information Processing Society of Japan会津大会津大会津大会津大酒井智哉松本和也中里直人StanislavSedukhinLU-factorization is important part in many practical problems, <br />which are based on the solution of system of linear equations. <br />We present performance result of LU-factorization on Cypress<br />GPU architecture. Cypress GPU can compute 320 fused<br /> multiply-add (FMA) operations per cycle in double precision <br />floating point. Working frequency of the fastest Cypress GPU <br />is 850 MHz, i.e peak performance is 544 Gflop/s (one FMA <br />operation includes 2 flops). Most computations for LU-factorization <br />depend on General Matrix Multiply (GEMM). The performance of<br /> our implementation of GEMM on Cypress GPU is close to 80% of <br />the peak. Our current implementation of LU-factorization achieved<br /> 379 Gflop/s (69 % of the peak), which is the fastest among <br />existing one-chip GPU implementations.AN00349328第73回全国大会講演論文集201112052062011-03-022014-12-17