WEKO3
アイテム
Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread Applications
https://ipsj.ixsq.nii.ac.jp/records/189999
https://ipsj.ixsq.nii.ac.jp/records/1899995eb91342-c4ee-44da-bbd4-92c92afaca25
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
Copyright (c) 2018 by the Information Processing Society of Japan
|
|
オープンアクセス |
Item type | Journal(1) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2018-06-15 | |||||||||||
タイトル | ||||||||||||
タイトル | Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread Applications | |||||||||||
タイトル | ||||||||||||
言語 | en | |||||||||||
タイトル | Performance Improvement Techniques in Tightly Coupled Multicore Architectures for Single-Thread Applications | |||||||||||
言語 | ||||||||||||
言語 | eng | |||||||||||
キーワード | ||||||||||||
主題Scheme | Other | |||||||||||
主題 | [一般論文] multicore, single-thread performance | |||||||||||
資源タイプ | ||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||
資源タイプ | journal article | |||||||||||
著者所属 | ||||||||||||
Department of Electrical Engineering and Computer Science, Nagoya University/Presently with Okuma Corporation | ||||||||||||
著者所属 | ||||||||||||
Department of Information and Communication Engineering, Nagoya University | ||||||||||||
著者所属 | ||||||||||||
Department of Information and Communication Engineering, Nagoya University | ||||||||||||
著者所属(英) | ||||||||||||
en | ||||||||||||
Department of Electrical Engineering and Computer Science, Nagoya University / Presently with Okuma Corporation | ||||||||||||
著者所属(英) | ||||||||||||
en | ||||||||||||
Department of Information and Communication Engineering, Nagoya University | ||||||||||||
著者所属(英) | ||||||||||||
en | ||||||||||||
Department of Information and Communication Engineering, Nagoya University | ||||||||||||
著者名 |
Keita, Doi
× Keita, Doi
× Ryota, Shioya
× Hideki, Ando
|
|||||||||||
著者名(英) |
Keita, Doi
× Keita, Doi
× Ryota, Shioya
× Hideki, Ando
|
|||||||||||
論文抄録 | ||||||||||||
内容記述タイプ | Other | |||||||||||
内容記述 | Current multicore processors achieve high throughput by executing multiple independent programs in parallel. However, it is difficult to utilize multiple cores effectively to reduce the execution time of a single program. This is due to a variety of problems, including slow inter-thread communication and high-overhead thread creation. Dramatic improvements in the single-core architecture have reached their limit; thus, it is necessary to effectively use multiple cores to reduce single-program execution time. Tightly coupled multicore architectures provide a potential solution because of their very low-latency inter-thread communication and very light-weight thread creation. One such multicore architecture called SKY has been proposed. SKY has shown its effectiveness in multithreaded execution of a single program, but several problems must be overcome before further performance improvements can be achieved. The problems this paper focuses on are as follows: 1) The SKY compiler partitions programs at a basic block level, but does not explore the inside of basic blocks. This misses an opportunity to find good partitioning. 2) The SKY processor always sequentializes a new thread if the forking core in which it is supposed to be created is busy. However, this is not necessarily a good decision. 3) If the execution of register communication instructions among cores is delayed, the other register communication instructions can be delayed, causing the following thread execution to stall. This situation occurs when the instruction window of a core becomes full. To address these problems, we propose the following three software and hardware techniques: 1) Instruction-level thread partitioning: the compiler explores the inside of basic blocks to find a better program partition. 2) Selective thread creation: the hardware selectively sequentializes or waits for the creation of a new thread to achieve better performance. 3) Automatic register communication: register communication is automatically performed by a small hardware support instead of using instruction window resources. We evaluated the performance of SKY using SPEC2000 benchmark programs. Results on four cores show that the proposed techniques improved performance by 4% and 26% on average (maximum of 11% and 206%) for SPECint2000 and SPECfp2000 programs, respectively, compared with the case where the proposed techniques are not applied. As a result, performance improvements of 1.21 and 1.93 times on average (maximum of 1.52 and 3.30 times) were achieved, respectively, compared with the performance of a single core. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.26(2018) (online) DOI http://dx.doi.org/10.2197/ipsjjip.26.445 ------------------------------ |
|||||||||||
論文抄録(英) | ||||||||||||
内容記述タイプ | Other | |||||||||||
内容記述 | Current multicore processors achieve high throughput by executing multiple independent programs in parallel. However, it is difficult to utilize multiple cores effectively to reduce the execution time of a single program. This is due to a variety of problems, including slow inter-thread communication and high-overhead thread creation. Dramatic improvements in the single-core architecture have reached their limit; thus, it is necessary to effectively use multiple cores to reduce single-program execution time. Tightly coupled multicore architectures provide a potential solution because of their very low-latency inter-thread communication and very light-weight thread creation. One such multicore architecture called SKY has been proposed. SKY has shown its effectiveness in multithreaded execution of a single program, but several problems must be overcome before further performance improvements can be achieved. The problems this paper focuses on are as follows: 1) The SKY compiler partitions programs at a basic block level, but does not explore the inside of basic blocks. This misses an opportunity to find good partitioning. 2) The SKY processor always sequentializes a new thread if the forking core in which it is supposed to be created is busy. However, this is not necessarily a good decision. 3) If the execution of register communication instructions among cores is delayed, the other register communication instructions can be delayed, causing the following thread execution to stall. This situation occurs when the instruction window of a core becomes full. To address these problems, we propose the following three software and hardware techniques: 1) Instruction-level thread partitioning: the compiler explores the inside of basic blocks to find a better program partition. 2) Selective thread creation: the hardware selectively sequentializes or waits for the creation of a new thread to achieve better performance. 3) Automatic register communication: register communication is automatically performed by a small hardware support instead of using instruction window resources. We evaluated the performance of SKY using SPEC2000 benchmark programs. Results on four cores show that the proposed techniques improved performance by 4% and 26% on average (maximum of 11% and 206%) for SPECint2000 and SPECfp2000 programs, respectively, compared with the case where the proposed techniques are not applied. As a result, performance improvements of 1.21 and 1.93 times on average (maximum of 1.52 and 3.30 times) were achieved, respectively, compared with the performance of a single core. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.26(2018) (online) DOI http://dx.doi.org/10.2197/ipsjjip.26.445 ------------------------------ |
|||||||||||
書誌レコードID | ||||||||||||
収録物識別子タイプ | NCID | |||||||||||
収録物識別子 | AN00116647 | |||||||||||
書誌情報 |
情報処理学会論文誌 巻 59, 号 6, 発行日 2018-06-15 |
|||||||||||
ISSN | ||||||||||||
収録物識別子タイプ | ISSN | |||||||||||
収録物識別子 | 1882-7764 |