# Prediction Switching for Photonic Network-on-chip

CISSE AHMADOU DIT ADI ,<sup>†1</sup> HIROKI MATSUTANI ,<sup>†2</sup> MICHIHIRO KOIBUCHI ,<sup>†3</sup> SAYAKA AKIOKA <sup>†1</sup> and TSUTOMU YOSHINAGA<sup>†1</sup>

Scalable network-on-chip (NoC) infrastructure faces critical challenges in meeting the enormous bandwidth requirement, low latency and low power consumption for future multiprocessors. With the development of nanophotonics, optical interconnection has been proposed as potential solution to replace the electric NoC which requires large power consumption caused by its electric component. However, buffering optical data is still difficult to be implemented at the chip level. In this paper, we discuss hybrid photonic NoC (HPNoC) that uses optical properties to improve the bandwidth and electric properties for setting up the circuit that avoids buffering of optical data. The crucial performance factor of the HPNoC is the setup time of optical circuit by using the electric NoC. We propose to apply a prediction switching to the electric NoC for reducing the path setup time of the HPNoC. Estimation results show that even in small data-size (14 Bytes), HPNoC with the prediction switching reduces the latency compared with the original HPNoC. The HPNoC also reduces latency compared with conventional electric NoC even in small-sized data communication.

#### 1. Introduction

The growing number of cores inside a single chip increases the complexity of CMPs that include electric network-on-chip(NoC), and it will be difficult to meet the power budget constraint available to achieve a higher bandwidth and low latency.<sup>1)</sup>

The development of nanophotonics technology and 3D stack integration using TSV (through silicon via) made possible the integration of photonic elements such as modulators, receivers, waveguides, and switches with the standard CMOS processes. However, using optical interconnect at the chip level still includes a number of challenges that must be resolved. Buffering and processing optical data in flight are still difficult to be handled. Despite this difficulty, a large number of researches consider that optical interconnection will be a candidate of future on-chip interconnects. D. Vantrease et al. proposed a 3D many core architectural that uses nanophotonics communication for both inter-core and off stack communication<sup>2)</sup>, and A. Shacham et al. proposed hybrid NoC design that combines a photonic transmission layer with an electronic control layer and showed that high bandwidth can be achieved at drastically lower power consumption in comparison with fully electric NoC<sup>3)</sup>. With the modern state-of-art, electric packet switching overlaid by an optical interconnection with circuit switching will become to be promised as a solution to improve the bandwidth at low power. As in circuit switching, path setup latency is the critical part for the overall performance of communication.

In this paper, we discuss a hybrid photonic NoC (HPNoC) that uses optical properties to improve the bandwidth and electric properties for setting up the circuit that avoids buffering of optical data. We propose to apply a prediction switching to the electric NoC for reducing the path setup time of optical NoC.

The rest of this paper is organize as follows. In Section 2, we describe existing optical NoCs, and the HPNoC, and its prediction switching are describe in Section 3, followed by it performance gain in Section 4. In section 5 we present some related works. We make our conclusions and our future research direction in Section 6.

# 2. Optical Network-on-chip

Optical networks have been widely used in off-chip interconnection networks, such as wide area networks (WAN), and local area networks (LAN). By borrowing their architecture, recent advance in the nanophotonics technology has made possible the development of optical devices (*waveguides,micro-resonators*) to build optical interconnects of IPs at the chip level. Several researches showed that with those devices, high reliable and scalable interconnect at a higher bit rate will be possible in future complex SoCs and CMPs design. Optical NoC presents the advantage of low power consumption and higher bandwidth in comparison with

<sup>†1</sup> University of Electro-Communications

<sup>†2</sup> University of Tokyo

<sup>&</sup>lt;sup>†3</sup> National Institute of Informatics

electric NoC<sup>3</sup>). Unlike electric NoCs, the design of packet switching on optical NoCs is complex and quite expensive, because of opto-electrical and electro-optical conversion at each node of the path traversal.

Thus, hybrid photonic NoC (HPNoC) that uses electric NoC for setup of paths on fully photonic NoC has been widely received attention as future highbandwidth NoCs for CMPs. The optical properties improve the bandwidth, while electric properties sets up the optical circuit that avoids buffering of optical data at intermediate routers.

# 3. Prediction Switching

In this section, we focus on the HPNoC architecture, and we propose to apply prediction switching to it in order to reduce the setup latency of circuit optical NoC.

# 3.1 Behavior of Hybrid Photonic NoC (HPNoC)

The HPNoC consists of circuit switching on optical NoC and electrical control network NoC for setup the path of optical circuit NoC, Figure 1 shows a  $4 \times 4$  mesh HPNoC topology.

At the beginning of communication, a packet is sent to establish a path from the source to the destination in electric control network, then an acknowledgment signal is sent backward to the source to notify the establishment of the path, and the data transfer take place. When no more data need to be sent, a teardown packet is sent by the destination node to release the circuit. The end-to-end latency for this type of communication can be briefly estimated as follows:

$$latency_{end-to-end} = latency_{path \ setup} + latency_{data \ transfer} \tag{1}$$

where  $latency_{path \ setup}$  is the latency to setup the circuit and  $latency_{data \ transfer}$  is the latency for the data transfer. In HPNoC, the path setup latency sometimes dominates the overall latency<sup>9</sup>). The prediction switching reduces it, and it will strongly improve the overall performance.

# 3.2 Prediction Switching Architecture

To reduce the communication latency on interconnection networks, various router architectures have been recently developed. A speculative router speculatively performs different pipeline stages of a packet transfer in parallel<sup>4</sup>). The look-ahead routing technique removes the control dependency between the rout-



Fig. 1 A 4×4 mesh HPNoC



ing computation and virtual channel allocation in order to perform them in parallel<sup>5)</sup>. In addition, aggressive low-latency router architectures that bypass one or more pipeline stages for specific packet transfers have been proposed<sup>6),7)</sup>.

As yet another low-latency router architecture, we have proposed prediction router that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits<sup>8</sup>). A prediction router adds a predictor to an original router, which forecasts a next packet's output port before it arrives, into an original router that consists of input/output ports, control circuit, and crossbar switch, as shown in Figure 2.

A conventional original router divides packet processing into several pipeline stages, such as routing computation (RC), output virtual-channel allocation (VC), switch allocation (SA), and switch traversal (ST). For the ease of understanding, we explain the prediction mechanism using the behavior of the conventional 4-stage router. Figure 3 shows the diagrams for transferring a packet at the 4-stage router.

By introducing the prediction mechanism, the RC, VA, and SA stages can be processed, before a packet arrives at input buffers. The prediction mechanism thus performs the stage of prediction switch traversal (PST) as soon as receiving the packet, and the delay of the RC, VA, and SA stages are hidden. The



Fig. 3 Pipeline stages of conventional and prediction router

prediction mechanism drastically reduces the packet delay at a router; the 1stage flit transfer is achieved if the prediction succeeds. Even if the prediction fails, it imposes no miss-penalty on packet, and the 4-stage flit transfer is done. Thus prediction switching can be applied for the electrical control network of HPNoC to reduce the circuit set up latency the following section estimates the performance gain.

# 4. Performance Estimation

In this section, we briefly compare the HPNoC and conventional purely electric NoCs w/wo the prediction switching in terms of latency, throughput, and zero-load (ideal) latency of various data-size transfers.

# 4.1 Performance Estimation Environment

A flit-level simulator written in C++ was used for measuring the throughput and latency. Every router has three, four, or five ports, and a single Processing Element(PE) connected to every router. The PEs inject packets independently of each other.

In the electric NoCs, 16 bits are reserved as a field of packet header that include source (4-bit), destination (4-bit), and length (8-bit).

In the prediction router, since some pipeline stages are skipped only when the prediction hits, the primary concern for reducing the communication latency is which prediction algorithm is to be used. A simple and practical prediction algorithm optimized for the dimension-order routing on meshes or tori, static straight (SS) is used in the evaluation. It assumes that all incoming packets

1

continue along the same dimension. In the case of the dimension-order routing on a 2-D mesh for example, the SS predictor fails at most two times per flight, since a packet may turn from the x-dimension to y-dimension in addition to the ejection at destination core. Therefore, packets that travel a long distance increase the prediction hit rate, whereas the communication locality negatively affects the SS predictor.

For a fair comparison between the HPNoC and conventional electric NoC, we set the packet size as follows in electric NoC.

 $Packet\_length = (Data\_size + 16)/Bit\_width$ (2)

Where *Bit\_width* is 16 in the HPNoC, and 16 then 64 in conventional NoC.

Each host injects packets synchronized to the same interval, leading to bursty traffic like that in most scientific applications. We employ a 3-cycle router that combines VA and SA from 4-cycle router shown in Figure 3, since the paper<sup>8</sup>) reported that the frequency of the 4-cycle router is same as that of the 3-cycle router. Other parameters are set as shown in Table 1.

| Table 1 | Simulation | parameters |  |
|---------|------------|------------|--|
|---------|------------|------------|--|

| 1                       |
|-------------------------|
| wormhole                |
| $8 \times 8$ Mesh       |
| Dimension-order routing |
| Uniform                 |
|                         |

Since HPNoC, and optical NoC is a drastically developing technology, there are various implementations. Here, we briefly estimate the latency of optical NoC assuming that the circuit of optical NoC is regarded to be established if a single-flit packet arrives at the destination in the electric control NoC without using ACK/NACK flow control; Once the connection is established, the data is directly transferred to the destination without buffering in optical NoC.

Since the flit-level simulation outputs the accepted traffic, flits/cycle/node, and latency (cycles), we simply estimate latency and (accepted) traffic as follows based on the simulation results.

$$at_{HPNoC}(ns) = sim_{lat} * 0.5(GHz) + data\_size/960(Gbps)$$
(3)

$$lat_{el}(ns) = sim_{lat} * 0.532(GHz) \tag{4}$$

 $Trf_{HPNoC}(Gbps/node) = sim_{trf} * 0.5(GHz) * data\_size$ (5)

 $Trf_{el}(Gbps/node) = sim_{trf} * (payload/pkt\_size) * 0.532(GHz) * bitwidth(6)$ 

The frequency value of the prediction router is 0.5 (GHz), is briefly taken from our results in previously published papers. The original conventional router increases it by 6.4% as reported in<sup>8)</sup>. The bandwidth of optical NoC is 960Gbps which are simply borrowing from the value in <sup>10</sup>.

## 4.2 Analysis Results

# 4.2.1 Latency and Throughput



Fig. 4 Throughput and latency, two-bytes data transfer

Figures 4,5, and 6 show the estimation results of latency and throughput that is the maximum value of accepted traffic. "Conv, 16 bit" represents an electric NoC whose link consists of 16 bit-width, while "Hybrid (pred)" represents the HPNoC that employs prediction routers.

These results show that the prediction router strongly improves the throughput and latency in both the electric NoC and HPNoC. In particular, even for small size data transfer (two bytes), the HPNoC drastically increases the throughput, although the bandwidth of each optical link can not be fully used. In addition, by employing the prediction router in electric NoC, the throughput is increased.

The HPNoC decreases the latency compared with the electric NoC by making the best use of high-bandwidth optical links. These figures show that although



Fig. 5 Throughput and latency, 14-bytes data transfer



Fig. 6 Throughput and latency, 256-bytes data transfer

the bandwidth increases by widening the bit-width of electric NoC links, its improvement is quite smaller than that using the HPNoC.

#### 4.2.2 Zero-load Latency

Assuming a packet that consists of L flits including a single header flit goes through h wormhole routers, its zero-load latency is calculated as

$$T_0^{orig} = T_{lt}(h-1) + (T_{rc} + T_{vsa} + T_{st})h + data\_size/BW,$$
(7)

where  $T_{rc}$ ,  $T_{vsa}$ ,  $T_{st}$ , and  $T_{lt}$  are the latencies for the RC, VSA, ST, and LT stages, respectively.

The prediction router can skip the RC and VSA stages only when the prediction hits. Therefore, its zero load latency is calculated as

$$T_0^{pred} = T_{lt}(h-1) + (T_{pst})hP_{hit} + (T_{rc} + T_{vsa} + T_{st})h(1 - P_{hit}) + data\_size/BW,$$
(8)

where  $T_{pst}$  is the latency for the prediction switch traversal (PST), and  $P_{hit}$  is the probability of the prediction hit. In the case of HPNoC, let L be 1 for the brief estimation.

Figure 7 shows the zero-load latencies of the electric NoC (Conv), the HPNoC (Hybrid), and the HPNoC with perfect (or oracle) prediction (Hybrid (ideal)). The figure demonstrates that the prediction routers of electric NoCs drastically improve the latency of all NoCs in various data-sized transfers. In general, HP-NoC and photonic NoCs are usually focused by their extreme high bandwidth in the case of large-sized data transfer, such as megabytes. However, this report clearly shows that the HPNoC is an efficient NoC infrastructure in terms of latency, and it should be used even for samll data sizes.



#### 5. Related Works

Traditionally, the circuit-switching technique has been used for low-latency bulk data transfers. It allocates the crossbar switches along the destination node before data flits are actually sent out. Once the crossbars are allocated, every data flit can be transferred at the minimal latency (e.g., 1-cycle per hop). However, the setup operation that allocates the crossbars along the route requires a significant amount of latency and this sometimes overwhelms the benefit.Our proposed Prediction switching can reduce considerably the path setup latency.

To efficiently exploit the circuit switching, hybrid router architectures that have both a wormhole and circuit switches have been  $proposed^{11)12}$ . In<sup>12)</sup>, the wormhole and circuit switches share the same physical data links. To remove the setup latency of circuit switching, the data flits are piggybacked immediately behind the setup flit. If there are not any unused circuit-planes available, the data flits get off the circuit and use the packet-switched pipeline until they reach their destination. In this router, both the circuit-switched and packet-switched pipelines are implemented in every physical port, in addition to a tiny setup network for the circuit switching.

An optical Dimension Order Router(ODOR) have been proposed in<sup>13)</sup>. ODOR takes advantage of the dimension order routing (for example a packet coming from the X direction can not turn in the Y dimension in a YX algorithm )to reduce the cost of optical router by reducing the number of microresonnator and waguides compare to the proposed architecture  $in^{2)14}$ . The power gain is suitable for our proposed Prediction switching for HPNoC to reduce the low power cost added by the predictor compare to conventional router.

#### 6. Conclusions

With the development of nanophotonics, optical interconnection has been proposed as potential solution to replace the electric NoC which requires large power consumption caused by its electric component. However, buffering and processing of optical data is still difficult to be implemented at the chip level.

In this paper, we discuss hybrid photonic NoC (HPNoC) that uses optical properties to improve the bandwidth and electric properties for setting up the circuit that avoids buffering of optical data. The crucial performance factor of the HPNoC is the setup time of optical circuit by using the electric NoC.

The gain in reducing the latency in the electric NoC can lead to considerable gain in the overall performance of the HPNoC in terms of throughput and latency. Even in the small-sized data transfer, such as 14 bytes, the HPNoC drastically increases the throughput, although the bandwidth of each optical link can not be fully used. In addition, by employing the prediction router in electric NoC, the throughput is increased.

In our future work, we plan to develop a more accurate simulation of the HPNoC with prediction switching. We also identify several interesting domain of investigation for future optical NoC design, such as, dense wave-length division multiplexing (DWDN).

# Acknowledgments

This work was partially supported by NII Joint Research Fund and Grantsin-aid for Scientific Research of Japan Society for promotion of Science (JSPS), No.19500040(C).

# References

- 1) Nikkei Microdevices Magazine No.288, P.90 June 2009.
- 2) Dana Vantrease et al: "Corona: System Implications of Emerging Nanophotonic Technology", In Proceedings of IEEE International Symposium on computer Architecture ISCA-35, 2008, pp. 153-164.
- A. Shacham, K.Bergman and L.P.carloni: "The Case for Low-Power Photonic Networks on Chip" In Proc of Design Automation Conference, June 2007.
- 4) Li-Shiuan Peh and William J. Dally, "A Delay Model and Speculative Architecture for Pipelined Routers", Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pp.255–266, 2001.
- W.J. Dally and B. Towles: "Principles and Practices of Interconnection Networks", Morgan Kaufman Publishers, P.550 (2003).
- 6) C. Izu and R. Beivide and C. Jesshope "Mad-Postman: A Look-Ahead Message Propagation Method for Static Bidimensional Meshes", *Proceedings of the Euromi*cro Workshop on Parallel and Distributed Processing (PDP), pp.117–124, 1994.
- 7) Amit Kumar and Li-Shiuan Peh and Partha Kundu and Niraj K. Jha, "Express Virtual Channels: Towards the Ideal Interconnection Fabric", *Proceedings of the International Symposium on Computer Architecture (ISCA)*, pp.150–161, 2007.

- 8) Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga: "Prediction Router:Yet Another Low Latency On-Chip Router Architecture", Proceedings of the 15th IEEE Internation Symposium on High-Perfromance Computer Architecture (HPCA 2009), pp. 367-378.
- 9) Assaf Shacham et al :"Photonic NoC for DMA communications in Multiprocessors" *Hot Interconnects*, 2007.
- 10) Assaf Shacham, Keren Bergman and Luca P. Carloni, "On the Design of a Photonic Network-on-Chip", Proceedings of the International Symposium on Networks-on-Chip (NOCS), 2007.
- 11) José Duato and Pedro López and Federico Silla and Sudhakar Yalamanchili", "A High Performance Router Architecture for Interconnection Networks", *Proceedings* of the International Conference on Parallel Processing (ICPP'96), pp.61-68.
- 12) Natalie Enright Jerger and Li-Shiuan Peh and Mikko Lipasti, "Circuit-Switched Coherence", *Proceedings of the International Symposium on Networks-on-Chip* (NOCS'08), pp.193-202.
- 13) Huaxi Gu, Jiang Xu, Zhen Wang,"ODOR:A Microresonator-based Highperformance Low-cost Router for Optical Networks-on-Chip", Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardward/Software codesign and system synthesis(2008), pp. 203-208.
- 14) M.Briere, B. Girodiaas et al."System Level Assessment of an Optical NoC in an MPSoC Platform", Design, Automation & Test in Europe Conference & Exhibition, 2007.