# **Bus Serialization for Reducing Power Consumption**

NAOYA HATTA,<sup>†</sup> NIKO DEMUS BARLI,<sup>††</sup> CHITAKA IWAMA,<sup>†</sup> LUONG DINH HUNG,<sup>†</sup> DAISUKE TASHIRO,<sup>†</sup> SHUICHI SAKAI<sup>†</sup> and HIDEHIKO TANAKA<sup>†††</sup>

Shared-bus chip multiprocessors require buses with long wires. The portion of power consumed in wires relatively increases with device scaling. In this paper, we advocate the use of bus serialization to reduce bus power consumption. Bus serialization decreases the number of wires, and increases the pitch between wires. The wider pitch decreases the coupling capacitances of wires, and consequently reduces bus power consumption. Evaluation results indicate that our technique can reduce bus power consumption by 30% at 45nm technology process.

# 1. Introduction

Power reduction has emerged as one of the most important issues in recent VLSI design. As gate length shrinks, interconnects have increasing impact on total power consumption. In addition, this trend intensifies on Chip Multiprocessor (CMP) and System on Chip (SoC) requiring a lot of long interconnects. For example, in a SoC with 4 ARM processors, 10 - 15 % of power is consumed by the interconnects [5].

In this paper, we investigate and propose a bus serialization technique for reducing on-chip bus power consumption without causing area and throughput penalties. The concept of our proposal is to reduce the coupling capacitances of adjacent wires. In our proposal, a conventional parallel bus is replaced with some serial buses. Adopting serial bus allows less number of wires and more spacing between wires in the same chip size. This results in reduced coupling capacitances and consequently bus power consumption.

This paper describes the details and quantitative effects of bus serialization. Advantages and disadvantages of it are also described.

The following sections are organized as follows. Section 2 presents the details of our proposal. Section 3 reports the evaluation results. Section 4 describes related work and explains our contributions. Finally, Section 5 concludes this paper.

#### 2. Bus Serialization

#### 2.1 Concept

Bus power consumption P is generally calculated

```
    † 東京大学
University of Tokyo
    †† 日本テキサス・インスツルメンツ
Texas Instruments Japan
    ††† 情報セキュリティ大学院大学
```

```
Institute of Information Security
```

using the following formula.

$$P = a f W C V^2$$
.

In the formula, a is the switching activity, f is the bus frequency and W is the number of wires. C is the bus capacitance and V is the voltage swing. It indicates that bus power consumption can reduce by reducing bus capacitance. In particular, in deep submicron technologies, coupling capacitance is dominant in bus capacitance. Consequently, reducing coupling capacitance is effective in reducing bus power consumption.

We notice this point and propose bus serialization technique. Bus serialization is a technique that introduces serial bus to reduce power consumption. Introducing serial bus decreases the number of wires, and permits wider spacing between wires in the same area. It makes coupling capacitances decrease. Therefore bus serialization can reduce bus power consumption.

In addition, bus serialization permits higher bus frequency. The wider wire pitch allows us room for improving bus capacitance as well as bus resistance. If wire spacing increases by the extra spacing, coupling capacitance decreases. If wire width increases otherwise, bus resistance decreases and load capacitance increases. Bus frequency is approximately in inverse proportion to the product of the total capacitance and resistance. Therefore both low power consumption and high frequency can be achieved by optimizing wire width.

#### 2.2 Basic Structure

Figure 1 shows the basic structure of a serialized bus. Total bus width is the product of the number of wires M and the serialization degree N. By serialization, the number of wires decreases from  $M \cdot N$  to M. Each serial bus transfers N bits data per one transaction. Serializer and deserializer are used to convert from parallel data to serial data and vice versa.

Power consumption of conventional bus  $P_C$  and that



Fig. 1 Circuit Structure of Serialized Bus.



Fig. 2 Bus Layout Design.

of serialized bus  $P_S$  are shown in the followings.

$$P_C = af(M \cdot N)CV^2,$$
  
$$P_S = a(f \cdot N)M(C/\alpha)V^2,$$

It indicates that serialized bus can reduce power consumption by capacitance reduction  $\alpha$  without reducing throughput.

### 2.3 Layout Design Optimization

As has been mentioned, the extra spacing allowed by bus serialization can reduce bus capacitance or resistance. In this section, we propose a methodology for determining optimum wire width and spacing.

We define following parameters.

N : Serialization degree.

 $W_S$ : Wire width (serialized bus).

 $S_{S}$ : Wire spacing (serialized bus).

We assume that the following parameters are defined by a bus specification and a metal configuration.

 $M \cdot N$ : Total bus width.

 $W_C$ : Wire width (conventional bus).

 $S_C$ : Wire spacing (conventional bus).

 $f_C$ : Bus frequency (conventional bus). The following parameters can be calculated from pre-

vious parameters.

C : Bus capacitance.

R : Bus resistance.

 $f_S$ : Bus frequency (serialized bus).

Figure 2 shows the extra spacing gained by bus serialization.  $L_C$  is wire pitch in conventional (fully parallel) bus. Bus serialization increases wire pitch from  $L_C$  to  $L_S$ . For an identical wire's area, wider  $L_S$  can be used to increase wire spacing  $S_S$  or wire width  $W_S$ .



An Example Where Serialized Bus Increases Power



Fig. 4 The Circuit Example of Differential Data Transfer.

These can be reduced to following.

Consumption.

Fig. 3

 $W_S + S_S = (W_C + S_C) \cdot N.$  (1) This is the constraint for area.

To maintain the same throughput, bus frequency of a serialized bus must be N times as high as that of conventional bus. Therefore the following inequality is the constraint for bus frequency.

$$f_S > f_C \cdot N. \tag{2}$$

In this paper, we assume the formula developed by Kawaguchi and Sakurai [3] for calculating bus frequency, and the capacitance model developed by Chern et al. [1] for calculating bus capacitance.

When Equations 1 and 2 are fulfilled and *C* is minimized, the best  $W_S$  and  $S_S$  can be found.

# 2.4 Differential Data Transfer

Though bus serialization can reduce bus capacitance, bus serialization may also increase power consumption. Figure 3 shows an example of the case. When bits at a clock cycle are similar to bits at the previous clock, no power is consumed by conventional bus. However, when the bits are not all 0 or all 1, extra power consumption is consumed by serialized bus. In address bus, bit pattern like this frequently appears.

The following solution for the problem is possible. The cause of the problem is that the present bits are similar to the previous bits. Therefore the problem can be resolved by varying the bits. Differential data transfer is a technique that transfers only the difference between the present bits and the previous bits. By the technique, when bits pattern is sequential, many bits become 0 and the power consumption is reduced.

Figure 4 shows an example of circuit for differential data transfer.

| Table 1 Processor Model. |
|--------------------------|
|--------------------------|

| issue widt       | h | 4                          |
|------------------|---|----------------------------|
| data cach        | e | 16KB, 2-way, 64-byte block |
| instruction cach | e | 16KB, 2-way, 64-byte block |
| L2 cach          | e | ideal                      |

## 2.5 Disadvantages of Bus Serialization

Possible disadvantages of our proposal are the additional power of peripheral circuits and clock skew. In this technique, we need serializer, deserializer and an extra clock line. If the power consumption of these circuits is larger than power reduction by our proposal, the technique is not effective. Therefore we must take care of the power for using the technique. However, in deep sub-micron technologies, the power consumption of peripheral circuits is probably not critical. According to our estimation in Section 3.3.3, when serialization degree is low: for example N = 2, the power of these circuits is about 2.4% of conventional bus power consumption. This is not critical because the power reduction by our proposal is 27% - 34% of conventional.

On the other hand, we must always consider the problem of clock skew. We currently do not investigate this issue in this paper. The margin of clock skew is in inverse proportion to the serialization degree N.

## 3. Evaluation

## 3.1 Setup

In this section, we evaluate the effects of our proposal. The bus specification that we assume is shown as follows.

# Total bus width : 64 bits Serialization degree : 2 Bus length : 5 mm

We assume wire configurations derived from International Technology Roadmap for Semiconductors 2002 Update [7], and use eight applications from SPEC95int benchmark suite for estimating data dependency of bus power. We assume a processor with cache configuration shown in Table 1, and simulate 10 - 25 million bus transactions for each benchmark.

We assume bit patterns between L1 cache and L2 cache for estimation, and use load address, load data, store address, and store data in SPEC95int benchmark.

# 3.2 Capacitance Analysis

In this section, we estimate the effects of our proposal in reducing bus capacitance. We have proposed the methodology of layout design in Section 2.3. Figure 5 shows the relation between bus capacitance C, bus resistance R, and bus throughput T in 90nm tech-



**Fig. 5** Layout Optimization (90nm process, serialization degree = 2).



Fig. 6 Capacitance Ratio of Serialized Bus to Conventional Bus.

nology. In Figure 5, throughput line in the meshed area meets Inequality 2. The circled point shows the wire width where bus capacitance is minimized. Therefore the wire width of this point is optimum from power viewpoint.

We find optimum width and capacitance in each technology by similar approach. Figure 6 shows minimized bus capacitances by our proposal in each technology. It indicates that our proposal becomes more effective as gate length shrinks. This is because coupling capacitance becomes more dominant as wire spacing decreases.

#### 3.3 Power Analysis

#### 3.3.1 Power Reduction

Bus power consumption can be calculated from bus capacitance and bit patterns transferred by the bus.

Figure 7 shows power consumption ratio of the serialized bus to conventional bus in each benchmark. Figure 8 shows power consumption averages in each technology.

The results of Figure 7 indicate that there is a significant difference between address bus and data bus, and our proposal is effective when it is adopted as data



Fig. 7 Power Consumption in Each Benchmark (45nm process).



Fig. 8 Average Power Consumption in Each Process.



Fig. 9 Differential Data Transfer: Power Consumption in Each Benchmark (45nm process).

bus. From Figure 8, we can find the same tendency and the effectiveness of our proposal becomes larger as gate length shrinks.

#### 3.3.2 Differential Data Transfer

As we have mentioned in Section 2.4, when bit pattern is sequential, bus power does not decrease by our proposal. From Figure 7 and 8, we can consider that address bit pattern is sequential.

Figure 9 and 10 shows power consumption with differential data transfer. It indicates that differential data transfer is effective in address bus.

Figure 11 shows comparison of serialized bus and



Fig. 10 Differential Transfer: Average Power Consumption in Each Process.



Fig. 11 Compare Serialized Bus to Serialized Bus with Differential Data Transfer



Fig. 12 Circuits of Serialized Bus for SPICE Simulation.



Fig. 13 Circuits of Conventional Bus for SPICE Simulation.

serialized bus with differential data transfer. According to the figure, differential data transfer is not effective in data bus. Therefore, unmodified serialized bus is proper to data bus, and serialized bus with differential data transfer is proper to address bus.

# 3.3.3 Power of Peripheral Circuits

We have mentioned the circuit structure of serialized bus in Figure 1. In this section, we assume specific circuits shown in Figure 12 and 13 for estimating



Fig. 14 Current of Peripheral Circuits.

| Tal | ole 2 | Delay of Perip | heral Circu | its. |
|-----|-------|----------------|-------------|------|
|     |       |                | Delay       |      |
|     | Cor   | ventional Bus  | 0.17ns      |      |
|     |       | Serialized Bus | 0.15ns      |      |

power of these circuits by SPICE simulation. Transistors in serializer, deserializer and D Flip-Flop (DFF) have the same gate width (basic width), and width of transistors in buffer is x 2, x 4 and x 8 of basic width. We assume that wire capacitance is 1pF in both conventional bus and serialized bus.

Figure 14 shows the additional power of peripheral circuits in 180nm process. In the figure, *Peripherals* means serializer, deserializer and DFF in Figure 12 and 13. *Wire* means the power consumed in buffer. Indeed our proposal increases the power of peripheral circuits, but the additional power is only 2.4 % of conventional bus power consumption.

# 3.4 Timing Analysis

Our proposal needs serializer and deserializer, and these additional circuits probably cause additional delay. In this section, we estimate the additional delay by SPICE simulation. The circuits for SPICE simulation are shown in Figure 12 and 13. We assume that the delay by peripheral circuits is interval from the rising of clock to the rising of buffer output.

Simulation results are shown in Table 2. According to the results, serialized bus is faster than conventional bus. This is because an output inverter of serializer drives an inverter while an output inverter of DFF drives two inverters. Output inverter means the inverter of the last stage in each block.

The delays depend on the structure of circuits and gate width. However, the results indicate that an additional delay by bus serialization is not critical.

#### 3.5 Area Analysis

In this section, we estimate the cost of our proposal in terms of area. We evaluate the area of circuits by the number of transistors, and we assume gate width



Fig. 15 Area of Peripheral Circuits.

as weighting factor on buffer in Figure 12 and 13. For example,  $x^2$  inverter is counted as two inverters. Figure 15 shows the number of transistors in peripheral circuits. Though serialized bus needs serializer and deserializer, number of buffers required by serialized bus is fewer than conventional bus. In additional, serializer and deserializer have little additional area than conventional DFF. Therefore serialized bus can be implemented in the area that is 7% smaller than conventional bus.

## 4. Related Work

Generally, there are two approaches for reducing on-chip bus power consumption: signal transition density reduction and effective capacitance reduction. A concept of signal transition density reduction is to minimize the signal transition on bus by data encoding schemes. Bus-invert coding [9] and code-book encoding [2] have been proposed as the data encoding for signal transition density reduction. In addition, adaptive code-book encoding focuses on coupling capacitance between wires [4] also has been proposed.

Effective capacitance reduction is minimizing the effective capacitance of wires by layout optimization. Coupling-driven bus ordering [8] and Non-uniform wire placement [6] have been proposed. The former reduces effective capacitance by reordering bus wires. The bus order is determined by a heuristic algorithm. The latter applies non-uniform spacing wire placement to address bus. The non-uniform spacing is also determined by a heuristic algorithm. The effective of both two techniques depend on predictability of bit pattern.

Our proposal is also capacitance reduction technique. However our proposal can reduces capacitance of all wires. Therefore our proposal can apply to buses that data is not predictable.

# 5. Conclusion

We first pointed out the importance for reducing bus power consumption. As gate length shrinks, power consumption of interconnects has more impact on total power consumption. In particular, buses generally are organized by long wires that have large capacitance, and coupling capacitance between wires is dominant in deep sub-micron process.

We propose a bus serialization technique for reducing bus power consumption without decreasing throughput. Our proposal focuses on reducing coupling capacitance and introduces on-chip serial bus.

In this paper, we evaluated our proposal, assuming 64bit bus with serialization degree of 2 and wire length of 5mm. Evaluation results showed power reduction by our proposal depends on data that is transferred by bus. However, according to the results, bus power consumption decreases to 66% of conventional bus when serialized bus is adopted as data bus. Moreover, when serialized bus is adopted as address bus, bus power consumption decreases to 73% by differential data transfer.

We evaluated additional costs by our proposal in 180nm process. Our proposal needs serializer, deserializer and extra clock line. However the additional delay by these circuits is negligible. The additional power consumption is 2.4% of conventional bus. This overhead is small enough compare to power reduction 27% - 34% by our proposal. The layout size of peripheral circuits decreases 7% from conventional bus. We did not evaluate additional costs by differential data transfer. This is a future work.

#### Acknowledgement

This research is partially supported by Grant-in-Aid for Fundamental Scientific Research B(2) #13480077 and B(2) #1630013 from Ministry of Education, Culture, Sports, Science and Technology Japan, Semiconductor Technology Academic Research Center (STARC) Japan, CREST project of Japan Science and Technology Corporation, by 21st century COE project of Japan Society for the Promotion of Science, and VLSI Design and Education Center(VDEC), the University of Tokyo in collaboration with Cadence Design Systems, Inc, Hitachi Ltd, Mentor Graphics, Inc, and Synopsys, Inc.

# References

 J.-H. Chern, J. Huang, L. Arledge, P.-C. Li, and P. Yang. Multilevel Metal Capacitance Models For CAD Design Synthesis Systems. *IEEE ELECTRON DEVICE LETTERS*, 13(1):32–34, 1992.

- [2] M. Ikeda and K. Asada. Bus Data Coding with Zero Suppression for Low Power Chip Interface. In Proc of. 1996 International Workshop on Logic and Architecture Synthesis, pages 267–274, 1996.
- [3] H. Kawaguchi and T. Sakurai. Delay and Noise Formulas for Capacitively Coupled Distributed RC Lines. In Proc of. 1998 Asia South Pacific Design Automation Conference, pages 35–43, 1998.
- [4] S. Komatsu, M. Ikeda, and K. Asada. Bus Data Encoding with Coupling-driven Adaptive Code-book Method for Low Power Data Transmission. In *Proc of. 2001 European Solid-State Circuits Conference*, 2001.
- [5] M. Loghi, M. Poncino, and L. Benini. Cycle-Accurate Power Analysis for Multiprocessor System-on-a-Chip. In *Proc of. 2004 ACM Great Lakes Symposium on VLSI*, pages 401–406, 2004.
- [6] L.Macchiarulo, E.Macii, and M.Poncino. Wire Placement for Crosstalk Energy Minimization in Address Buses. In *Proc of. 2002 Design, Automation and Test in Europe*, pages 158–162, 2002.
- [7] Semicondutor Industry Association. International Technology Roadmap for Semiconductors 2002 Update. http://public.itrs.net, 2002.
- [8] Y. Shin and T. Sakurai. Coupling-Driven Bus Design for Low-Power Application-Specific Systems. In *Proc* of. 2001 Design Automation Conference, pages 750– 753, 2001.
- [9] M. R. Stan and W. P. Burleson. Bus-Invert Coding for Low-Power I/O. *IEEE Transactions on Very Large Scale Integration Systems*, 3(1):49–58, 1995.