# Low-Power VLSI Implementation by NMOS 4-Phase Dynamic Logic

BAO-YU SONG,<sup>†</sup> MAKOTO FURUIE,<sup>†</sup> YUKIHIRO YOSHIDA,<sup>†</sup> TAKAO ONOYE<sup>††</sup> and ISAO SHIRAKAWA<sup>†</sup>

An nMOS 4-phase dynamic logic scheme is described, which is intended mainly to achieve low-power consumption. In this scheme, the short-circuit current of a logic gate is eliminated, and moreover, the capacitive load of the gate is reduced to almost half as compared with the corresponding CMOS gate, resulting in enhancing the power reduction and shortening the gate delay. A new layout concept of *Array Cell* (AC) is introduced, which contains  $(M \times N)+2$  transistors to construct a logic gate, and is used for the basic logic component in the nMOS 4-phase dynamic logic scheme. The regular structure of the AC contributes much toward the reduction of total layout area. Moreover, a clock generator dedicated to generating four types of clock signals is devised for reducing the complexity of clock distribution. A number of experimental results of logic modules are also shown to demonstrate that not only the low-power dissipation but also the high density can be attained.

### 1. Introduction

The CMOS (Complementary MOS) logic scheme is widely used, in which a CMOS gate has a distinctive structure that a pair of nMOS and pMOS logic blocks are connected serially between Vdd and Vss. Hence, while the output of the gate is steady at the "0" or "1" logic level, there is no short-circuit current from Vdd and Vss, which achieves the low-power dissipation<sup>1</sup>; whereas while the output is transitive between the "0" and "1" logic levels, the shortcircuit current flows from Vdd to Vss, which makes the power dissipation increase suddenly as the clock frequency grows<sup>2</sup>).

Recently, extensive efforts have been attempted to reduce power dissipation by means of the pass-transistor logic, which may be regarded as a promising solution<sup>3),4)</sup>. Nevertheless, this logic tends to lack the robustness against downsizing and voltage scaling<sup>5)</sup>. Hence, there still remains considerable room for investigating novel logic schemes which can attain much more power reduction.

Motivated by this tendency, we exploit the 4phase dynamic logic to achieve low-power consumption, which was originally introduced in Refs. 6)-8). This logic scheme uses four types of clock signals together with four types of logic gates in such a way that each logic module is constructed of a number of different types of logic gates, and the same clock signal is fed simultaneously to a pair of terminals of each gate so as to eliminate the short-circuit current from the top terminal to the bottom one.

It should be added here that the pMOS 4phase dynamic logic was once widely used dedicatedly for calculators in 1970s, when only the pMOS technology was available. However, an efficient layout model as well as a concise mechanism of generating four types of clock signals is indispensable in making the best use of this logic scheme in the very deep submicron era.

Section 2 describes features of the nMOS 4phase dynamic logic in comparison with the static CMOS and the dynamic domino CMOS logic. Section 3 introduces an *Array Cell* (AC) architecture for constructing logic gates, and Section 4 proposes a clock generator dedicated to this nMOS 4-phase dynamic logic. Several experimental results are shown in Section 5 to demonstrate the performance of this logic scheme. Conclusion is finally summarized in Section 6.

## 2. NMOS 4-Phase Dynamic Logic

The 4-phase dynamic logic can be implemented in different ways in terms of the behaviors of four types of clock signals and the structures of four types of logic gates<sup>6</sup>),<sup>7</sup>), although there is no crucial difference among them on circuit operations. A typical configuration of this logic scheme is shown in **Fig. 1**.

A mathematical model of the logic simulation was proposed for the 4-phase MOS dynamic logic in Ref. 6), the behavior of the four

<sup>†</sup> Department of Information Systems Engineering, Osaka University

<sup>††</sup> Department of Communications and Computer Engineering, Kyoto University



Fig. 1 NMOS 4-phase dynamic logic scheme. (a) Four types of logic gates, (b) Four different types of clock signals, (c) Priority relations among different types of logic gates.

types of gates as well as the connection rule among them was considered in Ref. 7), and the transient analysis of the 4-phase MOS switching circuits was executed in Ref. 8). Nevertheless, the potential of practicability of the 4-phase dynamic logic has not yet been discussed in comparison with the static CMOS or the domino CMOS logic.

We first show distinctive features of the nMOS 4-phase dynamic logic in contrast with the static CMOS logic.

 As can be seen from Fig. 1 (a), a pair of the top and bottom terminals of each 4phase logic gate are driven by the same clock signal, and hence neither an input signal nor a control signal causes the short-circuit current which occurs in a CMOS gate, resulting in very small power dissipation even at a high clock frequency.

- (2) Considering that the output of an nMOS 4-phase gate is connected only to nMOS transistors of the subsequent gates, whereas the output of a CMOS gate is connected to both nMOS and pMOS transistors of each subsequent gates, the capacitive load of the output of the 4-phase gate is half of that of the corresponding CMOS gate.
- (3) The number of transistors necessary for a k-input 4-phase logic gate is k + 2, whereas that for the CMOS logic gate equivalent to it is 2k.
- (4) The 4-phase dynamic logic uses nMOS ratio-less transistors, which can make the layout of gates much simple and concise.

Consequently, we can see that, in comparison with the static CMOS logic scheme, the nMOS 4-phase dynamic logic scheme can achieve much power saving as well as area efficiency.

We now touch on the domino CMOS logic in contrast with the nMOS 4-phase dynamic logic: In the domino CMOS logic a gate function is realized with the use of only one logic block, similarly to that of the 4-phase dynamic logic, where the number of transistors necessary for a k-input gate is k+2. However, extra transistors may have to be required to avoid the charge redistribution and to stabilize the circuit operation. Thus, the total number of transistors necessary for a k-input gate exceeds k+2.

Although an inverted signal can be generated in the domino CMOS logic, circuit instability is incurred when both a gate output signal and its inverted signal are input to the same gate. Therefore, the domino CMOS logic may dissipate more power for specific logic functions<sup>9</sup>.

In addition, the use of both pMOS and nMOS transistors in the domino CMOS logic increases the layout complexity, as compared with the 4-phase dynamic logic which employs only nMOS ratio-less transistors.

The behavior of the 4-phase dynamic logic is exemplified with the use of a 1-bit full adder as depicted in **Fig. 2** (a). As can be seen from Fig. 2 (a), the 1-bit full adder of the 4-phase dynamic logic is constructed of three gates of type 1 and two gates of type 2. Supposing that sig-



Fig. 2 Circuit structures of 1-bit full adder.

nals a, b, and cin are conveyed from the gates of type 4 to these three gates of type 1, the behavior of the adder is summarized as follows; (1) three gates of type 1 are precharged and evaluated in phases 3 and 1, respectively, converting signals a, b, and cin to  $\overline{a}, \overline{b}$ , and  $\overline{cin}$ , respectively, and (2) two gates of type 2 are precharged and evaluated in phases 1 and 2, respectively, producing output signals *sout* and *cout* of the 1-bit full adder.

It should be added here that gates of types 2 and 4 can constitute the basic gates of the so-called dual-phase dynamic logic, in which highly complicated logic structures have to be pursued by inserting gates of types 1 and 3 between basic gates of types 2 and 4, which can reduce the total number of gates.

To see the difference of these four logic schemes, circuit structures of the 1-bit full adder realized by means of the static CMOS logic, domino CMOS logic, and pass-transistor logic are also depicted in Fig. 2 (b), (c), and (d), respectively.

### 3. Array Cell Architecture for Layout

It is generally supposed that a dynamic logic gate is accompanied with multiple signal inputs, and hence it may suffer from a long gate delay, which can degrade the whole circuit performance. Thus we have estimated the gate delay for the 4-phase dynamic gates through an HSPICE simulator by using  $0.35 \,\mu$ m technology. Figure 3 shows a part of simulation results of the gate delay for 4-phase NAND gates versus static CMOS NAND gates with the same capacitive load, from which we can see that the delay of a 4-phase NAND gate is small to such an extent that this can be connected to 1.5 times as many inputs as a CMOS NAND gate

Since the delay of a 4-phase logic gate is much shorter than that of a CMOS logic gate, and the output of the gate is available at least in the next clock phase, we can realize a logic module with less gates than that in the CMOS logic.

For the ease of layout synthesis, let a logic block be constructed in an  $M \times N$  transistor array as illustrated in **Fig. 4** (a), where it should be noticed that transistors are arrayed without interconnection. Now, consider a trivial logic function which is realized by a regular interconnection as shown in Fig. 4 (b). The HSPICE delay simulation has been attempted for this trivial function with different values of M and N, at 3.3 V and 1.8 V supply voltages, and with the nMOS threshold voltage of 0.55 V and 0.4 V, respectively, where it is assumed that the capacitive load for each gate is 40 fF, or in other words, each gate has the driving ability of 8 nMOS transistors. **Figure 5** depicts the simu-



Fig. 3 Delay simulation for NAND gates.



Fig. 4 Logic block model with  $M \times N$  transistor array.



 ${\bf Fig. 5} \quad {\rm Simulation\ results\ for\ gate\ delay}.$ 

lation results, from which we can see that

- (i) 144 transistors  $(M \times N = 12 \times 12)$  can be used for the operation frequency of 50 MHz at 3.3 V supply voltage,
- (ii) 60 transistors (M $\times$ N=6 $\times$ 10) can be used for the operation frequency of 100 MHz at both 3.3 V and 1.8 V supply voltages, and
- (iii) 36 transistors (M×N=6×6) can be used for the operation frequency of 200 MHz at 3.3 V supply voltage.



Fig. 6 Layout model for proposed Array Cell.



Fig. 7 Carry generator of 4-bit adder.

Consequently, an optimal realization of a given logic function can be attained according to the above statements (i), (ii), and (iii).

Now, let us define a new layout concept of an Array Cell (AC) by a layout cell which is composed of a two-dimensional array of  $M \times (N+1)$ transistors for a logic block and two transistors for precharge and gate control, as illustrated in Fig. 6, where the number M of columns and the number N+1 of rows denote the width and the height of the AC, respectively. In terms of the layout standardization of a layout macro in our 4-phase dynamic logic scheme, henceforth let us fix the width M of each AC, as depicted in Fig. 6, just in the same way as the standard-cell approach. In addition, one dimensional layout compaction can be applied to each AC on the basis of diffusion abutment and deletion of unused transistors so as to shorten the height of the AC.

For example, a 4-phase dynamic gate for the carry generator of a 4-bit adder depicted in **Fig. 7** (a), can be constructed by a  $(6 \times 5)+2$  AC as shown in Fig. 7 (b), where 6 and 5 denote the width and height of the AC, respectively, and 2 indicates two transistors used for precharge and gate control. In case of constructing an 8-bit round shift circuit (see **Fig. 8**), each output



Fig. 8 8-bit round shift circuit. (a) Block diagram, (b) Gates organization, (c) Array structure for y0.



Fig. 9 Clock generation and distribution method. (a) Global routing for synchronous signals and local routing for four clock signals, (b) Gate diagram for each part of clock generator.



Fig. 10 Waveforms of basic clock signal (BC), two synchronous signals (SS1 and SS2), and four clock signals.

can be implemented by a  $(6 \times 6) + 2$  AC.

## 4. Clock Generator

Each logic gate in the nMOS 4-phase dynamic logic is driven by one of the four types of clock signals as indicated in Fig. 1, instead of Vdd in the CMOS logic. Therefore, a clock generator can be implemented dedicatedly for providing such four types of clock signals to each logic gate.

To cope with the required severity of distributing these four clock signals, a clock distribution method is devised in such a way that the clock generator is composed of two parts, i.e. SS and CS, as shown in **Fig. 9** (a). The part of SS, which is located at the center of the chip, synthesizes two synchronous signals. Then these two synchronous signals are fed to CSs, which are distributed overall in the chip. Each CS simultaneously generates four types of non-overlap clock signals and keeps local synchronization of logic gates.

Figure 9 (b) depicts a gate structure of a clock generator which can be implemented by the static CMOS logic. A basic clock signal, as exemplified in **Fig. 10**, which is discussed later, is used as the input signal to *SS*.

To enhance the driving ability and the noise margin of each clock signal, a large size of inverters are placed at the last stage of CS.

| 3.3 V<br>100 MHz |                      | 4 phase | sCMOS | dCMOS | РТ    |
|------------------|----------------------|---------|-------|-------|-------|
|                  | trs                  | 12      | 12    | 12    | 10    |
| MUX2             | area                 | 248.0   | 374.5 |       |       |
|                  | pow                  | 31.3    | 56.6  | 104.0 | 31.6  |
|                  | (ex)                 | 35.0    | 60.3  |       |       |
|                  | $\operatorname{trs}$ | 28      | 28    | 31    | 32    |
| FA               | area                 | 597.1   | 654.5 |       |       |
|                  | pow                  | 88.9    | 111.2 | 200.1 | 106.1 |
|                  | (ex)                 | 99.9    | 117.4 |       |       |
| nand3            | $\operatorname{trs}$ | 5       | 6     | 5     | 14    |
|                  | pow                  | 12.2    | 17.2  | 33.4  | 32.9  |
| XOR              | $\operatorname{trs}$ | 9       | 10    | 9     | 10    |
|                  | pow                  | 28.7    | 39.8  | 75.6  | 46.8  |

Table 1Circuit simulation results at 100 MHz.

() sCMOS: static CMOS, dCMOS: domino CMOS, PT: pass-transistor, trs: number of transistors, area: layout area  $(\mu m^2)$ , pow: power consumption  $(\mu W)$ , (ex): with layout (extracted capacitances)

## 5. Implementation Results

Experiments have been attempted for a number of logic modules in the nMOS 4-phase dynamic logic, static CMOS logic, domino CMOS logic, and pass-transistor logic, by using  $0.35 \,\mu\text{m}$  triple-metal technology with the pMOS and the nMOS threshold voltages of -0.7 V and 0.55 V, respectively.

To see the performance of the nMOS 4-phase dynamic logic scheme, we have implemented a 2-input multiplexer (MUX2), a 1-bit full adder (FA; see Fig. 2), an exclusive OR (XOR), and a 3-input NAND (nand3).

The detailed features are summarized in Table 1, where simulations of power consumption with extracted capacitances through layouts for MUX2 and FA are also executed with the use of the Cadence Layout Tools (Virtuoso, ver.4.4). Table 1 demonstrates that even with the use of an optimal static CMOS structure, the 4-phase dynamic logic surpasses all the other logics in power dissipation. Moreover, supposing that a large functional unit is to be constructed, the static CMOS logic scheme may need pipeline registers for performing the pipeline process. On the contrary, the 4-phase logic scheme is controlled essentially by a 'dynamic' mechanism so that none of these pipeline registers are necessary, and hence it turns out that we can omit the power dissipation which might be necessary additionally in such pipelining.

To enhance the performance evaluation, we have also simulated the frequency characteristics of power consumption for a 4-bit Adder, an FA, an XOR, a 3-input NOR, and a 3-input



Fig. 11 Power consumption ratio of nMOS dynamic logic to CMOS logic.



Fig. 12 Layout patterns for one of the outputs of 8bit round shift circuit. (a) Original layout patterns, (b) Final layout patterns.

NAND. Figure 11 shows the statistics of power ratios of these modules in the 4-phase dynamic logic versus in the static CMOS logic, which indicates that the former can reduce the power dissipation of the latter by 30–40% at frequencies of 20–200 MHz.

To demonstrate the performance of AC architecture, we have implemented one of the outputs of an 8-bit round shift circuit by means of the AC of Fig. 8 (c). Given the original layout patterns of **Fig. 12** (a), in which transistors are simply placed, the final layout patterns are obtained as depicted in Fig. 12 (b), where 29.5% of area reduction is observed.

To see the applicability of ACs to more complicated logic functions, we have implemented



Fig. 13 Layout patterns of 8-bit round shift circuit. (a) ACs of 4-phase logic, (b) CMOS logic.

 Table 2
 Experimental results for 8-bit round shift circuit.

| 100 MHz<br>3.3 V | 4-phase    | CMOS   | ratio |
|------------------|------------|--------|-------|
| power (mW)       | 2.723      | 3.946  | 0.69  |
| #trs.            | 305        | 476    | 0.64  |
| area $(\mu m^2)$ | $16,\!418$ | 21,561 | 0.76  |

the whole of a 8-bit round shift circuit with the use of ACs. **Figure 13** (a) shows the layout patterns abtained by using ACs, whereas Fig. 13 (b) depicts those by the static CMOS logic.

We have also simulated the performance of power consumption of the 4-phase dynamic logic and the static CMOS logic with the use of an 8-bit round shift circuit. The experimental results are summarized in **Table 2**, from which we can see that the power consumption is about 30% smaller than that of the static CMOS logic. It can be readily verified that the AC architecture for the 4-phase logic can reduce effectively not only the number of transistors but also the power dissipation, in comparison with the static CMOS logic.

A clock generator has been implemented for the operation frequency of 100 MHz so as to investigate the power consumption and layout size, where the capacitive load for each clock signal is assumed to be 200 fF, which can drive 40 nMOS transistors, i.e., 20 ACs.

Layout patterns attained for SS and CS are shown in **Figs. 14** (a) and (b), respectively. Detailed features of the clock generator are shown in **Table 3**. In addition, the waveforms of four



Fig. 14 Layout patterns of clock generator. (a) SS, (b) CS.

 Table 3
 Experimental results for clock generator.

| $\begin{array}{c} 100\mathrm{MHz} \\ 0.35\mathrm{\mu m} \end{array}$ | SS    | CS    |
|----------------------------------------------------------------------|-------|-------|
| power (mW)                                                           | 1.803 | 1.217 |
| #trs.                                                                | 28    | 68    |
| area $(\mu m^2)$                                                     | 1,022 | 1,896 |

clock signals generated by this clock generator have been attained through an HSPICE simulator, as shown in Fig. 10, from which we can see that margin fields between phases 2 and 3 and between phases 4 and 1, are settled as a measure for non-overlap. It should be added that although a CS has been tentatively settled to drive 20 ACs, the driving ability of a CS should be raised according as the size of the logic increases.

#### 6. Conclusion

This paper has described the nMOS 4-phase dynamic logic scheme, which is intended mainly for reducing power dissipation. AC architecture dedicated for this logic scheme has been devised, by which we can integrate a given logic function with less transistors than that required for the CMOS logic, resulting in lowpower consumption and high density. Moreover, a clock generator dedicated to generating the four clock signals is devised to mitigate the complexity of clock lines. Considering that capacitive loads related to the clocks are variable and state-dependent, which may unbalance the corresponding clocks and hence incur clock skew, we insert a large size of inverters to be connected to the four clock signals in the CSso as to attain sufficiently fast switching time, which contributes much toward the elimination of clock skew.

Experimental results for several functional modules demonstrate that our nNMOS 4-phase

dynamic logic scheme can be a viable candidate for the low-power logic design.

Finally, it should be added that according to our experiments the 4-phase dynamic logic may demonstrate the practicability especially for functional modules operating at frequencies up to 200 MHz.

Development is continuing further on sophisticated CAD tools not only for logic synthesis but also for layout synthesis, dedicated to our nMOS 4-phase dynamic logic scheme of exploiting ACs.

### Acknowledgments

The authors wish to thank N. Kubo, M. Osaka, K. Yoshida, T. Yoshimura and R. Miyama of SHARP Corporation for their valuable discussions and technical supports.

### References

- Weste, N.H.E. and Eshraghian, K.: Principles of CMOS VLSI Design: A System Perspective, (2nd Ed.), Addison-Wesley (1993).
- Veendrick, H.J.M.: Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits, *IEEE J. Solid-State Circuits*, Vol.SC-19, No.4, pp.468–473 (1984).
- 3) Yano, K., Sasaki, Y., Rikino, K. and Seki, K.: Top-down pass-transistor logic design, *ibid*, Vol.31, No.6, pp.792–803 (1996).
- 4) Parameswar, A., Hara, H. and Sakurai, T.: A swing restored pass-transistor logic-based multiply and accumulate circuit for multimedia applications, *ibid*, Vol.31, No.6, pp.805–809, (1996).
- Zimmermann, R. and Fichtner, W.: Lowpower logic styles: CMOS versus passtransistor logic, *ibid*, Vol.32, No.7, pp.1079– 1090 (1997).
- Yen, Y.T.: A mathematical model characterizing four-phase MOS circuits for logic simulation, *IEEE Trans. Computers*, Vol.c-17, No.9, pp.822–826 (1968).
- Asija, S.P.: Four-phase logic is practical, *Electronic Design*, pp.160–163 (1977).
- Yen, Y.T.: Transient analysis of four-phase MOS switching circuits, *IEEE J. Solid-State Circuits*, Vol.sc-3, No.1, pp.1–5 (1968).
- Friedman, V. and Liu, S.: Dynamic logic CMOS circuits, *ibid*, Vol.sc-19, No.2, pp.263– 266 (1984).

(Received September 20, 1999) (Accepted February 4, 2000)



**Bao-Yu Song** received the B.E. and M.E. degrees in information systems engineering from Osaka University, Osaka, Japan, in 1996 and 1998, respectively. She is currently working toward the Ph.D. degree in In-

formation Systems Engineering. Her research activities are related to low-power circuit techniques on VLSI design. She is a Member of IEEE, IPSJ, and IEICE.



Makoto Furuie received the B.E. and M.E. degrees in information systems engineering from Osaka University, Osaka, Japan, in 1998 and 2000, respectively. He is currently working toward the Ph.D. degree in In-

formation Systems Engineering. His research interests include Computer-Aided Design of VLSI Circuits. He is a student member of IEEE and IEICE.



Yukihiro Yoshida received the B.E. and M.E. degrees all in electrical engineering from Doshisha University in 1963, and 1965, respectively. He joined the Sharp Corporation in 1965, where he was promoted to

Division General Manager of IC Development Center, Sharp Corporation in 1991. He is currently a student of Doctorate Course in the Department of Information Systems Engineering, Osaka University. He has been engaged in Research and Development mainly on MOS LSI design of Electronic Calculator, Scientific Calculator, Word processor, Personal Computer, PDA design, and system ASIC design. His research interests include VLSI implementation and low-power consumption technology of VLSI design. He is a Member of IEEE.



Takao Onoye received B.E. and M.E. degrees in Electronic Engineering, and Dr. Eng. degree in Information Systems Engineering all from Osaka University, Japan, in 1991, 1993, and 1997, respectively. He joined the

Department of Information Systems Engineering, Osaka University in 1993 as a research associate, where he was promoted to a lecturer Meanwhile, he was with the ICS in 1998. Department, University of California, Irvine, as a visiting associate researcher in 1997-1998. Presently, he is an Associate Professor in the Department of Communications and Computer Engineering, Kyoto University. Since 1998, he has also served a principal research scientist of Synthesis Corporation. His research interests include low-power architecture, VLSI design, and implementation of multimedia processing Dr. Onoye is a member of IEEE, systems. ACM, IPSJ, and ITE of Japan.



Isao Shirakawa received the B.E., M.E., and Ph.D. degrees all in electronic engineering from Osaka University in 1963, 1965, and 1968, respectively. He joined the Department of Electronic Engineering, Faculty of

Engineering, Osaka University in 1968, where he was promoted to a professor in 1987, and he moved to the Department of Information Systems Engineering in 1990. Meanwhile, he was with the Electronics Research Lab., University of California at Berkeley, as a visiting scholar in 1974–1975. He has been engaged in education and research mainly on basic circuit theory, logic design, applied graph theory, CAD algorithms for VLSI, and VLSI implementation for signal processing. He is a member of IPSJ, SICE, ACM, and so forth, and a Fellow of IEEE. His main IEICE/IEEE activities are as follows; Trustee of IEICE Editorial Board (1996–1997), Vice President of IEEE CAS Society (1995–1996), General Chair of ASP-DAC '97, Program Chair of ASP-DAC '95, Program Chair of APCCAS '92, Program Co-Chair of ISCAS '91, etc.