RC-002

# A Novel Low Power FPGA Architecture Ce Li, Yiping Dong†, Takahiro Watanabe‡

#### 1. Abstract

An FPGA is easy-to-use for manufacturing due to its merits, i.e., designed for performance, time to market, low cost, high reliability, and long-term maintenance. But, by the increasing of FPGA size and the deep-submicron process technology, the power consumption of FPGA limits its application in mobile products. So, many low power methods and FPGA architectures have been researched. In this paper, a novel FPGA architecture with sophisticated power-gating is proposed. Each Logic Element (LE) or Clustered Logic Block (CLB) in FPGA could be powered off separately by the status of some internal logic signals. So, this method could dynamically save the power of LEs or CLBs which are unused in the circuit after download or entering sleep mode without any control signals out of FPGA. For this advantage, our proposed method is very useful in reducing FPGA leakage power especially when used as commercial mobile chips. Moreover, it will not only reduce the FPGA leakage, but also can be used for the emulation of ASIC chips before tape-out.

#### 2. Introduction

Recently, FPGAs (Field-Programmable Gate Array) are widely used because of its short development time and flexibility for the commercial design. Also, many ASIC companies use FPGA to emulate their designs to check the design quality and debugs before tape-out. It can reduce the non-recurring engineering (NRE) [1]. But its weak point is that the power consumption per function which limits its applications in the mobile devices.

The Basic units of FPGA are LE/CLB[2]/LAB(Logic Arrange Block [3]), Input Output Block (IOB), Connection Box, Wire Channel and Switch Box. FPGA implements the total logic by dividing the design into small pieces which can be achieved by LE composed of a 4-input look-up table (LUT) and a flip flop, shown in figure 1.



Fig.1 4-input LE architecture.

When we develop a commercial product, we give the design first. Then, we select an appropriate FPGA chip whose capability is little bigger than the design. But the blank parts which do not implement the design still have the power consumption. For the logic mapped to the FPGA, we could not reduce power consumption by clock-gating, because FPGA uses global clock. Other well-known power saving methods such as PSO (Power Shut Off) cannot be used in current commercial FPGA chips, since there is no related logic to save the value in the SRAM of

- †(社)情報処理学会, IPSJ
- ‡(社)電子情報通信学会, IEICE

these FPGA chips. If we power them off, the function will be lost. So, we focus on the new architecture to reduce the power consumption of FPGA. It could not only power off the unused CLB, but also can dynamic control the power of them. What is more, it could emulate ASIC designs with the power domains.

The remainder of this paper is organized as follows: in the next section, the LE architecture using power gating method is described. In Section 4, we propose the new architecture of the CLB. The software which supports design by this new FPGA architecture is shown in section 5. Experimental results are shown in section 6. Finally, this paper is concluded.

#### 3. Power Gated LE

In the power gating method, PMOS and/or NMOS is added as the header or the footer of the circuit. The circuit power could be switched off by controlling CMOS. A pair of high-VTH power switches in series is usually used for power gating, but it causes more IR voltage drop in the supply. This drop increases delays for the gates in the design [5]. To avoid this drawback, only PMOS switch is added in our architecture, so that VDD is switched and VSS is provided directly to the entire chip. It may be the most appropriate choice for switches if external power gating will also be used on the chip [5].

The former researches focused on power reduction methods such as Power-gating, Dual-VTH/VDD and Micro-VDD-Hopping [4]. These methods can be used in the whole FPGA chip by controlling CMOS switch for each four CLBs, but the control signals and the routing for them are not described clearly in the paper. If these control signals were generated from a circuit inside and then sent to the power control chips out of the FPGA, the outside of the chip has to control the power or select the power supplies. It will waste some IO pads and IO blocks for the control signals. So, we propose the new power gating architecture inside the FPGA whose control signals also comes from the internal logic.

Fig. 2 illustrates the proposed power gating architecture. Let's consider the 3-input LE (SRAM based) as an example. It consists of 3-input LUT and a DFF. S[7:0] is the SRAM registers whose data could not be lost during the VDD (the power of FPGA) is gated. S9 is a Disable Sleep Mode Register (DSMR). It affects the whole power of this LE, VDD SW.

- (1) When S9(DSMR) is set to one, this architecture could not support sleep mode. The logic value at the node M in Fig.2 is always zero no matter what SLPEN value is. So, VDD can pass through the PMOS, then, the LUT and the Flip-Flop are powered on. During this period, this LE has the normal function, that is, the value of S[7:0] can be selected by the input[2:0] as normal 3-LUT. S8 is used to set the attribute of this LE to the combinational logic or sequential logic.
- (2) When S9 is set to zero, the LE enables sleep mode.
- (2-1) In this mode, if SLPEN is one, then the logic value at M is one, the PMOS shut off the power supply and the logic block

surrounded by the broken line is power off. The LE enters into sleep mode. (2-2) If SLPEN is zero, the value at M is zero. So, the 3-input LE is power on as the normal mode.

The input signal SLPEN could connect any LE outputs or the primary input pins inside of the FPGA for this feature.



Fig. 2 Power Gating Architecture.

In this architecture, since logic in the LUT will be lost in the sleep mode, we have to isolate the output of the power gated LE from the load in the sleep mode to avoid leakage or some function issues. SRAMs powered by the VDD can always keep the value. That is the difference from architectures reported in [6][7]. If these SRAM registers are switched off, during the dual VDD select, the LUT function does not work well when they come back from the sleep mode. Similarly, if FPGA lost the power, it must be configured again. Some papers [4][5][8] have the feature that the register value or the primary output value can be saved during the power is gated, because there are two power rails for the DFF or the logic which could save the value. The architecture described in this paper does not support this feature. Because many systems, such as PC, usually reset the power gated logic after resumed from the sleep mode. So, all the transistors of the flip-flop should be connected to the VDD SW which is gated by the PMOS.

Also, the clock of DFF in LE should be gated in the sleep mode to save the dynamic power. But, the merit by the clock-gating at each LE is very small. It will be implemented in CLB as described in the next section.

This power gating FPGA architecture bases on the design mostly. In another word, the value of the DSMR in each LE should be determined before the design is downloaded into the FPGA. The LEs which do not have sleep mode or are not used should be mapped and left the SLPEN signals without drive when setting the DSMR to one. When the DSMRs of the sleep LEs are configured to zero, the SLPEN signals will be connected the driver which is described in the RTL design when routing. Section 5 gives more information about the EDA flow.

#### Power gated CLB Architecture

Although the proposed LE architecture has the advantages that each LE could be powered off and the on-chip gating is much faster than off-chip power rail gating, another consideration on design is required because this fine grain power gating has the significant area overhead by adding power switch and related gates in each LE. A block gates can be powered off in coarse grain power gating by the collection of switch cells.

Actually, almost all the commercial FPGAs are made of the two-level hierarchy [9]. The first level consists of the k-input LE. It could implement a combinational logic or sequential logic. The

second level is called CLB or LAB. The CLB level is made of n basic LEs (i.e. n LUTs and n FFs), along with local interconnect that allows the n cluster outputs to be routed back to LUT inputs. The number of logic block inputs, i, can be less than the total number of LUT inputs in this CLB, k\*n, where the local interconnect also allows each of the i inputs to be routed to any of the k\*n LUT inputs. It will increase the density and speed of FPGA [1].



Fig.3 the FPGA architecture basing on the PCHM.

Based on the proposed LE architecture, DSMR, SLPEN signal and isolate gates are added to CLB. It means that n pairs of DSMR and SLPEN signal in each LEs of the CLB could be replaced by one pair in the CLB level. The FPGA architecture based on the PCHM (Power Control Hard-Macro) is shown in figure 3.

Figure 4 illustrate PCHM internal logic of CLB. DSMR located in this PCHM to enable and disable the sleep mode of the related CLBs. The four DFFs are used to generate the related power on or off sequence. The count of the DFFs can be changed to meet the different power sequences. When SLPEN is set during DSMR is low (enable sleep mode), PCHM will gate the clock before they are sent to the LE level first. The second step is to isolate the output of CLB to avoid the snake current by setting ISOL to low. Then, DFF in CLB will be reset when CLBRST (CLB reset) goes to low. After that, power of the CLB will be gated when VDD\_SW is switched off. The switching fabric typically consists of a large number of PMOS switches to provide enough current. The area of these switches will be sizeable if there are many LEs in the CLB.

The power on sequence is in an opposite way after SLPEN is pulled down, shown in Figure 5. The sleep time, during SLPEN is high, depends on the real state of the design in the FPGA, because internal logic drives the SLPEN.

When disable the sleep mode by setting DSMR to one, the VDD\_5DFF in PCHM is power gated. It will save the power consumption of PCHM. All the power gated related method

could not be affected by SLPEN. Figure 6 illustrates the power sequence which is not affected by PCHM when disable sleep mode.

Three parameters {k, n, i} can be used to describe for the CLB [1]. The reasonable values of these parameters will be given out in section 6 after comparing different value on the benchmarks.

logic and black boxes (if needed). Then, the ABC optimizes the netlist based on And-Inverter Graphs, maps for the LUTs and innovative algorithms for sequential synthesis and verification. The output of ABC is also a .blif format netlist which consists of LUTs, flip-flops. We need to separate the sleep modules and the un-sleep modules form the verilog design, and restore the



Fig.4 The Power Gated Control Logic.

Since the power of CLB is gated by the header switch PMOS, the design mapped into this CLB should be in the same power domain. If not, when the VDD\_SW is gated, the logic for the unsleep domain will be affected in this CLB. The CLB based FPGA contains the switch box, connection box and the wire channel. The inputs and outputs are located in each CLB side, such as left, right, bottom and top. These attributes will be added in the architecture file for the VPR.



Fig. 6 Power sequence when disable sleep mode.

## 5. Software Flow

A commercial FPGA provides lots of software from the RTL design to the program downloading. Most of them could not cover the architecture for the third party. So, a software flow should be explored for our proposed architecture.

Figure 7 illustrates the typical CAD flow we use, based on the ODIN\_II [10], ABC [11], V-PACK and VPR [9][12][13]. They compose an un-commercial FPGA design flow that can be used from front-end to the back-end. First, ODIN\_II synthesizes the verilog Hardware Description Language (HDL) design in to a flattened structural netlist in .blif [14]. The netlist consists of

SLPEN signals for each sleep modules before ODIN\_II, because ABC will remove these ports which have no load after optimization. Therefore, the dark block in Fig. 7 is newly added in the flow. We developed a tool to pick out the module which has the inputs commented by "//power gating signal" in the whole logic design. The V-PACK program packs the LUTs and flip-flops in to CLB which contains one or more LEs. After V-PACK generates the output file in .net format, we use a tool to combine the .net file of the sleep and un-sleep modules into one .net file with connection of the SLPEN control signals. VPR can place the circuit with the simulated annealing algorithm [13] and then route it. The output of VPR describes the circuit placement, total logic area, routing area and information, and various statistics concerning the minimum number of tracks per channel required to successfully route, the total wire length, etc [12].

## 6. Experiments

We performed experiments on the biggest 20 MCNC benchmarks. Fig. 8 illustrates the relationship between the arithmetic mean value of total area and CLB size n. When n=1, the total area is larger because there is only one LE in the CLB and no local interconnection in the CLB, much of the area is cost by the connection blocks and switch blocks among the CLBs. What is more, the ratio of PCHM to LE is 1:1. Each CLB has a PCHM which contains at least five flip-flops and related logic gates. This ratio is decreased by half or less when n is a little bigger, the PCHM area becomes less important to the total area. The merit of the connection in the CLB also comes out. But the complicated interconnection in the CLB will remove this merit when n becomes larger. In conclusion, when the LE number in the CLB is 3, we could get the highest utilization ratio based on this architecture. The input parameter i is set to 2n+2 by the reference [1], so, i = 8 in our architecture.

### 7. Conclusion

We have proposed a new FPGA power gating architecture which can switch off the CLB by the internal logic. A software flow is

also given out to support the evaluation of the new architecture.



Fig. 7 Software Flow.

Obviously, the area of the FPGA with PCHM is bigger than the ones without PCHM. We pay attention to the point that the power off transistor is 16.7% of total count in the PCHM FPGA architecture shown in Fig. 9.

The FPGA in this architecture could also be used to do the function emulations for the portable electronic devices or even Note Book PC for the sleep states. The further work will focus on the relationships between timing and the parameters of the CLB.



Fig. 8 Relationship between area and CLB size



Fig. 9 Power off Percent in the PCHM FPGA Architecture Acknowledgements

Valuable discussions with the members of ASIC-DA lab in Waseda University make this low power architecture idea more clear and considerate.

#### Reference

- [1] V. Betz, J. Rose, A. Marquardt, "Architecture and CAD for Deep-Submicron FPGAs," Kluwer Academic Publishers, 1999
- [2] V. Betz, J. Rose, "Cluster-Based Logic Blocks for FPGAs: Area-Efficiency vs. Input Sharing and Size", CICC, 1997, pp. 551-554.
  [3] ALTERA co, "FPGA Architecture", White Paper, July 2006.
  [4] C. Q. Tran, H. Kawaguchi and T. Sakurai, "95% Leakage-Reduced
- FPGA using Zigzag Power-gating, Dual-VTH/VDD and Micro-VDD-
- Hopping, "ASSCC, Nov. 2005, pp. 149 152.

  [5] M. Keating, D. Flynn, R. Aitken, A. Gibbons and K. Shi. "Low Power Methodology Manual," Springer, 2007.
- [6] A. Gayasen, K. Lee, N. Vijaykrishnan, M. Kandemir, M.J. Irwin and T. Tuan, "A Dual-VDD Low Power FPGA Architecture," Lecture Notes in Computer Science, 2004, pp.145-157.
- [7] J. H. Anderson, F. N. Najm, "A Novel Low-Power FPGA Routing Switch, "Proceedings of the IEEE 2004 CICC. 2004, pp. 719-722
- [8] Y. Shin, H. O. Kim, "Cell-Based Semicustom Design of Zigzag Power Gating Circuits," 8th International Symposium on Quality Electronic Design, 2007, pp.527-532.
- [9] J. Luu, I. Kuon, P.Jamieson, T. Campbell, A. Ye, M. Fang, and J. Rose. "VPR 5.0: FPGA CAD and Architecture Exploration Tools with Single-Driver Routing, Heterogeneity and Process Scaling," in FPGA '09, ACM Symposium on FPGAs, ACM Symposium on FPGAs, February 2009, pp. 133-142.
- [10] P. Jamieson and J. Rose, "A Verilog RTL Synthesis Tool For Heterogeneous FPGAs," in 2005 Int'l Conference on Field Programmable Logic and Applications (FPL'05), Tampere, Finland, August 2005, pp. 305-310.
- [11] Berkeley Logic Synthesis and Verification Group, ABC: A System for Sequential Synthesis and Verification. http://www.eecs.berkeley.edu/ -alanmi/abc/abc.htm
- [12] V. Betz, T. Campbell, W. M. Fang, etc., "VPR and T-VPack User's Manual," http://www.eecg.utoronto.ca/vpr/
  [13] V. Betz and J. Rose, "VPR: A New Packing, Placement and Routing
- Tool for FPGA Research," in 7th International Workshop on Field-Programmable Logic, London, August 1997, pp. 213-222
- [14] University of California, Berkeley Logic Interchange Format. Berkeley. http://www1.cs.columbia.edu/~cs4861/s07-sis/blif/index.html