## Regular Paper

# Logic Optimization of Asynchronous Speed-Independent Circuits Using Transduction Methods 

Hiroshi Saito, ${ }^{\dagger}$ Hiroshi Nakamura, ${ }^{\dagger}$ Masahiro Fujita ${ }^{\dagger \dagger}$ and Takashi Nanya ${ }^{\dagger}$


#### Abstract

In this paper, we present an optimization method of asynchronous speed-independent circuits based on transduction methods. Although transduction methods are well used for the optimization of multi-level combinational circuits, the direct application to speed-independent circuits may leads to a malfunction of circuits because the property of hazard-freeness guaranteed by circuits is broken. Therefore, we discuss how to optimize speed-independent circuits by using transduction methods without leading any hazardous behavior. For instance, we extend the gate substitution algorithm in transduction methods. Finally, we evaluate the efficiency of the extended gate substitution algorithm by applying it to several benchmark circuits.


## 1. Introduction

Signal transition graphs (STGs) ${ }^{2)}$ are well used to specify the behavior of asynchronous circuits. Logic synthesis tool Petrify ${ }^{3)}$ accepts an STG and produces a hazard-free asynchronous speed-independent (SI) circuit. The SI circuit can operate correctly under arbitrary gate delay and zero wire delay ${ }^{6)}$.

Although Petrify is well established for the synthesis of SI circuits, global optimization using the relationships among the logic functions of gates is restricted to gate substitution after logic decomposition ${ }^{5}$. Namely, if a function is decomposed, Petrify exploits whether the gate corresponding to the decomposed function can substitute for other gates. However, it implies that there is no chance of optimization if no function is decomposed because of the appearances of hazardous behavior. Because the logic function of each non-input (output or internal) signal is derived separately from the binary encoded state graph of a given STG, the failure of decomposition remains redundancy in the synthesized circuit such that the identical function appears on the logic network of different signals.

Thus, in this paper, we present an alternative approach to optimize SI circuits globally based on transduction methods ${ }^{4), 8}$. Transduction methods are well used for the optimization of multi-level combinational circuits. By calculating permissible functions which represent

[^0]don't care space in a given circuit, transduction methods optimize circuits by merging common gates, substituting gates, or pruning gates so that the total number of gates and/or connections are minimized.

In this paper, we extend the gate substitution algorithm ${ }^{8)}$ to optimize SI circuits based on the standard- $C$ architecture ${ }^{5)}$. Because the gate substitution algorithm does not concern the property of hazard-freeness guaranteed in SI circuits, there is a possibility that the property is broken during transformation. Therefore, to optimize SI circuits correctly, the gate substitution algorithm is extended in the following two aspects. First, we extend the traditional calculation method of permissible functions. The key point is that the assignment of don't cares is carried out based on monotonous cover conditions ${ }^{5)}$ which guarantee hazard-freeness in SI circuits based on the standard-C architecture. Second, we introduce valid conditions of substitution to preserve the property of hazard-freeness. Although these conditions are also based on monotonous cover conditions, we modify them so that the evaluation of monotonous cover conditions is carried out on calculated logic functions and permissible functions.

Figure 1 shows how our proposed extended gate substitution algorithm works. At first, the original STG is synthesized to generate a SI circuit based on the standard-C architecture and the new STG corresponding to the synthesized circuit. The new STG is different from the original one in that it includes the behavior of not only input/output signals


Fig. 2 VME bus controller: (a) Interface, (b) Timing diagram of a read cycle, (c) STG, (d) SG, (e) Karnaugh map for signal $D$, (f) Standard-C architecture for signal $D$.


Fig. 1 Optimization flow.
but also internal signals inserted by Petrify. Then, by using draw_astg command distributed in Petrify package, the binary encoded state graph is translated from the new STG. After both the SI circuit and the state graph are generated as inputs of the extended gate substitution algorithm, permissible functions are calculated throughout the circuit. Finally, the circuit is optimized without violating the property of hazard-freeness.

The rest of this paper is organized as follows. The basic notions for SI circuits are shown in Section 2. In Section 3, we describe a traditional calculation method of permissible functions at first, and then we describe how to calculate permissible functions in SI circuits. In Section 4, we describe conditions of valid substitution and the extended gate substitution al-
gorithm. Finally, we show experimental results in Section 5 and conclude this work in Section 6.

## 2. Basic Notions of Asynchronous SI Circuits

### 2.1 Signal Transition Graph

Figure 2 (a) shows the interface of a VME bus controller. The controller controls the flow of data between Bus and Device via a Data Transceiver. There are two input signals $D r$ and $D w$ for bus side which indicate the beginning of a bus read and write cycle. After the completion of cycles, these signals are acknowledged by output signal Dack. On the other hand, there is a pair of handshake signals $L D$ and $L D a c k$ for device side. Data Transceiver is controlled by signal $D$. The timing diagram of a read cycle is shown in Fig. 2 (b).
Signal transition graphs (STGs) ${ }^{2)}$ known as interpreted Petri Nets ${ }^{7}$ ) are well used to specify the behavior of asynchronous circuits. The STG shown in Fig. 2 (c) represents the behavior of the VME bus controller, which is generated by Petrify ${ }^{3)}$ after the logic circuit is synthesized. The STG has one internal signal csc which is inserted during the synthesis. Note all signals in an STG are partitioned into the set of input signals and the set of non-input (internal and output) signals.
Rising transitions of signal $a$ are labeled with " $a+$ " and falling transitions with " $a-$ ". We also use the notation $a *$ if the sign of the tran-
sition is not specific. Multiple transitions for a signal are distinguished by an index $i$ (e.g., $D_{1}+$ and $D_{2}+$ in Fig. $2(\mathrm{c})$ ). Places (denoted by circles) can be marked with tokens (black dots). The set of all places currently marked is called a marking. On the other hand, all arcs in STGs represent the ordering relation between vertices.

A signal transition is enabled if all its input places contain a token. After the firing of an enabled transition, one token is removed from every input place and added to every output place of the transition. For example, in the initial marking $\{p 13, p 18\}$ of the STG in Fig. 2 (c), transitions of input signals, $D w+$ and $D r+$, are enabled. In that marking, however, only one of them can be fired because there is only one token in place $p 13$ (i.e., the removal of the token by the firing of one transition will disable the other). The choice is decided by the environment. If transition $D w+$ is fired, a new marking, $\{p 12, p 18\}$, where $D_{2}+$ becomes enabled is achieved. In that case, transition $D r+$ is not enabled any more (i.e., transition $D r+$ is disabled).

### 2.2 State Graph

By analyzing all of the reachable markings in a given STG, one can generate the state graph (SG) (i.e., each marking in the STG corresponds to a state in the SG). In SGs, vertices labeled with a vector of signal values represent binary encoded states and arcs between pairs of vertices labeled with a signal transition represent the ordering relation between states. Figure $2(\mathrm{~d})$ is the SG corresponding to the STG in Fig. 2 (c). In state 0'0'00000 of the SG where all of the signals have value 0 , transitions $D r+$ and $D w+$ are enabled. After the firing of $D w+$, the corresponding circuit achieves to state 01000'00 where $D+$ is enabled. Note that if a transition is enabled in a state, the corresponding signal value of the state is labeled by symbol '.

### 2.2.1 Speed-Independence Property

A synthesized asynchronous SI circuit is hazard-free if the SG satisfies the speedindependence property ${ }^{6)}$. The property consists of the following two conditions:

- no non-input signal transition can be disabled by another signal transition
- no input signal transition can be disabled by a non-input signal transition
The former ensures that there are no glitches, known as hazards, at the output of gates, while the latter ensures that there are no hazards at


Fig. 3 Violation of speed-independence.
the inputs of circuits. According to the speedindependence property, only transitions for input signals can be disabled by transitions for different input signals (i.e., a choice by environment as in Dr+ and Dw+ of Fig. 2 (c)).

Figure 3 shows a violation of the speedindependence property. Initially, in state $11^{\prime} 0$ ', the rising transition of non-input signal $c, c+$, and the falling transition of input signal $b, b-$, are enabled. If $b-$ happens before $c+$, the enabled transition $c+$ is disabled in state 100 (i.e., $c+$ cannot be fired any more). This implies a hazard at the output of the AND gate.
Note an SG must satisfy not only the speedindependence property but also other properties to be implementable as a hazard-free SI circuit. These are described in the literature ${ }^{6}$.

### 2.2.2 Next-State Function

The next-state function on a non-input signal is defined as follows ${ }^{5}$. . It maps the binary code of each SG state $s$ into:

- 1 if the signal has value $0^{\prime}$ or 1 in the binary code of $s$
- 0 if the signal has value $1^{\prime}$ or 0 in the binary code of $s$
-     * (don't care) for all binary codes that do not correspond to any reachable SG state The next-state function of signal $D$ in the SG of Fig. 2 (d) is represented in Fig. 2 (e) with the Karnaugh map.


### 2.2.3 Excitation Region and Quiescent Region

Given a signal $a$, we can classify the states of the SG into the following sets ${ }^{5)}$ :

- Positive and negative excitation regions (ERs)
- Positive and negative quiescent regions (QRs)
A maximally connected set of states in which a signal transition $a_{i} *$ is enabled is called an ER for the transition $a_{i} *$ (denoted by $\operatorname{ER}\left(a_{i} *\right)$ ). The QR of a signal transition $a_{i} *$ (denoted by $\left.\mathrm{QR}\left(a_{i} *\right)\right)$, with the corresponding $\operatorname{ER}\left(a_{i} *\right)$, is a maximal set of states $s$ reachable from $\operatorname{ER}\left(a_{i} *\right)$ such that $a *$ is stable (not enabled) in $s$ and $s$ is not reachable from any other $\operatorname{ER}\left(a_{j} *\right)(i \neq j)$ without going through $\operatorname{ER}\left(a_{i} *\right)$. Examples of

ER and QR for transition $D_{1}+$ are shown in Fig. 2 (d).

### 2.3 Standard-C architecture

In this work, we focus on the optimization of asynchronous SI circuits based on the standardC architecture. In the standard- C architecture, the logic circuit of a non-input signal consists of three-level logics as shown in Fig. 2 (f). The first-level AND-OR gates called monotonous covers (MCs) correspond to a signal transition in a given STG. An AND-OR gate represents not only a complex gate but also an AND or OR gate with multiple input inverters. They are configured by covering SG states based on the following MC conditions. The second-level OR gates gather MCs for all rising or falling transitions of a signal to organize a set or reset function of the signal. The third-level C-element corresponds to the output of a non-input signal. The C-element is equal to the function, $a=a \cdot(b+c)+b \cdot c$, and behaves as follows. The output signal $a$ is equal to 1 if both inputs $b$ and $c$ are equal to 1 and 0 if both $b$ and $c$ are equal to 0 . Otherwise, it keeps the previous value of signal $a$.

The definition of MC conditions are as follows ${ }^{5)}$.

Definition $2.1 C\left(a_{i} *\right)$ denotes one of the first-level AND-OR gates in the standard-C architecture. $C\left(a_{i} *\right)$ is a correct monotonic polyterm cover for the $\operatorname{ER}\left(a_{i} *\right)$ if the following three conditions are satisfied:
(1) Cover condition: $C\left(a_{i} *\right)$ covers all states of $\operatorname{ER}\left(a_{i} *\right)$, i.e., $C\left(a_{i} *\right)$ is logic 1 in all states of $\operatorname{ER}\left(a_{i} *\right)$
(2) One-hot condition: $C\left(a_{i} *\right)$ does not cover any state outside of $\operatorname{ER}\left(a_{i} *\right) \cup$ $\mathrm{QR}\left(a_{i} *\right)$, i.e., $C\left(a_{i} *\right)$ is logic 0 in all states outside of $\operatorname{ER}\left(a_{i} *\right) \cup \operatorname{QR}\left(a_{i} *\right)$
(3) Monotonicity condition: $C\left(a_{i} *\right)$ can change the value from 1 to 0 at most once along any state sequence within $\operatorname{QR}\left(a_{i} *\right)$
In the standard-C architecture, the speedindependence property must be satisfied not only for the transitions of input and non-input signals but also for the transitions of all MCs. The MC conditions satisfy this requirement ${ }^{5}$.

Let us explain the reason briefly. Suppose that the synthesized circuit enters a state in $\operatorname{ER}\left(a_{i}+\right)$. The cover condition ensures that $C\left(a_{i}+\right)$ changes the logic value from 0 to 1 in that state (i.e., the rising transition of $C\left(a_{i}+\right)$ is not disabled). The one-hot condition guarantees that no other gates $C\left(a_{j}+\right)(i \neq j)$ in the
set function of signal $a$ and no gates $C\left(a_{i}-\right)$ in the reset function of signal $a$ can be at 1 in that moment. Therefore, $C\left(a_{i}+\right)$ is the only gate to change the value of signal $a$ to 1 through the OR gate and the C-element (i.e., the enabled transition $a_{i}+$ is not disabled). When signal $a$ changes its value from 0 to 1 , the circuit enters a state in $\mathrm{QR}\left(a_{i}+\right)$. According to the last condition, $C\left(a_{i}+\right)$ will be reset the value from 1 to 0 (i.e., the falling transition of $C\left(a_{i}+\right)$ is not disabled) and this change is propagated to signal $a$ when it will go low (i.e., $a_{i}-$ ). As a result, we can find out that there is no disabled transition in the circuit (i.e., the speed-independence property is satisfied).
The generation of MCs is explained by considering the case of transition $D_{1}+$ in Fig. $2(\mathrm{~d}) . \operatorname{ER}\left(D_{1}+\right)$ consists of state $10100^{\prime} 00$ and $\operatorname{QR}\left(D_{1}+\right)$ consists of states 1010'111 and 1'011111. First, cube $D r \cdot \overline{D w} \cdot L D a c k \cdot \overline{D a c k}$. $\bar{D} \cdot L D \cdot c s c$ is generated to cover all the states of $\operatorname{ER}\left(D_{1}+\right)$. Then, the cube is expanded to Dr. LDack $\cdot$ csc without violating the one-hot and monotonicity conditions. This cube corresponds to the MC for transition $D_{1}+$, respectively (see Fig. 2 (e) and Fig. 2 (f)). Note the cube also covers unreachable states denoted as don't cares.

## 3. Calculation Methods of Permissible Functions

### 3.1 A Traditional Calculation Method of Permissible Functions

Before to explain a traditional calculation method of permissible functions, we define permissible functions as follows ${ }^{8)}$ :

Definition 3.1 If no non-input signals change after replacing the function realized at a gate or net to a function $f$, the function $f$ is called a permissible function for the gate or net.

Here, a net represents the wire for a signal or a fanout of a gate (a net is denoted as $n_{i}(i=$ $1,2, \ldots$ ) in Fig. $4(\mathrm{a})$ ).
Permissible functions are calculated for each net and gate based on the following two steps ${ }^{8)}$ :
(1) The logic function of each gate and net is calculated from the nets of input signals by assigning truth values.
(2) The permissible functions of each gate and net are calculated from the nets of non-input signals by assigning don't cares.


Fig. 4 Calculation of permissible functions.
By assigning truth values to the nets of input signals, the logic function of each gate and net is calculated from the nets of input signals to the nets of non-input signals. $2^{n}$ truth values are assigned for each net of input signals if there are $n$ input signals. For example in Fig. 4 (a), because there are three input signals, eight truth values are assigned to each net of input signals. A vector of these values is referred as the logic function of a gate or net. For example, the logic function of the net of input signal $b$ in Fig. 4 (a) is represented as $f\left(n_{3}\right)=(00110011)$.

After logic functions are calculated throughout a given circuit as in Fig. 4 (a), the permissible functions of each gate and net are calculated from the nets of non-input signals to the nets of input signals. Initially, the calculated logic functions of the nets of non-input signals are assigned as the permissible functions of the nets of non-input signals together with external don't cares (Fig. 4 (c)). Then, the permissible functions for all internal gates and nets are calculated by assigning don't cares. In the case of an OR gate, all of the inputs must be 0 if the output is equal to 0 . However, if the output is equal to 1 , one of its inputs must be 1 , while the others can be either 0 or 1 (i.e., don't care, denoted by $*$ ). Based on the same consideration, we can calculate the permissible functions of other types of gates. Note if a gate or net has multiple fanout, the permissible functions of the gate or net are represented as the common set of the permissible functions for each fanout.

Figure 4 (d) shows the calculated permissible functions for the circuit in Fig. 4 (a). As same as a logic function, the permissible functions for a gate or net are denoted as a vector of truth values with don't cares. For example, $G\left(n_{3}\right)=$ $\left(00^{*} 100 * 1\right)$ represents a set of permissible functions of net $n_{3}$ (i.e., (00110011), (00110001), (00010001), and (00010011)). The permissible
functions of gate $g_{1}$ and net $n_{4}, G\left(g_{1}\right)=G\left(n_{4}\right)$ $=(00010001)$, are the common set of $G\left(n_{5}\right)=$ $\left(000^{*} 000^{*}\right)$ and $G\left(n_{6}\right)=(00010001)$ because gate $g_{1}$ and net $n_{4}$ has two fanout (i.e., nets $n_{5}$ and $n_{6}$ ).

During the calculation of permissible functions, a gate or net may have several sets of permissible functions. For example in Fig. 4 (d), ( $* * * * 000^{*}$ ) is another set of permissible functions for net $n_{5}$ (the first three elements are different from the previous one). The difference comes from the selection of don't care assignment to $n_{1}$ and $n_{5}$ where the output of gate $g_{2}$ is equal to 0 . However, since to calculate all of the sets requires a lot of computation time, we select only one set based on priority. A set of permissible functions selected by priority is called Compatible Set of Permissible Functions (CSPF) ${ }^{8)}$.
For the selection of don't care assignment, we simply follow the same manner described in the paper ${ }^{8)}$. According to the paper, priority is assigned to each input of gates. After the assignment of priority, don't cares are assigned to the inputs which have higher priority, while don't care is not assigned to the input which has the lowest priority. Priority is assigned as follows:

- If an input is a fanout of a gate and the gate is connected to the net of a non-input signal, the lowest priority is assigned to the input because we cannot remove the gate connected to the non-input signal.
- If an input is a fanout of a gate which has multiple fanout, lower priority is assigned to the input because the reduction of the gate affects many parts of the circuit.
For example, we assign priority to the inputs of gate $g_{2}$ (i.e., nets $n_{1}$ and $n_{5}$ ) in Fig. 4 (d). Because $n_{5}$ is a fanout of gate $g_{1}$ which has multiple fanout ( $n_{5}$ and $n_{6}$ ), we assign higher priority not to $n_{5}$ but to $n_{1}$ (i.e., $G\left(n_{1}\right)$ has many don't cares in the elements).


### 3.2 A Calculation Method in Asynchronous SI Circuits

To calculate permissible functions (i.e., CSPF) in asynchronous SI circuits, we extend the calculation method described in Section 3.1. The reasons are as follows:

- The target of the calculation method described in Section 3.1 is combinational circuits. However, SI circuits derived by Petrify represent sequential machines which have feedback loops.
- The calculation method described in Sec-

(a) $\underbrace{\mathrm{n}_{10}}_{\mathrm{g}_{5} \mathrm{CSC}}$

(c)

Fig. 5 Calculation of permissible functions in asynchronous SI circuits.
tion 3.1 does not concern the possibility of hazards when we assign don't cares. However, since SI circuits preserve the property of hazard-freeness (i.e., it is guaranteed by Def. 2.1 in the standard-C architecture), we must carefully assign don't cares to reduce the possibility of hazardous behavior.
For the extension, we introduce the following three requirements:
(1) Removal of all feedback loops
(2) Assignment of truth values from SG states
(3) Assignment of don't cares based on Def. 2.1
Because SI circuits represent sequential machines, they contain feedback loops. To prevent iterative calculation of logic functions and permissible functions caused by these loops, we must cut all of the loops before the calculation of logic functions (see Fig. 5 (a)). After the cut of feedback loops, the calculation of logic functions and CSPF in SI circuits is considered as the cases of multi-level combinational circuits.

Since SI circuits are synthesized from an SG, truth values for each net of input signals are directly assigned from the values of SG states. As
a result, we can save memory space and computation time because of the avoidance of calculation in unreachable states. For example in Fig. $5(\mathrm{~b})$, since $D r$ is $1, L D a c k$ is $1, D w$ is 0 , and $c s c$ is 1 in state $s 1$, these values are assigned to the nets of input signals such that the first value of $f\left(n_{1}\right)$ is $0, f\left(n_{2}\right)$ is $1, f\left(n_{3}\right)$ is 0 , and $f\left(n_{11}\right)$ is 1 . After all state values are assigned to the nets of input signals, the logic functions of gates and nets are calculated as described in Section 3.1. Note each value in the logic function of non-input signals represents the next state value with respect to the current state.

For the assignment of don't cares, we refer to MC conditions described in Def. 2.1. According to the cover condition and the one-hot condition in Def. 2.1, an MC $C\left(a_{i} *\right)$ must be logic 1 in all states of $\operatorname{ER}\left(a_{i} *\right)$ and logic 0 in all states outside of $\operatorname{ER}\left(a_{i} *\right) \cup \operatorname{QR}\left(a_{i} *\right)$. Therefore, we should not assign don't cares to the values of the logic function in $C\left(a_{i} *\right)$ (i.e., the first-level AND-OR gate in the standard-C architecture) if the values are related to those states. In other words, don't cares are assigned only to the values related to the states of $\operatorname{QR}\left(a_{i} *\right)$.

In the case of the set function of signal $a$ (i.e., the second-level OR gate), don't cares are assigned to the values related to the states of all $\mathrm{QR}\left(a_{i}+\right)$ (i.e, states in $\mathrm{QR}\left(a_{1}+\right) \cup \mathrm{QR}\left(a_{2}+\right) \cup$ $\left.\ldots \cup \operatorname{QR}\left(a_{n}+\right)(i=1 \ldots n)\right)$. On the other hand, in the case of the third-level C-element, no don't cares are assigned to all the values of the logic function.

For example in Fig. 5 (c), we have never assigned don't cares for all the values of the logic function of signal $D$ (i.e., $f\left(g_{4}\right)$ ). For the set function of signal $D$ (i.e., $f\left(g_{3}\right)$ ), don't cares are assigned to the values related to states $s 2$, $s 3, s 7, s 8$, and $s 9$ because these states are in QR $\left(D_{1}+\right) \cup \mathrm{QR}\left(D_{2}+\right)$. Finally, for MCs of $D_{1}+$ and $D_{2}+$ (i.e., $f\left(g_{1}\right)$ and $f\left(g_{2}\right)$ ), we assign don't cares to the values related to $\mathrm{QR}\left(D_{1}+\right.$ ) (for $f\left(g_{1}\right)$ ) and $\operatorname{QR}\left(D_{2}+\right.$ ) (for $f\left(g_{2}\right)$ ).

By introducing these requirements, we can calculate CSPF in SI circuits. Figure 5 (c) shows the calculated CSPFs with respect to the circuit in Fig. 5 (a).

## 4. Transduction Methods in Asynchronous SI Circuits

In transduction methods described in the literature ${ }^{8)}$, we concentrate on the gate substitution algorithm. In the literature, the algorithm was applied to the optimization of multi-level combinational circuits. However, because the gate substitution algorithm does not concern the property of hazard-freeness guaranteed in SI circuits, there is a possibility that the property is broken after substitution if we apply it to SI circuits directly. Therefore, to apply it to SI circuits, we must extend the algorithm so that any substitution does not lead to hazardous behavior.

### 4.1 Conditions of Valid Gate Substitution

To preserve hazard-freeness after substitution, we propose the conditions described in Proposition 4.1. In the proposition, gate $g_{2}$ can substitute for gate $g_{1}$ only when gate $g_{2}$ satisfies the MC conditions (Def.2.1) satisfied by gate $g_{1}$. This is because in SI circuits based on the standard-C architecture, the hazardfreeness of gates is guaranteed by the MC conditions. Therefore, if $g_{2}$ satisfies the MC conditions satisfied by $g_{1}$, the substitution does not lead to any hazardous behavior.

Different from the conditions in Def.2.1 which are evaluated on SG, the conditions described in Prop. 4.1 are evaluated on calculated
logic functions and CSPFs. Before describing the proposition, we introduce several terminologies for calculated logic functions and CSPFs:

- $f(g)$ - the logic function of gate $g$
- $f_{s}(g)$ - the value of $f(g)$ in state $s$
- $G(g)$ - the CSPF of gate $g$
- $G_{s}(g)$ - the value of $G(g)$ in state $s$
- pred $_{s}$ - the set of all direct predecessor states of state $s$
The substitution of gate $g_{2}$ for gate $g_{1}$ is validated by the following proposition. $g_{1}$ and $g_{2}$ correspond to the MCs (i.e., first-level AND-OR gates) for the i-th transition of $a *$ and the j -th transition of $b *$ (i.e., $a_{i} *$ and $b_{j} *$ ).

Proposition 4.1 The substitution of gate $g_{2}$ for gate $g_{1}$ is hazard-free if the following conditions are satisfied.
(1) $\forall$ state $s: G_{s}\left(g_{1}\right)=1 \Rightarrow f_{s}\left(g_{2}\right)=1$
(2) $\forall$ state $s: G_{s}\left(g_{1}\right)=0 \Rightarrow f_{s}\left(g_{2}\right)=0$
(3) $\forall$ state $s: s \in \operatorname{QR}\left(a_{i} *\right) \wedge f_{s}\left(g_{2}\right)=1 \Rightarrow$ $f_{s^{\prime}}\left(g_{2}\right)=1$ in all states $s^{\prime}$ of preds
As the proof of Prop.4.1, we show that $g_{2}$ satisfies MC conditions not only for transition $b_{j} *$ but also for transition $a_{i} *$.
Proof: The first condition in Prop. 4.1 claims that $g_{2}$ satisfies the cover condition in Def. 2.1 for both transitions $a_{i} *$ and $b_{j}$. According to the calculation of CSPFs described in Section 3.2, no don't care is assigned to $f_{s}\left(g_{1}\right)$ if $s$ is a state in $\operatorname{ER}\left(a_{i} *\right)$. Therefore, $G_{s}\left(g_{1}\right)$ is equal to 1 in state $s$. If $f_{s}\left(g_{2}\right)$ is equal to 1 in all states $s$ where $G_{s}\left(g_{1}\right)$ is equal to 1 , it means that $g_{2}$ covers all states in not only $\operatorname{ER}\left(b_{j} *\right)$ but also $\mathrm{ER}\left(a_{i} *\right)$. In these states, the rising transition of $g_{2}$ is fired (i.e., not disabled).

The second condition claims that $g_{2}$ satisfies the one-hot condition in Def. 2.1 for both transitions $a_{i} *$ and $b_{j} *$. As same as the first condition, no don't care is assigned to $f_{s}\left(g_{1}\right)$ if $s$ is a state outside of $\operatorname{ER}\left(a_{i} *\right) \cup \operatorname{QR}\left(a_{i} *\right)$. Therefore, $G_{s}\left(g_{1}\right)$ is equal to 0 in state $s$. If $f_{s}\left(g_{2}\right)$ is equal to 0 in all states $s$ where $G_{s}\left(g_{1}\right)$ is equal to 0 , it means that $g_{2}$ does not cover not only the states outside of $\operatorname{ER}\left(b_{j} *\right) \cup \mathrm{QR}\left(b_{j} *\right)$ but also the states outside of $\operatorname{ER}\left(a_{i} *\right) \cup \operatorname{QR}\left(a_{i} *\right) . g_{2}$ is the only gate to change the value of signals $a$ and $b$ when $a_{i} *$ and $b_{j} *$ are enabled.

The third condition claims that $g_{2}$ satisfies the monotonicity condition in Def. 2.1 for both transitions $a_{i} *$ and $b_{j} * . s \in \operatorname{QR}\left(a_{i} *\right) \wedge f_{s}\left(g_{2}\right)=$ 1 means that state $s$ is in $\operatorname{QR}\left(a_{i} *\right)$ and $f_{s}\left(g_{2}\right)$ is equal to 1 . Under this situation, if all direct predecessor states $s^{\prime}$ (i.e., $s^{\prime}$ is a state in $p r e d_{s}$ ) are covered by $g_{2}$, there is no additional rising
transition between $s^{\prime}$ and $s$. It preserves that only a falling transition of $g_{2}$ is fired in both $\operatorname{QR}\left(a_{i} *\right)$ and $\operatorname{QR}\left(b_{j} *\right)$.

Before substitution, Prop. 4.1 must be evaluated to validate hazard-freeness. If one of them is not satisfied, the substitution is prevented because it leads to hazardous behavior.
The substitution for a second-level OR gate is also carried out by using Prop.4.1. For example, if the first-level MC (denoted as $g_{2}$ ) of transition $a_{1}+$ substitutes for the second-level OR gate (denoted as $g_{1}$ ) of transitions $b_{j}+$ ( $j=1,2 \ldots$ ), $g_{2}$ must satisfy Prop. 4.1 for all MCs of $b_{j}+$. On the other hand, a gate $g$ which substitutes for a C-element must have the same logic function as the C-element because no don't cares are assigned to the C-element (see Section 3.2).

### 4.2 Extended Gate Substitution Algorithm

To apply the gate substitution algorithm to asynchronous SI circuits based on the standardC architecture, we extend it so that substitution is realized only when Prop. 4.1 is satisfied. Figure 6 shows a pseudo code of our extended gate substitution algorithm. It evaluates the possibilities of substitution for all pairs of gates. Currently, there is no particular way to decide the order of substitution. However, since the order may affect the quality of optimization, we are going to consider it in our future work.

In our extended gate substitution algorithm, the evaluation of Prop. 4.1 is classified into three cases by the statements of calculated logic functions and CSPFs.

Case 1. $f\left(g_{1}\right)=f\left(g_{2}\right)$. If the logic functions of two gates $g_{1}$ and $g_{2}$ are equivalent, $g_{2}$ substitutes for $g_{1}$ without checking Prop. 4.1 because it implies that all conditions of Prop. 4.1 are satisfied. The substitution of $g_{1}$ for $g_{2}$ is also possible.

Case 2. $f\left(g_{2}\right) \subset G\left(g_{1}\right)\left(\right.$ or $\left.f\left(g_{1}\right) \subset G\left(g_{2}\right)\right)$. Suppose $f\left(g_{2}\right) \subset G\left(g_{1}\right)$. In this case, our algorithm evaluates whether $g_{2}$ substitutes for $g_{1}$ or not. $f\left(g_{2}\right) \subset G\left(g_{1}\right)$ means that in all states $s$ where $G_{s}\left(g_{1}\right)$ is 0 or $1 f_{s}\left(g_{2}\right)$ has the same value. Since it means that the first two conditions of Prop. 4.1 are satisfied, only the last condition of Prop. 4.1 must be checked before the substitution. We call the check of Prop. 4.1 in Case 2 as Case2_check.

Similarly, in the case of $f\left(g_{1}\right) \subset G\left(g_{2}\right)$, our algorithm evaluates whether $g_{1}$ substitutes for $g_{2}$ or not.

Input: An initial SI circuit and the corresponding SG
Output: An optimized SI circuit

```
begin
calculate CSPF \(G\left(g_{i}\right)\) for all \(i\) using SG
while network is changed do
    foreach pair of gates, \(g_{i}\) and \(g_{j}(i \neq j)\) do
        /* Case1 */
        if \(f\left(g_{i}\right)=f\left(g_{j}\right)\)
            disconnect all connections to \(g_{j}\)
            connect \(g_{i}\) to all fanout gates of \(g_{j}\)
                recalculate CSPF \(G\left(g_{i}\right)\) for all \(i\)
                break
        /* Case2 */
        if \(G\left(g_{i}\right) \supset f\left(g_{j}\right)\) and Case2_check of \(g_{j}\) wrt \(g_{i}\)
            disconnect all connections to \(g_{i}\)
            connect \(g_{j}\) to all fanout gates of \(g_{i}\)
            recalculate CSPF \(G\left(g_{i}\right)\) for all \(i\)
            break
        if \(G\left(g_{j}\right) \supset f\left(g_{i}\right)\) and Case2_check of \(g_{i}\) wrt \(g_{j}\)
            disconnect all connections to \(g_{j}\)
            connect \(g_{i}\) to all fanout gates of \(g_{j}\)
            recalculate CSPF \(G\left(g_{i}\right)\) for all \(i\)
            break
        /* Case 3 */
        \(N e w G=G\left(g_{i}\right) \cap G\left(g_{j}\right)\)
        if \(N e w G \neq \emptyset\)
            explore a set of new gates \(g\) such that \(N e w G \supset f(g)\)
            if \(g \in\) New exists and Case3_check of \(g\) wrt \(g_{i}\) and \(g_{j}\)
                disconnect all connections to \(g_{i}, g_{j}\)
                connect \(g\) to all fanout gates of \(g_{i}, g_{j}\)
                recalculate CSPF \(G\left(g_{i}\right)\) for all \(i\)
                break
    endforeach
endwhile
end
```

Fig. 6 Extended gate substitution algorithm.
Case 3. Other Cases. In all other cases, we calculate the conjunction of $G\left(g_{1}\right)$ and $G\left(g_{2}\right)$ denoted by New $G$ which represents the common set of both $G\left(g_{1}\right)$ and $G\left(g_{2}\right)$. If $N e w G$ is not empty, a set of new gates $g$ whose $f(g)$ is included in $N e w G$ (i.e., New $G \supset f(g)$ ) is explored. Such a set is denoted as New. If a gate $g$ in New satisfies all of the conditions of Prop. 4.1 with respect to $g_{1}$ and $g_{2}$, the newly cleated gate $g$ substitutes for both $g_{1}$ and $g_{2}$. Different from Case2_check, we call the check of Prop. 4.1 in Case3 as Case3_check.

The exploration of a new gate $g$ is explained by using an example in Fig. 7. In this example, $G\left(g_{1}\right)=\left({ }^{* *} 10\right), G\left(g_{2}\right)=\left({ }^{*} 1^{*} 0\right)$, and $N e w G=$ (*110) $\left(G\left(g_{1}\right) \cap G\left(g_{2}\right)\right)$. Since the inputs of the new gate $g$ is configured by the inputs of both $g_{1}$ and $g_{2}$ (i.e., $n_{1}$ and $n_{2}$ ), we explore the new gate $g$ by the combination of $n_{1}$ and $n_{2}$ such that $f(g)$ is equal to either (1110) or (0110). In this case, because $f\left(n_{1}\right)=(0101)$ and $f\left(n_{2}\right)=$ (0011), we can identify that $f(g)$ will be (1110) when we take NAND of $x 1$ and $x 2$. Therefore, a NAND gate $g$ is created in New.

### 4.2.1 Example

To demonstrate how our extended gate substitution algorithm works, we apply it to an example circuit.


Fig. 7 Creation of a new gate $g$.


Fig. 8 Example of gate substitution.
Figure 8 (a) shows the SG of this example and Fig. 8 (b) shows a part of the corresponding SI circuit. In this example, we focus on the substitution of gate $g_{2}$ for gate $g_{3}$ at first. Because of $G\left(g_{3}\right)$ ว $f\left(g_{2}\right)$ (i.e., $G\left(g_{3}\right)=\left(11^{* * *} 0000000000000\right) \supset$ $\left.f\left(g_{2}\right)=(111100000000000000)\right)$, the substitution of gate $g_{2}$ for $g_{3}$ is evaluated based on Case2_check.

In Case2_check, only the last condition of Prop. 4.1 is evaluated. For the last condition of Prop. 4.1, states $s$ in $\mathrm{QR}(c s c-)$ and $f_{s}\left(g_{2}\right)=1$ are enumerated. States $s 3$ and $s 4$ correspond to such states. In these states, we check whether all of the direct predecessor states of $s 3$ and $s 4$ are covered by $g_{2}$ or not. In states $s 1, s 2$, and $s 4$ which are the states of pred $_{s 3}$ and pred $_{s 4}$ $\left(\right.$ pred $_{s 3}=\{s 1, s 4\}$ and pred $\left._{s 4}=\{s 2\}\right)$, since $g_{2}$ covers these states (i.e., $f_{s 1}\left(g_{2}\right), f_{s 2}\left(g_{2}\right)$, and $\left.f_{s 4}\left(g_{2}\right)=1\right)$, the last condition of Prop. 4.1 is satisfied (see Fig. 8 (d)) As a result, the substitution of gate $g_{2}$ for gate $g_{3}$ is carried out without leading any hazardous behavior as in Fig. 8 (e).

As another case, to demonstrate how our algorithm blocks an invalid substitution, we consider the substitution of gate $g_{3}$ for gate $g_{2}$. In this case, we can see a violation of the sec-
ond condition of Prop. 4.1 in state $s 5$. In state $s 5$ which is outside of $\operatorname{ER}($ aout +$) \cup \mathrm{QR}($ aout +$)$, both $f_{s 5}\left(g_{2}\right)$ and $G_{s 5}\left(g_{2}\right)$ are equal to 0 . However, because $f_{s 5}\left(g_{3}\right)$ is equal to 1 in state $s 5$, there exists an additional rising transition of aout+ in that state. Since this transition is not specified in the given SG (i.e., aout+ is not enabled in state $s 5$ ), we find out that this is a hazard. Because of the violation of Prop.4.1, our algorithm prevents the substitution.

## 5. Experimental Results

To evaluate our extended gate substitution algorithm, we applied it to several benchmark circuits derived from asynchronous logic synthesis tool Petrify. For this purpose, we implemented our algorithm in JAVA. The experiment was carried out on an Windows 2000 machine which has a Pentium III processor ( 800 MHz ) and a 320 Mbyte memory.

Table 1 shows the results of optimization after our algorithm was applied. The first column shows the name of circuits. The second column shows the number of signals in the original STGs. The third column represents the number of newly inserted signals (i.e., internal signals) by Petrify because of logic decomposition. The fourth column represents the number of states in SGs after the new signals are inserted. The fifth and sixth ones show the number of gates and literals in the original SI circuits obtained by Petrify. Note gate substitution by using newly inserted signals was carried out on circuits nak-pa, slave, and sfir. The seventh and eighth ones show the results after our algorithm was applied to the original SI circuits. The final column shows the calculation time taken by our algorithm.

The experimental results show that the area reduction ratio by our algorithm is about $14 \%$ for both the number of literals and gates. In these optimized circuits, there exist circuits which were not optimized by Petrify (exp and $p c c$ in this experiment) and circuits which were already optimized by Petrify (nak-pa, slave, and sfir). Therefore, by using our algorithm as a post-optimizer of Petrify, we may get more optimum circuits.

In the following, we describe the current limitations of our algorithm. In addition to them, we briefly show the future direction of this work.

- Our algorithm can handle circuits which have 20,000 states. However, explicitly as-

Table 1 Optimization result.

| name | signals | inserted | states | Original SI |  | Optimized SI |  | time |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | (STG) | signals |  | gates | lits. | gates | lits. | (sec.) |
| nak-pa | 10 | 1 | 60 | 7 | 19 | 6 | 17 | 0.19 |
| slave | 14 | 3 | 311 | 11 | 45 | 9 | 36 | 1.45 |
| sfir | 13 | 2 | 238 | 11 | 41 | 10 | 38 | 0.94 |
| exp | 5 | 0 | 16 | 5 | 19 | 4 | 14 | 0.11 |
| pcc | 16 | 0 | 2240 | 9 | 27 | 8 | 25 | 3.37 |

signing these states to input signals to calculate CSPFs faces to computational difficulties if the number of states is increased. Therefore, to solve this problem, an implicit method by using binary decision diagram (BDD) ${ }^{1)}$ will be considered in our future work.

- Our algorithm based on Prop. 4.1 is applicable only for the standard-C architecture. To apply our algorithm to other implementations of SI circuits, we need to modify the assignment method of don't cares and the conditions of valid substitution based on the implementability conditions of such implementations.


## 6. Conclusions

Because in Petrify global optimization is restricted to gate substitution after logic decomposition, redundancy may be remained in the synthesized asynchronous SI circuits. To solve this problem, we proposed an alternative optimization method for SI circuits based on transduction methods. Because the property of hazard-freeness must be satisfied in optimized circuits, the gate substitution algorithm in transduction methods was extended by introducing a calculation method of permissible functions and valid conditions of substitution. The extended gate substitution algorithm was implemented in JAVA.

The experimental results showed that the area reduction ratio by our algorithm was about $14 \%$ for both the number of literals and gates. In addition, the results showed the usefulness of our algorithm in that it could optimize the circuits which were not optimized by Petrify and the circuits which were already optimized by Petrify. From this reason, by using our algorithm as a post-optimizer of Petrify, we may get more optimum circuits.

As future work, to optimize large circuits efficiently, we are going to extend our algorithm by using BDD. Another work is to extend our algorithm so that other implementation styles of SI circuits can be optimized.

Acknowledgments We would like to thank Dr. Alex Kondratyev (Cadence Barkley Lab.) and Prof. Jordi Cortadella (Universitat Politecnica de Catalunya) for their useful comments. This work was partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Scientific Research (B)13480076 and Special Coordination Fund for Science and Technology.

## References

1) Bryant, R.E.: Graph-based algorithms for boolean function manipulation, IEEE Trans. Comput., Vol.C-35, No.8, pp.677-691 (Aug. 1986).
2) Chu, T.A.: Synthesis of Self-Timed VLSI Circuits from Graph-Theoretic Specifications, Ph.D. thesis, MIT, Boston (1987).
3) Cortadella, J., Kishinevsky, M., Kondratyev, A., Lavagnumber, L. and Yakovlev, A.: Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronumberus controllers. IEICE Trans. Inf. Syst., Vol.E80D, No.3, pp.314-325 (March 1997).
4) Futita, M.: A logic synthesis system with multi-level logic circuit minimization mechanism based on transduction methods, IPSJ Trans., Vol.30, No.5, pp.613-623 (May 1989).
5) Kondratyev, A., Cortadella, J., Kishinevsky, M., Lavagnumber, L. and Yakovlev, A.: Logic decomposition of speed-independent circuits, Proc. IEEE, Vol.87, No.2, pp.347-362 (Feb. 1999).
6) Muller, D.E. and Bartky, W.S.: A theory of asynchronous circuits, Proc. International Symposium on the Theory of Switching, pp.204243, Harvard University Press (Apr. 1959).
7) Murata, T.: Petri nets: Properties, analysis and applications, IEEE Press, Vol.77, pp.541580 (Apr. 1989).
8) Muroga, S., Kambayashi, Y., Lai, H.C. and Culliney, J.N.: The transduction methoddesign of logic networks based on permissible functions, IEEE Trans. Comput., Vol.38, No.10, pp.1404-1424 (Oct. 1989).
(Received October 15, 2003)
(Accepted March 5, 2004)


Hiroshi Saito received the B.S and M.S degrees in computer science from the University of Aizu in 1998 and 2000. In 2003, he received the Ph.D. degree in engineering from the University of Tokyo. He is currently a research associate of Research Center for Advanced Science and Technology at the University of Tokyo. His research interests include the computer aided design of asynchronous systems. He is a member of the IEEE and IEICE.


Hiroshi Nakamura received the B.E., M.E., and Ph.D. degree in Electrical Engineering from the University of Tokyo in 1985, 1987, and 1990 respectively. From 1990 to 1996, he was a faculty of Institute of Information Sciences and Engineering at University of Tsukuba, where he was a member of CP-PACS project. He is currently an Associate Professor of Research Center for Advanced Science and Technology at the University of Tokyo. His research interests include computer architecture, high-performance and dependable computing, VLSI design, and bioinformatics. He received the Best Paper Award from IPSJ in 1994 and Sakai Special Researcher Award from IPSJ in 2002. He is a member of the IEEE, the ACM, and the IPSJ.


Masahiro Fujita received the B.S. degree in electrical engineering in 1980, and the M.S. and Ph.D. degrees in information engineering from the University of Tokyo, Tokyo, Japan, in 1982 and 1985, respectively. From 1985 to 1993, he was a Research Scientist with Fujitsu Laboratories, Kawasaki, Japan. From 1994 to 1999, he was the Director of the Advanced Computer-Aided Design Research Group, Fujitsu Laboratories of America, Sunny-vale, CA. He is currently a Professor in the Department of Electrical Engineering, the University of Tokyo, Tokyo, Japan. He has been on program committees for many conferences dealing with digital design and is an Associate Editor of Formal Methods on Systems Design. His primary research interest is in the computer-aided design of digital systems. Dr. Fujita received the Sakai Award from IPSJ in 1984.


Takashi Nanya received the B.E. and M.E. degrees in mathematical engineering and information physics from the University of Tokyo, Japan, in 1969 and 1971, respectively, and the Ph.D. degree in electrical engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 1978. He was with the NEC Central Research Laboratories from 1971 to 1981, and on the faculty of Tokyo Institute of Technology from 1981 to 1996. In 1996, he joined the University of Tokyo where he is a professor. From 2001 to 2004, he served as the director of the Research Center for Advanced Science and Technology. His research interests include dependable computing, VLSI design and asynchronous computing. He received the IEICE Best Paper Award in 1987, the Okawa Publication Award in 1994, and ASP-DAC Best Paper Award in 1998. He has been serving as a vice-chair of IFIP-TC10 "Computer Systems Technology", and IFIP WG10.4 "Dependable Computing and Fault Tolerance". He is a fellow of IEEE and IEICE.


[^0]:    $\dagger$ Research Center for Advanced Science and Technology, The University of Tokyo
    $\dagger \dagger$ Department of Electrical Engineering, The University of Tokyo

