## Regular Paper

# Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set ( $2^{n}-1,2^{n}, 2^{n}+1$ ) 

Shuangching Chen ${ }^{\dagger}$ and Shugang Wei ${ }^{\dagger}$


#### Abstract

High-speed signed-digit (SD) architectures for weighted-to-residue (WTOR) and residue-toweighted (RTOW) number conversions with the moduli set $\left(2^{n}, 2^{n}-1,2^{n}+1\right)$ are proposed. The complexity of the conversion has been greatly reduced by using compact forms for the multiplicative inverse and the properties of modular arithmetic. The simple relationships of WTOR and RTOW number conversions result in simpler hardware requirements for the converters. The primary advantages of our method are that our conversions use the modulo $m$ signed-digit adder (MSDA) only and that the constructions are simple. We also investigate the modular arithmetic between binary and SD number representation using circuit designs and a simulation, and the results show the importance of SD architectures for WTOR and RTOW number conversions. Compared to other converters, our methods are fast and the execution times are independent of the word length. We also propose a high-speed method for converting an SD number to a binary number representation.


## 1. Introduction

In a binary number system, the carry propagation during addition limits the speed of arithmetic operation. A well-known property of the residue number system (RNS) is that the $i$ th residue digit of the sum, difference, and product of the operands is exclusively dependent on the $i$ th digits ${ }^{1), 2}$. Digital signal processing hardware based on RNS is currently considered important for high-speed and low-cost hardware realization ${ }^{3}$. However, residue number arithmetic cannot be used for some applications, because RNS does not have weights in the residue digits. Moreover, fast conversions between the residue numbers and weighted numbers are required. Residue number arithmetic with the moduli set ( $2^{n}-1,2^{n}, 2^{n}+1$ ) has been widely used, because residue addition can be implemented using a binary adder ${ }^{4,5}$ ). When the ordinary binary number system is used in residue arithmetic, the binary arithmetic speed which is bounded by the size of the numbers is the prncipal cause of timing problems.

The redundant binary number representation was first introduced by Avizienis in $1961{ }^{6}$ ). A radix-two signed-digit (SD) number system ${ }^{6), 7)}$, which has a set of $\{-1,0,1\}$ and no need for a separate sign digit, is well known to offer advantages in an arithmetic circuit imple-

[^0]mentation without the carry propagation. Such number representation systems possess sufficient redundancy to allow for the annihilation of carry or borrow chains, and hence result in fast propagation-free addition and subtraction. A novel residue arithmetic hardware algorithm using a radix-two SD number representation has been proposed to implement the modulo $m$ multiplication for the symmetric $\mathrm{RNS}^{8), 9)}$. The key to increasing the computation speed of such multiplication is to perform fast modular additions with a large modulus. The advantages of fast multiplication and addition have been clearly demonstrated, although output decoding is difficult when comparing the magnitudes of two residue numbers.

The weighted-to-residue (WTOR) and re-sidue-to-weighted (RTOW) number conversions are the crucial steps for any successful RNS application. For general moduli sets, residue-to-binary number conversions are based on the Chinese remainder theorem (CRT) or mixed radix conversion (MRC). However, a direct implementation of CRT is unprofitable because it is based on the modulo $M$ operation where $M$ is large. A number of residue-to-binary number converters have been proposed ${ }^{10) \sim 12)}$ for the moduli set $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$. The algorithm of Andraos et al. ${ }^{10)}$ uses multiplicative inverses to simplify the CRT and that of Piestrak ${ }^{11)}$ is an improved algorithm. Arithmetic modulo $2^{2 n}-1$ is required, and the implementation units of carry save adders (CSAs) and
multiplexers improve its speed. The approach of Conway et al. ${ }^{12 \text { ) }}$ uses several coefficients to establish the dynamic range but the CRT equation becomes more complex.

The mixed radix conversion technique is another method of RNS-to-binary number conversion. The classical Szabo-Tanaka ${ }^{1)}$ procedure consists of a MRC and an additional adjustment that is slow and requires $n$ computational steps to get the result, where $n$ is the number of residue system moduli. Several MRC approaches have been developed ${ }^{13) \sim 16)}$. The method of Huang ${ }^{14)}$ uses look-up tables only at the first stage of the implementation, followed by regular standard binary hardware. To get mixed radix coefficients, the MRC method requires $n$ modulo additions and $n$ single input table look-up stages, where the modulo addition can be replaced using 2's complement addition followed by a correction factor on the adder output. The moduli set $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$ is a recent variation of the well-known MRC technique ${ }^{16)}$. The reason for taking the moduli set $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$ is that the residue adder can be efficiently performed in an end-around-carry adder. In addition, the multiplicative inverses of the set result in simplified expressions for the conversion throughout the mathematical relationships. However, the carry propagation will arise inside the mixed radix digits during addition/subtraction and the speed of the arithmetic is limited. In the residue-to-weighted number conversion, the crucial work is using an efficient multiplier and treating the subtraction. In this paper, the SD number system is newly introduced to avoid the carry propagation and simplify the arithmetic for number conversions. We also introduce the idea of multiplicative inverses of the most popular special moduli set $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$. Therefore, a multiplier is not needed for our conversion because the embedded multiplications can be replaced by simple shift-left operations. Moreover, we also find that the timing depends on the order of the moduli $2^{n}-1,2^{n}$ and $2^{n}+1$. The design results show that the proposed residue number arithmetic circuits are faster than those based on binary arithmetic.

In the following section, we give the definition and several properties of a redundant modular representation for RNS, by which an efficient arithmetic algorithm with radix-two SD numbers can be constructed. In Section 3 , novel weighted-to-residue and residue-
to-weighted number conversion algorithms are proposed. The conversion process depends on simple mathematical relationships without multiplications. Section 4 shows the hardware design and the simulation results, including the performance of our proposed conversions. Finally, Section 5 concludes this work.

## 2. Residue Number System and Residue Arithmetic with SD Number Representation

We consider a residue number system that has a set of relatively prime moduli, $\left\{2^{n}, 2^{n}-\right.$ $\left.1,2^{n}+1\right\}$. A residue digit with respect to a modulus $m_{i}$ is represented by the number set:

$$
\begin{equation*}
l_{m_{i}}=\left\{0, \cdots,\left(m_{i}-1\right)\right\} . \tag{1}
\end{equation*}
$$

Let $M=\prod_{i=1}^{3} m_{i}=2^{n}\left(2^{n}-1\right)\left(2^{n}+1\right)$. Szabo et al. ${ }^{1)}$ proved that if $0 \leq A<M$, then the integer $A$ has a one-to-one correspondence to its RNS representation. The integer $A$ is uniquely represented by a 3 -tuple $\left(A_{1}, A_{2}, A_{3}\right)$, where

$$
\begin{equation*}
A_{i}=|A|_{m_{i}}=A-\left[A / m_{i}\right] \times m_{i} \tag{2}
\end{equation*}
$$

for $i=1,2,3$. In the above equation, $\left[A / m_{i}\right]$ is the integer part, and each residue digit is defined as the remainder of least magnitude when $A$ is divided by $m_{i}$.

A residue number $x$ can be represented by an $n$-digit radix-two SD number representation as follows:

$$
\begin{equation*}
x=x_{n-1} 2^{n-1}+x_{n-2} 2^{n-2}+\cdots+x_{0} \tag{3}
\end{equation*}
$$

where $x_{i} \in\{-1,0,1\}$, which can be denoted as $x=\left(x_{n-1}, x_{n-2}, \cdots, x_{0}\right)_{S D}$. In the SD number representation, $x$ has a value in the range of $\left[-\left(2^{n}-1\right), 2^{n}-1\right]$. However, it is difficult to know if $x$ is in $l_{m}$, because of the redundancy of the SD number representation. (The subscript $i$ is omitted.)

To simplify the manipulation of the modular operation in an SD number representation, we apply the definition that each residue digit has the following redundant residue number set:

$$
\begin{align*}
L_{m}= & \left\{-\left(2^{n}-1\right), \cdots, 0, \cdots,(m-1)\right. \\
& \left.\cdots,\left(2^{n}-1\right)\right\} \tag{4}
\end{align*}
$$

Thus, $x$ must be in $L_{m}$ when it is expressed in an $n$-digit SD number representation. Obviously,

$$
\begin{aligned}
-x & =-\left(x_{n-1}, x_{n-2}, \cdots, x_{0}\right)_{S D} \\
& =\left(-x_{n-1},-x_{n-2}, \cdots,-x_{0}\right)_{S D}
\end{aligned}
$$

is also in $L_{m}$.
[Definition 1] Let $X$ be an integer and $m$ be a modulus. Then $x=\langle X\rangle_{m}$ is defined as an integer in $L_{m}$. When $|X|_{m} \neq 0, x$ has one of two possible values:

$$
\begin{equation*}
x=\langle X\rangle_{m}=|X|_{m} \tag{5}
\end{equation*}
$$

and

$$
\begin{equation*}
x=\langle X\rangle_{m}=|X|_{m}-\operatorname{sign}\left(|X|_{m}\right) \times m, \tag{6}
\end{equation*}
$$

where

$$
\operatorname{sign}(x)= \begin{cases}-1 & x<0 \\ 1 & x \geq 0\end{cases}
$$

When $|X|_{m}=0$ and $m=2^{n}-1, x$ has three possible values, that is, $-m, 0$ and $m$.
The integer set $l_{m}$ in (1) is a partial set of $L_{m}$. The intermediate results calculated in $L_{m}$ are used for fast residue arithmetic. If necessary for a final result, they can be converted into $l_{m}$. Thus, the addition, subtraction and multiplication of 3-tuples $A=\left(A_{1}, A_{2}, A_{3}\right)$ and $B=\left(B_{1}, B_{2}, B_{3}\right)$ in an RNS can be represented as follows:

$$
\begin{align*}
A \pm B= & \left(\left\langle A_{1} \pm B_{1}\right\rangle_{m_{1}},\left\langle A_{2} \pm B_{2}\right\rangle_{m_{2}},\right. \\
& \left.\left\langle A_{3} \pm B_{3}\right\rangle_{m_{3}}\right),  \tag{7}\\
A \times B= & \left(\left\langle A_{1} \times B_{1}\right\rangle_{m_{1}},\left\langle A_{2} \times B_{2}\right\rangle_{m_{2}},\right. \\
& \left\langle A_{3} \times B_{3}\right\rangle_{m_{3}}, . \tag{8}
\end{align*}
$$

Obviously, the following properties exist in the redundant modular representation.

## [Property 1]

Let $a$ and $b$ be integers. Then
(a) $\operatorname{abs}\left(\langle a\rangle_{m}\right) \leq 2^{n}-1$,
(b) $\langle a+b\rangle_{m} \equiv\left\langle\langle a\rangle_{m}+\langle b\rangle_{m}\right\rangle_{m}$,
(c) $\langle a \times b\rangle_{m} \equiv\left\langle\langle a\rangle_{m} \times\langle b\rangle_{m}\right\rangle_{m}$,
and
(d) $\langle-a\rangle_{m} \equiv-\langle a\rangle_{m}$,
where $a b s(x)$ is the absolute value of x and $\equiv$ indicates binary congruent modulo $m$.

## 3. Mixed Radix Number System

The basic structures of our converters are shown in Fig. 1. A good way to approach this paper is to consider that the RNS with the SD number representation is integrated with the weighted number system by WTOR and


Fig. 1 Architecture of arithmetic operation.

RTOW number conversions.
Mixed radix conversion is fairly simple in principle. It would be usable in its given form in a residue computation system, because the residue computation system would be equipped to perform the arithmetic operation modulo $m_{i}$, not modulo $M$ as required by the Chinese remainder theorem.

A number $X$ as input or output in the architecture for moduli set $\left\{m_{1}, m_{2}, m_{3}\right\}$ can be expressed in mixed radix form:

$$
\begin{equation*}
X=a_{3}\left(w_{3}\right)+a_{2}\left(w_{2}\right)+a_{1}\left(w_{1}\right) \tag{9}
\end{equation*}
$$

where $0 \leq a_{i}<m_{i}$ for $\mathrm{i}=1,2,3$. The term $a_{i}$ is mixed and $w_{1}=1, w_{2}=m_{1}$ and $w_{3}=$ $m_{2} m_{1}$ are radices. The mixed radix representation of $X$ is denoted by $\left(a_{3}, a_{2}, a_{1}\right)$, where the digits are listed in order of decreasing significance. The moduli $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$ residue number system has gained popularity because a modulo $m$ addition in this system can be efficiently performed in an end-aroundcarry adder. For this reason, this type of RNS can be expected to play an increasing role in an RNS with weighted-to-residue and residue-to-weighted number conversions. Let $m_{3}=2^{n}, m_{2}=2^{n}-1$ and $m_{1}=2^{n}+1$ then $w_{1}=1, w_{2}=2^{n}+1$ and $w_{3}=\left(2^{2 n}-1\right)$. Notice that in Fig. 1, the input and output of the binary number system must be in the legitimate range $\left[0, m_{3} m_{2} m_{1}-1\right]$.

We represent numbers redundantly based on the redundant binary representation. This mainly has the following advantages over a binary number system.
(1) Parallel operation

The carry propagation that arises in the binary number system can be avoided using the redundant number representation.
(2) Fast subtraction

Subtraction can be done using addition with invers addend. In the redundant number representation, the NOT logical operation can be implemented by replacing $Y$ with $-Y$, and we do not need any gates for the NOT logical operation.
(3) Word length

According to the SD number representation, all the residues in the moduli set $\left(2^{n}, 2^{n}-1,2^{n}+1\right)$ are within $n$-digits, so the SD number representation is more efficient than the binary, which needs $(n+1)$-bit binary numbers for the moduli $\left(2^{n}+1\right)$.

Table 1 Rules for adding binary SD numbers.

|  | $a b s\left(x_{i}\right)=a b s\left(y_{i}\right)$ | $a b s\left(x_{i}\right) \neq a b s\left(y_{i}\right)$ |  |  |
| :---: | :---: | :---: | :---: | :---: |
|  |  | $\left(x_{i}+y_{i}\right) \times\left(x_{i-1}+y_{i-1}\right) \leq 0$ | $\left(x_{i}+y_{i}\right) \times\left(x_{i-1}+y_{i-1}\right)>0$ |  |
| $w_{i}$ | 0 | $x_{i}+y_{i}$ | $-\left(x_{i}+y_{i}\right)$ |  |
| $c_{i}$ | $\left(x_{i}+y_{i}\right) / 2$ | 0 | $x_{i}+y_{i}$ |  |

### 3.1 Weighted-to-Residue Number Conversion

The proposed method for conversion from the weighted number system to the residue number system is for a specific moduli set $m_{3}=$ $2^{n}, m_{2}=2^{n}-1, m_{1}=2^{n}+1$. Let $B$ be a $3 n$-digit number, represented as (9), and given by

$$
\begin{equation*}
B=a_{3} \times\left(m_{2} m_{1}\right)+a_{2} \times\left(m_{1}\right)+a_{1} \tag{10}
\end{equation*}
$$

where $0 \leq B<2^{n}\left(2^{n}-1\right)\left(2^{n}+1\right)$. Let $b_{1}=B \bmod 2^{n}+1, b_{2}=B \bmod 2^{n}-1$ and $b_{3}=B \bmod 2^{n}$, where $b_{1}, b_{2}$ and $b_{3}$ are residue digits for the moduli $2^{n}+1,2^{n}-1$ and $2^{n}$. Evidently, $b_{1}=a_{1}$. Then, $b_{2}$ and $b_{3}$ are converted as follows:

We can implement the above number conversion by using binary arithmetic method. In the definition of the SD number for RNS, the legitimate range of $b_{i}$ is $\left[-\left(m_{i}-1\right),\left(m_{i}-1\right)\right]$. We use the modulo operation $\langle\cdot\rangle_{m}$ to deal with $|\cdot|_{m}$ using the SD number representation so that we do not consider whether the converted result is $\geq 0$. Therefore, the conversion can achieve high speed. Moreover, the $b_{i}$ s are directly applicable to next-generation residue SD architecture.

## Algorithm A

Let $a_{1}, a_{2}$ and $a_{3}$ be $n$-digit mixed radix numbers for modulo $2^{n}+1,2^{n}-1$ and $2^{n}$, and let $b_{1}, b_{2}$ and $b_{3}$ be $n$-digit residue numbers. The $a_{i}$ and $b_{i}$ are represented as SD numbers for $i=1,2,3$.
(1) procedure for $b_{1}$

$$
b_{1}=a_{1} ;
$$

(2) procedure for $b_{2}$
(2A) $Z 1=\left\langle 2 a_{2}\right\rangle_{2^{n}-1}$;
(2B) $b_{2}=\left\langle Z 1+a_{1}\right\rangle_{2^{n}-1}$;


Fig. 2 Modulo $m$ signed-digit adder (MSDA).
(3) procedure for $b_{3}$
(3A) $R 1=\left\langle a_{2}+a_{1}\right\rangle_{2^{n}}$;
(3B) $R 2=\left\langle-a_{3}\right\rangle_{2^{n}}$;
(3C) $b_{3}=\langle R 2+R 1\rangle_{2^{n}}$;
In (2A), we perform a modular doubling. We first shift $a_{2}$ to the left and re-insert it to the least significant digit (LSD). In (3B) and (3C), we use one modulo $m$ signed-digit Addition (MSDA) by replacing the addend with $-a_{3}$. Therefore, we need three MSDAs ${ }^{8)}$ for (2B), (3A) and (3C). In the above algorithm, $b_{1}, b_{2}$ and $b_{3}$ can be performed in parallel.

The proposed converters are constructed by using MSDAs. A circuit diagram of an $n$-digit MSDA with $n$ SD full adders (SDFAs) is shown in Fig.2. One SDFA consists of one ADD1 and one ADD2. The ADD1 generates the intermediate sum and the intermediate carry, and the ADD2 sums the low intermediate carry and the intermediate sum. Let $c_{i}$ and $w_{i}$ be the carry and the intermediate sum of the $i$ th digit position, respectively. Their values are set as shown in Table 1 with respect to the values of $x_{i}, y_{i}, x_{i-1}$ and $y_{i-1}$. Thus, the modulo $m$ addition can be performed in parallel without the carry propagation ${ }^{8)}$. We use $\otimes$ to mean a 1-by-1 multiplier.
[Example 1] We now illustrate the conversion procedure for $n=3$ and $B=411=$ $a_{3}\left(m_{2} m_{1}\right)+a_{2}\left(m_{1}\right)+a_{1}$ where $m_{3}=2^{3}, m_{2}=$ $2^{3}-1$ and $m_{1}=2^{3}+1$. Thus, the inputs, the mixed radix digits, are $a_{3}=6=$ (110), $a_{2}=3=(011)$ and $a_{1}=6=(110)$. Encoding the binary representation to the SD number representation is not difficult if a sign bit is inserted for all binary bits. In the following procedure, the addition is replaced with an MSDA.

Hence $b_{1}=(110)$, and

$$
\begin{aligned}
b_{2} & =\left\langle\langle 2(011)\rangle_{2^{3}-1}+(110)\right\rangle_{2^{3}-1} \\
& =\langle(110)+(110)\rangle_{2^{3}-1} \\
& =(101) \\
b_{3} & =\left\langle(\overline{1} \overline{1} 0)+\langle(011)+(110)\rangle_{2^{3}}\right. \\
& =\langle(\overline{1} \overline{1} 0)+(001)\rangle_{2^{3}} \\
& =(1 \overline{1} 1)
\end{aligned}
$$

where $\overline{1}=-1$. Thus, the output is ((1111), (101), (110)).

### 3.2 Arithmetic of Residue Number System

We now consider the arithmetic of RNS. A number in RNS is represented by the residue of each modulus, and arithmetic operations are accomplished based on each modulus. Suppose that two numbers, $B$ and $D$, are represented as $B=\left(b_{3}, b_{2}, b_{1}\right)$ and $D=\left(d_{3}, d_{2}, d_{1}\right)$ in RNS for moduli $m_{3}=2^{n}, m_{2}=2^{n}-1$ and $m_{1}=2^{n}+1$. The arithmetic of $B$ and $D$ in RNS satisfies Eqs. (7) and (8).

Because the moduli are independent of each other, there is no carry propagation among them. All residue digits in the binary system are positive by the definition of the mod operation. If $x_{i}=b_{i}-d_{i}<0$, then $x_{i}=m_{i}+\left(b_{i}-d_{i}\right)$. However, in the SD number system, by Definition 1 , we do not consider whether $x_{i}$ is $\geq 0$ or $<0$, and this enables high speed. Moreover, no carry propagation arises during the addition inside the residue number digits.
[Example 2] We provide an example of the arithmetic of residue numbers with SD number representation. Let $B=((1011),(1010)$, $(10 \overline{1} 1))$ and $D=((0110),(01 \overline{1} 0),(0101))$ for the moduli $2^{4}, 2^{4}-1$ and $2^{4}+1$. Consider $B+D$ in RNS. We use three MSDAs to calculate it, and this procedure is shown in Fig. 3. Note that $c_{-1}$ is evaluated using the end-around-carry. The result is $((0001),(\overline{1} 101),(\overline{1} 1 \overline{1} \overline{1}))$.

These characteristics allow RNS computations with SD number representations to be complemented more quickly.


Fig. 3 MSDA in RNS.

### 3.3 Residue-to-Weighted Number Conversion

Next, we consider the conversion of the residue number to the weighted format with respect to the moduli set $\left(2^{n}, 2^{n}-1,2^{n}+1\right)$. To recover the residue number representation of $X$, the basic formula of MRC is applied.

Let $X$ correspond to the residue $\left(x_{3}, x_{2}, x_{1}\right)$ for moduli $\left(m_{3}, m_{2}, m_{1}\right)=\left(2^{n}, 2^{n}-1,2^{n}+1\right)$. The converter computes the number $X$ from the 3 -tuple $\left(x_{3}, x_{2}, x_{1}\right)$, and the mixed radix notation is as follows:

$$
\begin{align*}
a_{1}= & x_{1}  \tag{13}\\
a_{2}= & \left|\left|\frac{1}{m_{1}}\right|_{m_{2}} \times\left|x_{2}-a_{1}\right|_{m_{2}}\right|_{m_{2}}  \tag{14}\\
a_{3}= & \left|\left|\frac{1}{m_{2}}\right|_{m_{3}} \times\right|\left(\left|\frac{1}{m_{1}}\right|_{m_{3}} \times\left|x_{3}-a_{1}\right|_{m_{3}}\right. \\
& \left.-a_{2}\right)\left.\left.\right|_{m_{3}}\right|_{m_{3}} \tag{15}
\end{align*}
$$

Modulo $m$ multiplications and modulo $m$ subtractions are needed to compute $a_{2}$ and $a_{3}$. The above number conversion is usually implemented using binary number arithmetic ${ }^{16)}$. As pointed out earlier, in the SD number system, we do not need any gates for the NOT logical operation, so the modulo $m$ subtraction is easily implemented by replacing $Y$ with $-Y$ in the modulo $m$ addition. In the following, we introduce compact forms of the multiplicative inverse for the moduli set $\left(2^{n}-1,2^{n}, 2^{n}+1\right)$. The modulo $m$ multiplication can be implemented by shifting/inverting the multiplicand. The proof is shown in Property 2.
[Property 2] Let $m_{3}=2^{n}, m_{2}=2^{n}-1$ and $m_{1}=2^{n}+1$. The multiplicative inverses are as follows:

$$
\begin{align*}
& \left\langle\frac{1}{2^{n}-1}\right\rangle_{2^{n}}=-1  \tag{16}\\
& \left\langle\frac{1}{2^{n}+1}\right\rangle_{2^{n}}=1  \tag{17}\\
& \left\langle\frac{1}{2^{n}+1}\right\rangle_{2^{n}-1}=2^{n-1} . \tag{18}
\end{align*}
$$

The proofs of (16), (17), (18) are based on the fact that $|x| \frac{1}{x} \|_{m_{j}}=1$.
Proof of (16):

$$
\begin{aligned}
\left|\left(2^{n}-1\right) \cdot(-1)\right|_{2^{n}} & =\left|-2^{n}+1\right|_{2^{n}} \\
& =1
\end{aligned}
$$

Proof of (17):

$$
\left|\left(2^{n}+1\right) \cdot(1)\right|_{2^{n}}=\left|2^{n}+1\right|_{2^{n}}
$$

Proof of (18):

$$
\begin{aligned}
\mid\left(2^{n}+1\right) & \left.\cdot\left(2^{n-1}\right)\right|_{2^{n}-1} \\
& =\left|\left(2^{n}-1\right)\left(2^{n-1}\right)+2^{n}\right|_{2^{n}-1} \\
& =\left|2^{n}\right|_{2^{n}-1} \\
& =\left|\left(2^{n}-1\right)+1\right|_{2^{n}-1} \\
& =1
\end{aligned}
$$

The most efficient order of the moduli is $m_{3}=2^{n}, m_{2}=2^{n}-1$ and $m_{1}=2^{n}+1$. The reason is as follows. Because the longest delay path is for calculating $a_{3}$ in the order $m_{3}=2^{n}, m_{2}=2^{n}-1, m_{1}=2^{n}+1$, the multiplication needed for $a_{3}$ is $\left|\frac{1}{m_{1}}\right|_{m_{3}}=1$, whose multiplication is not required, and $\left|\frac{1}{m_{3}}\right|_{m_{3}}=-1$, whose multiplicative inverses are all sign digits. In SD number representation, the inversion procedure is simple.

In the following procedure, we use operation $\langle\cdot\rangle_{m}$ instead of $|\cdot|_{m}$. Substituting Eqs. (16), (17) and (18) into Eqs. (14) and (15) yields

$$
\begin{align*}
a_{2} & =\left\langle 2^{n-1} \times\left\langle x_{2}-a_{1}\right\rangle_{m_{2}}\right\rangle_{m_{2}}  \tag{19}\\
a_{3} & \left.=\left\langle(-1) \times\left\langle\left\langle x_{3}-a_{1}\right\rangle_{m_{3}}-a_{2}\right)\right\rangle_{m_{3}}\right\rangle_{m_{3}} \\
& =\left\langle-\left\langle\left(x_{3}-a_{1}\right)\right\rangle_{m_{3}}+\left(a_{2}\right)\right\rangle_{m_{3}} .(20 \tag{20}
\end{align*}
$$

The coefficients imply simple shift-left operations. Note that $-\left(m_{1}-1\right) \leq a_{1} \leq m_{1}-1$, $-\left(m_{2}-1\right) \leq a_{2} \leq m_{2}-1$, and $-\left(m_{3}-1\right) \leq$ $a_{3} \leq m_{3}-1$.
[Example 3] Suppose that $X=43$ and that its residue representation is $((011),(01 \overline{1}),(111))$ for moduli $2^{3}, 2^{3}-1$, and $2^{3}+1$. The proposed residue-to-weighted number conversion, shown in Fig. 4, derives the mixed radix digit $a_{1}=$ $(111)=7, a_{2}=(0 \overline{1} \overline{1})=-3$ and $a_{3}=(01 \overline{1})=$ 1. According to $(9), 1(63)-3(9)+7=43=X$.

## Algorithm B

Let $x_{1}, x_{2}$ and $x_{3}$ be the residue numbers for modulo $2^{n}+1,2^{n}-1$ and $2^{n}$, and let $a_{1}, a_{2}$ and $a_{3}$ be mixed radix digits.
(1) procedure for $a_{1}$ $a_{1}=x_{1} ;$
(2) procedure for $a_{2}$ (2A) $Z 1=\left\langle x_{2}-a_{1}\right\rangle_{m_{2}}$; (2B) $a_{2}=\left\langle 2^{n-1} Z 1\right\rangle_{m_{2}}$;
(3) procedure for $a_{3}$
(3A) $R 1=\left\langle x_{3}-a_{1}\right\rangle_{m_{3}}$;
(3B) $R 2=\langle-R 1\rangle_{m_{3}}$;


Fig. 4 Example of residue-to-weighted number conversion.
(3C) $R 3=\left\langle R 2+a_{2}\right\rangle_{m_{3}}$;
In (2A), (3A) and (3C), we use MSDA by changing the addends to $-a_{1}$ and $-R 1$. In (2B), the most significant $(n-1)$ positions will shift to the left and are re-inserted into the LSD.

### 3.4 Conversion to Binary Numbers

Because one-to-one correspondence between input and output is needed, we convert the $T=a_{3}\left(2^{2 n}-1\right)+a_{2}\left(2^{n}+1\right)+a_{1}$ into $l_{m}$. First, we convert $a_{1}, a_{2}$ and $a_{3}$ to the 2 's complement $(n+1)$-bit binary numbers $a_{1}^{+}, a_{1}^{-}, a_{2}^{+}, a_{2}^{-}, a_{3}^{+}$ and $a_{3}^{-}$, where the $a_{i}^{+}$are the positive digits and the $a_{i}^{-}$are the negative digits for $a_{i}$ $(i=1,2,3)$. Then, we use three $n$-bit prefix adders to calculate $S A=a_{1}^{+}+a_{1}^{-}, S B=$ $a_{2}^{+}+a_{2}^{-}$and $S C=a_{3}^{+}+a_{3}^{-}$in parallel, where $S A, S B$ and $S C$ are 2's $(n+1)$-bit complement numbers. The prefix adder is based on the Kogge-Stone tree structure ${ }^{18)}$ which uses the associative operator ${ }^{\prime} o$ ' defined in Ref. ${ }^{19)}$ to implement the carry computation. Note that $S=S C\left(2^{2 n}-1\right)+S B\left(2^{n}+1\right)+S A$, and that $S$ is an ordinary binary number. If $S<0$, then we add $M$ to $S$ and $S$ will be returned to $l_{m}$, where $M=2^{n}\left(2^{n}-1\right)\left(2^{n}+1\right)$. Therefore, $S+M=\left(2^{n}+S C\right)\left(2^{2 n}-1\right)+S B\left(2^{n}+1\right)+S A$. We can consider $2^{n}+S C$ to mean "add ' 1 ' to the $(n+1)$ th position of $\mathrm{SC}^{\prime \prime}$. Thus, the output is within the legitimate range $[0, M-1]$ and it is also a binary number representation. Notice that SC is an $(n+1)$-bit ordinary binary representation, and SB and SA are 2's $(n+1)$-bit complement ordinary binary representations.

## 4. Hardware Design and Performance Evaluation

In binary logic, a single memory space is all that is needed for a digit (bit), while in ternary logic the sign of the digit requires an extra space. In hardware implementation, an SD digit $x$ is encoded as a 2-bit binary code defined as $x=\left[x^{s}, x^{a}\right]$, where $x^{s}$ is the sign and $x^{a}$ is the absolute value. This demands more hardware resources, but it also reallocates space, which affects the overall speed. We use a hardware description language, VHDL, to design the residue arithmetic circuits for the implementation of the proposed converters. Then, we performed a simulation under the conditions of $1-\mu \mathrm{m}$ CMOS gate array technology. To compare the performance of the proposed circuits and conventional ones, we designed a binary architecture using the same technology by using a synthesis tool called Design Complier from Synopsys. The implementations are relative between binary number arithmetic and SD number arithmetic. Therefore, the performance comparison results may have the same relative rates under advanced design technology such as $0.3-\mu \mathrm{m}$ CMOS gate array technology.

### 4.1 Proposed Architecture

The proposed weighted-to-residue number converter consists of three MSDAs, which are used for (2B), (3A) and (3C) in Algorithm A, and one shift block. The WTOR converter is shown in Fig. 5. The MSDA consists of $n$ identical sub-blocks SDFAs, as shown in Fig. 2 and the shift block is for shifting the most significant digit to the left and re-inserting it into the LSD corresponding to (2A) in Algorithm A. In Fig. 5, $\ominus$, which corresponds to (3B) in Algorithm A, means to replace $y_{i}$ with $-y_{i}$. Note that $b_{1}, b_{2}$ and $b_{3}$ can be implemented in parallel by the proposed architecture.

The proposed residue-to-weighted number converter, which consists of three MSDAs and one shift block, is shown in Fig. 6. The three MSDAs are for steps (2A), (3A) and (3C) in Algorithm B. The shift block which corresponds to (2B) in Algorithm B, is for shifting the most significant ( $n-1$ ) positions to the left and reinserting them into the LSD.

The proposed WTOR and RTOW number converters are very simple, and only three MSDAs are required for both.


Fig. 5 Weighted-to-residue number converter.


Fig. 6 Residue-to-weighted number converter.

### 4.2 Performance Evaluation and Comparison

Our aim is to enable high-speed conversion, so drawbacks in terms of area are not problems. Because a residue number computing system includes a number of residue number arithmetic circuits such as adders and multipliers (see Fig. 1), we list the evaluations of residue adders and multipliers not only in their binary number representations but also in SD number representations. The performance of the MSDA with the gate-level design is shown in Table 2. The data in the first column is the dynamic range for different choices of $n$. The simulation results show that the delay time of MSDA is independent of $n$. Kalampoukas, et al. ${ }^{20)}$ gives a good way to deal with modulo $\left(2^{n}-1\right)$ adders using parallel-prefix structure ${ }^{19)}$. Their model
assumes that each gate, excluding exclusiveOR, counts as one elementary gate or both area and delay. Their delay time ${ }^{20)}$ is $2 \log _{2} n+3$ and their chip area is $3 n \log _{2} n+4 n$. However, this technique cannot be applied to the modulo $\left(2^{n}+1\right)$ adder because the end-around-carry is negative. On the other hand, MSDA can deal with the modulo $\left(2^{n}+1\right)$ adder in a manner similar to the modulo $\left(2^{n}-1\right)$ adder, because the SD number representation is redundant. When the modulus $2^{n}-1$ is large, MSDA has more advantages than Kalampoukas, et al. ${ }^{20)}$. The relationship is shown in Fig. 7.

Next, we compare the modular SD multiplier with the binary multiplier. Efstathiou, et al. ${ }^{17)}$ gives an efficient binary modulo $2^{n}-1$ multiplier. A Booth-code method was used to reduce the numbers of modulo $2^{n}-1$ partial products and thus the chip area is smaller. In the modulo $2^{n}+1$ multiplier, we calculate $(n \times n)$-bit multiplication, and the result is $\bmod \left(2^{n}+1\right)^{5)}$. The modular SD multiplier is designed as a binary tree structure ${ }^{8)}$ to increase speed. The multiplier in RNS is shown in Table 3. The SD multiplier is faster than the binary multiplier.

Tables 2 and 3 show that SD number architectures are faster than binary architectures in RNS arithmetic. Therefore, the high-speed con-

Table 2 Performance of modulo $2^{n}-1$ adder.

| $n$ | area (gates) |  | delay (ns) |  |
| :--- | :---: | :---: | :---: | :---: |
|  | $\mathrm{PA}^{20}$ ) | SD number | $\left.\mathrm{PA}^{20}\right)$ | SD number |
| 4 | 60 | 120 | 4.95 | 5.46 |
| 8 | 136 | 240 | 5.7 | 5.46 |
| 16 | 378 | 480 | 8.82 | 5.46 |


verters architectures for WTOR and RTOW with SD number representation, as shown in Fig. 1, are important. The tables show that a residue number arithmetic system with the SD number arithmetic can have higher performance than that with the ordinary binary arithmetic.

Table 4 shows a comparison of the performance of WTOR number converters with the architecture shown in Fig. 5. Because the longest delay path is dependent on the delay time of two MSDAs when $n$ is large, the proposed WTOR number converter is faster than binary number converters.

The methods used for the RNS to binary conversion are based on CRT or MRC. The Andraos-Ahmad algorithm, which introduces compact forms of multiplicative inverses to simplify CRT, is a well-known technique. The algorithm uses four adders, two of which operate in parallel, to convert the moduli $\left(2^{n}+1,2^{n}, 2^{n}-\right.$ 1) residue number into their binary equivalent. Recently, Piestrak ${ }^{11)}$ suggested a simplification of the Andraos-Ahmad technique: the value of the $-r_{1}$ modulo $2^{2 n-1}$ of Andraos, et al. ${ }^{10)}$ can be easily obtained by manipulating $r_{1}$. Piestrak ${ }^{11)}$ proposed two methods. The first method, referred to as the cost-effective (CE) version, uses two $2 n$-bit CSAs and one $2 n$ CPA with an end-around-carry to calculate the $A+B+C-r_{1}$ of Andraos, et al. ${ }^{10)}$. The other method, which is referred to as the high-speed (HS) version, uses two $2 n$-bit CSAs and two parallel $2 n$-bit CPAs followed by a multiplexer. Our technique is comparable to Piestrak's technique.

Now, we evaluate the proposed RTOW converter design.

Table 4 Performance of weighted-to-residue number converter.

| $n$ | area (gates) |  | delay (ns) |  |
| :--- | :---: | :---: | :---: | :---: |
|  | binary | SD number | binary | SD number |
| 4 | 116 | 368 | 12.17 | 10.41 |
| 8 | 265 | 736 | 21.31 | 10.41 |
| 16 | 627 | 1,472 | 40.67 | 10.41 |

Fig. 7 Comparison of adders.
Table 3 Performance of parallel modulo $2^{n} \pm 1$ multipliers.

| area (gates) | delay (ns) |  |  |  |  |  |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|  | binary <br> $\left(2^{n}+1\right)$ | binary ${ }^{17)}$ <br> $\left(2^{n}-1\right)$ | SD number <br> $\left(2^{n} \pm 1\right)$ | binary $)^{5)}$ <br> $\left(2^{n}+1\right)$ | binary 17) <br> $\left(2^{n}-1\right)$ | SD number <br> $\left(2^{n} \pm 1\right)$ |
|  | 177 | 136 | 474 | 19.72 | 19.72 | 16.94 |
| 8 | 748 | 456 | 2,106 | 42.03 | 37.03 | 22.92 |
| 16 | 3,226 | 2,103 | 8,826 | 91.85 | 73.75 | 29.94 |

Table 5 Performance of the residue-to-weighted number converter, with a comparison of cost-effective and high-speed methods.

| $n$ | area (gates) |  |  |  | delay (ns) |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | CRT |  | MRC |  | CRT |  | MRC |  |
|  | CE ${ }^{11)}$ | HS ${ }^{11)}$ | binary ${ }^{16)}$ | SD number | CE ${ }^{11)}$ | HS ${ }^{11)}$ | binary ${ }^{16)}$ | SD number |
| 4 | 240 | 345 | 147 | 376 | 20.06 | 15.25 | 18.73 | 13.75 |
| 8 | 480 | 689 | 329 | 752 | 32.38 | 22.56 | 38.81 | 13.75 |
| 16 | 960 | 1,377 | 727 | 1,504 | 57.02 | 37.19 | 74.62 | 13.75 |

Table 6 Performance of prefix adder.

| n | area (gates) | delay $(\mathrm{ns})$ |
| :--- | :---: | :---: |
| 4 | 48 | 5.79 |
| 8 | 119 | 7.75 |
| 16 | 284 | 9.83 |

$$
\begin{align*}
& \text { Area }_{R T O W}=3 A_{M S D A}+A_{\text {shift }}  \tag{21}\\
& \text { Delay }_{R T O W}=2 \triangle_{M S D A} \tag{22}
\end{align*}
$$

where $A_{M S D A}$ and $A_{\text {shift }}$ are the areas of the $M S D A$ and the shifter in Fig. 6, and the delay of the MSDA is $\triangle_{M S D A}$. The execution time of the MSDA is $\mathrm{O}(1)$, which is independent of $n$. We find the best order is $m_{3}=2^{n}-1, m_{2}=2^{n}$ and $2^{n}+1^{16)}$. Their longest delay path is in calculating $a_{3}$, and the inversion can be easily replaced using 1's complement representation. A comparison between the proposed converter and CRT ${ }^{11)}$ is provided in Table 5. The results show that our converter is very fast and that the delay time is independent of $n$.

To convert SD number representation to binary number representation, we use prefix adders, as mentioned in Section 3.4. The performance of one prefix adder is illustrated in Table 6, which shows that the conversion is fast. The binary number representation is a 2 's complement representation, but it is $\geq 0$. The 2's complement representation is suitable for our converter.

## 5. Conclusion

We presented simple weighted-to-residue and residue-to-weighted number converters for moduli $2^{n}, 2^{n}-1$ and $2^{n}+1$. Our method requires only addition for WTOR and RTOW number conversion. The proposed converters have many advantages, which have been demonstrated throughout this paper using examples and analysis. Our simulation shows that the performance of our converters is comparable to that of binary architectures and that the proposed schemes are high-speed architectures. The RNS arithmetic operation based on the SD number system has been shown to be
more efficient than the binary number system. In addition, the experimental results show that the proposed WTOR and RTOW number converters, which can be easily implemented on hardware and make the calculation less complicated and more efficient, are high-speed architectures. Thus, our method is more efficient and less complicated than the binary number system in the arithmetic operation. We also provided a method for converting an SD number to a binary number representation.

## References

1) Szabo, N.S. and Tanaka, R.I.: Residue Arithmetic and Its Applications to Computer Technology, New York, McGraw-Hill (1967).
2) Paliouras V. and Stouraitis T.: Novel highradix residue number system architectures, IEEE Trans. circuits and systems II., Vol.47, No.10, pp.1059-1073 (Oct. 2000).
3) Sonderstrand, M.A., Jendins, W.K., Junllien, G.A. and Taylor, F.J.: Residue Number System Arithmetic: Modern Applications in Digital Signal Processing, IEEE Press, New York (1986).
4) Skavantzos, A. and Rao, P.B.: New multipliers modulo $2^{n}-1$, IEEE Trans. Comput., Vol.41, No.8, pp.957-961 (Aug. 1992).
5) Hiasat, A.: New memoryless mod $\left(2^{n} \pm 1\right)$ residue multiplier, Electronics Letters, Vol.28, No.3, pp.314-315 (Jan. 1992).
6) Avizienis, A.: Signed-digit number representations for fast parallel arithmetic, IRE Trans. Elect. Comput., EC-10, pp.389-400 (Sep. 1961).
7) Parhami, B.: Carry-free addition of recoding binary signed-digit numbers, IEEE Trans.comput., Vol.37, No.11, pp.1470-1476 (Nov. 1988).
8) Wei, S. and Shimizu, K.: A novel residue arithmetic hardware algorithm using a signed-digit number representation, IEICE Trans. Inf. \& Syst., Vol.E83-D, No.12, pp.2056-2064 (Dec. 2000).
9) Wei, S. and Shimizu, K.: Compact residue arithmetic multiplier based on the radix4 signed-digit multiple-valued arithmetic circuits, IEICE Trans.Electron., Vol.E82-C, No.9, pp.1647-1645 (Sep. 1999).
10) Andraos, S. and Ahmad, H.: A new efficient memoryless residue to binary converter, IEEE Trans. Circuits Syst., Vol.CAS-35, pp.14411444 (Nov. 1988).
11) Piestrak, S.J.: A high-speed realization of a residue to binary number system converter, IEEE Trans. Circuits Syst.II, Vol.42, No.10, pp.661-663 (Oct. 1995).
12) Conway, R. and Neison, J.: Fast converter for 3 moduli RNS using new property of CRT, IEEE Trans. Comput., Vol.48, No.8, pp.852860 (Aug. 1999).
13) Baraniecka, A. and Jullien, G.: On decoding techniques for residue number system realizations of digital signal processing hardware, IEEE Trans. Circuits Syst., Vol.CAS-25, pp.935-936 (Nov. 1978).
14) Huang, C.: A fully parallel mixed-radix conversion algorithm for residue number applications, IEEE Trans. Comput., Vol.C-32, pp.398402 (Apr. 1983).
15) Chakraborti, N., Soundararajan, J. and Reddy, A.: An implementation of mixed-radix conversion for residue number applications, IEEE Trans. Comput., Vol.C-35, pp.762-764 (Aug. 1986).
16) Ananda Mohan, P.V.: Evaluation of fast conversion techniques for binary-residue number system, IEEE Trans. Circuits Syst.-I, Vol.45, No.10, pp.1107-1109 (Oct. 1998).
17) Efstathiou, C., Vergos, H.T. and Nikolos, D.: Modified booth modulo $2^{n}-1$ multipliers, IEEE Trans. Comput., Vol.53, No.3, pp.370374 (Mar. 2004).
18) Kogge, P.M. and Stone, H.S.: A parallel algorithm for the efficient solution fo a general class of recurrence equations, IEEE Trans. Comput., Vol.22, No.8, pp.783-791 (Aug. 1973).
19) Brent, R.P. and Kung, H.T.: A regular layout for parallel adders, IEEE Trans. Comput., Vol.31, No.3, pp.260-264 (Mar. 1982).
20) Kalampoukas, L., Nikolos, D., Efstathiou, C., Vergos, H.T. and Kalamatianos, J.: High-speed parallel-prefix modulo $2^{n}-1$ adders, IEEE Trans. Comput. Vol.49, No.7, pp.673-680 (July 2000).
(Received September 15, 2005)
(Accepted March 2, 2006)
(Online version of this article can be found in the IPSJ Digital Courier, Vol.2, pp.328-337.)


Shuangching Chen was born in Kaohsiung, Taiwan on July 18, 1972. He received the B.E. degree in Applied Mathematic from Feng Chia University, Taiwan, Republic of China in 1994, and the M.E. degree in Computer Science from Gunma University, Kiryu, Japan in 2002. He is currently a doctoral student at the Department of Computer Science, Gunma University. His research interests include parallel computer architecture, residue architecture, VLSI design and digital signal processing. He is a student member of IPSJ and IEICE.


Shugang Wei was born in Harbin, China on September 19, 1957. He received the B.E. degree in Radio Engineering from the Harbin Institute of Technology, Harbin, China, the M.E. degree in Computer Science from Gunma University, Kiryu, Japan and the Dr. Eng. degree in Electronic Engineering from Tohoku University, Sendai, Japan, in 1982, 1987, and 1990, respectively. He was an Assistant Professor with the Department of Radio Engineering, Harbin Institute of Technology from 1982 to 1984. In 1990 he joined Matsushita Communication Industrial Co., Ltd., Yokohama, Japan. At present he is an Associate Professor in the Department of Computer Science, Faculty of Engineering, Gunma University. His research interests include logic design, high-speed arithmetic circuits, VLSI systems and digital audio signal processing. Dr. Wei is a member of the Acoustical Society of Japan, IEICE and IEEE.


[^0]:    $\dagger$ Department of Computer Science, Gunma University

