# A FRAMEWORK FOR THE DESIGN OF ERROR-AWARE POWEREFFICIENT FIXED-WIDTH BOOTH MULTIPLIERS 

Min-An Song, Lan-Da Van*, Chih-Chyau Yang*, Shih-Chieh Chiu, and Sy-Yen Kuo<br>Department of Electrical Engineering, National Taiwan University, Taiwan<br>*Chip Implementation Center (CIC), National Applied Research Laboratories, Taiwan<br>E-mail: sykuo@cc.ee.ntu.edu.tw and ldvan@cic.org.tw


#### Abstract

In this paper, a framework of designing a low-error and power-efficient two's-complement fixed-width Booth multiplier that receives two $n$-bit numbers and produces an $n$-bit product is proposed. The design methodology of the framework involving four steps results in one better errorcompensation bias. The better error-compensation bias can be mapped to a simple low-error fixed-width Booth multiplier with a little penalty of power consumption. For the benchmark of $8 \times 8$ multipliers, the simulation results show that a reduction of $82.04 \%$ average error compared to that using the direct-truncated fixed-width Booth multiplier can be obtained. Moreover, the power consumption can be saved $40.68 \%$ compared to that of full-precision Booth multiplier design.


## 1. Introduction

During the past decade, digital signal processing (DSP) kernels stand the test for widely multimediacommunication applications. Among many DSP kernels such as digital filters [1] and wavelet transform, it is desirable to maintain constant output word length and low power operations. The Baugh-Wooley based fixed-width multipliers have been widely studied in [1-5]. Recently, the Booth based fixed-width multipliers have been mainly attracted and actively researched [8-11]. The modified Booth algorithm proposed by the MacSorley [6] in which a triplet of bits is scanned at a time. Area saving of a fixedwidth Booth multiplier can be achieved by directly truncating $n$ least significant columns and preserving $n$ most significant columns. In this paper, we are motivated to propose a systematic framework for low-error powerefficient Booth multipliers. The methodology of the framework includes the following steps in order: 1) Propose an error-compensation bias with a new binary thresholding for a fixed value of $w ; 2$ ) simulate the value of $K$ and error performance of the proposed errorcompensation bias using our generalized index, and then select the best index having lower error and satisfying the same value of $K$ for limited width $n$; 3) verify the realizable error compensation bias by statistical techniques. 4) construct a low-error fixed-width Booth multiplier structure. According to our proposed framework, while $w=2$, the proposed fixed-width Booth multiplier possesses lower error than those in $[8,10,11]$ at the expense of slightly increased power consumption with respect to each value of $w$. Thus, the proposed architecture is of erroraware and power-efficient. The organization of this paper
is as follows. The modified Booth algorithm is concisely reviewed in Section 2. In Section 3, we propose a framework to design a better error-compensation bias and present the simulation results for limited width $n$. The improved error-compensation bias can be mapped to a new structure with respect to each value of $w$. The performance as well as implementation results of the proposed structure are described in Section 4. Finally, brief remarks in Section 5 conclude the presentation.

## 2. Modified Booth Multiplier

Considering the multiplication of two 2 's-complement integers with $n$-bit multiplicand $A$ and $n$-bit multiplier $B$ as

$$
\begin{equation*}
P=A B=\sum_{i=0}^{2 n-1} P_{i} 2^{i} \tag{1}
\end{equation*}
$$

Assume $n$ is even and the $n$-bit multiplier $B$ can be rewritten as

$$
\begin{gather*}
B=\sum_{\mathrm{i}=0}^{(n-2) / 2}\left(b_{2 i-1}+b_{2 i}-2 b_{2 i+1}\right) 2^{2 i},  \tag{2}\\
P=A B=\sum_{\mathrm{i}=0}^{(n-2) / 2}\left(b_{2 i-1}+b_{2 i}-2 b_{2 i+1}\right) A 2^{2 i}=\sum_{i=0}^{(n-2) / 2} S_{i}, \tag{3}
\end{gather*}
$$

where $S_{i}=\left(b_{2 i-1}+b_{2 i}-2 b_{2 i+1}\right) A 2^{2 i}$, and it is known that the scanning of triplets begins from $b_{-1}$ to the MSB with one-bit overlapping.
We define the following notation

$$
\begin{equation*}
S_{i}=S_{i, n-1} 2^{2 i+n-1}+S_{i, n-2} 2^{2 i+n-2}+\ldots+S_{i, 0} 2^{2 i} \tag{4}
\end{equation*}
$$

where $S_{i, j}$ represents the bit product of the $i$-th row. According to the sign-generate sign extension scheme [7], for an 8 by 8 multiplier, the sign of the final result can be expressed as

$$
\begin{align*}
S & =\left(S_{0,7} \sum_{\mathrm{j}=8}^{15} 2^{j}\right) 2^{0}+\left(S_{1,7} \sum_{\mathrm{j}=8}^{13} 2^{j}\right) 2^{2}+\left(S_{2,7} \sum_{\mathrm{j}=8}^{11} 2^{j}\right) 2^{4}+\left(S_{3,7} \sum_{\mathrm{j}=8}^{9} 2^{j}\right) 2^{6} \\
& =\left(2^{9}+\overline{S_{0,7}} 2^{8}\right)+\left(2^{11}+\overline{S_{1,7}} 2^{10}\right)+\left(2^{13}+\overline{S_{2,7}} 2^{12}\right) \\
& +\left(2^{15}+\overline{S_{3,7}} 2^{14}\right)+2^{8} \tag{5}
\end{align*}
$$

where $S$ is the final sign result. From Eq. (5), the partial products of the Booth algorithm only need to add two
elements $\left(1, \overline{S_{i, 7}}\right)$ for each row and add an extra ' 1 ' in the $2^{8}$-weighting column. Thus, the sign-generate sign extension scheme can reduce many redundant full adders compared to the conventional sign extension method. It is worth mentioning that we can obtain various fixed-width Booth multipliers based on keeping $n+w$ most significant columns of the subproduct array, where $w$ is a nonnegative integer between 0 and $n-1$.

## 3. Framework of Fixed-Width Booth Multiplier Design

### 3.1 Framework Formulation

The $2 n$-bit product for $n$ by $n$ 's-complement multiplication can be divided into two sections as

$$
\begin{equation*}
P=A B=M P+L P \tag{6}
\end{equation*}
$$

The most accurate truncation product is given by

$$
\begin{gather*}
P \cong M P+\sigma_{\text {Temp }} \times 2^{n},  \tag{7}\\
\sigma_{\text {Temp }}=\left[L P / 2^{n}\right]_{r} . \tag{8}
\end{gather*}
$$

Without loss of generality, for $n=8$, Eq. (8) can be written as
$\sigma_{\text {Temp }}=\left[\begin{array}{l}\frac{1}{2}\left(S_{3,1}+S_{2,3}+S_{1,5}+S_{0,7}\right)+\frac{1}{2^{2}}\left(S_{3,0}+\ldots+S_{0,6}\right. \\ \left.+\operatorname{Ctrl}_{3}[2]\right)+\ldots+\frac{1}{2^{7}} S_{0,1}+\frac{1}{2^{8}}\left(S_{0,0}+\operatorname{Ctrl}_{0}[2]\right)\end{array}\right]_{r}$.

Then we define the main-part and remain-part error terms as

$$
\begin{align*}
& E_{\text {main }}=S_{3,1}+S_{2,3}+S_{1,5}+S_{0,7}  \tag{10}\\
& E_{\text {remain }}=\frac{1}{2}\left(S_{3,0}+S_{2,2}+S_{1,4}+S_{0,6}+\text { trl }_{3}[2]\right) \\
& +\ldots+\frac{1}{2^{7}}\left(S_{0,0}+C \text { trl } l_{0}[2]\right) \tag{11}
\end{align*}
$$

Thus, we can rewrite Eq. (9) as

$$
\begin{equation*}
\sigma_{\text {Temp }}=\left[\frac{1}{2}\left(E_{\text {main }}+E_{\text {remain }}\right)\right]_{r} . \tag{12}
\end{equation*}
$$

It is convenient to perform exhaustive simulation if we define the useful index. Here the useful index for an 8 x 8 multiplier is defined as
$\left.\theta_{\text {index,w }}\left(q_{3}, q_{2}, q_{1}, q_{0}\right)=<S_{3,1-w}\right\rangle^{q_{3}}+\left\langle S_{2,3-w}\right\rangle^{q_{2}}$
$+\left\langle S_{1,5-w}>^{q_{1}}+\left\langle S_{0,7-w}>{ }^{q_{0}}\right.\right.$
where the binary parameters $q_{3}, q_{2}, q_{1}$ and $q_{0} \in\{0,1\}$. Note that if the value of the second index of subscription is less than zero, $\left\langle S_{i, j}\right\rangle^{q_{i}}$ is equal to zero. In Eq. (13), the operator

$$
<T>^{q_{i}}=\left\{\begin{array}{l}
T, \text { if } q_{i}=0  \tag{14}\\
\bar{T},, \text { if } q_{i}=1
\end{array},\right.
$$

in which $\bar{T}$ is the complement of binary $T$. Furthermore, $\theta_{\text {index,w }}\left(q_{3}, q_{2}, q_{1}, q_{0}\right)$ is referred to as $\theta_{Q, w}$, where

$$
\begin{equation*}
Q=q_{3} \times 2^{3}+q_{2} \times 2^{2}+q_{1} \times 2^{1}+q_{0} \times 2^{0} \tag{15}
\end{equation*}
$$

For instance, the value of $Q$ has a range from 0 to 15 for $w=0$ and 1 . In similar behavior, for $w=2$ and 3 , the value of $Q$ is located at the range from 0 to 7 . Since the reduction and rounding errors do not own the same most significant column, we adopt S-Ss' method [4] to concurrently treat reduction and rounding error. By applying Eqs. (13) to (15) into Eq. (12), we get

$$
\begin{align*}
& \sigma_{\text {Temp }}=\left[\begin{array}{l}
\frac{1}{2} E_{\text {main }}+\frac{1}{2} E_{\text {remain }}+\frac{1}{2^{w}} \theta_{Q, w}-E_{\text {reduct }, w} \\
+E_{\text {reduct }, w}-\frac{1}{2^{w}} \theta_{Q, w}
\end{array}\right]_{r} \\
& \cong\left\lfloor\begin{array}{l}
\left(\frac{1}{2} E_{\text {main }}+\frac{1}{2} E_{\text {remain }}+\frac{1}{2^{w}} \theta_{Q, w}-E_{\text {reduct }, w}\right) \\
+[K]_{r} / 2^{w}
\end{array}\right], \tag{16}
\end{align*}
$$

where

$$
\begin{align*}
& K=2^{w}\left(E_{\text {reduct }, w}-\frac{1}{2^{w}} \theta_{Q, w}+E_{\text {round }, w}\right),  \tag{17}\\
& E_{\text {reduct }, w}=\frac{1}{2^{w+1}}\left(S_{3,1-w}+S_{2,3-w}+S_{1,5-w}+S_{0,7-w}\right) \\
& +\ldots+\frac{1}{2^{n}} S_{0,0}+\left(\frac{1}{2^{2}} \operatorname{Ctrl}_{3,1-w}[2]+\frac{1}{2^{4}} \operatorname{Ctrl}_{2,3-w}[2]\right.  \tag{18}\\
& \left.+\frac{1}{2^{6}} \operatorname{Ctrl}_{1,5-w}[2]+\frac{1}{2^{8}} \operatorname{Ctrl}_{0,7-w}[2]\right), \\
& E_{\text {round }, w}=2^{-1}\left(1-2^{-w}\right) . \tag{19}
\end{align*}
$$

Herein, the second index of the control signal in Eq. (18) determines whether the control signal exists. In case the value of the second index is less than zero, the control signal can be neglected. In [10, 11], although we have provided two specific fixed-width Booth multiplier designs for $w=0$ and $w=1$, respectively, we have not yet proposed a systematic framework for $w \geq 1$. Next, a framework analysis is proposed in the following statements.

### 3.2 Framework Analysis

According to the preceding framework, we can complete the analysis through the four design steps. In the first step, so as to design a realizable error-compensation bias, one type of binary thresholding for the errorcompensation bias can be modeled as
Type 1:
$\sigma_{T y p e l, Q, w \geq 1}=\left\{\begin{array}{l}\left\lfloor\frac{1}{2}\left(E_{\text {main }}+E_{\text {remain }}\right)+\frac{1}{2^{w}} \theta_{Q, w}-E_{\text {reduct }, w}+\left[K_{1}\right]_{r} / 2^{w}\right], \text { if } \theta_{Q, w \geq 1}=0 \\ \left\lfloor\frac{1}{2}\left(E_{\text {main }}+E_{\text {remain }}\right)+\frac{1}{2^{w}} \theta_{Q, w}-E_{\text {reduct }, w}+\left[K_{2}\right]_{r} / 2^{w}\right\rfloor, \text { if } \theta_{Q, w \geq 1}>0\end{array}\right.$
where $K_{1}$ and $K_{2}$ are defined as those of [1] but for $w \geq 1$. The restriction of the value of $K$ can be modified as $\left[K_{i}\right]_{r} \in\left\{0,1,2^{w-1}-1,2^{w-1}\right\}$ for $i=1$, and 2 . Since the analysis for $w=1$ has been addressed in [11], herein, we focus on the investigation of $w=2$. In similar behavior, the analysis for other values of $w$ can be easily followed. Corresponding to the rounding values of $K_{1}$ and $K_{2}$, we simulate average error by exhaustive search simulation. Considering the goal of lower error and the restriction of $K$, we can observe that the specific index, $\theta_{Q=0, w=2}$, achieves superior error performance. Thus, the simple error-compensation bias of the proposed fixed-width Booth multiplier under $w=2$ can be described as:

where $\theta_{Q=0, w=2}=S_{2,1}+S_{1,3}+S_{0,5}$. So far, the second step is achieved. Here, due to the limited page space, we ignore the 3 rd step. Finally, Eq. (21) can be mapped to a new proposed low-error fixed-width $8 \times 8$ Booth multiplier with $\theta_{Q=0, w=2}$ as shown in Fig. 1 and the detailed implementation issues are exposed in the next section.


Fig. 1. Proposed low-error fixed-width $8 \times 8$ Booth multiplier with $\theta_{Q=0, w=2}$.

## 4. Performance Comparisons and Implementation Issues

In this section, we first simulate error performance in terms of maximum error, average error, and variance of error as listed in Table 1 for J-T-T's fixed-width Booth multiplier [8], C-L-P's fixed-width Booth multiplier [9],
and the proposed fixed-width Booth multiplier. It is clearly seen that the error performance of this work and two previous proposed fixed-width multipliers can be superior to that of J-T-T's fixed-width multiplier. Most importantly, the proposed Type 1 structure with $\theta_{Q=0, w=2}$ has the best error performance among the existing fixedwidth Booth multipliers.

About the implementation, the cell-based design flow with Artisan standard cell library is adopted and the proposed fixed-width multiplier has been implemented in UMC 0.18 um CMOS process. The Synopsys Design Compiler is used to synthesize the RTL design of the proposed multiplier and the Synopsys Apollo is adopted for placement and routing ( $\mathrm{P} \& \mathrm{R}$ ). The active chip area of the final layout as shown in Fig. 2 is 77.2 um by 72.6 um. In the worst case, the power consumption measured via the Synopsys Nanosim is 0.805 mW at an average operation rate of 100 MHz . The critical delay time obtained from the static timing analysis (STA) in Synopsys Apollo is 5.93 ns calculated in worst-case condition. Table 2 summarizes chip characteristics. Moreover, in order to clarify whether the structure owns the feature of power efficiency, we re-synthesis and resimulate other six multipliers for comparison as listed in Table 3. The power-consumption of Type 1 fixed-width multiplier with $\theta_{Q=0, w=2}$ is less than that of C-L-P's fixedwidth multiplier. That is, the proposed Type 1 fixed-width multiplier possesses error-aware and power-efficiency features.

Table 1: Comparison Results of Three Kinds of Errors among Different Booth Multipliers

| Multiplier | Width | Maximu m Error | Average Error | Variance of Error |
| :---: | :---: | :---: | :---: | :---: |
| DirectTruncated Multiplier | 4 | 32 | 10.88 | 66.43 |
|  | 6 | 192 | 70.50 | 874.76 |
|  | 8 | 1024 | 384.25 | 18479.23 |
| J-T-T's multiplier[8] | 4 | 15 | 4.69 | 26.87 |
|  | 6 | 85 | 23.24 | 435.88 |
|  | 8 | 443 | 107.10 | 7703.31 |
| C-L-P's multiplier [9] | 4 | 11 | 3.95 | 15.36 |
|  | 6 | 56 | 17.80 | 159.60 |
|  | 8 | 224 | 73.00 | 2018.40 |
| $\begin{gathered} \text { Type } 2 \\ \text { with } \\ \mathrm{Q}=0, w=0 \end{gathered}$ | 4 | 16 | 4.59 | 23.45 |
|  | 6 | 85 | 21.60 | 390.77 |
|  | 8 | 443 | 103.12 | 6489.46 |
| Type 1 with$\mathrm{Q}=0, w=1$ | 4 | 12 | 4.01 | 18.78 |
|  | 6 | 62 | 18.17 | 218.39 |
|  | 8 | 298 | 79.69 | 2534.44 |
| $\begin{gathered} \text { Type } 1 \\ \text { with } \\ \mathrm{Q}=0, w=2 \end{gathered}$ | 4 | 8 | 3.28 | 9.18 |
|  | 6 | 53 | 15.25 | 110.67 |
|  | 8 | 218 | 69.00 | 1799.43 |



Fig. 2. Type 1 fixed-width Booth multiplier layout with $\theta_{Q=0, w=2}$.

Table 2: Chip Characteristics

| Multiplier \& Multiplicand Word <br> Length (n) | 8 bits |
| :--- | :--- |
| Product Word Length (n) | 8 bits |
| Critical Delay Time | 5.93 ns |
| Power Supply | 1.8 V |
| Power Consumption @ 100 MHz | 0.805 mW |
| Active Chip Area | $77.2 \mathrm{um} \mathrm{x} \mathrm{72.6um}$ |
| Process Technology | UMC 0.18 um CMOS |

Table 3: Comparison Results of Power Consumption among Different Booth Multipliers for $n=8$

| Multiplier | Area (um x um) | Power $(\mathrm{mW})$ <br> @ 100 MHz |
| :---: | :---: | :---: |
| Full Precision <br> Multiplier | $91.1 \times 87.7$ | 1.357 |
| Direct- <br> Truncated <br> Multiplier | $60.7 \times 57.4$ | 0.393 |
| J-T-T's <br> multiplier [8] | $67.3 \times 62.5$ | 0.517 |
| C-L-P's <br> multiplier [9] | $79.9 \times 77.6$ | 0.909 |
| Type 2 with <br> Q=0, $w=0$ | $68.0 \times 67.5$ | 0.550 |
| Type 1 with <br> $\mathrm{Q}=0, w=1$ | $73.3 \times 72.6$ | 0.656 |
| Type 1 with <br> $\mathrm{Q}=0, w=2$ | $77.2 \times 72.6$ | 0.805 |

## 5. Conclusions

This paper develops a systematic methodology for designing one low-error and power-efficient fixed-width Booth multiplier. By properly choosing binary thresholding and the generalized index, we can derive one better error-compensation bias to improve the truncation error. Furthermore, the error-compensation bias can be easily constructed as a lower-error fixed-width Booth multiplier at the expense of slightly increased power consumption. It is very suitable for VLSI digital signal processing applications where the accuracy, power, speed, and area issues are crucial.

## References

[1] L. D. Van, S. S. Wang, and W. S. Feng, "Design of the lower-error fixed-width multiplier and its application", IEEE Trans. Circuits Syst. II, vol. 47, pp. 1112-1118, Oct. 2000.
[2] J. M. Jou, S. R. Kuang, and R. D. Chen, "Design of low-error fixed-width multiplier for DSP applications," IEEE Trans. Circuits Syst. II, vol. CAS46, no. 6, pp. 836-842, Jun. 1999.
[3] Y. P. Lim, "Single-precision multiplier with reduced circuit complexity for signal processing applications", IEEE Tran. Comput., vol. 41, no. 10, pp. 1333-1336, Oct. 1992.
[4] M. J. Schulte and E.E. Swartzlander, Jr. "Truncated multiplication with correction constant", VLSI Signal Processing, VI New York: IEEE Press, 1993, pp. 338396.
[5] E. E. Swartzlander, Jr., "Truncated multiplication with approximate rounding," in Proc. 33rd Asilomar Conference on Signals, Systems, and Computers, 1999, vol. 2, pp. 1480-1483.
[6] O. L. MacSorley, "High-speed arithmetic in binary computer", Proc. IRE, vol. 49, pp. 67-91, 1961.
[7] E. de Angel and E. E. Swartzlander, Jr., "Low power parallel multipliers," IEEE VLSI Signal Processing IX, 1996, pp. 199-208.
[8] S. J. Jou, M. H. Tsai, and Y. L. Tsao, "Low-error reduced-width multipliers for DSP applications," IEEE Trans. Circuits Syst. I, vol. 50, pp. 1470-1474, Nov. 2003.
[9] K. J. Cho, K. C Lee, and K. K. Parhi, "Design of Low-Error Fixed_Width Modified Booth Multiplier," IEEE Trans. VLSI Systems, vol. 12, no. 5, pp.522-530, May, 2004.
[10] M. A. Song, L. D. Van, T. C. Huang, and S. Y. Kuo, "A low-error and area-time efficient fixed-width Booth's multiplier," IEEE Int. Midwest Symp. Circuits Syst., Dec. 2003, accepted, Egypt.
[11] M. A. Song, L. D. Van, T. C. Huang and S. Y. Kuo, "A generalized methodology for low-error and areatime efficient fixed-width Booth multipliers," IEEE Int. Midwest Symp. Circuits Syst., Jul. 2004, vol. 1, pp. 9-12, Japan.

