# A Generalized Methodology for Lower-Error Area-Efficient Fixed-Width Multipliers Lan-Da Van and Sung-Huang Lee

Chip Implementation Center (CIC), National Science Council,

No. 1, Prosperity Rd. 1, Science-Based Industrial Park, Hsinchu, Taiwan, R.O.C.

E-mail: <u>ldvan@cic.edu.tw</u> and <u>shlee@cic.edu.tw</u>

### Abstract

In this paper, we extend our generalized methodology for designing lower-error area-efficient fixed-width two's-complement multipliers that receive two s-bit numbers and produce an s-bit product. The generalized methodology involving four steps results in several better error-compensation biases. These better error-compensation biases can be easily mapped to lower-error fixed-width multipliers suitable for VLSI realization.

## 1. Introduction

Low-error, small-area, and low-power fixed -width (also referred to as single precision) multipliers [2-8] are the most important processing element for digital computing engines such as digital filters, wavelet transform [7]. The fixed-width multipliers derived from the Baugh-Wooley [1] multiplier produce s -bit output product with s -bit multiplier and s -bit multiplicand. Area saving of a fixed-width multiplier can be achieved either by directly truncating s least significant product-bits and preserving s most significant product-bits or by other efficient methods. By the former method, significant truncation errors composed of reduction and rounding errors would be introduced since no error compensation is considered. Thus, the latter schemes investigate issues on low truncation error and small area. Lim [2] first utilized statistical techniques to estimate the error-compensation bias. However, in his analysis the reduction and rounding errors are separately treated such that this scheme does not lead to an accurate enough error-compensation bias. Schulte et al. [3] improved the error-compensation bias to be more accurate and practical since the reduction and rounding errors are concurrently treated. The above two schemes are based on keeping s + w columns of the subproduct array, where w is a nonnegative integer between 0 and s-1. While w equals zero, these two schemes are equivalent to the work of Kidambi et al [4]. Nevertheless, the three analyses and structures cannot provide an adaptive error-compensation bias. Later, King and Swartzlander [5, 6] analyzed an adaptive error-compensation bias (also referred to as variable correction) under keeping s + w columns and proposed an s -bit fixed-width multiplier. Corresponding to J-K-Cs' index, Jou et al. [7], independently, provided another adaptive error-compensation bias (also referred to as carry-generating circuit) to improve truncation error at w = 0. In [8], how to choose the index and whether other binary thresholding and structures exist have been partially explored and pointed out at w = 0. This work is intended to extend our generalized methodology to design several lower-error area-efficient fixed-width multipliers. This generalized methodology includes the following steps in order: 1) Propose an error-compensation bias with a new binary thresholding for a fixed value of w; 2) simulate the value of K and error performance of the proposed error-compensation bias using our generalized index, and then select the best index having lower error and satisfying the same value of K for small width s; 3) construct a lower-error multiplier structure; and 4) verify the realizable error compensation bias by statistical techniques for large width s. While  $w \ge 1$ ,

the new multiplier also operates lower error than those in [5, 6] at the expense of slightly increased area-ratio with respect to each value of w. Thus, the proposed lower-error multipliers are area-efficient.

This paper is organized as follows. In Section 2, we propose several better error-compensation biases and simulate the results for small width s. The improved error-compensation bias can be mapped to a new structure with respect to each value of w. The performance discussions of the proposed fixed-width multipliers are described in Section 3. Finally, brief statements conclude the presentation.

# 2. Design of Fixed-Width Multipliers

Considering two two's-complement integer operands, an s-bit multiplicand X and an s-bit multiplier Y can be respectively represented by

$$X = -x_{s-1} 2^{s-1} + \sum_{i=0}^{s-2} x_i 2^i , \qquad (1)$$

$$Y = -y_{s-1}2^{s-1} + \sum_{j=0}^{s-2} y_j 2^j, \qquad (2)$$

where  $x_i$ ,  $y_j \in \{0, 1\}$ . The 2*s*-bit product  $P_{Standard}$  can be written as

$$P_{Standard} = X \times Y$$
  
=  $x_{s-1}y_{s-1}2^{2s-2} + \sum_{i=0}^{s-2}\sum_{j=0}^{s-2}x_iy_j2^{i+j} + 2^{s-1}(-2^{s-1} + \sum_{j=0}^{s-2}\overline{x_{s-1}y_j}2^j + 1)$   
+  $2^{s-1}(-2^{s-1} + \sum_{i=0}^{s-2}\overline{y_{s-1}x_i}2^i + 1)$ . (3)

Eq. (3) represents the Baugh-Wooley algorithm. Fig. 1 reveals the subproduct array for  $s \times s$  two's-complement multiplication. By partitioning the subproducts into two sections, Eq. (3) can be rewritten as

$$P_{Standard} = MP + LP = \sum_{i=0}^{2^{s-1}} P_i 2^i , \qquad (4)$$

where  $P_i \in \{0, 1\}$ . It is known that the most accurate truncated product is theoretically given by

$$P_{Standard} \cong MP + \sigma_{Temp} \times 2^{s} , \qquad (5)$$
  
and  
$$\sigma_{Temp} = \left[LP/2^{s}\right]_{r} = \left[\frac{1}{2}(\overline{x_{s-1}y_{0}} + x_{s-2}y_{1} + \dots + x_{1}y_{s-2} + \overline{x_{0}y_{s-1}}) + \frac{1}{2^{s}}(x_{s-2}y_{0} + \dots + x_{0}y_{s-2}) + \dots + \frac{1}{2^{s-1}}(x_{1}y_{0} + x_{0}y_{1}) + \frac{1}{2^{s}}x_{0}y_{0}\right]_{r} , \quad (6)$$

where  $[\bullet]_r$  denotes the rounding integer of  $\bullet$ . Then we define the following terms as

$$E_{main} = \overline{x_{s-1}y_0} + x_{s-2}y_1 + x_{s-3}y_2 \dots + x_1y_{s-2} + \overline{x_0}y_{s-1} , \qquad (7)$$

$$E_{remain} = \frac{1}{2} (x_{s-2}y_0 + x_{s-3}y_1 + \dots + x_0y_{s-2}) + \dots + \frac{1}{2^{s-1}} x_0y_0.$$
 (8)

Using Eqs. (7) and (8), we can rewrite Eq. (6) as

$$\sigma_{Temp} = \left\lfloor \frac{1}{2} (E_{main} + E_{remain}) \right\rfloor_{r}.$$
(9)

To explore the influence of the index in the proposed binary thresholding, we define a generalized index,  $\theta_{index,w}$ , as

$$\theta_{index,w}(q_{s-1-w}, q_{s-2-w}, ..., q_0) = \langle x_{s-1-w} y_0 \rangle^{q_{s-1-w}} + \langle x_{s-2-w} y_1 \rangle^{q_{s-2-w}} + ... + \langle x_0 y_{s-1-w} \rangle^{q_0},$$
(10)

where w means to keep s + w columns in the subproduct array and the binary parameters  $q_{s-1-w}, q_{s-2-w}, ...,$  and  $q_0 \in \{0,1\}$ . The operator

$$\langle T \rangle^{q_i} = \begin{cases} T &, \text{ if } q_i = 0\\ \overline{T} &, \text{ if } q_i = 1 \end{cases}$$
(11)

in which  $\overline{T}$  is the complement of the binary number T. Instead, we call the index  $\theta_{index,w}(q_{s-1-w}, q_{s-2-w}, ..., q_0)$  as  $\theta_{Q,w}$ , where

$$Q = q_{s-1-w} \times 2^{s-1-w} + q_{s-2-w} \times 2^{s-2-w} + \dots + q_0 \times 2^0.$$
(12)

Note that Q has a range varying from 0 to  $2^{s-w} - 1$ . For evaluating the resulting performance, let  $\varepsilon$ ,  $\varepsilon$  and  $\upsilon$  be the absolute error between products of the standard multiplier and truncated multiplier, the average error, and the variance of error, respectively. In the following, we divide the content into two subsections.

# **2.1** Realizable error-compensation bias keeping s columns (w = 0)

Since the derivation results in this subsection are the same as what proposed in [8, 9], we only present the simple error-compensation bias applying the generalized methodology in Type 2 binary thresholding as

$$\sigma_{Type2,Q=2^{s-1}+1,w=0} = \begin{cases} x_{s-2}y_1 + x_{s-3}y_2 + \dots + x_1y_{s-2} + 1 & \text{if } \theta_{Q=2^{s-1}+1,w=0} < s \\ x_{s-2}y_1 + x_{s-3}y_2 + \dots + x_1y_{s-2} & \text{, if } \theta_{Q=2^{s-1}+1,w=0} = s \end{cases}$$
where  $\theta_{Q=2^{s-1}+1,w=0} = x_{1,2}$  (13)

where  $\theta_{Q=2^{s-1}+1,w=0} = x_{s-1}y_0 + x_{s-2}y_1 + \dots + x_0y_{s-1}$ . Subsequently, we focus on developing a newly fixed-width multiplier

quently, we focus on developing a newly fixed-width multiplier in the next subsection.

# **2.2** Realizable error-compensation bias keeping more than *s* columns ( $w \ge 1$ )

In [2, 3], they show that lower truncation error can be obtained if larger s + w columns are kept in hardware. However, more area cost could be increased. Since the reduction and rounding errors do not own the same weight position, we adopt S-Ss' method [3] to concurrently treat reduction and rounding errors. Eq. (9) can be rewritten as

$$\sigma_{Temp} = \left[\frac{1}{2}E_{main} + \frac{1}{2}E_{remain} + \frac{1}{2^{w}}\theta_{Q,w} - E_{reduct,w} + E_{reduct,w} - \frac{1}{2^{w}}\theta_{Q,w}\right]$$
$$\cong \left[\left(\frac{1}{2}E_{main} + \frac{1}{2}E_{remain} + \frac{1}{2^{w}}\theta_{Q,w} - E_{reduct,w}\right) + \left[K\right]_{r} / 2^{w}\right], (14)$$
where

$$K = 2^{w} \left( E_{reduct,w} - \frac{1}{2^{w}} \theta_{\mathcal{Q},w} + E_{round,w} \right), \tag{15}$$

$$E_{reduct,w} = \frac{1}{2^{w+1}} (x_{s-1-w} y_0 + x_{s-2-w} y_1 + \dots + x_0 y_{s-1-w}) + \dots + \frac{1}{2^s} x_0 y_0, (16)$$

$$E_{round,w} = 2^{-1} (1 - 2^{-w}), \qquad (17)$$

and  $\lfloor \bullet \rfloor$  denotes the maximum integer equal to or less than  $\bullet$ . Note that the least significant weigh of K must be limited to the s + w weighting position. Concurrently treating method for reduction and rounding errors of Eq. (14) can be directly referred to derivation of [3]. In the first step, to design a realizable error-compensation bias, two types of binary thresholding for the error-compensation bias can be changed to

Type 1:

$$\sigma_{T_{ype1,Q,w\geq 1}} = \begin{cases} \left\lfloor \frac{1}{2} (E_{main} + E_{remain}) + \frac{1}{2^{w}} \theta_{Q,w} - E_{reduct,w} + [K_{1}]_{r} / 2^{w} \right\rfloor, \text{ if } \theta_{Q,w} = 0 \\ \left\lfloor \frac{1}{2} (E_{main} + E_{remain}) + \frac{1}{2^{w}} \theta_{Q,w} - E_{reduct,w} + [K_{2}]_{r} / 2^{w} \right\rfloor, \text{ if } \theta_{Q,w} > 0 \end{cases}$$

$$(18)$$

Type 2:

$$\sigma_{Type2,Q,w\geq 1} = \begin{cases} \left\lfloor \frac{1}{2} (E_{main} + E_{remain}) + \frac{1}{2^{w}} \theta_{Q,w} - E_{reduct,w} + [K_{3}]_{r} / 2^{w} \right\rfloor, \text{ if } \theta_{Q,w} < s \\ \left\lfloor \frac{1}{2} (E_{main} + E_{remain}) + \frac{1}{2^{w}} \theta_{Q,w} - E_{reduct,w} + [K_{4}]_{r} / 2^{w} \right\rfloor, \text{ if } \theta_{Q,w} = s \end{cases}$$
(19)

where  $K_1, K_2, K_3$  and  $K_4$  are defined as those of [8] but for  $w \ge 1$ . The restriction of the value of K can be modified as  $[K_i]_{w} \in \{0, 1, 2^{w-1} - 1, 2^{w-1}\}$  for i = 1, 2, 3 and 4. For  $w \ge 1$ , since using the same simulation procedures as mentioned in [8], we only introduce the analysis for w = 1 and construct the structure. In Type 1 binary thresholding, by exhaustive search we can find that one good index,  $\theta_{O=0,w=1}$  for s=6 satisfies the restriction of  $K_1$  and  $K_2$  as shown in Fig. 2. The average error and variance of error resulted from the index,  $\theta_{O=0,w=1}$ , are shown in Figs. 3 and 4, respectively. After exhaustive simulation from s = 4 to 12, we observe that the specific index,  $\theta_{O=0,w=1}$ achieves better error performance where the chosen index satisfies  $[K_1]_r = 1$  and  $[K_2]_r = 0$ . On the other hand, for Type 2 binary thresholding, the error simulation in terms of average errors and variance of errors are larger than what we find error resulted from the best index in Type 1 thresholding, so we ignore the discussion in Type 2 binary thresholding. So far, the second step is achieved. Hence, a new lower error fixed-width multiplier under w = 1 can be described and simplified as:

$$\sigma_{Type1,Q=0,w=1} = \begin{cases} \left\lfloor \frac{1}{2} (E_{main} + \theta_{Q=0,w=1}) + \frac{1}{2} \right\rfloor, \text{ if } \theta_{Q=0,w=1} = 0\\ \left\lfloor \frac{1}{2} (E_{main} + \theta_{Q=0,w=1}) + 0 \right\rfloor, \text{ if } \theta_{Q=0,w=1} > 0 \end{cases}$$
(20)

where  $\theta_{Q=0,w=1} = x_{s-2}y_0 + x_{s-3}y_1 + ... + x_0y_{s-2}$ . In the third step, Eq. (20) can be mapped to the structure as shown in Fig. 5(a) for s = 8, where A, ND, and FA denote AND gate, NAND gate, and a full adder, respectively, and other main symbolic cells are depicted in Fig. 5(b). In the fourth step, verifying the realizable error-compensation bias by statistical techniques for larger width can be easily modified from the derivation results in [8]; thus, we ignore the presentation here. At other values of w, we can follow the same procedures to evaluate K, error performance and form the structure for small width s. From simulation results,  $\theta_{Q=0,w}$  in Type 1 binary thresholding is still the best index for  $w \ge 1$ .

#### 3. Performance Discussion

In this section, the performance is evaluated in terms of

maximum error  $\varepsilon_{\max}$ , average error  $\overline{\varepsilon}$ , variance of error  $\upsilon$ and area ratio R defined in [8, 9]. It is obvious that a fixed-width multiplier is more accurate if  $\varepsilon_{max}$ ,  $\overline{\varepsilon}$ , and  $\upsilon$  are smaller. Tables 1, 2 and 3 show simulated results for various fixed-width multipliers at different width s. The comparison results show that our proposed fixed-width multipliers are more accurate than others [2-7] for different values of w. The better performance is achieved due to the fact that we derive better error-compensation biases to reduce truncation error. The area-ratio in Table 4 shows that the proposed multiplier has nearly the same area-ratio as that of [4-7] at each value of w so that the lower-error fixed-width multiplier is area-efficient. Next, we apply the proposed fixed-width multiplier to the 35-tap FIR filter for speech processing. For convenience of comparison of various fixed-width multipliers, we take 1000 samples for the consonant part and vowel part of "Chicken". We are concerned with whether the filtered waveform is accurate via our proposed fixed-width multiplier, so the correct standard output is required. We use error-free output as a standard, which is used to compare the accuracy performances of fixed-width multipliers. From comparison results obtained with four fixed-width multipliers as shown in Fig. 6 for speech processing application, we observed that the Type 1 multiplier with  $\theta_{O^{=0,w=1}}$  shows better error per-

formance in the consonant and vowel parts.

#### 4. Conclusion

This paper develops a generalized methodology for designing lower-error area-efficient fixed-width multipliers. By properly choosing the binary thresholding and generalized indices, we devise several better error-compensation biases to reduce the maximum error, average error and variance of error. From area-ratio comparison, these new lower-error fixed-width multipliers are shown to be area-efficient for VLSI implementation. Finally, we successfully apply the proposed fixed-width multipliers to speech processing application and obtain satisfactory results.

## References

- C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," *IEEE Trans. Comput.*, vol. C-22, no. 12, pp. 1045-1047, Dec. 1973.
- [2] Y. P. Lim, "Single-precision multiplier with reduced circuit complexity for signal processing applications," *IEEE Trans. Comput.*, vol. 41, no. 10, pp. 1333-1336, Oct. 1992.
- [3] M. J. Schulte and E. E. Swartzlander, Jr., "Truncated multiplication with correction constant," *VLSI Signal Processing*, *VI*, New York: IEEE Press, 1993, pp. 388-396.
- [4] S. S. Kidambi, F. El-Guibaly, and A. Antoniou, "Area-efficient multipliers for digital signal processing applications," *IEEE Trans. Circuits Syst. II*, vol. CAS-43, no. 2, pp. 90-94, Feb. 1996.
- [5] E. J. King and E. E. Swartzlander, Jr., "Data-dependent truncation scheme for parallel multipliers," in *Proc. 31st Asilomar Conference on Signals, Systems, and Computers*, 1997, vol. 2, pp. 1178-1182, Pacific Grove, CA.
- [6] E. E. Swartzlander, Jr., "Truncated multiplication with approximate rounding," in *Proc. 33rd Asilomar Conference on Signals, Systems, and Computers*, 1999, vol. 2, pp. 1480-1483.
- [7] J. M. Jou, S. R. Kuang, and R. D. Chen, "Design of low-error fixed-width multiplier for DSP applications," *IEEE Trans. Circuits Syst. II*, vol. CAS-46, no. 6, pp. 836-842, Jun. 1999.

- [8] L. D. Van, S. S. Wang, and W. S. Feng, "Design of the lower-error fixed-width multiplier and its application", *IEEE Trans. Circuits Syst. II*, vol. 47, pp. 1112-1118, Oct. 2000.
- [9] L. D. Van, Design of Efficient VLSI Architectures: Multiplier, 2-D Digital Filter, and Adaptive Digital Filter, Ph. D. dissertation, Dept. of the Electrical Engineering, National Taiwan University, Taipei, Taiwan, R.O.C., 2001.



Fig. 1. Subproduct array of  $s \times s$  two's-complement multiplication.



Fig. 2. Values of  $K_1$  and  $K_2$  versus different  $\theta_{Q,w=1}$  in Type 1 binary thresholding for s = 6.



Fig. 3. Average errors by full-search simulation versus different  $\theta_{Q, w=1}$  in Type 1 binary threshoding for s = 6.



Fig. 4. Variance of errors by full-search simulation versus different  $\theta_{Q,w=1}$  in Type 1 binary threshoding for s = 6.



Fig. 5. (a) Proposed lower-error fixed-width 8×8 two's complement multiplier with  $~\theta_{\mathcal{Q}^{=0,w=1}}$  , and (b) NFA gate, AFA gate, and logic diagrams of cell 4 and cell 5.



Fig. 5. (Continued.)



Fig. 6. Comparison results of error signals obtained with four kinds of fixed-width multipliers.

| Table 1: Comparison Results of Maximum Error, $\varepsilon_{max}$      |       |        |          |        |         |
|------------------------------------------------------------------------|-------|--------|----------|--------|---------|
| Multiplier                                                             | s=4   | s=6    | s=8      | s=10   | s=12    |
| K-G-As' Structure                                                      | 33    | 193    | 1281     | 6145   | 32769   |
| J-K-Cs' Structure                                                      | 21    | 107    | 515      | 2403   | 10979   |
| Type 2 with $Q = 2^{s-1} + 1$ , $w = 0$                                | 17    | 89     | 441      | 2105   | 9785    |
| K-Ss' Structure with $w = 1$                                           | 11    | 55     | 263      | 1223   | 5575    |
| Type 1 with $Q = 0$ , $w = 1$                                          | 9     | 49     | 237      | 1117   | 5149    |
| K-Ss' Structure with $w = 2$                                           | 9     | 41     | 185      | 825    | 3641    |
| Type 1 with $Q = 0$ , $w = 2$                                          | 8     | 37     | 171      | 771    | 3427    |
| Table 2: Comparison Results of Average Error, $\overline{\varepsilon}$ |       |        |          |        |         |
| Multiplier                                                             | s=4   | s=6    | s=8      | s=10   | s=12    |
| K-G-As' Structure                                                      | 6.96  | 41.01  | 188.29   | 906.40 | 3842.06 |
| J-K-Cs' Structure                                                      | 7.20  | 37.27  | 170.46   | 736.62 | 3065.25 |
| Type 2 with $Q = 2^{s-1} + 1, w = 0$                                   | 5.17  | 24.07  | 105.96   | 456.14 | 1907.36 |
| K-Ss' Structure with $w = 1$                                           | 4.00  | 17.58  | 73.91    | 307.11 | 1245.14 |
| Type 1 with $Q = 0$ , $w = 1$                                          | 3.77  | 16.29  | 69.15    | 292.27 | 1205.49 |
| K-Ss' Structure with $w = 2$                                           | 3.78  | 16.11  | 65.94    | 267.25 | 1067.47 |
| Type 1 with $Q = 0$ , $w = 2$                                          | 3.75  | 15.81  | 64.67    | 262.99 | 1055.75 |
| Table 3: Comparison Results of Variance of Error, $v$                  |       |        |          |        |         |
| Multiplier                                                             | s=4   | s=6    | s=8      | s=10   | s=12    |
| K-G-As' Structure                                                      | 39.80 | 788.45 | 22959.01 | 416043 | 9204493 |
| J-K-Cs' Structure                                                      | 28.24 | 537.70 | 10158.54 | 190805 | 3417020 |
| Type 2 with $Q = 2^{s-1} + 1$ , $w = 0$                                | 17.63 | 320.65 | 6031.32  | 112079 | 1973508 |
| K-Ss' Structure with $w = 1$                                           | 9.25  | 149.26 | 2597.87  | 45748  | 787538  |
| Type 1 with $Q = 0$ , $w = 1$                                          | 7.32  | 110.32 | 2060.25  | 39345  | 723507  |
| K-Ss' Structure with $w = 2$                                           | 7.45  | 104.74 | 1672.61  | 27741  | 446760  |
| Type 1 with $Q = 0$ , $w = 2$                                          | 7.19  | 95.15  | 1512.79  | 25626  | 420685  |
| Table 4: Comparison Results of Area Ratio, R                           |       |        |          |        |         |
| Multiplier                                                             | s=4   | s=6    | s=8      | s=10   | s=12    |
| K-G-As' Structure                                                      | 0.555 | 0.536  | 0.527    | 0.522  | 0.518   |
| J-K-Cs' Structure                                                      | 0.608 | 0.569  | 0.550    | 0.540  | 0.533   |
| Type 2 with $Q = 2^{s-1} + 1, w = 0$                                   | 0.608 | 0.569  | 0.550    | 0.540  | 0.533   |
| K-Ss' Structure with $w = 1$                                           | 0.863 | 0.733  | 0.671    | 0.635  | 0.612   |
| Type 1 with $Q = 0$ , $w = 1$                                          | 0.885 | 0.747  | 0.682    | 0.644  | 0.619   |
| K-Ss' Structure with $w = 2$                                           | 1.277 | 0.994  | 0.862    | 0.786  | 0.736   |
| Type 1 with $Q = 0$ $w = 2$                                            | 1.292 | 1.004  | 0.871    | 0.793  | 0.742   |