low power parallel multiplier with column bypassing
Post on 29-May-2017
227 Views
Preview:
TRANSCRIPT
Low-power parallel multiplier with columnbypassing
M.-C. Wen, S.-J. Wang and Y.-N. Lin
A low-power parallel multiplier design, in which some columns in the
multiplier array can be turned-off whenever their outputs are known, is
proposed. This design maintains the original array structure without
introducing extra boundary cells, as was the case in previous designs.
Experimental results show that it saves 10% of power for random
input. Higher power reduction can be achieved if the operands contain
more 0’s than 1’s.
Introduction: Multiplication is an essential arithmetic operation for
common DSP applications, such as filtering and fast Fourier transform
(FFT). To achieve high execution speed, parallel array multipliers are
widely used. These multipliers tend to consume most of the power in
DSP computations, and thus power-efficient multipliers are very
important for the design of low-power DSP systems.
CMOS is currently the dominant technology in digital VLSI. Two
components contribute to the power dissipation in CMOS circuits. The
static dissipation is due to leakage current, while dynamic power
dissipation is due to switching transient current as well as charging and
discharging of load capacitances. Since the amount of leakage current is
usually small, the major source of power dissipation in CMOS circuits is
the dynamic power dissipation. Dynamic power dissipation appears only
when a CMOS gate switches from one stable state to another. Thus, the
power consumption can be reduced if one can reduce the switching
activity of a given logic circuit without changing its function.
Many low-power multiplier designs can be found in the literature. A
straightforward approach is to design a full adder (FA) that consumes
less power [1]. Power reduction can also be achieved through structural
modification. For example, rows of partial products can be ignored [2].
Parallel multiplier: Consider the multiplication of two unsigned n-bit
numbers, where A¼ an�1 an�2, . . . , a0 is the multiplicand and B¼
bn�1 bn�2, . . . , b0 is the multiplier. The product P¼ p2n�1p2n�2, . . . , p0,
can be written as follows:
P ¼Xn�1
i¼0
Xn�1
j¼0
ðai � bjÞ2iþj
An array implementation, known as the Braun multiplier [3], is
shown in Fig. 1. On the other hand, the Baugh-Wooley multiplier uses
the same array structure to handle 2’s complement multiplication, with
some of the partial products replaced by their complements. The
multiplier array consists of (n� 1) rows of CSA, in which each row
contains (n� 1) FA cells. Each FA in the CSA array has two outputs:
the sum bit goes down while the carry bit goes to the lower-left FA. For
an FA in the first row, there are only two valid inputs, and the third input
bit is set two 0. Therefore, it can be replaced by a two-input half-adder.
The last row is a ripple adder for carry propagation. In this Letter, we
propose a low-power design for this multiplier.
Fig. 1 4� 4 Braun multiplier
Low-power multipliers with row-bypassing: A low-power multiplier
design may disable the operations in some rows to save power [2]. If bit
bj is 0, all partial products aibj, 0� i� n� 1, are zero. Therefore, the
additions in the corresponding row in Fig. 1 can be bypassed. The row-
bypassing multiplier is shown in Fig. 2. Each cell in the CSA array is
augmented with three tri-state gates and two multiplexers. For exam-
ple, let b2 be 0 in Fig. 2. In this case, the CSA in the second row
(enclosed in the circle) can be bypassed, and the outputs from the first
row are fed directly to the third row CSA. However, since the rightmost
FA in the second row is disabled, it does not execute the addition and
thus the output is not correct. To remedy this problem, an extra circuit
must be added, and these elements locate in the triangle area in Fig. 2.
P7
a b1 3
P6 P5 P4 P3 P2 P1 P0
a b2 3a b3 3 a b0 3
a b3 2 a b2 2 a b1 2 a b0 2
a b3 1 a b2 1 a b1 1 a b0 1
a b0 0a b1 0a b2 0a b3 0
+ + + +
+ + + +
+ + + +
+ + +0 0 0
0
0b3
–b2
b2
-b101 1001 1001 10
01 10 01 1001 10
01 10 01 1001 10
0 0 0
-b3
Fig. 2 4� 4 Braun multiplier with row-bypassing
Proposed method: Instead of bypassing rows of full adders, we
propose a multiplier design in which columns of adders are bypassed.
In this approach, the operations in a column can be disabled if the
corresponding bit in the multiplicand is 0. There are two advantages
to this approach. First, it eliminates the extra correcting circuit as
shown in Fig. 2. Secondly, the modified FA is simpler than that used
in the row-bypassing multiplier.
Assume that we execute 1010� 1111 in Fig. 1. It can be verified that,
for FAs in the first and third diagonals, two out of the three input bits are
0: the ‘carry’ bit from its upper right FA, and the partial product aibj
(note that a0¼ a2¼ 0). As a result, the output carry bit of such an FA is
0, and the output sum bit is simply equal to the third bit, which is the
‘sum’ output of its upper FA. The following theorem shows that this is
true in general. Therefore, when ai is 0, the operations in the correspond-
ing diagonal can be disabled since all the outputs are known. We refer to
the FAs in a diagonal in Fig. 1 as a column. Let FAi, j be the full adders
locating in row i and column j, 0� i, j� n� 2, in the (n� 1)� (n� 1)
array, as shown in Fig. 1. FA0,0 is the adder at the upper-right corner. The
following theorem establishes reason for column bypassing.
Theorem 1: When aj¼ 0, the output of a column j adder cell FAi, j can
be specified as follows. 1. The output carry bit is 0. 2. The output sum
bit is equal to the output sum bit of FAi�1, jþ1.
Proof: We prove this theorem by induction.
1. Consider row 0. Note that, in row 0, there are only two bits to be
added. Adder FA0, j carries out ajb1þ ajþ1b0. If aj¼ 0, then the output
carry bit must be zero, and the out sum bit is equal to ajþ1b0.
2. Assume that the theorem holds for row i.
3. In row iþ 1, the inputs of FAiþ1, j are carry bit from FAi, j, sum bit from
FAi, jþ1, and the partial product ajbiþ1. Since aj¼ 0, two out of the three
inputs are 0, and the output sum bit is equal to the sum bit sent by FAi, jþ1.
According to theorem 1, when aj¼ 0, the operations in column j can
be ignored and thus the full adders can be disabled since the outputs are
known.
a b3 3
P7 P6 P5 P4 P3 P2 P1 P0
+ + +
+ + +
+ + +
+ + +
a b2 3
a2 a1 a0
10 10 10
10 10 10
10 10 10
a b1 3 a b0 3
a b3 2 a b2 2 a b1 2 a b0 2
a b3 1 a b2 1 a b1 1
a b3 0 a b2 0 a b1 0 a b0 0a b0 1
Fig. 3 4� 4 column-bypassing multiplier
ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10
Multiplier design: The column bypassing multiplier is shown in
Fig. 3. Note that we only need two tri-state gates and one multiplexer
in a modified adder cell. If aj¼ 0, the FA will be disabled. We do not
need a tri-state gate for the carry input (Ci�1, j), and the reason is given
as follows. For a Braun multiplier, there are only two inputs for each
FA in the first row (i.e. row 0). Therefore, when aj¼ 0, the two inputs
of FA0, j are disabled, and thus its output carry bit will not be changed.
Therefore, all three inputs of FA1,j are fixed, which prohibits its output
changing. In the bottom of the CSA array, we need to set the carry
outputs to be 0. Otherwise, the corresponding FAs may not produce
the correct outputs since their inputs are disabled. This is done by
adding an AND gate at the outputs of the last-row CSA adders.
Results: To evaluate the performance of this low-power multiplier, we
implement the design with TSMC 0.35 mm technology. We compare
the performance of this design with a normal Braun multiplier and row-
bypassing multiplier [2]; the results are given as follows. Table 1 gives
the power consumption by the three designs. In this experiment, the
input patterns are assumed to be random, i.e. the probability of 0 and 1
are both 0.5. The power is estimated by running HSPICE. Note that this
is a relatively pessimistic estimation. If the operands are sparse (i.e. the
number of 0’s is more than 1’s), there will be greater power saving. Our
results show that the row-bypassing multipliers actually consume more
power, possibly due to the extra logic. Our design consumes less power
in all cases, and the reduction increases as the size becomes larger. If
the distribution of 0’s and 1’s is not uniform, we shall be able to achieve
higher power saving. The areas of the three designs are listed in Table 2.
In our design, the area overhead is roughly 20%, while the area
overheads of row-bypassing multipliers are more than 40%.
Table 1: Power (mWatt)
Multiplier typeSize
4� 4 (%) 8� 8 (%) 16� 16 (%)
Braun 0.4325 100 2.31 100 8.01 100
[2] 0.5537 128 2.76 119 8.26 103
Proposed 0.4298 99.4 2.25 97.4 7.15 89.3
Table 2: Area (mm2)
Multiplier typeSize
4� 4 (%) 8� 8 (%) 16� 16 (%)
Braun 8672 100 33286 100 131040 100
[2] 13692 158 48991 147 185367 141
Proposed 10063 116 40236 121 162131 124
Conclusion: We have presented a new low-power parallel multiplier
design, which disables the operations in columns of full adders.
Compared with row-bypassing, this technique achieves higher
power reduction with lower hardware overhead.
# IEE 2005 2 February 2005
Electronics Letters online no: 20050464
doi: 10.1049/el:20050464
M.-C. Wen, S.-J. Wang and Y.-N. Lin (Department of Computer
Science, National Chung-Hsing University, 250 Kuo-Kuan Road,
Taichung 40227, Taiwan)
E-mail: sjwang@cs.nchu.edu.tw
References
1 Wu, A.: ‘High performance adder cell for low power pipelinedmultiplier’. Proc. IEEE Int. Symp. on Circuits and Systems, May 1996,Vol. 4, pp. 57–60
2 Ohban, J., Moshnyaga, V.G., and Inoue, K.: ‘Multiplier energy reductionthrough bypassing of partial products’. Proc. Asia-Pacific Conf. onCircuits and Systems, 2002, Vol. 2, pp. 13–17
3 Abu-Khater, I.S., Bellaouar, A., and Elmasry, M.: ‘Circuit techniques forCMOS low-power high-performance multipliers’, IEEE J. Solid-StateCircuits, 1996, 31, (10), pp. 1535–1546
ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10
top related