a high-speed, hierarchical 16×16 array of array
TRANSCRIPT
-
8/8/2019 A High-Speed, Hierarchical 1616 Array of Array
1/4
IMPACT-2009
A High-Speed, Hierarchical 1616 Array of Array
Multiplier Design
Abhijit Asati
1
and Chandrashekhar
2
1 EEE Group, BITS, Pilani, India, [email protected] CEERI, Pilani, India, [email protected]
AbstractArray multipliers are preferred for smaller operand
sizes due to their simpler VLSI implementation, in-spite of their
linear time complexity. The tree multipliers have time
complexity of O (log n) but are less suitable for VLSI
implementation since, being less regular, they require larger
total routing length, which may degrade their performance.
Some hybrid architectures called array of array multipliers
have intermediate performance. These multipliers have a time
complexity better than array multipliers, and therefore becomesan obvious choice for higher performance multiplier designs of
moderate operand sizes. In this paper a 1616 unsigned arrayof array multiplier circuit is designed with hierarchical
structure and implemented using conventional CMOS logic in
0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) ofMOSIS. The proposed multiplier implementation shows large
reduction in propagation delay and the average power
consumption (at 20MHz) as compared to 16-bit Booth encoded
Wallace tree multiplier by F Jalil [3]. The total transistor count,
maximum instantaneous power, leakage power, core area, total
routing length and number of vias are also presented.
I. INTRODUCTIONThe multiplier is a fundamental building block in Standard
Digital Signal Processors and ASIC Digital Signal Processorsused for Digital Signal Processing. Multiplication process isused in many Neural computing and DSP applications likeinstrumentation and measurement, communications, audio andvideo processing, Graphics, image enhancement, 3-Drendering, Navigation, radar, GPS, and control applicationslike robotics, machine vision, guidance. It is mainly used toimplement algorithms like frequency domain filtering (FIRand IIR), frequency-time transformations (FFT), Correlationetc. Most DSP tasks require real-time processing; it must perform these tasks speedily while minimizing Cost andPower. The multiplication algorithms differ in the means ofpartial product generation and partial product addition [1].
The array multiplier has linear time complexity i.e O (n)therefore delay degrades for multipliers having larger operandsizes. Also it has poor space complexity O (n2), as it requiresapproximately n2 cells to produce multiplication. Therefore asthe operand size grows, the circuit takes larger area and power[2], [5], [6]. A radix-m booth encoding, where m=2n reduces
the partial product rows by factor of n. Booth radix-4(m=4=22) encoding can reduce the number of partial productrows by a factor of two [3]. Since the numbers of partial product rows is reduced to half, the hardware required togenerate partial products is reduced to n2/2 cells [2]. InWallace tree multipliers, since ripple effect is reduced they produce products in far less time. The time complexity isreduced to O (log n) but larger routing area is required as
compared to regular array multipliers making them lesssuitable for VLSI implementation [2]. The advantage ofreduction in hardware using Booth encoding scheme can becombined with accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O(log n), which are very much suitable for large operand sizemultipliers [2], [3]. In sub-micron/deep sub-micron era for themultipliers of moderate operand sizes, where tree basedarchitectures may degrade their performance due to largerrouting lengths some hybrid architectures shows better performance, since gate level analysis of these architecturesshows moderate area and delay performance. These multiplierarchitectures have moderate area requirements and time
complexity of )( NO [4]. In this paper we present ahierarchical implementation of 1616, multiplier design usingarray of array technique. The VLSI implementation of
multiplier circuit is done using 0.6m, N-well CMOS process(SCN_SUBM, lambda=0.3) of MOSIS, using conventionalCMOS logic. Simulation results are compared with Boothencoded Wallace tree multiplier of [3]. Section II explains the
design of a 22 multiplier, Section III describes hierarchical
design of a 44 multiplier; Section IV describes hierarchical
design of 88 multiplier and 1616 multiplier. Physicalimplementation and results are described in section V. SectionVI concludes the paper.
II. DESIGN OF A 22 MULTIPLIERIn this architecture the 22 unsigned multiplier is used as a
basic building block in a hierarchical design of a larger bit size
multiplier. The truth table for a 22 combinational multiplieris shown in table I. The truth table can be solved using K-
978-1-4244-3604-0/09/$25.00 2009 IEEE 161
Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.
-
8/8/2019 A High-Speed, Hierarchical 1616 Array of Array
2/4
IMPACT-2009
map, which generates the equation (1). A 22, combinationalcircuit can be realized using these equations.
)1(01013
)00(112
)01(01)10(101
000
BBAAP
BABAP
ABBAABBAP
BAP
=
+=
+++=
=
TABLE I. TRUTH TABLE OF 22 MULTIPLIERA1 A0 B1 B0 P3 P2 P1 P0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 01 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1
III. DESIGN OF A 44 MULTIPLIERThe first step in the design of 4 bit multiplier will be
finding the different combinations of input bit pairs that are
derived in terms of 22 multiplier. Each input bit-pair ishandled by a separate 22 combinational multiplier to produce4 partial product rows. These partial products rows are thenadded optimally to generate final product bits. The design
procedure for 44 combinational multiplier is shown in tableII, while Fig. 1 shows the schematic of a 44 combinational
multiplier designed using 22 combinational multiplier. Thesepartial products rows are then added optimally using 5-bit fulladder cells.
TABLE II. DESIGN OF A 4-BIT MULTIPLIER USING 22 COMBINATIONALMULTIPLIER
Pair A3 A2 A1 A0
Group II Group I
B3 B2 B1 B0
Group IV Group III
I III PP3 PP2 PP1 PP0II III PP7 PP6 PP5 PP4
I IV PP11 PP10 PP9 PP8
II
IV
PP15 PP14 PP13 PP12
Sum P7 P6 P5 P4 P3 P2 P1 P0
IV. DESIGN OF A 88 MULTIPLIER AND 1616MULTIPLIER
In the design of 88 multiplier the first step will be findingthe different combinations of input bit pairs that are derived in
terms of 44 multiplier. Each input bit-pair is handled by a
separate 44 combinational multiplier which has been already
designed using 22 multiplier as explained in section III.
These separate 44 combinational multipliers produce 4 partial product rows. These partial products rows are then
added optimally to generate final product bits of 88
multiplier as shown in Fig. 2 Similarly, in the design of 1616multiplier the first step is to find the different combinations of
input bit pairs that are derived in terms of 88 multiplier. The
each input bit-pair is handled by a separate 88 combinational
multiplier to produce 4 partial product rows (the 88combinational multiplier design has already been discussed).These partial products rows are then added optimally to
generate final product bits of 1616 multiplier as shown inFig. 3.
At each level of hierarchy in design four partial productrows are to be handled. Therefore accumulation of partialproduct rows at each level of hierarchy is much simplified ascompared to other multiplier architectures.
Figure 1. A 44 combinational multiplier
162
Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.
-
8/8/2019 A High-Speed, Hierarchical 1616 Array of Array
3/4
IMPACT-2009
Figure 1.
Figure 2. A 88 combinational multiplier
Figure 3. A 1616 combinational multiplier
163
Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.
-
8/8/2019 A High-Speed, Hierarchical 1616 Array of Array
4/4