a high-speed, hierarchical 16×16 array of array

8/8/2019 A High-Speed, Hierarchical 1616 Array of Array

1/4

IMPACT-2009

A High-Speed, Hierarchical 1616 Array of Array

Multiplier Design

Abhijit Asati

1

and Chandrashekhar

2

1 EEE Group, BITS, Pilani, India, [email protected] CEERI, Pilani, India, [email protected]

AbstractArray multipliers are preferred for smaller operand

sizes due to their simpler VLSI implementation, in-spite of their

linear time complexity. The tree multipliers have time

complexity of O (log n) but are less suitable for VLSI

implementation since, being less regular, they require larger

total routing length, which may degrade their performance.

Some hybrid architectures called array of array multipliers

have intermediate performance. These multipliers have a time

complexity better than array multipliers, and therefore becomesan obvious choice for higher performance multiplier designs of

moderate operand sizes. In this paper a 1616 unsigned arrayof array multiplier circuit is designed with hierarchical

structure and implemented using conventional CMOS logic in

0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) ofMOSIS. The proposed multiplier implementation shows large

reduction in propagation delay and the average power

consumption (at 20MHz) as compared to 16-bit Booth encoded

Wallace tree multiplier by F Jalil [3]. The total transistor count,

maximum instantaneous power, leakage power, core area, total

routing length and number of vias are also presented.

I. INTRODUCTIONThe multiplier is a fundamental building block in Standard

Digital Signal Processors and ASIC Digital Signal Processorsused for Digital Signal Processing. Multiplication process isused in many Neural computing and DSP applications likeinstrumentation and measurement, communications, audio andvideo processing, Graphics, image enhancement, 3-Drendering, Navigation, radar, GPS, and control applicationslike robotics, machine vision, guidance. It is mainly used toimplement algorithms like frequency domain filtering (FIRand IIR), frequency-time transformations (FFT), Correlationetc. Most DSP tasks require real-time processing; it must perform these tasks speedily while minimizing Cost andPower. The multiplication algorithms differ in the means ofpartial product generation and partial product addition [1].

The array multiplier has linear time complexity i.e O (n)therefore delay degrades for multipliers having larger operandsizes. Also it has poor space complexity O (n2), as it requiresapproximately n2 cells to produce multiplication. Therefore asthe operand size grows, the circuit takes larger area and power[2], [5], [6]. A radix-m booth encoding, where m=2n reduces

the partial product rows by factor of n. Booth radix-4(m=4=22) encoding can reduce the number of partial productrows by a factor of two [3]. Since the numbers of partial product rows is reduced to half, the hardware required togenerate partial products is reduced to n2/2 cells [2]. InWallace tree multipliers, since ripple effect is reduced they produce products in far less time. The time complexity isreduced to O (log n) but larger routing area is required as

compared to regular array multipliers making them lesssuitable for VLSI implementation [2]. The advantage ofreduction in hardware using Booth encoding scheme can becombined with accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O(log n), which are very much suitable for large operand sizemultipliers [2], [3]. In sub-micron/deep sub-micron era for themultipliers of moderate operand sizes, where tree basedarchitectures may degrade their performance due to largerrouting lengths some hybrid architectures shows better performance, since gate level analysis of these architecturesshows moderate area and delay performance. These multiplierarchitectures have moderate area requirements and time

complexity of )( NO [4]. In this paper we present ahierarchical implementation of 1616, multiplier design usingarray of array technique. The VLSI implementation of

multiplier circuit is done using 0.6m, N-well CMOS process(SCN_SUBM, lambda=0.3) of MOSIS, using conventionalCMOS logic. Simulation results are compared with Boothencoded Wallace tree multiplier of [3]. Section II explains the

design of a 22 multiplier, Section III describes hierarchical

design of a 44 multiplier; Section IV describes hierarchical

design of 88 multiplier and 1616 multiplier. Physicalimplementation and results are described in section V. SectionVI concludes the paper.

II. DESIGN OF A 22 MULTIPLIERIn this architecture the 22 unsigned multiplier is used as a

basic building block in a hierarchical design of a larger bit size

multiplier. The truth table for a 22 combinational multiplieris shown in table I. The truth table can be solved using K-

978-1-4244-3604-0/09/$25.00 2009 IEEE 161

Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.


2/4

IMPACT-2009

map, which generates the equation (1). A 22, combinationalcircuit can be realized using these equations.

)1(01013

)00(112

)01(01)10(101

000

BBAAP

BABAP

ABBAABBAP

BAP

=

+=

+++=

=

TABLE I. TRUTH TABLE OF 22 MULTIPLIERA1 A0 B1 B0 P3 P2 P1 P0

0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0

0 0 1 1 0 0 0 0

0 1 0 0 0 0 0 0

0 1 0 1 0 0 0 1

0 1 1 0 0 0 1 0

0 1 1 1 0 0 1 1

1 0 0 0 0 0 0 0

1 0 0 1 0 0 1 0

1 0 1 0 0 1 0 0

1 0 1 1 0 1 1 01 1 0 0 0 0 0 0

1 1 0 1 0 0 1 1

1 1 1 0 0 1 1 0

1 1 1 1 1 0 0 1

III. DESIGN OF A 44 MULTIPLIERThe first step in the design of 4 bit multiplier will be

finding the different combinations of input bit pairs that are

derived in terms of 22 multiplier. Each input bit-pair ishandled by a separate 22 combinational multiplier to produce4 partial product rows. These partial products rows are thenadded optimally to generate final product bits. The design

procedure for 44 combinational multiplier is shown in tableII, while Fig. 1 shows the schematic of a 44 combinational

multiplier designed using 22 combinational multiplier. Thesepartial products rows are then added optimally using 5-bit fulladder cells.

TABLE II. DESIGN OF A 4-BIT MULTIPLIER USING 22 COMBINATIONALMULTIPLIER

Pair A3 A2 A1 A0

Group II Group I

B3 B2 B1 B0

Group IV Group III

I III PP3 PP2 PP1 PP0II III PP7 PP6 PP5 PP4

I IV PP11 PP10 PP9 PP8

II

IV

PP15 PP14 PP13 PP12

Sum P7 P6 P5 P4 P3 P2 P1 P0

IV. DESIGN OF A 88 MULTIPLIER AND 1616MULTIPLIER

In the design of 88 multiplier the first step will be findingthe different combinations of input bit pairs that are derived in

terms of 44 multiplier. Each input bit-pair is handled by a

separate 44 combinational multiplier which has been already

designed using 22 multiplier as explained in section III.

These separate 44 combinational multipliers produce 4 partial product rows. These partial products rows are then

added optimally to generate final product bits of 88

multiplier as shown in Fig. 2 Similarly, in the design of 1616multiplier the first step is to find the different combinations of

input bit pairs that are derived in terms of 88 multiplier. The

each input bit-pair is handled by a separate 88 combinational

multiplier to produce 4 partial product rows (the 88combinational multiplier design has already been discussed).These partial products rows are then added optimally to

generate final product bits of 1616 multiplier as shown inFig. 3.

At each level of hierarchy in design four partial productrows are to be handled. Therefore accumulation of partialproduct rows at each level of hierarchy is much simplified ascompared to other multiplier architectures.

Figure 1. A 44 combinational multiplier

162



3/4

IMPACT-2009

Figure 1.



163



4/4

a high-speed, hierarchical 16×16 array of array

Documents