a high-speed, hierarchical 16×16 array of array

Upload: imran-basha

Post on 09-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 A High-Speed, Hierarchical 1616 Array of Array

    1/4

    IMPACT-2009

    A High-Speed, Hierarchical 1616 Array of Array

    Multiplier Design

    Abhijit Asati

    1

    and Chandrashekhar

    2

    1 EEE Group, BITS, Pilani, India, [email protected] CEERI, Pilani, India, [email protected]

    AbstractArray multipliers are preferred for smaller operand

    sizes due to their simpler VLSI implementation, in-spite of their

    linear time complexity. The tree multipliers have time

    complexity of O (log n) but are less suitable for VLSI

    implementation since, being less regular, they require larger

    total routing length, which may degrade their performance.

    Some hybrid architectures called array of array multipliers

    have intermediate performance. These multipliers have a time

    complexity better than array multipliers, and therefore becomesan obvious choice for higher performance multiplier designs of

    moderate operand sizes. In this paper a 1616 unsigned arrayof array multiplier circuit is designed with hierarchical

    structure and implemented using conventional CMOS logic in

    0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) ofMOSIS. The proposed multiplier implementation shows large

    reduction in propagation delay and the average power

    consumption (at 20MHz) as compared to 16-bit Booth encoded

    Wallace tree multiplier by F Jalil [3]. The total transistor count,

    maximum instantaneous power, leakage power, core area, total

    routing length and number of vias are also presented.

    I. INTRODUCTIONThe multiplier is a fundamental building block in Standard

    Digital Signal Processors and ASIC Digital Signal Processorsused for Digital Signal Processing. Multiplication process isused in many Neural computing and DSP applications likeinstrumentation and measurement, communications, audio andvideo processing, Graphics, image enhancement, 3-Drendering, Navigation, radar, GPS, and control applicationslike robotics, machine vision, guidance. It is mainly used toimplement algorithms like frequency domain filtering (FIRand IIR), frequency-time transformations (FFT), Correlationetc. Most DSP tasks require real-time processing; it must perform these tasks speedily while minimizing Cost andPower. The multiplication algorithms differ in the means ofpartial product generation and partial product addition [1].

    The array multiplier has linear time complexity i.e O (n)therefore delay degrades for multipliers having larger operandsizes. Also it has poor space complexity O (n2), as it requiresapproximately n2 cells to produce multiplication. Therefore asthe operand size grows, the circuit takes larger area and power[2], [5], [6]. A radix-m booth encoding, where m=2n reduces

    the partial product rows by factor of n. Booth radix-4(m=4=22) encoding can reduce the number of partial productrows by a factor of two [3]. Since the numbers of partial product rows is reduced to half, the hardware required togenerate partial products is reduced to n2/2 cells [2]. InWallace tree multipliers, since ripple effect is reduced they produce products in far less time. The time complexity isreduced to O (log n) but larger routing area is required as

    compared to regular array multipliers making them lesssuitable for VLSI implementation [2]. The advantage ofreduction in hardware using Booth encoding scheme can becombined with accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O(log n), which are very much suitable for large operand sizemultipliers [2], [3]. In sub-micron/deep sub-micron era for themultipliers of moderate operand sizes, where tree basedarchitectures may degrade their performance due to largerrouting lengths some hybrid architectures shows better performance, since gate level analysis of these architecturesshows moderate area and delay performance. These multiplierarchitectures have moderate area requirements and time

    complexity of )( NO [4]. In this paper we present ahierarchical implementation of 1616, multiplier design usingarray of array technique. The VLSI implementation of

    multiplier circuit is done using 0.6m, N-well CMOS process(SCN_SUBM, lambda=0.3) of MOSIS, using conventionalCMOS logic. Simulation results are compared with Boothencoded Wallace tree multiplier of [3]. Section II explains the

    design of a 22 multiplier, Section III describes hierarchical

    design of a 44 multiplier; Section IV describes hierarchical

    design of 88 multiplier and 1616 multiplier. Physicalimplementation and results are described in section V. SectionVI concludes the paper.

    II. DESIGN OF A 22 MULTIPLIERIn this architecture the 22 unsigned multiplier is used as a

    basic building block in a hierarchical design of a larger bit size

    multiplier. The truth table for a 22 combinational multiplieris shown in table I. The truth table can be solved using K-

    978-1-4244-3604-0/09/$25.00 2009 IEEE 161

    Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.

  • 8/8/2019 A High-Speed, Hierarchical 1616 Array of Array

    2/4

    IMPACT-2009

    map, which generates the equation (1). A 22, combinationalcircuit can be realized using these equations.

    )1(01013

    )00(112

    )01(01)10(101

    000

    BBAAP

    BABAP

    ABBAABBAP

    BAP

    =

    +=

    +++=

    =

    TABLE I. TRUTH TABLE OF 22 MULTIPLIERA1 A0 B1 B0 P3 P2 P1 P0

    0 0 0 0 0 0 0 0

    0 0 0 1 0 0 0 0

    0 0 1 0 0 0 0 0

    0 0 1 1 0 0 0 0

    0 1 0 0 0 0 0 0

    0 1 0 1 0 0 0 1

    0 1 1 0 0 0 1 0

    0 1 1 1 0 0 1 1

    1 0 0 0 0 0 0 0

    1 0 0 1 0 0 1 0

    1 0 1 0 0 1 0 0

    1 0 1 1 0 1 1 01 1 0 0 0 0 0 0

    1 1 0 1 0 0 1 1

    1 1 1 0 0 1 1 0

    1 1 1 1 1 0 0 1

    III. DESIGN OF A 44 MULTIPLIERThe first step in the design of 4 bit multiplier will be

    finding the different combinations of input bit pairs that are

    derived in terms of 22 multiplier. Each input bit-pair ishandled by a separate 22 combinational multiplier to produce4 partial product rows. These partial products rows are thenadded optimally to generate final product bits. The design

    procedure for 44 combinational multiplier is shown in tableII, while Fig. 1 shows the schematic of a 44 combinational

    multiplier designed using 22 combinational multiplier. Thesepartial products rows are then added optimally using 5-bit fulladder cells.

    TABLE II. DESIGN OF A 4-BIT MULTIPLIER USING 22 COMBINATIONALMULTIPLIER

    Pair A3 A2 A1 A0

    Group II Group I

    B3 B2 B1 B0

    Group IV Group III

    I III PP3 PP2 PP1 PP0II III PP7 PP6 PP5 PP4

    I IV PP11 PP10 PP9 PP8

    II

    IV

    PP15 PP14 PP13 PP12

    Sum P7 P6 P5 P4 P3 P2 P1 P0

    IV. DESIGN OF A 88 MULTIPLIER AND 1616MULTIPLIER

    In the design of 88 multiplier the first step will be findingthe different combinations of input bit pairs that are derived in

    terms of 44 multiplier. Each input bit-pair is handled by a

    separate 44 combinational multiplier which has been already

    designed using 22 multiplier as explained in section III.

    These separate 44 combinational multipliers produce 4 partial product rows. These partial products rows are then

    added optimally to generate final product bits of 88

    multiplier as shown in Fig. 2 Similarly, in the design of 1616multiplier the first step is to find the different combinations of

    input bit pairs that are derived in terms of 88 multiplier. The

    each input bit-pair is handled by a separate 88 combinational

    multiplier to produce 4 partial product rows (the 88combinational multiplier design has already been discussed).These partial products rows are then added optimally to

    generate final product bits of 1616 multiplier as shown inFig. 3.

    At each level of hierarchy in design four partial productrows are to be handled. Therefore accumulation of partialproduct rows at each level of hierarchy is much simplified ascompared to other multiplier architectures.

    Figure 1. A 44 combinational multiplier

    162

    Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.

  • 8/8/2019 A High-Speed, Hierarchical 1616 Array of Array

    3/4

    IMPACT-2009

    Figure 1.

    Figure 2. A 88 combinational multiplier

    Figure 3. A 1616 combinational multiplier

    163

    Authorized licensed use limited to: K.S. Institute of Technology. Downloaded on November 3, 2009 at 01:45 from IEEE Xplore. Restrictions apply.

  • 8/8/2019 A High-Speed, Hierarchical 1616 Array of Array

    4/4