final report

100
ABSTRACT In High-performance VLSI circuits, the on-chip power densities are playing dominant role in both static and dynamic conditions due to shrinking device features. The consumed power is usually dissipated heat, affecting the performance and reliability of the chip. Complex Multiplier is an arithmetic circuit that is extensively used in DSP and communication applications like, FFT, Digital Filters etc. For fast circuit implementation, parallel multiplier is preferred. For large bit-width multiplications, a large number of adders are required to perform the partial product addition. Compressors are used to compress partial product addition stages. Higher order compressors permit the reduction of the vertical critical paths in parallel multiplier resulting in better speed-power product for the multiplier circuit. Thesis presents a novel scheme for 16*16 bit multiplier using thirteen different types of compressors. The scheme is optimized for low power as well as high speed implementation over reported schemes. It represents low power multiplier design methodology, which counts only number of 1’s in the partial products. .

Upload: katrenapriya

Post on 11-Sep-2014

182 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Final Report

ABSTRACT

In High-performance VLSI circuits, the on-chip power densities are playing

dominant role in both static and dynamic conditions due to shrinking device features. The

consumed power is usually dissipated heat, affecting the performance and reliability of

the chip. Complex Multiplier is an arithmetic circuit that is extensively used in DSP and

communication applications like, FFT, Digital Filters etc. For fast circuit implementation,

parallel multiplier is preferred. For large bit-width multiplications, a large number of

adders are required to perform the partial product addition.

       Compressors are used to compress partial product addition stages. Higher order

compressors permit the reduction of the vertical critical paths in parallel multiplier

resulting in better speed-power product for the multiplier circuit. Thesis presents a novel

scheme for 16*16 bit multiplier using thirteen different types of compressors. The

scheme is optimized for low power as well as high speed implementation over reported

schemes. It represents low power multiplier design methodology, which counts only

number of 1’s in the partial products.

.

Page 2: Final Report

CONTENTS

1. INTRODUCTION

1.1 Introduction

1.2 Complex Number

1.2.1 Operation of Complex Numbers

1.3 Organization of Thesis

2. SURVEY OF COMPLEX MULTIPLICATION

2.1 General rule of Complex Multiplication

2.2 Cases of Multiplication

2.3 Types of Complex Multiplication

2.3.1 Complex Multiplication for Area Efficient

2.3.2 Multiplication of Complex Number using a low power parallel multiplier

2.4 Related Research

2.4.1 Braun Multiplier

2.4.2 Baugh-Wooley Multiplier

2.4.3 Multiplier using Bypassing circuitary

2.4.4 Multiplier using Adder-Subtractor Unit (ASU)

2.5 Signed Number Multiplication

2.5.1 Representation of Negative Numbers

2.5.2 Booth’s Recoding Algorithm

2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-2

and Radix-4

3. MULTIPLIER UNIT

3.1 Partial Product Generator

3.2 Different Order Compressors

3.2.1 Adder as Counter

Page 3: Final Report

3.2.2 Compressor Logic

3.3 Parallel Adders

3.4 Architecture of Multiplier using Compressors

4. PROPOSED COMPLEX MULTIPLIER

4.1 Unsigned Multiplication

4.2 Signed Multiplication

4.2.1 Modified Technique Recoding Algorithm for Radix-2 and Radix-

4

4.2.2 Modified Booth’s Recoding Unit

4.3 Compressors and Adders

5. RESULTS AND DISCUSSION

5.1 Behavioral Simulation

5.2 Synthesis Report

5.3 Power Calculation

5.4 Layout

6. CONCLUSION AND FUTURE WORK

6.1 Conclusion

6.2 Future work

7. REFERENCES

Page 4: Final Report

LIST OF FIGURES

Figure 2.1. OBC-DA based Complex Multiplier structure

Figure 2.2. 4x4 Braun Multiplier

Figure 2.3. 4*4 Bypass Multiplier

Figure 2.4 4*4 ASU Multiplier

Figure 2.5 Adder Subtractor Unit

Figure 2.6: - Smart Adder (SA)

Figure 3.1. Internal Block Diagram of 16*16 Basic Multiplier

Figure 3.2. Partial Product Generator (4 Bit)

Figure 3.3. Half Adder

Figure 3.4. Full Adder

Figure 3.5. Block Diagram of 4:3 Compressor

Figure 3.6. Block Diagram of 5:3 Compressor

Figure 3.7. Block Diagram of 6:3 Compressor

Figure 3.8. Block Diagram of 7:3 Compressor

Figure 3.9. Block Diagram of 8:4 Compressor

Figure 3.10. Block Diagram of 9:4 Compressor

Figure 3.11. Block Diagram of 10:4 Compressor

Figure 3.12. Block Diagram of 11:4 Compressor

Figure 3.13. Block Diagram of 12:4 Compressor

Figure 3.14. Block Diagram of 13:4 Compressor

Figure 3.15. Block Diagram of 14:4 Compressor

Figure 3.16. Block Diagram of 15:4 Compressor

Figure 3.17. Block Diagram of 16:5 Compressor

Figure 3.18. Block Diagram of Parallel Adder

Figure 3.19. Architecture of 8*8 Multiplier using Compressors

Page 5: Final Report

Figure 4.1. Block Diagram of Unsigned Complex Multiplier

Figure 4.2. Combinational Logic for intermediate sign

Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part

Figure 4.4. Modified Complex Multiplier Block Diagram

Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier

Figure 4.6 Addition scheme for Radix-2

Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2

Figure 4.8 Addition scheme for Radix-4

Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4

LIST OF TABLES

Table 3.1. Half Adder as a Counter

Table 3.2 Full Adder as a Counter

Table 4.1. Booth’s Recoding algorithm Radix-2

Table 4.2. Booth’s Recoding algorithm Radix-4

Table 4.3 Modified Booth’s Recoding Algorithm Radix-2

Table 4.4 Modified Booth’s Recoding Algorithm Radix-4

Page 6: Final Report

Chapter 1.

Introduction The electronics industry has achieved a phenomenal growth over the last two

decades, mainly due to the rapid advances in integration technologies, large-scale

systems design - in short, due to the advent of VLSI. The number of applications of

integrated circuits in high-performance computing, telecommunications, and consumer

electronics has been rising steadily, and at a very fast pace. Increasing demand for

portable electronics for computing and communication, as well as other applications, has

necessitated longer battery life, lower weight, and lower power consumption. In order to

satisfy these requirements, research activities focusing on low power/low voltage design

techniques are underway. Since 'power' is now one of the design decision variables, the

expanded design space required for low power has further increased the complexity of an

already non-trivial task. Low power design basically involves two concomitant tasks:

power estimation and analysis and power minimization. These tasks need to be carried

out at each of the levels in the design hierarchy, namely, the behavioral, architectural,

logic, circuit and physical levels.[1]

In the survey of the current state of the field, many of the salient power

estimation and minimization techniques proposed for low power VLSI design are

reviewed. For each of the design levels, we provide an overview of several power

estimation and minimization approaches and the CAD tools that support them. Finally,

future research issues are discussed that will be necessary in order to make the low power

design endeavor a successful one. In the majority of digital signal processing (DSP)

applications the critical operations are the multiplication and accumulation. Real-time

signal processing requires high speed and high throughput Multiplier unit that consumes

low power, which is always a key to achieve a high performance digital signal processing

system. The purpose of this work is design and implementation of a low power multiplier

unit with block enabling technique to save power[2].

Page 7: Final Report

1.1 Introduction

Sizes of devices are scaling down by Moore Law. The sources of energy

consumption on a CMOS chip can be classified as static and dynamic

power dissipation. The dominant component of energy consumption in

CMOS is dynamic power consumption caused by the actual effort of the

circuit to switch. A first order approximation of the dynamic power

consumption of CMOS circuitry is given by the formula:

P = C*V2*f

Where P is the power, C is the effective switch capacitance, V is

the supply voltage, and f is the frequency of operation. The power

dissipation arises from the charging and discharging of the circuit node

capacitances found on the output of every logic gate. Power

management is the careful planning of power budget for every

subsystem of a VLSI chip. This is especially important issue for today’s

complex systems. The most important and successful use of power

management is to deactivate a portion of circuit when its computation

is not required [3].

Every low-to-high logic transition in a digital circuit incurs a

change of voltage, drawing energy from the power supply. A designer

at the technological and architectural level can try to minimize the

variables in these equations to minimize the overall energy

consumption. However, power minimization is often a complex process

of trade-offs between speed, area, and power consumption. The

current work proposes reduction of dynamic switching power in 16*16

complex multiplier by using higher order compressors to reduce the

switching activity as well as reduction of gate counts.

Multipliers require high amount of power and delay during the partial products

addition. At this stage, most of the multipliers are designed with different kind of adders

that are capable to add two/three or at most 4 bits by using 4-2 compressors. For higher

Page 8: Final Report

order multiplications, a huge number of adders or compressors are used to perform the

partial product addition. Binary counter property has been merged with the compressor

property to develop higher order compressors[3] [5].

1.2 Complex Number:-

A complex number is a number comprising a real and imaginary part. It can be

written in the form a + bi, where a and b are real numbers, and i is the standard imaginary

unit with the property i 2 = −1. To construct a complex number, we associate with each

real number a second real number. A complex number is then an ordered pair of real

numbers(a,b).

Complex numbers were first conceived and defined to to find solutions to cubic

equations. The solution of a general cubic equation in radicals (without trigonometric

functions) may require intermediate calculations containing the square roots of negative

numbers, even when the final solutions are real numbers. This ultimately led to the

fundamental theorem of algebra, which shows that with complex numbers, a solution

exists to every polynomial equation of degree one or higher. Complex numbers thus form

an algebraically closed field, where any polynomial equation has a root.

Complex numbers are usually written in the form (A+Bi), where a and b are real

numbers, and i is the imaginary unit, which has the property i 2 = −1. The real number a is

called the real part of the complex number, and the real number b is the imaginary part.

For example, 3 + 2i is a complex number, with real part 3 and imaginary part 2. If,

Z=A+Bi, the real part A is denoted by Re(Z) and imaginary part B is denoted by Im(Z).

The complex numbers (C) are regarded as an extension of the real numbers (R) by

considering every real number as a complex number with an imaginary part of zero. The

real number a is identified with the complex number a + 0i. Complex numbers with a real

part of zero (Re(z)=0) are called imaginary numbers. Instead of writing 0 + bi, that

imaginary number is usually denoted as just bi. If b equals 1, instead of using 0 + 1i or 1i,

the number is denoted as i.

Two complex numbers are said to be equal if and only if their real parts are

equal and their imaginary parts are equal. In other words, if the two complex numbers are

Page 9: Final Report

written as a + bi and c + di with a, b, c, and d real, then they are equal if and only if a = c

and b = d.[4] [5]

1.2.1 Operations of Complex Numbers:-

Complex numbers are added, subtracted, multiplied, and divided by formally applying

the associative, commutative and distributive laws of algebra, together with the equation

i 2 = −1. Here,i is the abbreviation of √–1(square root of -1). In other words, i is

something whose square is –1.

i) Addition:-

ii) Subtraction:-

iii)Multiplication:-

iv) Division:-

1.3 Organization of Thesis:-

Chapter 2. “Survey of Complex Multiplication”, in that General rules, Cases and Types

of Complex Multiplication is explained.

Chapter 3. These chapter will explained Basic “Multiplier Unit” using Compressor

technique, in that we explained how to generate partial products, compressor technique

and parallel adder to generate multiplication.

Page 10: Final Report

Chapter 4. Explained “Types of Multiplication”. It explains both unsigned and signed

number multiplication.

Chapter 5. “Results and Discussion”, it will explain all behavioral simulation result,

synthesis result and power calculation result for every multiplier.

Chapter 6. “Conclusion and Future Work”, will give conclusion of the thesis and any

future work.

-:References:-

[1] Power Reduction Techniques for Ultra-Low-Power Solutions by Virage

Logic Corporation.

[2] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design

of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5,

Number 1, April 2009, 31-39.

[3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-

ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,

International Journal of Electrical, Computer, and Systems Engineering,

2009, 234-239.

[4] Conway, John B. (1986), Functions of One Complex Variable I, Springer, ISBN 0-

387-90328-3

[5] K.Z. Pekmestzi, "Complex Number Multipliers" IEE Proceed- ings (Computers and

Digital Technology), Vol. 136, No. 1, 1989, pp. 70-75

Page 11: Final Report

Chapter 2.

Survey of Complex Multiplication

In many real-time DSP applications, high performance is a prime target.

However, achieving this may be done at the expense of area, power dissipation and

accuracy. Attempts have been made to use alternative number systems to optimize the

realization of arithmetic blocks, maintaining high performance without incurring

prohibitive area and power increases[1].

Fourier transforms play an important role in many digital signal processing

applications including speech, signal and image processing. However, direct computation

of Discrete Fourier Transform (DFT) requires on the order of N2 operations where N is

the transform size. Parallel-pipelined FFTs are preferred for both high throughput and

low power consumption.

2.1 General rule of Complex Multiplication:-

Consider two complex numbers: (a+bi) and (c+di) ,then

(a+bi).(c+di)=(ac-bd) + (ad+bc)i

(ac-bd) is the Real Part of Complex Multiplication and (ad+bc) is the Imaginary Part of

Complex Multiplication.

Remember that (ac–bd), the real part of the product, is the product of the real

parts minus the product of the imaginary parts, but (ad + bc), the imaginary part of the

product, is the sum of the two products of one real part and the other imaginary part.

Page 12: Final Report

The positive value is called the modulus of Z and is denoted as |Z|.

Z=a+bi , then |Z|=

2.2 Cases of Multiplication:-

i) Multiplication of Complex Number with Real Number:-

In the above formula for multiplication, if d is zero, then you get a formula for

multiplying a complex number a+bi and a real number c together:

(a+bi).c = ac + bc i.

In other words, we just multiply both parts of the complex number by the real

number. For example, let us take two numbers (1+2i) and 3 then after multiplication

of these two numbers we get:-

(1+2i).3= 3+6i

Geometrically, when you double a complex number, just double the distance from the

origin, 0. Similarly, when you multiply a complex number z by 1/2, the result will be

half way between 0 and z. You can think of multiplication by 2 as a transformation

which stretches the complex plane C by a factor of 2 away from 0; and multiplication

by 1/2 as a transformation which squeezes C toward 0.

ii) Multiplication of Complex Number with Imaginary Number:-

In the above formula for multiplication, if c is zero, then you get a formula for

multipliying a complex number a+bi and a imaginary number d together:

(a+bi).di = -bd+ad i.

In other words, we just multiply both parts of the complex number by the

imaginary number. For example, let us take two numbers (1+2i) and 3i then after

multiplication of these two numbers we get:-

(1+2i). 3i= -6+3i

Page 13: Final Report

2.3 Types of Complex Multiplication

2.3.1 Complex Multiplication for Area Efficient:-

i) Complex Multiplication using LNS [2]:-

Complex Multiplication for Lower Area i.e. to reduce hardware cost of realizing

Complex Multiplier is explained below using Logarithmic Number System(LNS). LNS

based complex multiplier employs correction algorithm. It composed with four real

multipliers, one adder and one subtractor. Attempts have been made to optimize the

realization of the complex multiplier by reducing the number of multipliers and

accumulating the partial products; however, the wider the input, the more partial product

layers that must be added in order to compute the result. To solve this problem, one can

consider the LNS to realize the multiplication as shown in Equations

Xo=AC-BD = log -1(log A + log C) – log -1(log B+ log D) Yo=BC+AD = log -1(log B+log C) + log -1(log A + log D)

Figure shows the complex multiplier block diagram that is composed from

logarithmic and anti-logarithmic converters and N-Bit Adders. This method can

significantly reduce the hardware to build a multiplier.

LNS provides a simple technique to compute multiplication at the cost of reduced

precision. This approach has limited accuracy.

ii) Complex Multiplier using OBC and DA [3] :-

Page 14: Final Report

A well known Area-Efficient method to implement Complex Multiplier is Offset

Binary Coded and Distributed Arithmetic. The structure of Complex Multiplier using

OBC-DA is shown below:-

Figure 2.1. OBC-DA based Complex Multiplier structure[3]

It is formed by the following modules:

a) Two registers that store a W-bits word each (-(cR-cI) and -(cR+cI)), whose outputs are

connected to two multiplexers that are controlled by an XOR of the

input bits.

b) Two shift-accumulators SA to add and shift the multiplexer output.

In this structure a subtraction can happens in each cycle of the computation, as a

difference with the previous one where it only happens during the last cycle. The extra-

bit slide is a bit-serial adder which is needed to complete the two’s complement in any

cycle. Another difference is that SA2 includes hardware for loading the offset

value (Ao) in carry registers.

2.3.2 Multiplication of Complex Number using a low power parallel multiplier:-

Page 15: Final Report

The Conventional Technique of Complex Multiplier is given as

(A + Bj) . (C + Dj) = (AC –BD) + (AD + BC )j

It requires four multiplication and two adders . In this technique a different way for the

realization of complex multiplication that reduces complexity of the circuit. The

canonical form of the obtained circuits makes them well suited for VLSI realizations.

Besides circuit reduction, the hardware or software for the control in the realization of the

algorithms is simplified, especially when either of these includes only complex

operations, as in an FET. Each complex bit takes four possible values. Consequently, it

must be represented by two bits. This representation allows the development of

algorithms for operations with complex numbers and the ability to describe these

algorithms in the bit-level. It is natural that these algorithms and the corresponding

circuits have great similarities to those for real numbers in two’s complement form.

Complex Parallel multiplication is the most critical for realization. The parallel multiplier

includes specialized hardware circuitry designed to perform complex multiplication

operations at high speeds. The parallel multiplier requires significantly less die area than

conventionally required, which results in reduced manufacturing costs and reduced power

consumption.[4]

2.4 Related Research:-

In FPGA designs power reduction is possible only through reduced switching

activity, which is also called dynamic power. In general dynamic power consumption is

defined as the power consumed while the clock is running and the external inputs are

switching. In general design practices to reduce switching activity reduction can be

controlled at various levels of the design flow. Architectural decisions in the early design

phases have the greatest impact. For high switching signals, delay balancing and

reduction of the number of logic levels are among the most efficient techniques to tackle

power penalty. An obvious method to reduce the switching activity is to shut down the

idle part of the circuit, which is not in operating condition.

A general M x N parallel multiplier operates by computing the partial products in

parallel and by shifting and accumulating the partial products. Switching activity is

Page 16: Final Report

poorly correlated with the input coefficient. In particular, reducing the switching activity

of the component used in the design can minimize the power dissipation i.e. if kth bit of

the coefficient is zero, the kth row of adders need not be activated. However, this type of

multiplier does not help us for reduced switching since there is unnecessarily switching

of adders even if the kth bit is zero.

2.4.1 Braun Multiplier[4][5] :-

Figure 2.2 4x4 Braun Multiplier

Above figure shows structure of 4*4 Braun Multplier. An n*n bit Braun

Multiplier requires n(n-1) adders and n2 AND gates. In these technique each partial

product can be added to previous sum of partial products by using row of adders. The

Carry-out signals are shifted one bit to the left and then added to the sum of the first

adder which is adition of partial product bits. The shifting of carry-out bits to the left is

done by carry-save adder. As carry bits are passed diagonally downward to the next adder

stage, there is no horizontal carry propagation for the first four rows. Instead, the

respective carry bit is “saved” for the subsequent adder stage.

Braun Multiplier has some drawback that, the number of components required in

building the Braun Multiplier increases quadratically with number of bits. This makes

Braun Multiplier inefficient. The delay of Braun Multiplier is dependent on full adder cell

Page 17: Final Report

and also on final adder in last row. In this multiplier array, a full adder with balanced

carry and sum delays is desirable because sum and carry both are in critical path .

2.4.2 Baugh-Wooley Multiplier[6]:-

Baugh-Wooley Multiplier are used for both unsigned and signed number

multiplication. Signed Number operands which are represented in 2’s complemented

form. Partial Products are adjusted such that negative sign move to last step, which in

turn maximize the regularity of the multiplication array. Baugh-Wooley Multiplier

operates on signed operands with 2’s complement representation to make sure that the

signs of all partial products are positive.

To reiterate, the numerical value of 2’s complement numbers, suppose X and Y

can be obtained from following product terms made of one AND gate.

Variables with bars denotes prior inversions. Inverters are connected before the input of

the full adder or the AND gates as required by the algorithm. Each column represents the

addition in accordance with the respective weight of the product term.

2.4.3 Multiplier using Bypassing circuitary:-

In these technique, The main idea of our approach is based on the observation that

most modern multipliers produce a large number of signal transitions while adding zero

partial products. If, any bit of the multiplier is zero that row of adders need not to be

activated, since corresponding partial product is zero. The adders of these multiplier,

however perform summation of the zero partial products and, as result, exhibit redundant

signal switching. The increased activity of the internal nodes results in unnecessary

power dissipation[7] [8].

Page 18: Final Report

To disable this adder rows we have to bypass the partial product of previous adder

row to next adder row. It modifies the unnecessary transitions and bypass inputs to

outputs when corresponding partial product is zero. Multiplexers are used at the output of

full adder to pass the partial product directly when it is zero to the next stage.

Figure 2.3 4*4 Bypass Multiplier

The tri-state buffers, placed at the inputs of the adder cell, disable signal transitions in

those adding cells which are bypassed. The output carry-bits c are passed downwards,

instead of to the right [9].

2.4.4 Multiplier using Adder-Subtractor Unit(ASU)[4] :-

In these technique, higher power reduction can be achieved if the operand

contains more number of 0’s than 1’s. In this approach it was propose Binary / Booth

Recoding Unit which will force operand to have more number of zeros. The advantage

here is that if operand contains more successive number of ones then Binary / Booth

Recoding unit converts these ones in zeros. Adder-Subtractor Unit also removes the extra

2’s complement addition circuitry needed. Use of look up table is again an added

advantage to this design.

The switching activity of the component used in the design depends on the input

bit coefficient. This means if the input bit coefficient is zero, corresponding row or

column of adders need not be activated. If operand contains more zeros, higher power

Page 19: Final Report

reduction can be achieved. We proposed a Binary / Booth Recoding Unit which will

force operand to have more number of zeros.

+/-+/-+/-

texttexttext +/-+/-+/-

+/-+/-+/-

SASASA

XOR

XOR

XOR

XOR

XOR

XOR

a1b0a2b0 a0b0a3b0s2b1 s1b1 s0b0

Mux

Mux

MuxMux

s2b2 s1b2 s0b2

MuxMuxMux

a0a1a2

a0a1a2XOR

s2b3XOR

s1b3XOR

s0b3

Mux Mux

XOR

s3b3 a0a1a2

P5 P1P2P3P4P6 P0

AND AND AND

a0a1a2

a3b1

a3b2

Figure 2.4 4*4 ASU Multiplier [4]

Figure shows the 4x4 low power ASU multiplier structure. This technique will be

very useful as we go for higher width of the multiplicand specially when there are

successive numbers of ones.Each ASU will work as an adder or subtractor depending

upon the sign bit of sign register. For multiplication with b it will make ASU to work as

subtractor and with 0 and 1, it will work as an adder. The great advantage of this

technique is that we don’t need extra addition circuitry to add sign extension bits when

multiplicand bit is –1. In the upper row of architecture we need to and sign bits with b0.

Since when sj=1 and b0=0, if not added produces wrong outputs. At the bottom, ASU

will work as half adder or subtractor depending upon the sign bits. For higher width of

multiplicand smart adder chain will continue.

Page 20: Final Report

Figure 2.5 Adder Subtractor Unit[1]

Figure 2.6: - Smart Adder (SA)

The Modified Full Adder-Subtractor Unit is constructed as shown in figure. If aj is zero,

FA is disabled. Here sj is a sign bit of operand. Structure of smart adder is shown in

figure.

2.5 Signed Number Multiplication:-

As we seen in unsigned multiplication, user has to input number as well as

sign ,so for total operation of this multiplier we required more hardware and more

switching operation hence the switching power, i.e. dynamic power will be more for

Unsigned Multiplication.

In Signed Multiplication, directly user has to enter signed number, so there is no

need to enter separate sign bit for all four numbers. The only difference between Signed

number and Unsigned number is the range of the number. As, we saw earlier in section

3.1 the range of the Unsigned number is from 0 to 2ⁿ-1. So, the range of the Signed

Number is from –2ⁿ -1 to +(2ⁿ -1-1).

2.5.1 Representation of Negative Numbers:-

Page 21: Final Report

For fixed-point number in a radix r system, we have to determine way of negative

number to be represented. Two different forms are commonly used:-

1. Sign and Magnitude Representation.

2. Complement Representation.

1.Sign and Magnitude Representation:-

In this form of representation sign and magnitude are represented separately. First

digit is sign bit and the remaining (n-1) bits are magnitude. In binary case, ‘0’ is

represented as positive and ‘1’ is represented as negative. In the non-binary case, value 0

and (r-1) are assigned to the sign digit of positive and negative number, respectively. In

the binary case all 2n sequences are utilized. The 2n-1 sequence from 00----0 to 01----1

represents positive number, while the remaining 2n-1 sequences from 10----0 to 11----1

represents negative number. A major disadvantage of the signed-magnitude

representation is that the operation to be performed may depend on the signs of the

operand. For example, when adding a positive number X and a negative number –Y, we

need to perform the calculation X+(-Y). If, Y>X, then we should obtain as a final result

–(Y-X). For that we have to perform (Y-X) ,i.e., switch the order of operands and

perform subtraction rather than addition, and then attach minus sign to it.

Example:- +7 would be 111 and then a 0 in front so 00000111 for an 8-bit representation.

-9 would be 1001 (+9) and then a 1 so 10001001 for an 8-bit representation

2. Complement Representation:-

In complement representation, numbers are represented as two’s complement in

the binary section. In this method, positive number is represented in the same way as

signed-magnitude method. It is most widely used method of representation. Positive

numbers are simply represented as a binary number with ‘0’ as sign bit. To get negative

number convert all 0’s to 1’s , all 1’s to 0’s and then add ‘1’ to it. Suppose, a number

which are in 2’s complement form and we have to find its value in binary, then if number

starts with ‘0’ then it is a positive number and if number starts with ‘1’ then it is a

negative number.

Page 22: Final Report

If, number is negative take the 2’s complement of that number, we will get number

in ordinary binary. Let us take, 1101. Take the 2’s complement then we will get 0011.

As, number is started with ‘1’ it is negative number and 0011 is binary representation of

positive 3. So, the number is -3. Similarly, we are representing other negative numbers in

2’s complement representation.

Suppose we are adding +5 and -5 in decimal we get ‘0’. Now, represent these

numbers in 2’s complement form, then we get +5 as 0101 and -5 as 1011. On adding

these two numbers we get 10000. Discard carry, then the number is represented as ‘0’

In this signed multiplication we had modified the Complex Multiplication

strategy, normally we are having Four Multipliers and three adder/subtractor blocks.

But,in modified strategy we require Three Multipliers and five Adders.

For Complex Multiplication of two numbers:-

(a+jb).(c+jd) we get

Real Part:- (c-d).b + c.(a-b)

Imaginary Part:- (c+d).a – c.(a-b)

So, we required only Three Multiplication term as c.(a-b) is common term in

both results. Hence, we are saving more power than we used in previous method of

Complex Multiplication.

2.5.2 Booth’s Recoding Algorithm:-

Parallel Multiplication using basic Booth’s Recoding algorithm

technique based on the fact that partial product can be generated for

group of consecutive 0’s and 1’s which is called as Booth’s Recoding.

These Booth’s Recoding algorithm is used to generate efficient partial

product. These Partial Products always have large number of bits than

the input number of bits. This width of partial product is usually

depends upon the radix scheme used for recoding. These generated

partial products are added by compressor’s as explained in section

Page 23: Final Report

3.2. So, these scheme uses less partial products which comprises low

power and area.

There are two types of algorithm Radix-2 and Radix-4 to

generate efficient partial products for multiplication. First we will

explain basic technique of Booth’s Recoding algorithm and then

Modified Booth’s Recoding technique for both Radix-2 and Radix-4

algorithm.

2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-

2 and Radix-4:-

Booth has proposed Radix algorithm for high speed multiplication

which reduces partial products for multiplication. The Booth’s

algorithm for multiplication is based on this observation. To do a

multiplication A*B, where

A= an ,an-1…..a0 is a multiplier

B= bn ,bn-1…..b0 is a multiplicand

then, we check every two consecutive bits in A at a time:-

Ai Ai-1 Y Comments Explanation

0 0 0 Middle of 0’s String of 0’s shift only

0 1 1.B End of 1’s Add and Shift

1 0 -1.B Beginning of 1’s Add and Shift

1 1 0 Middle of 1’s String of 1’s shift only

Table 2.1. Booth’s Recoding algorithm Radix-2

Ai+1 Ai Ai-1 Y Comments Explanation

0 0 0 0 Strings of zeros Two bit shift only

0 0 1 1.B End of 1’s Add and two bit shift

0 1 0 1.B A single 1 Add and two bit shift

0 1 1 2.B End of 1’s Add and two bit shift

1 0 0 -2.B Beginning of 1’s Add and two bit shift

1 0 1 -1.B A single 0 Add and two bit shift

Page 24: Final Report

1 1 0 -1.B Beginning of 1’s Add and two bit shift

1 1 1 0 Strings of zeros Two bit shift only

Table 2.2. Booth’s Recoding algorithm Radix-4

Let us take example:-

Radix-2:-

Suppose A is Multiplier having value -5 and B is Multiplicand having value +2 then,

B=> 0010 (+2)

A=> 1011 (-5)

After looking into above table for multiplicand, first we see two LSB values and then

adjacent values in A. We, get partial product as:-

i) For 10 we have to perform -1.B, i.e., 2’s complement of B, 1110.

ii) For 11 we have to put all 0’s i.e., 0000.

iii) For 01 we have to perform 1.B, i.e., value of B,0010

iv) For 10 again -1.B, i.e. 1110.

Here, some bits are encapsulated called as correction bits to match the width of partial

products.

Radix 4:-

A=> -5 => 1 1 1 1 1 0 1 1

Page 25: Final Report

B=> 46 => 0 0 1 0 1 1 1 0, then the following Partial Products are

generated:-

In the above technique of Booth’s Algorithm vertical length of

partial products are more, hence more adders are required, so

power and area will be more.

-:References:-

[1] Solomentsev, E.D. (2001), "Complex number", in Hazewinkel, Michiel,

Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104

[2] Man Yan Kong; Langlois, J.M.P.; Al-Khalili, D.(2008), “Efficient FPGA

implementation of complex multipliers using the logarithmic number system “Circuits

and Systems, 2008. ISCAS 2008. IEEE International Symposium on Digital Object

Identifier, Page(s): 3154 – 3157.

[3] Pascual, A.P.; Valls, J.; Peiro, M.M(1999), “Efficient complex-number multipliers

mapped on FPGA”, Electronics, Circuits and Systems, 1999. Proceedings of ICECS '99.

The 6th IEEE International Conference on

[4] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design

of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5,

Number 1, April 2009, 31-39.

Page 26: Final Report

[5] Jones, C.M. ; Dlay, S.S. ; Naguib, R.G.(Oct 1996), “Berger check prediction for

concurrent error detection in the Braun array multiplier”, Electronics, Circuits, and

Systems, 1996. ICECS '96., Proceedings of the Third IEEE International Conference,

Pages 81 - 84 vol.1

[6] C. R. Baugh and B. A.Wooley, .A two.s complement parallel array multiplication

algorithm., IEEE Trans. Comput., Dec. 1973, vol. C-22, pp. 1045-1047.

[7] Ko-Chi Kuo; Chi-Wen Chou (2006),” Low Power Multiplier with Bypassing

and Tree Strucuture” Circuits and Systems, 2006. APCCAS 2006. IEEE

Asia Pacific Conference 4-7 Dec. 2006,602 – 605.

[8] J. Ohban, V.G. Moshnyaga, and K. Inoue, Multiplier energy reduction through

bypassing of partial products, Asia-Pacific Conf. on Circuits and Systems. 2002.,vol.2,

pp. 13-17.

[9] Ming-Chen Wen, Sying-Jyan Wang, and Yen-Nan Lin, Low Power Parallel

Multiplier with Column Bypassing, Electronics letters, 10, 12 May 2005 Volume

41, Issue Page(s): 581 – 583

Chapter 3.

Multiplier Unit

As explained in previous chapters about various technique of Complex

Multipliers, we found that implementation of Complex Multipliers are implemented

using more than one number of Basic Multipliers are required, i.e. to implement normal

way to implement Complex Multiplication, four Basic Multipliers are required. To make

Complex Multiplier as low power unit, this Basic Multipliers are designed by using

Compressor technique. If, the Basic Multiplier is designed as low power then Complex

Multiplier also becomes a low power unit.

Page 27: Final Report

Figure 3.1 Internal Block Diagram of 16*16 Basic Multiplier[2]

The above figure shows Internal Block Diagram of Basic Multiplier. It consists of three

stages:-

i) Partial Product Generator

ii) Different Order Compressors

iii) Parallel Adder

Below is the description of all three blocks that are used for multiplication.

3.1 Partial Product Generator:-

In Unsigned Multiplier, normally we are generating partial products and adding

them to generate result of multiplier. Let ‘A’ and ‘B’ are two n-bit unsigned numbers

which is generating product ‘Z’ which is of 2n-bit. First we are generating Partial

products by using ‘AND’ operation. For n bit number multiplication n*n number of

partial product generated.

Page 28: Final Report

Let us take two 16-bit numbers A15-A0 called Multiplicand and B15-B0 called

Multiplier as inputs of multiplier, partial products are generated by ANDing each bit of

‘A’ with each bit of ‘B’, so 16*16=256 number of partial products are generated. Each

bit of multiplicand is ANDed with every bit of multiplicand. a0 is ANDed

with b0-b15 producing m00-m015 sixteen partial product for first row.

Similarly, for other 14 rows we are using AND operation of a1-a15 with

b0-b15 for producing other 240 remaining partial products i.

Figure 3.2. Partial Product Generator(4

Bit)

Page 29: Final Report

In above diagram Partial Product Generator is explained.

a0 bit which is multiplicand is ANDed with other bits of multiplier b0-b3

producing sixteen partial products m00-m33. This Partial Products is

going to the inputs of Compressors to compress the partial product

stages. This Compressors are used to reduce the stages of partial

products into only two stages.

3.2 Different Order Compressors[1][3][4]:-

After Generation of Partial Products, these partial

products are going to inputs to compressors. Compressors are used to

reduce the partial product stages of the multiplier. The main operation

of compressors is to count number of 1’s. After generating partial

products we have make vertical groups. This vertical groups will count

number of 1’s and count value of that group is passed it on second

stage.

3.2.1 Adder as Counter:-

Adder circuit whether it is a full adder or half adder can be used as a

counter which counts number of 1’s.

Page 30: Final Report

Figure 3.3. Half Adder Figure

3.4.Full Adder

A B Carry Sum

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

Page 31: Final Report

Table 3.1. Half Adder as a Counter

A B C Carry Sum

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Table 3.2. Full Adder as a Counter[2]

Above table shows the half adder and full adder as a

counter, it counts number of 1’s , if inputs are A,B and C then its

count value carry and sum together gives number of 1’s in binary

form. Carry is Most Significant Bit and Sum is Least Significant Bit.

This adder which uses three inputs and generating two outputs, so

it means it compresses three bits into two bits called 3:2

compressor.

Similarly, on the basis of these logic we can make other

types of compressors having more number of inputs called higher

order compressors. These compressors count number of 1’s of

higher number of inputs. So, as vertical length of partial products

increases we can use these higher order compressors.

3.2.2Compressor Logic:-

Page 32: Final Report

Different Compressor logic based upon the concept of

counter of full adder. It can be defined as single bit adder circuit

that has more than three inputs as in full adder and less number of

outputs. It is noticed that in full adder there are two outputs so, it

will count upto three(11). Similarly, for three bit output it will count

upto maximum seven(111) value.

Compressors having four,five,six and seven number of

inputs produces three number of outputs which counts maximum

seven(111) value. Other Compressors having eight to fifteen

number of inputs produces four number of outputs which counts

maximum fifteen(1111) value. So, these compressors are build

depend on number of inputs they are having and what count value

they have to generate. Following is the description of different

compressor logics with their block diagrams:-

1) 4:3 Compressor:-

Figure 3.5. Block Diagram of 4:3

Compressor

Page 33: Final Report

Above figure shows block diagram of 4:3 Compressor. It consists of

four inputs and three outputs. 4:3 Compressor has two Half Adders and

one Parallel Adder. If, all four inputs are 1 then it will give maximum

count value as 100 . Consider the output bits represented as j, (j+1),

and (j+2). (j+2)th bit is MSB and jth bit is LSB.

2) 5:3 Compressor:-

Figure 3.6. Block Diagram of 5:3 Compressor

Above figure shows block diagram of 5:3 compressor. It consists of

five inputs and three outputs. 5:3 Compressors has one Half adder,

one Full adder and a Parallel Adder. So, the maximum count value

will be 101. Consider the output bits represented as j, (j+1), and

(j+2). (j+2)th bit is MSB and jth bit is LSB.

Page 34: Final Report

3) 6:3 Compressor:-

Figure 3.7. Block Diagram of 6:3 Compressor

Above figure shows block diagram of 6:3 compressor. It consists of

six inputs and three outputs. 6:3 Compressor has two Full adders

and one parallel adder.So, the maximum count value of 6:3

compressor will be 110. Consider the output bits represented as j,

(j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.

4) 7:3 Compressor:-

Figure 3.8. Block Diagram of 7:3 Compressor

Page 35: Final Report

Above figure shows block diagram of 7:3 compressor. It consists of

seven inputs and three outputs. 7:3 Compressors has one 4:3

Compressor, one Full adder and one parallel adder. So, the

maximum count value of 7:3 compressor is 111. Consider the

output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and

jth bit is LSB.

5) 8:4 Compressor:-

Figure 3.9. Block Diagram of 8:4

Compressor

Above figure shows block diagram of 8:4 compressor. It consists of

eight inputs and four outputs. 8:4 Compressor has one 5:3

Page 36: Final Report

Compressor, one Full Adder and one Parallel Adder. The maximum

count value of 8:4 compressor is 1000. Consider the output bits

represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is

LSB.

6) 9:4 Compressor:-

Figure 3.10. Block Diagram of 9:4

Compressor

Above figure shows block diagram of 9:4 Compressor. It consists of

nine inputs and four outputs. 9:4 Compressor has one 6:3 Compressor,

one Full Adder and one parallel adder. The maximum count value of

9:4 compressor is 1001. Consider the output bits represented as j,

(j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.

6) 10:4 Compressor:-

Page 37: Final Report

Figure 3.11. Block Diagram of 10:4 Compressor

Above Figure shows block diagram of 10:4 Compressor. It consists

of ten inputs and four outputs. 10:4 Compressor has one 7:3

Compressor, one Full Adder and one Parallel Adder.The maximum

count value of 10:4 compressor is 1010. Consider the output bits

represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is

LSB.

7) 11:4 Compressor:-

Figure 3.12. Block Diagram of 11:4

Compressor

Page 38: Final Report

Above Figure shows Block Diagram of 11:4 Compressor. It consists

of eleven inputs and four outputs. 11:4 Compressor has one 7:3

Compressor, one 4:3 Compressor and one Parallel Adder. The

maximum count value of 11:4 compressor is 1011. Consider the

output bits represented as j, (j+1),(j+2) and (j+3). (j+3)th bit is MSB

and jth bit is LSB.

8) 12:4 Compressor:-

Figure 3.13. Block Diagram of 12:4

Compressor

Above Figure shows Block Diagram of 12:4 Compressor. It consists

of twelve inputs and four outputs. 12:4 Compressor has one 7:3

Compressor, one 5:3 Compressor and one three-bit Parallel adder.

The maximum count value of 12:4 compressor is 1100. Consider the

output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is

MSB and jth bit is LSB.

Page 39: Final Report

9) 13:4 Compressor:-

Figure 3.14. Block Diagram of 13:4

Compressor

Above Figure shows Block Diagram of 13:4 Compressor. It consists

of thirteen inputs and four outputs. 13:4 Compressors has one 7:3

Compressor, one 6:3 Compressor and one three-bit parallel

adder.The maximum count value of 13:4 compressor is 1101.

Consider the output bits represented as j, (j+1), (j+2) and (j+3).

(j+3)th bit is MSB and jth bit is LSB.

10) 14:4 Compressor:-

Page 40: Final Report

Figure 3.15. Block Diagram of 14:4

Compressor

Above Figure shows Block Diagram of 14:4 Compressor. It consists

of fourteen inputs and four outputs. 14:4 Compressor has two 7:3

Compressors and one three-bit parallel adder. The maximum count

value of 14:4 compressor is 1110. Consider the output bits

represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit

is LSB

11) 15:4 Compressor:-

Page 41: Final Report

Figure 3.16. Block Diagram of 15:4

Compressor

Above Figure shows Block Diagram of 15:4 Compressor. It consists

of fifteen inputs and four outputs. 15:4 Compressors has one 8:4

Compressor, one 7:3 Compressors and one three-bit parallel

adder.The maximum count value of 15:4 compressor is 1111.

Consider the output bits represented as j, (j+1), (j+2) and (j+3).

(j+3)th bit is MSB and jth bit is LSB

12) 16:5 Compressor:-

Figure 3.17. Block Diagram of 16:5

Compressor

Page 42: Final Report

Above Figure shows Block Diagram of 16:5 Compressor. It consists

of sixteen inputs and five outputs. 16:5 Compressors has two 8:4

Compressors and one four-bit parallel adder. The maximum count

value of 16:5 compressor is 10000. Consider the output bits

represented as j, (j+1), (j+2) ,(j+3) and (j+4). (j+4)th bit is MSB and

jth bit is LSB.

These different order Compressors are used to reduce the partial

product stages. Compressors are also used to reduce the switching

operations as we are used to count the number of 1’s only. The

partial products generated is divided into different order

compressors vertically.

3.3 Parallel Adders:-

Page 43: Final Report

Figure 3.18. Block Diagram of Parallel

Adder

Above figure shows Block Diagram of Parallel Adder. It consists of

cascaded Full Adder’s. Depending on length of output that many of

adders are used. For N*N multiplication 2N number of full adders are

used. Here, Cout of first full adder is connected to Cin of next adjacent

full adder. The main concept of these parallel adder is comes from

Carry Look-ahead Adder. The output of Parallel Adder is the final

output of Multiplier.

3.4 Architecture of Multiplier Using Compressor:-

Following figure shows the Architecture of 8*8 Multiplier using different

order Compressors.

.

Page 44: Final Report

Figure 3.19. Architecture of 8*8 Multiplier using

Compressors[2]

As, shown in above figure Partial Products are added in four

stages. Adders and different compressors are used to minimize the

stage operations. Compressors are used carefully so that minimum

number of outputs are generated. Consider column number eight,

where eight bits are added at the first stage. These eight bits are

added by using 8:4 Compressor, that generates four output which

eventually decreases number of bits for next stage.

It is to be mentioned that output of each compressor from 4:3 to

7:3 has bit position jth, (j+1)th and (j+2)th, where jth bit is LSB bit

and (j+2)th bit is MSB bit.Compressor from 8:4 to 15:4 has bit

position jth, (j+1)th, (j+2)th and (j+3)th, where jth bit is LSB and

(j+3)th is MSB. Compressor 16:5 has bit position jth, (j+1)th,

(j+2)th, (j+3)th and (j+4)th, where jth bit is LSB and (j+4)th is MSB.

Suppose, if compressor in column number four i.e.,4:3 Compressor,

its jth output goes to column number four and next adjacent output

i.e.,(j+1)th output goes to column number five and (j+2)th output

goes to column number six. Similarly, for eight column i.e. for 8:4

compressor,its jth output goes to column number eight and next

adjacent output (j+1)th output goes to column number nine and last

output(j+3)th output goes to column number eleven. Thus, these

compressors are used to reduce vertical critical path more rapidly.

Now, similarly for next stage if vertical path having bit more than

two bits, we used compressors of that many bits to reduce again

the vertical critical path. Finally, we use compressors upto the stage

where only vertically two bits are there and that two bits are added

parallely as explained in section 3.3.

Page 45: Final Report

-:References:-[1] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS

4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and

Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume:

51, Issue: 10, Oct. 2004.

[2] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-

ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,

International Journal of Electrical, Computer, and Systems Engineering,

2009, 234-239.

[3] J. Gu, C.H.Chang (2003), “Ultra low voltage low power 4-2 compressor

for high speed multiplications”. Circuits and Systems, 2003.ISCAS ’03.

Proceedings of the International Symposium, vol. 5, May 2003, 321-324.

[4] K. Prasad and K. K. Parthi (2001), “Low power 4-2 and 5-2 compressor”.

Proc. of the 35th Asilomar Conf. on Signals, Systems and Computors, vol.

1, ,2001,129-133.

Chapter 4.

Proposed Complex Multiplier

Page 46: Final Report

In these Chapter we proposed new Complex Multiplier for both unsigned and signed

Complex Multiplication.

4.1 Unsigned Multiplication:-

As, we saw in General rule of Complex Multiplication when we multiplying two

complex numbers we are getting four different multipliers and three

adders/subtractors. The range of unsigned number is 0 to 2ⁿ-1 Being as a unsigned

number, we have to enter separate sign for all four real numbers hence, we are getting

real and imaginary parts of the number with sign of real and imaginary by using some

combinational logic we are getting Real and Imaginary sign output.

Figure 4.1. Block Diagram of Unsigned Complex Multiplier

As shown in figure 1, we are entering four real numbers ‘a’,’b’,’c’ and ‘d’ & sign

of each number as ‘sa’, ‘sb’, ‘sc’, ‘sd’. After, multiplying the Real numbers using four

Page 47: Final Report

Multipliers and by using Add/Sub Block of 32 bit we are getting output as “rr” which is

Real part and “ri” which is Imaginary part of the result of Complex Multiplication.

Similarly, to get sign of result for both Real and Imaginary part we have to apply some

combinational logic for sign inputs and we are getting output sign as “ssr” for Real part

and “ssi” for Imaginary part.

As explained in Chapter 2. multiplication of Two Complex Numbers.

(a+bi).(c+di)=(ac-bd) + (ad+bc)i

As, we are entering sign of each number separately, we have to use some combinational

circuit to produce sign of result for Real part(sr) as well as Imaginary part(si).

Consider first term “ac” represent as ‘e’, “bd” represent as ‘f’, “ad” represent as ‘g’ and

“bc” represent as ‘h’. So, sign of these results represented as se,sf,sg and sh. So, these

sign results will be generated as by using XORing operations.

se= sa xor sc.

sf= sb xnor sd.

sg= sb xor sc.

Page 48: Final Report

sh= sa xor sd.

Figure 4.2. Combinational Logic for intermediate sign

Now, by using some condition on se, sf, sg, and sh, we are generating final sign result,

i.e. for “sr” for real part and “si” for imaginary part. We are applying 2:1 Mux to generate

the output sign value. ‘0’ is represented for Positive Value and ‘1’ is represented for

Negative Value.

Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part

4.2 Signed Multiplication:-

Page 49: Final Report

Figure 4.4. Modified Complex Multiplier Block Diagram.

Above Block Diagram shows Modified Complex Multiplier which consists of three

multipliers and three adder/subtractor unit. These multiplier requires one less multiplier

compare to previous technique. So, it consumes less power. To perform signed

multiplication we are using Booth’s Radix algorithm. Booth’s Radix algorithm reduces

partial products as compared to normal multiplier algorithm. So, it reduces the switching

operation of the multiplier, hence reduces power. It is based on the fact that partial

product can be generated for group of consecutive zeros & ones which

is called as Booth’s recoding.

4.2.1 Modified Technique Recoding Algorithm for Radix-2 and

Radix-4[1][2]:-

Parallel Multiplication using basic Booth Recoding Technique is

explained in previous section. Since this technique requires lot of adders as a

Page 50: Final Report

result it requires more power & area. In next proposed multiplier design, we have reduced

number of adders required in partial product addition. Hence, reduction of vertical length

of Partial Products. In these technique, mainly correction bits are reduced This is done

without compromising correctness of multiplication of 2’s complement numbers. We

have used Multiplexer based Booth Recoding scheme to reduce the length and width of

partial products.

In these technique, change in scheme results in partial products which after recoding

are always greater than input bit length by one bit Radix-2 scheme. Similarly, in Radix-4

scheme recoding are always greater than input bit length by two bits. These additional

bit/bits are act as a correction bit/bits to get correct value of the multiplier. Also, at

hardware realization of Booth’s recoding scheme, we can remove extra select line, which

is used at the time of recoding. Because of this extra select lines multiplexer size become

large. We have observed that if we do not consider this extra bit at the time of hardware

realization we can reduces size of one multiplexer. So, in radix 2 LSB decides first partial

product. Also, in radix 4 first two LSB bits decides first partial product. Now these partial

products have been added using proposed array of adders to achieve correct

multiplication output. The working of this novel design has been explained in following

sections.

Page 51: Final Report

Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier[1]

In order to achieve signed number multiplication Partial Products are generated

using Modified Booth’s Recoding Unit Multiplication block. After generation of new

Partial products these are added using Compressors and Parallel adder. Below is the

explanation of Modified Booth’s Recoding Unit for Multiplier.

4.2.2 Modified Booth’s Recoding Unit[3]:-

Partial Products are generated using Modified Booth’s Recoding Unit block. As,

we saw in previous section generation of Partial Products for basic Booth’s Recoding

algorithm, using the same concept we are generating partial products for Modified

Booth’s Recoding Algorithm having the length of partial product more than input bit

sequence by one for Radix-2 scheme and by two for Radix-4 scheme.

Page 52: Final Report

These modified technique is explained below:-

Radix-2 Method:-

As, we saw in Table 1. output partial products are added and shifted according to

input sequence. Here, we are using multiplexers to generate recoding unit. Select lines of

multiplexers are input bits of multiplier and outputs are according to modified table as

shown below:-

Ai Ai-1 Y Explanation

0 0 0 All 0’s

0 1 1.B [ B(n-1) , B ]

1 0 -1.B --------

[ B(n-1) , (-B) ]

1 1 0 All 0’s

Table 4.3 Modified Booth’s Recoding Algorithm Radix 2

This can be explained with simple example:-

Suppose B => 1100 (-4)

A => 1010 (-6)

So, according to table as shown above we will obtained recoding bits as partial products:-

PP0 => 0 0 0 0 0

PP1 => 0 0 1 0 0

PP2 => 1 1 1 0 0

PP3 => 0 0 1 0 0

Here, in Modified Booth’s Recoding algorithm one extra bit is added to the MSB

of the input bit sequence as shown in Table. The hardware realization for this recoding

unit is based on multiplexers and include 2’s complement unit. At the time of recoding

we are assuming one extra bit ‘0’ before the LSB of input bit sequence and these extra bit

‘0’ decides Partial Product according the sequence as explained in Table above. We have

observed that at the time of hardware realization only LSB is sufficient to get partial

Page 53: Final Report

products, because of these multiplexer become 2x1 rather than 4x1 and other

multiplexers will remain same as per their input select lines depending upon recoding

scheme. So, multiplexers are important hardware for Booth’s Recoding unit.

Radix-4 Method:-

Radix-4 scheme is same as above Radix-2 scheme which is also used to reduce

the partial product, so it is very useful for fast multiplication of long input bit sequence.

Here, partial products we got from recoding unit is always 2 bit more than input bits. So,

if input bits are n bits then partial product length will be of (n+2) bits.

Ai+1 Ai Ai-1 Y Explanation

0 0 0 0 All 0’s

0 0 1 1.A [A(n), A(n), A]

0 1 0 1.A [A(n), A(n), A]

0 1 1 2.A [A(n), A, 0]

1 0 0 -2.A --------

[A(n-1), -A, 0]

1 0 1 -1.A -------- --------

[A(n-1), A(n-1), -A]

1 1 0 -1.A -------- --------

[A(n-1), A(n-1), -A]

1 1 1 0 All 0’s

Table 4.4 Modified Booth’s Recoding Algorithm Radix-4

Above Table shows how partial products are generated according to input bit sequence.

Here, we are generating two extra bits according the input bit. These two bits are

correction bits to get corrected output of multiplication. MSBs of partial products need to

be added carefully. For that, new structure of adder array is introduced. This modification

removes the problem of large number of correction bits which requires more numbers of

adders hence more higher order compressors.

Page 54: Final Report

4.3 Compressors and Adders:-

Recoding and Addition scheme for Radix-2 and Radix-4 for four bit input sequence

[4] [5]:-

Figure 4.6 Addition scheme for Radix-2

Above figure shows the addition scheme for Radix-2 which having five bit partial

product. These partial product are added using compressor scheme as explained

previously. Here, value of m(0)(4) is added diagonally. i.e, added with diagonal bit which

is MSB of second partial product and also a correction bit. So, we are adding m(0)(4)

with m(1)(4) and result of that is putting in place of m(1)(4). Similarly, that new value of

MSB of second partial product row is added with old MSB of third partial product to get

new value of MSB of third partial product as shown in above figure. After getting new

values of correction bit we are adding these nits by using compressors.

Page 55: Final Report

Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 [5]

Above figure shows Architecture of 8*8 Signed Multiplier for Radix-2 scheme where

partial products are generated by using Modified Booth’s Recoding Unit. Here, we are

generating partial product of 9 bits per row. In first stage, this partial products are divided

in vertical blocks, these vertical blocks are half adders, full adders and different order

compressors. Vertical block of 2 Bits are half adders and vertical block of 3 bits are full

adders. Output of these adders and compressors arranged as explained in chapter 3.

Horizontal blocks are parallel adders which are used for addition to generate final

multiplication result.

Page 56: Final Report

Figure 4.8 Addition scheme for Radix-4

Above figure shows addition scheme for Radix-4 which having six partial product

bits, four LSB bits are input sequence and two MSB’s are correction bit. Here, MSB of

the first row of partial products is added to both MSB’s of second row. In Modified

Radix-4 scheme total number of partial products row are half of the normal partial

product scheme. Suppose, if the multiplier is of 4*4 bit then total number of rows for

partial product including correction bits are two, i.e. half of the rows of original scheme

as shown in above figure. Similarly, for other wide bit multiplier using radix-4 scheme

total number of partial products row are half of the original, that results in less switching

operation hence, less power.

Page 57: Final Report

Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4

Above figure shows Architecture of 8*8 Signed Multiplier of Radix-4 scheme

where Partial Products are generated by using Modified Booth’s Recoding Unit. In this

scheme we are generating partial products of 10 bit each, i.e. extra two bit for each row as

explained in table of Radix-4 scheme. The main advantage of Radix-4 scheme is that

number of rows for partial products are become half of the Radix-2 method, i.e., here in

8*8 multiplier number of partial products row are become four, so less compressors are

required and hence less switching operation which causes low-power.

Page 58: Final Report

-: References:-

[1] D. A Pucknell, K. Eshraghain, Basic VLSI Design, Prentice-Hall, ISBN

81-203-0986-3.

[2] Israel Koren, Computer arithmatics algorithms A.K.Peters Ltd. ISBN 1568811608.

[3] A.D.Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics

and Applied mathematics, vol-IV,pt-2-1951.

[4] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS

4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and

Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume:

51, Issue: 10, Oct. 2004.

[5] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-

ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,

International Journal of Electrical, Computer, and Systems Engineering,

2009, 234-239.

Page 59: Final Report

Chapter 5.

Results and Discussion

5.1 Behavioral Simulation

5.2 Synthesis Report

5.3 Power Calculation

5.4 Layout

This section shows all the results of different blocks which are used for

implementation of Complex Multiplier. It consists of Simulation Results of different

blocks, Synthesis Report and Power Calculation of different blocks. Power of the design

is calculated by giving 100 Random Inputs. Test Bench is written in VHDL. The textio

format is used where, input is given in input file called infile and we are getting output in

output file called outfile. All of the below design are simulated using ModelSim XE III

6.2g, synthesized by using Xilinx ISE Project Navigator 9.1i, power calculation using

Xilinx XPower tool. Power Calculation is also calculated in ASIC Encounter synthesis

tool.

Page 60: Final Report

5.1 Behavioral Simulation:-

i) Unsigned Basic Multiplier16*16:-

Figure 5.1 Behavioral Simulation of Unsigned 16*16 Basic Multiplier

Above Figure shows the simulation of 16*16 unsigned multiplier. Inputs are ‘a’ and ‘b’ each of 16 bit,

while ‘z’ is the 32 bit output. As, this is unsigned multiplier range of input number is from 0 to 65535.

Here, in these type of multiplier no negative number is considered. All are positive numbers. As

shown in the simulation diagram if both inputs ‘a’ and ‘b’ value is entered as unsigned 7 i.e.

“0000000000000111” in binary we get output ‘z’ value as 49 in unsigned format. Consider the

maximum value i.e. 65535 which is highest value for 16 bit unsigned format. It consists of all 1’s i.e.

“1111111111111111” in binary, we get output ‘z’ as 4294836225 which is the maximum value for

16*16 unsigned multiplier.

Page 61: Final Report

ii) Unsigned Complex Multiplier 16*16:-

Figure 5.2 Behavioral Simulation of 16*16 Unsigned Complex Multiplier.

Above figure shows waveform of 16*16 Complex Multiplier for unsigned number. Here, four

inputs are there ‘a’,’b’,’c’ and ‘d’ of 16 bit input each. As, the inputs are unsigned number, we have to

enter sign of each number separately. So, for all four inputs we are entering sign bit as ‘sa’ for input

‘a’, ‘sb’ for input ‘b’, ‘sc’ for input ‘c’ and ‘sd’ for input ‘d’.

As explained in section 4.1 block diagram of unsigned complex multiplication, we are getting

output of complex multiplier as shown in above figure. Operation of Complex Multiplier is explained

in above simulation waveform.

iii) Signed Multiplier 16*16:-

a) Radix-2:-

Page 62: Final Report

Figure 5.3 Behavioral Simulation of 16*16 Basic Signed Multiplier

Above figure shows Behavioral Simulation of 16*16 Basic Signed Multiplier. In these scheme

we have to enter signed values of input i.e.,’a’ and ‘b’. Inputs are of 16 bit while output ‘x’ is of 32 bit.

Here, the range of the numbers are from -32768 to +32767. As, these is signed number multiplier so

both positive and negative numbers are considered.

As shown in above figure result of signed multiplier, here we don’t have to input sign value of

each input as we are required in Unsigned scheme. Negative numbers are entered in 2’s complement

form. Suppose, we are putting value of ‘a’ and ‘b’ as 7 and -7 respectively. As, ‘a’ is positive number

so we enter value as “0000000000000111” and ‘b’ as negative number so, we enter value as

“111111111111001” for -7 which is in 2’s complement form. Result ‘z’ we got here in these case is in

binary form is “1111111111001111” which is value of -49 in 2’s complement form.

b) Radix-4

In Radix-4 design simulation result is same as Radix-2 scheme. Only difference between these two

schemes are synthesis report

iv) Signed 16*16 Complex Multiplier:-

i) Radix-2

Page 63: Final Report

Figure 5.4 Behavioral Simulation of 16*16 Complex Signed Multiplier

Above figure shows Behavioral simulation of 16*16 Complex Signed Multiplier. In these

scheme we are entering inputs ‘a’,’b’,’c’ and ‘d’ in both positive and negative format. So, there is no

need to enter sign bits for all inputs.

As, we discussed the range of the number and format of number in previous section, consider

the first example where a=1,b=2,c=3 and d=4. All these numbers are positive number so we put their

binary values as normal binary weighted values. After calculation of (1+2i).(3+4i) we get result as

5-10i. Real part is +5 and imaginary part is -10. These result in binary format is written as for +5 it is

“00000000000000000000000000000101” and for -10 it is “11111111111111111111111111110110”

which is in 2’s complement form.

v) Radix-4:-

Behavioral Simulation of Radix-4 Complex Multiplier is same as Radix-2 scheme.

Page 64: Final Report

5.2 Synthesis Report:-

i) Unsigned Basic Multiplier16*16:-

Design Summary:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Logic Utilization:

Number of 4 input LUTs: 714 out of 18,560 3%

Logic Distribution:

Number of occupied Slices: 405 out of 9,280 4%

Number of Slices containing only related logic: 405 out of 405 100%

Number of Slices containing unrelated logic: 0 out of 405 0%

Total Number of 4 input LUTs: 714 out of 18,560 3%

Number of bonded IOBs: 64 out of 564 11%

Total equivalent gate count for design: 4,287

Combinational Path Delay:- 34.009ns

ii) Unsigned Complex Multiplier 16*16:-

Design Summary:-

Logic Utilization:

Number of Slice Latches: 2 out of 18,560 1%

Number of 4 input LUTs: 3,422 out of 18,560 18%

Logic Distribution:

Number of occupied Slices: 1,891 out of 9,280 20%

Number of Slices containing only related logic: 1,891 out of 1,891 100%

Number of Slices containing unrelated logic: 0 out of 1,891 0%

Total Number of 4 input LUTs: 3,422 out of 18,560 18%

Number of bonded IOBs: 136 out of 564 24%

IOB Latches: 66

Total equivalent gate count for design: 21,760

Page 65: Final Report

Combinational Path Delay:- 41.271 ns

iii) Signed Basic Multiplier 16*16 radix 2:-

Design Summary

Xilinx FPGA xc2vp20-5ff1152:-

Logic Utilization:

Number of 4 input LUTs: 811 out of 18,560 4%

Logic Distribution:

Number of occupied Slices: 468 out of 9,280 5%

Number of Slices containing only related logic: 468 out of 468 100%

Number of Slices containing unrelated logic: 0 out of 468 0%

Total Number of 4 input LUTs: 812 out of 18,560 4%

Number used as logic: 811

Number used as a route-thru: 1

Number of bonded IOBs: 64 out of 564 11%

Total equivalent gate count for design: 4,980

Combinational Path Delay:-35.432 ns

iv) Signed Basic Multiplier 16*16 radix-4.

Design Summary

Xilinx FPGA xc2vp20-5ff1152:-

Logic Utilization:

Number of 4 input LUTs: 705 out of 18,560 3%

Logic Distribution:

Number of occupied Slices: 392 out of 9,280 4%

Number of Slices containing only related logic: 392 out of 392 100%

Number of Slices containing unrelated logic: 0 out of 392 0%

Total Number of 4 input LUTs: 707 out of 18,560 3%

Number used as logic: 705

Number used as a route-thru: 2

Page 66: Final Report

Number of bonded IOBs: 63 out of 564 11%

Total equivalent gate count for design: 4,422

Combinational Path Delay:-35.858 ns

v) Signed 16*16 Complex Multiplier Radix-2:-

Design Summary:-

Xilinx FPGA xc2vp20-5ff1152:-

Logic Utilization:

Number of 4 input LUTs: 3,903 out of 18,560 21%

Logic Distribution:

Number of occupied Slices: 2,238 out of 9,280 24%

Number of Slices containing only related logic: 2,238 out of 2,238 100%

Number of Slices containing unrelated logic: 0 out of 2,238 0%

Total Number of 4 input LUTs: 3,908 out of 18,560 21%

Number used as logic: 3,903

Number used as a route-thru: 5

Number of bonded IOBs: 126 out of 564 22%

Total equivalent gate count for design: 24,231

Combinational path Delay:- 58.181 ns

vi) Signed 16*16 Complex Multiplier Radix-4:-

Design Summary:-

Xilinx FPGA xc2vp20-5ff1152:-

Logic Utilization:

Number of 4 input LUTs: 3,195 out of 18,560 17%

Logic Distribution:

Number of occupied Slices: 1,758 out of 9,280 18%

Number of Slices containing only related logic: 1,758 out of 1,758 100%

Number of Slices containing unrelated logic: 0 out of 1,758 0%

Total Number of 4 input LUTs: 3,200 out of 18,560 17%

Page 67: Final Report

Number used as logic: 3,195

Number used as a route-thru: 5

Number of bonded IOBs: 126 out of 564 22%

Total equivalent gate count for design: 20,301

Combinational path delay: 57.847ns

5.3 Power Calculation:-

i) Unsigned Basic Multiplier 16*16:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:-52.68mW

Static Power:- 540.72 mW

Power-Delay Product:- 1.79 nJ

b) ASIC Encounter Synthesis:-

Number of Cells:- 668 out of 549815

Dynamic Power:- 18.97 mW

ii) Unsigned Complex Multiplier 16*16:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:- 6486.61mW

Static Power:- 7248.75mW

Power-Delay Product:- 267.7nJ

iii) Signed Basic Multiplier 16*16 radix 2:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:- 87.34mW

Page 68: Final Report

Static Power:- 554.68mW

Power-Delay Product:-3.09 nJ

b) ASIC Encounter Synthesis:-

Number of Cells:- 2818 out of 75981

Dynamic Power:- 3.84 mW

iv) Signed Basic Multiplier 16*16 radix-4:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:- 81.21mW

Static Power:- 464.07mW

Power-Delay Product:-2.9nJ

b) ASIC Encounter Synthesis:-

Number of Cells:- 653 out of 17774

Dynamic Power:- 2.83 mW

v) Signed Complex Multiplier 16*16 radix-2:-

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:- 80.78mW

Static Power:- 951.67mW

Power-Delay product:-4.69nJ

b) ASIC Encounter Synthesis:-

Number of Cells:- 3509 out of 115564

Dynamic Power:- 25.63 mW

vi) Signed Complex Multiplier 16*16 radix-4:-

Page 69: Final Report

a) Xilinx FPGA xc2vp20-5ff1152:-

Dynamic Power:- 80.78mW

Static Power:- 951.67mW

Power-Delay product:-4.69nJ

b) ASIC Encounter Synthesis:-

Number of Cells:- 1621 out of 46147

Dynamic Power:-10.48mW

5.4 Layout:-

Signed Complex Multiplier 16*16:-

Page 70: Final Report

Chapter 6.

Conclusion and Future Work

6.1 Conclusion

6.2 Future Work

Page 71: Final Report

This Chapter summarizes the conclusion for the design and also explained about future work.

6.1 Conclusion:-

Parallel Complex Multiplier using different order Compressors is explained. Use of

Compressors are used to reduce the switching activity and propagation delay for the Multipliers. It

also reduced vertical critical path delay, hence reduces stages of partial products. Optimal use of all

these thirteen different compressors improves the speed as well as power performance of the

multiplier. As, the delay and power both are reduced then power-delay product is also reduced.

Results are calculated in both FPGA and ASIC. FPGA we used in our design is xc2vp20-

5ff1152 to calculate all synthesis report and power for all multipliers. For, ASIC design we used

Encounter Synthesis Tool to calculate hardware information and power for all multipliers. It is found

that signed multipliers has less area and low power compared to unsigned multiplier.

6.2 Future Work:-

Complex Multiplier of higher width can be implemented using these compressors. More higher

order compressors can be design to reduce the vertical height for higher width multiplier, hence we

can achieve less power.Design of these Complex Multipliers are used to implement FFT/IFFT design

which are used in DSP applications.

SUMMARY:-

In order to evaluate performance of low power Complex Multiplier using Compressor technique, we

implement all these designs on Xilinx xc2vp20-5ff1152 FPGA. We compare the performance of

proposed Complex Multiplier using Compressor with research Multipliers which are explained in

Chapter 2. Table below highlighting the performance of all the multipliers with dynamic power,

combinational path delay and speed-power product.

Page 72: Final Report