final project report

Design and verification of 8X8 Vedic Multiplier using

90nm CMOS Process Technology

Huan Wang

Department of Electrical and Computer Engineering

University of Massachusetts

Lowell, MA01854, USA

[email protected]

Riddhi Shah

Department of Electrical and Computer Engineering

University of Massachusetts

Lowell, MA01854, USA

[email protected]

Abstract—A previous paper mentioned this modified carry

select adder (CSA) by using Verilog. We transplant this design to

the transistor level. This CSA is considered to be the fastest

adder among the normal adder configuration. A multiplier is a

very important element in almost all the processors and

contributes substantially to the total power consumption of the

system. The novel point is the efficient use of Vedic algorithm

(sutras) that reduces the number of computational steps

considerably compared with traditional method. The schematic

for this multiplier is designed using Cadence. The design is then

verified in virtuoso using 90nm CMOS technology library file. In

the end we design the ideal multiplier using the Verilog to do the

verification with our design in transistor level. Paper presents a

systematic design methodology for this improved performance

digital multiplier based on Vedic mathematics.

Keywords— Multiplier, Vedic Multiplier, Ripple Carry Adder

I. INTRODUCTION

The multiplier is one of the most important structure in any processor nowadays. A binary multiplier is an electronic circuit used in digital circuit A variety of computer arithmetic techniques can be used to implement a digital multiplier. Most techniques involve computing a set of partial products, and then summing the partial products together[1].This process conducting long multiplication on decimal integers, but has been modified here for application to a binary number system. As more transistors per chip became available due to larger-scale integration, it became possible to put enough adders on a single chip to sum all the partial products at once, rather than reuse a single adder to handle each partial product one at a time.

As the common digital signal processing algorithms spend most of their time multiplying, the processors spend a lot of chip area in order to make the multiplication as fast as possible. Hence a non-conventional yet very efficient Vedic mathematics is used for making a high performance multiplier. Vedic Mathematics deals mainly with various Vedic mathematical formulae andtheir applications for carrying out large arithmetical operations easily[2]. The power consumption and speed performance are what to be compared with the existing digital multiplier designs.

II. VEDIC MULTIPLICATION ALGORITHM

A. The Vedic Sutras

Depending on the various branches of mathematics, Vedic algorithms are divided into 16 sutras (algorithms) [3], out of which two sutras are for multiplication as :

1. Nikhilam Navatashcaramam Dashatah – All from 9 and the last from 10.

2. Urdhva-Tiryagbhyam – Vertically and crosswise.

This paper is based on Urdhva-Tiryagbhyam(UT) sutra of Vedic multiplication, which is the most generalized method for multiplication. This sutra is used for binary multiplication for

making the digital multiplier. It is also called as“Vertically

and Crosswise” method of multiplication. An illustration of this multiplication algorithm is shown in the figure 1 below. Considering a digital hardware, a Vedic multiplier will be more power efficient and more faster also as less number of steps are required for multiplication. Also there is nearly no limitations attached to this multiplication algorithm

B. Example for general Multipllicand using Vedic

Mathmatics

Fig-1 shows the generalized line diagram for the UT algorithm. This algorithm is able to be used in all cases such as decimal multiplicand, binary multiplicand, etc. [4] All the multiplications being done here are in vertical and crosswise directions, requiring only 7 steps for multiplication of two, 4 bit numbers.

Fig-1 Line diagram for UT algorithm

III. VLSI TECHNOLOGY USING CMOS LOGIC

Large integrated circuits can be constructed using CMOS logic with very low static power consumption. The increasing demand for low-power very large scale integration(VLSI) can be addressed at different design levels, such as the architectural, circuit, layout, and the process technology level. At the circuit design level, considerable potential for power savings exists by means of proper choice of a logic style for implementing combinational circuits. This is because all the important parameters governing power dissipation—switching capacitance, transition activity, and short-circuit currents—are strongly influenced by the chosen logic style. Depending on the application, the kind of circuit to be implemented, and the design technique used, different performance aspects become important. In the past, the parameters like high speed, small area and low cost were the major areas of concern, whereas power considerations are now gaining the attention of the scientific community associated with VLSI design. In recent years, the growth of personal computing devices (portable computers and real time audio and video based multimedia applications) and wireless communication systems has made power dissipation a most critical design parameter [5] .In the absence of low-power design techniques such applications generally suffer from very short battery life, while packaging

and cooling them would be very difficult and this is leading to an unavoidable increase in the cost of the product. In multiplication, reliability is strongly affected by power consumption. Usually, high power dissipation implies high temperature operation, which, in turn, has a tendency to induce several failure mechanisms in the system [6]. Power dissipation is the most critical parameter for portability & mobility and it is classified in to dynamic and static power dissipation. Dynamic power dissipation occurs when the circuit is operational, while static power dissipation becomes an issue whether circuit is inactive or is in a power-down mode. There are three major sources of power dissipation in digital CMOS circuit which are summarized in equation (1):

Pavg = Pswitching + Pshort circuit + Pleakage (1)

The first term represents the switching component of power, The second term is due to the direct-path short circuit current, I , which arises when both the NMOS and PMOS transistors are simultaneously active, conducting current directly from supply to ground. Finally, leakage current, which can arise from substrate injection and sub-threshold effects, is primarily determined by fabrication technology considerations. The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage (V) and emerges as a very effective means of limiting the power consumption. However, the saving in power Therefore, reduction of dissipation comes at a significant cost in terms of increased circuit delay. Since the exact analysis of propagation delay is quite complex, a simple first order derivation can be used to show the relation between power supply and delay time [7].

Td = Cl * Vdd/ (K*Vdd-Vth)α (2)

IV. MODIFIED MULTIPLIER ARCHITECTURE

The architectures for 2×2, 4×4, 8×8 bit modules are discussed in this section. In this section, the technique used is UT (Vertically and Crosswise) sutra.

A. 2X2 Vedic Multiplier Design

To show how it works. If we have 2 numbers each has two bits, let’s assume A=a1a0, B=b1b0. First the least significant bit (LSB) bit of final product (vertical) is obtained by taking the product of two least significant bit (LSB) bits of A and B is a0b0. Second step is to take the products in a crosswise manner such as the least significant bit (LSB) of the first number A (multiplicand) is multiplied with the next higher bit of the multiplicand B in a crosswise manner. The output generated is 1-Carry bit and 1bit used in the result as shown below. Next step is to take product of 2 most significant bits (MSB) and for the obtained result previously obtained carry should be added. The result obtained is used as the fourth bit of the final result and final carry is the other bit.[8]

s0 = a0b0

c1s1 = a1b0+ a0b1 (4)

c2s2 = c1 + a1b1 (5)

The result of the 2X2 multiplier is c2s2s1s0. The 2X2 multiplier is composed of two half adders. The below figures are the schematic design of half adder and 2X2 multiplier in Cadence.

Fig-3 Half Adder Block Design

Fig-4 2X2 Vedic Multiplier Block Design

Fig-6 Simulation Result for 2X2 Vedic Multiplier

B. 4X4 Vedic Mulstiplier Deisng

In this part we will introduce how the 4X4 Multiplier

works. First let’s assume we have two numbers: A=a3b2b1b0,

B=b3b2b1b0. The procedure can be seen in the Block Design

Figure below. The final product will be c6s6s5s4s3s2s1s0.

The partial products are calculated in parallel and hence delay

obtained is decreased enormously for the increase in the

number of bits. The Least Significant Bit (LSB) S0 is obtained

easily by multiplying the LSBs of the multiplier and the

multiplicand. [8] The following equations show how the

multiplier does the algorithm.

S0 = A0B0 (6)

C1S1 = A1B0 + A0B1 (7)

C2S2 = C1 + A0B2 + A2B0 + A1B1 (8)

C3S3 = C2 + A0B3 + A3B0 + A1B2 + A2B1 (9)

C4S4 = C3 + A1B3 + A3B1 + A2B2 (10)

C5S5 = C4 + A3B2 + A2B3 (11)

C6S6 = C5 + A3B3 (12)

Fig-7 Full Adder Block Design

Fig-8 4-bit Ripple Carry Adder

Fig-9 4X4 Multiplier Block Design

Fig-10 Full Adder Simulation Result

Fig-11 4X4 Vedic Multiplier Simulation Result

The function for the Ripple Carry Adder is that the carry

generated from the first ripple carry adder is passed on to the

next ripple carry adder and there are two zero inputs for

second ripple carry adder. The arrangement of the ripple carry

adders in Fig-9 can reduce the computational time such that

the delay can be decreased.

C. 8X8 Vedic Multiplier Design

In this part we will discuss the 8X8 Vedic Multiplier

design. Let’s assume we have two numbers

A=a7a6a5a4a3a2a1a0, B=b7b6b5b4b3b2b1b0. The procedure

could be explained by the following design figures. The final

product will be

S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0. The partial

products are calculated in parallel and hence delay obtained is

decreased enormously for the increase in the number of bits.

The Least Significant Bit (LSB) S0 is obtained easily by

multiplying the LSBs of the multiplier and the multiplicand.

Here the multiplication is followed according to the steps

shown in the line diagram in figure 4. After performing all the

steps the result (Sn) and Carry (Cn) is obtained and in the

same way at each step the previous stage carry is forwarded to

the next stage and the process goes on. [8]

Fig-12 8X8 Vedice Multiplier Block Design

Fig-13 8X8 Vedic Multiplier Simulation Result.

Look at the block design for 8x8 as shown above. In the

block diagram 8x8 totally there are four 4x4 Vedic multiplier

modules, and three modified carry select adders which are of 8

bit size are used. The 8 bit modified carry select adders are

used for addition of two 8 bits and likewise totally four are use

at intermediate stages of multiplier. The carry generated from

the first modified carry select adder is passed on to the next

modified carry select adder and there are four zero inputs for

second modified carry select adders. The arrangements of the

modified carry select adders are shown in below block

diagram which can reduces the computational time such that

the delay can be decrease. [8]

V. VERFICATION

We have designed the 2X2, 4X4, 8X8 multiplier in

Verilog HDL and the simulation is done in ModelSim to do

the verification of our result. We also did the ideal block

design using the Verilog HDL to run the simulation in

Cadence to do the comparison. Also we did the comparison

with the traditional booth multiplier in Verilog HDL design.

(the codes can be found in Appendix)

Fig-14 4X4 Vedic Multiplier Simulation Result in

Modelsim

Fig-15 8X8 Vedic Multiplier Simulation Result in

Modelsim

Fig-16 8X8 Booth Multiplier Simulation Result in

ModelSim

VI. SIMULATION RESULT ANALYSIS

1) For the 4X4 Vedic Multiplier:

Measurement Result

Pavg = 0.01371 W

Processor Time Required = 4.46 seconds

2) For the 8X8 Vedic Multiplier

Measurement Result:

Pavg = 0.0939 W

Processor Time Required = 13.97 seconds

Both for a transition time of 100ns.When compared with the

results obtained in [9] , the power consumption and processor

time required is found to be very less. The power consumption

using the gate level analysis in [9] for a 4-bit multiplier is

found to be 0.45W whether the results obtained in this paper

using transistor level analysis , gives it to be around 3 mW .

The power consumption for the 8-bit multiplier structure here

using four , 4-bit multipliers is found to be around 93 mW The

processor time required in the gate level analysis in [9] is 6.42

Seconds for the 4-bit multiplier against the 4.43 seconds

obtained in the Vedic multiplier designed above using CMOS

VLSI technology. Again the computational steps are also

reduced and hardware implementation required will also be

less as compared to the conventional methods and hence

enhancing the performance of the overall system.

VII. CONCLUSION

This paper represents an efficient Vedic multiplier design

using VLSI technology. Almost 80% power reduction at 1.2

volts can be achieved using this Vedic multiplier as compared

to its earlier counterparts using gate level analysis or the

conventional ways of multiplication. The processor's time

consumption is reduced from 6.42 Seconds to 4.43 Seconds

for the 4-bit Vedic multiplier and the computational

complexity is also less as it is requiring fewer numbers of

steps as compared to conventional multiplication methods. For

a real world application of this multiplier, it is implemented

for finding out the determinant of a 2 X 2 matrix which will be

having two, 8-bit multipliers and finding the difference of both

using two's compliment.

The design in a transplant from a previous design all use the

ideal block design in Verilog HDL. Transplanting this design

to the transistor level cause us a lot of problems in the delay

time which will have an influence on the later stage logic.

That’s why we can’t finish the 16X16 Vedic Multiplier

because the delay is so severe that we can’t get the right logic

out. And we redesign the full adder using the PTL solution,

which will be much faster and more power saving. And a

carry skip adder should be added to reduce the delay caused

by the ripple carry adder. For the power consumption part, as

the multiplier is using large number of MOSFETs so the

transistor’s switching characteristics also needs to be kept in

mind and buffers will be required at various nodes inside the

circuit for avoiding the voltage drop inside the circuit [10].

The design algorithm and the results show that this Vedic

multiplier requires less area and consumes less power as

compared to the conventional multipliers.

VIII. FUTURE WORKS

1. Do more research on the more efficient full-adder

design and try to add a carry skip adder to reduce the

delay time from ripple carry adder.

2. Design a built-in self-test circuitry for the verification

in hardware approach

ACKNOWLEDGMENT

I sincerely thank my partner Riddhi and Prof. Martin Margala,

for their help in completing this project. And special thanks to

Rajitha Gullapalli for her help in the Verilog Design part.

REFERENCES

[1] Kai Hwang, Computer Arithmetic: Principles, Architecture And Design. New York: John Wiley & Sons, 1979

[2] Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, Yong Beom Cho, "Multiplier design based on ancient Indian Vedic Mathematics”, 2008 International SoC Design Conference, PP 65-68.

[3] Parth Mehta, Dhanashri Gawali,“Conventional versus Vedic

mathematical method for Hardware implementation of a multiplier” Department of ETC,Maharashtra Academy of Engg., ., Alandi(D),Pune, India, 2009

[4] Vedic Mathematics [Online]. Available: http://www.hinduism.co.za/vedic.htm.

[5] J.D. Lee, Y.J. Yoony, K.H. Leez, B.-G. Park, “Application of dynamic pass- transistor logic to an 8-bit multiplier,” J.Kor. Phys. Soc. 38 (3)

(2001) 220–223.

[6] Sung Mo Kang , Yusuf Leblebici " CMOS Digital Integrated Circuits, Third Edition , 2003.

[7] R. Jacob Baker, Harry W. Li, David E. Boyce " CMOS :Circuit Design Layout And Simulation (Book style) ", Third Edition, 2011.

[8] Bhavani Prasad.Y, Ganesh Chokkakula, Srikanth Reddy.P and Samhitha.N.R “Design of Low Power and High Speed Modified Carry Select Adder for 16 bit Vedic Multiplier”, ICICES2014, ISBN No.978-1-4799-3834-6/14

[9] Laxman P.Thakre, Suresh Balpande, Umesh Akare, Sudhir Lande, “Performance Evaluation and Synthesis of Multiplier used in FFT operation using Conventional and Vedic algorithms,” Third International Conference on Emerging Trends in Engineering and Technology , PP 614-619, IEEE, 2010.

[10] Kang, S.,“Accurate simulation of power dissipation in circuits”, IEEE Journal of Solid-State Circuits, vol. 21, pp.889-891, 1986.

http://www.hinduism.co.za/vedic.htm

final project report

Documents