final project report
TRANSCRIPT
![Page 1: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/1.jpg)
Design and verification of 8X8 Vedic Multiplier using
90nm CMOS Process Technology
Huan Wang
Department of Electrical and Computer Engineering
University of Massachusetts
Lowell, MA01854, USA
Riddhi Shah
Department of Electrical and Computer Engineering
University of Massachusetts
Lowell, MA01854, USA
Abstract—A previous paper mentioned this modified carry
select adder (CSA) by using Verilog. We transplant this design to
the transistor level. This CSA is considered to be the fastest
adder among the normal adder configuration. A multiplier is a
very important element in almost all the processors and
contributes substantially to the total power consumption of the
system. The novel point is the efficient use of Vedic algorithm
(sutras) that reduces the number of computational steps
considerably compared with traditional method. The schematic
for this multiplier is designed using Cadence. The design is then
verified in virtuoso using 90nm CMOS technology library file. In
the end we design the ideal multiplier using the Verilog to do the
verification with our design in transistor level. Paper presents a
systematic design methodology for this improved performance
digital multiplier based on Vedic mathematics.
Keywords— Multiplier, Vedic Multiplier, Ripple Carry Adder
I. INTRODUCTION
The multiplier is one of the most important structure in any processor nowadays. A binary multiplier is an electronic circuit used in digital circuit A variety of computer arithmetic techniques can be used to implement a digital multiplier. Most techniques involve computing a set of partial products, and then summing the partial products together[1].This process conducting long multiplication on decimal integers, but has been modified here for application to a binary number system. As more transistors per chip became available due to larger-scale integration, it became possible to put enough adders on a single chip to sum all the partial products at once, rather than reuse a single adder to handle each partial product one at a time.
As the common digital signal processing algorithms spend most of their time multiplying, the processors spend a lot of chip area in order to make the multiplication as fast as possible. Hence a non-conventional yet very efficient Vedic mathematics is used for making a high performance multiplier. Vedic Mathematics deals mainly with various Vedic mathematical formulae andtheir applications for carrying out large arithmetical operations easily[2]. The power consumption and speed performance are what to be compared with the existing digital multiplier designs.
II. VEDIC MULTIPLICATION ALGORITHM
A. The Vedic Sutras
Depending on the various branches of mathematics, Vedic algorithms are divided into 16 sutras (algorithms) [3], out of which two sutras are for multiplication as :
1. Nikhilam Navatashcaramam Dashatah – All from 9 and the last from 10.
2. Urdhva-Tiryagbhyam – Vertically and crosswise.
This paper is based on Urdhva-Tiryagbhyam(UT) sutra of Vedic multiplication, which is the most generalized method for multiplication. This sutra is used for binary multiplication for
making the digital multiplier. It is also called as“Vertically
and Crosswise” method of multiplication. An illustration of this multiplication algorithm is shown in the figure 1 below. Considering a digital hardware, a Vedic multiplier will be more power efficient and more faster also as less number of steps are required for multiplication. Also there is nearly no limitations attached to this multiplication algorithm
B. Example for general Multipllicand using Vedic
Mathmatics
Fig-1 shows the generalized line diagram for the UT algorithm. This algorithm is able to be used in all cases such as decimal multiplicand, binary multiplicand, etc. [4] All the multiplications being done here are in vertical and crosswise directions, requiring only 7 steps for multiplication of two, 4 bit numbers.
![Page 2: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/2.jpg)
Fig-1 Line diagram for UT algorithm
III. VLSI TECHNOLOGY USING CMOS LOGIC
Large integrated circuits can be constructed using CMOS logic with very low static power consumption. The increasing demand for low-power very large scale integration(VLSI) can be addressed at different design levels, such as the architectural, circuit, layout, and the process technology level. At the circuit design level, considerable potential for power savings exists by means of proper choice of a logic style for implementing combinational circuits. This is because all the important parameters governing power dissipation—switching capacitance, transition activity, and short-circuit currents—are strongly influenced by the chosen logic style. Depending on the application, the kind of circuit to be implemented, and the design technique used, different performance aspects become important. In the past, the parameters like high speed, small area and low cost were the major areas of concern, whereas power considerations are now gaining the attention of the scientific community associated with VLSI design. In recent years, the growth of personal computing devices (portable computers and real time audio and video based multimedia applications) and wireless communication systems has made power dissipation a most critical design parameter [5] .In the absence of low-power design techniques such applications generally suffer from very short battery life, while packaging
and cooling them would be very difficult and this is leading to an unavoidable increase in the cost of the product. In multiplication, reliability is strongly affected by power consumption. Usually, high power dissipation implies high temperature operation, which, in turn, has a tendency to induce several failure mechanisms in the system [6]. Power dissipation is the most critical parameter for portability & mobility and it is classified in to dynamic and static power dissipation. Dynamic power dissipation occurs when the circuit is operational, while static power dissipation becomes an issue whether circuit is inactive or is in a power-down mode. There are three major sources of power dissipation in digital CMOS circuit which are summarized in equation (1):
Pavg = Pswitching + Pshort circuit + Pleakage (1)
The first term represents the switching component of power, The second term is due to the direct-path short circuit current, I , which arises when both the NMOS and PMOS transistors are simultaneously active, conducting current directly from supply to ground. Finally, leakage current, which can arise from substrate injection and sub-threshold effects, is primarily determined by fabrication technology considerations. The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage (V) and emerges as a very effective means of limiting the power consumption. However, the saving in power Therefore, reduction of dissipation comes at a significant cost in terms of increased circuit delay. Since the exact analysis of propagation delay is quite complex, a simple first order derivation can be used to show the relation between power supply and delay time [7].
Td = Cl * Vdd/ (K*Vdd-Vth)α (2)
IV. MODIFIED MULTIPLIER ARCHITECTURE
The architectures for 2×2, 4×4, 8×8 bit modules are discussed in this section. In this section, the technique used is UT (Vertically and Crosswise) sutra.
A. 2X2 Vedic Multiplier Design
To show how it works. If we have 2 numbers each has two bits, let’s assume A=a1a0, B=b1b0. First the least significant bit (LSB) bit of final product (vertical) is obtained by taking the product of two least significant bit (LSB) bits of A and B is a0b0. Second step is to take the products in a crosswise manner such as the least significant bit (LSB) of the first number A (multiplicand) is multiplied with the next higher bit of the multiplicand B in a crosswise manner. The output generated is 1-Carry bit and 1bit used in the result as shown below. Next step is to take product of 2 most significant bits (MSB) and for the obtained result previously obtained carry should be added. The result obtained is used as the fourth bit of the final result and final carry is the other bit.[8]
s0 = a0b0
c1s1 = a1b0+ a0b1 (4)
c2s2 = c1 + a1b1 (5)
![Page 3: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/3.jpg)
The result of the 2X2 multiplier is c2s2s1s0. The 2X2 multiplier is composed of two half adders. The below figures are the schematic design of half adder and 2X2 multiplier in Cadence.
Fig-3 Half Adder Block Design
Fig-4 2X2 Vedic Multiplier Block Design
Fig-6 Simulation Result for 2X2 Vedic Multiplier
B. 4X4 Vedic Mulstiplier Deisng
In this part we will introduce how the 4X4 Multiplier
works. First let’s assume we have two numbers: A=a3b2b1b0,
B=b3b2b1b0. The procedure can be seen in the Block Design
Figure below. The final product will be c6s6s5s4s3s2s1s0.
The partial products are calculated in parallel and hence delay
obtained is decreased enormously for the increase in the
number of bits. The Least Significant Bit (LSB) S0 is obtained
easily by multiplying the LSBs of the multiplier and the
multiplicand. [8] The following equations show how the
multiplier does the algorithm.
S0 = A0B0 (6)
C1S1 = A1B0 + A0B1 (7)
C2S2 = C1 + A0B2 + A2B0 + A1B1 (8)
C3S3 = C2 + A0B3 + A3B0 + A1B2 + A2B1 (9)
C4S4 = C3 + A1B3 + A3B1 + A2B2 (10)
C5S5 = C4 + A3B2 + A2B3 (11)
C6S6 = C5 + A3B3 (12)
Fig-7 Full Adder Block Design
Fig-8 4-bit Ripple Carry Adder
Fig-9 4X4 Multiplier Block Design
![Page 4: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/4.jpg)
Fig-10 Full Adder Simulation Result
Fig-11 4X4 Vedic Multiplier Simulation Result
The function for the Ripple Carry Adder is that the carry
generated from the first ripple carry adder is passed on to the
next ripple carry adder and there are two zero inputs for
second ripple carry adder. The arrangement of the ripple carry
adders in Fig-9 can reduce the computational time such that
the delay can be decreased.
C. 8X8 Vedic Multiplier Design
In this part we will discuss the 8X8 Vedic Multiplier
design. Let’s assume we have two numbers
A=a7a6a5a4a3a2a1a0, B=b7b6b5b4b3b2b1b0. The procedure
could be explained by the following design figures. The final
product will be
S15S14S13S12S11S10S9S8S7S6S5S4S3S2S1S0. The partial
products are calculated in parallel and hence delay obtained is
decreased enormously for the increase in the number of bits.
The Least Significant Bit (LSB) S0 is obtained easily by
multiplying the LSBs of the multiplier and the multiplicand.
Here the multiplication is followed according to the steps
shown in the line diagram in figure 4. After performing all the
steps the result (Sn) and Carry (Cn) is obtained and in the
same way at each step the previous stage carry is forwarded to
the next stage and the process goes on. [8]
Fig-12 8X8 Vedice Multiplier Block Design
Fig-13 8X8 Vedic Multiplier Simulation Result.
Look at the block design for 8x8 as shown above. In the
block diagram 8x8 totally there are four 4x4 Vedic multiplier
modules, and three modified carry select adders which are of 8
bit size are used. The 8 bit modified carry select adders are
used for addition of two 8 bits and likewise totally four are use
at intermediate stages of multiplier. The carry generated from
the first modified carry select adder is passed on to the next
modified carry select adder and there are four zero inputs for
second modified carry select adders. The arrangements of the
modified carry select adders are shown in below block
diagram which can reduces the computational time such that
the delay can be decrease. [8]
V. VERFICATION
We have designed the 2X2, 4X4, 8X8 multiplier in
Verilog HDL and the simulation is done in ModelSim to do
the verification of our result. We also did the ideal block
design using the Verilog HDL to run the simulation in
Cadence to do the comparison. Also we did the comparison
with the traditional booth multiplier in Verilog HDL design.
(the codes can be found in Appendix)
![Page 5: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/5.jpg)
Fig-14 4X4 Vedic Multiplier Simulation Result in
Modelsim
Fig-15 8X8 Vedic Multiplier Simulation Result in
Modelsim
Fig-16 8X8 Booth Multiplier Simulation Result in
ModelSim
VI. SIMULATION RESULT ANALYSIS
1) For the 4X4 Vedic Multiplier:
Measurement Result
Pavg = 0.01371 W
Processor Time Required = 4.46 seconds
2) For the 8X8 Vedic Multiplier
Measurement Result:
Pavg = 0.0939 W
Processor Time Required = 13.97 seconds
Both for a transition time of 100ns.When compared with the
results obtained in [9] , the power consumption and processor
time required is found to be very less. The power consumption
using the gate level analysis in [9] for a 4-bit multiplier is
found to be 0.45W whether the results obtained in this paper
using transistor level analysis , gives it to be around 3 mW .
The power consumption for the 8-bit multiplier structure here
using four , 4-bit multipliers is found to be around 93 mW The
processor time required in the gate level analysis in [9] is 6.42
Seconds for the 4-bit multiplier against the 4.43 seconds
obtained in the Vedic multiplier designed above using CMOS
VLSI technology. Again the computational steps are also
reduced and hardware implementation required will also be
less as compared to the conventional methods and hence
enhancing the performance of the overall system.
VII. CONCLUSION
This paper represents an efficient Vedic multiplier design
using VLSI technology. Almost 80% power reduction at 1.2
volts can be achieved using this Vedic multiplier as compared
to its earlier counterparts using gate level analysis or the
conventional ways of multiplication. The processor's time
consumption is reduced from 6.42 Seconds to 4.43 Seconds
for the 4-bit Vedic multiplier and the computational
complexity is also less as it is requiring fewer numbers of
steps as compared to conventional multiplication methods. For
a real world application of this multiplier, it is implemented
for finding out the determinant of a 2 X 2 matrix which will be
having two, 8-bit multipliers and finding the difference of both
using two's compliment.
The design in a transplant from a previous design all use the
ideal block design in Verilog HDL. Transplanting this design
to the transistor level cause us a lot of problems in the delay
time which will have an influence on the later stage logic.
That’s why we can’t finish the 16X16 Vedic Multiplier
because the delay is so severe that we can’t get the right logic
out. And we redesign the full adder using the PTL solution,
which will be much faster and more power saving. And a
carry skip adder should be added to reduce the delay caused
by the ripple carry adder. For the power consumption part, as
the multiplier is using large number of MOSFETs so the
transistor’s switching characteristics also needs to be kept in
mind and buffers will be required at various nodes inside the
circuit for avoiding the voltage drop inside the circuit [10].
The design algorithm and the results show that this Vedic
multiplier requires less area and consumes less power as
compared to the conventional multipliers.
VIII. FUTURE WORKS
1. Do more research on the more efficient full-adder
design and try to add a carry skip adder to reduce the
delay time from ripple carry adder.
2. Design a built-in self-test circuitry for the verification
in hardware approach
ACKNOWLEDGMENT
I sincerely thank my partner Riddhi and Prof. Martin Margala,
for their help in completing this project. And special thanks to
Rajitha Gullapalli for her help in the Verilog Design part.
![Page 6: Final Project Report](https://reader035.vdocuments.net/reader035/viewer/2022080503/58e76ce01a28abd6068b4bad/html5/thumbnails/6.jpg)
REFERENCES
[1] Kai Hwang, Computer Arithmetic: Principles, Architecture And Design. New York: John Wiley & Sons, 1979
[2] Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, Yong Beom Cho, "Multiplier design based on ancient Indian Vedic Mathematics”, 2008 International SoC Design Conference, PP 65-68.
[3] Parth Mehta, Dhanashri Gawali,“Conventional versus Vedic
mathematical method for Hardware implementation of a multiplier” Department of ETC,Maharashtra Academy of Engg., ., Alandi(D),Pune, India, 2009
[4] Vedic Mathematics [Online]. Available: http://www.hinduism.co.za/vedic.htm.
[5] J.D. Lee, Y.J. Yoony, K.H. Leez, B.-G. Park, “Application of dynamic pass- transistor logic to an 8-bit multiplier,” J.Kor. Phys. Soc. 38 (3)
(2001) 220–223.
[6] Sung Mo Kang , Yusuf Leblebici " CMOS Digital Integrated Circuits, Third Edition , 2003.
[7] R. Jacob Baker, Harry W. Li, David E. Boyce " CMOS :Circuit Design Layout And Simulation (Book style) ", Third Edition, 2011.
[8] Bhavani Prasad.Y, Ganesh Chokkakula, Srikanth Reddy.P and Samhitha.N.R “Design of Low Power and High Speed Modified Carry Select Adder for 16 bit Vedic Multiplier”, ICICES2014, ISBN No.978-1-4799-3834-6/14
[9] Laxman P.Thakre, Suresh Balpande, Umesh Akare, Sudhir Lande, “Performance Evaluation and Synthesis of Multiplier used in FFT operation using Conventional and Vedic algorithms,” Third International Conference on Emerging Trends in Engineering and Technology , PP 614-619, IEEE, 2010.
[10] Kang, S.,“Accurate simulation of power dissipation in circuits”, IEEE Journal of Solid-State Circuits, vol. 21, pp.889-891, 1986.