the future of alu design to increase performance and ...cal.ucf.edu/3801_reports_fall_2015/future...
TRANSCRIPT
Page 1 of 3
The Future of ALU Design to Increase Performance
and Density While Reducing Power Consumption
Bartholomew M. McDowell
Department of Electrical and Computer Engineering
University of Central Florida
Orlando, FL 32816-2362
Abstract—In researching ALU designs there is a pattern. There is
a need for a faster running design that will consume less power and
take up less space. Current designs are leveling off from Moore’s law
which means the increased technology in chip design will no longer
increase at the rate it has over the past 50 years. There is, however new
designs in logic, pipelining and architecture that will allow increased
performance without having to give up a major portion of speed or
power consumption. We will compare designs that are currently under
research in the field of chip design.
Keywords— Feedback Switch Logic, ALU, Power Consumption,
Density, Quantum-dot Cellular Automata, Pipelining, Probabilistic
Domain Transformation
I. INTRODUCTION
ALU design has gotten better and better with the use of
smaller chips that consume power at a much lower rate and
process data at a much higher speed. This paper will discuss a
few methods that are being researched and implemented in
designs. There is not one perfect solution to solving the future
needs of computing. Some ideas may have much higher speeds
of throughput but take much more power to run. The end user
will be the deciding factor on whether higher speeds or lower
power consumption is the priority.
The first design that we will look at will be Feedback Switch
Logic (FSL) which offers high speed and low power [1].
Pipelining allows the architecture of the ALU will allow the
next instruction to be fetched at the same moment the arithmetic
is being performed. Altera’s NIOS 2.0 soft processor
demonstrates this process [2].
Quantum-dot cellular automata (QCA) is a highly robust
full adder made up of square shaped cells that when aligned
correctly has electrons that can tunnel between them. The small
size is highly desired to keep chip size small for more
applications [3].
Multiplier-less discrete convolver use algorithms that are
encoded using probabilistic methods that translate the inputted
values into random samples that will be decoded using
probability to generate the output [4]
II. LITERATURE REVIEW
Feedback Switch Logic uses a clockless design that creates
the output and the compliment on a single side of the gate.
Figure 1 shows the gate design of the FSL. The transistor is
shared between separate networks. This design offers the speed
similar to dynamic logic while giving the switching behavior
similar to that of static logic without the use of a clock signal.
The resultant speed is faster than a standard CMOS design but
it does consume more power.
Ripple carry adders are typically the most common adders
in ALU designs. A Kogge Stone adder uses a method called
recursive doubling which splits the function up into two sub
functions that are equally complex and completes the arithmetic
in two different processors [5]. The Kogge-Stone is considered
the fastest design of any adder possible. The drawbacks are that
it obviously takes more space as it breaks up computations and
requires more power to do so. The power consumption is more
but only about a 6% increase over the standard CMOS logic
while giving a decrease in delay of 14%. These results are
displayed in figure 2.
Figure 2: Power consumption and delay of CMOS vs. FSL.
Pipelining is the method of fetching instructions as the
processor is performing the arithmetic. This method will
increase speed at the same time it can decrease power
consumption. In a three stage process such as operand-fetch,
execute and operand-write-back. This three stage operation can
be accomplished in two clock cycles with the use of pipelining.
Pipelining is comparable to an assembly line in an assembly
plant where multiple products are put together at the same time
at different stages all the while moving down the line at the
same rate. This can remove as many redundant operations as
possible in a process streamlining the operation [6].
CMOS transistors are reliable but the size is getting harder
and harder to decrease. One of the methods to decrease chip
size while increasing logic density is by using quantum –dot
cellular automata (QCA). QCA circuits are comprised of small
cells which contain quantum dots where two electrons or holes
are contained on opposite edges of the cell. The cell has two
states representing a zero or one. When these cells are arranged
in a linear pattern there will be a resulting interaction between
cells. The interaction between cells uses no current and can
transmit binary information from one end to another [7].
III. DATA ANALYSIS
Figure 1: FSL logic design
Figure 2: Power consumption and delay of CMOS vs. FSL.
IV. CONCLUSION
This paper has shown different methods of ALU designs for
varying demands. Feedback switch logic can achieve a high
percentage of speedup at a low increase of power consumption.
Pipelining helps an ALU run arithmetic at a steadier rate at a
cost of chip size. QCA is a highly robust full adder framework
that can run at very low power and take up a tiny amount of
space but the design process cannot be completely controlled
and there may be deposition defects. There are methods of
creating smaller faster and less power consuming designs. The
user must determine what is suitable for his needs. Moore’s law
is starting to level off which means the chip designs will either
have to change their designs or come up with a new technology.
CMOS designs seem to be doing the trick for now, but with
expanding needs in the 21st century. New Methods will have to
be developed to keep up with the demanding needs.
REFERENCES
[1] Prakash, P.; Saxena, A.K., "Design of Low Power High Speed ALU Using Feedback Switch Logic," Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on , vol., no., pp.899,902, 27-28 Oct. 2009
[2] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bit RISC processor using low power pipelining," Computing, Communication & Automation (ICCCA), 2015 International Conference on , vol., no., pp.1294,1297, 15-16 May 2015
[3] A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of an Ultra-Area-Efficient Fault-Tolerant QCA Full Adder," Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015,
[4] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, “Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transformation ,” in Proceedings of 22nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, California, USA, February 27-28, 2014.
[5] Kogge, Peter M., and Harold S. Stone. "A parallel algorithm for the efficient solution of a general class of recurrence equations." Computers, IEEE Transactions on 100.8 (1973): 786-793.
[6] Finlayson, Ian, et al. Improving processor efficiency by statically pipelining instructions. Vol. 48. No. 5. ACM, 2013.
[7] S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology," in Proceedings of IEEE SoutheastCon 2015 (SECon-2015), Fort Lauderdale, FL, April 9 - 12, 2015.
[8] Bhaskar Chatterjee, Manoj Sachdev, and Ram Krishnamurthy, “A CPL-based dual supply 32-bit ALU for sub 180nm CMOS technologies,” In Proceedings of the international symposium on Low power electronics and design (ISLPED '04). ACM, pp. 248-251, New York, NY, USA, 2004
[9] Paul Metzgen, “A high performance 32-bit ALU for programmable logic,” In Proceedings of the ACM/SIGDA 12th international symposium on Field programmable gate arrays (FPGA '04). ACM, pp. 61-70, New York, NY, USA ,2004.
[10] Gopal, L.; Mohd Mahayadin, N.S.; Chowdhury, A.K.; Gopalai, A.A.; Singh, A.K., "Design and synthesis of reversible arithmetic and Logic Unit (ALU)," Computer, Communications, and Control Technology (I4CT), 2014 International Conference on , vol., no., pp.289,293, 2-4 Sept. 2014.
[11] J. Di, J. S. Yuan, and R. F. DeMara, "Improving Power-awareness of Pipelined Array Multipliers using 2-Dimensional Pipeline Gating and its Application to FIR Design," Integration, the VLSI Journal, Vol. 39, No. 2, March, 2006, pp. 90-112.
[12] Lent, Craig S., and Beth Isaksen. "Clocked molecular quantum-dot cellular automata." Electron Devices, IEEE Transactions on 50.9 (2003): 1890-1896.N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting Resource Escalation for Resilient Signal Processing Architectures," The Springer Journal of Signal Processing Systems (JSPS, Volume 77, Issue 3, pp. 257-280), December 2014.
[13] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders," in
TABLE I. <WRITE A CAPTION IN YOUR OWN WORDS ABOVE EACH TABLE.>
ALU or Floating Point
Architecture Name
Datapath width (bits)
or
#bits in operands
Time for Operation
or
Design Type
ITRS Technology
Node (nm)
or
Area
or
Model of Chip
used
Energy/Power
Consumption(W or J)
else
indicate “low” or
“high” Adder Multiplier Floating Point
Low Power High Speed ALU Using Feedback
Switch Logic [1]
32 bits (Operands)
480.2ps(RCA) 349.5ps(KSA)
436.4ps(HCA FSL
N/A N/A 90nm 555.1μw(RCA) 658.57(KSA)
638.44(HCA)
RISC Processor Using low Power Pipelining [2]
16 bit N/A
2/3 clock
cyclels of standard
multiplier
N/A 28nm 220mW
Ultra-area-efficient fault-
tolerant QCA full adder[3] 1 bit (Operands)
Ultra-area-
efficient fault-
tolerant QCA full adder
N/A N/A 18nm^2 (Cell
Area) low
Energy-Efficient Multiplier-Less Discrete Convolver
through Probabilistic
Domain Transformation [4]
128 bits (Operands) N/A
4.09 μs Energy-
Efficient
Multiplier
N/A
Virtex 6 FPGA
devices
(XC6VLX550t) (Model of Chip
used)
166.63 nJ
Energy and Area Analysis of a Floating-Point Unit [7]
32 bits (Operands) N/A N/A IEEE-754
Single
Precision
45nm and 15nm (ITRS Node)
2.048mW (45nm) 0.6340mW (15nm)
CPL based Dual 32-bit
ALU for Sub 180nm CMOS [8]
32 bits 180ps-500ps N/A N/A 65nm-180nm
~18-24%savings in
power consumption over
High Performance 32-bit
ALU for Programmable
logic(NIOS 2.0) [9]
32 bits
70%
reduction over
NIOS 1.1
N/A N/A
50% size
reduction over
NIOS 1.1
N/A
Reversible ALU designs
[10] 16 bit 7.39 ns N/A N/A N/A N/A
Finite Impulse Response (FIR) [11]
16-bit 1250MHz N/A N/A N/A N/A
Finite Impulse Response
Filter [13] 16-bit N/A N/A N/A 240nm CMOS N/A
Design #9
Design #10