the future of alu design to increase performance and ...cal.ucf.edu/3801_reports_fall_2015/future...

of 3

The Future of ALU Design to Increase Performance

and Density While Reducing Power Consumption

Bartholomew M. McDowell

Department of Electrical and Computer Engineering

University of Central Florida

Orlando, FL 32816-2362

[email protected]

Abstract—In researching ALU designs there is a pattern. There is

a need for a faster running design that will consume less power and

take up less space. Current designs are leveling off from Moore’s law

which means the increased technology in chip design will no longer

increase at the rate it has over the past 50 years. There is, however new

designs in logic, pipelining and architecture that will allow increased

performance without having to give up a major portion of speed or

power consumption. We will compare designs that are currently under

research in the field of chip design.

Keywords— Feedback Switch Logic, ALU, Power Consumption,

Density, Quantum-dot Cellular Automata, Pipelining, Probabilistic

Domain Transformation

I. INTRODUCTION

ALU design has gotten better and better with the use of

smaller chips that consume power at a much lower rate and

process data at a much higher speed. This paper will discuss a

few methods that are being researched and implemented in

designs. There is not one perfect solution to solving the future

needs of computing. Some ideas may have much higher speeds

of throughput but take much more power to run. The end user

will be the deciding factor on whether higher speeds or lower

power consumption is the priority.

The first design that we will look at will be Feedback Switch

Logic (FSL) which offers high speed and low power [1].

Pipelining allows the architecture of the ALU will allow the

next instruction to be fetched at the same moment the arithmetic

is being performed. Altera’s NIOS 2.0 soft processor

demonstrates this process [2].

Quantum-dot cellular automata (QCA) is a highly robust

full adder made up of square shaped cells that when aligned

correctly has electrons that can tunnel between them. The small

size is highly desired to keep chip size small for more

applications [3].

Multiplier-less discrete convolver use algorithms that are

encoded using probabilistic methods that translate the inputted

values into random samples that will be decoded using

probability to generate the output [4]

II. LITERATURE REVIEW

Feedback Switch Logic uses a clockless design that creates

the output and the compliment on a single side of the gate.

Figure 1 shows the gate design of the FSL. The transistor is

shared between separate networks. This design offers the speed

similar to dynamic logic while giving the switching behavior

similar to that of static logic without the use of a clock signal.

The resultant speed is faster than a standard CMOS design but

it does consume more power.

Ripple carry adders are typically the most common adders

in ALU designs. A Kogge Stone adder uses a method called

recursive doubling which splits the function up into two sub

functions that are equally complex and completes the arithmetic

in two different processors [5]. The Kogge-Stone is considered

the fastest design of any adder possible. The drawbacks are that

it obviously takes more space as it breaks up computations and

requires more power to do so. The power consumption is more

but only about a 6% increase over the standard CMOS logic

while giving a decrease in delay of 14%. These results are

displayed in figure 2.

Figure 2: Power consumption and delay of CMOS vs. FSL.

Pipelining is the method of fetching instructions as the

processor is performing the arithmetic. This method will

increase speed at the same time it can decrease power

consumption. In a three stage process such as operand-fetch,

execute and operand-write-back. This three stage operation can

be accomplished in two clock cycles with the use of pipelining.

Pipelining is comparable to an assembly line in an assembly

plant where multiple products are put together at the same time

at different stages all the while moving down the line at the

same rate. This can remove as many redundant operations as

possible in a process streamlining the operation [6].

CMOS transistors are reliable but the size is getting harder

and harder to decrease. One of the methods to decrease chip

size while increasing logic density is by using quantum –dot

cellular automata (QCA). QCA circuits are comprised of small

cells which contain quantum dots where two electrons or holes

are contained on opposite edges of the cell. The cell has two

states representing a zero or one. When these cells are arranged

in a linear pattern there will be a resulting interaction between

cells. The interaction between cells uses no current and can

transmit binary information from one end to another [7].

III. DATA ANALYSIS

Figure 1: FSL logic design

Figure 2: Power consumption and delay of CMOS vs. FSL.

IV. CONCLUSION

This paper has shown different methods of ALU designs for

varying demands. Feedback switch logic can achieve a high

percentage of speedup at a low increase of power consumption.

Pipelining helps an ALU run arithmetic at a steadier rate at a

cost of chip size. QCA is a highly robust full adder framework

that can run at very low power and take up a tiny amount of

space but the design process cannot be completely controlled

and there may be deposition defects. There are methods of

creating smaller faster and less power consuming designs. The

user must determine what is suitable for his needs. Moore’s law

is starting to level off which means the chip designs will either

have to change their designs or come up with a new technology.

CMOS designs seem to be doing the trick for now, but with

expanding needs in the 21st century. New Methods will have to

be developed to keep up with the demanding needs.

REFERENCES

[1] Prakash, P.; Saxena, A.K., "Design of Low Power High Speed ALU Using Feedback Switch Logic," Advances in Recent Technologies in Communication and Computing, 2009. ARTCom '09. International Conference on , vol., no., pp.899,902, 27-28 Oct. 2009

[2] Trivedi, Priyanka; Tripathi, Rajan Prasad, "Design & analysis of 16 bit RISC processor using low power pipelining," Computing, Communication & Automation (ICCCA), 2015 International Conference on , vol., no., pp.1294,1297, 15-16 May 2015

[3] A. Roohi, R. F. DeMara, and N. Khoshavi, "Design and Evaluation of an Ultra-Area-Efficient Fault-Tolerant QCA Full Adder," Microelectronics Journal, Vol. 46, No. 6, pp. 531-542., June 2015,

[4] M. Alawad, Y. Bai, R. F. DeMara, and M. Lin, “Energy-Efficient Multiplier-Less Discrete Convolver through Probabilistic Domain Transformation ,” in Proceedings of 22nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA-14), pp. 185-188, Monterey, California, USA, February 27-28, 2014.

[5] Kogge, Peter M., and Harold S. Stone. "A parallel algorithm for the efficient solution of a general class of recurrence equations." Computers, IEEE Transactions on 100.8 (1973): 786-793.

[6] Finlayson, Ian, et al. Improving processor efficiency by statically pipelining instructions. Vol. 48. No. 5. ACM, 2013.

[7] S. Salehi, and R. F. DeMara, "Energy and Area Analysis of a Floating-Point Unit in 15nm CMOS Process Technology," in Proceedings of IEEE SoutheastCon 2015 (SECon-2015), Fort Lauderdale, FL, April 9 - 12, 2015.

[8] Bhaskar Chatterjee, Manoj Sachdev, and Ram Krishnamurthy, “A CPL-based dual supply 32-bit ALU for sub 180nm CMOS technologies,” In Proceedings of the international symposium on Low power electronics and design (ISLPED '04). ACM, pp. 248-251, New York, NY, USA, 2004

[9] Paul Metzgen, “A high performance 32-bit ALU for programmable logic,” In Proceedings of the ACM/SIGDA 12th international symposium on Field programmable gate arrays (FPGA '04). ACM, pp. 61-70, New York, NY, USA ,2004.

[10] Gopal, L.; Mohd Mahayadin, N.S.; Chowdhury, A.K.; Gopalai, A.A.; Singh, A.K., "Design and synthesis of reversible arithmetic and Logic Unit (ALU)," Computer, Communications, and Control Technology (I4CT), 2014 International Conference on , vol., no., pp.289,293, 2-4 Sept. 2014.

[11] J. Di, J. S. Yuan, and R. F. DeMara, "Improving Power-awareness of Pipelined Array Multipliers using 2-Dimensional Pipeline Gating and its Application to FIR Design," Integration, the VLSI Journal, Vol. 39, No. 2, March, 2006, pp. 90-112.

[12] Lent, Craig S., and Beth Isaksen. "Clocked molecular quantum-dot cellular automata." Electron Devices, IEEE Transactions on 50.9 (2003): 1890-1896.N. Imran, R. F. DeMara, J. Lee, and J. Huang, "Self-adapting Resource Escalation for Resilient Signal Processing Architectures," The Springer Journal of Signal Processing Systems (JSPS, Volume 77, Issue 3, pp. 257-280), December 2014.

[13] J. Di, J. S. Yuan, and R. DeMara, "High Throughput Power-aware FIR Filter Design based on Fine-grain Pipeline Multipliers and Adders," in

TABLE I. <WRITE A CAPTION IN YOUR OWN WORDS ABOVE EACH TABLE.>

ALU or Floating Point

Architecture Name

Datapath width (bits)

or

#bits in operands

Time for Operation

or

Design Type

ITRS Technology

Node (nm)

or

Area

or

Model of Chip

used

Energy/Power

Consumption(W or J)

else

indicate “low” or

“high” Adder Multiplier Floating Point

Low Power High Speed ALU Using Feedback

Switch Logic [1]

32 bits (Operands)

480.2ps(RCA) 349.5ps(KSA)

436.4ps(HCA FSL

N/A N/A 90nm 555.1μw(RCA) 658.57(KSA)

638.44(HCA)

RISC Processor Using low Power Pipelining [2]

16 bit N/A

2/3 clock

cyclels of standard

multiplier

N/A 28nm 220mW

Ultra-area-efficient fault-

tolerant QCA full adder[3] 1 bit (Operands)

Ultra-area-

efficient fault-

tolerant QCA full adder

N/A N/A 18nm^2 (Cell

Area) low

Energy-Efficient Multiplier-Less Discrete Convolver

through Probabilistic

Domain Transformation [4]

128 bits (Operands) N/A

4.09 μs Energy-

Efficient

Multiplier

N/A

Virtex 6 FPGA

devices

(XC6VLX550t) (Model of Chip

used)

166.63 nJ

Energy and Area Analysis of a Floating-Point Unit [7]

32 bits (Operands) N/A N/A IEEE-754

Single

Precision

45nm and 15nm (ITRS Node)

2.048mW (45nm) 0.6340mW (15nm)

CPL based Dual 32-bit

ALU for Sub 180nm CMOS [8]

32 bits 180ps-500ps N/A N/A 65nm-180nm

~18-24%savings in

power consumption over

High Performance 32-bit

ALU for Programmable

logic(NIOS 2.0) [9]

32 bits

70%

reduction over

NIOS 1.1

N/A N/A

50% size

reduction over

NIOS 1.1

N/A

Reversible ALU designs

[10] 16 bit 7.39 ns N/A N/A N/A N/A

Finite Impulse Response (FIR) [11]

16-bit 1250MHz N/A N/A N/A N/A

Finite Impulse Response

Filter [13] 16-bit N/A N/A N/A 240nm CMOS N/A

Design #9

Design #10

the future of alu design to increase performance and ...cal.ucf.edu/3801_reports_fall_2015/future...

Documents