fpu project report

CARLETON UNIVERSITY

ELEC 4907

Design of a 32-bit RISC Microprocessor

with Floating Point Unit Design of a Floating Point Unit

Author: Adam Parsons Supervisor: M. Shams S/N: 100653270

April 5, 2010

Department of Electronics

2009-2010

Microprocessor Design April 5, 2010

ii

This fourth year project presents and examines the design of a microprocessor.

The project is to design a 32-bit RISC microprocessor with a floating point unit. The

design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam

Parsons.

This report covers the topics of professional engineering practices as well as

project management techniques, but it centers mainly on the microprocessor, and its

design. It provides background and information on the microprocessor and its

importance to today’s society.

The more technical portion of the report focuses heavily upon the Floating Point

Unit which is can be viewed as a coprocessor to the microprocessor that was designed.

It starts by focusing on the understanding of how a microprocessor operates, which is

then followed by a more in depth study of how a floating point unit is designed and

operated.

Furthermore, the results of the successful digital design testing are presented

and explained, with suggestions of improvements and further optimization techniques.

Abstract


iii

Acknowledgements

My immediate thanks go to Maitham Shams (project supervisor), for his constant

guidance. Under his instruction for this project, I have gained valuable skills that can be

applied in the workplace.

I would also like to thank my group members Chaiyas See-toh and Zain Zia, of

whom without this project would not have been completed. Their patience and

dedication to hard work made this project a success, and they were indeed a true

pleasure to work with.

I would also like to thank all those I have met in my abundance of years at

Carleton University. You have all kept me on the right track, as you constantly remind

me of things I had often forgotten. I would also like to thank the creators of

ASICWORLD.com, as well as AJDESIGNER.com, for without their guidance, I would be

lost in the language of Verilog and floating point calculations.

Most of all I would like to thank my parents, who patiently stood by me in all my

years of studies, although they don’t always understand what I am supposed to be

learning.

April 2010

Adam Parsons


iv

This is for those who are patient.

We’re here for the long haul.


v

Table of Contents Abstract ................................................................................................................................ii

Acknowledgements ............................................................................................................. iii

Table of Figures .................................................................................................................. vii

Table of Equations ............................................................................................................. vii

Table of Tables .................................................................................................................. viii

List of Abbreviations ......................................................................................................... viii

1.0 Introduction .................................................................................................................. 1

1.1 Purpose ......................................................................................................................... 1

1.1.1 Motivation .............................................................................................................. 1

1.1.2 Applications ............................................................................................................ 2

1.2 Report Overview ........................................................................................................... 2

2.0 Health and Safety ...................................................................................................... 4

2.1 Engineering Professionalism ..................................................................................... 6

2.2 Project Management ................................................................................................. 7

3.0 Project Overview ........................................................................................................... 9

3.1 Design Specifications .................................................................................................. 10

3.2 Design Methodology ................................................................................................... 12

4.0 Background of Floating Point Representation ............................................................ 14

4.1 Floating Point Unit ...................................................................................................... 18

4.2 Addition and Subtraction ............................................................................................ 19

4.2.1 Addition ................................................................................................................ 22

4.2.2 Subtraction ........................................................................................................... 23

4.3 Multiplication and Division ......................................................................................... 24

4.3.1 Multiplier .............................................................................................................. 26

4.3.2 Division ................................................................................................................. 28

4.4 Float to Integer ........................................................................................................... 30

4.5 Integer to Float ........................................................................................................... 32

4.6 Power Approximation ................................................................................................. 33

4.7 Square-Root ................................................................................................................ 38

4.8 Floating Point Control Unit ......................................................................................... 39

5.0 Digital Testing .............................................................................................................. 42


vi

5.1 Structural Analysis....................................................................................................... 42

5.2 Timing Analysis ............................................................................................................ 44

5.3 Implementation .......................................................................................................... 45

6.0 Concluding Remarks.................................................................................................... 47

6.1 Summary of Project Accomplishments ....................................................................... 47

6.2 Considerations for Future Work ................................................................................. 48

References ........................................................................................................................ 49

Appendix A: Verilog Design Code ..................................................................................... 50

Addition Module ........................................................................................................... 50

Subtraction Module ...................................................................................................... 53

Normalization Module............................................................................................... 56

24- bit Addition Module ............................................................................................ 58

Multiplication Module ................................................................................................... 60

Division Module ............................................................................................................ 63

Floating Point to Integer Conversion Module ............................................................... 65

Integer to Floating Point Conversion Module ............................................................... 68

Power Module ............................................................................................................... 70

Square Root Module ..................................................................................................... 72

Control Module ............................................................................................................. 73

Appendix B: Digital Testing Results................................................................................... 77

Standard Case Waveforms ............................................................................................ 77

Corner Case Tables ........................................................................................................ 79


vii

Table of Figures FIGURE 1: PROJECT SCHEDULE ........................................................................................................................ 8 FIGURE 2: PROCESSOR OVERVIEW .................................................................................................................. 9 FIGURE 3: WORKLOAD PARTITIONING CHART .............................................................................................. 12 FIGURE 4: FLOATING POINT BINARY ............................................................................................................. 16 FIGURE 5: FLOATING POINT BLOCK DIAGRAM .............................................................................................. 19 FIGURE 6: ADDITION/SUBTRACTION MODULE ............................................................................................. 21 FIGURE 7: CARRY LOOK-AHEAD ADDER ........................................................................................................ 22 FIGURE 8: TWO'S COMPLIMENT ................................................................................................................... 23 FIGURE 9: MULTIPLIER AND DIVIDER MODULE ............................................................................................. 25 FIGURE 10: MULTIPLICATION ALGORITHM ................................................................................................... 26 FIGURE 11: MULTIPLICATION BLOCK DIAGRAM............................................................................................ 28 FIGURE 12: DIVISION BLOCK DIAGRAM ......................................................................................................... 29 FIGURE 13: DIVISION ALGORITHM ................................................................................................................ 30 FIGURE 14: FLOAT TO INTEGER BLOCK .......................................................................................................... 31 FIGURE 15: INTEGER TO FLOAT DIAGRAM .................................................................................................... 32 FIGURE 16: LOG2 VS IEEE ESTIMATE ............................................................................................................. 34 FIGURE 17: POWER UNIT ............................................................................................................................... 37 FIGURE 18: SQUAREROOT UNIT .................................................................................................................... 39 FIGURE 19: FLOATING POINT CONTROL UNIT ............................................................................................... 40 FIGURE 20: ALTERA DE2 IMPLEMENTATION ................................................................................................. 46

Table of Equations EQUATION 1 .................................................................................................................................................. 32 EQUATION 2 .................................................................................................................................................. 32 EQUATION 3 .................................................................................................................................................. 33 EQUATION 4 .................................................................................................................................................. 35 EQUATION 5 .................................................................................................................................................. 36 EQUATION 6 .................................................................................................................................................. 36 EQUATION 7 .................................................................................................................................................. 36 EQUATION 8 .................................................................................................................................................. 36 EQUATION 9 .................................................................................................................................................. 38 EQUATION 10 ................................................................................................................................................ 38


viii

Table of Tables TABLE 1: IEEE-754 SPECIAL REPRESENTATIONS ............................................................................................ 17 TABLE 2: LOG ESTIMATE ERROR .................................................................................................................... 35 TABLE 3: LOG ESTIMATE ERROR CORRECTION .............................................................................................. 35 TABLE 4: STANDARD TEST CASE .................................................................................................................... 43 TABLE 5: SPECIAL TEST CASES ....................................................................................................................... 43 TABLE 6: FAST TIMING ANALYSIS .................................................................................................................. 44 TABLE 7: SLOW TIMING ANALYSIS ................................................................................................................. 45

List of Abbreviations

CPU Central Processing Unit

RISC Reduced Instruction Set Computer

FPGA Field Programmable Gate Array

OPCODE Operational Code

ALU Arithmetic Logic Unit

FPU Floating Point Unit

MIPS Microprocessor without Interlocked Pipeline Stages

NaN Not a Number

INF Infinity

FMAX Maximum Frequency

TCO Clock Output Time

TH Hold Time

TSU Clock Setup Time


1

Chapter 1

1.0 Introduction

The purpose of this report is to present and examine the design of a

microprocessor. The project is to design a 32-bit RISC microprocessor with a floating

point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh,

and Adam Parsons.

1.1 Purpose

Microprocessors are extremely small electrical devices built on an integrated

circuit. They are the cornerstone that today’s automated systems are built upon. Most

notably the microprocessor is used in the common computer; be it either a PC or a

MAC. There are many more applications of it in the modern world, and there is often a

microprocessor design specifically for that task. Their uses can range from simple

household devices such as washing machines and mobile phones to the automatic

check-in booths in the airport.

1.1.1 Motivation

As the microprocessor becomes more integrated into every aspect of daily life, it

becomes more important to understand the design and implementation of the device.

This allows for improvements and optimizations in order to maintain a competitive


2

marketplace, as well as a constant progression of modern technology. Modern

applications of microprocessors require them to be faster, precise and designed with

minimal hardware.

1.1.2 Applications

The 32-bit RISC microprocessor with floating point unit is a more specialized

device, but it still maintains a wide range of possible implementations. It can store and

manipulate large data sets, and handle real number calculations that may be necessary

in the field. These applications would tend to be directed to math-intensive operations,

such as data processing.

With a more specialized functionality that provides faster and more accurate

outputs compared to a general microprocessor. Due to the specialty of the processor it

is often encouraged to implement it as part of a multi-core processing set. This

particular processor can be implemented within web controllers, graphics processors, as

well as mobile GPS devices.

1.2 Report Overview

Chapter 2 outlines the engineering project as a whole. This ranges from the

Health and Safety concerns involved with designing a microprocessor, and the

appropriate procedures taken to ensure that the respective Health and Safety


3

requirements are met. It also addresses the engineering professionalism pertaining to

the project, through project management, workload partitioning, as well as workplace

synergy.

Chapter 3 will begin to present you with the more technical aspect of the

microprocessor and its design. This chapter addresses the overview of the project,

providing background information regarding the microprocessor, as well as design

specifications, and the partitioning of the actual microprocessor components in relation

to each project member.

The specialized main topic of the project is presented within Chapter 4. For

this specific report it will provide in depth technical details regarding the floating point

unit. The individual modules of the device will be explained, and the algorithms and

optimizations that were used to produce a high performing floating point unit.

In Chapter 5 the results from the digital design testing are displayed and

analyzed. This chapter also contains explanations for performance analysis and

performance restrictions of the floating point unit.

Chapter 6 concludes the report by summarizing the project’s work and

accomplishments, and possible applications for the 32-bit RISC micro processor with

floating point unit, or even just simply the floating point coprocessor. This chapter also

states proposals for future improvements to be made to the processor.


4

Chapter 2

2.0 Health and Safety

Microprocessors are relatively safe devices to operate, but within the computer

design lab it is still important to follow and respect general health and safety principles

as regulated by the Carleton University Health-And-Safety document. Some of the

relevant health and safety principle from the document include:

• usage of personal protective equipment at all times,

• using the equipment only for its designed purpose,

• keeping the lab supervisor informed of any unsafe condition,

• keeping track of the location and correct use of safety equipment,

• determining potential hazards and appropriate safety precautions before

beginning new operations.

As the microprocessor was implemented and tested on the ALTERA DE2

Development Board, extra precautions were needed to be considered to ensure a safe

work environment. The following measures ensure that the board operates within its

normal operating conditions while maintain the health and safety of all project

members.


5

• Automatic testing was incorporated to check the integrity of the following units

before the first execution: system’s Memory Units (RAM and ROM), Input and

Output signal processing circuitry, the Arithmetic Logic Unit (ALU), Control Unit,

and Registers.

• Software was developed which during predetermined time intervals monitors for

electrical parameters such as Current or Voltage in the Circuit. When fault is

sensed it sends a signal to the board which halts further execution and

terminates the program. This circuitry continually tests for proper supply voltage

to the microprocessor.

• Overcurrent is an abnormal current greater than the full load value of the circuit.

This can occur due to short-circuitry or overload currents in any unit.

• Overload is an overcurrent which persists long enough to cause dangerous

overheating. This can occur during long start time, during multiple restarts in a

short interval and if the normal duty cycle of the processor is exceeded.

• An Alarm Signal is generated by the board and the program execution is halted if

an overload was to occur.

• The board was implemented in such a fashion so that failure to execute the

program disconnects the Voltage Source to prevent any false leakage of Current.

• An asynchronous Reset Signal for the Microprocessor was designed for manual

override to reset all units in case of a danger of overload.


6

• Microprocessor is designed so that the algorithm can’t be altered by anyone

except by the designers themselves.

2.1 Engineering Professionalism

To meet the requirements for professionalism in engineering, all engineers must

abide by the Professional Engineers Act (PEA), and the Professional Engineers Ontario

(PEO) Code of Ethics. As engineering is a self-regulated profession with strict rigor on its

code of ethics, it is of upmost importance that we follow the principles of fairness,

integrity and honesty.

During the project design there have been minimal ethical dilemmas from a

professional standpoint. As the project work was fairly separate for each individual,

there were never any conflicts of points of view, as we all trusted each other to have

been working at the best of their respective abilities. Professional engineering had

occurred at all times, as the only reasonable way for this project to even possibly be

completed is for each group partner to operate without impeding the work flow of the

other group members.

The only major difficulty was meeting specific preset deadlines, as previously

outlined by the project proposal. The proposal may have produced an unreasonable

timeline for the group to keep pace with. This may have been caused by our minimal

communication outside of our weekly meetings. Consistent contact was maintained


7

through emails as to keep each other up to date with status reports, and questions

regarding project difficulties/confusion.

Although during the development of a microprocessor there are reduced

chances for unprofessional behavior there was none that had truly impeded the quality

of work, or professional decisions that had to be made for the completion of the project.

Each group member’s professional responsibilities aided in meeting each member’s

individually designated goals. It has also enabled the achievement of the group’s goal

which was to successfully designing a microprocessor.

2.2 Project Management

Several project management techniques were used in order to coordinate,

manage and perform the project.

Weekly group meetings with Prof. M. Shams kept clear the objectives and progress of

the design project. It was here that we could clarify any individual misconceptions of

the design of the project with the supervisor. This portion of the project management

was fairly relaxed, which is important as to not be intimidated or fear the supervisor.

The relatively loose regulation of supervision had encouraged the group’s members to

improve communication with each other, instead of being completely autonomous with

very little knowledge of each other’s involvement of the project.

Open communication was encouraged (via email/phone), to enable the clear flow of

design concepts and ideas. This also promoted the project’s success for when any group


8

member arrived at a difficult design decision or had any other difficulty either of the

other group members had been able to assist.

The ability to perform the project is not something that could truly fall under project

management of the group. This ability rests heavily upon the individual group member

as the software required to complete the project is available in several laboratories

within the Department of Electronics at Carleton University; a free web service of the

program was also available for use at home. The performance expectations were clearly

displayed within the initial project proposal as shown in Figure 1 below.

Figure 1: Project Schedule

The partitioning of the workload relating to the project was decided during one

of the initial group meetings that were supervised by M. Shams. Each portion

designated was selected or compromised by the individual group members as to

encourage each individual to work in the field that sparks the most personal interest,

which would therefore increase workflow productivity.


9

Chapter 3

3.0 Project Overview

Before discussing the more technical side of the design of a 32-bit RISC

microprocessor with floating point unit, it important to receive a clear overview of the

components of a microprocessor. A simple microprocessor is built from five basic

integrated blocks as shown in Figure 2. These are:

● Inputs/Outputs

● Memory

● Datapath

● Control Unit

● Arithmetic Logic Unit

Figure 2: Processor Overview


10

Figure 2 clearly shows the organization of the microprocessor, which is

consistent throughout all types of processors. Every processor performs the same basic

functions of fetching decoding and executing, which require all of the five necessary

blocks.

The processor receives instructions from the Memory, which is responsible for

storing the instruction sets as well as data sets. The flow of data between the Memory

to the processor follows the implementation of the Datapath. The Datapath interprets

the instruction signals between the Control Unit, Memory, as well as the Input/Output

devices. This interpretation of data is regulated by the Control Unit’s output signals

which then branch to the Input/Output devices. The input and output devices, usually

consist of hardware such as a keyboard, or a graphics display.

3.1 Design Specifications

The Microprocessor design requires the implementation of a memory and register

unit which temporarily stores data within the microprocessor. The memory was given a

specified size of 512 x 32 bits. The size of each register in the microprocessor is specified

to 32-bits. The standard set of instruction classes to be performed by the

microprocessor was also specified. A description of these classes follows.


11

• R-type Instruction – Arithmetic Instructions (Addition, Subtraction,

Multiplication and Division of two operands) and Logical Instructions (A

Comparison of two operands).

• Branch Instruction – Makes a jump to the provided Memory address by

comparing two operands. Operands are compared for equality and if they are

equal the branch is executed.

• Load Instruction – Loads a data word from Memory into one of the specified

registers in the processor.

• Store Instruction – Stores a data word from a specified register into the specified

Memory address.


12

3.2 Design Methodology

The microprocessor was designed using Verilog Hardware Design Language

(Verilog-HDL). This allows the user to operate comfortably within the Verilog

programming language, for design, testing, as well as synthesis of the overall

microprocessor design. The Quartus II software was used to compile and simulate the

Verilog-HDL code, as it connects fairly easily to the ALTERA DE2 development boards

that the design must be implemented upon.

The design was partitioned into three distinct portions as mentioned in Section

2, as well as shown in Figure 3.

Figure 3: Workload Partitioning Chart

The design follows the Von Neumann Architecture, which follows the standard

FETCH DECODE EXECUTE pattern of microprocessors. This particular architecture allows

the instructions and data to be stored within the same memory. This particular

architecture has been chosen due to its highly-optimized instruction set, high

performance implementations, programmability (easy to express programs) and


13

reduction in the required hardware. It does this by sharing the functional units, while

also implementing pipelining, and as a result a smaller silicon size chip with a lower

operating power can be fabricated.


14

Chapter 4

4.0 Background of Floating Point Representation

Many basic microprocessors are unable to handle real number arithmetic, but

only integer manipulations. Real number manipulation allows for the processors to

handle rational, as well as possibly irrational numbers. This is very important for data

analysis and manipulation of various signals within Digital Signal Processing (DSP)

devices.

An important part of handling real numbers is scientific notation, which is a form

of handling real numbers that may be too large to be conveniently expressed in decimal

notation. This notation is presented as

[fraction] x 10[exponent]

[real] x 10[integer]

More often than not scientific notation is expressed in its normalized format.

This is the format of when the most significant integer is of the real number is the only

one to the left hand side of the decimal point. This allows for easy comparison of the

magnitudes of two numbers as they are expressed solely within the exponent of the

notation.


15

Examples of real numbers:

11/5 = 2.2ten

𝜋𝜋 ≈ 3.141593ten

5.73ten x 10 -4 (normalized scientific notation)

235.9722 x108 (scientific notation)

The floating point representation of real binary values allows microprocessors to

manipulate real numbers. This notation deals with the fractions created by real numbers

through the placement of binary points 1

as well as scientific notation.

Examples of real binary numbers:

110111.11two = 55.75

1011two x 23 (scientific notation)

1.0001two x 2-7 (normalized scientific notation)

There are different formats for handling floating point binary, such as MIPS and

IEEE-754 standards. In the design of a floating point unit these both require specific sizes

of both the exponent and fraction. The size of the exponent and fraction (commonly

referred to as mantissa) are determined by the size of the fixed word. A large exponent

1 Binary point is the binary term for a decimal point, as we are now working in binary notation instead of

decimal notation


16

would be ideal for a large range of numbers, while a larger size of the fraction allows for

a more precise representation of the numbers within the reduced range. For a 32-bit

word neither of these are much of problem as there is a relatively large range, with

capabilities of significant precision.

MIPS floating point representation was designed by MIPS Technologies

(-1)sign x [fraction] x 2[exponent]

With 32-bit MIPS representation floating point binary is expressed as :

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

S EXPONENT [8 bits] FRACTION [23 bits] Figure 4: Floating Point Binary

This format allows for 23-bits to express the fraction, with 8-bits expressing the

exponent. The exponent holds a bias of 127, which allows for the exponential to range

from +127 to -127.

MIPS may not have many limitations but it is not the best representation for

floating point numbers for binary computing. A more commonly used standard is the

IEEE-754 representation of floating point binary.

(-1)sign x [1+fraction] x 2[exponent]


17

It still uses the 32-bit format expressed like the MIPS, but the format assumes that the

fraction is constantly normalized, which enables the most significant bit to be implied.

This hidden bit allows for the fraction to actually be 24-bits instead of 23-bits long.

This format is preferred over the MIPS format mainly because it allows for

special representations of certain values such as Inf, and NaN to prevent interrupts.

Value Exponent Fraction Binary Zero Zero Zero 0000000000000000000000000000000 Signaling NaN 255 nonzero 1111111100000000000000000000001 Quiet NaN 255 nonzero 1111111110000000000000000000000 Infinity 255 Zero 1111111100000000000000000000000

Table 1: IEEE-754 Special Representations

These special representations do not cover overflow and underflow exceptions.

Overflow occurs when the exponent is too large to be represented, while underflow

occurs when the negative exponent is also too large to be represented.


18

4.1 Floating Point Unit

The floating point unit designed in this project utilizes the IEEE-754 format for

design optimization. The actual unit performs the standard ALU operations, as well as a

few extra operations that can only be done in floating point format. These operations

include:

● Addition

● Subtraction

● Multiplication

●Division

● Power

● Square Root

● Floating Point to Integer

● Integer to Floating Point

Many of the algorithms that were utilized throughout the design of the floating

point unit were created through basic arithmetic that can be done by hand.


19

Figure 5: Floating Point Block Diagram

4.2 Addition and Subtraction

The addition and subtraction modules follow very similar algorithms, as it is very

easy to switch between the two functions. The two functions were not complimentary

together as to increase the capability of a pipelining implementation so that multiple

instructions can occur before the completion of the algorithms.

The two algorithms follow the same basic initial steps:


20

Step 1:

Compare Exponent of two numbers and shift the smaller number to the

right until exponents match

The shift allows the two numbers to have the same exponent

which enables the numbers to the easily added together with a

basic arithmetic adder/subtractor that could be designed from an

ALU.

Step 2:

Add or Subtract significands

The specific addition/subtraction function module is called in

respect to the instruction implemented.

Step 3:

Normalize the sum by shifting right or left

Normalization of the sum adjusts for over flow or underflow. This

must be done as each floating point number is normalized as to

maintain consistency of arithmetic algorithms.

Step 4:

Round the Significand


21

Rounding the significand can be done to increase accuracy, but it

was decided that it would delay the operational speed of the

device, in comparison to the relatively high accuracy that can be

determined from a 22bit mantissa. Truncation was performed

instead, as to maintain the high speeds that the unit can operate

within.

Figure 6: Addition/Subtraction Module


22

4.2.1 Addition

The addition of the significands can be done for the sake of simplicity with a

basic Carry-Save Adder (CSA). However, a Carry-Look Ahead Adder (CLA) produces

results faster as it calculates both the “propagate” and the “generate” signals for the

group to avoid waiting for the ripple to determine the first group’s generated carry. The

group generate signal is the signal that “generates” the summation by passing the two

signals through an AND gate. This is done in parallel with the group propagation signal is

the signal that determines if the signal will pass along. This signal is created by passing

the group inputs through an OR gate.

In this project a 24 bit CLA Adder was used as to increase the speed of the

function.

Figure 7: Carry Look-Ahead Adder


23

4.2.2 Subtraction

The subtraction of the significands utilized the CLA used in the previous module.

As the difference between addition and subtraction is minimal it was very elementary to

change the addition module into a subtraction module.

The only technical change from the addition to subtraction was the mantissa of

the subtractor was converted into a negative value through two’s compliment

manipulation.

Figure 8: Two's Compliment


24

4.3 Multiplication and Division

The as with the Addition/Subtraction modules the Multiplication and Division

modules follow similar premises when dealing with floating point notation.

Step 1: Addition/Subtraction exponents without bias

The exponents are added or subtracted together, just as if this was

done by hand.

Step 2: Manipulation of Significands

Multiplication or Division of the significands is done at this stage,

where a separate module is called to perform the specified

operation.

Step 3: Check if Normalized and for Overflow

As binary multiplication/division produces an output that is a

summation of the sizes of the inputs, it is important to check if the

product/quotient is normalized, as well as the exponents being

check for overflow.


25

Step 4: Rounding or Truncation

Due to the large size of the mantissa, as well as for the sake of

speed, truncation was chosen to occur as it was deemed

unnecessary for a floating point number that already holds such

precision.

Step 5: Set the Sign

The sign it set by passing the two sign bits through an XOR gate to

produce the appropriate value.

Figure 9: Multiplier and Divider Module


26

4.3.1 Multiplier

There are several various algorithms for multiplication, but the “rolled out”

binary multiplier was used, as like the addition/subtraction modules it was the most

relatable and clear to understand and explain.

A simple binary adder performs a simple shift and summation for the entire

length of the multiplicand. This can be implemented within a loop to conserve space

within the chip design. This produces a synchronous circuit which therefore relies upon

24 clock edges until it is completed.

The “rolled out” version was used to make the same basic algorithm but instead

of the synchronous loop, each stage was laid out to produce the accurate multiplication

in much less than 24 clock edges. This format allows for easier implementation of

pipelining circuitry as to support multiple function calls simultaneously.


27

Step 1:

Check the multiplier bit [n]

Step 2:

If the multiplier bit [n] holds a value

of 1 then the product is summed with the

multiplicand and placed within the product

register

Step 3:

Shift the multiplicand left by 1 bit

Step 4:

Shift the multiplier right by 1 bit

Step 5:

Check if the loop has stepped

through each multiplier bit, if not then step

to the next bit (n+1) and repeat.


28

Figure 11: Multiplication Block Diagram

4.3.2 Division

The division algorithm is identical to the multiplication algorithm, and can be

implemented in a very similar manner. This division algorithm is different from the

multiplication algorithm implemented because it was kept in the iterative loop.


29

Step 1: Check the Remainder

Step 2a: If the remainder is greater than zero the quotient is shifted by 1-bit, and

the new LSB is set to a value of one.

Step 2b: If the remainder is less than zero the quotient is shifted by 1-bit, the new

LSB is set to a value of zero, and the remainder is restored.

Step 5: Check if the loop has stepped through each remainder bit, if not then

step to the next bit (n+1) and repeat.

Figure 12: Division Block Diagram


30

Figure 13: Division Algorithm

The loop was maintained because as the multiplication algorithm was already

built, the looped divider would provide an appropriate comparison during simulations,

and timing analysis.

4.4 Float to Integer

The integer to float unit was centrally designed with the purpose of use within

the Power Module. It separates the 23-bit fraction into an integer, numerator and

denominator.


31

It does this by placing the fraction into a shift register that is twice as large as an

integer register (2x 32-bit), as to maximize the size of the integers that can be produced.

As to order to produce an integer the exponent must be zero; therefore large register is

then shifted left or right according to the value of the exponent to set the exponent to

zero. If the exponent is too large for the shift register to manipulate then the register is

shifted to the far right or the far left and the exponent is adjusted accordingly.

Figure 14: Float to Integer Block


32

The numerator and denominator are formed by stepping through the bottom

segment (32-bits) of the shift register, while counting the value of bits. As the bits are

counted they follow the equation

𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = �1

2𝑥𝑥

𝑥𝑥

0

Equation 1

𝑓𝑓𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝑥𝑥) =𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

= �

𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

, 𝑥𝑥 = 0

𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 +𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

, 𝑥𝑥 = 1�

Equation 2

4.5 Integer to Float

Figure 15: Integer to Float Diagram

The Integer to Float Module accepts the inputs in signed binary integer format,

and normalizes the integer, which provides it with an exponent value of its own. The

importance of normalization was previously discussed in Section 4.0


33

4.6 Power Approximation

The Power Module of the FPU was to initially use a recursive algorithm, but a

looping algorithm provided many issues than were necessary for determining power of

a floating point number.

The first issue was the fact that the loop was in fact “a loop”. As floating point

representation handles large real numbers, it would be unwise to loop for extremely

large numbers, with large exponents. The loop method would prove to be far too slow

for floating point representation.

The second issue was the difficulty in creating the power of a real number (for

example 2.523.194). The looped algorithm had initially only dealt with integers in the

exponent form, but with the application real numbers, the situation had become more

difficult to manipulate.

The first issue was addressed by changing the Power Module into a Power

Approximation Module. The Power Approximation Module uses the IEEE-754 binary

representation of a 32-bit floating point number in its estimation of LOG2(X).

LOG2(x) = Xinteger/223 - 127

Equation 3

This approximation method is fairly accurate for its respective speed.


34

Figure 16: Log2 vs IEEE Estimate

However, a problem occurs when the logarithmic value is further manipulated,

the precision becomes greatly lost in comparison to its actual value.

Real Estimate “Lossy” Estimate X = 5 5 Xinteger = 1084227584 1084227584

Y = Log2(X) = 2.3219 Y =Xinteger

223 − 127 = 2.25 2

Z = 2*Y = 4.6439 Z = 2*Y = 4.5 4 2^Z = 25 Z + 127

223 = 1103101952 1065353220

XFloat= 16 1


35

Table 2: Log Estimate Error

This issue can be resolved by shifting the value of Xinteger to the left a few binary

points before passing it through the logarithmic estimate function. In this

implementation of the algorithm the Xinteger was shifted by two places and the results

can be seen in the table below.

Real Estimate “Lossy” Estimate X = 5 Xinteger*100 = 108422758400 108422758400

Y = Log2(X) = 2.3219 Y=(𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵223 ) − (127 ∗ 100) = 225 225

Z = 2*Y = 4.6439 Z= 2*Y = 450 450 2^Z = 25 Z+127∗100

223 /100 = 1103101952 1103101952

XFloat= 24 24 Table 3: Log Estimate Error Correction

The accuracy of the estimate of the power module has greatly increased from

the implementation. This can be further improved by shifting the initial Xinteger by several

more binary places.

The second issue was resolved by utilizing the Float to Integer Converter

Module. This module converts the binary real exponent into a more manipulative

integer format.

�𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 + �𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

�� ∗ 10𝐸𝐸𝑥𝑥𝐸𝐸𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

Equation 4


36

With the logarithmic estimate provided, the manipulation into a power module

becomes as simple as multiplication and division of an integer.

Example:

𝑃𝑃𝐵𝐵𝑃𝑃𝐵𝐵𝐵𝐵 = �log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]�𝑌𝑌𝐵𝐵𝐵𝐵𝐵𝐵𝑟𝑟

Equation 5

𝐹𝐹𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 =𝑌𝑌𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑌𝑌𝑑𝑑𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

∗ log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]

Equation 6

𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 = 𝑌𝑌𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ∗ log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]

Equation 7

𝑃𝑃𝐵𝐵𝑃𝑃𝐵𝐵𝐵𝐵 = 2𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 + 2𝐹𝐹𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵

Equation 8

These calculations are within the block diagram in Figure 17 which shows the

flow of the individual steps to produce the power approximation module.


37

Figure 17: Power Unit

The power block is incapable of handling exponents outside the range of

+4.2950e+009 to - 4.2950e+009 as these numbers are too large for the algorithm to

properly operate.


38

4.7 Square-Root

There are several different iterative methods (i.e. Newton’s Method) for

developing the square-root estimate of a binary real number. The issue was once again,

that the methods take several iterations. For this reason, the Square-Root Module

utilizes the same method of logarithmic approximation as the Power Module.

This is much faster than the Power Module, as it does not rely upon the Float to

Integer Converter. It simply follows the formula:

ℎ𝐵𝐵𝑟𝑟𝑓𝑓_𝑟𝑟𝐵𝐵𝑋𝑋 =12∗ ( log2 𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 )

Equation 9

𝑠𝑠𝑠𝑠𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = 2ℎ𝐵𝐵𝑟𝑟𝑓𝑓 _𝑟𝑟𝐵𝐵𝑋𝑋

Equation 10


39

Figure 18: Squareroot Unit

4.8 Floating Point Control Unit

The Floating Point Control Unit is most vital portion of the coprocessor, as it is

responsible for organizing the various operations of the coprocessor. This is done by

handling only six opcode signals, with each representing the specific module called to

produce an output value. The control module handles the input instructions and checks


40

for special cases. Although IEEE-754 floating point representation was designed to

handle certain special cases, it was deemed better to be on the side of caution.

Figure 19: Floating Point Control Unit

The various exceptions the control unit is designed to catch are cases when the

inputs or outputs would be clearly: Zeros, NaNs or INFs.


41

For example:

Input x Zero = Zero

Input + Inf = INF

Input/Zero = NaN

After the control unit checks for special cases it then calls the individual modules

in event that the predetermined opcode is received.


42

Chapter 5

5.0 Digital Testing

After the complete coprocessor was designed, the overall digital testing began.

There were two types of digital design testing that was done on the design. These tests

were regarding structural analysis, as well as timing analysis.

5.1 Structural Analysis

Structural testing is a form of testing when specific inputs are used in the testing

of the circuit. These gauge the range of the design, and detect flaws within the design.

This is different from functional testing, because in structural testing the design is

known, and so is the ability to probe points along the designated testing paths.

The first case test shown in the table below is a standard test case, which is

comfortably within the operational range of the floating point unit’s parameters. This

test case shows that the floating point unit is operating properly under reasonable

conditions.


43

Standard case: (Input1 = 5, Input2 = 0.75)

Real Value Floating Point Value “FPU” Value A 5 0_10000001_01000000000000000000000 5 B 0.75 0_01111110_10000000000000000000000 0.75 Add 5.75 0_10000001_01110000000000000000000 5.75 Sub 4.25 0_10000001_00010000000000000000000 4.25 Mul 3.75 0_10000000_11100000000000000000000 3.75 Div 6.6667 0_10000001_10101010101010101010100 6.6667 Pow 3.3437 0_10000000_10101110000101000111101 3.3599 SQRT 2.2361 0_10000000_00011110101110000101000 2.2399

Table 4: Standard Test Case

More specific cases were also used to test the corners of the design. A few

results of the specific cases that were used are shown in the table below:

A B ADD SUB MUL DIV POWER Real Value 5 5 10 0 25 1 3125 FPU Value - - 10 0 25 1 2560 Real Value 5 -5 0 10 -25 -1 3.1605e-018 FPU Value - - 0 10 -25 -1 NaN Real Value 5 0 5 5 0 NaN 1 FPU Value - - 5 5 0 NaN 1 Real Value 5 Inf Inf -Inf Inf 0 Inf FPU Value - - Inf -Inf Inf 0 Inf

Table 5: Special Test Cases

Several more extra cases were tested with the results posted within Appendix B.

These cases test the corners of the design, which range from the smallest numbers the

FPU should be able to handle all the way to the largest.


44

5.2 Timing Analysis

Two versions of timing analysis were used on the digital design. The first one,

which can be seen in Table 6, is the Fast Model Timing Analyzer. The second version is

the Slow Model Timing Analyzer which is shown in Table 7. The fast timing model

utilizes best-case timing model of the fastest device to analyze and report the fastest

delay of the timing characteristics for the design. While the slow timing model utilizes

the worst-case scenario for the design’s timing characteristics.

Type Time From To Worst-case tsu 4.702 ns opcode[0] subB[30] Worst-case tco 11.824 ns mulA[30] valueout[30] Worst-case tpd 10.560 ns B[31] valueout[29] Worst-case th 4.808 ns A[0] mulA[0] Worst-case Minimum tco

4.231 ns floatmul: floatmulA|e[25]

valueout[3]

Worst-case Minimum tpd

4.286 ns opcode[1] valueout[7]

Fast Model Clock Setup: 'clk'

4.88 MHz ( period = 204.804 ns )

Power:power| float2int:float2pow| denominator[29]

Power:power|normFr[0]

Table 6: Fast Timing Analysis

The maximum operation frequency of the Fast Timing Model is a slow 4.88MHz,

while in the Slow Timing Model the maximum operating frequency is an even slower

2.21MHz. Table 6 clearly shows that the Float to Integer Module used within the Power

module is by far the slowest module, and it greatly affects the highest operating

frequency of the device.


45

Type Time From To Worst-case tsu 9.168 ns opcode[0] subB[30] Worst-case tco 24.432 ns mulB[24] valueout[30] Worst-case tpd 20.806 ns B[31] valueout[29] Worst-case th 9.778 ns A[0] mulA[0] Slow Model Clock Setup: 'clk'

2.21 MHz ( period = 453.352 ns )

Power:power| float2int:float2pow| numerator[18]

Power:power| normFr[0]

Table 7: Slow Timing Analysis

The Slow Model Analysis was done without the power module’s float to integer

converter, and produced a maximum frequency of 88.75MHz, with a Fast Timing

Analysis fmax of 199.80MHz. The slowest clock setup time was due to the Subtractor

Module needing to switch to a two’s compliment before it passes through the binary

adder. Although the removal of the two’s compliment would make the subtractor into

another floating point adder, curiosity took over, and resulted in impressive

improvements in speed. The Slow model Analyzer produced a fmax value of 144.7MHz,

while the fast model analyzer produced more than double that speed with an fmax

characterized at 320.82 MHz.

5.3 Implementation

The coprocessor was implemented onto the ALTERA DE2 development board as

shown in Figure 20. Due to the lack of inputs provided by the board it was unreasonable

to create a complex form of setting the input values for the device for live-testing.

Instead a set of preset inputs were assigned for purpose of presentation.


46

Figure 20: Altera DE2 Implementation The various switches determined the opcode, and set the operation to be

performed by the device. The push buttons were set as the reset input for the device,

for when a new opcode was to be inputted into the board. The outputs were displayed

on both the small LCD as well as the on the 18 LEDs located above the switches.

Due to the size of the LCD display, which did not allow the floating point unit to

display large real numbers, the 18 LEDs displayed the output in floating point binary

format. The eight green LEDs clearly showing the exponential value, while the rest

displayed a truncated version of the mantissa.


47

Chapter 6

6.0 Concluding Remarks

This concluding chapter allows for a brief review of the project, and to

emphasize on a few key points that developed during the course of the year.

6.1 Summary of Project Accomplishments

The coprocessor was successfully designed and implemented upon the ALTERA

DE2 Development board, using 32-bit data registers.

The addition and subtraction modules utilized the fastest basic binary addition

algorithm. The multiplication module is optimized for the ability to be pipelined, while

the divider utilized a slow looping algorithm. The Power module used IEEE logarithmic

estimation to improve performance, but was slowed down considerably by the Float to

Integer Converter that it required to fully operate.

The digital design was put under test, and analyzed to optimized performance

characteristics. There were a few small bugs here and there, but the floating point unit

successfully passed the rigorous digital device testing, although perceptively slow to the

commercial versions of the FPU, which operate at speeds around 250MHz.


48

6.2 Considerations for Future Work

There are still many more possibilities for a faster Floating Point Unit

coprocessor. Improvements of fmax within the Float to Integer Module would greatly

increase the speed by a minimum factor of four, and with improvements in the speed of

the two’s compliment of the Subtractor, the maximum operating frequency would be at

worst case somewhat close to the standard operating of commercial FPUs.

The multiplier is ready to be pipelined, and several tests are required to see how

well the coprocessor would combine with the regular 32-bit RISC microprocessor.


49

References

[1] Carleton University, “Laboratory Health and Safety Manual”, [Online]. Available at: http://www.doe.carleton.ca/undergrads/health-and-safety.pdf [Accessed: March 28 2010]. [2] D. A. Patterson and J. L. Hennessey, Computer Organization and Design, 3rd Ed. San Francisco: Morgan Kaufmann Publishers.

[3] Carleton University, “Microprocessor Systems”, ELEC 4601. [Online]. Available at: http://www.doe.carleton.ca/~shams/ELEC4601/Course_Notes.pdf [Accessed: Oct 17 2009].

[4] Carleton University, “Digital Design Flow”, ELEC 4706. [Online]. Available at: http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-09-10%20LECTURES [Accessed: Oct 13 2009].

[5] Carleton University, “Binary Manipulation”, SYSC 3006. [Online]. Available at: http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf [Accessed: Oct 12 2009].

[6] ASIC WORLD, “Verilog Tutorials”, Deepak Kumar Tala [Online]. Available at: http://www.asic-world.com/verilog/veritut.html [Accessed: Sept 25 2009].

[7] Goldberg, David. 1991. “What Every Computer Scientist Should Know About Floating-Point Arithmetic.”[Online]. Available at: http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf [Accessed: Oct 5 2009].

http://www.doe.carleton.ca/undergrads/health-and-safety.pdf�

http://www.doe.carleton.ca/~shams/ELEC4601/Course_Notes.pdf�

http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-09-10%20LECTURES/�

http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-09-10%20LECTURES/�

http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf�

http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf�

http://www.asic-world.com/verilog/veritut.html�

http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf�

http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf�


50

Appendix A: Verilog Design Code

Addition Module

module adder (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow; wire[24:0] addoutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout; assign fractionA[22:0]=A[22:0]; assign fractionB[22:0]=B[22:0]; assign fractionA[23]=1; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin


51

shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0; sign<=signA; snorm<=1'b1; end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=signA; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=signB; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end // Add Significands bitadder add(noshift,shiftout,1'b0,addoutput); // Normalize


52

normalizer addnorm(expLarge,addoutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule


53

Subtraction Module module subtractor (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output overflow,finish; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow,fnorm; wire[24:0] suboutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout,shiftout1; assign fractionA[22:0]=A[22:0]; assign fractionA[23]=1; assign fractionB[22:0]=B[22:0]; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8sub(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0;


54

sign<=1’b0; snorm<=1'b1; end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=1’b0; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=1’b1; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end //2.0 Add Significands // this is the slowest part by 100MHz i blame the INV wire [23:0] negtemp; assign negtemp[23:20]=~shiftout[23:20]+1'b1; assign negtemp[19:15]=~shiftout[19:15]+1'b1; assign negtemp[14:12]=~shiftout[14:12]+1'b1; assign negtemp[11:8]=~shiftout[11:8]+1'b1;


55

assign negtemp[7:4]=~shiftout[7:4]+1'b1; assign negtemp[3:0]=~shiftout[3:0]+1'b1; bitadder sub(noshift,negtemp,1'b0,suboutput); //bitadder sub(noshift,(~shiftout+1'b1),1'b0,suboutput); // Normalize normalizer addnorm(expLarge,suboutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule


56

Normalization Module

module normalizer(expin,in,expout,out,clk,rst,overflow,start,finish); input clk,rst,start; input [7:0]expin; input [24:0]in; output [23:0]out; output [7:0] expout; output finish,overflow; reg active,first; reg [24:0] regF,fregF; reg [8:0] regE,fregE; always@(posedge clk or posedge rst)begin if(rst)begin regF<=25'b0; regE[7:0]<=8'b0; fregF<=25'b0; fregE<=9'b0; active<=1'b0; first<=1'b0; end else if(start)begin if(!first)begin fregF<=fregF; fregE<=fregE; regF<=in[24:0]; regE[7:0]<=expin[7:0]; active<=1'b1; first<=1'b1; end else if(regF[24]==1'b1)begin regF<=regF>>1'b1; regE<=regE+1'b1; // Increment Exponent active<=1'b1; first<=1'b1; end else if(regF[23]==1'b0 && regF[24]==1'b0)begin //shift left regF<=regF<<1'b1; regE<=regE-1'b1; // Decrement Exponent active<=1'b1; first<=1'b1; end else begin regE<=regE;


57

regF<=regF; fregE<=regE; fregF<=regF; active<=1'b0; first<=1'b1; end end else begin regE<=regE; regF<=regF; fregE<=fregE; fregF<=fregF; active<=1'b0; first<=1'b0; end end assign out=fregF[23:0]; assign expout=fregE[7:0]; assign overflow=fregF[8]; assign finish=~active; endmodule


58

24- bit Addition Module

module bitadder(addinA,addinB,carryin,sum); input[23:0] addinA,addinB; input carryin; output [24:0]sum; wire carryout1,carryout2,carryout3,carryout4,carryout5,carryout6; wire [3:0] sum1,sum2,sum3,sum4,sum5,sum6; fourbitadder adder1(addinA[3:0],addinB[3:0],carryin,sum1,carryout1); fourbitadder adder2(addinA[7:4],addinB[7:4],carryout1,sum2,carryout2); fourbitadder adder3(addinA[11:8],addinB[11:8],carryout2,sum3,carryout3); fourbitadder adder4(addinA[15:12],addinB[15:12],carryout3,sum4,carryout4); fourbitadder adder5(addinA[19:16],addinB[19:16],carryout4,sum5,carryout5); fourbitadder adder6(addinA[23:20],addinB[23:20],carryout5,sum6,carryout6); assign sum[24] = carryout6; assign sum[23:20] = sum6; assign sum[19:16] = sum5; assign sum[15:12] = sum4; assign sum[11:8] = sum3; assign sum[7:4] = sum2; assign sum[3:0] = sum1; assign test=addinA+addinB; endmodule


59

4-bit Addition Module

module fourbitadder(addinA,addinB,carryin,sum,carryout); input[3:0] addinA,addinB; input carryin; output [3:0]sum; output carryout; wire[3:0] generation,propagation; wire [2:0] carrybit; assign sum[0] = propagation[0]^carryin; assign generation = addinA&addinB; assign propagation = addinA^addinB; assign carrybit[0] = generation[0]|(propagation[0]&carryin); assign carrybit[1] = generation[1]|(generation[0]&propagation[1])|(propagation[0]&propagation[1]&carryin); assign carrybit[2] = generation[2]|(generation[1]&propagation[2])|(generation[0]&propagation[1]&propagation[2])|(propagation[0]&propagation[1]&propagation[2]&carryin); assign sum[3:1] = propagation[3:1]^carrybit[2:0]; endmodule


60

Multiplication Module

module floatmul(A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg active; reg [47:0] Mplier,Mcand,product,d,e; reg [7:0]counter; wire [23:0] fractionA,fractionB; wire [7:0] expA,expB; wire [8:0] expsum; assign expA=A[30:23]-127; assign expB=B[30:23]-127; assign fractionA={1'b1,A[22:0]}; assign fractionB={1'b1,B[22:0]}; // adding exponents without bias assign expsum = ((A[30:23]-127)+(B[30:23]-127))+127; // check for overflow assign overflow = expsum[8]; // multiplying significands always@(posedge clk)begin if(rst)begin d=0; e=0; active=1'b0; end else if(start) begin active=1'b1; d={({32{fractionA[1]}}&fractionB)&({32{fractionA[0]}}&fractionB),({32{fractionA[1]}}&fractionB)^({32{fractionA[0]}}&fractionB)}; e[0]=d[0]; d={({32{fractionA[2]}}&fractionB)&d[32:1],({32{fractionA[2]}}&fractionB)^d[32:1]}; e[1]=d[0]; d={({32{fractionA[3]}}&fractionB)&d[32:1],({32{fractionA[3]}}&fractionB)^d[32:1]}; e[2]=d[0]; d={({32{fractionA[4]}}&fractionB)&d[32:1],({32{fractionA[4]}}&fractionB)^d[32:1]};


61

e[3]=d[0]; d={({32{fractionA[5]}}&fractionB)&d[32:1],({32{fractionA[5]}}&fractionB)^d[32:1]}; e[4]=d[0]; d={({32{fractionA[6]}}&fractionB)&d[32:1],({32{fractionA[6]}}&fractionB)^d[32:1]}; e[5]=d[0]; d={({32{fractionA[7]}}&fractionB)&d[32:1],({32{fractionA[7]}}&fractionB)^d[32:1]}; e[6]=d[0]; d={({32{fractionA[8]}}&fractionB)&d[32:1],({32{fractionA[8]}}&fractionB)^d[32:1]}; e[7]=d[0]; d={({32{fractionA[9]}}&fractionB)&d[32:1],({32{fractionA[9]}}&fractionB)^d[32:1]}; e[8]=d[0]; d={({32{fractionA[10]}}&fractionB)&d[32:1],({32{fractionA[10]}}&fractionB)^d[32:1]}; e[9]=d[0]; //-----------10-----------d={({32{fractionA[11]}}&fractionB)&d[32:1],({32{fractionA[11]}}&fractionB)^d[32:1]}; e[10]=d[0]; d={({32{fractionA[12]}}&fractionB)&d[32:1],({32{fractionA[12]}}&fractionB)^d[32:1]}; e[11]=d[0]; d={({32{fractionA[13]}}&fractionB)&d[32:1],({32{fractionA[13]}}&fractionB)^d[32:1]}; e[12]=d[0]; d={({32{fractionA[14]}}&fractionB)&d[32:1],({32{fractionA[14]}}&fractionB)^d[32:1]}; e[13]=d[0]; d={({32{fractionA[15]}}&fractionB)&d[32:1],({32{fractionA[15]}}&fractionB)^d[32:1]}; e[14]=d[0]; d={({32{fractionA[16]}}&fractionB)&d[32:1],({32{fractionA[16]}}&fractionB)^d[32:1]}; e[15]=d[0]; d={({32{fractionA[17]}}&fractionB)&d[32:1],({32{fractionA[17]}}&fractionB)^d[32:1]}; e[16]=d[0]; d={({32{fractionA[18]}}&fractionB)&d[32:1],({32{fractionA[18]}}&fractionB)^d[32:1]}; e[17]=d[0]; d={({32{fractionA[19]}}&fractionB)&d[32:1],({32{fractionA[19]}}&fractionB)^d[32:1]}; e[18]=d[0]; //---------20----------- d={({32{fractionA[20]}}&fractionB)&d[32:1],({32{fractionA[20]}}&fractionB)^d[32:1]}; e[19]=d[0]; d={({32{fractionA[21]}}&fractionB)&d[32:1],({32{fractionA[21]}}&fractionB)^d[32:1]}; e[20]=d[0]; d={({32{fractionA[22]}}&fractionB)&d[32:1],({32{fractionA[22]}}&fractionB)^d[32:1]}; e[21]=d[0]; d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //---again!!! for N+1 iterations or good luck


62

d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //--------- e[47:23]=d; active=1'b0; end else begin d=0; e=0; active=1'b0; end end // truncation // output the mantissa assign OUT[22:0]=e[45:22];//e[45:23];//46:22 // output exponent assign OUT[30:23]=expsum[7:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; assign finish=~active; endmodule


63

Division Module

module floatdiv(A,B,OUT,clk,rst,overflow,start,finish);//floatdiv input clk,rst,start; input[31:0] A,B; output[31:0] OUT; output overflow,finish; wire [7:0] expA,expB; wire [8:0] expsub; assign expA=A[30:23]-127; assign expB=B[30:23]-127; reg active; reg [46:0] remainder,divisorreg;//46:0 reg [23:0] quotientreg,outreg; reg [7:0] counter; //adding exponents without bias assign expsub =((A[30:23]-127)-(B[30:23]-127))+127; // check for overflow assign overflow = expsub[8]; //the divider starts here always@(posedge clk or posedge rst) begin if(rst)begin remainder<={22'b0,1'b1,A[22:0]}; quotientreg<=24'b0; divisorreg<={1'b1,B[22:0],23'b0}; counter<=7'b0; active<='b0; outreg<=24'b0; end else if(start)begin if(counter<25)begin//25 remainder<=remainder-divisorreg; if(remainder[46])begin // shift quotient to the left quotientreg<={quotientreg[22:0],1'b0}; end else begin// restore if less than zero remainder<=remainder+divisorreg; // shift quotient to the left quotientreg<={quotientreg[22:0],1'b1}; end // shift divisor to the right divisorreg<={1'b0,divisorreg[46:1]}; counter<=counter+1'b1;


64

active<=1'b1; outreg<=outreg; end else begin quotientreg<=quotientreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; outreg<=quotientreg; end end else begin quotientreg<=quotientreg; outreg<=outreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; end end assign OUT[30:23]=expsub[7:0]; assign finish=~active; assign OUT[22:0]=outreg[22:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; endmodule


65

Floating Point to Integer Conversion Module module float2int(IN,clk,rst,integerOUT,numerator,denominator,sign,INTexp,start,finish); input [31:0] IN;//the_float input clk,rst,start; output [31:0] integerOUT,numerator,denominator; output sign,finish; output [7:0] INTexp; wire signed [7:0] diff; wire unsigned [7:0]expIN; wire [63:0]fraction; wire [31:0] fractionIN,integerIN,integerOUT; reg active; reg [31:0] bincount,denominator,numerator; reg [63:0]fractionshift; reg [7:0] counter,intexp; assign fraction[31:9]=IN[22:0]; assign fraction[32]=1; assign expIN=IN[30:23]; assign diff=expIN-127; // normalize the exponent assign integerIN=fractionshift[63:32]; assign fractionIN=fractionshift[31:0]; assign integerOUT=fractionshift[63:32]; //the integer assign sign=IN[31]; //the positive/negative sign assign INTexp=intexp-127; assign finish=~active; //shift A into integer and fraction always@(posedge rst or posedge clk)begin if (rst) begin fractionshift<=64'b0; intexp<=8'b0; end else if(expIN<= 159 && expIN>= 95)begin if(expIN<127)begin fractionshift<=fraction>>(-diff); intexp<=expIN+(-diff); end else begin fractionshift<=fraction<<diff; intexp<=expIN-diff; end


66

end else if(expIN>159)begin // for a large integer fractionshift<=fraction<<31; intexp<=expIN-5'b11111; // decrement exponent end else if(expIN<95)begin // for a small fraction fractionshift<=fraction>>31; intexp<=expIN+5'b11111; // increment exponent end else begin fractionshift<=fraction; intexp<=intexp; end end // find the numerator and denominator integers of the floating point // by adding the fractions 1/2+1/4+1/8..etc = 0.875=7/8 always@(posedge clk or posedge rst)begin if(rst)begin counter<=32;//0 bincount<=1; numerator<=1'b0; denominator<=1'b1; active<=1'b0; end else if(start)begin if(counter>0)begin counter<=counter-1'b1; bincount<=bincount*2'b10; active<=1'b1; if(fractionIN[counter])begin //cross multiplying denominator<=bincount*denominator; numerator<=bincount*numerator+denominator; end else begin numerator<=numerator; denominator<=denominator; end end else begin counter<=counter; bincount<=bincount; numerator<=numerator; denominator<=denominator;


67

active<=1'b0; end end end endmodule


68

Integer to Floating Point Conversion Module module INT2FLOAT(in,out,clk,rst,start,finish); input clk,rst,start; input [31:0]in; output [31:0] out; output finish; reg [64:0] shiftreg,fshiftreg; reg [7:0] shiftexp,fshiftexp; reg active,first,sign; always@(posedge clk or posedge rst)begin if (rst)begin shiftreg<=65'b0; shiftexp<=8'b10111111;// 159=8'b10011111 //191=10111111 active<=1'b0; first<=1'b1; fshiftreg<=65'b0; fshiftexp<=8'b10001110; sign<=1'b0; end else if(start)begin if(first)begin if(in[31])begin// if negative shiftreg[31:0]<=~in[31:0]+1'b1; sign<=1'b1; end else begin shiftreg[31:0]<=in[31:0]; sign<=1'b0; end shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; end else if(!shiftreg[64])begin shiftreg<=shiftreg<<1'b1; shiftexp<=shiftexp-1'b1; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; sign<=sign; end


69

else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=shiftreg; fshiftexp<=shiftexp; active<=1'b0; first<=1'b0; sign<=sign; end end else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=active; first<=first; sign<=sign; end end assign out={sign,fshiftexp[7:0],fshiftreg[63:41]}; assign finish=~active; endmodule


70

Power Module module Power(A,B,OUT,clk,rst,start,finish); input clk,rst,start; input [31:0] A,B; output finish; output [31:0] OUT; wire [63:0] log,mullog,mullog2; wire [31:0] integerOUT,numerator,denominator,OUTmul; wire [7:0] expPow; wire ffloat; reg [63:0] normInt,normFr; reg [31:0] check,checkout; reg active; float2int float2pow(B,clk,rst,integerOUT,numerator,denominator,sign,expPow,start,ffloat); assign log=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mullog=(numerator*log/denominator); //+log apply to the power of B assign mullog2=log*integerOUT; //seperately include the integer //check for invalids always@(posedge clk or posedge rst)begin if(rst)begin normInt<=63'b0; normFr<=63'b0; active<=1'b1; end else if(ffloat)begin if(A<=10'd1065353216)begin // if negative or zero normInt<=32'b1111111100000000000000000000000;// NaN normFr<=32'b0; active<=1'b0; end else if(numerator==0)begin normFr<=32'b0; normInt<=((mullog2+127*100)*8388608)/100; active<=1'b0; end else if(integerOUT==0)begin normInt<=32'b0; normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end


71

else begin //because it can't do log2(0) normInt<=((mullog2+127*100)*8388608)/100; // convert from log (A+127)*(2^(23)); normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end end else begin normInt<=normInt; normFr<=normInt; active<=1'b1; end end always@(posedge clk or posedge rst)begin if(rst)begin check<=32'b0; checkout<=32'b0; end else if(!active)begin if(B[31]==1'b1)begin check<=normFr+normInt; checkout<={check[31],(~check[30:23]+1'b1),check[22:0]}; end else begin check<=check; checkout<=normFr+normInt; end end else begin check<=check; checkout<=checkout; end end assign OUT=checkout; assign finish=~active; endmodule


72

Square Root Module module SQRT(A,OUT,clk,rst); input [31:0]A; input clk,rst; output [31:0]OUT; wire [63:0] logrt,mulrt; reg [31:0] normIntr; assign logrt=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mulrt=logrt/2; // apply the root always@(posedge clk or posedge rst)begin if(rst) normIntr<=0; else if(A<=10'd1065353216) // if negative or zero normIntr<=32'b1111111100000000000000000000000;// NaN else normIntr<=((mulrt+127*100)*8388608)/100; // convert from log (A-127)*(2^(23)); end assign OUT=normIntr; endmodule


73

Control Module module Control(opcode,A,B,clk,rst,valueout); input clk,rst; input[2:0] opcode; input[31:0] A,B; output[31:0] valueout; reg [31:0]OUT; reg [31:0] addA,addB,subA,subB,divA,divB,mulA,mulB,powA,powB,sqrtA; reg sdiv,spow,sadd,ssub,smul,ssqrt,finish; wire [31:0] addOUT,subOUT,OUTdiv,OUTmul,OUTpow,root; // declare constants wire[31:0] Inf,NaN,Zero,One; wire /*fpow,fdiv,fadd,fsub,fmul,fsqrt,*/addof,subof,mulof,divof; assign Inf=32'b1111111100000000000000000000000; assign NaN=32'b1111111110000000000000000000000; assign One=32'b0011111110000000000000000000000; assign Zero=32'b0000000000000000000000000000000; adder addition(addA,addB,addOUT,clk,rst,addof,sadd,fadd); //A+B subtractor subtraction(subA,subB,subOUT,clk,rst,subof,ssub,fsub); //A-B floatmul floatmulA(mulA,mulB,OUTmul,clk,rst,mulof,smul,fmul);// A*B floatdiv floatdivA(divA,divB,OUTdiv,clk,rst,divof,sdiv,fdiv);// A/B Power power(powA,powB,OUTpow,clk,rst,spow,fpow);//A^B SQRT squareroot(sqrtA,root,clk,rst); // check for Zeros NaN & INFs inputs // check for Special Case Statements always@(posedge clk or posedge opcode)begin // opcode case statements case(opcode) 0: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;mulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end //For the Adder =============================================================== 1: begin if(A[30:0]==Zero[30:0]) OUT<=B; else if(B[30:0]==Zero[30:0]) OUT<=A;


74

else if(A==Inf || B==Inf) OUT<=Inf[30:0]; else if(A[30:0]==B[30:0] && A[31]!=B[31]) //A+(-A) or (-A)+A OUT<=Zero; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A+B = B-A subB<={1'b0,A[30:0]}; subA<=B; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A+-B = A-B subB<={1'b0,B[30:0]}; subA<=A; ssub<=1'b1; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //-A + -B = -(A+B) addB<={1'b0,B[30:0]}; addA<={1'b0,A[30:0]}; OUT<={1'b1,addOUT[30:0]}; sadd<=1'b1; end else begin addA<=A; addB<=B; sadd<=1'b1; OUT<=addOUT; end end //For the Subtractor ============================================================= 2: begin if(A==B) OUT<=Zero;// just make it zero else if(A[30:0]==Zero[30:0]) OUT<={~B[31],B[30:0]}; else if(B[30:0]==Zero[30:0]) OUT<=A; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A - B = -(B+A) addA<={1'b0,A[30:0]}; addB<=B; sadd<=1'b1; OUT<={1'b1,addOUT[30:0]}; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A - -B = A+B addB<={1'b0,B[30:0]}; addA<=A; sadd<=1'b1;


75

OUT<={1'b0,addOUT[30:0]}; end else if (A[31]==1'b1 && B[31]==1'b1)begin //- A - -B = B-A subA<={1'b0,B[30:0]}; subB<={1'b0,A[30:0]}; ssub<=1'b1; OUT<=subOUT; end else begin subA<=A; subB<=B; ssub<=1'b1; OUT<=subOUT; end end // For the Mulitplier ============================================================= 3: begin if (A[30:0]==Zero[30:0]|| B[30:0]==Zero[30:0]) //if(A*Zero) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(A[30:0]==Inf[30:0]||B[30:0]==Inf[30:0]) //if(A*Inf) OUT<=Zero; // Zero else begin mulA<=A; mulB<=B; smul<=1'b1; OUT<=OUTmul; end end // For the Divider ============================================================ 4: begin // varieties of Zero or NaN or Inf if(A[30:0]==Zero[30:0]) OUT<=Zero; // Zero else if(B[30:0]==Zero[30:0]) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(B[30:0]==Inf[30:0]) //if(A/Inf) OUT<=Zero; // Zero else if(A[30:0]==B[30:0]) // 1 OUT[31:0]<={{A[31]^B[31]},One[30:0]};//One else begin divA<=A; divB<=B; sdiv=1'b1; OUT<=OUTdiv;


76

end end // For the Power ============================================================== 5: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; if(B[30:0]==Zero[30:0]) OUT<=One; else begin powA<=A; powB<=B; spow<=1'b1; if(fpow) OUT<=OUTpow; end end // For the SquareRoot ============================================================ 6: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; else begin sqrtA<=A; OUT<=root; end end // Default Case =============================================================== default: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;mulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end endcase end // output the output value assign valueout=OUT; endmodule


77

Appendix B: Digital Testing Results Standard Case Waveforms

Addition

Subtraction

Multiplication

Division

Power


78

Square-root


79

Corner Case Tables

Real Value Floating Point Value “FPU” Value A SMALLEST 0_00000001_00000000000000000000000 5.8774717e-39 B 5 0_10000001_01000000000000000000000 5 Add 5 0_10000001_01000000000000000000000 5 Sub 5 0_10000001_01000000000000000000000 5 Mul 2.9387e-038 0_00000011_11000000000000000000000 8.2284604e-38 Div 1.1755e-039 0_11111111_11001100110011001100101 INF Pow* 7.0138e-192 0_01110011_10000101000111101011100 3.7109374e-4 SQRT 2.6484e-096 0_01000000_00000000000000000000000 1.0842021e-19 Real Value Floating Point Value “FPU” Value A LARGEST 0_11111110_11111111111111111111110 3.4028232e+38 B 5 0_10000001_01000000000000000000000 5 Add 0_11111110_11111111111111111111110 3.4028232e+38 Sub 1_11111110_11111111111111111111110 3.4028232e+38 Mul 1.7014e+039 0_00000000_01111111111111111111100* Overflow(INF) Div 6.8056e+037 0_11111100_11001100110011001100100 7.6563520e+37 Pow 4.5624e+192 1_01111101_11110011001100110011001 -4.8749998e-1 SQRT 2.1360e+096 0_10111110_11111101011100001010001 1.8354509e+19 Real Value Floating Point Value “FPU” Value A -SMALLEST 1_00000001_00000000000000000000000 -5.8774717e-39 B 5 0_10000001_01000000000000000000000 5 Add 5 0_10000001_01000000000000000000000 5 Sub 5 1_10000001_01000000000000000000000 5 Mul -2.9387e-038 1_00000011_01000000000000000000000 -8.2284604e-38 Div -1.1755e-039 1_11111111_11001100110011001100101 - INF Pow -7.0138e-192 1_10001000_00000000000000000000000 -5.1200000e+2 SQRT NaN 0_11111111_10000000000000000000000 NaN Real Value Floating Point Value “FPU” Value A -LARGEST 1_11111110_11111111111111111111110 -3.4028232e+38 B 5 0_10000001_01000000000000000000000 5 Add -3.4028232e+38 1_11111110_11111111111111111111110 -3.4028232e+38 Sub -3.4028232e+38 1_11111110_11111111111111111111110 -3.4028232e+38 Mul -1.7014e+039 1_00000000_01111111111111111111100* Overflow(INF) Div -6.8056e+037 1_11111100_11001100110011001100100 7.6563520e+37 Pow 4.5624e+192 0_01111101_11110011001100110011001 4.8749998e-1 SQRT NaN 0_11111111_10000000000000000000000 NaN * note: the corner cases are too large for the power unit algorithm to handle

fpu project report

Documents