meljun cortes floating point representation

7/29/2019 MELJUN CORTES Floating Point Representation

1/15

Lesson 3 - 1

Year 1

CS113/0401/v1

LESSON 3

FLOTING POINTREPRESENTATION

A better alternative to fixed-pointrepresentation is the floating-

point representation

Capable of holding, in a single-

length word, a greater range of

numbers.

It also uses the same form for

coping with integers, mixednumbers or fractions but at a cost

of reduced accuracy.

The floating-point form is suitablefor handling quantities of far

higher value than usual, or whose

values are very small.


2/15

Lesson 3 - 2

Year 1

CS113/0401/v1

Scientific Notation (Decimal)

57429 = 5.7429 x 10

Mantissa always 1 or greater, but

less than 10

1 < m < 10

Exponent always integer

N.B:Mantissa can also benegative

Mantissa

Exponent

FLOTING POINT NOTATION

4


3/15

Lesson 3 - 3

Year 1

CS113/0401/v1

Divide mantissa by 10, increase

exponent by 1

Mantissa now always fraction, 0.1

or bigger

0.1 < M < 1

Example :

0.93 x 10

0.9 x 10-2

Normalisation gets rid of mixed

numbers Better for computers

NORMALISED FLOATING

POINT FORM (DECIMAL)

8

-28


4/15

Lesson 3 - 4

Year 1

CS113/0401/v1

Same principles as decimal

Mantissa always fraction 0.1

(binary) or bigger

0.1 < M < 1

Exponent is positive or negative

integer N.B: Mantissa needs a sign bit,

but doesnt use twos

complement

NORMALISED FLOATING

POINT FORM (BINARY)


5/15

Lesson 3 - 5

Year 1

CS113/0401/v1

FLOTING POINT

ARITHMETIC

Also called Real Arithmetic

In most practical cases, computer

calculations are done in real

arithmetic

Real is a word with a special

mathematical meaning


6/15

Lesson 3 - 6

Year 1

CS113/0401/v1

Two possibilities:

Exponents are the same

Exponents are different

Need different methods

Same exponents

Add mantissas

(0.1011 x 2 ) + (0.1001 x 2 )

= 1.0100 x 2

For Normalised floating point,mantissa is too big therefore must

normalise

= 0.1010 x 2

Could lose accuracy here

FLOTING POINT

ADDING (1)

5 5

5

6


7/15

Lesson 3 - 7

Year 1

CS113/0401/v1

Different exponents( 0.1001 x 2 ) + ( 0.1110 x 25 )

Increase smaller exponent and

adjust mantissa

( 0.001001 x 2 ) + ( 0.1110 x 25 )

Add as before

= 1.00000 x 2 ROUNDED

( lost accuracy ) Normalise and truncate

= 0.10000 X 2

Could lose accuracy here

FLOTING POINT

ADDING (2)

3 5

5 5

5

6


8/15

Lesson 3 - 8

Year 1

CS113/0401/v1

Similar to addition

( 0.1110 x 2 ) - ( 0.1100 x 2 )

=0.0010 x 2

=0.1000 x 2 ( Normalised )

introduced inaccuracy

( 0.1001 x 2 ) - ( 0.1000 x 2 )

=( 0.1001 x 2 ) - ( 0.0001000 x 2

)

accuracy lost

=0.1000 x 2

FLOTING POINT

SUBTRACTION

7 7

7

5

8 5

8 8

8


9/15

Lesson 3 - 9

Year 1

CS113/0401/v1

Multiply the mantissa and add theexponents

( 0.1101 + 2 ) x ( 0.1010 x 2 )

= (0.01101 x 0.0001101 ) x 2

= 0.1000001 x 2

= 0.1000 x 2 truncated

( 0.1001 x 2 ) x ( 0.1111 x 2 )

= 0.10000111 x 2= 0.1000 x 2 truncated

FLOTING POINT

MULTIPLICATION

6 4

10

10

10

7 -11

-4

-4


10/15

Lesson 3 - 10

Year 1

CS113/0401/v1

FLOTING POINT DIVISION

Divide the mantissa and subtractthe exponents

( 0.111 X 2 ) ( 0.1000 X 2 )

= 1.111 X 2

= 0.1111 X 2 ( Normalised )

( 0.1011 x 2 ) ( 0.1101 x 2 )

= 0.1101 x 2

8 -4

12

13

7 4

3


11/15

Lesson 3 - 11

Year 1

CS113/0401/v1

FLOTING POINT

STORAGE(1)

m x B where

Mantissa can be in

Sign modulus

2s complement

Exponent can be in 2s complement

Excess 2 where n is the no. of

storage bits for exponent

n-1

E


12/15

Lesson 3 - 12

Year 1

CS113/0401/v1

STORING NEGATIVE

MANTISSA

Sign Bit

0 is assigned if the mantissa is

positive

1 is assigned if the mantissa is

negative

Twos Complement Method

The twos complement of the

mantissa is used if it is negative.


13/15

Lesson 3 - 13

Year 1

CS113/0401/v1

STORING NEGATIVE

MANTISSA Twos complement form

The exponent is stored in its twos

complement form if it is negativeand so we do not need to allocate a

separate space to hold the sign of

the exponent. Although we must

always do so for that of the

mantissa.

Excess 2

(where n is the number of bits

assigned for the exponent)

In this method, the value of 2 isadded to the actual exponent

whether positive or negative to give

the stored exponent.

Stored Exponent = True Exponent + 2

OR

True Exponent = Store Exponent -2

n-1

n-1

n-1

n-1


14/15

Lesson 3 - 14

Year 1

CS113/0401/v1

STORE FLOATING POINT

FORMAT

Format A

Sign modulus mantissa & 2s

complement exponent

Floating point numbers are stored

using 6 bits. The first bit is the

mantissa sign bit, next 9 are the

normalised mantissa and the finalsix bits are the exponent in 2s

complement.

Format B

2s complements mantissa & 2scomplement exponent

Floating point numbers are stored

using 16 bits. The ten bit 2s

complement mantissa followedby a six-bit 2s complement

exponent


15/15

Lesson 3 - 15

Year 1

CS113/0401/v1

STORE FLOATING POINT

FORMAT

Format C Sign modulus mantissa & excess

2 method

The first bit is the mantissa sign

bit,the next 9 bits are normalised

mantissa, the final six bits the

exponent in excess 2 form.

Format D

2s complements mantissa & 2

method

Given 16 bits storage. First 10bits 2s complement mantissa

followed by 6 bits exponent in

excess 2 form

n-1

n-1

n-1

n-1

meljun cortes floating point representation

Documents