representing fractions – fixed point the problem: how to represent fractions with finite number of...

Post on 15-Jan-2016

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Representing fractions – Fixed point

The problem:

How to represent fractions with finite number of bits ?

Representing fractions – Fixed point

a1a2a3a4a5a6a7a8a9a10A number with 10 bits

Representing fractions – Fixed point

a1a2a3a4a5a6a7a8a9a10A number with 10 bits

a1a2a3a4a5a6a7a8.a9a10

Fixing the point

Representing fractions – Fixed point

Number Fraction

Signed (2 complement)

-27…27-1 Quanta of 0.25

Unsigned 0…28-1 Quanta of 0.25

Range of representation:

Fixed point : the problem

Cannot represent wide ranges of numbers.

In scientific applications.

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-2

Base (radix) - r

-0.123

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-1

exponent

-0.123

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-1

Number (Mantissa)

-0.123

Representing Fractions – Floating point

10

-0.123

(-1)0*1 * 101

(-1)1*1.23 * 10-1

Sign bit

Problem of uniqueness

0.1

100*10-4

0.001*102

Representation is not Unique

Problem of uniqueness - Normalization

0.61

610*10-4

0.0061*102

Standardization6.1*10-1

One digit to the Left of the point

Normalized Binary Floating point

D = (-1)a0 * (1.a1a2a3…)*2b1b2b3…

a0b1b2…bna1a2a3…am

String of bits

Floating point - Questions Representing the (signed) exponent

How to represent zero? And Nan, infinity ?

How to add, subtract and multiply?

Rounding Errors.

Floating point – Representing the exponent

How to represent singed number ?

Sign bit

2-Complement

Floating point – Representing the exponent

How to represent singed number ?

Sign bit

2-Complement

Neither

Floating point – Representing the exponent

We want the exponent to be binary ordered:

0000 < 0001 < …. < 1000 < … < 1111

Floating point – Representing the exponent

Number = Number - B

000…0001

111…1110

emin

emax

Usually B = 2n-1-1

We define the following sizes like this:

Floating point – Representing zero,NAN, ±

Exponent Mantissa Represent

e = emin-1 M = 0 ±0

e = emin-1 M ≠ 0 0.M*2emin

emin ≤ e ≤ emax

1.M*2e

e = emax+1 M = 0 ±

e = emax+1 M ≠ 0 NaN

IEEE754 special values

Denormalized number

normalized number

IEEE 754

Parameter

Single Double

Mantissa 24 53

emax 127 1023

emin -126 -1022

Exponent width

8 11

Format width

32 64

(Including the signBit)

What is NaN (not a number)

Operation NaN produced by

+ + (- )

* 0 * / 0/0 , /

Partial list

Operation

X±NaN

X*NaN

X/NaN

Infinity

Provide a safe was to continue calculation when overflow is encountered.

Calculations with Floating Point numbers

Addition: Equalize the exponents

(smallerlarger exponent)

Sum the mantissa

Renormalize if necessary

Calculations with Floating Point numbers

Example (in base 10):

91 9.10*101

|E| = 1 , |M| = 3

9.7 9.70*100

Calculations with Floating Point numbers

9.10*101

+ 9.70*100

Not The same Order.

Calculations with Floating Point numbers

9.10*101

+ 9.70*100

9.10*101

+ 0.97*101

10.7*101

renormalize

1.07*102

Calculations with Floating Point numbers

Example II (in base 10):

91 9.10*101

|E| = 1 , |M| = 3

9.75 9.75*100

Calculations with Floating Point numbers

9.10*101

+ 9.75*100

Not The same Order.

Calculations with Floating Point numbers

9.10*101

+ 9.75*100

9.10 *101

+ 0.975*101

10.75*101renormalize

1.07*102

5 (rounding error)

Rounding Errors

The Problem: Squeezing infinite many real numbers

into a finite number of bits

Measuring Rounding Errors

Units in last place (Ulps)

Relative Error

Measuring Rounding Errors – ULP

error = |d.dddd – (z/re)|*rp-1

If d.dddd*re represent z

p digits

Measuring Rounding Errors – ULP

Example I: r = 10 , p = 3

The number 3.14*10-2 represents 0.0314159

Error = 0.159

Measuring Rounding Errors – ULP

What is the maximum ULP if the rounding is toward the nearest number?

0.5 ULP

Measuring Rounding Errors – Relative Error

Relative error = |d.dddd*re – z|/z

If d.dddd*re represent z

p digits

Measuring Rounding Errors – Relative errors

Example I: r = 10 , p = 3

The number 3.14*10-2 represents 0.0314159

Relative Error ~ 0.0005

top related