representing fractions – fixed point the problem: how to represent fractions with finite number of...

35
Representing fractions Fixed point The problem: How to represent fractions with finite number of bits ?

Upload: todd-hoover

Post on 15-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point

The problem:

How to represent fractions with finite number of bits ?

Page 2: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point

a1a2a3a4a5a6a7a8a9a10A number with 10 bits

Page 3: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point

a1a2a3a4a5a6a7a8a9a10A number with 10 bits

a1a2a3a4a5a6a7a8.a9a10

Fixing the point

Page 4: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing fractions – Fixed point

Number Fraction

Signed (2 complement)

-27…27-1 Quanta of 0.25

Unsigned 0…28-1 Quanta of 0.25

Range of representation:

Page 5: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Fixed point : the problem

Cannot represent wide ranges of numbers.

In scientific applications.

Page 6: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-2

Base (radix) - r

-0.123

Page 7: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-1

exponent

-0.123

Page 8: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing Fractions – Floating point

10 1 * 101

-1.23 * 10-1

Number (Mantissa)

-0.123

Page 9: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Representing Fractions – Floating point

10

-0.123

(-1)0*1 * 101

(-1)1*1.23 * 10-1

Sign bit

Page 10: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Problem of uniqueness

0.1

100*10-4

0.001*102

Representation is not Unique

Page 11: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Problem of uniqueness - Normalization

0.61

610*10-4

0.0061*102

Standardization6.1*10-1

One digit to the Left of the point

Page 12: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Normalized Binary Floating point

D = (-1)a0 * (1.a1a2a3…)*2b1b2b3…

a0b1b2…bna1a2a3…am

String of bits

Page 13: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point - Questions Representing the (signed) exponent

How to represent zero? And Nan, infinity ?

How to add, subtract and multiply?

Rounding Errors.

Page 14: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point – Representing the exponent

How to represent singed number ?

Sign bit

2-Complement

Page 15: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point – Representing the exponent

How to represent singed number ?

Sign bit

2-Complement

Neither

Page 16: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point – Representing the exponent

We want the exponent to be binary ordered:

0000 < 0001 < …. < 1000 < … < 1111

Page 17: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point – Representing the exponent

Number = Number - B

000…0001

111…1110

emin

emax

Usually B = 2n-1-1

We define the following sizes like this:

Page 18: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Floating point – Representing zero,NAN, ±

Exponent Mantissa Represent

e = emin-1 M = 0 ±0

e = emin-1 M ≠ 0 0.M*2emin

emin ≤ e ≤ emax

1.M*2e

e = emax+1 M = 0 ±

e = emax+1 M ≠ 0 NaN

IEEE754 special values

Denormalized number

normalized number

Page 19: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

IEEE 754

Parameter

Single Double

Mantissa 24 53

emax 127 1023

emin -126 -1022

Exponent width

8 11

Format width

32 64

(Including the signBit)

Page 20: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

What is NaN (not a number)

Operation NaN produced by

+ + (- )

* 0 * / 0/0 , /

Partial list

Operation

X±NaN

X*NaN

X/NaN

Page 21: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Infinity

Provide a safe was to continue calculation when overflow is encountered.

Page 22: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

Addition: Equalize the exponents

(smallerlarger exponent)

Sum the mantissa

Renormalize if necessary

Page 23: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

Example (in base 10):

91 9.10*101

|E| = 1 , |M| = 3

9.7 9.70*100

Page 24: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

9.10*101

+ 9.70*100

Not The same Order.

Page 25: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

9.10*101

+ 9.70*100

9.10*101

+ 0.97*101

10.7*101

renormalize

1.07*102

Page 26: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

Example II (in base 10):

91 9.10*101

|E| = 1 , |M| = 3

9.75 9.75*100

Page 27: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

9.10*101

+ 9.75*100

Not The same Order.

Page 28: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Calculations with Floating Point numbers

9.10*101

+ 9.75*100

9.10 *101

+ 0.975*101

10.75*101renormalize

1.07*102

5 (rounding error)

Page 29: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Rounding Errors

The Problem: Squeezing infinite many real numbers

into a finite number of bits

Page 30: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors

Units in last place (Ulps)

Relative Error

Page 31: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors – ULP

error = |d.dddd – (z/re)|*rp-1

If d.dddd*re represent z

p digits

Page 32: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors – ULP

Example I: r = 10 , p = 3

The number 3.14*10-2 represents 0.0314159

Error = 0.159

Page 33: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors – ULP

What is the maximum ULP if the rounding is toward the nearest number?

0.5 ULP

Page 34: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors – Relative Error

Relative error = |d.dddd*re – z|/z

If d.dddd*re represent z

p digits

Page 35: Representing fractions – Fixed point The problem: How to represent fractions with finite number of bits ?

Measuring Rounding Errors – Relative errors

Example I: r = 10 , p = 3

The number 3.14*10-2 represents 0.0314159

Relative Error ~ 0.0005