number representation part 2 floating point representations rounding

64
Number Representation Part 2 Floating Point Representations Rounding Representation of the Galois Field elements ECE 645: Lecture 5

Upload: belle

Post on 04-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

ECE 645: Lecture 5. Number Representation Part 2 Floating Point Representations Rounding Representation of the Galois Field elements. Required Reading. Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 17, Floating-Point Representations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Number Representation Part 2 Floating Point Representations Rounding

Number Representation

Part 2Floating Point Representations

RoundingRepresentation of the Galois Field elements

ECE 645: Lecture 5

Page 2: Number Representation Part 2 Floating Point Representations Rounding

Required Reading

Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware DesignChapter 17, Floating-Point RepresentationsChapter 17.5, Rounding schemes

Rounding Algorithms 101http://www.eetimes.com/document.asp?doc_id=1274515

Page 3: Number Representation Part 2 Floating Point Representations Rounding

Floating Point Representations

Page 4: Number Representation Part 2 Floating Point Representations Rounding
Page 5: Number Representation Part 2 Floating Point Representations Rounding
Page 6: Number Representation Part 2 Floating Point Representations Rounding
Page 7: Number Representation Part 2 Floating Point Representations Rounding
Page 8: Number Representation Part 2 Floating Point Representations Rounding

The ANSI/IEEE standard floating-point number representation formats

Short (32-bit) format

Long (64-bit) format

Sign Exponent Significand

8 bits, bias = 127, –126 to 127

11 bits, bias = 1023, –1022 to 1023

52 bits for fractional part (plus hidden 1 in integer part)

23 bits for fractional part (plus hidden 1 in integer part)

Originally IEEE 754-1985.Superseded by IEEE 754-2008 Standard.

-

-

Page 9: Number Representation Part 2 Floating Point Representations Rounding
Page 10: Number Representation Part 2 Floating Point Representations Rounding
Page 11: Number Representation Part 2 Floating Point Representations Rounding

Table 17.1 Some features of the ANSI/IEEE standard floatingpoint number representation formats

Page 12: Number Representation Part 2 Floating Point Representations Rounding

00 01 7F FE FF7E 800 1 127 254 255126 128

–126 0 +127–1 +1

Decimal codeHex code

Exponent value

f = 0: Representation of 0f 0: Representation of denormals, 0.f 2–126

f = 0: Representation of f 0: Representation of NaNs

Exponent encoding in 8 bits for the single/short (32-bit) ANSI/IEEE format

1.f 2e

Exponent Encoding

Page 13: Number Representation Part 2 Floating Point Representations Rounding

Fig. 17.4 Denormals in the IEEE single-precision format.

Page 14: Number Representation Part 2 Floating Point Representations Rounding
Page 15: Number Representation Part 2 Floating Point Representations Rounding
Page 16: Number Representation Part 2 Floating Point Representations Rounding
Page 17: Number Representation Part 2 Floating Point Representations Rounding

New IEEE 754-2008 StandardBasic Formats

Page 18: Number Representation Part 2 Floating Point Representations Rounding

New IEEE 754-2008 StandardBinary Interchange Formats

Page 19: Number Representation Part 2 Floating Point Representations Rounding

Requirements for Arithmetic

Results of the 4 basic arithmetic operations (+, , , ) as well as square-rooting must match those obtained if all intermediate computations were infinitely precise

That is, a floating-point arithmetic operation should introduce no more imprecision than the error attributable to the final rounding of a result that has no exact representation (this is the best possible)

Example:(1 + 21) (1 + 223 )

Rounded result 1 + 21 + 222 Error = ½ ulp

Exact result 1 + 21 + 223 + 224

Page 20: Number Representation Part 2 Floating Point Representations Rounding

Rounding 101

Page 21: Number Representation Part 2 Floating Point Representations Rounding

The IEEE 754-2008 standard includes five

rounding modes:

Default:

Round to nearest, ties to even (rtne)

Optional:

Round to nearest, ties away from 0 (rtna)

Round toward zero (inward)

Round toward + (upward)

Round toward – (downward)

Rounding Modes

Page 22: Number Representation Part 2 Floating Point Representations Rounding

22

Rounding

• Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L')

• Example 1:• old: 000110.11010001 (K=6, L=8)• new: 000110.11 (K'=6, L'=2)

• Example 2:• old: 000110.11010001 (K=6, L=8)• new: 000111. (K'=6, L'=0)

• The following pages show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L

• Usually, keep the number of integral bits the same K'=K

Page 23: Number Representation Part 2 Floating Point Representations Rounding

23

Rounding Equation

• y = round(x)

Fractional partWhole part

xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l yk–1yk–2 . . . y1y0 Round

Page 24: Number Representation Part 2 Floating Point Representations Rounding

24

Rounding Techniques

• There are different rounding techniques:• 1) truncation

• results in round towards zero in signed magnitude• results in round towards -∞ in two's complement

• 2) round to nearest number• 3) round to nearest even number (or odd number)• 4) round towards +∞

• Other rounding techniques• 5) jamming or von Neumann• 6) ROM rounding

• Each of these techniques will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement• Error = round(x) – x

Page 25: Number Representation Part 2 Floating Point Representations Rounding

25

1) Truncation

• Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding• 011.10 (3.5)10 011 (3)10

• Error = -0.5• 111.10 (-3.5)10 111 (-3)10

• Error = +0.5• Truncation in two's complement results in a number chop(x) that is always smaller

than x. This is called round towards -∞ or downward-directed rounding• 011.10 (3.5)10 011 (3)10

• Error = -0.5• 100.10 (-3.5)10 100 (-4)10

• Error = -0.5

The simplest possible rounding scheme: chopping or truncation

xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l xk–1xk–2 . . . x1x0 trunc

ulp

Page 26: Number Representation Part 2 Floating Point Representations Rounding

26

Truncation Function Graph: chop(x)

Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0).

Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as round to -∞).

chop(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

chop(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Page 27: Number Representation Part 2 Floating Point Representations Rounding

27

Bias in two's complement truncation

X (binary)

X (decimal)

chop(x) (binary)

chop(x)

(decimal)

Error

(decimal)

011.00 3 011 3 0

011.01 3.25 011 3 -0.25

011.10 3.5 011 3 -0.5

011.11 3.75 011 3 -0.75

100.01 -3.75 100 -4 -0.25

100.10 -3.5 100 -4 -0.5

100.11 -3.25 100 -4 -0.75

101.00 -3 101 -3 0

• Assuming all combinations of positive and negative values of x equally possible, average error is -0.375

• In general, average error = -(2-L'-2-L )/2, where L' = new number of fractional bits

Page 28: Number Representation Part 2 Floating Point Representations Rounding

28

Implementation truncation in hardware

• Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1

xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L

= yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest)

Page 29: Number Representation Part 2 Floating Point Representations Rounding

29

2) Round to nearest number

• Rounding to nearest number what we normally think of when say round• 010.01 (2.25)10 010 (2)10

•Error = -0.25• 010.11 (2.75)10 011 (3)10

•Error = +0.25• 010.00 (2.00)10 010 (2)10

•Error = +0.00• 010.10 (2.5)10 011 (3)10

•Error = +0.5 [round-half-up (arithmetic rounding)]

• 010.10 (2.5)10 010 (2)10

•Error = -0.5 [round-half-down]

Page 30: Number Representation Part 2 Floating Point Representations Rounding

30

Round-half-up: dealing with negative numbers

• Rounding to nearest number what we normally think of when say round• 101.11 (-2.25)10 110 (-2)10

•Error = +0.25• 101.01 (-2.75)10 101 (-3)10

•Error = -0.25• 110.00 (-2.00)10 110 (-2)10

•Error = +0.00• 101.10 (-2.5)10 110 (-2)10

•Error = +0.5 [asymmetric implementation]

• 101.10 (-2.5)10 101 (-3)10

•Error = -0.5 [symmetric implementation]

Page 31: Number Representation Part 2 Floating Point Representations Rounding

31

Round to Nearest Function Graph: rtn(x)Round-half-up version

rtn(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

rtn(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Asymmetric implementation Symmetric implementation

Page 32: Number Representation Part 2 Floating Point Representations Rounding

32

Bias in two's complement round to nearestRound-half-up asymmetric implementation

X (binary)

X (decimal)

rtn(x) (binary)

rtn(x)

(decimal)

Error

(decimal)

010.00 2 010 2 0

010.01 2.25 010 2 -0.25

010.10 2.5 011 3 +0.5

010.11 2.75 011 3 +0.25

101.01 -2.75 101 -3 -0.25

101.10 -2.5 110 -2 +0.5

101.11 -2.25 110 -2 +0.25

110.00 -2 110 -2 0

• Assuming all combinations of positive and negative values of x equally possible, average error is +0.125

• Smaller average error than truncation, but still not symmetric error• We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias

always• Also have the problem that you can get overflow if only allocate K' = K integral bits

• Example: rtn(011.10) overflow• This overflow only occurs on positive numbers near the maximum positive value, not on negative

numbers

Page 33: Number Representation Part 2 Floating Point Representations Rounding

33

Implementing round to nearest (rtn) in hardware Round-half-up asymmetric implementation

• Two methods• Method 1: Add '1' in position one digit right of new LSB

(i.e. digit L'+1) and keep only L' fractional bitsxk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L

+ 1

= yk-1 yk-2 .. y1 y0. y-1

• Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L) and keep only L' fractional bits

xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L

+ x-1

yk-1 yk-2 .. y1 y0.

ignore (i.e. truncate the rest)

ignore (i.e truncate the rest)

Page 34: Number Representation Part 2 Floating Point Representations Rounding

34

Round to Nearest Even Function Graph: rtne(x)

• To solve the problem with the midway value we implement round to nearest-even number (or can round to nearest odd number)

Fig. 17.8 Rounding to the nearest even number.

rtne(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Fig. 17.9 R* rounding or rounding to the nearest odd number.

R*(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Page 35: Number Representation Part 2 Floating Point Representations Rounding

35

Bias in two's complement round to nearest even (rtne)

• average error is now 0 (ignoring the overflow)• cost: more hardware

X (binary)

X (decimal)

rtne(x) (binary)

rtne(x)

(decimal)

Error

(decimal)

000.10 0.5 000 0 -0.5

001.10 1.5 010 2 +0.5

010.10 2.5 010 2 -0.5

011.10 3.5 0100 (overfl) 4 +0.5

100.10 -3.5 100 -4 -0.5

101.10 -2.5 010 -2 +0.5

110.10 -1.5 010 -2 -0.5

111.10 -0.5 000 0 +0.5

Page 36: Number Representation Part 2 Floating Point Representations Rounding

36

4) Rounding towards infinity

• We may need computation errors to be in a known direction• Example: in computing upper bounds, larger results are

acceptable, but results that are smaller than correct values could invalidate upper bound• Use upward-directed rounding (round toward +∞)• up(x) always larger than or equal to x

• Similarly for lower bounds, use downward-directed rounding (round toward -∞)• down(x) always smaller than or equal to x• We have already seen that round toward -∞ in two's complement

can be implemented by truncation

Page 37: Number Representation Part 2 Floating Point Representations Rounding

37

Rounding Toward Infinity Function Graph: up(x) and down(x)

up(x) down(x)

down(x) can be implemented by chop(x) intwo's complement

Page 38: Number Representation Part 2 Floating Point Representations Rounding

38

Two's Complement Round to Zero

• Two's complement round to zero (inward rounding) also exists

inward(x )

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Page 39: Number Representation Part 2 Floating Point Representations Rounding

39

Other Methods

• Note that in two's complement round to nearest (rtn) involves an addition which may have a carry propagation from LSB to MSB• Rounding may take as long as an adder takes

• Can break the adder chain using the following two techniques:• Jamming or von Neumann• ROM-based

Page 40: Number Representation Part 2 Floating Point Representations Rounding

40

5) Jamming or von Neumann

jam(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Chop and force the LSB of the result to 1

Simplicity of chopping, with the near-symmetry or ordinary rounding

Max error is comparable to chopping (double that of rounding)

- - ---

-

-

-

-

-

Page 41: Number Representation Part 2 Floating Point Representations Rounding

41

6) ROM Rounding

Fig. 17.11 ROM rounding with an 8 2 table.

Example: Rounding with a 32 4 table

ROM(x)

–4

–3

–2

–1

x –4 –3 –2 –1 4 3 2 1

4

3

2

1

Rounding result is the same as that of the round to nearest scheme in 31 of the 32 possible cases, but a larger error is introduced when

x3 = x2 = x1 = x0 = x–1 = 1

xk–1 . . . x4x3x2x1x0 . x–1x–2 . . . x–l xk–1 . . . x4y3y2y1y0 ROM

ROM dataROM address

- - ---

-

-

-

-

-

Page 42: Number Representation Part 2 Floating Point Representations Rounding

Representationof the Galois Field

elements

Page 43: Number Representation Part 2 Floating Point Representations Rounding

Evariste Galois (1811-1832)

Page 44: Number Representation Part 2 Floating Point Representations Rounding

Evariste Galois (1811-1832)

Studied the problem of finding algebraic solutions for the general

equations of the degree 5, e.g.,

f(x) = a5x5+ a4x4+ a3x3+ a2x2+ a1x+ a0 = 0

Answered definitely the question which specific equations of

a given degree have algebraic solutions.

On the way, he developed group theory,

one of the most important branches of modern mathematics.

Page 45: Number Representation Part 2 Floating Point Representations Rounding

Evariste Galois (1811-1832)

1829 Galois submits his results for the first time to the French Academy of Sciences

Reviewer 1 Augustin-Luis Cauchy forgot or lost the communication.

1830 Galois submits the revised version of his manuscript,hoping to enter the competition for the Grand Prizein mathematics

Reviewer 2 Joseph Fourier – died shortly after receiving the manuscript.

1831 Third submission to the French Academy of SciencesReviewer 3

Simeon-Denis Poisson – did not understand the manuscript and rejected it.

Page 46: Number Representation Part 2 Floating Point Representations Rounding

Evariste Galois (1811-1832)

May 1832 Galois provoked into a duel

The night before the duel he wrote a letter to his friend containing the summary of his discoveries.

The letter ended with a plea: “Eventually there will be, I hope, some people who

will find it profitable to decipher this mess.”

May 30, 1832 Galois was grievously wounded in the duel and died in the hospital the following day.

1843 Galois manuscript rediscovered by Joseph Liouville

1846 Galois manuscript published forthe first time in a mathematical journal.

Page 47: Number Representation Part 2 Floating Point Representations Rounding
Page 48: Number Representation Part 2 Floating Point Representations Rounding

Field

Set F, and two operations typically denoted by (but not necessarily equivalent to)

+ and *

Set F, and definitions of these two operations must fulfill special conditions.

Page 49: Number Representation Part 2 Floating Point Representations Rounding

{ set Zp={0, 1, 2, … , p-1}, + (mod p): addition modulo p, * (mod p): multiplication modulo p}

Examples of fieldsInfinite fields

Finite fields

{ R= set of real numbers, + addition of real numbers * multiplication of real numbers}

Page 50: Number Representation Part 2 Floating Point Representations Rounding

Finite Fields = Galois Fields

GF(p) GF(2m)

Polynomial basisrepresentation

Normal basisrepresentation

Fast in hardware

Arithmetic operations

presentin many libraries

Fast squaring

GF(pm)p – primepm – number of elements in the field

Most significantspecial cases

Page 51: Number Representation Part 2 Floating Point Representations Rounding

Quotient and remainder

Given integers a and n, n>0

! q, r Z such that

a = q n + r and 0 r < n

q – quotient

r – remainder (of a divided by n)

q = an = a div n

r = a - q n = a – an

n =

= a mod n

Page 52: Number Representation Part 2 Floating Point Representations Rounding

32 mod 5 =

-32 mod 5 =

Page 53: Number Representation Part 2 Floating Point Representations Rounding

Integers coungruent modulo n

Two integers a and b are congruent modulo n

(equivalent modulo n)

written a b

iff

a mod n = b mod n

or

a = b + kn, k Z

or

n | a - b

Page 54: Number Representation Part 2 Floating Point Representations Rounding

Laws of modular arithmetic

Page 55: Number Representation Part 2 Floating Point Representations Rounding

Rules of addition, subtraction and multiplicationmodulo n

a + b mod n = ((a mod n) + (b mod n)) mod n

a - b mod n = ((a mod n) - (b mod n)) mod n

a b mod n = ((a mod n) (b mod n)) mod n

Page 56: Number Representation Part 2 Floating Point Representations Rounding

9 · 13 mod 5 =

25 · 25 mod 26 =

Page 57: Number Representation Part 2 Floating Point Representations Rounding

Laws of modular arithmetic

Modular addition

Modular multiplication

Regular addition

Regular multiplication

a+b = a+ciff

b=c

a+b a+c (mod n)iff

b c (mod n)

If a b = a c and a 0then b = c

If a b a c (mod n) and gcd (a, n) = 1then b c (mod n)

Page 58: Number Representation Part 2 Floating Point Representations Rounding

Modular Multiplication: Example

18 42 (mod 8) 6 3 6 7 (mod 8)

3 7 (mod 8)

x

6 x mod 8

0 1 2 3 4 5 6 7

0 6 4 2 0 6 4 2

x

5 x mod 8

0 1 2 3 4 5 6 7

0 5 2 7 4 1 6 3

Page 59: Number Representation Part 2 Floating Point Representations Rounding

Finite Fields = Galois Fields

GF(p) GF(2m)

Polynomial basisrepresentation

Normal basisrepresentation

Fast in hardware

Arithmetic operations

presentin many libraries

Fast squaring

GF(pm)p – primepm – number of elements in the field

Most significantspecial cases

Page 60: Number Representation Part 2 Floating Point Representations Rounding

Elements of the Galois Field GF(2m)

Binary representation (used for storing and processing in computer systems):

Polynomial representation(used for the definition of basic arithmetic operations):

A = (am-1, am-2, …, a2, a1, a0) ai {0, 1}

A(x) = aixi = am-1xm-1 + am-2xm-2 + …+ a2x2 + a1x+a0

multiplication+ addition modulo 2 (XOR)

i=0

m-1

Page 61: Number Representation Part 2 Floating Point Representations Rounding

Addition and Multiplicationin the Galois Field GF(2m)

Inputs

A = (am-1, am-2, …, a2, a1, a0)B = (bm-1, bm-2, …, b2, b1, b0)

ai , bi {0, 1}

Output

C = (cm-1, cm-2, …, c2, c1, c0) ci {0, 1}

Page 62: Number Representation Part 2 Floating Point Representations Rounding

Addition

A A(x)B B(x)C C(x) = A(x) + B(x) = = (am-1+bm-1)xm-1 + (am-2+bm-2)xm-2+ …+ + (a2+b2)x2 + (a1+b1)x + (a0+b0) = = cm-1xm-1 + cm-2xm-2 + …+ c2x2 + c1x+c0

Addition in the Galois Field GF(2m)

multiplication+ addition modulo 2 (XOR)

ci = ai + bi = ai XOR bi

C = A XOR B

Page 63: Number Representation Part 2 Floating Point Representations Rounding

Multiplication

A A(x)B B(x)C C(x) = A(x) B(x) mod P(X) = cm-1xm-1 + cm-2xm-2 + …+ c2x2 + c1x+c0

Multiplication in the Galois Field GF(2m)

P(x) - irreducible polynomial of the degree m

P(x) = pmxm + pm-1xm-1 + …+ p2x2 + p1x+p0

Page 64: Number Representation Part 2 Floating Point Representations Rounding