lecture 12: integer arithmetic and floating point cs 2011 fall 2014, dr. rozier

Lecture 12: Integer Arithmetic and Floating Point

CS 2011

Fall 2014, Dr. Rozier

FULL ADDER SOLUTIONS

INTEGER ARITHMETIC

Putting Together Multiple Bits

Making it Faster

Carry Look Ahead Adder

Making it Even Faster

Carry-Select Adder

Kogge-Stone Adder

How do we get subtraction?X B2T(X)B2U(X)

0000 00001 10010 20011 30100 40101 50110 60111 7

–88–79–610–511–412–313–214–115

10001001101010111100110111101111

01234567

How do we get subtraction?X B2T(X)B2U(X)

0000 00001 10010 20011 30100 40101 50110 60111 7

–88–79–610–511–412–313–214–115

10001001101010111100110111101111

01234567

1 0 0 1 0 11 1 x

0 1 1 0 1 00 0~x+

1 1 1 1 1 11 1-1

FLOATING POINT

Carnegie Mellon

Fractional binary numbers

• What is 1011.1012?

2i

2i-1

421

1/21/41/8

2-j

bibi-

1

•••

b2 b1 b0 b-1 b-2 b-3•••

b-j

Carnegie Mellon

• • •

Fractional Binary Numbers

• Representation– Bits to right of “binary point” represent fractional powers of

2– Represents rational number:

• • •

Carnegie Mellon

Fractional Binary Numbers: Examples

Value Representation5 3/4 101.112

2 7/8 010.1112

63/64 001.01112

Observations Divide by 2 by shifting right Multiply by 2 by shifting left Numbers of form 0.111111…2 are just below 1.0

1/2 + 1/4 + 1/8 + … + 1/2i + … 1.0➙ Use notation 1.0 – ε

Carnegie Mellon

Representable Numbers

• Limitation– Can only exactly represent numbers of the form x/2k

– Other rational numbers have repeating bit representations

• Value Representation– 1/3 0.0101010101[01]…2

– 1/5 0.001100110011[0011]…2

– 1/10 0.0001100110011[0011]…2

Floating Point Standard

• Defined by IEEE Std 754-1985• Developed in response to divergence of

representations– Portability issues for scientific code

• Now almost universally adopted• Two representations

– Single precision (32-bit)– Double precision (64-bit)

IEEE Floating-Point Format

• S: sign bit (0 non-negative, 1 negative)• Normalize significand: 1.0 ≤ |significand| < 2.0

– Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)

– Significand is Fraction with the “1.” restored• Exponent: excess representation: actual exponent + Bias

– Ensures exponent is unsigned– Single: Bias = 127; Double: Bias = 1203

S Exponent Fraction

single: 8 bitsdouble: 11 bits

single: 23 bitsdouble: 52 bits

Bias)(ExponentS 2Fraction)(11)(x

Floating-Point Addition

• Consider a 4-digit decimal example– 9.999 × 101 + 1.610 × 10–1

• 1. Align decimal points– Shift number with smaller exponent– 9.999 × 101 + 0.016 × 101

• 2. Add significands– 9.999 × 101 + 0.016 × 101 = 10.015 × 101

• 3. Normalize result & check for over/underflow– 1.0015 × 102

• 4. Round and renormalize if necessary– 1.002 × 102

Floating-Point Addition

• Now consider a 4-digit binary example– 1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)

• 1. Align binary points– Shift number with smaller exponent– 1.0002 × 2–1 + –0.1112 × 2–1

• 2. Add significands– 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

• 3. Normalize result & check for over/underflow– 1.0002 × 2–4, with no over/underflow

• 4. Round and renormalize if necessary– 1.0002 × 2–4 (no change) = 0.0625

FP Adder Hardware

• Much more complex than integer adder• Doing it in one clock cycle would take too long

– Much longer than integer operations– Slower clock would penalize all instructions

• FP adder usually takes several cycles– Can be pipelined

FP Adder Hardware

Step 1

Step 2

Step 3

Step 4

FP Arithmetic Hardware

• FP multiplier is of similar complexity to FP adder– But uses a multiplier for significands instead of an

adder• FP arithmetic hardware usually does

– Addition, subtraction, multiplication, division, reciprocal, square-root

– FP integer conversion• Operations usually takes several cycles

– Can be pipelined

Floating Point

• Floating Point is handled by a FPU, floating point unit.

Pentium FDIV Bug

• Intel’s Pentium 5– Professor Thomas Nicely noticed inconsistencies in

calculations when addingPentiums to his cluster

– Floating-point divisionoperations didn’t quite comeout right.Off by 61 parts per million

Pentium FDIV Bug

• Intel acknowledged the flaw, but claimed it wasn’t serious. Wouldn’t affect most users.

• Byte magazine estimatedonly 1 in 9 billion floatingpoint operations wouldsuffer the error.

Pentium FDIV Bug

• Total cost to Intel?

$450 million

WRAP UP

For next time

• Read Chapter 4.1-4.4

For next time

• Read Chapter 3 • Sections 3.1 – 3.5

lecture 12: integer arithmetic and floating point cs 2011 fall 2014, dr. rozier

Documents

right of binary point

binary pointsshift number

point standarddefined

floating point unit

digit binary example1

leading prebinary

fp adder hardwaremuch

integer adderdoing