floating point number

Floating Point Numbers In the decimal system, a decimal point

(radix point) separates the whole numbers from the fractional part

Examples:

37.25 ( whole=37, fraction = 25)

123.567

10.12345678

Floating Point Numbers For example, 37.25 can be analyzed as:

101 100 10-1 10-2

Tens Units Tenths Hundredths

3 7 2 5

37.25 = 3 x 10 + 7 x 1 + 2 x 1/10 + 5 x 1/100

Binary Equivalent The binary equivalent of a floating point number can be computed by computing the binary representation for each part separately.

whole part: subtraction or division Fractional part: subtraction or multiplication

Binary Equivalent In the binary representation of a floating point number the column values will be as follows:

… 26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4 …

… 64 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16 …

… 64 32 16 8 4 2 1 . .5 .25 .125 .0625 …

Finding Binary Equivalent of fraction part Converting .25 using Multiplication method.

Step 1 : multiply fraction by 2 until fraction becomes 0

.25

x 2

0.5

x 2

1.0

Step 2 Collect the whole parts and place them after the radix point

64 32 16 8 4 2 1 . .5 .25 .125 .0625

. 0 1

Finding Binary Equivalent of fraction part Converting .25 using subtraction method.

Step 1: write positional powers of two and column values for the fractional part

. 2-1 2-2 2-3 2-4 2 -5

. ½ ¼ 1/8 1/16 1/32

. .5 .25 .125 .0625 0.03125

Finding Binary Equivalent of fraction part Converting .25 using subtraction method.

Step 2: start subtracting the column values from left to right, place a 0 if the value cannot be subtracted or 1 if it can until the fraction becomes .0 .

.25 2 1 . .5 .25 .125 .0625

- .25 . 0 1

.0

Binary Equivalent of FP numberGiven 37.25, convert 37 and .25 using subtraction method. 64 32 16 8 4 2 1 . .5 .25 .125 .0625

26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4

1 0 0 1 0 1 . 0 1 37 .25 - 32 - .25 5 .0

- 4

1 37.2510 = 100101.012

-1 0

So what is the Problem?Given the following binary representation:

37.2510 = 100101.012

7.62510 = 111.1012

0.312510 = 0.01012

How we can represent the whole and fraction part of the binary rep. in 4 bytes?

Solution is NormalizationEvery binary number, except the one corresponding to the number zero, can be normalized by choosing the exponent so that the radix point falls to the right of the leftmost 1 bit.

37.2510 = 100101.012 = 1.0010101 x 25

7.62510 = 111.1012 = 1.11101 x 22

0.312510 = 0.01012 = 1.01 x 2-2

So what Happened ?After normalizing, the numbers now have different mantissas and exponents.

37.2510 = 100101.012 = 1.0010101 x 25

7.62510 = 111.1012 = 1.11101 x 22

0.312510 = 0.01012 = 1.01 x 2-2

IEEE Floating Point Representation Floating point numbers can be represented

by binary codes by dividing them into three parts:

the sign, the exponent, and the mantissa.

1 2 9 10 32

IEEE Floating Point Representation The first, or leftmost, field of our floating point

representation will be the sign bit: 0 for a positive number, 1 for a negative number.

IEEE Floating Point Representation The second field of the floating point number will be the

exponent. Since we must be able to represent both positive and

negative exponents, we will use a convention which uses a value known as a bias of 127 to determine the representation of the exponent. An exponent of 5 is therefore stored as 127 + 5 or 132; an exponent of -5 is stored as 127 + (-5) OR 122.

The biased exponent, the value actually stored, will range from 0 through 255. This is the range of values that can be represented by 8-bit, unsigned binary numbers.

IEEE Floating Point Representation The mantissa is the set of 0’s and 1’s to the

left of the radix point of the normalized (when the digit to the left of the radix point is 1) binary number. ex:1.00101 X 23

The mantissa is stored in a 23 bit field,

Converting decimal floating point values to stored IEEE standard values.

Example: Find the IEEE FP representation of 40.15625.

Step 1.

Compute the binary equivalent of the whole part and the fractional part. ( convert 40 and .15625. to their binary equivalents)


40 .15625

- 32 Result: - .12500 Result:

8 101000 .03125 .00101

- 8 - .03125

0 .0

So: 40.1562510 = 101000.001012


Step 2. Normalize the number by moving the decimal point to the right of the leftmost one.

101000.00101 = 1.0100000101 x 25


Step 3. Convert the exponent to a biased exponent

127 + 5 = 132

==> 13210 = 100001002


Step 4. Store the results from above

Sign Exponent (from step 3) Mantissa ( from step 2)

0 10000100 01000001010 .. 0

Covert 40.15625 to IEEE 32-bit format

40.15625

40-32-8 Binary value of 40 32 16 8 4 2 1 1 0 1 0 0 0

Step 1Find binary of whole number

Step 1bFind binary of fraction number

.15625 -.1250 - .03125 Binary value of .15625 .5 .25 .125 .0625 .03125 0 0 1 0 1

101000.00101

Step 2aMove decimal (Radix point) to the right of the left most 1 to come up with the exponent

.

Equals 101000.00101 x 25 = 1.0100000101

Step 2b Normalize by multipling the number by 2 and the exponent from step 3

127 + 5 = 13210

Step 3aconvert the exponent to a biased exponent by adding the bias of 127 to the exponent

Binary value of 132 132-128-4 64 32 16 8 4 2 1 1 0 0 0 1 0 0

Step 3bConvert the biased exponent to its binary value

Sign Exponent Mantissa 0 10000100 0100000101 0 010000100 01000001010…0

Step 4aPiece together values.

Sign Exponent Mantissa 0 010000100 01000001010…0

Step 4c. Left pad exponent and right pad mantissa to determine the binary equivalent of IEEE

Step 4bMove to mantissa without the 1.


Ex : Find the IEEE FP representation of –24.75 Step 1. Compute the binary equivalent of the whole

part and the fractional part. 24 .75- 16 Result: - .50 Result: 8 11000 .25 .11- 8 - .25 0 .0

So: -24.7510 = -11000.112


Step 2.

Normalize the number by moving the decimal point to the right of the leftmost one.

-11000.11 = -1.100011 x 24


Step 3. Convert the exponent to a biased exponent

127 + 4 = 131

==> 13110 = 100000112

Step 4. Store the results from above

Sign Exponent mantissa

1 10000011 1000110..0

Converting from IEEE format to the decimal floating point values.

Do the steps in reverse order In reversing the normalization step move

the radix point the number of digits equal to the exponent. if exponent is +ve move to the right, if –ve move to the left.


Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents.

Sign Exponent Mantissa a. 1 01111101 010..0


Step 1 Extract exponent (unbias exponent)

biased exponent = 01111101 = 125

exponent: 125 - 127= -2


Step 2 Write Normalized number

1 . ____________ x 2 ----

-1. 01 x 2 –2

mantissa

Exponent


Step 3: Write the binary number (denormalize value from step2)

-0.01012

Step 4: Convert binary number to FP equivalent ( add column values)

-0.01012 = - ( 0.25 + 0.0625) = -0.3125


Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents.

Sign Exponent Mantissa

0 10000011 1101010..0


Step 1 Extract exponent (unbias exponent)

biased exponent = 10000011 = 131

exponent: 131 - 127= 4


Step 2 Write Normalized number

1 . ____________ x 2 ----

1. 110101 x 2 4

mantissa

Exponent


Step 3 Write the binary number (denormailze value from step 2)

11101.012

Step 4 Convert binary number to FP equivalent ( add column values)

11101.012 = 16 + 8 + 4 + 1 + 0.25 = 29.2510

Proof your work Convert 0 10000100 01000001010…0 back to IEEE

32 16 8 4 2 1 1 0 1 0 0 0 32+8 = 40

Step 4aFind whole number of exponent

Step 4bFind fractional numberofthe mantissa

.5 .25 .125 .0625 .03125 0 0 1 0 1 .1250 + .03125 = .15625

Equals 101000 . 001012

Step 3aMove decimal (Radix point) to the right ofthe left most 1 to come up with the exponent

. 1.0100000101 x 25

Step 2Denormalize by multipling the number by 2 andthe exponent from step 2.Set up format 1 mantissa x 2 exponent

132 – 127 = 5

Step 1bUnbias the number by Subtracting 127from the decimal number to determinethe exponent

128 32 16 8 4 2 1 1 0 0 0 1 0 0 128+4 Binary value = 132

Step 1aDetermine the decimal of the binarynumber

Step 4cAdd together values. Make sure to includethe sign if it is a negative value

Sign Exponent Mantissa 0 010000100 01000001010…0

40.15625

Step 3bConvert binary number to FP equvalent

floating point number

Documents