floating point number
DESCRIPTION
floating point number lecturesTRANSCRIPT
Floating Point Numbers In the decimal system, a decimal point
(radix point) separates the whole numbers from the fractional part
Examples:
37.25 ( whole=37, fraction = 25)
123.567
10.12345678
Floating Point Numbers For example, 37.25 can be analyzed as:
101 100 10-1 10-2
Tens Units Tenths Hundredths
3 7 2 5
37.25 = 3 x 10 + 7 x 1 + 2 x 1/10 + 5 x 1/100
Binary Equivalent The binary equivalent of a floating point number can be computed by computing the binary representation for each part separately.
whole part: subtraction or division Fractional part: subtraction or multiplication
Binary Equivalent In the binary representation of a floating point number the column values will be as follows:
… 26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4 …
… 64 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16 …
… 64 32 16 8 4 2 1 . .5 .25 .125 .0625 …
Finding Binary Equivalent of fraction part Converting .25 using Multiplication method.
Step 1 : multiply fraction by 2 until fraction becomes 0
.25
x 2
0.5
x 2
1.0
Step 2 Collect the whole parts and place them after the radix point
64 32 16 8 4 2 1 . .5 .25 .125 .0625
. 0 1
Finding Binary Equivalent of fraction part Converting .25 using subtraction method.
Step 1: write positional powers of two and column values for the fractional part
. 2-1 2-2 2-3 2-4 2 -5
. ½ ¼ 1/8 1/16 1/32
. .5 .25 .125 .0625 0.03125
Finding Binary Equivalent of fraction part Converting .25 using subtraction method.
Step 2: start subtracting the column values from left to right, place a 0 if the value cannot be subtracted or 1 if it can until the fraction becomes .0 .
.25 2 1 . .5 .25 .125 .0625
- .25 . 0 1
.0
Binary Equivalent of FP numberGiven 37.25, convert 37 and .25 using subtraction method. 64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1 0 0 1 0 1 . 0 1 37 .25 - 32 - .25 5 .0
- 4
1 37.2510 = 100101.012
-1 0
So what is the Problem?Given the following binary representation:
37.2510 = 100101.012
7.62510 = 111.1012
0.312510 = 0.01012
How we can represent the whole and fraction part of the binary rep. in 4 bytes?
Solution is NormalizationEvery binary number, except the one corresponding to the number zero, can be normalized by choosing the exponent so that the radix point falls to the right of the leftmost 1 bit.
37.2510 = 100101.012 = 1.0010101 x 25
7.62510 = 111.1012 = 1.11101 x 22
0.312510 = 0.01012 = 1.01 x 2-2
So what Happened ?After normalizing, the numbers now have different mantissas and exponents.
37.2510 = 100101.012 = 1.0010101 x 25
7.62510 = 111.1012 = 1.11101 x 22
0.312510 = 0.01012 = 1.01 x 2-2
IEEE Floating Point Representation Floating point numbers can be represented
by binary codes by dividing them into three parts:
the sign, the exponent, and the mantissa.
1 2 9 10 32
IEEE Floating Point Representation The first, or leftmost, field of our floating point
representation will be the sign bit: 0 for a positive number, 1 for a negative number.
IEEE Floating Point Representation The second field of the floating point number will be the
exponent. Since we must be able to represent both positive and
negative exponents, we will use a convention which uses a value known as a bias of 127 to determine the representation of the exponent. An exponent of 5 is therefore stored as 127 + 5 or 132; an exponent of -5 is stored as 127 + (-5) OR 122.
The biased exponent, the value actually stored, will range from 0 through 255. This is the range of values that can be represented by 8-bit, unsigned binary numbers.
IEEE Floating Point Representation The mantissa is the set of 0’s and 1’s to the
left of the radix point of the normalized (when the digit to the left of the radix point is 1) binary number. ex:1.00101 X 23
The mantissa is stored in a 23 bit field,
Converting decimal floating point values to stored IEEE standard values.
Example: Find the IEEE FP representation of 40.15625.
Step 1.
Compute the binary equivalent of the whole part and the fractional part. ( convert 40 and .15625. to their binary equivalents)
Converting decimal floating point values to stored IEEE standard values.
40 .15625
- 32 Result: - .12500 Result:
8 101000 .03125 .00101
- 8 - .03125
0 .0
So: 40.1562510 = 101000.001012
Converting decimal floating point values to stored IEEE standard values.
Step 2. Normalize the number by moving the decimal point to the right of the leftmost one.
101000.00101 = 1.0100000101 x 25
Converting decimal floating point values to stored IEEE standard values.
Step 3. Convert the exponent to a biased exponent
127 + 5 = 132
==> 13210 = 100001002
Converting decimal floating point values to stored IEEE standard values.
Step 4. Store the results from above
Sign Exponent (from step 3) Mantissa ( from step 2)
0 10000100 01000001010 .. 0
Covert 40.15625 to IEEE 32-bit format
40.15625
40-32-8 Binary value of 40 32 16 8 4 2 1 1 0 1 0 0 0
Step 1Find binary of whole number
Step 1bFind binary of fraction number
.15625 -.1250 - .03125 Binary value of .15625 .5 .25 .125 .0625 .03125 0 0 1 0 1
101000.00101
Step 2aMove decimal (Radix point) to the right of the left most 1 to come up with the exponent
.
Equals 101000.00101 x 25 = 1.0100000101
Step 2b Normalize by multipling the number by 2 and the exponent from step 3
127 + 5 = 13210
Step 3aconvert the exponent to a biased exponent by adding the bias of 127 to the exponent
Binary value of 132 132-128-4 64 32 16 8 4 2 1 1 0 0 0 1 0 0
Step 3bConvert the biased exponent to its binary value
Sign Exponent Mantissa 0 10000100 0100000101 0 010000100 01000001010…0
Step 4aPiece together values.
Sign Exponent Mantissa 0 010000100 01000001010…0
Step 4c. Left pad exponent and right pad mantissa to determine the binary equivalent of IEEE
Step 4bMove to mantissa without the 1.
Converting decimal floating point values to stored IEEE standard values.
Ex : Find the IEEE FP representation of –24.75 Step 1. Compute the binary equivalent of the whole
part and the fractional part. 24 .75- 16 Result: - .50 Result: 8 11000 .25 .11- 8 - .25 0 .0
So: -24.7510 = -11000.112
Converting decimal floating point values to stored IEEE standard values.
Step 2.
Normalize the number by moving the decimal point to the right of the leftmost one.
-11000.11 = -1.100011 x 24
Converting decimal floating point values to stored IEEE standard values.
Step 3. Convert the exponent to a biased exponent
127 + 4 = 131
==> 13110 = 100000112
Step 4. Store the results from above
Sign Exponent mantissa
1 10000011 1000110..0
Converting from IEEE format to the decimal floating point values.
Do the steps in reverse order In reversing the normalization step move
the radix point the number of digits equal to the exponent. if exponent is +ve move to the right, if –ve move to the left.
Converting from IEEE format to the decimal floating point values.
Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents.
Sign Exponent Mantissa a. 1 01111101 010..0
Converting from IEEE format to the decimal floating point values.
Step 1 Extract exponent (unbias exponent)
biased exponent = 01111101 = 125
exponent: 125 - 127= -2
Converting from IEEE format to the decimal floating point values.
Step 2 Write Normalized number
1 . ____________ x 2 ----
-1. 01 x 2 –2
mantissa
Exponent
Converting from IEEE format to the decimal floating point values.
Step 3: Write the binary number (denormalize value from step2)
-0.01012
Step 4: Convert binary number to FP equivalent ( add column values)
-0.01012 = - ( 0.25 + 0.0625) = -0.3125
Converting from IEEE format to the decimal floating point values.
Ex: Convert the following 32 bit binary numbers to their decimal floating point equivalents.
Sign Exponent Mantissa
0 10000011 1101010..0
Converting from IEEE format to the decimal floating point values.
Step 1 Extract exponent (unbias exponent)
biased exponent = 10000011 = 131
exponent: 131 - 127= 4
Converting from IEEE format to the decimal floating point values.
Step 2 Write Normalized number
1 . ____________ x 2 ----
1. 110101 x 2 4
mantissa
Exponent
Converting from IEEE format to the decimal floating point values.
Step 3 Write the binary number (denormailze value from step 2)
11101.012
Step 4 Convert binary number to FP equivalent ( add column values)
11101.012 = 16 + 8 + 4 + 1 + 0.25 = 29.2510
Proof your work Convert 0 10000100 01000001010…0 back to IEEE
32 16 8 4 2 1 1 0 1 0 0 0 32+8 = 40
Step 4aFind whole number of exponent
Step 4bFind fractional numberofthe mantissa
.5 .25 .125 .0625 .03125 0 0 1 0 1 .1250 + .03125 = .15625
Equals 101000 . 001012
Step 3aMove decimal (Radix point) to the right ofthe left most 1 to come up with the exponent
. 1.0100000101 x 25
Step 2Denormalize by multipling the number by 2 andthe exponent from step 2.Set up format 1 mantissa x 2 exponent
132 – 127 = 5
Step 1bUnbias the number by Subtracting 127from the decimal number to determinethe exponent
128 32 16 8 4 2 1 1 0 0 0 1 0 0 128+4 Binary value = 132
Step 1aDetermine the decimal of the binarynumber
Step 4cAdd together values. Make sure to includethe sign if it is a negative value
Sign Exponent Mantissa 0 010000100 01000001010…0
40.15625
Step 3bConvert binary number to FP equvalent