ieee floating point

2

Click here to load reader

Upload: jason-thomas

Post on 12-Jul-2015

89 views

Category:

Education


0 download

TRANSCRIPT

Page 1: IEEE Floating Point

IEEE 754 Standard for Floating Point Representation of Real Numbers.

There are four pieces of info to be represented: Sign of the number (Always the high order bit; 0=positive, 1=negative.) Magnitude of the number (Stored in binary with leading "1" understood. See below.) Sign of the exponent (Stored as an offset "bias" on value of exponent. See below.) Magnitude of the exponent. (Stored as unsigned binary added to offset bias.)

I. Float. (32 bits) Binary template: SEEE EEEE EMMM MMMM ... MMMM where S = sign bit, E = exponent bits (8 of these), M = mantissa bits (23 of these).

Process to represent: A. Convert decimal number to binary. B. Move radix point to 1.xxx... * 2^exp representation. C. Now, do the work: i. S. Assign the sign bit. positive --> 0, negative --> 1.

ii. Assign the mantissa. This is the fractional part of the value of the number. --ignore the leading "1". It is always 1 in this scientific notation, so there is no need to store it. The system will re-insert it later when the number is used. --form the 23 MMM...MMM bits from the first 23 of the remaining "xxx..." bits above. If there are not 23 of them, fill out with trailing zeros.

iii. The exponent "exp" may be positive or negative. Float is stored with excess-127 notation. That is, you ADD 127 to your exponent and store as a pure (unsigned) binary. When the number is re-created for use later, 127 will be subtracted from the exponent. This method saves having to use 2's complement for negative exponents.

iv: Combine in recipe SEEEEEEEEMMMM...MMM format.

Example: Find the representation of decimal -10.5 in float IEEE-754 representation. A. Convert the number: -10.5 decimal --> -1010.1 binary B. Move radix to get scientific notation: -1010.1 binary --> -1.0101 * 2^(+3) (binary) C. Now, do the work: i. Sign bit will be 1 since the number is negative.

ii. Mantissa. (101010000...0) Ignore leading one: 01010000...0 Use first 23 bits: 01010000000000000000000 -->This is mantissa.

Page 2: IEEE Floating Point

iii. Exponent is +3. ADD excess-127 offset: 127 + (+3) = 130 decimal = 10000010 binary. These are the 8 bits for the exponent: 10000010.

D. Combine in SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM format:

11000001001010000000000000000000

To make it easier to read, group in 4's and convert each group to hex:

original 11000001001010000000000000000000 binary grouped: 1100 0001 0010 1000 0000 0000 0000 0000 binary C 1 2 8 0 0 0 0 hex or, C128 0000 hex

II. double. The process to convert to "double" is similar to "float": 1. a total of 64 bits are used to represent the number. 2. The sign bit still is one digit, always the high order digit. 3. The mantissa is stored with 52 bits instead of 23. This gives doubles greater precision. 4. The exponent is stored with 11 bits instead of 8. This gives doubles a wider range of representable numbers.

Example: Decimal +10.5 --> 0100 0000 0010 0101 0000 ... 0000 binary (64 bits) 4025 0000 0000 0000 hex