floating point variables of different lengths

49
Floating point variables of different lengths

Upload: carlow

Post on 05-Jan-2016

36 views

Category:

Documents


3 download

DESCRIPTION

Floating point variables of different lengths. Trade-off: accuracy vs. memory space. Recall that the computer can combine adjacent bytes in the RAM memory to form larger memory cells. Trade-off: accuracy vs. memory space (cont.). Effect of combining memory cells:  . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Floating point variables of  different lengths

Floating point variables of different lengths

Page 2: Floating point variables of  different lengths

Trade-off: accuracy vs. memory space

• Recall that the computer can combine adjacent bytes in the RAM memory to form larger memory cells

Page 3: Floating point variables of  different lengths

Trade-off: accuracy vs. memory space (cont.)

• Effect of combining memory cells:  

Combination  # bits in memory cell   Capability 

1 byte 8 bits 28 = 256 possible patterns

2 bytes 16 bits 216 = 65536 possible patterns

4 bytes 32 bits 232 = 4294967296 possible patterns

Page 4: Floating point variables of  different lengths

Trade-off: accuracy vs. memory space (cont.)

• Trade-off between accuracy and memory usage

• To obtain a higher accuracy (= more significant digit of accuracy), we need to combine more memory cells

Page 5: Floating point variables of  different lengths

Another trade-off: speed

• Fact:

• Arithmetic expressions that uses higher accuracy will take longer time to compute

(It take longer time to multiple 15 digit numbers than 2 digit numbers)

Page 6: Floating point variables of  different lengths

Another trade-off: speed (cont.)

• Trade-off between accuracy and speed

• When a Java program needs to use higher accurate numbers, it will not only use more memory, but it will also take a longer time to complete.

Page 7: Floating point variables of  different lengths

Floating point numbers in Java

• Java provides 2 different sizes of floating point numbers (and variables):

• Single precision floating numbers (has lower accuracy and uses less memory)

• Double precision floating numbers (has more accuracy and uses more memory)

Page 8: Floating point variables of  different lengths

Floating point numbers in Java (cont.)

• This offer programmers more flexibility:

• Computations that have low accuracy requirements can use a single precision to save memory space and run faster

• Computations that have high accuracy requirements must use a larger size to attain the required accuracy

Page 9: Floating point variables of  different lengths

Floating point numbers in Java (cont.)

• Single precision floating point variable:

• uses 4 consecutive bytes of memory as a single 32 bit memory cell

• A single precision floating point variable can represent a floating point number:

• in range of from −1038 to 1038

• and with about 7 decimal digits accuracy

Page 10: Floating point variables of  different lengths

Floating point numbers in Java (cont.)

• A double precision floating point variable is a variable that:

• uses 8 consecutive bytes of memory as a single 64 bit memory cell

• A double precision floating point variable can represent a floating point number:

• in range of from −10308 to 10308

• and with about 15 decimal digits accuracy

Page 11: Floating point variables of  different lengths

Defining single and double precision floating point variables

• We have already learned how to define single precision floating point variables:

• The syntax used to define single precision floating point variables is similar to the one used to define double precision floating point variables

The only difference is that we need to use a different keyword to denote single precision floating point variables

double variableName ;

Page 12: Floating point variables of  different lengths

Defining single and double precision floating point variables (cont.)

• Syntax to define single precision floating point variables:

float variableName ;

Page 13: Floating point variables of  different lengths

Warning: float and double are considered as different types

• What determine a type:

• Because single and double precision floating point numbers uses different encoding methods, Java considers them as different types

• Each data type uses a unique data encoding method

Page 14: Floating point variables of  different lengths

Warning: float and double are considered as different types (cont.)

• You can use the tool below (next slide) to experience how a single precision floating point number is encoded:

• Type in a decimal number in the field named Decimal representation

• Press enter to see the 32 bits pattern used to encode the decimal number

Page 15: Floating point variables of  different lengths

Warning: float and double are considered as different types (cont.)

• Check the website for Decimal representation.

• The row of tiny squares are the 32 bits

• Checked square represents a bit 1 and unchecked square represents a bit 0

Page 16: Floating point variables of  different lengths

Warning: float and double are considered as different types (cont.)

• The following webpage let you compare the different encoding used in single and double precision:

http://mathcs.emory/~cheung/Courses/170/Syllabus/04/IEEE-float/IEEE-754-encoding.html

• Usage: Enter a decimal number in the Decimal Floating-Point field and press Not Rounded

Page 17: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation

• The computer has built-in machine instructions to convert between different encodings

• The Java programming language provides access the computer's conversion operations through a number of conversion operators

• Computer jargon:

• Casting operation = a type conversion operation

Page 18: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Java's Casting operators for single and double precision floating point numbers:

(float) --- convert to the single precision floating point representation

(double) --- convert to the double precision floating point representation

Page 19: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Example 1: conversion sequence float double float⇒ ⇒

public class Casting01 { public static void main(String[] args) { float x; // Define single precision floating point double y; // Define double precision floating point x = 3.1415927f; // f denotes "float" y = (double) x; // **** convert to double representation

Page 20: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

System.out.print("Original single precision x = "); System.out.println(x); System.out.print("Converted double precision y = "); System.out.println(y); x = (float) y; // **** convert to float representation System.out.print("Re-converted single precision x = "); System.out.println(x); } }

Page 21: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

Output:

Original single precision x = 3.1415927

Converted double precision y = 3.1415927410125732

Re-converted single precision x = 3.1415927

Page 22: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Notes:

• The trailing letter f in "3.1415927f" denotes a float typed number

(Yes, even numbers are typed in Java)

• Notice that accuracy of variable x was preserved after the second conversion

Page 23: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Example: conversion sequence double float double⇒ ⇒

public class Casting01 { public static void main(String[] args) { float x; // Define single precision floating point double y; // Define double precision floating point x = 3.1415927f; // f denotes "float" y = (double) x; // **** convert to double representation

Page 24: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

System.out.print("Original single precision x = "); System.out.println(x); System.out.print("Converted double precision y = "); System.out.println(y); x = (float) y; // **** convert to float representation System.out.print("Re-converted single precision x = "); System.out.println(x); } }

Page 25: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Output:

Original double precision x = 3.14159265358979

Converted single precision y = 3.1415927

Re-converted double precision x = 3.1415927410125732

Page 26: Floating point variables of  different lengths

Converting (casting) to a single or a double precision representation (cont.)

• Notes:

• A decimal number without a trailing "f" has the data type double

• Notice that we have lost many digits of accuracy in the float double⇒ conversion !!!

Page 27: Floating point variables of  different lengths

Priority level of the casting operators

• I want to emphasize that:

are operators

• (double)  

• (float)

Page 28: Floating point variables of  different lengths

Priority level of the casting operators (cont.)

• The casting operators in Java are unary operators (i.e., has 1 operand)

Analogy:

Unary negation operator Casting operator is a unary operator−x   (negates the value in variable x)   (float)x   (converts the value in variable x)  

Page 29: Floating point variables of  different lengths

Priority level of the casting operators (cont.)

• Operators in Java has a priority level

Priority level of casting operators:

Operator   Priority   Note  

(   ....   ) Highest

(float)   (double)   − Higher Unary operator, e.g.: (float) 3.0

*   / High   Binary operator, e.g.: 4 * 5

+   - Lowest   Binary operator, e.g.: 4 + 5

Page 30: Floating point variables of  different lengths

Priority level of the casting operators (cont.)

• When operators of different priority appear in one single arithmetic expression, then the operator with the highest priority is executed first.

• It's the same as what you have learned in Elementary School...

Page 31: Floating point variables of  different lengths

Phenomenon: lost of accuracy

• In the previous 2 examples, we have observed the following phenomenon:

Page 32: Floating point variables of  different lengths

Phenomenon: lost of accuracy (cont.)

• Observation:

• When we convert a float to a double and then back to a float, there is no loss in accuracy

• When we convert a double to a float and then back to a float, there is a high loss in accuracy

Page 33: Floating point variables of  different lengths

Phenomenon: lost of accuracy (cont.)

• We can understand why we lose accuracy if we depict the conversion process as follows:

Page 34: Floating point variables of  different lengths

Phenomenon: lost of accuracy (cont.)

• Explanation:

• The float ⇒ double ⇒ float steps pass through a widening conversion and retain accuracy

• The double ⇒ float ⇒ double steps pass through a narrowing conversion and lost accuracy

Page 35: Floating point variables of  different lengths

Overflow condition

• When converting a higher accuracy type to a lower accuracy type, you may cause an overflow condition

• Overflow:

• Each data type can represent a certain range of values

• Overflow = storing a out of range value into a variable

Page 36: Floating point variables of  different lengths

Overflow condition

• Example:

• A double typed variable can store a value in the range of −10308 ... 10308

• A float typed variable can only store a value in the range of −1038 ... 1038

Page 37: Floating point variables of  different lengths

Overflow condition (cont.)

• Here is a Java program with an overflow condition:

public class Overflow1 { public static void main(String[] args) { double d; // range: -10^(308) .. 10^(308) float f; // range: -10^(38) .. 10^(38) d = 3.1415e100; // In range of "double", out of range of "float" f = (float) d; // Overflow !!! System.out.print("d = "); System.out.println(d); System.out.print("f = "); System.out.println(f); } }

Page 38: Floating point variables of  different lengths

Overflow condition (cont.)

• Output of this program:

d = 3.1415E100

f = Infinity

Page 39: Floating point variables of  different lengths

Overflow condition (cont.)

• Conclusion:

• When you convert a values from a higher accuracy type to a lower accuracy type, you may cause a significant loss of information

Page 40: Floating point variables of  different lengths

Safe and unsafe conversion operations

• Safe and unsafe conversions:

• Safe conversion = a conversion from one representation (encoding) to another representation (encoding) where there is no loss in accuracy

• Unsafe conversion = a conversion from one representation (encoding) to another representation (encoding) where there is significant loss in accuracy

Page 41: Floating point variables of  different lengths

Safe and unsafe conversion operations (cont.)

• We saw in the previous example that:

• double float⇒ is a safe conversion

• float double⇒ is a unsafe conversion

Page 42: Floating point variables of  different lengths

Expressions containing values of different types

• It is common to use different data types in the same Java program

• Little known fact about a computer:

• A computer can only operate on data of the same data type

In other words:

• A computer can only add two double typed values

• A computer can only subtract two double double values

• And so on.

Page 43: Floating point variables of  different lengths

Expressions containing values of different types (cont.)

• A computer does not have an instruction to add (or subtract) a double typed value and a float typed value

(This has to do with the encoding method used for different types of data)

• A computer can only add two float typed values

• A computer can only subtract two float double values

• And so on.

Page 44: Floating point variables of  different lengths

Expressions containing values of different types (cont.)

• Operations on different types of value:

• In order to perform any operation on two values of differing types, the computer must:

1. Convert one of the types into the other type

2. Perform the operation on the value (now of the same type

Page 45: Floating point variables of  different lengths

Automatic conversion performed in Java

• It is extremely inconvenient for programmers to have to write conversion operations when values of different types are used in the same expression

• Java makes writing programs less painful by providing a number of automatic conversions

The automatic conversions are all safe conversion

Page 46: Floating point variables of  different lengths

Automatic conversion performed in Java (cont.)

• Java conversion rule:

• lower type OP higher type     ⇒ higher type OP higher type

• When values and/or variables of different types are used in an arithmetic expression (OP), the values and/or variables of the less accurate type is automatically converted to the more accurate type

• The operation (OP) is then performed on 2 values of the more accurate type

• The result of the operation is also of the more accurate type

Page 47: Floating point variables of  different lengths

Automatic conversion performed in Java (cont.)

• Assignment statement:   type1 = type2 ;

• If type1 is a higher accuracy type than type2, then the type2 value is automatically converted to type1 before the assignment statement is executed.

(Because the conversion was safe)

• If type1 is a lower accuracy type than type2, then the assignment statement is not allowed You will need to use an casting operator to make the assignment statement valid.

Page 48: Floating point variables of  different lengths

Automatic conversion performed in Java (cont.)

• Examples:

float x; double y;

y = x; ===> higher accuracy type = lower accuracy type 1. the float value in x is converted to a double 2. the (converted) value is assigned to y

x = y; ===> lower accuracy type = higher accuracy type This assignment is NOT allowed (see rules above) (This is because the conversion is unsafe)

x = (float) y; ===> 1. The casting operator (float) converts the double value into a float value

Page 49: Floating point variables of  different lengths

Automatic conversion performed in Java (cont.)

2. The (converted) float value is assigned to x

y = x + y; ===> x + y 1. the float value is x is converted to a double 2. then + is performed on 2 double values

y = double result 3. The result is double and can be assigned to y (because y is a double typed variable)

x = x + y; ===> x + y 1. the float value is x is converted to a double 2. then + is performed on 2 double values

x = double result 3. The result is double and cannot be assigned to x (because x is a float typed variable