isbn 0-321-33025-0 chapter 5 names, bindings, type checking, and scopes

120
ISBN 0-321-33025-0 Chapter 5 Names, Bindings, Type Checking, and Scopes

Post on 19-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

ISBN 0-321-33025-0

Chapter 5

Names, Bindings, Type Checking, and Scopes

1-2

Chapter 5 Topics

• Introduction • Names• Variables• The Concept of Binding• Type Checking• Strong Typing• Type Compatibility • Scope and Lifetime• Referencing Environments• Named Constants

1-3

Imperative Languages

• Imperative languages are abstractions of von Neumann architecture– Memory

• stores both instructions and data

– Processor• provides operations for modifying the contents of

the memory

1-4

Memory Cells and Variables

• The abstractions in a language for the memory cells of the machine are variables. – In some cases, the characteristics of the

abstractions are very close to the characteristics of the cells;

• an example of this is an integer variable, which is usually represented directly in one or more bytes of memory.

– In other cases, the abstractions are far removed from the organization of the hardware memory,

• as with a three-dimensional array, which requires a software mapping function to support the abstraction.

1-5

Attributes of Variables

• A variable can be characterized by a collection of properties, or attributes, the most important of which is type, a fundamental concept in programming languages.

• The design of the data types of a language requires that a variety of issues be considered.– Among the most important of these issues are the scope

and lifetime of variables and type equivalence.• Related to the first two are the issues of type checking and

initialization.

1-6

C-based Languages

• In this book, the author uses the phrase C-based languages to refer to C, C++, Java, and C#.

1-7

Name

• One of the fundamental attributes of variables: names, which have broader use than simply for variables.

• A name is a string of characters used to identify some entity in a program.

1-8

Other Usage of the Name Attribute of Variables

• Names are also associated with labels, subprograms, formal parameters, and other program constructs.

• The term identifier is often used interchangeably with name.

1-9

Design Issues for Names

– Are names case sensitive?– Are special words reserved words or

keywords?

1-10

Length of Names

• Length– If too short, they cannot be connotative– Length examples:

• FORTRAN I: maximum 6• C 89:

– no length limitation on its internal names» Only the first 31 are significant

– external names (defined outside functions and handled by linkers)

» are restricted to 6 characters.

• C# and Java: no limit, and all are significant• C++: no limit, but implementers often impose one

– They do this so the symbol table in which identifiers are stored during compilation need not be too large, and also to simplify the maintenance of that table.

1-11

Name Forms

• Names in most programming languages have the same form: a letter followed by a string consisting of letters, digits, and underscore character (_).– In the 1970s and 1980s, underscore

characters were widely used to form names.• E.g. my_stack

– Nowadays, in the C-based languages, underscore form names are largely replaced by camel notation.

• E.g. myStack

1-12

Embedded Spaces in Names

• In versions of Fortran prior to Fortran 90, names could have embedded spaces, which were ignored.– For example, the following two names were

equivalent:

Sum Of Salaries

SumOfSalaries

1-13

Case Sensitivity

• In many languages, notably the C-based languages, uppercase and lowercase letters in names are distinct– For example, the following three names are

distinct in C++: rose, ROSE, and Rose.

1-14

Drawbacks of Case Sensitivity

• detriment to readability – Names that look very similar in fact denote

different entities. – Case sensitivity violates the design principle

that language constructs that look the same should have the same meaning.

• detriment to writability– The need to remember specific case usage

makes it more difficult to write correct programs.

1-15

Special Words

• Special words in programming languages are used – to make programs more readable by naming

actions to be performed. – to separate the syntactic entities of programs.

• In most languages, special words are classified as reserved words, but in some they are only keywords.– P.S.: In program code examples in this book,

special words are presented in boldface.

1-16

Keywords

• A keyword is a word of a programming language that is special only in certain contexts.

1-17

Example of Keywords

• Fortran is one of the languages whose special words are keywords. – In Fortran, the word Real, when found at the beginning

of a statement and followed by a name, is considered a keyword that indicates the statement is a declarative statement.

– However, if the word Real is followed by the assignment operator, it is considered a variable name.

– These two uses are illustrated in the following: Real Apple Real = 3.4

• Fortran compilers and Fortran program readers must recognize the difference between names and special words by context.

1-18

Reserved Words

• A reserved word is a special word of a programming language that can NOT be used as a name.

1-19

Advantages of Reserved Words

• As a language design choice, reserved words are better than keywords because the ability to redefine keywords can lead to readability problems.

1-20

Drawback Example of Keywords

• In Fortran, one could have the statements • Integer Real• Real Integer

which declare the program variable Real to be of Integer type and the variable Integer to be of Real type.

• In addition to the strange appearance of these declaration statements, the appearance of Real and Integer as variable names elsewhere in the program could be misleading to program readers.

1-21

Variables

• A program variable is an abstraction of a computer memory cell or collection of cells.

• Variables can be characterized as a sextuple of attributes:– Name– Address– Value– Type– Lifetime– Scope

1-22

Benefits of Using Variables

• The move from machine languages to assembly languages was largely one of replacing absolute numeric memory addresses with names, making programs far more readable and thus easier to write and maintain.

• That above step also provided an escape from the problem of manual absolute addressing, because the translator that converted the names to actual addresses also chose those addresses.

1-23

Address

• The address of a variable is the machine memory address with which it is associated.

• In many languages, it is possible for the same variable name to be associated with different addresses at different places and at different times in the program.

1-24

How Parameters and Local Variables Are

Represented in an Object File?

abc(int aa)

{int bb;

bb=aa;

:

:

}

abc:

function prologue

*(%ebp-4)=*(%ebp+8)

function epilogue

aa

return address

previous frame

pointbb

ebpa C function equivalent assembly code

P.S.: function prologue and function epilogue are added by a compiler

1-25

The Same Names in Different Functions Are Associated with Different Addresses• A program can have two subprograms, subl and sub2, each of which defines a variable that uses the same name, say sum.

• Because these two variables are independent of each other, a reference to sum in subl is unrelated to a reference to sum in sub2.

1-26

The Same Names in Different Executions May Be Associated with Different Addresses

• If a subprogram has a local variable that is allocated from the run-time stack when the subprogram is called, different calls may result in that variable having different addresses.– These are in a sense different instantiations of

the same variable.

1-27

Memory Allocation of Local Variables

b

return address add_g

address of G’s

frame point

C[0]

H’s stack

frame

G(int a)

{ int i;

H(3);

add_g:

i++;

}

H(int b)

{ char c[100];

int i;

while((c[i++]=getch())!=EOF)

{

}

}

C[99]

G’s stack framei

i

stackhigh address

low address

1-28

L-value

• The address of a variable is sometimes called its L-value, because that is what is required when a variable appears in the left side of an assignment statement.

1-29

Aliases

• It is possible to have multiple variables that have the same address.

• When more than one variable name can be used to access a single memory location, the names are called aliases.

1-30

Disadvantages of Aliases

• Aliasing is a hindrance to readability because it allows a variable to have its value changed by an assignment to a different variable. – For example, if variables total and sum are aliases, any

change to total also changes sum and vice versa. – A reader of the program must always remember that total and sum are different names for the same memory cell.

– Because there can be any number of aliases in a program, this is very difficult in practice.

• Aliasing also makes program verification more difficult.

1-31

Ways to Create Aliases

• Aliases can be created in programs in several different ways.

• C and C++: union types. • Two pointer variables are aliases whey they

point to the same memory location.• Reference variables: when a C++ pointer is set

to point at a named variable, the pointer, when dereferenced, and the variable’s name are aliases.

1-32

Type

• The type of a variable determines – the range of values the variable can store and– the set of operations that are defined for values of

the type.

• For example, the type int in Java specifies– a value range of -2147483648 to 2147483647 and – arithmetic operations for addition, subtraction,

multiplication, division, and modulus.

1-33

Value

• The value of a variable is the contents of the memory cell or cells associated with the variable.

1-34

Abstract Cells

• It is convenient to think of computer memory in terms of abstract cells, rather than physical cells.

• The physical cells, or individually addressable units, of most contemporary computer memories are byte-sized, with a byte usually being eight bits in length. – This size is too small for most program

variables.

• We define an abstract memory cell to have the size required by the variable with which it is associated.

1-35

Example

• Although floating-point values may occupy four physical bytes in a particular implementation of a particular language, we think of a floating-point value as occupying a single abstract memory cell.

• We consider the value of each simple nonstructured type to occupy a single abstract cell. Henceforth, when we use the term memory cell, we mean abstract memory cell.

1-36

r-value

• A variable's value is sometimes called its r-value because it is what is required when the variable is used on the right side of an assignment statement.

• To access the r-value, the L-value must be determined first.– Such determinations are not always simple.

• For example, scoping rules can greatly complicate matters, as is discussed in Section 5.8.

1-37

Binding and Binding Time

• In a general sense, a binding is an association, such as between an attribute and an entity or between an operation and a symbol.

• The time at which a binding takes place is called binding time.

1-38

Possible Binding Time

• Bindings can take place at:– language design time, – language implementation time, – compile time, – link time, – load time, – run time.

1-39

Different Language Syntactic Unit Has Different Binding Time for Different Attribute

• The asterisk symbol (*) is usually bound to the multiplication operation at language design time.

• A data type, such as int in C, is bound to a range of possible values at language implementation time.

• A variable in a Java program is bound to a particular data type at compile time.

• A variable may be bound to a storage cell when the program is loaded into memory.– e.g.: global variables and static variables.– That same binding does not happen until run time in

some cases, as with variables declared in Java method.• A call to a library subprogram is bound to the

subprogram code at link time.

1-40

Example

• Consider the following C assignment statement count = count + 5; • Some of the bindings and their binding times for the parts

of this assignment statement are as follows: – The type of count is bound at compile time– The set of possible values of count is bound at compiler

design time/language implementation time.– The meaning of the operator symbol + is bound at compile

time, when the types of its operands have been determined.• The symbol + may have different usage in a programming

Language, such as addition of integers or addition of floating point numbers.

– The value of count is bound at execution time with this assignment

1-41

Static Binding

• A binding is static if it first occurs before run time and remains unchanged throughout program execution.

1-42

Dynamic Binding

• If a binding first occurs during run time or can change in the course of program execution, it is called dynamic.

1-43

Type Bindings

• Before a variable can be referenced in a program, it must be bound to a data type.

• Two important aspects of type bindings are – how the type is specified.– when the binding takes place.

• Types can be specified statically through some form of – explicit declaration – implicit declaration

• Both explicit and implicit declarations create static bindings to types.

1-44

Explicit Declarations

• An explicit declaration is a statement in a program that lists variable names and specifies that they are a particular type.– Most programming languages designed since

the mid-1960s require explicit declarations of ALL variables (Perl, JavaScript, Ruby, and ML are some exceptions).

1-45

• An implicit declaration is a means of associating variables with types through default conventions instead of declaration statements. – In this case, the FIRST appearance of a

variable name in a program constitutes its implicit declaration.

– Several widely used languages whose initial designs were done before the late 1960s – notably Fortran, PL/I, and BASIC – have implicit declarations.

Implicit Declarations

1-46

Implicit Declaration Example

• In Fortran, an identifier that appears in a program that is not explicitly declared is implicitly declared according to the following convention: – If the identifier begins with one of the letters I, J, K, L, M, or N, or their lowercase versions, it is implicitly declared to be Integer type.

– In all other cases, it is implicitly declared to be Real type.

1-47

Drawbacks of Implicit Declarations

• Although they are a minor convenience to programmers, implicit declarations can be detrimental to reliability because they prevent the compilation process from detecting some typographical and programmer errors.– For example, in Fortran, variables that are

accidentally left undeclared by the programmer are given default types and unexpected attributes, which could cause subtle errors that are difficult to diagnose.

1-48

Disable Implicit Declarations in Fortran

• Many Fortran programmers now include the declaration – Implicit none – in their programs. This declaration instructs the compiler to no implicitly declare any variables.

1-49

Method to Avoid Implicit Declarations

• Some of the problems with implicit declarations can be avoided by requiring names for specific types to begin with particular special characters. For example, in Perl,– any name that begins with $ is a scalar, which can store

either a string or a numeric value.– any name beginning with @ is an array – The above rules create different name spaces for different

type variables. In this scenario, the names @apple and %apple are unrelated, because each is from a different name space.

– Furthermore, a program reader always knows the type of a variable when reading its name.

1-50

Declarations and Definitions

• In C and C++, one must sometimes distinguish between declarations and definitions.

• Declarations specify types and other attributes but do not cause allocation of storage.

• Definitions specify attributes and cause storage allocation,

1-51

Number of Declarations and Definitions

• For a specific name, a C program can have ANY number of compatible declarations, but only a SINGLE definition.

• One purpose of variable declarations in C is to provide the type of a variable defined external to a function but used in the function. It tells the compiler the type of a variable and that it is defined elsewhere.

1-52

Function Definition and Function Prototype

• The idea in previous slides carries over to the functions in C and C++, where prototypes declare names and interfaces, but not the code of functions.

• Function definitions, on the other hand, are complete.

1-53

Dynamic Type Binding

• With dynamic type binding:– the type is not specified by a declaration

statement, nor can it be determined by the spelling of its name.

– the variable is bound to a type when it is assigned a value in an assignment statement.

• When the assignment statement is executed, the variable being assigned is bound to the type of the value of the expression on the right side of the assignment.

1-54

The Primary Advantage of Dynamic Variable Type Binding

• A great deal of programming flexibility.

1-55

Creation of Generic Programs

• A program to process a list of data in a language that uses dynamic type binding can be written as a generic program, meaning that it is capable of dealing with data of any numeric type.

• Whatever type data is input will be acceptable, because the variables in which the data is to be stored can be bound to the correct type when the data is assigned to the variables after input.

• By contrast, because of static binding of types, one

cannot write a C++ or Java program to process a list of data without knowing the type of that data.

1-56

Example of Dynamic Binding

• In PHP, and JavaScript, the binding of a variable to a type is dynamic. – For example, a JavaScript script may contain

the following statement: list = [10.2, 3.5] Regardless of the previous type of the variable

named list, this assignment causes it to become a single-dimensioned array of numeric elements of length 2.

– If the statement list = 47 followed the assignment above, list would become a numeric scalar variable.

1-57

Dynamic Binding Is Less Reliable in Error Detection

• Dynamic type binding causes programs to be less reliable, because the error detection capability of the compiler is diminished relative to a compiler for a language with static type bindings.

1-58

Dynamic Binding Results in Weak Type-related Error Detection

• Dynamic type binding allows any variable to be assigned a value of any type.

• Incorrect types of right sides of assignments are not detected as errors; rather the type of the left side is simply changed to the incorrect type.

1-59

Example of Drawbacks of Dynamic Type Binding

• Suppose – that in a particular JavaScript program, i and x are currently

storing scalar numeric values, and y is currently storing an array. – that the program needs the assignment statement

i = x; but because of a keying error, it has the assignment

statement i = y;

• In Javascript (or any other language that uses dynamic type binding), no error is detected in this statement by the interpreter - i is simply changed to an array. But later uses of i will expect it to be a scalar, and correct results will be impossible.

• In a language with static type binding, the compiler would detect the error in the assignment i = y, and the program would not get to execution.

1-60

Disadvantages of Dynamic Binding in terms of Cost

• Perhaps the greatest disadvantage of dynamic type binding is cost. The cost of implementing dynamic attribute binding is considerable, particularly in execution time.

• Type checking must be done at run time.

• Furthermore, every variable must have a run-time descriptor associated with it to maintain the current type.

• The storage used for the value of a variable must be of varying size, because different type values require different amounts of storage.

1-61

Implementation Concerns

• Languages that have dynamic type binding for variables are usually implemented using pure interpreters rather than compilers.

• Up to date computers do not have instructions whose operand types are not known at compile time.– Therefore, a compiler cannot build machine instructions

for the expression A + B if the types of A and B are not known at compile time.

• Pure interpretation typically takes at least ten times as long as to execute equivalent machine code.

1-62

Type Inference

• ML is a programming language that supports both functional and imperative programming (Milner et al., 1990).

• ML employs an interesting type inference mechanism, in which the types of most expressions can be determined without requiring the programmer to specify the types of the variables.

1-63

General Syntax of a ML Function

fun function_name(formal parameters) = expression;

• The value of the expression is returned by the function.

1-64

Example (1)

• The function declaration

fun circumf(r) = 3.14159 * r * r;

specifies a function that takes a floating-point argument ( real in ML) and produces a floating-point result.

• The types are inferred from the type of the constant in the expression.

1-65

Example (2)

• Likewise, in the function fun times10(x) = 10 * x;

the argument and functional value are inferred to be of type int.

1-66

Example (3)

• Consider the following ML function:

fun square(x) = x * x;

– ML determines the type of both the parameter and the return value from the * operator in the function definition. Because this is an arithmetic operator, the type of the parameter and the function are assumed to be numeric.

– In ML, the default numeric type to be int. So, it is inferred that the type of the parameter and the return value of square is int.

1-67

Example (4)

• If square were called with a floating-point value, as in

square(2.75);

it would cause an error, because ML does not coerce real values to int type.

1-68

Example (5)

• If we wanted square to accept real parameters, it could be rewritten as

fun square(x) : real = x * x;

• Because ML does not allow overloaded functions, this version could no coexist with earlier int version.

1-69

Allocation and Deallocation of Memory Cells

• The memory cell to which a variable is bound somehow must be taken from a pool of available memory. This process is called allocation.

• Deallocation is the process of placing a memory cell that has been unbound from a variable back into the pool of available memory.

1-70

The Lifetime of a Variable

• The lifetime of a variable is the time during which the variable is bound to a specific memory location.

• So the lifetime of a variable begins when

it is bound to a specific cell and ends when it is unbound from that cell.

1-71

Categories of Scalar Variables

1-72

Categories of Scalar Variables

• It is convenient to separate scalar (unstructured) variables into four categories, according to their lifetimes:– static, – stack-dynamic, – explicit heap-dynamic,– implicit heap-dynamic.

1-73

Static Variables

• Static variables are those that are bound to memory cells before program execution begins and remain bound to those same memory cells until program execution terminates.

1-74

Applications of Static Variables

• Globally accessible variables are often used throughout the execution of a program, thus making it necessary to have them bound to the same storage during that execution.

• Sometimes it is convenient to have variables that are declared in subprograms be history-sensitive, that is, have them retain values between separate executions of the subprogram. – This is a characteristic of a variable that is statically

bound to storage.

1-75

Advantages of Static Variables

• Another advantage of static variables is efficiency. – All addressing of static variables can be direct;

other kinds of variables often require indirect addressing, which is slower.

– Furthermore, no run-time overhead is incurred for allocation and deallocation of static variables, although this time is often neglibible.

1-76

Disadvantages of Static Variables

• reduced flexibility– in a language that has only variables that are

statically bound to storage, recursive subprograms can not supported.

• storage cannot be shared among variables– For example, suppose a program has two

subprograms, both of which require large unrelated arrays. Further suppose that the two subprograms are never active at the same time If the arrays are static, they cannot share the same storage for their arrays.

1-77

Example

• C and C++ allow programmers to include the static specifier on a variable definition in a function, making the variables it defines static.

1-78

Elaboration of the Declaration Statements of Stack-Dynamic Variables• Stack-dynamic variables are those whose

storage bindings are created when their declaration statements are elaborated, but whose types are statically bound.

• Elaboration of such a declaration refers to the

storage allocation and binding process indicated by the declaration, which takes place when execution reaches the code to which the declaration is attached.

• Elaboration occurs during run time.

1-79

Memory Allocation of Stack-Dynamic Variables Occur during Run-time

b

return address add_g

address of G’s

frame point

C[0]

H’s stack

frame

G(int a)

{ int i;

H(3);

add_g:

i++;

}

H(int b)

{ char c[100];

int i;

while((c[i++]=getch())!=EOF)

{

}

}

C[99]

G’s stack framei

i

stackhigh address

low address

1-80

Example

• The variable declarations that appear at the beginning of a Java method are elaborated when the method is called.

• The variables defined by those declarations are deallocated when the method completes its execution.

1-81

The Location That Stores Stack-Dynamic Variables

• As their name indicates, stack-dynamic variables are allocated from the run-time stack.

1-82

Storage Binding of a Variable May Occur before Its Declaration

• Some languages – for example, C and Java –allow variable declarations to occur anywhere a statement can appear.

• In some implementations of these languages, all of the stack-dynamic variables declared in a function or method (not including those declared in nested blocks) may be bound to storage at the beginning of execution of the function or method, even though the declarations of some of these variables do not appear at the beginning.

1-83

Stack-Dynamic Variables and Recursive Programs

• To be useful, at least in most cases, recursive subprograms require some form of dynamic local storage so that each active copy of the recursive subprogram has its own version of the local variables.

• These needs are conveniently met by stack-dynamic variables.

1-84

Memory Sharing

• The introduction of stack-dynamic variables allows all subprograms to share the same memory space for their locals.

1-85

Disadvantages of Stack-Dynamic Variables

• the run-time overhead of allocation and deallocation.– however the overhead is not significant,

because all of the stack-dynamic variables that are declared at the beginning of a subprogram are allocated and deallocated togerher.

• slower accesses– Indirect addressing is required

• subprograms cannot be history sensitive.

1-86

Examples of Stack-Dynamic Variables

• In Java, C++ and C#, local variables defined in methods are by default stack-dynamic.

• In Pascal and Ada, all non-heap variables defined in subprograms are stack-dynamic.

1-87

Explicit Heap-Dynamic Variables

• Explicit heap-dynamic variables are nameless (abstract) memory cells that are allocated and deallocated by explicit run-time instructions specified by the programmer.

• These variables, which are allocated from and

deallocated to the heap, can only be referenced through pointers or reference variables.– The pointer or reference variable that is used to access

an explicit heap-dynamic variable is created as nay other scalar variable.

1-88

Properties of a Heap

• The heap is a collection of storage cells whose organization is highly disorganized because of the unpredictability of its use.

1-89

Creating an Explicit Heap-Dynamic Variable

• An explicit heap-dynamic variable is created:– by an operator (for example, in Ada and C++ )

or – by a call to a system subprogram provided for that

purpose (for example, malloc() in C).

1-90

Allocation Operator in C++

• In C++, the allocation operator, named new, uses a type name as its operand.

• When executed, an explicit heap-dynamic variable of the operand type is created and a pointer to it is returned. – Because an explicit heap-dynamic variable is

bound to a type at compile time, that binding is static.

– However, such variables are bound to storage at the time they are created, which is during run time.

1-91

Deleting a Heap-Dynamic Variables• In addition to a subprogram or operator

for creating explicit heap-dynamic variables, some languages include a means of destroying them.

1-92

Example of Explicit Heap-dynamic Variables

What follows is a C++ code segment: int *intnode; //create a pointer

...

intnode = new int; // create the heap-dynamic variable

delete intnode; // deallocate the heap-dynamic variable

// to which intnode points

• In this example, an explicit heap-dynamic variable of int type is created by the new operator.

• This variable can then be referenced through the pointer, intnode.

• Later, the variable is deallocated by the delete operator.

1-93

Java Objects

• Java, all data except the primitive scalars are objects.

• Java objects are explicit heap-dynamic and are accessed through reference variables.

• Java has no way of explicitly destroying a heap-dynamic variable; rather, implicit garbage collection is used.

1-94

Applications of Explicit Heap-Dynamic Variables

• Explicit heap-dynamic variables are often used for dynamic structures, such as linked lists and trees, that need to grow and/or shrink during execution.

• Such structures can be built conveniently using pointers or references and explicit heap-dynamic variables.

1-95

Disadvantages of Explicit Heap-Dynamic Variables

• the difficulty of using pointer reference variables correctly.

• the cost of references to the variables, allocations, and deallocations.

• the complexity of storage management implementation.

1-96

Implicit Heap-dynamic Variables

• Implicit heap-dynamic variables are bound to heap storage only when they are assigned values.

• In fact, all their attributes are bound every time they are assigned.

1-97

Example

• For example, a JavaScript script may contain the following statement to assign a value to the implicit heap-dynamic variable list :– list = [10.2, 3.5] Regardless of the previous type of the variable

named list, this assignment causes it to become a single-dimensioned array of numeric elements of length 2.

– If the statement list = 47 followed the assignment above, list would become a numeric scalar variable.

1-98

Advantages

• The advantage of such variables is that they have the highest degree of flexibility, allowing highly generic code to be written.

1-99

Disadvantages

• the run-time overhead of maintaining all the dynamic attributes, which could include array subscript types and ranges, among others.

• the loss of some error detection by the compiler, as discussed in Section 5.4.2.2.

1-100

Type Checking

1-101

Generalize the Concepts of Functions and Assignment Statements

• Subprograms are thought of as operators – their parameters are their operands.

• The assignment symbol is thought of as a binary operator– with its target variable and its expression

being the operands.

1-102

Type Checking

• Type checking is the activity of ensuring that the operands of an operator are of COMPATIBLE types.

1-103

Compatible Types

• A compatible type is one that is – either legal for the operator – or is allowed under language rules to be implicitly

converted by compiler-generated code (or the interpreter) to a legal type.

• This automatic conversion is called a coercion.

1-104

Example

• If an int variable and a float variable are added in Java, the value of the int variable is coerced to float and a floating-point add is done.

1-105

Type Errors

• A type error is the application of an operator to an operand of an inappropriate type.

1-106

Example of Type Errors

• In the original version of C, if an int value was passed to a function that expected a float value, a type error would occur (because compilers for that language did not check the types of parameters.)– Integer Signed Attacks

1-107

Example of Integer Signed Attacks

static char data[256];int store_data(char *buf, int len){ if (len > 256 ) return -1; return memcpy(data, buf, len);}P.S.: memcpy requires an unsigned integer for the

length parameter; therefore, the signed variable len would be promoted to an unsigned integer, lose its negative sign, and could wrap around and become a very large positive number, cause memcpy() to read past the bounds of buf.

1-108

Static Type Checking

• If all bindings of variables to types are static in a language, then type checking can nearly always be done statically.

1-109

Dynamic Type Checking

• Dynamic type binding requires type checking at run time, which is called dynamic type checking.– Some languages, such as JavaScript and PHP, because of their dynamic type binding, allow only dynamic type checking.

1-110

Pros and Cons of Static Type Checking

• It is better to detect errors at compile time than at run time, because the earlier correction is usually less costly.

• The penalty for static checking is reduced programmer flexibility.

1-111

Type Checking for Memory Cells That Can Store Values of Different Types

• Type checking is complicated when a language allows a memory cell to store values of different types at different times during execution. – Such memory cells can be created with Ada variant records, Fortran EQUIVALENCE, and C and C++ unions.

1-112

Type Checking for Variables That Can Store Values of Different Types Must Be Dynamic

• For variables that can store values of different types, type checking, if done, MUST be dynamic and requires the run-time system to maintain the type of the current value of such memory cells.

• So even though all variables are statically bond to types in languages such as C++, not all type errors can be detected by static type checking.– For example, the type of a statically bond C++ variable

may be union.

1-113

Strongly Typed Programming Languages

• A programming language is strongly typed if type errors are always detected.

• The above requires that the types of all operands can be determined either at compile time or at run time.

1-114

The Importance of Strongly Typed Languages

• The importance of strong typing lies in its ability to detect ALL misuses of variables that result in type errors.

• A strongly typed language also allows the detection, at run time, of uses of the incorrect type values in variables that can store values of more than one type.

1-115

Fortran 95 Is Not Strongly Typed

• In Fortran 95 the use of Equivalence between variables of different types allows a variable of one type to refer to a value of a different type, without the system being able to check the type of the value when one of the Equivalenced variables is referenced or assigned.

1-116

Explanation: Fortran 95 Is Not Strongly Typed

Integer A

Real R

Equivalence (A,R)

A=123

A

R

123

123 is not a real number; hence, a type

error occurs.

1-117

C and C++ Are Not Strongly Typed Languages

• C and C++ are not strongly typed languages because – both allow functions for which parameters are

not type checked. – Furthermore, the union types of these

languages are not type checked.

1-118

Coercion Rules vs. Type Checking

• The coercion rules of a language have an important effect on the value of type checking.– For example,

• Expressions are strongly typed in Java.• However, an arithmetic operator with one floating-

point operand and one integer operand is legal.• The value of the integer operand is coerced to

floating-point, and a floating-point operation takes place.

• Even though the above is what is usually intended by the programmer, the coercion also results in a loss of part of the reason for strong typing – error detection (see next slide).

1-119

• Suppose a program written in a strongly typed language had the int variables a and b and the float variable d.

• Now, if a programmer meant to type a + b, but mistakenly typed a + d, the error would not be detected by the compiler. The value of a would simply be coerced to float.

• The above coercion weaken the value of strong typing.

The Value of Strong Typing Is Weakened by Coercion

1-120

Coercion Reduces Reliability

• Languages with a great deal of coercion, like Fortran, C, and C++, are significantly less reliable than those with little coercion, such as Ada.

• Java and C# has half as many assignment type coercions as C++, so its error detection is better than that of C++, but still not nearly as effective as that of Ada.