6. semantic analysis

44
6. Semantic Analysis

Upload: gerek

Post on 23-Feb-2016

92 views

Category:

Documents


0 download

DESCRIPTION

6. Semantic Analysis. Semantic Analysis Phase Purpose : compute additional information needed for compilation that is beyond the capabilities of Context-Free Grammars and Standard Parsing Algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 6. Semantic Analysis

6. Semantic Analysis

Page 2: 6. Semantic Analysis

• Semantic Analysis Phase– Purpose: compute additional information needed for

compilation that is beyond the capabilities of Context-Free Grammars and Standard Parsing Algorithms

– Static semantic analysis: Take place prior to execution (Such as building a symbol table performing type inference and type checking)

• Classification– Analysis of a program required by the rules of the

programming language to establish its correctness and to guarantee proper execution

– Analysis performed by a compiler to enhance the efficiency of execution of the translated program

Page 3: 6. Semantic Analysis

• Description of the static semantic analysis – Attribute grammar

• identify attributes of language entities that must be computed and to write attribute equations or semantic rules that express how the computation of such attributes is related to the grammar rules of the language.

– Which is most useful for languages that obey the principle of Syntax-Directed Semantics

– Abstract syntax as represented by an abstract syntax tree

Page 4: 6. Semantic Analysis

Contents

6.1 Attributes and Attribute Grammars 6.3 The Symbol Table6.4 Data Types and Type Checking

Page 5: 6. Semantic Analysis

6.1 Attributes and Attribute Grammars

Page 6: 6. Semantic Analysis

• Attributes – Any property of a programming language construct such as

• The data type of a variable• The value of an expression• The location of a variable in memory• The object code of a procedure• The number of significant digits in a number

• Binding of the attribute– The process of computing an attribute and associating its

computed value with the language construct• Binding time

– The time during the compilation/execution process when the binding of an attribute occurs

– Based on the difference of the binding time, attributes is divided into Static attributes (be bound prior to execution) and Dynamic attributes (be bound during execution)

Page 7: 6. Semantic Analysis

• Example: The binding time and significance during compilation of the attributes. – Attribute computations are extremely varied– Type checker

• A Type Checker is a semantic analyzer that computes the data type attribute of all language entities for which data types are defined and verifies that these types conform to the type rules of the language.

• In a language like C or Pascal, is an important part of semantic analysis; .

• While in a language like LISP , data types are dynamic, LISP compiler must generate code to compute types and perform type checking during program execution.

– The values of expressions• Usually dynamic and the be computed during execution; • But sometime can also be evaluated during compilation (constant

folding).

Page 8: 6. Semantic Analysis

6.1.1 Attribute Grammars

Page 9: 6. Semantic Analysis

• In syntax-directed semantics, attributes are associated directly with the grammar symbols of the language.

• X.a means the value of ‘a’ associated to ‘X’– X is a grammar symbol and a is an attribute associated to X

• Syntax-directed semantics: – Attributes are associated directly with the grammar

symbols of the language. – Given a collection of attributes a1, …, ak, it implies that for

each grammar rule X0X1X2…Xn (X0 is a nonterminal), – The values of the attributes Xi.aj of each grammar symbol Xi

are related to the values of the attributes of the other symbols in the rule.

Page 10: 6. Semantic Analysis

• An attribute grammar for attributes a1, a2, … ,ak is the collection of all attribute equations or semantic rules of the following form, – for all the grammar rules of the language.

Xi.aj = fij(X0.a1,…,X0.ak, …,X1.al,….. X1.ak …, Xn-1.a1, …Xn.ak)• Where fij is a mathematical function of its arguments

• Typically, attribute grammars are written in tabular form as follows:Grammar Rule Semantic RulesRule 1 Associated attribute equations ...Rule n Associated attribute equation

Page 11: 6. Semantic Analysis

Example 6.1 consider the following simple grammar for unsigned numbers:Number number digit | digitDigit 0|1|2|3|4|5|6|7|8|9

The most significant attribute: numeric value (write as val), and the responding attribute grammar is as follows:

Grammar Rule Semantic Rules Number1number2 digit number1.val = number2.val*10+digit.val Numberdigit number.val= digit.val digit0 digit.val = 0 digit1 digit.val = 1 digit2 digit.val = 2 digit3 digit.val = 3 digit4 digit.val = 4 digit5 digit.val = 5 digit6 digit.val = 6 digit7 digit.val = 7 digit8 digit.val = 8 digit9 digit.val = 9 table 6.1

Page 12: 6. Semantic Analysis

The parse tree showing attribute computations for the number 345 is given as follows

number (val = 34*10+5 =345) number digit (val = 3*10+4=34) (val = 5)

number digit 5 (val=3) (val=4)

digit 4 3 Fig 6.1

Page 13: 6. Semantic Analysis

Example 6.2 consider the following grammar for simple integer arithmetic expressions:

Exp exp + term | exp-term | termTerm term*factor | factorFactor (exp)| number

The principal attribute of an exp (or term or factor) is its numeric value (write as val) and the attribute equations for the val attribute are given as follows

Grammar Rule Semantic Rules exp1exp2+term exp1.val=exp2.val+term.val exp1exp2-term exp1.val=exp2.val-term.val exp1 term exp1.val= term.val term1term2*factor term1.val=term2.val*factor.val termfactor term.val=factor.val factor(exp) factor.val=exp.val factornumber factor.val=number.val

table 6.2

Page 14: 6. Semantic Analysis

Given the expression (34-3)*42 , the computations implied by this attribute grammar by attaching equations to nodes in a parse tree is as follows

Fig6.2

Exp

(val = 1302)

term

(val = 31*42=1302)

term

(val = 31) factor

(val = 42)

Exp

( (val = 34-3=31) )

Exp

(val = 34) term

(val = 3)

factor

(val = 31) number

(val = 42)

term

(val = 34) factor

(val =3)

factor

(val = 34) number

(val = 3)

number

(val = 34)

-

*

Page 15: 6. Semantic Analysis

Example 6.3 consider the following simple grammar of variable declarations in a C-like syntax:

Decl type var-listTypeint | floatVar-listid, var-list |id

Define a data type attribute for the variables given by the identifiers in a declaration and write equations expressing how the data type attribute is related to the type of the declaration as follows: (We use the name dtype to distinguish the attribute from the nonterminal type)

Grammar Rule Semantic Rules decltype var-list var-list.dtype = type.dtype typeint type.dtype = integer type float type.dtype = real var-list1id,var-list2 id.dtype = var-list1.dtype var-list2.dtype= var-list1.dtype var-listid id.dtype = var-list.dtype

table 6.3

Page 16: 6. Semantic Analysis

Note that there is no equation involving the dtype of the nonterminal decl. It is not necessary for the value of an attribute to be specified for all grammar symbols Parse tree for the string float x,y showing the dtype attribute as specified by the attribute grammar above is as follows:

Fig6.3

Attribute grammars may involve several interdependent attributes.

decl

Type (dtype = real)

Var-list (dtype = real)

Var-list (dtype = real)

Id (x)

(dtype = real) Id (y)

(dtype = real)

float ,

Page 17: 6. Semantic Analysis

Example 6.4 consider the following grammar, where numbers may be octal or decimal, suppose this is indicated by a one-character suffix o(for octal) or d(for decimal):

Based-num num basecharBasechar o|dNum num digit | digitDigit 0|1|2|3|4|5|6|7|8|9

In this case num and digit require a new attribute base, which is used to compute the val attribute. The attribute grammar for base and val is given as follows.

Page 18: 6. Semantic Analysis

Grammar Rule Semantic Rules Based-numnum basechar Based-num.val = num.val Based-num.base = basechar.base Basechar o Basechar.base = 8 Basechar d Basechar.base = 10 Num1num2 digit num1.val = If digit.val = error or num2.val = error

Then error Else num2.val*num1.base+digit.val Num2.base = num1.base Digit.base = num1.base Num digit num.val = digit.val Digit.base = num.base Digit 0 digit.val = 0 Digit 1 digit.val = 1 … … Digit 7 digit.val = 7 Digit 8 digit.val = if digit.base = 8 then error else 8 Digit 9 digit.val = if digit.base = 8 then error else 9

Tab 6.4

Page 19: 6. Semantic Analysis

Fig6.4

num

(val = 3)

(base = 8)

digit

(val = 3)

(base = 8)

3

Based-num

(val = 229)

num

(val = 28*8+5=229)

(base = 8)

basechar

(base = 8)

num

(val = 3*8+4=28)

(base=8)

digit

(val = 4)

(base = 8)

o

4

digit

(val = 5)

(base = 8)

5

345o

Page 20: 6. Semantic Analysis

6.3 Symbol Table

Page 21: 6. Semantic Analysis

• The symbol table is the major inherited attribute in a compiler and, after syntax tree, forms the major data structure.

• The operations of the Symbol table areinsert: is used to store the information provided by name

declarations when processing these declarations.

lookup: is needed to retrieve the information associated to a name when that name is used to associate code

delete: is needed to remove the information provided by a declaration when that declaration no longer applies

Page 22: 6. Semantic Analysis

The structure of the symbol table• The symbol table in a compiler is a typical dictionary

data structure include • linear lists :constant time insert operations, can be

chosen if speed is not a concern

• various search tree structures( binary search trees, AVL trees, B trees):do not provide best case efficiency and complexity of delete operations

• hash tables: best choice because all three operations can be performed at constant time.

Page 23: 6. Semantic Analysis

• The best scheme for compiler construction is to open addressing, called separate chaining.

• In this method each bucket is actually a linear list, and collisions are resolved by inserting the new item into the bucket list.

Page 24: 6. Semantic Analysis

• separate chaining.– The size of the linear list ranges from few hundred to

over a thousand– Size of the bucket array should be chosen to be a prime

number.• Hash function for a symbol table

– Convert the character string in to an integer between 0…..size-1

• Convert each character in to a non negative integer– C language uses ASCII values – Pascal uses ord function

• Combine all the integers in to a single integer– Programmers choice

• Resulting integer should be scaled to the range of 0……size-1

Page 25: 6. Semantic Analysis

• separate chaining.– Mod function is used to scale a number to fall in to the

range of 0…size-1.– Combine all the integers in to a single integer

• One simple method is to consider first few characters or first middle and last characters and ignore the rest

– Inadequate: as programmers can use variables like temp1,temp2… and will cause frequent collisions

• Chosen method should reduce the collisions• One good method is to multiplying factor(some constant

number )– If ci is the numeric value of the (i+1)th character and hi is the partial is

the partial hash value computed at the (i)th step then » h(i+1) = hi+c» This can be generalized to » H=( n-1c1+ n-2c2+……. cn-1+cn)mod size

Page 26: 6. Semantic Analysis

• separate chaining– The choice of the constant in this formula has

significant effect on the out come • A reasonable choice for the constant is a number which is

power of 2 (like 2,4,8,16,32,64,128…)

Page 27: 6. Semantic Analysis

• Declarations.– There is no specific standard to create a symbol table for all

compilers.• Symbol table will vary from compiler to compiler and it depends on

– How each compiler Calls insert and delete operations– How the variables are declared in each compiler.– How long the compiler wants the symbol table to exist

– The behavior of the symbol table heavily depends on the properties of declarations of the language being translated.

– There are 4 basic kinds of declarations • Constant ,

– Const int SIZE=199• type

– Structures and unions • variable

– Variable declarations :int a,b[100]

Page 28: 6. Semantic Analysis

• Declarations.– procedure/function

• Are constant declaration of procedure/function type (similar to other variables)

• These variables are explicit in nature (needs to be declared before use)

• In few compliers they are implicit in nature (need not be declared before use)

– Fortran,basic

– We can create one single symbol table for all above 4 declarations

• Constant declarations(value binding) associate values to names.– Saves them in the symbol table and their value will never change in

symbol table– In the execution phase the compiler will change these constant

variables names to values by picking them from the symbol table

Page 29: 6. Semantic Analysis

• Declarations.– We can create one single symbol table for all

above 4 declarations– Type declarations, Variable declaration and

function/procedure declarations also will be saved in the symbol table with minute difference between each other

Page 30: 6. Semantic Analysis

• Scope rules and Block structure:

• In the above program there are five blocks • First block : Variables int i,j and the function f has

scope for entire program(global).• Second block: the variable size has one scope• Third block: variables i and temp has different scope

Page 31: 6. Semantic Analysis

• Scope rules and Block structure:

• In the above program there are five blocks • Third :variables i and temp has different scope

» Variable i is of type char but in global scope it is of type int» Until we come out of this 3rd block (function f) the type of i will

be char. When we come out of 3rd block again variable i will be of type int.

» Compiler is giving more priority to local variables.• When we come out of the block the variable i of type char

will be deleted and again variable i of type int will come in to picture.

Page 32: 6. Semantic Analysis

• Scope rules and Block structure:

• In the above program there are five blocks • Third :variables i and temp has different scope

– When processing this 3rd block the symbol table will look like

Page 33: 6. Semantic Analysis

• Scope rules and Block structure:

• Variable i is saved twice with both types• Variable i of type char is saved first, so when try

to look up for variable i, it will return the char variable only (but not int variable)

• When we come out of the block, i (char) and temp (char) will be deleted from the symbol table

Page 34: 6. Semantic Analysis

• Scope rules and Block structure:

• In the above program there are five blocks • Fourth block :variables j of type double will be added to

the table and the symbol table looks like

Page 35: 6. Semantic Analysis

• Scope rules and Block structure:

• In the above program there are five blocks • At the end of the function f block all the variables

declared in it are deleted and the symbol table will be

Page 36: 6. Semantic Analysis

• Scope rules and Block structure: • One more type of scope declaration is using different

symbol tables for each block (as seen in the previous slides) and the practical implementation will look like

Page 37: 6. Semantic Analysis

37

class Foo { int value; int test() { int b = 3; return value + b; } void setValue(int c) { value = c; { int d = c; c = c + d; value = c; } }}

class Bar { int value; void setValue(int c) {

value = c; }}

scope of value

scope of b

scope of c

scope of c scope of value

scope of d

Symbol table example

block1

Page 38: 6. Semantic Analysis

38

class Foo { int value; int test() { int b = 3; return value + b; } void setValue(int c) { value = c; { int d = c; c = c + d; value = c; } }}

class Bar { int value; void setValue(int c) {

value = c; }}

Symbol table example

Symbol Kind Type Properties

value field int …

test method -> int

setValue method int -> void

(Foo)

Symbol Kind Type Properties

b var int …

(test)Symbol Kind Type Properties

c var int …

(setValue)

Symbol Kind Type Properties

d var int …

(block1)

Page 39: 6. Semantic Analysis

39

Symbol Kind Type Properties

value field int …

test method -> int

setValue method int -> void

Symbol Kind Type Properties

b var int …

Symbol Kind Type Properties

c var int …

Symbol Kind Type Properties

d var int …

(Foo)

(test)

…Symbol table example cont.

(setValue)

(block1)

Page 40: 6. Semantic Analysis

40

Checking scope rulesSymbol Kind Type Properties

value field int …

test method -> int

setValue method int -> void

Symbol Kind Type Properties

b var int …

Symbol Kind Type Properties

c var int …

Symbol Kind Type Properties

d var int …

(Foo)

(test) (setValue)

(block1)void setValue(int c) { value = c; { int d = c; c = c + d; value = c; }}

lookup(value)

Page 41: 6. Semantic Analysis

41

Symbol Kind Type Properties

value field int …

test method -> int

setValue method int -> void

Symbol Kind Type Properties

b var int …

Symbol Kind Type Properties

c var int …

Symbol Kind Type Properties

d var int …

(Foo)

(test) (setValue)

(block1)void setValue(int c) { value = c; { int d = c; c = c + d; myValue = c; }}

lookup(myValue)

Error !undefined

symbol

Catching semantic errors

Page 42: 6. Semantic Analysis

6.4 Data Types and Type Checking

Page 43: 6. Semantic Analysis

Principal Task of CompilerType inference

The computation and maintenance of information on data types

Type checking • Type checking uses logical rules to reason about the behavior of a

program at run time. Specifically, it ensures that the types of the operands match the type expected by an operator. For example, the && operator in Java expects its two operands to be booleans; the result is also of type boolean.

The use of the information to ensure that each part of the program makes sense under the type rules of the language.

These two tasks are related, performed together, and referred as type checking.

Page 44: 6. Semantic Analysis

Data Type

• Data type information may be either static or dynamic, or a combination of two.

Ex: In LISP type inference and type checking will be done during the execution(dynamic)

Ex: In Pascal, C and Ada type information is primarily static, and is used as the principal mechanism for checking the correctness of the program before execution(static)