chap. 8, declaration processing and symbol tables j. h. wang dec. 13, 2011

Chap. 8, Declaration Processing and Symbol

TablesJ. H. Wang

Dec. 13, 2011

Outline

• Constructing a Symbol Table• Block-Structured Languages and

Scopes• Basic Implementation Techniques• Advanced Features• Declaration Processing Fundamentals• Variable and Type Declarations• Class and Method Declarations• An Introduction to Type Checking

Constructing a Symbol Table

• We walk (make a pass over) the AST for two purposes– To process symbol declarations– To connect each symbol reference with

its declaration

• An AST node is enriched with a reference to the name’s entry in the symbol table

Static Scoping

• Static scope: includes its defining block as well as any contained blocks that do not contain a declaration for the identifier

• Global scope: a name space shared by all compilation units

• Scopes might be opened and closed by braces ({ } as in C and Java), or by reserved keywords (begin and end as in Ada, Algol)

A Symbol Table Interface

• Methods – OpenScope()– CloseScope()– EnterSymbol(name, type)– RetreiveSymbol(name)– DeclaredLocally(name)

• Ex.– (Fig. 8.2) Code to build the symbol table

for the AST in Fig. 8.1

Block-Structured Languages and Scopes

• Block-structured languages: languages that allow nested name scopes– Concepts introduced by Algol 60

• Handling Scopes– Current scope: the innermost context– Open scopes (or currently active scopes):

the current scope and its surrounding scopes

– Closed scopes: all other scopes

• Some common visibility rules– Accessible names are those in the current

scope and in all other open scopes– If a name is declared in more than one scope,

then a reference to the name is resolved to the innermost declaration

– New declarations can be made only in the current scope

• Global scope– Extern: in C– Public static: in Java

• Compilation-unit scope: in C and C++– Declared outside of all methods

• Package-level scope: in Java• Every function definition is available in the global

scope, unless it has static attribute• In C++ and Java, names declared within a class

are available to all methods in the class– Protected members are available to the class’s

subclasses

• Names declared within a statement-block are available to all contained blocks, unless it’s redeclared in an inner scope

One Symbol Table or Many?

• Two common approaches to implementing block-structured symbol tables– A symbol table associated with each

scope– Or a single, global table

An Individual Table for Each Scope

• Because name scope are opened and closed in a last-in first-out (LIFO) manner, a stack is an appropriate data structure for a search– The innermost scope appears at the top of stack– OpenScope(): pushes a new symbol table– CloseScope(): pop– (Fig. 8.3)

• Disadvantage– Need to search a name in a number of symbol

tables– Cost depending on the number of nonlocal

references and the depth of nesting

One Symbol Table

• All names in the same table– Uniquely identified by the scope name

or depth

• RetrieveSymbol() need not chain through scope tables to locate a name

• More details in Sec.8.3.3– (Fig. 8.8)

Basic Implementation Techniques

• Entering and Finding Names• The Name Space• An Efficient Symbol Table

Implementation

Entering and Finding Names

• Examine the time needed to insert symbols, retrieve symbols, and maintain scopes– In particular, we pay attention to the cost of

retrieving symbols– Names can be declared no more than once in

each scope, but typically referenced multiple times

• Various approaches– Unordered list– Ordered list– Binary search trees– Balanced trees– Hash tables

Unordered List

• Simplest– Array– Linked list or resizable array

• All symbols in a given scope appear adjacently– Insertion: fast– Retrieval: linear scan

• Impractically slow

Ordered List

• Binary search: O(log n)– How to organize the ordered lists for a name in

multiple scopes?• An ordered list of stacks (Fig. 8.4)

– RetrieveSymbols() locates stacks using binary search

– CloseScope() examines each stack and pops those stacks whose top symbol is declared in the abandoned scope

• To avoid such checking, we maintain a separate linking of symbol table entries that are declared at the same scope level (Sec.8.3.3)

• Fast retrieval, but expensive insertion– Advantageous when the space of symbols is

known• Reserved keywords

Binary Search Trees

• Insert, search: O(log n), given random inputs– Average-case performance does not

necessarily hold for symbol tables– Programmers do not choose identifiers

at random!

• Advantage– Simple, widely known implementation

Balanced Trees

• The worst-case scenario for binary trees can be avoided if the tree is balanced– E.g.: red-black trees, splay trees– Insert, search: O(log n)

Hash Tables

• Most common, due to its excellent performance– Insert, search: O(1), given

• A sufficiently large table• A good hash function• Appropriate collision-handling techniques

– (Sec.8.3.3)

The Name Space

• Properties to consider– The name of a symbol does not change

during compilation– Symbol names persist throughout compilation– Great variance in the length of identifier

names– Unless an ordered list is maintained,

comparisons of symbol names involve only equality and inequality

• In favor of one logical name space (Fig. 8.5)

• Names are inserted, but never deleted• Two fields

– Origin– Length

An Efficient Symbol Table Implementation

• A symbol table entry containing– Name– Type– Hash– Var– Level– Depth

• Two index structures– Hash table– Scope display

• Symbols at the same level

– (Fig. 8.7) & (Fig. 8.8)

Advanced Features

• Extensions of the simple symbol table framework to accommodate advanced features of modern programming languages– Name augmentation (overloading)– Name hiding and promotion– Modification of search rules

Records and Typenames

• Aggregate data structures – struct, record

• E.g. a.b.c.d– C, Ada, Pascal: completely specifying the containers

and the field– COBOL, PL/I: intermediate containers can be omitted if

the reference can be unambiguously resolved» a.c, c.d: more difficult to read

• Can be nested arbitrarily deeply– Tree

• typedef: alias for a type – Symbol table

Overloading and Type Hierarchies

• Method overloading allowed in object-oriented languages such as C++ and Java– If each definition has a unique type signature

• Number and types of the parameters and return type– E.g.: print(int), print(String)

– To view a method definition not only in terms of its names but also its type signature

• To encode type signature of a method along with its name

– E.g.: M(int): void

• To record a method along with a list of its overloaded definitions

• Operator overloading: allowed in C++, Ada

• Ada allows literals to be overloaded– E.g.: diamond in two different enumeration

types: as a playing card, and as a gem

• Pascal, Fortran allow the same symbol to represent the invocation of a method and the value of the method’s result– Two entries in the symbol table

• C: the same name as a local variable, a struct name, and a label

– E.g.: (in Ex. 17)• main() {

struct xxx { int a, b; } c; int xxx;xxx: c.a = 1;}

• Type extension through subclassing allowed in Java, C++– resize(Shape) vs. resize(Rectangle)

Implicit Declarations

• In some languages, the appearance of a name in a certain context serves to declare the name as well– E.g.: labels in C– In Fortran: inferred from the identifier’s first letter– In Ada: an index is implicitly declared to be of the

same type as the range specifier– A new scope is opened for the loop so that the

loop index cannot clash with an existing variable• E.g. for (int i=1; i<10; i++) { … }

Export and Import Directives

• Export: some local scope names are to become visible outside that scope– Typically associated with modularization features

such as Ada packages, C++ classes, C compilation units, and Java classes

• Java: public attribute, String class in java.lang package• C: all methods are known outside unless the static

attribute is specified

• In a large software system, the space of global names can become polluted and disorganized– C, C++: Header files– Java: import– Ada: use

Altered Search Rules

• To alter the way in which symbols are found in symbol table– In Pascal: with R do S

• First try to resolve an identifier as a field of the record R

• Advantageous if R is a complex name• Can usually generate efficient code

• Forward reference in recursive data structures or methods– A portion of the program will reference a

definition that has not yet been processed– Must be announced in some languages

Symbol Table Summary

• The symbol table organization in this chapter efficiently represents scope-declared symbols in a block-structured language

• Most languages include rules for symbol promotion to a global scope

• Issues such as inheritance, overloading, and aggregate data types must be considered

Declaration Processing Fundamentals

• Attributes in the symbol table– Internal representations of declarations– Identifiers are used in many different ways in a

modern programming language• Variables, constants, types, procedures, classes, and

fields• Every identifier will not have the same set of attributes

– We need a data structure to store the variety of information

• Using a struct that contains a tag, and a union for each possible value of the tag

• Using object-based approach, Attributes and appropriate subclasses

Type Descriptor Structures

Type Checking Using an Abstract Syntax Tree

• Using the visitor pattern (in Chap. 7)– SemanticsVisitor: a subclass of Visitor

• The top-level visitor for processing declarations and doing semantic checking on the AST nodes

– TopDeclVisitor• A specialized visitor invoked by

SemanticsVisitor for processing declarations

– TypeVisitor• A specialized visitor used to handle an

identifier that represents a type or a syntactic form that defines a type (such as an array)

Variable and Type Declarations

• Simple variable declarations– A type name and a list of identifiers

• (Fig. 8.12)• Visitor actions: (Fig. 8.13)

Visit Method

Handling Type Names

Type Declarations

• A name and a description of the type to be associated with it– (Fig. 8.15)– Visit method: (Fig. 8.16)

Thanks for Your Attention!

chap. 8, declaration processing and symbol tables j. h. wang dec. 13, 2011

Documents

symbol reference

symbol tablewe

symbol tablesj

scope tables

identifierglobal scope

number of symbol tablescost

inner scopeone symbol

new symbol tableclosescope