chap. 8, declaration processing and symbol tables j. h. wang dec. 13, 2011
TRANSCRIPT
Chap. 8, Declaration Processing and Symbol
TablesJ. H. Wang
Dec. 13, 2011
Outline
• Constructing a Symbol Table• Block-Structured Languages and
Scopes• Basic Implementation Techniques• Advanced Features• Declaration Processing Fundamentals• Variable and Type Declarations• Class and Method Declarations• An Introduction to Type Checking
Constructing a Symbol Table
• We walk (make a pass over) the AST for two purposes– To process symbol declarations– To connect each symbol reference with
its declaration
• An AST node is enriched with a reference to the name’s entry in the symbol table
Static Scoping
• Static scope: includes its defining block as well as any contained blocks that do not contain a declaration for the identifier
• Global scope: a name space shared by all compilation units
• Scopes might be opened and closed by braces ({ } as in C and Java), or by reserved keywords (begin and end as in Ada, Algol)
A Symbol Table Interface
• Methods – OpenScope()– CloseScope()– EnterSymbol(name, type)– RetreiveSymbol(name)– DeclaredLocally(name)
• Ex.– (Fig. 8.2) Code to build the symbol table
for the AST in Fig. 8.1
Block-Structured Languages and Scopes
• Block-structured languages: languages that allow nested name scopes– Concepts introduced by Algol 60
• Handling Scopes– Current scope: the innermost context– Open scopes (or currently active scopes):
the current scope and its surrounding scopes
– Closed scopes: all other scopes
• Some common visibility rules– Accessible names are those in the current
scope and in all other open scopes– If a name is declared in more than one scope,
then a reference to the name is resolved to the innermost declaration
– New declarations can be made only in the current scope
• Global scope– Extern: in C– Public static: in Java
• Compilation-unit scope: in C and C++– Declared outside of all methods
• Package-level scope: in Java• Every function definition is available in the global
scope, unless it has static attribute• In C++ and Java, names declared within a class
are available to all methods in the class– Protected members are available to the class’s
subclasses
• Names declared within a statement-block are available to all contained blocks, unless it’s redeclared in an inner scope
One Symbol Table or Many?
• Two common approaches to implementing block-structured symbol tables– A symbol table associated with each
scope– Or a single, global table
An Individual Table for Each Scope
• Because name scope are opened and closed in a last-in first-out (LIFO) manner, a stack is an appropriate data structure for a search– The innermost scope appears at the top of stack– OpenScope(): pushes a new symbol table– CloseScope(): pop– (Fig. 8.3)
• Disadvantage– Need to search a name in a number of symbol
tables– Cost depending on the number of nonlocal
references and the depth of nesting
One Symbol Table
• All names in the same table– Uniquely identified by the scope name
or depth
• RetrieveSymbol() need not chain through scope tables to locate a name
• More details in Sec.8.3.3– (Fig. 8.8)
Basic Implementation Techniques
• Entering and Finding Names• The Name Space• An Efficient Symbol Table
Implementation
Entering and Finding Names
• Examine the time needed to insert symbols, retrieve symbols, and maintain scopes– In particular, we pay attention to the cost of
retrieving symbols– Names can be declared no more than once in
each scope, but typically referenced multiple times
• Various approaches– Unordered list– Ordered list– Binary search trees– Balanced trees– Hash tables
Unordered List
• Simplest– Array– Linked list or resizable array
• All symbols in a given scope appear adjacently– Insertion: fast– Retrieval: linear scan
• Impractically slow
Ordered List
• Binary search: O(log n)– How to organize the ordered lists for a name in
multiple scopes?• An ordered list of stacks (Fig. 8.4)
– RetrieveSymbols() locates stacks using binary search
– CloseScope() examines each stack and pops those stacks whose top symbol is declared in the abandoned scope
• To avoid such checking, we maintain a separate linking of symbol table entries that are declared at the same scope level (Sec.8.3.3)
• Fast retrieval, but expensive insertion– Advantageous when the space of symbols is
known• Reserved keywords
Binary Search Trees
• Insert, search: O(log n), given random inputs– Average-case performance does not
necessarily hold for symbol tables– Programmers do not choose identifiers
at random!
• Advantage– Simple, widely known implementation
Balanced Trees
• The worst-case scenario for binary trees can be avoided if the tree is balanced– E.g.: red-black trees, splay trees– Insert, search: O(log n)
Hash Tables
• Most common, due to its excellent performance– Insert, search: O(1), given
• A sufficiently large table• A good hash function• Appropriate collision-handling techniques
– (Sec.8.3.3)
The Name Space
• Properties to consider– The name of a symbol does not change
during compilation– Symbol names persist throughout compilation– Great variance in the length of identifier
names– Unless an ordered list is maintained,
comparisons of symbol names involve only equality and inequality
• In favor of one logical name space (Fig. 8.5)
• Names are inserted, but never deleted• Two fields
– Origin– Length
An Efficient Symbol Table Implementation
• A symbol table entry containing– Name– Type– Hash– Var– Level– Depth
• Two index structures– Hash table– Scope display
• Symbols at the same level
– (Fig. 8.7) & (Fig. 8.8)
Advanced Features
• Extensions of the simple symbol table framework to accommodate advanced features of modern programming languages– Name augmentation (overloading)– Name hiding and promotion– Modification of search rules
Records and Typenames
• Aggregate data structures – struct, record
• E.g. a.b.c.d– C, Ada, Pascal: completely specifying the containers
and the field– COBOL, PL/I: intermediate containers can be omitted if
the reference can be unambiguously resolved» a.c, c.d: more difficult to read
• Can be nested arbitrarily deeply– Tree
• typedef: alias for a type – Symbol table
Overloading and Type Hierarchies
• Method overloading allowed in object-oriented languages such as C++ and Java– If each definition has a unique type signature
• Number and types of the parameters and return type– E.g.: print(int), print(String)
– To view a method definition not only in terms of its names but also its type signature
• To encode type signature of a method along with its name
– E.g.: M(int): void
• To record a method along with a list of its overloaded definitions
• Operator overloading: allowed in C++, Ada
• Ada allows literals to be overloaded– E.g.: diamond in two different enumeration
types: as a playing card, and as a gem
• Pascal, Fortran allow the same symbol to represent the invocation of a method and the value of the method’s result– Two entries in the symbol table
• C: the same name as a local variable, a struct name, and a label
– E.g.: (in Ex. 17)• main() {
struct xxx { int a, b; } c; int xxx;xxx: c.a = 1;}
• Type extension through subclassing allowed in Java, C++– resize(Shape) vs. resize(Rectangle)
Implicit Declarations
• In some languages, the appearance of a name in a certain context serves to declare the name as well– E.g.: labels in C– In Fortran: inferred from the identifier’s first letter– In Ada: an index is implicitly declared to be of the
same type as the range specifier– A new scope is opened for the loop so that the
loop index cannot clash with an existing variable• E.g. for (int i=1; i<10; i++) { … }
Export and Import Directives
• Export: some local scope names are to become visible outside that scope– Typically associated with modularization features
such as Ada packages, C++ classes, C compilation units, and Java classes
• Java: public attribute, String class in java.lang package• C: all methods are known outside unless the static
attribute is specified
• In a large software system, the space of global names can become polluted and disorganized– C, C++: Header files– Java: import– Ada: use
Altered Search Rules
• To alter the way in which symbols are found in symbol table– In Pascal: with R do S
• First try to resolve an identifier as a field of the record R
• Advantageous if R is a complex name• Can usually generate efficient code
• Forward reference in recursive data structures or methods– A portion of the program will reference a
definition that has not yet been processed– Must be announced in some languages
Symbol Table Summary
• The symbol table organization in this chapter efficiently represents scope-declared symbols in a block-structured language
• Most languages include rules for symbol promotion to a global scope
• Issues such as inheritance, overloading, and aggregate data types must be considered
Declaration Processing Fundamentals
• Attributes in the symbol table– Internal representations of declarations– Identifiers are used in many different ways in a
modern programming language• Variables, constants, types, procedures, classes, and
fields• Every identifier will not have the same set of attributes
– We need a data structure to store the variety of information
• Using a struct that contains a tag, and a union for each possible value of the tag
• Using object-based approach, Attributes and appropriate subclasses
Type Descriptor Structures
Type Checking Using an Abstract Syntax Tree
• Using the visitor pattern (in Chap. 7)– SemanticsVisitor: a subclass of Visitor
• The top-level visitor for processing declarations and doing semantic checking on the AST nodes
– TopDeclVisitor• A specialized visitor invoked by
SemanticsVisitor for processing declarations
– TypeVisitor• A specialized visitor used to handle an
identifier that represents a type or a syntactic form that defines a type (such as an array)
Variable and Type Declarations
• Simple variable declarations– A type name and a list of identifiers
• (Fig. 8.12)• Visitor actions: (Fig. 8.13)
Visit Method
Handling Type Names
Type Declarations
• A name and a description of the type to be associated with it– (Fig. 8.15)– Visit method: (Fig. 8.16)
Thanks for Your Attention!