chapter 2. design of a simple compiler j. h. wang sep. 21, 2015
TRANSCRIPT
![Page 1: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/1.jpg)
Chapter 2. Design of a Simple Compiler
J. H. WangSep. 21, 2015
![Page 2: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/2.jpg)
Outline
• An Informal Definition of the ac Language
• Formal Definition of ac• Phases of a Simple compiler• Scanning• Parsing• Abstract Syntax Trees• Semantic Analysis• Code Generation
![Page 3: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/3.jpg)
Introduction
• An overview of compilation process by considering a simple language
• A quick overview of a compiler’s phases and their associated data structures
![Page 4: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/4.jpg)
An Informal Definition of the ac Language
• ac: adding calculator• Types
– integer– float: allows 5 fractional digits after the decimal point– Automatic type conversion from integer to float
• Keywords– f: float– i: integer– p: print
• Variables– 23 names from lowercase Roman alphabet except the three
reserved keywords f, i, and p• Target of translation: dc (desk calculator)
– Reverse Polish notation (RPN)
![Page 5: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/5.jpg)
An Example ac Program
• Example ac program:– f b
i aa = 5b = a + 3.2p b$
• Corresponding dc code– 5
sala3.2+sblbp
![Page 6: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/6.jpg)
Formal Definition of ac
• Syntax specification: context-free grammar (CFG)– (Chap. 4)
• Token specification: regular expressions– (Sec. 3.2)
![Page 7: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/7.jpg)
Syntax Specification
![Page 8: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/8.jpg)
• CFG:– A set of productions or rewriting rules– E.g.: Stmt id assign Val Expr
| print id– Two kinds of symbols
• Terminals: cannot be rewritten– E.g.: id, assign, print– Empty or null string: λ– End of input stream or file: $
• Nonterminals:– Start symbol: Prog– E.g.: Val, Expr
– Left-hand side (LHS)– Right-hand side (RHS)
![Page 9: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/9.jpg)
• Starting from the start symbol• Choosing some nonterminal symbol
and finding a production for it• Replacing it with the string of
symbols on the RHS• Any string of terminals that can be
produced: syntactically valid• Otherwise: syntax error
![Page 10: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/10.jpg)
![Page 11: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/11.jpg)
Token Specification
![Page 12: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/12.jpg)
![Page 13: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/13.jpg)
Phases of a Simple Compiler
• Scanner: source ac program -> tokens– Chap. 3
• Parser: tokens -> abstract syntax tree (AST)– Chap. 5 & 6
• Symbol table: created from AST– Chap. 8
• Semantic analysis: AST decoration• Translation: by traversing AST
![Page 14: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/14.jpg)
Scanning
• To translate a stream of characters into a stream of tokens– Automatic construction of scanners: Chap.3– Token:
• Type: membership in the terminal alphabet• Semantic value: additional information
– For most programming languages, the scanner’s job is not so easy
• +, ++• //, “, \”• Variable-length tokens
![Page 15: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/15.jpg)
CANNER
PEEK
PEEK
ADVANCE
ADVANCE
CAN IGITS
EXICAL RROR
![Page 16: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/16.jpg)
CAN IGITS
PEEK
PEEK
PEEK
ADVANCE
ADVANCE
ADVANCE
![Page 17: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/17.jpg)
Parsing
• To determine if the stream of tokens conforms to the language’s grammar (Chap. 4, 5, 6)– e.g.: Are these valid statements?
• b = a + 3.2 • p b
– For ac, a simple parsing technique called recursive descent is used
• “Mutually recursive parsing routines that descend through a derivation tree”
• Each nonterminal has an associated parsing procedure for determining if the token stream contains a sequence of tokens derivable from that nonterminal
![Page 18: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/18.jpg)
Predicting a Parsing Procedure
• Examine the next input token to predict which production should be applied– E.g.:
• Stmt id assign Val Expr• Stmt print id
– Predict set• {id} [1]• {print} [6]
![Page 19: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/19.jpg)
TMT
PEEK
PEEK
MATCH
MATCH
MATCH
MATCH
ERROR
AL
XPR
TMT
PEEK
PEEK
MATCH
MATCH
MATCH
MATCH
ERROR
AL
XPR
![Page 20: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/20.jpg)
• Consider the productions for Stmts– Stmts Stmt Stmts– Stmts λ
• The predict sets– {id, print} [8]– {$} [11]
![Page 21: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/21.jpg)
TMTS
PEEK PEEK
PEEK
TMT
TMTS
ERROR
![Page 22: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/22.jpg)
Implementing the Production
• When a terminal is encountered, a call to MATCH() is placed
• For each nonterminal, the corresponding procedure will be called
• For the symbol λ, no code is executed
![Page 23: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/23.jpg)
Abstract Syntax Trees
• Some aspects of compilation that can be difficult to perform during syntax analysis– Some aspects of language cannot be specified
in a CFG• E.g.: symbol usage consistency with type declaration
– Context sensitive
• In Java: x.y.z– Package x, class y, static field z– Variable x, field y, another field z
• Operator overloading– +: numerical addition or appending of strings
– Separation into phases makes the compiler much easier to write and maintain
![Page 24: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/24.jpg)
• Parse trees are large and unnecessarily detailed (Fig. 2.4)– Abstract syntax tree (AST) (Fig. 2.9)
• Inessential punctuation and delimiters are not included
– A common intermediate representation for all phases after syntax analysis
• Declarations need not be in source form• Order of executable statements explicitly represented• Assignment statement must retain identifier and
expression• Nodes representing computation: operation and operands• Print statement must retain name of identifier
![Page 25: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/25.jpg)
![Page 26: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/26.jpg)
Semantic Analysis
• Example processing include:– Declarations and name scopes are
processed to construct a symbol table– Type consistency– Make type-dependent behavior explicit
![Page 27: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/27.jpg)
Symbol Tables
• To record all identifiers and their types – 23 entries for 23 distinct identifiers in ac
(Fig. 2.11)• Type info.: integer, float, unused (null)• Attributes: scope, storage class, protection
properties
– Symbol table construction (Fig. 2.10)• Symbol declaration nodes call
VISIT(SymDeclaring n)• ENTERSYMBOL checks the given symbol has
not been previously declared
![Page 28: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/28.jpg)
![Page 29: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/29.jpg)
VISIT
GET YPE
NTER YMBOL
NTER YMBOL
NTER YMBOL
OOKUP YMBOL
ERROR
GET D
GET D
![Page 30: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/30.jpg)
Type Checking
• Only two types in ac– Integer– Float
• Type hierarchy– Float wider than integer– Automatic widening (or casting)
• integer -> float
![Page 31: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/31.jpg)
Type Analysis
VISIT
VISIT
VISIT
VISIT
VISIT
ONSISTENT
ONVERT
OOKUP YMBOL
![Page 32: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/32.jpg)
ONSISTENT
ENERALIZE
ONVERT
ONVERT
ONVERT
ENERALIZE
ERROR
![Page 33: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/33.jpg)
• Type checking– Constants and symbol reference: simply set
the node’s type based on the node’s contents– Computation nodes: CONSISTENT(n.c1, n.c2)– Assignment operation: CONVERT(n.c2,
n.c1.type)
• CONSISTENT()– GENERALIZE(): determines the least general
type– CONVERT(): checks whether conversion is
necessary
![Page 34: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/34.jpg)
![Page 35: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/35.jpg)
Code Generation
• The formulation of target-machine instructions that faithfully represent the semantics of the source program– Chap. 11 & 13– dc: stack machine model– Code generation proceeds by traversing the
AST, starting at its root• VISIT (Computing n): +, -• VISIT (Assigning n): =• VISIT (SymReferencing n)• VISIT (Printing n)• VISIT (Converting n)
![Page 36: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/36.jpg)
VISIT
VISIT
VISIT
VISIT
VISIT
VISIT
ODE EN
ODE EN
ODE EN
ODE EN
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
VISIT
VISIT
VISIT
VISIT
VISIT
VISIT
ODE EN
ODE EN
ODE EN
ODE EN
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
MIT
![Page 37: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/37.jpg)
![Page 38: Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015](https://reader035.vdocuments.net/reader035/viewer/2022062309/56649f265503460f94c3d832/html5/thumbnails/38.jpg)
End of Chapter 2