![Page 1: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/1.jpg)
Compiler DesignSpring 2018
3.0 Frontend
1
Thomas R. Gross
Computer Science DepartmentETH Zurich, Switzerland
![Page 2: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/2.jpg)
Admin issues
§ Recitation sessions take place only when announced§ In the lecture / on course website / on the mailing list
§ No recitation session this week§ Next recitation session
§ March 15, 2018 @ 15:00§ ETF E1 (tentative)
2
![Page 3: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/3.jpg)
Compiler model
3
Source program
ASM file
“Front-end”
IR
“Back-end”
OptimizerQuestion: How to build IR (tree)?
![Page 4: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/4.jpg)
Overview
§ 3.1 Introduction§ 3.2 Lexical analysis§ 3.3 “Top down” parsing§ 3.4 “Bottom up” parsing
4
![Page 5: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/5.jpg)
3.1 Introduction
§ Frontend responsible to turn input program into IR§ Input: Usually a string of ASCII or Unicode characters§ IR: As required by later stages of the compiler
§ Frontend divided into§ Lexical analysis – deals with reading the input program
§ Also known as scanning§ Scanner, Lexer
§ Syntactic analysis – understand structure of the input program§ Also known as parsing§ Parser
5
![Page 6: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/6.jpg)
3.1 Introduction (cont’d)§ Good news: Syntactic and lexical analysis well understood
§ Good theory and books, e.g., Aho et al., Chapters 2 (in part), 3, and 4§ Good tools
§ Bad news: Even good tools may be painful to use§ Good == powerful§ Many options§ Still can’t handle all possible languages§ May give cryptic error messages
6
![Page 7: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/7.jpg)
3.1 Introduction (cont’d)§ Need to understand theory to use tool
§ Same theory that allows building tool§ Tools made hand-crafted frontends obsolete§ Frontend tools used for other domains
7
![Page 8: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/8.jpg)
Languages
§ Frontend processes input program§ Need a way to describe what input is allowed
§ Formal languages§ Well-researched area§ First part of compilers supported by tools
§ In this lecture: brief review§ Aho et al. covers topic in more depth§ Focus on essentials
§ (Speed an issue in real life)§ Theory behind tools
8
![Page 9: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/9.jpg)
Languages: Grammar§ Grammars provide a set of rules to generate “strings”§ A grammar consists of
§ Terminals: a, b, c, …§ Non-terminals: X, Y, Z, …§ Set of productions§ Start symbol: S
§ Some terminology§ Terminal symbols: Sometimes called characters or tokens§ Non-terminal symbols: Also called syntactic variables§ String: Sequence of symbols from some alphabet
§ Other terms: Word, sentence 9
![Page 10: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/10.jpg)
Productions§ General form
§ Left-hand side à Right-hand side§ LHS à RHS (for short)§ LHS, RHS: Strings over alphabets of terminal and non-terminal symbols
§ Example: Grammar G1S à aBaS à aXaXb à Xbc | cBa à aBa | b
§ How does a grammar generate a language (known as L(G))?§ Using the grammar G1 as an example 10
![Page 11: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/11.jpg)
L(G)§ From production to derivation
Given § w -- a word over (T ∪ NT),§ a, b, g words over (T ∪ NT)
§ (a, b, g may be empty) s.t. w = a b g and P a production bà d
We say that w’ = a d g is derived from w, i.e., w ⇒ w’.
§ Example derivation (with G1)§ S ⇒ aBa ⇒ aaBa ⇒ aab
§ L(G1) = anb, n ≥ 1 12
S à aBaS à aXaXb à Xbc | cBa à aBa | b
![Page 12: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/12.jpg)
L(G)
§ L(G) = set of strings w such that§ w consists only of symbols from the set of terminals§ There exists a sequence of productions P1, P2, …Pn such that S ⇒ RHS1 by
P1, … (by Pi), …. ⇒ w (by Pn)§ In other words: there exists a derivation S ⇒ P1 … … ⇒ Pn w (or S ⇒* w)
14
![Page 13: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/13.jpg)
Productions, 2nd look
§ No constraints on LHS, RHS§ Some RHS could be dead-end street
S à aXaXb à …
§ Remove dead-end streets§ Updated grammar G1’
S à aBaBa à aBa | b
16
![Page 14: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/14.jpg)
Productions, 3rd look
§ We care about L(G) – prune productions that do not contribute
§ Restrictions on LHS§ Only a single non-terminal is allowed on the left hand side§ For example: A à a§ “Context free” grammar or Type-2 grammar
§ Context-free grammars important § Efficient analysis techniques known§ From now on only context-free grammars unless noted 17
![Page 15: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/15.jpg)
Regular and linear grammars
§ Linear grammar: Context-free, at most 1 NT in the RHS§ Left-linear grammar: Linear, NT appears at left end of RHS§ Right-linear grammar: Linear, NT appears at the right end of RHS§ Regular grammar: Either right-linear or left-linear§ Regular grammars generate regular languages
§ Could also be described by regular expression§ Can be recognized by Finite Deterministic Automaton§ Type-3 grammar
18
![Page 16: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/16.jpg)
Special cases
§ ∅ – a language (but not an interesting one)§ e – the empty string
§ Must use a symbol so that we can see it
§ Can be the RHS§ A à e
22
![Page 17: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/17.jpg)
3.1 Introduction
§ So far: Brief summary of grammars§ Using multiple grammars to save work § Properties of derivations§ Parse trees§ Properties of grammars
§ Detect ambiguity§ Avoid ambiguity
23
![Page 18: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/18.jpg)
3.1.1 Example grammar G2
§ Start symbol: S§ Terminals: { a, b, …, z, +, -, *, /, ( , ) }§ Non-Terminals: { S, E, Op, Id, L, M, N }§ Productions
S à EE à E Op E | - E | ( E ) | IdOp à + | - | * | / Id à L ML à a | b | ... | zM à L M | N M | eN à 0 | 1 | ... | 9 24
Note: ℇ-production allows us to make M “disappear”
S ⇒ E ⇒ Id ⇒ L M ⇒ L L M ⇒ a L M ⇒ ap M ⇒ ap
![Page 19: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/19.jpg)
Parsing
§ Given G and a word w ∈T*: we want to know if “w ∈ L(G)?”
§ Analysis problem§ Answer is either YES or NO§ ap ∈L(G2)§ ap + bp ∈L(G2)§ ap++ ∉ L(G2)
§ For YES we need to find a sequence of productions so that S ⇒ … … ⇒w§ (or S ⇒* w for short) 26
![Page 20: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/20.jpg)
w = a3 + b§ Derivation
S ⇒ E ⇒ E Op E ⇒ E Op Id ⇒ E + Id ⇒ Id + Id ⇒Id + LM ⇒ Id + L ⇒ Id + b ⇒ LM + b ⇒ a M + b ⇒a N M + b ⇒ a3 M + b ⇒ a3 + b
29
![Page 21: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/21.jpg)
Comments§ If a string w contains multiple non-terminals we have a choice
when expanding w ⇒w’§ Grammars that are context-free and without useless non-terminals: must
have a production for each non-terminal in w§ Assume A, B ∈ NT, A à a , B à b are productions P1, P2
§ w = d A t B g§ Choice #1: w1 = d a t B g§ Choice #2: w2 = d A t b g§ (Both w ⇒ w1 or w ⇒ w2 possible)
30
![Page 22: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/22.jpg)
More comments
§ Question: Does the choice influence L(G)?§ Or, is (w1 ⇒ * x ∈ L(G)) ⇔ (w2 ⇒ * x ∈ L(G))
§ Answer: choice does not matter for context-free grammars
§ How to decide which production to pick?§ Everything worked out in the example
§ We’ve always picked the right production
§ Found w = a3 + b
§ Later more…
31
![Page 23: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/23.jpg)
More comments
§ Part of the derivation is pretty boring§ Do we care about exact steps to generate identifier “a3”?§ Details (not always) important
32
![Page 24: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/24.jpg)
3.1.1 Example grammar G2
§ Start symbol: S§ Terminals: { a, b, …, z, +, -, *, /, ( , ) }§ Non-Terminals: { S, E, Op, Id, L, M, N }§ Productions
S à EE à E Op E | - E | ( E ) | IdOp à + | - | * | / Id à L ML à a | b | ... | zM à L M | N M | eN à 0 | 1 | ... | 9 33
![Page 25: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/25.jpg)
More comments
§ Part of the derivation is pretty boring§ Do we care about exact steps to generate identifier “a3”?§ Details (not always) important
§ Can we find a better way to deal with this aspect?§ Better: Simpler§ Better: Maybe also more efficient
34
![Page 26: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/26.jpg)
36
![Page 27: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/27.jpg)
Regular expressions§ Idea: Use regular expression to capture “uninteresting” part
of a grammar§ Here: Exact rules for identifier names§ Replace part of grammar G2
…Id à L ML à a | b | ... | zM à L M | N M | eN à 0 | 1 | ... | 9
§ Regular expressions recognized by Finite State Machines§ Either a Deterministic Finite Automaton (DFA)§ Or a Nondeterministic Finite Automaton (NFA) 37
![Page 28: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/28.jpg)
Token§ Idea: Introduce grammar symbol that represents
string described by regular expression§ Terminal for the grammar§ Rules/production to generate regular expression string
§ When looking for a derivation identify strings that can be described by regular expression§ “Token”§ Example: a3 + b Tokens: Id (“a3”) + Id (“b”)
§ Chunks of the input stream§ More in 3.2 Lexical analysis 38
regexp regexp
![Page 29: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/29.jpg)
Examples
§ a3 + b … really … Id(“a3”) + Id(“b”)§ z * u + x … really … Id(“z”) * Id(“u”) + Id(“x”)
§ Id * Id + Id ∈ L(G2)
§ Treat terminals the same way§ Id(“z”) Term(“*”) Id(“u”) Term(“+”) Id(“x”)
40
![Page 30: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/30.jpg)
3.1.2 Simplified grammar G3
§ Start symbol: S
§ Terminals: { a, b, …, z, +, -, *, /, ( , ), Id }
§ Non-Terminals: { S, E, Op, Id, L, M, N }
§ Productions and regular definitions
S à EE à E Op E | - E | ( E ) | IdOp à + | - | * | / Id: L { L | N } *
41regexp
L = { a | b | c | … | z }N = { 0 | 1 | 2 | … | 9 }
![Page 31: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/31.jpg)
More simplifications?§ Can grammar G3 simplified even further?§ Are there other productions we can replace with a regular
expression?§ Productions
S à EE à E Op E | - E | ( E ) | IdId à L { L | N } * L = { a | b | c | … | z }
N = { 0 | 1 | 2 | … | 9 }Op à + | - | * | /
§ Could treat Op the same wayOp: { + | - | * | / } 43
![Page 32: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/32.jpg)
Simplified grammar G4
§ Start symbol: S
§ Terminals: { a, b, …, z, +, -, *, /, ( , ), Id }
§ Non-Terminals: { S, E, Op}
§ Productions and regular definitionsS à E (1) E à E Op E (2)
| - E (3)| ( E ) (4)| Id (5)
Op à + | - | * | / (6)Id: L { L | N } * L = { a | b | c | … | z }, N = { 0 | 1 | 2 | … | 9 } 44
![Page 33: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/33.jpg)
w = a 3 + b
Please take a piece of paper and find a derivation for “a 3 + b”
(Raise your hand when you are done.)
Compare your solution with your neighbor’s solution.§ Do you start with the same production?§ Do you use the same production in the 2nd step?
45
![Page 34: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/34.jpg)
w = a3 + b
§ Some example derivations§ Derivation #1
S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
§ Derivation #2S ⇒ E ⇒E Op E ⇒ E Op Id ⇒ E + Id ⇒ Id + Id
§ More?
48
![Page 35: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/35.jpg)
Looking at derivations
§ a3 + b ∈L(G4), i.e., a3 + b is a legal program§ At least according to grammar G4
§ Are we done?
§ More analysis is needed§ Looking at derivation helps start analysis§ Derivations may provide information on structure
50
![Page 36: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/36.jpg)
Questions for derivations
§ Does the order of applying productions matter?§ Are derivations unique?
§ How do we compare derivations?
51
![Page 37: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/37.jpg)
Choice of non-terminal in derivation step
§ Given w = d A t B g (with A, B ∈ NT, A à a , B à b productions)
§ Two choices§ w ⇒ d a t B g§ w ⇒ d A t b g
52
![Page 38: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/38.jpg)
Many options
§ In Derivation #1: Always the left-most non-terminal is picked for replacement§ “Left-most” derivation
§ In Derivation #2: Always the right-most non-terminal is picked for replacement§ “Right-most” derivation
§ No influence on L(G)§ But useful to distinguish “different” derivations§ Intuitively: Different derivations might convey different “meaning” 53
![Page 39: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/39.jpg)
Derivations
§ Given a grammar G with productions Pi.§ Consider two derivations D1 = Pa Pb Pc … Pn and
D2 = P’a P’b P’c … P’n§ Pj, P’k productions, applied as intended
§ Are D1 and D2 the same?§ Again (intuitively): Do they ”mean” the same?
54
![Page 40: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/40.jpg)
Derivations
§ Question: Are D1 and D2 both right-most derivations(or both left-most derivations)?§ YES: if Pm = P’m for all 1 ≤ m ≤ n § NO: We can’t easily compare
§ Later more (parse trees)
§ Looking at right-most (or left-most) derivations allows us to compare derivations§ Different derivations don’t matter always§ … but sometimes they do (more later)
55
![Page 41: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/41.jpg)
Parse tree
§ Want to identify structure expressed by derivation§ Compare two derivations that are not both right-most (or both left-
most) derivations
§ Summary of derivation§ Ignore the order of applying productions§ Leaves: Terminals§ Interior nodes: Application of a production
56
![Page 42: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/42.jpg)
Parse tree construction§ How to construct parse tree?§ Induction
§ Given derivation a1 ⇒ a2 ⇒… ai ⇒ ai+1 ⇒ ... ⇒ an
§ Step 1: Construct tree for a1
§ Really tree for A = a1
§ Single node labeled A§ Step i: Assume tree for a1 ⇒ a2 ⇒…⇒ ai already constructed
§ ai = X1 X2 … Xj … Xk§ Assume Xjà b = Y1 … Ym leads to ai ⇒ ai+1§ Take tree built for a1 ⇒ a2 ⇒…⇒ ai§ Find j-th leaf in this tree – this is labeled Xj.§ Add m new children (all leaves), labeled Y1 … Ym§ Special case: m = 0, i.e. b = e§ Add one child with label e
58
![Page 43: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/43.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
59
S
![Page 44: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/44.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
60
S
E
S à E
![Page 45: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/45.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
61
S
E
Op EE
S à EE àE Op E
![Page 46: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/46.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
62
S
E
Op EE
Id
S à EE àE Op E
E à Id
![Page 47: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/47.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
63
S
E
Op EE
Id +
S à EE àE Op E
E à IdOp à +
![Page 48: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/48.jpg)
Example: Constructing a parse tree
§ Derivation #1 (left-most derivation)S ⇒ E ⇒E Op E ⇒ Id Op E ⇒ Id + E ⇒ Id + Id
64
S
E
Op EE
Id + Id
S à EE àE Op E
E à IdOp à +E à Id
![Page 49: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/49.jpg)
Example: Constructing a parse tree§ Derivation #2 (right-most derivation)
S ⇒ E ⇒E Op E ⇒ E Op Id ⇒ E + Id ⇒ Id + Id
§ Same tree!§ Parse tree summarizes derivation (you can find production used)§ No statement regarding the right-most or left-most derivation 67
S
E
Op EE
Id + Id
![Page 50: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/50.jpg)
a + b * c
Talk to your neighbor and find a derivation for “a + b * c”(Hint: right-most or left-most)
Construct the parse tree for your derivation
Compare your tree with the result obtained by your neighbor team
68
![Page 51: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/51.jpg)
Derivations for a + b * c§ Tree #1 § Tree #2
70
S
E
Op EE
Id + Op EE
Id * Id
S
E
OpE E
Id*Op EE
Id + Id
Note: Each tree can be obtained using both a left-most and a right-most derivation.
![Page 52: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/52.jpg)
Derivations and parse trees
§ Derivations with different parse trees§ For the same string w
§ What was intended by the programmer?§ Tree #1 means: a + (b * c)§ Tree #2 means: (a + b) * c
71
![Page 53: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/53.jpg)
Derivations and parse trees
§ Derivations with different parse trees§ For the same string w
§ What was intended by the programmer?§ Tree #1 means: a + (b * c)§ Tree #2 means: (a + b) * c
§ Should we allow grammars with different parse trees for w?§ Probably not for programming languages (if derivations capture
structure) 72
![Page 54: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/54.jpg)
Different parse trees
§ There are grammars that allow more than one right-most derivation for w ∈ L(G)§ (Or more than one left-most derivation)
§ Different right-most (left-most) derivations result in different parse trees§ Capture different structure
74
![Page 55: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/55.jpg)
Different parse trees
§ There are grammars that allow more than one right-most derivation for w ∈ L(G)§ (Or more than one left-most derivation)
§ Example (right-most)§ Derivation #1: S ⇒ E ⇒ E Op E ⇒ E Op E Op E ⇒ E Op E Op Id ⇒ E Op E
* Id ⇒ E Op Id * Id ⇒ E Op Id * Id ⇒ E + Id * Id ⇒ Id + Id * Id
§ Derivation #2: S ⇒ E ⇒ E Op E ⇒ E Op Id ⇒ E * Id ⇒ E Op E * Id ⇒ E Op Id * Id ⇒ E + Id * Id ⇒ Id + Id * Id
76
![Page 56: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/56.jpg)
Derivations and parse treesTree #1 Tree #2
77
S
E
OpE E
Id*Op EE
Id + Id
S
E
Op EE
Id + Op EE
Id * Id
![Page 57: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/57.jpg)
3.1.3 Ambiguity
§ A grammar that allows more than on parse tree for at least one w ∈ L(G) is called ambiguous
§ Note: Ambiguity is property of the grammar§ We give later a non-ambiguous grammar for expressions
§ We need to compare parse trees (and derivations)§ Comparing derivations easy if only left-most (right-most) used
§ Alternative definition: A grammar that allows more than one (right | left)-most derivation for at least one w ∈ L(G) is called ambiguous 78
![Page 58: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/58.jpg)
Problems w/ ambiguity
§ Compiler does not know how to interpret “a + b * c”§ Is it Tree #1? I.e., (a + b) * c§ Or is it Tree #2? I.e., a + (b * c)
§ What can we do?
79
![Page 59: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/59.jpg)
Addressing ambiguity
§ Change the grammar§ See later for better grammar§ May not always be possible
§ Change language§ Add rules that “*” binds more strongly than “+”
§ Precedence§ Resolves conflicts
§ Bad idea: Let the compiler (writer) decide§ Or let the user worry 80
![Page 60: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/60.jpg)
Another example
§ “If” statement§ Two forms
§ if (Condition) then (Body)§ if (Condition) then (Body) else (Body)
81
![Page 61: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/61.jpg)
Another example – G5
§ Start symbol: S§ Productions
S à stmt-list S | stmt-liststmt-list à …. | if-stmtif-stmt à if cond-expr then S |
if cond-expr then S else Scond-expr à …
§ Other statements (assign, function call, …) and expression details omitted (abbreviate Body, Cond)
82
![Page 62: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/62.jpg)
Please construct with your neighbor a program fragment that shows that this grammar is ambiguous.
(Find an example with two parse trees)
83
![Page 63: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/63.jpg)
Another example
if (Cond) then if (Cond) then (Body) else (Body)Body: some other stmtsCond: some condition expression
What did the programmer mean?
84
if (Cond) then (Body)if (Cond) then (Body) else (Body)
if (Cond) then (Body)if (Cond) then (Body)
else (Body)
![Page 64: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/64.jpg)
85
![Page 65: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/65.jpg)
Grammar G6
§ S à SS | (S) | () | ε
§ Is G6 ambiguous?
§ What is L(G6)? Find two right-most (left-most) derivations for some w.
§ Find a grammar G6’ such that L(G6) == L(G6’) and G6’ is not ambiguous. 88
![Page 66: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/66.jpg)
89
![Page 67: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/67.jpg)
Ambiguous languages
§ Ambiguity is a property of the grammar§ One word is enough to show ambiguity§ How do you show that a grammar is not ambiguous?
§ Proof (for one grammar)§ Some kinds of grammars are certified unambiguous
§ We will look at those in compiler design
§ Unfortunately there are languages that are inherently ambiguous§ All grammars that generate such a language are ambiguous§ Even for Type-2 (context free) grammars
91
![Page 68: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/68.jpg)
92
![Page 69: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/69.jpg)
Transition from parse tree to IR
§ Parse tree§ Sometimes called concrete syntax tree§ Interior nodes represent non-terminals
§ Our tree-based IR: Abstract-syntax tree§ Interior nodes represent programming constructs§ Non-terminals not (directly) preserved§ Structure close to that of the parse tree
§ Building IR: Via derivations or separate transformation step
94
![Page 70: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/70.jpg)
Parse tree vs IRConcrete syntax tree Abstract syntax tree (IR)
95
S
E
Op EE
Ida7 + Id
b
+
VARb
VARa7
![Page 71: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/71.jpg)
Parsing
§ Given G and a word w ∈T*: we want to know if “w ∈ L(G)?”§ Analysis problem
§ Answer is either YES or NO
§ If (and only if) we find a sequence of productions so that S ⇒* w then w ∈ L(G)
96
![Page 72: w03 01-front-end-overview 18 · 3.1 Introduction §Frontend responsible to turn input program into IR §Input: Usually a string of ASCII or Unicode characters §IR: As required by](https://reader033.vdocuments.net/reader033/viewer/2022050220/5f655eaaae0b096ba81348f6/html5/thumbnails/72.jpg)
Summary
§ Frontend performs two tasks§ Break input into tokens§ Analyze that sequence of tokens is legal input
§ Find derivation S ⇒* w
§ Goal: produce IR§ Parse trees capture derivations
§ Information about structure – needed for IR
§ Our IR is tree-based, so step from parse tree to IR tree not that large 98