![Page 1: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/1.jpg)
Lexical and Syntax Analysis
Top-Down Parsing
![Page 2: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/2.jpg)
Data structure
Easy for programs
to transform
String of characters
Easy for humans to write and understand
Lexemes identified
String of tokens
![Page 3: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/3.jpg)
Syntax
A syntax is a set of rules defining the valid strings of a language, often specified by a context-free grammar.
For example, a grammar E for arithmetic expressions:
e → x | y | e + e | e – e | e * e | ( e )
![Page 4: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/4.jpg)
Derivations
A derivation is a proof that some string conforms to a grammar.
A leftmost derivation:
e ⇒ e + e ⇒ x + e ⇒ x + ( e ) ⇒ x + ( e * e ) ⇒ x + ( y * e ) ⇒ x + ( y * x )
![Page 5: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/5.jpg)
Derivations
A rightmost derivation:
e ⇒ e + e ⇒ e + ( e ) ⇒ e + ( e * e ) ⇒ e + ( e * x ) ⇒ e + ( y * x ) ⇒ x + ( y * x )
Many ways to derive the same string: many ways to write the same proof.
![Page 6: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/6.jpg)
Parse tree: motivation
Also a proof that a given input is valid according to the grammar. But a parse tree:
is more concise: we don’t write out the sentence every time a non-terminal is expanded.
abstracts over the order in which rules are applied.
![Page 7: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/7.jpg)
Parse tree: intuition
If non-terminal n has a production
n → X Y Z
where X, Y, and Z are terminals or non-terminals, then a parse tree may have an interior node labelled n with three children labelled X, Y, and Z.
n
X Y Z
![Page 8: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/8.jpg)
Parse tree: definition
A parse tree is a tree in which:
the root is labelled by the start symbol;
each leaf is labelled by a terminal symbol, or 𝜀;
each interior node is labelled by a non-terminal;
if n is a non-terminal labelling an interior node whose children are X1, X2, ⋯, Xn then there must exist a production n→ X1 X2 ⋯ Xn.
![Page 9: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/9.jpg)
Example 1
Example input string:
A resulting parse tree according to grammar E:
x + y * x
e
x
+
* e
e
e
y
x
e
![Page 10: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/10.jpg)
Example 2
The following is not a parse tree according to grammar E.
e
x
+
* e
e
e
y
x
Why? Because e → x + e is not a production in grammar E.
![Page 11: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/11.jpg)
Grammar notation
Non-terminals are underlined.
Rather than writing
we may write:
(Also, symbols → and ::= will be used interchangeably.)
e → x e → e + e
e → x | e + e
![Page 12: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/12.jpg)
Syntax Analysis
String of symbols
Parse tree
A parse tree is:
1. A proof that a given input is valid according to the grammar;
2. A data structure that is convenient for compilers to process.
(Syntax analysis may also report that the input string is invalid.)
![Page 13: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/13.jpg)
Ambiguity
If there exists more than one parse tree for any string then the grammar is ambiguous. For example, the string x+y*x has two parse trees:
e
e + e
x e * e
y x
e
* e
e + e
x y
e
x
![Page 14: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/14.jpg)
Operator precedence
Different parse trees often have different meanings, so we usually want unambiguous grammars.
Conventionally, * has a higher precedence (binds tighter) than +, so there is only one interpretation of x+y*x, namely x+(y*x).
![Page 15: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/15.jpg)
Operator associativity
Binary operators are either:
Conventionally, - is left-associative, so there is only one interpretation of x-x-x-x, namely ((x-x)-x)-x.
left-associative;
right-associative;
non-associative.
Even with precedence rules, ambiguity remains, e.g. x-x-x-x.
![Page 16: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/16.jpg)
Ambiguity removal
All operators are left associative, and * binds tighter than + and –.
e → x | y | e + e | e – e | e * e | ( e )
Example input:
![Page 17: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/17.jpg)
Ambiguity removal
Example output:
e → e + e1
| e – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e ) | x | y
Note: ignoring bracketed expressions e1 disallows + and –
e2 disallows +, -, and *
![Page 18: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/18.jpg)
Disallowed parse trees
e
* e
e + e
x y
e
x
LHS of * cannot
contain a +.
RHS of + cannot
contain a -.
e
e + e
x e - e
y x
After disambiguation, there are no parse trees corresponding to the following originals:
![Page 19: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/19.jpg)
Ambiguity removal: step-by-step
Given a non-terminal e which involves operators at n levels of precedence:
Step 1: introduce n+1 new non-terminals, e0 ⋯ en.
![Page 20: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/20.jpg)
Step 2a: replace each production
e → e op e
with
ei → ei op ei+1
| ei+1
if op is left-associative, or
ei → ei+1 op ei
| ei+1
if op is right-associative
Let op denote an operator with precedence i.
![Page 21: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/21.jpg)
Step 2b: replace each production
e → op e
with
ei → op ei
| ei+1
Step 2c: replace each production
e → e op
with
ei → ei op
| ei+1
![Page 22: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/22.jpg)
Grammar E after step 2 becomes:
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e → ( e ) | x | y
Operator Precedence
+, - 0
* 1
Construct the precedence table:
![Page 23: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/23.jpg)
Step 3: replace each production
e → ⋯
with
en → ⋯
e0 → e0 + e1
| e0 – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e ) | x | y
After step 3:
![Page 24: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/24.jpg)
Step 4: replace all occurrences of e0 with e.
e → e + e1
| e – e1
| e1
e1 → e1 * e2
| e2
e2 → ( e ) | x | y
After step 4:
![Page 25: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/25.jpg)
Exercise 1
Consider the following ambiguous grammar for logical propositions.
p → 0 (Zero) | 1 (One) | ~ p (Negation) | p + p (Disjunction) | p * p (Conjunction)
Now let + and * be right associative and the operators in increasing order of binding strength be : +, *, ~.
Give an unambiguous grammar for
logical propositions.
![Page 26: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/26.jpg)
Exercise 2
Which of the following grammars are ambiguous?
s → if b then s | if b then s else s | skip
e → + e e | – e e | x
b → 0 b 1 | 0 1
![Page 27: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/27.jpg)
Homework exercise
Consider the following ambiguous grammar G.
s → if b then s | if b then s else s | skip
Give a unambiguous grammar that accepts the same language as G.
![Page 28: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/28.jpg)
Summary so far
Syntax of a language is often specified by a context-free grammar
Derivations and parse trees are proofs.
Parse trees lead to a concise definition of ambiguity.
Construction of unambiguous grammars using rules of precedence and associativity.
![Page 29: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/29.jpg)
PART 2: TOP-DOWN PARSING
• Recursive-Descent
• Backtracking
• Left-Factoring
• Predictive Parsing
• Left-Recursion Removal
• First and Follow Sets
• Parsing tables and LL(1)
![Page 30: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/30.jpg)
Top-down parsing
Top-down: begin with the start symbol and expand non-terminals, succeeding when the input string is matched.
A good strategy for writing parsers:
1. Implement a syntax checker to accept or refute input strings.
2. Modify the checker to construct a parse tree – straightforward.
![Page 31: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/31.jpg)
RECURSIVE DESCENT
A popular top-down parsing technique.
![Page 32: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/32.jpg)
Recursive descent
A recursive descent parser consists of a set of functions, one for each non-terminal.
The function for non-terminal n returns true if some prefix of the input string can be derived from n, and false otherwise.
![Page 33: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/33.jpg)
Consuming the input
int eat(char c) { if (*next == c) { next++; return 1; } return 0; }
Consume c from input if possible.
We assume a global variable next points to the input string.
char* next;
![Page 34: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/34.jpg)
Recursive descent
int N() { char* save = next;
for each N → X1 X2 ⋯ Xn
if (parse(X1) && parse(X2) && ⋯ && parse(Xn)) return 1; else next = save;
return 0; }
For each non-terminal N, introduce:
Let parse(X) denote
X() if X is a non-terminal
eat(X) if X is a terminal
Backtrack
![Page 35: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/35.jpg)
Exercise 4
Consider the following grammar G with start symbol e.
Using recursive descent, write a syntax checker for grammar G.
e → ( e + e ) | ( e * e ) | v v → x | y
![Page 36: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/36.jpg)
Answer (part 1)
int e() { char* save = next;
if (eat('(') && e() && eat('+') && e() && eat(')')) return 1; else next = save;
if (eat('(') && e() && eat('*') && e() && eat(')')) return 1; else next = save;
if (v()) return 1; else next = save;
return 0; }
![Page 37: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/37.jpg)
Answer (part 2)
int v() { char* save = next; if (eat('x')) return 1; else next = save; if (eat('y')) return 1; else next = save; return 0; }
![Page 38: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/38.jpg)
Exercise 5
How many function calls are made by the recursive descent parser to parse the following strings?
(x*x)
((x*x)*x)
(((x*x)*x)*x)
(See animation of backtracking.)
![Page 39: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/39.jpg)
Answer
Input string Length Calls
(x*x) 5 21
((x*x)*x) 9 53
(((x*x)*x)*x) 13 117
Number of calls is quadratic in the length of the input string.
Lesson: backtracking expensive!
String length
Fun
ctio
n c
alls
![Page 40: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/40.jpg)
LEFT FACTORING
Reducing backtracking!
![Page 41: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/41.jpg)
Left factoring
When two productions for a non-terminal share a common prefix, expensive backtracking can be avoided by left-factoring the grammar.
Idea: Introduce a new non-terminal that accepts each of the different suffixes.
![Page 42: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/42.jpg)
Example 3
Left-factoring grammar G by introducing non-terminal r:
e → ( e r | v r → + e ) | * e ) v → x | y
Common prefix
Different suffixes
![Page 43: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/43.jpg)
Effect of left-factoring
Input string Length Calls
(x*x) 5 13
((x*x)*x) 9 22
(((x*x)*x)*x) 13 31
Number of calls is now linear in the length of input string.
Lesson: left-factoring a grammar reduces backtracking.
String length
Fun
ctio
n c
alls
![Page 44: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/44.jpg)
PREDICTIVE PARSING
Eliminating backtracking!
![Page 45: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/45.jpg)
Predictive parsing
Idea: know which production of a non-terminal to choose based solely on the next input symbol.
Advantage: very efficient since it eliminates all backtracking.
Disadvantage: not all grammars can be parsed in this way. (But many useful ones can.)
![Page 46: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/46.jpg)
Running example
The following grammar H will be used as a running example to demonstrate predictive parsing.
Example:
e → e + e | e * e | ( e ) | x | y
x+y*(y+x)
![Page 47: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/47.jpg)
Removing ambiguity
Since + and * are left-associative and * binds tighter than +, we can derive an unambiguous variant of H.
e → e + t | t t → t * f | f f → ( e ) | x | y
![Page 48: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/48.jpg)
Left recursion
Problem: left-recursive grammars cause recursive descent parsers to loop forever.
int e() { char* save = next; if (e() && eat('+') && t()) return 1; next = save; if (t()) return 1; next = save; return 0; }
Call to self without consuming any input
![Page 49: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/49.jpg)
Eliminating left recursion
n → 𝛼 n → 𝛼 n' ⟹
n' → 𝛼 n' ⟹ Rule 1
Rule 2
where 𝛼 does not begin with n
Let 𝛼 denote any sequence of grammar symbols.
n' → 𝜀
Rule 3 Introduce new
production
n → n 𝛼
![Page 50: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/50.jpg)
Eliminating left recursion
Example before:
e → e + v | v v → x | y
and after:
e → v e' v → x | y e' → 𝜀 | + v e'
![Page 51: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/51.jpg)
Example 4
Running example, after eliminating left-recursion.
e → t e' e' → + t e' | 𝜀
t → f t' t' → * f t' | 𝜀
f → ( e ) | x | y
![Page 52: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/52.jpg)
first and follow sets
Predictive parsers are built using the first and follow sets of each non-terminal in a grammar.
![Page 53: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/53.jpg)
Definition of first sets
Let 𝛼 denote any sequence of grammar symbols.
If 𝛼 can derive a string beginning with terminal a then a ∊ first(𝛼).
If 𝛼 can derive 𝜀 then 𝜀 ∊ first(𝛼).
![Page 54: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/54.jpg)
Computing first sets
If a is a terminal then a ∊ first(a 𝛼).
If X1X2⋯Xn is a sequence of grammar symbols
and ∃i · a ∊ first(Xi)
and ∀j < i · 𝜀 ∊ first(Xj)
then a ∊ first(X1X2⋯ Xn ).
The empty string 𝜀 ∊ first(𝜀).
If n → 𝛼 is a production then
first( n ) = first(𝛼).
![Page 55: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/55.jpg)
Exercise 6
Give all members of the sets:
e → ( e + e ) | ( e * e ) | v v → x | 𝜀
first( v )
first( e )
first( v e )
![Page 56: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/56.jpg)
Exercise 7
What are the first sets for each non-terminal in the following grammar.
e → t e' e' → + t e' | 𝜀
t → f t' t' → * f t' | 𝜀
f → ( e ) | x | y
![Page 57: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/57.jpg)
Answer
first( f ) = { ‘(‘, ‘x’, ‘y’ } first( t' ) = { ‘*’, 𝜀 } first( t ) = { ‘(‘, ‘x’, ‘y’ } first( e' ) = { ‘+’, 𝜀 } first( e ) = { ‘(‘, ‘x’, ‘y’ }
![Page 58: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/58.jpg)
Definition of follow sets
Let 𝛼 and 𝛽 denote any sequence of grammar symbols.
Terminal a ∊ follow(n) if the start symbol of the grammar can derive a string of grammar symbols in which a immediately follows n.
The set follow(n) never contains 𝜀.
![Page 59: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/59.jpg)
End markers
In predictive parsing, it is useful to mark the end of the input string with a $ symbol.
((x*x)*x)$
$ is equivalent to '\0' in C.
![Page 60: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/60.jpg)
Computing follow sets
If s is the start symbol of the grammar then $ ∊ follow(s).
If n → 𝛼 x 𝛽 then everything in first(𝛽) except 𝜀 is in follow(x).
If n → 𝛼 x
or n → 𝛼 x 𝛽 and 𝜀 ∊ first(𝛽)
then everything in follow(n) is in follow(x).
![Page 61: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/61.jpg)
Exercise
Give all members of the sets:
e → ( e + e ) | ( e * e ) | v v → x | 𝜀
follow( e )
follow( v )
![Page 62: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/62.jpg)
Exercise 8
What are the follow sets for each non-terminal in the following grammar.
e → t e' e' → + t e' | 𝜀
t → f t' t' → * f t' | 𝜀
f → ( e ) | x | y
![Page 63: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/63.jpg)
Answer
follow( e' ) = { $, ‘)’ } follow( e ) = { $, ‘)’ } follow( t' ) = { ‘+’, $, ‘)’ } follow( t ) = { ‘+’, $, ‘)’ } follow( f ) = { ‘*’, ‘+’, ‘)’, $ }
![Page 64: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/64.jpg)
Predictive parsing table
For each non-terminal n, a parse table T defines which production of n should be chosen, based on the next input symbol a.
( + ...
e e → ( e r
r r → + e
v
Terminals
No
n-T
erm
inal
s
Production
![Page 65: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/65.jpg)
Predictive parsing table
for each production n → 𝛼 for each a ∊ first(𝛼) add n → 𝛼 to T[n , a] if 𝜀 ∊ first(𝛼) then for each b ∊ follow(n) add n → 𝛼 to T[n , b]
![Page 66: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/66.jpg)
Exercise 9
Construct a predictive parsing table for the following grammar.
e → t e' e' → + t e' | 𝜀
t → f t' t' → * f t' | 𝜀
f → ( e ) | x | y
![Page 67: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/67.jpg)
LL(1) grammars
If each cell in the parse table contains at most one entry then the a non-backtracking parser can be constructed and the grammar is said to be LL(1).
First L: left-to-right scanning of the input.
Second L: a leftmost derivation is constructed.
The (1): using one input symbol of look-ahead to decide which grammar production to choose.
![Page 68: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/68.jpg)
Exercise 10
Write a syntax checker for the grammar of Exercise 9, utilising the predictive parsing table.
int e() { ... }
It should return a non-zero value if some prefix of the string pointed to by next conforms to the grammar, otherwise it should return zero.
![Page 69: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/69.jpg)
Answer (part 1)
int e() { if (*next == 'x') return t() && e1(); if (*next == 'y') return t() && e1(); if (*next == '(') return t() && e1(); return 0; }
int e1() { if (*next == '+') return eat('+') && t() && e1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }
![Page 70: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/70.jpg)
Answer (part 2)
int t() { if (*next == 'x') return f() && t1(); if (*next == 'y') return f() && t1(); if (*next == '(') return f() && t1(); return 0; }
int t1() { if (*next == '+') return 1; if (*next == '*‘) return eat('*') && f() && t1(); if (*next == ')') return 1; if (*next == '\0') return 1; return 0; }
![Page 71: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/71.jpg)
Answer (part 3)
int f() { if (*next == 'x') return eat('x'); if (*next == 'y') return eat('y'); if (*next == '(') return eat('(') && e() && eat(')'); return 0; }
(Notice how backtracking is not required.)
![Page 72: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/72.jpg)
Predictive parsing algorithm
Let s be a stack, initially containing the start symbol of the grammar, and let next point to the input string.
while (top(s) != $) if (top(s) is a terminal) { if (top(s) == *next) { pop(s); next++; } else error(); } else if (T[top(s), *next] == X → Y1⋯ Yn) { pop(s); push(s, Yn⋯ Y1) /* Y1 on top */ }
![Page 73: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/73.jpg)
Exercise 11
Give the steps that a predictive parser takes to parse the following input.
x + x * y
For each step (loop iteration), show the input stream, the stack, and the parser action.
![Page 74: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/74.jpg)
Acknowledgements
Plus Stanford University lecture notes by Maggie Johnson and Julie Zelenski.
![Page 75: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/75.jpg)
APPENDIX
![Page 76: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/76.jpg)
Context-free grammars
Have four components:
1. A set of terminal symbols.
2. A set of non-terminal symbols.
3. A set of productions (or rules) of the form:
where n is a non-terminal and
X1⋯Xn is any sequence of terminals, non-terminals, and 𝜀.
4. The start symbol (one of the non-terminals).
n → X1⋯ Xn
![Page 77: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/77.jpg)
Notation
Non-terminals are underlined.
Rather than writing
we may write:
(Also, symbols → and ::= will be used interchangeably.)
e → x e → e + e
e → x | e + e
![Page 78: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/78.jpg)
Why context-free?
Regular
Context Free
Context Sensitive
Unrestricted
Nice balance between expressive power and efficiency of parsing.
![Page 79: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/79.jpg)
Chomsky hierarchy
Grammar Valid productions
Unrestricted 𝛼 → 𝛽
Context-Sensitive 𝛼 x γ → 𝛼 𝛽 γ
Context-Free x → 𝛽
Regular x → t x → t z x → 𝜀
Let t range over terminals, x and z over non-terminals and , 𝛽 and γ over sequences of terminals, non-
terminals, and 𝜀.
![Page 80: Lexical and Syntax Analysis · Lexical and Syntax Analysis Top-Down Parsing . Data structure Easy for programs to transform String of characters Easy for humans to write and understand](https://reader035.vdocuments.net/reader035/viewer/2022063008/5fbe6bec945744342233ac57/html5/thumbnails/80.jpg)
Backus-Naur Form
BNF is a standard ASCII notation for specification of context-free grammars whose terminals are ASCII characters. For example:
<exp> ::= <exp> "+" <exp> | <exp> "-" <exp> | <var> <var> ::= "x" | "y"
The BNF notation can itself be specified in BNF.