midterm exam advice and hints
DESCRIPTION
Midterm Exam Advice and Hints. Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155. [email protected] http://www.engr.uconn.edu/~steve (860) 486 - 4818. Dr. Robert LaBarre - PowerPoint PPT PresentationTRANSCRIPT
MTE.1
CSE4100
Midterm Exam Advice and HintsMidterm Exam Advice and Hints
Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department
The University of Connecticut191 Auditorium Road, Box U-155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
Dr. Robert LaBarreUnited Technologies Research Center
411 Silver LaneE. Hartford, CT 06018
MTE.2
CSE4100
Core MaterialCore Material
Chapter 1: Introduction to CompilersChapter 1: Introduction to Compilers Basic Compiler Ideas and Concepts The “Big Picture”
Chapter 2: A Simple One-Pass CompilerChapter 2: A Simple One-Pass Compiler A Look at All Phases of Compilation Process From Lexical Analysis Thru Code Generation
FOCUS: Chapter 3: Lexical AnalysisFOCUS: Chapter 3: Lexical Analysis Specifying/Recognizing Tokens Patterns (Regular Expressions) and Lexemes Regular Expressions and DFA/NFA Algorithms for
Regular Expression to NFA NFA to DFA
MTE.3
CSE4100
Core MaterialCore Material
FOCUS Chapter 4: Syntax Analysis FOCUS Chapter 4: Syntax Analysis Context Free Grammar: Defs and Concepts
Derivations, Specification, Languages Writing Grammars Ambiguity, Left Recursion, Left Factoring,
Removing epsilon Moves Algorithm for Left Recursion Removal
Top-Down Parsing Recursive Descent and Predictive Parsing First and Follow Calculation Constructing LL(1) Parsing Table Ambiguity and Error Handling
Lex and Yacc will not be Tested!Lex and Yacc will not be Tested!
MTE.4
CSE4100
Hints for Taking ExamHints for Taking Exam
Read the Questions Carefully!Read the Questions Carefully! Ask Questions if you are Confused!Ask Questions if you are Confused! Answer Questions in Any OrderAnswer Questions in Any Order
Organized to fit on minimum number of pages Answer “Easiest” questions for you!
Assess Points per Time UnitAssess Points per Time Unit 75 minutes = 75 points 30 minutes = 30 points; 20 minutes = 20 points
Don't Be Afraid to Not Answer a QuestionDon't Be Afraid to Not Answer a Question 60% Correct for 100 Points = 60 Points 90% Correct For 80 Points = 72 Points
Partial Credit is the NormPartial Credit is the Norm
MTE.5
CSE4100
Possible QuestionsPossible Questions
Open Notes and Open BookOpen Notes and Open Book 5 to 6 Total Multi-Part Questions5 to 6 Total Multi-Part Questions Possibilities… Possibilities…
Constructive and Algorithm Questions Writing and Using Grammar Understanding Significance and Relevance of
Concepts Know your Algorithms and Constructs Know your Algorithms and Constructs
(Regular Expressions, NFA, DFA, CFG)(Regular Expressions, NFA, DFA, CFG) Show All Work to Receive Partial (Any) CreditShow All Work to Receive Partial (Any) Credit Do Not Jump to Final AnswerDo Not Jump to Final Answer Avoid Run-on ExplanationsAvoid Run-on Explanations
MTE.6
CSE4100
Chapter 3 Excerpted MaterialChapter 3 Excerpted Material Introducing Basic Terminology
Token Sample Lexemes Informal Description of Pattern
const
if
relation
id
num
literal
const
if
<, <=, =, < >, >, >=
pi, count, D2
3.1416, 0, 6.02E23
“core dumped”
const
if
< or <= or = or < > or >= or >
letter followed by letters and digits
any numeric constant
any characters between “ and “ except “
Classifies Pattern
Actual values are critical. Info is :
1. Stored in symbol table2. Returned to parser
MTE.7
CSE4100
Language ConceptsLanguage Concepts
A language, L, is simply any set of strings over a fixed alphabet.
Alphabet Languages
{0,1} {0,10,100,1000,100000…}
{0,1,00,11,000,111,…}
{a,b,c} {abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {TEE,FORE,BALL,…}
{FOR,WHILE,GOTO,…}
{A,…,Z,a,…,z,0,…9, { All legal PASCAL progs}
+,-,…,<,>,…} { All grammatically correct
English sentences }
Special Languages: - EMPTY LANGUAGE
- contains string only
MTE.8
CSE4100
Formal Language Operations
OPERATION DEFINITION
union of L and M written L M
concatenation of L and M written LM
Kleene closure of L written L*
positive closure of L written L+
L M = {s | s is in L or s is in M}
LM = {st | s is in L and t is in M}
L+=
0i
iL
L* denotes “zero or more concatenations of “ L
L*=
1i
iL
L+ denotes “one or more concatenations of “ L
MTE.9
CSE4100
Formal Language OperationsExamples
L = {A, B, C, D } D = {1, 2, 3}
L D = {A, B, C, D, 1, 2, 3 }
LD = {A1, A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 }
L2 = { AA, AB, AC, AD, BA, BB, BC, BD, CA, … DD}
L4 = L2 L2 = ??
L* = { All possible strings of L plus }
L+ = L* -
L (L D ) = ??
L (L D )* = ??
MTE.10
CSE4100
A A Regular Expression Regular Expression is a Set of Rules / is a Set of Rules /
Techniques for Constructing Sequences of Techniques for Constructing Sequences of
Symbols (Strings) From an Alphabet.Symbols (Strings) From an Alphabet.
Let Let Be an Alphabet, r a Regular Expression Be an Alphabet, r a Regular Expression
Then L(r) is the Language That is Characterized Then L(r) is the Language That is Characterized
by the Rules of Rby the Rules of R
Language & Regular Expressions
MTE.11
CSE4100
precedence
Rules for Specifying Regular Expressions:Rules for Specifying Regular Expressions:
1. is a regular expression denoting { }
2. If a is in , a is a regular expression that denotes {a}
3. Let r and s be regular expressions with languages L(r) and L(s). Then
(a) (r) | (s) is a regular expression L(r) L(s)
(b) (r)(s) is a regular expression L(r) L(s)
(c) (r)* is a regular expression (L(r))*
(d) (r) is a regular expression L(r)
All are Left-Associative.
MTE.12
CSE4100
EXAMPLES of Regular Expressions
L = {A, B, C, D } D = {1, 2, 3}
A | B | C | D = L
(A | B | C | D ) (A | B | C | D ) = L2
(A | B | C | D )* = L*
(A | B | C | D ) ((A | B | C | D ) | ( 1 | 2 | 3 )) = L (L D)
MTE.13
CSE4100
Algebraic Properties of Algebraic Properties of Regular ExpressionsRegular Expressions
AXIOM DESCRIPTION
r | s = s | r
r | (s | t) = (r | s) | t
(r s) t = r (s t)
r = rr = r
r* = ( r | )*
r ( s | t ) = r s | r t( s | t ) r = s r | t r
r** = r*
| is commutative
| is associative
concatenation is associative
concatenation distributes over |
relation between * and
Is the identity element for concatenation
* is idempotent
MTE.14
CSE4100
Automata & Language TheoryAutomata & Language Theory
TerminologyTerminology FSA
A recognizer that takes an input string and determines whether it’s a valid string of the language.
Non-Deterministic FSA (NFA) Has several alternative actions for the same input symbol
Deterministic FSA (DFA) Has at most 1 action for any given input symbol
Bottom LineBottom Line expressive power(NFA) == expressive power(DFA) Conversion can be automated
MTE.15
CSE4100
Finite Automata & Language TheoryFinite Automata & Language Theory
Finite Automata : A recognizer that takes an input string & determines whether it’s a valid sentence of the language
Non-Deterministic : Has more than one alternative action for the same input symbol. Can’t utilize algorithm !
Deterministic : Has at most one action for a given input symbol.
Both types are used to recognize regular expressions.
MTE.16
CSE4100
NFAs & DFAsNFAs & DFAs
Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but are somewhat less precise.
Deterministic Finite Automata (DFAs) require more complexity to represent regular expressions, but offer more precision.
We’ll discuss both plus conversion algorithms, i.e., NFA DFA and DFA NFA
MTE.17
CSE4100
Non-Deterministic Finite AutomataNon-Deterministic Finite Automata
An NFA is a mathematical model that consists of :
• S, a set of states
• , the symbols of the input alphabet
• move, a transition function.
• move(state, symbol) state
• move : S S
• A state, s0 S, the start state
• F S, a set of final or accepting states.
MTE.18
CSE4100
Example NFAExample NFA
S = { 0, 1, 2, 3 }
s0 = 0
F = { 3 }
= { a, b }
start0 3b21 ba
a
b
What Language is defined ?
What is the Transition Table ?
state
i n p u t
0
1
2
a b
{ 0, 1 }
-- { 2 }
-- { 3 }
{ 0 }
(null) moves possible
ji
Switch state but do not use any input symbol
MTE.19
CSE4100
Epsilon-TransitionsEpsilon-Transitions
Given the regular expression: (a (b*c)) | (a (b |c+)?)Given the regular expression: (a (b*c)) | (a (b |c+)?) Find a transition diagram NFA that recognizes
it. Solution ?Solution ?
MTE.20
CSE4100
Deterministic Finite Automata Deterministic Finite Automata
A DFA is an NFA with the following restrictions:
• moves are not allowed
• For every state s S, there is one and only one path from s for every input symbol a .
Since transition tables don’t have any alternative options, DFAs are easily simulated via an algorithm.
s s0
c nextchar;while c eof do s move(s,c); c nextchar;end;if s is in F then return “yes” else return “no”
MTE.21
CSE4100
Example - DFAExample - DFA
start0 3b21 ba
a
b
start0 3b21 ba
b
ab
aa
What Language is Accepted?
Recall the original NFA:
MTE.22
CSE4100
Regular Expression to NFA ConstructionRegular Expression to NFA Construction
We now focus on transforming a Reg. Expr. to an NFA
This construction allows us to take:
• Regular Expressions (which describe tokens)
• To an NFA (to characterize language)
• To a DFA (which can be computerized)
The construction process is componentwise
Builds NFA from components of the regular expression in a special order with particular techniques.
NOTE: Construction is syntax-directed translation, i.e., syntax of regular expression is determining factor for NFA construction and structure.
MTE.23
CSE4100
Motivation: Construct NFA For:Motivation: Construct NFA For:
start i f
astart 0 1
b A Bastart 0 1
bstart A B
:
a :
b:
ab:
| ab :
a*
( | ab )* :
MTE.24
CSE4100
Construction Algorithm : R.E. Construction Algorithm : R.E. NFA NFA
Construction Process :
1st : Identify subexpressions of the regular expression
symbols
r | s
rs
r*
2nd : Characterize “pieces” of NFA for each subexpression
MTE.25
CSE4100
Piecing Together NFAsPiecing Together NFAs
2. For a in the regular expression, construct NFA
astart i f L(a)
1. For in the regular expression, construct NFA
L()start i f
MTE.26
CSE4100
Piecing Together NFAs – continued(1)Piecing Together NFAs – continued(1)
where i and f are new start / final states, and -moves are introduced from i to the old start states of N(s) and N(t) as well as from all of their final states to f.
3.(a) If s, t are regular expressions, N(s), N(t) their NFAs s|t has NFA:
start i f
N(s)
N(t)
L(s) L(t)
MTE.27
CSE4100
Piecing Together NFAs – continued(2)Piecing Together NFAs – continued(2)
3.(b) If s, t are regular expressions, N(s), N(t) their NFAs st (concatenation) has NFA:
starti fN(s) N(t) L(s) L(t)
Alternative:
overlap
N(s)start i fN(t)
where i is the start state of N(s) (or new under the alternative) and f is the final state of N(t) (or new). Overlap maps final states of N(s) to start state of N(t).
MTE.28
CSE4100
Piecing Together NFAs – continued(3)Piecing Together NFAs – continued(3)
fN(s)start i
where : i is new start state and f is new final state
-move i to f (to accept null string)
-moves i to old start, old final(s) to f
-move old final to old start (WHY?)
3.(c) If s is a regular expressions, N(s) its NFA, s* (Kleene star) has NFA:
MTE.29
CSE4100
Properties of Construction Properties of Construction
1. N(r) has at most 2*(#symbols + #operators) of r
2. N(r) has exactly one start and one accepting state
3. Each state of N(r) has at most one outgoing edge
a and at most two outgoing ’s
4. BE CAREFUL to assign unique names to all states !
Let r be a regular expression, with NFA N(r), then
MTE.30
CSE4100
Detailed ExampleDetailed Example
r13
r12r5
r3 r11r4
r9
r10
r8r7
r6
r0
r1 r2
b
*c
a a
|
( )
b
|
*
c
See example 3.16 in textbook for (a | b)*abb2nd Example - (ab*c) | (a(b|c*))
Parse Tree for this regular expression:
What is the NFA? Let’s construct it !
MTE.31
CSE4100
Detailed Example – Construction(1)Detailed Example – Construction(1)
r3: a
r0: b
r2: c
b
r1:
r4 : r1 r2b
c
r5 : r3 r4
b
a c
MTE.32
CSE4100
Detailed Example – Construction(2)Detailed Example – Construction(2)
r11: a
r7: b
r6: c
c
r9 : r7 | r8
b
r10 : r9
c
r8:
c
r12 : r11 r10
b
a
MTE.33
CSE4100
Detailed Example – Final StepDetailed Example – Final Step
r13 : r5 | r12
b
a c
c
b
a
1
6543
8
2
10
9 12 13 14
11
15
7
16
17
MTE.34
CSE4100
Conversion : NFA Conversion : NFA DFA Algorithm DFA Algorithm
• Algorithm Constructs a Transition Table for DFA from NFA
• Each state in DFA corresponds to a SET of states of the NFA
• Why does this occur ?
• moves
• non-determinism
Both require us to characterize multiple situations that occur for accepting the same string.
(Recall : Same input can have multiple paths in NFA)
• Key Issue : Reconciling AMBIGUITY !
MTE.35
CSE4100
Converting NFA to DFA – 1Converting NFA to DFA – 1stst Look Look
0 85
4
7
3
6
2
1
ba
c
From State 0, Where can we move without consuming any input ?
This forms a new state: 0,1,2,6,8 What transitions are defined for this new state ?
MTE.36
CSE4100
The Resulting DFAThe Resulting DFA
Which States are FINAL States ?
1, 2, 5, 6, 7, 81, 2, 4, 5, 6, 8
0, 1, 2, 6, 8 3
c
ba
a
a
c
c
DC
AB
c
baa
a
c
c
How do we handle alphabet symbols not defined for A, B, C, D ?
MTE.37
CSE4100
Algorithm ConceptsAlgorithm Concepts
NFA N = ( S, , s0, F, MOVE )
-Closure(S) : s S
: set of states in S that are reachable
from s via -moves of N that originate
from s.
-Closure of T : T S
: NFA states reachable from all t T
on -moves only.
move(T,a) : T S, a : Set of states to which there is a
transition on input a from some t T
These 3 operations are utilized by algorithms / techniques to facilitate the conversion process.
No input is consumed
MTE.38
CSE4100
Illustrating Conversion – An ExampleIllustrating Conversion – An Example
First we calculate: -closure(0) (i.e., state 0)
-closure(0) = {0, 1, 2, 4, 7} (all states reachable from 0 on -moves)Let A={0, 1, 2, 4, 7} be a state of new DFA, D.
0 1
2 3
54
6 7 8 9
10
a
a
b
b
b
start
Start with NFA: (a | b)*abb
MTE.39
CSE4100
Chapter 4 Excerpted MaterialChapter 4 Excerpted MaterialContext Free GrammarsContext Free Grammars
Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where:
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generatable by the grammar & in the language
S: Start symbol, SNT, which defines all strings of the language
PR: Production rules to indicate how T and NT are combines to generate valid strings of the language.
PR: NT (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model !
MTE.40
CSE4100
Context Free Grammars : A First LookContext Free Grammars : A First Look
assign_stmt id := expr ;
expr term operator term
term id
term real
term integer
operator +
operator -
What do “BLUE” symbols represent?
Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a collection of terminals / tokens.
Simply stated: Grammars / production rules allow us to “rewrite” and “identify” correct syntax.
What do “BLACK” symbols represent?
MTE.41
CSE4100
How is Grammar Used ?How is Grammar Used ?
Given the rules on the previous slide, suppose
id := real + int;
is input. Is it syntactically correct? How do we know?
expr is represented as: expr term operator term
Is this accurate / complete?
expr expr operator term
expr term
How does this affect the derivations which are possible?
MTE.42
CSE4100
Grammar ConceptsGrammar Concepts
A step in a derivation is zero or one action that replaces a NT with the RHS of a production rule.
EXAMPLE: E -E (the means “derives” in one step) using the production rule: E -E
EXAMPLE: E E A E E * E E * ( E )
DEFINITION: derives in one step
derives in one step
derives in zero steps
+
*
EXAMPLES: A if A is a production rule
1 2 … n 1 n ; for all
If and then
* *
**
MTE.43
CSE4100
lm
Leftmost and Rightmost DerivationsLeftmost and Rightmost Derivations
Leftmost: Replace the leftmost non-terminal symbol
E E A E id A E id * E id * id
Rightmost: Replace the leftmost non-terminal symbol
E E A E E A id E * id id * idrm
lmlmlmlm
rmrmrm
Important Notes: A
If A , what’s true about ?
If A , what’s true about ?rm
Derivations: Actions to parse input can be represented pictorially in a parse tree.
MTE.44
CSE4100
Examples of LM / RM DerivationsExamples of LM / RM Derivations
E E A E | ( E ) | -E | id
A + | - | * | / |
A leftmost derivation of : id + id * id
A rightmost derivation of : id + id * id
MTE.45
CSE4100
Derivations & Parse TreeDerivations & Parse Tree
E
EE A
E
EE A
*
E
EE A
id *
E
EE A
id id*
E * E
E E A E
id * E
id * id
MTE.46
CSE4100
Parse Trees and DerivationsParse Trees and Derivations
Consider the expression grammar:
E E+E | E*E | (E) | -E | id
Leftmost derivations of id + id * id
E E + E E + E id + E
E
EE +
id
E
EE *
id + E id + E * E
E
EE +
id
E
EE +
MTE.47
CSE4100
Removing AmbiguityRemoving Ambiguity
Take Original Grammar:
stmt if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Revise to remove ambiguity:
stmt matched_stmt | unmatched_stmt
matched_stmt if expr then matched_stmt else matched_stmt | other
unmatched_stmt if expr then stmt
| if expr then matched_stmt else unmatched_stmt
How does this grammar work ?
MTE.48
CSE4100
Resolving Difficulties : Left RecursionResolving Difficulties : Left Recursion
A left recursive grammar has rules that support the derivation : A A, for some .+
Top-Down parsing can’t reconcile this type of grammar, since it could consistently make choice which wouldn’t allow termination.
A A A A … etc. A A |
Take left recursive grammar:
A A | To the following:
A’ A’
A’ A’ |
MTE.49
CSE4100
Why is Left Recursion a Problem ?Why is Left Recursion a Problem ?
Consider: E E + T | T T T * F | F F ( E ) | id
Derive : id + id + id
E E + T
How can left recursion be removed ?
E E + T | T What does this generate?
E E + T T + T
E E + T E + T + T T + T + T
How does this build strings ?
What does each string have to start with ?
MTE.50
CSE4100
Resolving Difficulties : Left Recursion (2)Resolving Difficulties : Left Recursion (2)
Informal Discussion:
Take all productions for A and order as:
A A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ | For our example: E E + T | T
T T * F | F
F ( E ) | id
E TE’ E’ + TE’ | T FT’
T’ * FT’ | F ( E ) | id
MTE.51
CSE4100
Resolving Difficulties : Left Recursion (3)Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep, this isn’t enough
S Aa | b
A Ac | Sd | S Aa Sda
Algorithm:Input: Grammar G with no cycles or -productions
Output: An equivalent grammar with no left recursion
1. Arrange the non-terminals in some order A1,A2,…An
2. for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai Aj
by the productions Ai 1 | 2 | … | k
where Aj 1|2|…|k are all current Aj productions; end
eliminate the immediate left recursion among Ai productions end
MTE.52
CSE4100
Using the AlgorithmUsing the Algorithm
Apply the algorithm to: A1 A2a | b
A2 A2c | A1d |
i = 1
For A1 there is no left recursion
i = 2
for j=1 to 1 do
Take productions: A2 A1 and replace with
A2 1 | 2 | … | k |
where A1 1 | 2 | … | k are A1 productions
in our case A2 A1d becomes A2 A2ad | bdWhat’s left: A1 A2a | b
A2 A2 c | A2 ad | bd | Are we done ?
MTE.53
CSE4100
Using the Algorithm (2)Using the Algorithm (2)
No ! We must still remove A2 left recursion !
A1 A2a | b
A2 A2 c | A2 ad | bd |
Recall:
A A1 | A2 | … | Am | 1 | 2 | … | n
A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’ | … | m A’ |
Apply to above case. What do you get ?
MTE.54
CSE4100
Removing Difficulties : Removing Difficulties : -Moves-Moves
Very Simple: A and B uAv implies add B uv to the grammar G.
Why does this work ?
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
Examples:
A1 A2 a | b
A2 bd A2’ | A2’
A2’ c A2’ | bd A2’ |
MTE.55
CSE4100
Removing Difficulties : CyclesRemoving Difficulties : Cycles
How would cycles be removed ?
S SS | ( S ) |
Has: S SS S
S
MTE.56
CSE4100
Removing Difficulties : Left FactoringRemoving Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose:
stmt if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A 1 | 2 : if expr then stmt
1: else stmt 2 :
Transform to:
A A’
A’ 1 | 2
EXAMPLE:
stmt if expr then stmt rest
rest else stmt |
MTE.57
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another LookAnother Look
• Forcing Matching Within Grammar
- if-then-else else must go with most recent if
stmt if expr then stmt | if expr then stmt else stmt | other
stmt matched_stmt | unmatched_stmt
matched_stmt if expr then matched_stmt else matched_stmt | other
unmatched_stmt if expr then stmt
| if expr then matched_stmt else unmatched_stmt
Leads to:
MTE.58
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another Look (2)Another Look (2)
• Left Factoring - Know correct choice in advance !
where_queries where are_does declarations of symbol
| where are_does “search_option” appear
Rules given in Assignment or
where_queries where rest_where
rest_where are declarations of symbol
| does “search_option” appear
MTE.59
CSE4100
Resolving Grammar Problems : Resolving Grammar Problems : Another Look (3)Another Look (3)
• Left Recursion - Avoid infinite loop when choosing rules
A A | A A’
A’ A’ | EXAMPLE:
E E + T | T
T T * F | F
F ( E ) | id
E TE’ E’ + TE’ | T FT’
T’ * FT’ | F ( E ) | id
If we use the original production rules on input: id + id (with leftmost):
E E + T E + T + T E + T + T + T T + T + T + … + T
If we use the transformed production rules on input: id + id (with leftmost):
E TE’ T + TE’ id + TE’ id + id E’ id + id
*
not assured
**
MTE.60
CSE4100
Overall Top-Down AlgorithmOverall Top-Down Algorithm
Data structuresData structures A stack Holds the symbols to treat A lookahead window to choose a prediction
AlgorithmAlgorithm Startup
Initialize stack to start symbol (a non-terminal) Initialize lookahead window at start of token stream
Recursive Process Find out if front of window and top of stack match If match
– Consume the symbols
No match– Pop / Select / Push
MTE.61
CSE4100
Lookahead Size and LanguagesLookahead Size and Languages
With k tokens of lookaheadWith k tokens of lookahead Some languages can be parsed (this way) Some languages cannot be parsed
What about k+1 tokens ?What about k+1 tokens ? More languages can be parsed.... So the set of languages recognized with k is a
subset of the set of languages recognized with k+1!
We have a hierarchy!
Still...Still... In practice LL(1) should be enough.
MTE.62
CSE4100
Top-Down ParsingTop-Down Parsing
Identify a leftmost derivation for an input string Why ?
By always replacing the leftmost non-terminal symbol via a production rule, we are guaranteed of developing a parse tree in a left-to-right fashion consistent with scanning the input.
A aBc adDc adec (scan a, scan d, scan e, scan c - accept!)
Recursive-descent parsing concepts Predictive parsing
Recursive / Brute force technique non-recursive / table driven
Error recovery Implementation
MTE.63
CSE4100
Non-Recursive / Table DrivenNon-Recursive / Table Driven
Empty stack symbol
a + b $
Y
X
$
Z
Input
Predictive Parsing Program
Stack Output
Parsing Table M[A,a]
(String + terminator)
NT + T symbols of CFG What actions parser
should take based on stack / input
General parser behavior: X : top of stack a : current input
1. When X=a = $ halt, accept, success
2. When X=a $ , POP X off stack, advance input, go to 1.
3. When X is a non-terminal, examine M[X,a]
if it is an error call recovery routineif M[X,a] = {X UVW}, POP X, PUSH W,V,UDO NOT expend any input
MTE.64
CSE4100
Algorithm for Non-Recursive ParsingAlgorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is terminal or $ then
if X=a then
pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X,a] = XY1Y2…Yk then begin
pop X from stack;
push Yk, Yk-1, … , Y1 onto stack, with Y1 on top
output the production XY1Y2…Yk
end
else error()
until X=$ /* stack is empty */
Input pointer
May also execute other code based on the production used
MTE.65
CSE4100
ExampleExample
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
Our well-worn example !
Table M
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
ETE’
TFT’
Fid
E’+TE’
T’ T’*FT’
F(E)
TFT’
ETE’
T’
E’ E’
T’
MTE.66
CSE4100
Trace of ExampleTrace of Example
Expend Input
$E
$E’T$E’T’F$E’T’id$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’id$E’T’$E’$
id + id * id$
id + id * id$id + id * id$id + id * id$
+ id * id$+ id * id$+ id * id$
id * id$id * id$id * id$
* id$* id$
id$id$
$$$
E TE’T FT’F id
T’ E’ +TE’
T FT’F id
T’ *FT’
F id
T’ E’
STACK INPUT OUTPUT
MTE.67
CSE4100
Leftmost Derivation for the ExampleLeftmost Derivation for the Example
The leftmost derivation for the example is as follows:
E TE’ FT’E’ id T’E’ id E’ id + TE’ id + FT’E’
id + id T’E’ id + id * FT’E’ id + id * id T’E’
id + id * id E’ id + id * id
MTE.68
CSE4100
What’s the Missing Puzzle Piece ?What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M !
1st : Calculate First & Follow for Grammar
2nd: Apply Construction Algorithm for Parsing Table
Conceptual Perspective:
First: Let be a string of grammar symbols. First() are the first terminals that can appear in in any possible derivation. NOTE: If , then is First( ).
Follow: Let A be a non-terminal. Follow(A) is the set of terminals that can appear directly to the right of A in some sentential form. (S Aa, for some and ). NOTE: If S A, then $ is Follow(A).
*
* *
MTE.69
CSE4100
Computing First(X) : Computing First(X) : All Grammar SymbolsAll Grammar Symbols
1. If X is a terminal, First(X) = {X}
2. If X is a production rule, add to First(X)
3. If X is a non-terminal, and X Y1Y2…Yk is a production rule
Place First(Y1) in First(X)
if Y1 , Place First(Y2) in First(X)
if Y2 , Place First(Y3) in First(X)
…
if Yk-1 , Place First(Yk) in First(X)
NOTE: As soon as Yi , Stop.
May repeat 1, 2, and 3, above for each Yj
*
*
*
*
MTE.70
CSE4100
Computing First(X) : Computing First(X) : All Grammar Symbols - continuedAll Grammar Symbols - continued
Informally, suppose we want to compute
First(X1 X2 … Xn ) = First (X1) “+”
First(X2) if is in First(X1) “+”
First(X3) if is in First(X2) “+”
…
First(Xn) if is in First(Xn-1)
Note 1: Only add to First(X1 X2 … Xn) if is in First(Xi) for all i
Note 2: For First(X1), if X1 Z1 Z2 … Zm , then we need to compute First(Z1 Z2 … Zm) !
MTE.71
CSE4100
ExampleExample
Computing First for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E)
First(TE’)
First(T)
First(T) “+” First(E’)
First(F)
First((E)) “+” First(id)
First(F) “+” First(T’)
“(“ and “id”
Not First(E’) since T
Not First(T’) since F
*
*
Overall: First(E) = { ( , id } = First(F)
First(E’) = { + , } First(T’) = { * , }
First(T) First(F) = { ( , id }
MTE.72
CSE4100
Example 2Example 2
Given the production rules:
S i E t SS’ | a
S’ eS |
E b
Verify that
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
MTE.73
CSE4100
Computing Follow(A) : Computing Follow(A) : All Non-TerminalsAll Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $ signals end of input
2. If there is a production A B, then everything in First() is in Follow(B) except for .
3. If A B is a production, or A B and (First() contains ), then everything in Follow(A) is in Follow(B)
(Whatever followed A must follow B, since nothing follows B from the production rule)
*
We’ll calculate Follow for two grammars.
MTE.74
CSE4100
ExampleExample
Compute Follow for: E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
• Follow(E) - contains $ since E is the start symbol. Also, since F (E) then First(“)”) is in Follow(E). Thus Follow(E) = { ) , $ }
• Follow(E’) : E TE’ implies Follow(E) is in Follow(E’), and Follow(E’) = { ) , $ }
• Follow(T) : E TE’ implies put in First(E’). Since E’ , put in Follow(E). Since E’ +TE’ , Put in First(E’), and since E’ , put in Follow(E’). Thus Follow(T) = { +, ), $ }.
• Follow(T’)
• Follow(F)You do these !
**
MTE.75
CSE4100
Computing Follow : 2Computing Follow : 2ndnd Example Example
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Recall:
Follow(S) – Contains $, since S is start symbol
Since S i E t SS’ , put in First(S’) – not
Since S’ , Put in Follow(S)
Since S’ eS, put in Follow(S’) So…. Follow(S) = { e, $ }
Follow(S’) = Follow(S) HOW?
Follow(E) = { t }
*
MTE.76
CSE4100
Motivation Behind First & FollowMotivation Behind First & Follow
First:
Follow:
Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream)
Example: If A , and a is in First(), then when a=input, replace with .
( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .
Is used when First has a conflict, to resolve choices. When or , then what follows A dictates the next choice to be made.
Example: If A , and b is in Follow(A ), then when a , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows!
( Above . Here First( ) contains .)
*
*
*
MTE.77
CSE4100
Constructing Parsing TableConstructing Parsing Table
Algorithm:
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A to M[A, a ]
3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A).
3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]
4. All undefined entries are errors.
MTE.78
CSE4100
Constructing Parsing Table - ExampleConstructing Parsing Table - Example
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +, }First(T’) = { *, }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ), }Follow(T,T’) = { +, ), }
Expression Example: E TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E TE’
M[E, id ] : E TE’
(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’
(by rule 3) E’ : in First( ) T’ : in First( )
by rule 2
M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)
M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)
(Due to Follow(E’) M[T’, $] : T’ (3.2)
MTE.79
CSE4100
Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Follow(S) = { e, $ }
Follow(S’) = { e, $ }
Follow(E) = { t }
S i E t SS’ S a E b
First(i E t SS’)={i} First(a) = {a} First(b) = {b}
S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }
INPUT SYMBOL
a $tie
S
S’
E
bNon-terminal
S a S iEtSS’
S
E b
S’ S’ eS
MTE.80
CSE4100
ExampleExample
Step 1Step 1 Compute
Follow First
T → F T’T’→ * F T’
→ F → ( E )
→ Id
S → E $E → T E’ E’→ + T E’
→
MTE.81
CSE4100
ExampleExample
Step 2Step 2 Build the parser table
Step 3Step 3 Input: Id + Id * Id $
Input SymbolsInput Symbols
NTNT IdId ++ ** (( )) $$
SS EE → → TE’TE’ EE → → TE’TE’
EE EE → → TE’TE’ EE → →TE’TE’
E’E’ E’E’ →+ →+TE’TE’ E’E’ → → E’E’ → → TT TT → → FT’FT’ TT → →FT’FT’
T’T’ T’T’ → → T’T’ → →**FT’FT’ T’T’ → → T’T’ → →
FF FF → → IdId FF → → ( (EE))
Pa
rse
r T
ab
le
T → F T’T’→ * F T’
→ F → ( E )
→ Id
S → E $E → T E’ E’→ + T E’
→
MTE.82
CSE4100
Motivation Behind First & FollowMotivation Behind First & Follow
First:
Follow:
Is used to indicate the relationship between non-terminals (in the stack) and input symbols (in input stream)
Example: If A , and a is in First(), then when a=input, replace with .
( a is one of first symbols of , so when A is on the stack and a is input, POP A and PUSH .
Is used when First has a conflict, to resolve choices. When or , then what follows A dictates the next choice to be made.
Example: If A , and b is in Follow(A ), then when a , and if b is an input character, then we expand A with , which will eventually expand to , of which b follows!
( Above . Here First( ) contains .)
*
*
*
MTE.83
CSE4100
Constructing Parsing TableConstructing Parsing Table
Algorithm:
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A to M[A, a ]
3.1 in First()? Add A to M[A, a ] for all terminals b in Follow(A).
3.2 in First() and $ in Follow(A)? Add A to M[A, $ ]
4. All undefined entries are errors.
MTE.84
CSE4100
Constructing Parsing Table - ExampleConstructing Parsing Table - Example
E TE’ E’ + TE’ | T FT’ T’ * FT’ | F ( E ) | id
First(E,F,T) = { (, id }
First(E’) = { +, }First(T’) = { *, }
Follow(E,E’) = { ), $}
Follow(F) = { *, +, ), }Follow(T,T’) = { +, ), }
Expression Example: E TE’ : First(TE’) = First(T) = { (, id }
M[E, ( ] : E TE’
M[E, id ] : E TE’
(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’
(by rule 3) E’ : in First( ) T’ : in First( )
by rule 2
M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)
M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)
(Due to Follow(E’) M[T’, $] : T’ (3.2)
MTE.85
CSE4100
Constructing Parsing Table – Example 2Constructing Parsing Table – Example 2
S i E t SS’ | a
S’ eS |
E b
First(S) = { i, a }
First(S’) = { e, }
First(E) = { b }
Follow(S) = { e, $ }
Follow(S’) = { e, $ }
Follow(E) = { t }
S i E t SS’ S a E b
First(i E t SS’)={i} First(a) = {a} First(b) = {b}
S’ eS S First(eS) = {e} First() = {} Follow(S’) = { e, $ }
INPUT SYMBOL
a $tie
S
S’
E
bNon-terminal
S a S iEtSS’
S
E b
S’ S’ eS
MTE.86
CSE4100
ExampleExample
Step 1Step 1 Compute
Follow First
T → F T’T’→ * F T’
→ F → ( E )
→ Id
S → E $E → T E’ E’→ + T E’
→
MTE.87
CSE4100
ExampleExample
Step 2Step 2 Build the parser table
Step 3Step 3 Input: Id + Id * Id $
Input SymbolsInput Symbols
NTNT IdId ++ ** (( )) $$
SS EE → → TE’TE’ EE → → TE’TE’
EE EE → → TE’TE’ EE → →TE’TE’
E’E’ E’E’ →+ →+TE’TE’ E’E’ → → E’E’ → → TT TT → → FT’FT’ TT → →FT’FT’
T’T’ T’T’ → → T’T’ → →**FT’FT’ T’T’ → → T’T’ → →
FF FF → → IdId FF → → ( (EE))
Pa
rse
r T
ab
le
T → F T’T’→ * F T’
→ F → ( E )
→ Id
S → E $E → T E’ E’→ + T E’
→
MTE.88
CSE4100
LL(1) GrammarsLL(1) Grammars
L : Scan input from Left to Right
L : Construct a Leftmost Derivation
1 : Use “1” input symbol as lookahead in conjunction with stack to decide on the parsing action
LL(1) grammars have no multiply-defined entries in the parsing table.
Properties of LL(1) grammars:
• Grammar can’t be ambiguous or left recursive• Grammar is LL(1) when A 1. & do not derive strings starting with the same terminal a 2. Either or can derive , but not both.
Note: It may not be possible for a grammar to be manipulated into an LL(1) grammar
MTE.89
CSE4100
Error RecoveryError Recovery
a + b $
Y
X
$
Z
Input
Predictive Parsing Program
Stack Output
Parsing Table M[A,a]
When Do Errors Occur? Recall Predictive Parser Function:
1. If X is a terminal and it doesn’t match input.
2. If M[ X, Input ] is empty – No allowable actions
Consider two recovery techniques:
A. Panic Mode
B. Phase-level Recovery
MTE.90
CSE4100
Panic Mode RecoveryPanic Mode Recovery
Augment parsing table with action that attempts to realign / synchronize token stream with the expected input.
Suppose : A on top of stack doesn’t mesh with current input symbol
1. Use Follow(A) to remove input tokens – sync (discard)
2. Use First(A) to determine when to restart parsing
3. Incorporate higher level language concepts (begin/end, while, repeat/until) to sync actions we don’t skip tokens unnecessarily.
Other actions:
4. When A , use it to manipulate stack to postpone error detection
5. Use non-matching terminal on stack as token that is inserted into input.
MTE.91
CSE4100
Revised Parsing Table / ExampleRevised Parsing Table / Example
synch
synch
synch
Non-terminal
INPUT SYMBOL
id + * ( ) $
E
E’
T
T’
F
ETE’
TFT’
Fid
E’+TE’
T’ T’*FT’
F(E)
TFT’
ETE’
T’
E’ E’
T’
synch
synch synch
synch
synch
synch
From Follow sets. Pop stack entry – T or NT
Skip input symbol
MTE.92
CSE4100
Skip & SynchSkip & Synch
MeaningMeaning Skip
Discard input symbol
SynchPop top of stack
MessagesMessages Constructed based on lookahead an non-terminal
ExampleExample NT = F Lookahead = + Expecting a FACTOR. Got + for a Term. So a
factor is missing.
MTE.93
CSE4100
Revised Parsing Table / Example(2)Revised Parsing Table / Example(2)
$E$E$E’T$E’T’F$E’T’id$E’T’$E’T’F*$E’T’F$E’T’$E’$E’T+$E’T$E’T’F$E’T’id$E’T’$E’$
) id * + id$
id * + id$ id * + id$ id * + id$
id * + id$ * + id$ * + id$
+ id$ + id$
+ id$ + id$
id$ id$ id$
$ $ $
STACK INPUT Remark
error, M[F,+] = synch
id is in First(E)
F has been popped
error, skip )
MTE.94
CSE4100
Phase-Level RecoveryPhase-Level Recovery
Fill in blanks entries of parsing table with error Fill in blanks entries of parsing table with error handling routineshandling routines
These routines These routines modify stack and / or input stream issue error message
Problems:Problems: Modifying stack has to be done with care, so as
to not create possibility of derivations that aren’t in language
Infinite loops must be avoided Can be used in conjunction with panic mode to Can be used in conjunction with panic mode to
have more complete error handlinghave more complete error handling