parsing giuseppe attardi università di pisa. parsing calculate grammatical structure of program,...
Post on 20-Dec-2015
223 views
TRANSCRIPT
![Page 1: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/1.jpg)
ParsingParsing
Giuseppe AttardiGiuseppe Attardi
Università di PisaUniversità di Pisa
![Page 2: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/2.jpg)
ParsingParsing
Calculate grammatical structure of Calculate grammatical structure of program, like diagramming program, like diagramming sentences, where:sentences, where:
Tokens = “words”Tokens = “words”
Programs = “sentences”Programs = “sentences”
For further information: Aho, Sethi, Ullman, “Compilers: Principles,
Techniques, and Tools” (a.k.a, the “Dragon Book”)
![Page 3: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/3.jpg)
Outline of coverageOutline of coverage
Context-free grammarsContext-free grammarsParsingParsing
– Tabular Parsing Methods– One pass
• Top-down• Bottom-up
YaccYacc
![Page 4: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/4.jpg)
Parser: extracts grammatical structure of programParser: extracts grammatical structure of program
function-def
name arguments stmt-list
mainstmt
expression
operatorexpression expression
variable string
cout
<<
“hello, world\n”
![Page 5: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/5.jpg)
Context-free languagesContext-free languages
Grammatical structure defined by context-Grammatical structure defined by context-free grammarfree grammar
statementstatement labeled-statementlabeled-statement | | expression-statementexpression-statement | | compound-statementcompound-statementlabeled-statementlabeled-statement identident :: statementstatement | | casecase constant-expression constant-expression :: statementstatementcompound-statementcompound-statement {{ declaration-list statement-list declaration-list statement-list }}
terminalnon-terminal
“Context-free” = only one non-terminal in left-part
![Page 6: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/6.jpg)
Parse treesParse trees
Parse tree = tree labeled with grammar Parse tree = tree labeled with grammar symbols, such that:symbols, such that:
If node is labeled A, and its children If node is labeled A, and its children are labeled are labeled xx11......xxnn, then there is a , then there is a productionproductionA A xx11......xxnn
““Parse tree from Parse tree from AA” = root labeled ” = root labeled with with AA
““Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokenslabeled with tokens
![Page 7: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/7.jpg)
Parse trees and sentencesParse trees and sentences
Frontier Frontier of tree = labels on leaves (in left-of tree = labels on leaves (in left-to-right order)to-right order)
Frontier of tree from Frontier of tree from SS is a is a sentential formsentential form Frontier of a complete tree from Frontier of a complete tree from SS is a is a
sentencesentence
L
E
a
L
; E
“Frontier”
![Page 8: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/8.jpg)
ExampleExample
GG: L : L L L ;; E | E E | E E E aa | | bb
Syntax trees from start symbol (L):Syntax trees from start symbol (L):
a a;E a;b;b
L
E
a
L
E
a
L
; E L
E
a
L
; E
b
L
E
b
;
Sentential forms:Sentential forms:
![Page 9: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/9.jpg)
DerivationsDerivations
Alternate definition of Alternate definition of sentencesentence:: Given Given , , in in VV*, say *, say is a is a derivation derivation
step step if if ’’’’ and ’’ and = = ’’’’ , where ’’ , where A A is a productionis a production
is a is a sentential form sentential form iff there exists a iff there exists a derivationderivation (sequence of derivation steps) (sequence of derivation steps) SS( alternatively, we say that ( alternatively, we say that SS))
Two definitions are equivalent, but note that there are many derivations corresponding to each parse tree
![Page 10: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/10.jpg)
Another exampleAnother example
HH: L : L E E ;; L | E L | E E E aa | | bb
L
E
a
L
E
a
L
;E L
E
a
L
;E
b
L
E
b
;
![Page 11: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/11.jpg)
AmbiguityAmbiguity
For some purposes, it is important to For some purposes, it is important to know whether a sentence can have more know whether a sentence can have more than one parse treethan one parse tree
A grammar is A grammar is ambiguous ambiguous if there is a if there is a sentence with more than one parse treesentence with more than one parse tree
Example: Example: EE E E++E | EE | E**E | E | idid
E
E
E
E
E
id id
id+
*
E
E
EE
Eid
id id
+
*
![Page 12: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/12.jpg)
NotesNotes
If e then if b then d else fIf e then if b then d else f{ int x; y = 0; }{ int x; y = 0; }A.b.c = d;A.b.c = d; Id -> s | s.idId -> s | s.id
E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id + T ->+ T ->id + id * id + idid + id * id + id
![Page 13: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/13.jpg)
AmbiguityAmbiguity
Ambiguity is a function of the Ambiguity is a function of the grammar rather than the languagegrammar rather than the language
Certain ambiguous grammars may Certain ambiguous grammars may have equivalent unambiguous oneshave equivalent unambiguous ones
![Page 14: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/14.jpg)
Grammar TransformationsGrammar Transformations
Grammars can be transformed Grammars can be transformed without affecting the language without affecting the language generatedgenerated
Three transformations are discussed Three transformations are discussed next:next:– Eliminating Ambiguity– Eliminating Left Recursion
(i.e.productions of the form AA )– Left Factoring
![Page 15: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/15.jpg)
Eliminating AmbiguityEliminating Ambiguity
Sometimes an ambiguous grammar can Sometimes an ambiguous grammar can be rewritten to eliminate ambiguitybe rewritten to eliminate ambiguity
For example, expressions involving For example, expressions involving additions and products can be written as additions and products can be written as follows:follows:
EE E E ++T | TT | T TT T T ** idid | | idid The language generated by this grammar The language generated by this grammar
is the same as that generated by the is the same as that generated by the grammar in slide “Ambiguity”. Both grammar in slide “Ambiguity”. Both generate generate idid(+(+id|id|**idid)*)*
However, this grammar is not ambiguousHowever, this grammar is not ambiguous
![Page 16: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/16.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
One advantage of this grammar is One advantage of this grammar is that it represents the precedence that it represents the precedence between operators. In the parsing between operators. In the parsing tree, products appear nested within tree, products appear nested within additionsadditions
E
T
TE
id
+
*
idT
id
![Page 17: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/17.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
An example of ambiguity in a An example of ambiguity in a programming language is the programming language is the dangling dangling elseelse
ConsiderConsider S S ifif thenthen SS elseelse SS | | ifif thenthen
SS | |
![Page 18: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/18.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
When there are two nested ifs and When there are two nested ifs and only one else..only one else..
S
ifif then S else S
if then S
S
ifif then S
ifif then S else S
![Page 19: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/19.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
In most languages (including C++ and Java), In most languages (including C++ and Java), each each elseelse is assumed to belong to the is assumed to belong to the nearest nearest ifif that is not already matched by an that is not already matched by an elseelse. This association is expressed in the . This association is expressed in the following (unambiguous) grammar:following (unambiguous) grammar:
S S MatchedMatched | Unmatched| Unmatched Matched Matched ifif thenthen Matched Matched elseelse Matched Matched | | Unmatched Unmatched ifif thenthen S S ||ifif thenthen Matched Matched elseelse Unmatched Unmatched
![Page 20: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/20.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
Ambiguity is a property of the Ambiguity is a property of the grammargrammar
It is undecidable whether a context It is undecidable whether a context free grammar is ambiguousfree grammar is ambiguous
The proof is done by reduction to The proof is done by reduction to Post’s correspondence problemPost’s correspondence problem
Although there is no general Although there is no general algorithm, it is possible to isolate algorithm, it is possible to isolate certain constructs in productions certain constructs in productions which lead to ambiguous grammarswhich lead to ambiguous grammars
![Page 21: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/21.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
For example, a grammar containing the For example, a grammar containing the production production AAAA |AA | would be ambiguous, would be ambiguous, because the substring because the substring has two parses: has two parses:
A
A A
A
A
A A
A
A
A
This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions AAAB |AB | BB and and BB
or the productionsor the productions AABA |BA | BB and and BB..
![Page 22: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/22.jpg)
Eliminating Ambiguity (Cont.)Eliminating Ambiguity (Cont.)
Examples of ambiguous productions:Examples of ambiguous productions:AAAAA | AAA | AA
A CF language is inherently ambiguous if A CF language is inherently ambiguous if it has no unambiguous CFGit has no unambiguous CFG– An example of such a language is
L = {aibjcm | i=j or j=m} which can be generated by the grammar:
SAB | DC AaA | CcC | BbBc | DaDb |
![Page 23: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/23.jpg)
Elimination of Left RecursionElimination of Left Recursion
A grammar is left recursive if it has a A grammar is left recursive if it has a nonterminal nonterminal AA and a derivation and a derivation A A AAfor some stringfor some string
– Top-down parsing methods cannot handle left-recursive grammars, so a transformation to eliminate left recursion is needed
Immediate left recursion (productions of Immediate left recursion (productions of the form the form A A AA) can be easily eliminated:) can be easily eliminated:
1. Group the A-productions as A A1 | A2 | … | Am | 1| 2 | … | n
where no i begins with A2. Replace the A-productions by A 1A’ | 2A’ | … | nA’
A’ 1A’ | 2A’| … | mA’ |
![Page 24: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/24.jpg)
Elimination of Left Recursion (Cont.)Elimination of Left Recursion (Cont.)
The previous transformation, however, The previous transformation, however, does not eliminate left recursion does not eliminate left recursion involving two or more stepsinvolving two or more steps
For example, consider the grammarFor example, consider the grammar S Aa | b
A Ac | Sd |
S is left-recursive because S is left-recursive because S S AAaaSSdadabut it is not immediately left but it is not immediately left recursiverecursive
![Page 25: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/25.jpg)
Elimination of Left Recursion (Cont.)Elimination of Left Recursion (Cont.)
Algorithm. Eliminate left recursionAlgorithm. Eliminate left recursionArrange nonterminals in some order AArrange nonterminals in some order A11, A, A2 ,2 ,,…, A,…, Ann
for i = 1 to n {for i = 1 to n { for j = 1 to i - 1 {for j = 1 to i - 1 { replace each production of the form replace each production of the form AAi i AAjj by the production by the production AAi i 1 1 | | 2 2 | … | | … | nn where where AAj j 1 1 | | 2 2 |…| |…| nn are all the current Aare all the current Ajj--
productionsproductions }} eliminate the immediate left recursion among the Aeliminate the immediate left recursion among the Aii--
productionsproductions}}
![Page 26: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/26.jpg)
Elimination of Left Recursion (Cont.)Elimination of Left Recursion (Cont.)
To show that the previous algorithm actually To show that the previous algorithm actually works, notice that iteration works, notice that iteration ii only changes only changes productions with productions with AAii on the left-hand side. And on the left-hand side. And mm > > ii in all productions of the form in all productions of the form AAi i AAmm
Induction proof: Induction proof: – Clearly true for i = 1– If it is true for all i < k, then when the outer loop is
executed for i = k, the inner loop will remove all productions Ai Am with m < i
– Finally, with the elimination of self recursion, m in the AiAm productions is forced to be > i
At the end of the algorithm, all derivations of the At the end of the algorithm, all derivations of the form form AAi i AAmmwill have will have mm > > ii and therefore left and therefore left recursion would not be possiblerecursion would not be possible
![Page 27: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/27.jpg)
Left FactoringLeft Factoring
Left factoring helps transform a grammar for Left factoring helps transform a grammar for predictive parsingpredictive parsing
For example, if we have the two productionsFor example, if we have the two productions S S ifif thenthen SS elseelse SS | | ifif thenthen SS
on seeing the input token on seeing the input token ifif, we cannot , we cannot immediately tell which production to choose to immediately tell which production to choose to expand expand SS
In general, if we have In general, if we have A A 1 1 || 22 and the input and the input begins with begins with , we do not know, we do not know (without looking (without looking further) which production to use to expand further) which production to use to expand AA
![Page 28: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/28.jpg)
Left Factoring (Cont.)Left Factoring (Cont.)
However, we may defer the decision However, we may defer the decision by expanding A to by expanding A to A’A’
Then after seeing the input derived Then after seeing the input derived from from , we may expand A’ to , we may expand A’ to 1 1 or toor to
22
Left-factored, the original Left-factored, the original productions becomeproductions become
AA A’A’
A’A’1 1 | | 22
![Page 29: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/29.jpg)
Non-Context-Free Language ConstructsNon-Context-Free Language Constructs
Examples of non-context-free languages are:Examples of non-context-free languages are:– L1 = {wcw | w is of the form (a|b)*}– L2 = {anbmcndm | n 1 and m 1 }– L3 = {anbncn | n 0 }
Languages similar to these that are context freeLanguages similar to these that are context free– L’1 = {wcwR | w is of the form (a|b)*} (wR stands for w
reversed) This language is generated by the grammar
S aSa | bSb | c
– L’2 = {anbmcmdn | n 1 and m 1 } This language is generated by the grammar
S aSd | aAdA bAc | bc
![Page 30: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/30.jpg)
Non-Context-Free Language Constructs Non-Context-Free Language Constructs (Cont.)(Cont.)L”L”2 2 = {= {aannbbnnccmmddmm | | n n 1 1 andand m m 1 1 }} is generated by the grammaris generated by the grammar
S ABA aAb | abB cBd | cd
L’L’3 3 = {= {aannbbnn | | n n 1 1}} is generated by the grammaris generated by the grammar
S aSb | ab This language is not definable by any This language is not definable by any
regular expressionregular expression
![Page 31: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/31.jpg)
Non-Context-Free Language Constructs Non-Context-Free Language Constructs (Cont.)(Cont.)
Suppose we could construct a DFSM Suppose we could construct a DFSM DD accepting accepting L’L’3. 3.
DD must have a finite number of states, say must have a finite number of states, say kk. . Consider the sequence of states Consider the sequence of states ss00, , ss11, , ss22, …, , …, sskk
entered by entered by DD having read having read , , aa, , aaaa, …, , …, aakk. . Since Since DD only has only has kk states, two of the states in the states, two of the states in the
sequence have to be equal. Say,sequence have to be equal. Say, s sii ssjj ( (ii jj). ). From From ssii, a sequence of , a sequence of ii bbs leads to an accepting s leads to an accepting
(final) state. Therefore, the same sequence of (final) state. Therefore, the same sequence of ii bbs s will also lead to an accepting state from will also lead to an accepting state from ssjj. . Therefore Therefore DD would accept would accept aajjbbii which means that which means that the language accepted by the language accepted by DD is not identical to L’ is not identical to L’33. . A contradiction.A contradiction.
![Page 32: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/32.jpg)
ParsingParsing
The parsing problem is: Given string of The parsing problem is: Given string of tokens tokens ww, find a parse tree whose frontier , find a parse tree whose frontier is is ww. (Equivalently, find a derivation from . (Equivalently, find a derivation from ww))
A A parserparser for a grammar for a grammar GG reads a list of reads a list of tokens and finds a parse tree if they form tokens and finds a parse tree if they form a sentence (or reports an error otherwise)a sentence (or reports an error otherwise)
Two classes of algorithms for parsing:Two classes of algorithms for parsing:– Top-down– Bottom-up
![Page 33: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/33.jpg)
Parser generatorsParser generators
A A parser generator parser generator is a program that reads is a program that reads a grammar and produces a parsera grammar and produces a parser
The best known parser generator is The best known parser generator is yaccyacc It It produces bottom-up parsersproduces bottom-up parsers
Most parser generators - including yacc - Most parser generators - including yacc - do not work for every CFG; they accept a do not work for every CFG; they accept a restricted class of CFG’s that can be restricted class of CFG’s that can be parsed efficiently using the method parsed efficiently using the method employed by that parser generatoremployed by that parser generator
![Page 34: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/34.jpg)
Top-down parsingTop-down parsing
Starting from parse tree containing Starting from parse tree containing just just SS, build tree down toward input. , build tree down toward input. Expand left-most non-terminal.Expand left-most non-terminal.
Algorithm: (next slide)Algorithm: (next slide)
![Page 35: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/35.jpg)
Top-down parsing (cont.)Top-down parsing (cont.)
Let input = aLet input = a11aa22...a...ann
current sentential form (csf) = current sentential form (csf) = SS
loop {loop {
suppose csf = asuppose csf = a11…a…akkAA
based on abased on akk+1+1…, choose production…, choose production
A A csf becomes acsf becomes a11…a…akk
}}
![Page 36: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/36.jpg)
Top-down parsing exampleTop-down parsing example
Grammar: Grammar: HH: L : L E E ;; L | E L | E E E aa | | bb
Input: Input: a;ba;bParse tree Sentential form Parse tree Sentential form Input Input
L a;b
E;L a;b
L
LE L;
LE L;
a
a;L a;b
![Page 37: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/37.jpg)
Top-down parsing example Top-down parsing example (cont.)(cont.)
Parse tree Sentential form InputParse tree Sentential form Input
a;E a;bLE L;
a E
LE L;
a E
b
a;b a;b
![Page 38: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/38.jpg)
LL(1) parsingLL(1) parsing
Efficient form of top-down parsingEfficient form of top-down parsingUse only first symbol of remaining Use only first symbol of remaining
input (input (aakk+1+1) to choose next ) to choose next
production. That is, employ a production. That is, employ a function M: function M: N N P in “choose P in “choose production” step of algorithm.production” step of algorithm.
When this is possible, grammar is When this is possible, grammar is called LL(1)called LL(1)
![Page 39: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/39.jpg)
LL(1) examplesLL(1) examples
Example 1:Example 1: H: L E ; L | E
E a | b
Given input a;b, so next symbol is a.
Which production to use? Can’t tell.
H not LL(1)
![Page 40: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/40.jpg)
LL(1) examplesLL(1) examples
Example 2:Example 2: Exp Term Exp’
Exp’ $ | + Exp
Term id(Use $ for “end-of-input” symbol.)
Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.
Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.
![Page 41: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/41.jpg)
Nonrecursive predictive parsingNonrecursive predictive parsing
Maintain a stack explicitly, rather Maintain a stack explicitly, rather than implicitly via recursive callsthan implicitly via recursive calls
Key problem during predictive Key problem during predictive parsing: determining the production parsing: determining the production to be applied for a non-terminalto be applied for a non-terminal
![Page 42: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/42.jpg)
Nonrecursive predictive parsingNonrecursive predictive parsing
Algorithm. Algorithm. Nonrecursive predictive parsingNonrecursive predictive parsing Set Set ipip to point to the first symbol of to point to the first symbol of ww$.$. repeatrepeat Let Let XX be the top of the stack symbol and a the symbol pointed to be the top of the stack symbol and a the symbol pointed to
by by ipip ifif XX is a terminal or $ is a terminal or $ thenthen ifif XX == == aa thenthen pop pop XX from the stack and advance from the stack and advance ipip elseelse error() error() elseelse // // XX is a nonterminal is a nonterminal ifif MM[[X,aX,a] == ] == XXYY11 Y Y22 … Y … Y kk thenthen pop pop XX from the stack from the stack push Ypush YkkY Y k-1k-1, …, Y, …, Y11 onto the stack with Y onto the stack with Y11 on top on top (push nothing if (push nothing if YY11 Y Y22 … Y … Y kk is is ) ) output the production output the production XXYY11 Y Y22 … Y … Y kk elseelse error() error() untiluntil X == $ X == $
![Page 43: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/43.jpg)
LL(1) grammarsLL(1) grammars
No left recursionNo left recursionA A : If this production is chosen,
parse makes no progress.No common prefixesNo common prefixes
A |
Can fix by “left factoring”:A A’
’|
![Page 44: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/44.jpg)
LL(1) grammars (cont.)LL(1) grammars (cont.)
No ambiguityNo ambiguityPrecise definition requires that
production to choose be unique (“choose” function M very hard to calculate otherwise)
![Page 45: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/45.jpg)
Top-down ParsingTop-down Parsing
Input tokens: <t0,t1,…,ti,...>L
E0 … En
Start symbol androot of parse tree
Input tokens: <ti,...>L
E0 … En
...From left to right,“grow” the parsetree downwards
![Page 46: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/46.jpg)
Checking LL(1)-nessChecking LL(1)-ness
For any sequence of grammar symbols For any sequence of grammar symbols , , define set FIRST(define set FIRST() ) to be to be
FIRST(FIRST() = { ) = { aa | | * * aa for some for some } }
![Page 47: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/47.jpg)
LL(1) definitionLL(1) definition
Define: Grammar G = (N, Define: Grammar G = (N, , P, S) is LL(1), P, S) is LL(1) iff whenever there iff whenever there are two left-most derivations (in which the leftmost non-are two left-most derivations (in which the leftmost non-terminal is always expanded first) terminal is always expanded first) SS * * wAwA ww * * wtxwtx SS * * wAwA ww * * wtywty
it follows that it follows that = =
In other words, given In other words, given 1. a string1. a string wA wA in V* and in V* and 2. t, the first terminal symbol to be derived from 2. t, the first terminal symbol to be derived from AA there is at most one production that can be applied to there is at most one production that can be applied to AA to to yield a derivation of any terminal string beginning with yield a derivation of any terminal string beginning with wtwt
FIRST sets can often be calculated by inspectionFIRST sets can often be calculated by inspection
![Page 48: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/48.jpg)
FIRST SetsFIRST Sets
Exp Term Exp’Exp’ $ | + Exp Term id
(Use $ for “end-of-input” symbol)
FIRST($) = {$}FIRST(+ Exp) = {+}
FIRST($) FIRST(+ Exp) = {}
grammar is LL(1)
FIRST($) = {$}FIRST(+ Exp) = {+}
FIRST($) FIRST(+ Exp) = {}
grammar is LL(1)
![Page 49: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/49.jpg)
FIRST SetsFIRST Sets
L E ; L | EE a | b
FIRST(E ; L) = {a, b} = FIRST(E)FIRST(E ; L) FIRST(E) {} grammar not LL(1).
FIRST(E ; L) = {a, b} = FIRST(E)FIRST(E ; L) FIRST(E) {} grammar not LL(1).
![Page 50: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/50.jpg)
Computing FIRST SetsComputing FIRST Sets
Algorithm. Compute FIRST(X) for all grammar Algorithm. Compute FIRST(X) for all grammar symbols Xsymbols X
forall X forall X V do FIRST(X) = {} V do FIRST(X) = {} forall X forall X (X is a terminal) do FIRST(X) = {X} (X is a terminal) do FIRST(X) = {X} forall productions X forall productions X do FIRST(X) = FIRST(X) U { do FIRST(X) = FIRST(X) U {}} repeatrepeat c: forall productions c: forall productions X X YY11YY22 … Y … Ykk do do forall i forall i [1,k] do [1,k] do FIRST(X) = FIRST(X) U (FIRST(FIRST(X) = FIRST(X) U (FIRST(YYii) - {) - {}) })
if if FIRST( FIRST(YYii) then continue c) then continue c FIRST(X) = FIRST(X) U {FIRST(X) = FIRST(X) U {} } until no more terminals or until no more terminals or are added to any FIRST are added to any FIRST
setset
![Page 51: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/51.jpg)
FIRST Sets of Strings of SymbolsFIRST Sets of Strings of Symbols
FIRST(XFIRST(X11XX22…X…Xnn) is the union of ) is the union of
FIRST(XFIRST(X11) and all FIRST(X) and all FIRST(Xii) such that ) such that
FIRST( FIRST(XXkk) for ) for kk = 1, 2, …, = 1, 2, …, ii-1-1
FIRST(XFIRST(X11XX22…X…Xnn) contains ) contains iff iff
FIRST(FIRST(XXkk) for k = 1, 2, …, ) for k = 1, 2, …, nn
![Page 52: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/52.jpg)
FIRST Sets do not SufficeFIRST Sets do not Suffice
Given the productionsGiven the productions A A T x T x A A T y T y
T T ww
T T TTww should be applied when the next should be applied when the next
input token is w.input token is w. TTshould be applied whenever the should be applied whenever the
next terminal is either x or ynext terminal is either x or y
![Page 53: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/53.jpg)
FOLLOW SetsFOLLOW Sets
For any nonterminal For any nonterminal XX, define the set , define the set FOLLOW(FOLLOW(XX) ) as as
FOLLOW(FOLLOW(XX) = {) = {aa | S | S * * XXaa}}
![Page 54: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/54.jpg)
Computing the FOLLOW SetComputing the FOLLOW Set
Algorithm. Compute FOLLOW(X) for all nonterminals Algorithm. Compute FOLLOW(X) for all nonterminals XX
FOLLOW(S) ={$}FOLLOW(S) ={$} forall productions A forall productions A BB do do
FOLLOW(B)=Follow(B) FOLLOW(B)=Follow(B) (FIRST( (FIRST() - {) - {})}) repeatrepeat forall productions A forall productions A B or A B or A BB with with
FIRST(FIRST() do) do FOLLOW(B) = FOLLOW(B) FOLLOW(B) = FOLLOW(B)
FOLLOW(A) FOLLOW(A) until all FOLLOW sets remain the sameuntil all FOLLOW sets remain the same
![Page 55: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/55.jpg)
Construction of a predictive parsing tableConstruction of a predictive parsing table
Algorithm. Construction of a predictive parsing Algorithm. Construction of a predictive parsing tabletable
M[:,:] = {}M[:,:] = {} forall productions A forall productions A do do forall a forall a FIRST( FIRST() do ) do M[A,a] = M[A,a] U {A M[A,a] = M[A,a] U {A } } if if FIRST( FIRST() then ) then forall b forall b FOLLOW(A) do FOLLOW(A) do M[A,b] = M[A,b] U {A M[A,b] = M[A,b] U {A } } Make all empty entries of M be errorMake all empty entries of M be error
![Page 56: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/56.jpg)
Another Definition of LL(1)Another Definition of LL(1)
Define: Grammar G is LL(1)Define: Grammar G is LL(1) if for every if for every
AA N with productions A N with productions A
11nn
FIRST(FIRST(ii FOLLOW(A)) FOLLOW(A)) FIRST( FIRST(j j
FOLLOW(A) ) = {} for all FOLLOW(A) ) = {} for all ii, , jj
![Page 57: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/57.jpg)
Regular LanguagesRegular Languages
Definition. A Definition. A regularregular grammar is one grammar is one whose productions are all of the whose productions are all of the type:type:– A aB– A a
A A Regular ExpressionRegular Expression is either: is either:– a– R1 | R2
– R1 R2
– R*
![Page 58: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/58.jpg)
Nondeterministic Finite State Nondeterministic Finite State AutomatonAutomaton
0 1 2 3
a
b
a b bstart
![Page 59: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/59.jpg)
Regular LanguagesRegular Languages
Theorem. The classes of languagesTheorem. The classes of languages– Generated by a regular grammar– Expressed by a regular expression– Recognized by a NDFS automaton– Recognized by a DFS automaton
coincide.coincide.
![Page 60: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/60.jpg)
Deterministic Finite AutomatonDeterministic Finite Automaton
space, tab, new line
digit
OPERATOR
KEYWORD
digit
=, +, -, /, (, )
letter
START
NUM
$ $ $
circle state
double circle accept state
arrow transition
bold, cap labels state names
lower case labels transition characters
![Page 61: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/61.jpg)
Scanner codeScanner code state := startstate := start looploop if no input character buffered then read one, and add it to the accumulated tokenif no input character buffered then read one, and add it to the accumulated token case state ofcase state of start: start: case input_char ofcase input_char of A..Z, a..z : state := idA..Z, a..z : state := id 0..9 : state := num0..9 : state := num else ...else ... endend id:id: case input_char ofcase input_char of A..Z, a..z : state := idA..Z, a..z : state := id 0..9 : state := id0..9 : state := id else ...else ... endend num:num: case input_char ofcase input_char of 0..9: ...0..9: ... ...... else ...else ... endend ...... end;end; end;end;
![Page 62: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/62.jpg)
Table-driven DFATable-driven DFA
0-start 1-num 2-id 3-operator 4-keyword
white space 0 exit exit exit exit
letter 2 error 2 exit error
digit 1 1 2 exit error
operator 3 exit exit exit exit
$ 4 error error exit 4
![Page 63: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/63.jpg)
L0
CFL [NPA]
Language ClassesLanguage Classes
LR(1)
LL(1)RL
[DFA=NFA]
L0
CSL
![Page 64: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/64.jpg)
QuestionQuestion
Are regular expressions, as provided Are regular expressions, as provided by Perl or other languages, sufficient by Perl or other languages, sufficient for parsing nested structures, e.g. for parsing nested structures, e.g. XML files?XML files?
![Page 65: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/65.jpg)
Recursive Descent ParserRecursive Descent Parser
stat stat → var → var == expr expr ;;
expr → term [expr → term [++ expr] expr]
term → factor [term → factor [** factor] factor]
factor → factor → (( expr expr )) | var | constant | var | constant
var → identifiervar → identifier
![Page 66: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/66.jpg)
ScannerScanner
public class Scanner {public class Scanner { private StreamTokenizer input;private StreamTokenizer input; private Type lastToken;private Type lastToken;
public enum Type { INVALID_CHAR, NO_TOKEN , PLUS,public enum Type { INVALID_CHAR, NO_TOKEN , PLUS,// etc. for remaining tokens, then:// etc. for remaining tokens, then:EOFEOF
};};
public Scanner (Reader r) {public Scanner (Reader r) { input = new StreamTokenizer(r);input = new StreamTokenizer(r); input.resetSyntax();input.resetSyntax(); input.eolIsSignificant(false);input.eolIsSignificant(false); input.wordChars('a', 'z');input.wordChars('a', 'z'); input.wordChars('A', 'Z');input.wordChars('A', 'Z'); input.ordinaryChar('+');input.ordinaryChar('+'); input.ordinaryChar('*');input.ordinaryChar('*'); input.ordinaryChar('=');input.ordinaryChar('='); input.ordinaryChar('(');input.ordinaryChar('('); input.ordinaryChar(')');input.ordinaryChar(')'); input.whitespaceChars('\u0000', ' ');input.whitespaceChars('\u0000', ' '); }}
![Page 67: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/67.jpg)
ScannerScanner
public int nextToken() {public int nextToken() { Type token;Type token; try {try { switch (input.nextToken()) {switch (input.nextToken()) { case StreamTokenizer.TT_EOF:case StreamTokenizer.TT_EOF:
token = EOF;token = EOF;break;break;
case Type.TT_WORD:case Type.TT_WORD:if (input.sval.equalsIgnoreCase("false"))if (input.sval.equalsIgnoreCase("false")) token = FALSE;token = FALSE;else if (input.sval.equalsIgnoreCase("true"))else if (input.sval.equalsIgnoreCase("true")) token = TRUE;token = TRUE;elseelse token = VARIABLE;token = VARIABLE;break;break;case '+':case '+': token = PLUS;token = PLUS;break;break;// etc.// etc.
}} } catch (IOException ex) { token = EOF; }} catch (IOException ex) { token = EOF; } return token;return token; }}}}
![Page 68: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/68.jpg)
ParserParser
public class Parser {public class Parser { private LexicalAnalyzer lexer;private LexicalAnalyzer lexer; private Type token;private Type token;
public Expr parse(Reader r) throws public Expr parse(Reader r) throws SyntaxException {SyntaxException {
lexer = new LexicalAnalyzer(r);lexer = new LexicalAnalyzer(r);nextToken(); // assigns tokennextToken(); // assigns token
Statement stat = statement();Statement stat = statement(); expect(LexicalAnalyzer.EOF);expect(LexicalAnalyzer.EOF); return stat;return stat; }}
![Page 69: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/69.jpg)
StatementStatement
// stat ::= variable '=' expr ';'// stat ::= variable '=' expr ';' private Statement stat() throws private Statement stat() throws
SyntaxException {SyntaxException { Expr var = variable();Expr var = variable(); expect(LexicalAnalyzer.ASSIGN);expect(LexicalAnalyzer.ASSIGN); Expr exp = expr();Expr exp = expr(); Statement stat = new Statement(var, exp);Statement stat = new Statement(var, exp); expect(LexicalAnalyzer.SEMICOLON);expect(LexicalAnalyzer.SEMICOLON); return stat;return stat; }}
![Page 70: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/70.jpg)
ExprExpr
// expr ::= term ['+' expr]// expr ::= term ['+' expr] private Expr expr() throws SyntaxException private Expr expr() throws SyntaxException
{{ Expr exp = term();Expr exp = term(); while (token == LexicalAnalyzer.PLUS) {while (token == LexicalAnalyzer.PLUS) { nextToken();nextToken(); exp = new Exp(exp, expression());exp = new Exp(exp, expression()); }} return exp;return exp; }}
![Page 71: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/71.jpg)
TermTerm
// term ::= factor ['*' term ]// term ::= factor ['*' term ]
private Expr term() throws private Expr term() throws SyntaxException {SyntaxException {
Expr exp = factor();Expr exp = factor();
// Rest of body: left as an exercise.// Rest of body: left as an exercise.
}}
![Page 72: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/72.jpg)
FactorFactor
// factor ::= ( expr ) | var// factor ::= ( expr ) | var private Expr factor() throws S.Exception {private Expr factor() throws S.Exception { Expr exp = null;Expr exp = null; if (token == LexicalAnalyzer.LEFT_PAREN) {if (token == LexicalAnalyzer.LEFT_PAREN) { nextToken();nextToken(); exp = expression();exp = expression(); expect(LexicalAnalyzer.RIGHT_PAREN);expect(LexicalAnalyzer.RIGHT_PAREN); } else {} else {
exp = variable();exp = variable();}}
return exp;return exp; }}
![Page 73: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/73.jpg)
VariableVariable
// variable ::= identifier// variable ::= identifier private Expr variable() throws S.Exception {private Expr variable() throws S.Exception {
if (if (token == LexicalAnalyzer.ID) {token == LexicalAnalyzer.ID) { Expr exp = new Variable(lexer.getString());Expr exp = new Variable(lexer.getString()); nextToken();nextToken(); return exp;return exp; }} }}
![Page 74: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/74.jpg)
ConstantConstant
private Expr constantExpression() throws private Expr constantExpression() throws S.Exception {S.Exception {
Expr exp = null;Expr exp = null;
// Handle the various cases for constant// Handle the various cases for constant
// expressions: left as an exercise.// expressions: left as an exercise.
return exp;return exp;
}}
![Page 75: Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d485503460f94a23b45/html5/thumbnails/75.jpg)
UtilitiesUtilities
private void expect(Type t) throws private void expect(Type t) throws SyntaxException {SyntaxException {
if (token != t) { // throw SyntaxException...if (token != t) { // throw SyntaxException... }} nextToken();nextToken(); }}
private void nextToken() {private void nextToken() { token = lexer.nextToken();token = lexer.nextToken(); }}}}