1 chapter 3 context-free grammars and parsing. 2 parsing: syntax analysis decides which part of the...
Post on 22-Dec-2015
231 views
TRANSCRIPT
1
Chapter 3
Context-Free Grammars and Parsing
2
Parsing: Syntax Analysis
decides which part of the incoming token stream should be grouped together.
the output of parsing is some representation of a parse tree.
intermediate code generator transforms the parse tree into an intermediate language.
3
Comparisons between r.e. (regular expressions) and c.f.g. (context-free grammars)
r.e. tokens F.A. to test a valid token
c.f.g. programming language constructs P.F.A. to test a valid program (sentence)
describes using
describes
using
4
Features of programming languages
contents:
- declarations
- sequential statements
- iterative statements
- conditional statements
5
features:
- declare/state recursively & repeatedly
- hierarchical specification
e.g., compound statement -> statement ->
expression -> id
- nested structures
- similarity
6
Description of programming languages
Syntax Diagrams (See Sec. 3.5.2) Context Free Grammars (CFG)
exp exp addop term | term
addop + | -
term term mulop factor | factor
mulop
factor ( exp ) | number
Contex Free Grammar (in BNF)
*
9
History
- In 1956 BNF (Backus Naur Form) is used for
description of natural language. - Algol uses BNF to describe its language.- The Syntactic Specification of Programming
Languages - CFG ( a BNF description)
10
Capabilities of Context-free grammars
give precise syntactic specification of programming languages
a parser can be constructed automatically by CFG
the syntax entity specified in CFG can be used for translating into object code.
useful for describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-then-else, etc.
11
Def. of context free grammars
- A CFG is a 4-tuple (V,T,P,S), where V - a finite set of variables (non-terminals) T - a finite set of terminal symbols (tokens) P - a finite set of productions (or grammar rules) S - a start symbol and V T = S V Productions are of the form: A -> , where A V, (V+T)* - CFG generates CFL(Context Free Languages)
12
An Example
G = ({E}, {+, *, (, ), id}, P, E)
P: { E -> E + E
E -> E * E
E -> ( E )
E -> id }
13
Rules from F.A.(r.e.) to CFG
1. For each state there is a nonterminal symbol.
2. If state A has a transition to state B on symbol a, introduce A -> aB.
3. If A goes to B on input , introduce A -> B.4. If A is an accepting state, introduce A -> .5. Make the start state of the NFA be the start
symbol of the grammar.
14
Examples
(1) r.e.: (a|b)(a|b|0|1)* c.f.g.: S -> aA|bA A -> aA|bA|0A|1A|
(2) r.e.: (a|b)*abb c.f.g.: S -> aS | bS | aA A -> bB B -> bC C ->
15
Why don’t we use c.f.g. to replace r.e. ?
r.e. => easy & clear description for token. r.e. => efficient token recognizer modularizing the components
16
Derivations (How does a CFG defines a
language?)
Definitions: directly derive derive in zero or more steps derive in one or more steps derive in i steps A => (V+T)* sentential form (V+T)* sentence T* language: { w | S => w , w T* } leftmost derivations rightmost derivations
+
=> (V+T)* *
=> (V+T)* +
i
+
(number-number)*number
P : { exp exp op exp | ( exp ) | number op + | - | * }
G = ( {exp, op}, {+, *, (, ), number}, P, exp )
18
19
Parse trees
=> a graphical representation for derivations.
(Note the difference between parse tree and syntax tree.)
=> Often the parse tree is produced in only a figurative sense; in reality, the parse tree exists only as a sequence of actions made by stepping through the tree construction process.
20
Ambiguity
Ambiguous Grammars
- Def.: A context-free grammar that can produce more than one parse tree for some sentence.
- The ways to disambiguate a grammar: (1) specifying the intention (e.g. associtivity and precedence for arithmetic operators, other) (2) rewrite a grammar to incorporate the intention into the grammar itself.
For (1) Precedence: negate > exponent ( ) > * / > + - Associtivity: exponent ==> right associtivity others ==> left associtivity In yacc, a “specification rule” is used to solve the problem
of (1), e.g., the alignment order, the special syntax, default value (refer to yacc manual for the disambiguating rules)
For (2) 1. introducing one nonterminal for each precedence level.
22
Example 1
E -> E + E | E- E | E * E | E / E | E E | ( E )
| - E | id
is ambiguous ( is exponent operator with right associtivity.)
E
E + E
E * Eid
id id
E
E * E
E + E
id id
id
More than one parse tree for the sentence id + id * id
+
*id
id id
*
+
id id
id
More than one syntax tree for the sentence id + id * id
25
The corresponding grammar shown below is unambiguous
element - > (expression) | id /*((expression) 括號內的最
優先做之故 ) */
primary - > -primary | element
factor - > primary factor | primary /*has right
associtivity */
term - > term * factor | term / factor | factor
expression -> expression + term | expression –
term | term
expression
expression + term
term * factor
primaryelement
id
term
factor
primary
element
id
factor
primary
element
id
Ex: id + id * id
29
Example 2
stat -> IF cond THEN stat
| IF cond THEN stat ELSE stat
| other stat
is an ambiguous grammar
if c1 then
stat
IF cond THEN stat
IF cond THEN stat ELSE stat
if c2 then s2 else s3
stat
IF cond THEN stat ELSE stat
IF cond THEN stat if c1 then
if c2 then
else
s2
s3
If c1 then if c2 then s2 else s3
Dangling else problem
31
The corresponding grammar shown below is
unambiguous.
stat - > matched-stat | unmatched-stat
matched-stat - > if cond then matched-stat else matched-
stat | other-stat
unmatched-stat - > if cond then stat | if cond then
matched-stat else unmatched-stat
33
Non-context free language constructs
L = {wcw | w is in (a|b)*} L = {anbmcndm | n 1 and m 1} L = {anbncn| n 0}
34
Basic Parsing Techniques
1. How to check if an input string is a sentence of a given grammar?
(check the syntax -- not only used in the programming language)
2. How to construct a parse tree for the input string, if desired?
35
Method classic approach modern approach
1. top-down recursive descent LL parsing
(produce leftmost derivation)
2. bottom-up operator precedence LR parsing (shift-
reduce parsing; produce rightmost derivation in reverse order)
36
An Example (for LR Parsing)
S -> aABe A -> Abc | b B -> d w = abbcde S => aABe => aAde => aAbcde => abbcde
LR parsing: abbcde ==> aAbcde ==> aAde ==> aABe ==> S
rm rmrm rm
37
38
39
Assignment #3a
1. Do exercises 3.3, 3.5, 3.24, 3.25
Using the grammar in BNF of the TINY
language in Fig. 3.6 to derive step by step the sequence of tokens of the program in Fig. 3.8. (for practice only)