review of cfgs and parsing - university of waterloo
TRANSCRIPT
Winter 2014 Costas Busch - RPI 1
Review of CFGs and Parsing – I Context-free Languages and Grammars
Winter 2014 Costas Busch - RPI 2
Regular Languages
}0:{ ≥nba nn }{ Rww
**ba *)( ba +
Context-Free Languages
Winter 2014 Costas Busch - RPI 3
Context-Free Languages
Pushdown Automata
Context-Free Grammars
stack
automaton
Winter 2014 Costas Busch - RPI 4
Context-Free Grammars
Winter 2014 Costas Busch - RPI 5
Formal Definitions
( )PSTVG ,,,=
Set of Variables/ Non-terminal Set of
terminal symbols
Start Symbol
Set of productions
Grammar:
Winter 2014 Costas Busch - RPI 6
Context-Free Grammar:
All productions in are of the form
sA →
String of Non-terminal and terminals
),,,( PSTVG =
P
Non-terminal
Winter 2014 Costas Busch - RPI 7
λ|aSbS →
( )PSTVG ,,,=
}{SV =},{ baT =
},{ λ→→= SaSbSP
variables terminals
productions
start variable
Example of Context-Free Grammar
Winter 2014 Costas Busch - RPI 8
For a grammar with start variable G S
String of terminals or
*},:{)(*
TwwSwGL ∈⇒=
λ
Language of a Grammar:
Winter 2014 Costas Busch - RPI 9
Example
λ→
→
SaSbSGrammar:
Non-terminals
Sequence of terminals and non-terminals
The right side may be λ
Winter 2014 Costas Busch - RPI 10
Grammar: Derivation of string :
λ→
→
SaSbS
abaSbS ⇒⇒
ab
aSbS→ λ→S
Winter 2014 Costas Busch - RPI 11
Grammar: Derivation of string :
aabbaaSbbaSbS ⇒⇒⇒
aSbS→ λ→S
aabb
λ→
→
SaSbS
Winter 2014 Costas Busch - RPI 12
λ→
→
SaSbS
}0:{ ≥= nbaL nn
Grammar:
Language of the grammar:
Winter 2014 Costas Busch - RPI 13
We write: Instead of:
aaabbbS*⇒
aaabbbaaaSbbbaaSbbaSbS ⇒⇒⇒⇒
for zero or more derivation steps
A Convenient Notation
Winter 2014 Costas Busch - RPI 14
nww*
1 ⇒
nwwww ⇒⇒⇒⇒ 321
in zero or more derivation steps
In general we write:
If:
ww*⇒Trivially:
Winter 2014 Costas Busch - RPI 15
A language is context-free if there is a context-free grammar with
LG
)(GLL =
Context-Free Language:
Winter 2014 Costas Busch - RPI 16
since context-free grammar :
}0:{ ≥= nbaL nn
G
Example:
is a context-free language
generates LGL =)(
λ|aSbS →
Winter 2014 Costas Busch - RPI 17
λ||bSbaSaS →
abbaabSbaaSaS ⇒⇒⇒
Context-free grammar : G
Example derivations:
abaabaabaSabaabSbaaSaS ⇒⇒⇒⇒
=)(GL }*},{:{ bawwwR ∈
Palindromes of even length
Another Example
Winter 2014 Costas Busch - RPI 18
λ|| SSaSbS→
ababSaSbSSSS ⇒⇒⇒⇒
Context-free grammar : G
Example derivations:
abababaSbabSaSbSSSS ⇒⇒⇒⇒⇒
}prefixanyin )()( and
),()(:{
vvnvn
wnwnw
ba
ba≥
==)(GL
() ((( ))) (( ))
Describes matched parentheses: )b (, ==a
Another Example
Winter 2014 Costas Busch - RPI 19
Derivation Order and
Derivation Trees
Winter 2014 Costas Busch - RPI 20
Derivation Order Consider the following example grammar with 5 productions:
ABS→.1λ→
→
AaaAA
.3
.2λ→
→
BBbB
.5
.4
Winter 2014 Costas Busch - RPI 21
aabaaBbaaBaaABABS54321⇒⇒⇒⇒⇒
Leftmost derivation order of string : aab
At each step, we substitute the leftmost variable
ABS→.1λ→
→
AaaAA
.3
.2λ→
→
BBbB
.5
.4
Winter 2014 Costas Busch - RPI 22
aabaaAbAbABbABS32541⇒⇒⇒⇒⇒
Rightmost derivation order of string : aab
At each step, we substitute the rightmost variable
ABS→.1λ→
→
AaaAA
.3
.2λ→
→
BBbB
.5
.4
Winter 2014 Costas Busch - RPI 23
aabaaAbAbABbABS32541⇒⇒⇒⇒⇒
Rightmost derivation of : aab
aabaaBbaaBaaABABS54321⇒⇒⇒⇒⇒
Leftmost derivation of : aab
ABS→.1λ→
→
AaaAA
.3
.2λ→
→
BBbB
.5
.4
Winter 2014 Costas Busch - RPI 24
Derivation Trees Consider the same example grammar:
aabaaBbaaABbaaABABS ⇒⇒⇒⇒⇒
And a derivation of : aab
ABS→ λ|aaAA→ λ|BbB→
Winter 2014 Costas Busch - RPI 25
ABS⇒S
BA
ABS→ λ|aaAA→ λ|BbB→
yield AB
Winter 2014 Costas Busch - RPI 26
aaABABS ⇒⇒
a a A
S
BA
ABS→ λ|aaAA→ λ|BbB→
yield aaAB
Winter 2014 Costas Busch - RPI 27
aaABbaaABABS ⇒⇒⇒S
BA
a a A B b
ABS→ λ|aaAA→ λ|BbB→
yield aaABb
Winter 2014 Costas Busch - RPI 28
aaBbaaABbaaABABS ⇒⇒⇒⇒S
BA
a a A B b
λ
ABS→ λ|aaAA→ λ|BbB→
yield aaBbBbaa =λ
Winter 2014 Costas Busch - RPI 29
aabaaBbaaABbaaABABS ⇒⇒⇒⇒⇒
yield aabbaa =λλ
S
BA
a a A B b
λ λ
Derivation Tree
ABS→ λ|aaAA→ λ|BbB→
(parse tree)
Winter 2014 Costas Busch - RPI 30
aabaaBbaaBaaABABS ⇒⇒⇒⇒⇒
aabaaAbAbABbABS ⇒⇒⇒⇒⇒S
BA
a a A B b
λ λ
Give same derivation tree
Sometimes, derivation order doesn’t matter Leftmost derivation:
Rightmost derivation:
Winter 2014 Costas Busch - RPI 31
Ambiguity
Winter 2014 Costas Busch - RPI 32
Grammar for mathematical expressions
))(()( aaaaaaa +∗++∗+
Example strings:
Denotes any number
aEEEEEE |)(|| ∗+→
Winter 2014 Costas Busch - RPI 33
A leftmost derivation for aaa ∗+
E
EE
EE
+
a
a a
∗
aaaEaaEEaEaEEE*+⇒∗+⇒
∗+⇒+⇒+⇒
aEEEEEE |)(|| ∗+→
Winter 2014 Costas Busch - RPI 34
E
EE
+
a a
∗
EE a
aaaEaaEEaEEEEEE
∗+⇒∗+⇒
∗+⇒∗+⇒∗⇒
Another leftmost derivation for
aEEEEEE |)(|| ∗+→
aaa ∗+
Winter 2014 Costas Busch - RPI 35
aaa ∗+ E
EE
+
a a
∗
EE a
E
EE
EE
+
a
a a
∗
Two derivation trees for
aEEEEEE |)(|| ∗+→
Winter 2014 Costas Busch - RPI 36
E
EE
+
∗
EE
E
EE
EE
+
∗2
2 2 2 2
2
222 ∗+=∗+ aaa
take 2=a
Winter 2014 Costas Busch - RPI 37
E
EE
+
∗
EE
E
EE
EE
+
∗
6222 =∗+
2
2 2 2 2
2
8222 =∗+
4
2 2
2
6
2 2
24
8
Good Tree Bad Tree
Compute expression result using the tree
Winter 2014 Costas Busch - RPI 38
Two different derivation trees may cause problems in applications which use the derivation trees:
• Evaluating expressions
• In general, in compilers for programming languages
Winter 2014 Costas Busch - RPI 39
Ambiguous Grammar:
A context-free grammar is ambiguous if there is a string which has:
two different derivation trees or two leftmost derivations
G)(GLw∈
(Two different derivation trees give two different leftmost derivations and vice-versa)
Winter 2014 Costas Busch - RPI 40
E
EE
+
a a
∗
EE a
E
EE
EE
+
a
a a
∗
string aaa ∗+ has two derivation trees
aEEEEEE |)(|| ∗+→
this grammar is ambiguous since
Example:
Winter 2014 Costas Busch - RPI 41
string aaa ∗+ has two leftmost derivations
aaaEaaEEaEEEEEE
∗+⇒∗+⇒
∗+⇒∗+⇒∗⇒
aaaEaaEEaEaEEE*+⇒∗+⇒
∗+⇒+⇒+⇒
aEEEEEE |)(|| ∗+→
this grammar is ambiguous also because
Winter 2014 Costas Busch - RPI 42
IF_STMT if EXPR then STMT →| if EXPR then STMT else STMT
Another ambiguous grammar:
Variables Terminals
Very common piece of grammar in programming languages
Winter 2014 Costas Busch - RPI 43
If expr1 then if expr2 then stmt1 else stmt2
IF_STMT
expr1 then
else if expr2 then
STMT
stmt1
if
IF_STMT
expr1 then else
if expr2 then
STMT stmt2 if
stmt1
stmt2
Two derivation trees
Winter 2014 Costas Busch - RPI 44
In general, ambiguity is bad and we want to remove it
Sometimes it is possible to find a non-ambiguous grammar for a language
But, in general we cannot do so
Winter 2014 Costas Busch - RPI 45
aEEE
EEEEEE
→
→
∗→
+→
)(aEFFFTTTTEE
|)(||
→
∗→
+→
Ambiguous Grammar Non-Ambiguous
Grammar
Equivalent
generates the same language
A successful example:
Winter 2014 Costas Busch - RPI 46
aaaFaaFFaFTaTaTFTTTEE
∗+⇒∗+⇒∗+⇒
∗+⇒+⇒+⇒+⇒+⇒
E
E T+
T ∗ F
F
a
T
F
a
a
aEFFFTTTTEE
|)(||
→
∗→
+→
Unique derivation tree for aaa ∗+
Winter 2014 Costas Busch - RPI 47
}{}{ mmnmnn cbacbaL ∪=
0, ≥mn
An un-successful example:
every grammar that generates this language is ambiguous
L is inherently ambiguous:
Winter 2014 Costas Busch - RPI 48
}{}{ mmnmnn cbacbaL ∪=
λ||11
aAbAAcSS
→
→
λ||22
bBcBBaSS
→
→21 | SSS→
Example (ambiguous) grammar for : L
Professor Alex Aiken Lecture #5 (Modified by Professor Vijay
Ganesh)
49
Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar
unambiguously Enforces precedence of * over +
' '
'
E E E | EE id E | id | (E) E | (E)
→ +
ʹ′ ʹ′→ ∗ ∗
50
Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an
ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the
grammar. Sometimes allows more natural definitions. However, we still need to eventually eliminate ambiguity. How?
51
Precedence and Associativity Declarations Instead of rewriting the grammar
Use the more natural (ambiguous) grammar Along with disambiguating declarations
Most parser-generators allow precedence
and associativity declarations to disambiguate grammars
52
Associativity Declarations • Ambiguous • two parse trees of int + int + int
• Left associativity declaration: %left +
E
E
E E
E +
int +
int int
E
E
E E
E +
int +
int int
53
Precedence Declarations • Consider the string int + int * int • Precedence declarations: %left + %left *
E
E
E E
E +
int *
int int
E
E
E E
E *
int +
int int
Winter 2014 Costas Busch - RPI 54
Properties of
Context-Free languages
Winter 2014 Costas Busch - RPI 55
Context-free languages are closed under: Union
1L is context free
2L is context free 21 LL ∪
is context-free
Union
Winter 2014 Costas Busch - RPI 56
Example
λ|11 baSS →
λ|| 222 bbSaaSS →
Union
}{1nnbaL =
}{2RwwL =
21 | SSS→}{}{ Rnn wwbaL ∪=
Language Grammar
Winter 2014 Costas Busch - RPI 57
In general:
The grammar of the union has new start variable and additional production 21 | SSS→
For context-free languages with context-free grammars and start variables
21, LL
21, GG21, SS
21 LL ∪S
Winter 2014 Costas Busch - RPI 58
Context-free languages are closed under: Concatenation
1L is context free
2L is context free 21LL
is context-free
Concatenation
Winter 2014 Costas Busch - RPI 59
Example
λ|11 baSS →
λ|| 222 bbSaaSS →
Concatenation
}{1nnbaL =
}{2RwwL =
21SSS→}}{{ Rnn wwbaL =
Language Grammar
Winter 2014 Costas Busch - RPI 60
In general:
The grammar of the concatenation has new start variable and additional production 21SSS→
For context-free languages with context-free grammars and start variables
21, LL
21, GG21, SS
21LLS
Winter 2014 Costas Busch - RPI 61
Context-free languages are closed under: Star-operation
L is context free *L is context-free
Star Operation
Winter 2014 Costas Busch - RPI 62
λ|aSbS→}{ nnbaL =
λ|11 SSS →*}{ nnbaL =
Example
Language Grammar
Star Operation
Winter 2014 Costas Busch - RPI 63
In general:
The grammar of the star operation has new start variable and additional production
For context-free language with context-free grammar and start variable
LGS
*L1S
λ|11 SSS →
Winter 2014 Costas Busch - RPI 64
Context-free Languages don’t have the following
Properties
Winter 2014 Costas Busch - RPI 65
Context-free languages are not closed under: intersection
1L is context free
2L is context free
21 LL ∩
not necessarily context-free
Intersection
Winter 2014 Costas Busch - RPI 66
Example
}{1mnn cbaL =
λ
λ
||
cCCaAbAACS
→
→
→
Context-free:
}{2mmn cbaL =
λ
λ
||
bBcBaAAABS
→
→
→Context-free:
}{21nnn cbaLL =∩ NOT context-free
Intersection
Winter 2014 Costas Busch - RPI 67
Context-free languages are not closed under: complement
L is context free L not necessarily context-free.
Complement
I.e., the complement of a context-free language may or may not be context-free.
Winter 2014 Costas Busch - RPI 68
}{2121nnn cbaLLLL =∩=∪
NOT context-free
Example
}{1mnn cbaL =
λ
λ
||
cCCaAbAACS
→
→
→
Context-free:
}{2mmn cbaL =
λ
λ
||
bBcBaAAABS
→
→
→Context-free:
Complement
Winter 2014 Costas Busch - RPI 69
Intersection of
Context-free languages and
Regular Languages
Winter 2014 Costas Busch - RPI 70
The intersection of a context-free language and a regular language is a context-free language
1L context free
2L regular 21 LL ∩
context-free
Winter 2014 Costas Busch - RPI 71
Applications of
Regular Closure
Winter 2014 Costas Busch - RPI 72
The intersection of a context-free language and a regular language is a context-free language
1L context free
2L regular 21 LL ∩
context-free
Regular Closure
Winter 2014 Costas Busch - RPI 73
An Application of Regular Closure
Prove that: }0,100:{ ≥≠= nnbaL nn
is context-free
Winter 2014 Costas Busch - RPI 74
}0:{ ≥nba nn
We know:
is context-free
Winter 2014 Costas Busch - RPI 75
}{ 1001001 baL = is regular
}{}){( 100100*1 babaL −+= is regular
We also know:
Winter 2014 Costas Busch - RPI 76
}{}){( 100100*1 babaL −+=
regular }{ nnba
context-free
1}{ Lba nn ∩ context-free
LnnbaLba nnnn =≥≠=∩ }0,100:{}{ 1is context-free
(regular closure)
Winter 2014 Costas Busch - RPI 77
Another Application of Regular Closure
Prove that: }:{ cba nnnwL ===
is not context-free
Winter 2014 Costas Busch - RPI 78
}:{ cba nnnwL ===
}{*}**{ nnn cbacbaL =∩
context-free regular context-free
If is context-free
Then
Impossible!!!
Therefore, is not context free L
(regular closure)