1
Unit 9
More Pushdown AutomataContext-free LanguagesPumping Lemma for CFL
Reading: Sipser, chapter 2.3
2
Properties of PDAs
• An NFA can only distinguish between |Q| different characterizations.
• A PDA can distinguish between an unlimited number of characterizations.
• A PDA can recognize non-regular languages because the stack can ‘count’.
• A PDA can count ‘more than once’. But it can not mix counters. Only one active counter can be used each time.
3
Example: L = {aibicjdj |i,j0}
Construct a PDA to recognize:
L = {aibicjdj |i,j0}
We will use the empty stack model.
The basic idea:
• Push to the stack an A for each a, pop an A
for each b.
• Push to the stack a C for each c, pop a C for
each d .
syntactic computational
4
Properties of PDAs
• We had two ways to describe regular languages:
Regular-Expressions DFA / NFA
• How about context-free-languages?
computational
CFG
syntactic
PDA
CFG=PDA
Theorem: A language is context-free iff
some pushdown automaton recognizes it.
Proof:
• CFLPDA: we show that if L is CFL then a PDA recognizes it.
• PDA CFL: we show that if a PDA recognizes L then L is CFL.
5
From CFG to PDA
6
• Proof idea: Use PDA to simulate leftmost derivations.
• Leftmost derivation : A derivation of a string is a leftmost derivation if at every step the leftmost remaining variable is the one replaced.
• We use the stack to store the suffix that has not been derived so far.
• Any terminal symbols appearing before the leftmost variable are matched right away.
7
Different derivations for the same parse tree
235
23E
2 EE
2E
EEE
235
E35
E E5
E EE
EEE
5 + 3 x 2
E
leftmostderivation
rightmostderivation
E
E E
E
CFG: EEE | E+E
E0 | 1 | 2 | … | 9
control
5 + 3 x 2 ExE$
input: stack:
control
5 + 3 x 2 E$
E
input: stack:
EEE
EE
Starting configuration:
control
5 + 3 x 2
5+ExE$
input: stack:
E E5
E EE
EEE
control
5 + 3 x 2 E+ExE$
input: stack:E EE
EEE
E+EE
5+EE
control
5 + 3 x 2 ExE$
input: stack:
E E5
E EE
EEE
EE
control
5 + 3 x 2 3xE$
input: stack:
E 35
E E5
E EE
EEE
3E
control
5 + 3 x 2 E$
input: stack:
E
E 35
E E5
E EE
EEE
2 35
E 35
E E5
E EE
EEE
control
5 + 3 x 2 2$
input: stack:
2
2 35
E 35
E E5
E EE
EEE
control
5 + 3 x 2 $input: stack:
The string ‘5 + 3 x 2’ is accepted
Informally:
1. Place the marker symbol $ and the start variable
S on the stack.
2. Repeat the following steps:
– If the top of the stack is a variable A:
Choose a rule A→1…k and substitute A with 1…k
– If the top of the stack is a terminal a:
Read next input symbol and compare to a
If they don’t match, reject (die)
– If top of stack is $, go to accept state
13
From CFG to PDA
• For a given CFG G=(V,,S,R),
we construct a PDA P=(Q,,,,q0,F) where:
– Q={qstart, qloop, qaccpt}
– = V{$}
– q0=qstart
– F={qaccpt}
14
From CFG to PDA
• We define as follows (shorthand notation):
– (qstart,,)={(qloop,S$)}
– (qloop,,A)={(qloop, 1…k) | for each A1…k in R}
– (qloop,a,a)={(qloop,) | for each a }
– (qloop,,$)={(qaccpt,)}
15
From CFG to PDA
,S$qstart qloop qaccpt
{,A1…k | for rules A 1…k}
{a,a | for all a}
,$
• Construct a PDA for the following CFG G:
SaTb | b L(G)= a*bTTa |
16
Example:
,S$qstart qloop qaccpt
,SaTb
,Sb
,TTa
,T
a,a
b,b
,$
17
From PDA to CFG
• First, we simplify the PDA:
– It has a single accept state qf
– $ is always popped exactly before accepting
– Each transition is either a push, or a pop, but
not both
context-free grammar pushdown automaton
✓
18
From PDA to CFG
• single accept state qf:
,
,
19
From PDA to CFG
• $ is always popped exactly before accepting:
{,A | A, A$}
,$
20
From PDA to CFG
• Each transition is either a push, or a pop:
,ab ,a ,b
, ,z ,z
z
21
From PDA to CFG
• For any word w accepted by a PDA
P=(Q,,,,q0,qf) the process starts at q0 with an
empty stack and ends at qf with an empty stack.
• Definition: for any two states p,qQ we define
Lp,q to be the language that if we starts at p with
an empty stack and run on wLp,q we end at q
with an empty stack.
• We define for Lp,q a variables Ap,q s.t.
Lp,q = {w | Ap,q* w}
• Note, that L(P)=Lq0,qf
22
From PDA to CFG• Consider a word wLp,q
• While running w on P, the stack is empty at p and
at q but what happens in the middle?
• Two possibilities:
– Option 1: The stack also empty in the middle
– Option 2: The stack never empty in the middle
p qr p q
stack
height
23
From PDA to CFG
Option 1: The stack also empty in the middle
• If the stack become empty at some state r then the
word wLpq can be reconstructed by a
concatenation of a word from Lpr and a word from
Lrq, thus Lpr Lrq Lpq
• In the CFG we express this by a rule: Apq AprArq
p qr
generated by Apr generated by Arq
24
From PDA to CFG
Option 2: The stack never empty in the middle
• The symbol that has been pushed at p is the
symbol that is popped at q.
• Thus, if at p we read a symbol a and moved to r,
while from state s we read a symbol b and moved
to q, aLr,sbLp,q and in CFG we have Apq aArsb
p q
generated by Ars
r s
a b
25
From PDA to CFG
Let P=(Q, , , , q0, qf) a given PDA.
We construct a CFL G=(V,,S,R) as follows*:
• V = {Ap,q | p,qQ}
• S=A
• R is a set of rules constructed as follows:
q0,qf
* Proof of correctness and further reading at the supplementary
material in the course web page .
26
From PDA to CFG• Add the following rules to R:
1. For each p,q,r,sQ, t, and a,b,
if (r,t)(p,a,) and (q,)(s,b,t) add a rule
Apq aAr,sb
2. For each p,q,rQ, add a rule Ap,q Ap,r Ar,q
1. For each pQ, add the rule Ap,p
p ra,t
s qb,t
p r q
p
pop tpush t
27
Example:
qs q0
#,,$q1
0,A 1,A
q2
,$
qs q0
#,z,$q1
0,A 1,A
q2
,$q3
,z
L(P)=0n#1n
28
Example:
start variable: AS2
productions:
ASS → ASSASS
ASS → AS0A0S
ASS → AS1A1S
ASS → AS2A2S
AS1 → ASSAS1
A00 →
...
A11 → A22 →
AS2 → A01
A01 → 0 A011
A33 →
AS1 → AS0A01
AS1 → AS1A11
ASS →
qs q0
#,z,$q1
0,A 1,A
q2
,$q3
,z
A01 → #A33
CFG=PDA
29
• We have shown that a language is context-free
iff some pushdown automaton recognizes it.
• In particular all regular languages can be
generated by CFGs and so can be recognized
by PDA.
• The class of languages accepted by non-
deterministic PDAs is larger than those
accepted by deterministic PDAs.
DPDA
30
The Context-free Languages
the regular languages
context-free languages
31
Non context-free Languages
• Consider the language L={aibici |i0}.
• When trying to build a push-down automaton that recognizes L, we can compare the number of a-'s with b-'s or c-'s but not both;
• If we compared the number of a-'s to the number of b-'s then we can't compare c-'s with any of them, as at this stage the stack (or counter) is empty.
32
• So some languages seem to be not CFL.
• The question is which?
• This can be determined using the pumping lemma for context-free languages.
Non context-free Languages
33
The Pumping Lemma - background
• Let L be a CFL and let G be a simplegrammar (no unit/ rules) generating it.
• Let wL be a long enough word (we will say later what is long).
• The parsing tree of w contains a long path from S to some leaf (terminal).
• On this long path some variable R must repeat (remember, w is long).
34
The Pumping Lemma - background
• Divide w into uvxyz
according to the parse
tree, as in the figure.
• Each occurrence of R
has a subtree under it.
xu v y z
S
R
R
35
The Pumping Lemma - background
• The upper occurrence of R has a larger subtree
and generates vxy.
• The lower occurrence of R has a smaller
subtree and generates only x.
• Both subtrees are generated by the same
variable R.
• That means if we substitute one for the other we
will still obtain valid parse trees.
xu v y z
S
R
R
36
Replacing the smaller by
the larger repeatedly
generates the string
uvixyiz at each i>0.
Replacing the larger by
the smaller generates the
string uxz or uvixyiz
where i=0.
Therefore, for all i0, wi = uvixyiz is also in L
u v y z
S
R
R
v yx
u z
S
R
x
37
The Pumping Length
• That means that every CFL has a special
value called the pumping length such that all
strings longer than the pumping length can
be "pumped".
• The string can be divided into 5 parts
w=uvxyz.
• The second and fourth can be pumped to
produce additional words in L.
• for all k0, wk = uvkxykz can also be
generated by the grammar.
38
Pumping Lemma for CFL
Lemma: Let L be a context-free language.
There is a positive integer p (the pumping
length) such that for all strings wL with
|w|p, w can be divided into five pieces
w=uvxyz satisfying the following conditions:
1. |vy|>0
2. |vxy|p
3. for each i0, uvixyizL
39
Proof - value of p• First we find out the value of p.
• Let G be a CFG for CFL L.
• Let b be the maximum number of symbols in
the right side of any rule in G.
• So we know that in any parse tree of G a node
can't have more than b children.
• So if the height of a parsing tree for wL is h
then |w|< bh (h>logb|w|).A
1 2 3 b
A123 b
40
Proof - value of p• Let |V| be the number of variables in G.
• We set p = b|V|+2 . (h>logb|p|).
• Then for any string of length p the parse tree
requires height at least |V|+2 (Note, b>1 since
there are no unit rules).
• Given a string wL, s.t. |w| p , since G has
only |V| variables, at least one of the variables
repeats (height |V|+2 |V|+1 variables +
terminal).
• W.l.o.g. assume this variable is R
41
Proof – condition 1
• To prove condition 1 (|vy|>0) we have to show it
is impossible that both v and y are .
• We use a grammar without unit rules.
• But the only way to have v=y=, is to have a
rule R R, which is a unit rule. Contradiction.
• So condition 1 is satisfied.
xu v y z
S
R
R
42
Proof – condition 2
• To prove condition 2 (|vxy|p) we will check
the height of the subtree rooted in first R =
the subtree that generates vxy.
• Its height is at most |V|+2 (R was selected as
a variable that has two occurrences within
the bottom |V|+1 levels of the parsing tree).
• So it can generate a string of length at most
b|V|+2.
• Since p= b|V|+2, condition 2 is satisfied.
u v y z
S
R
R
x
u z
S
R
x
v y
R
Replacing the smaller by the larger repeatedly generates the string uv ixy iz at each i>0.
Replacing the larger by the smaller generates the string uxz or uv ixy iz
where i=0.
Therefore, for all i0, wi=uvixyiz is also in L 43
Proof – condition 3
44
Usage of the lemma• We use the pumping lemma to prove that a
language is not context-free.
General Structure:
• Assume (by contradiction) that L is context-
free and therefore should fulfill the lemma.
• Let p be the pumping length for L.
• Select a word wL s.t. |w|>p.
• Show that for any partition of w into five parts
uvxyz such that |vy|>0 and |vxy|p, there
exists an i such that wi= uvixyiz L
• Contradiction!
45
Usage of the lemma - Example
• We use the pumping lemma to prove that the language L={anbncn |n0} is not context-free.
Proof:
• Assume, by contradiction, that L is context-free thus satisfying the lemma.
• Let p be the pumping length for L.
• We select the string w= apbpcp . wL and |w|>p, so it can be pumped.
46
L={anbncn |n0}
• Divide w into five parts uvxyz such that |vy|>0
and |vxy|p.
• There are two cases:
1. u and y are both homogeneous.
2. v or y is heterogeneous.
47
.Let }.0|{ uvxyzcbawncbaL pppnnn
ap bp cp
v y
v y
v
y
Case 1:
v and y are
homogeneous
v y
Case 2
v or y is
heterogeneous
or
v y
v y
48
L={anbncn |n0}
Case 1: v and y contain only one type of
alphabet symbol. By choosing i=2 we get
w2=uv2xy2z in which the number of
appearances of one or two symbols
increased while the third symbol remain
unchanged. So w2 cannot contain the same
number of a's, b's and c's and w2L.
49
L={aibici |i0}
Case 2: v or y contain two types of alphabet
symbols. (Cannot have three because
condition 2 holds). So if we choose i=2 then
the order of the symbols in v2 or y2 was
destroyed and w2=uv2xy2z L.
Conclusion: The assumption that L is context-
free is false. and L={anbncn |n0} is not CFL.
50
Pumping Lemma – another example
Let L={ww| w{0,1}*}
[ 00, 110110L ; 010, 0010L ]
Prove that L is not context-free
Proof: In class.
51
Intersection 1
CFL are not closed under intersection.
Proof: by contradiction
L1={anbnck |n,k>0}
L2={akbncn | n,k>0}
Both L1 and L2 are CFL but
L1L2 = {ajbjcj | j>0} is not a CFL.
52
Intersection 2
• CFL and RL are closed under intersection.
Proof:
Build a product automaton of a PDA and a DFA:
Note that the resulting automaton is a PDA.
DFA
PDA
ANDinput accept/reject
stack
53
Complement
Theorem: CFL are not closed under complement.
Proof: by contradiction.
• Take two CFLs L1 and L2 and assume that ~L1
and ~L2 are CFL. (~L denotes complement)
• CFL are closed under union operation so
~L1~L2 is a CFL.
• Using our assumption again we get that
~(~L1~L2) is a CFL as well.
• But ~(~L1~L2) =L1L2 and we already know
that CFLs are not closed under intersection.