A Pumping Lemma forand Closure Proper ties ofConte xt-Free Langua ges
Martin Franzle
Informatics and Mathematical Modelling
The Technical University of Denmark
Context-free languages III – p.1/20
What you’ ll learn
1. A pumping lemma for CFLs:
� The lemma� Its use in proving languages non-CFL
� Insight in the limits of CFLs
2. Closure proper ties of CFLs:
� The operation of substitution — generalizinghomomorphisms
� Closure under union, concatenation, Kleene closure,homomorphism� Non-closure under intersection, complement, difference� Closure under intersection with a regular language� Closure under inverse homomorphism
Context-free languages III – p.2/20
A pumping lemma for CFLs
Context-free languages III – p.3/20
Rationale
� As CFGs build strings of arbitrary length – if the languagedefined is not finite — from a finite set of rules, it is reasonableto expect that all sufficiently long words of the language havesome kind of repeatable pattern.
� Obviously, this pattern can be more complex than that occurringin regular languages (why?)
Program:
1. Identify the kind of pattern;
2. derive a method for proving languages non-contextfree.
Context-free languages III – p.4/20
Rationale
� As CFGs build strings of arbitrary length – if the languagedefined is not finite — from a finite set of rules, it is reasonableto expect that all sufficiently long words of the language havesome kind of repeatable pattern.
� Obviously, this pattern can be more complex than that occurringin regular languages (why?)
Program:
1. Identify the kind of pattern;
2. derive a method for proving languages non-contextfree.
Context-free languages III – p.4/20
Idea
Any parse-tree deeper than the number of variables in the CFG musthave some variable occur at least twice on its deepest path:
S
V
V
S
V
V
S
V
V
This provides “pluggable” subtrees!
Context-free languages III – p.5/20
Idea
Any parse-tree deeper than the number of variables in the CFG musthave some variable occur at least twice on its deepest path:
S
V
V
S
V
V
S
V
V
This provides “pluggable” subtrees!
Context-free languages III – p.5/20
Idea
Any parse-tree deeper than the number of variables in the CFG musthave some variable occur at least twice on its deepest path:
S
V
V
S
V
V
S
V
V
This provides “pluggable” subtrees!
Context-free languages III – p.5/20
Pumping it
Parse tree
can be shrunk
u v w x y
S
V
V
y
w
u
S
V
Caution: To really change yield of the parse tree, we have tomake sure that or .
Context-free languages III – p.6/20
Pumping it
Parse tree can be shrunk
u v w x y
S
V
V
y
w
u
S
V
Caution: To really change yield of the parse tree, we have tomake sure that or .
Context-free languages III – p.6/20
Pumping it
Parse tree can be shrunk
u v w x y
S
V
V
y
w
u
S
V
Caution: To really change yield of the parse tree, we have tomake sure that � ��� � or � ��� �.
Context-free languages III – p.6/20
Pumping it
Parse tree
can be expanded
u v w x y
S
V
V
u v
v w x
x y
S
V
V
V
Again, change of yield depends on .
Context-free languages III – p.7/20
Pumping it
Parse tree can be expanded � � �
u v w x y
S
V
V
u v
v w x
x y
S
V
V
V
Again, change of yield depends on .
Context-free languages III – p.7/20
Pumping it
Parse tree can be expanded � � �
u v w x y
S
V
V
u v
v w x
x y
S
V
V
V
Again, change of yield depends on � � ��� �.Context-free languages III – p.7/20
Ensuring � or � �
can be done by using Chomsky-normal-form (CNF) grammars:
� they don’t contain nullable symbols,
� the production used for the upper
in the parse tree must be ofthe form � �� ��as it did not directly derive a terminal
the lower
must be derived from either
�� or
�
� in the first case, � ��� � as
� is non-nullable,� in the second case, � ��� � as
� is non-nullable.
Context-free languages III – p.8/20
The pumping lemma for CFLs
Thm: For any CFL
�
there is constant � �
IN such that any string� � �
with
� � ��� � can be split into � � � �� � � with1.
� � � � ��� �,
i.e. we find the pumpable portion in a “short” substring,
2. � � ��� �,
i.e. at least one of and is non-trivial,
3.
�� �
IN � � ! � � ! � � �
,
i.e. and can be simultaneously “pumped”.
Context-free languages III – p.9/20
The pumping lemma for CFLs
Thm: For any CFL
�
there is constant � �
IN such that any string� � �
with
� � ��� � can be split into � � � �� � � with1.
� � � � ��� �,i.e. we find the pumpable portion in a “short” substring,
2. � � ��� �,i.e. at least one of � and � is non-trivial,
3.
�� �
IN � � ! � � ! � � �
,i.e. � and � can be simultaneously “pumped”.
Context-free languages III – p.9/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on :
Induction hypothesis: For all it is true that the yield of any parsetree of depth satisfies , if is terminal.
Induction step: Parse tree depth . We deduce from the inductionhypothesis that for all terminal yields of parse trees of depth .
Case 1: . There is no parse tree of depth having a terminal yield.Thus, the conjecture is trivially true.Case 2: . As is of CNF and is terminal, can only be derivedby a production with . Hence, .
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:
Induction hypothesis: For all it is true that the yield of any parsetree of depth satisfies , if is terminal.
Induction step: Parse tree depth . We deduce from the inductionhypothesis that for all terminal yields of parse trees of depth .
Case 1: . There is no parse tree of depth having a terminal yield.Thus, the conjecture is trivially true.Case 2: . As is of CNF and is terminal, can only be derivedby a production with . Hence, .
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:Induction hypothesis: For all / 0 1
it is true that the yield 2 of any parsetree of depth / satisfies
3 2 34 5687 9
, if 2 is terminal.
Induction step: Parse tree depth . We deduce from the inductionhypothesis that for all terminal yields of parse trees of depth .
Case 1: . There is no parse tree of depth having a terminal yield.Thus, the conjecture is trivially true.Case 2: . As is of CNF and is terminal, can only be derivedby a production with . Hence, .
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:Induction hypothesis: For all / 0 1
it is true that the yield 2 of any parsetree of depth / satisfies
3 2 34 5687 9
, if 2 is terminal.
Induction step: Parse tree depth /: 1
. We deduce from the inductionhypothesis that
3 2 34 5 ;7 9
for all terminal yields of parse trees of depth
1
.
Case 1: . There is no parse tree of depth having a terminal yield.Thus, the conjecture is trivially true.Case 2: . As is of CNF and is terminal, can only be derivedby a production with . Hence, .
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:Induction hypothesis: For all / 0 1
it is true that the yield 2 of any parsetree of depth / satisfies
3 2 34 5687 9
, if 2 is terminal.
Induction step: Parse tree depth /: 1
. We deduce from the inductionhypothesis that
3 2 34 5 ;7 9
for all terminal yields of parse trees of depth
1
.�
Case 1:
1: <
. There is no parse tree of depth
<
having a terminal yield.Thus, the conjecture is trivially true.
Case 2: . As is of CNF and is terminal, can only be derivedby a production with . Hence, .
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:Induction hypothesis: For all / 0 1
it is true that the yield 2 of any parsetree of depth / satisfies
3 2 34 5687 9
, if 2 is terminal.
Induction step: Parse tree depth /: 1
. We deduce from the inductionhypothesis that
3 2 34 5 ;7 9
for all terminal yields of parse trees of depth
1
.�
Case 1:
1: <
. There is no parse tree of depth
<
having a terminal yield.Thus, the conjecture is trivially true.�
Case 2:
1: =
. As
>
is of CNF and 2 is terminal, 2 can only be derivedby a production
?A@ B with B C D. Hence,
3 2 3 : = : 5 E : 5 ;7 9
.
Case 3: . As tree depth , the root of the parse tree uses aproduction, which is of the form because is of CNF. As theoverall tree depth is , the subtrees rooted at and have depth ,s.t. the induction hypothesis applies to them.Thus, holds for the yields of these subtrees. Now,
such that .
Context-free languages III – p.10/20
An auxiliar y lemma on CNF grammar s
Lem: If
" � # %$ &$ ' $ ( )
is a CNF grammar and � � &*
is the yield of aparse tree of
"
of depth � then
� � ��� +,.- �
.
Prf: By complete induction on /:Induction hypothesis: For all / 0 1
it is true that the yield 2 of any parsetree of depth / satisfies
3 2 34 5687 9
, if 2 is terminal.
Induction step: Parse tree depth /: 1
. We deduce from the inductionhypothesis that
3 2 34 5 ;7 9
for all terminal yields of parse trees of depth
1
.�
Case 1:
1: <
. There is no parse tree of depth
<
having a terminal yield.Thus, the conjecture is trivially true.�
Case 2:
1: =
. As
>
is of CNF and 2 is terminal, 2 can only be derivedby a production
?A@ B with B C D. Hence,
3 2 3 : = : 5 E : 5 ;7 9
.�
Case 3:
1GF =
. As tree depth1F =
, the root of the parse tree uses aproduction, which is of the form
?A@ H 9 HJI because
>
is of CNF. As theoverall tree depth is
1, the subtrees rooted at
H 9 and
HI have depth 0 1
,s.t. the induction hypothesis applies to them.Thus,
3 2 9 34 5 ;7 ILK 3 2 I 3
holds for the yields 28M of these subtrees. Now,2 : 2 9 2 I such that
3 2 3 : 3 2 9 38N 3 2 I 34 5 ;7 I N 5 ;7 I : 5 ;7 9
.Context-free languages III – p.10/20
Pumping lemma as a proof schemeGiven a language
�
deemed to be non-CFL, proceed as follows:
1. Take arbitrary � �
IN,
(Selection of is not under your control — you have to accept any .)
2. provide a construction of � depending on �,
(You select .)
3. arbitrarily break � into � �� � � subject to the constraints(a)
� � � � ��� �,(b) � � ��� �,
(Selection of is not under your control — you have to accept anysplit that satisfies the two constraints.)
4. pick
� �
IN depending on � $ � $ � $ � $ � and � such that� � ! � � ! � �� �
.
(You select .)
This constitutes a proof of
�
being not context-free.Context-free languages III – p.11/20
Pumping lemma as a proof schemeGiven a language
�
deemed to be non-CFL, proceed as follows:
1. Take arbitrary � �
IN,(Selection of / is not under your control — you have to accept any /.)
2. provide a construction of � depending on �,(You select O.)
3. arbitrarily break � into � �� � � subject to the constraints(a)
� � � � ��� �,(b) � � ��� �,(Selection of PRQ S Q 2 Q T Q U is not under your control — you have to accept anysplit that satisfies the two constraints.)
4. pick
� �
IN depending on � $ � $ � $ � $ � and � such that� � ! � � ! � �� �
.(You select
V
.)
This constitutes a proof of
�
being not context-free.Context-free languages III – p.11/20
Closure proper ties of CFLs
Context-free languages III – p.12/20
Substitutions
Idea: Generalize the notion of homomorphism by substituting a fullCFL (instead of just a word) for each terminal symbol.
Def: Given a set
&
of terminals, a substitution for
&is a mappingWX & �
CFL.
Given � � &*
and a substitution W on
&,
W # � ) def� Y � � � � � � � �[Z]\ Z � � ! � W # � ! ) for all
� � � � �^ $
i.e. W # � )
is the concatenation of the languages W # � � ) $ W # � � ) $ � � � .Given
�_ &*
and a substitution W on
&
,W # � ) def�
\ ` aW # � ) �
Context-free languages III – p.13/20
Closure under substitution
Thm: If
�
is a CFL over alphabet
&
and W is a substitution for&
thenW # � )
is a CFL.
Prf: Essentially, we take a CFG
>
for
b
and replace each terminal B by the startsymbols of a CFG for c d B e .
Let and be a CFG for for each. Then generates , where
is the disjoint union of and all ’s,
,
consists of1. the productions from , but with each terminal being replaced
by (actually a homomorphism on the productions),
2. .
Context-free languages III – p.14/20
Closure under substitution
Thm: If
�
is a CFL over alphabet
&
and W is a substitution for&
thenW # � )
is a CFL.
Prf: Essentially, we take a CFG
>
for
b
and replace each terminal B by the startsymbols of a CFG for c d B e .Let
> : d HQ DQ fQ ? e
and
>hg : d Hg Q Dg Q fg Q ?g e
be a CFG for c d B e for eachB C D
. Then
> i : d H i Q D i Q f i Q ? e
generates c d b e, where� H i
is the disjoint union of
H
and all
Hg ’s,� D i : jg k l Dg ,� f i
consists of1. the productions from
f
, but with each terminal B C D
being replacedby
?g (actually a homomorphism on the productions),
2.
jg k l fg .
Context-free languages III – p.14/20
Implications of substitution closureCor: The CFLs are closed under� union,� concatenation,� Kleene closure (
*
) and positive closure (
m
),� homomorphism.
Prf: For the first three items, we state regular (and thus context-free) languagesplus a substitution that maps them into the desired language:1. ,2. ,3. and , respectively.
Applying the substitution and to these RLs, we obtain
1. ,2. ,3. and .Homomorphism, finally, is a special case of substitution.Hence, substitution closure implies all these closure properties.
Context-free languages III – p.15/20
Implications of substitution closureCor: The CFLs are closed under� union,� concatenation,� Kleene closure (
*
) and positive closure (
m
),� homomorphism.
Prf: For the first three items, we state regular (and thus context-free) languagesplus a substitution that maps them into the desired language:1. BN n
,2. B n
,3. B o and B B o , respectively.
Applying the substitution c d B e : band c d n e : p
to these RLs, we obtain
1.
bq p
,2.
b p
,3.
b o
and
b r
.
Homomorphism, finally, is a special case of substitution.Hence, substitution closure implies all these closure properties.
Context-free languages III – p.15/20
Implications of substitution closureCor: The CFLs are closed under� union,� concatenation,� Kleene closure (
*
) and positive closure (
m
),� homomorphism.
Prf: For the first three items, we state regular (and thus context-free) languagesplus a substitution that maps them into the desired language:1. BN n
,2. B n
,3. B o and B B o , respectively.
Applying the substitution c d B e : band c d n e : p
to these RLs, we obtain
1.
bq p
,2.
b p
,3.
b o
and
b r
.Homomorphism, finally, is a special case of substitution.Hence, substitution closure implies all these closure properties.
Context-free languages III – p.15/20
Non-c losure under inter section
Thm: The CFLs are not closed under intersection.
N.B.: This is in contrast to the RLs, which are closed underintersection.
Prf: By contraposition:p : s B6 n 6]t u 3 / Q v C
IN
w
and
x : s B6 n ut u 3 / Q v CIN
w
are CFLs. If theCFLs were closed under intersection then
s B6 n 6t 6 3 / C
IN
w: py x
werea CFL, which it is not.
Context-free languages III – p.16/20
Implications of non-c losure under inter section
Cor: The CFLs are not closed under� complement,� difference.
Prf: By contraposition:
If the CFLs were closed under complement, then they were also closed
under intersection, as
py x : p q x
.
If the CFLs were closed under difference, then they were also closed undercomplement, as
D o
is a CFL and
b : D o z b.
Context-free languages III – p.17/20
Closure under reversal
Thm: The CFLs are closed under reversal of words.
Prf: Take a CFG for the language and reverse the bodies of all productions.
Context-free languages III – p.18/20
Closure under inter section with RL
Thm: The CFLs are closed under intersection with a regularlanguage.I.e.,
�{ |
is a CFL if
�
is a CFL and
|
is an RL.
Prf: Let
f : d}R~ Q � Q �Q �~ Q �~ Q � E Q �~ e
be a PDA acceptingb
by final state and let� : d} g Q � Q �J� Q � � Q �� e
be a DFA for
�
.The PDA
} i : d} ~ � } g Q � Q �Q � Q d �~ Q � � e Q � E Q �~ � �g ewith
� d d � Q � e Q B Q � e def: d d � i Q � i e Q � e���������d � i Q � e C �~ d � Q B Q � e
� � i : ���� d � Q B e
then accepts
by �
by final state.
Cor: If is a CFL and is an RL, then is a CFL.
Prf: . Thus, is a CFL because of closure of the CFLs underintersection with RLs and because of closure of the RLs under complement.
Context-free languages III – p.19/20
Closure under inter section with RL
Thm: The CFLs are closed under intersection with a regularlanguage.I.e.,
�{ |
is a CFL if
�
is a CFL and
|
is an RL.
Prf: Let
f : d}R~ Q � Q �Q �~ Q �~ Q � E Q �~ e
be a PDA acceptingb
by final state and let� : d} g Q � Q �J� Q � � Q �� e
be a DFA for
�
.The PDA
} i : d} ~ � } g Q � Q �Q � Q d �~ Q � � e Q � E Q �~ � �g ewith
� d d � Q � e Q B Q � e def: d d � i Q � i e Q � e���������d � i Q � e C �~ d � Q B Q � e
� � i : ���� d � Q B e
then accepts
by �
by final state.
Cor: If
�
is a CFL and
|is an RL, then
� � |
is a CFL.
Prf:
b z � : by �
. Thus,b z �
is a CFL because of closure of the CFLs underintersection with RLs and because of closure of the RLs under complement.
Context-free languages III – p.19/20
Closure under inverse homomorphismThm: The CFLs are closed under reverse homomorphism.
Prf: We add a length v B T s 3� d B e 3 3 B C D w
buffer to a PDA for the CFL:
Accept/reject
Bufferh(a)a h
statePDA
Stack
�
Buffer is part of finite state set,�
whenever it is empty, it can befilled with
� dnext input
e
, therebykeeping the stack and theoriginal PDA’s state,�
when buffer is not empty, theoriginal PDA’s non- � moves canbe performed on the frontmostelement of the buffer, therebyremoving that element from thebuffer,�
the original PDA’s � moves canalways be performed, leaving thebuffer intact.
Context-free languages III – p.20/20