non-regular languages and the pumping lemma · 2015-10-02 · use the pumping lemma for regular...
TRANSCRIPT
Non-Regular Languages and The Pumping Lemma
Foundations of Computer Science Theory
Regular Languages
• For the regular languages, we have seen that there is a “circle of conversions” from one representation to another:
RE
DFA
NFA
ε-NFA
Properties of Language Classes
• A language class is a set of languages – Example: the regular languages
• Language classes have two important kinds of properties: 1. Closure properties 2. Decision properties
Closure Properties
• Recall that a closure property of a language class says that given any languages in the class, an operation (e.g., union) produces another language in the same class
• The regular languages are closed under union, concatenation, the Kleene star, intersection, difference, complement, and reversal
Decision Properties
• A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., an NFA or a regular expression) and determines whether or not some property holds
• For example, given a specific language L, we could ask, “Is the language L empty?” or “Is the language L finite?”
Why Decision Properties?
• If we think of a language as representing a certain protocol for processing data (i.e., a way of solving computational problems), then decision properties can tell us a lot about the behavior of the protocol – For example, “Is the language finite?” could
correspond to “Is there a way to solve the problem in a finite number of steps?”
– “Is the language empty?” could correspond to “Is there any way to solve the problem?”
The Emptiness Problem
• Our first decision property for regular languages is the question “Given a regular language, does the language contain any string at all?”
• Algorithm: – Create an NFA for the language – Compute the set of states reachable from the start
state – If at least one final state is reachable, then the
answer is “yes”, otherwise the answer is “no”
The Membership Problem
• The membership problem asks, “Is string w in regular language L?”
• Algorithm: – Create an NFA (or DFA) for the language – Simulate the action of the NFA on the sequence of
input symbols forming w
Start
1
0
A C B 1
0 0,1 Here’s a DFA for all strings without
consecutive 1’s. Test membership on an input string 01011. Not accepted.
• The finiteness problem asks, “Is a given regular language finite?”
• Algorithm: – Create an NFA for the language – If the NFA has n states, and the NFA accepts only
strings of length strictly less than n, then the language is finite
– If a given regular language is not finite then it is infinite
The Finiteness Problem
0 0 A B C
start
• If a regular language is infinite, it means that repetition is allowed when generating strings
• To define an infinite language we could use either: – The Kleene star (such as in a regular expression), or – Loops on states (such as in an NFA)
• Recall that an NFA for a regular language (finite or infinite) always has a finite number of states
• Algorithm: – Construct an NFA for the language – Test the NFA on strings that would force the NFA to go
through a loop if one exists
The Infiniteness Problem
• If an n-state NFA accepts a string of length n or longer, then there must be a state that appears at least twice on the path from the start state to a final state – This means that there must be a loop in the NFA,
because there are at least n+1 states visited along this particular path
The Infiniteness Problem
Here’s an NFA for strings of consecutive 0’s that have at least two 0’s. It has 3 states. String 000, with length 3, is accepted. To accept this string, the NFA visits 3 + 1 = 4 states (A, B, B, C). State B appears twice (loop!). Notice that string 00 is also accepted, as is string 00000…
0
0
0 A B C
start
Let w = xyz be a string accepted by an NFA with sub-strings x, y, and z, and
y ≠ ε, i.e., |y| ≥ 1.
q x y
z
Then x yi z is in the language for all i ≥ 0.
This statement implies that if we can find such a w (where y is not ε) then there are an infinite number of strings in L (i.e., L is infinite).
The Infiniteness Problem
Theorem: Let M = (Q, Σ, δ, s, F) be any NFA. If M accepts any string of length |Q| or greater, then the regular language recognized by M is infinite.
Proof: M starts in the start state and each time M reads an
input character, it visits another state. So, in processing a string of length n, M visits a total of n + 1 states. If n + 1 > |Q|, then, by the pigeonhole principle, some state must get more than one visit. So, if n ≥ |Q|, then M must visit at least one state more than once. This implies that there must be a loop in the NFA, which means that the state can be visited an infinite number of times.
The Infiniteness Problem
Theorem: There is a countably infinite number of regular languages.
Proof: The upper bound on the number of regular
languages is the number of possible finite automata. Given an alphabet, we could enumerate all possible NFAs (start with those with one state, then those with two states, then three states, etc.). Thus, the number of NFAs is countably infinite. Since there are fewer regular languages than there are NFAs, the regular languages must also be countably infinite
How Many Languages are Regular?
Theorem: There is an uncountably infinite number of non-regular languages.
Proof: A language is a set of strings over a non-empty
finite alphabet ∑. Thus, the set of all languages is the set of all sets of strings (the power set of all strings). We have already proven (using the diagonalization technique) that the power set of any countably infinite set is uncountably infinite. Therefore, there is an uncountably infinite number of languages that are not regular.
So there must be many more non-regular languages than
there are regular ones
How Many Languages are Not Regular?
How Many Languages Are There?
• Every finite language is regular • Some infinite languages are regular:
− a*b* − {w ∈ {a, b}* : every a is immediately followed
by b} • Some infinite languages are not regular:
− {w ∈ {a, b}* : anbn, n ≥ 0} − {w ∈ {a, b}* : every a has a matching b
somewhere, and the number of b’s is at least as great as the number of a’s}
Is a Language Regular?
Showing that a Language is Not Regular
• Recall that every regular language can be recognized by some finite automaton
• Recall also that finite automata can only use a finite amount of memory to record essential properties of the language
Question: What is the longest string that a 5-state NFA can accept without going through any loops?
Showing that a Language is Not Regular
Question? If an NFA with n states accepts any string of length ≥ n, how many strings does it accept?
For example, let L = bab*ab
w = ba b ab is a string accepted by this language. x y z
Therefore, xy*z must also be in L.
So L includes: baab, babab, babbab, babbbbbbbbbbab
Showing that a Language is Not Regular
• To show that a language is not regular, we use the pumping lemma for regular languages – In an NFA, “long” strings require that some states
must be visited more than once (i.e., the NFA must contain at least one loop)
• Long strings can be “pumped” and still be accepted
– If a language contains at least one long string that cannot be pumped, then the language is not regular
Showing that a Language is Not Regular
For every regular language L, there is an integer k, such that for every string w in L of length ≥ k, we can write w = xyz such that:
1. |xy| ≤ k 2. |y| > 0 3. For all i ≥ 0, xyiz is in L
Number of states in NFA First cycle in NFA
The Pumping Lemma
• Recall our earlier claim that {anbn : n > 0} is not a regular language
• Proof by contradiction: – Suppose it is regular, then there must be an associated
k such that |xy| ≤ k – Pick any string in L: let’s pick w = akbk – Since |xy| ≤ k, both x and y must consist of only a’s – Pump y up by choosing i = 2
• Then xyyz should be in L, but it is not because this string has more a’s than b’s
• For example, if w = aaabbb and i = 2, then aaaabbb should also be in L, but it is not because it has more a’s than b’s
The Pumping Lemma
• What if we made a different choice for w? – We can still prove that L = {anbn : n > 0} is not regular – If L were regular, then there would exists some k such
that any string w, where |w| ≥ k, must satisfy the conditions of the lemma
– Let w = ak/2b k/2 – Since |w| ≥ k and w is in L, w must satisfy the
conditions of the pumping lemma: • There must exist an x, y, and z such that w = xyz,|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L)
• We now show that no such y can exist
The Pumping Lemma
Divide w into two regions: aaaaa…..aaaaaa | bbbbb…..bbbbbb 1 | 2 There are 3 places where y could occur – in the a region, in the b region, or across the boundary between the a’s and the b’s.
The Pumping Lemma
• Case 1: y = ap for some p ≥ 1. The resulting string is ak/2+p b k/2. This string is not in L, since it has more a’s than b’s.
• Case 2: y = bp for some p ≥ 1. The resulting string is ak/2 b k/2+p.
This string is not in L, since it has more b’s than a’s. • Case 3: y = (ab)p for some p ≥ 1. The resulting string is
ak/2 (ab)pb k/2 has interleaved a’s and b’s, and so is not in L. Therefore since there exists at least one long string in L for which there is no way to split w into xyz, such that the required properties are preserved, L is not regular.
The Pumping Lemma
• If L is regular, then every “long” string in L is pumpable • To show that L is not regular, find one long string in L that isn’t
pumpable • Thus, to use the pumping lemma to show that a language L is
not regular, we must: 1. Choose a string w, where |w| ≥ k. Since we do not know what k is,
we must state w in terms of k. 2. Separate the possibilities for y into a set of equivalence classes. 3. For each such class of possible y values where |xy| ≤ k and y ≠ ε,
choose a value for i such that xyiz is not in L.
The Pumping Lemma - Summary
• Prove that L = {abncn : n ≥ 0} is not regular • If L were regular, then there would exist some k
such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma
• Let w = abkck , where y occurs in the first k characters of w
• Since |w| = 2k + 1 and w is in L, w must satisfy the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,
|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist
Equal b’s and c’s is Not Regular
• Recall that w = abkck , where y occurs in the first k characters of w
• If y includes the initial a, pump in one extra copy – The resulting string is not in L because it contains
more than one a • If y does not include the initial a, then it must be
bp, where 0 < p < k, so pump in one extra copy – The resulting string is not in L because it contains
more b’s than c’s • Therefore L is not regular
Equal b’s and c’s is Not Regular
Balanced Parentheses is Not Regular
• Prove that L = {w ∈ {(,)}* : the parentheses are balanced} is not regular
• If L were regular, then there would exist some k such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma
• Let w = (k )k • Since |w| = 2k and w is in L, w must satisfy the
conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,
|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist
Balanced Parentheses is Not Regular
• Since |xy| ≤ k, y must occur within the first k characters and so y = (p for some p ≥ 1
• Since y ≠ ε, p must be greater than 0 • Let i = 2 (in other words, pump in one extra
copy of y) • The resulting string is (k+p )k • This string must also be in L, but it is not since
it has more (‘s than )’s
Even Palindromes is Not Regular
• Prove that L = {wwR : w ∈ {a, b}*} is not regular • If L were regular, then there would exist some k
such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma
• Let w = akbkbkak • Since |w| = 4k and w is in L, w must satisfy the
conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,
|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist
Even Palindromes is Not Regular
• Since |xy| ≤ k, y must occur within the first k characters and so y = ap for some p
• Since y ≠ ε, p must be greater than 0 • Let i = 2 (in other words, pump in one extra copy of y) • The resulting string is ak+pbkbkak
• If p is odd, then this string is not in L because all strings in L have even length
• If p is even, then it is at least 2, so the first half of the string has more a’s than the second half so it is not in L
• Therefore, L is not regular
More a’s Than b’s is Not Regular
• Prove that L = {anbm : n > m} is not regular • If L were regular, then there would exist some k
such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma
• Let w = ak+1bk • Since |w| = 2k + 1 and w is in L, w must satisfy
the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,
|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist
More a’s Than b’s is Not Regular • Since |xy| ≤ k, y must occur within the first k characters and
so y = ap for some p • Since y ≠ ε, p must be greater than 0 • Notice that with our choice of w, there are already more a’s
than b’s, as required by the definition of L • If we pump in, there will be even more a’s and the resulting
string will still be in L • But if we set i = 0 (i.e., pump out) the resulting string will be
ak+1-pbk • Since p > 0, k + 1 – p ≤ k, so the resulting string no longer has
more a’s than b’s and so it is not in L • Since there exists at least one long string in L that fails to
satisfy the conditions of the pumping lemma, L is not regular
Prime Number of a’s is Not Regular
• Prove that L = {an : n is prime} is not regular • If L were regular, then there would exist some k
such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma
• Let w = aj, where j is the smallest prime number greater than k + 1 (e.g., w = aaaaa, k = 3, j = 5)
• Since |w| > k and w is in L, w must satisfy the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,
|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist
Prime Number of a’s is Not Regular
• Since |xy| ≤ k, y must occur within the first k characters and so y = ap for some p
• Recall that pumping lemma requires that ∀i ≥ 0 (xyiz is in L), therefore ∀i ≥ 0 (a|x| + i⋅|y| + |z| must also be in L)
• This means that |x| + i⋅|y| + |z| must also be prime • Let i = |x| + |z| • Then |x| + i⋅|y| + |z| = |x| + (|x| + |z|) ⋅|y| + |z| = (|x| + |z|) ⋅ (1 + |y|) This is composite (i.e., not prime). So for at least one value of i, the resulting string is not in L. Therefore, L is not regular.
Choosing Values • When we use the pumping lemma to prove that a
language is not regular, we have two choices to make: w and i
• Heuristic for choosing w: – Choose a w that is in the part of L that makes it non-
regular – Choose a w that is only barely in L – Choose a w with as homogenous as possible an initial
region of length at least k • Heuristic for choosing i:
– Try letting i be either 0 or 2 – If that doesn’t work, try analyzing L to see if there is some
other value that will work
Sometimes Pumping Does Not Help
• Consider: L = {ai bj ck : i, j, k ≥ 0, and if i = 1 then j = k} • We could try to use the pumping lemma, after
re-writing this language as: L = {ai bj ck : i, j, k ≥ 0, and either (i ≠ 1) or (j = k)}
• What are the possible values for i? – If i = 0 then: if j ≠ 0, let y be b; otherwise let y be c
• Pump in or out, and i will still be 0 and thus not equal to 1, so the resulting string is in L
– If i = 1 then: let y be a • Pump in or out, then i will no longer be equal to 1, so the
resulting string is in L – If i = 2 then: let y be aa
• Pump in or out, and i cannot equal 1, so the resulting string is in L
– If i > 2 then: let y be a • Pump out once or in any number of times, and i cannot
equal 1, so the resulting string is in L
Sometimes Pumping Does Not Help
Use of Closure Properties
• Sometime the only (or easiest) way to prove that a language is not regular is to use the closure properties for regular languages, either alone or in conjunction with the pumping lemma
• The fact that regular languages are closed under intersection is particularly useful
• Recall that we proved that L = {abncn : n ≥ 0} is not regular
• Let Lnew = {aibjck: i,j,k > 0, and if i = 1 then j = k} – Notice that L = Lnew ∩ ab*c*
• If Lnew were regular, then L would also be regular because regular languages are closed under intersection
• But we have already proven that L is not regular, therefore Lnew is not regular
Closure Under Intersection
• Recall that we already proved L = {anbn : n ≥ 0} is not regular
• Let Lnew = the set of strings with an equal number of a’s and b’s – Notice that L = Lnew ∩ a*b*
• If Lnew were regular, then L would also be regular because regular languages are closed under intersection
• But we have already proven that L is not regular, therefore Lnew is not regular
Closure Under Intersection
• Let L = {aibj : i, j ≥ 0, and i ≠ j} • It seems unlikely that L is regular since any
machine that would accept it would have to be able to count the a’s and the b’s
• We could try to use the pumping lemma to prove that L is not regular – For example, let w = ak+1bk – But then y could be pumped up or pumped down
and the string would still be in L
Closure Under Complement
Closure Under Complement
• Suppose we let w = akbk+k!
• Then y = ap for some non-zero p • Let i= (k!/p) + 1 (in other words, pump in (k!/p)
additional groups of y’s) – Note that k!/p must be an integer because p ≤ k
• The number of a’s in the resulting string is k + (k!/p)p = k + k!
• Therefore, the resulting string is ak+k!bk+k! which has an equal number of a’s and b’s, so it is not in L
Closure Under Complement
• Closure under complement provides an easier way to show that L = {aibj : i, j ≥ 0, and i ≠ j} is not regular
• If L were regular, then ¬L would also be regular • ¬L = {anbn : n ≥ 0} ∪ {strings of a’s and b’s that do
not have all a’s in front of all b’s} • If ¬L is regular, then ¬L ∩ a*b* = {anbn : n ≥ 0}
must also be regular, but we have shown that {anbn : n ≥ 0} is not, therefore ¬L and L are not
Closure Under Reverse
• Let L = {ai bj ck : i, j, k ≥ 0, and if i = 1 then j = k} • Then LR = {ck bj ai : i, j, k ≥ 0, and if i = 1 then j = k} • If L were regular, then LR would be also • Let w = ckbka, and y must occur in the first k characters
of w, so y = cp, where 0 < p ≤ k • Set i = 0 (i.e., pump c out once): the resulting string
contains a single a, and the number of b’s and c’s must be equal for the string to be in LR
• But there are fewer c’s than b’s, so the resulting string is not in LR, therefore LR is not regular and neither is L