non-regular languages and the pumping lemma · 2015-10-02 · use the pumping lemma for regular...

Non-Regular Languages and The Pumping Lemma

Foundations of Computer Science Theory

Regular Languages

• For the regular languages, we have seen that there is a “circle of conversions” from one representation to another:

RE

DFA

NFA

ε-NFA

Properties of Language Classes

• A language class is a set of languages – Example: the regular languages

• Language classes have two important kinds of properties: 1. Closure properties 2. Decision properties

Closure Properties

• Recall that a closure property of a language class says that given any languages in the class, an operation (e.g., union) produces another language in the same class

• The regular languages are closed under union, concatenation, the Kleene star, intersection, difference, complement, and reversal

Decision Properties

• A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., an NFA or a regular expression) and determines whether or not some property holds

• For example, given a specific language L, we could ask, “Is the language L empty?” or “Is the language L finite?”

Why Decision Properties?

• If we think of a language as representing a certain protocol for processing data (i.e., a way of solving computational problems), then decision properties can tell us a lot about the behavior of the protocol – For example, “Is the language finite?” could

correspond to “Is there a way to solve the problem in a finite number of steps?”

– “Is the language empty?” could correspond to “Is there any way to solve the problem?”

The Emptiness Problem

• Our first decision property for regular languages is the question “Given a regular language, does the language contain any string at all?”

• Algorithm: – Create an NFA for the language – Compute the set of states reachable from the start

state – If at least one final state is reachable, then the

answer is “yes”, otherwise the answer is “no”

The Membership Problem

• The membership problem asks, “Is string w in regular language L?”

• Algorithm: – Create an NFA (or DFA) for the language – Simulate the action of the NFA on the sequence of

input symbols forming w

Start

1

0

A C B 1

0 0,1 Here’s a DFA for all strings without

consecutive 1’s. Test membership on an input string 01011. Not accepted.

• The finiteness problem asks, “Is a given regular language finite?”

• Algorithm: – Create an NFA for the language – If the NFA has n states, and the NFA accepts only

strings of length strictly less than n, then the language is finite

– If a given regular language is not finite then it is infinite

The Finiteness Problem

0 0 A B C

start

• If a regular language is infinite, it means that repetition is allowed when generating strings

• To define an infinite language we could use either: – The Kleene star (such as in a regular expression), or – Loops on states (such as in an NFA)

• Recall that an NFA for a regular language (finite or infinite) always has a finite number of states

• Algorithm: – Construct an NFA for the language – Test the NFA on strings that would force the NFA to go

through a loop if one exists

The Infiniteness Problem

• If an n-state NFA accepts a string of length n or longer, then there must be a state that appears at least twice on the path from the start state to a final state – This means that there must be a loop in the NFA,

because there are at least n+1 states visited along this particular path


Here’s an NFA for strings of consecutive 0’s that have at least two 0’s. It has 3 states. String 000, with length 3, is accepted. To accept this string, the NFA visits 3 + 1 = 4 states (A, B, B, C). State B appears twice (loop!). Notice that string 00 is also accepted, as is string 00000…

0

0

0 A B C

start

Let w = xyz be a string accepted by an NFA with sub-strings x, y, and z, and

y ≠ ε, i.e., |y| ≥ 1.

q x y

z

Then x yi z is in the language for all i ≥ 0.

This statement implies that if we can find such a w (where y is not ε) then there are an infinite number of strings in L (i.e., L is infinite).


Theorem: Let M = (Q, Σ, δ, s, F) be any NFA. If M accepts any string of length |Q| or greater, then the regular language recognized by M is infinite.

Proof: M starts in the start state and each time M reads an

input character, it visits another state. So, in processing a string of length n, M visits a total of n + 1 states. If n + 1 > |Q|, then, by the pigeonhole principle, some state must get more than one visit. So, if n ≥ |Q|, then M must visit at least one state more than once. This implies that there must be a loop in the NFA, which means that the state can be visited an infinite number of times.


Theorem: There is a countably infinite number of regular languages.

Proof: The upper bound on the number of regular

languages is the number of possible finite automata. Given an alphabet, we could enumerate all possible NFAs (start with those with one state, then those with two states, then three states, etc.). Thus, the number of NFAs is countably infinite. Since there are fewer regular languages than there are NFAs, the regular languages must also be countably infinite

How Many Languages are Regular?

Theorem: There is an uncountably infinite number of non-regular languages.

Proof: A language is a set of strings over a non-empty

finite alphabet ∑. Thus, the set of all languages is the set of all sets of strings (the power set of all strings). We have already proven (using the diagonalization technique) that the power set of any countably infinite set is uncountably infinite. Therefore, there is an uncountably infinite number of languages that are not regular.

So there must be many more non-regular languages than

there are regular ones

How Many Languages are Not Regular?

How Many Languages Are There?

• Every finite language is regular • Some infinite languages are regular:

− a*b* − {w ∈ {a, b}* : every a is immediately followed

by b} • Some infinite languages are not regular:

− {w ∈ {a, b}* : anbn, n ≥ 0} − {w ∈ {a, b}* : every a has a matching b

somewhere, and the number of b’s is at least as great as the number of a’s}

Is a Language Regular?

Showing that a Language is Not Regular

• Recall that every regular language can be recognized by some finite automaton

• Recall also that finite automata can only use a finite amount of memory to record essential properties of the language

Question: What is the longest string that a 5-state NFA can accept without going through any loops?


Question? If an NFA with n states accepts any string of length ≥ n, how many strings does it accept?

For example, let L = bab*ab

w = ba b ab is a string accepted by this language. x y z

Therefore, xy*z must also be in L.

So L includes: baab, babab, babbab, babbbbbbbbbbab


• To show that a language is not regular, we use the pumping lemma for regular languages – In an NFA, “long” strings require that some states

must be visited more than once (i.e., the NFA must contain at least one loop)

• Long strings can be “pumped” and still be accepted

– If a language contains at least one long string that cannot be pumped, then the language is not regular


For every regular language L, there is an integer k, such that for every string w in L of length ≥ k, we can write w = xyz such that:

1. |xy| ≤ k 2. |y| > 0 3. For all i ≥ 0, xyiz is in L

Number of states in NFA First cycle in NFA

The Pumping Lemma

• Recall our earlier claim that {anbn : n > 0} is not a regular language

• Proof by contradiction: – Suppose it is regular, then there must be an associated

k such that |xy| ≤ k – Pick any string in L: let’s pick w = akbk – Since |xy| ≤ k, both x and y must consist of only a’s – Pump y up by choosing i = 2

• Then xyyz should be in L, but it is not because this string has more a’s than b’s

• For example, if w = aaabbb and i = 2, then aaaabbb should also be in L, but it is not because it has more a’s than b’s

The Pumping Lemma

• What if we made a different choice for w? – We can still prove that L = {anbn : n > 0} is not regular – If L were regular, then there would exists some k such

that any string w, where |w| ≥ k, must satisfy the conditions of the lemma

– Let w = ak/2b k/2 – Since |w| ≥ k and w is in L, w must satisfy the

conditions of the pumping lemma: • There must exist an x, y, and z such that w = xyz,|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L)

• We now show that no such y can exist

The Pumping Lemma

Divide w into two regions: aaaaa…..aaaaaa | bbbbb…..bbbbbb 1 | 2 There are 3 places where y could occur – in the a region, in the b region, or across the boundary between the a’s and the b’s.

The Pumping Lemma

• Case 1: y = ap for some p ≥ 1. The resulting string is ak/2+p b k/2. This string is not in L, since it has more a’s than b’s.

• Case 2: y = bp for some p ≥ 1. The resulting string is ak/2 b k/2+p.

This string is not in L, since it has more b’s than a’s. • Case 3: y = (ab)p for some p ≥ 1. The resulting string is

ak/2 (ab)pb k/2 has interleaved a’s and b’s, and so is not in L. Therefore since there exists at least one long string in L for which there is no way to split w into xyz, such that the required properties are preserved, L is not regular.

The Pumping Lemma

• If L is regular, then every “long” string in L is pumpable • To show that L is not regular, find one long string in L that isn’t

pumpable • Thus, to use the pumping lemma to show that a language L is

not regular, we must: 1. Choose a string w, where |w| ≥ k. Since we do not know what k is,

we must state w in terms of k. 2. Separate the possibilities for y into a set of equivalence classes. 3. For each such class of possible y values where |xy| ≤ k and y ≠ ε,

choose a value for i such that xyiz is not in L.

The Pumping Lemma - Summary

• Prove that L = {abncn : n ≥ 0} is not regular • If L were regular, then there would exist some k

such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma

• Let w = abkck , where y occurs in the first k characters of w

• Since |w| = 2k + 1 and w is in L, w must satisfy the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,

|xy| ≤ k, y ≠ ε, and ∀i ≥ 0 (xyiz is in L) – We now show that no such x, y, and z exist

Equal b’s and c’s is Not Regular

• Recall that w = abkck , where y occurs in the first k characters of w

• If y includes the initial a, pump in one extra copy – The resulting string is not in L because it contains

more than one a • If y does not include the initial a, then it must be

bp, where 0 < p < k, so pump in one extra copy – The resulting string is not in L because it contains

more b’s than c’s • Therefore L is not regular

Equal b’s and c’s is Not Regular

Balanced Parentheses is Not Regular

• Prove that L = {w ∈ {(,)}* : the parentheses are balanced} is not regular

• If L were regular, then there would exist some k such that any string w, where |w| ≥ k, must satisfy the conditions of the pumping lemma

• Let w = (k )k • Since |w| = 2k and w is in L, w must satisfy the

conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,


Balanced Parentheses is Not Regular

• Since |xy| ≤ k, y must occur within the first k characters and so y = (p for some p ≥ 1

• Since y ≠ ε, p must be greater than 0 • Let i = 2 (in other words, pump in one extra

copy of y) • The resulting string is (k+p )k • This string must also be in L, but it is not since

it has more (‘s than )’s

Even Palindromes is Not Regular

• Prove that L = {wwR : w ∈ {a, b}*} is not regular • If L were regular, then there would exist some k


• Let w = akbkbkak • Since |w| = 4k and w is in L, w must satisfy the

conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,


Even Palindromes is Not Regular

• Since |xy| ≤ k, y must occur within the first k characters and so y = ap for some p

• Since y ≠ ε, p must be greater than 0 • Let i = 2 (in other words, pump in one extra copy of y) • The resulting string is ak+pbkbkak

• If p is odd, then this string is not in L because all strings in L have even length

• If p is even, then it is at least 2, so the first half of the string has more a’s than the second half so it is not in L

• Therefore, L is not regular

More a’s Than b’s is Not Regular

• Prove that L = {anbm : n > m} is not regular • If L were regular, then there would exist some k


• Let w = ak+1bk • Since |w| = 2k + 1 and w is in L, w must satisfy

the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,


More a’s Than b’s is Not Regular • Since |xy| ≤ k, y must occur within the first k characters and

so y = ap for some p • Since y ≠ ε, p must be greater than 0 • Notice that with our choice of w, there are already more a’s

than b’s, as required by the definition of L • If we pump in, there will be even more a’s and the resulting

string will still be in L • But if we set i = 0 (i.e., pump out) the resulting string will be

ak+1-pbk • Since p > 0, k + 1 – p ≤ k, so the resulting string no longer has

more a’s than b’s and so it is not in L • Since there exists at least one long string in L that fails to

satisfy the conditions of the pumping lemma, L is not regular

Prime Number of a’s is Not Regular

• Prove that L = {an : n is prime} is not regular • If L were regular, then there would exist some k


• Let w = aj, where j is the smallest prime number greater than k + 1 (e.g., w = aaaaa, k = 3, j = 5)

• Since |w| > k and w is in L, w must satisfy the conditions of the pumping lemma – There must exist an x, y, and z such that w = xyz,


Prime Number of a’s is Not Regular

• Since |xy| ≤ k, y must occur within the first k characters and so y = ap for some p

• Recall that pumping lemma requires that ∀i ≥ 0 (xyiz is in L), therefore ∀i ≥ 0 (a|x| + i⋅|y| + |z| must also be in L)

• This means that |x| + i⋅|y| + |z| must also be prime • Let i = |x| + |z| • Then |x| + i⋅|y| + |z| = |x| + (|x| + |z|) ⋅|y| + |z| = (|x| + |z|) ⋅ (1 + |y|) This is composite (i.e., not prime). So for at least one value of i, the resulting string is not in L. Therefore, L is not regular.

Choosing Values • When we use the pumping lemma to prove that a

language is not regular, we have two choices to make: w and i

• Heuristic for choosing w: – Choose a w that is in the part of L that makes it non-

regular – Choose a w that is only barely in L – Choose a w with as homogenous as possible an initial

region of length at least k • Heuristic for choosing i:

– Try letting i be either 0 or 2 – If that doesn’t work, try analyzing L to see if there is some

other value that will work

Sometimes Pumping Does Not Help

• Consider: L = {ai bj ck : i, j, k ≥ 0, and if i = 1 then j = k} • We could try to use the pumping lemma, after

re-writing this language as: L = {ai bj ck : i, j, k ≥ 0, and either (i ≠ 1) or (j = k)}

• What are the possible values for i? – If i = 0 then: if j ≠ 0, let y be b; otherwise let y be c

• Pump in or out, and i will still be 0 and thus not equal to 1, so the resulting string is in L

– If i = 1 then: let y be a • Pump in or out, then i will no longer be equal to 1, so the

resulting string is in L – If i = 2 then: let y be aa

• Pump in or out, and i cannot equal 1, so the resulting string is in L

– If i > 2 then: let y be a • Pump out once or in any number of times, and i cannot

equal 1, so the resulting string is in L

Sometimes Pumping Does Not Help

Use of Closure Properties

• Sometime the only (or easiest) way to prove that a language is not regular is to use the closure properties for regular languages, either alone or in conjunction with the pumping lemma

• The fact that regular languages are closed under intersection is particularly useful

• Recall that we proved that L = {abncn : n ≥ 0} is not regular

• Let Lnew = {aibjck: i,j,k > 0, and if i = 1 then j = k} – Notice that L = Lnew ∩ ab*c*

• If Lnew were regular, then L would also be regular because regular languages are closed under intersection

• But we have already proven that L is not regular, therefore Lnew is not regular

Closure Under Intersection

• Recall that we already proved L = {anbn : n ≥ 0} is not regular

• Let Lnew = the set of strings with an equal number of a’s and b’s – Notice that L = Lnew ∩ a*b*

• If Lnew were regular, then L would also be regular because regular languages are closed under intersection

• But we have already proven that L is not regular, therefore Lnew is not regular

Closure Under Intersection

• Let L = {aibj : i, j ≥ 0, and i ≠ j} • It seems unlikely that L is regular since any

machine that would accept it would have to be able to count the a’s and the b’s

• We could try to use the pumping lemma to prove that L is not regular – For example, let w = ak+1bk – But then y could be pumped up or pumped down

and the string would still be in L

Closure Under Complement


• Suppose we let w = akbk+k!

• Then y = ap for some non-zero p • Let i= (k!/p) + 1 (in other words, pump in (k!/p)

additional groups of y’s) – Note that k!/p must be an integer because p ≤ k

• The number of a’s in the resulting string is k + (k!/p)p = k + k!

• Therefore, the resulting string is ak+k!bk+k! which has an equal number of a’s and b’s, so it is not in L


• Closure under complement provides an easier way to show that L = {aibj : i, j ≥ 0, and i ≠ j} is not regular

• If L were regular, then ¬L would also be regular • ¬L = {anbn : n ≥ 0} ∪ {strings of a’s and b’s that do

not have all a’s in front of all b’s} • If ¬L is regular, then ¬L ∩ a*b* = {anbn : n ≥ 0}

must also be regular, but we have shown that {anbn : n ≥ 0} is not, therefore ¬L and L are not

Closure Under Reverse

• Let L = {ai bj ck : i, j, k ≥ 0, and if i = 1 then j = k} • Then LR = {ck bj ai : i, j, k ≥ 0, and if i = 1 then j = k} • If L were regular, then LR would be also • Let w = ckbka, and y must occur in the first k characters

of w, so y = cp, where 0 < p ≤ k • Set i = 0 (i.e., pump c out once): the resulting string

contains a single a, and the number of b’s and c’s must be equal for the string to be in LR

• But there are fewer c’s than b’s, so the resulting string is not in LR, therefore LR is not regular and neither is L

non-regular languages and the pumping lemma · 2015-10-02 · use the pumping lemma for regular...

Documents