regular languages - computer sciencecobweb.cs.uga.edu/~potter/theory/2.1_regular_languages.pdf · i...

Regular Languages

CSCI 2670

LaTex Help From Dr. Frederick W Maier

Fall 2014

CSCI 2670 Regular Languages

Strings and Languages

Definition

An alphabet is a nonempty, finite set of objects (symbols).

I Σ and Γ are usually used to indicate alphabets.

I For instance, Σ1 = {a, b, c , d , e, f }, Σ2 = {0, 1}

Definition

A string over Σ is any finite sequence of symbols from Σ.The empty string ε (sometimes λ) is the string consisting of no symbols.If w = a1a2 . . . an, n ≥ 0 and each ai ∈ Σ, is a string over Σ, then

I the length |w | of w is n. The length of ε is 0.

I wR = an . . . a2a1 is the reverse of w .

I a substring u of w is any consecutive sequence of 0 or more symbols of w .

I 0, 101, and 0101111 are strings over Σ = {0, 1}.I |0| = 1, |101| = 3, and |0101111| = 7.


Strings and Languages

Definition

Let x = x1 . . . xn and y = y1 . . . ym be strings over some alphabet.xy = x1 . . . xny1 . . . ym is the concatenation of x and y .

I Σ = {0, 1},I x = 01, y = 001.

I xy = 01001, yx = 00101.

Definition

If w is a string and k ∈ N, then wk is the concatenation of k w ’s.

I If w = 001, then w3 = 001001001.

I w0 = ε for any string w .


Σ∗ and Languages

Definition

Let Σ be some alphabet.Σ∗ is the set of strings defined as follows:

I Basis: ε ∈ Σ∗.

I If w ∈ Σ∗ and a ∈ Σ, then wa ∈ Σ∗.

I Nothing but strings in the basis or formed by a finite number ofapplications of the above rule are members of Σ∗.

Definition

A language L over an alphabet Σ is any subset of Σ∗.

In other words, a language is a set of strings.


Regular Operations

Definition

Let A and B be languages. The regular operations are:

I Union A ∪ B: {w |w ∈ A or w ∈ B}.I Concatenation A ◦ B: {uv |u ∈ A and v ∈ B}.I (Kleene) Star A∗: {w1w2 . . .wn| n ≥ 0 and each wi ∈ A}.

I Observe that for any language A, ε ∈ A∗.

I If Σ is a set of symbols, then Σ∗ is just the set of finite strings madefrom symbols of Σ.

I We often write AB rather than A ◦ B.


Regular Operations

Example

Let A = {good , bad} and B = {boy , girl}I A ∪ B: {good , bad , boy , girl}.I A ◦ B: {goodboy , badboy , goodgirl , badgirl}.I A∗: {ε, good , bad , goodgood , goodbad , badgood , badbad , . . .}.I B∗: {ε, boy , girl , boyboy , boygirl , girlboy , girlgirl , . . .}.


Regular Operations

I There are several equivalent definitions for regular languages.

I They can be defined recursively via regular operations.

Let Σ be an alphabet.

I Basis:

I ∅ is a regular language over Σ.I {ε} is a regular language over Σ.I For each a ∈ Σ, {a} is a regular language over Σ.

I Recursion:

I If A and B are regular languages over Σ, A ∪ B is a regular languageover Σ.

I If A and B are regular languages over Σ, A ◦ B is a regular languageover Σ.

I If A is a regular language over Σ, A∗ is a regular language over Σ.

I Closure: Only languages formed via finite applications of the above rulesare regular languages over Σ.

I The above will work. However, instead we will define regular languagesvia automata.


Finite Automata (FA)

I A simple model of computation

I Read an input string from tape

I Determine if the input string is in a language

I Determine if the answer for the problem is “YES” or “NO” for the giveninput on the tape


How Does a FA Work?

I At the beginning,

I the FA is in the start state (initial state)I its tape head points at the first cell

I For each move, the FA

I reads the symbol under its tape headI changes its state (according to the transition function) to the next state

determined by the symbol read from the tape and its current stateI move its tape head to the right one cell


How Does a FA Work?

I a FA stops

I when it reads all symbols on the tape

I Then, it gives an answer if the input string is in the specific language:

I Answer “YES” if its last state is an accept stateI Answer “NO” if its last state is not an accept state


Deterministic Finite Automata (Informal)

I Regular languages can be defined using deterministic finite automata(DFAs).

I Informally, a DFA M consists of a set of states q0, q1, . . . qn.

I q0 is called the start state.I Some subset of q0, q1, . . . qn comprises the accept states.

I DFA M reads an input string w = w1 . . .wm, m ≥ 0:

I M operates in discrete steps.I M occupies exactly one state at any given time.I M begins in state q0.I M reads w one character at a time, moving left to right.I A transition function, together with the current state and character

determines M’s next state.

I If M reaches the end of w in an accept state, then M accepts string w .Otherwise, it rejects it.

I The language L(M) of M is the set of strings M accepts.

I A language is regular if and only if there is some DFA that accepts it.


State Diagrams

Figure: An example DFA M1.

I A state diagram, a directed graph, is often used to represent DFAs.

I Nodes represent states.

I Nodes circled twice represent accept states.

I An edge with no starting node indicates the start state.

I Labelled edges represent the transition function.

I An edge qi−→a qj means that if M is in state qi and reads an a, it should

move to state qj .

Here, L(M1), the language of M1, is the set of strings that have at leastone 1, and the last 1 is followed by an even number of 0s.


Deterministic Finite Automata (Formal)

Definition

I deterministic finite automaton (DFA) is a 5-tuple (Q,Σ, δ, q0,F ):

I Q is a finite, nonempty set of states.I Σ is a finite, nonempty set (the alphabet).I δ : Q × Σ→ Q is total function, the transition function.I q0 ∈ Q is the start state.I F ⊆ Q is the set of accept states.

I Observe that Q and Σ must be finite and nonempty.

I q0 must be in Q.

I δ is a total function and so maps every pair (q, a) of state q and symbola to some state q′.

I F might be empty.


Deterministic Finite Automata

The above DFA can be formally defined as follows:

I Q = {q1, q2, q3}.I Σ = {0, 1}.I δ is given by the table:

δ 0 1

q1 q1 q2

q2 q3 q2

q3 q2 q2

I q0 = q1.

I F = {q2}



The above DFA M2 can be formally defined as follows:

I Q = {q1, q2}.I Σ = {0, 1}.I δ is given by the table:

δ 0 1

q1 q1 q2

q2 q1 q2

I q0 = q1.

I F = {q2}

L(M2) = {w | w ends in a 1}.



The above DFA M3 can be formally defined as follows:

I Q = {q1, q2}.I Σ = {0, 1}.I δ is given by the table:

δ 0 1

q1 q1 q2

q2 q1 q2

I q0 = q1.

I F = {q1}

L(M3) = {w | w ends in a 0} ∪ {ε}.



DFA M4 can be formally defined asfollows:

I Q = {s, q1, q2, r1, r2}.I Σ = {a, b}.I δ is given by the table:

δ a bs q1 r1q1 q1 q2

q2 q1 q2

r1 r2 r1r2 r2 r1

I q0 = s.

I F = {q1, r1}

L(M4) = {w |w begins and ends in a} ∪ {w |w begins and ends in b}.


Language Acceptance/Recognition for DFAs (Formal)

Definition

I Let M = (Q,Σ, δ, q0,F ) be a DFA.

I Let w = w1 . . .wn be a string such that each wi ∈ Σ.

I M accepts w if and only if there exists a sequence r0, r1, . . ., rn ofstates of Q such that

I r0 = q0.I For each 0 ≤ i < n, δ(ri ,wi+1) = ri+1.I rn ∈ F .

I Otherwise, M rejects w .

Definition

If M is a DFA, then M recognizes language L if L = {w |M accepts w}.

Definition

Language L is regular if there exists some DFA M such that Mrecognizes L.


How to Construct a DFA?

I Determine what a DFA needs to memorize in order to recognize stings inthe language

I Hint: the property of the strings in the language

I Determine how many states are required to memorize what we want

I Accept state(s) memorize the property of the string in the language

I Find out how the thing we memorize is changed once the next inputsymbol is read

I From this change, we get the transition function


Example: Constructing a DFA

I Suppose that Σ = {0, 1} and the language consists of all strings with anodd number of 1s. Construct a DFA to accept this language.

I ??How about all strings with an even number of 1s?

I Construct a DFA to accept the language, which consists of strings thatrepresent binary numbers divisible by 3.

I Decide what a DFA needs to memorizeI How many states do we needI Construct the transition diagram


Example: Constructing a DFA

1. Suppose that Σ = {0, 1} and the language consists of all strings thathave 00 or 11 as substrings. Construct a DFA to accept this language.

2. Suppose that Σ = {0, 1} and the language consists of all strings thathave 00 and 11 as substrings. Construct a DFA to accept this language.

I Decide what a DFA needs to memorize

I How many states do we need

I Construct the transition diagram


Regular Operations: Closure

Definition

Recall the regular operations:

I Union A ∪ B: {w |w ∈ A or w ∈ B}.I Concatenation A ◦ B: {uv |u ∈ A and v ∈ B}.I (Kleene) Star A∗: {w1w2 . . .wn| n ≥ 0 and each wi ∈ A}.

I These can be used to define regular languages, but we will use aDFA-based account as our primitive.

I We will use this account to show that the set of regular languages isclosed under the regular operations.

Let A1 and A2 be any languages defined over alphabet Σ.

I If A1 and A2 are regular languages, then A1 ∪ A2 is regular.

I If A1 and A2 are regular, then A1 ◦ A2 is regular.

I If A1 is regular, then A∗1 is regular.


Closure of regular languages under union

Theorem

If A1 and A2 are regular languages, then A1 ∪ A2 is regular.

Proof.

Wlog, we may assume that A1 and A2 are defined over the same alphabet Σ.Since A1 and A2 are both regular, there exist DFAs M1 and M2 such thatL(M1) = A1 and L(M2) = A2.

I M1 = (Q1,Σ, δ1, q1,F1)

I M2 = (Q2,Σ, δ2, q2,F2)

The proof is by construction. We construct a DFA M to recognize A1 ∪ A2.Specifically, M = (Q,Σ, δ, q0,F ), where

I Q = {(r1, r2)|r1 ∈ Q1 and r2 ∈ Q2};I q0 = (q1, q2);

I F = {(r1, r2)|r1 ∈ F1 or r2 ∈ F2};I For each (r1, r2) ∈ Q and each a ∈ Σ, δ((r1, r2), a) = (δ1(r1, a), δ2(r2, a)).



Proof, Cont.

We must prove the construction works. We must prove L(M) = A1 ∪ A2.(LR) Suppose M accepts string w = w1w2 . . .wn. Then by definition thereexists a sequence of states (r0, s0), (r1, s1), . . . (rn, sn) such that

I (r0, s0) = q0;

I For each 0 ≤ i < n, δ((ri , si ),wi+1) = (ri+1, si+1).

I (rn, sn) ∈ F ;

However, by construction of M, r0 = q1, s0 = q2, and since (rn, sn) ∈ F , it mustbe that either rn ∈ F1 or sn ∈ F2. We may assume that its the former.Similarly, by the construction of δ, for each 0 ≤ i < n, ifδ((ri , si ),wi+1) = (ri+1, si+1) then δ1(riwi+1) = ri+1. Given all of this, thesequence r0 . . . rn satisfies all of the requirements needed to show that M1

accepts w .From this, w ∈ A1 and consequently w ∈ A1 ∪ A2.



Proof, Cont.

For the other direction, we show that if w ∈ L(M1), then w ∈ L(M).(RL) Suppose M1 accepts string w = w1w2 . . .wn. Then by definition thereexists a sequence of states r0, r1, . . . rn such that

I r0 = q1;

I For each 0 ≤ i < n, δ1(ri ,wi+1) = ri+1.

I rn ∈ F1;

Construct the following sequence (r0, s0), (r1, s1), . . . (rn, sn), such that

I (r0, s0) = (q1, q2);

I for each 0 ≤ i < n, (ri+1, si+1) = (δ1(ri ,wi+1), δ2(si ,wi+1)).



Proof, Cont.

Construct the following sequence (r0, s0), (r1, s1), . . . (rn, sn), such that

I (r0, s0) = (q1, q2);

I for each 0 ≤ i < n, (ri+1, si+1) = (δ1(ri ,wi+1), δ2(si ,wi+1)).

Observe that:

I (r0, s0) = q0;

I For each 0 ≤ i < n,(ri+1, si+1) = (δ1(ri ,wi+1), δ2(si ,wi+1)) = δ((ri , si ),wi+1).

I (rn, sn) ∈ F , because rn ∈ F1;

As such, the sequence (r0, s0), (r1, s1), . . . (rn, sn) satisfies all of the requirementsneeded to show that M accepts w , and so w ∈ L(M).


Nondeterministic Finite Automata (NFAs)

I The behavior of DFAs is completely deterministic. The next state of themachine is determined completely by its current state and the symbol tobe read from input.

I Nondeterministic finite automata (NFAs) eliminate this determinism.

I With NFAs, there may be a choice of next state.

I Though NFAs in a sense generalize DFAs (all DFAs are NFAs but notvice versa), the two computational models are equivalent.

I NFAs and DFAs both accept exactly the regular languages.I A language is regular iff there is a DFA that accepts it.I A language is regular iff there is an NFA that accepts it.

I To prove that A1 ◦ A2 and A∗1 yield regular languages (provided A1, A2

are regular), we will use NFAs.


Nondeterministic Finite Automata

In an NFA N

I For any state q and symbol a, q might have 0, 1, or > 1 transitions.

I ε is allowed in a transition (N switches states but consumes no input).

I In a DFA there is only one way to process an input string w .

I In an NFA, there might be multiple possible ways of processing it.

I There might be multiple computation paths.

I If there is a choice of the next state, you may think of the path asbranching off in multiple directions.

I The NFA accepts a string w if any of these possible computation pathsend in an accept state.



NFA N2 accepts the language of bit-strings w such that w ends in 100, 101,110, or 111.

I Often, it is easier to design and understand NFAs than DFAs.

I Though DFAs and NFAs are equivalent in computational power, the DFAto accept a given language might have many more states than an NFAthat accepts it.


Nondeterministic Finite Automata (Formal)

Definition

If Σ is an alphabet, then Σε = Σ ∪ {ε}.I A nondeterministic finite automaton (NFA) is a 5-tuple

(Q,Σ, δ, q0,F ):

I Q is a finite, nonempty set of states.I Σ is a finite, nonempty set (the alphabet).I δ : Q × Σε → P(Q) is total function, the transition function.I q0 ∈ Q is the start state.I F ⊆ Q is the set of accept states.

I Observe that the transition function differs from that for DFAs.

I In a DFA, δ maps a pair (q, a) to a single state q′.

I In an NFA, δ maps a pair (q, a) to a set of states.

I Also, in an NFA, the domain of δ is Q × Σε and not Q × Σ.


Language Acceptance/Recognition for NFAs (Formal)

Definition

I Let N = (Q,Σ, δ, q0,F ) be an NFA.

I Let w = w1 . . .wn be a string such that each wi ∈ Σ.

I N accepts w if and only if there exists a sequence r0, r1, . . ., rn ofstates of Q such that

I r0 = q0.I For each 0 ≤ i < n, ri+1 ∈ δ(ri ,wi+1).I rn ∈ F .

I Otherwise, M rejects w .

Definition

If N is an NFA, then N recognizes language L if L = {w |N accepts w}.

Note that the “next” state ri+1 in the sequence is one of the set of statesindicated by δ(ri ,wi+1).



The above NFA N1 can be formally defined as follows:

I Q = {q1, q2, q3, q4}.I Σ = {0, 1}.I δ is given by the table:

δ 0 1 ε

q1 {q1} {q1, q2} ∅q2 {q3} ∅ {q3}q3 ∅ {q4} ∅q4 {q4} {q4} ∅

I q0 = q1.

I F = {q4}


Equivalence between NFAs and DFAs

I Every DFA is an NFA.

Theorem

Every NFA has an equivalent DFA.

Let N = (Q,Σ, δ, q0,F ) be an NFA.We construct an equivalent DFA M = (Q ′,Σ, δ′, q′0,F

′) as follows:

I Q ′ = P(Q) (the powerset of Q).

I q′0 = {q0}.I F = {R|R ∈ Q ′ and there is a q ∈ R such that q ∈ F}.I For any R ∈ Q ′ and a ∈ Σ, δ(R, a) = {q| r ∈ R and q ∈ δ(r , a)}.

I N processes a string in multiple parallel computation paths.

I N must still operate in discrete steps, however.

I At any step, N will “occupy” some subset S of states of Q.I The states of M encode these sets of states.

I Consider reading string w1w2 . . ..

I M begins in state {q0}. M transitions to {q| q0 →w1 q is an edge of N}.CSCI 2670 Regular Languages

Equivalence between NFAs and DFAs

I The previous construction did not account for ε-edges.

I Consider the following alterations:

I E(R) = {q| q is reachable from any state of R by following only ε-edges}.I For any R ∈ Q ′ and a ∈ Σ, δ(R, a) = {q| r ∈ R and q ∈ E(δ(r , a))}.I q′0 = E({q0})

I These are sufficient to construct a DFA M equivalent to arbitrary NFA N.

Note that Sipser does not provide a proof that the construction works. Insteadhe states that it “obviously works correctly”.

Given the equivalence

Corollary

A language is regular if and only if it is recognized by some NFA.


Example: NFA to DFA

NFA N4 has three states. The DFA M made from it has 8 states.Construct M.


Example: NFA to DFA

NFA N4 has three states. The DFA M made from it has 8 states.

δM a b∅ ∅ ∅{1} ∅ {2}{2} {2, 3} {3}{3} {1, 3} ∅{1, 2} {2, 3} {2, 3}{1, 3} {1, 3} {2}{2, 3} {1, 2, 3} {3}{1, 2, 3} {1, 2, 3} {2, 3}

FM = {{1}, {1, 2}, {1, 3}, {1, 2, 3}}. q0 for M is {1, 3}.


Example: NFA to DFA

DFA M.


Example: NFA to DFA

DFA M has been simplified by removing unreachable nodes.


Closure of regular languages under concatenation

I Given that NFAs delineate the class of regular languages, we will use them toshow that regular languages are closed under concatenation.

Theorem

If A1 and A2 are regular languages, then A1 ◦ A2 is regular.

Since A1 and A2 are both regular, there exist NFAs N1 and N2 such thatL(N1) = A1 and L(N2) = A2.

I N1 = (Q1,Σ, δ1, q1,F1)

I N2 = (Q2,Σ, δ2, q2,F2)


Closure of regular languages under concatenation

Since A1 and A2 are both regular, their exist NFAs N1 and N2 such thatL(N1) = A1 and L(N2) = A2.

I N1 = (Q1,Σ, δ1, q1,F1)

I N2 = (Q2,Σ, δ2, q2,F2)

We construct an NFA N = (Q,Σ, δ, q0,F ) to recognize A1 ◦ A2:

I Q = Q1 ∪ Q2

I q0 = q1;

I F = F2

I For each q ∈ Q and a ∈ Σ:

I δ(q, a) = δ1(q, a) if q ∈ Q1 and q /∈ F1.I δ(q, a) = δ1(q, a) ∪ {q2} if q ∈ F1 and a = ε.I δ(q, a) = δ1(q, a) if q ∈ F1 and a 6= ε.I δ(q, a) = δ2(q, a) if q ∈ Q2.


Closure of regular languages under star

Theorem

If A is a regular language, then A∗ is regular.

Let N1 = (Q1,Σ, δ1, q1,F1) be an NFA such that L(N1) = A.We construct an NFA N = (Q,Σ, δ, q0,F ) to recognize A∗:

I Q = Q1 ∪ {q0}, q0 /∈ Q1

I F = F1 ∪ {q0}I For each q ∈ Q and a ∈ Σ:

I δ(q, a) = ∅ if q = q0 and a 6= ε.I δ(q, a) = {q1} if q = q0 and a = ε.I δ(q, a) = δ1(q, a) if q /∈ F1

I δ(q, a) = δ1(q, a) if q ∈ F1 and a 6= ε.I δ(q, a) = δ1(q, a) ∪ {q1} if q ∈ F1 and a = ε.

Note again that Sipser does not provide proofs that these constructions work.


regular languages - computer sciencecobweb.cs.uga.edu/~potter/theory/2.1_regular_languages.pdf · i...

Documents