finite automata & regular languages

Finite Automata &Regular Languages

Sipser, Chapter 1

Deterministic Finite Automata A DFA or deterministic finite automaton

M is a 5-tuple, M = (Q, , , q0, F), where: Q is a finite set of states of M is the finite input alphabet of M : Q Q is the state transition function q0 is the start state of M F Q is the set of accepting states or final

states of M

DFA Example State diagram

Q = { q0, q1 } = { 0, 1 }F = { q1 }

q0 q1

0

1

0

1

M

0 1

q0 q0 q1

q1 q1 q0

StateTable

State table &state transition function State table

State transition function(q0, 0) = q0, (q0, 1) = q1

(q1, 0) = q1, (q1, 1) = q0

0 1

q0 q0 q1

q1 q1 q0

State transitions If q, q’ Q, s , and (q, s) = q’,

then we say that q’ is an s-successor of q, or there is a transition from q to q’ on input s, and we writeq s q’

Example: since (q0, 1) = q1, then there is a transition from q0 to q1 on input 1, and we write q0 1 q1.

State sequences If a string of input symbols

w = s0s1s2 … sk-1 takes M from initial state q0 to state qk, namelyq0 s0 q1 s1 q2 s2 q3 … s[k-1] qk

then we say that qk is a w-successor of q0, and write q0 w qk. Also q0q1q2 … qk is called an admissible state sequence for w.

Strings accepted by a DFA Let M = (Q, , , q0, F) be a DFA, and

w = s0s1s2 … sk-1 * be a string over alphabet . Then M accepts w if there exists an admissible state sequence q0q1q2 … qk for w, starting at initial state q0 and ending with state qk, where qk F. That is, M accepts input string w if M ends up in one of the final states.

Language recognized by a DFA The language L(M) that is recognized

by a DFA, M = (Q, , , q0, F), is the set of all strings accepted by M. That is,L(M) = { w * | M accepts w }= { w * | q0 w qk, qk F }.

Example: For the previous DFA, L(M) is the set of all strings of 0s and 1s with odd parity, that is, odd number of 1s.

DFA Example 2 Recognizer for 11*01*

B

D

A C

1

1

01

0 0

0,1Trap

DFA Example 2 M = (Q, , , q0, F), L(M) = 11*01*

Q = { q0=A, B, C, D } = { 0, 1 }F = { C }

0 1

A D B

B C B

C D C

D D D

DFA Example 3 Modulo 3 counter

A

B

C

1

1

1,R

2,R

2

2

0,R

0

0

DFA Example 3 M = (Q, , , q0, F)

Q = { q0=A, B, C } = { 0, 1, 2, R }F = { A } 0 1 2 R

A A B C A

B B C A A

C C A B A

Regular Languages A language L * is called regular if

there exists a DFA M such that L(M)=L.

Earlier, we defined a language L * as regular if there exists a T3 or regular (left-linear or right-linear) grammar G such that L(G)=L. We shall prove that these two definitions are equivalent.

Operations on Regular Languages

Let A and B be regular languages:Union:A B = { x | x A or x B }

Concatenation:AB = { xy | x A and y B }.

Kleene Closure (A-star)A* = {x1x2x3 ... xk | k 0 and xi A }

Examples of regular operations

A = { good, bad }, B = { boy, girl }A B = { good, bad, boy, girl }AB = { goodboy, goodgirl, badboy, badgirl }A* = { , good, bad, goodgood, goodbad, badgood, badbad, … }

Closure under Union If A and B are regular languages,

then their union, A B, is a regular language

Union Machine M(A B)

q0

q1F

q2F

p0

p1F

p2F

M(A)

M(B)

r0

Closure under Concatenation If A and B are regular languages,

then their concatenation, AB, is a regular language.

Concatenation Machine M(AB)

Closure under Kleene Star If A is a regular language, then the

Kleene closure of A, A*, is also a regular language

Kleene Closure Machine M(A*)

NFAs:Nondeterministic Finite Automata

Presence of lambda transtitions. May have more than one initial

state. On input a, state q may have no

transition out. On input a, state q may have more

than one transition out.

NFAs A nondeterministic finite

automaton M is a five-tuple M = ( Q, , R, I, F ), where Q is a finite set of states is the (finite) input alphabet R is the transition relation, R

QQ I Q is the set of initial states F Q is the set of final states

Example NFAs NFA that recognizes the language

0*1 1*0 NFA that recognizes the language

(0 1)*11 (0 1)*

Converting NFAs to DFAs Given a NFA, M = (Q, , R, I, F), build

a DFA, M’ = (Q’, , , S0, F’) as follows. The states S0, S1, S2, … of M’ are sets of

states of M. The initial state of M’ is obtained by

putting together all the initial states of M and all states reachable from those by transitions, and calling this set S0, the initial state of M’

Converting NFAs to DFAs For each state Sk already in Q’ in M’, and

for each input symbol a , put together into a set Sj all states of M reachable from each state in Sk on input a. This set Sj may or may not yet already be in Q’. Also it may be the empty set . Add to the transition from Sk to Sj on input a.

Since there can only be a finite number of subsets of states of M, this procedure will stop after a finite number of steps.

Example conversions Convert the NFA for the language

(0 1)*00 (0 1)*11 to a DFA

0,10 0

0,1

1 1

A B C

D E F

State transition table of NFA 0 1

A A,B A -

B C - -

C - - -

D D D,E -

E - F -

F - - -

State table of DFA 0 1

A,D A,B,D A,D,E

A,B,D A,B,C,D A,D,E

A,D,E A,B,D A,D,E,F

A,B,C,D A,B,C,D A,D,E

A,D,E,F A,B,D A,D,E,F

State diagram of DFA

0

1

0

1

1 0

0

1

0

1AD

ABD

ADE

ABCD

ADEF

Regular Expressions (r.e.) If a , then the set a = {a} is a r.e. The set = {} is a r.e. The set = { } is a r.e. If R and S are r.e., then (R S) is a r.e. If R and S are r.e., then (RS) is a r.e. If R is a r.e., then ( R )* is a r.e. Any r.e. is obtained by a finite

application of the above rules.

REs and Regular Languages R.E.s are shorthand notation for

regular languages.

Regex: REs in Unix [a-f], [^a-f] R*, R+, R? {R} RS R|S

Minimization of DFAs Subset construction

(Myhill-Nerode Theorem)

NFAs, DFAs, & Lexical Analyzer Generators Sec 3.6: Finite Automata, Aho,

Sethi, Ullman, “Compilers: P.T.T” Sec 3.7: From REs to NFAs

(Thompson’s Construction) Sec 3.8: Design of a Lexical

Analyzer generator Sec 3.9: Optimization of DFA-based

Lexical Analyzers

finite automata & regular languages

Documents