chapter 3 regular expressions and languages

74
1 Chapter 3 Chapter 3 Regular Expressions Regular Expressions and Languages and Languages Giza Pyramids, Egypt

Upload: drea

Post on 06-Jan-2016

169 views

Category:

Documents


7 download

DESCRIPTION

Chapter 3 Regular Expressions and Languages. Giza Pyramids, Egypt. Outline. 3.1 Regular Expressions 3.2 Finite Automata & Regular Expressions 3.3 Applications of RE’s 3.4 Algebraic Laws for RE’s. 3.1 Regular Expressions. Use of Regular expressions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 3      Regular Expressions and Languages

11

Chapter 3 Regular Chapter 3 Regular Expressions and LanguagesExpressions and Languages

Giza Pyramids, Egypt

Page 2: Chapter 3      Regular Expressions and Languages

22

OutlineOutline 3.1 Regular Expressions3.1 Regular Expressions 3.2 Finite Automata & Regular 3.2 Finite Automata & Regular

ExpressionsExpressions 3.3 Applications of RE’s3.3 Applications of RE’s 3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

Page 3: Chapter 3      Regular Expressions and Languages

33

3.1 Regular Expressions3.1 Regular Expressions

Use of Regular expressionsUse of Regular expressions– The regular expression is a kind of The regular expression is a kind of

generator for languages.generator for languages.

– It offers a “declarative” way of It offers a “declarative” way of expressing strings of symbols.expressing strings of symbols.

– It defines It defines all and onlyall and only regular languages regular languages (a theorem).(a theorem).

Page 4: Chapter 3      Regular Expressions and Languages

44

3.1 Regular Expressions3.1 Regular Expressions

Applications of Regular expressionsApplications of Regular expressions– Used as commands for finding strings in Used as commands for finding strings in

Web browsers or text-formatting systems Web browsers or text-formatting systems (such as UNIX (such as UNIX grepgrep commands) commands)**

– Used as lexical analyzer generator (such LUsed as lexical analyzer generator (such Lex or Flex)ex or Flex) A lexical analyzer breaks source programs intA lexical analyzer breaks source programs int

o “tokens” (keywords, identifiers, signs, …)o “tokens” (keywords, identifiers, signs, …)

– The The grepgrep command searches files or standard input globally for li command searches files or standard input globally for lines matching a given regular expression, and prints them to the prnes matching a given regular expression, and prints them to the program's standard output. ogram's standard output.

Page 5: Chapter 3      Regular Expressions and Languages

55

3.1 Regular Expressions3.1 Regular Expressions

Operators of Regular ExpressionsOperators of Regular Expressions

– Review of three operations on languages Review of three operations on languages LL

and and MM:: UnionUnion --- --- LL∪∪M = M = {{xx | | xxLL or or xxMM}}

ConcatenationConcatenation --- --- LM = LM = {{xyxy | | xxLL, , yyMM}}

– Example --- Example --- LL00 = { = {}, }, LL11 = = LL, , LL22 = = LL, …LL, …

ClosureClosure (or star, or Kleene closure) --- (or star, or Kleene closure) ---

LL** = = LL00∪∪LL11∪∪LL22∪∪......

Page 6: Chapter 3      Regular Expressions and Languages

66

3.1 Regular Expressions3.1 Regular Expressions

Example 3.1 --- Example 3.1 --- (language)(language)

– ** = { = {} because } because 00 = { = {}.}.

– If If LL = {0, 1}, then = {0, 1}, then LL00 = { = {}, }, LL11 = = LL, , LL22 = =

{00. 01, 10, 11}, …{00. 01, 10, 11}, …

– If If LL is the set of all strings of 0’s, then it is the set of all strings of 0’s, then it can be proved that can be proved that LL** is is LL itself (see the itself (see the textbook for the proof).textbook for the proof).

Page 7: Chapter 3      Regular Expressions and Languages

77

3.1 Regular Expressions3.1 Regular Expressions

3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions– Recursive definition of a regular expression Recursive definition of a regular expression (RE)(RE)

EE and the language which it defines, and the language which it defines, LL((EE):): BasisBasis: :

– Constants Constants and and are RE’s, defining languages { are RE’s, defining languages {} } and and , respectively , respectively LL(() = {) = {}, }, LL(() = ) = ..

– If If aa is a symbol, then is a symbol, then aa is an RE, defining the lang is an RE, defining the language {uage {aa} } LL((aa) = {) = {aa}. (note: }. (note: aa is of bold face)is of bold face)

– A variable like A variable like LL (capitalized and italic) (capitalized and italic) represents represents any language.any language.

Page 8: Chapter 3      Regular Expressions and Languages

88

3.1 Regular Expressions3.1 Regular Expressions 3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions

– Recursive definition of an RE (cont’d):Recursive definition of an RE (cont’d): InductionInduction: given two RE’s : given two RE’s EE and and FF, then, then

– E E + + FF is an RE such that is an RE such that LL((E E + + FF) = ) = LL((EE))∪∪LL((FF) ) ((unionunion))

– EFEF is an RE such that is an RE such that LL((EFEF) = ) = LL((EE))LL((FF))((concatenationconcatenation))

– EE** is an RE such that is an RE such that LL((EE**) = () = (LL((EE))))** ((closureclosure))

– ((EE) is an RE such that ) is an RE such that LL((((EE)) = )) = LL((EE) ) ((parenthparenth

esizationesization).).

Page 9: Chapter 3      Regular Expressions and Languages

99

3.1 Regular Expressions3.1 Regular Expressions

Examples (supplemental)(1/4)Examples (supplemental)(1/4) – RE RE FF = = 11 “expresses” the language “expresses” the language LL((11) = ) =

{1}.{1}.

– RE RE E = E = 11**

Language expressed by Language expressed by EE --- ---

LL = = LL((EE) = ) = LL((11**) = () = (LL((11))))* * = ({1})= ({1})** (closure of language)(closure of language)

= {= {, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}

= {1= {1nn | | nn 0} 0}

Page 10: Chapter 3      Regular Expressions and Languages

1010

3.1 Regular Expressions3.1 Regular Expressions

Examples (supplemental)(2/4)Examples (supplemental)(2/4)

– RE RE G = G = 0011**

Language expressed by Language expressed by GG --- ---

LL = = LL((GG) = ) = LL((0101**) = ) = LL((00))LL((11**) ) (concatenation)(concatenation)

= {0}{= {0}{, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}

= {0, 01, 011, 0111, …}= {0, 01, 011, 0111, …}

= {01= {01nn | | nn 0} 0}

Page 11: Chapter 3      Regular Expressions and Languages

1111

3.1 Regular Expressions3.1 Regular Expressions

Examples (supplemental)(3/4)Examples (supplemental)(3/4)

– RE RE H = H = 11 + + 0011**

Language expressed by Language expressed by HH --- ---

LL = = LL((HH) = ) = LL((11 + + 0101**) = ) = LL((11) ) U U LL(0(011**))

= {1} = {1} UU {0, 01, 011, 0111, …} {0, 01, 011, 0111, …}

= {1, 0, 01, 011, 0111, …}= {1, 0, 01, 011, 0111, …}

= {1}= {1}UU{01{01nn | | nn 0} 0}

Page 12: Chapter 3      Regular Expressions and Languages

1212

3.1 Regular Expressions3.1 Regular Expressions Examples (supplemental)(4/4)Examples (supplemental)(4/4)

– RE RE K = K = + + aa**

Language expressed by Language expressed by KK --- ---

LL = = LL((KK) = ) = LL(( + + aa**) = ) = LL(( ) ) UU LL((aa**))

= {= {} } UU { {a, aa, aaa, …}a, aa, aaa, …}

= {= {a, aa, aaa, …}a, aa, aaa, …}

= = LL((aa**))

That is, we have the following That is, we have the following RE equalitiesRE equalities::

+ + aa** = = aa** = = aa* * ++

Page 13: Chapter 3      Regular Expressions and Languages

1313

3.1 Regular Expressions3.1 Regular Expressions

Example 3.2 Example 3.2 – An RE defining a language of strings of An RE defining a language of strings of

alternating 0’s and 1’s alternating 0’s and 1’s (including none)(including none) is is one of the two below:one of the two below: ((0101))** + ( + (1010)* + )* + 00((1010))** + + 11((0101))* *

(0…1 1…0(0…1 1…0 0…0 1…1) 0…0 1…1) (( + + 11)()(0101)*()*( + + 00))

((Why? See the textbook.)Why? See the textbook.)

Page 14: Chapter 3      Regular Expressions and Languages

1414

3.1 Regular Expressions3.1 Regular Expressions

3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators

– Precedence Precedence

Highest --- Highest --- ** (closure)(closure)

Next--- . Next--- . (concatenation) (left to right)(concatenation) (left to right)

Last--- + Last--- + (union) (left to right)(union) (left to right)

Use parentheses anywhere to resolve ambiguityUse parentheses anywhere to resolve ambiguity

Page 15: Chapter 3      Regular Expressions and Languages

1515

3.1 Regular Expressions3.1 Regular Expressions

3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators

– Example 3.3: Example 3.3:

Three ways to interpret Three ways to interpret 0101* + * + 1:1:

((00((11*)) + *)) + 1 1 by precedence above by precedence above (= (= 0101* + * + 1)1)

((0101)* +)* + 1 (another meaning) 1 (another meaning)

00((11* + * + 11) ) (a third meaning)(a third meaning)

Page 16: Chapter 3      Regular Expressions and Languages

1616

3.2 FA’s & RE’s3.2 FA’s & RE’s

Theorems to be proved:Theorems to be proved:

– Every language defined by a DFA is also Every language defined by a DFA is also

defined by an RE.defined by an RE.

– Every language defined by an RE is also Every language defined by an RE is also

defined by an defined by an -NFA.-NFA.

Page 17: Chapter 3      Regular Expressions and Languages

1717

3.2 FA’s & RE’s3.2 FA’s & RE’s

Relations of theorems (yellow lines are to be Relations of theorems (yellow lines are to be proved): proved):

-NFA-NFA

RERE

NFANFA

DFADFA

Page 18: Chapter 3      Regular Expressions and Languages

1818

3.2 FA’s & RE’s3.2 FA’s & RE’s

3.2.1 From DFA’s to RE’s3.2.1 From DFA’s to RE’s– Theorem 3.4:Theorem 3.4:

If If LL = = LL((AA) for some DFA ) for some DFA AA, then there is an RE , then there is an RE RR such that such that LL = = LL((RR).).

ProofProof. . Prove by constructing progressively string sets defineProve by constructing progressively string sets define

d by a d by a certain RE formcertain RE form RRijij((kk)) until the entire set of accepuntil the entire set of accep

table strings (i.e., language table strings (i.e., language LL((AA)) is obtained.)) is obtained. Assume the states are {1, 2, ..., Assume the states are {1, 2, ..., nn} (1 is the start state).} (1 is the start state).

Page 19: Chapter 3      Regular Expressions and Languages

1919

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) --- ---

– RE RE RRijij((kk)) is used to denote the set of strings is used to denote the set of strings

ww such that such that Each Each ww is the label of a path from state is the label of a path from state ii to to

state state jj in DFA in DFA AA; ;

the path has the path has nono intermediateintermediate node whose node whose

number is larger than number is larger than kk..

Page 20: Chapter 3      Regular Expressions and Languages

2020

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) --- ---

– T construct T construct RRijij((kk)),, we use induction, starting we use induction, starting

at at kk = 0 and stop at = 0 and stop at kk = = n (the largest state n (the largest state number)number).. Then, when Then, when kk = = nn, , ii = =11, and , and jj specifies an specifies an

acceptingaccepting state, then state, then RRijij((kk)) defines a set of defines a set of

strings strings acceptedaccepted by DFA by DFA AA, with each string , with each string forming a path starting from the start state to forming a path starting from the start state to the accepting state.the accepting state.

Page 21: Chapter 3      Regular Expressions and Languages

2121

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) --- ---

Basis:Basis:

– when when kk = 0, all state numbers = 0, all state numbers 1, and so ther 1, and so ther

e is e is nono intermediate state in path intermediate state in path ii to to jj, leading , leading

to 2 cases:to 2 cases:

(1)(1) an arc (a transition) from an arc (a transition) from ii to to jj;;

(2)(2) a path from a path from ii to to ii itself. itself.

Page 22: Chapter 3      Regular Expressions and Languages

2222

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) --- ---

Basis (cont’d):Basis (cont’d):

– If If ii jj,, only only (1)(1) is possible, leading to 3 cases: is possible, leading to 3 cases:

no symbol for such a transition no symbol for such a transition RRijij(0) (0) = =

one symbol one symbol aa for the transition for the transition RRijij(0) (0) = = aa

multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,

RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm

Page 23: Chapter 3      Regular Expressions and Languages

2323

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---

Basis (cont’d) Basis (cont’d) ii jj::

RRijij(0) (0) = =

RRijij(0) (0) = = aa

RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm

qi qj

a qi qj

a1+…+am qi qj

Page 24: Chapter 3      Regular Expressions and Languages

2424

3.2 FA’s & RE’s3.2 FA’s & RE’s Meaning of Meaning of RRijij

((kk)) --- ---

Basis (cont’d):Basis (cont’d):

– If If ii = = jj,, only only (2)(2) is possible, which means is possible, which means there exists at least a there exists at least a path path from from ii to to ii itself, itself, in addition to the 3 cases:in addition to the 3 cases: no symbol for such a transition no symbol for such a transition RRijij

(0)(0)= =

one symbol one symbol aa for the transition for the transition RRijij(0)(0)= = + + aa

multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,

RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm

Page 25: Chapter 3      Regular Expressions and Languages

2525

3.2 FA’s & RE’s3.2 FA’s & RE’s

Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---

Basis (cont’d) Basis (cont’d) ii = = jj::

RRijij(0) (0) = =

RRijij(0) (0) = = + + aa

RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm

+ a

qi

qi

qi

+a1+…+am

Page 26: Chapter 3      Regular Expressions and Languages

2626

3.2 FA’s & RE’s3.2 FA’s & RE’s

InductionInduction (to compute (to compute RRijij((kk)) ))::

– Suppose there is a path from Suppose there is a path from ii to to jj that goes t that goes through no state numbered higher than hrough no state numbered higher than kk. Th. Then, two cases should be considered:en, two cases should be considered: (1)(1) the path does not go through the path does not go through k k RRijij

((k-k-1)1)

(2) (2) the path goes through the path goes through k k at least once, then that least once, then the path may be broken into 3 pieces:e path may be broken into 3 pieces:

– through through ii to to kk without passing without passing kk RRiikk((k-k-1)1)

– from from kk to to kk itself itself ( (RRkkkk((k-k-1)1)))** (recusive) (recusive);;

– from from kk to to jj without passing without passing k k RRkkjj((k-k-1)1)..

Page 27: Chapter 3      Regular Expressions and Languages

2727

3.2 FA’s & RE’s3.2 FA’s & RE’s

Illustration of paths represented by Illustration of paths represented by RRiijj((kk)) ::

ki j……

((RRkkkk((k-k-1)1)))**

circulating zero or more times

RRkkjj((k-k-1)1)RRiikk

((k-k-1)1)

Page 28: Chapter 3      Regular Expressions and Languages

2828

3.2 FA’s & RE’s3.2 FA’s & RE’s

Induction (cont’d)Induction (cont’d)::

– The three pieces are concatenated to beThe three pieces are concatenated to be

RRikik((k-k-1)1)((RRkkkk

((k-k-1)1)))**RRkjkj((k-k-1)1)..

– Combining Combining (1)(1) & & (2)(2), we get the RE defini, we get the RE defini

ng “ng “all the paths from all the paths from ii to to jj that go through that go through

no state higher than no state higher than kk”” as as

RRijij((kk)) = R = Rijij

((k-k-1)1) + R + Rikik((k-k-1)1)((RRkkkk

((k-k-1)1)))**RRkjkj((k-k-1)1)..

Page 29: Chapter 3      Regular Expressions and Languages

2929

3.2 FA’s & RE’s3.2 FA’s & RE’s

Induction (cont’d)Induction (cont’d)::

– ConstructingConstructing RRikik((kk)) in the order of in the order of kk until until kk = =

nn for for ii = 1 = 1

– For each accepting state For each accepting state jjkk, we can get the , we can get the

union below as the resultunion below as the result

RR11jj11

((nn) ) ++ RR11jj22

((nn)) + ... + ... ++ RR11jjmm

((nn))

where {where {jj11, , jj22, ..., , ..., jjmm} are the set of final states, } are the set of final states, FF..

(End of proof of thereom)(End of proof of thereom)

Page 30: Chapter 3      Regular Expressions and Languages

3030

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5Example 3.5

– Convert the following DFA into an RE.Convert the following DFA into an RE.

– RRijij(0)(0) may be constructed to be ( may be constructed to be (details in the next pagedetails in the next page):):

0, 1

1startstart 20

1

R11(0) + 1

R12(0) 0

R21(0)

R22(0) ( + 0 + 1)

Page 31: Chapter 3      Regular Expressions and Languages

3131

Example 3.5 Example 3.5 (cont’d)(cont’d)

– RR1111(0)(0) = = + + 11 because because (1, (1, 11) = 1 & going back to itself) = 1 & going back to itself

– RR1212(0)(0) = = 00 because because (1, (1, 00) = 2 (going out to state 2)) = 2 (going out to state 2)

– RR2121(0)(0) = = because there is no path from state 2 to 1because there is no path from state 2 to 1

– RR2222(0)(0) = ( = ( + + 00 + + 11) because ) because (2, (2, 00) = 2 & ) = 2 & (2, (2, 11) = 2 & ) = 2 &

going back to itselfgoing back to itself

3.2 FA’s & RE’s3.2 FA’s & RE’s

0, 1

1startstart 20

1

Page 32: Chapter 3      Regular Expressions and Languages

3232

Example 3.5 Example 3.5 (cont’d)(cont’d)

– We can then compute all We can then compute all RRijij((kk)) for for kk=1 & =1 & kk=2.=2.

– However, we may alternatively compute However, we may alternatively compute onlonlyy necessarynecessary terms of terms of RRijij((kk) ) backwardbackward from th from th

e final states, to save time.e final states, to save time.

3.2 FA’s & RE’s3.2 FA’s & RE’s

0, 1

1startstart 20

1

Page 33: Chapter 3      Regular Expressions and Languages

3333

3.2 FA’s & RE’s3.2 FA’s & RE’s

Example 3.5 Example 3.5 (cont’d)(cont’d)

– There is only one final state 2, so only have There is only one final state 2, so only have to compute to compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)..

– Only have to compute Only have to compute RR1212(1)(1) and and RR2222

(1)(1), ,

withoutwithout computing computing RR2121(1)(1) and and RR1111

(1)(1)..

– To compute each of these terms, we need soTo compute each of these terms, we need some RE equalities to simplify intermediate reme RE equalities to simplify intermediate results.sults.

Page 34: Chapter 3      Regular Expressions and Languages

3434

3.2 FA’s & RE’s3.2 FA’s & RE’s

Some equalitiesSome equalities ( (RR is an RE): is an RE):

1.1. RR==RR== ( (==annihilatorannihilator for concatenation) for concatenation)

2.2. + + RR = = RR + + RR ( (==identityidentity for union) for union)

3.3. RR = = RR = = RR ( (= = identityidentity for concatenation) for concatenation)

4.4. (( + + aa))** = = aa* * == ((aa + + ))**

5.5. (( + + aa))aa** = ( = (aa** + + aaaa**) = ) = aa** + + aa++ = = aa**

aa**(( + + aa) = () = (aa** + + aa**aa) = ) = aa** + + aa++ = = aa**

(all provable by easy deduction)(all provable by easy deduction)

Page 35: Chapter 3      Regular Expressions and Languages

3535

3.2 FA’s & RE’s3.2 FA’s & RE’s

To compute To compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– RR1212(1)(1) = = RR1212

(0)(0) + + RR1111(0)(0)((RR1111

(0)(0)))**RR1212(0)(0)

= = 00 + ( + ( + + 11)()( + + 11))**00 (by substitutions) (by substitutions)

= = 00 + ( + ( + + 11))11**00 (by 4. in last page) (by 4. in last page)

= = 00 + + 11** 00 (by 5.) (by 5.)

= (= ( + 1 + 1**))00 (by distributive law) (by distributive law)

= = 11**00 (by 4.) (by 4.)

Page 36: Chapter 3      Regular Expressions and Languages

3636

3.2 FA’s & RE’s3.2 FA’s & RE’s

To compute To compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– RR2222(1)(1) = = RR2222

(0)(0) + + RR2121(0)(0)((RR1111

(0)(0)))**RR1212(0)(0)

= (= ( + 0 + 1) + + 0 + 1) + (( + 1) + 1)**0 0 (by substitutions)(by substitutions)

= (= ( + 0 + 1) + + 0 + 1) + (by 1.)(by 1.)

= = + 0 + 1 + 0 + 1 (by 2.)(by 2.)

Page 37: Chapter 3      Regular Expressions and Languages

3737

3.2 FA’s & RE’s3.2 FA’s & RE’s

To compute To compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– Finally, Finally, RR1212(2)(2)

= 1= 1**0 +10 +1**0(0( + 0 + 1) + 0 + 1)**(( + 0 + 1) + 0 + 1) (by subst.)(by subst.)

= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)**(( + 0 + 1) + 0 + 1) (by 4.)(by 4.)

= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)* * (by 6.)(by 6.)

=1=1**0(0( + (0 + 1 + (0 + 1))**) (by distributive law)) (by distributive law)

= 1= 1**0(0 + 1)0(0 + 1)** (by 4.)(by 4.)

Page 38: Chapter 3      Regular Expressions and Languages

3838

3.2 FA’s & RE’s3.2 FA’s & RE’s

Check the correctness of the final resultCheck the correctness of the final result

RR1212(2)(2) = = 11**00((00 + + 11))**

correct (by looking at the diagram directly)! correct (by looking at the diagram directly)! The above method also works for NFA andThe above method also works for NFA and

--NFANFA. .

0, 1

1startstart 20

1

Page 39: Chapter 3      Regular Expressions and Languages

3939

3.2 FA’s & RE’s3.2 FA’s & RE’s

3.2.2 Converting DFA’s to RE’s 3.2.2 Converting DFA’s to RE’s by Eliminating Sby Eliminating Statestates --- --- another wayanother way– Step 1 – regard symbols on arcs as RE’sStep 1 – regard symbols on arcs as RE’s– Step 2 – conduct the following conversionStep 2 – conduct the following conversion– Step 3 – collect RE’s for all the final statesStep 3 – collect RE’s for all the final states

(for a complete diagram of this, see textbook)(for a complete diagram of this, see textbook)

Sq1 q2

s

R11

Q1 P1

. . .

. . .

q1 q2

R11+ Q1S*P1

. . .

. . .

Fig. 3.7 (partial)

Fig. 3.8 (partial)

Page 40: Chapter 3      Regular Expressions and Languages

4040

3.2 FA’s & RE’s3.2 FA’s & RE’s

Details of Step 3:Details of Step 3:(1) For (1) For eacheach final state final state qq, eliminate all states , eliminate all states

as above except the start state as above except the start state qq00..

(2) If (2) If qq qq00, then a 2-state automaton is left , then a 2-state automaton is left as follows:as follows:

Corresponding RE is (Corresponding RE is (RR++SUSU**TT))**SUSU* * (provable (provable by the first method)by the first method)

UURR

qq00 qqSS

TTstartstart

Fig. 3.9

Page 41: Chapter 3      Regular Expressions and Languages

4141

3.2 FA’s & RE’s3.2 FA’s & RE’s(3) If (3) If qq = = qq00, then perform , then perform one moreone more state state

elimination to eliminate elimination to eliminate qq, leaving only , leaving only the start state the start state qq00 as follows (see an as follows (see an example in the next page):example in the next page):

The corresponding RE is The corresponding RE is RR**..

(4)(4) Collect the result for each final state Collect the result for each final state derived as above to get the final result.derived as above to get the final result.

RR

qq00startstartFig. 3.10

Page 42: Chapter 3      Regular Expressions and Languages

4242

3.2 FA’s & RE’s3.2 FA’s & RE’s

An example of Case (3) in the last page An example of Case (3) in the last page (supplemental)(supplemental)

– Regard Regard qq00 as two separate states, as two separate states, qq as as ss, and apply Figs. , and apply Figs.

3.7 & 3.8 to eliminate 3.7 & 3.8 to eliminate qq11 as follows: as follows:

VVXX

qq00 qqYY

ZZstartstart

S= Vq0 q0

s

R11=X

Q1=Y P1=Z

. . .. . .

q0 q0

R11+ Q1S*P1

=X+YV*Z

. . .. . .

Fig. 3.7 (partial) Fig. 3.8 (partial)

Page 43: Chapter 3      Regular Expressions and Languages

4343

3.2 FA’s & RE’s3.2 FA’s & RE’s An example of Case (3) in the last page An example of Case (3) in the last page (supplemental) (supplemental)

(cont’d)(cont’d)

– Use the result Use the result RR1111 + + QQ11SS**PP11 = = XX + + YVYV**ZZ as as RR in Fig. 3.10 in Fig. 3.10

like the following:like the following:

– And the final result is And the final result is RR** = ( = (X + YV*ZX + YV*Z))**..

– This will be used in your homework.This will be used in your homework.

R=X + YV*Z R=X + YV*Z

qq00startstart

Page 44: Chapter 3      Regular Expressions and Languages

4444

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5 revisitedExample 3.5 revisited

– Use the derivation for 2-state automaton described Use the derivation for 2-state automaton described previously directly to bepreviously directly to be

((RR++SUSU**TT))**SUSU* * = (= (1 1 + + 001111

= = 11* * 11 correct!correct!

0, 1

1startstart 20

1

UURR

qq11 qq22

SS

TTstartstart

Page 45: Chapter 3      Regular Expressions and Languages

4545

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.6Example 3.6

– Step 1: regard all symbols on the arcs as RE’s, Step 1: regard all symbols on the arcs as RE’s, we getwe get

Astartstart B1

0, 1

C0, 1

D0, 1

Astartstart B1

0 + 1

C0 + 1

D0 + 1

Page 46: Chapter 3      Regular Expressions and Languages

4646

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.6Example 3.6

– Step 2: to remove B, use the following conversion we getStep 2: to remove B, use the following conversion we get

ss = = , , qq11 = A, = A, qq22 = C, = C, SS = = , , QQ11 = = 11, , PP11 = = 00 + + 11, , RR1111 = = , ,

so so RR1111 + + QQ11SS**PP11 = = + + 11**((00 + + 11) = ) = 11((00 + + 11) = ) = 11((00 + + 11))

Sq1q2

s

R11

Q1 P1

. . .

. . .

q1 q2

R11+ Q1S *P1

. . .

. . .Astartstart B

1

0 + 1

C0 + 1

D0 + 1

Page 47: Chapter 3      Regular Expressions and Languages

4747

Example 3.6 (cont’d)Example 3.6 (cont’d)

– For final state D, we have to remove C further, resulting inFor final state D, we have to remove C further, resulting in

ss = C, = C, qq11 = A, = A, qq22 = D, = D, SS = = , , QQ11 = =1(0 + 1)1(0 + 1), , PP11 = =00 + + 11, , RR1111= = , ,

so so RR1111 + + QQ11SS**PP11 = = + + 1(0 + 1)1(0 + 1)**((00 + + 11) = ) = 11((00 + + 11)()(00 + + 11))

Sq1q2

s

R11

Q1 P1

. . .. . .

q1 q2

R11+ Q1S *P1

. . .. . .

Astartstart

0 + 1

C1(0 + 1)

D0 + 1

3.2 FA’s & RE’s3.2 FA’s & RE’s

Page 48: Chapter 3      Regular Expressions and Languages

4848

Example 3.6 (cont’d)Example 3.6 (cont’d)

– By the following conversion, we getBy the following conversion, we get

– RR = ( = (00 + + 11), ), qq11 =A, =A, qq22 =D, =D, SS = = 11((00 + + 11)()(00 + + 11),), T T = = , , UU = =

soso ( (RR++SUSU**TT))**SUSU** = (0+1+= (0+1+1(0 + 1)1(0 + 1)**))**((11((00 + + 11)()(00 + + 11)) )) ** = = ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)

Astartstart

0 + 1

1(0 + 1)(0 + 1)1(0 + 1)(0 + 1)D

3.2 FA’s & RE’s3.2 FA’s & RE’s

UURR

qq11 qq22

SS

TTstartstart

Page 49: Chapter 3      Regular Expressions and Languages

4949

Example 3.6 (cont’d)Example 3.6 (cont’d)– For the other final state C, starting from the For the other final state C, starting from the

following diagramfollowing diagram

We have to eliminate D by the following diagramWe have to eliminate D by the following diagram

3.2 FA’s & RE’s3.2 FA’s & RE’s

Astartstart

0 + 1

C1(0 + 1)

D0 + 1

Sq1q2

s

R11

Q1 P1

. . .. . .

q1 q2

R11+ Q1S *P1

. . .. . .

Page 50: Chapter 3      Regular Expressions and Languages

5050

Example 3.6 (cont’d)Example 3.6 (cont’d)– Since D has no successor (and C before it is a final state), Since D has no successor (and C before it is a final state),

deleting D has no effect to the other partsdeleting D has no effect to the other parts, resulting in the , resulting in the following diagram.following diagram.

And by the following conversion, we getAnd by the following conversion, we get

((RR++SUSU**TT))**SUSU** = = ((0 0 + + 1 1 + + 1(0 + 1)1(0 + 1)**))**((1(0 + 1)) 1(0 + 1)) **

= (= (0 0 + + 11))**1(0 + 1)1(0 + 1)

3.2 FA’s & RE’s3.2 FA’s & RE’s

Astartstart

0 + 1

C1(0 + 1)

UURR

qq11 qq22

SS

TTstartstart

Page 51: Chapter 3      Regular Expressions and Languages

5151

Example 3.6 (cont’d)Example 3.6 (cont’d)– The final result is a sum of the previous two The final result is a sum of the previous two

derivation results:derivation results:

((0 0 + + 11))**1(0 + 1)1(0 + 1) + + ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)

3.2 FA’s & RE’s3.2 FA’s & RE’s

Page 52: Chapter 3      Regular Expressions and Languages

5252

3.2 FA’s & RE’s3.2 FA’s & RE’s

3.2.3 Converting RE’s to Automata3.2.3 Converting RE’s to Automata

– Theorem 3.7Theorem 3.7 Every language defined by Every language defined by

an RE is also defined by an FA.an RE is also defined by an FA.

ProofProof. .

Basis. Basis. There are three cases, as shown There are three cases, as shown

below.below.

RE = RE =

a

RE = a

Page 53: Chapter 3      Regular Expressions and Languages

5353

InductionInduction. Three cases need be considered.. Three cases need be considered.

(1) RE = (1) RE = RR + + SS

3.2 FA’s & RE’s3.2 FA’s & RE’s

RE = R + S

R

S

Page 54: Chapter 3      Regular Expressions and Languages

5454

InductionInduction. Three cases need be considered.. Three cases need be considered.

(2) RE = (2) RE = RSRS

3.2 FA’s & RE’s3.2 FA’s & RE’s

RE = RS

R

S

Page 55: Chapter 3      Regular Expressions and Languages

5555

3.2 FA’s & RE’s3.2 FA’s & RE’s

InductionInduction. Three cases need be considered.. Three cases need be considered.

(3) RE =(3) RE = R R**

RE = R*

R R

R0

Page 56: Chapter 3      Regular Expressions and Languages

5656

3.2 FA’s & RE’s3.2 FA’s & RE’s

– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).

Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.

(a) (a) 00 + + 110

1

Page 57: Chapter 3      Regular Expressions and Languages

5757

3.2 FA’s & RE’s3.2 FA’s & RE’s

– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).

Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.

(b) ((b) (00 + + 1)1)**

0

1

Page 58: Chapter 3      Regular Expressions and Languages

5858

3.2 FA’s & RE’s3.2 FA’s & RE’s

– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).

Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.

(c) ((c) (00 + + 1)1)**11((00 + + 1)1)

Connect every two parts by an Connect every two parts by an -transition-transition0

1

1(B)

Page 59: Chapter 3      Regular Expressions and Languages

5959

3.3 Applications of RE’s3.3 Applications of RE’s

Two examples of uses of RE’sTwo examples of uses of RE’s

– Lexical analysisLexical analysis

– Text searchText search

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX

– RE’s used in UNIX are extended versions RE’s used in UNIX are extended versions

of RE’s, allowing of RE’s, allowing non-regularnon-regular languages to languages to

be recognized.be recognized.

Page 60: Chapter 3      Regular Expressions and Languages

6060

3.3 Applications of RE’s3.3 Applications of RE’s

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX– Rules for character classes:Rules for character classes:

The symbol . (dot) The symbol . (dot) any characters. any characters. [[aa11aa22……aakk] ] aa11 + + aa22 + … + + … + aakk

[[aa11--aakk] ] [ [aa11aa22……aakk]]

e.g., [0-9] e.g., [0-9] [0 1 … 9] [0 1 … 9] 00 + + 11 + … + + … + 99

[A-Z] [A-Z] A + B + … +Z A + B + … +Z

[A-Za-z0-9] [A-Za-z0-9] set of all letters and digits set of all letters and digits

[+[+.0-9] .0-9] characters for forming signed digits characters for forming signed digits

Special notationsSpecial notations

e.g., e.g., [:digit:][:digit:] = [0-9], = [0-9], [:alpha:][:alpha:] = [A-Za-z], = [A-Za-z], [:alnum:][:alnum:] = = [A-Za-z0-9][A-Za-z0-9]

Page 61: Chapter 3      Regular Expressions and Languages

6161

3.3 Applications of RE’s3.3 Applications of RE’s

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX

– Operators used in UNIX:Operators used in UNIX: | as union | as union + in RE + in RE

? as “zero ? as “zero or or one of” like one of” like RR? ? + + RR

+ as “one or more of” like + as “one or more of” like RR+ + RRRR* * (= (= RR++))

{{nn} as “} as “nn copies of” like R{5} copies of” like R{5} RRRRRRRRRR (= (= RR55))

– * still used in UNIX.* still used in UNIX.

Page 62: Chapter 3      Regular Expressions and Languages

6262

3.3 Applications of RE’s3.3 Applications of RE’s

3.3.2 Lexical analysis3.3.2 Lexical analysis– Example recalled (in Chapter 1)Example recalled (in Chapter 1)

’’[A-Z][a-z]*[A-Z][a-z]*[ ][ ][A-Z][A-Z][A-Z][A-Z]’’

means the following REmeans the following RE

(A+B+…+Z)(a+b+…+z)*(A+B+…+Z)(a+b+…+z)*__(A+B+…Z)(A+B+…+Z)(A+B+…Z)(A+B+…+Z)

where where __ means a blank.means a blank.

The above can be used to represent addresses The above can be used to represent addresses

like like Ithaca NY, Buffalo NYIthaca NY, Buffalo NY, … , …

Page 63: Chapter 3      Regular Expressions and Languages

6363

3.3 Applications of RE’s3.3 Applications of RE’s 3.3.2 Lexical analysis3.3.2 Lexical analysis

– Each UNIX command lex or flex has a form:Each UNIX command lex or flex has a form:

UNIX-style REUNIX-style RE {code for lexical analyze{code for lexical analyzerr

generation}generation}

– ExamplesExamples else else {return(ELSE);}{return(ELSE);}

[A-Za-z][A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* {code to enter the{code to enter the found identifier ifound identifier i

nn the symbol table;the symbol table; return(ID)}return(ID)}

>=>= {return(GE);}{return(GE);} ……

Page 64: Chapter 3      Regular Expressions and Languages

6464

3.3 Applications of RE’s3.3 Applications of RE’s

3.3.3 Finding Patterns in Text3.3.3 Finding Patterns in Text– We can use RE’s in UNIX for pattern search in WeWe can use RE’s in UNIX for pattern search in We

b pagesb pages– Example: UNIX RE for addresses (incomplete)Example: UNIX RE for addresses (incomplete)

’’[0-9]+[A-Z]?[0-9]+[A-Z]?[ ][ ][A-Z][a-z]*([A-Z][a-z]*([ ][ ][A-Z][a-z]*)*[A-Z][a-z]*)*[ ][ ] (S (Street|Sttreet|St\.\.|Avenue|Ave|Avenue|Ave\.\.|Road |Rd|Road |Rd\.\.))’’

e.g., 123A Main Street, 20 Ta Hsueh Rd., …e.g., 123A Main Street, 20 Ta Hsueh Rd., …

– Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] (see p. 4 & p. 113 in the textbook)(see p. 4 & p. 113 in the textbook) 2. the backslash is used to differentiate a real dot from the dot used for 2. the backslash is used to differentiate a real dot from the dot used for ‘ ‘any character’)any character’)

Page 65: Chapter 3      Regular Expressions and Languages

6565

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

Purpose:Purpose:– To derive “high-level” algebraic laws for To derive “high-level” algebraic laws for

equivalent RE’sequivalent RE’s

Two RE’s are said to be Two RE’s are said to be equivalentequivalent if the if the

languages they define are identical. languages they define are identical.

The RE’s to be discussed include The RE’s to be discussed include variablesvariables, ,

instead of just constants like instead of just constants like , , 00, , 11, , aa, , 0101, ,

……

Page 66: Chapter 3      Regular Expressions and Languages

6666

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.1 Associativity & Commutativity 3.4.1 Associativity & Commutativity – Assume Assume LL, , MM, and , and NN are RE’s ( are RE’s (variablesvariables))– Commutative law for unionCommutative law for union

LL + + MM = = MM + + LL – Associative law for unionAssociative law for union

((LL + + MM) + ) + NN = = LL + ( + (MM + + NN) ) – Associative law for concatenationAssociative law for concatenation

((LMLM))NN = = LL((MNMN) ) (Note: commutative law for concatenation is false)(Note: commutative law for concatenation is false)

Page 67: Chapter 3      Regular Expressions and Languages

6767

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.2 Identities and Annihilators 3.4.2 Identities and Annihilators

– identity for union (identity for union ( + + LL = = LL + + = = LL))

– U U annihilator for union (U + annihilator for union (U + LL = = LL + U = U) + U = U)

– identity for concatenation (identity for concatenation (LL = = LL = = LL ) )

– annihilator for concatenation (annihilator for concatenation (LL = = LL = =

))

Page 68: Chapter 3      Regular Expressions and Languages

6868

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s 3.4.3 Distributive Laws3.4.3 Distributive Laws

– Left distributive law of concatenation over Left distributive law of concatenation over unionunion

LL((MM + + NN) = ) = LMLM + + LNLN

– Right distributive law of concatenation over Right distributive law of concatenation over unionunion

((MM + + NN))LL = = MLML + + NLNL

Note: U: universal languageNote: U: universal language

Page 69: Chapter 3      Regular Expressions and Languages

6969

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.4 The Idempotent Law 3.4.4 The Idempotent Law

– Idempotent law for unionIdempotent law for union

LL + + LL = = LL

Note: “idempotent” means Note: “idempotent” means 【數】冪等【數】冪等 (( 的的 ););

等冪等冪 (( 的的 ))

Page 70: Chapter 3      Regular Expressions and Languages

7070

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.5 Laws Involving Closures3.4.5 Laws Involving Closures– ((LL**))** = = LL**

** = = ** = = – LL++ = = LL**LL = = LLLL** – ((LL + + MM))** = ( = (LL**MM**))**

– LL* = * = LL+ + + + (easy)(easy)– LL?? = = L L (definition of ? said before)(definition of ? said before)

(for proofs, see the textbook)(for proofs, see the textbook)

Page 71: Chapter 3      Regular Expressions and Languages

7171

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.6 & 3.4.7 Discovering Laws for RE’3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Laws and A Test for an RE Algebraic Law

– It can be proved thatIt can be proved that

((LL + + MM))** = ( = (LL**MM**))* * is true is true iffiff (a + b) (a + b)** = (a = (a**bb**))** is true is true

Page 72: Chapter 3      Regular Expressions and Languages

7272

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.6 & 3.4.7 Discovering Laws for RE’s a3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Law (cont’nd A Test for an RE Algebraic Law (cont’d)d)– That is, replace variables in an RE equality with sThat is, replace variables in an RE equality with s

ingle symbols, and check if the resulting ingle symbols, and check if the resulting concreteconcrete RE equality can be proved to be true; if so, then tRE equality can be proved to be true; if so, then the original RE equality is also true.he original RE equality is also true.

Proof.Proof. By By Theorems 3.13 and 3.14.Theorems 3.13 and 3.14. For details, se For details, see the textbook. e the textbook. (iff = if and only if)(iff = if and only if)

Page 73: Chapter 3      Regular Expressions and Languages

7373

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental)

– ** = = ** = =

– rrrr** = = rr**rr

– rr** = r = r**rr** = ( = (rr**))** = r = r** + + rr**

– rr** = = + + rrrr** = = + + rr**rr = = + + rr** =( =( + + rr))** = ( = ( + + rr))rr**

– rr** = ( = (rr + + rr22 + … + + … +rrkk))** ((kk 1) 1)(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese

textbook)textbook)

Page 74: Chapter 3      Regular Expressions and Languages

7474

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental) – rr** = = + + rr + + rr22 + … + + … + rrk k - 1 - 1 + + rrkkrr** ((kk 1) 1)

– ((pp + + qq))** = ( = (pp** + + qq**))**==((pp**qq**))**==pp**((qpqp**))**= (= (pp**qq))**pp**

– ((pqpq))**pp = = pp((qpqp))**

– ((pp**qq))* * = = + ( + (pp + + qq))**qq

– ((pqpq**))* * = = + + p p((pp + + qq))**

(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese textbook)textbook)