regular expressions chapter 6 1. regular languages regular language regular expression finite state...

62
Regular Expressions Chapter 6 1

Upload: gregory-norman

Post on 05-Jan-2016

255 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Expressions

Chapter 61

Page 2: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Languages

Regular Language

Regular Expression

Finite State Machine

L

Accepts

2

Page 3: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Expressions

The regular expressions over an alphabet are all and only the strings that can be obtained as follows:

1. is a regular expression.2. is a regular expression.3. Every element of is a regular expression.4. If , are regular expressions, then so is .5. If , are regular expressions, then so is .6. If is a regular expression, then so is *.7. is a regular expression, then so is +.8. If is a regular expression, then so is ().

3

Page 4: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Expression Examples

If = {a, b}, the following are regular expressions:

a(a b)*abba

4

Page 5: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Expressions Define Languages

Define L, a semantic interpretation function for regular expressions:

1. L() = .2. L() = {}.3. L(c), where c = {c}.4. L() = L() L(). 5. L( ) = L() L(). 6. L(*) = (L())*. 7. L(+) = L(*) = L() (L())*. If L() is equal to , then

L(+) is also equal to . Otherwise L(+) is the language that is formed by concatenating together one or more strings drawn from L().

8. L(()) = L(). 5

Page 6: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Role of the Rules

• Rules 1, 3, 4, 5, and 6 give the language its power to define sets.

• Rule 8 has as its only role grouping other operators. • Rules 2 and 7 appear to add functionality to the

regular expression language, but they don’t.

2. is a regular expression.

7. is a regular expression, then so is +.

6

Page 7: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Analyzing a Regular Expression

L((a b)*b)

= L((a b)*) L(b)

= (L((a b)))* L(b)

= (L(a) L(b))* L(b)

= ({a} {b})* {b}

= {a, b}* {b}.

7

Page 8: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Examples

L( a*b* ) =

L( (a b)* ) =

L( (a b)*a*b* ) =

L( (a b)*abba(a b)* ) =

8

Page 9: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Going the Other Way

L = {w {a, b}*: |w| is even}

9

Page 10: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Going the Other Way

L = {w {a, b}*: |w| is even}

((a b) (a b))*

(aa ab ba bb)*

10

Page 11: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Going the Other Way

L = {w {a, b}*: |w| is even}

((a b) (a b))*

(aa ab ba bb)*

L = {w {a, b}*: w contains an odd number of a’s}

11

Page 12: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Going the Other Way

L = {w {a, b}*: |w| is even}

((a b) (a b))*

(aa ab ba bb)*

L = {w {a, b}*: w contains an odd number of a’s}

b* (ab*ab*)* a b*

b* a b* (ab*ab*)*

12

Page 13: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

More Regular Expression Examples

L ( (aa*) ) =

L ( (a )* ) =

L = {w {a, b}*: there is no more than one b in w}

L = {w {a, b}* : no two consecutive letters in w are the same}

13

Page 14: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Common Idioms

( ) optional

(a b)* *, where = {a, b}

14

Page 15: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Operator Precedence in Regular Expressions

Regular ArithmeticExpressions Expressions

Highest Kleene star exponentiation

concatenation multiplication

Lowest union addition

a b* c d* x y2 + i j2

15

Page 16: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Details Matter

a* b* (a b)*

(ab)* a*b*

16

Page 17: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Kleene’s Theorem

Finite state machines and regular expressions define the same class of languages. To prove this, we must show:

Theorem: Any language that can be defined with a regular expression can be accepted by some FSM and so is regular.

Theorem: Every regular language (i.e., every language that can be accepted by some DFSM) can be defined with a regular expression.

To prove A = B, we have to prove:

1. A B and 2. B A

17

Page 18: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

18

Page 19: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

19

Page 20: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element c of :

20

Page 21: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element c of :

21

Page 22: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element c of :

(*):

22

Page 23: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every Regular Expression There is a Corresponding FSM

We’ll show this by construction. An FSM for:

:

A single element c of :

(*):

23

Page 24: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Union

If is the regular expression and if both L() and L() are regular:

24

Page 25: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Union

S3

25

Page 26: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Concatenation

If is the regular expression and if both L() and L() are regular:

26

Page 27: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Concatenation

27

Page 28: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Kleene Star

If is the regular expression * and if L() is regular:

28

Page 29: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Kleene Star

S2

29

Page 30: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example

(b ab)*

An FSM for b An FSM for a An FSM for b

An FSM for ab:

30

Page 31: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example

(b ab)*

An FSM for (b ab):

31

Page 32: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example

(b ab)*

An FSM for (b ab)*:

32

Page 33: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Algorithm regextofsm

regextofsm(: regular expression) =

Beginning with the primitive subexpressions of and working outwards until an FSM for all of has been built do:

Construct an FSM as described above.

33

Page 34: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

For Every FSM There is a Corresponding Regular Expression

We’ll show this by construction.

The key idea is that we’ll allow arbitrary regular expressions to label the transitions of an FSM.

34

Page 35: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

A Simple Example

Let M be:

Suppose we rip out state 2:

35

Page 36: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Algorithm fsmtoregexheuristic fsmtoregexheuristic(M: FSM) = 1. Remove unreachable states from M. 2. If M has no accepting states then return . 3. If the start state of M is part of a loop, create a new start state s and connect s to M’s start state via an -transition. 4. If there is more than one accepting state of M or there are any transitions out of any of them, create a new accepting state and connect each of M’s accepting states to it via an -transition.

The old accepting states no longer accept. 5. If M has only one state then return . 6. Until only the start state and the accepting state remain do:

6.1 Select rip (not s or an accepting state). 6.2 Remove rip from M. 6.3 *Modify the transitions among the remaining states so M accepts the same strings.

7. Return the regular expression that labels the one remaining transition from the start state to the accepting state. 36

Page 37: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example

1. Create a new initial state and a new, unique accepting state, neither of which is part of a loop.

37

Page 38: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example, Continued

2. Remove states and arcs and replace with arcs labelled with larger and larger regular expressions.

It’s to create a source and a sink!

38

Page 39: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example, Continued

Remove state 3:

39

Page 40: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example, Continued

Remove state 2:

40

Page 41: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

An Example, Continued

Remove state 1:

41

The goal is to keep the source and the sink only!

Page 42: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

It’s not always easy

M =

42

Try removing state [2]!

Page 43: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Further Modifications to M Before We Start We require that, from every state other than the accepting state there must be exactly one transition to every state (including itself) except the start state. And into every state other than the start state there must be exactly one transition from every state (including itself) except the accepting state.

1. If there is more than one transition between states p and q, collapse them into a single transition:

becomes:

43

Page 44: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Further Modifications to M Before We Start

2. If any of the required transitions are missing, add them:

becomes:

44

Page 45: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Ripping Out States3. Choose a state. Rip it out. Restore functionality.

Suppose we rip state 2.

45

Page 46: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

What Happens When We Rip?

Consider any pair of states p and q. Once we remove rip, how can M get from p to q?

● It can still take the transition that went directly from p to q, or ● It can take the transition from p to rip. Then, it can take the transition from rip back to itself zero or more times. Then it can take the transition from rip to q.

46

Page 47: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Defining R(p, q)

After removing rip, the new regular expression that should label the transition from p to q is:

R(p, q) /* Go directly from p to q /* or

R(p, rip) /* Go from p to rip, thenR(rip, rip)* /* Go from rip back to itself

any number of times, thenR(rip, q) /* Go from rip to q

Without the comments, we have:

R = R(p, q) R(p, rip) R(rip, rip)* R(rip, q)

47

Page 48: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Returning to Our Example

R = R(p, q) R(p, rip) R(rip, rip)* R(p, rip)

Let rip be state 2. Then:

R (1, 3) = R(1, 3) R(1, rip)R(rip, rip)*R(rip, 3) = R(1, 3) R(1, 2)R(2, 2)*R(2, 3)

= a b* a= ab*a 48

Page 49: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Returning to Our Example

49

R (4, 4) = R(4, 3) R(4, 2)R(2, 2)*R(2, 4)

= b b* =

1 4

3

bb*aab*a

b

R (4, 3) = R(4, 3) R(4, 2)R(2, 2)*R(2, 3)

= b b* a = bb*a

R (1, 4) = R(1, 4) R(1, 2)R(2, 2)*R(2, 4)

= b a b* = b

Rip state 4:R (1, 3) = R(1, 3) R(1, 4)R(4, 4)*R(4, 3)

= ab*a b * bb*a = ab*a b bb*a = ab*a bbb*a

Page 50: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Algorithm fsmtoregex

fsmtoregex(M: FSM) = 1. M = standardize(M: FSM). 2. Return buildregex(M).

standardize(M: FSM) = 1. Remove unreachable states from M. 2. If necessary, create a new start state. 3. If necessary, create a new accepting state. 4. If there is more than one transition between states p

and q, collapse them. 5. If any transitions are missing, create them with label

.

50

Page 51: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

The Algorithm fsmtoregex

buildregex(M: FSM) = 1. If M has no accepting states then return . 2. If M has only one state, then return . 3. Until only the start and accepting states remain do:

3.1 Select some state rip of M. 3.2 For every transition from p to q, if both p and q are not rip then do

Compute the new label R for the transition from p to q:

R (p, q) = R(p, q) R(p, rip) R(rip,

rip)* R(rip, q)

3.3 Remove rip and all transitions into and out of it. 4. Return the regular expression that labels the transition from the start state to the accepting state.

51

The case of p = q should also be considered!

Page 52: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Regular Expression or FSM

• Kleene’s Theorem– Regular expression FSM

• Q: when to use regular expression and when to use FSM to describe a regular language?

• Order– Regular expression: must specify the order in

which a sequence of symbols must occur• Phone number, email address, etc.

– FSM: order doesn’t matter• Vending machine, parity checking, etc.

• Sometimes it’s easier to do it one way, sometimes the other.

52

Page 53: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Sometimes Writing Regular Expressions is Easy

• No two consecutive letters are the same– (b )(ab)*(a ) or (a )(ba)*(b )

• Floating point number– ( + -)D+( .D+)( (E( + -)D+))

where

D = (0 1 2 3 4 5 6 7 8 9)

53

Page 54: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Sometimes Building a DFSM is Easy

A Special Case of Pattern Matching

Suppose that we want to match a pattern that is composed of a set of keywords. Then we can write a regular expression of the form:

(* (k1 k2 … kn) *)+

We can use regextofsm to build an FSM. But …

We can instead use buildkeywordFSM.

54

Page 55: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Recognize {cat, bat, cab}

The single keyword cat:

55

Page 56: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

{cat, bat, cab}

Adding bat and cab:

56

Page 57: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

{cat, bat, cab}

Adding transitions to recover after a path dies:

57

Page 58: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Using Regular Expressions in the Real World

Matching numbers

Matching IP addresses

Scanning valid email address

Determining legal password

Finding doubled words (e.g., “the the” in word processor)

Identifying spam

58

Page 59: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Using Substitution

Building a chatbot:

On input:

<phrase1> is <phrase2>

the chatbot will reply:

Why is <phrase1> <phrase2>?

Example: <user> The food there is awful<chatbot> Why is the food there awful?

59

Page 60: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Simplifying Regular Expressions

Regular expression as sets: ● Union is commutative: = ● Union is associative: ( ) = ( ) ● is the identity for union: = = ● Union is idempotent: = ● If B A, A B = A

Concatenation: ● Concatenation is associative: () = () ● is the identity for concatenation: = = ● is a zero for concatenation: = =

Concatenation distributes over union: ● ( ) = ( ) ( ) ● ( ) = ( ) ( ) 60

Page 61: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Simplifying Regular Expressions

Kleene star: ● * = ● * = ● (*)* = * ● ** = * ● If L(*) L(*) then ** = * ● ( )* = (**)* ● If L() L(*) then ( )* = *

61

Page 62: Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2

Example

● ((a* U )* aa)(b bb)* b* ((a b)* b* ab)*= ((a* )* aa)(b bb)* b* ((a b)* b* ab)*= (a* aa)(b bb)* b* ((a b)* b* ab)*= a* (b bb)* b* ((a b)* b* ab)*= a* b* b* ((a b)* b* ab)*= a* b* ((a b)* b* ab)*= a* b* ((a b)* ab)*= a* b* (a b)*= a* (a b)*= (a b)*

62