computability regular expressions. languages defined by regular expresses = regular languages...

17
Computability Regular expressions. Languages defined by regular expresses = Regular languages (languages recognized by FSM). Applications. Pumping lemma. Homework: Postings? Watch videos.

Upload: ella-bruce

Post on 06-Jan-2018

235 views

Category:

Documents


0 download

DESCRIPTION

Regular expression definition. For given alphabet: the empty string is a regular expression the empty set is a regular expression any single letter is a regular expression For any regular expressions R and S, then –R union S is a regular expession –R concatenate S is a regular expression –R star is a regular expression

TRANSCRIPT

Computability

Regular expressions. Languages defined by regular expresses = Regular languages

(languages recognized by FSM). Applications. Pumping lemma.

Homework: Postings? Watch videos.

Regular expressions, cont.

• Want to prove relationship.• Two directions.

– Regular expressions are defined by an inductive process so need to show that steps of induction can lead to definitions by FSM (or NDFSM)

– Take ANY FSM and show a way to define its regular expression. Will use [yet another type of] machine. Use induction.

Regular expression definition.

For given alphabet:• the empty string is a regular expression• the empty set is a regular expression• any single letter is a regular expression• For any regular expressions R and S, then

– R union S is a regular expession– R concatenate S is a regular expression– R star is a regular expression

Building the FSMs for any regular expression

There is a FSM for any the empty string, any single letter, the empty set.

[Though the Sipser text did not mention this, I believe it is necessary to do an induction on the length of the expression…]

If there is a regular expression R that is defined by R1 union R2, then we can define a FSM for R using the FSMs for R1 and R2.

Similarly, for concatenation and for star.

Opposite direction• Given a FSM, produce the regular expression that

describes the same language.• Use (yet another) new machine: generalized

nondeterministic finite state machine:– arrows from state to state will be regular expressions– start state has no incoming arrows; outgoing to every state– single accepting state with no outgoing arrows; incoming from

every state– start state different from accepting state– there are arrows between all the other states, but note: there are

some arrows labeled with the empty set– GNFSM accepts a string by going from state to state using

transition labels as regular expressions and ending in the single accepting state.

Process of proof• Convert any FSM to a GNFSM

– add a new start state and arrow labeled with the empty string from it to the original start state

– add a new real accepting state, and add labels with the empty string going to it from all old accepting states

– if there are more than one arrow from any state S to any state T, replace by one new arrow with the union of the labels

– add arrows labeled with empty set between any pair of states without arrows

– CLAIM: this GNFSM accepts the same languages as the FSM.• Convert any GNFSM to a GNFSM with exactly 2 states,

the arrow from start to accepting is the regular expression

Break for joke(s)

• Theorem: All horses have an infinite number of legs.

• Lemma: All horses are of the same color.

• Homework: make a posting of explanation of problem with the lemma.

Induction proof

• Take any GNFSM. It must have at least two states.– If it has just 2 states, then we are done.– if it has >2 states, show we can reduce it by

one state.

Inductive step• Select a state, call it Q, different from starting and

accepting state. We will remove Q from the machine and/but modify all other arcs to include what Q did.

• For any two states Si and Sj in the remaining states, – let R1 by the label from Si to Q– R2 by the label from Q to itself, – R3 be the label from Q to Sj and – R4 be the (original) label from Si to Sj.

• Then the new label from Si to Sj isR1R2*R3 ⊔ R4

• Claim this new GNFSM accepts the same strings as the former one and has 1 fewer states.

Pumping lemma• The term pumping is for pumping out new

strings.• If A is a regular language, there is a number p

such that if a string s in A of length at least p, then we can divide s into 3 parts: s = xyz, so that– xyiz is in A, for i>=0. That is, repeat the y substring– |y| (length of y) >0– |xy| < pNote: this means that if there are strings of arbitrarity

large size (length), then we can take one and pump out others….

Proof of pumping lemma• There are [only] a finite number of states.• Let p be the number of states of the FSM for A.• Assume there is a string of length at least p. This

string must have visited at least one state more than one time.– pigeonhole principle– call one such state Q

• Let x be the string up to Q, y be the string that comes back to Q, z be the rest of the string.

• xyz satisfies the conditions.

Examples of non-regular languages

• B = {0i1i | i>=0}. B is strings with the a set number of 0s followed by the same number of 1s– Apply pumping lemma. Take a string and

what could it look like?

• C = {w | w has equal number of 0s and 1s}– Apply pumping lemma to strings of form 0i1i to

get strings with different number of 0s and 1s

Reflection

• FSM only have finite number of states.• The next type of machine (push down

automata) also has only a finite number of states BUT will have a way of saving information.

• Turing machines have a finite number of states BUT allow for writing (and reading) on the tape.

Applications

• Virtual dog: http://faculty.purchase.edu/jeanine.meyer/pet3.html – states corresponded to images– nearly or not quite FSM since I did keep a

variable weight

Lexical scanning• First step in compiling or interpreting is to go from string

of characters to string of lexical units.– units delimited by spaces AND by definitionssum = old + total*12.50 ;

• Want to reduce this string of 26 characters to 8 typed lexical units:– name(sum), operator(=), name(old), operator(+), name(total),

operator(*), number(12.50), statement-ender(;)

• Construct a FSM-like machine that has accepting states for each of name, operator, number, statement-ender, and scans strings of alphanumeric characters plus operators.

Homework

• Continue to look out for computability OR complexity in the news– What is status of the P and NP proof?

• Watch videos• Determine problem in my proof of the 'all

horses are of the same color' lemma.• Finish lexical scanning exercise

Next class

• push down automata and context-free languages