proof . to prove subset relation we need to show that

1

Proof. To prove subset relation we need to show that for any string w, w(AB)C wACBC.

Why not to prove ACBC (AB)C as well?

Let’s try. Take arbitrary wACBC wAC and wBC .

(x, y, w=xy, xA and yC) and (u,v, w=uv, uB and vC)

Can we imply xy=uv x = u ?

Example. A ={a}, B ={ab}, C ={c, bc}.

Then AB={}, (AB)C={}.

AC={ac, abc} BC={abc, abbc}

abc ACBC, but we can not imply that abc (AB)C={}

Theorem 2. Let A, B and C be sets of strings. Then (AB)C ACBC

No, because the same string abc may come from abc and abc

2

Using set operations to specify languages.

• The specification of a language requires an unambiguous description of the strings that belong to the language.

• Set notations can be used for strict definitions of languages. Consider a few examples of set notations for languages:

1) The language over {a, b} that consists of the strings containing the substring bb.

2) The language L2 consists of all strings that begin with aa and end with bb.

L1= {a, b}*{bb}{a, b}*

The set {a, b}* permits any number of a's and b's to precede and follow the occurrence of bb.

L2={aa}{a, b}*{bb}.

3

3) The language L3 consists of all strings that begin with aa or end with bb.

4) The set of even-length strings

L4={aa, ab, bb, ba}*.

L3={aa}{a, b}*{a, b}*{bb}.

4

Regular Languages

Regular languages are the simplest and satisfy some restrictions.

Definition. Let be an alphabet. A regular language over is defined recursively as follows: i) Basis: , {}, {a}, for any a are regular. ii) Recursive Step: If X and Y are regular, then

XY, XY and X* are regular languages.iii) Closure. X is regular language over only if it can be obtained from the basis elements by finite number of applications of the recursive step.

5

Example. Show that L={ab}{a, b}*{ba} is regular language.

Consider all steps: {a}, {b} are regular by Basis.

All finite languages are regular. Infinite languages may be not.

{ab}={a}{b} is regular as concatenation of regular languages.

{ba}={b}{a} is regular as concatenation of regular languages.

{a}{b}={a, b} is regular as the union of regular languages

{a, b}* is regular as Kleene closure of regular language

{ab}{a, b}*{ba} is regular as concatenation of regular languages

6

Regular expressions are used to abbreviate the specification of regular languages.

Regular languages are often described by algebraic expressionscalled regular expressions.

Definition. Let be an alphabet. A regular expression over is defined recursively as follows: i) Basis: , , a are regular expressions for all a. ii) Recursive Step: Let u and v be regular expressions over . Then (u+v), uv, u* are regular expressions. iii) Closure: u is a regular expression over only if it can be obtained from the basis elements by finite number of applications of the recursive step.

7

Examples of regular expressions over alphabet ={a, b}:, , a, b, +a, b*, a+ba, (a+b)a, ab*, a*+b*, etc.

For each regular expression E we might be able to associate a regular language L(E) following the following rules:

L()=,L()={},L(a)={a},L(R+S)=L(R)L(S),L(RS)=L(R)L(S),L(R*)=L(R)*

8

Example. Let's find the language of the regular expression a+bc* over ={a, b, c}.

L (a+bc*)=L(a)L(bc*)

={a} L(b)L(c*)

={a}{b}{c}*

So, the language described by expression a+bc* consist of string a and strings that start with one b followed by any number of c’s.

L ={a, b, bc, bcc, bccc, …}

9

Describe the language for each of the following regular expressions.

1) a+b

2) a+bc

3) ab*+c

4) ab*+bc*

5) a*bc*+ac

L1={a, b}

L2={a, bc}

L3={c, a, ab, abb, …}

L4={a, b, ab, abb, …, bc, bcc, …}

L5={ac, b, ab, bc, abc, aabc, …}

10

Example. Simplify a regular expression:

• Distinct regular expressions may represent the same language:a+b and b+a represent the same language {a, b}.

• Two expressions R and S that represent the same language, L(R) = L(S), are considered equal. For example, a + a*= a*

because L(a + a*) = L(a*) = {, a, aa, aaa, …}

• +ab +abab(ab)* = (ab)* L ={, ab, abab, ababab, …}

• aa (b*+a)+a(ab*+aa) = aa (b*+a)

• a(a+b)*+aa(a+b)*+aaa(a+b)* = a(a+b)*

To prove it we can show that aa(a+b)* a (a+b)(a+b)* a(a+b)*

and aaa(a+b)* a(a+b)*. Then use A B AB =B

11

Properties of Regular Expressions1) + properties R+T=T+R R+=+R=R R+R=R(R+S)+T=R+(S+T)

L(R) L(T) = L(R) L(T)

L(R) =L(R) L(R)L(R) = L(R)

(L(R)L(S))L(T) = L(R)(L(S)L(T))

2) ‘’ properties of regular expressions R=R= R=R=R (RS)T=R(ST)

3) distributive properties of regular expressionsR(S+T)=RS+RT (S+T)R=SR +ST

12

4) closure properties *=*=R*=R*R*=(R*)*=R+R*

R*=+ R*=(+ R*)*=(+R) R*=+R R*

R*=(R+…+ Rk)* for any k1R*=+ R+ R2+…+ Rk1+Rk R* for any k1R*R=R R*

(R+S)*=(R*+S*)*=(R*S*)*=(R*S)*R*=R*(SR*)*

R(SR)*=(RS)*R(R*S)*=+(R+S)*S(RS*)*=+R(R+S)*

13

Proof. We need to prove two inclusion properties, i) R* R*R* and ii) R*R* R*.

ii) To prove R*R* R* let's take arbitrary string wR*R* ……..(1) to prove that wR*.

• Each of this properties can be proved. For example, let's prove that R* = R*R*.

i) To prove R*R*R* it is sufficient to note that for any expression S (understand: for any set of strings, described by expression S) S SR* because R*.

(1) w=uv, where uR* and vR*……………….. (2)

(2) uRn and vRm for some integer n and m. Then w=uvRn+m R*. So, we proved both subset relations, i.e. two regular expressions are equal, R*=R*R*.

14

Let's prove one more property of regular expressions, (R+S)*=(R*S)*R*.

Proof. We understand the equality of two expressions as the equality of two sets of strings denoted by these expressions. So, we are going to prove two subset relations: i) (R+S)* (R*S)*R* and ii) (R*S)*R* (R+S)*.

i). Take arbitrary string w(R+S)*

w(R+S)n for some integer n0.

w=u1u2…un, where uiR+S for i=1, 2,…n.

uiR+S uiR or uiS.

Denote any substring uiR as ui=r and a substring uiS as ui=s.

Then string w is a sequence of substrings r and s, like w=rrrssrsssrsrr = rrrssrsssrsrr= v1v2v3v4v5v6 t, , where

vjR*S, w=v1v2…vkt, where tRSo, any string w(R*S)*R*.

15

In the same way we may prove that ii) (R*S)*R* (R+S)*.

(left as an exercise).

Example. Using properties of regular expressions prove the equalityba*(baa*)* = b( a + ba )*.

We can prove the equality by using the property (R+S)* = R*(SR*)*

Take R = a, S =ba, then (a+ba)*= (R+S)*= R*(SR*)* = a*(baa*)*

S RS R*

16

Then we can prove a(a+b)*+aa(a+b)*+aaa(a+b)*= a(a+b)*

by proving aa(a+b)* a(a+b)* and aaa(a+b)* a(a+b)*

We can also establish simple rules, that can be used in proofs by ‘double inclusion’. Let A, B and C be sets of strings, then 1) A B AC BC 2) A B A* B*

3) B A AB and A BA

a a+b

a(a+b)* (a+b)(a+b)*

(a+b)*

aa(a+b)* a(a+b)*

(a+b)*(a+b)* = (a+b)*

aaa(a+b)* aa(a+b)* a(a+b)*

17

Example. Prove that ( + a+b*a)*b* = (a + b)*.

We can show two subset relations: i) ( + a+b*a)*b* (a + b)*

and ii) (a + b)* ( + a+b*a)*b*

i) ( + a+b*a)*b* (a + b)*

b a+b b*(a+b)*

b*a (a+b)* (a+b)* = (a+b)*

(a+b)*

a a+b(a+b)* (+ a+ b*a) (a+b)*

(+ a+ b*a)*(a+b)**= (a+b)*

Finally, (+ a+ b*a)* b* (a+b)* (a+b)*= (a + b)*

18

ii) (a + b)* ( + a+b*a)*b*

(a + b)*= (b*a)* b* by the rule (R+S)*=(R*S)*R*

( + a+b*a)*b* by b*a ( +a+ b*a)

19

Deterministic Finite Automata (DFA)

DFA is a recognizer for regular languages. They model the behavior of real computing devices which are designed to distinguish a correct input over a given alphabet.

recognizerfor L *

DFAw* Accept (wL)

Reject (wL)

This abstract machine (DFA) is a device that reads an input string, one symbol at a time and decides whether the string belongs to the language or not (accept or reject).

20

The DFA can be depicted as a directed graph, where vertices represent states and each edge is labeled by the input symbol and dictates how the machine changes its state on reading thissymbol.

DFA includes: • alphabet• finite nonempty set of “states”• transition function defined for each state and on each symbol• start states• accepting states

21

Example. Construct a DFA to recognize the regular language over alphabet {a, b} described by regular expression L(ab*a).

So, we need to find a DFA that is able to distinguish between strings, that belong to L(ab*a) and strings that do not.

a

b

q1

q2

“sink state” a, b

b

q3

a

a, b

Transition function

(q0, a)=q1, (q0, b)=q2

(q1, a)=q3, (q1, b)=q1

(q2, a)=q2, (q2, b)=q2

(q3, a)=q2, (q3, b)=q2

q0w

22

a

b

q1

q2


b

q3

a

a, b

q0w

DFA L (ab*a)

• Set of states: Q={q0, q1, q2, q3}

including q0 - start state q3 - accepting state

DFA consists of:

• Alphabet ={a, b}

• Transition function (qi, ak) that assigns the nest state on reading any ak for each qi Q

23

a

b

q1

q2


b

q3

a

a, b

q0w

Assume w = abbaa enters the DFA.

abbaa

q0

By reading an input DFA goes through sequence of configurations:

abbaa

q1a

(q0, a)=q1

abbaa

q1b

(q1, b)=q1

abbaa

q1b

(q1, b)=q1

abbaa

q3a

(q1, a)=q3

abbaa

q2a

(q3, a)=q2

24

The configuration is a pair of a state and remaining input, (qi, w):

abbaa

q0

abbaa

q1a abbaa

q1b

abbaa

q1b

abbaa

q2a

abbaa

q3a

(q0, abbaa) (q1, bbaa) (q1, baa) (q1, aa) (q3, a) (q2, )

A string is accepted by a DFA if and only if on the reading this string the DFA comes to the configuration (qa, ), where qa is an accepting state.

25

The string is accepted (recognized to be in the language)if DFA comes to accepting state after reading the input string

Instead of using transition function (qi, ak) we can give the equivalent transition table.

a bq0 q1 q2

q1 q3 q1

q2 q2 q2

q3 q2 q2

a

b

q1

q2


b

q3

a

a, b

q0w

26

Inductive proofs on strings.

Usually induction is done on the length of a string |w| =n, or the number of repetition of some pattern.

Prove that the regular expression R =(ab+b)*(+a) describes the language L {a, b}* , consisting of all strings that do not contain aa.

Proof. To prove the equality of two sets of strings, L and R, we canprove two subset relations, RL and LR

i) R L , we need to prove that for any string w [wR wL]

Assume wR =(ab+b)*(+a) w (ab+b)n (+a), for some n0

Prove by induction on n0 , that for any w (ab+b)n (+a) w L.

27

Prove by induction on n0 , that for any w(ab+b)n(+a) w L.

Basis. n=0, w(+a), we have either w = or w = a.In both cases w L, because it does not contain aa.

IH. Assume that for n=k, k 0, any string from the set s(ab+b)k(+a) belongs to L.

IS. We need to prove that any string w(ab+b)k+1(+a) belongs to L.

w(ab+b)k+1(+a) w(ab+b)s , where s(ab+b)k(+a), either w=abs or w=bs , in both cases w does not contain aa since s does not contain aa by IH.

28

ii) Take any w L and prove that wR =(ab+b)*(+a).

Let’s prove it by induction on the length |w|=n 0

Basis. n=0, w= , R =(ab+b)*(+a).

IH. Assume that for n=k, k 0, we have that any string v L with length |v| k belongs to R.

IS. We need to prove that any string from L with length k+1belongs to R.

Take w L, |w|=k+1. We can consider two cases: 1) w=as or 2) w=bs. In the first case w L s=bu, where u L, and by IH u R, since |u|= k1<k , i. e. u (ab+b)*(+a).

Then w = abu ab(ab+b)*(+a) (ab+b)*(+a).

29

In the second case, w=bs, where s L and |s|=k, so s R =(ab+b)*(+a) by IH.

Then w b(ab+b)*(+a) (ab+b)*(+a).

30

Summary. A language L is a set of strings over some alphabet .

To describe a regular language we can use:• set operations• regular expressions• DFA that recognizes the language.

Any of this specification must be unambiguous (but may be not unique)

A language can be regular if it satisfies some restrictions.

Set notations can be used to specify what strings belong to language.

31

Example. Find the DFA to recognize the language L, which consists of all strings over the alphabet = {a, b}, L {a, b}* that includea substring aba.

Set notation: L= {a, b}*{aba}{a, b}*

Regular expression: (a+b)*aba (a+b)*

q0

b

q1

aq2

b

b

q3a

a, ba

DFA

32

Transition function : Q Q

(q0, a)= q1 (q0, b)= q0

(q1, a)= q1 (q1, b)= q2

(q2, a)= q3 (q2, b)= q0

(q3, a)= q3 (q3, b)= q3

Configurations for the input string w=aaaabab

(q0, aaaabab) (q1, aaabab) (q1, aabab) (q1, abab)

(q1, bab) (q2, ab) (q3, b) (q3, )

Q ={q0, q1, q2, q3} ={a, b}

q0

b

q1

aq2

b

b

q3a

a, ba

accepting statestart state

aaaabab L

33

What language L {a, b}* is recognized by the following DFA?Give regular expression describing the language.

0

a

a, b

a

b

b

b

1 2

3

a

L= +aa*b + aa*baa*b + aa*baa*baa*b +…

= + aa*b+ (aa*b)2+ (aa*b)3+…

= (aa*b)*

L is the set of strings over {a, b} for which each occurrence of b is preceded by at least one a and the only string with no b’s is .

34

Recursive functions on strings

An example of recursive function on strings is the reversalof a string.

Let f (w) be reversal of a string w *. It can be defined recursively as follows

Basis. If |w|=0, i. e. w=, then f (w)=f ()=

Recursive step. If w>0, then w=ua, for some a and u *

and f (w)= f (ua)=af (u).

Let’s find the reversal of w=abc by using this recursive definition.

f (abc) = cf (ab) = cbf (a) =cbf (a) = cba f ()= cba = cba

35

We can prove by induction that for any w f (w) gives the reverse of w, i. e. f (a1 a2 a3 … an) = an an-1…. a1

What will be the induction parameter n?

Proof by induction on n = |w| 0

Basis. n = |w|=0, w = , f ()= by the basis of recursive defn

IH. Assume that f (a1 a2 a3 … an) = an an-1…. a1 for n = k, k 0. In other words, we assume that for any string |w| = k, f (a1 a2 … ak) = ak ak-1… a1.

IS. We need to prove that the property holds for n = k +1,i.e. for any string |w| = k+1 f (a1 … ak ak+1) = ak+1ak … a1.

f(a1 a2 … ak ak+1) = f (u ak+1)= ak+1 f (u) by recursive step. = ak+1ak … a1 by IH for u= a1 … ak , |u|=k

36

Example. Prove by induction that f (f (w)) = w.

First we need to prove the following lemma.

Lemma. If |w| 1, then f (w)=f (bv)= f (v)b

In words, we want to prove that at each recursive step we can reverse the first letter, e.g. f (abc)=f (bc)a, as well as f (abc)=cf (ab)

Proof by induction on |v| 0 that f (bv)= f (v)b.

Basis |v| =0, v=, f (b) = f (b) = bf () =b= b= f ()b

IH. Assume that f (bv)= f (v)b when |v|=k, for some k 0.

IS. We need to prove it for |v|=k+1. Since |v|=k+1 1, we have v=ua, for some a.

f (bv)= f (bua) =a f (bu) =a f (u)b (by IH for |u|=k )= f (ua)b (by recursive step a f (u)= f (ua) )

= f (v)b (since ua=v).

37

Proof by induction that f (f (w)) = w.Induction on n =|w| 0.

Basis. n =0, w= .

f (f ())= f ()= by the basis of recursive dfn of f (w)

IH. Assume that f (f (u)) = u for n=|u|=k, for some k 0

IS. We need to prove f (f (w)) = w for n=|w|=k+1

f (f (w))= f (f (ua)) i.e. w=ua for some a, u*, |u|=k

= f (af (u)) by recursive step

= f (f (u))a by the Lemma.

= ua because f (f (u))=u by IH

=w since w=ua

38

Consider a language L {a, b}*, L ={anbn| n 0}

What languages are not regular?

Is it regular?

A regular language can be obtained from , a and b by finite number applications of , and *-operation.

Since L is infinite, we can get it only by *-operation.

But there is no way you can get any number of a’s followed by equal number of b’s by *-operations.

{ab}* = {, ab, abab, ababab, abababab, …} not what we need

{a}*{b}*=?

{a}n{b}n so, we can obtain it only from concatenation, but in this way we need infinitely many of them.

39

The DFA that we need to recognize anbn needs n states

…

It would be a valid DFA to accept any finite language, anbn

when n is limited.

a a a a a

bb bb b0 1 2 3 4 5

40

Recursive definition of a language

Example. Consider the following definition of L * , where ={a, b}.

Basis. L

Recursive step. If x L, then axb L

Prove that for any string wL the number of a’s is equal to the number of b’s, i.e. |w|a=|w|b

Any wL is obtained by finite number, n0, of applications of recursive step. Prove by induction on n0, that |w|a=|w|b

Basis. n=0, w =, | |a=0=| |b

IH. Assume that for some k 0, if u is obtained by n=k applications of recursive step, then |u|a=|u|b .

41

IS Consider a string w, that is obtained by k+1 recursive steps from basis.

It means that w=aub for some uL, where u is obtained in k steps.

w=aub implies that |w|a= |u|a+1 and |w|b= |u|b+1

So, we have |w|a= |u|a+1 =|u|b+1 (by IH |u|a=|u|b)

= |w|b

We can prove also that L´ ={anbn | n0} =L

proof . to prove subset relation we need to show that

Documents