regular expressions · 9/11/2020 · inductive definition of regexp for all ∊Σ, is a regexp...
TRANSCRIPT
![Page 1: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/1.jpg)
Regular Expressions
CS 154, Omer Reingold
![Page 2: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/2.jpg)
Regular ExpressionsComputation as simple, logical description
A totally different way of thinking about computation:What is the complexity of describing the strings in the language?
![Page 3: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/3.jpg)
Inductive Definition of Regexp
For all ∊ Σ, is a regexp
ε is a regexp
is a regexp
If R1 and R2 are both regexps, then
(R1R2), (R1 + R2), and (R1)* are regexps
Let Σ be an alphabet. We define the regular expressions over Σ inductively:
![Page 4: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/4.jpg)
Precedence Order:
* then then +
· R2R1*(Example: R1*R2 + R3 = ( ) ) + R3
![Page 5: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/5.jpg)
Definition: Regexps Represent Languages
The regexp ∊ Σ represents the language {}
The regexp ε represents {ε}
The regexp represents
If R1 and R2 are regular expressions representing L1 and L2 then:
(R1R2) represents L1 L2
(R1 + R2) represents L1 L2
(R1)* represents L1*
![Page 6: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/6.jpg)
Regexps Represent Languages
For every regexp R, define L(R) to be the language that R represents
A string w ∊ Σ* is accepted by R(or, w matches R) if w ∊ L(R)
Examples: 0, 010, and 01010 match (01)*0
110101110100100 matches (0+1)*0
![Page 7: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/7.jpg)
{ w | w has exactly a single 1 }
0*10*
Assume Σ = {0,1}
{ w | w contains 001 }
(0+1)*001(0+1)*
![Page 8: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/8.jpg)
What language does the regexp * represent?
{ε}
Assume Σ = {0,1}
![Page 9: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/9.jpg)
{ w | w has length ≥ 3 and its 3rd symbol is 0 }
(0+1)(0+1)0(0+1)*
Assume Σ = {0,1}
![Page 10: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/10.jpg)
{ w | every odd position in w is a 1 }
(1(0 + 1))*(1 + ε)
Assume Σ = {0,1}
![Page 11: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/11.jpg)
1 + 0 + ε + 0(0+1)*0 + 1(0+1)*1
{ w | w has equal number ofoccurrences of 01 and 10}
= { w | w = 1, w = 0, or w = ε, or w starts with a 0 and ends with a 0, orw starts with a 1 and ends with a 1 }
Claim: A string w has equal occurrences of 01 and 10w starts and ends with the same bit.
Assume Σ = {0,1}
![Page 12: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/12.jpg)
L can be represented by some regexp L is regular
DFAs ≡NFAs ≡ Regular Expressions!
![Page 13: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/13.jpg)
L can be represented by some regexp L is regular
![Page 14: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/14.jpg)
Given any regexp R, we will construct an NFA N s.t.N accepts exactly the strings accepted by R
Proof by induction on the length of the regexp R
L can be represented by some regexp L is regular
Base Cases (R has length 1):
R =
R = ε
R =
![Page 15: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/15.jpg)
Induction Step: Suppose every regexp of length < krepresents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
![Page 16: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/16.jpg)
Induction Step: Suppose every regexp of length < k represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 representsome regular languages, L1 and L2
But L(R) = L(R1 + R2) = L1 L2
so L(R) is regular, by the union theorem!
![Page 17: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/17.jpg)
Induction Step: Suppose every regexp of length < k represents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1·R2) = L1· L2
so L(R) is regular by the concatenationtheorem
![Page 18: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/18.jpg)
Induction Step: Suppose every regexp of length < krepresents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1*) = L1*so L(R) is regular, by the star theorem
![Page 19: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/19.jpg)
Induction Step: Suppose every regexp of length < krepresents some regular language.
Three possibilities for R:
R = R1 + R2
R = R1 R2
R = (R1)*
Consider a regexp R of length k > 1
By induction, R1 and R2 represent
some regular languages, L1 and L2
But L(R) = L(R1*) = L1*so L(R) is regular, by the star theorem
Therefore: If L is represented by a regexp,then L is regular
![Page 20: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/20.jpg)
Give an NFA that accepts the language represented by (1(0 + 1))*
1ε 0,1
ε
1(0+1)( )*Regular expression:
![Page 21: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/21.jpg)
L can be represented by a regexp
L is a regular language
Idea: Transform an NFA for L into a regular expression by removing states and
re-labeling the arcs with regular expressions
Generalized NFAs (GNFA)
Rather than reading in just letters from the string on a step, we can read in entire substrings
![Page 22: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/22.jpg)
This GNFA recognizes L(a*b(cb)*a)
Is aaabcbcba accepted or rejected?
Is bba accepted or rejected?
Is bcba accepted or rejected?
Generalized NFA (GNFA)
![Page 23: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/23.jpg)
NFA
Add unique start and accept states
![Page 24: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/24.jpg)
While the machine has more than 2 states:
Pick an internal state, rip it out and re-label the arrows with regexps,
to account for paths through the missing state
0
1
001*0
NFA
![Page 25: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/25.jpg)
In general:
R(q1,q2)
R(q2,q2)
R(q2,q3)R(q1,q2)R(q2,q2)*R(q2,q3) + R(q1,q3)
q1q2 q3
G
R(q1,q3)
While the machine has more than 2 states:
NFA
![Page 26: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/26.jpg)
q1
b
a
εq2
a,b
εa*b(a*b)(a+b)*q0 q3
R(q0,q3) = (a*b)(a+b)* represents L(N)
![Page 27: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/27.jpg)
DFAs NFAs
RegularLanguages
RegularExpressions
DEFINITION
![Page 28: Regular Expressions · 9/11/2020 · Inductive Definition of Regexp For all ∊Σ, is a regexp εis a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2), (R 1 + R](https://reader036.vdocuments.net/reader036/viewer/2022071219/6055dfa3c33ab00abe1132b6/html5/thumbnails/28.jpg)
Parting thought:
Regular Languages can be defined by their
closure properties