finite automata and regular expressions haniel...
TRANSCRIPT
CS:4330 Theory of ComputationSpring 2018
Regular LanguagesFinite Automata and Regular Expressions
Haniel Barbosa
Readings for this lecture
Chapter 1 of [Sipser 1996], 3rd edition. Sections 1.1 and 1.3.
Abstraction of Problems
B Data: abstracted as a word in a given alphabet
I Σ: alphabet, a finite, non-empty set of symbolsI Σ∗: all the words of finite length built up using Σ
B Conditions: abstracted as a set of words, called language
I Any subset L⊆ Σ∗
B Unknown: Implicitly a Boolean variable: true if a word is in the language,false otherwise
I Given w ∈ Σ∗ and L⊆ Σ∗, does w ∈ L?
1 / 14
Finite Automata
B The simplest computational model is called a finite state machine or a finiteautomaton
B Representations:
I GraphicalI TabularI Mathematical
2 / 14
Computation of a Finite Automaton
B The automaton receives the input symbols one by one from left to right,changing the “active” state
B After reading each symbol, the “active state” moves from one state toanother along the transition that has that symbol as its label
B When the last symbol of the input is read the automaton produces theoutput: accept if the “active state” is in an accept state, or reject otherwise
3 / 14
Applications
B Finite automata are popular in parser construction of compilers
B Finite automata and their probabilistic counterparts, Markov chains, areuseful tools for pattern recognitionExample: speech processing and optical character recognition
B Markov chains have been used to model and predict price changes infinancial applications
4 / 14
Formal Definition of Finite Automata
A finite automaton is a 5-tuple (Q, Σ, δ, q0, F) in which:
B Q is a finite set called the states
B Σ is a finite set called the alphabet
B δ : Q×Σ→ Q is the transition function
B q0 ∈ Q is the start state, also called initial stat
B F ⊆ Q is the set of accepted states, also called the final states
5 / 14
Language Recognized by an Automaton
B If L is the set of all strings that a finite automaton M accepts, we say that Lis the language of the machine M and write L (M) = L
B An automaton may accept several strings, but it always recognizes only onelanguage
B If a machine accepts no strings, it still recognizes one language, namely theempty language ∅
6 / 14
Formal Definition of Acceptance
Let M = (Q, Σ, δ, q0, F) be a finite automaton and w = a0 . . .an be a string over Σ
Then M accepts w if a sequence of states r0, . . . , rn exists in Q such that:
1. r0 = q0
2. δ(ri, ai+1) = ri+1 for i = 0,1, . . . ,n−1
3. rn ∈ F
Condition (1) says where the machine starts
Condition (2) says that the machine goes from state to state according to its transitionfunction δ
Condition (3) says when the machine accepts its input: if it ends up in an accept state
7 / 14
Regular Languages
We say that a finite automaton M recognizes the language L ifL = {w |M accepts w}
Definition: A language is called regular language if there exists a finiteautomaton that recognizes it.
8 / 14
Designing Finite Automata
B Whether it be of automaton or artwork, design is a creative process.Consequently it cannot be reduced to a simple recipe or formula.
B The approach:
I Identify the finite pieces of information you need to solve the problem.These are the states
I Identify the condition (alphabet) to change from one state to anotherI Identify the start and final statesI Add missing transitions
9 / 14
Operations on Regular Languages
Let A and B be languages. We define regular operations union, concatenation,and start as follows
B Union: A∪B = {x | x ∈ A∨ x ∈ B}B Concatenation: A◦B = {xy | x ∈ A∧ y ∈ B}B Star: A∗ = {x1 . . .xk | k ≥ 0∧ xi ∈ A, 0≤ i≤ k}
Note:
1. ε ∈ A∗, no matter what A is
2. A+ denotes A◦A∗
10 / 14
Regular Expressions
A regular expression (RE in short) is a string of symbols that describes a regularlanguage.
B Three base cases:
I For any a ∈ Σ, a is a regular expression denoting the language {a}I ε is a regular expression denoting the language {ε}I ∅ is a regular expression denoting the language ∅;
B Three recursive cases: If r1 and r2 are regular expressions denotinglanguages L1 and L2, respectively, then
I Union: r1∪ r2 denotes L1∪L2I Concatenation: r1r2 denotes L1 ◦L2I Star: r∗1 denotes L∗1
11 / 14
Some useful notation
Let r be a regular expression:
B The string r+ represents rr∗, and it also holds that r+∪{ε}= r∗
B The string rk represents rr . . .r︸ ︷︷ ︸k times
B Recall that the symbol Σ represents the alphabet {a1, . . . , ak}
B As with automata, the language represented by r is denoted L (r)
12 / 14
Precedence Rules
B The star (∗) operation has the highest precedence
B The concatenation (◦) operation is second on precedence order
B The union (∪) or (+) operation is the least preferred
B Parenthesis can be omitted using these rules
13 / 14
Examples
0∗10∗ =
{w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ =
{w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ =
{w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ =
{w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ =
{w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 −
all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ −
all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ −
∅∅∗ − {ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ −
{ε}
14 / 14
Examples
0∗10∗ = {w | w contains a single 1}Σ∗1Σ∗ = {w | w has at least a single 1}Σ∗(101)Σ∗ = {w | w contains 101 as a substring}1∗(01+)∗ = {w | every 0 in w is followed by at least a single 1}(ΣΣ)∗ = {w | w is of even length}
0Σ∗0∪1Σ∗1∪0∪1 − all words starting and ending with the same letter(0∪ ε)1∗ = 01∗∪1∗ − all strings of forms 1,11,1 . . .1 and 0,1,11,1 . . .1R∅ − ∅∅∗ − {ε}
14 / 14