mathematical foundations of computer science chapter 3: regular languages and regular grammars
DESCRIPTION
Regular Languages A regular language (over an alphabet Σ) is any language for which there exists a finite automaton that recognizes it.TRANSCRIPT
Mathematical Foundations Mathematical Foundations of Computer Scienceof Computer Science
Chapter 3: Regular Languages and Chapter 3: Regular Languages and Regular GrammarsRegular Grammars
LanguagesLanguages A A languagelanguage (over an (over an alphabetalphabet ΣΣ) is any ) is any
subset of the set of all possible strings over subset of the set of all possible strings over ΣΣ . The set of all possible strings is written . The set of all possible strings is written as as ΣΣ**..
Example:Example: ΣΣ = { = {aa, , bb, , cc}} ΣΣ** = { = {, , aa, , bb, , cc, , abab, , acac, , baba, , bcbc, , caca, , aaaaaa, …}, …} one language might be the set of strings of one language might be the set of strings of
length less than or equal to 2.length less than or equal to 2.LL = { = {, , aa, , bb, , cc, , aa, abaa, ab, , acac, , baba, , bb, bcbb, bc, , ca, cb, ccca, cb, cc}}
Regular LanguagesRegular Languages A A regular languageregular language (over an (over an alphabetalphabet ΣΣ) is ) is
any language for which there exists a finite any language for which there exists a finite automaton that recognizes it.automaton that recognizes it.
Mathematical Models of ComputationMathematical Models of Computation
This course studies a variety of mathematical This course studies a variety of mathematical models corresponding to notions of models corresponding to notions of computation.computation.
The The finitefinite automatonautomaton was our first example. was our first example. The The finite automatonfinite automaton is an example of an is an example of an
automatonautomaton model. model. There are other models as well.There are other models as well.
Mathematical Models of ComputationMathematical Models of Computation
Another important model is that of a Another important model is that of a grammargrammar..
We will shortly look at We will shortly look at regularregular grammarsgrammars.. But first, a digression:But first, a digression:
Regular ExpressionsRegular Expressions A A regular expressionregular expression is a mathematical model is a mathematical model
for describing a particular type of language.for describing a particular type of language.
Regular expressions are kind of like Regular expressions are kind of like arithmetic expressions.arithmetic expressions.
The regular expression is defined recursively.The regular expression is defined recursively.
Regular ExpressionsRegular Expressions Given an alphabet Given an alphabet ΣΣ
, , λλ and and aa ΣΣ are all are all regular expressionsregular expressions..
If If rr11 and and rr22 are are regularregular expressionsexpressions, then so are , then so are rr11 + + rr22, , rr11 rr22 ,, rr11** and and ((rr11))..
• NoteNote: we usually write : we usually write rr11 rr22 as as rr11 rr22 ..
These are the only things that are These are the only things that are regularregular expressionsexpressions..
empty empty setset
empty empty stringstring
Regular ExpressionsRegular Expressions Meaning:Meaning:
represents the empty language represents the empty language λλ represents the language { represents the language {λλ}} aa represents the language {represents the language {aa}} rr11 + + rr22 represents the language represents the language LL((rr11) ) LL((rr22)) rr11 rr22 represents represents LL((rr11) ) LL((rr22)) rr11** represents ( represents (LL((rr11))))**
Regular ExpressionsRegular Expressions Example 1:Example 1:
What does What does aa*(*(aa + + bb)) represent? represent? It represents zero or more It represents zero or more aa's followed by either an 's followed by either an
aa or a or a bb.. {{aa, , bb, , aaaa, , abab, , aaaaaa, , aabaab, , aaaaaaaa, , aaabaaab …} …}
Regular ExpressionsRegular Expressions ExampleExample 2:2:
What does What does ((aa ++ bb)*()*(aa ++ bbbb)) represent? represent? It represents zero or more symbols, each of which It represents zero or more symbols, each of which
can be an can be an aa or a or a bb, followed by either , followed by either aa or or bbbb.. {{aa, , bbbb, , aaaa, , abbabb, , baba, , bbbbbb, , aaaaaa, , aabbaabb, , abaaba, , abbbabbb, ,
baabaa, , babbbabb, , bbabba, , bbbbbbbb, …}, …}
Regular ExpressionsRegular Expressions Example 3:Example 3:
What does What does ((aaaa)*()*(bbbb)*)*bb represent? represent? All strings over {All strings over {aa, , bb} that start with an even } that start with an even
number of number of aa's which are then followed by an odd 's which are then followed by an odd number of number of bb's.'s.
It's important to understand the underlying It's important to understand the underlying meaning of a regular expression.meaning of a regular expression.
Regular ExpressionsRegular Expressions Example 4:Example 4:
Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have at least one pair of consecutive 0's.which have at least one pair of consecutive 0's.
Each such string must have a 00 somewhere in it.Each such string must have a 00 somewhere in it. It could have any string in front of it and any string It could have any string in front of it and any string
after it, as long as it's there!!!after it, as long as it's there!!! Any string is represented by (0 Any string is represented by (0 ++ 1)* 1)* Answer: Answer: (0 + 1)*00(0 + 1)*(0 + 1)*00(0 + 1)*
Regular ExpressionsRegular Expressions Example:Example:
Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's.which have no pairs of consecutive 0's.
• It's a repetition of strings that are either It's a repetition of strings that are either 11's or, if a 's or, if a substring begins withsubstring begins with 0 0, it must be followed by at least , it must be followed by at least one one 11..
• (1 + 011*)*(1 + 011*)*• or equivalently, or equivalently, (1 + 01)*(1 + 01)*• But such strings can't end in a But such strings can't end in a 00..
Regular ExpressionsRegular Expressions Example:Example:
Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's.which have no pairs of consecutive 0's.
• (1 + 011*)*(1 + 011*)*• (1 + 01)*(1 + 01)*• But such strings can't end in a But such strings can't end in a 00..• So we add So we add (0 + (0 + λλ)) to the end to allow for this. to the end to allow for this.• (1 + 01)* (0 + (1 + 01)* (0 + λλ))
This is only one of many possible answers.This is only one of many possible answers.
Regular ExpressionsRegular Expressions Why are they called Why are they called regular expressionsregular expressions?? Because, as it turns out, the set of languages Because, as it turns out, the set of languages
they describe is that of they describe is that of the regular languagesthe regular languages.. That means that regular expressions are just That means that regular expressions are just
another model for the same thing as finite another model for the same thing as finite automata.automata.
Regular ExpressionsRegular Expressions Homework:Homework:
Chapter 3, Section 1Chapter 3, Section 1• Problems 1-11, 17, 18Problems 1-11, 17, 18
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
As we have said, regular expressions and finite As we have said, regular expressions and finite automata are really different ways of automata are really different ways of expressing the same thing.expressing the same thing.
Let's see why.Let's see why. Given a regular expression, how can we build Given a regular expression, how can we build
an equivalent finite automaton?an equivalent finite automaton? (We won't bother going the other way, (We won't bother going the other way,
although it can be done.)although it can be done.)
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
Clearly there are simple finite automata Clearly there are simple finite automata corresponding to the simple regular expressions:corresponding to the simple regular expressions:
λλ
aa
λλ
aa
Note that each of these has an initial state Note that each of these has an initial state and one accepting state.and one accepting state.
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
On the previous slide, we saw that the simplest On the previous slide, we saw that the simplest regular expressions can be represented by a regular expressions can be represented by a finite automaton with an initial state (duh!) finite automaton with an initial state (duh!) and one isolated accepting state:and one isolated accepting state:
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
We can build more complex automata for We can build more complex automata for more complex regular expressions using this more complex regular expressions using this model:model:
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
Here's how we build an nfa for Here's how we build an nfa for rr11 + + rr22::
λλ
λλ λλ
λλ
rr11
rr22
rr1 1 + + rr22
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
Here's how we build an nfa for Here's how we build an nfa for rr11 rr22::
rr11
rr22
λλλλ
λλ
rr11 rr22
Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages
Here's how we build an nfa for (Here's how we build an nfa for (rr11)*:)*:
λλ λλ
λλ
λλ
rr11
((rr11)*)*
λλ
Note: the last state added is not in book. For safety, I do it Note: the last state added is not in book. For safety, I do it to have only one arc going into the final stateto have only one arc going into the final state..
Building an nfa from a regular Building an nfa from a regular expressionexpression
Example:Example: Consider the regular expression (Consider the regular expression (aa + + bbbb)()(aa++bb)*()*(bbbb))
aa
bb bb
λλ
λλ
λλ
λλ
λλ
λλ
aa
bbλλ λλ
λλ
λλ
λλ
λλ
λλ
λλ
bb
bb
sometimes we just get tired and sometimes we just get tired and take an obvious shortcuttake an obvious shortcut
Building regular expression from a Building regular expression from a finite automatonfinite automaton
The book goes on to show that it works the The book goes on to show that it works the other way around as well: we can find a other way around as well: we can find a corresponding regular expression for any finite corresponding regular expression for any finite automaton.automaton.
It's fairly easy in some cases and you can "just It's fairly easy in some cases and you can "just do it."do it."
However, it's generally complicated and not However, it's generally complicated and not worth the bother studying.worth the bother studying.
You are not responsible for this materialYou are not responsible for this material
Building regular expression from a Building regular expression from a finite automatonfinite automaton
The above automaton clearly corresponds toThe above automaton clearly corresponds to
aa*(*(aa++bb))cc**
a, ba, b
ccaa
Regular Expressions and nfa'sRegular Expressions and nfa's
Homework:Homework: Chapter 3, Section 2Chapter 3, Section 2
• Problems 1-5Problems 1-5
Regular GrammarsRegular Grammars Review: A Review: A grammargrammar is a quadruple is a quadruple
GG = ( = (VV, , TT, , SS, , PP) ) wherewhere VV is a finite set of is a finite set of variablesvariables TT is a finite set of symbols, called is a finite set of symbols, called terminalsterminals SS is in is in VV and is called the and is called the startstart symbolsymbol PP is a finite set of is a finite set of productionsproductions, which are , which are rulesrules of of
the formthe formαα →→ ββ
• where where αα and and ββ are strings consisting of terminals and are strings consisting of terminals and variables.variables.
Regular GrammarsRegular Grammars
A grammar is said to be A grammar is said to be right-linearright-linear if every if every production in production in PP is of the form is of the form AA →→ xBxB or or AA →→ xx where where AA and and BB are variables (perhaps the same, are variables (perhaps the same,
perhaps the start symbol perhaps the start symbol SS) in ) in VV and and xx is any string of terminal symbols (including is any string of terminal symbols (including
the empty string the empty string λλ))
Regular GrammarsRegular Grammars An alternate (and better) definition of a An alternate (and better) definition of a right-right-
linear grammarlinear grammar says that every production in says that every production in PP is of the form is of the form AA →→ aBaB or or AA →→ a a oror SS →→ λλ (to allow (to allow λλ to be in the language) to be in the language) where where AA and and BB are variables (perhaps the same, but are variables (perhaps the same, but
BB can't be can't be SS) in ) in VV and and aa is any terminal symbol is any terminal symbol
Regular GrammarsRegular Grammars
The reason I prefer the second definition The reason I prefer the second definition (although I accept the first one that happens to (although I accept the first one that happens to be used in the book) isbe used in the book) is It's easier to work with in proving things.It's easier to work with in proving things. It's the much more common definition.It's the much more common definition.
Regular GrammarsRegular Grammars
A grammar is said to be A grammar is said to be left-linearleft-linear if every if every production in production in PP is of the form is of the form AA →→ BxBx or or AA →→ xx where where AA and and BB are variables (perhaps the same, are variables (perhaps the same,
perhaps the start symbol perhaps the start symbol SS) in ) in VV and and xx is any string of terminal symbols (including is any string of terminal symbols (including
the empty string the empty string λλ))
Regular GrammarsRegular Grammars The alternate definition of a The alternate definition of a left-linear left-linear
grammargrammar says that every production in says that every production in PP is of is of the formthe form AA →→ BaBa or or AA →→ a a oror SS →→ λλ where where AA and and BB are variables (perhaps the same, but are variables (perhaps the same, but
BB can't be can't be SS) in ) in VV and and aa is any terminal symbol is any terminal symbol
Regular GrammarsRegular Grammars
Any Any left-linearleft-linear or or right-linearright-linear grammar is grammar is called a called a regular grammarregular grammar..
Regular GrammarsRegular Grammars
For brevity, we often write a set of productions For brevity, we often write a set of productions such assuch as AA → → xx11
AA → → xx22
AA → x→ x33
AsAs A A → → xx11 | | xx22 | | xx33
Regular GrammarsRegular Grammars A derivation in grammar A derivation in grammar GG is any sequence of is any sequence of
strings in strings in VV and and TT, , connected with connected with starting with starting with SS and ending with a string containing and ending with a string containing
no variablesno variables where each subsequent string is obtained by where each subsequent string is obtained by
applying a production in applying a production in PP is called a derivation. is called a derivation. SS xx11 xx22 xx33 . . .. . . xxnn
abbreviated as:abbreviated as: SS xxnn
*
Regular GrammarsRegular Grammars SS xx11 xx22 xx33 . . .. . . xxnn
abbreviated as:abbreviated as: SS xxnn
We say that We say that xxnn is a is a sentencesentence of the language of the language generated by generated by GG, , LL((GG))..
We say that the other We say that the other xx's are 's are sentential formssentential forms..
*
Regular GrammarsRegular Grammars L(G) = L(G) = {{w | w w | w T* and S T* and S xxnn}}
We call We call L(G)L(G) the language generated by the language generated by GG
L(G)L(G) is the set of all sentences over is the set of all sentences over grammar grammar GG
*
Example 1Example 1
SS →→ abSabS || a a is an example of a right-linear is an example of a right-linear grammar.grammar.
Can you figure out what language it generates?Can you figure out what language it generates? L =L = {{w w {{a,ba,b}}* |* | ww contains alternating contains alternating aa's 's
and and bb's , 's , begins with an begins with an aa, , and and ends with a ends with a bb}} {{aa}}
LL((((abab))*a*a))
Example 2Example 2
SS →→ AabAabA A →→ Aab Aab || aBaBB B →→ a a is an example of a left-linear grammar.is an example of a left-linear grammar.
Can you figure out what language it generates?Can you figure out what language it generates? LL = { = {ww { {a,ba,b}* | }* | ww isis aaaa followed by at least followed by at least
one set of alternatingone set of alternating abab's's}} LL((aaabaaab((abab)*) )*)
Example 3Example 3
Consider the grammarConsider the grammarSS →→ AAA A →→ aB aB | | λλ B B →→ Ab Ab
This grammar is NOT regular.This grammar is NOT regular. No "mixing and matching" left- and right-No "mixing and matching" left- and right-
recursive productions.recursive productions.
Regular Grammars and nfa'sRegular Grammars and nfa's
It's not hard to show that regular grammars It's not hard to show that regular grammars generate and nfa's accept the same class of generate and nfa's accept the same class of languages: the regular languages!languages: the regular languages!
It's a long proof, where we must show that It's a long proof, where we must show that any finite automaton has a corresponding left- or any finite automaton has a corresponding left- or
right-linear grammar,right-linear grammar, and any regular grammar has a corresponding nfa.and any regular grammar has a corresponding nfa.
We won't bother with the details. We won't bother with the details.
Regular Grammars and nfa'sRegular Grammars and nfa's
We get a feel for this by example.We get a feel for this by example. Let Let SS →→ aAaA A A →→ abS abS | | bb
SS AAaa bb
bb aa
Regular Grammars and Regular Regular Grammars and Regular ExpressionsExpressions
Example: Example: LL((aab*aaab*a)) We can easily construct a regular language for We can easily construct a regular language for
this expression:this expression: SS →→ aAaA AA →→ aB aB BB →→ bBbB BB →→ aa
Regular LanguagesRegular Languages
regular regular expressionsexpressions
regular regular grammarsgrammars
finite finite automataautomata
Regular LanguagesRegular Languages
Homework:Homework: Chapter 3, Section 3Chapter 3, Section 3 ProblemsProblems