mathematical foundations of computer science chapter 3: regular languages and regular grammars

46
Mathematical Mathematical Foundations of Foundations of Computer Science Computer Science Chapter 3: Regular Chapter 3: Regular Languages and Regular Languages and Regular Grammars Grammars

Upload: beatrice-byrd

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Regular Languages  A regular language (over an alphabet Σ) is any language for which there exists a finite automaton that recognizes it.

TRANSCRIPT

Page 1: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Mathematical Foundations Mathematical Foundations of Computer Scienceof Computer Science

Chapter 3: Regular Languages and Chapter 3: Regular Languages and Regular GrammarsRegular Grammars

Page 2: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

LanguagesLanguages A A languagelanguage (over an (over an alphabetalphabet ΣΣ) is any ) is any

subset of the set of all possible strings over subset of the set of all possible strings over ΣΣ . The set of all possible strings is written . The set of all possible strings is written as as ΣΣ**..

Example:Example: ΣΣ = { = {aa, , bb, , cc}} ΣΣ** = { = {, , aa, , bb, , cc, , abab, , acac, , baba, , bcbc, , caca, , aaaaaa, …}, …} one language might be the set of strings of one language might be the set of strings of

length less than or equal to 2.length less than or equal to 2.LL = { = {, , aa, , bb, , cc, , aa, abaa, ab, , acac, , baba, , bb, bcbb, bc, , ca, cb, ccca, cb, cc}}

Page 3: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular LanguagesRegular Languages A A regular languageregular language (over an (over an alphabetalphabet ΣΣ) is ) is

any language for which there exists a finite any language for which there exists a finite automaton that recognizes it.automaton that recognizes it.

Page 4: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Mathematical Models of ComputationMathematical Models of Computation

This course studies a variety of mathematical This course studies a variety of mathematical models corresponding to notions of models corresponding to notions of computation.computation.

The The finitefinite automatonautomaton was our first example. was our first example. The The finite automatonfinite automaton is an example of an is an example of an

automatonautomaton model. model. There are other models as well.There are other models as well.

Page 5: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Mathematical Models of ComputationMathematical Models of Computation

Another important model is that of a Another important model is that of a grammargrammar..

We will shortly look at We will shortly look at regularregular grammarsgrammars.. But first, a digression:But first, a digression:

Page 6: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions A A regular expressionregular expression is a mathematical model is a mathematical model

for describing a particular type of language.for describing a particular type of language.

Regular expressions are kind of like Regular expressions are kind of like arithmetic expressions.arithmetic expressions.

The regular expression is defined recursively.The regular expression is defined recursively.

Page 7: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Given an alphabet Given an alphabet ΣΣ

, , λλ and and aa ΣΣ are all are all regular expressionsregular expressions..

If If rr11 and and rr22 are are regularregular expressionsexpressions, then so are , then so are rr11 + + rr22, , rr11 rr22 ,, rr11** and and ((rr11))..

• NoteNote: we usually write : we usually write rr11 rr22 as as rr11 rr22 ..

These are the only things that are These are the only things that are regularregular expressionsexpressions..

empty empty setset

empty empty stringstring

Page 8: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Meaning:Meaning:

represents the empty language represents the empty language λλ represents the language { represents the language {λλ}} aa represents the language {represents the language {aa}} rr11 + + rr22 represents the language represents the language LL((rr11) ) LL((rr22)) rr11 rr22 represents represents LL((rr11) ) LL((rr22)) rr11** represents ( represents (LL((rr11))))**

Page 9: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Example 1:Example 1:

What does What does aa*(*(aa + + bb)) represent? represent? It represents zero or more It represents zero or more aa's followed by either an 's followed by either an

aa or a or a bb.. {{aa, , bb, , aaaa, , abab, , aaaaaa, , aabaab, , aaaaaaaa, , aaabaaab …} …}

Page 10: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions ExampleExample 2:2:

What does What does ((aa ++ bb)*()*(aa ++ bbbb)) represent? represent? It represents zero or more symbols, each of which It represents zero or more symbols, each of which

can be an can be an aa or a or a bb, followed by either , followed by either aa or or bbbb.. {{aa, , bbbb, , aaaa, , abbabb, , baba, , bbbbbb, , aaaaaa, , aabbaabb, , abaaba, , abbbabbb, ,

baabaa, , babbbabb, , bbabba, , bbbbbbbb, …}, …}

Page 11: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Example 3:Example 3:

What does What does ((aaaa)*()*(bbbb)*)*bb represent? represent? All strings over {All strings over {aa, , bb} that start with an even } that start with an even

number of number of aa's which are then followed by an odd 's which are then followed by an odd number of number of bb's.'s.

It's important to understand the underlying It's important to understand the underlying meaning of a regular expression.meaning of a regular expression.

Page 12: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Example 4:Example 4:

Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have at least one pair of consecutive 0's.which have at least one pair of consecutive 0's.

Each such string must have a 00 somewhere in it.Each such string must have a 00 somewhere in it. It could have any string in front of it and any string It could have any string in front of it and any string

after it, as long as it's there!!!after it, as long as it's there!!! Any string is represented by (0 Any string is represented by (0 ++ 1)* 1)* Answer: Answer: (0 + 1)*00(0 + 1)*(0 + 1)*00(0 + 1)*

Page 13: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Example:Example:

Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's.which have no pairs of consecutive 0's.

• It's a repetition of strings that are either It's a repetition of strings that are either 11's or, if a 's or, if a substring begins withsubstring begins with 0 0, it must be followed by at least , it must be followed by at least one one 11..

• (1 + 011*)*(1 + 011*)*• or equivalently, or equivalently, (1 + 01)*(1 + 01)*• But such strings can't end in a But such strings can't end in a 00..

Page 14: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Example:Example:

Find a regular expression for strings of 0's and 1's Find a regular expression for strings of 0's and 1's which have no pairs of consecutive 0's.which have no pairs of consecutive 0's.

• (1 + 011*)*(1 + 011*)*• (1 + 01)*(1 + 01)*• But such strings can't end in a But such strings can't end in a 00..• So we add So we add (0 + (0 + λλ)) to the end to allow for this. to the end to allow for this.• (1 + 01)* (0 + (1 + 01)* (0 + λλ))

This is only one of many possible answers.This is only one of many possible answers.

Page 15: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Why are they called Why are they called regular expressionsregular expressions?? Because, as it turns out, the set of languages Because, as it turns out, the set of languages

they describe is that of they describe is that of the regular languagesthe regular languages.. That means that regular expressions are just That means that regular expressions are just

another model for the same thing as finite another model for the same thing as finite automata.automata.

Page 16: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular ExpressionsRegular Expressions Homework:Homework:

Chapter 3, Section 1Chapter 3, Section 1• Problems 1-11, 17, 18Problems 1-11, 17, 18

Page 17: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

As we have said, regular expressions and finite As we have said, regular expressions and finite automata are really different ways of automata are really different ways of expressing the same thing.expressing the same thing.

Let's see why.Let's see why. Given a regular expression, how can we build Given a regular expression, how can we build

an equivalent finite automaton?an equivalent finite automaton? (We won't bother going the other way, (We won't bother going the other way,

although it can be done.)although it can be done.)

Page 18: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

Clearly there are simple finite automata Clearly there are simple finite automata corresponding to the simple regular expressions:corresponding to the simple regular expressions:

λλ

aa

λλ

aa

Note that each of these has an initial state Note that each of these has an initial state and one accepting state.and one accepting state.

Page 19: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

On the previous slide, we saw that the simplest On the previous slide, we saw that the simplest regular expressions can be represented by a regular expressions can be represented by a finite automaton with an initial state (duh!) finite automaton with an initial state (duh!) and one isolated accepting state:and one isolated accepting state:

Page 20: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

We can build more complex automata for We can build more complex automata for more complex regular expressions using this more complex regular expressions using this model:model:

Page 21: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

Here's how we build an nfa for Here's how we build an nfa for rr11 + + rr22::

λλ

λλ λλ

λλ

rr11

rr22

rr1 1 + + rr22

Page 22: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

Here's how we build an nfa for Here's how we build an nfa for rr11 rr22::

rr11

rr22

λλλλ

λλ

rr11 rr22

Page 23: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and Regular Regular Expressions and Regular LanguagesLanguages

Here's how we build an nfa for (Here's how we build an nfa for (rr11)*:)*:

λλ λλ

λλ

λλ

rr11

((rr11)*)*

λλ

Note: the last state added is not in book. For safety, I do it Note: the last state added is not in book. For safety, I do it to have only one arc going into the final stateto have only one arc going into the final state..

Page 24: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Building an nfa from a regular Building an nfa from a regular expressionexpression

Example:Example: Consider the regular expression (Consider the regular expression (aa + + bbbb)()(aa++bb)*()*(bbbb))

aa

bb bb

λλ

λλ

λλ

λλ

λλ

λλ

aa

bbλλ λλ

λλ

λλ

λλ

λλ

λλ

λλ

bb

bb

sometimes we just get tired and sometimes we just get tired and take an obvious shortcuttake an obvious shortcut

Page 25: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Building regular expression from a Building regular expression from a finite automatonfinite automaton

The book goes on to show that it works the The book goes on to show that it works the other way around as well: we can find a other way around as well: we can find a corresponding regular expression for any finite corresponding regular expression for any finite automaton.automaton.

It's fairly easy in some cases and you can "just It's fairly easy in some cases and you can "just do it."do it."

However, it's generally complicated and not However, it's generally complicated and not worth the bother studying.worth the bother studying.

You are not responsible for this materialYou are not responsible for this material

Page 26: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Building regular expression from a Building regular expression from a finite automatonfinite automaton

The above automaton clearly corresponds toThe above automaton clearly corresponds to

aa*(*(aa++bb))cc**

a, ba, b

ccaa

Page 27: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Expressions and nfa'sRegular Expressions and nfa's

Homework:Homework: Chapter 3, Section 2Chapter 3, Section 2

• Problems 1-5Problems 1-5

Page 28: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars Review: A Review: A grammargrammar is a quadruple is a quadruple

GG = ( = (VV, , TT, , SS, , PP) ) wherewhere VV is a finite set of is a finite set of variablesvariables TT is a finite set of symbols, called is a finite set of symbols, called terminalsterminals SS is in is in VV and is called the and is called the startstart symbolsymbol PP is a finite set of is a finite set of productionsproductions, which are , which are rulesrules of of

the formthe formαα →→ ββ

• where where αα and and ββ are strings consisting of terminals and are strings consisting of terminals and variables.variables.

Page 29: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars

A grammar is said to be A grammar is said to be right-linearright-linear if every if every production in production in PP is of the form is of the form AA →→ xBxB or or AA →→ xx where where AA and and BB are variables (perhaps the same, are variables (perhaps the same,

perhaps the start symbol perhaps the start symbol SS) in ) in VV and and xx is any string of terminal symbols (including is any string of terminal symbols (including

the empty string the empty string λλ))

Page 30: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars An alternate (and better) definition of a An alternate (and better) definition of a right-right-

linear grammarlinear grammar says that every production in says that every production in PP is of the form is of the form AA →→ aBaB or or AA →→ a a oror SS →→ λλ (to allow (to allow λλ to be in the language) to be in the language) where where AA and and BB are variables (perhaps the same, but are variables (perhaps the same, but

BB can't be can't be SS) in ) in VV and and aa is any terminal symbol is any terminal symbol

Page 31: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars

The reason I prefer the second definition The reason I prefer the second definition (although I accept the first one that happens to (although I accept the first one that happens to be used in the book) isbe used in the book) is It's easier to work with in proving things.It's easier to work with in proving things. It's the much more common definition.It's the much more common definition.

Page 32: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars

A grammar is said to be A grammar is said to be left-linearleft-linear if every if every production in production in PP is of the form is of the form AA →→ BxBx or or AA →→ xx where where AA and and BB are variables (perhaps the same, are variables (perhaps the same,

perhaps the start symbol perhaps the start symbol SS) in ) in VV and and xx is any string of terminal symbols (including is any string of terminal symbols (including

the empty string the empty string λλ))

Page 33: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars The alternate definition of a The alternate definition of a left-linear left-linear

grammargrammar says that every production in says that every production in PP is of is of the formthe form AA →→ BaBa or or AA →→ a a oror SS →→ λλ where where AA and and BB are variables (perhaps the same, but are variables (perhaps the same, but

BB can't be can't be SS) in ) in VV and and aa is any terminal symbol is any terminal symbol

Page 34: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars

Any Any left-linearleft-linear or or right-linearright-linear grammar is grammar is called a called a regular grammarregular grammar..

Page 35: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars

For brevity, we often write a set of productions For brevity, we often write a set of productions such assuch as AA → → xx11

AA → → xx22

AA → x→ x33

AsAs A A → → xx11 | | xx22 | | xx33

Page 36: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars A derivation in grammar A derivation in grammar GG is any sequence of is any sequence of

strings in strings in VV and and TT, , connected with connected with starting with starting with SS and ending with a string containing and ending with a string containing

no variablesno variables where each subsequent string is obtained by where each subsequent string is obtained by

applying a production in applying a production in PP is called a derivation. is called a derivation. SS xx11 xx22 xx33 . . .. . . xxnn

abbreviated as:abbreviated as: SS xxnn

*

Page 37: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars SS xx11 xx22 xx33 . . .. . . xxnn

abbreviated as:abbreviated as: SS xxnn

We say that We say that xxnn is a is a sentencesentence of the language of the language generated by generated by GG, , LL((GG))..

We say that the other We say that the other xx's are 's are sentential formssentential forms..

*

Page 38: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular GrammarsRegular Grammars L(G) = L(G) = {{w | w w | w T* and S T* and S xxnn}}

We call We call L(G)L(G) the language generated by the language generated by GG

L(G)L(G) is the set of all sentences over is the set of all sentences over grammar grammar GG

*

Page 39: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Example 1Example 1

SS →→ abSabS || a a is an example of a right-linear is an example of a right-linear grammar.grammar.

Can you figure out what language it generates?Can you figure out what language it generates? L =L = {{w w {{a,ba,b}}* |* | ww contains alternating contains alternating aa's 's

and and bb's , 's , begins with an begins with an aa, , and and ends with a ends with a bb}} {{aa}}

LL((((abab))*a*a))

Page 40: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Example 2Example 2

SS →→ AabAabA A →→ Aab Aab || aBaBB B →→ a a is an example of a left-linear grammar.is an example of a left-linear grammar.

Can you figure out what language it generates?Can you figure out what language it generates? LL = { = {ww { {a,ba,b}* | }* | ww isis aaaa followed by at least followed by at least

one set of alternatingone set of alternating abab's's}} LL((aaabaaab((abab)*) )*)

Page 41: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Example 3Example 3

Consider the grammarConsider the grammarSS →→ AAA A →→ aB aB | | λλ B B →→ Ab Ab

This grammar is NOT regular.This grammar is NOT regular. No "mixing and matching" left- and right-No "mixing and matching" left- and right-

recursive productions.recursive productions.

Page 42: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Grammars and nfa'sRegular Grammars and nfa's

It's not hard to show that regular grammars It's not hard to show that regular grammars generate and nfa's accept the same class of generate and nfa's accept the same class of languages: the regular languages!languages: the regular languages!

It's a long proof, where we must show that It's a long proof, where we must show that any finite automaton has a corresponding left- or any finite automaton has a corresponding left- or

right-linear grammar,right-linear grammar, and any regular grammar has a corresponding nfa.and any regular grammar has a corresponding nfa.

We won't bother with the details. We won't bother with the details.

Page 43: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Grammars and nfa'sRegular Grammars and nfa's

We get a feel for this by example.We get a feel for this by example. Let Let SS →→ aAaA A A →→ abS abS | | bb

SS AAaa bb

bb aa

Page 44: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Grammars and Regular Regular Grammars and Regular ExpressionsExpressions

Example: Example: LL((aab*aaab*a)) We can easily construct a regular language for We can easily construct a regular language for

this expression:this expression: SS →→ aAaA AA →→ aB aB BB →→ bBbB BB →→ aa

Page 45: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular LanguagesRegular Languages

regular regular expressionsexpressions

regular regular grammarsgrammars

finite finite automataautomata

Page 46: Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular LanguagesRegular Languages

Homework:Homework: Chapter 3, Section 3Chapter 3, Section 3 ProblemsProblems