cs 3813: introduction to formal languages and automata

34
CS 3813: Introduction to Formal Languages and Automata Chapter 1 Introduction to the Theory of Computation These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 3 rd ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2001. They are intended for classroom use only and are not a substitute for reading

Upload: jaafar

Post on 06-Jan-2016

45 views

Category:

Documents


2 download

DESCRIPTION

CS 3813: Introduction to Formal Languages and Automata. Chapter 1 Introduction to the Theory of Computation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 3813: Introduction to Formal Languages and Automata

CS 3813: Introduction to Formal Languages and Automata

Chapter 1

Introduction to the Theory of Computation

These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 3rd ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, 2001. They are intended for classroom use only and are not a substitute for reading the textbook.

Page 2: CS 3813: Introduction to Formal Languages and Automata

Acknowledgement

Unless otherwise credited, all class notes on this website are based on the required textbook for this course, Linz, Peter. An Introduction to Formal Languages and Automata, 3rd ed. Sudbury, Mass.: Jones and Bartlett Publishers, 2001.

These notes are intended solely for the use of the students in the CS 3813 class at Mississippi State University.

Please assume any errors to be mine, and not the author of the textbook.

Page 3: CS 3813: Introduction to Formal Languages and Automata

Topic of course

• What are the fundamental capabilities and limitations of computers?

• To answer this, we will study abstract mathematical models of computers

• These mathematical models abstract away many of the details of computers to allow us to focus on the essential aspects of computation

• It allows us to develop a mathematical theory of computation

Page 4: CS 3813: Introduction to Formal Languages and Automata

Review of set theory

Can specify a set in two ways: - list of elements: A = {6, 12, 28} - characteristic property: B = {x | x is a positive, even integer}

Set membership: 12 A, 9 ASet inclusion: A B (A is a subset of B)

A B (A is a proper subset of B)Set operations: union: A {9, 12} = {6, 9, 12, 28} intersection: A {9, 12} = {12} difference: A - {9, 12} = {6, 28}

Page 5: CS 3813: Introduction to Formal Languages and Automata

Set theory (continued)

Another set operation, called “taking the complement of a set”,assumes a universal set.

Let U = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} be the universal set.Let A = {2, 4, 6, 8}Then = U - A = {0, 1, 3, 5, 7, 9}

The empty set:

A

Page 6: CS 3813: Introduction to Formal Languages and Automata

Set theory (continued)

The cardinality of a set is the number of elements in a set.Let S = {2, 4, 6}Then |S| = 3

The powerset of S, represented by 2S, is the set of all subsets of S.2S = {{}, {2}, {4}, {6},{2,4}, {2,6}, {4,6}, {2,4,6}}

The number of elements in a powerset is |2S| = 2|S|

Page 7: CS 3813: Introduction to Formal Languages and Automata

What does the title of this course mean?

• Formal language– a subset of the set of all possible strings from from a set of

symbols

– example: the set of all syntactically correct C programs

• Automata– abstract, mathematical model of computer

– examples: finite automata, pushdown automata Turing machine, RAM, PRAM, many others

• Let’s consider each of these in turn

Page 8: CS 3813: Introduction to Formal Languages and Automata

Alphabet = finite set of symbols or characters examples: = {a,b}, binary, ASCII

String = finite sequence of symbols from an alphabet examples: aab, bbaba, also computer programs A formal language is a set of strings over an alphabetExamples of formal languages over alphabet = {a, b}:

L1 = {aa, aba, aababa, aa} L2 = {all strings containing just two a’s and any number of b’s}

A formal language can be finite or infinite.

Formal language

Page 9: CS 3813: Introduction to Formal Languages and Automata

We often use string variables; u = aab, v = bbaba

Operations on strings length: |u| = 3 reversal: uR = baa concatenation: uv = aabbbaba

The empty string, denoted , has some special properties:

| | = 0 ww w

Formal languages (continued)

Page 10: CS 3813: Introduction to Formal Languages and Automata

If w is a string, then wn stands for the string obtainedby repeating w n times.w0 =

}

L0 = {}L1 = L

Formal languages (continued)

Page 11: CS 3813: Introduction to Formal Languages and Automata

Operations on languagesSet operations:

L1 L2 = {x | x L1 or x L2} is unionL1 L2 = {x | x L1 and x L2} is intersectionL1 L2 = {x | x L1 and x L2} is difference = * - L is complementL1 L2 = (L1 - L2) (L2 - L1) is “symmetric difference”

String operations:LR = {wR | w L} is “reverse of language”L1L2 = {xy | x L1, y L2} is “concatenation of languages”L* = {x = x1…xk | k 0 and x1, …, xk L} =

L0 L1 L2 . . . . is “Kleene star” or "star closure"L+ = L1 L2 . . . . is positive closure

L

Page 12: CS 3813: Introduction to Formal Languages and Automata

Some review questions

• What is {, 01, 001} {, 00, 10}?

• What is the concatenation of {0, 11, 010} and {, 10, 010}?

• What are the 5 shortest strings in the language {0i1i | i 0}?

• What is the powerset {a, b, ab}?

Page 13: CS 3813: Introduction to Formal Languages and Automata

Important example of a formal language

• alphabet: ASCII symbols• string: a particular C++ program• formal language: set of all legal C++ programs

Page 14: CS 3813: Introduction to Formal Languages and Automata

Grammars

A grammar G is defined as a quadruple:

G = (V, T, S, P)

Where V is a finite set of objects called variables

T is a finite set of objects called terminal symbols

S V is a special symbol called the Start symbol

P is a finite set of productions or "production rules"

Sets V and T are nonempty and disjoint

Page 15: CS 3813: Introduction to Formal Languages and Automata

Grammars

Production rules have the form:x y

where x is an element of (V T)+ and y is in (V T)*

Given a string of the formw = uxv

and a production rulex y

we can apply the rule, replacing x with y, giving z = uyv

We can then say that w z

Read as "w derives z", or "z is derived from w"

Page 16: CS 3813: Introduction to Formal Languages and Automata

Grammars

If u v, v w, w x, x y, and y z, then we say: * u z

This says that u derives z in an unspecified number of steps.

Along the way, we may generate strings which contain variables as well as terminals. These are called sentential forms.

Page 17: CS 3813: Introduction to Formal Languages and Automata

Grammars

What is the relationship between a language and a grammar?

Let G = (V, T, S, P)

The set *

L(G) = {w T* : S w}is the language generated by G.

Page 18: CS 3813: Introduction to Formal Languages and Automata

Grammars

Consider the grammar G = (V, T, S, P), where:

V = {S}

T = {a, b}

S = S,

P = S aSb

S

Page 19: CS 3813: Introduction to Formal Languages and Automata

Grammars

What are some of the strings in this language?

S aSb ab

S aSb aaSbb aabb

S aSb aaSbb aaaSbbb aaabbb

It is easy to see that the language generated by this grammar is:

L(G) = {anbn : n 0}

(See proof on pp. 21-22 in Linz)

Page 20: CS 3813: Introduction to Formal Languages and Automata

GrammarsLet's go the other way, from a description of a language to a grammar that generates it. Find a grammar that generates:

L = {anbn+1 : n 0}

So the strings of this language will be:b (0 a's and 1 b)abb (1 a and 2 b's)aabbb (2 a's and 3 b's) . . .

In order to generate a string with no a's and 1 b, you might want to write rules for the grammar that say:

S ab a

But you can't do this; a is a terminal, and you can't change a terminal, only variables

Page 21: CS 3813: Introduction to Formal Languages and Automata

GrammarsSo, instead of:

S ab a

we create another variable, A (we often use capital letters to stand for variables), to use in place of the terminal, a:

S Ab A

Now you might think that we can use another S rule here to generate the other part of the string, the anbn part

S aSbBut you can't, because that will generate ab, aabb, etc.Note, however, that if we use A in place of S, that will solve our problem:

A aAb

Page 22: CS 3813: Introduction to Formal Languages and Automata

GrammarsSo, here are our rules:

S AbA aAbA

The S Ab rule creates a single b terminal on the right, preceded by other strings (including possibly the empty string) on the left.

The A rule allows the single b string to be generated.

The A aAb rule and the A rule allows ab, aabb, aaabbb, etc. to be generated on the left side of the string.

Page 23: CS 3813: Introduction to Formal Languages and Automata

Language-recognition problem

• There are many types of computational problem. We will focus on the simplest, called the “language-recognition problem.”

• Given a string, determine whether it belongs to a language or not. (Practical application for compilers: Is this a valid C++ program?)

• We study simple models of computation called “automata,” and measure their computational power in terms of the class of languages they can recognize.

Page 24: CS 3813: Introduction to Formal Languages and Automata

Automata

• Input File

• Control Unit (with finite states)

• Temporary Storage

• Output

Page 25: CS 3813: Introduction to Formal Languages and Automata

“Computer” or Turing machine(Alan Turing 1936)

0

1

2

3

X 0 X B 0

Finite-statecontrol

Infinite tape or “memory”

Read/write head

Page 26: CS 3813: Introduction to Formal Languages and Automata

Finite automata• Developed in 1940’s and 1950’s for neural net models of

brain and computer hardware design• Finite memory!• Many applications:

– text-editing software: search and replace– many forms of pattern-recognition (including use in WWW

search engines)– compilers: recognizing keywords (lexical analysis)– sequential circuit design– software specification and design– communications protocols

Page 27: CS 3813: Introduction to Formal Languages and Automata

Pushdown automata

• Noam Chomsky’s work in 1950’s and 1960’s on grammars for natural languages

• infinite memory, organized as a stack

• Applications:– compilers: parsing computer programs– programming language design

Page 28: CS 3813: Introduction to Formal Languages and Automata

Computational power

FSA

PDA

LBA

TM

Page 29: CS 3813: Introduction to Formal Languages and Automata

Automata, languages, and grammars

• In this course, we will study the relationship between automata, languages, and grammars

• Recall that a formal language is a set of strings over a finite alphabet

• Automata are used to recognize languages

• Grammars are used to generate languages

• (All these concepts fit together)

Page 30: CS 3813: Introduction to Formal Languages and Automata

Classification of automata, languages, and grammars

Automata Language GrammarTuring machine Recursively

enumerableRecursively enumerable

Linear-bounded automaton

Context sensitive Context sensitive

Nondeterministic push-down automaton

Context free Context free

Finite-state automaton

regular regular

Page 31: CS 3813: Introduction to Formal Languages and Automata

Besides developing a theory of classes of languages and automata, we will study the limits of computation. We will consider the following two important questions:– What problems are impossible for a computer

to solve?– What problems are too difficult for a computer

to solve in practice (although possible to solve in principle)?

Page 32: CS 3813: Introduction to Formal Languages and Automata

Uncomputable (undecidable) problems

• Many well-defined (and apparently simple) problems cannot be solved by any computer

• Examples:– For any program x, does x have an infinite loop?– For any two programs x and y, do these two

programs have the same input/output behavior?– For any program x, does x meet its specification?

(i.e., does it have any bugs?)

Page 33: CS 3813: Introduction to Formal Languages and Automata

Intractable problems• We will learn how to mathematically characterize the

difficulty of computational problems.• There is a class of problems that can be solved in a

reasonable amount of time and another class that cannot (What good is it for a problem to be solvable, if it cannot be solved in the lifetime of the universe?)

• The field of cryptography, for example, relies on the fact that the computational problem of “breaking a code” is intractable

Page 34: CS 3813: Introduction to Formal Languages and Automata

Why study the theory of computing?

• Core mathematics of CS (has not changed in over 30 years)

• Many applications, especially in design of compilers and programming languages

• Important to be able to recognize uncomputable and intractable problems

• Need to know this in order to be a computer scientist, not simply a computer programmer