meljun cortes automata8

26
CSC 3130: Automata theory and formal languages Context-free languages MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACS MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

Upload: meljun-cortes

Post on 15-Jul-2015

59 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: MELJUN CORTES automata8

CSC 3130: Automata theory and formal languages

Context-free languages

MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES automata8

Context-free grammar

• This is an a different model for describing languages

• The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g.

• Using these rules, we can derive strings like this:

A → 0A1A → BB → #

A, B are variables0, 1, # are terminalsA is the start variable

A ⇒ 0A1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000#111

Page 3: MELJUN CORTES automata8

Some natural examples

• Context-free grammars were first used for natural languages

a girl with a flower likes the boy

SENTENCE

NOUN-PHRASE VERB-PHRASE

ART NOUN PREP ART NOUN VERB ART NOUN

CMPLX-NOUNCMPLX-NOUN CMPLX-NOUN

PREP-PHRASE

CMPLX-VERB

NOUN-PHRASE

Page 4: MELJUN CORTES automata8

Natural languages

• We can describe (some fragments) of the English language by a context-free grammar:

SENTENCE → NOUN-PHRASE VERB-PHRASENOUN-PHRASE → CMPLX-NOUNNOUN-PHRASE → CMPLX-NOUN PREP-PHRASEVERB-PHRASE → CMPLX-VERBVERB-PHRASE → CMPLX-VERB PREP-PHRASEPREP-PHRASE → PREP CMPLX-NOUNCMPLX-NOUN → ARTICLE NOUNCMPLX-VERB → VERB NOUN-PHRASECMPLX-VERB → VERB

ARTICLE → aARTICLE → theNOUN → boyNOUN → girlNOUN → flowerVERB → likesVERB → touchesVERB → seesPREP → with

variables: SENTENCE, NOUN-PHRASE, …

terminals: a, the, boy, girl, flower, likes, touches, sees, with

start variable: SENTENCE

Page 5: MELJUN CORTES automata8

Programming languages

• Context-free grammars are also used to describe (parts of) programming languages

• For instance, expressions like (2 + 3) * 5 or 3 + 8 + 2 * 7 can be described by the CFG

<expr> → <expr> + <expr>

<expr> → <expr> * <expr>

<expr> → (<expr>)

<expr> → 0

<expr> → 1

<expr> → 9

Variables: <expr>

Terminals: +, *, (, ), 0, 1, …, 9

Page 6: MELJUN CORTES automata8

Motivation for studying CFGs

• Context-free grammars are essential for understanding the meaning of computer programs

• They are used in compilers

code: (2 + 3) * 5

meaning: “add 2 and 3, and then multiply by 5”

Page 7: MELJUN CORTES automata8

Definition of context-free grammar

• A context-free grammar (CFG) is a 4-tuple (V, T, P, S) where– V is a finite set of variables or non-terminals– T is a finite set of terminals (V ∩T = ∅)– P is a set of productions or substitution rules of the form

where A is a symbol in V and α is a string over V ∪ T– S is a variable in V called the start variable

A → α

Page 8: MELJUN CORTES automata8

Shorthand notation for productions

• When we have multiple productions with the same variable on the left like

we can write this in shorthand as

E → E + E

E → E * E

E → (E)

E → N

E → E + E | E * E | (E) | 0 | 1

N → 0N | 1N | 0 | 1

Variables: E, N

Terminals: +, *, (, ), 0, 1

Start variable: E

N → 0N

N → 1N

N → 0

N → 1

Page 9: MELJUN CORTES automata8

Derivation

• A derivation is a sequential application of productions:

E

deri

vatio

n

⇒ E * E

⇒ (E) * E

⇒ (E) * N

⇒ (E + E ) * N

⇒ (E + E ) * 1

⇒ (E + N) * 1

⇒ (N + N) * 1

⇒ (N + 1N) * 1

⇒ (N + 10) * 1

⇒ (1 + 10) * 1

α ⇒ βmeans β can be obtainedfrom α with one production

α ⇒ β*

means β can be obtainedfrom α after zero or moreproductions

Page 10: MELJUN CORTES automata8

Language of a CFG

• The language of a CFG (V, T, P, S) is the set of all strings containing only terminals that can be derived from the start variable S

• This is a language over the alphabet T

• A language L is context-free if it is the language of some CFG

L = {ω | ω ∈ T* and S ⇒ ω }*

Page 11: MELJUN CORTES automata8

Example 1

• Is the string 00#11 in L?

• How about 00#111, 00#0#1#11?

• What is the language of this CFG?

A → 0A1 | BB → #

variables: A, Bterminals: 0, 1, # start variable: A

L = {0n#1n: n ≥ 0}

Page 12: MELJUN CORTES automata8

Example 2

• Give derivations of (), (()())

• How about ())?

S → SS | (S) | εconvention: variables in uppercase, terminals in lowercase, start variable first

S ⇒ (S) (rule 2) ⇒ () (rule 3)

S ⇒ (S) (rule 2)⇒ (SS) (rule 1)⇒ ((S)S) (rule 2)⇒ ((S)(S)) (rule 2)⇒ (()(S)) (rule 3)⇒ (()()) (rule 3)

Page 13: MELJUN CORTES automata8

Examples: Designing CFGs

• Write a CFG for the following languages– Linear equations over x, y, z, like:

x + 5y – z = 911x – y = 2

– Numbers without leading zeros, e.g., 109, 0 but not 019

– The language L = {anbncmdm | n ≥ 0, m ≥ 0}

– The language L = {anbmcmdn | n ≥ 0, m ≥ 0}

Page 14: MELJUN CORTES automata8

Context-free versus regular

• Write a CFG for the language (0 + 1)*111

• Can you do so for every regular language?

• Proof:

S → A111A → ε | 0A | 1A

Every regular language is context-free

regularexpression

DFANFA

Page 15: MELJUN CORTES automata8

From regular to context-free

regular expression

∅εa (alphabet symbol)

E1 + E2

CFG

E1E2

E1*

grammar with no rules

S → εS → a

S → S1 | S2

S → S1S2

S → SS1 | ε

In all cases, S becomes the new start symbol

Page 16: MELJUN CORTES automata8

Context-free versus regular

• Is every context-free language regular?

• No! We already saw some examples:

• This language is context-free but not regular

A → 0A1 | BB → #

L = {0n#1n: n ≥ 0}

Page 17: MELJUN CORTES automata8

Parse tree

• Derivations can also be represented using parse trees

E ⇒ E + E ⇒ V + E ⇒ x + E ⇒ x + (E) ⇒ x + (E − E) ⇒ x + (V − E) ⇒ x + (y − E) ⇒ x + (y − V) ⇒ x + (y − z)

E

E E+

V ( E )

E − E

V V

x

y z

E → E + E | E - E | (E) | V V → x | y | z

Page 18: MELJUN CORTES automata8

Definition of parse tree

• A parse tree for a CFG G is an ordered tree with labels on the nodes such that– Every internal node is labeled by a variable– Every leaf is labeled by a terminal or ε– Leaves labeled by ε have no siblings– If a node is labeled A and has children A1, …, Ak from left to

right, then the rule

is a production in G.

A → A1…Ak

Page 19: MELJUN CORTES automata8

Left derivation

• Always derive the leftmost variable first:

• Corresponds to a left-to-right traversal of parse tree

E ⇒ E + E ⇒ V + E ⇒ x + E ⇒ x + (E) ⇒ x + (E − E) ⇒ x + (V − E) ⇒ x + (y − E) ⇒ x + (y − V) ⇒ x + (y − z)

E

E E+

V ( E )

E − E

V V

x

y z

Page 20: MELJUN CORTES automata8

Ambiguity

• A grammar is ambiguous if some strings have more than one parse tree

• Example: E → E + E | E − E | (E) | V V → x | y | z

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

Page 21: MELJUN CORTES automata8

Why ambiguity matters

• The parse tree represents the intended meaning:

x + y + z

E

E E+

E E+V

V Vx

y z

E

E E+

E E+ V

V V

x y

z

“first add y and z, and then add this to x”

“first add x and y, and then add z to this”

Page 22: MELJUN CORTES automata8

Why ambiguity matters

• Suppose we also had multiplication:E → E + E | E − E | E × E | (E) | V V → x | y | z

x × y + z

E

E E*

E E+V

V Vx

y z

E

E E+

E E× V

V V

x y

z

“first x × y, then + z”“first y + z, then x ×”

Page 23: MELJUN CORTES automata8

Disambiguation

• Sometimes we can rewrite the grammar to remove the ambiguity

• Rewrite grammar so × cannot be broken by +:

E → E + E | E − E | E × E | (E) | V V → x | y | z

E → T | E + T | E − TT → F | T × FF → (E) | VV → x | y | z

T stands for term: x * (y + z)F stands for factor: x, (y + z)

A term always splits into factors

A factor is either a variable or a parenthesized expression

Page 24: MELJUN CORTES automata8

Disambiguation

• Example

x × y + z

E → T | E + T | E − TT → F | T × FF → (E) | VV → x | y | z

V V V

F

T

FT

E

E

T

F

Page 25: MELJUN CORTES automata8

Disambiguation

• Can we always disambiguate a grammar?

• No, for two reasons– There exists an inherently ambiguous context-free L:

Every CFG for this language is ambiguous – There is no general procedure that can tell if a grammar is

ambiguous

• However, grammars used in programming languages can typically be disambiguated

Page 26: MELJUN CORTES automata8

Another Example

• Is ab, baba, abbbaa in L?

• How about a, bba?

• What is the language of this CFG?

• Is the CFG ambiguous?

S → aB | bAA → a | aS | bAAB → b | bS | aBB