context-free grammars and languagesilyas/courses/bbm401/lec05_cfg.pdf · context-free grammars and...

29
Context-Free Grammars and Languages BBM 401 - Automata Theory and Formal Languages 1

Upload: dinhnga

Post on 15-Mar-2019

242 views

Category:

Documents


0 download

TRANSCRIPT

Context-Free Grammars and Languages

BBM 401 - Automata Theory and Formal Languages 1

Context-Free Grammars and Languages

• We have seen that many languages cannot be regular. Thus we need to consider larger classes of languages.

• Contex-Free Languages (CFL's) played a central role in natural languages, and compilers.

• Context-Free Grammars (CFG's) are used to define Contex-Free Languages (CFL's)

• We will talk about:– CFL– CFG– Derivations– Parse Trees– Pushdown Automata– Closure properties of CFL's

BBM 401 - Automata Theory and Formal Languages 2

An Example of CFG

• Consider the language of palindromesLpal = {w* : w = wR }

• Some members of Lpal : abba bob ses tat • Lpal is NOT regular. It can be shown that it is not regular by the

pumping lemma.• Context-Free Grammars (CFGs) are a formal mechanism for

definitions of Context-Free Languages (CFLs).• Lpal is a context-free language

BBM 401 - Automata Theory and Formal Languages 3

An Example of CFG - Lpal

• Let ={0,1} be the alphabet for Lpal : 0110 010 0000 1111 Lpal

• CFG for Lpal can be defined as:1: P 2: P 03: P 14: P 0P05: P 1P1

• 0 and 1 are terminals• P is a variable (or nonterminal)• P is in this grammar also the start symbol.• 1-5 are productions (or rules)

BBM 401 - Automata Theory and Formal Languages 4

Formal Definition of CFG's

• A context-free grammar is a quadruple

G = (V, T, P, S)where

• V is a finite set of variables (non-terminals).• T is a finite set of terminals.• P is a finite set of productions of the form A , where A is a

variable and (VT)*• S is a designated variable called the start symbol.

BBM 401 - Automata Theory and Formal Languages 5

An Example of CFG - Gpal

Gpal = ( {P}, {0,1}, {P, P0, P1, P0P0, P1P1}, P )

•Sometimes we group productions with the same head

P | 0 | 1 | 0P0 | 1P1

BBM 401 - Automata Theory and Formal Languages 6

Derivation

• We expand the start symbol using one of its productions (i.e., using a production whose head is the start symbol).

• We further expand the resulting string by replacing one of the variables by the body of one of its productions, and so on, until we derive a string consisting entirely of terminals.

• The language of the grammar is all strings of terminals that we can obtain in this way.

• This use of grammars is called derivation.

BBM 401 - Automata Theory and Formal Languages 7

Derivation,

BBM 401 - Automata Theory and Formal Languages 8

Derivation Sequence

BBM 401 - Automata Theory and Formal Languages 9

• We may extend the => relationship to represent zero, one, or many derivation steps,

• We use a =>* to denote "zero or more steps," of derivation sequence.Derivation Sequence:Basis:

– For any string of terminals and variables, we say =>* . – That is, any string derives itself.

Induction:– If =>* and => , then =>* . – That is, if can become by zero or more steps, and one more step

takes to , then can become

Derivation Sequence

In other words, =>* means that there is a sequence of strings 1, 2, … , n for some n≥ 1 such that

1. = 1 ,

2. = n , and

3. For i=1,2,…,n, we have i => i+1

BBM 401 - Automata Theory and Formal Languages 10

Derivation Sequence – Example1

• P => 0P0 => 01P10 => 010P010 => 0101010 is a derivation sequence• P derives 0101010; or 0101010 is derived from P• That is P =>* 0101010, and also

– P =>* 010P010 – 0P0 =>* 010P010 – 0P0 =>* 0P0

BBM 401 - Automata Theory and Formal Languages 11

Derivation Sequence – Example2

Grammar:S ASB | cA | aAB | bB

Derivation Sequences for acbS => ASB => aASB => aSB => acB => acbB => acbS => ASB => ASbB => ASb => aAcb => aAcb => acbS => ASB => AcB => aAcB => aAcbB => acbB => acb

We may select any non-terminal (variable) of the string for the replacement in eac derivation step.

BBM 401 - Automata Theory and Formal Languages 12

Leftmost and Rightmost Derivations

• Leftmost Derivation always replace the leftmost variable by one of its rule-bodies. =>lm

S =>lm ASB =>lm aASB =>lm aSB =>lm acB =>lm acbB =>lm acb

• Rightmost Derivation always replace the leftmost variable by one of its rule-bodies. =>rm

S =>rm ASB =>rm ASbB =>rm ASb =>rm aAcb =>rm aAcb =>rm acb

BBM 401 - Automata Theory and Formal Languages 13

The Language of a Grammar

• If G(V, T, P, S) is a CFG, then the language of G is

L(G) = { w T* : S =>* w }

• i.e. the set of strings of terminals (strings over T*) that are derivable from S

• If G is a CFG, we call L(G) as a context-free language.• Ex: L(Gpal) is a context-free language.

• For each CFL, there is a CFG, and each CFG generates a CFL.• Every regular language is a CFL. • That is, regular languages are a proper subset of context-free languages.

BBM 401 - Automata Theory and Formal Languages 14

Theorem 5.7 - Proof

Theorem 5.7: L(Gpal) = { w {0,1}* | w = wR }

Proof: ( Direction) If wLpal then wL(Gpal), ie Gpal can generate w• Suppose w = wR

• We show by induction on |w| that wL(Gpal)

Basis:• |w|=0, or |w|=1. • Then w is , 0, or 1• Since P , P0 and P1 are productions of Gpal, we can conclude

that P =>* w in all base cases

BBM 401 - Automata Theory and Formal Languages 15

Theorem 5.7 – Proof ( Direction)

Induction:• Suppose |w|2• Since w=wR , we have w=0x0, or w=1x1, and x=xR

Case1:– If w=0x0, by IH we know that P =>* x– Then, by the structure of the grammar P => 0P0 =>* 0x0

where 0x0=wCase2:

– If w=1x1, by IH we know that P =>* x– Then, by the structure of the grammar P => 1P1 =>* 1x1

where 1x1=w

BBM 401 - Automata Theory and Formal Languages 16

Theorem 5.7 – Proof ( Direction)

Proof: ( Direction)• We assume that wL(Gpal) and we must show that w=wR.• Since wL(Gpal), we have P =>* w • We do an induction of the length of =>*

Basis:• The derivation P =>* w is done in one step.• Then w must be , 0, or 1, they are all palindromes.

BBM 401 - Automata Theory and Formal Languages 17

Theorem 5.7 – Proof ( Direction)

Induction:• Let n2, ie derivation takes n steps• Derivation must be

– P => 0P0 =>* 0x0 = w or– P => 1P1 =>* 1x1 = w

since n2, and the productions P0P0 and P1P1 are the only productions whose use allows additional steps of a derivation.

• Note that in either case, P =>* x n-1 steps. • By the inductive hypothesis, we know that x is a palindrome;• But if so, then 0x0 and lxl are also palindromes. • We conclude that w is a palindrome, which completes the proof.

BBM 401 - Automata Theory and Formal Languages 18

Sentential Forms

• Let G = (V, T, P, S) be a CFG, and (VT)*• If S =>* , we say that is a sentential form.

• If S =>*lm , we say that is a left-sentential form.• If S =>*rm , we say that is a right-sentential form.

• L(G) is those sentential forms that are in T*.

BBM 401 - Automata Theory and Formal Languages 19

Parse Trees

• If w L(G), for some CFG, then w has a parse tree, which tells us the (syntactic) structure of w

• Parse trees are an alternative representation to derivations.

• There can be several parse trees for the same string• Ideally there should be only one parse tree (the \true" structure) for

each string, i.e. The language should be unambiguous.• Unfortunately, we cannot always remove the ambiguity.

– We may remove the ambiguity for some CFGs

BBM 401 - Automata Theory and Formal Languages 20

Constructing Parse Trees

• Let G = (V, T, P, S) be a CFG.

• A tree is a parse tree for G if:1. Each interior node is labeled by a variable in V .2. Each leaf is labeled by a symbol in VT{}. Any -labeled leaf

is the only child of its parent.3. If an interior node is labeled A, and its children (from left to right)

labeledX1,X2,…,Xk

then AX1X2…Xk P.

BBM 401 - Automata Theory and Formal Languages 21

Parse Tree - Example

Grammar: A Derivation Sequence of acbS ASB | c S => ASB => aASB => aSB => acB => acbB => acb

A | aAB | bB

S Parse Tree of acb

A S B

a A c b B

BBM 401 - Automata Theory and Formal Languages 22

The Yield of a Parse Tree

• The yield of a parse tree is the string of leaves from left to right.

• Important are those parse trees where:1. The yield is a terminal string.2. The root is labeled by the start symbol

• We shall see the the set of yields of these important parse trees is the language of the grammar.

BBM 401 - Automata Theory and Formal Languages 23

Derivations and ParseTrees

Theorem: If there is a derivation of a string w, we can construct its parse tree.

Proof: Induction on the legth of derivation.

Theorem: If there is a parse tree of a string w, we can construct its derivation (leftmost, rightmost).

Proof: Induction on the height of parse tree.

BBM 401 - Automata Theory and Formal Languages 24

Ambiguity

BBM 401 - Automata Theory and Formal Languages 25

• A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar.

E E+E id+E id+E*E id+id*E id+id*id

E E*E E+E*E id+E*E id+id*E id+id*id

E

E +

id E

E

* E

id id

E

id

E +

id

id

E

E

* E

Ambiguity – Operator Precedence

BBM 401 - Automata Theory and Formal Languages 26

Leftmost derivations and Ambiguity

• In general, there can be one parse tree for a string w, but there can be many derivations for w. This does not mean that the grammar is ambiguous.

• But if there are many leftmost derivations for a string w, it implies thatthere are many parse trees for that string (ambiguous).

• Many rightmost derivation also implies many parse trees.

• For any CFG G, a terminal string w has two distinct parse trees if and only if w has two distinct leftmost derivations from the start symbol.

BBM 401 - Automata Theory and Formal Languages 27

Inherent Ambiguity

• A CFL L is inherently ambiguous if all grammars for L are ambiguous.

BBM 401 - Automata Theory and Formal Languages 28

Inherent Ambiguity

• Let's look at parsing the string aabbccdd.

BBM 401 - Automata Theory and Formal Languages 29

It can be shown that everygrammar for L behaves like the one above. The language L isinherently ambiguous.