chapter 11. ambiguity of context-free grammars

21
1 Models of Language Generation: Grammars

Upload: kims3515354178

Post on 18-Nov-2014

1.800 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 11. Ambiguity of Context-free Grammars

1

Models of Language Generation: Grammars

Page 2: Chapter 11. Ambiguity of Context-free Grammars

2

11. Ambiguity of Context-free Grammars

Ambiguity in a language occurs either when a symbol or an expression has more than one meaning (e.g., story), or

when an expression can be (grammatically) parsed in two different ways. The former is called lexical (or semantic) ambiguity, and the later syntactic (or structural) ambiguity. For example, in natural language, the sentence “A man entered the room with a picture” can be interpreted (i.e., parsed) into two different grammatical structures as follows.

manA

entered room

thewith picture

a

manA

entered room

thewith picturea

This sentence is syntactically ambiguous. With no further information, it is impossible to know which way the sentence should be translated. In formal language, given a grammar G and a sentence x (i.e., a string, in the formal language jargon), parsing shows how x can be derived by the grammar. If x can be derived in two different ways, grammar G is ambiguous. Parsing is one of the main functions of the compiler of a programming language. In this chapter we will study syntactic ambiguity.

Page 3: Chapter 11. Ambiguity of Context-free Grammars

3

Ambiguity

Dear Dad & Dear Son

Dear Dad,$chool i$ really great. I am making lot$ of friend$ and $tudying very hard. With all my $tuff, I $imply can`t think of anything I need, $o if you would like, you can ju$t $end me a card, a$ I would love to hear from you.Love,Your $on

The Reply:Dear Son,I kNOw that astroNOmy, ecoNOmics, and oceaNOgraphy are eNOugh to keep even an hoNOr student busy. Do NOt forget that the pursuit of kNOwledge is a NOble task, and you can never study eNOugh.Love,Dad - Adrea -

Break Time

11.1 Parse tree 290 11.2 Parse Tree and Ambiguity 293 11.3 Eliminating Ambiguity of an ambiguous CFG 295 Using parenthesis, Fixing the order of rule applications Eliminating redundant rules Setting up precedence and associativity Rumination 305 Exercises 306

Page 4: Chapter 11. Ambiguity of Context-free Grammars

4

11.1 Parse Tree

Ambiguity

In the formal language, the syntax (i.e., structure) of a string generated by a grammar depends on the rules applied as well as their order of application. Syntax provides critical information for the compiler to translate the string (i.e., a program) into an object code. Thus, if a string can be derived in two different ways, it is impossible to give a unique translation.

A parse tree is an efficient data structure to represent the syntax of a string derived by the grammar and to translate it by the compiler. For example, figure (b) below shows a parse tree for string pqr generated by grammar G in figure (a).

SS S

SSA

A Ap

q r

(b)(a)

G: S SS | SS | S | A

A p | q | r

Page 5: Chapter 11. Ambiguity of Context-free Grammars

5

Given a derivation (i.e., sequence of rules applied) for a string w, the parse tree for w with respect to the derivation is constructed as follows. First put the root node with the start symbol S. Then, for each leaf node on the current tree with a nonterminal label, say A, recursively expand the tree as follows: Suppose that A → is a rule applied next to derive w. For each symbol X appearing in , in the order from left to right, a child node with label X is add to A. This procedure repeats until the tree has no leaf nodes (labeled with a nonterminal symbol) to expand. Reading all the leaf nodes left to right on the final tree should give the string w. This string of terminal symbols is called the yield of the parse tree.

AmbiguityParse Tree

SS S

SSA

A Ap

q r

(b)(a)

G: S SS | SS | S | A

A p | q | r

Page 6: Chapter 11. Ambiguity of Context-free Grammars

6

parse tree

In general, given a source code in programming environments, the compiler constructs a parse tree of the code and then traversing the tree bottom up, left to right, generates an object code (machine language program). For example, suppose that for the string pqr the compiler has generated the parse tree in figure (b) below.

The compiler generates machine language instructions that will access the values of variables q and r, compute qr and store the result (usually in a register), access the value of p, and finally execute the OR operation with the stored result of qr.

Because of the way of traversing the tree, bottom up, left to right, to generate the object code, the order of logical operations depends on the tree. For the example, the compiler generates an object code that will execute the logical expression pqr in the order of p( qr ).

Ambiguity

SS S

SSA

A Ap

q r

(b)(a)

G: S SS | SS | S | A

A p | q | r

Page 7: Chapter 11. Ambiguity of Context-free Grammars

7

(b) p(qr)

SS S

SSA

A Ap

q r

SS S

SS A

A A

p q

r

(c) (pq)r

G: S SS | SS | S | A

A p | q | r

(a)

11.2 Parse Tree and Ambiguity

Ambiguity

The two parse trees in figures (b) and (c) below show two parse trees yielding the same string pqr. In other words, it can be derived by grammar G in figure (a) in two ways. Consequently, the two parse trees imply that the expression pqr can be evaluated in two different ways, i.e., p(qr) and (p)qr. This implies that for grammar G, the operator precedence between the two logical operations (OR) and (AND) is ambiguous.

Page 8: Chapter 11. Ambiguity of Context-free Grammars

8

As we saw in the previous example, the existence of two parse trees yielding the same string is a problematic property, called ambiguity, of a CFG that should be eliminated. In real application, we cannot expect the correct result from a program written in the language of an ambiguous grammar. Before we discuss how to eliminate ambiguity from a CFG, we need a formal definition of it.

Definition (Ambiguity): A CFG G is ambiguous if there is a string x L(G) for which there are two parse trees yielding x.

Unfortunately, it is an unsolvable problem to decide whether an arbitrary CFG is ambiguous or not. Also, there is no algorithm available that given an ambiguous CFG, converts it to an unambiguous grammar. However, for a certain restricted construct, it is possible to solve the problems. In this section we will present several techniques with some examples.

AmbiguityParse Tree and Ambiguity

Page 9: Chapter 11. Ambiguity of Context-free Grammars

9

11.3 Eliminating Ambiguity of a CFG

(1) Binding with parenthesis. Example: We know that the CFG G1 below is ambiguous because there are two parse trees yielding the same string pqr. The ambiguity occurs because it can generate the same string by applying S SS followed by S SS, or vice versa, as shown, respectively, in figure (a) and figure (b).

G1: S SS | SS | S | A

A p | q | r

Ambiguity

SS S

SSA

A Ap

q r

SS S

SS A

A A

p q

r

(a) (b)

Page 10: Chapter 11. Ambiguity of Context-free Grammars

10

Eliminating Ambiguity

(b): ((p q ) r)

SS S

SS A

A A

p q

r

( )

( )

(a): (p (q r))

SS S

SA

A Ap

q r

( )

( )S

Ambiguous G1: S SS | SS | S | A A p | q | r

Unambiguous G2: S (SS) | (SS) | S | A A p | q | r

This ambiguity can be eliminated by parenthesizing the right side of those two rules as shown in G2 below. The parentheses make the yields of the two parse trees different as shown in figures (a) and (b).

Ambiguity

Page 11: Chapter 11. Ambiguity of Context-free Grammars

11

The parenthesizing technique is simple, but has a serious drawback, because we are altering the language by adding new terminal symbols, i.e., the parentheses. However, this is a popular technique used in programming languages. Instead of the parentheses they use other notations, for example, in Pascal “begin” and “end,” and in C and C++, the braces ‘{’ and ‘}’.

AmbiguityEliminating Ambiguity

(2) Fixing the order of applying rules.

Example 1. The language generated by CFG G3 below is {bicbj | i, j 0}. This

grammar is ambiguous because, for example, the string bcb can be derived by generating the left side b first then the right side b, or vice versa, as shown below.

S

bS

bS

c

Sb

S

bS

c

G3 : S bS | Sb | c

Page 12: Chapter 11. Ambiguity of Context-free Grammars

12

Ambiguous G3 : S bS | Sb | c

Unambiguous G4 : S bS | A A Ab | c

We can simply modify the grammar G3 to G4 as shown below such that left side b’s, if any, are always generated first. Figure (b) is the only parse tree for string bcb. Grammar G4 is unambiguous.

S

bS

bS

c

Sb

S

bS

c

(a)

Sb

S

bA

A

c

(b)

AmbiguityEliminating Ambiguity

Page 13: Chapter 11. Ambiguity of Context-free Grammars

13

Example 2. Using the technique of fixing the order of derivation, the ambiguous CFG G1 that we have examined can be converted to an unambiguous grammar G5 shown in figure (a). Notice that this grammar generates the operators left to right in the order they appear in the string.

S

SA

SA

SAp

q

S

A

pq

(b): pqqp

G1: S SS | SS | S | A A p | q | r

Ambiguous:

G5 : S AS | AS | S | A A p | q | r

Unambiguous:

(a)

AmbiguityEliminating Ambiguity

Page 14: Chapter 11. Ambiguity of Context-free Grammars

14

(3) Eliminating redundant rules

Ambiguous G8 : S BD B bBc D dDeUnambiguous G9 : S BD B bBcbc D dDede

The CFG G8 below is ambiguous because it can generate in two ways. Applying the technique for minimizing the number of -production rules, we can convert it to an unambiguous grammar G9.

Ambiguous G6 : S BD B abb D abd

Unambiguous G7 : S BD B abb D d

The CFG G6 below is ambiguous because it can generate ab either by B or D. We can simply delete one of the two and make the grammar unambiguous (see G7).

AmbiguityEliminating Ambiguity

Page 15: Chapter 11. Ambiguity of Context-free Grammars

15

(4) Implementing operator precedence and associativity

Operator precedence and associativity are important rules for evaluating mathematical expressions. In programming languages, the rules are defined by the grammar. As we know, multiplication (*) and division (/) are given higher precedence than addition (+) and subtraction (-). The assignment operator (=) is given the lowest. Operator = is right associative, and all the others are left associative. According to this order of precedence and associativity, the mathematical expression in figure (a) will be evaluated as shown in figure (b).

a = b + c * d – e / f

(a)

Ambiguity

(b)

a = ((b + (c * d)) – (e / f))

(5) (3) (1) (4) (2)

Eliminating Ambiguity

Page 16: Chapter 11. Ambiguity of Context-free Grammars

16

AmbiguousG1: S SSSSSA A pqr

Unambiguous

G10 : S DSD D CDC

C CA A pqr

(a)

C~

A

D

C D

A

A

SSD

DS

C

C

A C

p q r

p(b)

Example. Assume that the logical operators , , and are right associative with precedence given in that order (i.e., is at the top followed by , and at the bottom). The ambiguous CFG G1 (repeated below) can be modified into an unambiguous G10 by implementing the precedence and associativity into the production rules.

Notice that every OR () operator in the string should be generated by S DS before generating others. Then AND () operators, if any, are generated by D CD, and finally NOT () operators. Also notice that with D, there is no way to derive , and with C neither nor can be derived.

AmbiguityEliminating Ambiguity

Page 17: Chapter 11. Ambiguity of Context-free Grammars

17

G10 : S DSD D CDC

C CA A pqrC~

A

D

C D

A

A

SSD

DS

C

C

A C

p q r

p(b)

This fixed order of rule applications facilitates the compiler generating an object code to evaluate the string (i.e., a logical expression) according to the operator precedence and associativity.

For example, because NOT () operators are generated last, they appear on a subtree of the parent node of a leaf node labeled with or . Hence, the compiler, traversing the tree bottom up, left to right, generates the instructions which will execute the NOT () operators before the instructions executing other operators. Similarly, we can see how the compiler generates the instructions to execute AND () operators before OR () operators.

AmbiguityEliminating Ambiguity

(a)

p q r p ((p) (q (r p))) (1) (4) (3) (2)

Page 18: Chapter 11. Ambiguity of Context-free Grammars

18

Now, for the associativity, notice that identical operators are generated left to right in the order they appear in the string. For example, all OR () operators are derived by applying S DS recursively. Thus, the compiler, traversing the parse tree bottom up, left to right, will generate an object code to evaluate the logic expression following the order of right associativity.

If we want to implement left associativity for an operator used in the language, we can simply reverse the right side of the rule which derives it. For example, we use S SD instead to make operator left associative.

G10 : S DSD D CDC

C CA A pqr C~

A

D

C D

A

A

SSD

DS

C

C

A C

p q r

p(b)

Eliminating Ambiguity Ambiguity

(a)

p q r p ((p) (q (r p))) (1) (4) (3) (2)

Page 19: Chapter 11. Ambiguity of Context-free Grammars

19

We learned that for a simple CFG, it is possible to analyze it to decide whether it is ambiguous or not, and if it is ambiguous, convert it to an unambiguous one. However, as we mentioned before, most of the problems concerning ambiguity of CFG’s in general are very difficult or even unsolvable. Here are some interesting facts.

Rumination (1): ambiguity

Ambiguity

• There are some languages that can only be generated by ambiguous context-free grammars. Such languages are call inherently ambiguous. The following language L is an example.

L = {anbncm | m, n 1 } {anbmcm | m, n 1}

• It is an unsolvable problem to tell whether an arbitrary CFG is ambiguous or not.

• It is an unsolvable problem to convert an arbitrary ambiguous CFG, whose language is not inherently ambiguous, to an unambiguous CFG.

Today’s Quote

Be slow in choosing a friend, slower in changing. - Benjamin Franklin -

Break Time

Page 20: Chapter 11. Ambiguity of Context-free Grammars

20

Exercises

11.2 (a) Show that the following CFG is ambiguous.

G: S S + S | S * S | T T a | b

(b) In the CFG G above, let + and * be the addition and multiplication operators, respectively, and a and b be integer variables. Convert G to a CFG G’ that satisfies the following conditions.

(i) L(G) = L(G’) (ii) Operator * has higher precedence than operator +. (iii) Both operators are left associative.You should also present a parse tree showing that your grammar meets the required order of operator precedence andassociativity.

11.1 Show that each of the following CFG’s is ambiguous, and convert it to an unambiguous CFG.

(a) S Sa | A | a A a | bA

(b) S AB | A a | B b |

Ambiguity

Page 21: Chapter 11. Ambiguity of Context-free Grammars

21

11.3 The syntax flow graph below defines a simplified <If-statement> of the Pascal programming language. Following

the convention, symbols in a circle or oval are terminals, and the words in rectangles correspond to nonterminal symbols.

(a) Transform the flow graph to a CFG G and show that G is ambiguous.

(b) Convert grammar G to an unambiguous CFG and explain in detail how you got your answer.

<IF-statement>

IF ( <bool> ) then else<statement> <statement>

c

d

<bool>

<IF-statement>

a + b

<statement> a - b

The following problem is concerned with the ambiguity of the if-statement, which appears often in the text. Question (a)

is easy, but question (b) is challenging.

Exercises Ambiguity