ambiguity, ll1 grammars and table-driven parsing

Ambiguity, LL1 Grammars andTable-driven Parsing

Problems with Grammars

• Not all grammars are usable!– Ambiguous– Unproductive non-terminals– Unreachable rules

F = { E D | ( E ) | E + E | E – E | E * E | E / E ,

D 0 | 1 | … | 9 }

E

E E+

E E*D

D D1

2 3

E

EE *

E E+ D

D D

1 2

3

Ambiguous Grammar

1 + 2 * 3 G is ambiguous ifthere exists S in L(G),such that there aretwo different parsetrees for S

Multiple meanings:Precedence (1+2)*3≠1+(2*3)

Associativity (1-2)-3≠1-(2-3)

Fixing Precedence Ambiguity

F = { E D | ( E ) | E + E |

E – E | E * E | E / E ,

D 0 | 1 | … | 9 }

E T | E + T | E – T

T F | T * F | T / F

F D | ( E )

D 0 | 1 | … | 9

E

E T+

T F*T

F D

1 2

3

F

D D

Observe:Operators lower in the parse tree are executed first Operators executed first have higher precedenceFix:Introduce a new non-terminal symbol for each precedence level

Adding the Power Operator

E T | E+T | ET

T P | T*P | T/P

P F | FP

F D | (E)

D 0 | 1 | … | 9

E T | E + T | E – T

T F | T * F | T / F

F D | ( E )

D 0 | 1 | … | 9

Fixing Associative Ambiguity

E D | E D

E

E D

E D

D

3

2

1

(3 2) 1

Left recursion/Left associativity Right recursion/Right associativity

E D | D E

2 (3 2)

E

D E

2 D E

3

2

D

232

=

Unreachable Rules

= {S aABb ,

A a | aA ,

B b | bBD ,

C cD ,

D e }

F = { S aABb , A a | aA , B b | bBD , C cD , D e }

1. Initialize the set of reachable non-terminals R with the start symbol.

2. For each round, if R includes the lhs of a production rule, add the non-terminals in the rhs to R.

3. Loop on #2 until there are no changes to R.

4. Rules whose lhs’s are non-terminals in VN minus the non-terminals in R are the set of unreachable rules.

Initialize: R = {S}

Round 1: R = {S, A, B}

Round 2: R = {S, A, B, D}

Round 3: R = {S, A, B, D}

Done: no change: VN – {S, A, B, D} = {C}

Least-fixed point algorithm

Unproductive Non-terminals = { S aABb ,

A bC , B d | dB , C eC }

1. Start with the set of terminals T.

2. For each round, if T covers a rhs of a production rule, add the lhs to T.

3. Loop on #2 until there are no changes to T.

4. The alphabet of terminals and non-terminals, V, minus T is the set of unproductive non-terminals.

Least-fixed point algorithm

= { S aABb ,

A bC ,

B d | dB ,

C eC }

Initialize: T = {a, b, d, e}

Round 1: T = {a, b, d, e, B}

Round 2: T = {a, b, d, e, B}

Done: no change: {a, b, d, e, A, B, C, S} – T = {A, C, S}

C never produces all terminals.C eC eeC … enC

A also because it always produces CA bC beC … benC

S also because it always produce AS aABb aAbb abCbb …

E

N O E E

… + * N O E E N

… + N N 0 1 2

0 1 2 3 0 1 2 3 4

E N | OEEO + | | * | /N 0 | 1 | 2 | 3 | 4 *+342

Top-down Parsing with Backtracking

Prefix expressions associate an operator with the next two operandsE.g., *+324=(2+3)*4, *2+34=2*(3+4)

LL(1) Parsers• Problem:

– Never know what production to try (and very inefficient)

• Solution:– LL parser: parses input from Left to right, and constructs a

Leftmost derivation of the sentence– LL(k) parser uses k tokens of look-ahead

• LL(1) parsers:– Somewhat restrictive, BUT – Only need current non-terminal and next token to make

parsing decision• LL(1) parsers require LL(1) grammars

Simple LL(1) Grammars

All rules have the form:

A a11 | a22 | … | ann

whereai (1 ≤ i ≤ n) is a terminal

ai aj for i j

i (1 ≤ i ≤ n) is a sequence of terminals and non-terminals, or is empty

Creating Simple LL(1) Grammars

• By making all production rules of the form:

A a11 | a22 | … | ann

• Thus,

E 0 | 1 | 2 | 3 | 4 | +EE | EE | *EE | /EE

• Why is this not a simple LL(1) grammar?

E N | OEEO + | | * | /N 0 | 1 | 2 | 3 | 4

• How can we change it to simple LL(1)?

E (1)0 | (2)1 | (3)2 | (4)3 | (5)4 | (6)+EE | (7)EE | (8)*EE | (9)/EE

* + 2 3 4

E

2 * 3

E

?

* E E

8

E E+

6

2

3

3

44

5 E E

7

2

3

E E*

8

3

4

Success! Fail!

LL(1) Parsing

Simple LL(1) Parse TableA parse table is defined as follows:

(V {#}) (VT {#}) {(, i), pop, accept, error}where

– is the right side of production number i– # marks the end of the input string (# V)

If A (V {#}) is the symbol on top of the stack and a (VT {#}) is the current input symbol, then:

ACTION(A, a) = pop if A = a for a VT

accept if A = # and a = # (a, i) which means “pop, then push a and output

i” (A a is the ith production) error otherwise

Simple LL(1) Parse Table Example

E (1)0 | (2)1 | (3)2 | (4)3 | (5)+EE | (6)*EE

0 1 2 3 + * #

E (0,1) (1,2) (2,3) (3,4) (+EE,5) (*EE,6)

0 pop

1 pop

2 pop

3 pop

+ pop

* pop

# accept

V{#}

VT {#}

All blank entries are error

0 1 2 3 + * #

E (0,1) (1,2) (2,3) (3,4) (+EE,5) (*EE,6)

0,1,2,3,+,* pop pop pop pop pop pop

# accept

Action Stack Input Output

Initialize E# *+123#ACTION(E,*) = Replace [E,*EE], Out 6 *EE# *+123# 6ACTION(*,*) = pop(*,*) EE# *+123# 6ACTION(E,+) = Replace [E,+EE], Out 5 +EEE# *+123# 65ACTION(+,+) = pop(+,+) EEE# *+123# 65ACTION(E,1) = Replace [E,1], Out 2 1EE# *+123# 652ACTION(1,1) = pop(1,1) EE# *+123# 652ACTION(E,2) = Replace [E,2], Out 3 2E# *+123# 6523ACTION(2,2) = pop(2,2) E# *+123# 6523ACTION(E,3) = Replace [E,3], Out 4 3# *+123# 65234ACTION(3,3) = pop(3,3) # *+123# 65234ACTION(#,#) = accept Done!

Parse Table Execution: *+123

• Consider the following grammarE (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3

• Not simple LL(1): rules (1) & (2)• However:

– N leads only to {0, 1, 2, 3}– O leads only to {+, *}– {0, 1, 2, 3} {+, *} =

We can distinguish between rules (1) and (2):– If we see 0, 1, 2, or 3, we choose (1)– If we see + or *, we choose (2)

Relaxing Simple LL(1) Restrictions

LL(1) Grammars

• For any , define

FIRST() = { | * and VT}

• A grammar is LL(1) if for all rules of the form

A 1 | 2 | … | n

then,

FIRST(i) FIRST(j) = for i j(i.e., the sets FIRST(1), FIRST(2), …, and FIRST(n) are pairwise disjoint)

E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3

+ * 0 1 2 3 #E (OEE,2) (OEE,2) (N,1) (N,1) (N,1) (N,1)O (+,3) (*,4)N (0,5) (1,6) (2,7) (3,8)+ pop* pop0 pop1 pop2 pop3 pop# accept

V{#}

VT {#}

For (A, a), we select (, i) if a FIRST() and is the right hand side of rule i.

LL(1) Parse Table

+ * 0 1 2 3 #

E (OEE,2) (OEE,2) (N,1) (N,1) (N,1) (N,1)

O (+,3) (*,4)

N (0,5) (1,6) (2,7) (3,8)

+,*,0,1,2,3 pop pop pop pop pop pop

# accept

Action Stack Input Output

Initialize E# *+123#ACTION(E,*) = Replace [E,OEE], Out 2 OEE# *+123# 2

ACTION(*,*) = pop(*,*) EE# *+123# 24ACTION(E,+) = Replace [E,OEE], Out 2 OEEE# *+123# 242

ACTION(+,+) = pop(+,+) EEE# *+123# 2423

ACTION(N,1) = Replace [N,1], Out 6 1EE# *+123# 242316ACTION(1,1) = pop(1,1) EE# *+123# 242316ACTION(E,2) = Replace [E,N], Out 1 NE# *+123# 2423161

ACTION(2,2) = pop(2,2) E# *+123# 24231617ACTION(E,3) = Replace [E,N], Out 1 N# *+123# 242316171

ACTION(3,3) = pop(3,3) # *+123# 2423161718ACTION(#,#) = accept Done!

ACTION(O,*) = Replace [O,*], Out 4 *EE# *+123# 24

ACTION(O,+) = Replace [O,+], Out 3 +EEE# *+123# 2423

ACTION(E,1) = Replace [E,N], Out 1 NEE# *+123# 24231

ACTION(N,2) = Replace [N,2], Out 7 2E# *+123# 24231617

ACTION(N,3) = Replace [N,3], Out 8 3# *+123# 2423161718

Parse Table Execution Revisited: *+123

What does 2 4 2 3 1 6 1 7 1 8 mean?

E (1)N | (2)OEEO (3)+ | (4)*N (5)0 | (6)1 | (7)2 | (8)3

E

(2)OEE

(1)N

(6)1 (7)2

(8)3

(4)* (2)OEE (1)N

(3)+ (1)N

2 4 2 3 1 6 1 7 1 8 defines a parse tree via a preorder traversal

ambiguity, ll1 grammars and table-driven parsing

Documents

e d e e e e e e

e t e t e tt f t

f t ff d e d

b d db

power operatore t e

b b bbd

set of terminals t

minus t