inf5110 ch. 5: bottom-up parsing part 1€¦ · ch. 5: bottom-up parsing part 1 12/2-2015 stein...

35
1 INF5110 Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl Ifi, UiO Mandatory assignment 1: Will occur sometime during the week from 16/2

Upload: others

Post on 11-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

1

INF5110 Ch. 5: Bottom-up parsing

Part 1

12/2-2015 Stein Krogdahl

Ifi, UiO

Mandatory assignment 1: Will occur sometime during the week from 16/2

Page 2: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

2

Bottom up parsing S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B

B

A

The methods listed below are, in the given order, able to handle more and more complicated grammars. Each «state» is represented as a row in the table above.

- LR(0) Can handle only very simple grammars. Require about 300 states for a standard programming language. Only as an intro. to SLR(1) and LALR(1).

- SLR(1) Can take most standard grammars for standard PLs. The same number of states as the LR(0) method. This method will be our main focus

- LALR(1) Can handle a few more grammars than SLR(1). Again the same number of states as the LR(0) method. We will look at the ideas behind this method

- LR(1) Can handle all grammars that in any way can be handled by looking at only the next token. The number of states will be around 3000.

Automated tools that, from a BNF grammar, can deliver a parser:

-YACC Bison CUP ( all of the uses LALR(1)-techniques)

States

Tokens + nonterminals

Table for LR parsing

Page 3: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

3

Data structure for LR-parsing

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

Assume that the grammar is unambiguous, that we are given a correct sentence «t1 t2 … t7» and that we know the parse tree for this sentence:

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B B

A

S’

LR-parsing: • Have a stack representing what we have read

• Make a «reduction» of a subtree when «it occurs» at the top of the stack

Add a new «outermost» production. Thus, the new start symbol S’ will never occur in the right hand side of a production.

Page 4: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

4

More about LR-parsing

And we assume thatwe know the parse tree for this sentence

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B B

A

S’

• We have a stack representing what is read

• Make a «reduction» of a subtree when it appears at the top of the stack.

• A reduction is to replace this with the non-terminal that produced this tree.

• A reduction: To use a production backwards

Start-situation: $ t 1 t 2 t 3 t 7 $ t 4 t 5 t 6

stack input

stack input

Slutt-situasjonen: $ S’ $

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

New outermost production

Page 5: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

The principles of LR-parsing, and the shift and reduce operations

S’ → S

S → A B t 7 | ....

A → t 4 t 5 | t 1 B |

B → t 2 t 3 | A t 6 | ....

stakk input

$ t1 t2 t3 t4 t5 t6 t7$

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 t2 t3 t4 t5 t6 t7 $

$ t1 B t4 t5 t6 t7 $

$ A t4 t5 t6 t7 $

$ A t4 t5 t6 t7$

$ A t4 t5 t6 t7 $

$ A A t6 t7 $

$ A A t6 t7 $

$ A B t7 $

$ A B t7 $

$ S $

S

A

t 1 t 2

t 3 t 7 t 4 t 5 t 6

B

B A

S’

• There are two types of steps:

• Shift: Move the next input symbol over to the top of the stack.

• Reduction: Remove the symbols of the rightmost subtree from the stack, and replace it by the nonterminal at the root of the subtree.

If you know the parse tree it is easy to perform these steps correctly.

BUT: How can we do this without knowing:

- The full syntax tree

- The rest of the input

Page 6: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

6

For comparison: Top down and bottom up parsing

Analysed and found OK token Will be matched against the rest of A.

C

S

A

B

Which alternative or A?

Input

• In top-down (e.g. recursive descent) parsing we «produce» the syntax tree aiming to get the given sentence, while using the productions of the grammar.

• During bottom-up (e.g LR) parsing we «reduce» the input according to productions, while aiming at the start symbol.

Page 7: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

7

Example showing LR-parsing (when we know the tree!)

Right derivation (but the example is not good!)

E’

E

E

n + n

In our textbook, but we do not stress this:

• The next reduction that should be made is called the ”handle” of the situation.

• ”stack + input” will always form a sentenial form occuring during a right derivation. They occur in the opposite order of which they occur in a right derivation.

Page 8: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

8

Another example: LR-parsing from grammars with empty productions

S’

S

S S

ε ε

( )

NB: S → ε will appear a little strange: During a reduction with this production a nonterminal will show up at the top of the stack «from nowhere», here and here.

Page 9: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

9

A typical situation during LR-parsing

$ the stack token rest of input

s1s2s3s4 ....... sk

The stack is reduced version of the processed input

S’

S

A

B

t 1 t 2 t 3 ......t n $ t 4 t 5 t 6

All these are reduced

After a shift, the next reduction to be made is a reduction with the production:

C -> t1

Then, after two shifts, we will make a reduction with the production:

D -> t2 t3

Then, what’s next? D

C

Means that they are the same node

Page 10: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

10

A plan for solving «The LR-problem»: «When to do a shift and when to do a reduction (with what production)»

(Not everything on the slides is found in our book. Only what’s in the book is curriculum)

We look at all possible stacks that can occur during LR-parsing of a sentence in L(G). We consider them as a new language Stacks(G), over the following alphabet:

{terminals} ∪ {non-terminals} Stacks(G) = {string s | s may occur as the stack during LR-parsing of a

sentence in L(G))}

This language turns out to be regular, and can be described by an NFA where all states are accepting. These states are identified by ”items” of the form: A → X Y . Z The possible state transitions of the NFA can also be described rather

straight-forward.

We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)

Each state of this DFA will be a subset of the NFA states, and thus be sets of items.

Page 11: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

11

The rest of: The plan for solving «The LR-problem»:

The new language Stacks(G), turns out to be regular, as it can be described by an NFA

In this NFA all states are accepting

The states of the NFA are all the possible ”items” of the grammar:

The position of the dot (e.g. in «A → X Y . Z» ) will play a central role in the later analyses.

We will turn this NFA into a DFA in the standard way (subset construction from ch. 2)

Thus, each state of this DFA will consist of a set of items, and these items will indicate the possible «local situations» of the parsing (see: «The main proposition for LR-parsing»).

Page 12: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

12

The NFA: The states and the transitions This is not fully explained in our book, and is not curriculum

S’

S

A

B

$

A

Stack S: Original input:

Given a grammar G and a randomly chosen sentence in L(G), and a randomly chosen point during LR parsing of this sentence.

$

We will describe an NFA that will accept exactly all such «lines», seen as a sequence of symbols.

This sequence is the content of the stack

Note: No such subtrees remain under the left side branches of the remaining tree. They are all reduced to their root symbol.

Means that they are the same nodes

Page 13: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

13

The state transitions of the NFA

This is not fully explained in our book, and is not curriculum S’

S

A

B

$

X α

X

η

α η

corresponds to A

ε β

Stack: Original input:

x

Y

$

We have to show:

(1) For any such situation, the left edge of the tree must be accepted by the NFA

(2) For any path through the NFA, we can set up a parse tree that has this path as its left edge.

The same type of «random situation» as before

corresponds to

If X is a non-terminal:

X can be either a non-terminalor a terminal:

Page 14: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

14

Example: The NFA that describes all possible stacks More precisely: The LR(0)-NFA

E ’ → E

E → E + n

E → n

Items (LR(0)-items) E’

E

E

+ n start

E→ E+.n E→ E+n. E → E. +n

E ’ → E. E ’ → .E

E → .E +n

E → n. E → .n

ε

ε

ε

E

E

n

n +

ε

The same NFA, in a slightly more orderly form:

Page 15: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

Again: The LR(0)-NFA (Slightly more ordered than in the textbook):

LR(0)-DFA, made by the «subset construction» (Ch. 2):

E→ E+.n E→ E+n. E → E. +n

E ’ → E. E ’ → .E

E → .E +n

E → n. E → .n

ε

ε

ε

E

E

n

n +

start

ε

E’

E

E . + n

E’

E .

”E” on the stack ends in state 1. The two possible situation can then be exemplified by:

Closure None of the other states will have any closure

Page 16: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

16

How to construct the LR(0)-DFA directly from the grammar.

(Straight ahead use of the «subset construction»)

Closure of a set I of items: If:

A → α • Bγ is an item in I B is a non-terminal, and B → β 1 | β 2 | ...

Then these items B → • β 1

B → • β 2

should also be included in I.

The start state of the LR(0)-DFA is:

State transition for symbol X from state I X is a terminal or a

nonterminal (they are here treated the same way

S’ → • S

+ closure

Make sure that all items of the form «A → α • X β» are included

........

A1 → α1 • X β1

........

A2 → α2 • X β2

........

A1 → α1 X • β1

A2 → α2 X • β2

+ closure

X

State I:

Page 17: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

Another example: How to construct the LR(0)-DFA directly from the grammar

Note that S → ε gives only one item, which is: S → • Not like this : S → • ε, and S → ε •

LR(0) – NFA:

LR(0) – DFA:

The items:

Page 18: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

18

What is the ”top state” telling us?

The top state: The DFA-state that we arrive at when

the stack is fed trough the DFA.

The «Main proposition for LR-parsing»: The items in the top-state will indicate

all possible «local situations» that can occur in an LR parse, with the current stack.

S’

S

B

X

A

α β

Stakk:

If the item X → α β is a member of the top-state, then the situation may be as shown to the right

The rest of the input:

Token

Page 19: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

When is shift a possibility?

........

X → α • a β

........

........

X → α a • β

........

a

s t

This tells us that the situation may be as in the figure to the right. Thus:

• Shift is a possible operation

• Also: If shift is the correct operation and «a» is a terminal symbol equal to the token symbol, then the state after the shift will be t.

Assume here that the top state s contains the item X → α a β, where «a» may be either a terminal or a non-terminal

19

S’

S

B

X

A

α a β

Stack:

The rest of the input

Token

Page 20: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

When is a reduction (with a given production) a possibility?

........

A → γ • .......

S: Is called a complete item («slutt-item»)

Current stack

... v u w z s

New stack:

... v u t

γ

This indicates that the situation locally may be as in the figure to the right, and that the next step might be a reduction with A → γ

A A …

u …

t

NEW: We remember the states between the stack symbols!

The reduction step: Pop off what corresponds to γ (and the states in between), and push A as the new top symbol, and find the new top state from u and A

Assume that top state is s, and that this state has the item A → γ

20

S’

S

B

A

C

Stakk:

The rest of input

γ

Part of the DFA:

Token

Page 21: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

21

Our old example, and LR(0) grammars

E’

E

E

n + n $

Should shift

Should reduce with E’→E

Def. of LR(0)-grammar: «The top state uniquely decides the next step»

Thus: The above grammar is not LR(0), because of state 2 (stack = $ E)

If the stack is: «E», the top state is 1, and we can either shift or reduce with E’ → E:

Page 22: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

22

Are these example grammars LR(0) ? Grammar is LR(0) iff: For every state, only one action is possible

The example grammars:

This grammar gives the following LR(0)–DFA: State Possible actions:

0 Only shift is possible

1 Only red. possible, with A’ → A

2 Only red. possible, with A → a

3 Only shift is possible

4 Only shift is possible

5 Only red. possible, with A → (A)

- When shift: Many shift items may occur. Shift is one action

- When reduction: It must also be clear with which production

We have already looked at this one: Not LR(0)!

Yes! This grammar

is LR(0)!

Page 23: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

Parsing table for an LR(0) grammar This table structure is slightly different than for SLR(1), LALR(1) and LR(1) grammars

Therefore this table structure is not important (not curriculum)

Parsing av sentence: ((a)) A’

A

A

A

( ( a ) )

We should here reduce with A’→A, and as the input is empty we are finished, and the sentence is correct

If a reduction brings us to state 0 or 3, the Goto part will tell us what state pushing A will give

Page 24: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

24

Parsing of erroneous sentences Grammar: A → ( A ) | a

$ 0 ( ( a ) $

$ 0 ( 3 ( a ) $

$ 0 ( 3 ( a ) $

$ 0 ( 3 ( 3 a ) $

$ 0 ( 3 ( 3 a ) $

$ 0 ( 3 ( 3 A ) 5 $

$ 0 ( 3 ( 3 A 4 $

$ 0 ( ) $

$ 0 ( 3 ) $

Important invariant for LR-parsing in general:

We are never allowed to shift something illegal onto the

stack!

Page 25: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

25

A (slightly erroneous) formulation of the LR(0) requirement: If the following rules gives an unambiguous algorithm,

the grammar is LR(0):

S is a DFA state

t, where s → t X

Avslutning

t, where u → t A u

u Can

occu

r in

m

any

stat

es

Page 26: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

26

The next grammar: Is it LR(0)? No, because of the states 0, 2 and 4

LR(0) – DFA:

Page 27: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

27

Old grammar that is not LR(0): How can we make a choice in state 1 ??

E’

E

E

n + n $

We should shift

We should reduce with E’→E

Solution: We look at the next input symbol, in the token-variable!!

Page 28: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

28

SLR(1) – grammars. The SLR(1) algorithm

Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one

column for each terminal and one for each non-terminal.

... A → α. ... B → β.

... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2

LR(0): We have an unsolvable red./red. conflict

SLR(1): If Follow(A) ∩ Follow(B) = ∅ then we can solve the conflict by looking at the next input symbol

If token∈ Follow(A) reduce with A → α If token∈ Follow(B) reduce with B → β

But the «…» may inicate other possibilities!

b1

b2

Page 29: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

29

SLR(1) - grammars, SLR(1) - algorithms

Very few grammars are LR(0) By looking at the Follow-set we can obtain a much stronger algorithm Will still use the LR(0)-DFA The table structure will be a little different. The tables will have one

column for each terminal and one for each non-terminal.

... A → α. ... B → β.

... A → α. ... B1 → β1. b1 γ1 ... B2 → β2. b2 γ2

LR(0): We have an unsovable shift/red. conflict

SLR(1): If Follow(A) ∩ {b1,b2, …} = ∅ then we can solve the conflict by looking at the next input symbol (in token) as follows:

• If token∈ Follow(A): Reduce with A → α Nonterminal A will decide new top state

• If token = b1, b2, …: shift. Input will decide new top state

• But the «…» may inicate other possibilities!

b1

b2

Page 30: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

30

Is this grammar SLR(1)?

The SLR(1) requirement in the book: For all DFA states s, we must have:

Follow(E’) = { $ }

Thus: • Shift for ’+’ • Reduce for ’$’, med E’ → E (which is accept)

Shift reduce conflict in LR(0), but not in SLR(1)

Would otherwise have a shift/red. conflict on input X

Would otherwise have a red./red. conflict when input is in this set. Recall: «Complete item» = it has the dot at the end

Skal skifte for n

Page 31: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

31

A (slightly erroneous) formulation of the SLR(1) requirement: If the following rules gives an unambiguous algorithm, the grammar is SLR(1):

t, where u → t A

t, where s → t X

u

Dette er nytt i forhold til LR(0).

Page 32: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

32

Parsing table for an SLR(1)-grammar

SLR(1)-kravet på en annen måte: Denne tabellen må være entydig!

’n’ not in Follow(E)

Page 33: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

33

Parsing for an SLR(1)-grammar

E’

E

E

E

n + n + n

May also look at erroeous sentences: + n $ n n $ n + $

Parsing of the sentence: n + n + n

Page 34: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

34

Is this grammar SLR(1)?

Follow(S) = { ), $ }

Page 35: INF5110 Ch. 5: Bottom-up parsing Part 1€¦ · Ch. 5: Bottom-up parsing Part 1 12/2-2015 Stein Krogdahl . Ifi, UiO . Mandatory assignment 1: Will occur sometime during the week from

35

SLR(k) – Possible to come up with a theory for this, but it is probably not used in any tools.