parsing

53
Parsing

Upload: shelby-fry

Post on 31-Dec-2015

81 views

Category:

Documents


0 download

DESCRIPTION

Parsing. Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First and follow sets Constructing LL(1) parsing table Error recovery. Bottom-up parsing Shift-reduce parsers LR(0) parsing LR(0) items Finite automata of items - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parsing

Parsing

Page 2: Parsing

2301373 Chapter 4 Parsing 2

Outline

Top-down v.s. Bottom-upTop-down parsing Recursive-descent

parsing LL(1) parsing

LL(1) parsing algorithm

First and follow sets Constructing LL(1)

parsing table Error recovery

Bottom-up parsing - Shift reduce parsers LR(0) parsing

0LR( ) items FFFFFF FFFFFFFF FF FFF

FF 0LR( ) parsing algorithm 0LR( )gr ammar

S 1LR( )par si ng S FFFFFFF FFFFFFFFF1

S 1LR( )gr ammar FFFFFFF FFFFFFFF

Page 3: Parsing

2301373 Chapter 4 Parsing 3

Introduction

Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens.We already learn how to describe the syntactic structure of a language using (context-free) grammar.So, a parser only need to do this?

Stream of tokens

Context-free grammarParser Parse tree

Page 4: Parsing

2301373 Chapter 4 Parsing 4

Top–Down Parsing Bottom–Up Parsing

A parse tree is created from root to leavesThe traversal of parse trees is a preorder traversalTracing leftmost derivationTwo types: Backtracking parser Predictive parser

A parse tree is created from leaves to rootThe traversal of parse trees is a reversal of postorder traversalTracing rightmost derivationMore powerful than top-down parsing

Try different structures and backtrack if it does not matched the input

Guess the structure of the parse tree from the next input

Page 5: Parsing

2301373 Chapter 4 Parsing 5

Parse Trees and Derivations

E E + E id + E id + E * E id + id * E id + id * id

E E + E E + E * E E + E * id E + id * id id + id * id

Top-down parsing

Bottom-up parsing

id

E*

E

id

id

+

E

E E

+

*

id

E

E

id

E

E

id

E

Page 6: Parsing

2301373 Chapter 4 Parsing 6

Top-down Parsing

What does a parser need to decide?Which production rule is to be used at each

point of time ?

How to guess?What is the guess based on?What is the next token?

Reserved word if, open parentheses, etc. What is the structure to be built?

If statement, expression, etc.

Page 7: Parsing

2301373 Chapter 4 Parsing 7

Top-down Parsing

Why is it difficult?Cannot decide until later

Next token: if Structure to be built: St St Mat chedSt | Unmat chedSt UnmatchedSt

if (E) St| if (E) MatchedSt else UnmatchedSt MatchedSt if (E) MatchedSt else MatchedSt |...

Production with empty stringNext token: id Structure to be built: par par parList | parList exp , parList | exp

Page 8: Parsing

2301373 Chapter 4 Parsing 8

Recursive-Descent

Write one procedure for each set of productions with the same nonterminal in the LHSEach procedure recognizes a structure described by a nonterminal.A procedure calls other procedures if it need to recognize other structures.A procedure calls match procedure if it need to recognize a terminal.

Page 9: Parsing

2301373 Chapter 4 Parsing 9

Recursive-Descent: Example

E E O F | FO + | -F ( E ) | id

procedure F{ switch token

{ case (: match(‘(‘); E; match(‘)’);

case id: match(id);default: error;

}}

For this grammar: We cannot decide

which rule to use for E, and

If we choose E E O F, it leads to infinitely recursive loops.

Rewrite the grammar into EBNF

procedure E{ F;

while (token=+ or token=-){ O; F; }

}

procedure E{ E; O; F; }

E ::= F {O F}O ::= + | -F ::= ( E ) | id

Page 10: Parsing

2301373 Chapter 4 Parsing 10

Match procedure

procedure match(expTok){ if (token==expTok)

then getTokenelse error

}

The token is not consumed until getToken is executed.

Page 11: Parsing

2301373 Chapter 4 Parsing 11

-Problems in Recursive Descent

Difficult to convert grammars into EBNF Cannot decide which production to use

at each point Cannot decide when to use - production

A

Page 12: Parsing

2301373 Chapter 4 Parsing 12

LL(1) Parsing

1LL( ) Read input from (L ) left to right Simulate (L ) leftmost derivation1 lookahead symbol

Use stack to simulate leftmost derivation Part of sentential form produced in the left

most derivation is stored in the stack. Top of stack is the leftmost nonterminal sy

mbol in the fragment of sentential form.

Page 13: Parsing

2301373 Chapter 4 Parsing 13

Concept of LL(1) Parsing

Simulate leftmost derivation of the input.Keep part of sentential form in the stack.

If the symbol on the top of stack is a termi nal, try to match it with the next input tok

en and pop it out of stack. If the symbol on the top of stack is a nonte

rminal X, replace it with Y if we have a pro duction rule X Y.

Which production will be chosen, if there are b oth X Y and X Z ?

Page 14: Parsing

2301373 Chapter 4 Parsing 14

1Example of LL( ) Parsing

( n + ( n ) ) * n $

$

E

E T XX A T X | A + | -T F NN M F N | M *F ( E ) | n

T

X

F N )

E

( T

X

F

N

n A

T

X

+ F

N

(

E

)

T

X

F

N

n

M

F

N

*

n

Finished

E TX FNX (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n

Page 15: Parsing

2301373 Chapter 4 Parsing 15

LL(1) Parsing Algorithm

Push the start symbol into the stackWHILE stack is not empty ($ is not on top of stack) and the stream

of tokens is not empty (the next input token is not $)SWITCH (Top of stack, next token)

CASE (terminal a, a):Pop stack; Get next token

CASE (nonterminal A, terminal a):IF the parsing table entry M[A, a] is not empty THEN

Get A X1 X2 ... Xn from the parsing table entry M[A, a] Pop stack; Push Xn ... X2 X1 into stack in that order

ELSE ErrorCASE ($,$): AcceptOTHER: Error

Page 16: Parsing

2301373 Chapter 4 Parsing 16

LL(1) Parsing Table

If the nonterminal N is o n the top of stack and

the next token is t , whi ch production rule to u

se? Choose a rule N X su

ch thatX * tY orX * and S *

WNtY

N

Q

t … … …

X Y

t

Y

t

N X

Page 17: Parsing

2301373 Chapter 4 Parsing 17

FFFFF FFF

Let X be or be in V or T.First( X ) is the set of the first terminal in

any sentential form derived from X. If X is a terminal or , then First( X ) ={ X }. If X is a nonterminal and X X 1X 2 ... Xn is a

rule, thenFirst(X

1 -) { } is a subset of First(X)

First(Xi -) { } is a subset of First(X) if for all j<iFirst(Xj ) contains {}

is in First(X) if for all j≤ n First(Xj )contains

Page 18: Parsing

2301373 Chapter 4 Parsing 18

Examples of First Set

exp exp addop term | term

addop -+| term term mulop facto

r | factormulop *factor (exp) | num

- First(addop) = {+, } First(mulop) = {*} First(factor) = {(, num}

First(term) = {(, num} First(exp) = {(, num}

st ifst | other ifst if ( exp ) st elsepar

t elsepart else st |

exp 0 | 1

First(exp) = {0,1} First(elsepart) = {else, }

First(ifst) = {if} First(st) = {if, other}

Page 19: Parsing

2301373 Chapter 4 Parsing 19

Algorithm for finding First(A)

For all terminals a, First(a) = {a}For all nonterminals A, First(A) := {}While there are changes to any First(A)

For each rule A X1 X2 ... Xn

For each Xi in {X1, X2, …, Xn }

If for all j<i First(Xj) contains , Then

add First(Xi)-{} to First(A)

If is in First(X1), First(X2), ..., and First(Xn)

Then add to First(A)

If A is a terminal or , t hen First(A) = {A}.

If A is a nonterminal, th en for each rule A

X1

X2

... Xn , First(A) contains First(X

1 - )

{}. If also for some i<n, Fir

st(X1

), First(X2

), ..., and First(Xi ) contain

, then First(A) cont ains First(Xi+1 -) {}.

If First(X1

), First(X2

), .. ., and First(Xn ) conta in , then First(A) als

o contains .

Page 20: Parsing

2301373 Chapter 4 Parsing 20

Finding First Set: An Example

exp term exp’ exp’ addop term exp’ |

addop -+ | term factor term’ term’ mulop factor term’ |

mulop * factor ( exp ) | num

First

exp

exp’

addop

term

term’

mulop

factor

-+

*

( num

-+

( num

*

( num

Page 21: Parsing

2301373 Chapter 4 Parsing 21

Follow Set

Let $ denote the end of input tokens If A is the start symbol, then $ is in Follo

w(A). If there is a rule B FFFFFFFF - F,() { } is in Follow(A). If there is production B X A Y and is i n First(Y), then Follow(A) contains Follo

w(B).

Page 22: Parsing

2301373 Chapter 4 Parsing 22

Algorithm for Finding Follow(A)

Follow(S) = {$}

FOR each A in V-{S}

Follow(A)={}

WHILE change is made to some Follow sets

FOR each production A X1 X2 ... Xn,

FOR each nonterminal Xi

Add First(Xi+1 Xi+2...Xn)-{} into Follow(Xi).

(NOTE: If i=n, Xi+1 Xi+2...Xn= )

IF is in First(Xi+1 Xi+2...Xn) THEN

Add Follow(A) to Follow(Xi)

If A is the start sy mbol, then $ is i

n Follow(A). If there is a rule A Y X Z, then Fir

- st(Z) { } is in Follow(X).

If there is producti on B X A Y and

is in First(Y), th en Follow(A) con

tains Follow(B).

Page 23: Parsing

2301373 Chapter 4 Parsing 23

Finding Follow Set: An Example

exp term exp’ exp’ addop term exp’ |

addop -+ | term factor term’ term’ mulop factor term’

| mulop * factor ( exp ) | num

First

exp

exp’

addop

term

term’

mulop

factor

-+

*

( num

-+

( num

*

( num

Follow)

-+

$( num

( num

-+

*

$

( num

$

*

-+

$

$ -+ $

))

)

))

)

Page 24: Parsing

2301373 Chapter 4 Parsing 24

Constructing LL(1) Parsing Tables

FOR each nonterminal A and a production A X FOR each token a in First(X)

A X is in M(A, a) IF is in First(X) THEN

FOR each element a in Follow(A) Add A X to M(A, a)

Page 25: Parsing

2301373 Chapter 4 Parsing 25

1Example: Constructing LL( ) Parsing Table

First Followexp {(, num} {$,)}exp’ {+,-, }

{$,)}addop {+,-} {(,num}term {(,num} {+,-,),$}term’ {*, } {+,-,),$}mulop {*} {(,num}factor {(, num} {*,+,-,),$}

1 exp term exp’2 exp’ addop term exp’ 3 exp’ 4 addop + 5 addop -6 term factor term’7 term’ mulop factor term’8 term’ 9 mulop *10 factor ( exp ) 11 factor num

( ) + - * n $exp

exp’

addop

term

term’

mulop

factor

1 1

2 23 3

4 5

6 6

78 8 8 8

9

10 11

Page 26: Parsing

2301373 Chapter 4 Parsing 26

LL(1) Grammar

A grammar is an LL(1) grammar if its LL( 1) parsing table has at most one produc

tion in each table entry.

Page 27: Parsing

2301373 Chapter 4 Parsing 27

- LL(1 ) Parsing Table for non LL(1 ) Grammar

1 exp exp addop term 2exp term 3term term mulop facto

r 4term factor 5 factor ( exp ) 6f act or num 7addop + 8 addop - F FFFF 9 *

First(exp) = { (, num } First(term) = { (, num }

First(factor) = { (, num } - First(addop) = { +, } First(mulop) = { * }

( ) + - * num $exp 1,2 1,2term 3,4 3,4

factor 5 6addop 7 8mulop 9

Page 28: Parsing

2301373 Chapter 4 Parsing 28

Causes of - FFFFFFF(1)

FFF-FFFFFF(1)? -Left recursion Left factor

Page 29: Parsing

2301373 Chapter 4 Parsing 29

Left Recursion

Immediate left recursion A A X | Y A A X 1 | A X 2 |…| A

X n | Y 1 | Y 2 |... | Y m

General left recursion A => X =>* A Y

Can be removed very easily A Y A’, A’ X A’| A Y 1A’ | Y 2A’ |...| Ym A’

, A’ X 1 A’| X 2 A’|…| X n

A’|

Can be removed when -there is no empty strin

g production and no cy cle in the grammar

A=Y X*

A={Y1, Y2,…, Ym} {X1, X2, …, Xn}*

Page 30: Parsing

2301373 Chapter 4 Parsing 30

Removal of Immediate Left Recursion

exp exp + term | exp - term | termterm term * factor | factor

factor ( exp ) | num Remove left recursion

exp term exp’ exp’ - + term exp’ | term exp’ | term factor term’ term’ * factor term’ | factor ( exp ) | num

exp = term ( term)*

term = factor (* factor)*

Page 31: Parsing

2301373 Chapter 4 Parsing 31

General Left Recursion

Bad News! Can only be removed when there is no emp

- ty string production and no cycle in the grammar.

Good News!!!! Never seen in grammars of any programmi

ng languages

Page 32: Parsing

2301373 Chapter 4 Parsing 32

Left Factoring

-Left factor causes non LL(1) Given A X Y | X Z. Both A X Y and A X

Z can be chosen when A is on top of stack a nd a token in First(X) is the next token.

A X Y | X Z - can be left factored as

A X A’ and A’ Y | Z

Page 33: Parsing

2301373 Chapter 4 Parsing 33

Example of Left Factor

ifSt if ( exp ) st FFFF st | if ( exp ) st - can be left factored as

ifSt if ( exp ) st elseParte lsePart FFFF st |

seq st ; seq | st - can be left factored as

seq st seq’ seq’ ; seq |

Page 34: Parsing

2301373 Chapter 4 Parsing 34

Bottom-up Parsing

Use explicit stack to perform a parse Simulate rightmost derivation (R) from l

eft (L) to right, thus called LR parsing - More powerful than top down parsing

Left recursion does not cause problem

Two actions Shift: take next input token into the stack Reduce: replace a string B on top of stack b

y a nonterminal A, given a production A B

Page 35: Parsing

2301373 Chapter 4 Parsing 35

- Example of Shift reduce Parsing

FFFFFFF FF rightmost derivation

from left to right1 ( ( ) )2 ( ( ) )3 ( ( ) )4 ( ( S ) )5 ( ( S ) )6 ( ( S ) S ) 7 ( S )8 ( S )9 ( S ) S

10 S’ S

Grammar S’ S

S (S)S | Parsing actions

Stack Input Action$ ( ( ) ) $ shift

$ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept

Page 36: Parsing

2301373 Chapter 4 Parsing 36

- Example of Shift reduce Parsing

1 ( ( ) )2 ( ( ) )3 ( ( ) )4 ( ( S ) )5 ( ( S ) )6 ( ( S ) S ) 7 ( S )8 ( S )9 ( S ) S

10 S’ S

Grammar S’ S

S (S)S | Parsing actions

Stack Input Action$ ( ( ) ) $ shift

$ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S ) S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S ) S $ S $ accept

Viable prefix

handle

Page 37: Parsing

2301373 Chapter 4 Parsing 37

FFFFFFFFFFFFF

Right sentential form sentential form in a rightm

ost derivation

Vi abl e pr efi x sequence of symbols on th

e parsing stack

Handle right sentential form + pos

ition where reduction can b e performed + production

used for reduction

0LR( ) i t em production with distinguish

ed position in its RHS

Right sentential form ( S ) S ( ( S ) S )

Viable prefix ( S ) S, ( S ), ( S, ( ( ( S ) S, ( ( S ), ( ( S , ( (, (

Handle ( S ) S. with S ( S ) S . with S ( ( S ) S . ) with S ( S ) S

LR(0) item S ( S ) S. S ( S ) . S S ( S . ) S S ( . S ) S S . ( S ) S

Page 38: Parsing

2301373 Chapter 4 Parsing 38

- Shift reduce parsers

There are two possible actions: shift and reduce

Parsing is completed when the input stream is empty and the stack contains only the start symbol

The grammar must be augmented a new start symbol S’ is added a production S’ S is added

To make sure that parsing is finished when S’ is on top of stack because S’ never appears on the

RHS of any production.

Page 39: Parsing

2301373 Chapter 4 Parsing 39

LR(0) parsing

Keep track of what is left to be done in t he parsing process by using finite auto

mata of items An item A w . B y means:

A F F F F FFFF FF FFFF FFF FFF FFFFFFFFF FF FFF future,

at the time, we know we already construct w in t he parsing process,

if B is constructed next, we get the new item A w B . Y

Page 40: Parsing

2301373 Chapter 4 Parsing 40

LR(0) items

LR(0) item production with a distinguished position in the RHS

Initial Item Item with the distinguished position on the leftmos

t of the production Complete Item

Item with the distinguished position on the rightmo st of the production

Closure Item of x Item x together with items which can be reached fr

om x via -transition Kernel Item

Original item, not including closure items

Page 41: Parsing

2301373 Chapter 4 Parsing 41

Finite automata of items

: S’ S

S (S)S S

Items: S’ .S S’ S.

S .(S)S S (.S)S S (S.)S S (S).S S (S)S. S .

S’ .S S’ S.

S .(S)S S .

S (S.)S S (.S)S

S (S).S S (S)S.

S

S

(

)

S

Page 42: Parsing

2301373 Chapter 4 Parsing 42

DFA of LR(0) Items

S’ .S S’ S.

S .(S)S S .

S (S.)S S (.S)S

S (S).S

S (S)S.

S

S(

)

S

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

Page 43: Parsing

2301373 Chapter 4 Parsing 43

LR(0) parsing algorithm

Item in state token Action

A-> x.By where B is terminal B shift B and push state s

containing A -> xB.y

A-> x.By where B is terminal not B error

A -> x. - reduce with A -> x (i.e. pop x,

backup to the state s on top of

stack) and push A with new

state d(s,A)

S’ -> S. none accept

S’ -> S. any error

Page 44: Parsing

2301373 Chapter 4 Parsing 44

LR(0) Parsing Table

State Action Rule ( a ) A 0 shift 3 2 1 1 reduce A’ -> A 2 reduce A -> a 3 shift 3 2 4 4 shift 5 5 reduce A -> (A)

A’ .A A .(A) A .a

A’ A.

A a.

A (A).

A (.A) A .(A) A .a

A (A.)

A

A

a

a(

()

0

4

3

2

1

5

Page 45: Parsing

2301373 Chapter 4 Parsing 45

Example of LR(0) Parsing

State Action Rule ( a ) A 0 shift 3 2 1 1 reduce A’ -> A 2 reduce A -> a 3 shift 3 2 4 4 shift 5 5 reduce A -> (A)

Stack Input Action$0 ( ( a ) ) $ shift$0(3 ( a ) ) $ shift$0(3(3 a ) ) $ shift$0(3(3a2 ) ) $ reduce$0(3(3A4 ) ) $ shift$0(3(3A4)5 ) $ reduce$0(3A4 ) $ shift$0(3A4)5 $ reduce$0A1 $ accept

Page 46: Parsing

2301373 Chapter 4 Parsing 46

- 0Non LR( )Grammar

0

1

4

2

5

3

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

Conflict - Shift reduce conflict

A state contains a compl ete item A x. and a shi

ft item A x.By - Reduce reduce conflict

A state contains more th an one complete items.

0A grammar is a LR( ) grammar if there is

no conflict in the graFFFF.

Page 47: Parsing

2301373 Chapter 4 Parsing 47

SLR(1) parsing

Simple LR with 1 lookahead symbolExamine the next token before deciding to shift or reduce If the next token is the token expected in

an item, then it can be shifted into the stack.

If a complete item A x. is constructed and the next token is in Follow(A), then reduction can be done using A x.

Otherwise, error occurs.

Can avoid conflict

Page 48: Parsing

2301373 Chapter 4 Parsing 48

SLR(1) parsing algorithm

Item in state token Action

A-> x.By (B is terminal) B shift B and push state s containing

A -> xB.y

A-> x.By (B is terminal) not B error

A -> x. in

Follow(A)

reduce with A -> x (i.e. pop x,

backup to the state s on top of

stack) and push A with new state

d(s,A)

A -> x. not in

Follow(A)

error

S’ -> S. none accept

S’ -> S. any error

Page 49: Parsing

2301373 Chapter 4 Parsing 49

SLR(1) grammar

Conflict - Shift reduce conflict

A state contains a shift item A x.W y such that W is a terminal and a complete item B z. such that W is in Follow(B).

- Reduce reduce conflict A state contains more than one complete item

with some common Follow set.

A grammar is an SLR(1 ) grammar if ther e is no conflict in the grammar.

Page 50: Parsing

2301373 Chapter 4 Parsing 50

SLR(1) Parsing Table

A’ .A A .(A) A .a

A’ A.

A a.

A (A).

A (.A) A .(A) A .a A (A.)

A

A

a

a(

(

)

0

43

2

1

5

State ( a ) $ A 0 S3 S2 1 1 AC 2 R2 3 S3 S2 4 4 S5 5 R1

A (A) | a

Page 51: Parsing

2301373 Chapter 4 Parsing 51

SLR(1) Grammar not LR(0)

0

1

4

2

5

3

S’ .S S .(S)S S .

S (.S)S S .(S)S S .

S’ S.

S (S).S S .(S)S S .

S (S.)S

S (S)S.

S

(

S

)

((

S

State ( ) $ S 0 S2 R2 R2 1 1 AC 2 S2 R2 R2 3 3 S4 4 S2 R2 R2 5 5 R1 R1

S (S)S |

Page 52: Parsing

2301373 Chapter 4 Parsing 52

Disambiguating Rules for Parsing Conflict

Shift-reduce conflictPrefer shift over reduce

In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else

Reduce-reduce conflictError in design

Page 53: Parsing

2301373 Chapter 4 Parsing 53

Dangling Else

state

if else other $ S I

0 S4 S3 1 2

1 ACC

2 R1 R1

3 R2 R2

4 S4 S3 5 2

5 S6 R3

6 S4 S3 7 2

7 R4 R4

S’ .S 0S .IS .otherI .if SI .if S else S

S I. 2

S .other 3

I if S else .S 6S .IS .otherI .if SI .if S else S

I if .S 4I if .S else SS .IS .otherI .if SI .if S else S

S’ S. 1

I if S. 5I if S. else S

I .if S else S 7

S

S

if

I

other

S

if

I

I

ifother

else

other

other