top down vs. bottom up parsing - department of computer ...€¦ · bottom-up parsers: build the...

1

1 2

Parsin

g

•A

gram

mar d

escribes th

e string

s of to

ken

s that are

syn

tactically leg

al in a P

L

•A

recog

niser sim

ply

accepts o

r rejects string

s.

•A

gen

erator p

rod

uces sen

tences in

the lan

gu

age d

escribed

by

the g

ramm

ar

•A

parser co

nstru

ct a deriv

ation

or p

arse tree for a sen

tence (if

possib

le)

•T

wo co

mm

on ty

pes o

f parsers:

–b

otto

m-u

p o

r data d

riven

–to

p-d

ow

n o

r hyp

oth

esis driv

en

•A

recursive d

escent p

arser is a w

ay to

imp

lemen

t a top

-do

wn

parser th

at is particu

larly sim

ple.

3

To

p d

ow

n v

s. botto

m u

p p

arsin

g

•T

he p

arsing

pro

blem

is to co

nn

ect the ro

ot n

od

e S

with

the tree leav

es, the in

pu

t

•T

op

-dow

n p

arsers: starts co

nstru

cting

the p

arse tree at th

e top

(roo

t) of th

e parse tree an

d m

ov

e d

ow

n to

ward

s the leav

es. Easy

to im

plem

ent

by h

and, b

ut w

ork

with

restricted g

ramm

ars. ex

amp

les:

-P

redictiv

e parsers (e.g

., LL

(k))

•B

otto

m-u

p p

arsers: b

uild

the n

od

es on

the b

otto

m o

f the

parse tree first. S

uitab

le for au

tom

atic parser g

eneratio

n,

han

dle a larg

er class of g

ramm

ars. exam

ples:

–sh

ift-reduce p

arser (or L

R(k) p

arsers)

•B

oth

are gen

eral techn

iqu

es that can

be m

ade to

wo

rk fo

r all lan

gu

ages (b

ut n

ot all g

ramm

ars!).

S

A =

1 +

3 *

4 / 5

4

To

p d

ow

n v

s. botto

m u

p p

arsin

g

•B

oth

are gen

eral techn

iqu

es that can

be m

ade to

wo

rk

for all lan

gu

ages (b

ut n

ot all g

ramm

ars!).

•R

ecall that a g

iven

lang

uag

e can b

e describ

ed b

y

several g

ramm

ars.

•B

oth

of th

ese gram

mars d

escribe th

e same lan

gu

age

E -> E + Num

E -> Num

E -> Num + E

E -> Num

•T

he first o

ne, w

ith it’s left recu

rsion

, causes

pro

blem

s for to

p d

ow

n p

arsers.

•F

or a g

iven

parsin

g tech

niq

ue, w

e may

hav

e to

transfo

rm th

e gram

mar to

wo

rk w

ith it.

2

5

•H

ow

hard

is the p

arsing task

?

•P

arsing an

arbitrary

Contex

t Free G

ramm

ar is O(n

3), e.g., it

can tak

e time p

rop

ortio

nal th

e cub

e of th

e nu

mb

er of sy

mb

ols

in th

e inp

ut. T

his is b

ad!

•If w

e constrain

the g

ramm

ar som

ewhat, w

e can alw

ays p

arse

in lin

ear time. T

his is g

oo

d!

•L

inear-tim

e parsin

g

–L

L p

arsers

•R

ecog

nize L

L g

ramm

ar

•U

se a top-d

ow

n strateg

y

–L

R p

arsers

•R

ecog

nize L

R g

ramm

ar

•U

se a bo

ttom

-up

strategy

Parsin

g co

mp

lexity

•L

L(n

) : Left to

righ

t, L

eftmost d

erivatio

n,

look

ah

ead

at m

ost n

sy

mb

ols.

•L

R(n

) : Left to

righ

t, R

igh

t deriv

atio

n,

look

ah

ead

at m

ost n

sy

mb

ols.

6

Top

Dow

n P

arsin

g M

ethod

s

•S

imp

lest meth

od

is a full-b

acku

p, recu

rsive descen

t

parser

•O

ften u

sed fo

r parsin

g sim

ple lan

guag

es

•W

rite recursiv

e recog

nizers (su

bro

utin

es) for each

gram

mar ru

le

–If ru

les succeed

s perfo

rm so

me actio

n (i.e., b

uild

a

tree no

de, em

it cod

e, etc.)

–If ru

le fails, return

failure. C

aller may

try an

oth

er

cho

ice or fail

–O

n failu

re it “back

s up

”

7

Top

Dow

n P

arsin

g M

ethod

s

•P

roblem

s

–W

hen

goin

g fo

rward

, the p

arser consu

mes to

ken

s

from

the in

pu

t, so w

hat h

app

ens if w

e hav

e to b

ack

up

?

–A

lgo

rithm

s that u

se back

up

tend

to b

e, in g

eneral,

inefficien

t

–G

ramm

ar rules w

hich

are left-recursiv

e lead to

no

n-

termin

ation

!

8

Recu

rsive D

ecent P

arsin

g E

xam

ple

Exa

mp

le: F

or th

e g

ram

ma

r:

<term> -> <factor> {(*|/)<factor>}*

We co

uld

use th

e follo

win

g recu

rsive

descen

t parsin

g su

bp

rog

ram (th

is on

e is w

ritten in

C)

void term() {

factor(); /* parse first factor*/

while (next_token == ast_code ||

next_token == slash_code) {

lexical(); /* get next token */

factor(); /* parse next factor */

}

}

3

9

Pro

blem

s

•S

om

e gram

mars cau

se pro

blem

s for to

p

do

wn

parsers.

•T

op

do

wn

parsers d

o n

ot w

ork

with

left-recu

rsive g

ramm

ars.

–E

.g., o

ne w

ith a ru

le like: E

-> E

+ T

–W

e can tran

sform

a left-recursiv

e gram

mar in

to

on

e wh

ich is n

ot.

•A

top

do

wn

gram

mar can

limit b

acktrack

ing

if it on

ly h

as on

e rule p

er no

n-term

inal

–T

he tech

niq

ue o

f rule facto

ring

can b

e used

to

elimin

ate mu

ltiple ru

les for a n

on

-termin

al.

10

Left-recu

rsive g

ram

ma

rs

•A

gram

mar is left recu

rsive if it h

as rules lik

e X -> X

•O

r if it has in

direct left recu

rsion

, as in

X -> A

A -> X

•Q

: Wh

y is th

is a pro

blem

?

–A

: it can lead

to n

on

-termin

ating

recursio

n!

•C

onsid

er

E -> E + Num

E -> Num

•W

e can m

anu

ally o

r auto

matically

rewrite a g

ramm

ar to

remo

ve left-recu

rsion

, mak

ing

it suitab

le for a to

p-d

ow

n

parser.

11

Elim

inatio

n o

f Left R

ecursio

n

•C

on

sider th

e left-recursiv

e

gram

mar

S

S

S ->

•S

gen

erates strings

…

•R

ewrite u

sing rig

ht-

recursio

n

S

S’

S’

S’|

•C

on

cretely

T -> T + id

T-> id

•T

gen

erates strings

id

id+

id

id+

id+

id

…

•R

ewrite u

sing

righ

t-recu

rsion

T -> id T’

T’ -> id T’

T’ ->

12

More E

limin

atio

n o

f Left-R

ecursio

n

•In

gen

eral

S

S

1 | … | S

n |

1 | … |

m

•A

ll string

s deriv

ed fro

m S

start with

on

e of

1 ,

…,

m an

d co

ntin

ue w

ith sev

eral instan

ces of

1 ,

…,

n

•R

ewrite as

S

1 S

’ | … |

m S

’

S’

1 S

’ | … |

n S

’ |

4

13

Gen

eral L

eft Recu

rsion

•T

he g

ramm

ar

S

A

|

A

S

is also left-recu

rsive b

ecause

S

+ S

wh

ere + m

eans “can

be rew

ritten in

on

e or

mo

re steps”

•T

his in

direct left-recu

rsion

can also

be

auto

matically

elimin

ated

14

Su

mm

ary

of R

ecursiv

e Descen

t

•S

imp

le and

gen

eral parsin

g strateg

y

–L

eft-recursio

n m

ust b

e elimin

ated first

–…

bu

t that can

be d

on

e auto

matically

•U

np

op

ular b

ecause o

f back

trackin

g

–T

ho

ug

ht to

be to

o in

efficient

•In

practice, b

acktrack

ing

is elimin

ated b

y

restricting

the g

ramm

ar, allow

ing

us to

successfu

lly p

redict w

hich

rule to

use.

15

Pred

ictive P

arser

•A

pred

ictive p

arser u

ses info

rmatio

n fro

m th

e first term

ina

l symb

ol o

f each ex

pressio

n to

decid

e w

hich

pro

du

ction

to u

se.

•A

pred

ictive p

arser is also k

no

wn

as an L

L(k

) p

arser becau

se it do

es a Left-to

-righ

t pa

rse, a L

eftmo

st-deriva

tion

, and

k-sym

bo

l loo

ka

hea

d.

•A

gram

mar in

wh

ich it is p

ossib

le to d

ecide w

hich

p

rod

uctio

n to

use ex

amin

ing

on

ly th

e first tok

en (as

in th

e prev

iou

s exam

ple) are called

LL

(1)

•L

L(1

) gram

mars are w

idely

used

in p

ractice.

–T

he sy

ntax

of a P

L can

be ad

justed

to en

able it to

be

describ

ed w

ith an

LL

(1) g

ramm

ar.

16

Pred

ictive P

arser

Ex

amp

le: con

sider th

e gram

mar

S

if E th

en S

else S

S

beg

in S

L

S

prin

t E

L

end

L

; S L

E

num

= n

um

An S

expressio

n starts eith

er with

an IF

, BE

GIN

, or P

RIN

T to

ken

,

and

an L

expressio

n start w

ith an

E

ND

or a S

EM

ICO

LO

N to

ken

,

and

an E

expressio

n h

as only

one

pro

du

ction

.

5

17

Rem

emb

er…

•G

iven

a gram

mar an

d a strin

g in

the lan

gu

age

defin

ed b

y th

e gram

mar…

•T

here m

ay b

e mo

re than

on

e way

to d

erive th

e

string

leadin

g to

the sam

e parse tree

–it ju

st dep

end

s on

the o

rder in

wh

ich y

ou

app

ly th

e rules

–an

d w

hat p

arts of th

e string

yo

u ch

oo

se to rew

rite nex

t

•A

ll of th

e deriv

ation

s are valid

•T

o sim

plify

the p

rob

lem an

d th

e algo

rithm

s, we

often

focu

s on

on

e of

–A

leftmo

st deriv

ation

–A

righ

tmo

st deriv

ation

18

LL

(k) a

nd

LR

(k) p

arsers

•T

wo

imp

ortan

t classes of p

arsers are called L

L(k

) parsers an

d

LR

(k) p

arsers.

•T

he n

ame L

L(k

) mean

s:

–L

- Left-to

-righ

t scann

ing

of th

e inp

ut

–L

- Co

nstru

cting

leftmo

st deriva

tion

–k

– m

ax n

um

ber o

f inp

ut sy

mb

ols n

eeded

to select a p

arser action

•T

he n

ame L

R(k

) mean

s:

–L

- Left-to

-righ

t scann

ing

of th

e inp

ut

–R

- Co

nstru

cting

righ

tmo

st deriva

tion in

reverse

–k

– m

ax n

um

ber o

f inp

ut sy

mb

ols n

eeded

to select a p

arser action

•S

o, a L

R(1

) parser n

ever n

eeds to

“loo

k ah

ead” m

ore th

an o

ne in

pu

t

tok

en to

kn

ow

wh

at parser p

rod

uctio

n to

app

ly n

ext.

19

Pred

ictive P

arsin

g a

nd

Left F

acto

ring

•C

on

sider th

e gram

mar

E

T + E

E

T

T

int

T

int * T

T

( E )

•H

ard to

pred

ict becau

se

–F

or T

, two p

roductio

ns start w

ith in

t

–F

or E

, it is no

t clear ho

w to

pred

ict wh

ich ru

le to u

se

•A

gram

mar m

ust b

e left-facto

red b

efore u

se for p

redictiv

e

parsin

g

•L

eft-factorin

g in

vo

lves rew

riting

the ru

les so th

at, if a no

n-

termin

al has m

ore th

an o

ne ru

le, each b

egin

s with

a

termin

al.

20

Left-F

acto

ring E

xam

ple

E

T + E

E

T

T

int

T

int * T

T

( E )

E

T X

X

+ E

X

T

( E )

T

int Y

Y

* T

Y

Add n

ew n

on-term

inals to

factor o

ut co

mm

on

prefix

es of ru

les

6

21

Left F

acto

ring

•C

on

sider a ru

le of th

e form

A ->

a B1 | a B

2 | a B

3 | …

a Bn

•A

top

do

wn

parser g

enerated

from

this g

ramm

ar is no

t efficien

t as it requ

ires back

trackin

g.

•T

o av

oid

this p

rob

lem w

e left factor th

e gram

mar.

–co

llect all pro

du

ction

s with

the sam

e left han

d sid

e and

b

egin

with

the sam

e sym

bo

ls on

the rig

ht h

and

side

–co

mb

ine th

e com

mo

n strin

gs in

to a sin

gle p

rod

uctio

n an

d

then

app

end

a new

no

n-term

inal sy

mb

ol to

the en

d o

f this

new

pro

ductio

n

–create n

ew p

roductio

ns u

sing th

is new

non-term

inal fo

r each

of th

e suffix

es to th

e com

mo

n p

rod

uctio

n.

•A

fter left factorin

g th

e abo

ve g

ramm

ar is transfo

rmed

into

:

A –

> a A

1

A1 ->

B1 | B

2 | B

3 …

Bn

22

Usin

g P

arsin

g T

ab

les

•L

L(1

) mean

s that fo

r each n

on

-termin

al and

tok

en th

ere is o

nly

on

e pro

du

ction

•C

an b

e specified

via 2

D tab

les

–O

ne d

imen

sion

for cu

rrent n

on

-termin

al to ex

pan

d

–O

ne d

imen

sion fo

r nex

t token

–A

table en

try co

ntain

s on

e pro

du

ction

•M

etho

d sim

ilar to recu

rsive d

escent, ex

cept

–F

or each

no

n-term

inal S

–W

e loo

k at th

e nex

t tok

en a

–A

nd ch

ose th

e pro

ductio

n sh

ow

n at [S

,a]

•W

e use a stack

to k

eep track

of p

end

ing

no

n-term

inals

•W

e reject wh

en w

e enco

un

ter an erro

r state

•W

e accept w

hen

we en

cou

nter en

d-o

f-inp

ut

23

LL

(1) P

arsin

g T

ab

le Exam

ple

Left-fa

ctored

gra

mm

ar

E

T X

X

+ E |

T

( E ) | int Y

Y

* T |

int

* +

( )

$

E T

X

T X

X

+ E

T in

t Y

( E )

Y

* T

Th

e LL

(1) p

arsin

g ta

ble

24

LL

(1) P

arsin

g T

ab

le Exam

ple

•Co

nsid

er the [E

, int] en

try

–“W

hen

curren

t no

n-term

inal is E

and

nex

t inp

ut is in

t, use p

rod

uctio

n E

T X

–T

his p

rod

uctio

n can

gen

erate an int in

the first p

lace

•Consid

er the [Y

, +] en

try

–“W

hen

curren

t no

n-term

inal is Y

and

curren

t tok

en is +

, get rid

of Y

”

–Y

can b

e follo

wed

by

+ o

nly

in a d

erivatio

n w

here Y

•Co

nsid

er the [E

,*] en

try

–B

lank

entries in

dicate erro

r situatio

ns

–“T

here is n

o w

ay to

deriv

e a string

starting

with

* fro

m n

on

-termin

al E”

int

* +

( )

$

E T

X

T X

X

+ E

T in

t Y

( E )

Y

* T

7

25

LL

(1) P

arsin

g A

lgorith

m

initia

lize s

tack =

<S $

> a

nd n

ext

repeat

case s

tack o

f <

X, re

st>

: if T[X

,*next] =

Y1 …

Yn

then s

tack

<Y

1 … Y

n rest>

; e

lse e

rror ();

<t, re

st>

: if t ==

*next +

+

then s

tack

<re

st>

; e

lse e

rror ();

until s

tack =

= <

>

(1) n

ext p

oin

ts to th

e nex

t inp

ut to

ken

;

(2) X

match

es som

e no

n-term

inal;

(3) t m

atches so

me term

inal.

wh

ere:

26

LL

(1) P

arsin

g E

xa

mp

le

Stack Input Action

E $ int * int $ pop();push(T X)

T X $ int * int $ pop();push(int Y)

int Y X $ int * int $ pop();next++

Y X $ * int $ pop();push(* T)

* T X $ * int $ pop();next++

T X $ int $ pop();push(int Y)

int Y X $ int $ pop();next++;

Y X $ $

X $ $

$ $ ACCEPT!

int

* +

( )

$

E T

X

T X

X

+ E

T in

t Y

( E )

Y

* T

27

Con

structin

g P

arsin

g T

ab

les

•L

L(1

) lang

uag

es are tho

se defin

ed b

y a p

arsing

tab

le for th

e LL

(1) alg

orith

m

•N

o tab

le entry

can b

e mu

ltiply

defin

ed

•W

e wan

t to g

enerate p

arsing

tables fro

m C

FG

•If A

, wh

ere in th

e line o

f A w

e place

?

•In

the co

lum

n o

f t wh

ere t can start a strin

g

deriv

ed fro

m

–

* t

–W

e say th

at t F

irst()

•In

the co

lum

n o

f t if is

and

t can fo

llow

an A

–S

*

A t

–W

e say t

Follo

w(A

) 28

Com

pu

ting F

irst Sets

Defin

ition

: First(X

) = {

t | X

* t}

{ | X

*

}

Alg

orith

m sk

etch (see b

oo

k fo

r details):

1.

for all term

inals t d

o F

irst(t) {

t }

2.

for each

pro

du

ction X

do F

irst(X)

{

}

3.

if X

A1 …

An

and

F

irst(Ai ), 1

i

n d

o

•ad

d F

irst() to

First(X

)

4.

for each

X

A1 …

An s.t.

F

irst(Ai ), 1

i

n d

o

•ad

d

to F

irst(X)

5.

repeat step

s 4 &

5 u

ntil n

o F

irst set can b

e gro

wn

8

29

First S

ets. Exam

ple

•R

ecall the g

ramm

ar E

T

X X

+

E |

T

( E ) | in

t Y Y

*

T |

•F

irst sets

First( ( ) =

{ ( }

First( T

) = {

int, ( }

First( ) ) =

{ ) }

First( E

) = {

int, ( }

First( in

t) = {

int }

First( X

) = {

+,

}

First( +

) = {

+ }

First( Y

) = {

*,

}

First( *

) = {

* }

30

Com

pu

ting F

ollo

w S

ets

•D

efinitio

n:

Fo

llow

(X) =

{ t | S

*

X t

}

•In

tuitio

n

–If S

is the start sy

mb

ol th

en $

F

ollo

w(S

)

–If X

A

B th

en F

irst(B)

Fo

llow

(A) an

d

Fo

llow

(X)

Fo

llow

(B)

–A

lso if B

*

then

Fo

llow

(X)

Fo

llow

(A)

31

Com

pu

ting F

ollo

w S

ets

Alg

orith

m sk

etch:

1.

Fo

llow

(S)

{ $

}

2.

Fo

r each p

rod

uctio

n A

X

•ad

d F

irst() - {

} to

Fo

llow

(X)

3.

Fo

r each A

X

wh

ere

First(

)

•ad

d F

ollo

w(A

) to F

ollo

w(X

)

•rep

eat step(s) _

__

un

til no

Fo

llow

set gro

ws

32

Follo

w S

ets. Exam

ple

•R

ecall the g

ramm

ar

E

T X

X

+ E

|

T

( E ) | in

t Y Y

*

T |

•F

ollo

w sets

Fo

llow

( + ) =

{ in

t, ( } F

ollo

w( *

) = {

int, ( }

Fo

llow

( ( ) = {

int, ( }

Fo

llow

( E ) =

{), $

}

Fo

llow

( X ) =

{$

, ) } F

ollo

w( T

) = {

+, ) , $

}

Fo

llow

( ) ) = {

+, ) , $

} F

ollo

w( Y

) = {

+, ) , $

}

Fo

llow

( int) =

{*, +

, ) , $}

9

33

Con

structin

g L

L(1

) Parsin

g T

ab

les

•C

on

struct a p

arsing

table T

for C

FG

G

•F

or each

pro

du

ction

A

in

G d

o:

–F

or each

termin

al t F

irst() d

o

•T

[A, t] =

–If

F

irst(), fo

r each t

Fo

llow

(A) d

o

•T

[A, t] =

–If

F

irst() an

d $

F

ollo

w(A

) do

•T

[A, $

] =

34

Notes o

n L

L(1

) Parsin

g T

ab

les

•If an

y en

try is m

ultip

ly d

efined

then

G is n

ot

LL

(1)

–If G

is amb

iguo

us

–If G

is left recursiv

e

–If G

is no

t left-factored

•M

ost p

rog

ramm

ing

lang

uag

e gram

mars are n

ot

LL

(1)

•T

here are to

ols th

at bu

ild L

L(1

) tables

Botto

m-u

p P

arsin

g

•Y

AC

C u

ses bo

ttom

up

parsin

g. T

here are tw

o

imp

ortan

t op

eration

s that b

otto

m-u

p p

arsers use.

Th

ey are n

amely

shift an

d red

uce.

–(In

abstract term

s, we d

o a sim

ulatio

n o

f a Push

Dow

n

Au

tom

ata as a finite state au

tom

ata.)

•In

pu

t: giv

en strin

g to

be p

arsed an

d th

e set of

pro

du

ction

s.

•G

oal: T

race a righ

tmo

st deriv

ation

in rev

erse by

starting

with

the in

pu

t string

and

wo

rkin

g

back

ward

s to th

e start sym

bo

l.

Alg

orith

m

1. S

tart with

an em

pty

stack an

d a fu

ll inp

ut b

uffer. (T

he strin

g to

be

parsed

is in th

e input b

uffer.)

2. R

epeat u

ntil th

e inp

ut b

uffer is em

pty

and

the stack

con

tains th

e start

sym

bol.

a. S

hift zero

or m

ore in

put sy

mbols o

nto

the stack

from

input b

uffer

un

til a han

dle (b

eta) is fou

nd

on

top

of th

e stack. If n

o h

and

le is fou

nd

repo

rt syn

tax erro

r and

exit.

b

. Red

uce h

and

le to th

e no

nterm

inal A

. (Th

ere is a pro

du

ction

A ->

beta

)

3. A

ccept in

pu

t string

and

return

som

e represen

tation

of th

e deriv

ation

sequ

ence fo

un

d (e.g

.., parse tree)

•T

he fo

ur k

ey o

peratio

ns in

botto

m-u

p p

arsing are sh

ift, reduce, accep

t

and erro

r.

•B

otto

m-u

p p

arsing is also

referred to

as shift-red

uce p

arsing.

•Im

po

rtant th

ing

to n

ote is to

kn

ow

wh

en to

shift an

d w

hen

to red

uce an

d

to w

hich

redu

ce.

10

STACK

INPUT BUFFER

ACTION

$

num1+num2*num3$

shift

$num1

+num2*num3$

reduc

$F

+num2*num3$

reduc

$T

+num2*num3$

reduc

$E

+num2*num3$

shift

$E+

num2*num3$

shift

$E+num2

*num3$

reduc

$E+F

*num3$

reduc

$E+T

*num3$

shift

E+T*

num3$

shift

E+T*num3

$

reduc

E+T*F

$

reduc

E+T

$

reduc

E

$

accept

Exam

ple o

f Botto

m-u

p P

arsin

g

E ->

E+

T

| T

| E-T

T ->

T*

F

| F

| T/F

F ->

(E)

| id

| -E

nu

m

top down vs. bottom up parsing - department of computer ...€¦ · bottom-up parsers: build the...

Documents