top down vs. bottom up parsing - department of computer ...€¦ · bottom-up parsers: build the...
TRANSCRIPT
1
1 2
Parsin
g
•A
gram
mar d
escribes th
e string
s of to
ken
s that are
syn
tactically leg
al in a P
L
•A
recog
niser sim
ply
accepts o
r rejects string
s.
•A
gen
erator p
rod
uces sen
tences in
the lan
gu
age d
escribed
by
the g
ramm
ar
•A
parser co
nstru
ct a deriv
ation
or p
arse tree for a sen
tence (if
possib
le)
•T
wo co
mm
on ty
pes o
f parsers:
–b
otto
m-u
p o
r data d
riven
–to
p-d
ow
n o
r hyp
oth
esis driv
en
•A
recursive d
escent p
arser is a w
ay to
imp
lemen
t a top
-do
wn
parser th
at is particu
larly sim
ple.
3
To
p d
ow
n v
s. botto
m u
p p
arsin
g
•T
he p
arsing
pro
blem
is to co
nn
ect the ro
ot n
od
e S
with
the tree leav
es, the in
pu
t
•T
op
-dow
n p
arsers: starts co
nstru
cting
the p
arse tree at th
e top
(roo
t) of th
e parse tree an
d m
ov
e d
ow
n to
ward
s the leav
es. Easy
to im
plem
ent
by h
and, b
ut w
ork
with
restricted g
ramm
ars. ex
amp
les:
-P
redictiv
e parsers (e.g
., LL
(k))
•B
otto
m-u
p p
arsers: b
uild
the n
od
es on
the b
otto
m o
f the
parse tree first. S
uitab
le for au
tom
atic parser g
eneratio
n,
han
dle a larg
er class of g
ramm
ars. exam
ples:
–sh
ift-reduce p
arser (or L
R(k) p
arsers)
•B
oth
are gen
eral techn
iqu
es that can
be m
ade to
wo
rk fo
r all lan
gu
ages (b
ut n
ot all g
ramm
ars!).
S
A =
1 +
3 *
4 / 5
4
To
p d
ow
n v
s. botto
m u
p p
arsin
g
•B
oth
are gen
eral techn
iqu
es that can
be m
ade to
wo
rk
for all lan
gu
ages (b
ut n
ot all g
ramm
ars!).
•R
ecall that a g
iven
lang
uag
e can b
e describ
ed b
y
several g
ramm
ars.
•B
oth
of th
ese gram
mars d
escribe th
e same lan
gu
age
E -> E + Num
E -> Num
E -> Num + E
E -> Num
•T
he first o
ne, w
ith it’s left recu
rsion
, causes
pro
blem
s for to
p d
ow
n p
arsers.
•F
or a g
iven
parsin
g tech
niq
ue, w
e may
hav
e to
transfo
rm th
e gram
mar to
wo
rk w
ith it.
2
5
•H
ow
hard
is the p
arsing task
?
•P
arsing an
arbitrary
Contex
t Free G
ramm
ar is O(n
3), e.g., it
can tak
e time p
rop
ortio
nal th
e cub
e of th
e nu
mb
er of sy
mb
ols
in th
e inp
ut. T
his is b
ad!
•If w
e constrain
the g
ramm
ar som
ewhat, w
e can alw
ays p
arse
in lin
ear time. T
his is g
oo
d!
•L
inear-tim
e parsin
g
–L
L p
arsers
•R
ecog
nize L
L g
ramm
ar
•U
se a top-d
ow
n strateg
y
–L
R p
arsers
•R
ecog
nize L
R g
ramm
ar
•U
se a bo
ttom
-up
strategy
Parsin
g co
mp
lexity
•L
L(n
) : Left to
righ
t, L
eftmost d
erivatio
n,
look
ah
ead
at m
ost n
sy
mb
ols.
•L
R(n
) : Left to
righ
t, R
igh
t deriv
atio
n,
look
ah
ead
at m
ost n
sy
mb
ols.
6
Top
Dow
n P
arsin
g M
ethod
s
•S
imp
lest meth
od
is a full-b
acku
p, recu
rsive descen
t
parser
•O
ften u
sed fo
r parsin
g sim
ple lan
guag
es
•W
rite recursiv
e recog
nizers (su
bro
utin
es) for each
gram
mar ru
le
–If ru
les succeed
s perfo
rm so
me actio
n (i.e., b
uild
a
tree no
de, em
it cod
e, etc.)
–If ru
le fails, return
failure. C
aller may
try an
oth
er
cho
ice or fail
–O
n failu
re it “back
s up
”
7
Top
Dow
n P
arsin
g M
ethod
s
•P
roblem
s
–W
hen
goin
g fo
rward
, the p
arser consu
mes to
ken
s
from
the in
pu
t, so w
hat h
app
ens if w
e hav
e to b
ack
up
?
–A
lgo
rithm
s that u
se back
up
tend
to b
e, in g
eneral,
inefficien
t
–G
ramm
ar rules w
hich
are left-recursiv
e lead to
no
n-
termin
ation
!
8
Recu
rsive D
ecent P
arsin
g E
xam
ple
Exa
mp
le: F
or th
e g
ram
ma
r:
<term> -> <factor> {(*|/)<factor>}*
We co
uld
use th
e follo
win
g recu
rsive
descen
t parsin
g su
bp
rog
ram (th
is on
e is w
ritten in
C)
void term() {
factor(); /* parse first factor*/
while (next_token == ast_code ||
next_token == slash_code) {
lexical(); /* get next token */
factor(); /* parse next factor */
}
}
3
9
Pro
blem
s
•S
om
e gram
mars cau
se pro
blem
s for to
p
do
wn
parsers.
•T
op
do
wn
parsers d
o n
ot w
ork
with
left-recu
rsive g
ramm
ars.
–E
.g., o
ne w
ith a ru
le like: E
-> E
+ T
–W
e can tran
sform
a left-recursiv
e gram
mar in
to
on
e wh
ich is n
ot.
•A
top
do
wn
gram
mar can
limit b
acktrack
ing
if it on
ly h
as on
e rule p
er no
n-term
inal
–T
he tech
niq
ue o
f rule facto
ring
can b
e used
to
elimin
ate mu
ltiple ru
les for a n
on
-termin
al.
10
Left-recu
rsive g
ram
ma
rs
•A
gram
mar is left recu
rsive if it h
as rules lik
e X -> X
•O
r if it has in
direct left recu
rsion
, as in
X -> A
A -> X
•Q
: Wh
y is th
is a pro
blem
?
–A
: it can lead
to n
on
-termin
ating
recursio
n!
•C
onsid
er
E -> E + Num
E -> Num
•W
e can m
anu
ally o
r auto
matically
rewrite a g
ramm
ar to
remo
ve left-recu
rsion
, mak
ing
it suitab
le for a to
p-d
ow
n
parser.
11
Elim
inatio
n o
f Left R
ecursio
n
•C
on
sider th
e left-recursiv
e
gram
mar
S
S
S ->
•S
gen
erates strings
…
•R
ewrite u
sing rig
ht-
recursio
n
S
S’
S’
S’|
•C
on
cretely
T -> T + id
T-> id
•T
gen
erates strings
id
id+
id
id+
id+
id
…
•R
ewrite u
sing
righ
t-recu
rsion
T -> id T’
T’ -> id T’
T’ ->
12
More E
limin
atio
n o
f Left-R
ecursio
n
•In
gen
eral
S
S
1 | … | S
n |
1 | … |
m
•A
ll string
s deriv
ed fro
m S
start with
on
e of
1 ,
…,
m an
d co
ntin
ue w
ith sev
eral instan
ces of
1 ,
…,
n
•R
ewrite as
S
1 S
’ | … |
m S
’
S’
1 S
’ | … |
n S
’ |
4
13
Gen
eral L
eft Recu
rsion
•T
he g
ramm
ar
S
A
|
A
S
is also left-recu
rsive b
ecause
S
+ S
wh
ere + m
eans “can
be rew
ritten in
on
e or
mo
re steps”
•T
his in
direct left-recu
rsion
can also
be
auto
matically
elimin
ated
14
Su
mm
ary
of R
ecursiv
e Descen
t
•S
imp
le and
gen
eral parsin
g strateg
y
–L
eft-recursio
n m
ust b
e elimin
ated first
–…
bu
t that can
be d
on
e auto
matically
•U
np
op
ular b
ecause o
f back
trackin
g
–T
ho
ug
ht to
be to
o in
efficient
•In
practice, b
acktrack
ing
is elimin
ated b
y
restricting
the g
ramm
ar, allow
ing
us to
successfu
lly p
redict w
hich
rule to
use.
15
Pred
ictive P
arser
•A
pred
ictive p
arser u
ses info
rmatio
n fro
m th
e first term
ina
l symb
ol o
f each ex
pressio
n to
decid
e w
hich
pro
du
ction
to u
se.
•A
pred
ictive p
arser is also k
no
wn
as an L
L(k
) p
arser becau
se it do
es a Left-to
-righ
t pa
rse, a L
eftmo
st-deriva
tion
, and
k-sym
bo
l loo
ka
hea
d.
•A
gram
mar in
wh
ich it is p
ossib
le to d
ecide w
hich
p
rod
uctio
n to
use ex
amin
ing
on
ly th
e first tok
en (as
in th
e prev
iou
s exam
ple) are called
LL
(1)
•L
L(1
) gram
mars are w
idely
used
in p
ractice.
–T
he sy
ntax
of a P
L can
be ad
justed
to en
able it to
be
describ
ed w
ith an
LL
(1) g
ramm
ar.
16
Pred
ictive P
arser
Ex
amp
le: con
sider th
e gram
mar
S
if E th
en S
else S
S
beg
in S
L
S
prin
t E
L
end
L
; S L
E
num
= n
um
An S
expressio
n starts eith
er with
an IF
, BE
GIN
, or P
RIN
T to
ken
,
and
an L
expressio
n start w
ith an
E
ND
or a S
EM
ICO
LO
N to
ken
,
and
an E
expressio
n h
as only
one
pro
du
ction
.
5
17
Rem
emb
er…
•G
iven
a gram
mar an
d a strin
g in
the lan
gu
age
defin
ed b
y th
e gram
mar…
•T
here m
ay b
e mo
re than
on
e way
to d
erive th
e
string
leadin
g to
the sam
e parse tree
–it ju
st dep
end
s on
the o
rder in
wh
ich y
ou
app
ly th
e rules
–an
d w
hat p
arts of th
e string
yo
u ch
oo
se to rew
rite nex
t
•A
ll of th
e deriv
ation
s are valid
•T
o sim
plify
the p
rob
lem an
d th
e algo
rithm
s, we
often
focu
s on
on
e of
–A
leftmo
st deriv
ation
–A
righ
tmo
st deriv
ation
18
LL
(k) a
nd
LR
(k) p
arsers
•T
wo
imp
ortan
t classes of p
arsers are called L
L(k
) parsers an
d
LR
(k) p
arsers.
•T
he n
ame L
L(k
) mean
s:
–L
- Left-to
-righ
t scann
ing
of th
e inp
ut
–L
- Co
nstru
cting
leftmo
st deriva
tion
–k
– m
ax n
um
ber o
f inp
ut sy
mb
ols n
eeded
to select a p
arser action
•T
he n
ame L
R(k
) mean
s:
–L
- Left-to
-righ
t scann
ing
of th
e inp
ut
–R
- Co
nstru
cting
righ
tmo
st deriva
tion in
reverse
–k
– m
ax n
um
ber o
f inp
ut sy
mb
ols n
eeded
to select a p
arser action
•S
o, a L
R(1
) parser n
ever n
eeds to
“loo
k ah
ead” m
ore th
an o
ne in
pu
t
tok
en to
kn
ow
wh
at parser p
rod
uctio
n to
app
ly n
ext.
19
Pred
ictive P
arsin
g a
nd
Left F
acto
ring
•C
on
sider th
e gram
mar
E
T + E
E
T
T
int
T
int * T
T
( E )
•H
ard to
pred
ict becau
se
–F
or T
, two p
roductio
ns start w
ith in
t
–F
or E
, it is no
t clear ho
w to
pred
ict wh
ich ru
le to u
se
•A
gram
mar m
ust b
e left-facto
red b
efore u
se for p
redictiv
e
parsin
g
•L
eft-factorin
g in
vo
lves rew
riting
the ru
les so th
at, if a no
n-
termin
al has m
ore th
an o
ne ru
le, each b
egin
s with
a
termin
al.
20
Left-F
acto
ring E
xam
ple
E
T + E
E
T
T
int
T
int * T
T
( E )
E
T X
X
+ E
X
T
( E )
T
int Y
Y
* T
Y
Add n
ew n
on-term
inals to
factor o
ut co
mm
on
prefix
es of ru
les
6
21
Left F
acto
ring
•C
on
sider a ru
le of th
e form
A ->
a B1 | a B
2 | a B
3 | …
a Bn
•A
top
do
wn
parser g
enerated
from
this g
ramm
ar is no
t efficien
t as it requ
ires back
trackin
g.
•T
o av
oid
this p
rob
lem w
e left factor th
e gram
mar.
–co
llect all pro
du
ction
s with
the sam
e left han
d sid
e and
b
egin
with
the sam
e sym
bo
ls on
the rig
ht h
and
side
–co
mb
ine th
e com
mo
n strin
gs in
to a sin
gle p
rod
uctio
n an
d
then
app
end
a new
no
n-term
inal sy
mb
ol to
the en
d o
f this
new
pro
ductio
n
–create n
ew p
roductio
ns u
sing th
is new
non-term
inal fo
r each
of th
e suffix
es to th
e com
mo
n p
rod
uctio
n.
•A
fter left factorin
g th
e abo
ve g
ramm
ar is transfo
rmed
into
:
A –
> a A
1
A1 ->
B1 | B
2 | B
3 …
Bn
22
Usin
g P
arsin
g T
ab
les
•L
L(1
) mean
s that fo
r each n
on
-termin
al and
tok
en th
ere is o
nly
on
e pro
du
ction
•C
an b
e specified
via 2
D tab
les
–O
ne d
imen
sion
for cu
rrent n
on
-termin
al to ex
pan
d
–O
ne d
imen
sion fo
r nex
t token
–A
table en
try co
ntain
s on
e pro
du
ction
•M
etho
d sim
ilar to recu
rsive d
escent, ex
cept
–F
or each
no
n-term
inal S
–W
e loo
k at th
e nex
t tok
en a
–A
nd ch
ose th
e pro
ductio
n sh
ow
n at [S
,a]
•W
e use a stack
to k
eep track
of p
end
ing
no
n-term
inals
•W
e reject wh
en w
e enco
un
ter an erro
r state
•W
e accept w
hen
we en
cou
nter en
d-o
f-inp
ut
23
LL
(1) P
arsin
g T
ab
le Exam
ple
Left-fa
ctored
gra
mm
ar
E
T X
X
+ E |
T
( E ) | int Y
Y
* T |
int
* +
( )
$
E T
X
T X
X
+ E
T in
t Y
( E )
Y
* T
Th
e LL
(1) p
arsin
g ta
ble
24
LL
(1) P
arsin
g T
ab
le Exam
ple
•Co
nsid
er the [E
, int] en
try
–“W
hen
curren
t no
n-term
inal is E
and
nex
t inp
ut is in
t, use p
rod
uctio
n E
T X
–T
his p
rod
uctio
n can
gen
erate an int in
the first p
lace
•Consid
er the [Y
, +] en
try
–“W
hen
curren
t no
n-term
inal is Y
and
curren
t tok
en is +
, get rid
of Y
”
–Y
can b
e follo
wed
by
+ o
nly
in a d
erivatio
n w
here Y
•Co
nsid
er the [E
,*] en
try
–B
lank
entries in
dicate erro
r situatio
ns
–“T
here is n
o w
ay to
deriv
e a string
starting
with
* fro
m n
on
-termin
al E”
int
* +
( )
$
E T
X
T X
X
+ E
T in
t Y
( E )
Y
* T
7
25
LL
(1) P
arsin
g A
lgorith
m
initia
lize s
tack =
<S $
> a
nd n
ext
repeat
case s
tack o
f <
X, re
st>
: if T[X
,*next] =
Y1 …
Yn
then s
tack
<Y
1 … Y
n rest>
; e
lse e
rror ();
<t, re
st>
: if t ==
*next +
+
then s
tack
<re
st>
; e
lse e
rror ();
until s
tack =
= <
>
(1) n
ext p
oin
ts to th
e nex
t inp
ut to
ken
;
(2) X
match
es som
e no
n-term
inal;
(3) t m
atches so
me term
inal.
wh
ere:
26
LL
(1) P
arsin
g E
xa
mp
le
Stack Input Action
E $ int * int $ pop();push(T X)
T X $ int * int $ pop();push(int Y)
int Y X $ int * int $ pop();next++
Y X $ * int $ pop();push(* T)
* T X $ * int $ pop();next++
T X $ int $ pop();push(int Y)
int Y X $ int $ pop();next++;
Y X $ $
X $ $
$ $ ACCEPT!
int
* +
( )
$
E T
X
T X
X
+ E
T in
t Y
( E )
Y
* T
27
Con
structin
g P
arsin
g T
ab
les
•L
L(1
) lang
uag
es are tho
se defin
ed b
y a p
arsing
tab
le for th
e LL
(1) alg
orith
m
•N
o tab
le entry
can b
e mu
ltiply
defin
ed
•W
e wan
t to g
enerate p
arsing
tables fro
m C
FG
•If A
, wh
ere in th
e line o
f A w
e place
?
•In
the co
lum
n o
f t wh
ere t can start a strin
g
deriv
ed fro
m
–
* t
–W
e say th
at t F
irst()
•In
the co
lum
n o
f t if is
and
t can fo
llow
an A
–S
*
A t
–W
e say t
Follo
w(A
) 28
Com
pu
ting F
irst Sets
Defin
ition
: First(X
) = {
t | X
* t}
{ | X
*
}
Alg
orith
m sk
etch (see b
oo
k fo
r details):
1.
for all term
inals t d
o F
irst(t) {
t }
2.
for each
pro
du
ction X
do F
irst(X)
{
}
3.
if X
A1 …
An
and
F
irst(Ai ), 1
i
n d
o
•ad
d F
irst() to
First(X
)
4.
for each
X
A1 …
An s.t.
F
irst(Ai ), 1
i
n d
o
•ad
d
to F
irst(X)
5.
repeat step
s 4 &
5 u
ntil n
o F
irst set can b
e gro
wn
8
29
First S
ets. Exam
ple
•R
ecall the g
ramm
ar E
T
X X
+
E |
T
( E ) | in
t Y Y
*
T |
•F
irst sets
First( ( ) =
{ ( }
First( T
) = {
int, ( }
First( ) ) =
{ ) }
First( E
) = {
int, ( }
First( in
t) = {
int }
First( X
) = {
+,
}
First( +
) = {
+ }
First( Y
) = {
*,
}
First( *
) = {
* }
30
Com
pu
ting F
ollo
w S
ets
•D
efinitio
n:
Fo
llow
(X) =
{ t | S
*
X t
}
•In
tuitio
n
–If S
is the start sy
mb
ol th
en $
F
ollo
w(S
)
–If X
A
B th
en F
irst(B)
Fo
llow
(A) an
d
Fo
llow
(X)
Fo
llow
(B)
–A
lso if B
*
then
Fo
llow
(X)
Fo
llow
(A)
31
Com
pu
ting F
ollo
w S
ets
Alg
orith
m sk
etch:
1.
Fo
llow
(S)
{ $
}
2.
Fo
r each p
rod
uctio
n A
X
•ad
d F
irst() - {
} to
Fo
llow
(X)
3.
Fo
r each A
X
wh
ere
First(
)
•ad
d F
ollo
w(A
) to F
ollo
w(X
)
•rep
eat step(s) _
__
un
til no
Fo
llow
set gro
ws
32
Follo
w S
ets. Exam
ple
•R
ecall the g
ramm
ar
E
T X
X
+ E
|
T
( E ) | in
t Y Y
*
T |
•F
ollo
w sets
Fo
llow
( + ) =
{ in
t, ( } F
ollo
w( *
) = {
int, ( }
Fo
llow
( ( ) = {
int, ( }
Fo
llow
( E ) =
{), $
}
Fo
llow
( X ) =
{$
, ) } F
ollo
w( T
) = {
+, ) , $
}
Fo
llow
( ) ) = {
+, ) , $
} F
ollo
w( Y
) = {
+, ) , $
}
Fo
llow
( int) =
{*, +
, ) , $}
9
33
Con
structin
g L
L(1
) Parsin
g T
ab
les
•C
on
struct a p
arsing
table T
for C
FG
G
•F
or each
pro
du
ction
A
in
G d
o:
–F
or each
termin
al t F
irst() d
o
•T
[A, t] =
–If
F
irst(), fo
r each t
Fo
llow
(A) d
o
•T
[A, t] =
–If
F
irst() an
d $
F
ollo
w(A
) do
•T
[A, $
] =
34
Notes o
n L
L(1
) Parsin
g T
ab
les
•If an
y en
try is m
ultip
ly d
efined
then
G is n
ot
LL
(1)
–If G
is amb
iguo
us
–If G
is left recursiv
e
–If G
is no
t left-factored
•M
ost p
rog
ramm
ing
lang
uag
e gram
mars are n
ot
LL
(1)
•T
here are to
ols th
at bu
ild L
L(1
) tables
Botto
m-u
p P
arsin
g
•Y
AC
C u
ses bo
ttom
up
parsin
g. T
here are tw
o
imp
ortan
t op
eration
s that b
otto
m-u
p p
arsers use.
Th
ey are n
amely
shift an
d red
uce.
–(In
abstract term
s, we d
o a sim
ulatio
n o
f a Push
Dow
n
Au
tom
ata as a finite state au
tom
ata.)
•In
pu
t: giv
en strin
g to
be p
arsed an
d th
e set of
pro
du
ction
s.
•G
oal: T
race a righ
tmo
st deriv
ation
in rev
erse by
starting
with
the in
pu
t string
and
wo
rkin
g
back
ward
s to th
e start sym
bo
l.
Alg
orith
m
1. S
tart with
an em
pty
stack an
d a fu
ll inp
ut b
uffer. (T
he strin
g to
be
parsed
is in th
e input b
uffer.)
2. R
epeat u
ntil th
e inp
ut b
uffer is em
pty
and
the stack
con
tains th
e start
sym
bol.
a. S
hift zero
or m
ore in
put sy
mbols o
nto
the stack
from
input b
uffer
un
til a han
dle (b
eta) is fou
nd
on
top
of th
e stack. If n
o h
and
le is fou
nd
repo
rt syn
tax erro
r and
exit.
b
. Red
uce h
and
le to th
e no
nterm
inal A
. (Th
ere is a pro
du
ction
A ->
beta
)
3. A
ccept in
pu
t string
and
return
som
e represen
tation
of th
e deriv
ation
sequ
ence fo
un
d (e.g
.., parse tree)
•T
he fo
ur k
ey o
peratio
ns in
botto
m-u
p p
arsing are sh
ift, reduce, accep
t
and erro
r.
•B
otto
m-u
p p
arsing is also
referred to
as shift-red
uce p
arsing.
•Im
po
rtant th
ing
to n
ote is to
kn
ow
wh
en to
shift an
d w
hen
to red
uce an
d
to w
hich
redu
ce.
10
STACK
INPUT BUFFER
ACTION
$
num1+num2*num3$
shift
$num1
+num2*num3$
reduc
$F
+num2*num3$
reduc
$T
+num2*num3$
reduc
$E
+num2*num3$
shift
$E+
num2*num3$
shift
$E+num2
*num3$
reduc
$E+F
*num3$
reduc
$E+T
*num3$
shift
E+T*
num3$
shift
E+T*num3
$
reduc
E+T*F
$
reduc
E+T
$
reduc
E
$
accept
Exam
ple o
f Botto
m-u
p P
arsin
g
E ->
E+
T
| T
| E-T
T ->
T*
F
| F
| T/F
F ->
(E)
| id
| -E
nu
m