learning accurate, compact, and interpretable tree annotation

Learning Accurate, Compact, and Interpretable Tree Annotation

Recent Advances in Parsing Technology

WS 2011/2012

Saarland University in Saarbrücken Miloš Ercegovčević

Outline

Introduction EM algorithm

Latent Grammars Motivation Learning Latent PCFG

Split-Merge Adaptation

Efficient inference with Latent Grammars Pruning in Multilevel Coarse-to-Fine parsing Parse Selection

Introduction : EM Algorithm

Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models

X – observed data; Z – set of latent variables Θ – a vector of unknown parametes Likelihood function:

MLE of the marginal likelihood :

However this quantity is intractable Often we don’t know both Z and Θ

)|,()|();( ZXpXpXL z

)|,(),;( ZXpZXL

Introduction : EM Algorithm

Find the MLE of the marginal likelihood by iteratively applying two steps:

Expectation step (E-step): Calculate Z under current Θ

Maximization step (M-step): Find Θ that maximizes the quantity

)|(maxarg )()1( tt Q

)],;([log)|( )(,|

)( ZXLEq tXZ

t

Latent PCFG

Standard coarse Treebank Tree Baseline for parsing F1 72.6

Parent annotated trees [Johnson ’98], [Klein & Manning ’03]

F1 86.3

Latent PCFG

Latent PCFG

Head lexicalized [Collins ’99, Charniak ’00] trees F1 88.6

Latent PCFG

Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05]

Same number of subcategories for all categories

Latent PCFG

At each step split the categories into two sets. After 6 iterations number of subcategories is 64 Initialize EM with the results of the smaller grammar

Learning Latent PCFG

S

Induce subcategories Like forward-backward

for HMMs Fixed brackets

Forward

X1

X2X7X4

X5 X6X3

He was right

.

Backward

Learning Latent Grammar Inside-Outside probabilities

),,(),,()(),,(,

zINyINzyxzy

XIN CtsPBsrPCBAAtrP

),,(),,()(),,(,

zINxOUTzyxzx

yOUT CtsPAtrPCBABtrP

),,(),,()(),,(,

yINxOUTzyxyx

zOUT BsrPAtrPCBACtsP

Learning Latent Grammar Expectation step (E-step):

Maximization step (M-step):

),,(),,(

)(),,(),|),,,((

zINyIN

zyxxOUTzyx CtsPBsrP

CBAAtrPTwCBAtsrP

}{#

}{#:)(

''',' zyxzy

zyxzyx CBA

CBACBA

Latent Grammar : Adaptive splitting

Without loss in Accuracy

Want to split more according to the data Solution: Split everything then merge by the

loss

split with likelihood Data

reversedsplit with likelihood Data

Latent Grammar : Adaptive splitting The likelihood of data for tree T and sentence

w:

Then for two annotations the overall loss can be estimated as:

),,(),,(),( XOUTxx

IN AtrPAtrPTwP

i Tn i

ii

in

ANNOTATION

iTwP

TwPAA

),(

),(),( 21

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP X

RO

OT

LST

Number of Phrasal Subcategories

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP X

RO

OT

LST


PP

VPNP

0

5

10

15

20

25

30

35

40

NP

VP PP

AD

VP S

AD

JP

SB

AR QP

WH

NP

PR

N

NX

SIN

V

PR

T

WH

PP

SQ

CO

NJP

FR

AG

NA

C

UC

P

WH

AD

VP

INT

J

SB

AR

Q

RR

C

WH

AD

JP

X

RO

OT

LST


XNAC

Number of Lexical Subcategories

0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#

TO

,

POS

0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#


IN

DT

RB VBx


0

10

20

30

40

50

60

70

NN

P JJN

NS

NN

VB

N RB

VB

G VB

VB

D CD IN

VB

ZV

BP DT

NN

PS

CC

JJR

JJS :

PR

PP

RP

$M

DR

BR

WP

PO

SP

DT

WR

B-L

RB

- .E

XW

P$

WD

T-R

RB

- ''F

WR

BS

TO $

UH , ``

SY

M RP

LS#

NN

NNS

NNP JJ

Latent Grammar : Results

ParserF1

≤ 40 words

F1

all words

Klein & Manning ’03 86.3 85.7

Matsuzaki et al. ’05 86.7 86.1

Collins ’99 88.6 88.2

Charniak & Johnson ’05 90.1 89.6

Petrov et al. ‘06 90.2 89.7

Efficient inference with Latent Grammars Latent Grammar with 91.2 F1 score on Dev Set (1600

sentences) WSJ Training time 1621: more than a minute per sentence For usage in real-word applications this is to slow

Improve on inference: Hierarchical Pruning Parse Selection

Intermediate Grammars

X-Bar=G0

G=

G1

G2

G3

G4

G5

G6

Lea

rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

G1

G2

G3

G4

G5

G6

Lea

rning

G1

G2

G3

G4

G5

G6

Lea

rning

Projected Grammars

X-Bar=G0

G=

Pro

jectio

n i

0(G)

1(G)

2(G)

3(G)

4(G)

5(G)

G

Treebank

Rules in (G)

S NP VP

Rules in G

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

Infinite tree distribution

…

…

0.56

Estimating Grammars

Hierarchical Pruning

Consider the span:

… QP NP VP …coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Parse Selection

Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs:

Intractable: we cannot generate all the TT

TP T

TPTT

P TTLGwTPT ),(),|(minarg*

Parse Selection

Possible solutions best derivation generate n-best parses and re-rank them

sampling derivations of the grammar select the minimum risk candidate based on loss

function of posterior marginals:

),0,(

),,,(),,,(

nrootP

jkiBCArjkiBCAq

IN

Te

TG eqT )(maxarg

Results

Thank You!

References

S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides.

S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides.

S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440.

S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06.

T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

learning accurate, compact, and interpretable tree annotation

Documents

latent grammarspruning

current maximization

data z set of latent

marginal likelihood

parsing f1

em algorithmfind

parsing technologyws

tree t