learning accurate, compact, and interpretable tree annotation
DESCRIPTION
Learning Accurate, Compact, and Interpretable Tree Annotation. Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Milo š Ercegovčević. Outline. Introduction EM algorithm Latent Grammars Motivation Learning Latent PCFG - PowerPoint PPT PresentationTRANSCRIPT
Learning Accurate, Compact, and Interpretable Tree Annotation
Recent Advances in Parsing Technology
WS 2011/2012
Saarland University in Saarbrücken Miloš Ercegovčević
Outline
Introduction EM algorithm
Latent Grammars Motivation Learning Latent PCFG
Split-Merge Adaptation
Efficient inference with Latent Grammars Pruning in Multilevel Coarse-to-Fine parsing Parse Selection
Introduction : EM Algorithm
Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models
X – observed data; Z – set of latent variables Θ – a vector of unknown parametes Likelihood function:
MLE of the marginal likelihood :
However this quantity is intractable Often we don’t know both Z and Θ
)|,()|();( ZXpXpXL z
)|,(),;( ZXpZXL
Introduction : EM Algorithm
Find the MLE of the marginal likelihood by iteratively applying two steps:
Expectation step (E-step): Calculate Z under current Θ
Maximization step (M-step): Find Θ that maximizes the quantity
)|(maxarg )()1( tt Q
)],;([log)|( )(,|
)( ZXLEq tXZ
t
Latent PCFG
Standard coarse Treebank Tree Baseline for parsing F1 72.6
Parent annotated trees [Johnson ’98], [Klein & Manning ’03]
F1 86.3
Latent PCFG
Latent PCFG
Head lexicalized [Collins ’99, Charniak ’00] trees F1 88.6
Latent PCFG
Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05]
Same number of subcategories for all categories
Latent PCFG
At each step split the categories into two sets. After 6 iterations number of subcategories is 64 Initialize EM with the results of the smaller grammar
Learning Latent PCFG
S
Induce subcategories Like forward-backward
for HMMs Fixed brackets
Forward
X1
X2X7X4
X5 X6X3
He was right
.
Backward
Learning Latent Grammar Inside-Outside probabilities
),,(),,()(),,(,
zINyINzyxzy
XIN CtsPBsrPCBAAtrP
),,(),,()(),,(,
zINxOUTzyxzx
yOUT CtsPAtrPCBABtrP
),,(),,()(),,(,
yINxOUTzyxyx
zOUT BsrPAtrPCBACtsP
Learning Latent Grammar Expectation step (E-step):
Maximization step (M-step):
),,(),,(
)(),,(),|),,,((
zINyIN
zyxxOUTzyx CtsPBsrP
CBAAtrPTwCBAtsrP
}{#
}{#:)(
''',' zyxzy
zyxzyx CBA
CBACBA
Latent Grammar : Adaptive splitting
Without loss in Accuracy
Want to split more according to the data Solution: Split everything then merge by the
loss
split with likelihood Data
reversedsplit with likelihood Data
Latent Grammar : Adaptive splitting The likelihood of data for tree T and sentence
w:
Then for two annotations the overall loss can be estimated as:
),,(),,(),( XOUTxx
IN AtrPAtrPTwP
i Tn i
ii
in
ANNOTATION
iTwP
TwPAA
),(
),(),( 21
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP X
RO
OT
LST
Number of Phrasal Subcategories
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP X
RO
OT
LST
Number of Phrasal Subcategories
PP
VPNP
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP
X
RO
OT
LST
Number of Phrasal Subcategories
XNAC
Number of Lexical Subcategories
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
TO
,
POS
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
Number of Lexical Subcategories
IN
DT
RB VBx
Number of Lexical Subcategories
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
NN
NNS
NNP JJ
Latent Grammar : Results
ParserF1
≤ 40 words
F1
all words
Klein & Manning ’03 86.3 85.7
Matsuzaki et al. ’05 86.7 86.1
Collins ’99 88.6 88.2
Charniak & Johnson ’05 90.1 89.6
Petrov et al. ‘06 90.2 89.7
Efficient inference with Latent Grammars Latent Grammar with 91.2 F1 score on Dev Set (1600
sentences) WSJ Training time 1621: more than a minute per sentence For usage in real-word applications this is to slow
Improve on inference: Hierarchical Pruning Parse Selection
Intermediate Grammars
X-Bar=G0
G=
G1
G2
G3
G4
G5
G6
Lea
rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8
DT1 DT2 DT3 DT4
DT1
DT
DT2
G1
G2
G3
G4
G5
G6
Lea
rning
G1
G2
G3
G4
G5
G6
Lea
rning
Projected Grammars
X-Bar=G0
G=
Pro
jectio
n i
0(G)
1(G)
2(G)
3(G)
4(G)
5(G)
G
Treebank
Rules in (G)
S NP VP
Rules in G
S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12
Infinite tree distribution
…
…
0.56
Estimating Grammars
Hierarchical Pruning
Consider the span:
… QP NP VP …coarse:
split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …
… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:
split in eight: … … … … … … … … … … … … … … … … …
Parse Selection
Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs:
Intractable: we cannot generate all the TT
TP T
TPTT
P TTLGwTPT ),(),|(minarg*
Parse Selection
Possible solutions best derivation generate n-best parses and re-rank them
sampling derivations of the grammar select the minimum risk candidate based on loss
function of posterior marginals:
),0,(
),,,(),,,(
nrootP
jkiBCArjkiBCAq
IN
Te
TG eqT )(maxarg
Results
Thank You!
References
S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides.
S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides.
S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440.
S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06.
T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.