![Page 1: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/1.jpg)
Learning Accurate, Compact, and Interpretable Tree Annotation
Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein
![Page 2: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/2.jpg)
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98]
![Page 3: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/3.jpg)
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]
![Page 4: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/4.jpg)
The Game of Designing a Grammar
Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00] Automatic clustering?
![Page 5: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/5.jpg)
Previous Work:Manual Annotation
Manually split categories NP: subject vs object DT: determiners vs demonstratives IN: sentential vs prepositional
Advantages: Fairly compact grammar Linguistic motivations
Disadvantages: Performance leveled out Manually annotated
[Klein & Manning ’03]
Model F1
Naïve Treebank Grammar 72.6
Klein & Manning ’03 86.3
![Page 6: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/6.jpg)
Previous Work:Automatic Annotation Induction
Advantages: Automatically learned:
Label all nodes with latent variables.
Same number k of subcategories for all categories.
Disadvantages: Grammar gets too large Most categories are
oversplit while others are undersplit.
[Matsuzaki et. al ’05, Prescher ’05]
Model F1
Klein & Manning ’03 86.3
Matsuzaki et al. ’05 86.7
![Page 7: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/7.jpg)
Previous work is complementary
Manual Annotation
Allocates splits where needed
Very tedious
Compact Grammar
Misses Features
Automatic Annotation
Splits uniformly
Automatically learned
Large Grammar
Captures many features
This Work
Allocates splits where needed
Automatically learned
Compact Grammar
Captures many features
![Page 8: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/8.jpg)
Forward
Learning Latent Annotations
EM algorithm:
X1
X2X7X4
X5 X6X3
He was right
.
Brackets are known Base categories are known Only induce subcategories
Just like Forward-Backward for HMMs. Backward
![Page 9: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/9.jpg)
k=16k=8k=4
k=2
k=160
65
70
75
80
85
90
50 250 450 650 850 1050 1250 1450 1650
Total Number of grammar symbols
Par
sing
acc
urac
y (F
1)
OverviewLimit of computational resources
- Hierarchical Training- Adaptive Splitting- Parameter Smoothing
![Page 10: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/10.jpg)
Refinement of the DT tagDT
DT-1 DT-2 DT-3 DT-4
![Page 11: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/11.jpg)
Refinement of the DT tagDT
![Page 12: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/12.jpg)
Hierarchical refinement of the DT tag
![Page 13: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/13.jpg)
Hierarchical Estimation Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Par
sing
acc
urac
y (F
1)
Model F1
Baseline 87.3
Hierarchical Training 88.4
![Page 14: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/14.jpg)
Refinement of the , tag
Splitting all categories the same amount is wasteful:
![Page 15: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/15.jpg)
The DT tag revisited
Oversplit?
![Page 16: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/16.jpg)
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
![Page 17: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/17.jpg)
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
![Page 18: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/18.jpg)
Adaptive Splitting
Want to split complex categories more Idea: split everything, roll back splits which
were least useful
![Page 19: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/19.jpg)
Adaptive Splitting
Evaluate loss in likelihood from removing each split =
Data likelihood with split reversed
Data likelihood with split No loss in accuracy when 50% of the splits are
reversed.
![Page 20: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/20.jpg)
Adaptive Splitting Results
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols
Pars
ing
accu
racy
(F1)
50% Merging
Hierarchical Training
Flat Training
Model F1
Previous 88.4
With 50% Merging 89.5
![Page 21: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/21.jpg)
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP X
RO
OT
LST
Number of Phrasal Subcategories
![Page 22: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/22.jpg)
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP X
RO
OT
LST
Number of Phrasal Subcategories
PP
VPNP
![Page 23: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/23.jpg)
0
5
10
15
20
25
30
35
40
NP
VP PP
AD
VP S
AD
JP
SB
AR QP
WH
NP
PR
N
NX
SIN
V
PR
T
WH
PP
SQ
CO
NJP
FR
AG
NA
C
UC
P
WH
AD
VP
INT
J
SB
AR
Q
RR
C
WH
AD
JP
X
RO
OT
LST
Number of Phrasal Subcategories
XNAC
![Page 24: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/24.jpg)
Number of Lexical Subcategories
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
TO
,
POS
![Page 25: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/25.jpg)
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
Number of Lexical Subcategories
IN
DT
RB VBx
![Page 26: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/26.jpg)
Number of Lexical Subcategories
0
10
20
30
40
50
60
70
NN
P JJN
NS
NN
VB
N RB
VB
G VB
VB
D CD IN
VB
ZV
BP DT
NN
PS
CC
JJR
JJS :
PR
PP
RP
$M
DR
BR
WP
PO
SP
DT
WR
B-L
RB
- .E
XW
P$
WD
T-R
RB
- ''F
WR
BS
TO $
UH , ``
SY
M RP
LS#
NN
NNS
NNP JJ
![Page 27: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/27.jpg)
Smoothing
Heavy splitting can lead to overfitting Idea: Smoothing allows us to pool
statistics
![Page 28: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/28.jpg)
Linear Smoothing
![Page 29: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/29.jpg)
Result Overview
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100
Total Number of grammar symbols
Pa
rsin
g a
ccu
racy
(F
1)
50% Merging and Smoothing
50% Merging
Hierarchical Training
Flat Training
![Page 30: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/30.jpg)
Result Overview
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100
Total Number of grammar symbols
Pa
rsin
g a
ccu
racy
(F
1)
50% Merging and Smoothing
50% Merging
Hierarchical Training
Flat Training
![Page 31: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/31.jpg)
Result Overview
74
76
78
80
82
84
86
88
90
100 300 500 700 900 1100
Total Number of grammar symbols
Pa
rsin
g a
ccu
racy
(F
1)
50% Merging and Smoothing
50% Merging
Hierarchical Training
Flat Training
Model F1
Previous 89.5
With Smoothing 90.7
![Page 32: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/32.jpg)
Final Results
F1
≤ 40 words
F1
all wordsParser
Klein & Manning ’03 86.3 85.7
Matsuzaki et al. ’05 86.7 86.1
This Work 90.2 89.7
![Page 33: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/33.jpg)
Final Results
F1
≤ 40 words
F1
all wordsParser
Klein & Manning ’03 86.3 85.7
Matsuzaki et al. ’05 86.7 86.1
Collins ’99 88.6 88.2
Charniak & Johnson ’05 90.1 89.6
This Work 90.2 89.7
![Page 34: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/34.jpg)
Linguistic Candy
Proper Nouns (NNP):
Personal pronouns (PRP):
NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
![Page 35: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/35.jpg)
Linguistic Candy
Relative adverbs (RBR):
Cardinal Numbers (CD):
RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
![Page 36: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/36.jpg)
Conclusions
New Ideas: Hierarchical Training Adaptive Splitting Parameter Smoothing
State of the Art Parsing Performance: Improves from X-Bar initializer 63.4 to 90.2
Linguistically interesting grammars to sift through.
![Page 38: Learning Accurate, Compact, and Interpretable Tree Annotation](https://reader035.vdocuments.net/reader035/viewer/2022081603/56813b48550346895da42d1c/html5/thumbnails/38.jpg)
Other things we tried
X-Bar vs structurally annotated grammar: X-Bar grammar starts at lower performance, but provides
more flexibility Better Smoothing:
Tried different (hierarchical) smoothing methods, all worked about the same
(Linguistically) constraining rewrite possibilities between subcategories: Hurts performance EM automatically learns that most subcategory
combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability