log-linear models for history-based parsingmcollins/cs4705-spring2020/slides/loglinear... · i a...
TRANSCRIPT
![Page 1: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/1.jpg)
Log-Linear Models for History-Based Parsing
Michael Collins, Columbia University
![Page 2: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/2.jpg)
Log-Linear Taggers: Summary
I The input sentence is w[1:n] = w1 . . . wn
I Each tag sequence t[1:n] has a conditional probability
p(t[1:n] | w[1:n]) =∏n
j=1 p(tj | w1 . . . wn, t1 . . . tj−1) Chain rule
=∏n
j=1 p(tj | w1 . . . wn, tj−2, tj−1) Independence
assumptions
I Estimate p(tj | w1 . . . wn, tj−2, tj−1) using log-linear models
I Use the Viterbi algorithm to compute
argmaxt[1:n]log p(t[1:n] | w[1:n])
![Page 3: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/3.jpg)
A General Approach:
(Conditional) History-Based Models
I We’ve shown how to define p(t[1:n] | w[1:n]) where t[1:n] is atag sequence
I How do we define p(T | S) if T is a parse tree (or anotherstructure)? (We use the notation S = w[1:n])
![Page 4: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/4.jpg)
A General Approach:
(Conditional) History-Based Models
I Step 1: represent a tree as a sequence of decisions d1 . . . dm
T = 〈d1, d2, . . . dm〉m is not necessarily the length of the sentence
I Step 2: the probability of a tree is
p(T | S) =m∏i=1
p(di | d1 . . . di−1, S)
I Step 3: Use a log-linear model to estimatep(di | d1 . . . di−1, S)
I Step 4: Search?? (answer we’ll get to later: beam orheuristic search)
![Page 5: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/5.jpg)
An Example Tree
S(questioned)
NP(lawyer)
DT
the
NN
lawyer
VP(questioned)
Vt
questioned
NP(witness)
DT
the
NN
witness
PP(about)
IN
about
NP(revolver)
DT
the
NN
revolver
![Page 6: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/6.jpg)
Ratnaparkhi’s Parser: Three Layers of Structure
1. Part-of-speech tags
2. Chunks
3. Remaining structure
![Page 7: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/7.jpg)
Layer 1: Part-of-Speech Tags
DT
the
NN
lawyer
Vt
questioned
DT
the
NN
witness
IN
about
DT
the
NN
revolver
I Step 1: represent a tree as a sequence of decisions d1 . . . dm
T = 〈d1, d2, . . . dm〉
I First n decisions are tagging decisions〈d1 . . . dn〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN 〉
![Page 8: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/8.jpg)
Layer 2: Chunks
NP
DT
the
NN
lawyer
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
Chunks are defined as any phrase where all children arepart-of-speech tags
(Other common chunks are ADJP, QP)
![Page 9: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/9.jpg)
Layer 2: Chunks
Start(NP)
DT
the
Join(NP)
NN
lawyer
Other
Vt
questioned
Start(NP)
DT
the
Join(NP)
NN
witness
Other
IN
about
Start(NP)
DT
the
Join(NP)
NN
revolver
I Step 1: represent a tree as a sequence of decisions d1 . . . dm
T = 〈d1, d2, . . . dm〉
I First n decisions are tagging decisionsNext n decisions are chunk tagging decisions
〈d1 . . . d2n〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP)〉
![Page 10: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/10.jpg)
Layer 3: Remaining Structure
Alternate Between Two Classes of Actions:
I Join(X) or Start(X), where X is a label (NP, S, VP etc.)I Check=YES or Check=NO
Meaning of these actions:
I Start(X) starts a new constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)
I Join(X) continues a constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)
I Check=NO does nothingI Check=YES takes previous Join or Start action, and converts
it into a completed constituent
![Page 11: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/11.jpg)
NP
DT
the
NN
lawyer
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
![Page 12: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/12.jpg)
Start(S)
NP
DT
the
NN
lawyer
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
![Page 13: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/13.jpg)
Start(S)
NP
DT
the
NN
lawyer
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
Check=NO
![Page 14: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/14.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
![Page 15: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/15.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
Check=NO
![Page 16: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/16.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
![Page 17: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/17.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
IN
about
NP
DT
the
NN
revolver
Check=NO
![Page 18: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/18.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
Start(PP)
IN
about
NP
DT
the
NN
revolver
![Page 19: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/19.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
Start(PP)
IN
about
NP
DT
the
NN
revolver
Check=NO
![Page 20: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/20.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
Start(PP)
IN
about
Join(PP)
NP
DT
the
NN
revolver
![Page 21: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/21.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
PP
IN
about
NP
DT
the
NN
revolver
Check=YES
![Page 22: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/22.jpg)
Start(S)
NP
DT
the
NN
lawyer
Start(VP)
Vt
questioned
Join(VP)
NP
DT
the
NN
witness
Join(VP)
PP
IN
about
NP
DT
the
NN
revolver
![Page 23: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/23.jpg)
Start(S)
NP
DT
the
NN
lawyer
VP
Vt
questioned
NP
DT
the
NN
witness
PP
IN
about
NP
DT
the
NN
revolver
Check=YES
![Page 24: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/24.jpg)
Start(S)
NP
DT
the
NN
lawyer
Join(S)
VP
Vt
questioned
NP
DT
the
NN
witness
PP
IN
about
NP
DT
the
NN
revolver
![Page 25: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/25.jpg)
S
NP
DT
the
NN
lawyer
VP
Vt
questioned
NP
DT
the
NN
witness
PP
IN
about
NP
DT
the
NN
revolver
Check=YES
![Page 26: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/26.jpg)
The Final Sequence of decisions
〈d1 . . . dm〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP),Start(S), Check=NO, Start(VP), Check=NO,Join(VP), Check=NO, Start(PP), Check=NO,Join(PP), Check=YES, Join(VP), Check=YES,Join(S), Check=YES 〉
![Page 27: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/27.jpg)
A General Approach:
(Conditional) History-Based ModelsI Step 1: represent a tree as a sequence of decisions d1 . . . dm
T = 〈d1, d2, . . . dm〉
m is not necessarily the length of the sentence
I Step 2: the probability of a tree isp(T | S) =
∏mi=1 p(di | d1 . . . di−1, S)
I Step 3: Use a log-linear model to estimate
p(di | d1 . . . di−1, S)
I Step 4: Search?? (answer we’ll get to later: beam or heuristicsearch)
![Page 28: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/28.jpg)
Applying a Log-Linear Model
I Step 3: Use a log-linear model to estimate
p(di | d1 . . . di−1, S)
I A reminder:
p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e
f(〈d1...di−1,S〉,d)·v
where:
〈d1 . . . di−1, S〉 is the history
di is the outcome
f maps a history/outcome pair to a feature vector
v is a parameter vector
A is set of possible actions
![Page 29: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/29.jpg)
Applying a Log-Linear Model
I Step 3: Use a log-linear model to estimate
p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e
f(〈d1...di−1,S〉,d)·v
I The big question: how do we define f?
I Ratnaparkhi’s method defines f differently depending onwhether next decision is:
I A tagging decision(same features as before for POS tagging!)
I A chunking decisionI A start/join decision after chunkingI A check=no/check=yes decision
![Page 30: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/30.jpg)
Layer 3: Join or Start
I Looks at head word, constituent (or POS) label, andstart/join annotation of n’th tree relative to the decision,where n = −2,−1
I Looks at head word, constituent (or POS) label of n’th treerelative to the decision, where n = 0, 1, 2
I Looks at bigram features of the above for (-1,0) and (0,1)
I Looks at trigram features of the above for (-2,-1,0), (-1,0,1)and (0, 1, 2)
I The above features with all combinations of head wordsexcluded
I Various punctuation features
![Page 31: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/31.jpg)
Layer 3: Check=NO or Check=YES
I A variety of questions concerning the proposed constituent
![Page 32: Log-Linear Models for History-Based Parsingmcollins/cs4705-spring2020/slides/loglinear... · I A chunking decision I A start/join decision after chunking I A check=no/check=yes decision](https://reader034.vdocuments.net/reader034/viewer/2022050719/5f8afcb2e8056302cf01fd30/html5/thumbnails/32.jpg)
The Search Problem
I In POS tagging, we could use the Viterbi algorithm because
p(tj | w1 . . . wn, j, t1 . . . tj−1) = p(tj | w1 . . . wn, j, tj−2 . . . tj−1)
I Now: Decision di could depend on arbitrary decisions in the“past” ⇒ no chance for dynamic programming
I Instead, Ratnaparkhi uses a beam search method