conditional random fields - dfki · chunking task: find phrase boundaries: chunking pierre vinken,...
TRANSCRIPT
![Page 1: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/1.jpg)
Conditional Random
Fields
Dietrich Klakow
![Page 2: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/2.jpg)
Overview
• Sequence Labeling
• Bayesian Networks
• Markov Random Fields
• Conditional Random Fields
• Software example
![Page 3: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/3.jpg)
Sequence Labeling Tasks
![Page 4: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/4.jpg)
Sequence: a sentence
Pierre
Vinken
,
61
years
old
,
will
join
the
board
as
a
nonexecutive
director
Nov.
29
.
![Page 5: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/5.jpg)
POS Labels
Pierre
Vinken
,
61
years
old
,
will
join
the
board
as
a
nonexecutive
director
Nov.
29
.
NNP
NNP
,
CD
NNS
JJ
,
MD
VB
DT
NN
IN
DT
JJ
NN
NNP
CD
.
![Page 6: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/6.jpg)
Chunking
Task: find phrase boundaries:
![Page 7: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/7.jpg)
Chunking
Pierre
Vinken
,
61
years
old
,
will
join
the
board
as
a
nonexecutive
director
Nov.
29
.
B-NP
I-NP
O
B-NP
I-NP
B-ADJP
O
B-VP
I-VP
B-NP
I-NP
B-PP
B-NP
I-NP
I-NP
B-NP
I-NP
O
![Page 8: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/8.jpg)
Named Entity Tagging
Pierre
Vinken
,
61
years
old
,
will
join
the
board
as
a
nonexecutive
director
Nov.
29
.
B-PERSON
I-PERSON
O
B-DATE:AGE
I-DATE:AGE
I-DATE:AGE
O
O
O
O
B-ORG_DESC:OTHER
O
O
O
B-PER_DESC
B-DATE:DATE
I-DATE:DATE
O
![Page 9: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/9.jpg)
Supertagging
Pierre
Vinken
,
61
years
old
,
will
join
the
board
as
a
nonexecutive
director
Nov.
29
.
N/N
N
,
N/N
N
(S[adj]\NP)\NP
,
(S[dcl]\NP)/(S[b]\NP)
((S[b]\NP)/PP)/NP
NP[nb]/N
N
PP/NP
NP[nb]/N
N/N
N
((S\NP)\(S\NP))/N[num]
N[num]
.
![Page 10: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/10.jpg)
Hidden Markov Model
![Page 11: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/11.jpg)
HMM: just an Application of a
Bayes Classifier
[ ])...,,...,(maxarg)ˆ...ˆ,ˆ( 2121..,
21
21
NNN xxxPN
πππππππππ
=
![Page 12: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/12.jpg)
Decomposition of Probabilities
)..,,..,( 2121 NNxxxP πππ
∏=
−=N
i
iiii PxP1
1)|()|( πππ
)|( iixP π
)|( 1−iiP ππ : transition probability
: emission probability
![Page 13: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/13.jpg)
Graphical view HMM
X1 X2 X3 XN…….
π1 π2 π3 πN…….
Observation sequence
Label sequence
![Page 14: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/14.jpg)
Criticism
• HMMs model only limiter dependencies
a come up with more flexible models
a come up with graphical description
![Page 15: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/15.jpg)
Bayesian Networks
![Page 16: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/16.jpg)
Example for Bayesian Network
)()|()|(),|(
),,,(
CPCRPCSPRSWP
WRSCP =
From Russel and Norvig 95AI: A Modern Approach
Corresponding joint
distribution
![Page 17: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/17.jpg)
Naïve Bayes
∏=
D
i
i zxP1
)|(
Observations x1, …. xD are assumed to be independent
![Page 18: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/18.jpg)
Markov Random Fields
![Page 19: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/19.jpg)
• Undirected graphical model
• New term:
• clique in an undirected graph:
• Set of nodes such that every node is
connected to every other node
• maximal clique: there is no node that can be added without add without destroying the clique property
![Page 20: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/20.jpg)
Example
cliques: green and blue
maximal clique: blue
![Page 21: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/21.jpg)
Factorization
∑ ∏∈
Ψ=x CC
CC
M
xZ )(
∏∈
Ψ=MCC
CC xZ
xp )(1
)(
)0)((function potential:)(
cliques maximal all ofset :C
C cliquein nodes:
... nodes all:
CCCC
M
C
1
≥Ψ Ψ
xx
x
xxx N
Joint distribution described by graph
Normalization
Z is sometimes call the partition function
![Page 22: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/22.jpg)
Example
x1
x2
x5
x3
x4
What are the maximum cliques?
Write down joint probability
described by this graph
a white board
![Page 23: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/23.jpg)
Energy Function
)()( CxE
CC ex−
=Ψ
∑= ∈
−
MCC
CxE
eZ
xp)(
1)(
Define
Insert into joint distribution
![Page 24: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/24.jpg)
Conditional Random Fields
![Page 25: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/25.jpg)
Definition
Maximum random field
were each random variable yi
is conditioned on the complete input sequence x1, …xn
y1 y3
x
yn-1 yny2 …..
x=(x1…xn)
y=(y1…yn)
![Page 26: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/26.jpg)
Distribution
∑∑= = =
−−n
i
N
j
iijj ixyyf
exZ
xyp 1 1
1 ),,,(
)(
1)|(
λ
trained be to parameters :jλ
models)entropy maximum (see
function feature :),,,( 1 ixyyf iij −
Distribution
![Page 27: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/27.jpg)
Example feature functions
==
=−
else
yand yif i1-i
0
1),,,( 11
NNPINixyyf ii
==
=−
else
xand yif ii
0
1),,,( 12
SeptemberNNPixyyf ii
Modeling transitions
Modeling emissions
![Page 28: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/28.jpg)
Training
• Like in maximum entropy models
Generalized iterative scaling
• Convergence:
p(y|x) is a convex function
a unique maximum
Convergence is slow
Improved algorithms exist
![Page 29: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/29.jpg)
Define additional start symbol y0=START
and stop symbol yn+1=STOP
Define matrix
such that
Decoding: Auxiliary Matrix
)(xMi
[ ]∑
== =
−
−−
−N
j
iijj
iiii
ixyyfi
yyyy
iexMxM 1
1
11
),,,(
)()(λ
![Page 30: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/30.jpg)
Reformulate Probability
∏+
=−
=1
1
)()(
1)|(
1
n
i
i
yy xMxZ
xypii
With that definition we have
)()....()(...)( 121
121
1 2 3
10xMxMxMxZ
n
yyyy
y y y y
yy nn
n
+
+∑∑∑ ∑=
with
![Page 31: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/31.jpg)
Use Matrix Properties
[ ]STOPySTARTy
n
nxMxMxMxZ ==
+
+=
10 ,
121 )()...()()(
Use matrix product
with
[ ] ∑=1
211020)()()()( 2121
y
yyyyyyxMxMxMxM
![Page 32: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/32.jpg)
Software
![Page 33: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/33.jpg)
CRF++
• See http://crfpp.sourceforge.net/
![Page 34: Conditional Random Fields - DFKI · Chunking Task: find phrase boundaries: Chunking Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. B-NP I-NP](https://reader035.vdocuments.net/reader035/viewer/2022063004/5f8afcb1e8056302cf01fd28/html5/thumbnails/34.jpg)
Summary
• Sequence labeling problems
• CRFs are
• flexible
• Expensive to train
• Fast to decode