max-margin weight learning for markov logic networks
DESCRIPTION
Machine Learning Group Department of Computer Science The University of Texas at Austin. Max-Margin Weight Learning for Markov Logic Networks. ECML-PKDD-2009, Bled, Slovenia. Tuyen N. Huynh and Raymond J. Mooney. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Max-Margin Weight Learning for Markov Logic Networks
Tuyen N. Huynh and Raymond J. Mooney
Machine Learning GroupDepartment of Computer ScienceThe University of Texas at Austin
ECML-PKDD-2009, Bled, Slovenia
Motivation
Markov Logic Network (MLN) combining probability and first-order logic is an expressive formalism which subsumes other SRL models
All of the existing training methods for MLNs learn a model that produce good predictive probabilities
2
Motivation (cont.)
In many applications, the actual goal is to optimize some application specific performance measures such as classification accuracy, F1 score, etc…
Max-margin training methods, especially Structural Support Vector Machines (SVMs), provide the framework to optimize these application specific measures
Training MLNs under the max-margin framework
3
Outline
4
Background MLNs Structural SVMs
Max-Margin Markov Logic Networks Formulation LP-relaxation MPE inference
Experiments Future work Summary
Background
5
An MLN is a weighted set of first-order formulas
Larger weight indicates stronger belief that the clause should hold
Probability of a possible world (a truth assignment to all ground atoms) x:
Markov Logic Networks (MLNs)
iii xnw
ZxXP )(exp
1)(
Weight of formula i No. of true groundings of formula i in x
[Richardson & Domingos, 2006]
0.25 HasWord(“assignment”,p) => PageClass(Course,p)0.19 PageClass(Course,p1) ^ Linked(p1,p2) => PageClass(Faculty,p2)
6
Inference in MLNs
MAP/MPE inference: find the most likely state of a set of query atoms given the evidence
MaxWalkSAT algorithm [Kautz et al., 1997] Cutting Plane Inference algorithm [Riedel, 2008]
Computing the marginal conditional probability of a set of query atoms: P(y|x) MC-SAT algorithm [Poon & Domingos, 2006] Lifted first-order belief propagation [Singla &
Domingos, 2008]
)|(maxarg xyPy YyMAP
7
Existing weight learning methods in MLNs
Generative: maximize the Pseudo-Log Likelihood [Richardson & Domingos, 2006]
Discriminative : maximize the Conditional Log Likelihood (CLL) [Singla & Domingos, 2005], [Lowd & Domingos, 2007], [Huynh & Mooney, 2008]
8
Generic Strutural SVMs[Tsochantaridis
et.al., 2004]
9
Learn a discriminant function f: X x Y → R Predict for a given input x:
Maximize the separation margin:
Can be formulated as a quadratic optimization problem
)',(max),();,(\
yxwyxwwyx T
yYy
T
),();,( yxwwyxf T
),(maxarg);( yxwwxh T
Yy
Generic Strutural SVMs (cont.)
10
[Joachims et.al., 2009] proposed the 1-slack formulation of the Structural SVM:
Make the original cutting-plane algorithm [Tsochantaridis et.al., 2004] run faster and more scalable
n
iii
n
iiiii
Tnn
T
w
yyn
yxyxwn
Yyyst
Cww
111
0,
),(1
)],(),([1
:),...,(.
2
1min
Cutting plane algorithm for solving the structural SVMs
Structural SVM Problem Exponential constraints Most are dominated by a
small set of “important” constraints
Cutting plane algorithm Repeatedly finds the next
most violated constraint… … until cannot find any new
constraint
*Slide credit: Yisong Yue 11
Cutting plane algorithm for solving the 1-slack SVMs
Structural SVM Problem Exponential constraints Most are dominated by a
small set of “important” constraints
Cutting plane algorithm Repeatedly finds the next
most violated constraint… … until cannot find any new
constraint
*Slide credit: Yisong Yue 12
Cutting plane algorithm for solving the 1-slack SVMs
Structural SVM Problem Exponential constraints Most are dominated by a
small set of “important” constraints
Cutting plane algorithm Repeatedly finds the next
most violated constraint… … until cannot find any new
constraint
*Slide credit: Yisong Yue 13
Cutting plane algorithm for solving the 1-slack SVMs
Structural SVM Problem Exponential constraints Most are dominated by a
small set of “important” constraints
Cutting plane algorithm Repeatedly finds the next
most violated constraint… … until cannot find any new
constraint
*Slide credit: Yisong Yue 14
Applying the generic structural SVMs to a new problem
15
Representation: Φ(x,y) Loss function: Δ(y,y') Algorithms to compute
Prediction:
Most violated constraint: separation oracle [Tsochantaridis et.al., 2004] or loss-augmented inference [Taskar et.al.,2005]
)},({maxargˆ yxwy TYy
)},(),({maxargˆ yyyxwy T
Yy
Max-Margin Markov Logic Networks
16
),(max),(
)ˆ,(),();,(
\yxnwyxnw
yxnwyxnwwyxT
yYy
T
TT
Maximize the ratio:
Equivalent to maximize the separation margin:
Can be formulated as a 1-slack Structural SVMs
Formulation
17
i ii
i ii
yxnw
yxnw
xyP
xyP
)ˆ,(exp
),(exp
)|ˆ(
)|(
)|(maxargˆ \ xyPy yYy
Joint feature: Φ(x,y)
MPE inference:
Loss-augmented MPE inference:
Problem: Exact MPE inference in MLNs are intractable
Solution: Approximation inference via relaxation methods [Finley et.al.,2008]
Problems need to be solved
18
)',()',(maxargˆ'
yxnwyyy T
Yy
)',(maxargˆ ' yxnwy TYy
Relaxation MPE inference for MLNs
19
Many work on approximating the Weighted MAX-SAT via Linear Programming (LP) relaxation [Goemans and Williamson, 1994], [Asano and Williamson, 2002], [Asano, 2006] Convert the problem into an Integer Linear
Programming (ILP) problem Relax the integer constraints to linear
constraints Round the LP solution by some randomized
procedures Assume the weights are finite and positive
Relaxation MPE inference for MLNs (cont.)
20
Translate the MPE inference in a ground MLN into an Integer Linear Programming (ILP) problem: Convert all the ground clauses into clausal
form Assign a binary variable yi to each unknown
ground atom and a binary variable zj to each non-deterministic ground clause
Translate each ground clause into linear constraints of yi’s and zj’s
Relaxation MPE inference for MLNs (cont.)
21
3 InField(B1,Fauthor,P01)
0.5 InField(B1,Fauthor,P01) v InField(B1,Fvenue,P01)-1 InField(B1,Ftitle,P01) v InField(B1,Fvenue,P01)
!InField(B1,Fauthor,P01) v !InField(a1,Ftitle,P01).!InField(B1,Fauthor,P01) v !InField(a1,Fvenue,P01).!InField(B1,Ftitle,P01) v !InField(a1,Fvenue,P01).
Ground MLN Translated ILP problem
}1,0{,
1)1()1(
1)1()1(
1)1()1(
1
1
.
5.03max
32
31
21
23
22
121
211,
ji
zy
zy
yy
yy
yy
zy
zy
zyyst
zzy
22
LP-relaxation: relax the integer constraints {0,1} to linear constraints [0,1].
Adapt the ROUNDUP [Boros and Hammer, 2002] procedure to round the solution of the LP problem Pick a non-integral component and round it
in each step
Relaxation MPE inference for MLNs (cont.)
Loss-augmented LP-relaxation MPE inference
23
Represent the loss function as a linear function of yi’s:
Add the loss term to the objective of the LP-relaxation the problem is still a LP problem can be solved by the previous algorithm
0: 1:
Hammming )1(),(Ti
Tiyi yi
iiT yyyy
Experiments
24
Collective multi-label webpage classification
25
WebKB dataset [Craven and Slattery, 2001] [Lowd and Domingos, 2007]
4,165 web pages and 10,935 web links of 4 departments
Each page is labeled with a subset of 7 categories: Course, Department, Faculty, Person, Professor, Research Project, Student
MLN [Lowd and Domingos, 2007] :Has(+word,page) → PageClass(+class,page)¬Has(+word,page) → PageClass(+class,page)PageClass(+c1,p1) ^ Linked(p1,p2) → PageClass(+c2,p2)
Collective multi-label webpage classification (cont.)
26
Largest ground MLN for one department: 8,876 query atoms 174,594 ground clauses
Citation segmentation
27
Citeseer dataset [Lawrence et.al., 1999] [Poon and Domingos, 2007]
1,563 citations, divided into 4 research topics
Each citation is segmented into 3 fields: Author, Title, Venue
Used the simplest MLN in [Poon and Domingos, 2007]
Largest ground MLN for one topic: 37,692 query atoms 131,573 ground clauses
Experimental setup
4-fold cross-validation Metric: F1 score Compare against the Preconditioned
Scaled Conjugated Gradient (PSCG) algorithm
Train with 5 different values of C: 1, 10, 100, 1000, 10000 and test with the one that performs best on training
Use Mosek to solve the QP and LP problems
28
F1 scores on WebKB
29
Where does the improvement come from?
30
PSCG-LPRelax: run the new LP-relaxation MPE algorithm on the model learnt by PSCG-MCSAT
MM-Hamming-MCSAT: run the MCSAT inference on the model learnt by MM-Hamming-LPRelax
F1 scores on WebKB(cont.)
31
F1 scores on Citeseer
32
Sensitivity to the tuning parameter
33
Future work
34
Approximation algorithms for optimizing other application specific loss functions
More efficient inference algorithm Online max-margin weight learning
1-best MIRA [Crammer et.al., 2005] More experiments on structured
prediction and compare to other existing models
Summary
All existing discriminative weight learners for MLNs try to optimize the CLL
Proposed a max-margin approach to weight learning in MLNs, which can optimize application specific measures
Developed a new LP-relaxation MPE inference for MLNs
The max-margin weight learner achieves better or equally good but more stable performance.
35
Thank you!
36
Questions?