online max-margin weight learning with markov logic networks tuyen n. huynh and raymond j. mooney...
TRANSCRIPT
![Page 1: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/1.jpg)
Online Max-Margin Weight Learning
with Markov Logic Networks
Tuyen N. Huynh and Raymond J. Mooney
Machine Learning GroupDepartment of Computer ScienceThe University of Texas at Austin
Star AI 2010, July 12, 2010
![Page 2: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/2.jpg)
Outline
2
Motivation Background
Markov Logic Networks Primal-dual framework
New online learning algorithm for structured prediction
Experiments Citation segmentation Search query disambiguation
Conclusion
![Page 3: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/3.jpg)
Motivation
Most of the existing weight learning for MLNs are in the batch setting. Need to run inference over all the training
examples in each iteration Usually take a few hundred iterations to
converge Cannot fit all the training examples in the
memory
Conventional solution: online learning
3
![Page 4: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/4.jpg)
Background
4
![Page 5: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/5.jpg)
An MLN is a weighted set of first-order formulas
Larger weight indicates stronger belief that the clause should hold
Probability of a possible world (a truth assignment to all ground atoms) x:
Markov Logic Networks (MLNs)
iii xnw
ZxXP )(exp
1)(
Weight of formula i No. of true groundings of formula i in x
[Richardson & Domingos, 2006]
2.5 Center(i,c) => InField(Ftitle,i,c)1.2 InField(f,i,c) ^ Next(j,i) ^ ¬HasPunc(c,i)=> InField(f,j,c)
5
![Page 6: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/6.jpg)
Existing discriminative weight learning methods for MLNs
maximize the Conditional Log Likelihood (CLL) [Singla & Domingos, 2005], [Lowd & Domingos, 2007], [Huynh & Mooney, 2008]
maximize the margin, the log ratio between the probability of the correct label and the closest incorrect one [Huynh & Mooney, 2009]
6
![Page 7: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/7.jpg)
Online learning
7
Regret = R(T) =P T
t=1 lt(wt) ¡ minw2W
Regret = R(T) =TX
t=1
ct(wt) ¡ minw2W
(8)Regret = R(T) =TX
t=1
ct(wt) ¡ minw2W
TX
t=1
ct(w) (1)
Regret = R(T) =P T
t=1 ct(wt) ¡ minw2WP T
t=1 ct(w)
![Page 8: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/8.jpg)
A general and latest framework for deriving low-regret online algorithms
Rewriting the regret bound as an optimization problem (called the primal problem), then considering the dual problem of the primal one
A condition that guarantees the increase in the dual objective in each step
Incremental-Dual-Ascent (IDA) algorithms. For example: subgradient methods
Primal-dual framework [Shalev-Shwartz et al., 2006]
8
![Page 9: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/9.jpg)
Primal-dual framework (cont.)
9
Proposed a new class of IDA algorithms called Coordinate-Dual-Ascent (CDA) algorithm: The CDA update rule only optimizes the
dual w.r.t the last dual variable A closed-form solution of CDA update rule
CDA algorithms have the same cost as subgradient methods but increase the dual objective more in each step converging to the optimal value faster
![Page 10: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/10.jpg)
Primal-dual framework (cont.)
10
![Page 11: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/11.jpg)
CDA algorithms for max-margin structured prediction
11
![Page 12: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/12.jpg)
Max-margin structured prediction
12
),();,( yxwwyxf T
),(maxarg);( yxwwxh T
Yy
)',(max),();,(\
yxwyxwwyx T
yYy
T
![Page 13: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/13.jpg)
Steps for deriving new CDA algorithms
13
1. Define the regularization and loss functions
2. Find the conjugate functions3. Derive a closed-form solution for the
CDA update rule
![Page 14: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/14.jpg)
1. Define the regularization and loss functions
14
Label loss function
![Page 15: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/15.jpg)
1. Define the regularization and loss functions (cont.)
15
![Page 16: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/16.jpg)
2. Find the conjugate functions
16
![Page 17: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/17.jpg)
2. Find the conjugate functions (cont.)
17
![Page 18: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/18.jpg)
18
Optimization problem:
Solution:
3. Closed-form solution for the CDA update rule
![Page 19: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/19.jpg)
CDA algorithms for max-margin structured prediction
19
![Page 20: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/20.jpg)
Experiments
20
![Page 21: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/21.jpg)
Citation segmentation
21
Citeseer dataset [Lawrence et.al., 1999] [Poon and Domingos, 2007]
1,563 citations, divided into 4 research topics
Each citation is segmented into 3 fields: Author, Title, Venue
Used the simplest MLN in [Poon and Domingos, 2007] Similar to a linear chain CRF: Next(j,i) ^ !HasPunc(c,i) ^ InField(c,+f,i) => InField(c,+f,j)
![Page 22: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/22.jpg)
Experimental setup
Systems compared: MM: the max-margin weight learner for
MLNs in batch setting [Huynh & Mooney, 2009]
1-best MIRA [Crammer et al., 2005]
Subgradient [Ratliff et al., 2007]
CDA1/PA1 CDA2
22
![Page 23: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/23.jpg)
Experimental setup (cont.)
4-fold cross-validation Metric:
CiteSeer: micro-average F1 at the token level
Used exact MPE inference (Integer Linear Programming) for all online algorithms and approximate MPE inference (LP-relaxation) for the batch one.
Used Hamming loss as the label loss function
23
![Page 24: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/24.jpg)
Average F1
24
![Page 25: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/25.jpg)
Average training time in minutes
25
![Page 26: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/26.jpg)
Microsoft web search query dataset
26
Used the clean-up dataset created by Mihalkova & Mooney [2009]
Has thousands of search sessions where an ambiguous queries was asked
Goal: disambiguate search query based on previous related search sessions
Used 3 MLNs proposed in [Mihalkova & Mooney, 2009]
![Page 27: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/27.jpg)
Experimental setup
Systems compared: Contrastive Divergence (CD) [Hinton 2002]:
used in [Mihalkova & Mooney, 2009] 1-best MIRA Subgradient CDA1/PA1 CDA2
Metric: Mean Average Precision (MAP): how close
the relevant results are to the top of the rankings
27
![Page 28: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/28.jpg)
MAP scores
28
![Page 29: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/29.jpg)
Conclusion
29
Derived CDA algorithms for max-margin structured prediction Have same computational cost as existing
online algorithms but increase the dual objective more
Experimental results on two real-world problems show that the new algorithms generally achieve better accuracy and also have more consistent performance.
![Page 30: Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science](https://reader035.vdocuments.net/reader035/viewer/2022070305/55144ed5550346494e8b501b/html5/thumbnails/30.jpg)
Thank you!
30
Questions?