1 learning the structure of markov logic networks stanley kok & pedro domingos dept. of computer...

60
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

Post on 22-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

1

Learning the Structure of Markov Logic

Networks

Stanley Kok & Pedro Domingos

Dept. of Computer Science and Eng.

University of Washington

Page 2: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

2

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 3: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

3

Motivation Statistical Relational Learning (SRL)

combines the benefits of: Statistical Learning: uses probability to handle

uncertainty in a robust and principled way Relational Learning: models domains with

multiple relations

Page 4: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

4

Motivation Many SRL approaches combine a logical

language and Bayesian networks e.g. Probabilistic Relational Models

[Friedman et al., 1999]

The need to avoid cycles in Bayesian networks causes many difficulties [Taskar et al., 2002]

Started using Markov networks instead

Page 5: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

5

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

Page 6: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

6

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

This paper develops a fast algorithm that learns MLN structure Most powerful SRL learner to date

Page 7: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

7

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 8: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

8

Markov Logic Networks

First-order KB: set of hard constraints Violate one formula, a world has zero probability

MLNs soften constraints OK to violate formulas The fewer formulas a world violates,

the more probable it is Gives each formula a weight,

reflects how strong a constraint it is

Page 9: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

9

MLN Definition A Markov Logic Network (MLN) is a set of

pairs (F, w) where F is a formula in first-order logic w is a real number

Together with a finite set of constants,it defines a Markov network with One node for each grounding of each predicate

in the MLN One feature for each grounding of each formula F

in the MLN, with the corresponding weight w

Page 10: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

10

Ground Markov Network

Student(STAN)

Professor(PEDRO)

AdvisedBy(STAN,PEDRO)

Professor(STAN)

Student(PEDRO)

AdvisedBy(PEDRO,STAN)

AdvisedBy(STAN,STAN)

AdvisedBy(PEDRO,PEDRO)

AdvisedBy(S,P) ) Student(S) ^ Professor(P)2.7

constants: STAN, PEDRO

Page 11: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

11

MLN Model

Page 12: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

12

MLN Model

Vector of value assignments to ground predicates

Page 13: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

13

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Page 14: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

14

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Weight of ith formula

Page 15: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

15

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Weight of ith formula

# of true groundings of ith formula

Page 16: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

16

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

Page 17: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

17

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

SLOW#P-complete

Page 18: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

18

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

SLOW#P-completeSLOW

#P-complete

Page 19: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

19

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

Page 20: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

20

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

Page 21: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

21

MLN Structure Learning

R&D “learned” MLN structure in two disjoint steps: Learn first-order clauses with an off-the-shelf

ILP system (CLAUDIEN [De Raedt & Dehaspe, 1997]) Learn clause weights by optimizing

pseudo-likelihood Unlikely to give best results because CLAUDIEN

find clauses that hold with some accuracy/frequency in the data

don’t find clauses that maximize data’s (pseudo-)likelihood

Page 22: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

22

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 23: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

23

This paper develops an algorithm that: Learns first-order clauses by directly optimizing

pseudo-likelihood Is fast enough Performs better than R&D, pure ILP,

purely KB and purely probabilistic approaches

MLN Structure Learning

Page 24: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

24

Structure Learning Algorithm

High-level algorithmREPEAT

MLN Ã MLN [ FindBestClauses(MLN)UNTIL FindBestClauses(MLN) returns NULL

FindBestClauses(MLN)Create candidate clausesFOR EACH candidate clause c

Compute increase in evaluation measureof adding c to MLN

RETURN k clauses with greatest increase

Page 25: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

25

Structure Learning Evaluation measure Clause construction operators Search strategies Speedup techniques

Page 26: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

26

Evaluation Measure

R&D used pseudo-log-likelihood

This gives undue weight to predicates with large # of groundings

Page 27: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

27

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

Page 28: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

28

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

weight given to predicate r

Page 29: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

29

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r

Page 30: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

30

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r CLL: conditional log-likelihood

Page 31: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

31

Clause Construction Operators Add a literal (negative/positive) Remove a literal Flip signs of literals Limit # of distinct variables to restrict

search space

Page 32: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

32

Beam Search

Same as that used in ILP & rule induction Repeatedly find the single best clause

Page 33: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

33

Shortest-First Search (SFS)

1. Start from empty or hand-coded MLN2. FOR L Ã 1 TO MAX_LENGTH3. Apply each literal addition & deletion to

each clause to create clauses of length L4. Repeatedly add K best clauses of length L

to the MLN until no clause of length L improves WPLL

Similar to Della Pietra et al. (1997), McCallum (2003)

Page 34: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

34

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS) of adding c to MLN

RETURN k clauses with greatest increase

Page 35: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

35

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

Page 36: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

36

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Page 37: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

37

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

NOT THAT FAST

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Page 38: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

38

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 39: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

39

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 40: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

40

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 41: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

41

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 42: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

42

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 43: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

43

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 44: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

44

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 45: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

45

Experiments UW-CSE domain

22 predicates, e.g., AdvisedBy(X,Y), Student(X), etc. 10 types, e.g., Person, Course, Quarter, etc. # ground predicates ¼ 4 million # true ground predicates ¼ 3000 Handcrafted KB with 94 formulas

Each student has at most one advisor If a student is an author of a paper, so is her advisor

Cora domain Computer science research papers Collective deduplication of author, venue, title

Page 46: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

46

Systems

MLN(SLB): structure learning with beam searchMLN(SLS): structure learning with SFS

Page 47: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

47

Systems

MLN(SLB) MLN(SLS)

KB: hand-coded KBCL: CLAUDIENFO: FOILAL: Aleph

Page 48: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

48

Systems

MLN(SLB) MLN(SLS)

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Page 49: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

49

Systems

MLN(SLB) MLN(SLS)

NB: Naïve Bayes

BN: Bayesian

networks

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Page 50: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

50

Methodology UW-CSE domain

DB divided into 5 areas: AI, Graphics, Languages, Systems, Theory

Leave-one-out testing by area Measured

average CLL of the ground predicates average area under the precision-recall curve

of the ground predicates (AUC)

Page 51: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

51

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 52: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

52

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 53: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

53

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 54: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

54

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 55: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

55

0.533

0.472

0.390 0.397

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.370

-0.166

-1.0

-0.5

0.0

CLL

AU

C

MLN

(SLS

)

MLN

(SLB

)

NB

BN

MLN

(SLS

)

MLN

(SLB

)

NB

BN

UW-CSE

Page 56: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

56

Timing MLN(SLS) on UW-CSE

Cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines

Without speedups: did not finish in 24 hrs With speedups: 5.3 hrs

Page 57: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

57

4.0

21.6

8.46.5

4.1

24.8

0.0

10.0

20.0

30.0

Lesion Study Disable one speedup technique at a time; SFS

UW-CSE (one-fold)

Hour

all speedups

no clausesampling

no predicatesampling

don’t avoidredundancy

no looseconverg.

threshold

no weight thresholding

Page 58: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

58

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 59: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

59

Future Work Speed up counting of # true

groundings of clause Probabilistically bound the loss in

accuracy due to subsampling Probabilistic predicate discovery

Page 60: 1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington

60

Conclusion Markov logic networks: a powerful combination

of first-order logic and probability Richardson & Domingos (2004) did not learn

MLN structure We develop an algorithm that automatically learns

both first-order clauses and their weights We develop speedup techniques to make our

algorithm fast enough to be practical We show experimentally that our algorithm

outperforms Richardson & Domingos Pure ILP Purely KB approaches Purely probabilistic approaches

(For software, email: [email protected])