1 learning the structure of markov logic networks stanley kok

46
1 Structure of Markov Logic Networks Stanley Kok

Upload: alan-stanley

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Learning the Structure of Markov Logic Networks Stanley Kok

1

Learning the Structure of Markov Logic

Networks

Stanley Kok

Page 2: 1 Learning the Structure of Markov Logic Networks Stanley Kok

2

Overview Introduction CLAUDIEN, CRFs Algorithm

Evaluation Measure Clause Construction Search Strategies Speedup Techniques

Experiments

Page 3: 1 Learning the Structure of Markov Logic Networks Stanley Kok

3

Introduction Richardson & Domingoes (2004) learned MLN

structure in two disjoint steps: Learn FO clauses with off-the-shelf ILP system

(CLAUDIEN) Learn clause weights by optimizing pseudo-

likelihood

Develop algorithm: Learns FO clauses by directly optimizing pseudo-

likelihood Fast enough Learns better structure than R&D, pure ILP, purely

probabilistic and purely KB approaches

Page 4: 1 Learning the Structure of Markov Logic Networks Stanley Kok

4

CLAUDIEN CLAUsal DIscovery ENgine Starts with trivially false clause Repeatedly refine current clauses by adding literals Adds clauses that satisfy min accuracy and

coverage to KBtrue ) false

m ) false f ) false h ) false

m^f ) false m ) h

m ) fm^h ) false

f ) h f ) m f^h ) false h ) f h ) m

h ) m v f

Page 5: 1 Learning the Structure of Markov Logic Networks Stanley Kok

5

CLAUDIEN language bias ´ clause template

Refine handcrafted KB Example,

Professor(P) ( AdvisedBy(S,P) in KB dlab_template(‘1-2:[Professor(P),Student(S)]<-

AdvisedBy(S,P)’) Professor(P) v Student(S) ( AdvisedBy(S,P)

Page 6: 1 Learning the Structure of Markov Logic Networks Stanley Kok

6

Conditional Random Fields Markov networks used to compute P(y|x)

(McCallum2003)

Model:

Features, fk e.g. “current word is capitalized and next word is Inc”

y1 y2 y3 yn-1 yn

x1,x2,…,xn

IBM hired Alice….

Org PersonMisc Misc Misc

Page 7: 1 Learning the Structure of Markov Logic Networks Stanley Kok

7

CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Starts from empty CRF While convergence criteria is not met

Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in

model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)

Page 8: 1 Learning the Structure of Markov Logic Networks Stanley Kok

8

Algorithm High-level algorithm

RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN

Until Clauses =

FindBestClauses(MLN)Search for, For each candidate clause c

Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain

and create candidate clauses

Page 9: 1 Learning the Structure of Markov Logic Networks Stanley Kok

9

Evaluation Measure Ideally use log-likelihood, but slow

Recall: Value: Gradient:

Page 10: 1 Learning the Structure of Markov Logic Networks Stanley Kok

10

Evaluation Measure Use pseudo-log-likelihood

(R&D(2004)), but Undue weight to predicates with large #

of groundings Recall: E.g.:

Page 11: 1 Learning the Structure of Markov Logic Networks Stanley Kok

11

Evaluation Measure Use weighted pseudo-log-likelihood (WPLL)

E.g.:

Page 12: 1 Learning the Structure of Markov Logic Networks Stanley Kok

12

Algorithm High-level algorithm

RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN

Until Clauses =

FindBestClauses(MLN)Search for, For each candidate clause c

Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain

and create candidate clauses

Page 13: 1 Learning the Structure of Markov Logic Networks Stanley Kok

13

Clause Construction

Add a literal (negative/positive) All possible ways variables of new literal can

be shared with those of clause !Student(S) v AdvBy(S,P)

Remove a literal (when refining MLN) Remove spurious conditions from rules !Student(S) v !YrInPgm(S,5) v TA(S,C)

v TmpAdvBy(S,P)

Page 14: 1 Learning the Structure of Markov Logic Networks Stanley Kok

14

Clause Construction Flip signs of literals (when refining MLN)

Move literals on wrong side of implication !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !

SameCse(C1,C2) v !SameQtr(Q1,Q2) Beginning of algorithm Expensive, optional

Limit # of distinct variables to restrict search space

Page 15: 1 Learning the Structure of Markov Logic Networks Stanley Kok

15

Algorithm High-level algorithm

RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN

Until Clauses =

FindBestClauses(MLN)Search for, For each candidate clause c

Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain

and create candidate clauses

Page 16: 1 Learning the Structure of Markov Logic Networks Stanley Kok

16

Search Strategies Shortest-first search (SFS)

1. Find gain of each clause2. Sort clauses by gain3. Return top 5 with positive gainMLN

wt1, !AdvBy(S,P)wt2, clause2

4. Add 5 clauses to MLN5. Retrain wts of MLN

candidate set

1. Find gain of each clause2. Sort them by gain

(Yikes! All length-2 clauses have gains · 0)

!AdvBy(S,P) v Stu(S)

Page 17: 1 Learning the Structure of Markov Logic Networks Stanley Kok

17

Shortest-First Search

a. Extend 20 length-2 clause with highest gains

b. Form new candidate setc. Keep 1000 clauses with

highest gains

MLNwt1, !AdvBy(S,P)

wt2, clause2…

!AdvBy(S,P) v Stu(S) !AdvBy(S,P) v Stu(S) v Prof(P)

Page 18: 1 Learning the Structure of Markov Logic Networks Stanley Kok

18

Shortest-First Search Shortest-first search (SFS)

• Repeat process • Extend all length-2

clauses before length-3 ones

MLNwt1, clause1wt2, clause2

candidate setHow do you refine a non-empty MLN?

Page 19: 1 Learning the Structure of Markov Logic Networks Stanley Kok

19

SFS – MLN Refinementa. Extend 20 length-2

clause with highest gainsb. Extend length-2 clauses

in MLNc. Remove a predicate from

length-4 clauses in MLNd. Flip signs of length-3

clauses in MLN (optional)e. b,c,d replaces original

clause in MLN

MLNwt1, !AdvBy(S,P)

wt2, clause2…

wtA, clauseAwtB, clauseB

Page 20: 1 Learning the Structure of Markov Logic Networks Stanley Kok

20

Search Strategies Beam Search

1. Keep a beam of 5 clauses with highest gains 2. Track best clause3. Stop when best clause does not change after two

consecutive iterations

MLNwt1, clause1wt2, clause2

…wtA, clauseAwtB, clauseB

How do you refine a non-empty MLN?

Page 21: 1 Learning the Structure of Markov Logic Networks Stanley Kok

21

Algorithm High-level algorithm

RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN

Until Clauses =

FindBestClauses(MLN)Search for, For each candidate clause c

Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain

and create candidate clauses

Page 22: 1 Learning the Structure of Markov Logic Networks Stanley Kok

22

Difference from CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Start from empty CRF While convergence criteria is not met

Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in

model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)

We can refine non-empty MLN

•We use pseudo-likelihood; different optimizations.•Applicable to arbitrary MN (not only linear chains)

•Maintain separate candidate set•Add best ¼10s in model

Flexible enough to fit in different search algms

Page 23: 1 Learning the Structure of Markov Logic Networks Stanley Kok

23

OverviewIntroductionCLAUDIEN, CRFsAlgorithm

Evaluation MeasureClause ConstructionSearch Strategies

Speedup Techniques Experiments

Page 24: 1 Learning the Structure of Markov Logic Networks Stanley Kok

24

Speedup Techniques Recall: FindBestClauses(MLN)

Search for, and create candidate clausesFor each candidate clause c

Compute gainWPLL of adding c to MLNReturn k clauses with highest gain

LearnWeights(MLN+c) to optimize WPLL with L-BFGS L-BFGS computes value and gradient of WPLL

Many candidate clauses; important to compute WPLL and its gradient efficiently

Page 25: 1 Learning the Structure of Markov Logic Networks Stanley Kok

25

Speedup Techniques WPLL:

Ignore clauses in which predicate does not appear in e.g. predicate l does not appear in clause 1

CLL

Page 26: 1 Learning the Structure of Markov Logic Networks Stanley Kok

26

Speedup Techniques Gnd pred’s CLL affected by clauses that contains it Most clause weights do not significantly

Most CLLs do not much Don’t have to recompute all CLLs

Store WPLL and CLLs Recompute CLLs only if weights affecting it beyond

some threshold Subtract old CLLs and add new CLLs to WPLL

Page 27: 1 Learning the Structure of Markov Logic Networks Stanley Kok

27

Speedup Techniques WPLL is a sum over all ground predicates

Estimate WPLL Uniformly sampling grounding of each FO predicates

Sample x% of # groundings subject to min, max Extrapolate the average

Page 28: 1 Learning the Structure of Markov Logic Networks Stanley Kok

28

Speedup Techniques WPLL and its gradient

Compute # true groundings of a clause #P-complete problem

Karp & Luby (1983)’s Monte-Carlo algorithm Gives estimate that is within of true value with

probability 1- Draws samples of a clause

Found that estimate converges faster than algorithm specifies Use convergence test (DeGroot & Schervish 2002) after

every 100 samples Earlier termination

Page 29: 1 Learning the Structure of Markov Logic Networks Stanley Kok

29

Speedup Techniques L-BFGS used to learn clause weights to

optimize WPLL Two parameters:

Max number of iterations Convergence Threshold

Use smaller # max iterations and looser convergence thresholds When evaluating candidate clause’s gain Faster termination

Page 30: 1 Learning the Structure of Markov Logic Networks Stanley Kok

30

Speedup Technique Lexicographic ordering on clauses

Avoid redundant computations for clauses that are syntactically the same

Don’t detect semantically identical but syntactically different clauses (NP-complete problem)

Cache new clauses Avoid recomputation

Page 31: 1 Learning the Structure of Markov Logic Networks Stanley Kok

31

Speedup Techniques Also used R&D04 techniques for WPLL gradient :

Ignore predicates that don’t appear in ith formula

Ignore ground formulas with truth value unaffected by changing truth value of any literal

# true groundings of a clause computed once and cached

Page 32: 1 Learning the Structure of Markov Logic Networks Stanley Kok

32

OverviewIntroductionCLAUDIEN, CRFsAlgorithm

Evaluation MeasureClause ConstructionSearch StrategiesSpeedup Techniques

Experiments

Page 33: 1 Learning the Structure of Markov Logic Networks Stanley Kok

33

Experiments UW-CSE domain

22 predicates e.g. AdvisedBy, Professor etc 10 types e.g. Person, Course, Quarter etc Total # ground predicates about 4 million # true ground predicates (in DB) = 3212 Handcrafted KB with 94 formulas

Each student has at most one advisor If a student is an author of a paper, so is her advisor

etc

Page 34: 1 Learning the Structure of Markov Logic Networks Stanley Kok

34

Experiments Cora domain

1295 citations to 112 CS research papers Author, Venue, Title, Year fields 5 Predicates viz. SameCitation, SameAuthor,

SameVenue, SameTitle, SameYear Evidence Predicates e.g.

WordsInCommonInTitle20%(title1, title2) Total # ground predicates about 5 million # true ground predicates (in DB) = 378,589 Handcrafted KB with 26 clauses

If two citations same, then they have same authors, titles etc, and vice versa

If two titles have many words in common, then they are the same, etc

Page 35: 1 Learning the Structure of Markov Logic Networks Stanley Kok

35

Systems MLN(KB): weight-learning applied to handcrafted

KB MLN(CL): structure-learning with CLAUDIEN;

weight-learning MLN(KB+CL): structure-learning with CLAUDIEN,

using the handcrafted KB as its language bias; weight-learning

MLN(SLB): structure-learning with beam search, start from empty MLN

MLN(KB+SLB): ditto, start from handcrafted KB MLN(SLB+KB): structure-learning with beam

search, start from empty MLN, allow handcrafted clauses to be added in a first search step

MLN(SLS): structure-learning with SFS, start from empty MLN

Page 36: 1 Learning the Structure of Markov Logic Networks Stanley Kok

36

Systems CL: CLAUDIEN alone KB: handcrafted KB alone KB+CL: CLAUDIEN with KB as its language

bias NB: naïve bayes BN: Bayesian networks

Page 37: 1 Learning the Structure of Markov Logic Networks Stanley Kok

37

Methodology UW-CSE domain

DB divided into 5 areas: ai, graphics, languages, systems, theory

Leave-one-out testing by area Cora domain

5 different train-test splits Measured

average CLL of the predicates average area under the precision-recall curve

of the predicates (AUC)

Page 38: 1 Learning the Structure of Markov Logic Networks Stanley Kok

38

Results MLN(SLS), MLN(SLB) better than

MLN(CL), MLN(KB), CL, KB, NB, BN

UW-CSE

0.533

0.4710.430

0.5500.507

0.306

0.419

0.320

0.170

0.286

0.3890.395

0.000

0.100

0.200

0.300

0.400

0.500

0.600

AU

UW-CSE

0.0590.0860.142

0.0680.114

0.418

0.141

1.100

0.733

1.234

0.507

0.166

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

CLL

CLL

(-v

e)

AU

C

Page 39: 1 Learning the Structure of Markov Logic Networks Stanley Kok

39

Results MLN(SLS), MLN(SLB) better than

MLN(CL), MLN(KB), CL, KB, NB, BN

Cora

0.7820.826

0.782 0.796

0.148

0.813

0.693

0.148

0.693

0.1040.061

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

AU

CLL

AU

C

Cora

0.058 0.058 0.058 0.071

0.693

0.067

0.224

0.693

0.225

0.440

0.266

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

CLL

CLL

(-v

e)

Page 40: 1 Learning the Structure of Markov Logic Networks Stanley Kok

40

Results MLN(SLB+KB) better than

MLN(KB+CL), KB+CL

UW-CSE

0.533

0.4710.430

0.5500.507

0.306

0.419

0.320

0.170

0.286

0.3890.395

0.000

0.100

0.200

0.300

0.400

0.500

0.600

AU

UW-CSE

0.0590.0860.142

0.0680.114

0.418

0.141

1.100

0.733

1.234

0.507

0.166

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

CLL

CLL

(-v

e)

AU

C

Page 41: 1 Learning the Structure of Markov Logic Networks Stanley Kok

41

Results MLN(SLB+KB) better than

MLN(KB+CL), KB+CL

Cora

0.7820.826

0.782 0.796

0.148

0.813

0.693

0.148

0.693

0.1040.061

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

AU

CLL

AU

C

Cora

0.058 0.058 0.058 0.071

0.693

0.067

0.224

0.693

0.225

0.440

0.266

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

CLL

CLL

(-v

e)

Page 42: 1 Learning the Structure of Markov Logic Networks Stanley Kok

42

Results MLN(<system>) does better than corresponding

<system>

UW-CSE

0.533

0.4710.430

0.5500.507

0.306

0.419

0.320

0.170

0.286

0.3890.395

0.000

0.100

0.200

0.300

0.400

0.500

0.600

AU

UW-CSE

0.0590.0860.142

0.0680.114

0.418

0.141

1.100

0.733

1.234

0.507

0.166

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

CLL

CLL

(-v

e)

AU

C

Page 43: 1 Learning the Structure of Markov Logic Networks Stanley Kok

43

Results MLN(<system>) does better than corresponding

<system>

Cora

0.7820.826

0.782 0.796

0.148

0.813

0.693

0.148

0.693

0.1040.061

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

AU

CLL

AU

C

Cora

0.058 0.058 0.058 0.071

0.693

0.067

0.224

0.693

0.225

0.440

0.266

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

CLL

CLL

(-v

e)

Page 44: 1 Learning the Structure of Markov Logic Networks Stanley Kok

44

Results MLN(SLS) on UW-CSE; cluster of 15 dual-

CPUs 2.8 GHz Pentium 4 machines With speed-ups: 5.3 hrs Without speed-ups: didn’t finish running in 24

hrs MLN(SLB) on UW-CSE; on single 2.8 GHz

Pentium 4 machine With speedups: 8.8 hrs Without speedups: 13.7 hrs

Page 45: 1 Learning the Structure of Markov Logic Networks Stanley Kok

45

Future Work Speeding up counting of # true

groundings of clause Probabilistically bounding the loss in

accuracy due to subsampling Probabilistic predicate discovery

Page 46: 1 Learning the Structure of Markov Logic Networks Stanley Kok

46

Conclusion Develop algorithm:

Learns FO clauses by directly optimizing pseudo-likelihood

Fast enough Learns better structure than R&D, pure ILP,

purely probabilistic and purely KB approaches