learning relationships defined by linear combinations of … · 2011. 8. 21. · random walks...

49
Learning Relationships Defined by Linear Combinations of Constrained Random Walks William W. Cohen Machine Learning Department and Language Technologies Institute School of Computer Science Carnegie Mellon University joint work with: Ni Lao Language Technologies Institute Tom Mitchell Machine Learning Department

Upload: others

Post on 19-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Learning Relationships Defined by Linear Combinations of Constrained

Random Walks

William W. Cohen Machine Learning Department and Language Technologies Institute

School of Computer Science Carnegie Mellon University

joint work with: Ni Lao

Language Technologies Institute Tom Mitchell

Machine Learning Department

Page 2: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Motivation: The simple and the complex

•  In computer science there is a tension between –  The elegant, simple and general –  The messy, complex and problem-specific

•  Graphs are: –  Simple: so they are easy to analyze and store –  General: so

•  They appear in many contexts •  They are often a natural representation of

important aspects of information –  Well-understood: for instance,

•  Standard techniques like PPR/RWR exist for estimating similarity of two nodes in a graph

Page 3: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Motivation: The simple and the complex

•  The real world is complex… •  … learning is a way to incorporate that complexity in our

models without sacrificing elegance and generality

Page 4: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Motivation: The simple and the complex

•  Graphs are: –  Simple: so they are easy to analyze and store –  General –  Well-understood: for instance,

•  Standard techniques like PPR/RWR exist for estimating similarity of two nodes in a graph

•  In this talk: –  Learning similarity-like relationships in graphs, based

on RWR/PPR –  Several distinct applications –  Focus on recent work – [Lao & Cohen EMNLP 2010;

Lao & Cohen KDD 2010; Lao, Mitchell & Cohen, EMNLP 2011] vs [Minkov, Ng, Cohen SIGIR 2006; Cohen & Minkov BMC Bioinformatics 2006; Minkov & Cohen EMNLP 2008; etc]

Page 5: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Similarity Queries on Graphs 1) Given type t* and node x in G, find y:T(y)=t* and y~x. 2) Given type t* and node set X, find y:T(y)=t* and y~X.

•  Nearest-neighbor classification: –  G contains feature nodes and instance nodes –  A link (x,f) means feature f is true for instance x –  x* is a query instance, y~x* means y likely of same class as x*

•  Information retrieval: –  G contains word nodes and document nodes –  A link (w,d) means word w is in document d –  X is a set of keywords, y~X means y likely to be relevant to X

•  Database retrieval: –  G encodes a database –  … ?

Page 6: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

BANKS: Browsing and Keyword Search

•  Database is modeled as a graph –  Nodes = tuples –  Edges = references between tuples

•  edges are directed and indicate foreign key, inclusion dependencies, ..

[Aditya et al, VLDB 2002]

MultiQuery Optimization

S. Sudarshan Prasan Roy

writes writes

author author

paper

Page 7: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Query: {“sudarshan”, “roy”} Answer: subtree from graph

MultiQuery Optimization

S. Sudarshan Prasan Roy

writes writes

author author

paper

Page 8: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Query: “sudarshan”, “roy” Answer: subtree from graph

y: paper(y) & ~“sudarshan” w: paper(y) & w~“roy” AND

Page 9: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge
Page 10: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Similarity Queries on Graphs 1) Given type t* and node x in G, find y:T(y)=t* and y~x. 2) Given type t* and node set X, find y:T(y)=t* and y~X.

•  Nearest-neighbor classification •  Information retrieval •  Database retrieval •  Evaluation: specific families of tasks for scientific publications:

–  Citation recommendation for a paper: (given title, year, …, of paper p, what papers should be cited by p?)

–  Expert-finding: (given keywords, genes, … suggest a possible author) –  “Entity recommendation”: (given title, author, year, … predict entities

mentioned in a paper, e.g. gene-protein entities) – can improve NER –  Literature recommendation: given researcher and year, suggest papers

to read that year •  Inference in a DB of automatically-extracted facts

Core tasks in CS

Page 11: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Outline

•  Motivation for Learning Similarity in Graphs •  A Baseline Similarity Metric •  Some Literature-related Tasks •  The Path Ranking Algorithm (Learning Method)

– Motivation – Details

•  Results: BioLiterature tasks •  Results: KB Inference tasks

Page 12: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Defining Similarity on Graphs: PPR/RWR

Given type t* and node x, find y:T(y)=t* and y~x.

•  Similarity defined by “damped” version of PageRank •  Similarity between nodes x and y:

–  “Random surfer model”: from a node z, •  with probability α, teleport back to x (“reset”) •  Else pick a y uniformly from { y’ : z y’ } •  repeat from node y ....

–  Similarity x~y = Pr( surfer is at y | reset is always to x )

•  Intuitively, x~y is sum of weight of all paths from x to y, where weight of path decreases exponentially with length.

•  Can easily extend to a “query” set X={x1,…,xk}

[Personalized PageRank 1999]

Page 13: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Some BioLiterature Retrieval Tasks

•  Data used in this study –  Yeast: 0.2M nodes, 5.5M links –  Fly: 0.8M nodes, 3.5M links –  E.g. the fly graph

Page 14: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Learning Proximity Measures for BioLiterature Retrieval Tasks

•  Tasks: –  Gene recommendation: author, yeargene –  Reference recommendation: words,yearpaper –  Expert-finding: words, genesauthor –  Literature-recommendation: author, [papers read in past]

•  Baseline method: –  Typed RWR proximity methods

•  Baseline learning method: –  parameterize Prob(walk edge|edge label=L) and tune the

parameters for each label L (somehow…)

P(write)=b

P(L=cite) = a

P(NE) = c

P(bindTo) = d P(express) = d

Page 15: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Path-based vs Edge-label based learning

•  Learning one-parameter-per-edge label is limited because the context in which an edge label appears is ignored –  E.g. (observed from real data – task, find papers to read)

•  Instead, we will learn path-specific parameters

Path Comments Don't read about genes I’ve

already read about Do read papers from my

favorite authors

•  Paths will be interpreted as constrained random walks that give a similarity-like weight to every reachable node •  Step 0: D0 = {a} Start at author a •  Step 1: D1: Uniform over all papers p read by a •  Step 2: D2: Author a’ of papers in D1 weighted by number of papers

in D1 published by a’ •  Step 3: D3 Papers p’ published by a’ weighted by .... •  …

Page 16: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Path-based vs Edge-label based learning

•  Learning one-parameter-per-edge label is limited because the context in which an edge label appears is ignored –  E.g. (observed from real data – task, find papers to read)

•  Instead, we will learn path-specific parameters

Path Comments Don't read about genes I’ve

already read about Do read papers from my

favorite authors

•  Paths will be interpreted as constrained random walks that give a similarity-like weight to every reachable node •  Step 0: D0 = {a} Start at author a •  Step 1: D1: Uniform over all papers p read by a •  Step 2: D2: Author a’ of papers in D1 weighted by number of papers

in D1 published by a’ •  Step 3: D3 Papers p’ published by a’ weighted by .... •  …

Page 17: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

A Limitation of RWR Learning Methods •  Learning one-parameter-per-edge label is limited because

the context in which an edge label appears is ignored –  E.g. (observed from real data – task, find papers to read)

•  Instead, we will learn path-specific parameters Path Comments

Don't read about genes I’ve already read about

Do read papers from my favorite authors

Path Comments Do read about the genes

I’m working on Don't read papers from

my own lab

Page 18: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

A Limitation of RWR Learning Methods •  Learning one-parameter-per-edge label is limited because

the context in which an edge label appears is ignored –  E.g. (observed from real data – task, find papers to read)

•  Instead, we will learn path-specific parameters Path Comments

Don't read about genes I’ve already read about

Do read papers from my favorite authors

Path Comments Do read about the genes

I’m working on Don't read papers from

my own lab

Page 19: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Path Constrained Random Walks as Basis of a Proximity Measure

•  Our work (Lao & Cohen, ECML 2010) –  learn a weighted combination of simple “path experts”, each of

which corresponds to a particular labeled path through the graph

•  Citation recommendation--an example –  In the TREC-CHEM Prior Art Search Task, researchers found

that it is more effective to first find patents about the topic, then aggregate their citations

–  Our proposed model can discover this kind of retrieval schemes and assign proper weights to combine them. E.g.

Weighted Paths

Page 20: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

20

Definitions •  An graph G=(T,R,X,E), is

–  a set of entity types T={T} and a set of relations R={R} –  a set of entities (nodes) X={x}, where each node x has a type from T –  a set of edges e=(x,y), where each edge has a relation label from R

•  A path P=(R1, …,Rn) is a sequence of relations

•  Path Constrained Random Walk –  Given a query set S of “source” nodes –  Distribution D0 at time 0 is uniform over s in S –  Distribution Dt at time t>0 is formed by

•  Pick x from Dt-1 •  Pick y uniformly from all things related to x

–  by an edge labeled Rt

–  Notation: fP(s,t) = Prob(st; P)

–  In our examples type of t will be determined by Rn

Page 21: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Path Ranking Algorithm (PRA)

•  A PRA model scores a source-target node pair by a linear function of their path features

where P is the set of all relation paths with length ≤ L (with support on data, in some cases – see [Lao and Cohen EMNLP 2011])

•  For a relation R and a set of node pairs {(si, ti)}, we construct a training dataset D ={(xi, yi)}, where xi is a vector of all the path features for (si, ti), and yi indicates whether R(si, ti) is true or not

•  θ is estimated using L1,L2-regularized logistic regression

[Lao & Cohen, ECML 2010]

Page 22: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Supervised PCRW Retrieval Model

•  A Retrieval Model ranks target entities by linearly combining the distributions of different paths

•  This mode can be optimized by maximizing the probability of the observed relevance

–  Given a set of training data D={(q(m), A(m), y(m))}, ye(m)=1/0

Page 23: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

23

Parameter Estimation (Details)

•  Given a set of training data –  D={(q(m), A(m), y(m))} m=1…M, y(m)(e)=1/0

•  We can define a regularized objective function

•  Use average log-likelihood as the objective om(θ)

–  P(m) the index set or relevant entities, –  N(m) the index set of irrelevant entities

(how to choose them will be discussed later)

Page 24: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Parameter Estimation (Details)

•  Selecting the negative entity set Nm –  Few positive entities vs. thousands (or millions) of negative entities? –  First sort all the negative entities with an initial model (uniform weight

1.0) –  Then take negative entities at the k(k+1)/2-th position, for k=1,2,….

•  The gradient

•  Use orthant-wise L-BFGS (Andrew & Gao, 2007) to estimate θ –  Efficient, Can deal with L1 regularization

Page 25: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

L2 Regularization

•  Improves retrieval quality –  On the citation recommendation task

Page 26: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

L1 Regularization

•  Does not improve retrieval quality…

Page 27: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

L1 Regularization

•  … but can help reduce number of features

Page 28: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

28

Extension 1: Query Independent Paths

•  PageRank (and other query-independent rankings): –  assign an importance score (query independent) to each web page –  later combined with relevance score (query dependent)

•  We generalize pagerank to heterogeneous graphs: –  We include to each query a special entity e0 of special type T0 –  T0 is related to all other entity types, and each type is related to all

instances of that type –  This defines a set of PageRank-like query independent relation paths –  Compute f(*t;P) offline for efficiency

•  Example well cited papers

productive authors

all papers

all authors

Page 29: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Extension 2: Entity-specific rankings

•  There are entity-specific characteristics which cannot be captured by a general model –  Some items are interesting to the users because of features not

captured in the data –  To model this, assume the identity of the entity matters

–  Introduce new features f(st; Ps,t) to account for jumping from s to t and new features f(*t; P*,t)

–  At each gradient step, add a few new features of this sort with highest gradient, count on regularization to avoid overfitting

Page 30: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Extension 3: Speeding up random walks

•  Prior work on speeding up personalized PageRank/RWR –  Pre-computing components (eg Jeh & Widom 2003) –  Sampling-based approaches (eg Fogaras et al, 2005) –  Pre-clustering data (eg Tong et al 2006) –  Pruning approaches (eg Andersen et al, 2006)

•  We use hybrid sample/pruning based approach (“Weighted particle filtering” + “low variance sampling”) –  Same approximation used at training and test time –  Speedups up to 10-100x w/ little loss (sometimes some gain!) in

performance

[Lao and Cohen, KDD 2010]

Page 31: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Ext.2: Popular Entities

•  For a task with query type T0, and target type Tq, –  Introduce a bias θe for each entity e in IE(Tq) –  Introduce a bias θe’,e for each entity pair (e’,e) where e in IE(Tq)

and e’ in IE(T0)

•  Then

–  Or in matrix form

•  Efficiency consideration –  Only add to the model top J parameters (measured by ¦O(θ)/θe¦)

at each LBFGS iteration

Page 32: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

32

Experiment Setup for BioLiterature

•  Data sources for bio-informatics –  PubMed on-line archive of over 18 million biological abstracts –  PubMed Central (PMC) full-text copies of over 1 million of these papers –  Saccharomyces Genome Database (SGD) a database for yeast –  Flymine a database for fruit flies

•  Tasks –  Gene recommendation: author, yeargene –  Venue recommendation: genes, title wordsjournal –  Reference recommendation: title words,yearpaper –  Expert-finding: title words, genesauthor

•  Data split –  2000 training, 2000 tuning, 2000 test

•  Time variant graph –  each edge is tagged with a time stamp (year) –  only consider edges that are earlier than the query, during random walk

Page 33: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

BioLiterature: Some Results

•  Compare the MAP of PRA to –  RWR model –  query independent paths (qip) –  popular entity biases (pop)

Except these† , all improvements are statistically significant at p<0.05 using paired t-test

Page 34: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Example Path Features and their Weights

•  A PRA+qip+pop model trained for the citation recommendation task on the yeast data

6) approx. standard IR retrieval

1) papers co-cited with on-topic papers

7,8) papers cited during the past two years 9) well cited papers

12,13) papers published during the past two years

10,11) key early papers about specific genes

14) old papers

Page 35: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Outline

•  Motivation for Learning Similarity in Graphs •  A Baseline Similarity Metric •  Some Literature-related Tasks •  The Path Ranking Algorithm (Learning Method)

– Motivation – Details

•  Results: BioLiterature tasks •  Results: KB Inference tasks

[Lao, Mitchell, Cohen, EMNLP 2011]

Page 36: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Large Scale Knowledge-Bases

•  Large-Scale Collections of Automatically Extracted Knowledge

–  KnowItAll (Univ. Washington) •  0.5B facts extracted from 0.1B web pages

–  DBpedia (Univ. Leipzig) •  3.5M entities 0.7B facts extracted from wikipedia

–  YAGO (Max-Planck-Institute) •  2M entities 20M facts extracted from Wikipedia and wordNet

–  FreeBase •  20M entities 0.3B links, integrated from different data sources

and human judgments –  NELL (Never-Ending Language Learning, CMU)

•  0.85M facts extracted from 0.5B webpages

Page 37: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Inference in Noisy Knowledge Bases

•  Challenges –  Robustness: extracted knowledge is incomplete and noisy –  Scalability: the size of knowledge base is large

Page 38: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

The NELL Case Study •  Never-Ending Language Learning: “a never-ending learning

system that operates 24 hours per day, for years, to continuously improve its ability to read (extract structured facts from) the web” (Carlson et al., 2010)

•  Closed domain, semi-supervised extraction •  Combines multiple strategies: morphological patterns,

textual context, html patterns, logical inference

•  Example beliefs

Page 39: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

A Link Prediction Task

•  We consider 48 relations for which NELL database has more than 100 instances

•  We create two link prediction tasks for each relation –  AthletePlaysInLeague(HinesWard,?) –  AthletePlaysInLeague(?, NFL)

•  The actual nodes y known to satisfy R(x; ?) are treated as labeled positive examples, and all other nodes are treated as negative examples

Page 40: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Current NELL method (baseline)

•  FOIL (Quinlan and Cameron-Jones, 1993) is a learning algorithm similar to decision trees, but in relational domains

•  NELL implements two assumptions for efficient learning –  The predicates are functional --e.g. an athlete plays in at

most one league –  Only find clauses that correspond to bounded-length paths

of binary relations -- relational pathfinding (Richards & Mooney, 1992)

8/21/11 40

Page 41: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

•  FOL not great for handling uncertainty –  FOIL can only combine rules with disjunctions, therefore cannot

leverage low accuracy rules –  E.g. rules for teamPlaysSports

High

accuracy

but low

recall�

Current NELL method (baseline)

Page 42: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Experiments - Cross Validation on KB data (for parameter setting, etc)

RWR: Random Walk with Restart (PPR) �†Paired t-test give p-values 7x10-3, 9x10-4, 9x10-8, 4x10-4 �

† �

† �

† �

† �

Page 43: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Example Paths

Synonyms of the query

team

Page 44: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Evaluation by Mechanical Turk

•  There are many test queries per predicate –  All entities of a predicate’s domain/range, e.g.

•  WorksFor(person, organization) –  On average 7,000 test queries for each functional predicate, and

13,000 for each non-functional predicate

•  Sampled evaluation –  We only evaluate the top ranked result for each query –  We sort the queries for each predicate according to the scores of

their top ranked results, and then evaluate precisions at top 10, 100 and 1000 queries

•  Each belief is voted by 5 workers –  Workers are given assertions like “Hines Ward plays for the

team Steelers”, as well as Google search links for each entity

Page 45: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Evaluation by Mechanical Turk •  On 8 functional predicates where N-FOIL can successfully learn

–  PRA is comparable to N-FOIL for p@10, but has significantly better p@100

•  On 8 randomly sampled non-functional (one-many) predicates –  Slightly lower accuracy than functional predicates

Task #Rules

N-FOIL p@10 p@10

0 #Path

s

PRA p@10 p@10

0 Functional Predicates 2.1(+37) 0.76 0.380 43 0.79 0.668

Non-functional Predicates ---- ---- ---- 92 0.65 0.620

PRA: Path Ranking Algorithm �

Page 46: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Outline

•  Motivation for Learning Similarity in Graphs •  A Baseline Similarity Metric •  Some Literature-related Tasks •  The Path Ranking Algorithm (Learning Method)

– Motivation – Details

•  Results: BioLiterature tasks •  Results: KB Inference tasks

[Lao, Mitchell, Cohen, EMNLP 2011]

Page 47: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Outline

•  Motivation for Learning Similarity in Graphs •  A Baseline Similarity Metric •  Some Literature-related Tasks •  The Path Ranking Algorithm (Learning Method)

–  Motivation –  Details

•  Results: BioLiterature tasks •  Results: KB Inference tasks •  Conclusions

Page 48: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

Summary/Conclusion

•  Learning is the way to make a clean, elegant formulation of a task work in the messy, complicated real world

•  Learning how to navigate graphs is a significant, core task that models –  Recommendation, expert-finding, … –  Information retrieval –  Inference in KBs –  …

•  It includes significant, core learning problems –  Regularization/search of huge feature space –  Discovery: long paths, lexicalized paths, … –  Incorporating knowledge of graph structure … –  ….

Page 49: Learning Relationships Defined by Linear Combinations of … · 2011. 8. 21. · Random Walks William W. Cohen ... my own lab. A Limitation of RWR Learning Methods • Learning one-parameter-per-edge

49

•  Thanks to: –  The dedicated and persistent

–  NSF grant IIS-0811562 –  NIH grant R01GM081293 –  Gifts from Google

–  MLG Organizers!