learning probabilistic models of link structure

22
Oregon State University – CS539 PRMs Learning Probabilistic Models of Link Structure Getoor, Friedman, Koller, Taskar

Upload: halil

Post on 12-Jan-2016

25 views

Category:

Documents


5 download

DESCRIPTION

Learning Probabilistic Models of Link Structure. Getoor, Friedman, Koller, Taskar. Example Application: WebKB. Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class of those pages, recursively) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Learning Probabilistic Models of

Link Structure

Getoor, Friedman, Koller, Taskar

Page 2: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Example Application: WebKB

Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class

of those pages, recursively) Words in the “anchor text” from the other

page <a href=“url”>anchor text</a>. Web pages obtained from Cornell,

Texas, Washington, and Wisconsin

Page 3: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Example Application: CORA

Classify documents according to topic (7 levels) using… words in the document papers cited by the document papers citing the document

Page 4: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Standard PRM

parents(Doc.class) = {MODE(Doc.citers.class),MODE(Doc.cited.class)}

Document

class

words Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

Document

class

words

citers

cited

MODE

MODE

Page 5: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Problem: The Citation Structure is Fixed

The existence (or non-existence) of a link cannot serve as evidence

Individually-linked papers only influence the class through the MODE.

Page 6: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Possible Solution: Link Uncertainty

Model the existence of links as random variables

Create a Link instance for each pair of possibly-linked objects

Page 7: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Unrolled Network

Document

class

words

Document

class

words

Document

class

wordsCites

Exists

Cites

Exists

Cites

Exists

Page 8: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Getoor’s Diagram

Entity classes (Paper) Relation classes (Cites) Technically, every instance has an Exists

variable which is true for all Entity instances.

Page 9: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Semantics

P is the basic CPT P* will be the equivalent unrolled CPT Require that an object does not exist if

any of the objects it points to do not exist

Page 10: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms WebKB Network

Page 11: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Experimental Results

Cora and WebKB

Page 12: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms WebKB with various features

Page 13: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

A Second Approach:Reference Uncertainty

Treat reference attributes as random variables Each reference attribute takes as value an

object of the indicated class

Citation Citing: reference attribute, value is a Paper Cited: reference attribute, value is a Paper

Page 14: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Problems

How many citation objects exist? Consequently, how many reference random variables exist?

How do we represent P(Citation.cites | …)? Citation.cites could take on thousands of possible values. Huge conditional probability table Costly inference at run time

Page 15: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

SolutionsProblem 1: How many citations?

Fix the number of Citation objects This gives the “object skeleton”

Page 16: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Problem 2: Too many potential values for a reference attribute

Attach to each reference attribute a set of partition attributes The reference attribute chooses a partition A Paper is then chosen uniformly at random from

the partition

Citation

CitingCited

PaperPaper

Paper

Theory

PaperPaper

Paper

GraphicsPaper

PaperPaper

Learning

Page 17: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms

Representing Constraints Between Citing and Cited Papers

Parents(Cites.Cited) = {Cites.Citing.Topic}

Page 18: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Details

Each reference attribute has a selector attribute S that chooses the partition.

Citation

PaperPaper

Paper

Learning

PaperPaper

Theory

Paper

Paper

Graphics

PaperPaper

Sciting

Citing

Scited

Cited

Page 19: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Class-level Dependency Graph

Five types of edges Type I: edges within a single object Type II: edges between objects Type III: edges from every reference attribute along

any reference paths Type IV: edges from every partition attribute to the

selector attributes that use those partition attributes to choose a partition

Type V: edge from selector attributes to their corresponding reference attributes

Page 20: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Movie Theater Example

Type I: Genre Popularity Type II: Shows.Movie.Genre Shows.Profit

Shows.Theater.Type SMovie

Type III: Move Profit; Theater Smovie

Type IV: Genre SMovie

Type V: STheater Theater; SMovie Movie

Page 21: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Unrolled Graph?

The Unrolled Graph can have a huge number of edges

Is learning and inference really feasible?

Page 22: Learning Probabilistic Models of Link Structure

Ore

gon

Sta

te U

nive

rsit

y –

CS

539

PR

Ms Homework Exercise

Construct the dependency graph for the citation example

Construct an unrolled network for a reference uncertainty example