learning probabilistic models of link structure
DESCRIPTION
Learning Probabilistic Models of Link Structure. Getoor, Friedman, Koller, Taskar. Example Application: WebKB. Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class of those pages, recursively) - PowerPoint PPT PresentationTRANSCRIPT
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Learning Probabilistic Models of
Link Structure
Getoor, Friedman, Koller, Taskar
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Example Application: WebKB
Classify web page as course, student, professor, project, none using… Words on the web page Links from other web pages (and the class
of those pages, recursively) Words in the “anchor text” from the other
page <a href=“url”>anchor text</a>. Web pages obtained from Cornell,
Texas, Washington, and Wisconsin
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Example Application: CORA
Classify documents according to topic (7 levels) using… words in the document papers cited by the document papers citing the document
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Standard PRM
parents(Doc.class) = {MODE(Doc.citers.class),MODE(Doc.cited.class)}
Document
class
words Document
class
words
Document
class
words
Document
class
words
Document
class
words
Document
class
words
Document
class
words
Document
class
words
citers
cited
MODE
MODE
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
Problem: The Citation Structure is Fixed
The existence (or non-existence) of a link cannot serve as evidence
Individually-linked papers only influence the class through the MODE.
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
Possible Solution: Link Uncertainty
Model the existence of links as random variables
Create a Link instance for each pair of possibly-linked objects
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Unrolled Network
Document
class
words
Document
class
words
Document
class
wordsCites
Exists
Cites
Exists
Cites
Exists
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Getoor’s Diagram
Entity classes (Paper) Relation classes (Cites) Technically, every instance has an Exists
variable which is true for all Entity instances.
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Semantics
P is the basic CPT P* will be the equivalent unrolled CPT Require that an object does not exist if
any of the objects it points to do not exist
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms WebKB Network
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Experimental Results
Cora and WebKB
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms WebKB with various features
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
A Second Approach:Reference Uncertainty
Treat reference attributes as random variables Each reference attribute takes as value an
object of the indicated class
Citation Citing: reference attribute, value is a Paper Cited: reference attribute, value is a Paper
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Problems
How many citation objects exist? Consequently, how many reference random variables exist?
How do we represent P(Citation.cites | …)? Citation.cites could take on thousands of possible values. Huge conditional probability table Costly inference at run time
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
SolutionsProblem 1: How many citations?
Fix the number of Citation objects This gives the “object skeleton”
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
Problem 2: Too many potential values for a reference attribute
Attach to each reference attribute a set of partition attributes The reference attribute chooses a partition A Paper is then chosen uniformly at random from
the partition
Citation
CitingCited
PaperPaper
Paper
Theory
PaperPaper
Paper
GraphicsPaper
PaperPaper
Learning
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms
Representing Constraints Between Citing and Cited Papers
Parents(Cites.Cited) = {Cites.Citing.Topic}
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Details
Each reference attribute has a selector attribute S that chooses the partition.
Citation
PaperPaper
Paper
Learning
PaperPaper
Theory
Paper
Paper
Graphics
PaperPaper
Sciting
Citing
Scited
Cited
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Class-level Dependency Graph
Five types of edges Type I: edges within a single object Type II: edges between objects Type III: edges from every reference attribute along
any reference paths Type IV: edges from every partition attribute to the
selector attributes that use those partition attributes to choose a partition
Type V: edge from selector attributes to their corresponding reference attributes
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Movie Theater Example
Type I: Genre Popularity Type II: Shows.Movie.Genre Shows.Profit
Shows.Theater.Type SMovie
Type III: Move Profit; Theater Smovie
Type IV: Genre SMovie
Type V: STheater Theater; SMovie Movie
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Unrolled Graph?
The Unrolled Graph can have a huge number of edges
Is learning and inference really feasible?
Ore
gon
Sta
te U
nive
rsit
y –
CS
539
PR
Ms Homework Exercise
Construct the dependency graph for the citation example
Construct an unrolled network for a reference uncertainty example