simrank : a measure of structural- context similarity advisor : dr. hsu graduate : sheng-hsuan wang...

21
SimRank : A Measure of Stru ctural-Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Upload: sabina-ward

Post on 14-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

SimRank : A Measure of Structural-Context Similarity

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang

Author : Glen JehJennifer Widom

Page 2: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Outline

Motivation Objective Introduction Basic Graph Model SimRank Random Surfer-Pairs Model Future Work Personal opinion

Page 3: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Motivation

The problem of measuring “similarity” of objects arises in many applications.

Page 4: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Objective The approach, applicable in any

domain with object-to-object relationships.

Two objects are similar if they are related to similar objects.

Page 5: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Introduction

Page 6: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Basic Graph Model We model objects and

relationships as a directed graph G=(V,E).

For a node v in a graph, we denote by I(v) and O(v) the set of in-neighbors and out-neighbors.

Page 7: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

SimRank Basic SimRank Equation

If a=b then s(a,b) is defined to be 1. Otherwise,

Where C is a constant between 0 and 1. Set s(a,b)=0 when     or     .

)|(|

1

)|(|

1

))(),((|)(||)(|

),(aI

i

bI

jji bIaIs

bIaI

Cbas (1)

)(aI )(bI

Page 8: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Bipartite SimRank Two types of objects. Example : Shopping graph G.

SimRank

Page 9: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

SimRank

Page 10: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Let s(A,B) denote the similarity between persons A and B, for

Let s(c,d) denote the similarity between items c and d, for

SimRank

BA

dc

)|(|

1

)|(|

1

1 ))(),((|)(||)(|

),(AO

i

BO

jji BOAOs

BOAO

CBAs (2)

)|(|

1

)|(|

1

2 ))(),((|)(||)(|

),(cI

i

dI

jji dIcIs

dIcI

Cdcs (3)

Page 11: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Computing SimRank - Naive Method is a lower bound on the .

To compute     from

SimRank

010 {),( baR

(if )

(if )

ba ba

),(1 baRk (*,*)kR

)|(|

1

)|(|

11 ))(),((

|)(||)(|),(

aI

i

bI

jjikk bIaIR

bIaI

CbaR (4)

ba ba 1),(1 baRkFor , and for .

),(0 baR ),( bas

Page 12: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

The space required is simply to store the results .

The time required is . K:The number of iterations :The average of |I(a)||I(b)| over all

node pairs (a,b).

SimRank)( 2nO

kR

)( 22dKnO

2d

Page 13: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Computing SimRank - Pruning set the similarity between two nodes far

apart to be 0. consider node-pairs only for nodes which

are near each other.

SimRank

Page 14: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Radius r, and average such neighbors for a node, then there will be node-pairs.

The time and space complexities become and respectively.

SimRank

)( rndO)( 2dKndO r

rd

rnd

Page 15: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Random Surfer-Pair Model Expected Distance

Let H be any strongly connected graph.

Let u,v be any two nodes in H. We define the expected distance

d(u,v) from u to v as

vut

tltPvud:

)(][),( (5)

Page 16: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Expected Meeting Distance(EMD).

Random Surfer-Pair Model

),(),(:

)(][),(xxbat

tltPbam (6)

Page 17: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Expected-f Meeting Distance To circumvent the “infinite EMD”

problem. To map all distances to a finite

interval. Exponential function ,where

is a constant.

Random Surfer-Pair Model

),(),(:

)(][),(xxbat

tlctPbas (7)

zczf )(

)1,0(c

Page 18: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Equivalence to SimRank

Random Surfer-Pair Model

Page 19: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Theorem. The SimRank score, with parameter C, be

tween two nodes is their expected-f meeting distance traveling back-edges,for .

Random Surfer-Pair Model

zczf )(

Page 20: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Future Work

Future Work. Divided and conquer and merge.

Divided a corpus into chunks… Ternary(or more) relationships.

Page 21: SimRank : A Measure of Structural- Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom

Personal Opinion

We believe that the intuition behind SimRank can be used in many domains which based on objects to objects.