spectral clusteringwcohen/10-605/unsup-on-graphs.pdf · spectral clustering: graph = matrix w*v 1 =...

70
1

Upload: others

Post on 02-Aug-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

1

Page 2: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering

2

Page 3: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

spectral k-means

after transformation

3

Page 4: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

4

Page 5: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=Matrix

AB

C

FD

E

GI

HJ

A B C D E F G H I JA 1 1 1B 1 1C 1D 1 1E 1F 1 1 1G 1H 1 1 1I 1 1 1J 1 1

5

Page 6: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixTransitivelyClosedComponents=“Blocks”

AB

C

FD

E

GI

HJ

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

Of course we can’t see the “blocks” unless the nodesare sorted by cluster…

sometimes called a block-stochastic matrix:• each node has a latent “block”• fixed probability qi for links between

elements of block i• fixed probability q0 for links between

elements of different blocks

6

Page 7: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixVector=NodeàWeight

H

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

AB

C

FD

E

GI

J

AA 3B 2C 3DEFGHIJ

M v

7

Page 8: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixM*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

AB

C

FD

E

I

A 3B 2C 3DEFGHIJ

M

M v1

A 2*1+3*1+0*1

B 3*1+3*1

C 3*1+2*1

DEFGHIJ

v2* =

8

Page 9: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I J

A _ .5 .5 .3

B .3 _ .5

C .3 .5 _

D _ .5 .3

E .5 _ .3

F .3 .5 .5 _

G _ .3 .3

H _ .3 .3

I .5 .5 _ .3

J .5 .5 .3 _

AB

C

FD

E

I

A 3

B 2

C 3

D

E

F

G

H

I

J

W v1

A 2*.5+3*.5+0*.3

B 3*.3+3*.5

C 3*.33+2*.5

D

E

F

G

H

I

J

v2* =W: normalized so columns sum to 1

9

Page 10: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×

Q: How do I pick v to be an eigenvector

for a block-stochastic matrix?

10

Page 11: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×

How do I pick v to be an eigenvector

for a block-stochastic matrix?

11

Page 12: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×

[Shi & Meila, 2002]

λ2

λ3

λ4

λ5,6,7,…

.

λ1e1

e2

e3“eigengap”

12

Page 13: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×

[Shi & Meila, 2002]

e2

e3

-0.4 -0.2 0 0.2

-0.4

-0.2

0.0

0.2

0.4

xx x xx x

yyyy

y

xx xxx x

zzzzz zzzzz z e1

e2

13

Page 14: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×If W is connected but roughly block diagonal with k blocks then• the top eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

14

Page 15: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×If W is connected but roughly block diagonal with k blocks then• the “top” eigenvector is a constant vector • the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

Spectral clustering:• Find the top k+1 eigenvectors v1,…,vk+1• Discard the “top” one• Replace every node a with k-dimensional vector xa = <v2(a),…,vk+1 (a) >• Cluster with k-means

15

Page 16: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Backtothe2-Dexamples

• Create a graph.• Each 2D point is a vertex• Add edges connecting all points in the 2-D initial space to all other points• Weight of edge is distance

b

b

b

b

b

g g

g

g

g

g g

g gr r r r

r r r…

16

Page 17: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:ProsandCons• Elegant,andwell-foundedmathematically• Tendstoavoidlocalminima

– Optimalsolutiontorelaxedversionofmincut problem(Normalizedcut,akaNCut)

• Worksquitewellwhenrelationsareapproximatelytransitive(likesimilarity,socialconnections)

• Expensiveforverylargedatasets– Computingeigenvectorsisthebottleneck– Approximateeigenvectorcomputationnotalwaysuseful

• Noisydatasetssometimescauseproblems– Pickingnumberofeigenvectorsandkistricky– “Informative” eigenvectorsneednotbeintopfew– Performancecandropsuddenlyfromgoodtoterrible

17

Page 18: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Experimentalresults:best-caseassignmentofclasslabelstoclusters

18

Page 19: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral clustering as Optimization

19

Page 20: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

ll eigenvaluer with eigenvectoan is : vvvW =ו A = adjacency matrix• D = diagonal normalizer for A (# outlinks)• W = A D-1

• smallest eigenvecs of D-A are largest eigenvecs of A• smallest eigenvecs of I-W are largest eigenvecs of W

Suppose each y(i)=+1 or -1:• Then y is a cluster indicator that splits the nodes into two • what is yT(D-A)y ?

20

Page 21: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

úû

ùêë

é-=

úû

ùêë

é-+=

úúû

ù

êêë

é-÷

ø

öçè

æ+÷÷

ø

öççè

æ=

úû

ùêë

é-=

-=-=-

å

ååå

åå åå å

åå

åå

jijiji

jijiji

jijij

jiiij

jijiji

jj

iij

ii

jij

jijiji

iii

jijiji

iii

TTT

yya

yyayaya

yyayaya

yyayd

yyaydADAD

,

2,

,,

,

2

,

2

,,

22

,,

2

,,

2

)(21

221

221

2221

)( yyyyyy

= size of CUT(y)

)NCUT( of size)( yyy =-WIT

NCUT: roughly minimize ratio of transitions between classes vs transitions within classes

Same as smoothness criterion used in SSL but now there are no seeds

21

Page 22: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering: Graph = MatrixW*v1 = v2 “propogates weights from neighbors”

ll eigenvaluer with eigenvectoan is : vvvW =ו smallest eigenvecs of D-A are largest eigenvecs of A• smallest eigenvecs of I-W are largest eigenvecs of WSuppose each y(i)=+1 or -1:• Then y is a cluster indicator that cuts the nodes into two • what is yT(D-A)y ? The cost of the graph cut defined by y• what is yT(I-W)y ? Also a cost of a graph cut defined by y• How to minimize it?

• Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X• But: this will not be +1/-1, so it’s a “relaxed” solution

22

Page 23: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

ll eigenvaluer with eigenvectoan is : vvvW =×

[Shi & Meila, 2002]

λ2

λ3

λ4

λ5,6,7,…

.

λ1e1

e2

e3“eigengap”

23

Page 24: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Another way to think about spectral clustering….

• Mostnormalpeoplethinkaboutspectralclusteringlikethat- asrelaxedoptimization

• …me,notsomuch• Iliketheconnectionto“averaging”…because….itleadstoanicefastvariationofspectralclusteringcalledpoweriterationclustering

24

Page 25: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Unsupervised Learning on Graphs with Power Iteration

Clustering

25

Page 26: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering: Graph = Matrix

AB

C

FD

E

GI

HJ

A B C D E F G H I JA 1 1 1B 1 1C 1D 1 1E 1F 1 1 1G 1H 1 1 1I 1 1 1J 1 1

26

Page 27: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering: Graph = MatrixTransitively Closed Components = “Blocks”

AB

C

FD

E

GI

HJ

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

Of course we can’t see the “blocks” unless the nodesare sorted by cluster…

sometimes called a block-stochastic matrix:• each node has a latent “block”• fixed probability qi for links between

elements of block i• fixed probability q0 for links between

elements of different blocks

27

Page 28: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Spectral Clustering: Graph = MatrixVector = Node à Weight

H

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

AB

C

FD

E

GI

J

AA 3B 2C 3DEFGHIJ

M

M v

28

Page 29: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixM*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

AB

C

FD

E

I

A 3B 2C 3DEFGHIJ

M v1

A 2*1+3*1+0*1

B 3*1+3*1

C 3*1+2*1

D

E

F

G

H

I

J

v2* =

29

Page 30: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I J

A _ .5 .5 .3

B .3 _ .5

C .3 .5 _

D _ .5 .3

E .5 _ .3

F .3 .5 .5 _

G _ .3 .3

H _ .3 .3

I .5 .5 _ .3

J .5 .5 .3 _

AB

C

FD

E

I

A 3

B 2

C 3

D

E

F

G

H

I

J

W v1

A 2*.5+3*.5+0*.3

B 3*.3+3*.5

C 3*.33+2*.5

D

E

F

G

H

I

J

v2* =W: normalized so columns sum to 1

30

Page 31: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Repeatedaveragingwithneighborsasaclusteringmethod

• Pickavectorv0(maybeatrandom)• Computev1 =Wv0

– i.e.,replacev0[x]withweightedaverage ofv0[y]fortheneighborsyofx• Plotv1[x]foreachx• Repeatforv2,v3,…

• Variantswidelyusedforsemi-supervised learning– HF/CoEM/wvRN - average+clampingoflabelsfornodeswithknownlabels

• Withoutclamping,willconvergetoconstantvt

• Whatarethedynamics ofthisprocess?

31

Page 32: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Repeatedaveragingwithneighborsonasampleproblem…

• Create a graph, connecting all points in the 2-D initial space to all other points

• Weighted by distance• Run power iteration (averaging) for 10 steps• Plot node id x vs v10(x)

• nodes are ordered and colored by actual cluster number

b

b

b

b

b

g g

g

g

g

g g

g gr r r r

r r r…

blue green ___red___

32

Page 33: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Repeatedaveragingwithneighborsonasampleproblem…

blue green ___red___blue green ___red___ blue green ___red___

smal

ler

larg

er

33

Page 34: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Repeatedaveragingwithneighborsonasampleproblem…

blue green ___red___ blue green ___red___blue green ___red___

blue green ___red___blue green ___red___

34

Page 35: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Repeatedaveragingwithneighborsonasampleproblem…

very

smal

l

35

Page 36: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

PIC:PowerIterationClusteringrunpoweriteration(repeatedaveragingw/neighbors)with

earlystopping

– V0:randomstart,or“degreematrix” D,or…– Easytoimplementandefficient– Veryeasilyparallelized

– Experimentally,oftenbetterthan traditionalspectralmethods

– Surprisingsincetheembeddedspaceis1-dimensional!

36

Page 37: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Experiments

• “Network” problems:naturalgraphstructure– PolBooks:105politicalbooks,3classes,linkedbycopurchaser– UMBCBlog:404politicalblogs,2classes,blogroll links– AGBlog:1222politicalblogs,2classes,blogroll links

• “Manifold” problems:cosinedistancebetweenclassificationinstances– Iris:150flowers,3classes– PenDigits01,17:200handwrittendigits,2classes(0-1or1-7)– 20ngA:200docs,misc.forsale vs soc.religion.christian– 20ngB:400docs,misc.forsale vs soc.religion.christian– 20ngC:20ngB+200docsfromtalk.politics.guns– 20ngD:20ngC+200docsfromrec.sport.baseball

37

Page 38: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Experimentalresults:best-caseassignmentofclasslabelstoclusters

38

Page 39: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

39

Page 40: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Experiments:runtimeandscalability

Time in millisec

40

Page 41: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

More experiments

VaryingthenumberofclustersforPICandPIC4(startswithrandom4-dpointratherthanarandom1-dpoint).

41

Page 42: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

More experiments

More“real”networkdatasetsfromvariousdomains

42

Page 43: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

More experiments

43

Page 44: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

44

Page 45: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

LEARNING ON GRAPHS FOR NON-GRAPH DATASETS

45

Page 46: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Why I’m talking about graphs• Lotsoflargedataisgraphs

– Facebook,Twitter,citationdata,andothersocialnetworks

– Theweb,theblogosphere,thesemanticweb,Freebase,Wikipedia,Twitter,andotherinformation networks

– Textcorpora(likeRCV1),largedatasetswithdiscretefeaturevalues,andotherbipartitenetworks

• nodes=documentsorwords• linksconnectdocumentàwordorwordà document

– Computernetworks,biologicalnetworks(proteins,ecosystems,brains,…),…

– Heterogeneousnetworkswithmultipletypesofnodes• people,groups,documents

46

Page 47: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

proposal

CMU

CALO

graph

William

Simplest Case: Bi-partite Graphs

47

Page 48: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Motivation:ExperimentalDatasets`UsedforPICare…

• “Network” problems:naturalgraphstructure– PolBooks:105politicalbooks,3classes,linkedbycopurchaser– UMBCBlog:404politicalblogs,2classes,blogroll links– AGBlog:1222politicalblogs,2classes,blogroll links– Also:Zachary’skarateclub,citationnetworks,...

• “Manifold” problems:cosinedistancebetweenallpairsofclassificationinstances– Iris:150flowers,3classes– PenDigits01,17:200handwrittendigits,2classes(0-1or1-7)– 20ngA:200docs,misc.forsale vs soc.religion.christian– 20ngB:400docs,misc.forsale vs soc.religion.christian– …

Gets expensive fast

48

Page 49: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixA*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I JA _ 1 1 1B 1 _ 1C 1 1 _D _ 1 1E 1 _ 1F 1 1 _G _ 1 1H _ 1 1I 1 1 _ 1J 1 1 1 _

AB

C

FD

E

I

A 3B 2C 3DEFGHIJ

M

A v1

A 2*1+3*1+0*1

B 3*1+3*1

C 3*1+2*1

D

E

F

G

H

I

J

v2* =

49

Page 50: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

SpectralClustering:Graph=MatrixW*v1 =v2“propogates weightsfromneighbors”

A B C D E F G H I J

A _ .5 .5 .3

B .3 _ .5

C .3 .5 _

D _ .5 .3

E .5 _ .3

F .3 .5 .5 _

G _ .3 .3

H _ .3 .3

I .5 .5 _ .3

J .5 .5 .3 _

AB

C

FD

E

I

A 3

B 2

C 3

D

E

F

G

H

I

J

W v1

A 2*.5+3*.5+0*.3

B 3*.3+3*.5

C 3*.33+2*.5

D

E

F

G

H

I

J

v2* =W: normalized so columns sum to 1

W = D-1*A D[i,i]=1/degree(i)50

Page 51: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Lazycomputationofdistancesandnormalizers

• RecallPIC’supdateis– vt =W*vt-1==D-1A*vt-1

– …whereDisthe[diagonal]degreematrix:D=A*1• Myfavoritedistancemetricfortextislength-normalizedTFIDF:– Def’n:A(i,j)=<vi,vj>/||vi||*||vj||– LetN(i,i)=||vi||…andN(i,j)=0fori!=j– LetF(i,k)=TFIDFweightofwordwk indocumentvi–Then:A=N-1FTFN-1

<u,v>=inner product||u|| is L2-norm

1 is a column vector of 1’s

51

Page 52: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Lazycomputationofdistancesandnormalizers

• RecallPIC’supdateis– vt =W*vt-1==D-1A*vt-1

– …whereDisthe[diagonal]degreematrix:D=A*1– LetF(i,k)=TFIDFweightofwordwk indocumentvi– ComputeN(i,i)=||vi||…andN(i,j)=0fori!=j– Don’t computeA=N-1FTFN-1

– LetD(i,i)=N-1FTFN-1*1where1isanall-1’svector• ComputedasD=N-1(FT(F(N-1*1)))forefficiency

–Newupdate:• vt =D-1A*vt-1 =D-1 N-1FTFN-1*vt-1

Equivalent to using TFIDF/cosine on all pairs of examples but requires only

sparse matrices

52

Page 53: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Experimentalresults

• RCV1textclassificationdataset– 800k+newswirestories– Categorylabelsfromindustry vocabulary– Tooksingle-labeldocumentsandcategorieswithatleast500instances

– Result:193,844documents,103categories• Generated100randomcategorypairs

– Eachisalldocumentsfromtwocategories– Rangeinsizeanddifficulty– Pickcategory1,withm1 examples– Pickcategory2suchthat0.5m1<m2<2m1

53

Page 54: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Results

•NCUTevd: Ncut with exact eigenvectors•NCUTiram: Implicit restarted Arnoldi method•No stat. signif. diffs between NCUTevd and PIC

54

Page 55: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Results

55

Page 56: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Results

56

Page 57: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Results

57

Page 58: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Results• Linearrun-timeimpliesconstantnumberofiterations

• Numberofiterationsto“acceleration-convergence” ishardtoanalyze:–Fasterthanasinglecompleterunofpoweriterationtoconvergence

–Onourdatasets• 10-20iterationsistypical• 30-35isexceptional

58

Page 59: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

59

Page 60: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

PIC AND SSL

60

Page 61: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

PIC:PowerIterationClusteringrunpoweriteration(repeatedaveragingw/neighbors)with

earlystopping

61

Page 62: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

HarmonicFunctions/CoEM/wvRN

thenreplacevt+1(i)withseedvalues+1/-1forlabeleddata

for5-10iterations

Classifydatausingfinalvaluesfromv

62

Page 63: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

ImplicitManifoldsontheNELLdatasets

Paris

live in arg1

San FranciscoAustin

traits such as arg1

anxiety

mayor of arg1

Pittsburgh

Seattle

denial

arg1 is home of

selfishness

Nodes “near” seeds Nodes “far from” seeds

arrogancetraits such as arg1

denialselfishness

63

Page 64: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

64

Page 65: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

65

Page 66: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

66

Page 67: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

67

Page 68: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

68

Page 69: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSLA smoothing trick:

69

Page 70: Spectral Clusteringwcohen/10-605/unsup-on-graphs.pdf · Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogatesweights from neighbors” W × v = lv: v is an eigenvecto r with

Using the Manifold Trick for SSL

70