Download - DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

DeepWalk: Online Learning of SocialRepresentations 1

Presented by Carlos Oliver for COMP 766

January 20, 2020

1Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. ”Deepwalk: Online learning of social representations.”

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2014.

Page 2: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

I Can we predict the label of a node given the graph?

I Problem: labels not i.i.d so traditional methods can’t beused. (MRF, Graph Kernels and other structured learningmodels needed)

Page 3: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

Page 4: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Motivation

Page 5: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

IdeaI Separate labels from underlying structure.

I Existing Approaches:I Graph statistics (neighbourhood overlap...)I Spectral clustering

I Limitations: often require full graph to compute ordomain-specific knowledge.

Page 6: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Page 7: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Page 8: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Relationship between social graphs and natural language

I This tells us that modeling the co-occurence of vertices givesus similar information to measuring co-occurence of words.

I Seeing words co-occur gives us information about thestructure of the language (or graph).

Page 9: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Page 10: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Page 11: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Page 12: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Random walks ∼ sentences

I Since co-occurence tells us about the structural context.Sample random walks from each node.

I Bonus: natural way to split up graph (i.e. don’t need wholegraph to get a node’s embedding)

Page 13: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

I Given a random walk, (v1, .., vi ), we update representationΦ(vi ) ∈ Rd to maximizes the likelihood of the walk.

arg minΦ− logP(v1, .., vi−1|Φ(vi ))

I Probabilities are given by a multi-label classifier which mapsΦ(vi )→ V k where k is the size of the walks.

I Nodes with similar Φ will have similar local graph ‘structure’.

Page 14: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

Page 15: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Learning objective

Page 16: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

I Skip-gram: lets us split the walk into sliding windows size wand update nodes in each window using SGD.

I arg minΦ− logP(v1, .., vi−1|Φ(vi )) becomes

I arg minΦ− logP(vi−w , vi−1, vi+1.., vi+w |Φ(vi ))

Page 17: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

Page 18: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

Page 19: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: SkipGram

Page 20: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

P(bk |Φ(vj)) = Πlog |V |l=1 P(bl |Φ(vj))

Page 21: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical Softmax

I Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

Page 22: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Speedups: Hierarchical SoftmaxI Hierarchical Softmax: reduces softmax normalization fromO(|V |) to O(log |V |) by building tree of binary classifiers.

I

Page 23: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Training loop

Page 24: DeepWalk: Online Learning of Socialwlh/comp766/files/carlos_oliver.pdf · DeepWalk: Online Learning of Social Representations 1 Presented by Carlos Oliver for COMP 766 January 20,

Evaluation

I Task: Multi-label node classification on large social graphs.I Evaluation: micro,macro F1 score

I F1 score is the harmonic mean of precision and recallI Macro is the arithmetic mean of F1 over all classesI Micro is total proportion of correct labels over all samples.