statistical properties of network community structure
DESCRIPTION
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research. Statistical properties of network community structure. Network communities. Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/1.jpg)
Statistical properties of network community structureJure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research
![Page 2: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/2.jpg)
2
Network communities
Communities: Sets of nodes with lots
of connections inside and few to outside (the rest of the network)
Assumption: Networks are
(hierarchically) composed of communities (modules)
Communities, clusters, groups,
modules
Our question:Are large
networks really like this?
![Page 3: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/3.jpg)
3
Community score (quality) How community like is a
set of nodes?
Need a natural intuitive measure
Conductance (normalized cut)Φ(S) = # edges cut / # edges inside
Small Φ(S) corresponds to more community-like sets of nodes
S
S’
![Page 4: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/4.jpg)
4
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
What is “best”
community of 5 nodes?
![Page 5: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/5.jpg)
5
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Bad communit
yΦ=5/6 = 0.83
What is “best”
community of 5 nodes?
![Page 6: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/6.jpg)
6
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
What is “best”
community of 5 nodes?
![Page 7: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/7.jpg)
7
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
Best communit
yΦ=2/8 = 0.25
What is “best”
community of 5 nodes?
![Page 8: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/8.jpg)
8
Network Community Profile Plot We define:
Network community profile (NCP) plotPlot the score of best community of size k
Search over all subsets of size k and find best: Φ(k=5) = 0.25
NCP plot is intractable to compute Use approximation algorithm
![Page 9: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/9.jpg)
9
NCP plot: Small Social Network Dolphin social network
Two communities of dolphins
NCP plotNetwork
![Page 10: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/10.jpg)
10
NCP plot: Zachary’s karate club
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B
NCP plotNetwork
![Page 11: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/11.jpg)
11
NCP plot: Network Science Collaborations between scientists in Networks
NCP plotNetwork
![Page 12: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/12.jpg)
12
Geometric and Hierarchical graphs
Hierarchical network
Geometric (grid-like) network – Small social
networks– Geometric and– Hierarchical networkhave downward NCP plot
![Page 13: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/13.jpg)
13
Our work: Large networks Previously researchers examined community
structure of small networks (~100 nodes) We examined more than 70 different large
social and information networks
Large real-world networks look very
different!
![Page 14: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/14.jpg)
14
Example of our findings Typical example:
General relativity collaboration network (4,158 nodes, 13,422 edges)
![Page 15: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/15.jpg)
15
Com
mun
ity sc
ore
Community size
NCP: LiveJournal (N=5M, E=42M)
Better and better
communities
Best communities get worse and worse
Best community
has 100 nodes
![Page 16: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/16.jpg)
16
Explanation: Downward part Whiskers are responsible for
downward slope of NCP plot
Whisker is a set of nodes connected to the network by a single
edge
NCP plot
Largest whisker
![Page 17: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/17.jpg)
17
Explanation: Upward part Each new edge inside the
community costs moreNCP plot
Φ=2/4 = 0.5Φ=8/6 = 1.3
Φ=64/14 = 4.5
Each node has twice as many
children
Φ=1/3 = 0.33
![Page 18: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/18.jpg)
18
Suggested network structure
Network structure: Core-
periphery (jellyfish, octopus)
Whiskers are responsible for
good communities
Denser and denser
core of the network
Core contains 60%
node and 80% edges
![Page 19: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/19.jpg)
19
Caveat: Bag of whiskersWhat if we allow
cuts that give disconnected communities?
•Cut all whiskers •Compose communities out of whiskers•How good “community” do we get?
![Page 20: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/20.jpg)
20
Communities made of whiskers
Com
mun
ity sc
ore
Community size
We get better community scores when
composing disconnected sets of
whiskersConnected communitie
sBag of whiskers
![Page 21: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/21.jpg)
21
Comparison to rewire network Take a real network G Rewire edges for a long time We obtain a random graph with same degree
distribution as the real network G
![Page 22: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/22.jpg)
22
Comparison to a rewired network
Rewired network: random
network with same degree distribution
![Page 23: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/23.jpg)
23
What is a good model?
What is a good model that explains such network structure?
None of the existing models work
Pref. attachment Small World Geometric Pref. Attachment
FlatDown and Flat
Flat and Down
![Page 24: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/24.jpg)
24
Forest Fire model works Forest Fire: connections spread like a fire
New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively
As community grows it
blends into the core of
the network
![Page 25: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/25.jpg)
25
Forest Fire NCP plot
rewired
network
Bag of whisk
ers
![Page 26: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/26.jpg)
26
Conclusion and connections Whiskers:
Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social
relationship to at most 150 people Bond vs. identity communities
Core: Core has little structure (hard to cut) Still more structure than the random network
![Page 27: Statistical properties of network community structure](https://reader036.vdocuments.net/reader036/viewer/2022062812/56816307550346895dd38588/html5/thumbnails/27.jpg)
27
Conclusion and connections NCP plot is a way to analyze network
community structure Our results agree with previous work on small
networks (that are commonly used for testing community finding algorithms)
But large networks are different Large networks
Whiskers + Core structure Small well isolated communities blend into the core
of the networks as they grow