coconuts structuredetection coconuts … · 2016. 4. 15. · coconuts overview methods...

11
. . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 1 of 76 Structure detection methods Complex Networks | @networksvox CSYS/MATH 303, Spring, 2016 Prof. Peter Dodds | @peterdodds Dept. of Mathematics & Statistics | Vermont Complex Systems Center Vermont Advanced Computing Core | University of Vermont Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 2 of 76 These slides are brought to you by: . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 3 of 76 Outline Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 4 of 76 Structure detection Zachary’s karate club [18, 12] The issue: how do we elucidate the internal structure of large networks across many scales? Possible substructures: hierarchies, cliques, rings, … Plus: All combinations of substructures. Much focus on hierarchies... . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 5 of 76 . . “Community detection in graphs” Santo Fortunato, Physics Reports, 486, 75–174, 2010. [6] . COcoNuTS Overview Methods Hierarchy by aggregation Hierarchy by division Hierarchy by shuffling Spectral methods Hierarchies & Missing Links Overlapping communities Link-based methods General structure detection References . . . . . . . . . . . . . 7 of 76 . Hierarchy by aggregation—Bottom up: . . Idea: Extract hierarchical classification scheme for ԃ objects by an agglomeration process. Need a measure of distance between all pairs of objects. Example: Ward’s method [?] Procedure: 1. Order pair-based distances. 2. Sequentially add links between nodes based on closeness. 3. Use additional criteria to determine when clusters are meaningful. Clusters gradually emerge, likely with clusters inside of clusters. Call above property Modularity. Works well for data sets where a distance between all objects can be specified (e.g., Aussie Rules [9] ).

Upload: others

Post on 21-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................1 of 76

Structure detection methodsComplex Networks | @networksvox

CSYS/MATH 303, Spring, 2016

Prof. Peter Dodds | @peterdodds

Dept. of Mathematics & Statistics | Vermont Complex Systems CenterVermont Advanced Computing Core | University of Vermont

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................2 of 76

These slides are brought to you by:

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................3 of 76

Outline

Overview

MethodsHierarchy by aggregationHierarchy by divisionHierarchy by shufflingSpectral methodsHierarchies & Missing LinksOverlapping communitiesLink-based methodsGeneral structure detection

References

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................4 of 76

Structure detection

▲ Zachary’s karate club [18, 12]

The issue:how do weelucidate theinternal structure oflarge networksacross many scales?

Possible substructures:hierarchies, cliques, rings, …

Plus:All combinations of substructures.

Much focus on hierarchies...

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................5 of 76

.

.

Physics Reports 486 (2010) 75–174

Contents lists available at ScienceDirect

Physics Reports

journal homepage: www.elsevier.com/locate/physrep

Community detection in graphs

Santo Fortunato ∗

Complex Networks and Systems Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino, I, Italy

a r t i c l e i n f o

Article history:

Accepted 5 November 2009

Available online 4 December 2009

editor: I. Procaccia

Keywords:

Graphs

Clusters

Statistical physics

a b s t r a c t

The modern science of networks has brought significant advances to our understanding of

complex systems. One of the most relevant features of graphs representing real systems

is community structure, or clustering, i.e. the organization of vertices in clusters, with

many edges joining vertices of the same cluster and comparatively few edges joining

vertices of different clusters. Such clusters, or communities, can be considered as fairly

independent compartments of a graph, playing a similar role like, e.g., the tissues or the

organs in the human body. Detecting communities is of great importance in sociology,

biology and computer science, disciplines where systems are often represented as graphs.

This problem is very hard and not yet satisfactorily solved, despite the huge effort of a

large interdisciplinary community of scientists working on it over the past few years. We

will attempt a thorough exposition of the topic, from the definition of the main elements

of the problem, to the presentation of most methods developed, with a special focus on

techniques designed by statistical physicists, from the discussion of crucial issues like the

significance of clustering and how methods should be tested and compared against each

other, to the description of applications to real networks.

© 2009 Elsevier B.V. All rights reserved.

Contents

1. Introduction............................................................................................................................................................................................. 76

2. Communities in real-world networks ................................................................................................................................................... 78

3. Elements of community detection......................................................................................................................................................... 82

3.1. Computational complexity ....................................................................................................................................................... 83

3.2. Communities ............................................................................................................................................................................. 83

3.2.1. Basics .......................................................................................................................................................................... 83

3.2.2. Local definitions......................................................................................................................................................... 84

3.2.3. Global definitions....................................................................................................................................................... 85

3.2.4. Definitions based on vertex similarity ..................................................................................................................... 86

3.3. Partitions.................................................................................................................................................................................... 87

3.3.1. Basics .......................................................................................................................................................................... 87

3.3.2. Quality functions: Modularity................................................................................................................................... 88

4. Traditional methods................................................................................................................................................................................ 90

4.1. Graph partitioning..................................................................................................................................................................... 90

4.2. Hierarchical clustering.............................................................................................................................................................. 93

4.3. Partitional clustering................................................................................................................................................................. 93

4.4. Spectral clustering..................................................................................................................................................................... 94

5. Divisive algorithms ................................................................................................................................................................................. 96

5.1. The algorithm of Girvan and Newman .................................................................................................................................... 97

5.2. Other methods........................................................................................................................................................................... 99

∗ Tel.: +39 011 6603090; fax: +39 011 6600049.

E-mail address: [email protected].

0370-1573/$ – see front matter© 2009 Elsevier B.V. All rights reserved.

doi:10.1016/j.physrep.2009.11.002

“Community detection in graphs”Santo Fortunato,Physics Reports, 486, 75–174, 2010. [6]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................7 of 76

.Hierarchy by aggregation—Bottom up:..

.

Idea: Extract hierarchical classification scheme forobjects by an agglomeration process.

Need a measure of distance between all pairs ofobjects.

Example: Ward’s method [?]

Procedure:1. Order pair-based distances.2. Sequentially add links between nodes based on

closeness.3. Use additional criteria to determine when clusters

are meaningful.

Clusters gradually emerge, likely with clustersinside of clusters.

Call above property Modularity. Works well for data sets where a distance between

all objects can be specified (e.g., Aussie Rules [9]).

Page 2: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................8 of 76

Hierarchy by aggregation.Bottom up problems:..

.

Tend to plainly not work on data sets representingnetworks with known modular structures.

Good at finding cores of well-connected (orsimilar) nodes... but fail to cope well withperipheral, in-between nodes.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................10 of 76

Hierarchy by division.Top down:..

.

Idea: Identify global structure first and recursivelyuncover more detailed structure.

Basic objective: find dominant components thathave significantly more links within than without,as compared to randomized version.

We’ll first work through “Finding and evaluatingcommunity structure in networks” by Newmanand Girvan (PRE, 2004). [12]

See also1. “Scientific collaboration networks. II. Shortest

paths, weighted networks, and centrality” byNewman (PRE, 2001). [10, 11]

2. “Community structure in social and biologicalnetworks” by Girvan and Newman (PNAS, 2002). [7]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................11 of 76

Hierarchy by division

Idea: Edges that connect communities have higherbetweenness than edges within communities.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................12 of 76

Hierarchy by division

.One class of structure-detection algorithms:..

.

1. Compute edge betweenness for whole network.2. Remove edge with highest betweenness.3. Recompute edge betweenness4. Repeat steps 2 and 3 until all edges are removed.

5 Record whencomponents appear asa function of # edgesremoved.

6 Generate dendogramrevealing hierarchicalstructure.

Red line indicates appearanceof four (4) components at acertain level.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................13 of 76

.Key element for division approach:..

.

Recomputing betweenness. Reason: Possible to have a low betweenness in

links that connect large communities if other linkscarry majority of shortest paths.

.When to stop?:..

.

How do we know which divisions are meaningful? Modularity measure: difference in fraction of

within component nodes to that expected forrandomized version:= ∑�[ �� − �2� ]where �� is the fraction of (undirected) edgestravelling between identified communities and ,and �� = ∑� �� is the fraction of edges with atleast one end in community .

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................15 of 76

Hierarchy by division

.Test case:..

.

Generate random community-based networks. = ��� with four communities of size 32. Add edges randomly within and across

communities. Example: ⟨ ⟩in = 6 and ⟨ ⟩out = �.

Page 3: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................16 of 76

Hierarchy by division

Maximum modularity ≃ 0.5 obtained when fourcommunities are uncovered.

Further ‘discovery’ of internal structure issomewhat meaningless, as any communities ariseaccidentally.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................17 of 76

Hierarchy by division

Factions in Zachary’s karate club network. [18]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................18 of 76

Betweenness for electrons: Unit resistors on each

edge. For every pair of nodes� (source) and � (sink),

set up unit currents inat � and out at �.

Measure absolutecurrent along eachedge ℓ, |�ℓ,��|.

Sum |�ℓ,��| over all pairs of nodes to obtainelectronic betweenness for edge ℓ.

(Equivalent to random walk betweenness.) Contributing electronic betweenness for edge

between nodes and :� elec��,�� = ���|��,�� − ��,��|.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................19 of 76

Electronic betweenness Define some arbitrary voltage reference. Kirchhoff’s laws: current flowing out of node

must balance:�∑�=1 ��� (�� − ��) = ��� − ���. Between connected nodes, �� = � = ��� = �/���. Between unconnected nodes, �� = ∞ = �/���. We can therefore write:�∑�=1 ���(�� − ��) = ��� − ���. Some gentle jiggery-pokery on the left hand side:∑� ���(�� − ��) = �� ∑� ��� − ∑� �����= �� � − ∑� ����� = ∑� [ ������ − �����]= [( − �) � ]�

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................20 of 76

Electronic betweenness Write right hand side as [�ext]�,�� = ��� − ���, where�ext�� holds external source and sink currents. Matrixingly then:( − �) � = �ext�� . = − � is a beast of some utility—known as the

Laplacian. Solve for voltage vector � by � decomposition

(Gaussian elimination). Do not compute an inverse! Note: voltage offset is arbitrary so no unique

solution. Presuming network has one component, null

space of − � is one dimensional. In fact, �( − �) = { �, ∈ } since ( − �) � = 0.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................21 of 76

Alternate betweenness measures:.Random walk betweenness:..

.

Asking too much: Need full knowledge of networkto travel along shortest paths.

One of many alternatives: consider all randomwalks between pairs of nodes and .

Walks starts at node , traverses the networkrandomly, ending as soon as it reaches .

Record the number of times an edge is followedby a walk.

Consider all pairs of nodes. Random walk betweenness of an edge = absolute

difference in probability a random walk travelsone way versus the other along the edge.

Equivalent to electronic betweenness (see alsodiffusion).

Page 4: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................22 of 76

Hierarchy by division

Factions in Zachary’s karate club network. [18]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................23 of 76

Hierarchy by division

Third column shows what happens if we don’trecompute betweenness after each edge removal.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................24 of 76

Scientists working on networks (2004)

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................25 of 76

Scientists working on networks (2004)

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................26 of 76

Scientists working on networks (2004)

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................27 of 76

Dolphins!

Page 5: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................28 of 76

Les Miserables

More network analyses for Les Miserables hereand here.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................30 of 76

Shuffling for structure

“Extracting the hierarchical organization ofcomplex systems”Sales-Pardo et al., PNAS (2007) [14, 15]

Consider all partitions of networks into � groups As for Newman and Girvan approach, aim is to

find partitions with maximum modularity:= ∑� [ �� − (∑� ��)2] = Tr� − ||�2||1.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................31 of 76

Shuffling for structure

Consider partition network, i.e., the network of allpossible partitions.

Defn: Two partitions are connected if they differonly by the reassignment of a single node.

Look for local maxima in partition network. Construct an affinity matrix with entries aff�� . aff�� = �� random walker on modularity network

ends up at a partition with and in the samegroup.

C.f. topological overlap between and =# matching neighbors for and divided bymaximum of � and �.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................32 of 76

Shuffling for structure

Modularity, M0.0

0.5

1.0

P(M

’ >

M)

a

c

d

b

0.0 0.5 1.0

a

b

d

c

Modularity

c

b

a

d

Mav

A

B

C

D

E

3

6

13

12

11

15

14

10

9

5

8

7

2

4

1

A: Base network; B: Partition network; C:Coclassification matrix; D: Comparison to randomnetworks (all the same!); E: Orderedcoclassification matrix; Conclusion: no structure...

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................33 of 76

Method obtains a distribution of classificationhierarchies.

Note: the hierarchy with the highest modularity scoreisn’t chosen.

Idea is to weight possible hierarchies according to theirbasin of attraction’s size in the partition network.

Next step: Given affinities, now need to sort nodes intomodules, submodules, and so on.

Idea: permute nodes to minimize following cost� = 1� �∑�=1 �∑�=1 �aff�� |� − �|. Use simulated annealing (slow).

Observation: should achieve same results for moregeneral cost function: � = 1� ∑��=1 ∑��=1 �aff�� �(|� − �|)where � is a strictly monotonically increasing functionof 0, 1, 2, ...

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................34 of 76

Shuffling for structure

0.8 1 1.20.4 0.5 0.6 0.7 0.8 0.9 10.0

0.2

0.4

0.6

0.8

1.0

Mu

tua

l in

form

atio

n

0.6 0.8 1 1.2

ρ

0.5 1.00.0

B

Co−

affinity determination hierarchical tree determination

classification

Co−classification

clustering

Hierarchical

Hierarchical

clustering

Box model

clustering

A

Topological overla

p

= 640, ⟨ ⟩ = �6, 3 tiered

hierarchy.

Page 6: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................37 of 76

Shuffling for structure

.

e

e

l

s

t

Table 1. Top-level structure of real-world networks

Network Nodes Edges Modules Main modules

Air transportation 3,618 28,284 57 8

E-mail 1,133 10,902 41 8

Electronic circuit 516 686 18 11

Escherichia coli KEGG 739 1,369 39 13

E. coli UCSD 507 947 28 17

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................38 of 76

Shuffling for structure

DC

A B

Modules found match up with geopolitical units.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................39 of 76

Shuffling for structure

0.0

0.5

1.0

0 100 200 300 400 500Node

0.0

0.5

1.0

Fra

ctio

n o

f m

eta

bo

lite

s in

ma

in p

ath

wa

y

Carbohydrates

Lipids

Nucleotides

Amino acids

Other amino acids

Glycans

Polyketides andnonribosomal peptides

Cofactors and vitamins

Secondary metabolites

A

B Modularity

structure formetabolicnetwork of E. coli(UCSDreconstruction).

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................41 of 76

General structure detection

“Detecting communities in large networks”Capocci et al. (2005) [4]

Consider normal matrix −1�, random walkmatrix �T −1, Laplacian − �, and ��T.

Basic observation is that eigenvectors associatedwith secondary eigenvalues reveal evidence ofstructure.

Builds on Kleinberg’s HITS algorithm.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................42 of 76

General structure detection

Example network:

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................43 of 76

General structure detection

Second eigenvector’s components:

Page 7: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................44 of 76

General structure detection

Network of word associations for 10616 words. Average in-degree of 7. Using 2nd to 11th evectors of a modified version

of ��T:Table 1

Words most correlated to science, literature and piano in the eigenvectors of Q!1WWT

Science 1 Literature 1 Piano 1

Scientific 0.994 Dictionary 0.994 Cello 0.993

Chemistry 0.990 Editorial 0.990 Fiddle 0.992

Physics 0.988 Synopsis 0.988 Viola 0.990

Concentrate 0.973 Words 0.987 Banjo 0.988

Thinking 0.973 Grammar 0.986 Saxophone 0.985

Test 0.973 Adjective 0.983 Director 0.984

Lab 0.969 Chapter 0.982 Violin 0.983

Brain 0.965 Prose 0.979 Clarinet 0.983

Equation 0.963 Topic 0.976 Oboe 0.983

Examine 0.962 English 0.975 Theater 0.982

Values indicate the correlation.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................46 of 76

Hierarchies and missing linksClauset et al., Nature (2008) [5]

Idea: Shades indicate probability that nodes in leftand right subtrees of dendogram are connected.

Handle: Hierarchical random graph models. Plan: Infer consensus dendogram for a given real

network. Obtain probability that links are missing (big

problem...).

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................47 of 76

Hierarchies and missing links

Model also predicts reasonably well1. average degree,2. clustering,3. and average shortest path length.

Table 1 | Comparison of original and resampled networks

Network Ækæreal Ækæsamp Creal Csamp dreal dsamp

T. pallidum 4.8 3.7(1) 0.0625 0.0444(2) 3.690 3.940(6)Terrorists 4.9 5.1(2) 0.361 0.352(1) 2.575 2.794(7)

Grassland 3.0 2.9(1) 0.174 0.168(1) 3.29 3.69(2)

Statistics are shown for the three example networks studied and for new networks generated by

resampling from our hierarchical model. The generated networks closely match the average

degree Ækæ, clustering coefficient C and average vertex–vertex distance d in each case,

suggesting that they capture much of the structure of the real networks. Parenthetical values

indicate standard errors on the final digits.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................48 of 76

Hierarchies and missing links

a

b

Consensus dendogram for grassland species. Copes with disassortative and assortative

communities.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................50 of 76

From PoCS:Small-worldness and social searchability

.

.

Social networks and identity:

Identity is formed from attributes such as: Geographic location Type of employment Religious beliefs Recreational activities.

Groups are formed by people with at least one similarattribute.

Attributes ⇔ Contexts ⇔ Interactions ⇔ Networks.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................51 of 76

Social distance—Bipartite affiliationnetworks.

.

c d ea b

2 3 41

a

b

c

d

e

contexts

individuals

unipartitenetwork

Page 8: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................52 of 76

Social distance—Context distance

.

. eca

high schoolteacher

occupation

health careeducation

nurse doctorteacherkindergarten

db

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................53 of 76

Models

.Generalized affiliation networks..

.

100

eca b d

geography occupation age

0

Blau & Schwartz [2], Simmel [16], Breiger [3], Watts etal. [17]; see also Google+ Circles.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................54 of 76

.Dealing with community overlap:..

.

Earlier structure detection algorithms,agglomerative or divisive, force communities to bepurely distinct.

Overlap: Acknowledge nodes can belong tomultiple communities.

Palla et al. [13] detect communities as sets ofadjacent -cliques (must share − � nodes).

One of several issues: how to choose ? Four new quantities:

�, number of a communities a node belongs to. �ov�,�, number of nodes shared between two given

communities, � and �. �com� , degree of community �. �com� , community �’s size.

Associated distributions:>(�), >(�ov�,�), >( com� ), and >(�com� ).

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................55 of 76

.

.

Uncovering the overlapping community structure ofcomplex networks in nature and societyGergely Palla1,2, Imre Derenyi2, Illes Farkas1 & Tamas Vicsek1,2

Many complex systems in nature and society can be described interms of networks capturing the intricate web of connectionsamong the units they are made of1–4. A key question is how tointerpret the global organization of such networks as the co-existence of their structural subunits (communities) associatedwith more highly interconnected parts. Identifying these a prioriunknown building blocks (such as functionally related proteins5,6,industrial sectors7 and groups of people8,9) is crucial to theunderstanding of the structural and functional properties ofnetworks. The existing deterministic methods used for large net-works find separated communities, whereas most of the actualnetworks are made of highly overlapping cohesive groups ofnodes. Here we introduce an approach to analysing the mainstatistical features of the interwoven sets of overlapping commu-nities that makes a step towards uncovering the modular structureof complex systems. After defining a set of new characteristicquantities for the statistics of communities, we apply an efficienttechnique for exploring overlapping communities on a large scale.We find that overlaps are significant, and the distributions weintroduce reveal universal features of networks. Our studies ofcollaboration, word-association and protein interaction graphsshow that the web of communities has non-trivial correlations andspecific scaling properties.Most real networks typically contain parts in which the nodes

(units) are more highly connected to each other than to the rest ofthe network. The sets of such nodes are usually called clusters,communities, cohesive groups or modules8,10,11–13; they have nowidely accepted, unique definition. In spite of this ambiguity,the presence of communities in networks is a signature of thehierarchical nature of complex systems5,14. The existing methodsfor finding communities in large networks are useful if the commu-nity structure is such that it can be interpreted in terms of separatedsets of communities (see Fig. 1b and refs 10, 15, 16–18). However,most real networks are characterized by well-defined statistics ofoverlapping and nested communities. This can be illustrated by thenumerous communities that each of us belongs to, including thoserelated to our scientific activities or personal life (school, hobby,family) and so on, as shown in Fig. 1a. Furthermore, members of ourcommunities have their own communities, resulting in an extremelycomplicated web of the communities themselves. This has long beenunderstood by sociologists19 but has never been studied system-atically for large networks. Another, biological, example is that alarge fraction of proteins belong to several protein complexessimultaneously20.In general, each node i of a network can be characterized by a

membership number m i, which is the number of communities thatthe node belongs to. In turn, any two communities a and b can sharesova;b nodes, which we define as the overlap size between thesecommunities. Naturally, the communities also constitute a network,

with the overlaps being their links. The number of such links ofcommunity a can be called its community degree, dcoma : Finally, thesize scoma of any community a can most naturally be defined as thenumber of its nodes. To characterize the community structure of alarge network we introduce the distributions of these four basicquantities. In particular we focus on their cumulative distribution

LETTERS

Figure 1 | Illustration of the concept of overlapping communities. a, Theblack dot in the middle represents either of the authors of this paper, withseveral of his communities around. Zooming in on the scientific communitydemonstrates the nested and overlapping structure of the communities, anddepicting the cascades of communities starting from some membersexemplifies the interwoven structure of the network of communities.b, Divisive and agglomerative methods grossly fail to identify thecommunities when overlaps are significant. c, An example of overlappingk-clique communities at k ¼ 4. The yellow community overlaps the blue onein a single node, whereas it shares two nodes and a link with the green one.These overlapping regions are emphasized in red. Notice that any k-clique(complete subgraph of size k) can be reached only from the k-cliques of thesame community through a series of adjacent k-cliques. Two k-cliques areadjacent if they share k 2 1 nodes.

1Biological Physics Research Group of the Hungarian Academy of Sciences, Pazmany P. stny. 1A, H-1117 Budapest, Hungary. 2Department of Biological Physics, Eotvos University,

Pazmany P. stny. 1A, H-1117 Budapest, Hungary.

Vol 435|9 June 2005|doi:10.1038/nature03607

814

© 2005 Nature Publi shing Group

“Uncovering the overlapping communitystructure of complex networks in natureand society”Palla et al.,Nature, 435, 814–818, 2005. [13]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................56 of 76

Figure 1 | Illustration of the concept of overlapping communities. a, Theblack dot in the middle represents either of the authors of this paper, withseveral of his communities around. Zooming in on the scientific communitydemonstrates the nested and overlapping structure of the communities, anddepicting the cascades of communities starting from some membersexemplifies the interwoven structure of the network of communities.b, Divisive and agglomerative methods grossly fail to identify thecommunities when overlaps are significant. c, An example of overlappingk-clique communities at k ¼ 4. The yellow community overlaps the blue onein a single node, whereas it shares two nodes and a link with the green one.These overlapping regions are emphasized in red. Notice that any k-clique(complete subgraph of size k) can be reached only from the k-cliques of thesame community through a series of adjacent k-cliques. Two k-cliques areadjacent if they share k 2 1 nodes.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................57 of 76

Figure 2 | The community structure around a particular node in three

different networks. The communities are colour coded, the overlappingnodes and links between them are emphasized in red, and the volume of theballs and the width of the links are proportional to the total number ofcommunities they belong to. For each network the value of k has been set to4. a, The communities of G. Parisi in the co-authorship network of theLos Alamos CondensedMatter archive (for threshold weightw* ¼ 0.75) can

be associated with his fields of interest. b, The communities of the word‘bright’ in the South Florida Free Association norms list (for w* ¼ 0.025)represent the different meanings of this word. c, The communities of theprotein Zds1 in the DIP core list of the protein–protein interactions of S.cerevisiae can be associated with either protein complexes or certainfunctions.

Two tunable parameters: �∗, the link weightthreshold, and , the clique size.

Page 9: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................58 of 76

Figure 4 | Statistics of the k-clique communities for three large

networks. The networks are the co-authorship network of the Los AlamosCondensed Matter archive (triangles, k ¼ 6, f* ¼ 0.93), the word-association network of the South Florida Free Association norms (squares,k ¼ 4, f* ¼ 0.67), and the protein interaction network of the yeast S.cerevisiae from the DIP database (circles, k ¼ 4). a, The cumulativedistribution function of the community size follows a power law withexponents between 21 (upper line) and 21.6 (lower line). b, Thecumulative distribution of the community degree starts exponentially andthen crosses over to a power law (with the same exponent as for thecommunity size distribution). c, The cumulative distribution of the overlapsize. d, The cumulative distribution of the membership number.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................60 of 76

.A link-based approach:..

.

What we know now: Many network analyses profitfrom focusing on links.

Idea: form communities of links rather thancommunities of nodes.

Observation: Links typically of one flavor, whilenodes may have many flavors.

Link communities induce overlapping and stillhierarchically structured communities of nodes.

[Applause.]

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................61 of 76

.

.

♠�✁✂ ❝�♠✄☎✆✝✆✞✁✟✠✆ ✄☎�♣✡ �♦ ❛ ❧❛☎☛✆☞✁❝❛❧✆ ✁�❝✟❛❧ ✞✆✂♥�☎✌ ❝✍☎☎✆✞✂❧✡

✟✞ ✆♣✟✁✂✆✞❝✆✶✎✏✶✑

❀ ✂✝✆♠✆✂❛✒�❧✟❝ ✞✆✂♥�☎✌ ✟✐✓✔✕✖✗✘ ♦☎�♠❊✙✚✛✜✢✣✚✛✣✤ ✚✥✦✣

❑☞✔✕ ▼✧✔✖★★ ✁✂☎❛✟✞✘ ✟✁ �✞✆ �♦ ✂✝✆ ♠�✁✂ ✆❧❛✒�☎❛✂✆ ☎✆❝�✞✁✂☎✍❝✂✟�✞✁

❝✍☎☎✆✞✂❧✡ ❛✠❛✟❧❛✒❧✆✶✩❀ ❛✞✪ ✂✝✆ ✂✝☎✆✆ ✄☎�✂✆✟✞✫✄☎�✂✆✟✞ ✟✞✂✆☎❛❝✂✟�✞ ✞✆✂☞

♥�☎✌✁ �♦ ❙✤✚✚✛✤✢✥✬✭✚✜✙ ✚✜✢✜✮✣✙✣✤✜ ❛☎✆ ✂✝✆ ♠�✁✂ ☎✆❝✆✞✂ ❛✞✪ ❝�♠✄❧✆✂✆

✄☎�✂✆✟✞✫✄☎�✂✆✟✞ ✟✞✂✆☎❛❝✂✟�✞ ✪❛✂❛ ✡✆✂ ✄✍✒❧✟✁✝✆✪✶✯✳

❚✝✆✁✆ ✞✆✂♥�☎✌✁ ✄�✁✁✆✁✁ ☎✟❝✝ ♠✆✂❛✪❛✂❛ ✂✝❛✂ ❛❧❧�♥ ✍✁ ✂� ✪✆✁❝☎✟✒✆ ✂✝✆

✁✂☎✍❝✂✍☎❛❧ ❛✞✪ ♦✍✞❝✂✟�✞❛❧ ☎�❧✆✁ �♦ ✆❛❝✝ ✞�✪✆✳ ✓�☎ ✆♣❛♠✄❧✆✘ ✂✝✆ ✒✟�☞

❧�☛✟❝❛❧ ☎�❧✆✁ �♦ ✆❛❝✝ ✄☎�✂✆✟✞ ✟✞ ✂✝✆ ✄☎�✂✆✟✞✫✄☎�✂✆✟✞ ✟✞✂✆☎❛❝✂✟�✞ ✞✆✂☞

♥�☎✌ ❝❛✞ ✒✆ ✪✆✁❝☎✟✒✆✪ ✒✡ ❛ ❝�✞✂☎�❧❧✆✪ ✠�❝❛✒✍❧❛☎✡ ✰✧✆✞✆ ❖✞✂�❧�☛✡

✂✆☎♠✁✷✑✱✳ ❇✡ ❝❛❧❝✍❧❛✂✟✞☛ ♠✆✂❛✪❛✂❛☞✒❛✁✆✪ ✁✟♠✟❧❛☎✟✂✡ ♠✆❛✁✍☎✆✁ ✒✆✂♥✆✆✞

✞�✪✆✁ ✰▼✆✂✝�✪✁ ❛✞✪ ✲✍✄✄❧✆♠✆✞✂❛☎✡ ■✞♦�☎♠❛✂✟�✞✘ ✁✆❝✂✟�✞ ★✱✘ ♥✆ ❝❛✞

✪✆✂✆☎♠✟✞✆ ✂✝✆ q✍❛❧✟✂✡ �♦ ❝�♠♠✍✞✟✂✟✆✁ ✒✡ ✂✝✆ ✁✟♠✟❧❛☎✟✂✡ �♦ ✂✝✆ ✞�✪✆✁

✂✝✆✡ ❝�✞✂❛✟✞ ✰✴❝�♠♠✍✞✟✂✡ q✍❛❧✟✂✡✵✱✳ ▲✟✌✆♥✟✁✆✘ ♥✆ ❝❛✞ ✍✁✆♠✆✂❛✪❛✂❛ ✂�

✆✁✂✟♠❛✂✆ ✂✝✆ ✆♣✄✆❝✂✆✪ ❛♠�✍✞✂ �♦ �✠✆☎❧❛✄ ❛☎�✍✞✪ ❛ ✞�✪✆✘ ✂✆✁✂✟✞☛ ✂✝✆

q✍❛❧✟✂✡ �♦ ✂✝✆ ✪✟✁❝�✠✆☎✆✪ �✠✆☎❧❛✄ ❛❝❝�☎✪✟✞☛ ✂� ✂✝✆ ♠✆✂❛✪❛✂❛ ✰✴�✠✆☎❧❛✄

q✍❛❧✟✂✡✵✱✳ ✓�☎ ✆♣❛♠✄❧✆✘ ♠✆✂❛✒�❧✟✂✆✁ ✂✝❛✂ ✄❛☎✂✟❝✟✄❛✂✆ ✟✞ ♠�☎✆ ♠✆✂❛☞

✒�❧✟❝ ✄❛✂✝♥❛✡✁ ❛☎✆ ✆♣✄✆❝✂✆✪ ✂� ✒✆❧�✞☛ ✂� ♠�☎✆ ❝�♠♠✍✞✟✂✟✆✁ ✂✝❛✞

♠✆✂❛✒�❧✟✂✆✁ ✂✝❛✂ ✄❛☎✂✟❝✟✄❛✂✆ ✟✞ ♦✆♥✆☎ ✄❛✂✝♥❛✡✁✳ ✲�♠✆ ♠✆✂✝�✪✁ ♠❛✡

♦✟✞✪ ✝✟☛✝☞q✍❛❧✟✂✡ ❝�♠♠✍✞✟✂✟✆✁ ✒✍✂ �✞❧✡ ♦�☎ ❛ ✁♠❛❧❧ ♦☎❛❝✂✟�✞ �♦ ✂✝✆

✞✆✂♥�☎✌❀ ❝�✠✆☎❛☛✆ ♠✆❛✁✍☎✆✁ ✪✆✁❝☎✟✒✆ ✝�♥ ♠✍❝✝ �♦ ✂✝✆ ✞✆✂♥�☎✌ ♥❛✁

❝❧❛✁✁✟♦✟✆✪ ✒✡ ✆❛❝✝ ❛❧☛�☎✟✂✝♠ ✰✴❝�♠♠✍✞✟✂✡ ❝�✠✆☎❛☛✆✵✱ ❛✞✪ ✝�♥ ♠✍❝✝

�✠✆☎❧❛✄ ♥❛✁ ✪✟✁❝�✠✆☎✆✪ ✰✴�✠✆☎❧❛✄ ❝�✠✆☎❛☛✆✵✱✳ ✸❛❝✝ ❝�♠♠✍✞✟✂✡ ❛❧☛�☞

☎✟✂✝♠ ✟✁ ✂✆✁✂✆✪ ✒✡ ❝�♠✄❛☎✟✞☛ ✟✂✁ �✍✂✄✍✂ ♥✟✂✝ ✂✝✆ ♠✆✂❛✪❛✂❛✘ ✂� ✪✆✂✆☎☞

♠✟✞✆ ✝�♥ ♥✆❧❧ ✂✝✆ ✪✟✁❝�✠✆☎✆✪ ❝�♠♠✍✞✟✂✡ ✁✂☎✍❝✂✍☎✆ ☎✆♦❧✆❝✂✁ ✂✝✆

♠✆✂❛✪❛✂❛✘ ❛❝❝�☎✪✟✞☛ ✂� ✂✝✆ ♦�✍☎♠✆❛✁✍☎✆✁✳ ✸❛❝✝♠✆❛✁✍☎✆ ✟✁ ✞�☎♠❛❧✟✹✆✪

✁✍❝✝ ✂✝❛✂ ✂✝✆ ✒✆✁✂ ♠✆✂✝�✪ ❛✂✂❛✟✞✁ ❛ ✠❛❧✍✆ �♦ �✞✆✳ ✴➅�♠✄�✁✟✂✆ ✄✆☎♦�☎☞

♠❛✞❝✆✵ ✟✁ ✂✝✆ ✁✍♠ �♦ ✂✝✆✁✆ ♦�✍☎ ✞�☎♠❛❧✟✹✆✪ ♠✆❛✁✍☎✆✁✘ ✁✍❝✝ ✂✝❛✂ ✂✝✆

♠❛♣✟♠✍♠ ❛❝✝✟✆✠❛✒❧✆ ✁❝�☎✆ ✟✁ ♦�✍☎✳ ✓✍❧❧ ✪✆✂❛✟❧✁ ❛☎✆ ✟✞ ▼✆✂✝�✪✁ ❛✞✪

✲✍✄✄❧✆♠✆✞✂❛☎✡ ■✞♦�☎♠❛✂✟�✞✘ ✁✆❝✂✟�✞✁ ★ ❛✞✪ ✖✳

✓✟☛✍☎✆ ✕ ✪✟✁✄❧❛✡✁ ✂✝✆ ☎✆✁✍❧✂✁ �♦ ✂✝✟✁ q✍❛✞✂✟✂❛✂✟✠✆ ❝�♠✄❛☎✟✁�✞✘ ✁✝�♥☞

✟✞☛ ✂✝❛✂ ❧✟✞✌ ❝�♠♠✍✞✟✂✟✆✁ ☎✆✠✆❛❧ ♠�☎✆ ❛✒�✍✂ ✆✠✆☎✡ ✞✆✂♥�☎✌✵✁ ♠✆✂❛☞

✪❛✂❛ ✂✝❛✞ �✂✝✆☎ ✂✆✁✂✆✪ ♠✆✂✝�✪✁✳ ◆�✂ �✞❧✡ ✟✁ �✍☎ ❛✄✄☎�❛❝✝ ✂✝✆ �✠✆☎❛❧❧

❧✆❛✪✆☎ ✟✞ ✆✠✆☎✡ ✞✆✂♥�☎✌✘ ✟✂ ✟✁ ❛❧✁� ✂✝✆♥✟✞✞✆☎ ✟✞♠�✁✂ ✟✞✪✟✠✟✪✍❛❧ ❛✁✄✆❝✂✁

�♦ ✂✝✆ ❝�♠✄�✁✟✂✆ ✄✆☎♦�☎♠❛✞❝✆ ♦�☎ ❛❧❧ ✞✆✂♥�☎✌✁✘ ✄❛☎✂✟❝✍❧❛☎❧✡ ✂✝✆ q✍❛❧✟✂✡

♠✆❛✁✍☎✆✁✳ ❚✝✆ ✄✆☎♦�☎♠❛✞❝✆ �♦ ❧✟✞✌ ❝�♠♠✍✞✟✂✟✆✁ ✁✂❛✞✪✁ �✍✂ ♦�☎ ✪✆✞✁✆

✞✆✂♥�☎✌✁✘ ✁✍❝✝ ❛✁ ✂✝✆ ♠✆✂❛✒�❧✟❝ ❛✞✪ ♥�☎✪ ❛✁✁�❝✟❛✂✟�✞ ✞✆✂♥�☎✌✁✘

♥✝✟❝✝ ❛☎✆ ✆♣✄✆❝✂✆✪ ✂� ✝❛✠✆ ✄✆☎✠❛✁✟✠✆❧✡ �✠✆☎❧❛✄✄✟✞☛ ✁✂☎✍❝✂✍☎✆✳

❯✺✻✼✽✾✿✻❁❂

❏❃❄❅❆ ❈❉❉❃❄❅❆❋●❅❆

❍❃❋● ❈❅P ◗❃❘❱

❲❳❨✻❩❂

❬❭❄❪P❄❅❫❴ ❄❅ ❴❈❋●❅●❄❫❵❜❃❘❵❃❃P

❞❡❢❣❤❥❦

r❦s

t✉✈✇①②

③④⑤⑥❥⑦⑥

r❦⑧ ⑨❥⑩❶⑩❷⑤

❸⑦❥❢❡❤❥❹⑦

❺④❢❻❥⑦❦❶

❺④❢❻❥⑥❤❣⑤

❸⑦❥❢❡⑦❢

❼❥❡⑥❤❢❥❡❽④❢⑩❣⑤

❾⑤❿⑩❤④❢⑥❥⑥

❽④❢⑩❣❢❻

➀❣❦➁❥❤⑤

➂❢❶❦❤❥➁❥❤⑤

⑨❥⑩❶⑩❷❥⑥❤

❸❻❦❣❤

❸⑦❥❢❡❤❥⑥❤

❸❶⑤

⑨❣❥❷④❤

➀❢❡❥➃⑥

❞❡❤❢❶❶❥❷❢❡⑦❢

❺❶❢➁❢❣

❞❡❤❢❶❶❥❷❢❡❤➀❥➄❤❢➆

➇❥⑥❢

❞❡➁❢❡❤⑩❣ ⑨❣❥❶❶❥❦❡❤

➇❥⑥➆⑩❻

➈❥❡❢❤❥⑦

❼➉⑦❢❿❤❥⑩❡❦❶

➂❢❤❦❣➆❢➆

❞❡➁❢❡❤

❺④❢❻❥⑥❤

➇❥❤

➊❢❶⑩⑦❥❤⑤

❞❡❤❢❶❶❢⑦❤

❺➃❡❡❥❡❷

➋➃❤➄⑩➉

➌❶❦⑥➍ ⑨❢❦➍❢❣

❽❢⑥❤ ❤➃⑧❢

❼➉❿❢❣❥❻❢❡❤

➂❢⑥❢❦❣⑦④

➎❿❿❶❢

➇❢❥❷④❤

➏➐➑➒➓➔→➒➣↔↕ ➙➛➔➒➣➛➒

➜➒➝↔➞➣↕ ➟➓➠➡➔↔➢↕ ➠➑➑➤➒

➥→➠➓↔↕ ➔➣↔➒➤➤➒➛↔↕ ➙➛➔➒➣↔➔➙↔➙

➦➤➒➡➒➓↕ ➝➔↔

➥➛➔➒➣➛➒↕ ➙➛➔➒➣↔➔➙↔➙

➧➨

➭➯➲➳➯➲➵➯➲➳➯➭➵➯➳➵➯➭➲➯➸➺➯➻➲➯➻➲➯➺➸➯➼➸➯➽➽➯➼

➘ ➴

✃●❈❴❭❘●❴

❐❒❮❰ÏÐÑ ÒÓ❒❮❰ÐÔ❮

❐❒❮❰ÏÐÑ ÕÖÐÏ×ØÙ

ÚÓÛÛÖÜ×ØÙ ÒÓ❒❮❰ÐÔ❮

ÚÓÛÛÖÜ×ØÙ ÕÖÐÏ×ØÙ

✃●❆❵❃P❴

ÝÚÞß

➯➯➯➯

Ý×ÜàáÚÏ×ÕÖ❮ Ñ❮❰ÒÓÏÐØ×ÓÜÞ❰❮❮âÙ ÛÓâÖÏÐ❰×ØÙßÜãÓÛÐÑ

❐Øä❮❰ Ü❮ØåÓ❰àáæÓÒ×ÐÏ Ü❮ØåÓ❰àáç×ÓÏÓÔ×ÒÐÏ Ü❮ØåÓ❰àá

➽➽➺è➼➽➼➻é➭➲êëì

í ➵èî➲➳➵➻é➽➵

➵è➻➲➸➭éî➻

➵èîî➲➵➻é➺➸

➵è➳➵➭➲é➳➵

➳è➸➳➼➽é➼➳

➻➸è➲➵➵➽é➼î

➭➼î➭➽é➼➺

➵è➳➵➼➼é➽î

➺èî➵➽➳➳éî➳

➵➽è➵➲➳➺éî➼

î

Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß Ý Ú Þ ß

ïðñòðóôõöòö÷øð÷ñùúûö

üÛÐýÓÜéÒÓÛþÓ❰â ÐááÓÒéÿä×ÏÓáÓÑä❮❰❯æ ÚÓÜÔ❰❮ááüÒØÓ❰ÿäÓÜ❮ÿÿß PÐÏÏ�ÿÿß PÝÚ�ÿÿß Püÿ✁✂æ�ÿÿß P✄➳☎�✂❮ØÐ▼ÓÏ×Ò

❋✆✝✞✟✠ ✷ ⑤ ❆✡✡✠✡✡✆☛✝ t☞✠ ✟✠r✠✌✍☛✎✠ ♦✏ r✆☛❧ ✎♦❝❝✞☛✆t✆✠✡ ✞✡✆☛✝ ✟✠✍r✑✒♦✟r✓

☛✠t✒♦✟❧✡♥ ❈✔✕✖✔✗✘✙✚ ✖✚♣✛✔♣✕✜✢✣✚ ✭✤✚✙✥✔✦✗ ✜✢✦ ❙✧✖✖★✚✕✚✢✙✜♣✩

■✢✛✔♣✕✜✙✘✔✢✪ ✘✗ ✜ ✦✜✙✜❞✦♣✘✫✚✢✕✚✜✗✧♣✚ ✔✛ ✙✥✚ q✧✜★✘✙✩ ✭♣✚★✚✫✜✢✣✚ ✔✛ ✦✘✗✣✔✫✚♣✚✦

✕✚✕♠✚♣✗✥✘✖✗✪ ✜✢✦ ✣✔✫✚♣✜✬✚ ✭✛♣✜✣✙✘✔✢ ✔✛ ✢✚✙✮✔♣✯ ✣★✜✗✗✘✛✘✚✦✪ ✔✛ ✣✔✕✕✧✢✘✙✩

✜✢✦ ✔✫✚♣★✜✖✰ ❚✚✗✙✚✦ ✜★✬✔♣✘✙✥✕✗ ✜♣✚ ★✘✢✯ ✣★✧✗✙✚♣✘✢✬✱ ✘✢✙♣✔✦✧✣✚✦ ✥✚♣✚❤ ✣★✘q✧✚

✖✚♣✣✔★✜✙✘✔✢✾❤ ✬♣✚✚✦✩ ✕✔✦✧★✜♣✘✙✩ ✔✖✙✘✕✘✲✜✙✘✔✢✳✴❤ ✜✢✦ ■✢✛✔✕✜✖✳✵✰ ❚✚✗✙

✢✚✙✮✔♣✯✗ ✮✚♣✚ ✣✥✔✗✚✢ ✛✔♣ ✙✥✚✘♣ ✫✜♣✘✚✦ ✗✘✲✚✗ ✜✢✦ ✙✔✖✔★✔✬✘✚✗ ✜✢✦ ✙✔ ♣✚✖♣✚✗✚✢✙

✙✥✚ ✦✘✛✛✚♣✚✢✙ ✦✔✕✜✘✢✗ ✮✥✚♣✚ ✢✚✙✮✔♣✯ ✜✢✜★✩✗✘✗ ✘✗ ✧✗✚✦✰ ❙✥✔✮✢ ✛✔♣ ✚✜✣✥ ✜♣✚ ✙✥✚

✢✧✕♠✚♣ ✔✛ ✢✔✦✚✗✱ ◆✱ ✜✢✦ ✙✥✚ ✜✫✚♣✜✬✚ ✢✧✕♠✚♣ ✔✛ ✢✚✘✬✥♠✔✧♣✗ ✖✚♣ ✢✔✦✚✱ ➷❦æ✰

▲✘✢✯ ✣★✧✗✙✚♣✘✢✬ ✛✘✢✦✗ ✙✥✚ ✕✔✗✙ ♣✚★✚✫✜✢✙ ✣✔✕✕✧✢✘✙✩ ✗✙♣✧✣✙✧♣✚ ✘✢ ♣✚✜★❞✮✔♣★✦

✢✚✙✮✔♣✯✗✰ ✶✸✹✤❙✱ ✜✛✛✘✢✘✙✩❞✖✧♣✘✛✘✣✜✙✘✔✢✹✕✜✗✗ ✗✖✚✣✙♣✔✕✚✙♣✩❤ ▲❈✱ ★✘✙✚♣✜✙✧♣✚

✣✧♣✜✙✚✦❤ ✸✸■✱ ✖♣✔✙✚✘✢✺✖♣✔✙✚✘✢ ✘✢✙✚♣✜✣✙✘✔✢❤ ❨✻✼✱ ✩✚✜✗✙ ✙✮✔❞✥✩♠♣✘✦✰

❋✆✝✞✟✠ ✽ ⑤ ❖✌✠✟r✍✿✿✆☛✝ ✎♦❝❝✞☛✆t✆✠✡ r✠✍✓ t♦ ✓✠☛✡✠ ☛✠t✒♦✟❧✡ ✍☛✓ ✿✟✠✌✠☛t

t☞✠ ✓✆✡✎♦✌✠✟② ♦✏ ✍ ✡✆☛✝r✠ ☛♦✓✠ ☞✆✠✟✍✟✎☞②♥ ✍✱ ▲✔✣✜★ ✗✙♣✧✣✙✧♣✚ ✘✢ ✕✜✢✩

✢✚✙✮✔♣✯✗ ✘✗ ✗✘✕✖★✚s ✜✢ ✘✢✦✘✫✘✦✧✜★ ✢✔✦✚ ✗✚✚✗ ✙✥✚ ✣✔✕✕✧✢✘✙✘✚✗ ✘✙ ♠✚★✔✢✬✗ ✙✔✰

❜✱ ❈✔✕✖★✚❀ ✬★✔♠✜★ ✗✙♣✧✣✙✧♣✚ ✚✕✚♣✬✚✗ ✮✥✚✢ ✚✫✚♣✩ ✢✔✦✚ ✘✗ ✘✢ ✙✥✚ ✗✘✙✧✜✙✘✔✢

✦✘✗✖★✜✩✚✦ ✘✢ ✍✰ ✎✱ ✸✚♣✫✜✗✘✫✚ ✔✫✚♣★✜✖ ✥✘✢✦✚♣✗ ✙✥✚ ✦✘✗✣✔✫✚♣✩ ✔✛ ✥✘✚♣✜♣✣✥✘✣✜★

✔♣✬✜✢✘✲✜✙✘✔✢ ♠✚✣✜✧✗✚ ✢✔✦✚✗ ✣✜✢✢✔✙ ✔✣✣✧✖✩ ✕✧★✙✘✖★✚ ★✚✜✫✚✗ ✔✛ ✜ ✢✔✦✚

✦✚✢✦♣✔✬♣✜✕✱ ✖♣✚✫✚✢✙✘✢✬ ✜ ✗✘✢✬★✚ ✙♣✚✚ ✛♣✔✕ ✚✢✣✔✦✘✢✬ ✙✥✚ ✛✧★★ ✥✘✚♣✜♣✣✥✩✰

✓✱ ✠✱ ✶✢ ✚❀✜✕✖★✚ ✗✥✔✮✘✢✬ ★✘✢✯ ✣✔✕✕✧✢✘✙✘✚✗ ✭✣✔★✔✧♣✗ ✘✢ ✓✪✱ ✙✥✚ ★✘✢✯ ✗✘✕✘★✜♣✘✙✩

✕✜✙♣✘❀ ✭✠❤ ✦✜♣✯✚♣ ✚✢✙♣✘✚✗ ✗✥✔✮ ✕✔♣✚ ✗✘✕✘★✜♣ ✖✜✘♣✗ ✔✛ ★✘✢✯✗✪ ✜✢✦ ✙✥✚ ★✘✢✯

✦✚✢✦♣✔✬♣✜✕ ✭✠✪✰ ✏✱ ▲✘✢✯ ✣✔✕✕✧✢✘✙✘✚✗ ✛♣✔✕ ✙✥✚ ✛✧★★ ✮✔♣✦ ✜✗✗✔✣✘✜✙✘✔✢ ✢✚✙✮✔♣✯

✜♣✔✧✢✦ ✙✥✚ ✮✔♣✦ ➅❁✚✮✙✔✢❂✰ ▲✘✢✯ ✣✔★✔✧♣✗ ♣✚✖♣✚✗✚✢✙ ✣✔✕✕✧✢✘✙✘✚✗ ✜✢✦ ✛✘★★✚✦

♣✚✬✘✔✢✗ ✖♣✔✫✘✦✚ ✜ ✬✧✘✦✚ ✛✔♣ ✙✥✚ ✚✩✚✰ ▲✘✢✯ ✣✔✕✕✧✢✘✙✘✚✗ ✣✜✖✙✧♣✚ ✣✔✢✣✚✖✙✗

♣✚★✜✙✚✦ ✙✔ ✗✣✘✚✢✣✚ ✜✢✦ ✜★★✔✮ ✗✧♠✗✙✜✢✙✘✜★ ✔✫✚♣★✜✖✰ ❁✔✙✚ ✙✥✜✙ ✙✥✚ ✮✔♣✦✗ ✮✚♣✚

✖♣✔✦✧✣✚✦ ♠✩ ✚❀✖✚♣✘✕✚✢✙ ✖✜♣✙✘✣✘✖✜✢✙✗ ✦✧♣✘✢✬ ✛♣✚✚ ✮✔♣✦ ✜✗✗✔✣✘✜✙✘✔✢✗✰

❃❄❅❅❄❘❇ ❉❊●❍❏❑ ◗❱❲❳ ❩❬❬ ◗❭ ❊❪❫❪❴❵ ❛❡❢❡

❣✐❥Macmillan Publishers Limited. All rights reserved©2010

“Link communities reveal multiscalecomplexity in networks”Ahn, Bagrow, and Lehmann,Nature, 466, 761–764, 2010. [1]

Inertia

Law

Newton

Physics

Lab Biology

Scientiic

Chemical

Chemistry

Science

EinsteinTheory

Hypothesis

Theorem

Gravity

Relativity

Biologist

Smart

Scientist

Sly

Bright

Genius

Intelligence

Clever

Intelligent

Gifted

Wise

Inventor Brilliant

Wisdom

Kinetic

Exceptional

Retarded

Invent

Chemist

Wit

Velocity

Intellect

Cunning

Outfox

FlaskBeaker

Test tube

Experiment

Research

Apple

Weight

Experiment, science

Newton, gravity, apple

Smart, intellect, scientists

Clever, wit

Science, scientists

f

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................62 of 76

University

Joint appointment

Home and work

Family

Buildings in same neighborhood

21

3

5

3–4

2–4

1–4

2–3

1–2

1–3

4–7

5–6

4–6

4–5

7–9

7–8

8–9

4

6

7

8

9

a b

c

d

f

e

Inertia

Law

Newton

Physics

Lab Biology

Scientiic

Chemical

Chemistry

Science

EinsteinTheory

Hypothesis

Theorem

Gravity

Relativity

Biologist

Smart

Scientist

Sly

Bright

Genius

Intelligence

Clever

Intelligent

Gifted

Wise

Inventor Brilliant

Wisdom

Kinetic

Exceptional

Retarded

Invent

Chemist

Wit

Velocity

Intellect

Cunning

Outfox

FlaskBeaker

Test tube

Experiment

Research

Apple

Weight

Experiment, science

Newton, gravity, apple

Smart, intellect, scientists

Clever, wit

Science, scientists

f

Figure 1 | Overlapping communities lead to dense networks and prevent

the discovery of a single node hierarchy. a, Local structure in manynetworks is simple: an individual node sees the communities it belongs to.b, Complex global structure emerges when every node is in the situationdisplayed in a. c, Pervasive overlap hinders the discovery of hierarchicalorganization because nodes cannot occupy multiple leaves of a nodedendrogram, preventing a single tree from encoding the full hierarchy.d, e, An example showing link communities (colours in d), the link similaritymatrix (e; darker entries show more similar pairs of links) and the linkdendrogram (e). f, Link communities from the full word association networkaround the word ‘Newton’. Link colours represent communities and filledregions provide a guide for the eye. Link communities capture conceptsrelated to science and allow substantial overlap. Note that the words wereproduced by experiment participants during free word associations.

.

.

Note: See details of paper on how to choose linkcommunities well based on partition density �.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................63 of 76

Measures

Overlap coverage

Overlap quality

Community coverage

Community quality

Methods

LCGI

––––

LinksClique percolationGreedy modularity Infomap

Other networksSocial networksBiological networks

885,989

6.34⟨k⟩

N 1,042

16.81

1,647

3.06

1,004

16.57

1,213

4.21

2,729

8.92

67,411

8.90

390

38.95

1,219

9.80

5,018

22.02

18,142

5.09

0

1

2

3

4

L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I L C G I

Co

mp

osite p

erf

orm

ance

Amazon.comWord assoc.PhilosopherUS CongressActorPhonePPI (all)PPI (LC)PPI (AP/MS)PPI (Y2H)Metabolic

Figure 2 | Assessing the relevance of link communities using real-world

networks. Composite performance (Methods and SupplementaryInformation) is a data-drivenmeasure of the quality (relevance of discoveredmemberships) and coverage (fraction of network classified) of communityand overlap. Tested algorithms are link clustering, introduced here; cliquepercolation9; greedy modularity optimization26; and Infomap21. Test

networks were chosen for their varied sizes and topologies and to representthe different domains where network analysis is used. Shown for each are thenumber of nodes, N, and the average number of neighbours per node, Ækæ.Link clustering finds the most relevant community structure in real-worldnetworks. AP/MS, affinity-purification/mass spectrometry; LC, literaturecurated; PPI, protein–protein interaction; Y2H, yeast two-hybrid.

.

.

Comparison of structure detection algorithmsusing four measures over many networks.

Revealed communities are matched against‘known’ communities recorded in networkmetadata.

Link approach particularly good for dense,overlapful networks.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................64 of 76

Threshold, t = 0.20

t =

0.24

t = 0.27

t = 0.27

50 km

a

0.4

D

0.6 0.8 1

d  Largest communityLargest

subcommunity

Remaining

hierarchy

t

e

b

c

Word associationMetabolicPhone

Largestcommunity

Secondlargest

Third largest

e

Q/Q

max

0 0.2 0.4 0.6 0.8

Word association

0 0.2 0.4 0.6 0.8

Link dendrogram threshold, t

Metabolic

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8

Phone

ActualControl

Figure 4 | Meaningful communities at multiple levels of the link

dendrogram. a–c, The social network of mobile phone users displays co-located, overlapping communities on multiple scales. a, Heat map of themost likely locations of all users in the region, showing several cities.b, Cutting the dendrogram above the optimum threshold yields small, intra-city communities (insets). c, Below the optimum threshold, the largestcommunities become spatially extended but still show correlation. d, Thesocial network within the largest community in c, with its largestsubcommunity highlighted. The highlighted subcommunity is shown alongwith its link dendrogram and partition density,D, as a function of threshold,t. Link colours correspond to dendrogram branches. e, Community quality,Q, as a function of dendrogram level, compared with random control(Methods).

Page 10: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................66 of 76

General structure detection

“The discovery of structural form”Kemp and Tenenbaum, PNAS (2008) [8]

crocodile

bat

gorilla

ostrich

robin

turtle

snakePCA,MDS

Hierarchicalclustering

robin

ostrich

crocodile

snake

turtle

bat

gorilla

Unidimensionalscaling

ostrich

gorilla

crocodile

turtle

robin

bat

snake

bat

ostrich

robin

turtle

gorilla

crocodile

snake

crocodile

turtlesnake

robin

ostrich

gorilla

bat

gorillasnake

turtle batrobin

ostrichcrocodile

gorilla bat

turtle snake

crocodile robi n

ostrich

f 1

f 2

f 3

f 4

f 5

f 100 . . .

. . .

. . .

gorilla bat

turtle snake

crocodile robi n

ostric h

f 1

f 2

f 3

f 4

f 5

f 100 . . .

. . .

. . .

Clustering

Data

Structure

Form Tree

Circumplexmodels

A

Minimum

B

spanningtree

robin

ostrich

crocodile

snake

turtle

bat

gorilla

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................67 of 76

General structure detection

. . .

. . .

. . .

. . .

. . .

. . .

D

C

St ru ctur al Fo rm Generativ e p rocess

Pa rt itio n ⇒

Chai n ⇒

Orde r ⇒

Ring ⇒

Hierarch y ⇒

Tr ee ⇒

Gr id Chai n Chai n

Cylinder Chai n Ring

A B

Top downdescription ofform.

Nodereplacementgraph grammar:parent nodebecomes twochild nodes.

B-D: Growingchains, orders,and trees.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................68 of 76

Example learned structures:

New

York

Bombay

Buenos Aires

Moscow

Sao Paulo

Mexico City

Jakarta

Tokyo

Lima

London

Bangkok

SantiagoLos Angeles

Berlin

Madrid

ChicagoVancouverToronto

Sydney

Perth

Anchorage

Cape Town

Nairobi

Vladivostok

Dakar

Kinshasa

Bogota

Honolulu

Wellington

Cairo

Shanghai

Teheran

Irkutsk

Manila

Budapest

GinsburgBrennan

Scalia

Thomas

O'Connor

Kennedy

WhiteSouter

BreyerMarshallBlackmun Stevens Rehnquist

B

C

Elephant

RhinoHorse

Cow

CamelGiraffe

ChimpGorilla

MouseSquirrel

Tiger

Lion

Cat

DogWolf

Seal

Dolphin

Robin

Eagle

Chicken

Salmon Trout

Bee

Iguana

Alligator

Butterfly

AntFinch

Penguin

Cockroach

Whale

Ostrich

Deer

E

A

D

Biological features; Supreme Court votes; perceivedcolor differences; face differences; & distancesbetween cities.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................69 of 76

General structure detection

Elephant

Rhino

Horse

Cow Camel

Giraffe

Chimp

Gorilla

Mouse

Squirrel

TigerLion

Cat

Dog

Elephant

Wolf

Rhino

Seal

Horse

Dolphin

Cow

Robin

Camel

Eagle

Giraffe

Chicken

Chimp

Salmon

Trout

Gorilla

Mouse

Bee

Squirrel

Iguana

Tiger

Alligator

Lion

Butterfly

Ant

Cat

Finch

Elephant

Penguin

Dog

Rhino

Wolf

Cockroach

Seal

Horse

Whale

Ostrich

Dolphin

Deer

Cow

Robin

Camel

Eagle

Giraffe

Chicken

Salmon

Chimp

Trout

Gorilla

Bee

Mouse

Iguana

Alligator

Squirrel

Butterfly

Tiger

Ant

Lion

Finch

Penguin

Cat

Dog

Cockroach

Wolf

Seal

Whale

Dolphin

Ostrich

Robin

Eagle

Deer

Chicken

Salmon Trout

Bee

Iguana

Alligator

Butterfly

AntFinch

Penguin

Cockroach

Whale

Ostrich

Deer

5 features

20 features

110 features

Effect of addingfeatures on detectedform.

Straight partition⇓simple tree⇓complex tree

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................70 of 76

General structure detection

Performance for test networks.True Partition Chain Ring Tree Grid

100*

10000*

10000*

5000*

5000

*

Tree

Ring

Chain

Grid

Partition

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................71 of 76

References I

[1] Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann.Link communities reveal multiscale complexity innetworks.Nature, 466(7307):761–764, 2010. pdf

[2] P. M. Blau and J. E. Schwartz.Crosscutting Social Circles.Academic Press, Orlando, FL, 1984.

[3] R. L. Breiger.The duality of persons and groups.Social Forces, 53(2):181–190, 1974. pdf

[4] A. Capocci, V. Servedio, G. Caldarelli, andF. Colaiori.Detecting communities in large networks.Physica A: Statistical Mechanics and itsApplications, 352:669–676, 2005. pdf

Page 11: COcoNuTS Structuredetection COcoNuTS … · 2016. 4. 15. · COcoNuTS Overview Methods Hierarchybyaggregation Hierarchybydivision Hierarchybyshuffling Spectralmethods Hierarchies&Missing

.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................72 of 76

References II[5] A. Clauset, C. Moore, and M. E. J. Newman.

Hierarchical structure and the prediction ofmissing links in networks.Nature, 453:98–101, 2008. pdf

[6] S. Fortunato.Community detection in graphs.Physics Reports, 486:75–174, 2010. pdf

[7] M. Girvan and M. E. J. Newman.Community structure in social and biologicalnetworks.Proc. Natl. Acad. Sci., 99:7821–7826, 2002. pdf

[8] C. Kemp and J. B. Tenenbaum.The discovery of structural form.Proc. Natl. Acad. Sci., 105:10687–10692, 2008.pdf

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................73 of 76

References III

[9] D. P. Kiley, A. J. Reagan, L. Mitchell, C. M. Danforth,and P. S. Dodds.The game story space of professional sports:Australian Rules Football.Draft version of the present paper using purerandom walk null model. Available online athttp://arxiv.org/abs/1507.03886v1. AccesssedJanuary 17, 2016, 2015. pdf

[10] M. E. J. Newman.Scientific collaboration networks. II. Shortestpaths, weighted networks, and centrality.Phys. Rev. E, 64(1):016132, 2001. pdf

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................74 of 76

References IV

[11] M. E. J. Newman.Erratum: Scientific collaboration networks. II.Shortest paths, weighted networks, and centrality[Phys. Rev. E 64, 016132 (2001)].Phys. Rev. E, 73:039906(E), 2006. pdf

[12] M. E. J. Newman and M. Girvan.Finding and evaluating community structure innetworks.Phys. Rev. E, 69(2):026113, 2004. pdf

[13] G. Palla, I. Derényi, I. Farkas, and T. Vicsek.Uncovering the overlapping community structureof complex networks in nature and society.Nature, 435(7043):814–818, 2005. pdf

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................75 of 76

References V

[14] M. Sales-Pardo, R. Guimerà, A. A. Moreira, andL. A. N. Amaral.Extracting the hierarchical organization ofcomplex systems.Proc. Natl. Acad. Sci., 104:15224–15229, 2007.pdf

[15] M. Sales-Pardo, R. Guimerà, A. A. Moreira, andL. A. N. Amaral.Extracting the hierarchical organization ofcomplex systems: Correction.Proc. Natl. Acad. Sci., 104:18874, 2007. pdf

[16] G. Simmel.The number of members as determining thesociological form of the group. I.American Journal of Sociology, 8:1–46, 1902.

.

COcoNuTS

Overview

MethodsHierarchy by aggregation

Hierarchy by division

Hierarchy by shuffling

Spectral methods

Hierarchies & MissingLinks

Overlapping communities

Link-based methods

General structuredetection

References

................76 of 76

References VI

[17] D. J. Watts, P. S. Dodds, and M. E. J. Newman.Identity and search in social networks.Science, 296:1302–1305, 2002. pdf

[18] W. W. Zachary.An information flow model for conflict and fissionin small groups.J. Anthropol. Res., 33:452–473, 1977.

.