resolution limit in community detection

22
Yilin Shen 02/18/2009 1

Upload: chapa

Post on 20-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Resolution Limit in Community Detection. Yilin Shen 02/18/2009. Community Structure. Definition: Given a network (Graph G=(V,E)), A COMMUNITY is a subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph. Applications: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Resolution Limit in  Community Detection

Yilin Shen02/18/2009

1

Page 2: Resolution Limit in  Community Detection

Definition:Given a network (Graph G=(V,E)), A COMMUNITY is a subgraph of a network whose nodes are more tightly connected with each other than with nodes outside the subgraph.

Applications: ◦ Social Networks◦ Biochemical networks◦ Internet◦ Food webs

2

Page 3: Resolution Limit in  Community Detection

Definition:A quantitative measure to essentially compare the number of links inside a given module with the expected value for a randomized graph of the same size and degree sequence.

Objective:Maximize the Modularity:

The number of links inside a given module-

The expected value for a randomized graph of the same size and degree sequence

3

Page 4: Resolution Limit in  Community Detection

ls : # links inside module s

L : # links in the networkds : The total degree of the nodes in module s

: Expected # of links in module s

22

1 1

1

4 2

m ms s s

ss s

d l dQ l

L L L L

2

2sd

L

4

Page 5: Resolution Limit in  Community Detection

5

Probability that a stub, randomly selected, ends in module s2

sd

L=

Page 6: Resolution Limit in  Community Detection

6

2sd

L 2sd

L

Probability that the link is internal to module s2 2

s sd d

L L =

Expected number of links in module s

2

2 2 4s s sd d d

LL L L

=

Page 7: Resolution Limit in  Community Detection

Subgraph S is a module

Since

, ,outs s sl a l al L

2

0s sl d

L L

2 2 2outs s s s s sd l l l al a l

7

2

2

2 40

2

sss

a ll Ll

L L a

Page 8: Resolution Limit in  Community Detection

Consider “weak” definition for a community

Since ,

Therefore for each , holds.

8

, ,outs s sl a l al L

2 2

, 24

out out ins s s s

s

a l l d d

Ll a

24

24 2

L La L

a

4s

Ll

24

2s

Ll

a

Page 9: Resolution Limit in  Community Detection

A network made of m identical complete graphs (or ‘cliques’) (actually the m connected components are not necessarily cliques), disjoint from each other.

which converges to 1 when the number of cliques goes to infinity.

9

22 1

12

l lQ m

L L m

Page 10: Resolution Limit in  Community Detection

A connected network with N nodes and L links which maximizes modularity.

where

10

2

1

2 2

2

ms s

s

l lQ

L L

1

m

ss

l L m

Page 11: Resolution Limit in  Community Detection

For fixed m, we easily know that Q reaches maximum when

For variable m,

The corresponding number of links in each module is .

11

1, 1M

mQ m L

L m

/ 1sl l L m

* *2

, 1 1 2, 1M

M

dQ m Lm L Q m L

dm L m L

1l L

Page 12: Resolution Limit in  Community Detection

The crucial point here is that modularity seems to have some intrinsic scale of order

, which constrains the number and the size of the modules. For a given total number of nodes and links we could build many more than modules, but the corresponding network would be less “modular”, namely with a value of the modularity lower than the maximum

12

L

L

Page 13: Resolution Limit in  Community Detection

13

Since M1 and M2 are constructed modules, we have

1 1 2 2 1 22, 2, , / 4a b a b l l L

Page 14: Resolution Limit in  Community Detection

Let’s consider the following case• QA : M1 and M2 are separate modules• QB : M1 and M2 is a single module

Since both M1 and M2 are modules by construction, we need

That is,

14

21 1 1 1 2 2 1 22 2 2 2B AQ Q Q La l a b a b l l L

0B AQ Q Q

1

21 1 2 2

2

2 2

Lal

a b a b

Page 15: Resolution Limit in  Community Detection

Now let’s see how it contradicts the constructed modules M1 and M2

We consider the following two scenarios: ( )• The two modules have a perfect balance between internal

and external degree (a1+b1=2, a2+b2=2), so they are on the edge between being or not being communities, in the weak sense.

• The two modules have the smallest possible external degree, which means that there is a single link connecting them to the rest of the network and only one link connecting each other (a1=a2=b1=b2=1/l).

15

1 2l l l

Page 16: Resolution Limit in  Community Detection

When and , the right side of

can reach the maximum value

In this case, may happen.

16

1 2 2a a 1 20, 0b b

1

21 1 2 2

2

2 2

Lal

a b a b

max / 4Rl L

max / 4Rl l L

Page 17: Resolution Limit in  Community Detection

a1=a2=b1=b2=1/l

17

min

2R

Ll l

Page 18: Resolution Limit in  Community Detection

18

sin

2 11

1 2gleQm m n

1 2

11 2pairsQ

m m n

sin 1 2gle pairsQ Q m m n

Page 19: Resolution Limit in  Community Detection

For example, p=5, m=20

The maximal modularity of the network corresponds to the partition in which the two smaller cliques are merged

19

Page 20: Resolution Limit in  Community Detection

Any two interconnected modules, fuzzy or not, are merged if the number of links inside each of them does not exceed .

If modularity optimization finds a module S with lS internal links, it may be that the latter is a combination of two or more smaller communities.

The upper limit of lS can be much larger than , if the substructures are on average more interconnected with each other.

20

minRl

min2 2S Rl l L

2L

Page 21: Resolution Limit in  Community Detection

21

Page 22: Resolution Limit in  Community Detection

Modularity is actually not consistent with its optimization which may favor network partitions with groups of modules combined into larger communities.

The resolution limit of modularity does not rely on particular network structures, but only on the comparison between the sizes of interconnected communities and that of the whole network, where the sizes are measured by the number of links.

An increase of the number of modules does not necessarily correspond to an increase in modularity because the modules would be smaller and so would be each term of the sum.

Quality functions are still helpful, but their role should be probably limited to the comparison of partitions with the same number of modules.

22