matnettutorial part iii: analysis of attributed networkskanawati/mantut/community detection in...

84
Matnet Tutorial Part III: Analysis of Attributed Networks Rushed Kanawati, Martin Atzmueller, Christine Largeron A 3 , Université Sorbonne Paris Cité, France CSLab, Cognitive Science and Artificial Intelligence,Tilburg University, Netherlands Laboratoire Hubert Curien, Université Jean Monnet, Université de Lyon, France WWW 2018, Lyon, 2018-04-24

Upload: others

Post on 08-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Matnet Tutorial Part III:Analysis of Attributed Networks

Rushed Kanawati, Martin Atzmueller, Christine Largeron

A3, Université Sorbonne Paris Cité, FranceCSLab, Cognitive Science and Artificial Intelligence,Tilburg University, Netherlands

Laboratoire Hubert Curien, Université Jean Monnet, Université de Lyon, France

WWW 2018, Lyon, 2018-04-24

Page 2: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Agenda

�Attributed Networks/Graphs

� Subgroups & Communities

�Community Detection

2

Page 3: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Attributed Network

3

Context

Attributed network

Definition 1 [Zhou2009] Network represented by a graph G = (V,E)where each node v 2 V is associated with a vector of attributesvj, j = {1, .., p}.

CHRISTINE LARGERON (LaHC) Ecole EGC 2018 8 / 61

Page 4: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Attributed Network

4

Context

Attributed network

Definition 2 [Yin2010], [Gong2011]Network represented by

I a graph G = (V, E) describing the relationships between the entitiesI a bipartite graph Ga = (V [ Va,Ea) describing the relationships between

the entities and the attributes

CHRISTINE LARGERON (LaHC) Ecole EGC 2018 9 / 61

Page 5: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Attributed Graphs� Graph: edge attributes and/or node attributes� Structure: ties/links (of respective relations)

� Attributes - additional information�Actor attributes (node labels)

� Link attributes (information about connections)

�Attribute vectors for actors and/or links

�… can be mapped from/to each other

� Integration of heterogenous data (networks + vectors)

� Enables simultaneous analysis of relational + attribute data

5

Page 6: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Subgroups & Cohesive subgroups� Subgroup�Subset of actors (and all their ties)

�Define subgroups using specific criteria(homogeneity among members)�Compositional – actor attributes

�Structural – using tie structures

�Detection of cohesive subgroups & communities è structural aspects

� Subgroup discovery è actor attributes

�… attributed graph è can combine both

6

[Wasserman & Faust 1994]

Page 7: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Compositional Subgroups�Detect subgroups according to specific

compositional criteria�Focus on actor attributes

�Describe actor subset using attributes

�Often hypothesis-driven approaches: Test specific attribute combinations

� In contrast: Subgroup discovery�Hypothesis-generating approach

�Exploratory data mining method

�Local exceptionality detection

7

[Atzmueller 2015]

Page 8: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Agenda

�Attributed Networks/Graphs

� Subgroups & Communities

�Community Detection

8

Page 9: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Community Detection/Attributed Networks

9

Community detection in attributed networks

Community detection in attributed network: definition

Given an attributed graph G = (V,E), the task consists in building a partitionP = {C1, . . . ,Cr} of V in r communities such that:

vertices in the same community are densely connected and similar interms of attributesvertices from distinct communities are loosely connected and different

CHRISTINE LARGERON (LaHC) Ecole EGC 2018 19 / 61

[Combe et al. IDA 2015]

Page 10: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Combining Structure and Attributes

�Data sources

�Structural variables (ties, links)

�Compositional variables

�Actor attributes

�Represented as attribute vectors

�Edge attributes

�Each edge has an assigned label

�Multiplex graphsè Multiple edges (labels) between nodes

10

Page 11: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Communities/Edge-Attributed Graphs

�Clustering edge-attributed graphs

�Reduce/flatten to weighted graph[Bothorel et al. 2015]� Derive weights according to number of edges where nodes are

directly connected [Berlingerio et al. 2011]

� Standard graph clustering approaches can then be directly applied

� Frequent-itemset based [Berlingerio et al. 2013]

�Subspace-oriented [Boden et al. 2012]

11

Page 12: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Node-Attributed Graphs� Non-uniform terminology

� Social-attribute network� Attribute augmented graph� Feature-vector graph, vertex-labeled graph� Attributed graph�…

� Different representations

12

[Bothorel et al. 2015]

Page 13: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Community Detection – Attribute Extensions

�Utilize structural + attribute information

�Different roles of a description

�Methods "guiding"community detection usingattribute information

�"Dense structures" - connectivity

�But no "perfect" attribute homogeneity (purity)

�Methods generating explicit descriptions, i.e., descriptive community patterns

�"Dense structures" – connectivity

�Concrete descriptions, e.g., conjunctive logicalformula

13

Page 14: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Attributes for Aiding Community Detection

�Weight modification (edges) according to nodalattributes [Ge et al. 2008, Dang & Viennet 2012, Ruanet al. 2013, Zhou et al. 2009, Steinhaeuser & Chawla 2008]

�Abstraction into similarities between nodesè Edge weightsè Apply standard community detection algorithm,

� Specifically, distance-based community detectionmethods

� Entropy-oriented methods [Zhu et al. 2011, Smith et al. 2014, Cruz et al. 2011]

�Model-based approaches [Xu et al. 2012, Yang et al. 2013, Akoglu et al. 2012]

14

Page 15: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Weight modification� Use attribute-based distance measure

� Community detection: Group nodes according tothreshold �, i.e., given � � (0, 1) place any pair ofnodes whose edge weight exceeds the thresholdinto the same community

� Evaluate final partitioning using Modularity15

[Steinhaeuser & Chawla 2008]

Page 16: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Entropy Minimization

� For a partition, optimize entropy usingMonte-Carlo

� Integrateentropy stepinto Modularityoptimizationalgorithm

16

[Cruz et al. 2011]

[Blondel et al. 2008]

Page 17: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Model-based/MDL

� In general: Model edge & attribute valuesusing mixtures of probability distributions

�Use MDL to select clusters w.r.t. attributevalue similarity & connectivity similarity

�Data compression of connectivity& attribute matrices (PICS algorithm)

�Lossless compression è MDL cost-function

�Resulting node groups�Homogeneous both in node & attribute matrix

�Nodes - similar connectivity & high attribute coherence

17

[Akoglu et al. 2013]

Page 18: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Descriptive Community Patterns

�Community mining scenario�Discover "densely connected groups of nodes"�Communities should have explicit description�Community (evaluation) space: network/graph

�Goal:�Often: Discover top-k communities�Maximize some community

quality function

18

Page 19: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Finding Explicit Descriptions

� Cluster transformed node-attribute similaritygraph & extract pure clusters

�Mine frequent itemsets (binary attributes)& analyze communities

� Combine dense subgraph mining + subspaceclustering

� Apply correlated pattern mining

� Interleave community detection& redescription mining

� Adapt local exceptionality detection (usingsubgroup discovery) for communities

19

[Adnan et al. 2009]

[Moser et al. 2009,Günnemann et al. 2013]

[Atzmueller & Mitzlaff 2011, Atzmueller et al. 2015]

[Silva et al. 2012]

[Pool et al. 2014]

Page 20: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Subspace-Clustering & Dense Subgraphs

� Twofold cluster O: Combine subspace-clustering & dense subgraph mining (GAMer algorithm)� O fulfills subspace property (maximal distance threshold

w.r.t. node attribute values in O) with minimal numberof dimensions

� O fulfills quasi-clique property, according to nodal-degree and threshold�

� Induced subgraph of O is connected, and fulfills minimal size threshold

� Quality function: Density ∙Size ∙#Dimensions

� Pruning using subspace & quasi-clique properties

� Includes Redundancy-optimization step (Overlappingcommunities)

20

[Günnemann et al. 2011]

Page 21: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Correlated Pattern Mining� Structural correlation pattern mining (SCPM)� Correlation between node attribute set and dense

subgraph, induced by the attribute set

�Quality measure: Comparison against null model�Size of the pattern

�Cohesion of the pattern (density of quasi-clique)

� Compare against expected structural correlation ofattribute set (in random graph)

21

[Silva et al. 2011]

Page 22: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Description-driven Community Detection

� Find communities with concise descriptions(e.g., given by tags)

� Focus: Overlapping, diverse, descriptivecommunities

� Language: Disjunctions of conjunctiveexpressions

� Two-stage approach�Greedy hill-climbing step: Generate candidates

for communities�Redescription generation: Induce description

for each community, and reshape if necessary

� Heuristic approach, due to large searchspace

[Pool et al. 2014]

22

Page 23: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

� Starts with candidate communities�Domain knowledge� Partial communities� Start with single vertices (later being extended

using hill-climbing approach)

� ReMine algorithm for deriving patterns forcommunities [Zimmermann et al. 2010]

23

Page 24: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

[Pool et al. 2013]

24

Page 25: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Description-Oriented Community Detection

� Basic Idea: Pattern Mining for Community Characterization�Mine patterns in description space (tags/topics)è Subgroups of users described by tags/topics

� Optimize quality measure in community spaceè Network/graph of users

� Improve understandability of communities (explanation)

[Atzmueller et al., Information Science, 2016]

25

Page 26: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Direct Descriptive Community Mining

� Goal: Identification/description of communities witha high quality (exceptional model mining)� Input: Network/Graph + node properties (e.g., tags)

� Output: k-best community patterns

� Description language: conjunctive expressions

� COMODO algorithm: Top-k pattern mining, based on SD-Map* algorithm for subgroup discovery� Discover k-best patterns

� Search space: Conjunctions/tags

� Apply standard community quality functions, e.g., Modularity [Newman 2004]

26

Page 27: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Community Detection on Attributed Graphs

�Goal: Mine patterns describing such groups

�Merge networks + descriptive features, e.g., characteristics of users

�Target both�Community structure (some evaluation function) &

�Community description (logical formula, e.g., conjunction of features, see above)

27

Page 28: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Optimistic Estimates�Problem: Exponential Search Space

�Optimistic Estimate: Upper bound for thequality of a pattern and all its specializationsèTop-K Pruning

Delicious friend graphLast.fm friend graph 28

Page 29: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Matnet Tutorial Part III:Analysis of Attributed Networks

Rushed Kanawati, Martin Atzmueller, Christine Largeron

A3, Université Sorbonne Paris Cité, FranceCSAI, Tilburg University, Netherlands

Laboratoire Hubert Curien, Université Jean Monnet, Université de Lyon, France

WWW 2018, Lyon, 2018-04-24

Page 30: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 3 : Mining attributed networks

C. LARGERON

Laboratoire Hubert Curien, Universite Jean MonnetUniversite de Lyon

Tutorial MATNet@WWW2018

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 1 / 52

Page 31: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Outline

1 Party 3: Community detection methods for attributed graph

2 Part 4: Evaluation

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 2 / 52

Page 32: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Extended versions of Louvain

Modularity measure [Newman,2004, Clauset et al. 2004]Louvain method [Blondel et al. 2008]Extended versions of Louvain for attributed graph

NB: @Toto Thank to Toto !

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 3 / 52

Page 33: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity

Measure based on null model [Newman and Girwan, 2004]Comparison of the degree distribution in each group with the expecteddistribution in the configuration model

Given a network with n nodes and m edges, the expected number of edgesbetween two nodes i and i0 with degrees ki and ki0 is kiki0

2m

The expected number of edges between nodes 1 and 2 is 3⇤22⇤14 = 0.21

@Blondel

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 4 / 52

Page 34: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity definition

Measure the quality of a community structure based on the links[Newman and Girvan, 2004]Given a graph G = (V,E) and P a partition of V

QNG(P) =1

2m⌃ii0

(Aii0�

ki · ki0

2m)�(ci, ci0)

�(1)

whereI A is the adjacency matrix of GI ki is the degree of the vertex i 2 VI � is the Kronecker function.I m = Card(V)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 5 / 52

Page 35: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity: pros and cons

Pros:Between [-1, 1]Equals 0 if all nodes are clustered into one groupCan automatically determine optimal number of groups

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 6 / 52

Page 36: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity: pros and cons

Drawbacks:Resolution limit [Fortunato and Barthelemy,2007 Marcotorchino andConde Cespedes,2014]

I Tendency to favor large communitiesI Example: Ring of alpha = 30 cliques of size � = 5 with Qsingle = 0.876,

Qpairs = 0.888 (@Fortunato et al. 2007)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 7 / 52

Page 37: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity: pros and cons

Drawbacks:Resolution limit [Fortunato et al. 2007 Good et al. 2010, Marcotorchinoet al. 2014]

I Multiresolution methods by adding a parameter [Arenas et al. 2008],[Reichardt et al., 2006]

Optimizing modularity is NP complete [Brandes, 2008]

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 8 / 52

Page 38: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Louvain algorithm

Non deterministic greedy algorithmBlondel et al. , ”Fast unfolding of community hierarchies in large networks”,CoRR, vol. abs/0803.0476, 2008.

Initially every node forms a communityIteratively repeats 2 steps

1 Moving stepI repeat iteratively for all nodes i

- remove i from its community- insert i in a neighboring community so as to maximize modularity

I until a local maximum is attained

2 Merging step: construct a new graph from the obtained communities

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 9 / 52

Page 39: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Louvain ilustration

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 10 / 52

Page 40: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Louvain illustration

@Guillaume

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 11 / 52

Page 41: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Extended versions of Louvain

Provide communities (i.e. clusters) based on the relationships andattribute values such that

I elements in a same group are densely connected and homogeneous

regarding their attribute values

I elements in different groups are less connected and less similar

Based on the same iterative steps as Louvain (moving + merging)but optimize a global criterion QNG + Qattribute

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 12 / 52

Page 42: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Extended versions of Louvain

Differ by the attribute based criteria Qattribute

I Entropy [J.C. Cruz Gomez et al., 2011] (Categorical attributes)I Distance: SAC algorithm [T.A. Dang et al., 2012] SHC Algorithm [Zhan

et al. 2016]I Inertia in ToTeM or I-Louvain [Combe et al. 2013, 2015]I Common attributes [Asim et al. 2017] (Categorical attributes)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 13 / 52

Page 43: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain [Combe et al. IDA 2015]

Community detection in attributed graph with numerical attributesBased on the optimization of the global criterion QNG + Qinertia

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 14 / 52

Page 44: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Modularity for link data

Measure the quality of a community structure based on the links[Newman and Girvan, 2004]Given a graph G = (V,E) and P a partition of V

QNG(P) =1

2m⌃ii0

(Aii0�

ki · ki0

2m)�(ci, ci0)

�(2)

whereI A is the adjacency matrix of GI ki is the degree of the vertex i 2 VI � is the Kronecker function.I m = Card(V)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 15 / 52

Page 45: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Inertia based modularity for real vector data

Measure the quality, in terms of inertia, of a partition based on theattributes.Given V a set of elements represented into Rp and P a partition of V

Qinertia(P) =X

(i,i0)2V⇥V

" I(V, i) · I(V, i0)(2N · I(V))2 � ki � i0k2

2N · I(V)

!· � (ci, ci0)

#

(3)where I(V) is total inertia,and I(V, i) is the inertia of V toward an element i 2 V .

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 16 / 52

Page 46: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Inertia based modularity: properties

Qinertia(P)

Taking its values between -1 and 1, as modularity does,Insensitive to linear transformation applied to all the vectors,Insensitive to the number of clusters in the partition.

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 17 / 52

Page 47: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain algorithm

Based on the the optimization of the global criterion QNG + Qinertia

Repeats a moving phase and a merging phase as Louvain does

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 18 / 52

Page 48: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain algorithm

Initialization : each vertex belongs to a community

[1]

[2]

[9]

[11]

[28][30]

[22] [24]

[23]

Figure: Initialization

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 19 / 52

Page 49: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain algorithm

Iterative phase :Repeat

I For each vertex v, insert v in its neighboring community which maximizesthe global criterion

until a local maximum is reached

[1]

[2]

[9]

[11]

[28][30]

[22] [24]

[23]mD=3gD=23

mB=2gB=10

mA=2gA=1,5

mC=2gC=29

Figure: End of phase 1

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 20 / 52

Page 50: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain algorithm

Merging phaseBuilding a new graph G0 = (V 0,E0) from the partition P0

I Each community C of P0 represents a vertex v of G0

I The valuation between two vertices v and v0 of G0 is the sum of thevaluations of the edges between the vertices of corresponding communities

I The attribute vector associated to v equals to the gravity center of CI The weight of the class is the weight of the vertex in G0.

mD=3gD=23

MC=2gC=29

MA=2gA=1,5

mB=2gB=10

Figure: End of phase 2

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 21 / 52

Page 51: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Evaluation of I-Louvain method on a real network

Results obtained on a real dataset built using the databasesDBLP (06/18/2014)Microsoft Academic Search (02/03/2014)

DBLP allows to generate a copublication graph G = (V,E) with|V| = 2515 authors|E| = 5313 links: there is a link between two authors if they have aco-published paper in computer science in DBLP.23 attributes: number of publications in 23 research fieldsGround truth: major area of publication in Microsoft Academic Search

Table: Evaluation according to the normalized mutual information (NMI)

Louvain K-means ToTeM I-LouvainNMI 0.69 0.58 0.69 0.72

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 22 / 52

Page 52: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Data

Generated using an attributed graph model [Dang et al. 2012]

|C1| = |C2| = |C3| = 33 NC1(10, 7) NC2(40, 7) NC3(70, 7)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 23 / 52

Page 53: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Measuring the consequences of different modifications ofthe original network

Relational informationdegradationdegrrel 2 (0; 0.25; 0.50)

Attributes degradation� 2 (7; 10; 12)

Increased network size|V| 2 (99; 999; 5001)

Increased edges quantity|E| 2 (168; 315; 508)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 24 / 52

Page 54: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Measuring the consequences of different modifications ofthe original network

Relational informationdegradationdegrrel 2 (0; 0.25; 0.50)

Attributes degradation� 2 (7; 10; 12)

Increased network size|V| 2 (99; 999; 5001)

Increased edges quantity|E| 2 (168; 315; 508)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 24 / 52

Page 55: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Measuring the consequences of different modifications ofthe original network

Relational informationdegradationdegrrel 2 (0; 0.25; 0.50)

Attributes degradation� 2 (7; 10; 12)

Increased network size|V| 2 (99; 999; 5001)

Increased edges quantity|E| 2 (168; 315; 508)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 24 / 52

Page 56: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Measuring the consequences of different modifications ofthe original network

Relational informationdegradationdegrrel 2 (0; 0.25; 0.50)

Attributes degradation� 2 (7; 10; 12)

Increased network size|V| 2 (99; 999; 5001)

Increased edges quantity|E| 2 (168; 315; 508)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 24 / 52

Page 57: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Results : correctly classified rate

CCR Louvain K-means ToTeM I-LouvainCCR #cl. CCR CCR #cl. CCR #cl.

Reference graph

R 84% 4 96% 97% 3 98% 3Relational information degradation

degrrel = 0.25 33% 8 N/A* 18% 30 78% 5degrrel = 0.5 23% 9 N/A* 14% 36 63% 6Spreading of attributes distribution

� = 10 N/A* 90% 95% 3 96% 3� = 12 N/A* 87% 20% 26 98% 3Network size increase

|V| = 999 50% 11 97% 97% 3 84% 4|V| = 5001 40% 12 98% 0.5% 1 518 85% 4Edges insertion

|E| = 315 96% 3 N/A* 95% 3 94% 3|E| = 508 97% 3 N/A* 98% 3 98% 3

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 25 / 52

Page 58: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Results : NMI

NMI Louvain K-means ToTeM I-LouvainReference graph

R 0.784 0.883 0.861 0.930

Relational information degradation

degrrel = 0.25 0.220 N/A* 0.489 0.603

degrrel = 0.5 0.118 N/A* 0.377 0.353Attributes degradation

� = 10 N/A* 0.721 0.819 0.885

� = 12 N/A* 0.637 0.567 0.930

Increased network size

|V| = 999 0.597 0.880 0.854 0.800|V| = 5001 0.586 0.892 0.376 0.774Increased edges quantity

|E| = 315 0.848 N/A* 0.807 0.816|E| = 508 0.876 N/A* 0.917 0.917

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 26 / 52

Page 59: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

Run-time of I-Louvain

Run-time evolution against the number of vertices |V| with |E| = 500000for |V| > 10000.

1

10

100

1000

10000

100000

10 20 30 40 50 60 70 80 90 100

Exec

utio

n tim

e (s

econ

ds)

Number of nodes (in thousands)

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Louvain

I-Louvain

Figure: Run-time of I-Louvain

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 27 / 52

Page 60: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Party 3: Community detection methods for attributed graph

I-Louvain algorithm

Theoretical and experimental resultsInterest of using both attributes and relationships with missing data.The calculation of the variation of Qinertia inducted by the move of anelement from a class toward an other only relies on local information[Combe 2013].

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 28 / 52

Page 61: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Outline

1 Party 3: Community detection methods for attributed graph

2 Part 4: Evaluation

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 29 / 52

Page 62: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Part 4: Evaluation

Community detection evaluation: hard taskNon supervised task: no training set and evaluation setNotion of community not clearly defined

Two questions:

Q1: Suitability of the algorithm: is it appropriate for the dataQ2: Quality of the partition

I is a community satisfying?I is the partition satisfying ?

• in comparison with the possible partitions• in comparison with an expected result: ground truth

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 30 / 52

Page 63: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Evaluation: algorithm quality Q1

Same rules as in clustering [Gordon,1999; Jain and Dubes1988]:Study the stability of the partition to identify coresAfter variation of the network

I Add / remove links (i,j) with a probability p{i, j} = kikj2m [Karrer, 2008]

I Increase or decrease the weights (0 < � < 1) [Gfeller, 2005]I By bootstrapping [Rosvall, 2008]

After variation of the (non deterministic) algorithm via initialization orparameters

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 31 / 52

Page 64: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Evaluation: algorithm quality Q1

Consensus clustering [Goder and Filkov,2008, Topchy et al. 2005]Combinatory optimization problemGreedy approach [Strehl and Ghosh,2002]

1 Apply A on G nP times, yielding nP partitions2 Compute the consensus matrix D = [Dij] is the number of partitions in

which vertices i and j of G are assigned to the same community, dividedby nP

3 All entries of D below a chosen threshold are set to zero4 Apply A on D nP times, yielding nP partitions5 If the partitions are all equal, stop . Otherwise go back to 2.

@Fortunato

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 32 / 52

Page 65: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Evaluation: partitioning quality Q2

Three approaches to evaluate the obtained partitionAccording to a formal definition of the communitiesUsing quality measures

Driven by another task

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 33 / 52

Page 66: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Evaluation with respect to a formal definition

Formal definition of a communityI Clique, k-clique, k-club, quasi-clique, etcI Connected component

Check if the definition is verified by the communities

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 34 / 52

Page 67: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Evaluation with quality measures

Using internal quality measuresI computed on the obtained partition PI depending on the data: attributes or graph (weighted, directed, ...)

Using external quality measuresI apply to compare the obtained partition P with a ground truth P⇤

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 35 / 52

Page 68: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Internal evaluation

Measure the intrinsic quality of the proposed partition.Based on a function Q which associates to a partition P a real value Q(P)Comparison of partitions: High score means ”good” partitionAllows to compare different partitions and identify the best oneAdditivity property: Q(P) =

PC2P q(C) ; q quality function at the

community level

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 36 / 52

Page 69: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Internal criteria

Based on triads: clustering coefficient [Watts & Strogatz,1998]Based on density intra/inter communities [Mancoridis,1998]Based on the links

I Expansion [Radicchi,2004]I Coverage [Almeida,2011],I Conductance [Leskovec,2008]I Performance [Van Dongen,2000, Brandes,2007]

Based on criteria used for detecting the groupsI Ratio Cut, Normalized Cut [Wei et al.,1988, Shi et al. 2000], Modularity

(relational data)I Within inertia, Variance ratio criterion [Calinski et al. 1974], PBM

[Pakhira et al. 2004] (attributes)Based on distance depending on the type of data: euclidean, geodesic, etc

I Silhouette coefficient [Rousseeuw1987, Elghael2007]I Dunn Index [Dunn1973]I Davies and Bouldin index [1979]

but the interpretation of the obtained values is not very easy !CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 37 / 52

Page 70: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

External evaluation

Evaluation using external data / expert /ground truth P⇤Measure the difference Q(P,P⇤) between the proposed partition Pand an expected partition P⇤

I High score means correspondence

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 38 / 52

Page 71: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

External evaluation

Pair countingI Jaccard index [1901]I Mirkin metric [Mirkin,1996]

Cluster machineI Purity [Labatut2012]I Classification error [Meila2001]I Rand index [Rand1971], ARI [Hubert and Arabie1985, Meila2007]I Normalized metric [van Dongen, 2000]

Information theoryI EntropyI Adjusted mutual information [Vinh2010], Normalized mutual information

[Strehl,2003; Dhilon,2004, Pfitzner,2009]I V-measure [Rosenberg,2007]I HomogeneityI Completeness

Assumes the availability of this ground truth !

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 39 / 52

Page 72: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Criteria based evaluation according to the type of data

Vectors Graph

Internal Criteria Dunn Coverture, ExpansionDavies & Bouldin Conductance, Clustering

coef, DensitySilhouette (euclidean) Silhouette (geodesic)Within inertia, inertiabased modularity

Cut, Modularity

External criteria PurityNMI (Normalized Mutual Information), AMIRand index, ARIV-mesure

Hybrid criteria Joint Silhouette coefficient (Moser et al. 2007)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 40 / 52

Page 73: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Criteria based evaluation

State of art papers on partition qualityI Jure Leskovec, Kevin J. Lang, Michael W. Mahoney, Empirical

Comparison of Algorithms for Network Community Detection, WWW2010, p631-640

I H. Almeida, D. Guedes, W. Meira, M. Zaki, Is There a Best QualityMetric for Graph Clusters?, ECML PKDD’11, p.44-59

I J. Yang, J. Leskovec, Defining and Evaluating Network Communitiesbased on Ground -truth, MDS ’12 Proceedings of the ACM SIGKDDWorkshop on Mining Data Semantics Article No. 3

DifficultiesI Interpretation of obtained values for internal criteriaI Access to a ground truth for external criteriaI Solution : benchmarks and generators

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 41 / 52

Page 74: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

ExpertBenchmarks

I UCI Irwine: https://networkdata.ics.uci.edu/resources.phpI UCINET data collectionI Stanford Large Network Dataset Collection (SNAP)I KONECT (the Koblenz Network Collection) (261 networks)I The Colorado Index of Complex Networks (ICON) (608 networks)

but ... metadata does not necessarily fit the community structure[Hric,2014 Peel,2017]Generators

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 42 / 52

Page 75: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Graph generators

Parametric software / model to generate networksTypes of generators

I Graphs without attribute and community [Erdos-Renyi1960,Watts-Strogatz1998, Barabasi-Albert1999],

I Attributed graphs [Gong et al. 2012, Le Tran2015], Rtg[Akoglu-Faloutsos,2009]

I Graph with community structure: Kleinberg model [Kleinberg1999], GNbenchmark [Girvan-Newman2002], Forest Fire [Leskovec et al.2005],LFR [Lancichinetti-Fortunato-2009], Block Two-level Erdos-Renyi[Kolda et al,2013]

I Grow-shrink benchmark: dynamic graph with community structurederived from stochastic block model [Granell et al. 2015]

I Attributed graphs with community structure [Dang2012, Largeron2015,Largeron2016]

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 43 / 52

Page 76: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Graph generators with community structure

2 Girvan-Newman (2002), Weighted version [Fan,2007]I Graph composed of 128 nodes with a degree of 16, divided into 4 groups

having same sizeI Two parameters: pin and pout = 1 � pin , proba of having an in and out linkI degree distribution does not follw a power law

3 Lancichinetti-Fortunato Radicchi (2008, 2009 : overlapping community,2015 : dynamical version based on stochastic blok model)

I degree distribution follows a power law but low transitivity

4 DANCer (Largeron et al,2017)

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 44 / 52

Page 77: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Graph generator DANCer

DANCer: Dynamic Attributed Network with Community Structure Generatorfree available under the terms of the GNU Public Licence at:http://perso.univ-st-etienne.fr/largeron/DANCer_Generator/

easy to use with the interfaceLargeron et al., Plos one 2015, ECML-PKDD 2016, KAIS 2017

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 45 / 52

Page 78: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

DANCer interface

Parameters Panel

Visualization Panel

Measures / Degree Distribution Panel

Figure: User interface of the generator DANCer.

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 46 / 52

Page 79: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Properties verified and indicators computed

P1: Community structureModularity and clustering coefficient average

P2: Community homogeneity according to the attributesWithin inertia ratio

P3: HomophilyObserved homophily vs expected homophily

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 47 / 52

Page 80: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Properties verified and indicators computed

P4: Preferential attachmentDegree distribution (log- log scales)

P5: Small worldDiameter and average shortest path length

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 48 / 52

Page 81: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Generation of a dynamic network i.e. a sequence of graphs

Given a set of parameters, generation of a first graphApply micro and / or macro operations to obtain the next graphs

I Micro operations:• Add / remove vertices• Add / remove between and/or within edges• Update attribute values

I Macro operations:• Merge two communities into a single one• Split one community into two• Migrate vertices from a community to either a new or an existing community

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 49 / 52

Page 82: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Output of the generator

Parameters fileNetwork file with the description of the graphs obtained for thetimestampsMeasures used to evaluate the properties:

I Graph characteristics: Nb. edges between and within, Nb. edges, Nb.vertices, Nb of connected components, Number of communities andnumber of elements per community, Degree distribution

I Attribute measures: Within inertia rate, Observed vs expected homophilyI Structural measures: Modularity, Average clustering coefficient vs

Random clustering coefficient, Average degree, Average shortest pathlength, Diameter

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 50 / 52

Page 83: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Network reproduction

Great number of parameters butThey allow to built a network with well defined structure for the linksand the attributes (good communities / classes)Link-based structure and/or attribute-based structure can then be weakenby changing the parameters to evaluate the performance of algorithms onnoisy networks.Seed parameter : allows to reproduce exactly the same network.

CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 51 / 52

Page 84: MatnetTutorial Part III: Analysis of Attributed Networkskanawati/mantut/Community detection in attributed networks Community detection in attributed network: definition Given an attributed

Part 4: Evaluation

Performance

The generation of a dynamic network composed of 20 graphs having 81,806vertices, 328,016 edges and 100 communities takes approximatively 1 minute.

0

10

20

30

40

50

60

70

1 3 5 7 9 11 13 15 17 19

Runtim

e (

sec)

Timestamp

Base (4.1 %)

Remove Vert (0.4 %)

Add Vert (20.4 %)

Update Attr (7.3 %)

Remove Wth. Edge (4.4 %)

Add Wth. Edge (8.5 %)

Remove Btw. Edge (17.2 %)

Add Btw. Edge (23.7 %)

Migrate (0.0 %)

Merge (3.6 %)

Split (5.1 %)

Change in the runtime over 20 timestamps for micro and macro operations.CHRISTINE LARGERON (LaHC) Tutorial WWW 2018 52 / 52