visual analysis of large graphs using ( x , y )-clustering and hybrid visualizations

66
Visual Analysis of Large Graphs Using (X, Y)-clustering and Hybrid Visualizations V. Batagelj, W. Didimo, G. Liotta, P. Palladino, M. Patrignani (Univ. Ljubljana, Univ. Perugia, Univ. Roma Tre) In Proc. IEEE Pacific Visualization 2010

Upload: daria

Post on 23-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Visual Analysis of Large Graphs Using ( X , Y )-clustering and Hybrid Visualizations . V. Batagelj , W. Didimo , G. Liotta , P. Palladino , M. Patrignani ( Univ. Ljubljana , Univ. Perugia, Univ. Roma Tre ) In Proc. IEEE Pacific Visualization 2010. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Visual Analysis of Large Graphs Using (X, Y)-clustering and Hybrid Visualizations

V. Batagelj, W. Didimo, G. Liotta,P. Palladino, M. Patrignani

(Univ. Ljubljana, Univ. Perugia, Univ. Roma Tre)

In Proc. IEEE Pacific Visualization 2010

Page 2: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Outline

• The problem of visualizing large graphs• State of the art• Our contribution• Conclusions and open problems

Page 3: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

The problem of visualizing large graphs

Some major issues in the visualization of large graphs:

• Readability: optimization of aesthetic criteria• Scalability: fast computation• Visual complexity: interaction tools that allow users to limit the amount of information displayed on the screen

— overview of the graph— details on demand — user’s mental map preservation

Page 4: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

State of the art• Readability: there are many effective algorithms that are

computationally fast for relatively small and sparse graphs (see the graph drawing book of Di Battista, Eades, Tamassia, Tollis , 1999)

Page 5: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

State of the art• Scalability: there are some fast graph drawing algorithms

based on physical or algebraic models; the drawings have high visual complexity and do not allow detailed views (see the survey of Hacul and Jünger, 2007)

Page 6: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

State of the art• Visual complexity: draw the whole graph and then

interact with it; ex. focus+context techniques, like fisheye view or hyperbolic layouts; conceived for tree-like graphs (see the survey of Herman, Melançon, Marshall, 2000)

Page 7: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

State of the art• Interactive approaches for visualizing and exploring

large graphs:– graph visualized incrementally or at different levels of

details– strong interaction between the user and the drawing

Page 8: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches• Bottom-up strategies: the graph is visualized a

piece at a time—topological window moving through canvas (Eades et

al. ,1997) —Limits: no overview, the user’s mental map

preservation is difficult

Page 9: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches• Bottom-up strategies: the graph is visualized a

piece at a time—incremental enhancement of the drawing (ex.

Carmignani et al., 2002)—Limits: no overview, the user’s mental map

preservation is difficult without readability degradation

Page 10: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches

• Top-down approaches

Page 11: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches

• Top-down approaches– the graph is clustered (vertices are grouped

together)

Page 12: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches

• Top-down approaches– the graph is clustered (vertices are grouped

together)– a simplified view is shown (overview)

Page 13: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches

• Top-down approaches– the graph is clustered (vertices are grouped

together)– a simplified view is shown (overview)– the user interactively explores the clusters

(detailed views)

Page 14: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interactive Approaches

• Top-down strategies– the graph is clustered (vertices are grouped

together)– a simplified view is shown– the user interactively explores the clusters

• Limits– someone/something has to define clustering rules– existing clustering algorithms do not guarantee

properties on the graph of clusters

Page 15: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Our contribution

• A top-down approach with these ingredients:– a new clustering framework– new clustering algorithm within the framework– hybrid visualizations

• A system: VHyXY• Some case studies

Page 16: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Basic Terminology: Clustering

• G=(V, E): graph with vertex set V and edge set E• A cluster of G=(V, E) is a subset of V• A clustering C of G is a set of disjoint clusters of G

Page 17: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Basic Terminology: Clustering

• The graph of clusters H(G, C) is the graph obtained by collapsing each cluster of C into a single vertex and by replacing multiple edges with a single one

Page 18: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Basic Terminology: Clustering

• The graph of clusters H(G, C) is the graph obtained by collapsing each cluster of C into a single vertex and by replacing multiple edges with a single one

Page 19: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

A new clustering framework

• Clustering algorithms usually detect groups of highly connected vertices without taking care of the graph of clusters

• We adopt a new framework for the design of automatic clustering algorithms that guarantee: – desired properties for the clusters– desired properties for the graph of clusters

Page 20: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

The (X,Y)-clustering

• X and Y are two classes of graphs with certain properties

• G is called an (X,Y)-graph if there exists a clustering of G such that:– each cluster induces a subgraph that belongs to Y– the graph of clusters belongs to X

Page 21: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

(X,Y)-graph example

• Let X be the class of cycles and let Y be the class of K4

Page 22: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

(X,Y)-graph example

• Let X be the class of cycles and let Y be the class of K4

Page 23: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

(X,Y)-graph example

• The graph is a (cycle,K4)-graph

• Let X be the class of cycles and let Y be the class of K4

Page 24: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Interesting combinations

• X is some class of sparse graphs:– planar graphs, cycles, trees, paths, …

• Y is some class of highly connected graphs:– cliques, subgraphs with high-degree vertices, …

• One can think of using different visual paradigms and algorithms for drawing the graph of clusters and the subgraph induced by each cluster (hybrid visualization)

Page 25: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Remark on (X,Y)-clustering

• (X, Y)-clustering was previously defined by Brandenburg (GD 1997), but his model requires that every vertex belongs to some cluster

• Our model does not have this requirement, which poses severe practical limitations

Page 26: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

The (X,Y)-clustering problem

• Problem: Given a graph G and two desired classes X and Y, is G an (X,Y)-graph?

• This problem is NP-hard in general

• Theorem: Deciding whether G is a (planar, k-clique)-graph for desired k ≥ 5 is NP-hard

• This result motivates us to look for some relaxation of cliques

Page 27: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

K-core components

• The subgraph induced by a cluster is ak-core component if it is a maximal connected subgraph such that every vertex has degree at least k

5-core component

4-core component

4-core component

Page 28: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

(Planar, K-core component)-graphs

• We investigate (X,Y)-graphs G such that:– X is the class of planar graphs– Y is the class of k-core components of G

• In particular, for a given k > 0, one can ask whether G is a (planar, k-core component)-graph– this decision problem can be solved in polynomial time– we give a polynomial-time algorithm that finds the

maximum k for which G is a (planar, k-core component)-graph, and that computes the corresponding clustering

Page 29: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Properties of (planar, k-core component)-graphs

The union of all k-core components of G is called the k-core of G (the k-core of G, if it exists, is unique)

Property. If G has the k-core Gk (for some k ≥ 1), then G has the (k−1)-core G(k−1) and Gk ⊆ G(k−1)

Lemma. If G is a (planar, k-core component)-graph then it is a (planar, (k−1)-core component)-graph

Page 30: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemma

Page 31: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemmaV1 V2

Page 32: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemma

u(V1)

H(G, C)

u(V2)

V1 V2

Page 33: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemma

H(G, C)

u(V1)u(V2)

V1’ V2’

Page 34: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemma

u(V1’)

H(G, C’)H(G, C)

u(V2’)u(V1)

u(V2)

V1’ V2’

Page 35: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Proof of the lemma

H(G, C)

u(V1)u(V2)

V1’ V2’

u(V1’)u(V2’)

H(G, C’)

Page 36: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Clustering Algorithm

• Theorem: Let G be a graph with n vertices and m edges. There exists an O((n+m)log n)-time algorithm that computes the maximum k for which G is a (planar, k-core component)-graph, and the corresponding clustering

• Steps of the algorithm:1. Compute core-numbers for the vertices2. Perform a binary search on core-numbers3. For each graph of clusters, test its planarity

Page 37: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation• Compute the core number of each vertex, i.e., the

maximum k for which there exists a k-core that contains the vertex

Page 38: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

• Compute the core number of each vertex, i.e., the maximum k for which there exists a k-core that contains the vertex

Page 39: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 40: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 41: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 42: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

2

2

2

11

1 2 3 4 5

G is Planar

Page 43: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 44: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 45: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4

1 2 3 4 5

G is not Planar

K5

Page 46: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 47: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

4

4

4

44

4 5

5

5

5

5

5

1 2 3 4 5

Page 48: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Algorithm animation

33

3

3

2

2

2

11

1 2 3 4 5

G is Planar

Maximum k = 4

Page 49: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Hybrid Visualizations

• The (X, Y)-clustering technique can be used to design hybrid visualizations– combination of different drawing conventions for

different parts of the graph– Example:• node-link representation for sparse subgraphs• matrix-based representation for dense subgraphs

– Highly readable drawings for the graph of clusters (which is always planar)

Page 50: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Matrix based representation

• Matrix-based representation– vertices are rows and

columns– edges are cells

• The ordering of vertices in rows/columns may strongly affect the number of crossings in the drawing

Page 51: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Crossings minimization heuristic

vert

ex1

vert

ex2

vert

ex3

vert

ex4

vert

ex5

vert

ex6

vert

ex7

vert

ex8

vert

ex10

vert

ex11

vert

ex12

vert

ex13

vert

ex14

vert

ex15

vert

ex16

vert

ex17

vert

ex18

vert

ex19

vert

ex10

vert

ex20

Page 52: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Crossings minimization heuristic

vert

ex1

vert

ex2

vert

ex3

vert

ex4

vert

ex5

vert

ex6

vert

ex7

vert

ex8

vert

ex10

vert

ex11

vert

ex12

vert

ex13

vert

ex14

vert

ex15

vert

ex16

vert

ex17

vert

ex18

vert

ex19

vert

ex10

vert

ex20

Page 53: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Crossings minimization heuristic

vert

ex1

vert

ex2

vert

ex3

vert

ex4

vert

ex5

vert

ex6

vert

ex7

vert

ex8

vert

ex10

vert

ex11

vert

ex12

vert

ex13

vert

ex14

vert

ex15

vert

ex16

vert

ex17

vert

ex18

vert

ex19

vert

ex10

vert

ex20

Page 54: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

vert

ex12

vert

ex13

vert

ex14

vert

ex15

Crossings minimization heuristic

vert

ex1

vert

ex2

vert

ex3

vert

ex4

vert

ex5

vert

ex6

vert

ex7

vert

ex8

vert

ex10

vert

ex11

vert

ex16

vert

ex17

vert

ex18

vert

ex19

vert

ex10

vert

ex20

Page 55: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Remark about hybrid visualizations

• A hybrid visualization that combines node-link and matrix-based representations was previously used in the literature (Henry et al., 2007 - NodeTrix)

• Clusters are manually defined– no automatic clustering– no automatic ordering for

rows-columns

Page 56: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

The System VHyXY

• VHyXY integrates the clustering algorithm and hybrid visualizations– X-class chooser (e.g., planar, forest)– Y-class chooser (e.g., k-core component)– Filters on edge weights– Specific drawing algorithms for each component

Page 57: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

User interface

Page 58: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Case Study: Co-authorship networks

• DBLP: on-line database of publications in Computer Science

• VHyXY allows user to query DBLP on a specific topic– It retrieves data about all papers on that topic

(looking at the title of the papers)– It builds a network where

• authors are vertices• there is an edge between two authors if they share a paper

(edge’s weight = number of papers)

Page 59: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

• Co-authorship network for “orthogonal drawing”

Page 60: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

• Hybrid visualizations: a matrix and a circular in an orthogonal layout

Page 61: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

• Hybrid visualizations: a matrix and a circular inside an orthogonal

Page 62: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations
Page 63: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

• Larger network for “graph drawing”

114 vertices and 494 edges

Page 64: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

• Same network with edge filtering (weight > 2)

Page 65: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Clustering algorithm performanceIndex name Value (0-1)

Graph clustering 0.62

Coverage 0.56

Clustering performance

0.94

Clustering error 0.06

• Graph clustering– Property of a graph: the

higher the value the better can be the clustering

• Coverage– How the computed clusters

covers edges of the whole graph

• Performance– Counts the number of

“correctly interpreted pairs of nodes” in a graph

• Error– 1-performance

[Brandes et al. “Engineering graph clustering: Models and experimental evaluation” ACM Journal of Experimental Algorithmics 2007]

Index name Value (0-1)

Graph clustering 0.64

Coverage 0.37

Clustering performance

0.999

Clustering error 0.001

0.94

0.999

Page 66: Visual Analysis of Large Graphs Using ( X ,  Y )-clustering and Hybrid Visualizations

Open problems

• Explore additional X-classes or Y-classes for which polynomial-time clustering algorithms exist– X: forest, path, outerplanar, …– Y: relaxations of cliques, …

• Extend our techniques to– multi-level clustering (hierarchical clustering)– overlapping clusters

• Experiment the system on a larger set of application domains– biological networks, criminal networks, …