graph-based relationaldata visualization

65
pdf at: www.icmc.usp.br/pessoas/junio Graph-based Relational Graph-based Relational Data Visualization Data Visualization Daniel Mário Daniel Mário de Lima de Lima José Fernando José Fernando Rodrigues Jr. Rodrigues Jr. Agma Juci Agma Juci Machado Traina Machado Traina <danielm@icmc. <danielm@icmc. usp.br> usp.br> <[email protected]> <[email protected]> <[email protected]> <[email protected]> Instituto de Ciências Matemáticas e de Computação Universidade de São Paulo 17 th International Conference Information Visualization 15, 16, 17 and 18 July 2013 SOAS, University of London ● London ● UK pdf at http://www.icmc.usp.br/~junio/PublishedPapers/Lima-et_al_IV-2013.pdf

Upload: jose-f-rodrigues-jr

Post on 14-Jul-2015

101 views

Category:

Data & Analytics


3 download

TRANSCRIPT

pdf at: www.icmc.usp.br/pessoas/junio

Graph-based RelationalGraph-based RelationalData VisualizationData Visualization

Daniel Mário Daniel Mário de Limade Lima

José Fernando José Fernando Rodrigues Jr.Rodrigues Jr.

Agma Juci Agma Juci Machado TrainaMachado Traina

<danielm@icmc.<[email protected]>usp.br>

<[email protected]><[email protected]> <[email protected]><[email protected]>

Instituto de Ciências Matemáticas e de ComputaçãoUniversidade de São Paulo

17th International ConferenceInformation Visualization

15, 16, 17 and 18 July 2013SOAS, University of London ● London ● UK

pdf at http://www.icmc.usp.br/~junio/PublishedPapers/Lima-et_al_IV-2013.pdf

pdf at: www.icmc.usp.br/pessoas/junio

OutlineOutline1. Introduction

2. Method

3. Experiments

4. Conclusions

pdf at: www.icmc.usp.br/pessoas/junio

1. Introduction1. Introduction

pdf at: www.icmc.usp.br/pessoas/junio

IntroductionIntroduction• Large datasets are common

• unstructured: text• semi-structured: XML, RDF, sensor data• structured: relational (DBMS), network (graph-like)

• Analysis Process• Data Representation / Transformation• Storage / Retrieval• Statistics• Visualization• Analysis

Iter

ate

pdf at: www.icmc.usp.br/pessoas/junio

IntroductionIntroduction• How to spot interesting facts in the relationships

of large relational databases?• How are the entities on the database related to

each other?• How are the entities distributed over the

relations of the database?• How do the several attributes of the database

influence the relationships of the entities?• How do we quickly and intuitively browse the

relational database, considering its complex structure?

pdf at: www.icmc.usp.br/pessoas/junio

Our approachOur approach• Use graph representation• Graph-partitioning techniques• Graph-processing• Interactive Visualization

Database Graph Partitioning Visualization Analysis

pdf at: www.icmc.usp.br/pessoas/junio

2. Method2. Method

pdf at: www.icmc.usp.br/pessoas/junio

Relationships as GraphsRelationships as Graphs

Author Publish Work

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Relationships as GraphsRelationships as Graphs

Author Publish Work

Alice ABob B

Charles C…

A 1B 2C 3A 2

1 Optic Fiber2 Networks3 Cryptography

11

22

33

AA

BB

CC

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Relationships as GraphsRelationships as Graphs

Author Publish Work

Alice ABob B

Charles C…

A 1B 2C 3A 2

1 Optic Fiber2 Networks3 Cryptography

11

22

33

AA

BB

CC

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Relationships as GraphsRelationships as Graphs

Author Publish Work

Alice ABob B

Charles C…

A 1B 2C 3A 2

1 Optic Fiber2 Networks3 Cryptography

11

22

33

AA

BB

CC

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Graph PartitioningGraph PartitioningDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Graph PartitioningGraph PartitioningDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Graph PartitioningGraph PartitioningDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical PartitioningDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 0

subgraph 1 subgraph 2

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 0

subgraph 1 subgraph 2

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 0

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

subgraph 1 subgraph 2

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 0

DB m×mDB m×m GraphGraph VisualizationVisualization AnalysisAnalysisGraphTreeGraphTreePartitioningPartitioning

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 1 cut 0 cut 2

DB m×mDB m×m GraphGraph VisualizationVisualization AnalysisAnalysisGraphTreeGraphTreePartitioningPartitioning

pdf at: www.icmc.usp.br/pessoas/junio

Hierarchical PartitioningHierarchical Partitioning

cut 1 cut 0 cut 2

subgraph 1-1

subgraph 1-2

subgraph 2-1

subgraph 2-2

DB m×mDB m×m GraphGraph VisualizationVisualization AnalysisAnalysisGraphTreeGraphTreePartitioningPartitioning

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraph

SuperNode 1-1

SuperNode 1-2

cut 1 cut 0 cut 2

subgraph 2-1

subgraph 2-2

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraph

SuperEdge 1

SuperNode 1-1

SuperNode 1-2

cut 0 cut 2

subgraph 2-1

subgraph 2-2

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraph

SuperEdge 2

SuperNode 2-1

SuperNode 2-2

cut 0

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraph

cut 0

subgraph 1 subgraph 2

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraphSuperNode 2SuperNode 1

cut 0

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraphSuperNode 2SuperNode 1

SuperEdge 0

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraphDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

SuperGraphSuperGraph• Further details in the paper

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

PP AA

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

PP AA

local

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR

PP AA

BRBR

local

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR

PP AA

BRBR

localyear

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR

PP AA

BRBR

’00-’06’00-’06

localyear

’06-’11’06-’11

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR

PP AA

BRBR

’00-’06’00-’06

’06+’06+ **

‘95+‘95+ ’02+’02+

localyear

’06-’11’06-’11

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR

PP AA

BRBR

’00-’06’00-’06

’06+’06+ **

‘95+‘95+ ’02+’02+

agelocalyear

’06-’11’06-’11

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR <40<40 >40>40

PP AA

<40<40

>40>40

BRBR

’00-’06’00-’06

’06+’06+ **

‘95+‘95+ ’02+’02+

agelocalyear

’06-’11’06-’11

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR <40<40 >40>40

PP AA

<40<40

>40>40

BRBR

’00-’06’00-’06

’06+’06+ **

‘95+‘95+ ’02+’02+

agelocalyear dept

’06-’11’06-’11

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR <40<40 >40>40

PP AA

<40<40

>40>40

BRBR

’00-’06’00-’06

IMEIME **

’06+’06+ **

EESCEESC

ICMCICMC

‘95+‘95+ ’02+’02+

agelocalyear dept

’06-’11’06-’11

FFLCHFFLCH

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR <40<40 >40>40

PP AA

<40<40

>40>40

BRBR

’00-’06’00-’06

IMEIME **

’06+’06+ **

‘95+‘95+ ’02+’02+

agelocalyear dept

’06-’11’06-’11

FFLCHFFLCH

Connectivity SuperEdges

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

Attribute-based PartitioningAttribute-based PartitioningPaper Author

PaperPaper AuthorAuthor

USUS

USUS BRBR <40<40 >40>40

PP AA

<40<40

>40>40

BRBR

IMEIME **

’06+’06+ **

‘95+‘95+ ’02+’02+

agelocalyear dept

FFLCHFFLCH

Connectivity SuperEdges

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

Left relation: Paper = {idPaper, country, year, title}Rght relation: Author = {idAuthor, age, dept, authorName}

pdf at: www.icmc.usp.br/pessoas/junio

R-Mine PrototypeR-Mine Prototype• Based on the GMine System• Test platform with minimalistic design

• SuperNode tree:• node-link, radial layout, partial focus

• SuperEdge graphs:• node-link, bipartite layout, edge filtering

• Leaf SuperNode graphs: typical node-link

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

R-Mine PrototypeR-Mine PrototypeDB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

3. Experiments3. Experiments

DB m×mDB m×m GraphGraph GraphTreeGraphTreePartitioningPartitioning VisualizationVisualization AnalysisAnalysis

pdf at: www.icmc.usp.br/pessoas/junio

Tycho USP databaseTycho USP database• Data from several USP systems

• Personnel, Supervisions, Publications, Events…

pdf at: www.icmc.usp.br/pessoas/junio

Tycho USP databaseTycho USP database• Using 5 entities and 5 relationships

• 350k events• 380k examinations• 691k publications• 50k people• 26k supervisions

• 1.5 million nodes total• 1.8 million edges (relationships)

pdf at: www.icmc.usp.br/pessoas/junio

Q1: active authorsQ1: active authors• Which group of People (by age) have the

largest number of recent publications?

SQL: SELECT a.age, count(*) num FROM PersonPublication x JOIN Publication p ON p.id = x.publication AND p.year >= 2008 JOIN Person a ON a.id = x.author GROUP BY a.age ORDER BY num DESC

pdf at: www.icmc.usp.br/pessoas/junio

Q1: active authorsQ1: active authors

pdf at: www.icmc.usp.br/pessoas/junio

Q1.b: active authorsQ1.b: active authors• Who are them?• SQL: SELECT a.name, p.title

FROM PersonPublication xJOIN Publication p ON p.id = x.publication AND p.year >= 2008JOIN Person a ON a.id = x.authorWHERE a.age IN (SELECT age FROM (SELECT a.age age, count(*) num FROM PersonPublication x JOIN Publication p ON p.id = x.publication AND p.year >= 2008 JOIN Person a ON a.id = x.author GROUP BY a.age ORDER BY num DESC) T)

pdf at: www.icmc.usp.br/pessoas/junio

Q1.b: active authorsQ1.b: active authors

pdf at: www.icmc.usp.br/pessoas/junio

Q2: favorite countriesQ2: favorite countries• Which country receives the largest number of recent

publications from this group of people?• SQL: SELECT a.name, p.title

FROM PersonPublication xJOIN Publication p ON p.id = x.publication AND p.year >= 2008JOIN Person a ON a.id = x.author AND a.age BETWEEN 56 AND 63WHERE p.country IN (SELECT country FROM (SELECT p.country country, count(*) num FROM PersonPublication x JOIN Publication p ON p.id = x.publication AND p.year >= 2008 JOIN Person a ON a.id = x.author AND a.age BETWEEN 56 AND 63 GROUP BY p.country ORDER BY num DESC) T)

pdf at: www.icmc.usp.br/pessoas/junio

Q2: favorite countriesQ2: favorite countries

pdf at: www.icmc.usp.br/pessoas/junio

Q2: favorite countriesQ2: favorite countries

pdf at: www.icmc.usp.br/pessoas/junio

Q3: active authors per Q3: active authors per countrycountry• Now in one specific country, which group of People is the

most active recently?• SQL: SELECT a.name, p.title

FROM PersonPublication xJOIN Publication p ON p.id = x.publication AND p.year >= 2008 AND p.country = ‘Estados Unidos’JOIN Person a ON a.id = x.authorWHERE a.age IN (SELECT age FROM (SELECT a.age age, count(*) num FROM PersonPublication x JOIN Publication p ON p.id = x.publication AND p.year >= 2008 AND p.country = ‘Estados Unidos’ JOIN Person a ON a.id = x.author GROUP BY a.age ORDER BY num DESC) T)

pdf at: www.icmc.usp.br/pessoas/junio

Q3: active authors per Q3: active authors per countrycountry

pdf at: www.icmc.usp.br/pessoas/junio

Q3: active authors per Q3: active authors per countrycountry

pdf at: www.icmc.usp.br/pessoas/junio

Performance: individual Performance: individual queriesqueries150 analytical questions: PostgreSQL × R-Mine

pdf at: www.icmc.usp.br/pessoas/junio

Performance: accumulated Performance: accumulated timetime150 analytical questions: PostgreSQL × R-Mine

pdf at: www.icmc.usp.br/pessoas/junio

Performance: loading timePerformance: loading time

SuperNode Load(s)Connectivity

to all siblings (seconds)

SQL (seconds)

(initial loading) 6.032 - -

Person 0.057 5.847 7.349

Event 0.271 5.276 26.716

Publication 0.160 4.484 27.677

Total 6.520 15.607 61.742

pdf at: www.icmc.usp.br/pessoas/junio

4. Conclusions4. Conclusions

pdf at: www.icmc.usp.br/pessoas/junio

Our approachOur approach• Can use the Relational information

• To guide the partitioning• To give an initial context to the analyst

• Faster than running SQL queries

• Make neighborhood exploration easy• Interactive Visualization environment

pdf at: www.icmc.usp.br/pessoas/junio

ConsiderationsConsiderations• Initial parameters

• Which entities, relationships and attributes?• In which order?• How to define partitions? Ranges?• How many partitions?

• Different interaction tasks• Ongoing usability evaluation

pdf at: www.icmc.usp.br/pessoas/junio

ThanksThanks