graph cube: on warehousing and olap multidimensional...

24
Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao , Xiaolei Li , Dong Xin § , Jiawei Han Department of Computer Science, UIUC Groupon Inc. § Google Cooperation [email protected], [email protected] [email protected], § [email protected] June 16th, 2011 SIGMOD 2011 Athens, Greece 1 / 24

Upload: danghanh

Post on 22-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Graph Cube: On Warehousing and OLAPMultidimensional Networks

Peixiang Zhao†, Xiaolei Li‡, Dong Xin§, Jiawei Han†

†Department of Computer Science, UIUC‡Groupon Inc.

§Google Cooperation

[email protected], [email protected][email protected], §[email protected]

June 16th, 2011

SIGMOD 2011 Athens, Greece 1 / 24

Page 2: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Outline

1 Introduction

2 The Graph Cube Model

3 OLAP on Graph Cube

Cuboid Query

Crossboid Query

4 Implementing Graph Cube

5 Experiment

6 Conclusion

SIGMOD 2011 Athens, Greece 2 / 24

Page 3: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Introduction

Recent years have seen an astounding growth of networks in awide spectrum of application domains

Communication networks

Social networks

Biological networks

The Web

Multidimensional networks1 An underlying graph structure comprising entities and

relationships

2 Multidimensional attributes are specified and associated withentities of the network

There exist considerable technology gaps in managing,querying and summarizing multidimensional networkseffectively

SIGMOD 2011 Athens, Greece 3 / 24

Page 4: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

A Sample Multidimensional Network

1

2 3

4

5

6

7 8 9

10

(a) Graph

ID Gender Location Profession Income

1 Male CA Teacher $70, 000

2 Female WA Teacher $65, 000

3 Female CA Engineer $80, 000

4 Female NY Teacher $90, 000

5 Male IL Lawyer $80, 000

6 Female WA Teacher $90, 000

7 Male NY Lawyer $100, 000

8 Male IL Engineer $75, 000

9 Female CA Lawyer $120, 000

10 Male IL Engineer $95, 000

(b) Vertex Attribute Table

Figure: A Multidimensional Network Comprising a Graph Structure and aMultidimensional Vertex Attribute Table

SIGMOD 2011 Athens, Greece 4 / 24

Page 5: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Introduction

Motivation: Can we extend decision support facilities onmultidimensional networks?

Data warehouses and OLAP are advantageous in themultidimensional network scenario

Summarizing the massive networks into different levels ofgranularity for more effective analysis and exploration

Business Intelligence: in Facebook and Twitter, advertisersand marketers take advantage of social networks withindifferent multidimensional spaces to better promote theirproducts via social targeting or viral marketing

However, in multidimensional networks, much of the valuationand interest lies in the network itself!

Simple numeric value based group-by’s in traditional datawarehouses are no longer insightful and of limited usage,because the structural information of the networks is simplyignored

SIGMOD 2011 Athens, Greece 5 / 24

Page 6: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Network Aggregation v.s. Traditional Group-by

5 59

3

Male Female

(a) Aggregate Network

Gender COUNT(*)

Male 5Female 5

(b) Aggregate Table

Figure: Multidimensional Network Aggregation v.s. Traditional RDBAggregation (Group by Gender)

2

3

1

2

1 1

5

(Female, CA)

(Male, IL)

(Male, CA)

(Female, WA)

(Female, NY)

(Male, NY)

(a) Aggregate Network

Gender Location COUNT(*)

Male CA 1Female CA 2Female WA 2Male IL 3Male NY 1

Female NY 1

(b) Aggregate Table

Figure: Multidimensional Network Aggregation v.s. Traditional RDBAggregation (Group by Gender and Location)

SIGMOD 2011 Athens, Greece 6 / 24

Page 7: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Introduction

Graph CubeA multidimensional network can be summarized to aggregatenetworks in coarser levels of granularity within differentmultidimensional spaces

Vertex coalescence

Structure summarization

Different query models and OLAP solutions are proposed formultidimensional networks

Cuboid Queries

Crossboid Queries

Efficient implementation is based on a combination of

Well-studied data cube implementation techniques

Special characteristics of multidimensional networks

The first to systematically address warehousing and OLAPissues on large multidimensional networks

SIGMOD 2011 Athens, Greece 7 / 24

Page 8: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

The Graph Cube Model

Multidimensional Network

A multidimensional network, N , is a graph denoted asN = (V ,E ,A), where V is a set of vertices, E ⊆ V ×V is a set ofedges and A = {A1,A2, . . . ,An} is a set of n vertex-specificattributes, i.e., ∀u ∈ V , there is a tuple A(u) of u, denoted asA(u) = (A1(u),A2(u), . . . ,An(u)), where Ai (u) is the value of uon i-th attribute, 1 ≤ i ≤ n. A is called the dimensions of thenetwork N .

Some (or all) dimension Ai could be ∗ (ALL), representing asuper-aggregation along Ai

Given a set of n dimensions of a network, there exist 2n

multidimensional spaces (aggregations)

The measure within each possible space is no longer a simplenumeric value, but an aggregate network

SIGMOD 2011 Athens, Greece 8 / 24

Page 9: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

The Graph Cube Model

Graph Cube

Given a multidimensional network N = (V ,E ,A), the graph cubeis obtained by restructuring N in all possible aggregations of A.For each possible aggregation A′ of A, the grouping measure is anaggregate network G ′ w.r.t. A′.

2

5 12 8

15 16 19

23

Apex

(Gender) (Location) (Profession)

(Gender, Location) (Gender, Profession) (Location, Profession)

Base

Figure: The Graph Cube Lattice

SIGMOD 2011 Athens, Greece 9 / 24

Page 10: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

OLAP on Graph Cubes

Cuboid Query: return as output the aggregate networkcorresponding to a specific aggregation of the dimensions ofthe multidimensional network

What is the network structure between various genders?

What is the network structure between the various gender andlocation combinations?

5 59

3

Male Female

2

3

1

2

1 1

5

(Female, CA)

(Male, IL)

(Male, CA)

(Female, WA)

(Female, NY)

(Male, NY)

SIGMOD 2011 Athens, Greece 10 / 24

Page 11: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

OLAP on Graph Cubes

A cuboid query is within a single multidimensional space,which follows the traditional OLAP model

A crossboid query crosses multiple multidimensional spacesof the network, i.e., more than one cuboid is involved in aquery

What is the network structure between the user with ID = 3and various locations?

What is the network structure between users grouped bygender v.s. users grouped by location?.

1

1 3

1 1

ID: 3

WA IL

CA NY

3

5

Male

5

Female

CA IL WA NY

6 2

2

3

3 3 2 2

264

SIGMOD 2011 Athens, Greece 11 / 24

Page 12: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Cuboid Queries v.s. Crossboid Queries

Apex

(Gender)

(Gender, Location, Profession)

(Gender, Profession)

(Location)

(Profession)

(Gender, Location)

(a) Traditional Cuboid Queries

(Gender)

"What is the network structure

"What is the network structure between

(Location)

users grouped by gender andusers grouped by location?"

between users and the locations?"

(Gender, Location, Profession)

(b) Crossboid Queries StraddlingMultiple Cuboids

SIGMOD 2011 Athens, Greece 12 / 24

Page 13: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Graph Cube Implementation

Objective: compute the aggregate networks of differentcuboids grouping on all possible dimension combinations of amultidimensional network

1 Full materialization: Best query response time, worst spacecost

2 No materialization: Best space cost, worst query responsetime

3 Partial materialization: A small portion of cuboids ismaterialized in order to balance the tradeoff between queryresponse time and cube resource requirement

SIGMOD 2011 Athens, Greece 13 / 24

Page 14: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Graph Cube Implementation: Partial Materialization

Problem: To select a set S of k cuboids in the graph cubefor materialization, such that the average time taken toevaluate the queries can be minimized

The partial materialization problem is NP-complete, reducedfrom set-cover

Greedy Algorithm: Selecting k cuboids with the highestsize-reduction benefit

Theorem

Let Bgreedy be the benefit of k cuboids chosen by the greedyalgorithm and let Bopt be the benefit of any optimal set of kcuboids. Then Bgreedy ≤ (1− 1/e)× Bopt and this bound is tight

MinLevel Algorithm: Materializing cuboids c , wheredim(c) = l0 indicating the level in the cube lattice at whichwe start materializing cuboids

SIGMOD 2011 Athens, Greece 14 / 24

Page 15: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Experimental Evaluation

DBLP data set

A co-authorship graph with 28, 702 authors as vertices and66, 832 coauthor relationships as edges

Three dimensions: name, area, productivity

area: DB, DM, AI, IRproductivity: Excellent, Good, Fair, Poor

IMDB data set

A movie rating network with 116, 164 vertices and 5, 452, 350edges

Seven dimensions: Title, Year, Length, Budget, Rating,MPAA and Type

MPAA: G, PG, PG-13, R, NC-17, NRType: action, animation, comedy, drama, documentary,romance, short

SIGMOD 2011 Athens, Greece 15 / 24

Page 16: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Effectiveness Evaluation

7752 4590

11329 5031

DB DM

AI IR

22490

18729

1182

7116

8010

2220

1229

1550

2307 1999

(c) (Area)

26170 2165

321 46

Poor Fair

Good Excellent

31587

682

5787

3520

139

15877

872

496

1744 2584

(d) (Productivity)

Figure: Cuboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 16 / 24

Page 17: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Effectiveness Evaluation

6825

(DB, Poor)

732

(DB, Fair)

161(DB, Good)

34

(DB, Excellent)4209

(DM, Poor)

331

(DM, Fair)

43

(DM, Good)

7

(DM, Excellent)

10498

(AI, Poor)

747

(AI, Fair)

83(AI, Good)

1(AI, Excellent)4638

(IR, Poor)

355

(IR, Fair)

34

(IR, Good)

4

(IR, Excellent)

8887

1148

410

105

4182 252

32

4

10975

838

76

4590

478

31

1

5276

28771270

1422

670425

396 290

170

361

253

679

292

333

523244

203

(a) (Area, Productivity)

Figure: Cuboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 17 / 24

Page 18: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Effectiveness Evaluation

7752 4590

DB DM

21591

11329 5031

AI IR

26170 2165 321 46

Poor Fair Good Excellent

10193

5816

2596

7166

1857 1511719

20355

7639 2158

148

9778

4394 1420 414

(a) Area ./ Productivity

Figure: Crossboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 18 / 24

Page 19: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Effectiveness Evaluation

97

DB4

DM3

AI11

IR

52

Poor

33

Fair

24

Good

6

Excellent

1 Hector Garcia-Molina

97 4 3 11

52 33 24 6

(a) Area ./ Base ./ Productivityfor “Hector Garcia-Molina”

66

DB71

DM4

AI13

IR

71

Poor

52

Fair

12

Good

13

Excellent

1 Philip S. Yu

66 71 4 13

71 52 12 13

(b) Area ./ Base ./ Productivityfor “Philip S. Yu”

Figure: Crossboid Queries of the Graph Cube on DBLP Data Set

SIGMOD 2011 Athens, Greece 19 / 24

Page 20: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Efficiency Evaluation

0

2

4

6

8

10

12

14

1 2 3

Run

time

(sec

onds

)

Number of Dimensions

Raw TableGraph Cube

(a) Time v.s. # Dimensions

0

2

4

6

8

10

12

14

1 2 3 4 5 6

Run

time

(sec

onds

)

Number of Edges (*10K)

Raw TableGraph Cube

(b) Time v.s. # Edges

Figure: Full Materialization of Graph Cube for DBLP Data Set

SIGMOD 2011 Athens, Greece 20 / 24

Page 21: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Efficiency Evaluation

0

200

400

600

800

1000

1 2 3 4 5 6

Run

time

(sec

onds

)

Number of Dimensions

Graph CubeRaw Table

(a) Time v.s. # Dimensions

0

100

200

300

400

500

600

700

800

900

1000

1 2 3 4 5

Run

time

(sec

onds

)

Number of Edges (*1M)

Graph CubeRaw Table

(b) Time v.s. # Edges

Figure: Full Materialization of Graph Cube for IMDB Data Set

SIGMOD 2011 Athens, Greece 21 / 24

Page 22: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Efficiency Evaluation

0

5

10

15

20

25

30

35

40

45

6 8 10 12 14 16

Run

time

(sec

onds

)

Number of Materialized Cuboids

GreedyMinLevel

(a) Cuboid Queries

0

10

20

30

40

50

60

70

6 8 10 12 14 16

Run

time

(sec

onds

)

Number of Materialized Cuboids

GreedyMinLevel

(b) Crossboid Queries

Figure: Average Query Respond Time w.r.t. Different PartialMaterialization Algorithms

SIGMOD 2011 Athens, Greece 22 / 24

Page 23: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Conclusion

1 This work seeks to enhance decision-support functionality onlarge multidimensional networks

2 Graph cube: A new data warehousing model is designedspecifically for efficient aggregation on multidimensionalnetworks

3 Different query models and OLAP solutions for Graph Cubeare proposed and studied

Crossboid queries break the boundary of the traditional OLAPmodel by straddling multiple cuboids of the Graph Cube

4 The implementation of Graph Cube is discussed and theexperimental results have demonstrated the power and efficacyof Graph Cube as the first, to the best of our knowledge, toolfor warehousing and OLAP large multidimensional networks

SIGMOD 2011 Athens, Greece 23 / 24

Page 24: Graph Cube: On Warehousing and OLAP Multidimensional Networksweb.engr.illinois.edu/~hanj/slides/sigmod11_pzhao_slides.pdf · Graph Cube: On Warehousing and OLAP Multidimensional Networks

Thank you

SIGMOD 2011 Athens, Greece 24 / 24