business intelligence on complex graph data dritan bleco yannis kotidis ([email protected])...

24
Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis ([email protected]) ([email protected]) Department of Informatics Athens University Of Economics and Business BEWEB 2012 Berlin

Upload: amos-hunt

Post on 26-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Business Intelligence on Complex Graph Data

Dritan Bleco Yannis Kotidis ([email protected]) ([email protected])

Department of InformaticsAthens University Of Economics and Business

BEWEB 2012Berlin

Page 2: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Dritan Bleco

Outline•Motivation•Graph Data Model•Operators on Graph Data•Querying Graph Records•Query Rewrites• Experiments•Conclusions

Page 3: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Motivational Example• A Supply Chain Management (SCM) application

• Tracks the different routes that articles of a customer order follows from production lines to the consumer hands

• Multiple warehouses are located among the production lines and the shipping points and can stage the products while the order is being assembled

• RFID readers are used to keep track of the location of the articles

• An order follows one or more paths so our web supply chain application produces graph like data

Dritan Bleco

Page 4: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

A :1

D :8 F

B

G

HStart Node End Node MeasureA B 15A C 11

C E 7B E 5B D 9E E 8

E

E D 4E F 2C G 43D H 6

Node Location

A Thessaloniki

C Trikala

B Lamia

D Athens

E Athens

F Athens

G Iraklion

H Kalamata

F G 9F H 4

C

C C 1

Production Lines

Warehouses

Shipping Points

11

155

9

72

4

43

9

4

6

Dritan Bleco

Page 5: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AC:1

D

:8 FB

G

H

E

Node Location

A Thessaloniki

C Trikala

B Lamia

D Athens

E Athens

F Athens

G Iraklion

H Kalamata

11

15

43

7

5

9

29

4

6

Q1: What is the total order completion time?

The longest path between nodes A and G,H

Q2: What is the total processing time for parts that are shipped through warehouses located in Athens?

The longest path between nodes A and G,H considering only paths that transverse at least one location in Athens.

4

Dritan Bleco

Page 6: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AC:1

D

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Aggregate Nodes

Aggregate Node U coalesces Warehouses located in Athens

Set In(U) contains the set of nodes of U that have at least one incoming edge from nodes that do not belong U: In(U)={D, E}

Set Out(u) contains nodes in U that have at least one outgoing edge towards a node that does not belong to U: Out(U)={D,F}

A single node can be abstracted as an aggregate node whose internal structure is not revealed to the query: E =[in(E) ,out(E)] :8 Dritan Bleco

Page 7: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

PathC:1

Different simple Paths

Dritan Bleco

C:111

7

(ACE) starting from out(A) end ending to in(E) (internal measure 8 is not included)

(ACE] starting from out(A) end ending to out(E) (internal measure 8 is included)

(AE)

E:8

(AE]

Page 8: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

Node

C:1

Different simple Paths

Dritan Bleco

(ACE) starting from out(A) end ending to in(E) (internal measure 8 is not included)

(ACE] starting from out(A) end ending to out(E) (internal measure 8 is included)

Starting from in(E) end ending to out(E) [in(E),out(E)]= E

E - Path [EE]

E:8

Page 9: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

Composite Path [AE]*C:1

Composite Paths: Paths with same Starting and Ending Node

[A,E]* ={ [ACE], [ABE] }

Dritan Bleco

A

C:1

E:8

B

11

7

515

Page 10: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Composite Path [A in(u))*

C:1

Composite Paths: Paths with same Starting and Ending Node

[A,E]* ={ [ACE], [ABE] }

[A, in(U))* ={ [ACE),[ABE),[ABD)}

Dritan Bleco

A

C:1

B

11

7

515 9

Page 11: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Composite Path [in(U)out(U)]*

C:1

Composite Paths: Paths with same Starting and Ending Node

[A,E]* ={ [ACE], [ABE] }

[A, in(U))* ={ [ACE),[ABE),[ABD)}

[in(U),out(U)]* ={ [EF],[ED] }

Dritan Bleco

FE:8

D

2

4

Page 12: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

Operators on Graph DataC:1

Path-join operator concatenates two paths p1 and p2

1. Ending node of p1 is the same as the starting node of p2 2. One of the two paths is open-ended at the common end-point.

[ACE) [EFG]= [ACEFG]

Dritan Bleco

A

C:1

E:8 FG11

7 29

Page 13: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Operators on Graph DataC:1

Path-join operator concatenates two paths p1 and p2

1. Ending node of p1 is the same as the starting node of p2 2. One of the two paths is open-ended at the common end-point.

[ACE) [EFG]= [ACEFG]

[Pr, in(U)) [in(U), out(U)] (out(U), Sr]

Pr

Sr

Dritan Bleco

A

C:1

E:8 F

D

11

7 2

415 B

59

G

H

9

46

Page 14: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

Operators on Graph DataC:1

πp(r) Path projection operator projects the record on the edges defined in path p, while retaining their measures.

Π[ACE)(r)={(A,C):11, (C,C):1, (C,E):7 }

Dritan Bleco

A

C:111

7

15 B5

9

The projection of a record on a composite path is computed as a set containing the projections into the constituent paths.

Π[AE)*(r)={ {(A,C):11, (C,C):1, (C,E):7 } , {(A,B):15, (B,E):5 } }

Page 15: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

BI on Graph DataC:1

Intra-Path Aggregate Function Fp(r) : applied on the measures resulting from the projection of record r on path p Sum[ACE)(r)=[ACE):19 Sum[AE)*(r)={ [ACE):19 , [ABE):20}

Pr

Sr

Inter-Path Aggregate Function G(Fp(r) ): consolidates the result(s) obtained via Inter- Path aggregation.

Max(Sum[ACE)*(r) )=Max({ [ACE):19, [ABE):20 })=[ABE):20

Max(Sum[Pr, Sr]*(r)) returns the order completion time for the order depicted in record rDritan Bleco

[ABE):20

[ACE):19

Page 16: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Queries using operatorsC:1

Pr

Sr

Dritan Bleco

Page 17: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Query RewriteC:1

Pr

Sr

Dritan Bleco

A

C:1

E:8 F

DB

G

H

ΜΑΧ( SUM[Pr, in(U)) [in(U), out(U)] (out(U, Sr] (r)) =

MAX (SUM[Pr, in(U))(r) SUM SUM[in(U), out(U)] (r) SUMSUM(out(U, Sr] (r))

Generally G(Fp=p1 p2 (r)) = G ( Fp1(r) H Fp2(r) ) : pushing intra-path on a path

Page 18: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

AD

:8 FB

G

H

E

11

15

43

7

5

9

29

4

6

4

U

Query RewriteC:1

Pr

Sr

Dritan Bleco

A

C:1

E:8 F

DB

G

H

ΜΑΧ( SUM[Pr, in(U)) [in(U), out(U)] (out(U, Sr] (r)) =

MAX (SUM[Pr, in(U))(r) SUM SUM[in(U), out(U)] (r) SUMSUM(out(U, Sr] (r))

MAX ({[ABE):20 ,[ACE):19,[ABD):24} SUM{ [EF] :10,[ED]:12,[DD]:0} SUM{ (FG] :9, (FH]:4, (DH]:6})

MAX( {[ABEFG]:39, [ACEFG]:38, [ABEFH]:34, [ACEFH]:33, [ABEDH]:38, [ABEDH]:37 [ABDH]:30} ) = [ABEFG]:39

[ACE):19 [ABE):20

[ABD):24

[EF]:10[ED]:12[DD]:0

(FG]:9

(FH]:4(DH]:6

Page 19: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Experiments (I)• Two real Schema Graphs:

1. * BAY: Depicts San Francisco Bay Area roads and

2. **Gnutella: Describes connections among Gnutella hosts from August 2002.

• 120 million records are synthesized and assigned random real values to the labels of each record.

• Experimental evaluation using the PBS (Pick By Size)

• Queries 50% intra-path and 50% inter-path chosen with zipf or unif.

• Independent evaluation of the Cost via the total number of tuples that need to be retrieved

• * http://www.dis.uniroma1.it/~challenge9/download.shtml• ** http://snap.stanford.edu/data/p2p-Gnutella05.html

Page 20: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Experiments (II)

PBS, Bay Data Set, Uniform 100 Queries PBS, Bay Data Set, Zipf 100 Queries

PBS-1 considers only intra-path materialized aggregates

PBS-2 considers only inter-path materialized aggregates

PBS selects and materializes both types of views depending on the query workload

Dritan Bleco

Page 21: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Experiments (III)

PBS, Gnutella Data Set, Uniform 100 Queries PBS, Gnutella Data Set, Zipf 100 Queries

PBS-1 considers only intra-path materialized aggregates

PBS-2 considers only inter-path materialized aggregates

PBS selects and materializes both types of views depending on the query workload

Dritan Bleco

Page 22: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Experiments (IV)

Varying Query Mix, BAY Data Set, Uniform Queries

Varying Query Mix, BAY Data Set, Zipf Queries

Mix of intra-/inter-path queries in the BAY dataset for a fixed budget of 20%.

For inter-paths queries PBS and PBS-2 have the same performance

For only intra-path queries PBS-1 and PBS give the best performance.

PBS that considers both types of views provides consistently the largest reduction in query cost. Dritan Bleco

Page 23: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Conclusions• A framework for modeling analytical queries in a graph

database independent of the • underlying storage representation of the records • the query language used

• Our framework • Permits rewriting of complex aggregations into smaller

computational units• Enables cost-based query optimization and pre-

computation of frequently used calculations.

• Experimental results show that proper selection of materialized views can provide substantial gains in a large data warehouse containing millions of graph records.

Dritan Bleco

Page 24: Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department

Thank you,

Questions?

Dritan Bleco