towards data analytics on attributed graphs ngs qe oral presentation 1 student : qi fan supervisor:...

56
TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee T

Upload: buck-banks

Post on 31-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

1

TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHSNGS QE Oral Presentation

Student : Qi FanSupervisor: Prof. Kian-lee Tan

Page 2: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

2

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 3: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

3

Outline

Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 4: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

4

Data Analytics

• Data Analytics plays an important part in business [1]• Web analytics for advertising and recommendation• Customer analytics for market optimization• Portfolio analytics for risk control

• Analytics on data yield:• Data products• Data-driven decision support• Insights of data model

[1] Analytics Examples: http://en.wikipedia.org/wiki/Analytics

Graph Analytic Window Query Query Processing Experiments Future Work

Page 5: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

5

Relational Data Analytic

• Table as data representation, SQL as the query language

• Analytic SQL:• Ranking• Windowing• LAG/LEAD• FIRST/LAST• SKYLINE • TOP-K• … …

Graph Analytic Window Query Query Processing Experiments Future Work

Page 6: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

6

Emerging of Large Linked Data

• In real world, linked data are becoming emerging:• Facebook, LinkedIn, Biological network, Phone Call

network, Twitter, etc.

• Modeling linked data in relational way and querying using SQL is inefficient:• Graph queries are often traverse based• SQL based traversal is 100 times slower than adjacent

list based [1]

• Graph model is more fit for linked data!!![1] http://java.dzone.com/articles/mysql-vs-neo4j-large-scale

Graph Analytic Window Query Query Processing Experiments Future Work

Page 7: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

8

Graph Data Model

Vertex Edge

G = (V, E, A)

Attributed Graph Vertices Edges

Graph

Vertex Attr1 Attr2 Attr3

… …

Attribute Table

Attributes

Graph Structure + attribute dimensions

Graph Analytic Window Query Query Processing Experiments Future Work

Page 8: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

9

Graph Data Model

• Graph Data:• Vertex – entities, i.e. User, Webpage, Molecule, etc.• Edge – relationships, i.e. follow, cite, depends-on,

friends-of, etc.• Attribute – profile information for vertex/edge

• Specific model depends on data, thus:• Edge – directed / undirected • Attribute – homogeneous, inhomogeneous

Graph Analytic Window Query Query Processing Experiments Future Work

Page 9: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

10

Graph Data Model Example

People and follow relationships...

People and friends relationships…

Bimolecules and depends-on relationships...

Attributed Graph models a wealth of information

Graph Analytic Window Query Query Processing Experiments Future Work

Page 10: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

11

Graph Data Analytics

• Graph Database environment is growing:• Neo4j, Titan, SPARQL, Pregel etc.

• Graph Data Analytics are becoming popular:• Graph Summarization[1], Graph OLAP [2] etc.

• In our research, we focus on:• Discover needs of native graph analytical queries• Process graph analytical query efficiently

[2] C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu, “Graph olap: Towards online analytical processing on graphs,” in Data Mining, 2008. ICDM’08

[1] Tian, Y., Hankins, R. A., & Patel, J. M. (2008, June). Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD

Graph Analytic Window Query Query Processing Experiments Future Work

Page 11: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

12

Outline

• Attributed Graph Analytic

Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 12: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

13

SQL Window Query• A SQL window query:

• Partitions a table• Sorts each partition• Implicitly forms window of each tuple

Window of Tuple 7

Window of a tuple contains other tuples related to it

Graph Analytic Window Query Query Processing Experiments Future Work

Page 13: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

14

Graph Window Query

• In graph, a vertex can also have a set of related vertices to be its window.

• The aggregation on window is a personalized analysis over each vertex.

Graph Analytic Window Query Query Processing Experiments Future Work

Page 14: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

15

Graph Window Examples

• These queries focus on the neighborhoods of each user, thus the neighborhoods forms a vertex’s window

Summarizing the age distribution of each user’s friends

Summarizing the activeness of each user’s friends

Analyze the industry distribution of a user potential connections

Graph Analytic Window Query Query Processing Experiments Future Work

Page 15: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

16

Graph Window Examples

• These queries focus on the ancestor-descendent relationship of molecules, thus ancestor-descendent is a vertex’s window

Find how many enzymes are in each molecule’s pathway

Find how many molecules are affected by each enzyme in the pathway

Graph Analytic Window Query Query Processing Experiments Future Work

Page 16: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

17

Graph Window Queries

• We thus identify two types of graph window queries:

• K-hop window (k-window):• A vertex’s k-hop window contains all the vertices that

are its the k-hop neighbors.

• Topological window (t-window):• A vertex’s topological window contains all the vertices

that are its accentors / descendents

Graph Analytic Window Query Query Processing Experiments Future Work

Page 17: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

18

Graph Window Queries

• K-hop Window:• Similar to ego-centric analysis of network analysis

community• For undirected graph:

• all vertices that can connect a vertex

• For directed graph:• In-k-hop, for vertices that reaches a vertex in k-hop• Out-k-hop, for vertices that reached by a vertex in k-hop

• K-hop, union of in-k-hop and out-k-hop

• T-Window:• Requires graph to be DAG

Graph Analytic Window Query Query Processing Experiments Future Work

Page 18: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

19

Graph Window Queries

• Graph Window Query:• INPUT: a specific window (k-hop, topological) and an

aggregation function

• OUTPUT: aggregated value over each vertex’s window

Graph Analytic Window Query Query Processing Experiments Future Work

Page 19: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

20

Outline

• Attributed Graph Analytic

• Graph Window Query

Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 20: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

21

Related Work• In [1] a system EAGr has been proposed to process

neighborhood query• Focuses on 1-hop neighbor

• It uses iterative planning methods to share aggregations results between different vertex’s window

• However, it assumes a large intermediate data to reside in memory, which is not reasonable for k-window () and t-window

[1] J. Mondal and A. Deshpande, “Eagr: Supporting continuous ego-centric aggregate queries overlarge dynamic graphs,” SIGMOD, 2015.

Graph Analytic Window Query Query Processing Experiments Future Work

Page 21: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

22

Graph Window Query Processing• Naïve Processing I:

1. Compute vertex’s window sequentially

2. Aggregate each vertex individually

• Advantage:• No large intermediate data generated

• Inefficiencies:• Repeated computation of every vertex’s window:

• k-window is of complexity in arbitrary graph• t-window is of complexity in arbitrary graph

• Slow in individual aggregation:• Each vertex may have window size of • Total aggregation complexity can be

Graph Analytic Window Query Query Processing Experiments Future Work

Page 22: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

23

Graph Window Query Processing

• Naïve Processing II:1. Materialize each vertex’s window

2. On query processing, aggregate each vertex’s window individually

• Advantage:• No computation of windows at run time

• Inefficiencies:• Materialize is not memory efficient

• All the vertex’s window can be as large as

• Query processing is still slow as in Naïve Processing I

Graph Analytic Window Query Query Processing Experiments Future Work

Page 23: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

25

Overview of our approach

• Two index schemes:• Dense Block Index: for general window and k-hop

window• Parent Index: for topological window

• Indexes achieves:• Completely preserve the window information for each

vertex• Space efficiency• Efficient run-time query processing

Graph Analytic Window Query Query Processing Experiments Future Work

Page 24: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

26

Dense Block Index – Matrix View• Window Matrix:

• Records vertex-window mapping• Rows represent vertex• Columns represent window

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

Graph Analytic Window Query Query Processing Experiments Future Work

Page 25: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

27

Dense Block Index – Matrix View• Window Matrix Properties:

• Boolean matrix• Completely keeps the vertex-

window information

• Equivalent Matrices:• Window matrix can be applied

with row and column permutations

• Invariant: number of non-zero elements ()

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

A C B E D FB 1 0 1 0 1 1D 1 1 1 0 1 0F 1 1 1 0 0 1A 1 1 1 1 1 1C 1 1 0 1 1 1E 1 1 0 1 0 0

Graph Analytic Window Query Query Processing Experiments Future Work

Page 26: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

28

Dense Block Index – Matrix View

• Window matrix based aggregation:• Similar to Naïve Processing II

1. Traverse the matrix vertically

2. Aggregate the cells with value one, ignore cells with value zero

• Space and Query Complexity:• in sparse matrix format• in matrix format• Note that can be as large as

Graph Analytic Window Query Query Processing Experiments Future Work

Page 27: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

29

Dense Block Index• Dense Blocks:

• Given a matrix, dense blocks is the submatrix whose values are all non-zeros

• Properties of Dense Blocks ():• Space complexity

• compared to

• Query complexity• compared to

{𝐴 ,𝐵 }× {𝐴 ,𝐵 ,𝐶 }A B C D

A 1 1 1 0

B 1 1 1 0

C 0 0 0 1

D 1 0 0 1

Store row id and column id i.e. (A,B)(A,B,C) rather than 6 elements

Query: Compute A+B first, then the result is shared for window (A,B,C)

Same asymptotical bounds, thus can optimize both simultaneously

Graph Analytic Window Query Query Processing Experiments Future Work

Page 28: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

30

• Dense Block Index:• For every window to be computed, index all the dense

blocks in a window matrix

• A bipartite graph

A B C D E F

A,F,D B A,CC,ED E F

A C B E D F

B 1 0 1 0 1 1

D 1 1 1 0 1 0

F 1 1 1 0 0 1

A 1 1 1 1 1 1

C 1 1 0 1 1 1

E 1 1 0 1 0 0

Dense Block Index

Graph Analytic Window Query Query Processing Experiments Future Work

Page 29: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

31

Dense Block Index

• Properties:• Preserves every non-zero entry of window matrix• During query, no need to access original window

matrix

• Query Processing:1. compute partial aggregates for each dense block

2. compute final aggregates for every window

Graph Analytic Window Query Query Processing Experiments Future Work

Page 30: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

32

Dense Block Index Query ProcessingSummarizing the activeness of each user’s friends:

Compute On Graph GOver 1-hop Window

A 118B 64C 103D 78E 66F 55

Graph Analytic Window Query Query Processing Experiments Future Work

Page 31: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

33

Dense Block Index• Equivalent matrices may have different optimal partitions

• Find best dense block partition out of all equivalent matrices• Fixed size dense block partition is NP-hard [1]• Heuristics need to be applied

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

A C B E D FB 1 0 1 0 1 1D 1 1 1 0 1 0F 1 1 1 0 0 1A 1 1 1 1 1 1C 1 1 0 1 1 1E 1 1 0 1 0 0

[1] V. Vassilevska and A. Pinar, “Finding nonoverlapping dense blocks of a sparse matrix,” Lawrence Berkeley National Laboratory, 2004

Graph Analytic Window Query Query Processing Experiments Future Work

Page 32: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

34

MinHash Clustering for DBI

• Heuristic• Classifies similar windows together, then mining the

dense blocks in each cluster• Clustering + Mining

• Clustering:• Jaccard coefficient is used to measure the similarity

between windows• Since each window is a set of vertices

• MinHash is an efficient way to perform Jaccard coefficient based clustering

Graph Analytic Window Query Query Processing Experiments Future Work

Page 33: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

35

MinHash Clustering for DBI

• Mining:1. Build partial window matrix for each cluster

2. Condense the rows with identical values

3. For uncondensed rows, recursively cluster + mining, until stop condition achieves

Graph Analytic Window Query Query Processing Experiments Future Work

Page 34: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

36

MinHash Clustering for DBIA B C D E F

A 0 0 1 1 1 1B 1 1 1 1 1 0C 0 0 1 1 1 1D 1 1 1 1 0 1E 0 0 1 1 0 0F 0 1 1 1 1 1

A BA 0 0B 1 1C 0 0D 1 1E 0 0F 0 1

C D E FA 1 1 1 1B 1 1 1 0C 1 1 1 1D 1 1 0 1E 1 1 0 0F 1 1 1 1

A BB,D 1 1F 0 1

C D E FA,C,F 1 1 1 1

C D E FB 1 1 1 0D 1 1 0 1E 1 1 0 0

MinHash Clustering

{𝐴 ,𝐶 ,𝐹 }× {𝐶 ,𝐷 ,𝐸 ,𝐹 }

OutputsOutputs

Split

Recursive cluster

Graph Analytic Window Query Query Processing Experiments Future Work

Page 35: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

37

MinHash Clustering for DBI• DBI generation can be summarized into following steps:

• Clustering Step:1. Min-Hash each vertex, based on its window

• Mining Step:1. Generate partial matrix for each window

2. Group identical rows

3. Recursive clustering

Bottlenecks

MINHASH COST: WINDOW COST: for k-window, for t-windowToo HIGH in practice

Graph Analytic Window Query Query Processing Experiments Future Work

Page 36: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

38

Estimated MinHash Clustering

• For K-hop, we developed an estimation scheme to speed up the index creation process.

• The observation is that when hop goes larger, the overlapping between each vertex also goes larger• Thus we can use lower hop window information in the

clustering phase

Graph Analytic Window Query Query Processing Experiments Future Work

Page 37: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

39

Comparison• MinHash Clustering

1. Clustering Step:1. Min-Hash each

vertex, based on its window

2. Mining Step:1. Generate partial

matrix for each window

2. Group Identical rows

3. Recursive clustering

• Estimated Clustering1. Clustering Step:

1. Min-Hash each vertex, based on its lower-hop window

2. Mining Step:1. Generate partial

matrix for each window

2. Group Identical rows

3. Recursive clustering

The estimation reduces the indexing time since:1. Lower-hop window has less elements, so MinHash is faster2. Lower-hop window generation requires less time

Graph Analytic Window Query Query Processing Experiments Future Work

Page 38: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

40

Topological Window Processing

• Dense Block Index can be used on Topological Window as well• However, more efficient index exists given a T-

window query

• Containment Relationship in T-window• If , then • Thus, when compute window of , ’s result can be

directly used.

Graph Analytic Window Query Query Processing Experiments Future Work

Page 39: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

41

Parent Index• Given , in order to use for computing , we need to

materialize the difference between and

• For a given , the vertex with smallest difference must be one of ’s parent

• Thus, for each vertex, we only index its parent which has the smallest different

Graph Analytic Window Query Query Processing Experiments Future Work

Page 40: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

42

Parent Index• A parent index is a lookup table of three fields:• Vertex: the index entry• Parent: the closest parent

id• Diff: the difference

vertices between Vertex and Parent

Graph Analytic Window Query Query Processing Experiments Future Work

Page 41: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

43

Parent Index based Query Processing

• Topologically process each vertex’ window

• Use the formulae:

• Topological order ensures that when processing a vertex, its parents’ results are ready

Graph Analytic Window Query Query Processing Experiments Future Work

Page 42: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

44

Parent Index Creation

• Efficiently creation based on Topological Scan:• During scan, each vertex passes its current ancestor

information to its child• Child on receiving parents’ ancestor information, union

these ancestors• Child on receiving all parents information, record the

portent with smallest difference

Graph Analytic Window Query Query Processing Experiments Future Work

Page 43: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

45

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 44: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

46

Experiments

• Machine: 2.27GHz CPU with 32 GB memory

• Data Synthetic:• SNAP [1] generator for directed graphs• DAGGR [2] generator for DAGs

[2] H. Yildirim, V. Chaoji, and M. J. Zaki, “Dagger: A scalable index for reachability queries in large dynamic graphs,” arXiv preprint arXiv:1301.0977, 2013.

[1] Stanford Networ Analysis Platform, http://snap.stanford.edu/snap/index.html

Graph Analytic Window Query Query Processing Experiments Future Work

Page 45: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

47

Comparing Algorithms• K-hop window:

• MA: materialize ahead algorithm (materialize vertex-window mapping, individual aggregate)

• KBBFS: bounded BFS for computing window of each vertex• MC: MinHash Clustering• EMC: Estimated MinHash Clustering

• Topological window:• MA• DBI: dense block index• TS: Topological Scan to compute window of each vertex• PI: parent index

Graph Analytic Window Query Query Processing Experiments Future Work

Page 46: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

48

Effectiveness of Estimation

Hop = 1 Hop = 2

Hop = 3 Hop = 4

Graph Analytic Window Query Query Processing Experiments Future Work

Page 47: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

49

Benefit of Estimation

Degree 160

Hop MC_HASH MC_BFS EMC_HASH EMC_BFS EMC/MC

2 157,885 241,072 1,666 120,931 0.307294

3 2,281,794 4,494,853 1,637 2,257,493 0.33337

4 4,355,439 8,633,192 1,631 4,414,207 0.339977

Hop MC_HASH MC_BFS EMC_HASH EMC_BFS EMC/MC

2 33,611 19,559 484 9,974 0.19669

3 417,102 742,502 470 374,489 0.323351

4 964,521 184,3078 471 927,751 0.330611

Degree 40

Graph Analytic Window Query Query Processing Experiments Future Work

Page 48: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

50

Index size of MC and EMC

Degree = 40

Graph Analytic Window Query Query Processing Experiments Future Work

Page 49: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

51

Scalability of EMC

V = 100k, hop =1

V = 100k, hop = 2

Graph Analytic Window Query Query Processing Experiments Future Work

Page 50: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

52

Effectiveness of PI

V = 10k

Graph Analytic Window Query Query Processing Experiments Future Work

Page 51: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

53

Index size of PI

Vertex = 10k

Graph Analytic Window Query Query Processing Experiments Future Work

Page 52: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

54

Indexing Time of PI

Degree = 20

Graph Analytic Window Query Query Processing Experiments Future Work

Page 53: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

55

Scalability of PI

Degree = 10

Graph Analytic Window Query Query Processing Experiments Future Work

Page 54: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

56

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

Page 55: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

57

Conclusion and Future Work

• Conclusion:• We proposed two graph window queries and two

indexes for efficient processing

• In future:• Extend the query processing to handle large graphs (in

parallel platform / disk resident index)• More complex aggregation processing (include graph

OLAP)• Dynamic graphs (able to handle updates)

Graph Analytic Window Query Query Processing Experiments Future Work

Page 56: TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHS NGS QE Oral Presentation 1 Student : Qi Fan Supervisor: Prof. Kian-lee Tan

58

Thank you !

Graph Analytic Window Query Query Processing Experiments Future Work