tale of two graph frameworks: graph frames and tinkerpop

56
Artem Aliev and Russell Spitzer, DataStax A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP #EUeco3

Upload: russell-spitzer

Post on 18-Mar-2018

70 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Artem Aliev and Russell Spitzer, DataStax

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP

#EUeco3

Page 2: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Pierrot and Harlequin• Artem

• Graph Analytics Expert • Earth

• Russell • Distributed Systems Enthusiast • Earth

2

Page 3: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Tinkerpop and GraphFrames provide Complimentary Approaches for Graph Analytics

DataSet Catalyst

GraphFrames

3#EUeco3

Page 4: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs are Vertices and Edges

4

Vertices are things and edges represent their relations to one another

#EUeco3

Page 9: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs are Vertices and Edges

9

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-D)Class: GalaxyService: 2363–2371 (8 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution class[6]

Service: 2245–2285 (40 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise class[8][9]

Service: 2286–2293 (7 Years)

Ship

Ship

Ship

ShipVertex Label

succeeded by

succeeded by

succeeded by

#EUeco3

Page 10: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs are Vertices and Edges

10

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-D)Class: GalaxyService: 2363–2371 (8 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution classService: 2245–2285 (40 Years)

Ship

Ship

Ship

ShipPosition: Captain Name: Kirk

Position: Captain Name: Picard

Crew

Crew

succeeded by

succeeded by

succeeded by

#EUeco3

Page 11: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs are Vertices and Edges

11

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-D)Class: GalaxyService: 2363–2371 (8 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution classService: 2245–2285 (40 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise classService: 2286–2293 (7 Years)

Ship

Ship

Ship

ShipPosition: Captain Name: Kirk

Position: Captain Name: Picard

Crew

Crew

succeeded by

succeeded by

succeeded byserved onserved on

served on

served on

#EUeco3

Page 12: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs are Vertices and Edges

12

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-D)Class: GalaxyService: 2363–2371 (8 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution classService: 2245–2285 (40 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise classService: 2286–2293 (7 Years)

Ship

Ship

Ship

ShipPosition: Captain Name: Kirk

Position: Captain Name: Picard

Crew

Crew

succeeded by

succeeded by

succeeded byserved onserved on

served on

served on

But why do I want this?

#EUeco3

Page 13: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Graphs let us ask questions about our data based on their relations

13

What Captain Served After Kirk?

What Ship was two after the NCC-1701?

#EUeco3

Page 14: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Traversals involve following paths through the Graph

14

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-D)Class: GalaxyService: 2363–2371 (8 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution classService: 2245–2285 (40 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise classService: 2286–2293 (7 Years)

Ship

Ship

Ship

ShipPosition: Captain Name: Kirk

Position: Captain Name: Picard

Crew

Crew

succeeded by

succeeded by

succeeded byserved onserved on

served on

served on

#EUeco3

Page 15: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

What Captain was After Kirk?

15

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise classService: 2286–2293 (7 Years)

Ship

Ship

Position: Captain Name: Kirk

Position: Captain Name: Picard

Crew

Crewsucceeded by

served on

served on

#EUeco3

Page 16: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

What Ship was two after the NCC-1701?

16

Registry: USS Enterprise (NCC-1701-C)Class: AmbassadorService: 2332[11] – 2344 (12 Years)

Registry: USS Enterprise (NCC-1701)Class: Constitution classService: 2245–2285 (40 Years)

Registry: USS Enterprise (NCC-1701-A)Class: Enterprise classService: 2286–2293 (7 Years)

Ship

Ship

Ship

succeeded by

succeeded by

#EUeco3

Page 17: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Tinkerpop is a Powerful and Flexible Graph Framework

• Server, Language, Connectors • Graph Framework for

OLAP and OLTP • Node Centric Representations • Fluent API (Gremlin) • Fully Self Contained Framework

17#EUeco3

Page 18: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

OLTP Examples

18#EUeco3 18

Page 19: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Movie Lens Example Schema

19

https://grouplens.org/datasets/movielens/

#EUeco3 19

Page 20: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

20

Page 21: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

What happens when you have too much data?

21

Page 22: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Tinkerpop Spark OLAP Mechanism• Instead of one traversal we traverse starting from all nodes simultaneously

22

Page 23: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Distribution Requires Partitioning

23

?

Big DataIndependent Chunks

of Data#EUeco3

Page 24: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Vertex Stored in a PairRDD Id -> StarVertex(Edge and Property Information)

24

1

A

C

D

Star Vertex: Adjacency list representation1: "A", "Kirk"A: "C", "Kirk"C: "D", "Picard"D: "Picard" Just Id

Of Connected Vertex

Page 25: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Vertex Program Runs Initializing Traverser for every Vertex

25

1

A

C

D

SparkMemory - Accumulator - Used for GlobalState

Page 26: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Then we cycle through a message Passing Algorithm

26

1

A

C

D

1

A

C

D

1

A

C

D

SparkMemory - Accumulator - Used for GlobalState

Page 27: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Then we cycle through a message Passing Algorithm

27

1

A

C

D

1

A

C

D

1

A

C

D

SparkMemory - Accumulator - Used for GlobalState

Passes messages from one Vertex to another with a join

Page 28: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Then we cycle through a message Passing Algorithm

28

1

A

C

D

1

A

C

D

1

A

C

D

SparkMemory - Accumulator - Used for GlobalState

Repeat

Page 29: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Then we cycle through a message Passing Algorithm

29

1

A

C

D

1

A

C

D

1

A

C

D

SparkMemory - Accumulator - Used for GlobalState

All Traversers HaltOr Program Terminates

Result!

Page 30: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Example OLAP Traversals

30

Page 31: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Tinkerpop Spark OLAP Pros/ConsPros • Every message pass requires only a single shuffle • Edges and edge properties accessible without a step • Very Flexible, Many Provider Specific Shortcuts possible • Internal properties can be any Java type • All in one, Server already ready for multiple clients Cons • Limited in ability to connect to external sources/other spark applications• Flexibility of framework allows for many platform specific shortcuts to be added• Genericness provides difficulty in making some optimizations • Edges co-partitioned with vertices, high degree nodes can cause memory issues

31

Page 32: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Background• Third Party Package • https://graphframes.github.io/ • Integrates with Dataset/Dataframe in Spark • Relational under the hood

32

Page 33: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames are built of two DataFrames

33

Row

Column

Page 34: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames are built of two DataFrames

34

id job species

Geordi Chief Engineer

Human

Data Science Officer

Android

Vertex DataFrame

src dst relationship

Geordi Data Friend

Edge DataFrame

Friend

Page 35: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames are built of two DataFrames

35

id job species

Geordi Chief Engineer

Human

Data Science Officer

Android

Vertex DataFrame

src dst relationship

Geordi Data Friend

Edge DataFrame

Friend

Can Only Be Spark Types

Page 36: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames are built of two DataFrames

36

id job species

Geordi Chief Engineer

Human

Data Science Officer

Android

Vertex DataFrame

src dst relationship

Geordi Data Friend

Edge DataFrame

Friend

No Built in Labels

Page 37: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Catalyst Optimizes any Requests• Simple requests using DataFrame api don't do

anything special • Some methods fall back to GraphX (RDD Based) • Others use pure DataFrame methods

37

Page 38: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Motif Matching

38

GraphFrame(a)-[e]->(b)

V E

Page 39: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Motif Matching

39

GraphFrame(a)-[e]->(b)

Vertex (a) Vertices as a UDT "A"V E

A: <VertexRow>

Page 40: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Motif Matching

40

GraphFrame(a)-[e]->(b)

Vertex (a) Vertices as a UDT "A"

Edge [b] Edges as UDT "E"Join with edges where A.id = E.src

V E

A: <VertexRow>

JoinA: <VertexRow>, E: <EdgeRow>

Page 41: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Motif Matching

41

GraphFrame(a)-[e]->(b)

Vertex (a) Vertices as a UDT "A"

[e] Vertices as UDT "B" Join with edges where E.dst = B.id

Edge

Vertex

[b] Edges as UDT "E"Join with edges where A.id = E.src

V E

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

Join

JoinA: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Page 42: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Motif Matching

42

GraphFrame(a)-[e]->(b)

Vertex (a) Vertices as a UDT "A"

[e] Vertices as UDT "B" Join with edges where E.dst = B.id

Edge

Vertex

[b] Edges as UDT "E"Join with edges where A.id = E.src

V E

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

Join

JoinA: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

THAT'S SO MANY JOINS

Page 43: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3 43

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

DataFrames means Optimizations are Automatic

Page 44: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3 44

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Select A.ID

Columns Pruned and Predicates Pushed

Page 45: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

45

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Select A.ID

Columns Pruned and Predicates Pushed

#EUeco3

Page 46: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

46

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Select A.ID

Columns Pruned and Predicates Pushed

#EUeco3

Page 47: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

47

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Select A.ID

Columns Pruned and Predicates Pushed

#EUeco3

Page 48: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

All of the normal optimizations happen within this FrameWork

48

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Broadcast?

Broadcast?

Page 49: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Code Generation and Internal Rows

49

Vertex

Edge

Vertex

A: <VertexRow>

A: <VertexRow>, E: <EdgeRow>

A: <VertexRow>, E: <EdgeRow>, B: <VertexRow>

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Page 50: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrames Examples

50

Page 51: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

GraphFrame Pros ConsPros • Much Faster on basic counts • Powerful optimizations + CodeGen • Easy to connect to other sources Cons • Slower on complex traversals (2 Joins per hop) • Relational Model not as Flexible

51

Page 52: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Choosing the Right Framework

52

Page 53: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Choose TinkerPop OLAP For Long Paths

• More complicated queries • Traversals that require many hops

• g.V().out.out.out.out

• Avoid for simple counts and aggregations • Avoid if you have very high degree Vertices

53#EUeco3

Page 54: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

Choose GraphFrames for Interoperability and Short Paths

• General Edge/Vertex stats groupCount, min, max • Connecting to other sources • Short paths • High Degree Vertices

• Avoid • Long path algorithms

54#EUeco3

Page 55: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Choosing the Right Framework

55

Gremlin on Graphframes

OLTP backed by DSE Graph

Built in Spark

We write it!

Search Built In!

Advanced Security

Page 56: Tale of Two Graph Frameworks: Graph Frames and Tinkerpop

#EUeco3

Thanks for Listening

56

Datastax Academy Graph Course https://academy.datastax.com/resources/ds330-datastax-enterprise-graph

Try out Datastax Enterprise! https://academy.datastax.com/quick-downloadsApache Tinkerpophttp://tinkerpop.apache.org/ GraphFrames Link https://graphframes.github.io/