datastax | graph computing with apache tinkerpop (marko rodriguez) | cassandra summit 2016

85
Gremlin’s Graph Traversal Machinery Dr. Marko A. Rodriguez Director of Engineering at DataStax, Inc. Project Management Committee, Apache TinkerPop http://tinkerpop.apache.org

Upload: datastax

Post on 06-Jan-2017

257 views

Category:

Software


4 download

TRANSCRIPT

Page 1: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Gremlin’s Graph Traversal Machinery

Dr. Marko A. RodriguezDirector of Engineering at DataStax, Inc.

Project Management Committee, Apache TinkerPop

http://tinkerpop.apache.org

Page 2: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f : X ! X

The function f is a process that maps a structure of type X to a structure of type X.

Page 3: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x1) = x2

The function f maps the object x1 (from the set of X) to the object x2 (from the set of X).x1 2 X x2 2 X

Page 4: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x)

A step is a function.

Page 5: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x)

Assume that this step rotates an X by 90°.

90°

Page 6: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

The algorithm of the step is a “black box.”

Page 7: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

A traverser wraps a value of type V.

class Traverser<V> { V value;}

class Traverser<V> { V value;}

Page 8: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

The step maps an integer traverser to an integer traverser.

class Traverser<V> { V value;}

class Traverser<V> { V value;}

Traverser<Integer> Traverser<Integer>

Page 9: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

A traverser of with a rotation of 0° becomes a traverser with a rotation of 90°.

Traverser(0) Traverser(90)

class Traverser<V> { V value;}

class Traverser<V> { V value;}

Page 10: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

Page 11: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Both the input and output traversers are of type Traverser<Integer>.

90°

T [N] ! T [N]

2 T [N]2 T [N]

Page 12: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

A stream of input traversers…

90°

Page 13: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

…yields a stream of output traversers.

90°

Page 14: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

A traverser can have a bulk which denotes how many V values it represents.

90°

class Traverser<V> { V value; long bulk;}

class Traverser<V> { V value; long bulk;}

Page 15: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

44

Bulking groups identical traversers to reduce the number of evaluations of a step.

90°

class Traverser<V> { V value; long bulk;}

class Traverser<V> { V value; long bulk;}

Page 16: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

A variegated stream of input traversers yields a variegated stream of output traversers.

90°

Page 17: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

1

21 1

12

Bulking can reduce the size of the stream.

90°

class Traverser<V> { V value; long bulk;}

class Traverser<V> { V value; long bulk;}

Page 18: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

If the order of the stream does not matter…

90°

Page 19: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

…then reordering can increase the likelihood of bulking.

90°1

21

total bulk = 4total count = 4

total bulk = 4total count = 3

Page 20: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90°

180°

270°

360°

A traversal is a list of one or more steps.

90° 90° 90°

90°

Page 21: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f : X ! Y

Different functions can yield different mappings between different domains and ranges.

h : Y ! Z

Page 22: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

The output of f can be the input to h because the range of f is the domain of h (i.e. Y).

Y y = f(x)

Z z = h(y)

Page 23: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

y = f(x)

z = h(y)

Page 24: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

y = f(x)

z = h(y)

Page 25: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x) = y

h(y) = z

Page 26: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x) = y

h(y) = z

Page 27: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x) = yh(y) = z

Page 28: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f(x) h = z

Page 29: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f h = zx

Page 30: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

f h = zx · ·

Page 31: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

x · f · h = z

h(f(x)) = z

readab

le

unread

able≣

Page 32: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

z = x · f · h

Page 33: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

z = x.f().h().next()

z = x · f · h

stream/iterator/traversalhead/start

Page 34: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90° 225°

315°

Different types of steps exist and their various compositions yield query expressivity.

Page 35: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

90° 225°

315°

Steps process traverser streams/flows and their composition is a traversal/iterator.

= . (). ().next()90° 225°

Page 36: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

repeat( ).times(2)

180°

Anonymous traversals can serve as step arguments.

90°

traversal with a single step

90°

Page 37: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

repeat( ).times(2)

180°

Some functions serve as step-modulators instead of steps.

90°

…groupCount().by(out().count())

…select(“a”,”b”).by(“name”)

…addE(“knows

”).from(“a”)

.to(“b”)

…order().by(“age”,decr)

Page 38: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

repeat( ).times(2)

90° 90°

180°

During optimization, traversal strategies may rewrite a traversal to a more efficient,semantically equivalent form.

90°

RepeatUnrollStrategy

interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost();}

Page 39: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

repeat( ).until(0°)90°

Continuously process a traverser until some condition is met.

Page 40: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

repeat( ).until(0°)

4 loops2 loops1 loop

90° 180° 360°

90°do while(x != )

repeat().until() provides do/while-semantics.

90°

Page 41: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

until(0°).repeat( )

0 loops2 loops1 loop

90° 180° 0°

90°dowhile(x != )

90°

until().repeat() provides while/do-semantics.

Page 42: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

until(0°).repeat( )3

Even if the traversers in the input stream can not be bulked, the output traversers may be able to be.

90°

Page 43: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same

m

Page 44: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same

out(“knows”)

has(“name”,”gremlin”)

groupCount(“m”)

where(“a”,eq(“b”))

select(“a”,”b”)

path()

mean()

sum()

count()

groupCount()man

y-to

-one

(red

ucer

s)

order()

values(“age”)

values(“name”)and(x,y)

or(x,y)

coin(0.5)

sample(10)

simplePath()

dedup()

store(“m”)

tree(“m”)

subgraph(“m”)

in(“created”)

group(“m”)

m

label()

id()

* A collection of examples. Not the full step library.

match(x,y,z)

properties()

outE(“knows”)

V()

Page 45: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

values(“age”)

filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same

out(“knows”)

has(“name”,”gremlin”)

path()

groupCount()

simplePath()

m

label()

outE(“knows”)

group(“m”)

V()

Page 46: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

T [V [ E] ! T [N+]values(“age”)

filter(x)map(x) sideEffect(x)flatMap(x)one-to-one one-to-many one-to-(one-or-none) one-to-same

out(“knows”)

has(“name”,”gremlin”)

path()

groupCount()

simplePath()

m

label()

outE(“knows”)

T [?] ! T [path]

T [V [ E] ! T [string]T [?] ! T [?]

T [V [ E] ! ; [ T [V [ E]

T [?] ! ; [ T [?]

T [V ] ! T [V ]⇤

T [V ] ! T [E]⇤group(“m”)

V()T [?] ! T [map[?,N+]]T [G] ! T [V ]⇤

Page 47: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

T [V [ E] ! T [N+]values(“age”)

out(“knows”)

has(“name”,”gremlin”)

groupCount()

T [V [ E] ! ; [ T [V [ E]

T [V ] ! T [V ]⇤

V()T [?] ! T [map[?,N+]]T [G] ! T [V ]⇤

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

Page 48: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

T [V [ E] ! T [N+]values(“age”)

out(“knows”)

has(“name”,”gremlin”)

groupCount()

T [V ] ! T [V ]⇤

V()

T [?] ! T [map[?,N+]]

T [G] ! T [V ]⇤

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

Steps can be composed if their respective domains and ranges match.

T [V [ E] ! ; [ T [V [ E]

Page 49: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

values(“age”)

out(“knows”)

has(“name”,”gremlin”)

groupCount()

T [V ] ! T [V ]⇤

V() T [G] ! T [V ]⇤

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

T [V ] ! ; [ T [V ]

T [V ] ! T [N+]

T [N+] ! T [map[N+,N+]]

Page 50: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

What is the distribution of ages of the people that Gremlin knows?

Page 51: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices (flatMap)

Page 52: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices (flatMap) one vertex

to that vertex or no vertex(filter)

?…

Page 53: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices (flatMap) one vertex

to that vertex or no vertex(filter)

one vertexto many friend vertices

(flatMap)

?…

name=gremlin

Page 54: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices (flatMap) one vertex

to that vertex or no vertex(filter)

one vertexto many friend vertices

(flatMap)

one vertex to one age value

(map)

?…

37

name=gremlin

Page 55: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices (flatMap) one vertex

to that vertex or no vertex(filter)

one vertexto many friend vertices

(flatMap)

one vertex to one age value

(map)many age values

to an age distribution(map — reducer)

?…

37 [37:2, 41:1, 24:1, 35:4]37

37

2435

3535

35 41

name=gremlin

Page 56: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

The Gremlin Traversal Language

The Gremlin Traversal Machine

Page 57: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

a b c

a b c

Traversal creation via step composition

Step parameterization via traversal and constant nesting

a().b().c()

a(b().c()).d(x)d(x)

functi

on compositi

on

functi

on nest

ing

fluen

t meth

ods

method ar

gumen

ts

Any language that supports function composition and function nesting can host Gremlin.

Gremlin Traversal Language

Page 58: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

class Traverser<V> { V value; long bulk;}

class Step<S,E> { Traverser<E> processNextStart();}

f(x)

class Traversal<S,E> implements Iterator<E> { E next(); Traverser<E> nextTraverser();}

The fundamental constructs of Gremlin’s machinery.

Gremlin Traversal Machine

interface TraversalStrategy { void apply(Traversal traversal); Set<TraversalStrategy> applyPrior(); Set<TraversalStrategy> applyPost();}

a db c

a de

Page 59: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

1

Bytecode

Gremlin-Javag.V(1). repeat(out(“knows”)).times(2). groupCount().by(“age”)

[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]

Traversal GraphStep GroupCountStep

RepeatStep

VertexStep

GraphStep GroupCountStepVertexStep VertexStep

TraversalStrategies

GraphStep GroupCountStepVertexStep VertexStep

translates

compiles

optimizes

executes

[29:2, 30:1, 31:1, 35:10]

Page 60: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

1

Bytecode

Gremlin-Pythong.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’)

[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]

Traversal GraphStep GroupCountStep

RepeatStep

VertexStep

GraphStep GroupCountStepVertexStep VertexStep

TraversalStrategies

GraphStep GroupCountStepVertexStep VertexStep

translates

compiles

optimizes

executes

[29:2, 30:1, 31:1, 35:10]

Page 61: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

1

Bytecode

Gremlin-Pythong.V(1). repeat(out(‘knows’)).times(2). groupCount().by(‘age’)

[[V, 1][repeat, [[out, knows]]][times, 2][groupCount][by, age]]

Traversal GraphStep GroupCountStep

RepeatStep

VertexStep

GraphStep GroupCountStepVertexStep VertexStep

TraversalStrategies

GraphStep GroupCountStepVertexStep VertexStep

translates

compiles

optimizes

executes

[29:2, 30:1, 31:1, 35:10]

Gremlin language variant

Language agnostic bytecode

Execution engine assembly

Execution engine optimization

Execution engine evaluation

Lang

uage

Mac

hine

Page 62: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Lang

uage

Mac

hine

Gremlin-Python

Gremlin-Groovy

Gremlin-Java

Gremlin-JavaScriptGremlin-Ruby

Gremlin-Scala

Gremlin-ClojureBytecode

Java-based Implementation?

?-based Implementation

? ?

Page 63: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

Gremlin-Python

CPython

Page 64: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))

Gremlin-Python

DriverRemoteConnection

CPython

Page 65: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))# nested traversal with Python slicing and attribute interception extensions

>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()[u'marko', u'marko']

Gremlin-Python

Bytecode

DriverRemoteConnection

Gremlin Traversal MachineCPython JVM

Page 66: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Python 2.7.2 (default, Oct 11 2012, 20:14:37)[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from gremlin_python.structure.graph import Graph>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection>>> graph = Graph()>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))# nested traversal with Python slicing and attribute interception extensions

>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()[u'marko', u'marko']# a complex, nested multi-line traversal

>>> g.V().match( \... as_(“a”).out("created").as_(“b”), \... as_(“b”).in_(“created").as_(“c”), \... as_(“a”).out("knows").as_(“c”)). \... select("c"). \... union(in_(“knows"),out("created")). \... name.toList()[u'ripple', u'marko', u'lop']>>>

Gremlin-Python

Bytecode

DriverRemoteConnection

Gremlin Traversal MachineCPython JVM

Page 67: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Cypher

Bytecode

Distinct query languages (not only Gremlin language variants) can generate bytecode for evaluation by any OLTP/OLAP TinkerPop-enabled graph system.

Gremlin Traversal Machine

Page 68: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

SELECT DISTINCT ?cWHERE { ?a v:label "person" . ?a e:created ?b . ?b v:name ?c . ?a v:age ?d . FILTER (?d > 30)}

[ [V], [match, [[as, a], [hasLabel, person]], [[as, a], [out, created], [as, b]], [[as, a], [has, age, gt(30)]]], [select, b], [by, name], [dedup]]

Byteco

de

Page 69: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

GraphDatabase

OLTP

GraphProcessor

OLAP

Bytecode Byteco

de

BytecodeTinkerGraphDSE Graph

TitanNeo4j

OrientDBIBM Graph

TinkerComputerSparkGiraph

HadoopFulgora

Gremlin Traversal Machine

Traversal Trave

rsal

Traversal

Gremlin traversals can be executed against OLTP graph databases and OLAP graph processors.That means that if, e.g., SPARQL compiles to bytecode, it can execute both OLTP and OLAP.

Page 70: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

GraphDatabase

OLTP

GraphProcessor

OLAP

Gremlin Traversal Machine

Traversal Trave

rsal

TraversalTraversalStrategies TraversalStrategies

optimizes

Graph system providers (OLTP/OLAP) typically have custom strategies registered with the Gremlin traversal machine that take advantage of unique, provider-specific features.

Page 71: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Most OLTP graph systems have a traversal strategy that combines [V,has*]-sequences into a single global index-based flatMap-step.

g.V().has(“name”,”gremlin”). out(“knows”).values(“age”). groupCount()

one graph to many vertices using index lookup (flatMap)

GraphStepStrategy

one graph to many vertices (flatMap)

one vertex to that vertex or no vertex(filter)

?…

compiles

optimizes

name=gremlin

DataStax Enterprise Graph

Page 72: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Most OLAP graph systems have a traversal strategy that bypasses Traversal semanticsand implements reducers using the native API of the system.

g.V().count()

one graph to long (map — reducer)

rdd.count() 12,146,934

compiles

one graph to many vertices (flatMap)

many vertices to long(map — reducer)

… 12,146,934

optimizes

SparkInterceptorStrategy

Page 73: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Physical Machine

DataProgram Traversal

Heap/DiskMemory MemoryMemory/Graph System

Physical Machine Java Virtual Machine

byte

code

step

s

DataProgram

Memory/DiskMemory

Physical Machine

inst

ruct

ions

JavaVirtual Machine

GremlinTraversal Machine

From the physical computing machine to the Gremlin traversal machine.

Page 74: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Language ProvidersGremlin Language VariantDistinct Query Language

Application Developers Graph System Providers

OLAP ProviderOLTP Provider

Page 75: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Application Developers

One query language for all OLTP/OLAP systems. Gremlin

G = (V,E)

Real-time and analytic queries are represented in Gremlin.

GraphDatabase

OLTP

GraphProcessor

OLAP

Page 76: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Application Developers

One query language for all OLTP/OLAP systems.

No vendor lock-in.

Apache TinkerPop as the JDBC for graphs.

DataStax Enterprise Graph

Page 77: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Application Developers

One query language for all OLTP/OLAP systems.

No vendor lock-in.

Gremlin is embedded in the developer’s language.

Iterator<String> result = g.V().hasLabel(“person”). order().by(“age”). limit(10).values(“name”)

vs.

ResultSet result = statement.executeQuery( “SELECT name FROM People \n” + “ ORDER BY age \n” + “ LIMIT 10”)

Gremlin

-Java

SQL i

n Jav

a

No “fat strings.” The developer writes their graph database/processor queries in their native programming language.

Page 78: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Language ProvidersGremlin Language VariantDistinct Query Language

Easy to generate bytecode.

GraphTraversal.getMethods() .findAll { GraphTraversal.class == it.returnType } .collect { it.name } .unique() .each { pythonClass.append(""" def ${it}(self, *args): self.bytecode.add_step(“${it}”, *args) return self“””)}

Gremlin-Python’s source code is programmatically generated using Java reflection.

Page 79: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Language ProvidersGremlin Language VariantDistinct Query Language

Easy to generate bytecode.

Bytecode executes against TinkerPop-enabled systems.

Language providers write a translator for the Gremlin traversal machine,not a particular graph database/processor.

DataStax Enterprise Graph

Page 80: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

GraphDatabase

OLTP

GraphProcessor

OLAP

Stakeholders

Language ProvidersGremlin Language VariantDistinct Query Language

Easy to generate bytecode.

Bytecode executes against TinkerPop-enabled systems.

Provider can focus on design, not evaluation.

Gremlin Traversal Machine

The language designer does not have to concern themselves with OLTP or OLAP execution. They simply generate bytecode and the

Gremlin traversal machine handles the rest.

Page 81: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Easy to implement core interfaces.

Graph System Providers

Stakeholders

OLAP ProviderOLTP Provider

Vertex

Edge

Graph

Transaction

TraversalStrategy

Propertykey=value

?? ?

Page 82: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Provider supports all provided languages.

Easy to implement core interfaces.

Graph System Providers

Stakeholders

OLAP ProviderOLTP Provider

The provider automatically supports all query languages that have compilers that generate Gremlin bytecode.

Page 83: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

OLTP providers can leverage existing OLAP systems.

Provider supports all provided languages.

Easy to implement core interfaces.

Graph System Providers

Stakeholders

OLAP ProviderOLTP Provider

DSE Graph leverages SparkGraphComputer for OLAP processing.

DataStax Enterprise Graph

Page 84: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Stakeholders

Language ProvidersGremlin Language VariantDistinct Query Language

Application Developers Graph System Providers

OLAP ProviderOLTP Provider

One query language for all OLTP/OLAP systems.

No vendor lock-in.

Gremlin is embedded in the developer’s language.

Easy to generate bytecode.

Bytecode executes against TinkerPop-enabled systems.

Provider can focus on design, not evaluation.

Easy to implement core interfaces.

Provider supports all provided languages.

OLTP providers can leverage existing OLAP systems.

Page 85: DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Thank you.

http://tinkerpop.apache.orghttp://www.datastax.com/products/datastax-enterprise-graph