enabling multimodel graphs with apache tinkerpop

31
Jason Plurad • [email protected] • @pluradj IBM Open Technology • Apache TinkerPop January 14, 2017 • Graph Day Texas • #ddtx17 #gdtx17 Enabling Multimodel Graphs with Apache TinkerPop

Upload: jason-plurad

Post on 18-Feb-2017

458 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Jason Plurad • [email protected] • @pluradjIBM Open Technology • Apache TinkerPopJanuary 14, 2017 • Graph Day Texas • #ddtx17 #gdtx17

Enabling Multimodel Graphswith Apache TinkerPop™

Agenda

Apache TinkerPopMultimodel Graphs

Graph Traversal Strategies

Provider Optimizations

On the Horizon

2 @pluradj #ddtx17 #gdtx17

Apache TinkerPop™Open source graph computing framework

Apache TinkerPop

§ Open source, vendor-agnostic,graph computing framework

§ Gremlin graph traversal language

4

Apache TinkerPop™

Maintainer Apache Software Foundation

License Apache

LatestRelease

3.2.3October 2016

https://tinkerpop.apache.org@pluradj #ddtx17 #gdtx17

Graph System Integration

5 @pluradj #ddtx17 #gdtx17

Multimodel GraphsPolyglot persistence

Multimodel Database

§ Graphs often are not alone in a data application

§ Multimodel: Combining capabilities of differentdatabase types

§ Choose the right tool for the job

§ Use graphs for highly connected data

§ Single persistence layer

7

OrientDB®

Maintainer OrientDB

License Apache

LatestRelease

2.2.14December 2016

https://orientdb.com

@pluradj #ddtx17 #gdtx17

Multimodel Platform

§ Graphs often are not alone in a data application

§ Multimodel: Combining capabilities of differentdatabase types

§ Choose the right tool for the job

§ Use graphs for highly connected data

§ Take advantage of existing storage architectures

8

DataStax Enterprise Graph

Maintainer DataStax

License Commercial

LatestRelease

5.0.5December 2016

https://datastax.com

@pluradj #ddtx17 #gdtx17

Graph Traversal StrategiesOptimizing a Gremlin traversal

Gremlin Machine:Everything Is a Traversal

§ Traversal

§ Step

§ Traverser

§ Traversal Source

§ Traversal Strategy

10 @pluradj #ddtx17 #gdtx17

explain()

§ Details on how a traversal is compiled into a final execution plan

11 @pluradj #ddtx17 #gdtx17

withStrategies() / withoutStrategies()

§ Add or remove specific traversal strategies to a traversal source

12 @pluradj #ddtx17 #gdtx17

Traversal Strategy Types

1. Decoration

2. Optimization

3. Provider Optimization

4. Finalization

5. Verification

13 @pluradj #ddtx17 #gdtx17

Decoration

§ Application-level feature that can be embedded into the traversal logic

§ Event: raise events for graph mutations

§ Partition: use partition names to restrict element reads/writes

§ Sack: use a sack to store data that gets updated as traversers split/merge

§ Subgraph: restrict element reads based on traversals

14 @pluradj #ddtx17 #gdtx17

Finalization

§ Enforce final adjustment, cleanup, or analysis required before executing the traversal

§ MatchAlgorithm: used in match() step to reorder execution plan– CountMatchAlgorithm: largest traversal reduction goes first (default)– GreedyMatchAlgorithm: traversers drain in order

§ Profile: injects profile steps into traversal to measure runtime/counts

15 @pluradj #ddtx17 #gdtx17

Verification

§ Prevent traversals that are not legal for the application or traversal engine

§ LambdaRestriction: Do not allow use of lambdas

§ ReadOnly: Do not allow graph mutations

§ StandardVerification: Vertex computing steps must be executed by agraph computer. Reducing barrier steps cannot immediately followrepeat steps.

16 @pluradj #ddtx17 #gdtx17

Optimization

§ A more efficient way to express the traversal using TinkerPop steps only

§ AdjacentToIncident: replace out().count() with outE().count()

§ IncidentToAdjacent: replace outE().inV() with out()

§ Connective: rewrites binary conjunction (and/or steps)

§ FilterRanking: reorders filter and order steps to prioritize steps that willkeep traversers small and bulkable

§ InlineFilter: removes parent filters when child traversals are pure filters

§ PathRetraction: traversers shed unneeded path information,reducing path footprint, increasing likelihood of bulking

17 @pluradj #ddtx17 #gdtx17

Provider OptimizationsGraph system-specific graph traversals

Sqlg

§ Implementation of Apache TinkerPop over RDBMS– PostgreSQL– HSQLDB (HyperSQL Database)– H2 Database Engine

§ Optimizes Gremlin by reducing the number ofcalls to the RDBMS

§ Analyze the steps and where possible combinethem into a single SqlgGraphStepCompiled orSqlgVertexStepCompiled

19

Sqlg

Maintainer Pieter Martin

License MIT

Latest Release 1.3.2November 2016

https://github.com/pietermartin/sqlg

@pluradj #ddtx17 #gdtx17

Sqlg

20 @pluradj #ddtx17 #gdtx17

TitanDB

§ Scalable graph database distributed onmulti-machine clusters

§ Pluggable storage backends– Apache Cassandra®

– Apache HBase®

§ Pluggable index backends– Apache Solr™– Elasticsearch™

21

TitanDB™

Maintainer DataStax

License Apache

Latest Release

1.0November 2015

https://titandb.io

@pluradj #ddtx17 #gdtx17

TitanDB

22 @pluradj #ddtx17 #gdtx17

TitanDB + ScyllaDB storage backend

§ Scylla is a drop-in replacement for Apache Cassandra 2.1– Higher throughput, lower latency– C++ implementation, I/O scheduler

§ Scylla on IBM Compose (beta)– https://www.compose.com/scylladb

§ Titan 1.0 compatibility starting with Scylla 1.3

23

ScyllaDB™

Maintainer ScyllaDB

License AGPL

Latest Release

1.5December 2016

https://scylladb.com

@pluradj #ddtx17 #gdtx17

IBM Graph

§ Fully-managed, Apache TinkerPop compatibleOLTP graph database

§ Focus on your data, not on install and operations

§ #sleepMore

24

IBM Graph

Maintainer IBM

License Commercial

Latest Release

GAJuly2016

https://ibm.biz/IBMGraph

@pluradj #ddtx17 #gdtx17

On the HorizonMore Apache TinkerPop-enabled providers in development

Unipop

§ Data federation and virtualization engine– Elasticsearch®

– JDBC

§ Models your data as a "virtual" graph

§ Uses Gremlin as graph query language

26

Unipop

Maintainer Sean Barzilay,Ran Magen

License Apache

Latest Release 0.2September 2016

https://github.com/unipop-graph/unipop

@pluradj #ddtx17 #gdtx17

Apache S2Graph (incubating)

§ A graph database designed for distributed andscalable management of highly interconnecteddata at web scale

§ Built with Apache HBase, Scala

§ S2Graph powers 20+ services in productionat Kakao (mobile messaging app)

§ Apache TinkerPop support coming soon[JIRA S2GRAPH-72]

27

Apache S2Graph (incubating)

Maintainer Apache Software Foundation

License Apache

Latest Release 0.1October 2016

https://s2graph.incubator.apache.org

@pluradj #ddtx17 #gdtx17

HGraphDB

§ Apache HBase as an Apache TinkerPopGraph Database

§ Allows user-supplied ids

§ Integration with Apache Giraph for OLAP

28

HGraphDB

Maintainer Robert Yokota

License Apache

Latest Release 0.4.12January 2017

https://github.com/rayokota/hgraphdb

@pluradj #ddtx17 #gdtx17

JanusGraph

§ Fork of TitanDB code base

§ Scalable graph database distributed onmulti-machine clusters with pluggable storageand indexing

§ Vendor-neutral, open community withopen governance

29

JanusGraph™

Maintainer Linux Foundation

License Apache

First Release Planned1Q 2017

https://janusgraph.org

@pluradj #ddtx17 #gdtx17

Acknowledgements

30 @pluradj #ddtx17 #gdtx17

§ The Crew from Aurelius

§ The Apache Software Foundation

§ The Linux Foundation

§ Ketrina Yim

Thank you!