accumulo summit 2014: accumulo backed tinkerpop implementation
DESCRIPTION
As graph processing grows as a field, eventually standards will be created. The TinkerPop graph processing stack is one such potential standard. The TinkerPop stack contains an algorithm engine, a scripting engine and a RESTful service for accessing graphs. At the base of TinkerPop is Blueprints; an interface for accessing and creating property graphs. Blueprints has already been implemented with several different backing technologies (e.g., relational databases, RDF triple stores, graph databases) and implementations (e.g., JDBC-based, OpenRDF Sail, and Neo4j). This presentation will discuss our implementation of the Blueprints API backed by Accumulo to enable storage of arbitrarily large, distributed graphs. Our implementation falls between the extremes of distributed graph processing systems which require the entire graph fit within the available RAM of the cluster and batch-oriented systems that incur significant disk I/O costs during execution and generally handle iterative algorithms poorly. We will discuss the benefits of supporting the TinkerPop API and the design and performance trade-offs we faced when developing the Accumulo backend and integrating with the Hadoop MapReduce framework. We aim to merge the advantages of the TinkerPop software ecosystem with the scalability and fault-tolerance of Accumulo and provide a robust, turn-key solution for certain classes of large-scale, graph-related challenges.TRANSCRIPT
![Page 2: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/2.jpg)
2
Agenda
Introduction to TinkerPop Detailed Implementation Obstacles Overcoming Obstacles Map Reduce Integration Performance
![Page 3: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/3.jpg)
3
Background
Associate Professional at The Johns Hopkins Applied Physics Laboratory
Bachelors of Science in Computer Science with a minor in Mathematics from the University of Delaware
Pursing a Masters in Computer Science with a focus on Distributed Systems at the Whiting School of Engineering
![Page 4: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/4.jpg)
4
TinkerPop Blueprints
Foundational technology for a complete graph stack
Extensive test suite to ensure implementations follow all the rules required.
Only a simple API getVertex getEdge setProperty getProperty
Multiple Interfaces with incremental features
![Page 5: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/5.jpg)
5
TinkerPop Blueprints Graph API
![Page 6: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/6.jpg)
6
Graph Creation
Configuration cfg = new AccumuloGraphConfiguration()
.instance("accumulo").user("user").zkHosts("zk1")
.password("password".getBytes()).name("myGraph");
Graph graph = GraphFactory.open(cfg);
Vertex v1 = graph.addVertex("1");
v1.setProperty("name", "Alice");
Vertex v2 = graph.addVertex("2");
v2.setProperty("name", "Bob");
Edge e1 = graph.addEdge("E1", v1, v2, "knows");
e1.setProperty("since", new Date());
![Page 7: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/7.jpg)
7
Trade off Spectrum
Consistency
Performance
![Page 8: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/8.jpg)
8
Accumulo Implementation
Base Naïve implementation passes all required TinkerPop tests Far Right of the spectrum
As consistent as you can get
Table Structure Edge and Vertex Edge and Vertex Index table Metadata Table for indexes
![Page 9: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/9.jpg)
9
Table Structure
Vertex
Edge
Row ID Column Family Column Qualifier Value
VertexID Label Flag Exists Flag [empty]
VertexID INVERTEX OutVertexID_EdgeID Edge Label
VertexID OUTVERTEX InVertexID_EdgeID Edge Label
VertexID Property Key [empty] Serialized Value
Row ID Column Family Column Qualifier Value
EdgeID Label Flag InVertexID_OutVertexID Edge Label
EdgeID Property Key [empty] Serialized Value
![Page 10: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/10.jpg)
10
Graph Access and Index Creation/Use
// Access before Index
for (Vertex v: graph.getVertices()) {
String name = v.getProperty("name");
}
((KeyIndexableGraph)graph)
.createKeyIndex("name", Vertex.class);
// Access after Index
for (Vertex v: graph.getVertices()) {
String name = v.getProperty("name");
}
![Page 11: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/11.jpg)
11
Table Structure - Continued
Indexes
Metadata
Row Column Family Column Qualifier Value
Serialized Value Property Key VertexID [empty]
Row Column Family Column Qualifier Value
Index Name Index Class [empty] [empty]
![Page 12: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/12.jpg)
12
Obstacles
Existence checking is expensive Required for TinkerPop test suite
Writing every graph object out is expensive Building indexes post ingest is expensive
Blocking, full table scan
Consistency is expensive
![Page 13: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/13.jpg)
13
Overcoming Obstacles
Give more power to users who know they are using an Accumulo Graph Ingest Improvements
Give option to disable existence checks Allow manual batching Specialized Ingest path
Traversal Improvements Attribute preloading Property caching Element caching
![Page 14: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/14.jpg)
14
Simple Bulk Ingest
// Will migrate to BatchGraph
AccumuloBulkIngester g = new AccumuloBulkIngester(cfg);
PropertyBuilder v1 = g.addVertex("ID1");
PropertyBuilder v2 = g.addVertex("ID2");
PropertyBuilder edge = g.addEdge("ID1", "ID2", "knows");
v1.add("name", "alice");
v2.add("name", "bob");
edge.add("since", new Date());
![Page 15: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/15.jpg)
15
Map Reduce Integration
In your Tool
j.setInputFormatClass(VertexInputFormat.class);
VertexInputFormat.setAccumuloGraphConfiguration(
new AccumuloGraphConfiguration()
.instance(“accumulo").zkHosts(“zk1").user("root")
.password(“secret".getBytes()).name("myGraph"));
In your Mapper
public void map(Text k, Vertex v, Context c){
System.out.println(v.getId().toString());
}
![Page 16: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/16.jpg)
16
Results
2 Nodes 4 Nodes 8 Nodes
20 Hours 9 Minutes 13 Hours 47 Minutes 7 Hours 4 Minutes
Cluster Stats8 Node Cluster64 GB RamQuad-Core Xeon Processor 2.50GHz 10MB 2x 4 TB 6.0Gb/s 7200 RPM Drives1 Gb/s Networking
Accumulo 1.5.1, Hadoop 2.0.0 – MR1
Stanford SNAP Friendster Graph65,608,366 Vertices1,806,067,135 Edges
2 Nodes 4 Nodes 8 Nodes
55 Minutes 29 Minutes 15 Minutes
Vertex Iteration
Ingest
![Page 17: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/17.jpg)
17
Conclusion
Simple, easy to read graph API Give developers a lot of tuning points for their implementations Performance is “good enough”
Not meant for high performance, specialized solutions
Quick to develop new ideas and investigate your graph. Easy to integrate and already integrated.
Low effort to get REST access to your graph
![Page 18: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/18.jpg)
18
Future
Polish and open source Iterators Locality Groups Addressing Security Graph Query Extending MapReduce Integration Upgrading to Accumulo 1.6, TinkerPop 2.5
Conditional Mutations Table namespaces
![Page 19: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/19.jpg)
19
Resources
http://www.tinkerpop.com/ http://snap.stanford.edu/data/com-Friendster.html
![Page 20: Accumulo Summit 2014: Accumulo backed Tinkerpop Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c664104a79591e088b45d7/html5/thumbnails/20.jpg)