accumulo summit 2014: open source graph analysis and visualization powered by accumulo
DESCRIPTION
Lumify is a relatively new open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing popular big data tools like Hadoop, Accumulo, and Storm, it ingests and integrates many kinds of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.TRANSCRIPT
Open Source Graph Analysis
and Visualization
Powered by Accumulo
+Jeff Kunkle June 12, 2014
is an open source big data analysis and visualization platform powered by Accumulo
Elas%cSearch
Built on Scalable Open Source Tech
Hadoop CDH 4
Accumulo
tesseract CLAVIN CMU Sphinx OpenNLP OpenCV ffmpeg
Storm
Secure Graph
custom code
secure graph a secure graph abstraction layer atop Accumulo
Row ID Column Family
Column Qualifier Value Descrip8on
V[vertex id] V -‐ -‐ Vertex existence and visibility
V[vertex id] EOUT [edge id] [label] Out edges
V[vertex id] VOUT [vertex id] [edge label] Out vertex
V[vertex id] EIN [edge id] [label] In edges
V[vertex id] VIN [vertex id] [edge label] In vertex
V[vertex id] PROP [prop name + key] [prop value] Property
V[vertex id] PROPMETA [prop name + key] [prop meta] Property Metadata
Vertices Table Format
Row ID Column Family
Column Qualifier Value Descrip8on
E[edge id] E -‐ -‐ Edge existence and visibility
E[edge id] VOUT [vertex id] -‐ Out vertex
E[edge id] VIN [vertex id] -‐ In vertex
E[edge id] PROP [prop name + key] [prop value] Property
E[edge id] PROPMETA [prop name + key] [prop meta] Property Metadata
Edges Table Format
Show key features of Lumify
Show how Accumulo is used to implement the features
graph security
sandboxed workspaces 2
1
index security 3
Key Concepts
structure for organizing information (i.e., your data model) Ontology
any “thing” you want to represent (e.g., person, place, event) Entities
a link between two entities (e.g., leader of, works for, sibling of) Relationships
data about an entity (e.g., first name, last name, date of birth) Properties
collection of entities and the relationships between them Graph
graph security
Wallmart (vertex V3)
Row ID Column Family Column Qualifier Visibility Value
V3 V -‐ U
V3 EIN E1 TS Is leader
V3 VIN V1 TS Is leader
V3 EIN E3 S works for
V3 VIN V4 S works for
V3 EOUT E2 U headquartered in
V3 VOUT V2 U headquartered in
V3 PROP name1 U Wallmart
V3 PROP founded1 S 1962-‐01-‐01
User with U, S, and TS visibility
Row ID Column Family Column Qualifier Visibility Value
V3 V -‐ U
V3 EIN E1 TS Is leader
V3 VIN V1 TS Is leader
V3 EIN E3 S works for
V3 VIN V4 S works for
V3 EOUT E2 U headquartered in
V3 VOUT V2 U headquartered in
V3 PROP name1 U Wallmart
V3 PROP founded1 S 1962-‐01-‐01
User with U and S visibility
Row ID Column Family Column Qualifier Visibility Value
V3 V -‐ U
V3 EIN E1 TS Is leader
V3 VIN V1 TS Is leader
V3 EIN E3 S works for
V3 VIN V4 S works for
V3 EOUT E2 U headquartered in
V3 VOUT V2 U headquartered in
V3 PROP name1 U Wallmart
V3 PROP founded1 S 1962-‐01-‐01
User with U visibility
Row ID Column Family Column Qualifier Visibility Value
V3 V -‐ U
V3 EIN E1 TS Is leader
V3 VIN V1 TS Is leader
V3 EIN E3 S works for
V3 VIN V4 S works for
V3 EOUT E2 U headquartered in
V3 VOUT V2 U headquartered in
V3 PROP name1 U Wallmart
V3 PROP founded1 S 1962-‐01-‐01
sandboxed workspaces
Zarka de Mexico Vertex (V3)
Row ID Column Family Column Qualifier Visibility Value
V3 V -‐ U
V3 EIN E1 TS Is leader
V3 VIN V1 TS Is leader
V3 EIN E3 S works for
V3 VIN V4 S works for
V3 EOUT E2 U headquartered in
V3 VOUT V2 U headquartered in
V3 EIN E8 S&WS1 works for
V3 VIN V8 S&WS1 works for
index security
Implemented in ElasticSearch
• Use parent/child document indexing. One document per property.
• Store visibility with indexed docs. • Custom-developed ES filter uses
Accumulo’s visibility evaluation code to filter out documents prior to query eval.
demo
Questions?
learn more at www.lumify.io
Jeff Kunkle @kunklejr
References
SecureGraph • http://securegraph.org • http://youtu.be/JMde_jFDM2M