accumulo summit 2014: open source graph analysis and visualization powered by accumulo

20

Click here to load reader

Upload: accumulo-summit

Post on 01-Dec-2014

391 views

Category:

Technology


2 download

DESCRIPTION

Lumify is a relatively new open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing popular big data tools like Hadoop, Accumulo, and Storm, it ingests and integrates many kinds of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.

TRANSCRIPT

Page 1: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Open Source Graph Analysis

and Visualization

Powered by Accumulo

+Jeff Kunkle June 12, 2014

Page 2: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

is an open source big data analysis and visualization platform powered by Accumulo

Page 3: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Elas%cSearch  

Built on Scalable Open Source Tech

Hadoop  CDH  4  

Accumulo  

tesseract  CLAVIN   CMU  Sphinx  OpenNLP   OpenCV   ffmpeg  

Storm  

Secure  Graph  

custom  code  

Page 4: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

secure graph a secure graph abstraction layer atop Accumulo

Page 5: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Row  ID   Column  Family  

Column  Qualifier   Value   Descrip8on  

V[vertex  id]   V   -­‐   -­‐   Vertex  existence  and  visibility  

V[vertex  id]   EOUT   [edge  id]   [label]   Out  edges  

V[vertex  id]   VOUT   [vertex  id]   [edge  label]   Out  vertex  

V[vertex  id]   EIN   [edge  id]   [label]   In  edges  

V[vertex  id]   VIN   [vertex  id]   [edge  label]   In  vertex  

V[vertex  id]   PROP   [prop  name  +  key]   [prop  value]   Property  

V[vertex  id]   PROPMETA   [prop  name  +  key]   [prop  meta]   Property  Metadata  

Vertices Table Format

Page 6: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Row  ID   Column  Family  

Column  Qualifier   Value   Descrip8on  

E[edge  id]   E   -­‐   -­‐   Edge  existence  and  visibility  

E[edge  id]   VOUT   [vertex  id]   -­‐   Out  vertex  

E[edge  id]   VIN   [vertex  id]   -­‐   In  vertex  

E[edge  id]   PROP   [prop  name  +  key]   [prop  value]   Property  

E[edge  id]   PROPMETA   [prop  name  +  key]   [prop  meta]   Property  Metadata  

Edges Table Format

Page 7: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Show key features of Lumify

Show how Accumulo is used to implement the features

Page 8: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

graph security

sandboxed workspaces 2

1

index security 3

Page 9: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Key Concepts

structure for organizing information (i.e., your data model) Ontology

any “thing” you want to represent (e.g., person, place, event) Entities

a link between two entities (e.g., leader of, works for, sibling of) Relationships

data about an entity (e.g., first name, last name, date of birth) Properties

collection of entities and the relationships between them Graph

Page 10: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

graph security

Page 11: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Wallmart (vertex V3)

Row  ID   Column  Family   Column  Qualifier   Visibility   Value  

V3   V   -­‐   U  

V3   EIN   E1   TS   Is  leader  

V3   VIN   V1   TS   Is  leader  

V3   EIN   E3   S   works  for  

V3   VIN   V4   S   works  for  

V3   EOUT   E2   U   headquartered  in  

V3   VOUT   V2   U   headquartered  in  

V3   PROP   name1   U   Wallmart  

V3   PROP   founded1   S   1962-­‐01-­‐01  

Page 12: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

User with U, S, and TS visibility

Row  ID   Column  Family   Column  Qualifier   Visibility   Value  

V3   V   -­‐   U  

V3   EIN   E1   TS   Is  leader  

V3   VIN   V1   TS   Is  leader  

V3   EIN   E3   S   works  for  

V3   VIN   V4   S   works  for  

V3   EOUT   E2   U   headquartered  in  

V3   VOUT   V2   U   headquartered  in  

V3   PROP   name1   U   Wallmart  

V3   PROP   founded1   S   1962-­‐01-­‐01  

Page 13: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

User with U and S visibility

Row  ID   Column  Family   Column  Qualifier   Visibility   Value  

V3   V   -­‐   U  

V3   EIN   E1   TS   Is  leader  

V3   VIN   V1   TS   Is  leader  

V3   EIN   E3   S   works  for  

V3   VIN   V4   S   works  for  

V3   EOUT   E2   U   headquartered  in  

V3   VOUT   V2   U   headquartered  in  

V3   PROP   name1   U   Wallmart  

V3   PROP   founded1   S   1962-­‐01-­‐01  

Page 14: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

User with U visibility

Row  ID   Column  Family   Column  Qualifier   Visibility   Value  

V3   V   -­‐   U  

V3   EIN   E1   TS   Is  leader  

V3   VIN   V1   TS   Is  leader  

V3   EIN   E3   S   works  for  

V3   VIN   V4   S   works  for  

V3   EOUT   E2   U   headquartered  in  

V3   VOUT   V2   U   headquartered  in  

V3   PROP   name1   U   Wallmart  

V3   PROP   founded1   S   1962-­‐01-­‐01  

Page 15: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

sandboxed workspaces

Page 16: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Zarka de Mexico Vertex (V3)

Row  ID   Column  Family   Column  Qualifier   Visibility   Value  

V3   V   -­‐   U  

V3   EIN   E1   TS   Is  leader  

V3   VIN   V1   TS   Is  leader  

V3   EIN   E3   S   works  for  

V3   VIN   V4   S   works  for  

V3   EOUT   E2   U   headquartered  in  

V3   VOUT   V2   U   headquartered  in  

V3   EIN   E8   S&WS1   works  for  

V3   VIN   V8   S&WS1   works  for  

Page 17: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

index security

Page 18: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Implemented in ElasticSearch

•  Use parent/child document indexing. One document per property.

•  Store visibility with indexed docs. •  Custom-developed ES filter uses

Accumulo’s visibility evaluation code to filter out documents prior to query eval.

Page 19: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

demo

Page 20: Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by Accumulo

Questions?

learn more at www.lumify.io

Jeff Kunkle @kunklejr

References

SecureGraph •  http://securegraph.org •  http://youtu.be/JMde_jFDM2M