Transcript
Page 1: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

sqrrl data, INC.Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Adam Fuchs, Chief Technology Officer

Page 2: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Who We are

2

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

is the commercial provider of

Mature Database Technology - Apache Accumulo

Fine-Grained Access Controls - Data Integration and Sharing

Proven Performance - Petabytes and Beyond

Advanced Analytics - Search, Statistics, and Graphs

Page 3: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Contents

Core Philosophy

Technology

Techniques

Application APIs

3

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 4: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Integration across:

Multiple business linesMultiple data setsMultiple applicationsMultiple security, privacy, legal, policy, regulatory, and compliance constraintsNew demands

Apache Accumulo Perspective

Application

Data Data Data

Application Application

4

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 5: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Accumulo Design Drivers

Scalability Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scale

2

Diverse, Interactive Analytics Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and more

3

Cell-Level Security Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharing

1

5

Secure. Scale. Adapt.

Flexible, Adaptive Schema Start with universal structures and indexing Refine the schema over time

4

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 6: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Contents

Core Philosophy

Technology

Techniques

Application APIs

6

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 7: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Accumulo Key Structure

An Accumulo key is a 5-tuple, consisting of:

Row: Controls AtomicityColumn Family: Controls Locality Column Qualifier: Controls UniquenessVisibility Label: Controls AccessTimestamp: Controls Versioning

Row Col. Fam. Col. Qual. Visibility Timestamp Value

John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute …

John Doe Test Results Cholesterol JD|PCP_JD 20120912 183

John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass

John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100…

Accumulo Key/Value Example

7

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 8: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Visibility Syntax & Semantics

8

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 9: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Tablets

9

Collections of KV pairs form Tables

Tables are partitioned into Tablets

Metadata tablets hold info about other tablets, forming a 3-level hierarchy

A Tablet is a unit of work for a Tablet Server

Root Tablet-∞ to ∞

Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”

Data Tablet-∞ : thing

Data Tabletthing : ∞

Data Tablet-∞ : Ocelot

Data TabletOcelot : Yak

Data TabletYak : ∞

Data Tablet-∞ to ∞

Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞

Well-Known Location

(zookeeper)

Table: Adam’s Table Table: Encyclopedia Table: Foo

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 10: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Accumulo Architecture

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Application

Zookeeper

Zookeeper

Zookeeper

Master

Hadoop

Read/Write

Store/Replicate

Assign/Balance

Delegate Authority

Delegate Authority

Application

Application

10

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 11: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Tablet Data Flow

In-Memory Map

Write AheadLog

(For Recovery)

Sorted, Indexed

File

Sorted, Indexed

FileSorted, Indexed

File

Tablet

ReadsIterator

TreeMinor

Compaction

Merging / Major Compaction

Iterator Tree

Writes

11

Secure. Scale. Adapt.

Iterator Tree

Scan

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 12: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Contents

Core Philosophy

Technology

Techniques

Application APIs

16

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 13: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Hierarchical Decomposition

17

Row:

Column Family:

Column Qualifier:

Value:

<person>

attribute purchases returns

age

<age>

discount

<cost>

hat

<cost>

sneakers

<40%>

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 14: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Materialized Table

18

Row:

Column Family:

Column Qualifier:

Value:

george

attribute purchases returns

age

27 $83

hat

$42

sneakers

bill

attribute purchases

40%

sneakers

$100

discount

49

age

Secure. Scale. Adapt.

Key/Value Pair

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 15: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Forward and Inverted Index

19

Table:

Row:

Column Family:

Column Qualifier:

Value:

Forward Index

<UUID>

<Type>

<Field>

<Term>

Inverted Index

<Term>

<Type> + <Field>

<UUID>

<Digest of Event>

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 16: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Forward and Inverted Index

20

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 17: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Graph Analysis

21

Table:

Row:

Column Family:

Column Qualifier:(Tuples):

Value:

Graph Table

<Node ID>

“Node Info” “Out Edges” “In Edges”

<Field>

<Value>

<Node ID>

<Edge ID>

<Edge Info>

<Node ID>

<Edge ID>

<Edge Info>

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 18: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Geospatial Queries

22

Table:

Row:

Column Family:

Column Qualifier:

Value:

Geo Index

<GeoHash>

<Event Type>

<UUID>

<Digest of Event>

Secure. Scale. Adapt.

Latitude10110101001

Longitude00111010010

101001110111010101011100001011100

Depth11010110110

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 19: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Document Partitioning

23

Table:

Row:

Column Family:

Column Qualifier(Tuples):

Value:

Shard Table

<Partition ID>

“Docs” “Inv. Index” “Field Index”

<UUID>

<Value>

<Term>

<UUID>

<Field:Term>

<UUID>

Secure. Scale. Adapt.

<Field>

“Geo”

<Hash>

<UUID>

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 20: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Document Partitioning

24

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 21: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Intersecting Iterator

26

Secure. Scale. Adapt.

‘foo’ and (‘bar’ or ‘baz’)

<Partition ID>

“Docs” “Inv. Index”

<UUID>

<Value>

<Term>

<UUID><Field>

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 22: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Contents

Core Philosophy

Technology

Techniques

Application APIs

27

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 23: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

acorn

28

Key/Value pairs are great! How do I construct a document partitioning key again?

Techniques should be built into an APILet the people have polyglotLucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range)

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

=

+

+

Page 24: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Combined IR + Graph Search

29

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 25: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Schema-less Stats

30

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 26: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Get Involved

http://accumulo.apache.org

Help us make Accumulo even better!

31

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Page 27: Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

Contact

32

Adam Fuchs, CTO

sqrrl data, Inc.617-520-4375

www.sqrrl.com@sqrrl_inc

[email protected]

Secure. Scale. Adapt.

[email protected] | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved


Top Related