apache accumulo and cloudera
Embed Size (px)
DESCRIPTION
TRANSCRIPT

Apache Accumulo and Cloudera Hadoop-‐DC, July 2013 Joey Echeverria | Director, Federal FTS [email protected] | @fwiffo
©2013 Cloudera, Inc. All Rights Reserved. 1

Apache Accumulo and Cloudera
HADOOP 101
2

OperaNng Systems
• Manage and schedule machine resources • CPU • RAM • Memory
• Provide abstracNons and APIs • Files = stream of bytes • Process = instrucNons + private memory space
3

Distributed OperaNng System
• Same thing, but over a cluster of networked servers • AddiNonal concerns:
• Inter-‐process and inter-‐machine communicaNon • Data locality • Data availability • Data processing availability
4

Hadoop
• Defacto Distributed OperaNng System • Apache HDFS • Apache MapReduce and Apache YARN
5

Ecosystem
6
Key Value Stores High Level Batch Languages
Low Latency SQL Engine Graph Processing

Cloudera
7

CDH History
8
CDH1
*HDFS *MR *Hive *Pig
CDH2
*HDFS *MR *Hive *Pig
CDH3
*HDFS *MR *Hive *Pig *Flume *HBase Hue *Mahout *Oozie *Sqoop *Whirr *Zookeeper *Avro
CDH4 *HDFS *MR *YARN *Hive *Pig *Flume *HBase Hue *Mahout *Oozie *Sqoop *Whirr *Zookeeper *Avro DataFu HCatalog Impala *Solr *BigTop Sentry

Apache Accumulo and Cloudera
ACCUMULO 101 AND 201
9

BigTable
10

Accumulo Data Model
• MulJ-‐dimensional sorted map row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ]
11

Accumulo Storage Model
• key -‐> value • key = <row id><column><Nmestamp> • column = <family><qualifier><visibility>
12
Key Value
Row ID Column
Timestamp Family Qualifier Visibility

13

Other Concerns
• Write-‐ahead log • Tablet server failure handling • Versioning • Iterators • Cell-‐level security
14

Apache Accumulo and Cloudera
PROJECT HISTORY
15

Pre-‐Apache
16

Apache
17

RelaNonship to Hadoop Releases
• 1.3.x -‐> Hadoop 0.20.2 • 1.4.x -‐> Hadoop 0.20.2, Hadoop 0.20.203 • 1.5.x -‐> Hadoop 1.0.4, Hadoop 2.0.4-‐alpha
18

Accumulo and Cloudera Releases
• Accumulo 1.3.x, 1.4.x, and 1.5.x all work with CDH3 • Accumulo 1.5.x should work with CDH4…
• Limited tesNng
19

Apache Accumulo and Cloudera
ANNOUNCEMENT
20

Apache Accumulo and Cloudera
CLOUDERA SUPPORT OF APACHE ACCUMULO ON CDH4
21

Apache Accumulo and Cloudera
DEMO
22

System Logs
• Id • Unique id for an acNon
• Timestamp • Time the acNon occured
• Actor • User or system performing the acNon
• AcNon • The acNon taken
• Object • The object of the acNon
• Info • Free form informaNon (e.g. success/failure, alribute value, etc.)
23

AcNons
• created_user • deleted_user • set_password • logged_in • logged_out • read • modified
24

Roles
• system • Any user on the system
• admin • Administrators
• audit • Auditors
25

Accumulo Data Model
26
Key Value
Row ID Column
Timestamp Family Qualifier Visibility
<ts>-‐<id> <actor> <acNon>:<object> <info>

Apache Accumulo and Cloudera
DEMO
27

Logs Demo
28
Row key Column Visibility Value
201307241535-‐1 root:created_user:sean audit succeeded
201307241535-‐1 root:set_password:sean admin&audit password
201307241537-‐2 sean:logged_in:host system succeeded
201307241538-‐3
sean:read:/tmp/a audit succeeded
201307241539-‐4
sean:modified:/tmp/a audit failed
201307241540-‐5
sean:logged_out:host system succeeded

Apache Accumulo and Cloudera
VERSIONS REDUX
29

Recap
• Accumulo 1.3.x, 1.4.x, and 1.5.x all work with CDH3 • Accumulo 1.5.x should work with CDH4
30

Cloudera Support
• Naturally, Cloudera has tested and packaged Accumulo 1.5…
• But 1.5 is rather bleeding edge…
• So, we instead back ported Hadoop 2.0 support from 1.5 onto 1.4.3
31

Apache Accumulo and Cloudera
ECOSYSTEM INTEGRATION
32

Apache Nutch
33

Apache Pig
34

Apache Accumulo and Cloudera
DEMO
35

Apache Accumulo and Cloudera
NEXT STEPS
36

Recap
• What’s available today • Beta release of Accumulo 1.4.3 on CDH4.3 • Beta release of Accumulo 1.4.3 Pig integraNon
• Semi-‐private beta • Contact me ([email protected]) if you’re interested in trying out the bits
37

Future Ideas (not promises ;)
• Cloudera Manager integraNon • Flume integraNon • Sqoop integraNon • Hive integraNon • Impala integraNon
38

What next?
• Download Hadoop! • CDH available at www.cloudera.com • Cloudera provides pre-‐loaded VMs
• hlps://ccp.cloudera.com/display/SUPPORT/Cloudera+QuickStart+VM
• Reach out to me ([email protected]) if you want to try out the Accumulo beta
• InstrucNons to replicate the demos pending

My personal preference
• Cloudera Manager • hlps://ccp.cloudera.com/display/SUPPORT/Downloads
• Free up to unlimited nodes!

Shout Out
• Jason Trost • @jason_trost • covert.io blog posts
• hlp://www.covert.io/post/18414889381/accumulo-‐nutch-‐and-‐gora
• hlp://www.covert.io/post/18605091231/accumulo-‐and-‐pig

©2013 Cloudera, Inc. All Rights Reserved. 43