Download - Apache Accumulo and Cloudera
Apache Accumulo and Cloudera Hadoop-‐DC, July 2013 Joey Echeverria | Director, Federal FTS [email protected] | @fwiffo
©2013 Cloudera, Inc. All Rights Reserved. 1
Apache Accumulo and Cloudera
HADOOP 101
2
OperaNng Systems
• Manage and schedule machine resources • CPU • RAM • Memory
• Provide abstracNons and APIs • Files = stream of bytes • Process = instrucNons + private memory space
3
Distributed OperaNng System
• Same thing, but over a cluster of networked servers • AddiNonal concerns:
• Inter-‐process and inter-‐machine communicaNon • Data locality • Data availability • Data processing availability
4
Hadoop
• Defacto Distributed OperaNng System • Apache HDFS • Apache MapReduce and Apache YARN
5
Ecosystem
6
Key Value Stores High Level Batch Languages
Low Latency SQL Engine Graph Processing
Cloudera
7
CDH History
8
CDH1
*HDFS *MR *Hive *Pig
CDH2
*HDFS *MR *Hive *Pig
CDH3
*HDFS *MR *Hive *Pig *Flume *HBase Hue *Mahout *Oozie *Sqoop *Whirr *Zookeeper *Avro
CDH4 *HDFS *MR *YARN *Hive *Pig *Flume *HBase Hue *Mahout *Oozie *Sqoop *Whirr *Zookeeper *Avro DataFu HCatalog Impala *Solr *BigTop Sentry
Apache Accumulo and Cloudera
ACCUMULO 101 AND 201
9
BigTable
10
Accumulo Data Model
• MulJ-‐dimensional sorted map row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ]
11
Accumulo Storage Model
• key -‐> value • key = <row id><column><Nmestamp> • column = <family><qualifier><visibility>
12
Key Value
Row ID Column
Timestamp Family Qualifier Visibility
13
Other Concerns
• Write-‐ahead log • Tablet server failure handling • Versioning • Iterators • Cell-‐level security
14
Apache Accumulo and Cloudera
PROJECT HISTORY
15
Pre-‐Apache
16
Apache
17
RelaNonship to Hadoop Releases
• 1.3.x -‐> Hadoop 0.20.2 • 1.4.x -‐> Hadoop 0.20.2, Hadoop 0.20.203 • 1.5.x -‐> Hadoop 1.0.4, Hadoop 2.0.4-‐alpha
18
Accumulo and Cloudera Releases
• Accumulo 1.3.x, 1.4.x, and 1.5.x all work with CDH3 • Accumulo 1.5.x should work with CDH4…
• Limited tesNng
19
Apache Accumulo and Cloudera
ANNOUNCEMENT
20
Apache Accumulo and Cloudera
CLOUDERA SUPPORT OF APACHE ACCUMULO ON CDH4
21
Apache Accumulo and Cloudera
DEMO
22
System Logs
• Id • Unique id for an acNon
• Timestamp • Time the acNon occured
• Actor • User or system performing the acNon
• AcNon • The acNon taken
• Object • The object of the acNon
• Info • Free form informaNon (e.g. success/failure, alribute value, etc.)
23
AcNons
• created_user • deleted_user • set_password • logged_in • logged_out • read • modified
24
Roles
• system • Any user on the system
• admin • Administrators
• audit • Auditors
25
Accumulo Data Model
26
Key Value
Row ID Column
Timestamp Family Qualifier Visibility
<ts>-‐<id> <actor> <acNon>:<object> <info>
Apache Accumulo and Cloudera
DEMO
27
Logs Demo
28
Row key Column Visibility Value
201307241535-‐1 root:created_user:sean audit succeeded
201307241535-‐1 root:set_password:sean admin&audit password
201307241537-‐2 sean:logged_in:host system succeeded
201307241538-‐3
sean:read:/tmp/a audit succeeded
201307241539-‐4
sean:modified:/tmp/a audit failed
201307241540-‐5
sean:logged_out:host system succeeded
Apache Accumulo and Cloudera
VERSIONS REDUX
29
Recap
• Accumulo 1.3.x, 1.4.x, and 1.5.x all work with CDH3 • Accumulo 1.5.x should work with CDH4
30
Cloudera Support
• Naturally, Cloudera has tested and packaged Accumulo 1.5…
• But 1.5 is rather bleeding edge…
• So, we instead back ported Hadoop 2.0 support from 1.5 onto 1.4.3
31
Apache Accumulo and Cloudera
ECOSYSTEM INTEGRATION
32
Apache Nutch
33
Apache Pig
34
Apache Accumulo and Cloudera
DEMO
35
Apache Accumulo and Cloudera
NEXT STEPS
36
Recap
• What’s available today • Beta release of Accumulo 1.4.3 on CDH4.3 • Beta release of Accumulo 1.4.3 Pig integraNon
• Semi-‐private beta • Contact me ([email protected]) if you’re interested in trying out the bits
37
Future Ideas (not promises ;)
• Cloudera Manager integraNon • Flume integraNon • Sqoop integraNon • Hive integraNon • Impala integraNon
38
What next?
• Download Hadoop! • CDH available at www.cloudera.com • Cloudera provides pre-‐loaded VMs
• hlps://ccp.cloudera.com/display/SUPPORT/Cloudera+QuickStart+VM
• Reach out to me ([email protected]) if you want to try out the Accumulo beta
• InstrucNons to replicate the demos pending
My personal preference
• Cloudera Manager • hlps://ccp.cloudera.com/display/SUPPORT/Downloads
• Free up to unlimited nodes!
Shout Out
• Jason Trost • @jason_trost • covert.io blog posts
• hlp://www.covert.io/post/18414889381/accumulo-‐nutch-‐and-‐gora
• hlp://www.covert.io/post/18605091231/accumulo-‐and-‐pig
©2013 Cloudera, Inc. All Rights Reserved. 43