Transcript
Page 1: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  Hadoop-­‐DC,  July  2013  Joey  Echeverria  |  Director,  Federal  FTS  [email protected]  |  @fwiffo  

©2013  Cloudera,  Inc.  All  Rights  Reserved.  1

Page 2: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

HADOOP  101  

2  

Page 3: Apache Accumulo and Cloudera

OperaNng  Systems  

•  Manage  and  schedule  machine  resources  •  CPU  •  RAM  •  Memory  

•  Provide  abstracNons  and  APIs  •  Files  =  stream  of  bytes  •  Process  =  instrucNons  +  private  memory  space  

3

Page 4: Apache Accumulo and Cloudera

Distributed  OperaNng  System  

•  Same  thing,  but  over  a  cluster  of  networked  servers  •  AddiNonal  concerns:  

•  Inter-­‐process  and  inter-­‐machine  communicaNon  •  Data  locality  •  Data  availability  •  Data  processing  availability  

4

Page 5: Apache Accumulo and Cloudera

Hadoop  

•  Defacto  Distributed  OperaNng  System  •  Apache  HDFS  •  Apache  MapReduce  and  Apache  YARN  

5

Page 6: Apache Accumulo and Cloudera

Ecosystem  

6

Key  Value  Stores   High  Level  Batch  Languages  

Low  Latency  SQL  Engine  Graph  Processing  

Page 7: Apache Accumulo and Cloudera

Cloudera  

7

Page 8: Apache Accumulo and Cloudera

CDH  History  

8

CDH1    

*HDFS  *MR  *Hive  *Pig  

CDH2    

*HDFS  *MR  *Hive  *Pig  

CDH3    

*HDFS  *MR  *Hive  *Pig  *Flume  *HBase  Hue  *Mahout  *Oozie  *Sqoop  *Whirr  *Zookeeper  *Avro  

CDH4    *HDFS  *MR  *YARN  *Hive  *Pig  *Flume  *HBase  Hue  *Mahout  *Oozie  *Sqoop  *Whirr  *Zookeeper  *Avro  DataFu  HCatalog  Impala  *Solr  *BigTop  Sentry  

Page 9: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

ACCUMULO  101  AND  201  

9  

Page 10: Apache Accumulo and Cloudera

BigTable  

10

Page 11: Apache Accumulo and Cloudera

Accumulo  Data  Model  

•  MulJ-­‐dimensional  sorted  map  row id -> [ family -> [ qualifier -> [ visibility -> [ timestamp -> value ] ] ] ]

11

Page 12: Apache Accumulo and Cloudera

Accumulo  Storage  Model  

•  key  -­‐>  value  •  key  =  <row  id><column><Nmestamp>  •  column  =  <family><qualifier><visibility>  

12

Key  Value  

Row  ID  Column  

Timestamp  Family   Qualifier   Visibility  

Page 13: Apache Accumulo and Cloudera

13  

Page 14: Apache Accumulo and Cloudera

Other  Concerns  

•  Write-­‐ahead  log  •  Tablet  server  failure  handling  •  Versioning  •  Iterators  •  Cell-­‐level  security  

14

Page 15: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

PROJECT  HISTORY  

15  

Page 16: Apache Accumulo and Cloudera

Pre-­‐Apache  

16

Page 17: Apache Accumulo and Cloudera

Apache  

17

Page 18: Apache Accumulo and Cloudera

RelaNonship  to  Hadoop  Releases  

•  1.3.x  -­‐>  Hadoop  0.20.2  •  1.4.x  -­‐>  Hadoop  0.20.2,  Hadoop  0.20.203  •  1.5.x  -­‐>  Hadoop  1.0.4,  Hadoop  2.0.4-­‐alpha  

18

Page 19: Apache Accumulo and Cloudera

Accumulo  and  Cloudera  Releases  

•  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3  •  Accumulo  1.5.x  should  work  with  CDH4…  

•  Limited  tesNng  

19

Page 20: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

ANNOUNCEMENT  

20  

Page 21: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

CLOUDERA  SUPPORT  OF  APACHE  ACCUMULO  ON  CDH4  

21  

Page 22: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

DEMO  

22  

Page 23: Apache Accumulo and Cloudera

System  Logs  

•  Id  •  Unique  id  for  an  acNon  

•  Timestamp  •  Time  the  acNon  occured  

•  Actor  •  User  or  system  performing  the  acNon  

•  AcNon  •  The  acNon  taken  

•  Object  •  The  object  of  the  acNon  

•  Info  •  Free  form  informaNon  (e.g.  success/failure,  alribute  value,  etc.)  

23

Page 24: Apache Accumulo and Cloudera

AcNons  

•  created_user  •  deleted_user  •  set_password  •  logged_in  •  logged_out  •  read  •  modified  

24

Page 25: Apache Accumulo and Cloudera

Roles  

•  system  •  Any  user  on  the  system  

•  admin  •  Administrators  

•  audit  •  Auditors  

25

Page 26: Apache Accumulo and Cloudera

Accumulo  Data  Model  

26

Key  Value  

Row  ID  Column  

Timestamp  Family   Qualifier   Visibility  

<ts>-­‐<id>   <actor>   <acNon>:<object>           <info>  

Page 27: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

DEMO  

27  

Page 28: Apache Accumulo and Cloudera

Logs  Demo  

28

Row  key   Column   Visibility   Value  

201307241535-­‐1   root:created_user:sean   audit   succeeded  

201307241535-­‐1   root:set_password:sean   admin&audit   password  

201307241537-­‐2   sean:logged_in:host   system   succeeded  

201307241538-­‐3    

sean:read:/tmp/a   audit   succeeded  

201307241539-­‐4    

sean:modified:/tmp/a   audit   failed  

201307241540-­‐5    

sean:logged_out:host   system   succeeded  

Page 29: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

VERSIONS  REDUX  

29  

Page 30: Apache Accumulo and Cloudera

Recap  

•  Accumulo  1.3.x,  1.4.x,  and  1.5.x  all  work  with  CDH3  •  Accumulo  1.5.x  should  work  with  CDH4  

30

Page 31: Apache Accumulo and Cloudera

Cloudera  Support  

•  Naturally,  Cloudera  has  tested  and  packaged  Accumulo  1.5…  

•  But  1.5  is  rather  bleeding  edge…  

•  So,  we  instead  back  ported  Hadoop  2.0  support  from  1.5  onto  1.4.3  

31

Page 32: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

ECOSYSTEM  INTEGRATION  

32  

Page 33: Apache Accumulo and Cloudera

Apache  Nutch  

33

Page 34: Apache Accumulo and Cloudera

Apache  Pig  

34

Page 35: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

DEMO  

35  

Page 36: Apache Accumulo and Cloudera

Apache  Accumulo  and  Cloudera  

NEXT  STEPS  

36  

Page 37: Apache Accumulo and Cloudera

Recap  

•  What’s  available  today  •  Beta  release  of  Accumulo  1.4.3  on  CDH4.3  •  Beta  release  of  Accumulo  1.4.3  Pig  integraNon  

•  Semi-­‐private  beta  •  Contact  me  ([email protected])  if  you’re  interested  in  trying  out  the  bits  

37

Page 38: Apache Accumulo and Cloudera

Future  Ideas  (not  promises  ;)  

•  Cloudera  Manager  integraNon  •  Flume  integraNon  •  Sqoop  integraNon  •  Hive  integraNon  •  Impala  integraNon  

38

Page 39: Apache Accumulo and Cloudera

What  next?  

•  Download  Hadoop!  •  CDH  available  at  www.cloudera.com  •  Cloudera  provides  pre-­‐loaded  VMs  

•  hlps://ccp.cloudera.com/display/SUPPORT/Cloudera+QuickStart+VM  

•  Reach  out  to  me  ([email protected])  if  you  want  to  try  out  the  Accumulo  beta  

•  InstrucNons  to  replicate  the  demos  pending  

Page 40: Apache Accumulo and Cloudera

My  personal  preference  

•  Cloudera  Manager  •  hlps://ccp.cloudera.com/display/SUPPORT/Downloads  

•  Free  up  to  unlimited  nodes!  

Page 41: Apache Accumulo and Cloudera

Shout  Out  

•  Jason  Trost  •  @jason_trost  •  covert.io  blog  posts  

•  hlp://www.covert.io/post/18414889381/accumulo-­‐nutch-­‐and-­‐gora  

•  hlp://www.covert.io/post/18605091231/accumulo-­‐and-­‐pig  

Page 42: Apache Accumulo and Cloudera

QuesNons?  

•  Contact  me!  •  Joey  Echeverria  •  [email protected]  •  @fwiffo  

•  We’re  hiring!  

Page 43: Apache Accumulo and Cloudera

©2013  Cloudera,  Inc.  All  Rights  Reserved.  43


Top Related