performance evaluation of cloudera impala 0.6 beta with comparison to hive

12
Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1 Cloudera impala 0.6 beta Performance Evaluation (with Comparison to Hive) Mar. 6, 2013 CELLANT Corp. R&D Strategy Division Yukinori SUDA @sudabon

Upload: yukinori-suda

Post on 14-Jun-2015

5.166 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 1 1

Cloudera  impala  0.6  beta  Performance  Evaluation(with  Comparison  to  Hive)

Mar.  6,  2013CELLANT  Corp.  R&D  Strategy  Division

Yukinori  SUDA@sudabon

Page 2: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

v  ChangeLogs  from  0.5  betav Cloudera  Manager  4.5  and  CDH  4.2  support  Impala  0.6.v Support  for  the  RCFile  file  format.v Added  support  for  Impala  on  SUSE  and  Debian/Ubuntu.

v RHEL5.7/6.2  and  Centos5.7/6.2v SUSE  11  with  Service  Pack  1  or  laterv Ubuntu  10.04/12.04  and  Debian  6.03

Cloudera  impala  0.6  beta

2

Page 3: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

System  Environment

3

v  Install  via  Cloudera  Manager  Free  Edition  4.5.0

Master Slave

11  Servers

All  servers  are  connected  with  1Gbps  Ethernet  through  an  L2  switch

ActiveNameNode

DataNodeTaskTrackerImpalad

Stand-‐‑‒byNameNode

JobTrackerstatestored

3  Servers

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

DataNodeTaskTrackerImpalad

Page 4: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

v CPUl Intel  Core  2  Duo  2.13  GHz  with  Hyper  Threading

v Memoryl 4GB

v Diskl 7,200  rpm  SATA  mechanical  Hard  Disk  Drive

v OSl Cent  OS  6.2

Server  Specification

4

Page 5: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

v  Use  CDH4.2.0  +  impala  version  0.6  betav  Use  hivebench  in  open-‐‑‒sourced  benchmark  tool  “HiBench”

l  https://github.com/hibenchv  Modified  datasets  to  1/10  scale

l  Default  configuration  generates  table  with  1  billion  rowsv  Modified  query  sentence

l  Deleted  “INSERT  INTO  TABLE  …”  to  evaluate  read-‐‑‒only  performancev  Combines  a  few  Hive  storage  format  with  a  few  compression  methodl  TextFile,  SequenceFile,  RCFilel  No  compression,  Gzip,  Snappy

v  Comparison  with  job  query  latencyv  Average  job  latency  over  5  measurements

Benchmark

5

Page 6: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

•  Uservisits  table–  100  million  rows–  Table  Definitions

•  sourceIP string•  destURL string•  visitDate string•  adRevenue double•  userAgent string•  countryCode string•  languageCode string•  searchWord string•  duration int

•  Rankings  table–  12  million  rows–  Table  Definitions

•  pageURL string•  pageRank int•  avgDuration int

Modified  Datasets

6

Page 7: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

SELECT  sourceIP,  sum(adRevenue)  as  totalRevenue,  avg(pageRank)  FROM  rankings_̲t  RJOIN  (  SELECT    sourceIP,    destURL,    adRevenue  FROM    uservisits_̲t  UV  WHERE    (datediff(UV.visitDate,  '1999-‐‑‒01-‐‑‒01')>=0    AND    datediff(UV.visitDate,  '2000-‐‑‒01-‐‑‒01')<=0)  )  NUV

ON  (R.pageURL  =  NUV.destURL)group  by  sourceIPorder  by  totalRevenue  DESClimit  1;

Modified  Query

7

Page 8: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

Benchmark  Result  (Hive)

8

0 50 100 150 200 250

No  Comp.

Gzip

Snappy

Gzip

Snappy

TextFile

SequenceFile

RCFile

235.843

227.883

213.616

234.289

197.894

Avg.  Job  Latency  [sec]

Page 9: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

Benchmark  Result  (impala)

9

0 50 100 150 200 250

No  Comp.

Gzip

Snappy

Gzip

Snappy

TextFile

SequenceFile

RCFile

32.776

21.25

17.725

17.03

16.059

Avg.  Job  Latency  [sec]

Page 10: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

job TextFile SequenceFile RCFile

No Comp. Gzip Snappy Gzip Snappy 1st 50.256 23.692 22.085 18.475 20.042 2nd 34.905 20.710 19.733 16.690 18.859 3rd 30.752 20.604 15.608 16.620 16.642 4th 26.848 20.625 15.602 16.617 12.148 5th 21.121 20.620 15.597 16.747 12.606

Average 32.776 21.250 17.725 17.030 16.059

Block  Location  Cache  effect  ?

10

v  1st  job  is  the  slowest,  and  the  fastest  job  is  one  of  the  others  due  to  Block  Location  Cache  effect?

Page 11: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p /

v Impala  is  over  10  times  faster  than  MRv1  +  Hive

v Specifically,l Impala  0.6  beta

• RCFile  compressed  as  Snappy:  16.059  secl MRv1  +  Hive  0.10

• RCFile  compressed  as  Snappy:  197.894  secv Hope  that  impala  GA  included  in  CDH5  makes  fasterl Support  Trevni  columner  formatl Optimized  Query  Planner

Conclusion

11

Page 12: Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

Copyright © CELLANT Corp. All Rights Reserved. h t t p : / / w w w . c e l l a n t . j p / 12

Thanks.