accumulo summit 2014: benchmarking accumulo: how fast is fast?

Click here to load reader

Post on 15-Jan-2015

760 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Speaker: Mike Drob Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

TRANSCRIPT

  • 1. 1 Benchmarking Accumulo: How Fast is Fast? Mike Drob Software Engineer, Cloudera

2. Me Cloudera Engineer Accumulo Committer Perpetual Tinkerer 2 Victor Grigas CC-BY-SA 3.0 3. Agenda Methodology Accumulo 1.4 to 1.6 Accumulo to HBase Conclusions 3 Reuvenk CC-BY-SA 2.5 4. Methodology Measuring Performance Task Latency (time) Throughput (bps) Workloads Read Write Mixed 4 AngMoKio CC-BY-SA 2.5 5. Methodology Yahoo! Cloud Serving Benchmark Workloads Connectors Highly configurable # of Rows/Columns Size of Value # of Threads Parallelizable number of clients 5 Sfoskett CC BY-SA 3.0 6. 6 Accumulo across versions 7. Accumulo across versions Accumulo 1.4.4-cdh4.5.0 Accumulo 1.6.0-cdh4.6.0-beta-1 YCSB 0.14+50 80 node cluster 10 clients 5 racks 7 Public Domain via USAF 8. Accumulo across versions The Data: 200 GB 2k Columns Pre-Split Table 80x Vary # of rows Vary value size (we actually did a lot more, but it was hard to graph) 8 Morio CC BY-SA 3.0 9. Accumulo across versions 9 0 200 400 600 800 1000 1200 1400 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Read Accumulo 1.4 Accumulo 1.6 10. Accumulo across versions 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Mixed Accumulo 1.4 Accumulo 1.6 11. Accumulo across versions 11 0 50 100 150 200 250 300 350 400 450 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Aggregate Write Accumulo 1.4 Accumulo 1.6 12. Accumulo across versions Write speed improved! Read speed about the same. Something weird happens writing 1000 rows. 12 Christopher Foster CC BY-SA 3.0 13. Accumulo across versions So, what happens at 1000 rows? Nothing. 13 100 200 300 400 500 600 700 10 100 1000 10000 100000 Throughput(MB/sec) # of Rows Problem is at 100 rows. 14. 14 Accumulo and HBase 15. Accumulo and HBase Accumulo 1.6.0-cdh4.6.0-beta-1 HBase 0.94.15-cdh4.6.0 YCSB 0.14+50 5 worker nodes 5 split points 5G Heap, 3G mem map 15 Abdullah AlBargan CC BY-ND 2.0 16. Accumulo and HBase Single client (5 threads) Workload sizes In memory (15G) Force disk activity (30G) Constant # of rows Vary # of columns Activity 100% Write 100% Read 16 nahtanoj CC-BY-2.0 17. Accumulo and HBase 17 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 15GB (500 rows) Accumulo Hbase 18. Accumulo and HBase 18 0 100 200 300 400 500 600 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Reading 30GB (1000 rows) Accumulo Hbase 19. Accumulo and HBase 19 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 15GB (500 rows) Accumulo Hbase 20. Accumulo and HBase 20 0 10 20 30 40 50 60 70 80 100 1000 10000 100000 1000000 Throughput(MB/sec) # of columns Writing 30GB (1000 rows) Accumulo Hbase 21. 21 Performance Tweaks 22. Performance Tweaks Client Side Number of rows/columns Batch Writer Threads Batch Writer Buffer Size Use large buffer for small values Use small buffer for large values ACCUMULO-2766 possible fix 22 Public Domain via USN 23. Performance Tweaks Server Side Apply table splits liberally Increase automatic split threshold Some properties to play with: table.compaction.minor.logs.threshold tserver.compaction.minor.concurrent.max tserver.walog.max.size If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes 23 24. 24 Thank You! Please visit our booth! Mike Drob madrob@cloudera.com @mikhaildrob