performance models for apache accumulo

25
Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHARED- NOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc.

Upload: sqrrl

Post on 17-Dec-2014

39 views

Category:

Travel


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Performance Models for Apache Accumulo

Securely explore your data

PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHARED-

NOTHING ARCHITECTURE

Chris McCubbin Director of Data Science Sqrrl Data, Inc.

Page 2: Performance Models for Apache Accumulo

I’M NOT ADAM FUCHS

•  But perhaps I’m still an interesting guy •  MS in CS from UMBC in Network Security and

Quantum Computing •  8 years at JHU/APL working on UxV Swarms •  4 years at JHU/APL and TexelTek creating Big

Data Applications for the NSA •  Co-founder and Director of Data Science at Sqrrl

©2014 Sqrrl Data, Inc 2

Page 3: Performance Models for Apache Accumulo

SO, YOUR DISTRIBUTED APPLICATION IS SLOW

•  Today’s distributed applications run on tens or hundreds of library components

•  Many versions so internet advice could be ineffective, or worse, flat out wrong

•  Hundreds of settings •  Some, shall we say, could be better documented

•  Shared-nothing architectures are usually “shared-little” architectures with tricky interactions

•  Profiling is hard and time-consuming •  What do we do?

©2014 Sqrrl Data, Inc 3

Page 4: Performance Models for Apache Accumulo

TODAY’S TALK

1.  Quick intro to performance optimization 2.  Tricks and techniques for targeted distributed

application modeling performance improvement 3.  A deep dive into improving bulk load application

performance

4 ©2014 Sqrrl Data, Inc

Page 5: Performance Models for Apache Accumulo

The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, scalable, high performance data storage and retrieval system. •  Many applications in real-time storage and analysis of “big data”:

•  Spatio-temporal indexing in non-relational distributed databases - Fox et al 2013 IEEE International Congress on Big Data

•  Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 •  Leading its peers in performance and scalability:

•  Achieving 100,000,000 database inserts per second using Accumulo and D4M - Kepner et al IEEE HPEC 2014

•  An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) •  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its

Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big Data

For more papers and presentations, see http://accumulo.apache.org/papers.html

5 ©2014 Sqrrl Data, Inc

Page 6: Performance Models for Apache Accumulo

•  Collections of KV pairs form Tables •  Tables are partitioned into Tablets

•  Metadata tablets hold info about other tablets, forming a 3-level hierarchy

•  A Tablet is a unit of work for a Tablet Server

Data  Tablet  -­‐∞  :  thing  

Data  Tablet  thing  :  ∞    

Data  Tablet  -­‐∞  :  Ocelot    

Data  Tablet  Ocelot  :  Yak    

Data  Tablet  Yak  :  ∞    

Data  Tablet  -­‐∞  to  ∞    

Table:    Adam’s  Table   Table:    Encyclopedia   Table:    Foo  

SCALING UP: DIVIDE & CONQUER

Well-­‐Known  Loca9on  

(zookeeper)  

Root  Tablet  -­‐∞  to  ∞    

Metadata  Tablet  2  “Encyclopedia:Ocelot”  to  ∞  

Metadata  Tablet  1  -­‐∞  to  “Encyclopedia:Ocelot”  

6 ©2014 Sqrrl Data, Inc

Page 7: Performance Models for Apache Accumulo

PERFORMANCE ANALYSIS CYCLE

Simulate & Experiment

Modify Code

Analyze

7 ©2014 Sqrrl Data, Inc

Start: Create Model

Refine Model

Outputs: Better Code

+ Models

Page 8: Performance Models for Apache Accumulo

MAKING A MODEL

©2014 Sqrrl Data, Inc 8

•  Determine points of low-impact metrics •  Add some if needed

•  Create parallel state machine models with components driven by these metrics

•  Estimate running times and bottlenecks from a-priori information and/or apply measured statistics

•  Focus testing on validation of the initial model and the (estimated) pain points

•  Apply Amdahl’s Law

•  Rinse, repeat

Page 9: Performance Models for Apache Accumulo

BULK INGEST OVERVIEW •  Accumulo supports two mechanisms to bring

data in: streaming ingest and bulk ingest. •  Bulk Ingest

•  Goal: maximize throughput without constraining latency.

•  create a set of Accumulo Rfiles, then register those files with Accumulo.

•  RFiles are groups of sorted key-value pairs with some indexing information

•  MapReduce has a built-in key sorting phase: a good fit to produce RFiles

©2014 Sqrrl Data, Inc 9

Page 10: Performance Models for Apache Accumulo

BULK INGEST MODEL

10

Map Reduce Register

Time

©2014 Sqrrl Data, Inc

Page 11: Performance Models for Apache Accumulo

BULK INGEST MODEL

11

Time

•  100% CPU •  20% Disk •  0% Network •  46 seconds

•  40% CPU •  100% Disk •  20% Network •  168 seconds

•  10% CPU •  20% Disk •  40% Network •  17 seconds

Hypothetical Resource Usage

©2014 Sqrrl Data, Inc

Map Reduce Register

Page 12: Performance Models for Apache Accumulo

INSIGHT

12

Time

•  100% CPU •  20% Disk •  0% Network •  46 seconds

•  40% CPU •  100% Disk •  20% Network •  168 seconds

•  10% CPU •  20% Disk •  40% Network •  17 seconds

•  Spare disk here, spare CPU there – can we even out resource consumption? •  Why did reduce take 168 seconds? It should be more like 40 seconds. •  No clear bottleneck during registration – is there a synchronization or

serialization problem?

©2014 Sqrrl Data, Inc

Map Reduce Register

Page 13: Performance Models for Apache Accumulo

Reduce Thread

Map Thread

LOOKING DEEPER: REFINED BULK INGEST MODEL

13

Map Setup Map Sort

Sort Reduce Output

Spill Merge

Shuffle

Serve

Time

©2014 Sqrrl Data, Inc

Parallel Latch

Page 14: Performance Models for Apache Accumulo

BULK INGEST MODEL PREDICTIONS •  We can constrain parts of the model by physical

throughput limitations •  Disk -> memory (100Mbps avg 7200rpm seq. read rate)

•  Input reader •  Memory -> Disk (100Mbps)

•  Spill, OutputWriter •  Disk -> Disk (50Mbps)

•  Merge •  Network (Gigabit = 125Mbps)

•  Shuffle •  And/or algorithmic limitations

•  Sort, (Our) Map, (Our) Reduce, SerDe

©2014 Sqrrl Data, Inc 14

Page 15: Performance Models for Apache Accumulo

PERFORMANCE GOAL MODEL

©2014 Sqrrl Data, Inc 15

Performance goals obtained through: •  Simulation of individual components •  Prediction of available resources at runtime

Page 16: Performance Models for Apache Accumulo

INSTRUMENTATION

application version 1.3.3 SYSTEM DATA application sha 8d17baf8 node num 1 input type arcsight

yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32

yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20

yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649

yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723

yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324

mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830

mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992

mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787

mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577

mapreduce.task.io.sort.mb 100 TIME mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 mapred.map.output.compression.codec n/a map:merge avg 46

description baseline map total 290 CONSTANTS red:shuffle avg 6 avg schema entry size (bytes) 59

red:merge avg 38

red:reduce avg 68 effective MB/sec 1.618488025 red:total avg 112 red:reducer count 20

job:total 396

16 ©2014 Sqrrl Data, Inc

Page 17: Performance Models for Apache Accumulo

PERFORMANCE MEASUREMENT Baseline (naive implementation)

17 ©2014 Sqrrl Data, Inc

Reduce Thread

Map Thread

Map Setup Map Sort

Sort Reduce Output

Spill Merge

Shuffle

Serve

Page 18: Performance Models for Apache Accumulo

PATH TO IMPROVEMENT

1.  Profiling revealed much time spent serializing/deserializing Key

2.  With proper configuration, MapReduce supports comparison of keys in serialized form

3.  Rewriting Key’s serialization lead to an order-preserving encoding, easy to compare in serialized form

4.  Configure MapReduce to use native code to compare Keys

5.  Tweak map input size and spill memory for as few spills as possible

18 ©2014 Sqrrl Data, Inc

Page 19: Performance Models for Apache Accumulo

PERFORMANCE MEASUREMENT Optimized sorting

•  Improvements: •  Time for map-side merge went down •  Sort performance drastically improved in both

map and reduce phases •  300% faster

19 ©2014 Sqrrl Data, Inc

Page 20: Performance Models for Apache Accumulo

PERFORMANCE MEASUREMENT Optimized sorting

Insights: •  Map is slower than expected •  Output is disk bound maybe we can move more processing to Reduce

•  “Reverse Amdahl’s law” •  Intermediate data inflation ratio (output/input for map) is very high

20 ©2014 Sqrrl Data, Inc

Reduce Thread

Map Thread

Map Setup Map Sort

Sort Reduce Output

Spill Merge

Shuffle

Serve

Page 21: Performance Models for Apache Accumulo

PATH TO IMPROVEMENT

1.  Profiling revealed much time spent copying data 2.  Evaluation of data passed from map to reduce

revealed inefficiencies: •  Constant timestamp cost 8 bytes per key •  Repeated column names could be encoded/

compressed •  Some Key/Value pairs didn’t need to be created

until reduce

21 ©2014 Sqrrl Data, Inc

Page 22: Performance Models for Apache Accumulo

PERFORMANCE MEASUREMENT Optimized map code

•  Improvement: •  Big speedup in map function

•  Twice as fast •  Reduced intermediate inflation sped up all

steps between map and reduce

22 ©2014 Sqrrl Data, Inc

Page 23: Performance Models for Apache Accumulo

DO TRY THIS AT HOME

With these steps, we achieved 6X speedup: •  Perform comparisons on serialized objects •  With Map/Reduce, calculate how many merge

steps are needed •  Avoid premature data inflation •  Leverage compression to shift bottlenecks •  Always consider how fast your code should run

Hints for Accumulo Application Optimization

23 ©2014 Sqrrl Data, Inc

Page 24: Performance Models for Apache Accumulo

SOME CURRENT ACCUMULO PERFORMANCE PROJECTS •  Optimize metadata operations

•  Batch to improve throughput (ACCUMULO-2175, ACCUMULO-2889)

•  Remove from critical path where possible

•  Optimize write-ahead log performance •  Maximize throughput •  Reduce flushes •  Parallelize WALs (ACCUMULO-1083) •  Avoid downtime by pre-allocating

24 ©2014 Sqrrl Data, Inc

Page 25: Performance Models for Apache Accumulo

Securely explore your data

SQRRL IS HIRING! QUESTIONS?

Chris McCubbin Director of Data Science Sqrrl Data, Inc.