accumulo summit 2014: benchmarking accumulo: how fast is fast?
Post on 15-Jan-2015
764 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Benchmarking Accumulo: How Fast is Fast?
Mike Drob
Software Engineer, Cloudera
Me
• Cloudera Engineer
• Accumulo Committer
• Perpetual Tinkerer
2
Victor Grigas CC-BY-SA 3.0
Agenda
• Methodology
• Accumulo 1.4 to 1.6
• Accumulo to HBase
• Conclusions
3
Reuvenk CC-BY-SA 2.5
Methodology
• Measuring Performance
• Task Latency (time)
• Throughput (bps)
• Workloads
• Read
• Write
• Mixed
4
AngMoKio CC-BY-SA 2.5
Methodology
• Yahoo! Cloud Serving Benchmark
• Workloads
• Connectors
• Highly configurable
• # of Rows/Columns
• Size of Value
• # of Threads
• Parallelizable number of clients
5
Sfoskett CC BY-SA 3.0
6
Accumulo across versions
Accumulo across versions
• Accumulo 1.4.4-cdh4.5.0
• Accumulo 1.6.0-cdh4.6.0-beta-1
• YCSB 0.14+50
• 80 node cluster
• 10 clients
• 5 racks
7
Public Domain via USAF
Accumulo across versions
The Data:
• 200 GB
• 2k Columns
• Pre-Split Table 80x
• Vary # of rows
• Vary value size
(we actually did a lot more,
but it was hard to graph)
8
Morio CC BY-SA 3.0
Accumulo across versions
9
0
200
400
600
800
1000
1200
1400
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Read
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Mixed
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
11
0
50
100
150
200
250
300
350
400
450
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Write
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
• Write speed improved!
• Read speed about the same.
• Something weird happens writing 1000 rows.
12
Christopher Foster CC BY-SA 3.0
Accumulo across versions
So, what happens at 1000 rows…? Nothing.
13
100
200
300
400
500
600
700
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Problem is at 100 rows.
14
Accumulo and HBase
Accumulo and HBase
• Accumulo 1.6.0-cdh4.6.0-beta-1
• HBase 0.94.15-cdh4.6.0
• YCSB 0.14+50
• 5 worker nodes
• 5 split points
• 5G Heap, 3G mem map
15
Abdullah AlBargan CC BY-ND 2.0
Accumulo and HBase
• Single client (5 threads)
• Workload sizes
• In memory (15G)
• Force disk activity (30G)
• Constant # of rows
• Vary # of columns
• Activity
• 100% Write
• 100% Read
16
nahtanoj CC-BY-2.0
Accumulo and HBase
17
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Reading 15GB (500 rows)
Accumulo
Hbase
Accumulo and HBase
18
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Reading 30GB (1000 rows)
Accumulo
Hbase
Accumulo and HBase
19
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Writing 15GB (500 rows)
Accumulo
Hbase
Accumulo and HBase
20
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Writing 30GB (1000 rows)
Accumulo
Hbase
21
Performance Tweaks
Performance Tweaks – Client Side
• Number of rows/columns
• Batch Writer Threads
• Batch Writer Buffer Size
• Use large buffer for small values
• Use small buffer for large values
• ACCUMULO-2766 possible fix
22
Public Domain via USN
Performance Tweaks – Server Side
• Apply table splits liberally
• Increase automatic split threshold
• Some properties to play with:
• table.compaction.minor.logs.threshold
• tserver.compaction.minor.concurrent.max
• tserver.walog.max.size
• If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes
23
24
Thank You! Please visit our booth!
Mike Drob – madrob@cloudera.com
@mikhaildrob
top related