accumulo summit 2014: benchmarking accumulo: how fast is fast?
DESCRIPTION
Speaker: Mike Drob Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.TRANSCRIPT
1
Benchmarking Accumulo: How Fast is Fast?
Mike Drob
Software Engineer, Cloudera
Me
• Cloudera Engineer
• Accumulo Committer
• Perpetual Tinkerer
2
Victor Grigas CC-BY-SA 3.0
Agenda
• Methodology
• Accumulo 1.4 to 1.6
• Accumulo to HBase
• Conclusions
3
Reuvenk CC-BY-SA 2.5
Methodology
• Measuring Performance
• Task Latency (time)
• Throughput (bps)
• Workloads
• Read
• Write
• Mixed
4
AngMoKio CC-BY-SA 2.5
Methodology
• Yahoo! Cloud Serving Benchmark
• Workloads
• Connectors
• Highly configurable
• # of Rows/Columns
• Size of Value
• # of Threads
• Parallelizable number of clients
5
Sfoskett CC BY-SA 3.0
6
Accumulo across versions
Accumulo across versions
• Accumulo 1.4.4-cdh4.5.0
• Accumulo 1.6.0-cdh4.6.0-beta-1
• YCSB 0.14+50
• 80 node cluster
• 10 clients
• 5 racks
7
Public Domain via USAF
Accumulo across versions
The Data:
• 200 GB
• 2k Columns
• Pre-Split Table 80x
• Vary # of rows
• Vary value size
(we actually did a lot more,
but it was hard to graph)
8
Morio CC BY-SA 3.0
Accumulo across versions
9
0
200
400
600
800
1000
1200
1400
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Read
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
10
0
200
400
600
800
1000
1200
1400
1600
1800
2000
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Mixed
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
11
0
50
100
150
200
250
300
350
400
450
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Aggregate Write
Accumulo 1.4
Accumulo 1.6
Accumulo across versions
• Write speed improved!
• Read speed about the same.
• Something weird happens writing 1000 rows.
12
Christopher Foster CC BY-SA 3.0
Accumulo across versions
So, what happens at 1000 rows…? Nothing.
13
100
200
300
400
500
600
700
10 100 1000 10000 100000
Thro
ugh
pu
t (M
B/s
ec)
# of Rows
Problem is at 100 rows.
14
Accumulo and HBase
Accumulo and HBase
• Accumulo 1.6.0-cdh4.6.0-beta-1
• HBase 0.94.15-cdh4.6.0
• YCSB 0.14+50
• 5 worker nodes
• 5 split points
• 5G Heap, 3G mem map
15
Abdullah AlBargan CC BY-ND 2.0
Accumulo and HBase
• Single client (5 threads)
• Workload sizes
• In memory (15G)
• Force disk activity (30G)
• Constant # of rows
• Vary # of columns
• Activity
• 100% Write
• 100% Read
16
nahtanoj CC-BY-2.0
Accumulo and HBase
17
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Reading 15GB (500 rows)
Accumulo
Hbase
Accumulo and HBase
18
0
100
200
300
400
500
600
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Reading 30GB (1000 rows)
Accumulo
Hbase
Accumulo and HBase
19
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Writing 15GB (500 rows)
Accumulo
Hbase
Accumulo and HBase
20
0
10
20
30
40
50
60
70
80
100 1000 10000 100000 1000000
Thro
ugh
pu
t (M
B/s
ec)
# of columns
Writing 30GB (1000 rows)
Accumulo
Hbase
21
Performance Tweaks
Performance Tweaks – Client Side
• Number of rows/columns
• Batch Writer Threads
• Batch Writer Buffer Size
• Use large buffer for small values
• Use small buffer for large values
• ACCUMULO-2766 possible fix
22
Public Domain via USN
Performance Tweaks – Server Side
• Apply table splits liberally
• Increase automatic split threshold
• Some properties to play with:
• table.compaction.minor.logs.threshold
• tserver.compaction.minor.concurrent.max
• tserver.walog.max.size
• If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes
23