accumulo summit 2014: benchmarking accumulo: how fast is fast?

24
1 Benchmarking Accumulo: How Fast is Fast? Mike Drob Software Engineer, Cloudera

Upload: accumulo-summit

Post on 15-Jan-2015

764 views

Category:

Technology


0 download

DESCRIPTION

Speaker: Mike Drob Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

TRANSCRIPT

Page 1: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

1

Benchmarking Accumulo: How Fast is Fast?

Mike Drob

Software Engineer, Cloudera

Page 2: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Me

• Cloudera Engineer

• Accumulo Committer

• Perpetual Tinkerer

2

Victor Grigas CC-BY-SA 3.0

Page 3: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Agenda

• Methodology

• Accumulo 1.4 to 1.6

• Accumulo to HBase

• Conclusions

3

Reuvenk CC-BY-SA 2.5

Page 4: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Methodology

• Measuring Performance

• Task Latency (time)

• Throughput (bps)

• Workloads

• Read

• Write

• Mixed

4

AngMoKio CC-BY-SA 2.5

Page 5: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Methodology

• Yahoo! Cloud Serving Benchmark

• Workloads

• Connectors

• Highly configurable

• # of Rows/Columns

• Size of Value

• # of Threads

• Parallelizable number of clients

5

Sfoskett CC BY-SA 3.0

Page 6: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

6

Accumulo across versions

Page 7: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

• Accumulo 1.4.4-cdh4.5.0

• Accumulo 1.6.0-cdh4.6.0-beta-1

• YCSB 0.14+50

• 80 node cluster

• 10 clients

• 5 racks

7

Public Domain via USAF

Page 8: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

The Data:

• 200 GB

• 2k Columns

• Pre-Split Table 80x

• Vary # of rows

• Vary value size

(we actually did a lot more,

but it was hard to graph)

8

Morio CC BY-SA 3.0

Page 9: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

9

0

200

400

600

800

1000

1200

1400

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Read

Accumulo 1.4

Accumulo 1.6

Page 10: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

10

0

200

400

600

800

1000

1200

1400

1600

1800

2000

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Mixed

Accumulo 1.4

Accumulo 1.6

Page 11: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

11

0

50

100

150

200

250

300

350

400

450

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Write

Accumulo 1.4

Accumulo 1.6

Page 12: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

• Write speed improved!

• Read speed about the same.

• Something weird happens writing 1000 rows.

12

Christopher Foster CC BY-SA 3.0

Page 13: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo across versions

So, what happens at 1000 rows…? Nothing.

13

100

200

300

400

500

600

700

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Problem is at 100 rows.

Page 14: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

14

Accumulo and HBase

Page 15: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

• Accumulo 1.6.0-cdh4.6.0-beta-1

• HBase 0.94.15-cdh4.6.0

• YCSB 0.14+50

• 5 worker nodes

• 5 split points

• 5G Heap, 3G mem map

15

Abdullah AlBargan CC BY-ND 2.0

Page 16: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

• Single client (5 threads)

• Workload sizes

• In memory (15G)

• Force disk activity (30G)

• Constant # of rows

• Vary # of columns

• Activity

• 100% Write

• 100% Read

16

nahtanoj CC-BY-2.0

Page 17: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

17

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 15GB (500 rows)

Accumulo

Hbase

Page 18: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

18

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 30GB (1000 rows)

Accumulo

Hbase

Page 19: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

19

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 15GB (500 rows)

Accumulo

Hbase

Page 20: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Accumulo and HBase

20

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 30GB (1000 rows)

Accumulo

Hbase

Page 21: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

21

Performance Tweaks

Page 22: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Performance Tweaks – Client Side

• Number of rows/columns

• Batch Writer Threads

• Batch Writer Buffer Size

• Use large buffer for small values

• Use small buffer for large values

• ACCUMULO-2766 possible fix

22

Public Domain via USN

Page 23: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

Performance Tweaks – Server Side

• Apply table splits liberally

• Increase automatic split threshold

• Some properties to play with:

• table.compaction.minor.logs.threshold

• tserver.compaction.minor.concurrent.max

• tserver.walog.max.size

• If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes

23

Page 24: Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?

24

Thank You! Please visit our booth!

Mike Drob – [email protected]

@mikhaildrob