accumulo summit 2014: benchmarking accumulo: how fast is fast?

Post on 15-Jan-2015

764 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Speaker: Mike Drob Apache Accumulo has long held a reputation for enabling high-throughput operations in write-heavy workloads. In this talk, we use the Yahoo! Cloud Serving Benchmark (YCSB) to put real numbers on Accumulo performance. We then compare these numbers to previous versions, to other databases, and wrap up with a discussion of parameters that can be tweaked to improve them.

TRANSCRIPT

1

Benchmarking Accumulo: How Fast is Fast?

Mike Drob

Software Engineer, Cloudera

Me

• Cloudera Engineer

• Accumulo Committer

• Perpetual Tinkerer

2

Victor Grigas CC-BY-SA 3.0

Agenda

• Methodology

• Accumulo 1.4 to 1.6

• Accumulo to HBase

• Conclusions

3

Reuvenk CC-BY-SA 2.5

Methodology

• Measuring Performance

• Task Latency (time)

• Throughput (bps)

• Workloads

• Read

• Write

• Mixed

4

AngMoKio CC-BY-SA 2.5

Methodology

• Yahoo! Cloud Serving Benchmark

• Workloads

• Connectors

• Highly configurable

• # of Rows/Columns

• Size of Value

• # of Threads

• Parallelizable number of clients

5

Sfoskett CC BY-SA 3.0

6

Accumulo across versions

Accumulo across versions

• Accumulo 1.4.4-cdh4.5.0

• Accumulo 1.6.0-cdh4.6.0-beta-1

• YCSB 0.14+50

• 80 node cluster

• 10 clients

• 5 racks

7

Public Domain via USAF

Accumulo across versions

The Data:

• 200 GB

• 2k Columns

• Pre-Split Table 80x

• Vary # of rows

• Vary value size

(we actually did a lot more,

but it was hard to graph)

8

Morio CC BY-SA 3.0

Accumulo across versions

9

0

200

400

600

800

1000

1200

1400

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Read

Accumulo 1.4

Accumulo 1.6

Accumulo across versions

10

0

200

400

600

800

1000

1200

1400

1600

1800

2000

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Mixed

Accumulo 1.4

Accumulo 1.6

Accumulo across versions

11

0

50

100

150

200

250

300

350

400

450

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Aggregate Write

Accumulo 1.4

Accumulo 1.6

Accumulo across versions

• Write speed improved!

• Read speed about the same.

• Something weird happens writing 1000 rows.

12

Christopher Foster CC BY-SA 3.0

Accumulo across versions

So, what happens at 1000 rows…? Nothing.

13

100

200

300

400

500

600

700

10 100 1000 10000 100000

Thro

ugh

pu

t (M

B/s

ec)

# of Rows

Problem is at 100 rows.

14

Accumulo and HBase

Accumulo and HBase

• Accumulo 1.6.0-cdh4.6.0-beta-1

• HBase 0.94.15-cdh4.6.0

• YCSB 0.14+50

• 5 worker nodes

• 5 split points

• 5G Heap, 3G mem map

15

Abdullah AlBargan CC BY-ND 2.0

Accumulo and HBase

• Single client (5 threads)

• Workload sizes

• In memory (15G)

• Force disk activity (30G)

• Constant # of rows

• Vary # of columns

• Activity

• 100% Write

• 100% Read

16

nahtanoj CC-BY-2.0

Accumulo and HBase

17

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 15GB (500 rows)

Accumulo

Hbase

Accumulo and HBase

18

0

100

200

300

400

500

600

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Reading 30GB (1000 rows)

Accumulo

Hbase

Accumulo and HBase

19

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 15GB (500 rows)

Accumulo

Hbase

Accumulo and HBase

20

0

10

20

30

40

50

60

70

80

100 1000 10000 100000 1000000

Thro

ugh

pu

t (M

B/s

ec)

# of columns

Writing 30GB (1000 rows)

Accumulo

Hbase

21

Performance Tweaks

Performance Tweaks – Client Side

• Number of rows/columns

• Batch Writer Threads

• Batch Writer Buffer Size

• Use large buffer for small values

• Use small buffer for large values

• ACCUMULO-2766 possible fix

22

Public Domain via USN

Performance Tweaks – Server Side

• Apply table splits liberally

• Increase automatic split threshold

• Some properties to play with:

• table.compaction.minor.logs.threshold

• tserver.compaction.minor.concurrent.max

• tserver.walog.max.size

• If running with dfs.datanode.synconclose also enable dfs.datanode.sync.behind.writes

23

24

Thank You! Please visit our booth!

Mike Drob – madrob@cloudera.com

@mikhaildrob

top related