compaction and splitting in apache accumulo

29
© Hortonworks Inc. 2012 Compaction and Splitting in Apache Accumulo Billie Rinaldi [email protected] October 24, 2012 Page 1

Upload: hortonworks

Post on 05-Dec-2014

3.399 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Compaction and Splitting in Apache AccumuloBillie [email protected] 24, 2012

Page 1

Page 2: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

What are compaction and splitting?

•Accumulo tables are divided into non-overlapping key ranges called tablets

•Compaction selects a set of sorted files for a single tablet and rewrites them into one file

•Splitting divides a tablet into two tablets

Page 2

Page 3: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Tablet Overview

•When memory fills, new sorted files are created by flushing

•Sorted files are combined together into fewer sorted files

Page 3

Page 4: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

How much data are you writing?

•If you never compact – O(N)

•If you always compact – O(N2)

Page 4

Page 5: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Accumulo Compaction Algorithm

•Compact a set of files when:

Page 5

size of the largest file

compaction ratio

sum of the sizes of files× ≤

table.compaction.major.ratio

Page 6: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 1, W = 1)

Page 6

Page 7: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 2, W = 2)

Page 7

Page 8: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 3, W = 3)

Page 8

Page 9: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 3, W = 6)

Page 9

Page 10: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 4, W = 7)

Page 10

Page 11: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 5, W = 8)

Page 11

Page 12: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 6, W = 9)

Page 12

Page 13: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 6, W = 12)

Page 13

Page 14: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 7, W = 13)

Page 14

Page 15: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 8, W = 14)

Page 15

Page 16: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 9, W = 15)

Page 16

Page 17: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 9, W = 24)

Page 17

Page 18: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

In Action (r = 3, N = 27, W = 90*)

Page 18

Page 19: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Amount of data written

•W(rk) = (k+1)rk – (k-1)rk-1

•Thus, W(N) ≈ O(N log N)

Page 19

Page 20: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

HBase Compaction Algorithm

•Compact a set of files when:

Page 20

size of the largest file

sum of the sizes of

smaller files≤ compaction

ratio×

hbase.hstore.compaction.ratio

Page 21: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

HBase Compaction Algorithm

•Compact a set of files when:

Page 21

size of the largest file

sum of the sizes of

smaller files≤ compaction

ratio×

HBase ratio = Accumulo ratio – 1

1

Page 22: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Other Compaction-related Properties

•Accumulo

•Hbase

Page 22

table.file.maxtserver.compaction.major.thread.files.open.maxtserver.compaction.major.delaytable.compaction.major.everything.idle

hbase.hstore.compactionThresholdhbase.hstore.blockingStoreFileshbase.hstore.blockingWaitTimehbase.hstore.compaction.minhbase.hstore.compaction.maxhbase.hstore.compaction.min.sizehbase.hstore.compaction.max.size

Page 23: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Accumulo Splitting

•Always check to see if a split is needed before compacting

•If it is needed, split first•File names stored in metadata table

Page 23

split threshold

Page 24: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Accumulo Splitting Process

•Tablet closed, no new writes•Three writes to the metadata table–tablet made smaller & marked as splitting–new tablet added–original tablet's splitting marks removed

•Tablet server swaps new tablets for old tablet in its online tablet list

•Master informed

Page 24

Page 25: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Accumulo Splitting Recovery

•Whenever a tablet is brought online, the tablet server checks to see if it has split marks.

•If so, it assumes the splitting process was interrupted and finishes making changes to the metadata table.

Page 25

Page 26: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

1

• Simplify deployment to get started quickly and easily

• Monitor, manage any size cluster with familiar console and tools

• Only platform to include data integration services to interact with any data

• Metadata services opens the platform for integration with existing applications

• Dependable high availability architecture

• Tested at scale to future proof your cluster growth

Hortonworks Data Platform

Page 26

Reduce risks and cost of adoption Lower the total cost to administer and provision Integrate with your existing ecosystem

Page 27: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Hortonworks Training

The expert source for Apache Hadoop training & certification

Role-based Developer and Administration training

– Coursework built and maintained by the core Apache Hadoop development team.– The “right” course, with the most extensive and realistic hands-on materials– Provide an immersive experience into real-world Hadoop scenarios– Public and Private courses available

Comprehensive Apache Hadoop Certification

– Become a trusted and valuable Apache Hadoop expert

Page 27

Page 28: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

Next Steps?

• Expert role based training• Course for admins, developers

and operators• Certification program• Custom onsite options

Page 28

Download Hortonworks Data Platformhortonworks.com/download

1

2 Use the getting started guidehortonworks.com/get-started

3 Learn more… get support

• Full lifecycle technical support across four service levels

• Delivered by Apache Hadoop Experts/Committers

• Forward-compatible

Hortonworks Support

hortonworks.com/training hortonworks.com/support

Page 29: Compaction and Splitting in Apache Accumulo

© Hortonworks Inc. 2012

[email protected]

Page 29