understanding & tuning compaction algorithms nicolas spiegelberg software engineer, facebook

Understanding & Tuning Compaction Algorithms

Nicolas SpiegelbergSoftware Engineer, Facebook

HBase Users Group,January 23, 2013

1 Background

2 Compactions in Hbase

3 Compactions: Other System Algorithms

4 Parting Thoughts

Agenda

Compactions: Background

. . . . Shard #2

Log Structured Merge Tree

Shard #1

Server

. . . . ColumnFamily #2

ColumnFamily #1 Memstore

HFiles flush

Data in HFile is sorted; has block index for efficient retrieval

About LSMTWrite Algorithms are relatively-trivial▪ Write new, immutable file▪ Avoid stalls

Read Algorithms are varied▪ Compaction▪ Server-side Filters▪ Block Index▪ Bloom Filter

Compactions: IntroCritical for Read Performance▪ Merge N files▪ Reduces read IO when earlier filters don’t help enough▪ The most complicated part of an LSMT

▪ What & when to select

HFiles

Merge

Compactions: DisclaimersAssumptions▪ Only general algorithms included

▪ Coprocessors available for some common apps▪ Assume a relatively-stable R+W workload

Compactions in HBase

Sigma CompactionDefault algorithm in HBase 0.90

#1. File selection based on summation of sizes.

size[i] < (size[0] + size[1] + …size[i-1]) * C#2. Compact only if at least N eligible files found.

+ trivial implementation - non-deterministic latency+ minimal overwrites - files have variable lifetime

- no incremental benefit

Compactions: ConfigurationAll Compaction Algorithms▪ hbase.hstore.compaction.ratio▪ hbase.hstore.compaction.min▪ hbase.hregion.majorcompaction

▪ hbase.offpeak.start.hour▪ hbase.offpeak.end.hour▪ hbase.hstore.compaction.ratio.offpeak

Tiered CompactionDefault algorithm in BigTable/HBase

#1. File selection based on size relative to a pivot:

size[i] * C >= size[p] <= size[k] / C :: i < p < k#2. Compact only if at least N eligible files found.

(groups files into “tiers”)

+ trivial implementation - more files seeks necessary+ more deterministic behavior - still write-biased+ medium size files are warm - no incremental benefit

Compactions: ConfigurationTiered Compaction▪ Enable: “hbase.hstore.compaction.CompactionPolicy”

▪ Default.NumCompactionTiers▪ Default.Tier.X

▪ MaxSize▪ MaxAgeInDisk

Compactions: Work Queues▪ Problem: Starvation▪ Solution:

▪ Handle Large & Small Compactions Differently▪ Allow a configurable “throttle” to determine which queue

Compactions: ConfigurationCompaction Work Queues▪ hbase.regionserver.thread.compaction.small▪ hbase.regionserver.thread.compaction.large▪ hbase.regionserver.thread.compaction.throttle / “ThrottlePoint”

Compactions: Other Algorithms

Leveled CompactionDefault algorithm in LevelDB

#1. Bucket into tiers of magnitude difference (~10x) #2. Shard the compaction across files (not just block index)#3. Only the shard that goes over a certain size

+ optimized for read-heavy use - complicated algorithm+ faster compaction turnaround - heavy rewrites on write-dominated use+ easy to cache-on-compact - time range filters less effective

Time-Series CompactionHFiles

▪ Log-structured Merge Tree▪ Time-ordered Data Storage!

▪ Time-Series Compaction▪ Implement with Coprocessor▪ Time-boundary Based

▪ Shard HFiles on Hour, Day, etc… day… hour… …

▪ Time-series data optimized▪ Write-biased query optimized

HFiles

flush

Parting Thoughts

Compactions: Associated JIRAs▪ 0.90 Sigma Compactions (HBASE-3209)▪ 0.92 Multi-Threaded Compactions (HBASE-1476)▪ 0.96 Tier-based Compaction (HBASE-6371 & 7055)

▪ Future Make Compactions Pluggable (HBASE-7516)Leveled Compaction (HBASE-7519)

Compactions: High Level ThoughtsVariables▪ Disk IO on HFile Read▪ Disk & Network IO on Compaction (R+W)

Compactions: High Level ThoughtsRelated Questions▪ Is data mutate or append?

▪ Mutates benefit from lazy seeks but cause disk bloat▪ HFile reduction is less useful as Rows queries are larger

▪ Are you missing critical filters?▪ Explicit vs. Implicit Requests▪ Cache on write/compact (CacheConfig)▪ Time Range / Column Filter▪ Bloom Filters: non-trivial decision, need to measure

Thanks! Questions?

understanding & tuning compaction algorithms nicolas spiegelberg software engineer, facebook

Documents

dht approach

original strategy7compactions

original strategy8compactions

small files

n eligible files

sizei size0 size1 sizei

file selection

common appsassume