understanding & tuning compaction algorithms nicolas spiegelberg software engineer, facebook
DESCRIPTION
Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook. HBase Users Group, January 23, 2013. Agenda. Compactions: Background. Log Structured Merge Tree. Server. . . . . . Shard #2. Shard # 1. . . . . . ColumnFamily #2. ColumnFamily #1. Memstore. - PowerPoint PPT PresentationTRANSCRIPT
Understanding & Tuning Compaction Algorithms
Nicolas SpiegelbergSoftware Engineer, Facebook
HBase Users Group,January 23, 2013
1 Background
2 Compactions in Hbase
3 Compactions: Other System Algorithms
4 Parting Thoughts
Agenda
Compactions: Background
. . . . Shard #2
Log Structured Merge Tree
Shard #1
Server
. . . . ColumnFamily #2
ColumnFamily #1 Memstore
HFiles flush
Data in HFile is sorted; has block index for efficient retrieval
About LSMTWrite Algorithms are relatively-trivial▪ Write new, immutable file▪ Avoid stalls
Read Algorithms are varied▪ Compaction▪ Server-side Filters▪ Block Index▪ Bloom Filter
Compactions: IntroCritical for Read Performance▪ Merge N files▪ Reduces read IO when earlier filters don’t help enough▪ The most complicated part of an LSMT
▪ What & when to select
HFiles
Merge
Compactions: DisclaimersAssumptions▪ Only general algorithms included
▪ Coprocessors available for some common apps▪ Assume a relatively-stable R+W workload
Compactions in HBase
Sigma CompactionDefault algorithm in HBase 0.90
#1. File selection based on summation of sizes.
size[i] < (size[0] + size[1] + …size[i-1]) * C#2. Compact only if at least N eligible files found.
+ trivial implementation - non-deterministic latency+ minimal overwrites - files have variable lifetime
- no incremental benefit
Compactions: ConfigurationAll Compaction Algorithms▪ hbase.hstore.compaction.ratio▪ hbase.hstore.compaction.min▪ hbase.hregion.majorcompaction
▪ hbase.offpeak.start.hour▪ hbase.offpeak.end.hour▪ hbase.hstore.compaction.ratio.offpeak
Tiered CompactionDefault algorithm in BigTable/HBase
#1. File selection based on size relative to a pivot:
size[i] * C >= size[p] <= size[k] / C :: i < p < k#2. Compact only if at least N eligible files found.
(groups files into “tiers”)
+ trivial implementation - more files seeks necessary+ more deterministic behavior - still write-biased+ medium size files are warm - no incremental benefit
Compactions: ConfigurationTiered Compaction▪ Enable: “hbase.hstore.compaction.CompactionPolicy”
▪ Default.NumCompactionTiers▪ Default.Tier.X
▪ MaxSize▪ MaxAgeInDisk
Compactions: Work Queues▪ Problem: Starvation▪ Solution:
▪ Handle Large & Small Compactions Differently▪ Allow a configurable “throttle” to determine which queue
Compactions: ConfigurationCompaction Work Queues▪ hbase.regionserver.thread.compaction.small▪ hbase.regionserver.thread.compaction.large▪ hbase.regionserver.thread.compaction.throttle / “ThrottlePoint”
Compactions: Other Algorithms
Leveled CompactionDefault algorithm in LevelDB
#1. Bucket into tiers of magnitude difference (~10x) #2. Shard the compaction across files (not just block index)#3. Only the shard that goes over a certain size
+ optimized for read-heavy use - complicated algorithm+ faster compaction turnaround - heavy rewrites on write-dominated use+ easy to cache-on-compact - time range filters less effective
Time-Series CompactionHFiles
▪ Log-structured Merge Tree▪ Time-ordered Data Storage!
▪ Time-Series Compaction▪ Implement with Coprocessor▪ Time-boundary Based
▪ Shard HFiles on Hour, Day, etc… day… hour… …
▪ Time-series data optimized▪ Write-biased query optimized
HFiles
flush
Parting Thoughts
Compactions: Associated JIRAs▪ 0.90 Sigma Compactions (HBASE-3209)▪ 0.92 Multi-Threaded Compactions (HBASE-1476)▪ 0.96 Tier-based Compaction (HBASE-6371 & 7055)
▪ Future Make Compactions Pluggable (HBASE-7516)Leveled Compaction (HBASE-7519)
Compactions: High Level ThoughtsVariables▪ Disk IO on HFile Read▪ Disk & Network IO on Compaction (R+W)
Compactions: High Level ThoughtsRelated Questions▪ Is data mutate or append?
▪ Mutates benefit from lazy seeks but cause disk bloat▪ HFile reduction is less useful as Rows queries are larger
▪ Are you missing critical filters?▪ Explicit vs. Implicit Requests▪ Cache on write/compact (CacheConfig)▪ Time Range / Column Filter▪ Bloom Filters: non-trivial decision, need to measure
Thanks! Questions?