hbase applications - atlanta hug - may 2014
DESCRIPTION
HBase is good a various workloads, ranging from sequential range scans to purely random access. These access patterns can be translated into application types, usually falling into two major groups: entities and events. This presentation discussed the underlying implications and how to approach those use-cases. Examples taken from Facebook show how this has been tackled in real life.TRANSCRIPT
![Page 1: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/1.jpg)
1
HBase Applica-ons Selected Use-‐Cases around a Common Theme Atlanta HUG –May 2014 Lars George, Cloudera EMEA Chief Architect
![Page 2: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/2.jpg)
2
About Me
• EMEA Chief Architect @ Cloudera • Consul-ng on Hadoop projects (everywhere)
• Apache CommiNer • HBase and Whirr
• O’Reilly Author • HBase – The Defini-ve Guide
• Now in Japanese!
• Contact • [email protected] • @larsgeorge
日本語版も出ました!
![Page 3: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/3.jpg)
3
The Content...
• HBase -‐ Strengths and weaknesses • Common use-‐cases and paNerns • Focus on specific type of applica-ons • Summary
![Page 4: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/4.jpg)
4 CONFIDENTIAL -‐ RESTRICTED
HBase Strength and Weaknesses
![Page 5: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/5.jpg)
5
IOPS vs Throughput Mythbusters
It is all physics in the end, you cannot solve an I/O problem without reducing I/O in general. Parallelize access and read/write sequen-ally.
![Page 6: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/6.jpg)
6
HBase: Strengths & Weaknesses
Strengths: • Random access to small(ish) key-‐value pairs • Rows and columns stored sorted lexicographically • Adds table and region concepts to group related KVs • Stores and reads data sequen-ally • Parallelizes across all clients
• Non-‐blocking I/O throughout
![Page 7: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/7.jpg)
7
HBase: Strengths & Weaknesses
Weaknesses: • Not op-mized (yet) for 100% possible throughput of underlying storage layer
• And HDFS is not op-mized fully either
• Single writer issue with WALs • Single server hot-‐spojng with non-‐distributed keys
![Page 8: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/8.jpg)
8
PaNerns
• There are common paNerns in many common use-‐cases, like programming paNerns.
• We need to extract these common paNerns and make them repeatable.
• Similar to the “Gang of Four” (Gamma, Helm, Johnson, Vlissides), or the “Three Amigos” (Booch, Jacobson, Rumbaugh)
![Page 9: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/9.jpg)
9 CONFIDENTIAL -‐ RESTRICTED
Common PaNerns
![Page 10: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/10.jpg)
10
HBase Dilemma
Although HBase can host many applica-ons, they may require completely opposite features
Events Entities
Time Series Message Store
![Page 11: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/11.jpg)
11
This talk (at this event)
• Message Store • Informa-on exchange between en--es • Sending/Receiving informa-on is an event
• Time-‐Series • Sequence of data points measure at successive points in -me, spaced at uniform intervals
• Measuring of a data point is an event
![Page 12: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/12.jpg)
12
Using HBase Strengths
![Page 13: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/13.jpg)
13
HBase “Indexes” (cont.)
• Use primary keys, aka the row keys, as sorted index • One sort direc-on only • Use “secondary index” to get reverse sor-ng
• Lookup table or same table
• Use secondary keys, aka the column qualifiers, as sorted index within main record
• Use prefixes within a column family or separate column families
![Page 14: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/14.jpg)
14 CONFIDENTIAL -‐ RESTRICTED
Common Use-‐Cases
![Page 15: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/15.jpg)
15
Use-‐Case I: Messages
![Page 16: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/16.jpg)
16
HBase Message Store
Use-‐Case: • Store incoming messages in HBase, such as Emails, SMS, MMS, IM
• Constant updates of exis-ng en--es • e.g. Email read, flagged, starred, moved, deleted
• Reading of top-‐N entries, sorted by -me • Newest 20 messages, last 20 conversa-ons
• Examples: • Facebook Messages
![Page 17: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/17.jpg)
17
Problem Descrip-on
• Records are of varying size • Large ones hinder smaller ones
• Massive index issue • User can sort, filter by everything • At the same -me reading top-‐N should be fast • But what to do for automated accounts? 80/20 rule? • Only doable with heuris-cs
• Only create minimal indexes • Create addi-onal ones when user asks for it
• Cross mailbox issues with Conversa-ons • Similar to -meline in Facebook
• Overall requirements for I/O
![Page 18: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/18.jpg)
18
Interlude I: Compaction Details
Write Amplification in HBase
![Page 19: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/19.jpg)
19
Compac-ons in HBase
• Must happen to keep data in check • Combine small flush files into larger ones • Remove old data (during major compac-ons)
• Two types: Minor and Major Compac-ons • Minor are triggered with API muta-on calls • Major are -me scheduled (or auto-‐promoted) • Both can be triggered manually if needed
• Add extra background I/O that grows over -me • Write amplifica-on!
• Have to be tuned for heavy write systems
![Page 20: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/20.jpg)
20
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF1
hbase.hregion.memstore.flush.size = 128MB
![Page 21: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/21.jpg)
21
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF1 HF2 HF1
![Page 22: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/22.jpg)
22
Writes: Flushes and Compac-ons
HF3
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF2 HF1
hbase.hstore.compaction.min = 3 hbase.hstore.compactionThreshold = 3 (0.90)
hbase.hstore.compaction.max = 10
![Page 23: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/23.jpg)
23
Writes: Flushes and Compac-ons
CF1
Older Newer TIME
SIZE (MB)
1000
0
250
500
750 1. Compaction (Major auto promoted)
![Page 24: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/24.jpg)
24
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4
![Page 25: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/25.jpg)
25
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF1
HF4 HF5 HF4
![Page 26: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/26.jpg)
26
Writes: Flushes and Compac-ons
HF6
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5 HF4
![Page 27: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/27.jpg)
27
Writes: Flushes and Compac-ons
HF6
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF1
HF5 HF4
hbase.hstore.compaction.ratio = 1.2
hbase.hstore.compaction.min.size = flush size
![Page 28: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/28.jpg)
28
Writes: Flushes and Compac-ons
HF6
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF1 HF5
HF4
hbase.hstore.compaction.ratio = 1.2
120%
![Page 29: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/29.jpg)
29
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF2
2. Compaction (Major auto promoted)
![Page 30: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/30.jpg)
30
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF2
HF7
CF2
![Page 31: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/31.jpg)
31
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
![Page 32: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/32.jpg)
32
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9
![Page 33: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/33.jpg)
33
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF7 HF8
CF2
HF9 HF10
![Page 34: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/34.jpg)
34
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
HF7
HF8
CF2 HF9
HF10
hbase.hstore.compaction.ratio = 1.2
120%
Eliminate older to newer files, until in ratio
![Page 35: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/35.jpg)
35
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
CF2
CF3
3. Compaction
![Page 36: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/36.jpg)
36
Fast Forward...
![Page 37: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/37.jpg)
37
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
![Page 38: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/38.jpg)
38
Addi-onal Notes #1
There are a few more sejngs for compac-ons: • hbase.hstore.compaction.max = 10 Limit per maximum number of files per compac-on
• hbase.hstore.compaction.max.size = Long.MAX_VALUE Exclude files larger than that sejng (0.92+)
• hbase.hregion.majorcompaction = 1d Scheduled major compac-ons
![Page 39: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/39.jpg)
39
Addi-onal Notes #2
• hbase.hstore.compaction.kv.max = 10 Limits internal scanner caching during read of files to be compacted
• hbase.hstore.blockingStoreFiles = 7 Enforces upper limit of files for compac-ons to catch up -‐ blocks user opera-ons!
• hbase.hstore.blockingWaitTime = 90s Upper limit on blocking user opera-ons
![Page 40: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/40.jpg)
40
Write Fragmentation Yo, where’s the data at?
![Page 41: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/41.jpg)
41
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
We are looking at two specific rows, one is never changed, the other frequently
![Page 42: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/42.jpg)
42
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
![Page 43: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/43.jpg)
43
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
![Page 44: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/44.jpg)
44
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
1. Compaction (Major auto promoted)
Existing Row Mutations Unique Row Inserts
![Page 45: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/45.jpg)
45
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
![Page 46: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/46.jpg)
46
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
![Page 47: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/47.jpg)
47
Skip forward again...
![Page 48: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/48.jpg)
48
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
![Page 49: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/49.jpg)
49
Sou
rce:
http
://w
ww
.ngd
ata.
com
/vis
ualiz
ing-
hbas
e-flu
shes
-and
-com
pact
ions
/
![Page 50: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/50.jpg)
50
Compac-on Summary
• Compac-on tuning is important • Do not be too aggressive or write amplifica-on is no-ceable under load
• Use -mestamp/-‐ranges in Get/Scan to limit files
Ra+o Effect
1.0 Dampened, causes more store files, needs to be combined with an effec-ve Bloom filter usage (non random)
1.2 Default value, moderate sejng
1.4 More aggressive, keeps number of files low, causes more auto promoted major compac-ons to occur
![Page 51: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/51.jpg)
51
Interlude II: Bloom Filter Call me maybe, baby?
![Page 52: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/52.jpg)
52
Background on Bloom Filters
![Page 53: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/53.jpg)
53
Background on Bloom Filters
• Bit arrays of m bits, an k hash func-ons • HBase uses Hash folding
• Returns “No” or “Maybe” only • Error rate tunable, usually about 1% • At 1% error rate, op-mal k 9.6 bits per key
m=18, k=3
![Page 54: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/54.jpg)
54
Seeking with Bloom Filters
![Page 55: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/55.jpg)
55
Read Time Series Entry
• Event record is wriNen once and never deleted or updated
• Keeps en-re record in specific loca-on in storage files
• Use -me range to indicate what is needed • {Get|Scan}.setTimeRange() • Helps system to skip unnecessary (older) files
• Bloom Filter helps for given row key(s) and column qualifiers
• Can skip files not containing requested details
![Page 56: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/56.jpg)
56
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Existing Row Mutations Unique Row Inserts
Single Block Read (64K) Block filter and/or -me range eliminates all other store files
![Page 57: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/57.jpg)
57
Read Updateable En-ty
• Data is updated regularly, aging out at intervals • Reading en-ty needs to read all details to recons-tute the current state
• Deletes mask out aNributes • Updates overrides (or complements) aNributes
• Bloom filters will have a hard -me to say “no” since most files might contain en-ty aNributes
• Time filter on scans or gets also has few op-ons to skip files since older aNributes might s-ll be important
![Page 58: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/58.jpg)
58
Writes: Flushes and Compac-ons
Older Newer TIME
SIZE (MB)
1000
0
250
500
750
Bloom Filter returns “yes” for all but two files: 7+ block loads (64KB) needed
yes
yes yes yes
yes no
yes yes no
![Page 59: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/59.jpg)
59
Bloom Filter Op-ons
There are three choices: • NONE Duh! Use this when the Bloom Filter is not useful based on the use-‐case (Default sejng)
• ROW Index only row key, needs an entry per row key in Bloom Filter
• ROWCOL Index row and column key, requires an entry in the Filter for every column cell (KeyValue)
![Page 60: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/60.jpg)
60
How to decide?
![Page 61: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/61.jpg)
61
Bloom Filter Summary
• They help a lot -‐ but not always • Highly depends on write paNerns
• Keep an eye on size, since they are cached • HFile v2 helps here as it only loads root index info
“Bloom filters can get as large as 100 MB per HFile, which adds up to 2 GB when aggregated over 20 regions. Block indexes can grow as large as 6 GB in aggregate size over the same set of regions.”
Source: hNp://hbase.apache.org/book/hfilev2.html
![Page 62: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/62.jpg)
62
Interlude III: Write-ahead Log
The lonesome writer tale.
![Page 63: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/63.jpg)
63
Write-‐ahead Log -‐ Data Flow
![Page 64: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/64.jpg)
64
Write-‐ahead Log -‐ Overview
• One file per Region Server • All regions have a reference to this file
• Actually a wrapper around the physical file • The file is in the end a Hadoop SequenceFile
• Stored in HDFS so it can be recovered ater a server failure
• There is a synchroniza+on barrier that impacts all parallel writers, aka clients
• Overall performance is BAD, maybe 10MB/s
![Page 65: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/65.jpg)
65
Write-‐ahead Log -‐ Workarounds
• Enable log compression hbase.regionserver.wal.enablecompression
• Disable WAL for secondary records • Restore indexes or derived records from main one • But be careful to use coprocessor hook as it cannot access currently replaying region
• Work on upstream JIRAs • Mul+ple logs per server • Fix single writer issue in HDFS
![Page 66: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/66.jpg)
66
Back to the main theme...
Yes, message stores.
![Page 67: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/67.jpg)
67
Schema
• Every line is an inbox • Indexes as CFs or separate tables
• Random updates and inserts cause storage file churn • Facebook used more than 4 or 5 schema itera+ons
• Not representa-ve really: pure blob storage • Evolved over -me to be more HBase like
• Another customer iterated about the same -me over various schemas
• Difficult to keep indexes up to date
![Page 68: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/68.jpg)
68
Facebook Messages An interesting use-case…
![Page 69: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/69.jpg)
69
Facebook Messages -‐ Sta-s-cs
Source: HBaseCon 2012 - Anshuman Singh
![Page 70: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/70.jpg)
70
![Page 71: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/71.jpg)
71
![Page 72: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/72.jpg)
72
Schema 1
![Page 73: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/73.jpg)
73
Notes on Facebook Schema 1
This is basically the same as the NameNode, i.e. the applica-on only writes edits and those are merged with a snapshot of the data. The applica-on does not use HBase as an opera-onal store, but all data is cached in memory. Writes occasionally large chunks, and reads only a few -mes to merge or recover.
![Page 74: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/74.jpg)
74
Notes on Facebook Schema 1
Three column families: • Snapshot, Ac+ons, Keywords Sejngs changes: • DFS Block Size: 256MB
• Since large KVs are wriNen • Efficiency of HFile block index a concern
• Compac-on ra-o: 1.4 • Be more aggressive to clean up files
• Split Size: 2TB • Manage splijng manually
• Major Compac-ons: 3 days
![Page 75: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/75.jpg)
75
Schema 2
![Page 76: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/76.jpg)
76
Notes on Facebook Schema 2
• Eight column families • Snapshots per thread (user to user)
Sejngs changes: • Block Cache Size: 55%
• Cache more data on HBase side • Blocking Store Files: 25
• Allow more files to be around • Compac-on Min Size: 4MB
• Reduce number of uncondi-onally selected files • Major Compac-ons: 14 days
![Page 77: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/77.jpg)
77
Schema 2
![Page 78: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/78.jpg)
78
Notes on Facebook Schema 3
• Eleven column families • Twenty regions per server • One hundred server per cluster
Sejngs changes: • Block Cache Size: 60%
• Cache more data on HBase side
• Region Slop: 5% (from 20%) • Keep strict boundaries on regions per server
![Page 79: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/79.jpg)
79
![Page 80: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/80.jpg)
80
Note the imbalance! Recall flushes are interconnected and causes compac-on storms.
![Page 81: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/81.jpg)
81
FB Messages Summary
• Triggered many changes in HBase: • Change compac-on selec-on algorithm • Upper bounds on file sizes • Pools for small and large compac-ons • Online schema changes • Finer grained metrics • Lazy seeking in files • Point-‐seek op-miza-ons • …
![Page 82: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/82.jpg)
82
FB Messages Summary
• Went from “Snapshot” to more proper schema • Needed to wait for schema to seNle • Could sustain warped load for a while • Eventually uses HBase more as KV store
• Tweaked sejngs depending on schema • Tuned compac-ons from aggressive to relaxed • Changed block sizes to fit KV sizes
• Strict limit on I/O • 100 server • 20 regions per server • 50 million users per cluster
![Page 83: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/83.jpg)
83
Use-‐Case II: Time Series Database
![Page 84: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/84.jpg)
84
Events make big data big
• Majority use cases are dealing with event based data • Especially on HDFS and MapReduce level
• Machine Scale vs. Human Scale • Event has aNributes
• Type • Iden-fier • Actor • Other aNributes
![Page 85: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/85.jpg)
85
Events contd.
• Accessing event data • Give me everything about event e_id1 • Give me everything in [t1,t2] • Give me everything for event type e_t1 in [t1,t2] • Give me everything for actor a1 in [t1,t2] • Give me everything for event type e_t1 by actor a1 in [t1,t2]
• Aggregate based on some parameters (like above) and report
• Find events that match some other given criteria
![Page 86: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/86.jpg)
86
HBase and Time Series
• Access paNerns suited for HBase • Random access to event data or aggregate data • Serving… Not real -me compu-ng (that’s Impala)
• Schema design is the tricky thing • OpenTSDB does this well (but limited) • Key principle:
• Collocate data you want to read together • Spread out as much as possible at write -me • The above two are conflic-ng in a lot of cases. So, you decide on trade off
![Page 87: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/87.jpg)
87
Time Series design paNerns
• Ingest • Flume or direct wri-ng via app
• HDFS • Batch queries in Hive • Faster queries in Impala • No user -me serving
• HBase • Serve individual events (OpenTSDB) • Serve pre-‐computed aggregates (OpenTSDB, FB Insights)
• Solr • To make individual events searchable
![Page 88: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/88.jpg)
88
Time Series design paNerns
• Land data in HDFS and HBase • Aggregate in HDFS and write to HBase
• HBase can do some aggregates too (counters)
• Keep serve-‐able data in HBase. Then discard (TTL tw) • Keep all data in HDFS for future use
![Page 89: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/89.jpg)
89
The story with only HBase
• Landing des-na-on • Aggregates via counters • Serving end users • Event -‐> Flume/App -‐> HBase
• Raw entry in HBase for exact value • Mul-ple counter increments for aggregates
• OSS implementa-on -‐ OpenTSDB
![Page 90: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/90.jpg)
90
Overall Summary
![Page 91: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/91.jpg)
91
Applica-ons in HBase
Requires working with schema peculiari-es and implementa-on idiosyncrasies. Important is to compute write rate and un-‐op+mize schema to fit given hardware. If hardware is no issue then the op-mum is achievable. Trifacta of good performance: Compac+ons, Bloom Filters, and key design. (but also look out for Memstore and Blockcache sejngs)
![Page 92: HBase Applications - Atlanta HUG - May 2014](https://reader033.vdocuments.net/reader033/viewer/2022060107/554ba204b4c905b3618b4b2f/html5/thumbnails/92.jpg)
92
Ques-ons?