identifying hot and cold data in main-memory databases
DESCRIPTION
Identifying Hot and Cold Data in Main-Memory Databases. Justin Levandoski Per- Åke Larson. Radu Stoica. Cold Data. Records accessed infrequently in OLTP setting May have become irrelevant to workload Time-correlated and natural skew in real workloads UPS packages - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/1.jpg)
Identifying Hot and Cold Data in Main-Memory
Databases
Justin Levandoski
Per-Åke LarsonRadu Stoica
![Page 2: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/2.jpg)
Cold Data
• Records accessed infrequently in OLTP setting• May have become irrelevant to workload
• Time-correlated and natural skew in real workloads• UPS packages• Some users more active than others
![Page 3: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/3.jpg)
Cold Data (2)• Do we need to keep cold
data in memory?
• High-density DRAM comes at a large premium
• The N-Minute Rule for modern hardware prices
16 64 128 256 512 1024 20480
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000RAM
System
2TB Flash (high)
2TB Flash (low)
System Memory (GB)
Pric
e (1
000s
of $
)
1U 2U 4U1U 2U 4U
Record Size Store on Flash After
200 Bytes 60 Minutes
1 KB 11.6 Minutes
2KB 5.8 Minutes
4KB 2.9 Minutes
![Page 4: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/4.jpg)
Microsoft SQL Server “Hekaton”
• Main-memory optimized database engine targeting OLTP workloads
• Designed for high-levels of concurrency• Multi-version optimistic concurrency control• Latch-free data structures
• Record-centric data organization• No page-based organization• Oblivious to OS memory pages
![Page 5: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/5.jpg)
Motivation• First step in managing cold data is identifying it
• Why not caching?• No page-based organization to exploit• Only record granularity
• Space overhead• Cache techniques requires per-item statistics
• Overhead on critical path• Cache techniques require book-keeping on critical path• Queue updates introduce 25% overhead in our system
• No need to meet a hard deadline• Property we can exploit• Cold data management is a performance/cost optimization
![Page 6: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/6.jpg)
Requirements
• Minimize space overhead• Cannot store access statistics on records in the database• 8 Bytes * 50B records = 372 GB
• Minimize overhead on critical path• Main-memory systems have short critical paths (speed)• Remove hot/cold classification decision from critical path
![Page 7: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/7.jpg)
Outline
• Overview of approach• Siberia• Logging and Sampling• Exponential smoothing
• Classification Algorithms
• Performance
• Conclusion
![Page 8: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/8.jpg)
Siberia: Cold Data Management
Index Cursor
Cold record cache
Scanner
Hot index scanner
Memory(hot storage)
Cold index scanner
Access filtersUpdate
memo
Access log
Siberia (cold
storage)
Offline analysis
Periodic migration
• Project studying cold data management in Hekaton
1. Cold data classification (this talk)
2. Storage techniques
3. Cold data access and migration
4. Cold storageaccess reduction
![Page 9: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/9.jpg)
Logging and Sampling
• Log record accesses• Accesses can be batched and written asynchronously
• Write only a sample of record accesses to the log• Sampling minimizes overhead on the critical path
• Analyze log to classify hot and cold records• Can be done periodically• Classification can be moved to separate machine if necessary
<t1>RID2RID6RID3<t2>RID1RID4RID2<t3>RID1RID5
![Page 10: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/10.jpg)
Exponential Smoothing
• Used to estimate access frequencies
• W(tk): Score at time tk
• Xtk: observed value at time tk
• α : decay constant
• In our scenario, xtk = 1 if observed, 0 otherwise
𝒘 (𝒕𝒌 )=𝜶∗𝒙 𝒕𝒌+(𝟏−𝜶 )∗𝒘 (𝒕𝒌−𝟏)
tB=t
1
tn=t
E
t2
Example timeline for single record:
0 1 0 0 1 1
t3 . . . .
![Page 11: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/11.jpg)
Exponential Smoothing (2)
• Chosen due to simplicity and high accuracy• Advantages over well-known caching policies• Works very well with sampling
0.01 0.02 0.05 0.10 0.20 0.40 0.80 1.60 3.10 6.30 12.5025.0050.000
1
2
3
4
5
6
7
8
9
SES LRU-2 ARC
Hot Data Size (% of Total Data)
Loss
in H
it R
ate
(%)
Lower is better
Vs. Well-known Caching Techniques
![Page 12: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/12.jpg)
Outline
• Overview of approach
• Classification Algorithms• Forward approach• Backward approach
• Performance
• Conclusion
![Page 13: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/13.jpg)
13
Classification Algorithms
• All of our classification algorithms take the same input and produce the same output
• Input– Log of sampled record accesses– Parameter K, signifying number of “hot” records
• Output– K “hot” record Ids– All approaches use exponential smoothing
![Page 14: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/14.jpg)
14
Forward Algorithm
• Forward algorithm– Scan log forward from time t1 to tn
– Upon encountering an access for a record, update its estimate in a hash table
– Update estimate as:
– Once scan is finished, classify top K records with highest access frequency estimate as hot (the rest as cold)
t1
tn
Record ID Estimate Last Access
A .45 t4
G .98 t2
L .03 t7
P .34 t3
Z .20 t8
𝒘 (𝒕𝒌 )=𝜶∗𝒙 𝒕𝒌+𝒘 (𝒕𝒑𝒓𝒆𝒗)(𝟏−𝜶 )𝒕𝒌−𝑳𝒂𝒔𝒕 𝑨𝒄𝒄𝒆𝒔𝒔
![Page 15: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/15.jpg)
15
Forward Algorithm in Parallel
• Hash partition by record id– Assign a worker thread per partition– Final classification done using estimates from every
partition
• Partition the log by time– Assign a worker thread to each log segment– Each thread calculates partial estimate for its segment
t1 tn
Segment1 Segment2
ta tb
Segment3
Thread1 Thread2 Thread3
![Page 16: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/16.jpg)
Backward Algorithm• Avoid scanning entire log from beginning to end• Algorithm
• Scan log backward from time tn to t1
• Sharpen upper and lower bound for record’s final estimate
• Ocassionally attempt classification using estimate bounds
• Discard records that cannot possibly be in hot set
t1
tn
Time slice tn: Time slice tn-4:
below threshold,disregard
thrownout
Kth lower boundKth lower bound
![Page 17: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/17.jpg)
Calculating Bounds
• For record R at time tk, calculate base estimate as:
• Upper bound on final estimate value
• Lower bound on final estimate value
𝑤𝑅 (𝑡𝑘 )=𝛼∗ (1−𝛼 )𝑡𝐸−𝑡𝑘+𝑊 𝑅(𝑡 𝑙𝑎𝑠𝑡)tlast is timeslice of last observed record access, where tlast > tk
𝑈𝑝𝑝𝑒𝑟𝑅 ( 𝑡𝑘 )=𝑤𝑅 (𝑡𝑘)+(1−𝛼)𝑡𝐸−𝑡 𝑘+1
𝐿𝑜𝑤𝑒𝑟𝑅 ( 𝑡𝑘 )=𝑤𝑅 (𝑡𝑘 )+(1−𝛼)𝑡𝐸−𝑡𝐵+1
wR(tk)
UpperR(tk)
LowerR(tk)
Assume record will be present in every time slice moving back in the log
Assume record will not be observed again in the log
![Page 18: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/18.jpg)
Backward Algorithm in Parallel
• Partition log into N pieces using hash function on record id
Log1
HashFunction
Log2
….LogN
Log
![Page 19: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/19.jpg)
Phase I: Initialization
Log1 Log2 LogN
Thread1 Thread2 Thread3
Coordinator Thread
• N worker thread and one coordinator thread• Each worker reads back in log segment to find
all records in contention for local top K/N • Each worker may read back different lengths
K=9
![Page 20: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/20.jpg)
Phase I: Initialization (2)
Thread1 Thread2 Thread3
Coordinator Thread
0.70.6
0.8
Thresh: 0.7Upper: 6Lower: 3
Thresh: 0.6Upper: 7Lower: 3
Thresh: 0.8Upper: 8Lower: 3
• Each worker thread reports only three values:– Its local K/Nth threshold value– Upper bound: number of records that are in contention to be in
top K/N records– Lower bound: minimum number of records that can be in top
K/N records
Global Statistics
Threshold .6 - .8
Upper 21
Lower 9
![Page 21: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/21.jpg)
Phase II: Threshold Search
• Final “global” threshold that renders at least K records is between high and low thresholds initialization phase
• Controller’s job is to find this final global threshold
• Basic idea:• Iterate, choosing new candidate threshold• Polls workers for upper and lower counts for the threshold• Can also ask workers to sharpen bounds for frequency estimates
• Details in the paper
![Page 22: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/22.jpg)
Tighten Bounds Operation
Thread1 Thread2 Thread3
Coordinator
• Tighten Bounds– Each worker reads backward in log and sharpens
estimates for its records– Paper describes details of how far back a worker
must read
TightenBounds TightenBounds TightenBounds
Global Statistics
Threshold .6 - .8
Upper 21
Lower 9
![Page 23: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/23.jpg)
Report Count Operation
Thread1 Thread2 Thread3
Coordinator
ReportCounts(0.75) ReportCounts(0.75) ReportCounts(0.75)
0.75
Upper: 2Lower: 1
Upper: 4Lower: 3
Upper: 3Lower: 2
• Workers calculate upper and lower counts given a threshold value from the coordinator
Global Statistics
Threshold 0.75
Upper 9
Lower 6
![Page 24: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/24.jpg)
Phase III: Merge and Report
Thread1 Thread2 Thread3
Coordinator
0.725
{r3, r9, r18} {r4, r8} {r17, r20, r32, r70}
r3r9
r18
r4
r8
r17r20
r32 r70
• Workers report the ids of all records with lower bounds above threshold
Global Statistics
Threshold 0.725
Upper 9
Lower 9
![Page 25: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/25.jpg)
Outline
• Overview of approach
• Classification Algorithms
• Performance• Setup• Wall Clock Time• Space Overhead
• Conclusion
![Page 26: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/26.jpg)
Experiment Setup
• Workstation-class machine • HP Z400 with Xeon W3550 @ 3 GHz
• 1 Million Records
• 1 Billion Accesses
• Three access distributions:• Zipf• TPC-E non-uniform random• Uniform random
• Eight worker threads used for parallel versions
![Page 27: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/27.jpg)
Wall Clock Time
0.1 1 10 20 40 50 800
50
100
150
200
250
300
Zipf Distribution
Forward Forward-ParallelBack Back-Parallel
Hot Set Size (% of Total Data)
Clas
sific
ation
Tim
e (s
ec)
0.1 1 10 20 40 50 800
100
200
300
400
500
Uniform Distribution
Forward Forward-PBack Back-Parallel
Hot Set Size (% of Total Data)
Clas
sific
ation
Tim
e (s
ec)
• Classification of 1B Accesses in sub-second times
![Page 28: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/28.jpg)
Space Overhead – Zipf Distribution
0.1 1 10 20 40 50 800
200000
400000
600000
800000
1000000
1200000
1400000
Fwd-Serial Fwd-Parallel Back-ParallelBack-Serial Hot Set Size
Hot Set Size (% of Total Data)
Max
Has
h Ta
ble
Size
6.72 6.72 6.72 6.72 6.72 6.72 6.72
![Page 29: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/29.jpg)
Outline
• Overview of approach
• Classification Algorithms
• Performance
• Conclusion
![Page 30: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/30.jpg)
Conclusion• Studied cold data identification in main-memory
databases
• Cold Data Classification Framework• Microsoft SQL Server Hekaton• No page-based organization• Logging and Sampling Record Accesses
• Classification Algorithms• Forward approach• Backward approach
• Great Performance• Consistent sub-second classification times for best algorithms• Close to optimal space overhead in practice
• Part of Project Siberia• Stay tuned!
![Page 31: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/31.jpg)
Questions?
![Page 32: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/32.jpg)
Tighten Bounds Details
32
• How far to read back?– During every count, remember minimum overlap
value from all records straddling the threshold.
Minimum overlap
• Bring upper/lower range of all records to value of minimum overlap
• Intuition: if bounds are tightened to size of minimum overlap, overall overlap between records should be sufficiently small
![Page 33: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/33.jpg)
Tighten Bounds Details
33
• How far do we read back in the log to tighten bound to given value M?• Recall bounds are calculated as follows:
𝑤𝑅 (𝑡𝑘 )=𝛼∗ (1−𝛼 )𝑡𝐸−𝑡𝑘+𝑊 𝑅(𝑡 𝑙𝑎𝑠𝑡) U
𝐿=(1−𝛼)𝑡𝐸− 𝑡𝐵+1
• Need to read back to a time slice tk such that:
< - L
• Solving for tk gives us:
=
![Page 34: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/34.jpg)
Backward Parallel: Three Phases
• Phase I: Initialization• Goal: find an initial set of candidates and gather their
statistics for defining a beginning search space
• Phase II: Threshold Search• Goal: find a suitable threshold estimate value that
will yield at least K records (can be within a tolerable error)
• Phase III: Merge and Report• Goal: retrieve hot record ids from each worker thread
and merge, creating the hot set.
![Page 35: Identifying Hot and Cold Data in Main-Memory Databases](https://reader030.vdocuments.net/reader030/viewer/2022032607/568130b8550346895d96da56/html5/thumbnails/35.jpg)
Forward Algorithm in Parallel (2)
• Each segment for an estimate calculated independently
• Only last term in each segment relies on score from previous piece
• Final aggregation step sums results of each piece
t1 tn
ta tb
Piece1 Piece2 Piece3