modern erasure codes for distributed storage systems · r erasure codes timeline r classical codes...
TRANSCRIPT
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Modern Erasure Codes for Distributed Storage Systems
Srinivasan Narayanamurthy (Srini)NetApp
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Everything around us is changing!r The Data Deluge
r Disk capacities and densities are increasing faster than the disk transfer rates
r Increased delay to recover using classical techniques lead to availability exposure
r Changing Storage Technologiesr Architectures: Scale-out, Distributed Storage, Cloud, Convergedr Media: Flash, NVM, SMR, Tape, et al.r Features: Geo-distribution, Security, Commodity hardware (Failure
is a norm!)r Newer Dimensions of Erasure Codes
r Optimality tradeoffs redefinedr More about this inside…
2
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Organization
r Backgroundr Erasure Codes Timeliner Classical Codes - (n, k) code
r Modern Codesr Locally Repairable Codes (Codes on Codes)r Regenerating Codes (Network Codes)
r Technical Analysisr Optimality Tradeoff and Reliability Analysisr System Requirements and Codes
r Literature & Key Players3
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Background
4
Timeline – Classical (n, k) codes
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Timeline – Overview
5
Reliability Performance Repair Degree Repair Bandwidth
Tradeoff against “Storage Overhead”
Fountain Codes
Locally Repairable Codes
Regenerating Codes
Classical Codes
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Timeline
6
THEORYSYSTEMS
FountainLuby, LT
1998
Tornado
2002
RapTor
2006 2013
XORBAS(fLRC)
On theLocality
2011
Azure(mLRC)
2012
Hierarchical
2008
Pyramid
2007
Optimal I/O
2015
DoubleReplica
2014
LocalRegeneration
2013
Network Codesfor Storage
2010
2016
Butterfly
2016
MDSArrayHamming
1950s
Reed-Solomon
1960
2002
RAID-6
1988
RAID
RepairingRS
2015
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Classical (n, k) Codes (6,4)
7Think distributed systems; repairs are expensive !
n encodedblocks
Object / File
k data blocks
Chu
nk
Enco
de
Encode
Re-
enco
de
Lost blocks
n encodedblocks
Rep
lace
Repair
Reconstructeddata
Survivingk blocks
Ret
rieve
Dec
ode
Decode
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Modern Erasure Codes
8
Locally Repairable Codes – Regenerating codes
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Locally Repairable codes
9
Repair Degree
Microsoft Azure (mLRC)
(k, l, r) codes : (6, 2, 2)Locality/Max. Recoverability
RS (14,10) & optimizing single block failures
Facebook XORBAS (fLRC)Locality/Min. Dist.
Hierarchical (Bottom-Up) & Pyramid (Top-Down)
Codes
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Regenerating Codes
10
An Information Flow Graph& Min-Cut Bound
Functional Repair
A
B
C
D
A+B+C+D
A+2B+C+2D
A+2B+3C+D
3A+2B+2C+3D
A
B
C
D
P1 = A+2B
P2 = 2C+D
P3 = 4A+5B+4C+5D
5A+7B+8C+7D
6A+9B+6C+6D
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Repair By Transfer (MBR) Codes
11
1 23 4
1 57 9
4 67 P
2 56 8
3 89 P
1
5
4
P
8
2
3
7
6
9
Pentagon Code
Local Regenerating Codes
Storage /Repair BW
Repair Bandwidth
Stor
age
MSR
MBRNo Codes
Exist!
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Technical Analysis
12
Optimality Tradeoffs – Reliability Analysis –System Requirements
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Summary of Codes and their Tradeoffs
13
Codes/Family Tradeoff
MDS Storage overhead Reliability
Replication & Parity Storage overhead Reliability
Reed-Solomon Storage overhead Reliability
Near-Optimal Correction capability Computational Complexity
Fountain Rate Probability of Correction
Codes on Codes Storage overhead Repair Degree (Fan-in)
Azure (mLRC) MDS Maximum Recoverability
XORBAS (fLRC) Locality Minimum Distance
Regenerating Storage overhead Repair Bandwidth
Local Regenerating Storage overhead Reconstruction Cost
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Regenerating Codes
MTTDL
0
0.5
1
1.5
2
2.5
Storage Overhead
0
6
12
18
24
30
Code Length
3-replica RS (14,10)
fLRC (10,6,5)
mLRC (6,2,2)
Locally Repairable Codes
MTTDL
0
0.5
1
1.5
2
2.5
Storage Overhead
0
4
8
12
16
20
Repair Traffic
Reliability Analysis
14
Locally Repairable Codes Regenerating Codes
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
System Requirements and Example Codes
15
System Properties of the System Requirements for a Code Example family/code
Most Important
Least Important
Most Important
Least Important
Arch
itect
ure
General-purposestorage array
Reliability & Performance Cost Reliability Complexity
MSR, SD/STAIR
Codes
Geo-distributed storage
Repair over WAN is
expensive
Storage overhead across DR
sites
Local repair Storage overhead LRC
Secure Storage Security Storage overhead
Faster degraded
readsRepair time
Non-systematic
codes; MBR
DistributedSystems
Parallelism & Availability
Storage overhead Systematic Storage
overhead Replication
Wor
kloa
d
Big Data (say, Hadoop)
Large volumes of
dataWrite latency Storage
overheadRepair
bandwidth
Regenerating (MSR/MBR),systematic
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Literature & Key Players
16
Theory & Systems
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Researchers, Big Players & Startups
17
U TennesseeJames Plank
Chinese U HongKongPatrick PC Lee
Jerasure(2014)
GF-Complete(2013)
SD-Codes(2013)
GF Intel SIMD(2013) STAIR
(2014)
Greenan
Flat XOR(2010)
PMDS(2013)
UC Berkeley
Kannan R
UTAustin
Alex Dimakis
IIScVijay Kumar
NTUOggierParikshit
Gopalan
Network Codesfor Storage
(2010)
Piggyback(2013) Hitchhiker
(2014)
Pyramid(2007)
Locality(2011)
mLRC(2012)
Self-repairing(2011)
MSR/MBR Points(2013)
MITMuriel
Medard
Network Flow& Linear Coding
XORBAS(2013)
PM (RBT)(2015)
PM (MSR)RBT
(2011-2015)
Double Rep(2014)
RS, FountainRLNC
Non-systematic RS
RS
Fountain
THEORYSYSTEMS
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Related Areasr Cross-object Coding
r Sector & Disk failures – PMDS, SD, STAIR Codesr Other media
r Flash: LDPC, WOM, Multi-write codes; NVMr Security
r Dispersal, AONT-RSr Cloud
r NC-Cloud
rTransformational Codes:r Transform encoded data to different parameters as they become
hot/cold without decoding and re-encoding
18
2016 Storage Developer Conference. © NetApp Inc. All Rights Reserved.
Schrodinger’s Code
“The condition of any system is unknown until a repair is
complete.”
19