cooperative regenerating codes for distributed storage systems kenneth shum (joint work with yuchong...
Post on 21-Dec-2015
219 views
TRANSCRIPT
Cooperative regenerating codes for distributed storage systems
Kenneth Shum(Joint work with Yuchong Hu)
22nd July 2011
Multiple node failures
• Large-scale storage system– Google data center, example from Kannan’s talk.– 800000 servers, fail rate = 4% per year– Repair in 2 days– Mean number of failed servers in 2 days = 175.
• The lazy-repair policy in TotalRecall– A repair process is triggered only after the number
of failed nodes has reached a certain threshold.
Jul, 2011 2kshum
Jointly repair multiple failures
Jul, 2011
Hu et al. (JSAC, Feb 2010)3
Can we further reduce therepair-bandwidth?
Data exchange
kshum
Storage nodes Newcomers
Distributed storage (erasure coding)
Jul, 2011 4
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
Data Collector
Wu, Dimakis ISIT09
kshum
Naive Repair
Jul, 2011 5
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
4 packets required.
A1
A2
B 1, B 2
A 1+B 1
, 2 A 1
+B 2
kshum
Repair with ``code alignment’’
Jul, 2011 6
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
A1
A2
3 packets required.
B 1+ B 2
A 1+2
A 2+B 1
+ B 2
2 A 1
+ A 2
+B1+
B 2
Solve:P1 = A1+2 A2
P2 = 2 A1+ A2
kshum
Multiple failures, separate repair
Jul, 2011 7
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
8 packets in total4 packets per newcomer
B1
B2
2 packets
2 packets
2 A1+B1
A2+B2
2 packets
2 packets
kshum
Multiple failures, cooperative repair (I)
Jul, 2011 8
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A1 , A
2
2A2+B
2A1+B
1
B1,B2
B1
B2
2 A1+B1
A2+B2
kshum
Multiple failures, cooperative repair (II)
Jul, 2011 9
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A 1+B 1
A1
A1
A1+B1
A2
2A2 +B
2 A2
2A2+B2
B 2
B22A
1 +B1
2A1+B1
A2+B2
B1
kshum
Outline of the talk
• Is it optimal in terms of repair-bandwidth?• What is the tradeoff between storage and
repair-bandwidth for cooperative repair?• Can we achieve the Pareto-optimal operating
points on the tradeoff curve by linear network coding?– Exact repair– Functional repair
Jul, 2011 10kshum
In2
Information flow graph
Jul, 2011 11
S
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
In5 Out5
Out6
Out7
1
1
1
In6
In71
1
1
Mid6Mid7
2
2
kshum
Is this regenerating code optimal ?
Jul, 2011 12
A1
A2
B1
B2
A1+B1
2 A2+B2
A1, A2,B1, B2
2 A1+B1
A2+B2
6 packets in total3 packets per newcomer
A 1+B 1
A1
A1
A1+B1
A2
2A2 +B
2 A2
2A2+B2
B 2
B22A
1 +B1
2A1+B1
A2+B2
A1
kshum
In2
First cut
Jul, 2011 13
B
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
Out6
Out7
Mid6Mid7
2
2
1
1
1
1
B 4 1
In6
In7
kshum
Second cut
Jul, 2011 14
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
2
2
In1In2
In3
In4
1 1
B 2+1+ 2
kshum
A linear programming problem
• Minimize 21+ 2 (repair bandwidth)
• Subject to4 41
4 2+1 + 2
1 , 2 0
Jul, 2011 15
1 1 2 1
2
1
1
1
At least 3 packetskshum
In2
Non-homogeneous download traffic
Jul, 2011 16
B
In1 Out1
DataCollector
Out2In3 Out3
In4 Out4
Out6
Out7
Mid6Mid7
2
2
a
d
c
b
B a +b +c +d
In6
In7
kshum
Non-homogeneous traffic
Jul, 2011 17
Out1
DataCollector
Out2Out3
Out4
2Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
kshum
Non-homogeneous traffic
Jul, 2011 18
Out1
DataCollector
Out2Out3
Out4
2Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
kshum
Non-homogeneous traffic
Jul, 2011 19
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
B 2+e +j
kshum
Non-homogeneous traffic
Jul, 2011 20
Out1
DataCollector
Out2Out3
Out4
2 Out1
2 Out2
Mid1Mid2
2
2
1
1
1
1
Out3
Out4
Mid3Mid4
i
j
In1In2
In3
In4
h
f
e
fg
B 2+f +j
B 2+h +i
B 2+e +j
B 2+g +i
kshum
The same LP problem
• Minimize• Subject to
Jul, 2011 21
1
1
At least 3 packetskshum
TRADEOFF BETWEENSTORAGE AND REPAIR-BANDWIDTH
Jul, 2011 22kshum
120 130 140 150 160 170 180100
105
110
115
120
125
130
135
140
Repair bandwidth per failed node
Sto
rage
per
nod
e
Storage vs Repair-bandwidth
Jul, 2011 23
One-by-one repair
Repairing 3 newcomers jointly
File size = 420d = 8k = 4
d
DCk
kshum
(S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)
Fair comparison?
Jul, 2011 24
One-by-one repair
repair degree = 8
Cooperative repair
Sur
vivi
ng n
odes
Sur
vivi
ng n
odes
Number of connectionsper each newcomer = 8
Number of connectionsper each newcomer = 8+2
kshum
120 130 140 150 160 170 180100
105
110
115
120
125
130
135
140
Repair bandwidth per failed node
Sto
rage
per
nod
e
MBCR and MSCR
Jul, 2011 25
One-by-one repair
Cooperative repair
Minimum bandwidthcooperative repair (MBCR)
Minimum storagecooperative repair (MSCR)
kshum
480 490 500 510 520 530 540 550450
460
470
480
490
500
Repair bandwidth per failed node
Sto
rage
per
nod
e,
How much can we improve?
Jul, 2011 26
One-by-one repair
Repairing 10 newcomers jointly
File size = 2275d = 30k = 5
d
DCk
When d is large,joint repair does not havesignificant advantage overone-by-one repair.
kshum
180 200 220 240 260150
160
170
180
190
200
Repair bandwidth per failed node
Sto
rage
per
nod
e,
How much can we improve?
Jul, 2011 27
One-by-one repair
Repairing 10 newcomers jointly
File size = 616d = 8k = 4
d
DCk
Repair-bandwidth reductionis more prominent when d is not so large.
kshum
AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTHCOOPERATIVE REPAIR
Jul, 2011 28kshum
An explicit construction for MBCR
Jul, 2011 kshum 29
• Minimum repair-bandwidth
• Storage per node
• B = 8 information packets
• n = 4 nodes• Each node stores 5
packets.• Repair r = 2 failures
simultaneously• No. of connections
for each DC = k=2• No. of helpers for
each failed node =d=2
(S., Hu, ISIT 2011.) Require d = k, r = n–d
Min-Bandwidth point
5 5.5 6 6.5 7 7.5 8 8.5 9
3.5
4
4.5
5
5.5
6
Repair bandwidth per failed node
Sto
rage
per
nod
e
Jul, 2011 30kshum
One-by-one repair
Repairing 2 new nodes cooperatively
Data Distribution
8 data packets: A, B, C, D, E, F, G, H
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
XOR
5 packets: 4 systematic, 1 parity-check
Jul, 2011 31kshum
Data collection
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
Datacollector
A,B,C,D,E,F,G,H
A, B, C, D
E, F, G, H
Jul, 2011 32kshum
Data collection
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
Datacollector
A B C D E F G H
Triangular, Full-rank
F+GH+A
ABCDEF
A, B, C, F+G
D, E, F, H+A
Jul, 2011 33kshum
Exact Repair
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
BA DC
G HE F
F+GB+C
B+C
F+G
How to repair?
Total repair-bandwidth=10
Jul, 2011 34kshum
Exact Repair
A, B, C, D, F+G
C, D, E, F, H+A
E, F, G, H, B+C
G, H, A, B, D+E
C D
G H
D+EE H+A
B+CF+GF
E F
E F
E F
How to repair?
Total repair-bandwidth=10
Jul, 2011 35kshum
Min-Bandwidth point
5 5.5 6 6.5 7 7.5 8 8.5 9
3.5
4
4.5
5
5.5
6
Repair bandwidth per failed node
Sto
rage
per
nod
e
Jul, 2011 36kshum
One-by-one repair
Repairing 2 new nodes cooperatively
AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR
Jul, 2011 37kshum
An explicit construction for MSCR
Jul, 2011 kshum 38
• Minimum repair-bandwidth
• Storage per node
• B = 6 information packets
• n nodes• Each node stores 2
packets.• Repair r = 2 failures
simultaneously• No. of connections
for each DC = k=3• No. of helpers for
each failed node =d=3
(S. ICC 2011.) Require d = k
1 2 3 4 5 6 71
2
3
4
5
6
7
Repair bandwidth per failed node, d
Sto
rage
per
nod
e,
The min-storage point
Jul, 2011 39
Non-cooperative
k=3,d=3,r =2,B=6
Cooperativestorage cost per node = 2repair bandwidth per node = 4
3
DC3
kshum
Data retrieval
Jul, 2011 40
MDS code with dimension k=3Source data
encodecodeword
codeword
Storage nodes ……
Data collector
decode
=2
kshum
Repair : phase 1
Jul, 2011 41
encodecodeword
codeword
Storage nodes lost
lost
decode decodenewcomers
kshum
Source data
Repair: phase 2
Jul, 2011 42
encodecodeword
codeword
Storage nodes
lost
lost
Re-encode Re-encode
exchange
Repair bandwidth per node= 8/2 = 4
newcomers
kshum
1 2 3 4 5 6 71
2
3
4
5
6
7
Repair bandwidth per failed node, d
Sto
rage
per
nod
e,
The construction is optimal
Jul, 2011 43
Non-cooperative
k=3,d=3,r =2,B=6
Cooperativestorage cost per node = 2repair bandwidth per node = 4
3
DC3
kshum
EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR
Jul, 2011 44kshum
Existence of optimal linear regenerating codes in general
• Sustainable storage system– Will it work after arbitrarily many repairs?
• Technical difficulty: The information flow graph is unbounded.
• Can we work over a fixed finite field, for unlimited number of regenerations?– Yes if we can construct an exact regenerating code.– The answer is also “yes” for cooperative functional
repair in general.
Jul, 2011 kshum 45
(S., Hu, Netcod 2011.)
Trellis structure
Jul, 2011 kshum 46
mMessage vector(row vector)
…
…
…
…
Stage 0 Stage 1 Stage 2
mT0
T0 is the “transfer matrix” in stage 0
mT0T1
T1 is the “transfer matrix” in stage 1
T2 is the “transfer matrix” in stage 2
mT0T1T2
Flow in information flow graph
Jul, 2011 kshum 47
S
Out1
Out2
Out3
Out4
In1
In2
Mid1
Mid2
Out1
Out2
5
5
5
5
5
52
2
2
2
1
1
DC
In3
In4
Mid3
Mid4
Out3
Out4
5
5
1
1
2
2
2
2
4
4
4
1
1
3
1
2
5
31
2
2
224
4
0
0
0Out3
Out4
The cut-set bound says that the cut capacity is at least 8.
Can we constructa flow with value 8?
Cross-sectional flow pattern
Jul, 2011 kshum 48
S
Out1
Out2
Out3
Out4
In1
In2
Mid1
Mid2
Out1
Out2
5
5
5
5
52
2
2
2
1
1
DC
In1
In2
Mid1
Mid2
Out1
Out2
5
1
1
2
2
2
2
4
4
4
1
1
3
1
2
5
31
2
2
2
24
4
0
0
0
5
3
0
0
4
4
0
0
4
0
4
0
Out3
Out4
A recursive construction of flow
Jul, 2011 kshum 49
In1
In2
Mid1
Mid2
Out1
Out2
Out3
Out4
Out3
Out4
Stage s Stage s+1
g1
g2
g4
g3
h1
h2
h4
h3
1. Identify a set of cross-section flow pattern, say H.
2. For any cross-section flow pattern (h1, h2, h3, h4) in H stage s+1, we can find a flow in this segment of graph, such that (g1, g2, g3, g4) is also in H.
3. Each pattern corresponds to a submatrix of the transfer matrix.
4. By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non-zero, if the finite field is sufficiently large.
Summary• Multiple node failures in medium-scale to
large-scale storage system• Formulation as a linear program• Functional repair: Linear regenerating code
over fixed finite field which matches the cut-set bound on repair-bandwidth exists.
• Exact repair: two families of explicit code constructions– Minimum-bandwidth point: d=k, r = n – d – Minimum-storage point: d=k, r arbitrary
Jul, 2011 50kshum
References• Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage
via interference alignment, ISIT, Jul, 2009.
• Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no. 2, pp.268-275, Feb, 2010.
• K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC, Jun, 2011.
• A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011.
• K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes, Netcod, Jul, 2011.
• K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011.
Jul, 2011 kshum 51