cooperative regenerating codes for distributed storage systems kenneth shum (joint work with yuchong...

51
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Cooperative regenerating codes for distributed storage systems

Kenneth Shum(Joint work with Yuchong Hu)

22nd July 2011

Page 2: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Multiple node failures

• Large-scale storage system– Google data center, example from Kannan’s talk.– 800000 servers, fail rate = 4% per year– Repair in 2 days– Mean number of failed servers in 2 days = 175.

• The lazy-repair policy in TotalRecall– A repair process is triggered only after the number

of failed nodes has reached a certain threshold.

Jul, 2011 2kshum

Page 3: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Jointly repair multiple failures

Jul, 2011

Hu et al. (JSAC, Feb 2010)3

Can we further reduce therepair-bandwidth?

Data exchange

kshum

Storage nodes Newcomers

Page 4: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Distributed storage (erasure coding)

Jul, 2011 4

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

Data Collector

Wu, Dimakis ISIT09

kshum

Page 5: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Naive Repair

Jul, 2011 5

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

4 packets required.

A1

A2

B 1, B 2

A 1+B 1

, 2 A 1

+B 2

kshum

Page 6: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Repair with ``code alignment’’

Jul, 2011 6

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

A1

A2

3 packets required.

B 1+ B 2

A 1+2

A 2+B 1

+ B 2

2 A 1

+ A 2

+B1+

B 2

Solve:P1 = A1+2 A2

P2 = 2 A1+ A2

kshum

Page 7: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Multiple failures, separate repair

Jul, 2011 7

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

8 packets in total4 packets per newcomer

B1

B2

2 packets

2 packets

2 A1+B1

A2+B2

2 packets

2 packets

kshum

Page 8: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Multiple failures, cooperative repair (I)

Jul, 2011 8

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

6 packets in total3 packets per newcomer

A1 , A

2

2A2+B

2A1+B

1

B1,B2

B1

B2

2 A1+B1

A2+B2

kshum

Page 9: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Multiple failures, cooperative repair (II)

Jul, 2011 9

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

6 packets in total3 packets per newcomer

A 1+B 1

A1

A1

A1+B1

A2

2A2 +B

2 A2

2A2+B2

B 2

B22A

1 +B1

2A1+B1

A2+B2

B1

kshum

Page 10: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Outline of the talk

• Is it optimal in terms of repair-bandwidth?• What is the tradeoff between storage and

repair-bandwidth for cooperative repair?• Can we achieve the Pareto-optimal operating

points on the tradeoff curve by linear network coding?– Exact repair– Functional repair

Jul, 2011 10kshum

Page 11: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

In2

Information flow graph

Jul, 2011 11

S

In1 Out1

DataCollector

Out2In3 Out3

In4 Out4

In5 Out5

Out6

Out7

1

1

1

In6

In71

1

1

Mid6Mid7

2

2

kshum

Page 12: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Is this regenerating code optimal ?

Jul, 2011 12

A1

A2

B1

B2

A1+B1

2 A2+B2

A1, A2,B1, B2

2 A1+B1

A2+B2

6 packets in total3 packets per newcomer

A 1+B 1

A1

A1

A1+B1

A2

2A2 +B

2 A2

2A2+B2

B 2

B22A

1 +B1

2A1+B1

A2+B2

A1

kshum

Page 13: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

In2

First cut

Jul, 2011 13

B

In1 Out1

DataCollector

Out2In3 Out3

In4 Out4

Out6

Out7

Mid6Mid7

2

2

1

1

1

1

B 4 1

In6

In7

kshum

Page 14: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Second cut

Jul, 2011 14

Out1

DataCollector

Out2Out3

Out4

2 Out1

2 Out2

Mid1Mid2

2

2

1

1

1

1

Out3

Out4

Mid3Mid4

2

2

In1In2

In3

In4

1 1

B 2+1+ 2

kshum

Page 15: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

A linear programming problem

• Minimize 21+ 2 (repair bandwidth)

• Subject to4 41

4 2+1 + 2

1 , 2 0

Jul, 2011 15

1 1 2 1

2

1

1

1

At least 3 packetskshum

Page 16: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

In2

Non-homogeneous download traffic

Jul, 2011 16

B

In1 Out1

DataCollector

Out2In3 Out3

In4 Out4

Out6

Out7

Mid6Mid7

2

2

a

d

c

b

B a +b +c +d

In6

In7

kshum

Page 17: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Non-homogeneous traffic

Jul, 2011 17

Out1

DataCollector

Out2Out3

Out4

2Out1

2 Out2

Mid1Mid2

2

2

1

1

1

1

Out3

Out4

Mid3Mid4

i

j

In1In2

In3

In4

h

f

e

fg

B 2+f +j

kshum

Page 18: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Non-homogeneous traffic

Jul, 2011 18

Out1

DataCollector

Out2Out3

Out4

2Out1

2 Out2

Mid1Mid2

2

2

1

1

1

1

Out3

Out4

Mid3Mid4

i

j

In1In2

In3

In4

h

f

e

fg

B 2+f +j

B 2+h +i

kshum

Page 19: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Non-homogeneous traffic

Jul, 2011 19

Out1

DataCollector

Out2Out3

Out4

2 Out1

2 Out2

Mid1Mid2

2

2

1

1

1

1

Out3

Out4

Mid3Mid4

i

j

In1In2

In3

In4

h

f

e

fg

B 2+f +j

B 2+h +i

B 2+e +j

kshum

Page 20: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Non-homogeneous traffic

Jul, 2011 20

Out1

DataCollector

Out2Out3

Out4

2 Out1

2 Out2

Mid1Mid2

2

2

1

1

1

1

Out3

Out4

Mid3Mid4

i

j

In1In2

In3

In4

h

f

e

fg

B 2+f +j

B 2+h +i

B 2+e +j

B 2+g +i

kshum

Page 21: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

The same LP problem

• Minimize• Subject to

Jul, 2011 21

1

1

At least 3 packetskshum

Page 22: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

TRADEOFF BETWEENSTORAGE AND REPAIR-BANDWIDTH

Jul, 2011 22kshum

Page 23: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

120 130 140 150 160 170 180100

105

110

115

120

125

130

135

140

Repair bandwidth per failed node

Sto

rage

per

nod

e

Storage vs Repair-bandwidth

Jul, 2011 23

One-by-one repair

Repairing 3 newcomers jointly

File size = 420d = 8k = 4

d

DCk

kshum

(S., ICC 2011, Kermarrec, Le Scouamec and Straub, Netcod 2011.)

Page 24: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Fair comparison?

Jul, 2011 24

One-by-one repair

repair degree = 8

Cooperative repair

Sur

vivi

ng n

odes

Sur

vivi

ng n

odes

Number of connectionsper each newcomer = 8

Number of connectionsper each newcomer = 8+2

kshum

Page 25: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

120 130 140 150 160 170 180100

105

110

115

120

125

130

135

140

Repair bandwidth per failed node

Sto

rage

per

nod

e

MBCR and MSCR

Jul, 2011 25

One-by-one repair

Cooperative repair

Minimum bandwidthcooperative repair (MBCR)

Minimum storagecooperative repair (MSCR)

kshum

Page 26: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

480 490 500 510 520 530 540 550450

460

470

480

490

500

Repair bandwidth per failed node

Sto

rage

per

nod

e,

How much can we improve?

Jul, 2011 26

One-by-one repair

Repairing 10 newcomers jointly

File size = 2275d = 30k = 5

d

DCk

When d is large,joint repair does not havesignificant advantage overone-by-one repair.

kshum

Page 27: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

180 200 220 240 260150

160

170

180

190

200

Repair bandwidth per failed node

Sto

rage

per

nod

e,

How much can we improve?

Jul, 2011 27

One-by-one repair

Repairing 10 newcomers jointly

File size = 616d = 8k = 4

d

DCk

Repair-bandwidth reductionis more prominent when d is not so large.

kshum

Page 28: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

AN EXPLICIT CONSTRUCTION FOR MINIMUM-BANDWIDTHCOOPERATIVE REPAIR

Jul, 2011 28kshum

Page 29: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

An explicit construction for MBCR

Jul, 2011 kshum 29

• Minimum repair-bandwidth

• Storage per node

• B = 8 information packets

• n = 4 nodes• Each node stores 5

packets.• Repair r = 2 failures

simultaneously• No. of connections

for each DC = k=2• No. of helpers for

each failed node =d=2

(S., Hu, ISIT 2011.) Require d = k, r = n–d

Page 30: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Min-Bandwidth point

5 5.5 6 6.5 7 7.5 8 8.5 9

3.5

4

4.5

5

5.5

6

Repair bandwidth per failed node

Sto

rage

per

nod

e

Jul, 2011 30kshum

One-by-one repair

Repairing 2 new nodes cooperatively

Page 31: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Data Distribution

8 data packets: A, B, C, D, E, F, G, H

A, B, C, D, F+G

C, D, E, F, H+A

E, F, G, H, B+C

G, H, A, B, D+E

XOR

5 packets: 4 systematic, 1 parity-check

Jul, 2011 31kshum

Page 32: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Data collection

A, B, C, D, F+G

C, D, E, F, H+A

E, F, G, H, B+C

G, H, A, B, D+E

Datacollector

A,B,C,D,E,F,G,H

A, B, C, D

E, F, G, H

Jul, 2011 32kshum

Page 33: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Data collection

A, B, C, D, F+G

C, D, E, F, H+A

E, F, G, H, B+C

G, H, A, B, D+E

Datacollector

A B C D E F G H

Triangular, Full-rank

F+GH+A

ABCDEF

A, B, C, F+G

D, E, F, H+A

Jul, 2011 33kshum

Page 34: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Exact Repair

A, B, C, D, F+G

C, D, E, F, H+A

E, F, G, H, B+C

G, H, A, B, D+E

BA DC

G HE F

F+GB+C

B+C

F+G

How to repair?

Total repair-bandwidth=10

Jul, 2011 34kshum

Page 35: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Exact Repair

A, B, C, D, F+G

C, D, E, F, H+A

E, F, G, H, B+C

G, H, A, B, D+E

C D

G H

D+EE H+A

B+CF+GF

E F

E F

E F

How to repair?

Total repair-bandwidth=10

Jul, 2011 35kshum

Page 36: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Min-Bandwidth point

5 5.5 6 6.5 7 7.5 8 8.5 9

3.5

4

4.5

5

5.5

6

Repair bandwidth per failed node

Sto

rage

per

nod

e

Jul, 2011 36kshum

One-by-one repair

Repairing 2 new nodes cooperatively

Page 37: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

AN EXPLICIT CONSTRUCTION FOR MINIMUM-STORAGE COOPERATIVE REPAIR

Jul, 2011 37kshum

Page 38: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

An explicit construction for MSCR

Jul, 2011 kshum 38

• Minimum repair-bandwidth

• Storage per node

• B = 6 information packets

• n nodes• Each node stores 2

packets.• Repair r = 2 failures

simultaneously• No. of connections

for each DC = k=3• No. of helpers for

each failed node =d=3

(S. ICC 2011.) Require d = k

Page 39: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

1 2 3 4 5 6 71

2

3

4

5

6

7

Repair bandwidth per failed node, d

Sto

rage

per

nod

e,

The min-storage point

Jul, 2011 39

Non-cooperative

k=3,d=3,r =2,B=6

Cooperativestorage cost per node = 2repair bandwidth per node = 4

3

DC3

kshum

Page 40: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Data retrieval

Jul, 2011 40

MDS code with dimension k=3Source data

encodecodeword

codeword

Storage nodes ……

Data collector

decode

=2

kshum

Page 41: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Repair : phase 1

Jul, 2011 41

encodecodeword

codeword

Storage nodes lost

lost

decode decodenewcomers

kshum

Source data

Page 42: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Repair: phase 2

Jul, 2011 42

encodecodeword

codeword

Storage nodes

lost

lost

Re-encode Re-encode

exchange

Repair bandwidth per node= 8/2 = 4

newcomers

kshum

Page 43: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

1 2 3 4 5 6 71

2

3

4

5

6

7

Repair bandwidth per failed node, d

Sto

rage

per

nod

e,

The construction is optimal

Jul, 2011 43

Non-cooperative

k=3,d=3,r =2,B=6

Cooperativestorage cost per node = 2repair bandwidth per node = 4

3

DC3

kshum

Page 44: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

EXISTENCE OF COOPERATIVE REGENERATING CODES UNDER FUNCTIONAL REPAIR

Jul, 2011 44kshum

Page 45: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Existence of optimal linear regenerating codes in general

• Sustainable storage system– Will it work after arbitrarily many repairs?

• Technical difficulty: The information flow graph is unbounded.

• Can we work over a fixed finite field, for unlimited number of regenerations?– Yes if we can construct an exact regenerating code.– The answer is also “yes” for cooperative functional

repair in general.

Jul, 2011 kshum 45

(S., Hu, Netcod 2011.)

Page 46: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Trellis structure

Jul, 2011 kshum 46

mMessage vector(row vector)

Stage 0 Stage 1 Stage 2

mT0

T0 is the “transfer matrix” in stage 0

mT0T1

T1 is the “transfer matrix” in stage 1

T2 is the “transfer matrix” in stage 2

mT0T1T2

Page 47: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Flow in information flow graph

Jul, 2011 kshum 47

S

Out1

Out2

Out3

Out4

In1

In2

Mid1

Mid2

Out1

Out2

5

5

5

5

5

52

2

2

2

1

1

DC

In3

In4

Mid3

Mid4

Out3

Out4

5

5

1

1

2

2

2

2

4

4

4

1

1

3

1

2

5

31

2

2

224

4

0

0

0Out3

Out4

The cut-set bound says that the cut capacity is at least 8.

Can we constructa flow with value 8?

Page 48: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Cross-sectional flow pattern

Jul, 2011 kshum 48

S

Out1

Out2

Out3

Out4

In1

In2

Mid1

Mid2

Out1

Out2

5

5

5

5

52

2

2

2

1

1

DC

In1

In2

Mid1

Mid2

Out1

Out2

5

1

1

2

2

2

2

4

4

4

1

1

3

1

2

5

31

2

2

2

24

4

0

0

0

5

3

0

0

4

4

0

0

4

0

4

0

Out3

Out4

Page 49: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

A recursive construction of flow

Jul, 2011 kshum 49

In1

In2

Mid1

Mid2

Out1

Out2

Out3

Out4

Out3

Out4

Stage s Stage s+1

g1

g2

g4

g3

h1

h2

h4

h3

1. Identify a set of cross-section flow pattern, say H.

2. For any cross-section flow pattern (h1, h2, h3, h4) in H stage s+1, we can find a flow in this segment of graph, such that (g1, g2, g3, g4) is also in H.

3. Each pattern corresponds to a submatrix of the transfer matrix.

4. By Schwartz-Zippel lemma, we can find the local encoding vectors so that all such determinants are non-zero, if the finite field is sufficiently large.

Page 50: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

Summary• Multiple node failures in medium-scale to

large-scale storage system• Formulation as a linear program• Functional repair: Linear regenerating code

over fixed finite field which matches the cut-set bound on repair-bandwidth exists.

• Exact repair: two families of explicit code constructions– Minimum-bandwidth point: d=k, r = n – d – Minimum-storage point: d=k, r arbitrary

Jul, 2011 50kshum

Page 51: Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011

References• Y. Wu and A. G. Dimakis, Reducing repair traffic for erasure coding-based storage

via interference alignment, ISIT, Jul, 2009.

• Y. Hu, Y. Xu, X. Wang, C. Zhan and P. Li, Cooperative recovery of distributed storage systems from multiple losses with network coding, J. Sel. Area Comm., vol. 28, no. 2, pp.268-275, Feb, 2010.

• K. W. Shum, Cooperative Regenerating Codes for Distributed Storage Systems, ICC, Jun, 2011.

• A.-M. Kermarrec and N. Le Scouarnec and G. Straub, Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes, Netcod, Jul, 2011.

• K. W. Shum and Y. Hu, Existence of Minimum-Repair-Bandwidth Cooperative Regenerating Codes, Netcod, Jul, 2011.

• K. W. Shum and Y. Hu, Exact Minimum-Repair-Bandwidth Cooperative Regenerating Codes for Distributed Storage Systems, ISIT, Aug, 2011.

Jul, 2011 kshum 51