Transcript
Page 1: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

1

NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds

Henry C. H. ChenYuchong Hu

Patrick P. C. LeeYang Tang

IEEE Transactions on Computers, 15 August 2013

Page 2: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

2

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 3: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

3

Introduction

ه Cloud storage provides an on-demand remote backup solution.

ه A single cloud storage provider encounters the problem such as a single point of failure.

Page 4: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

4

Introduction

ه The general solution is to distribute data across different cloud providers.ه stripe data

ه The fault-tolerance can be improved by the diversity of multiple clouds.

Page 5: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

5

Introduction-Data Failure

ه This paper focuses on unexpected permanent cloud failure.ه a cloud fails permanently => activate repair.ه maintain data redundancy and fault-tolerance.

ه A repair operation ه retrieves data from existing surviving clouds.ه reconstructs the lost data in a new cloud.

Page 6: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

6

Introduction-Data Failure

ه During repair, each surviving node ه encode its stored data chunks.ه send the encoded chunks to a new node

ه Regenerate the lost data.

Page 7: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

7

Introduction-Cost Problem

ه Today’s cloud storage providers charge users for outbound data.

ه While repairing failures, moving the enormous amount of data (repair traffic) can introduce significant monetary costs.

Page 8: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

8

Introduction-Repair Traffic Problem

ه In order to minimize repair traffic problem, regenerating codes [16] have been proposed. ه store data redundantly in a distributed storage

system.ه require less repair traffic, but with the same

fault-tolerance level.

[16] Network Coding for Distributed Storage Systems

Page 9: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

9

Introduction-Regenerating Codes

ه But, most existing regenerating codes require storage nodesه equip with computation capabilities.ه perform encoding operations during repair.

Page 10: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

10

Introduction-Regenerating Codes

ه In order to make regenerating codes portable to any cloud storage service.

ه This paper considers only a thin-cloud interface where storage nodes only support read/write.

Page 11: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

11

Introduction-NCCloud

ه In this paper, we present the design and implementation of NCCloudه a proxy-based storage system.ه a fault-tolerant storage.ه over multiple cloud storage providers.

Page 12: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

12

Introduction-FMSR

ه On top of NCCloud, we propose the functional minimum-storage regenerating (FMSR) codes.

ه The FMSR code implementation ه maintain double-fault tolerance.ه maintain the same storage cost as in RAID-6ه less repair traffic when recovering a single-cloud

failure.

Page 13: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

13

Introduction-FMSR

ه FMSR codes are non-systematic ه the encoded chunks was formed by linear

combination of the original data chunks.ه not keep the original data chunks as in

systematic coding schemes.

Page 14: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

14

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 15: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

15

Repair in Multiple Cloud Storage

ه Transient failureه is short-term, such that the failed cloud will

return to normal after some time and no outsourced data is lost.

Page 16: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

16

Repair in Multiple Cloud Storage

ه Permanent failureه is long-term, in the sense that the outsourced

data on a failed cloud will become permanently unavailable.

ه example : .data center outages in disastersى.data loss and corruptionى.malicious attacksى

Page 17: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

17

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codes

ه Motivationه Implementation

ه NCCloudه Conclusion

Page 18: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

18

Motivation

ه This paper considersه distributedه multiple-cloud storageه data is stripedه proxy-based design

Page 19: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

19

Motivation

Page 20: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

20

Fault-tolerant

ه Maximum Distance Separable propertyه (n, k)-MDS code

.divide file into equal-size native chunksى.linearly combined to form code chunksى

ه distribute over n (larger than k) nodes.ه reconstruct original file from any k of the n

nodes.ه tolerate the failures of any n − k nodes.

Page 21: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

21

Fault-tolerant

ه The FMSR codes can reconstruct the data of failed node from the surviving nodes.ه download less data.ه not reconstruct the whole file.

Page 22: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

22

Different Coding SchemesStorage size 2MRepair traffic M

Storage size 2MRepair traffic 0.75M

Storage size 2MRepair traffic 0.75M

Page 23: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

23

Double-fault Tolerant FMSR Codes

ه divide a file M into 2(n − 2) native chunks.ه generate 2n code chunks.ه each node store two code chunks of size .ه repair a failed node, repair traffic is .ه RAID-6 codes, total storage size is , repair traffic

is M.50% saved

Page 24: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

24

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codes

ه Motivationه Implementation

ه NCCloudه Conclusion

Page 25: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

25

FMSR Codes Implementation

ه FMSR codes do not require lost chunks to be exactly reconstructedه not identical to those in the failed node.

ه As long as the MDS property holds.

Page 26: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

26

FMSR Codes Implementation

ه This paper propose a two-phase checking scheme to ensure the code chunks on all nodes always satisfy the MDS property.

Page 27: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

27

FMSR Codes Implementation

ه The implementation assumes a thin-cloud interface.1. File upload2. File download3. Repair

Page 28: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

28

File Upload

ه Native chunks :

ه Code chunks :

ه Encoding matrix of coefficients : ه size ه in the Galois field GF(pn)

Page 29: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

29

File Upload

ه Galois field GF(pn)

Encoding coefficient vector

Page 30: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

30

File Download

1. Download the k(n−k) code chunks from any k of the n storage nodes.

2. The ECVs of the k(n−k) code chunks can form a k(n−k)×k(n−k) square matrix.

3. Obtain the original k(n − k) native chunks. ه multiply the inverse of the square matrix with the code

chunks.

Page 31: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

31

Iterative Repair

ه MDS property must hold even after iterative repairs.

ه This paper proposes a two-phase checking.ه MDS propertyه rMDS property

Page 32: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

32

Satisfy MDS, but not rMDS

Page 33: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

33

Iterative Repair

Step 1. Download the encoding matrix from a surviving node.

Step 2. Select one ECV from each of the n-1 surviving nodes.

Step 3. Generate a repair matrix .

Step 4. Compute the ECVs for the new code chunks and reproduce a new encoding matrix.

Page 34: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

34

Iterative Repair

Step 5. Given EM’, verify if those properties are satisfied.ه verify MDS by enumerating all .ه verify rMDS by n(n−k)n-1 .ه The corresponding encoding matrices must form a full rank.

Step 6. Download the actual chunk data and regenerate new chunk data.

ه Step 4 : The new ECVsه Code chunks from surviving nodes

Page 35: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

35

rMDS Sustaining

Page 36: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

36

Time of Two-phase Checking

Page 37: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

37

Double-fault Tolerant Codes

ه Markov Model

Page 38: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

38

MTTDL, Compare to RAID-6

Mean Time To Data Loss

Page 39: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

39

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 40: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

40

NCCloud

ه A proxy that bridges user applications and multiple clouds.

ه Its design is built on three layers.ه File system layerه Coding layerه Storage layer

Page 41: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

41

NCCloud

ه It is mainly implemented in Python, while the coding schemes are implemented in C for better efficiency.

Page 42: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

42

Goal of NCCloud

ه Compare the costs and response time of using RAID-6 and FMSR codes.

ه The cost advantage of FMSR over RAID-6, while maintaining acceptable response time.

Page 43: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

43

Goal of NCCloud

ه Normal operationsه RAID-6 and FMSR incur similar storage costs.

ه Repair operationه FMSR save a significant amount of transfer

costs over RAID-6.

Page 44: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

44

Cost Saving-Price

Page 45: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

45

Cost Saving

ه Normal operationsه 1.25PB of data stored

FMSR : $86,851 monthly storage costىRAID-6 : $86,851 monthly storage costى

ه Repair operationه RAID-6 : 1PB of data, $56,832ه FMSR : 0.5625PB of data, $33,894

Saving of $ 22,938

Page 46: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

46

Response Time-Local Cloud

Page 47: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

47

Response Time-Local Cloud

Page 48: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

48

Response Time-Commerical Cloud

Page 49: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

49

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 50: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

50

Conclusion

ه This paper present NCCloud providing the reliability of today’s cloud backup storage.ه proxy-basedه multiple-cloud storage system

ه NCCloud not only provides fault tolerance in storage, but also allows cost-effective repair.

ه The FMSR code implementation eliminates the encoding requirement of storage nodes during repair.


Top Related