nccloud : a network-coding-based storage system in a cloud-of-clouds

50
1 NCCloud: A Network-Coding- Based Storage System in a Cloud-of-Clouds Henry C. H. Chen Yuchong Hu Patrick P. C. Lee Yang Tang IEEE Transactions on Computers, 15 August 2013

Upload: vesta

Post on 23-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds. Henry C. H. Chen Yuchong Hu Patrick P. C. Lee Yang Tang. IEEE Transactions on Computers, 15 August 2013. Outline. Introduction Repair in Multiple Cloud Storage FMSR Codes NCCloud Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

1

NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds

Henry C. H. ChenYuchong Hu

Patrick P. C. LeeYang Tang

IEEE Transactions on Computers, 15 August 2013

Page 2: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

2

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 3: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

3

Introduction

ه Cloud storage provides an on-demand remote backup solution.

ه A single cloud storage provider encounters the problem such as a single point of failure.

Page 4: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

4

Introduction

ه The general solution is to distribute data across different cloud providers.ه stripe data

ه The fault-tolerance can be improved by the diversity of multiple clouds.

Page 5: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

5

Introduction-Data Failure

ه This paper focuses on unexpected permanent cloud failure.ه a cloud fails permanently => activate repair.ه maintain data redundancy and fault-tolerance.

ه A repair operation ه retrieves data from existing surviving clouds.ه reconstructs the lost data in a new cloud.

Page 6: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

6

Introduction-Data Failure

ه During repair, each surviving node ه encode its stored data chunks.ه send the encoded chunks to a new node

ه Regenerate the lost data.

Page 7: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

7

Introduction-Cost Problem

ه Today’s cloud storage providers charge users for outbound data.

ه While repairing failures, moving the enormous amount of data (repair traffic) can introduce significant monetary costs.

Page 8: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

8

Introduction-Repair Traffic Problem

ه In order to minimize repair traffic problem, regenerating codes [16] have been proposed. ه store data redundantly in a distributed storage

system.ه require less repair traffic, but with the same

fault-tolerance level.

[16] Network Coding for Distributed Storage Systems

Page 9: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

9

Introduction-Regenerating Codes

ه But, most existing regenerating codes require storage nodesه equip with computation capabilities.ه perform encoding operations during repair.

Page 10: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

10

Introduction-Regenerating Codes

ه In order to make regenerating codes portable to any cloud storage service.

ه This paper considers only a thin-cloud interface where storage nodes only support read/write.

Page 11: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

11

Introduction-NCCloud

ه In this paper, we present the design and implementation of NCCloudه a proxy-based storage system.ه a fault-tolerant storage.ه over multiple cloud storage providers.

Page 12: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

12

Introduction-FMSR

ه On top of NCCloud, we propose the functional minimum-storage regenerating (FMSR) codes.

ه The FMSR code implementation ه maintain double-fault tolerance.ه maintain the same storage cost as in RAID-6ه less repair traffic when recovering a single-cloud

failure.

Page 13: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

13

Introduction-FMSR

ه FMSR codes are non-systematic ه the encoded chunks was formed by linear

combination of the original data chunks.ه not keep the original data chunks as in

systematic coding schemes.

Page 14: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

14

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 15: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

15

Repair in Multiple Cloud Storage

ه Transient failureه is short-term, such that the failed cloud will

return to normal after some time and no outsourced data is lost.

Page 16: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

16

Repair in Multiple Cloud Storage

ه Permanent failureه is long-term, in the sense that the outsourced

data on a failed cloud will become permanently unavailable.

ه example : .data center outages in disastersى.data loss and corruptionى.malicious attacksى

Page 17: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

17

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codes

ه Motivationه Implementation

ه NCCloudه Conclusion

Page 18: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

18

Motivation

ه This paper considersه distributedه multiple-cloud storageه data is stripedه proxy-based design

Page 19: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

19

Motivation

Page 20: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

20

Fault-tolerant

ه Maximum Distance Separable propertyه (n, k)-MDS code

.divide file into equal-size native chunksى.linearly combined to form code chunksى

ه distribute over n (larger than k) nodes.ه reconstruct original file from any k of the n

nodes.ه tolerate the failures of any n − k nodes.

Page 21: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

21

Fault-tolerant

ه The FMSR codes can reconstruct the data of failed node from the surviving nodes.ه download less data.ه not reconstruct the whole file.

Page 22: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

22

Different Coding SchemesStorage size 2MRepair traffic M

Storage size 2MRepair traffic 0.75M

Storage size 2MRepair traffic 0.75M

Page 23: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

23

Double-fault Tolerant FMSR Codes

ه divide a file M into 2(n − 2) native chunks.ه generate 2n code chunks.ه each node store two code chunks of size .ه repair a failed node, repair traffic is .ه RAID-6 codes, total storage size is , repair traffic

is M.50% saved

Page 24: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

24

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codes

ه Motivationه Implementation

ه NCCloudه Conclusion

Page 25: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

25

FMSR Codes Implementation

ه FMSR codes do not require lost chunks to be exactly reconstructedه not identical to those in the failed node.

ه As long as the MDS property holds.

Page 26: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

26

FMSR Codes Implementation

ه This paper propose a two-phase checking scheme to ensure the code chunks on all nodes always satisfy the MDS property.

Page 27: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

27

FMSR Codes Implementation

ه The implementation assumes a thin-cloud interface.1. File upload2. File download3. Repair

Page 28: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

28

File Upload

ه Native chunks :

ه Code chunks :

ه Encoding matrix of coefficients : ه size ه in the Galois field GF(pn)

Page 29: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

29

File Upload

ه Galois field GF(pn)

Encoding coefficient vector

Page 30: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

30

File Download

1. Download the k(n−k) code chunks from any k of the n storage nodes.

2. The ECVs of the k(n−k) code chunks can form a k(n−k)×k(n−k) square matrix.

3. Obtain the original k(n − k) native chunks. ه multiply the inverse of the square matrix with the code

chunks.

Page 31: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

31

Iterative Repair

ه MDS property must hold even after iterative repairs.

ه This paper proposes a two-phase checking.ه MDS propertyه rMDS property

Page 32: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

32

Satisfy MDS, but not rMDS

Page 33: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

33

Iterative Repair

Step 1. Download the encoding matrix from a surviving node.

Step 2. Select one ECV from each of the n-1 surviving nodes.

Step 3. Generate a repair matrix .

Step 4. Compute the ECVs for the new code chunks and reproduce a new encoding matrix.

Page 34: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

34

Iterative Repair

Step 5. Given EM’, verify if those properties are satisfied.ه verify MDS by enumerating all .ه verify rMDS by n(n−k)n-1 .ه The corresponding encoding matrices must form a full rank.

Step 6. Download the actual chunk data and regenerate new chunk data.

ه Step 4 : The new ECVsه Code chunks from surviving nodes

Page 35: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

35

rMDS Sustaining

Page 36: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

36

Time of Two-phase Checking

Page 37: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

37

Double-fault Tolerant Codes

ه Markov Model

Page 38: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

38

MTTDL, Compare to RAID-6

Mean Time To Data Loss

Page 39: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

39

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 40: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

40

NCCloud

ه A proxy that bridges user applications and multiple clouds.

ه Its design is built on three layers.ه File system layerه Coding layerه Storage layer

Page 41: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

41

NCCloud

ه It is mainly implemented in Python, while the coding schemes are implemented in C for better efficiency.

Page 42: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

42

Goal of NCCloud

ه Compare the costs and response time of using RAID-6 and FMSR codes.

ه The cost advantage of FMSR over RAID-6, while maintaining acceptable response time.

Page 43: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

43

Goal of NCCloud

ه Normal operationsه RAID-6 and FMSR incur similar storage costs.

ه Repair operationه FMSR save a significant amount of transfer

costs over RAID-6.

Page 44: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

44

Cost Saving-Price

Page 45: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

45

Cost Saving

ه Normal operationsه 1.25PB of data stored

FMSR : $86,851 monthly storage costىRAID-6 : $86,851 monthly storage costى

ه Repair operationه RAID-6 : 1PB of data, $56,832ه FMSR : 0.5625PB of data, $33,894

Saving of $ 22,938

Page 46: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

46

Response Time-Local Cloud

Page 47: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

47

Response Time-Local Cloud

Page 48: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

48

Response Time-Commerical Cloud

Page 49: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

49

Outline

ه Introductionه Repair in Multiple Cloud Storageه FMSR Codesه NCCloudه Conclusion

Page 50: NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

50

Conclusion

ه This paper present NCCloud providing the reliability of today’s cloud backup storage.ه proxy-basedه multiple-cloud storage system

ه NCCloud not only provides fault tolerance in storage, but also allows cost-effective repair.

ه The FMSR code implementation eliminates the encoding requirement of storage nodes during repair.