accelerated erasure coding: the new frontier of software ......title: accelerated erasure coding:...

28
Accelerated Erasure Coding: The New Frontier of Software Defined Storage Dineshkumar Bhaskaran Aricent – Altran Group

Upload: others

Post on 17-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 1

Accelerated Erasure Coding: The New Frontier of Software Defined Storage

Dineshkumar BhaskaranAricent – Altran Group

Page 2: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 2

Introduction to Data Resiliency

Traditional RAID and Mirroring Multiple disks are used for data placement thereby improving

performance and resiliency High storage overhead; high rebuild times Difficult to recover from co-related disk failures

Erasure coding Erasure coding is data protection method in which data is

encoded to data blocks and parity blocks. These are then stored across locations or storage nodes Compute intensive

Page 3: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 3

Erasure Coding : A primer

A traditional erasure code is representedas (k, m) where it encodes k data blockswith m parity blocks writes them to k+mstorage nodes

An optimal (or MDS) code can recoverfrom any ‘m’ node failures

A popular code is Reed-Solomon (RS).It has been successfully used in severalsolutions like Linux RAID-6, Google filesystem II, Hadoop, Facebook, etc.

Erasure coding

D1 D2 D3 D4 C1 C2

D

A (4, 2) erasure code

Page 4: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 4

Erasure Coding : Read and WriteTraditional erasure code• A (4, 2) erasure code has 4 data chunks and 2 parity chunks

Dat

a B

lock

(file

/obj

ect)

F1

F2

F3

F4

C1

C2

D1

D2

D3

D4

D1

D2

D3

D4

S2

S5

S4

S3

S6

S1 Dat

a B

lock

(f

ile/o

bjec

t)

App

licat

ion

App

licat

ion

Write Create chunks

encode WriteDisks

Read path Write path

Read

F1

F2

F3

F4

Recon-struct

Read

Page 5: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 5

Erasure Coding : Read and WriteTraditional erasure code• A (4, 2) erasure code has 4 data chunks and 2 parity chunks

Dat

a B

lock

(file

/obj

ect)

F1

F2

F3

F4

C1

C2

D1

D2

D3

D4

D1

D2

D3

C1

S2

S5

S4

S3

S6

S1 Dat

a B

lock

(f

ile/o

bjec

t)

App

licat

ion

App

licat

ion

Write Create chunks

encode WriteDisks

Read path Write path

Read

F1

F2

F3

F4

Recon-struct

Read

Reconstruct path

New disksD4 C2

Page 6: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 6

Erasure Coding : Shortcomings

Encode is compute intensive In case of Reed Solomon a generator matrix of

dimension (k+m, k) is used to create code chunks from data chunks

Reconstruction is costly. It is triggered in case of Degraded Read : This issue is caused when

application receives read exception while reading a data block in a node due to software errors (hot spot effect or system updates) or hardware errors

Node repair : The whole node is down

Number of failed nodes in a Facebook cluster of 3,000 nodes for a month [4]

Page 7: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 7

Erasure Coding : Modern approaches

Locally recoverable code (LRC) LRCs trade storage efficiency for

speeding up the recovery process

LRCs use MDS code in a hierarchical manner by performing the encoding at multiple levels

D1 D2 D3 D4

D5 D6 D7 D8

L1

G1 G2

Global Parity

Local Parity

Local Parity

L2

G3 G4

Page 8: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 8

Erasure Coding : Modern approaches

Regenerating codes These are mostly MDS codes represented as (n, k, d, α, β)

which divides the chunks into smaller sub-chunks during the encoding process

Reduce the bandwidth for the repair by reducing the amount of data read from each node

Further classified as minimum storage regenerating codes and minimum bandwidth regenerating codes Highly compute intensive

Page 9: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 9

Erasure code @ Aricent – Altran Group

Improvement in storage efficiency, Latency. Employ new generation Clay Code[2]. Clay Code has

Least possible storage overhead Least possible repair bandwidth and disk read Shown 3x repair time reduction and up to 30% and 106% improvement in

degraded read and write with CEPH Acceleration of Erasure Coding

Offloading the computation to GPU Accelerated Cauchy RS (CRS) from Jerasure library and Clay Code

Integrate the accelerated erasure code algorithms to CEPH.

Page 10: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 10

Cauchy Reed-Solomon

Cauchy Reed Solomon Uses Cauchy generator matrices Multiplication is reduced to XOR

operation Accelerated Cauchy Reed Solomon

Use of constant memory of generator matrix in GPU

Use of shared memory to optimize access to data in global memory

Cauchy RS erasure code[3]

Generator matrix

Input data

mw

Data Chunk k-1

Data Chunk 0

Data Chunk 1

Packet 0Packet 1

Packet w-1

Packet 0Packet 1

Packet w-1

Packet 0Packet 1

Packet w-1

Block 0 kw

Parity Chunk 0Parity Chunk 1

Parity Chunk m-1

Block 1 Block N-1

Page 11: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 11

Clay Code : Construction

Consider the (2, 2) encoding, each sub-chunk is represented using a point in plane. Sub-chunks are further classified as coupled (blue dots) and uncoupled (red-dots) Using uncoupled pairs copied as is, a Pairwise

Reverse Transform (PRT) is used on paired sub-chunks to obtain elements of uncoupled data cube (cube on RHS). A MDS code is used to get rest of uncoupled data cube

Using newly constructed uncoupled data cube a Pairwise Forward Transform (PFT) is applied to obtain the code chunks. Both PRT and PFT are (2, 2) MDS codes

PRT

+Copy

C1

C1*

C2

C2*

U1

U2

U1*

U2*

MD

S

PFT

+Copy

C3

C3*

C4

C4*

U3*U3

U4

U4*

yx

z = (0,0)

z = (0,1)

z = (1,0)

z = (1,1)

A (2, 2) Clay Code

Uncoupled data cube

Page 12: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 12

Clay Code : Decode/Recovery Process

Consider the following single data node erasure case Uncoupled data cube is created using PRT and copying the unpaired

sub-chunks MDS decode is performed on the planes selected for recovery and

uncoupled sub-chunks are copied

PRT+

Copy

MDS CopyC

C*

U

U*

Page 13: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 13

Clay Code : Decode/Recovery Process

With clay code construction any two sub-chunks in the set {U, U*, C, C*} can be recovered from the remaining two sub-chunks using PFT. Here C1* is computed from C1, U1 and C2* from C2, U2

The repair bandwidth is reduced in this method since data from only 2(half) Z-planes are used for the recovery process

C2*

C2 U2

U1

C1*

C1

Page 14: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 14

Clay Code : Enhancements

Use of accelerated Cauchy RS Clay code uses MDS codes for performing PFT and PRT through existing Erasure

code infrastructure in CEPH. A version of earlier accelerated Cauchy RS is used for PFT and PRT

Multiple memory allocation (both CPU and GPU side) and related copying were involved with CEPH erasure code infrastructure. These were optimized by removing redundant operations

Optimized memory access and separate GPU kernel for PFT and PRT Clay code construction uses data copy and various transforms to create intermediate

and final results. Complete clay operations were moved to GPU space while using CUDA/OpenCL primitives to achieve the copy operations

An optimized and independent (2, 2) erasure code CUDA/OpenCL implementation is used for PFT and PRT

Page 15: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 15

Environment

Hardware 16 core Intel(R) Xeon(R) CPU

E5-2660 @ 2.20GHz with 64GB ram

NVIDIA GTX 1080 Software

CEPH 13.1.0 (mimic) CUDA 8.0 (driver 384.111) Intel OpenCL 2.1 for CPU

ErasureInterface

CRS

Reed Solomon (Vand)

Intel ISA-L

Locally repairable erasure

Shingled Erasure code

Jerasure library

OSD

RADOS

RBD ErasureCodelibrary

Accelerated CRS

Clay Code

Accelerated Clay for GPU

TestCaseInjector

Googletest

EC test Fixture

Resultabridger

Accelerated Clay for CPU

Page 16: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 16

Results – CRS REF PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CRS algorithm.

Page 17: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 17

Results – CRS REF with OpenCL PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CRS algorithm.

Exec

utio

n M

B/s

CRS: Encode and decode performance decrease with higher (k, m) values. In case of decode the performance declines with no. of erasures similar to REF.

Page 18: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 18

Results – CRS REF with GPU PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CRS algorithm.

Exec

utio

n M

B/s

CRS: Encode and decode performance are fairly consistent with variation in (k,m) and number of erasures

Page 19: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 19

Results – CLAY REF PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CLAY algorithm.

Page 20: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 20

Results – CLAY REF with OpenCL PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CLAY algorithm.

Exec

utio

n M

B/s

CLAY: Encode performance decrease with higher (k, m) values. In case of decode the performance is consistent with no. of erasures similar to REF.

Page 21: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 21

Results – CLAY REF with GPU PerformanceEx

ecut

ion

MB/

s

Encode and decode performances for various (k, m) values with different chunk sizes for CLAY algorithm.

Exec

utio

n M

B/s

CLAY: Encode performance decrease with higher (k, m) values. In case of decode the performance is consistent with no. of erasures.

Page 22: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 22

Results – CLAY Decode performanceEx

ecut

ion

MB/

s

Exec

utio

n M

B/s

(12, 6) Decode with one erasure is ~3x and~2.5x faster in OpenCL and GPU respectively.

Page 23: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 23

Approximate (3-5x) gain is observed in case ofOpenCL and (10-18x) gain is observed in case ofGPU. Gain increases with number of erasures.

Results – CRS performance summaryEx

ecut

ion

MB/

s

Exec

utio

n M

B/s

Encode bandwidth is approximately 4x and 16xfor OpenCL and GPU respectively for (6, 4)and it gradually increases up to 45x withincrease in (k, m) value. A slight decrease isseen with (k, m) value of (20, 10).

Page 24: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 24

Results – CLAY performance summaryEx

ecut

ion

MB/

s

Exec

utio

n M

B/s

Encode bandwidth show 2-22x performanceimprovements for OpenCL and ~7-77xperformance improvement for GPU for different(k, m) values.

The decode gain reduces with higher k, mvalues. It reduces from ~2.5x to 1.7x forOpenCL and from ~15x to ~10x for GPU.

Page 25: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 25

Summary

Accelerated Cauchy Reed Solomon (CRS) and Clay Code show good performance gain compared to corresponding reference versions on GPU and with OpenCL. The table below shows the maximum gain obtained in various cases.

We continue the work of Testing new and improved CRS and Clay code with a CEPH Cluster comprising

four server machine with 16 core Intel Xeon CPU E5-2660 @ 2.20GHz, 64GB ram with NVIDIA GTX 1080 card and 60TB storage array

OpenCL GPU OpenCL GPUCRS 9.94 45.80 5.90 18.48CLAY 22.84 78.78 2.63 14.88

Encode DecodeAlgo.

Page 26: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 26

Erasure code : Future possibilities

Erasure Coding Use CasesApplication Workload Dependent

ResiliencyStorage Technology Dependent Resiliency Integration of EC with File System Data Migration for Resiliency Optimization

Page 27: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 27

Reference

1. Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. A Tale of Two Erasure Codes in HDFS. Usenix conference on File and storage technologies, 2015

2. Myna Vajha, Vinayak Ramkumar, Bhagyashree Puranik, Ganesh Kini, Elita Lobo, Birenjith Sasidharan, and P. Vijay Kumar, Indian Institute of Science, Bangalore; Alexandar Barg and Min Ye, University of Maryland; Srinivasan Narayanamurthy, Syed Hussain, and Siddhartha Nandi. Clay Codes: Moulding MDS Codes to Yield an MSR Code, Usenixconference on File and storage technologies, 2018.

3. Chengjian Liu, Qiang Wang, Xiaowen Chu, Yiu-Wing Leung. G-CRS: GPU Accelerated Cauchy Reed-Solomon Coding, IEEE Transactions on Parallel and Distributed Systems, 2018

4. Maheswaran Sathiamoorthy, Alexandros G. Dimakis, Megasthenis Asteris, Ramkumar Vadali, Dhruba Borthakur, Dimitris Papailiopoulos, Scott Chen. XORing Elephants: Novel Erasure Codes for Big Data, Proceedings of the VLDB Endowment, 2013.

Page 28: Accelerated Erasure Coding: The New Frontier of Software ......Title: Accelerated Erasure Coding: The New Frontier of Software Defined Storage Author: DineshKumar Bhaskaran Created

2018 Storage Developer Conference. © Aricent – Altran Group. All Rights Reserved. 28

Thank you