challenges in distributed storage and compute systemsparimal/talks/mvj2017slides.pdf · 1/ 28...

29
1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication Engineering Indian Institute of Science MVJ College of Engineering August 29, 2017

Upload: others

Post on 26-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

1/ 28

Challenges in Distributed Storageand Compute Systems

Parimal Parag

Electrical Communication EngineeringIndian Institute of Science

MVJ College of EngineeringAugust 29, 2017

Page 2: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

2/ 28

Evolving Digital Landscape

Information Theoretic Regime

Emerging Applications

Del

ayT

oler

ance

Rate Requirements

Voice Cloud Storage Video Streaming

Network Printer

Email

Browsing

File Transfer

Page 3: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

3/ 28

Dominant traffic on Internet

Peak Period Traffic Composition (North America)

Upstream Downstream Aggregate0

20

40

60

80

100Real-Time Entertainment

Web BrowsingMarketplaces

FilesharingTunneling

Social NetworkingStorage

CommunicationsGaming

Outside Top 5

I Real-Time Entertainment: 64.54% for downstream and 36.56% for mobile access1

1https://www.sandvine.com/downloads/general/global-internet-phenomena/2015/global-internet-

phenomena-report-latin-america-and-north-america.pdf

Page 4: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

4/ 28

Centralized Paradigm – Media Vault

Vault

File 1

File 2

File 3

File 4

File 5

File 6

Requests

Potential Issues with Centralized Scheme

I Traffic load: Vault must handle all requests for all files

I Service rate: Large storage entails longer access time

I Not robust to hardware failures or malicious attacks

Page 5: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

5/ 28

Alternative to Centralized Paradigm

File 1

File 2

File 3

File 4

File 5File 6

File 7 File 8

File 9

File 10

File 11

Distributed Systems

I Autonomous nodes with local memory

I Interaction between the connected nodes

I Nodes with local knowledge of input and network topology

I Heterogeneous and potentially time varying system topology

Page 6: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

6/ 28

Distributed Systems

File 1

File 2

File 3

File 4

File 5File 6

File 7 File 8

File 9

File 10

File 11

Desirable Properties

I Scalability: Linear or sub-linear increase in number of nodes

I Resilience: Able to withstand local node failures

I Efficiency: Minimum interaction between nodes

I Fairness: Almost equal load at all nodes

Page 7: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

7/ 28

Examples

Distributed Storage

I Content streaming: NetFlix, HotStar, Eros Now, YouTube,Hulu, Amazon Prime Video

I Cloud storage: GitHub, DropBox, iCloud, OneDrive,UbuntuOne

I Cloud service: Facebook, Google Suite, Office365

Distributed Computation

I Cloud computing: Amazon Web Services, Microsoft Azure,Google Search

I Cluster computing: Hadoop, Spark

I Distributed database: Aerospike, Cassandra, Couchbase,Druid

Page 8: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

8/ 28

Distributed System Architecture

Classification

I Client-server: Online banking, Web servers, e-commerce

I Peer-to-peer: Bitcoin, OS distribution

I Hybrid: Spotify, content delivery in ISPs

Interaction

I Master-slave: Message passing with local memory

I Database-centric: Relation database for interaction

Page 9: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

9/ 28

Content Delivery Network

Vault

File 1

File 2

File 3

File 4

File 5

File 6

File 1 File 3 File 5

File 1 File 4 File 6

File 2 File 3 File 6

File 2 File 4 File 5

Routed Requests

Redundancy for resilience

I Mirroring content with local servers

I Media file on multiple servers

Page 10: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

10/ 28

Load Balancing through File Fragmentation

Vault

File 1A File 1B

File 2A File 2B

File 3A File 3B

File 4A File 4B

File 5A File 5B

File 6A File 6B

File 1A File 3A File 5A

File 2B File 4B File 6B

File 1A File 4A File 6A

File 2B File 3B File 5B

File 2A File 3A File 6A

File 1B File 4B File 5B

File 2A File 4A File 5A

File 1B File 3B File 6B

Multiple Requests

Partial Completions

Shared Coherent Access

I Availability and better content distribution

I File segments on multiple servers

Page 11: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

11/ 28

Problem Statement

f1(X )

f2(X )

f3(X )

f4(X )

Requests

Quantify mean access time

I with number of fragments for a single message X ,

I with encoding and storage fi (X ) for fragmented messageX = (X1, . . . ,Xk) at n distinct nodes.

Page 12: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

11/ 28

Problem Statement

A

A

B

B

Requests A

B

A+B

A-B

Requests

ProblemQuantify the latency gains offered by distributed coding

SolutionCoded storage offers scaling gains over replication

Page 13: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

12/ 28

System Model

File storage

I Each media file divided into k pieces

I Pieces encoded and stored on n servers

Arrival of requests

I Each request wants entire media file

I Poisson arrival of requests with rate λ

Time in the system

I Till the reception of whole file

Service at each server

I IID exponential service time with rate k/n

Page 14: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

13/ 28

Question: Duplication versus MDS Coding

B

B

A

A

D

C

B

A

Reduction of access time

I How to select number of fragments for a single message?

I How to encode and store at the distributed storage nodes?

Page 15: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

14/ 28

Pertinent References (very incomplete)

N. B. Shah, K. Lee, and K. Ramchandran, “When do redundant requests reduce latency?” IEEE Trans.

Commun., 2016.

G. Joshi, Y. Liu, and E. Soljanin, “On the delay-storage trade-off in content download from coded

distributed storage systems” IEEE Journ. Spec. Areas. Commun., 2014.

Dimakis, Godfrey, Wu, Wainwright, and Ramchandran, “Network Coding for Distributed Storage Systems ”

IEEE Trans. Info. Theory, 2010.

A. Eryilmaz, A. Ozdaglar, M. Medard, and E. Ahmed, “On the delay and throughput gains of coding in

unreliable networks,” IEEE Trans. Info. Theory, 2008.

D. Wang, D. Silva, F. R. Kschischang, “Robust Network Coding in the Presence of Untrusted Nodes”, IEEE

Trans. Info. Theory, 2010.

A. Dimakis, K. Ramchandran, Y. Wu, C. Suh, “A Survey on Network Codes for Distributed Storage”,

Proceedings of IEEE, 2011.

Karp, Luby, Meyer auf der Heide, “Efficient PRAM simulation on a distributed memory machine”, ACM

symposium on Theory of computing, 1992.

Adler, Chakrabarti, Mitzenmacher, Rasmussen, “Parallel randomized load balancing”, ACM symposium on

Theory of computing, 1995.

Gardner, Zbarsky, Velednitsky, Harchol-Balter, Scheller-Wolf, “Understanding Response Time in the

Redundancy-d System”, SIGMETRICS, 2016.

B. Li, A. Ramamoorthy, R. Srikant, “Mean-field-analysis of coding versus replication in cloud storage

systems”, INFOCOM, 2016.

Page 16: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

15/ 28

Storage Coding – The Centralized MDS Queue

X

exempli gratia: Shah, Lee, Ramchandran (2013), Lee, Shah, Huang, Ramchandran (2017), Vulimiri, Michel,

Godfrey, Shenker (2012), Ananthanarayanan, Ghodsi, Shenker, Stoica (2012) Baccelli, Makowski, Shwartz (1989)

Page 17: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

16/ 28

State Space Structure

Keeping Track of Partially Fulfilled Requests

I Element of state vector YS(t) is number of users with givensubset S of pieces

Continuous-Time Markov Chain

I Y(t) = {YS(t) : S ⊂ [n], |S | < k} is a Markov process

Page 18: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

17/ 28

Storage Coding – (n, k) Fork-Join Model

x

x

X

exempli gratia: Joshi, Liu, Soljanin (2012, 2014), Joshi, Soljanin, Wornell (2015), Sun, Zheng, Koksal, Kim, Shroff

(2015), Kadhe, Soljanin, Sprintson (2016), Li, Ramamoorthy, Srikant (2016)

Page 19: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

18/ 28

State Space Collapse

TheoremFor duplication and coding schemes under priority scheduling andparallel processing model, collection

S(t) = {S ⊂ [n] : YS(t) > 0, |S | < k}

of information subsets is totally ordered in terms of set inclusion

Corollary

Let Yi (t) be number of requests with i information symbols attime t, then

Y(t) = (Y0(t),Y1(t), . . . ,Yk−1(t))

is Markov process

Page 20: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

19/ 28

State Transitions of Collapsed System

Arrival of Requests

I Unit increase in Y0(t) = Y0(t−) + 1 with rate λ

Getting Additional Symbol

I Unit increase in Yi (t) = Yi (t−) + 1

I Unit decrease in Yi−1(t) = Yi−1(t−)− 1

Getting Last Missing Symbol

I Unit decrease in Yk−1(t) = Yk−1(t−)− 1

Page 21: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

20/ 28

Tandem Queue Interpretation (No Empty States)

γ1Y1(t)γ0Y0(t)λ

Duplication

I When all statesnon-empty

I No. servers availableat level i is n/k

I Normalized servicerate at level i

γi = 1 i = 0, . . . , k−1

MDS Coding

I When all states non-empty

I One server available at leveli 6= k − 1

I Normalized service rate at level i

γi =

{kn i < k − 1kn (n − k + 1) i = k − 1

Page 22: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

21/ 28

Tandem Queue Interpretation (General Case)

γ1Y1(t)γ0Y0(t)λ

Tandem Queue with Pooled Resources

I Servers with empty buffers help upstream

I Aggregate service at level i becomes

li (t)−1∑j=i

γj where li (t) = k ∧ {l > i : Yl(t) > 0}

I No explicit description of stationary distribution formulti-dimensional Markov process

Page 23: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

22/ 28

Bounding and Separating

µ1µ0λ

Theorem†

When λ < minµi , tandem queue has product form distribution

π(y) =k−1∏i=0

λ

µi

(1− λ

µi

)yi

Uniform Bounds on Service RateTransition rates are uniformly bounded by

γi ≤li (y)−1∑j=i

γj ≤k−1∑j=i

γj , Γi

†F. P. Kelly, Reversibility and Stochastic Networks. New York, NY, USA: Cambridge University Press, 2011.

Page 24: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

23/ 28

Bounds on Tandem Queue

γ1Y1(t)γ0Y0(t)λ

Γ1Y1(t)Γ0Y0(t)λ

γ1Y1(t)γ0Y0(t)λ

Lower BoundHigher values for service ratesyield lower bound on queuedistribution

π(y) =k−1∏i=0

λ

Γi

(1− λ

Γi

)yi

Upper Bound

Lower values for service rateyield upper bound on queuedistribution

π(y) =k−1∏i=0

λ

γi

(1− λ

γi

)yi

Page 25: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

24/ 28

Mean Sojourn Time

0.1 0.2 0.4 0.6 0.8 0.950

5

10

15

Arrival Rate

Replication Coding

Upper BoundSimulationApproximationLower Bound

Page 26: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

25/ 28

Mean Sojourn Time

0.1 0.2 0.4 0.6 0.8 0.950

5

10

15

Arrival Rate

(4, 2) MDS Code

Upper BoundSimulationApproximationLower Bound

Page 27: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

26/ 28

Approximating Pooled Tandem Queue

γ1Y1(t)γ0Y0(t)λ

µ1Y1(t)

µ0Y0(t)λ

Independence Approximation with Statistical Averaging

Service rate is equal to base service rate γi plus cascade effect,averaged over time

µk−1 = γk−1

µi = γi + µi+1πi+1(0)π(y) =

k−1∏i=0

λ

µi

(1− λ

µi

)yi

Page 28: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

27/ 28

Comparing Replication versus MDS Coding

2 4 8 12 16 201

2

3

4

5

Number of Servers

MeanSojourn

Tim

e

Repetition Simulation

Repetition Approximation

MDS Simulation

MDS Approximation

Arrival rate 0.3 units and coding rate n/k = 2

Page 29: Challenges in Distributed Storage and Compute Systemsparimal/Talks/mvj2017slides.pdf · 1/ 28 Challenges in Distributed Storage and Compute Systems Parimal Parag Electrical Communication

28/ 28

Summary and Discussion

Main Contributions

I Analytical framework for study of distributed computation andstorage systems

I Upper and lower bounds to analyze replication and MDS codes

I A tight closed-form approximation to study distributed storagecodes

I MDS codes are better suited for large distributed systems

I Mean access time is better for MDS codes for all code-rates