spanstore: cost-effective geo-replicated storage spanning multiple cloud services zhe wu, michael...

SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services

Zhe Wu, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha V. Madhyastha

UC Riverside and USC

• 2

Geo-distributed Services for Low Latency

• 3

Cloud Services Simplify Geo-distribution

• 4

Need for Geo-Replication

Data uploaded by a user may be viewed/edited by users in other locations• Social networking (Facebook, Twitter)• File sharing (Dropbox, Google Docs) Geo-replication of data is necessary

Isolated storage service in each cloud data centerApplication needs to handle replication itself

• 5

Geo-replication on Cloud Services

Lots of recent work on enabling geo-replication• Walter(SOSP’11), COPS(SOSP’11),

Spanner(OSDI’12), Gemini(OSDI’12), Eiger(NSDI’13)…

• Faster performance or stronger consistency

Added consideration on cloud servicesMinimizing cost

• 6

Outline

Problem and motivation

SPANStore overview

Techniques for reducing cost

Evaluation

• 7

SPANStore

Key value store (GET/PUT interface) spanning cloud storage servicesMain objective: minimize costSatisfy application requirements• Latency SLOs• Consistency (Eventual vs. sequential

consistency)• Fault-tolerance

• 8

SPANStore Overview

SPANStore

App

Metadata lookups

Return data/ACK

Library

request

Read/write data based on optimal replication policy

Data center A

Data center B

Data center C

Data center D

• 9

SPANStore Overview

SPANStoreApp

Data center B

SPANStoreApp

Data center C

SPANStore

Data center A

SPANStoreApp

Data center D

Placement Manager

workloadReplication policy

Inter-DC latenciesPricing policies

Latency, consistency and fault tolerance requirements

SPANStore Characterization

Application Input

• 10

Outline


SPANStore overview


Evaluation

Questions to be addressed for every object:• Where to store replicas• How to execute PUTs and GETs

• 12

Cloud Storage Service Cost

Storage cost

Request cost

Data transfer cost

+

+

= Storage service cost

(the amount of data stored)

(the number of PUT and GET requests issued)

(the amount of data transferred out of data center)

• 13

Low Latency SLO Requires High Replication in Single Cloud Deployment

R R R

R

Latency bound = 100ms

AWS regions

• 14

Technique 1: Harness Multiple Clouds

R R R

RR

R

Latency bound = 100ms

AWS regions

• 15

Price Discrepancies across Clouds

Cloud region

Storage price (GB)

Data transfer price (GB)

GET request price (10000 requests)

PUT request price (1000 requests)

S3 US West 0.095$ 0.12$ 0.004$ 0.005$

Azure Zone2 0.095$ 0.19$ 0.001$ 0.0001$

GCS 0.085$ 0.12$ 0.01$ 0.01$

… … … … …

Leveraging discrepancies judiciously can reduce cost

• 16

Range of Candidate Replication Policies

Strategy 1: single replica in cheapest storage cloud

R High latencies

• 17


Strategy 2: few replicas to reduce latencies

RR

High data transfer costHigh data transfer costHigh data transfer cost

• 18


Strategy 3: replicated everywhere

PUT

R R

R

R

High latencies& cost of PUTs

High storage cost

Optimal replication policy depends on:

1. application requirements2. workload properties

• 19

High Variability of Individual Objects

Estimate workload based on same hour in previous week

60% of hours have error higher than 50%

20% of hours have error higher than 100%

Error can be as high as 1000%

Analyze predictability of Twitter workload

• 20

Technique 2: Aggregate Workload Prediction per Access Set

Observation: stability in aggregate workload• Diurnal and weekly patterns

Classify objects by access set:• Set of data centers from which object is accessed

Leverage application knowledge of sharing pattern• Dropbox/Google Docs know users that share a file• Facebook controls every user’s news feed

• 21

Technique 2: Aggregate Workload Prediction per Access Set

Aggregate workload is more stable and predictable

Estimate workload based on same hour in previous week

• 22

Optimizing Cost for GETs and PUTs

R

R

GET R

R

Use cheap (request + data transfer) data centers

• 23

Technique 3: Relay Propagation

PUT

Asynchronous propagation (no latency constraint)

R

0.25$/GB

0.19$/GB

0.2$/GB

0.19$/GB

0.12$/GB

R

R

R

R

• 24

Technique 3: Relay Propagation

PUT

0.25$/GB

0.19$/GB

0.2$/GB

0.19$/GB

0.12$/GB

Violate SLO

Asynchronous propagation (no latency constraint)Synchronous propagation (bounded by latency SLO)

R

R

R

R

R

• 25

Summary

Insights to reduce cost• Multi-cloud deployment• Use aggregate workload per access set• Relay propagation

Placement manager uses ILP to combine insightsOther techniques• Metadata management• Two phase-locking protocol• Asymmetric quorum set

• 26

Outline


SPANStore overview


Evaluation

• 27

Evaluation

Scenario• Application is deployed on EC2• SPANStore is deployed across S3, Azure and

GCS

Simulations to evaluate cost savingsDeployment to verify application requirements• Retwis • ShareJS

• 28

Simulation Settings

Compare SPANStore against• Replicate everywhere• Single replica• Single cloud deployment

Application requirements• Sequential consistency• PUT SLO: min SLO satisfies replicate everywhere• GET SLO: min SLO satisfies single replica

• 29

SPANStore Enables Cost Savings across Disparate Workloads

Savings by relay propagation

#1: big objects, more GETs(Lots of data transfers from

replicas)#2: big objects, more PUTs(Lots of data transfers to

replicas)

Savings by reducing data transfer

#3: small objects, more GETs(Lots of GET requests)

Savings by price discrepancy of GET request

#4: small objects, more PUTs(Lots of PUT requests)

Savings by price discrepancy of PUT request

• 30

Deployment Settings

Retwis• Scale down Twitter workload• GET: read timeline• PUT: make post• Insert: read follower’s timeline and append

post to it

Requirements:• Eventual consistency• 90%ile PUT/GET SLO = 100ms

• 31

SPANStore Meets SLOs

SLO

90%ile

Insert SLO

• 32

Conclusions

SPANStore• Minimize cost while satisfying latency,

consistency and fault-tolerance requirements

Use multiple cloud providers for greater data center density and pricing discrepancies

Judiciously determine replication policy based on workload properties and application needs

spanstore: cost-effective geo-replicated storage spanning multiple cloud services zhe wu, michael...

Documents