a novel data placement model for highly-available storage ......kinesis: a data/replica placement...

34
Rama, Microsoft Research joint work with John MacCormick, Nick Murphy, Kunal Talwar, Udi Wieder, Junfeng Yang, and Lidong Zhou A Novel Data Placement Model for Highly-Available Storage Systems

Upload: others

Post on 25-Jul-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Rama, Microsoft Research

joint work with

John MacCormick, Nick Murphy, Kunal Talwar,

Udi Wieder, Junfeng Yang, and Lidong Zhou

A Novel Data Placement Model for Highly-Available Storage Systems

Page 2: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Introduction

Kinesis:

Framework for placing data and replicas in a data-center storage system

Three Design Principles:

Structured organization

Freedom of choice

Scattered placement

Page 3: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis Design =>Structured Organization

Segmentation: Divide storage servers into k

segments

Each segment is an independent hash-based system

Failure Isolation: Servers with shared

components in the same segment

r replicas per item, each on a different segment

Reduces impact of correlated failures

Page 4: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis Design =>Freedom of Choice

Balanced Resource Usage:

Multiple-choice paradigm

Write: r out of k choices

Read: 1 out of r choices

Page 5: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis Design =>Scattered Data Placement

Independent hash functions

Each server stores different set of items

Parallel Recovery

Spread recovery load among multiple servers uniformly

Recover faster

Page 6: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Motivation => Data Placement Models

Data is stored on a server determined by a hash function

Server identified by a local computation during reads

Low overhead

Limited control in placing data items and replicas

Any server can store any data

A directory provides the server to fetch data form

Expensive to maintain a globally-consistent directory in a large-scale system

Can place data items carefully to maintain load balance, avoid correlated failure, etc.

Hash-based Directory-based

Page 7: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis =>Advantages

Enables hash-based storage systems to have advantages of directory-based systems

Near-optimal load balance

Tolerance to shared-component failures

Freedom from problems induced by correlated replica placement

No bottlenecks induced by directory maintenance

Avoids slow data/replica placement algorithms

Page 8: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Evaluation

Theory

Simulations

Experiments

Page 9: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory => The “Balls and Bins” Problem

Load balancing is often modeled by the task of throwing balls (items) into bins (servers)

Throw m balls into n bins:

Pick a bin uniformly at random

Insert the ball into the bin

Single-Choice Paradigm:

Max Load:(with high prob.)

Page 10: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Throw m balls into n bins:

Pick d bins uniformly at random (d ≥ 2)

Insert the ball into the less-loaded bin

Excess load is independent of m, number of balls! [BCSV00]

Max Load:(with high prob.)

Theory => Multiple-Choice Paradigm

Page 11: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory vs. Practice

Storage load vs. network load: [M91]

Network load is not persistent unlike storage load

Non-uniform sampling: [V99],[W07]

Servers chosen based on consistent (or linear) hashing

Replication: [MMRWYZ08]

Choose r out of k servers instead of 1 out of d bins

Heterogeneity:

Variable-sized servers [W07]

Variable-sized items [TW07]

Page 12: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory => Planned Expansions

Adding new, empty servers into the system

Creates sudden, large imbalances in load Empty servers vs. filled servers

Eventual storage balance Do subsequent inserts/writes fill up new servers?

Eventual network load balance Are new items distributed between old and new servers? New items are often more popular than old items

Page 13: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory => Planned Expansions

Expanding a linear-hashing system:

0 2b2b-1

N = 2

N = 4

N = 6

Page 14: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory => Planned Expansions

Before Expansion:

Assume there are k n servers on k segments

Storage is evenly balanced

Expansion:

Add α n new servers to each segment

After expansion:

R = (1 - α) n servers with twice the load as

L = 2 α n servers (new servers + split servers)

Page 15: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Theory => Planned Expansions

Relative Load Distribution: RL

Ratio of expected number of replicas inserted in L servers

to expected number of replicas inserted on all servers

RL > 1 => eventual storage balance!

RL < 1 => no eventual storage balance

Theorem: RL > 1 if k ≥ 2 r [MMRWYZ08]

Page 16: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Evaluation

Theory

Simulations

Experiments

Page 17: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => Overview

Real-World Traces:

Compare with Chain: one segment

single-choice paradigm

chained replica placement

E.g. PAST, CFS, Boxwood, Petal

Trace Num Files Total Size Type

MSNBC 30,656 2.33 TB Read only

Plan-9 1.9 Million 173 GB Read/write

0

2b

2b-1

Page 18: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => Load Balance

MSNBC Trace Plan-9 Trace

0.0

0.5

1.0

1.5

2.0

0 10000 20000 30000

Sto

rag

e:

Max -

Avg

(G

B)

Number of Objects

Chain(3)

Kinesis(7,3)

0

0.5

1

1.5

2

2.5

3

0 1 2 3

Sto

rag

e:

Max -

Avg

(G

B)

Number of Operations (106)

Chain(3)

Kinesis(7,3)

Page 19: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => System Provisioning

45.0%

35.0%

29.0%26.0%

23.0%20.0%

52.0%

0%

15%

30%

45%

60%

K (5) K (6) K (7) K (8) K (9) K (10) Chain

Perc

en

tag

e O

verp

rovis

ion

ing

Page 20: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => User Experienced Delays

0

2

4

6

8

10

0 100 200 300 400 500

90

thp

c. Q

ueu

ing

Dela

y (

sec)

Number of Servers

Chain (3)

Kinesis (7,3)

Page 21: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => Read Load vs. Write Load

Read load: short term resource consumption Network bandwidth, computation, disk bandwidth

Directly impacts user experience

Write load: short and long term resource consumption Storage space

Solution (Kinesis S+L): Choose short term over long term Storage balance is restored when transient resources are

not bottlenecked

Page 22: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => Read Load vs. Write Load

0

50

100

150

200

0 20 40 60 80 100

Qu

eu

ing

Dela

y:

Max -

Avg

(ms)

Update to Read Ratio (%)

Chain

Kinesis S

Kinesis S+L

Kinesis L

Page 23: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Simulations => Read Load vs. Write Load

0%

5%

10%

15%

0.0

0.5

1.0

1.5

0 4 8 12 16 20 24

Sto

rag

e Im

bala

nce (

%)

Req

uest

Ra

te (

Gb

/s)

Time (hours)

Request Rate

Kinesis S+L

Kinesis S

Page 24: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Evaluation

Theory

Simulations

Experiments

Page 25: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis => Prototype Implementation

Storage system for variable-sized objects

Read/Write interface

Storage servers

Linear hashing system

Read, Append, Exists, Size (storage), Load (queued requests)

Failure recovery for primary replicas Primary for an item is the highest numbered server with a replica

Page 26: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis => Prototype Implementation

Front end

Two-step read protocol

1) Query k candidate servers

2) Fetch from least loaded server with a replica

Versioned updates with copy-on-write semantics

RPC-based communication with servers

Non implemented!

Failure detector

Failure-consistent updates

Page 27: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Kinesis =>Experiments

15 node LAN test-bed

14 storage servers and 1 front end

MSNBC trace of 6 hours duration

5000 files, 170 GB total size, and 200,000 reads

Failure induced at 3 hours

Page 28: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

10.7 10.7 10.2 10.6 10.3 10.8

14.0

17.8 17.4

14.3

10.9 10.7 10.4 10.6

Experiments => Kinesis vs. Chain

11.9 11.4 11.7 11.9 11.5 11.9 11.612.4 12.5 12.9 13.4 12.7 12.3

11.4

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Kinesis(7,3)

Chain(3)

Average Latency

Kinesis: 1.73 sec Chain: 3 sec

Page 29: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Experiments => Failure Recovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Kinesis(7,3)

Chain(3)

Total Recovery Time

Kinesis: 17 min(12 servers)

Chain: 44 min(5 servers)

Page 30: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Related Work

Prior work:

Hashing: Archipelago, OceanStore, PAST, CFS…

Two-choice paradigm: common load balancing technique

Parallel recovery: Chain Replication [RS04]

Random distribution: Ceph [WBML06]

Our contributions:

Putting them together in the context of storage systems

Extending theoretical results to the new design

Demonstrating the power of these simple design principles

Page 31: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Summary and Conclusions

Kinesis: A data/replica placement framework for LAN storage systems

Structured organization, freedom-of-choice, and scattered distribution

Simple and easy to implement, yet quite powerful!

Reduces infrastructure cost, tolerates correlated failures, and quickly restores lost replicas

Soon to appear in ACM Transactions on Storage, 2008Download: http://research.microsoft.com/users/rama

Page 32: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and
Page 33: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Experiments => Kinesis vs. Chain

0

5

10

15

0 0.5 1 1.5 2 2.5 3

Avera

ge L

ate

ncy (

sec)

Time (hours)

Chain (3)

Kinesis (7,3)

Page 34: A Novel Data Placement Model for Highly-Available Storage ......Kinesis: A data/replica placement framework for LAN storage systems Structured organization, freedom-of-choice, and

Experiments => Kinesis vs. Chain

0

10

20

30

40

50

0

10

20

30

40

50

0 0.5 1 1.5 2 2.5 3

Se

rve

r L

oa

d:

Ma

x -

Avg

Req

uest

Rate

Time (hours)

Total Load

Chain (3)

Kinesis (7,3)