a distributed storage system allowing application users to reserve i/o performance in advance for...

29
A Distributed Storage System Allowing A Distributed Storage System Allowing Application Users to Reserve I/O Application Users to Reserve I/O Performance Performance in Advance for Achieving SLA in Advance for Achieving SLA Yusuke Tanimura Hidetaka Koie, Tomohiro Kudoh Isao Kojima, and Yoshio Tanaka National Institute of AIST, Japan The 11th ACM/IEEE International Conference on Grid Computing October 26-28, 2010, Brussels, Belgium

Upload: estella-anthony

Post on 27-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

A Distributed Storage System AllowingA Distributed Storage System AllowingApplication Users to Reserve I/O PerformanceApplication Users to Reserve I/O Performance

in Advance for Achieving SLAin Advance for Achieving SLA

Yusuke Tanimura , Hidetaka Koie, Tomohiro Kudoh

Isao Kojima, and Yoshio Tanaka

National Institute of AIST, Japan

The 11th ACM/IEEE International Conference on Grid Computing

October 26-28, 2010, Brussels, Belgium

2

On Grids/Clouds• Importance of Service Level Agreement (SLA)

– A contract between users and the service providers• End-to-end performance, reliability, and etc.

• I/O performance of the storage system tends to be a critical bottleneck.– The bandwidth can be guaranteed by recent network

technologies such as the lambda path.

ServiceBest effort BW

Guaranteed BW

Performance guaranteed?

Storage

3

On-going Studies• QoS of parallel I/O in a distributed storage

– Focused on scheduling and resource allocation

Application

I/O library

Local I/O scheduler

Storage servers

Storage client

Storage NW Broker

Requirements

Analyzed behaviors

Automatic translation and performance reservation

I/O control technologies

However, resources are assigned to each application on a first-come (open request), first-serve basis.

4

Our Approach• The storage system allows application users to reserve I/

O performance in advance.– Explicit throughput (MB/sec) reservation

• In advance reservation, there is a room to negotiate the contract.– Financial charge to a request

• Features of our design and implementation:– A distributed storage which supports:

• Advance performance reservation– User interfaces, protocols, resource allocation, etc.

• Striping I/O with QoS according to the reservation– Integration of I/O control techniques

– Cooperation with the network bandwidth reservation and the computing resource reservation

5

Assumptions and Definitions (1)• Assumed I/O workload (in our current focus)

– Streaming type for a large amount of data– Not a mixture of read & write at a single access

• Open for read-only, create or append-only

• Space reservation– Cooperation with write performance reservation – Reserved space = “Bucket”

• User’s private space– Name– Start and end time– Space size– Guaranteed throughput

» Read

» Write

– Stored data = “Object”

Time transition

Object lifetime

Bucket lifetime

Object creation is allowed.

6

Assumptions and Definitions (2)• Performance reservation

– Metrics shown to users:• Throughput (MB/sec)

• Start and end time

• Access type– Read for object– Write for bucket or object

– Condition:• Space reservation or object creation should be prior to the

performance reservation, and vice-versa for the cancellation.

• Combined reservation– Support 1 space & N performance reservations at once

• The storage resources are co-allocated so that the all reservations are accepted, or the request is rejected.

Time transition

Object lifetime

Read and write (append) reservation are allowed.

Bucket lifetime

Write reservation is allowed.

7

Overview Architecture

Applications

Storage server (SS)

Reserve request

Client node

Management server (MGS)

Reservation management

Metadata management for buckets and objects

Global ResourceCoordinator

Network ResourceManager

Storage ResourceManager

Collocation

Web Services-based protocol

Web Services-based protocol

Storage server (SS)

Storage server (SS)

OSD

(Disk I/O rate control)

OSD

OSD

Allocate resources and administer I/O controls according to the reservation

Commands

Web Services-basedreservation client

Reserve request

(Network flow control)

Client API library

Our proposed distributed storage system

8

Overview Architecture

Applications

Storage server (SS)

Reserve request

Client node

Management server (MGS)

Reservation management

Metadata management for buckets and objects

Global ResourceCoordinator

Network ResourceManager

Storage ResourceManager

Collocation

Web Services-based protocol

Web Services-based protocol

Storage server (SS)

Storage server (SS)

OSD

(Disk I/O rate control)

OSD

OSD

Allocate resources and administer I/O controls according to the reservation

Commands

Web Services-basedreservation client

Reserve request

(Network flow control)

Client API library

Our proposed distributed storage system

9

Reservation Interface• Command-line interface• Web-services interface (SRM interface)

– A wrapped interface of command-line interface– Based on the GNS-WSI3 protocol

• Polling-based asynchronous operation and two-phases commit– reserve/modify/release request ... (polling) ... commit/abort ... (polling) ...

– We newly defined “ReservationResources_Type” for storage resources.

ReservationResources_Type

ReservationID: string [0..1]

ReservationStatus: ReservationStatus_Type [0..1]

TimeSpecification: TimeSpecification_Type [1..1]

ResourceAttribute: ResourceAttribute_Type [0..*]

StorageResources_Type

ServicePoint: ServicePoint_Type [1..1]

Space: Space_Type

Access: Access_Type

Space_Type

SpaceName: string [1..1]

SpaceSize: GeneralSpaceSize_Type [1..1]

GuaranteedReadThput: GeneralThput_Type [0..1]

GuaranteedWriteThput: GeneralThput_Type [0..1]

Access_Type

Client: Client_Type [0..1]

FileName: string [1..1]

SpaceName: string [0..1]

Mode: Mode_Type [1..1]

GuaranteedThput: GeneralThput_Type [1..1]

[0..1]

[0..1]

ReservationResources_Type

ReservationID: string [0..1]

ReservationStatus: ReservationStatus_Type [0..1]

TimeSpecification: TimeSpecification_Type [1..1]

ResourceAttribute: ResourceAttribute_Type [0..*]

StorageResources_Type

ServicePoint: ServicePoint_Type [1..1]

Space: Space_Type

Access: Access_Type

Space_Type

SpaceName: string [1..1]

SpaceSize: GeneralSpaceSize_Type [1..1]

GuaranteedReadThput: GeneralThput_Type [0..1]

GuaranteedWriteThput: GeneralThput_Type [0..1]

Access_Type

Client: Client_Type [0..1]

FileName: string [1..1]

SpaceName: string [0..1]

Mode: Mode_Type [1..1]

GuaranteedThput: GeneralThput_Type [1..1]

[0..1]

[0..1]

10

Overview Architecture

Applications

Storage server (SS)

Reserve request

Client node

Management server (MGS)

Reservation management

Metadata management for buckets and objects

Global ResourceCoordinator

Network ResourceManager

Storage ResourceManager

Collocation

Web Services-based protocol

Web Services-based protocol

Storage server (SS)

Storage server (SS)

OSD

(Disk I/O rate control)

OSD

OSD

Allocate resources and administer I/O controls according to the reservation

Commands

Web Services-basedreservation client

Reserve request

(Network flow control)

Client API library

Our proposed distributed storage system

11

Client API Library

• Features– Striping I/O over multiple storage servers– Use a fixed I/O size against storage servers

• Conversion from the application’s I/O size

– Non-POSIX API

• Reservation ID must be specified in a create or open request for object.– Reservation ID is returned as a ticket when the performance

reservation request is accepted.– The management server verifies the reservation by Reservation

ID and User ID.

create_bucket()

delete_bucket() create_object() open_object() read() write() close()

12

Overview Architecture

Applications

Storage server (SS)

Reserve request

Client node

Management server (MGS)

Reservation management

Metadata management for buckets and objects

Global ResourceCoordinator

Network ResourceManager

Storage ResourceManager

Collocation

Web Services-based protocol

Web Services-based protocol

Storage server (SS)

Storage server (SS)

OSD

(Disk I/O rate control)

OSD

OSD

Allocate resources and administer I/O controls according to the reservation

Commands

Web Services-basedreservation client

Reserve request

(Network flow control)

Client API library

Our proposed distributed storage system

13

Management Server(MGS)

Resource Management• Storage resources

– Disk space & throughput of each OSD in a certain period of time

• Role of MGS– Collect status information of the all OSDs

• Max. throughput (Currently static)

• Used/free space– Each OSD primarily manages its own disk space.

– Allocate resources according to the reservation request• Record allocate/free information in the internal tables

OSD

Reserve request from client

• Access reservation info.

Internal tables

• Space reservation info. (cache)

Storage server (SS)

Space reservation request before committing the allocation plan

reply

14

Resource Allocation (1)

1. Check availability of each OSD

2. Score each OSD and sort the list

- Estimate available space in a time window

- Estimate available performance in a time window

3. Allocate a set of OSDs to the request

- Normalize and weight the availability

Performance model

Scoring model

- Assign OSDs from the list according to the score

Allocation model

Input (A set of reservation requests)

Output (A set of OSDs)

- Check to ensure the assigned OSDs are not overused

Performance model

Each request usually has a time window, space size, and performance.

Balancing space, balancing workload, or something else?

Allocation strategy

Change striping count and iterate this process

15

Resource Allocation (2)• Three models (performance, scoring and allocation

models) should be customizable by storage administrators.

• Our simple models in a prototype:– Read throughput is proportionally shared by multiple accesses.

• E.g. Total 200MB/s by 2 process

-> Each process can get 90 MB/s with 10% overhead.

– Write access is always exclusive to any other accesses.– Balancing I/O workload is first.

• The OSD which can provide higher throughput will be assigned first (a greedy strategy).

– Free space is considered second.

– Minimize striping count and limit the max. striping count• Striping size is fixed as a system-wide parameter.

16

Overview Architecture

Applications

Storage server (SS)

Reserve request

Client node

Management server (MGS)

Reservation management

Metadata management for buckets and objects

Global ResourceCoordinator

Network ResourceManager

Storage ResourceManager

Collocation

Web Services-based protocol

Web Services-based protocol

Storage server (SS)

Storage server (SS)

OSD

(Disk I/O rate control)

OSD

OSD

Allocate resources and administer I/O controls according to the reservation

Commands

Web Services-basedreservation client

Reserve request

(Network flow control)

Client API library

Our proposed distributed storage system

17

I/O Rate Control Framework• The storage server controls I/O rate according to the MG

S’s instruction.– Disk I/O scheduling

• Under development

– A storage network between client and storage servers• Integrate PSPacer into our prototype to configure the target netw

ork bandwidth on the Ethernet

• The instruction is delivered using the capability model.

Management Server(MGS)

OSD Storage server (SS)

Client1. Open request with reservation ID

2. Receive a capability

Sharing the key4. Verify the capability

5. Enforce rate control on this connection

3. Connect request

18

Prototype Implementation• Papio: our developed distributed storage software

– Implemented in C++ on Linux– Use SQLite version 3 for the internal database of MGS– Use EBOFS (an extent and B+tree based object file system) as

our OSD base• Extend the allocation algorithm to support space reservation

– Use PSPacer for network bandwidth control– Support the simple models for resource allocation

• SRM: our developed reservation agent for Papio, providing Web-Services interface (SRM interface)– Implemented in Java– Use GridARS to support the GNS-WSI3 protocol

19

Evaluation• Reservation cost

– Comparison between commad-line and SRM interfaces– Overheads of SRM and Papio

• Performance of reserved v.s. non-reserved access– A single occupation strategy– A multiple occupation strategy

• Experiment environment– 6 machines below connected by Dell PowerConnect 6248

CPU AMD Opteron Quad Core 2.3GHz

Memory 8GB memory

Disk OCZ Apex v3 (SSD)

OS CentOS 5 (Kernel version 2.6.18)

Network 1 GbE (4 nodes) or 10 GbE (2 nodes)

20

Reservation Cost (1)• We had 4 experiment cases.

Storage ResourceManager (SRM)

Web Services-basedreservation client

MGS

Dummy MGS cmds

Node-2 Node-1

Storage ResourceManager (SRM)

Web Services-basedreservation client

MGS cmds

Node-2 Node-1

MGSMGS cmds

Node-1

MGSMGS cmds

Node-1Node-2

a)

b)

c)

d)

21

Reservation Cost (2)• In the result, the SRM interface was 3~4 times slower than command-

line’s because of the polling (100 msec interval) based operation.• The cost is reasonably low and might not be a bottleneck.

Operation

Execution time [msec]

a) b) c) d)

Initialize 82.4 - -

Reserve Total 477 613 149 153

Request 102 105

Confirm 197 322

Commit 177 185

Release Total 386 511 137 141

Request 107 108

Confirm 98 221

Commit 181 183

22

Reserved / non-reserved access (1)• Measured Client-A’s read access:

– Reserved: Papio applies a single occupation strategy that each OSD serves only one access.

SS

Client-A Client-B

SS

OSD

SS

SS

Client-A Client-B

SS

OSD

SS SS

Client-A: Striping

Client-A Client-B

SS

OSD

SS

Client-A: 1 stream

Client-A Client-B

SS

OSD

Reserved

Non-reserved

I/O control is not applied.

Conflict with Client-B’s read or write access

Client-A: StripingClient-A: 1 stream

23

Reserved / non-reserved access (2)• Non-reserved access affected by Client-B’s access.

0

20

40

60

80

100

120

0 400 800 1200 1600

Total read size [MB]

0

50

100

150

200

250

300

350

0 1200 2400 3600 4800Total read size [MB]

Clie

nt-A

’s r

ead

thro

ughp

ut [

MB

/sec

] - Reserved

■ Non-reserved: R-R

X Non-reserved : R-W

1 stream Striping

55MB/s x355MB/s

24

Reserved / non-reserved access (3)• Measured Client-A’s read access:

– Reserved: Papio applies a multiple occupation strategy that each OSD serves more than one access.

SS

Client-A Client-B

SS

OSD

SS

Client-A Client-B

SS

OSD

Reserved

Non-reserved

I/O control by PSPacer is applied.

10% overhead (protocol etc.) estimation

SS

Client-A Client-B

SS

OSD

SS

Client-A Client-B

SS

OSD

80MB/s 20MB/s

80MB/s x320MB/s

Client-A: StripingClient-A: 1 stream

Client-A: StripingClient-A: 1 stream

Conflict with Client-B’s read or write access

25

Reserved / non-reserved access (4)• Reserved access got the requested I/O throughput.

0

20

40

60

80

100

120

0 400 800 1200 1600

Total read size [MB]

0

50

100

150

200

250

300

350

0 1200 2400 3600 4800

Total read size [MB]

Clie

nt-A

’s r

ead

thro

ughp

ut [

MB

/sec

] - Single occupation

■ Non-reserved: R-R

▲ Reserved: controlled

1 stream Striping

80MB/s 80MB/s x3

26

Potential Applications• Constraints

– Require an advance reservation– Read and append-only access for a large amount of data

• Potential applications (scheduled execution?)– Multimedia streaming (We had a demo in August.)– Moving large data between data centers– Server provisioning

VOD service provider

Watch reservation

Streamingserver

xx

xx

xOptical path network Streaming

server

x

Streamingserver

Papio storage

xx

SRM

NRM

Coordinate & reserve resources

27

Related Work• SRM in OGF

– SLA features: retention policy, access latency

• Automatic configuration to satisfy given I/O workload– Hippodrome, MINERVA

• Resource allocation based on performance prediction

• Many existing works for QoS – Disk I/O scheduling

– Network QoS

– Performance monitoring and feedbacked I/O control

We would like to apply some of these techniques to Papio and achieves more fine-grained performance guarantee.

28

Conclusion and Future Work• Proposed “an advance reservation feature by application

users” for storage access.– A different model from that resources are allocated at time of cre

ating/opening files (on-demand)– Design

• Defined performance metrics and storage resources

• Four key components:– Reservation interface– Client API– Resource management framework– I/O control framework

• Implemented Papio and SRM as a prototype and evaluated the basic performance and functions.

• Providing a more sophisticated user interface and a “guarantee” mechanism are in our future work.

29

Acknowledgement• A part of this work was supported by Special

Coordination Funds for Promoting Science and Technology of the Japanese Ministry of Education, Culture, Sports, Science and Technology.