the hp autoraid hierarchical storage system

28
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA

Upload: saxton

Post on 07-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM. J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA. INTRODUCTION. must protect data against disk failures: too frequent and too hard to repair possible solutions: for small numbers of disks: mirroring - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

THE HP AUTORAIDHIERARCHICAL STORAGE SYSTEM

J. Wilkes, R. Golding, C. StaelinT. Sullivan

HP Laboratories, Palo Alto, CA

Page 2: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

INTRODUCTION must protect data against disk failures: too

frequent and too hard to repair possible solutions:

for small numbers of disks: mirroring for larger number of disks: RAID

Page 3: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

RAID Typical RAID Organizations

Level 3: bit or byte level interleaved with dedicated parity disk

Level 5: block interleaved with parity blocks stored on all disks

Page 4: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

LIMITATIONS OF RAID (I) Each RAID level performs well for a narrow

range of workloads Too many parameters to configure: data- and

parity-layout, stripe depth, stripe width, cache sizes, write-back policies, ...

Page 5: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

LIMITATIONS OF RAID (II) Changing from one layout to another or

adding capacity requires downloading and reloading the data

Spare disks remain unused until a failure occurs

Page 6: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

A BETTER SOLUTION A managed storage hierarchy:

mirror active data store in a RAID 5 less active data

This requires locality of reference: active subset must be rather stable:

found to be true in several studies

Page 7: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

IMPLEMENTATION LEVEL Storage hierarchy could be implemented

Manually: can use the most knowledge but cannot adapt quickly

In the file system: offers best balance of knowledge and implementation freedom but specific to a particular file system

Through a smart array controller: easiest to deploy (HP AutoRAID)

Page 8: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

MAJOR FEATURES (I) Mapping of host block addresses to physical disk

locations Mirroring of write-active data Adaptation to changes in the amount of data

stored: Starts RAID 5 when array becomes full

Adaptation to workload changes: Hot-pluggable disks, fans, power supplies and

controllers

Page 9: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

MAJOR FEATURES (II) On-line storage capacity expansion: system

switches then to mirroring Can mix or match disk capacities

Controlled fail-over: can havedual controllers (primary/standby)

Active hot spares: used for more mirroring Simple administration and setup: appears to

host as one or more logical units Log-structured RAID 5 writes

Page 10: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

RELATED WORK (I) Storage Technology Corporation Iceberg:

also uses redirection but based on RAID 6 handles variable size records emphasis on very high reliability

Page 11: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

RELATED WORK (II) Floating parity scheme from IBM Almaden:

Relocated parity blocks and uses distributed sparing

Work on log-structured file systems at U.C. Berkeley and cleaning policies

Page 12: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

RELATED WORK (III) Whole literature on hierarchical storage

systems Schemes compressing inactive data Use of non-volatile memory (NVRAM) for

optimizing writes Allows reliable delayed writes

Page 13: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

OVERVIEW

Control

Control

Control

Control

2x10MB/s bus

ParityLogic

Processor,RAM and

Control Logic

Matching RAM

SCSIController

20 MB/sHost Computer

DRAM Read Cache

Other RAMNVRAM Write Cache

Page 14: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

PHYSICAL DATA LAYOUT Data space on disks is broken up into large

Physical EXTents (PEXes): Typical size is 1 MB

PEXes can be combined to form Physical Extent Groups (PEGs) containing at least three PEXes on three different disks

PEGs can be assigned to the mirrored storage class or to the RAID 5 storage class

Segments are the units on contiguous space on a disk (128 KB in prototype)

Page 15: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

LOGICAL DATA LAYOUT Logical allocation and migration unit is the

Relocation Block (RB) Size in prototype was 64 KB:

Smaller RB’s require more mapping information but larger RB’s increase migration costs after small updates

Each PEG holds a fixed number of RB’s

Page 16: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

MAPPING STRUCTURES Map addresses from virtual volumes to

PEGs, PEXes and physical disk addresses Optimized for finding fast the physical

address of a RB given its logical address : Each logical unit has a virtual device

table listing all RB’s in the logical unitand pointing to their PEG

Each PEG has a PEG Table listing all RB’s in the PEG and the PEXes used to store them

Page 17: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

NORMAL OPERATIONS (I) Requests are sent to the controller in SCSI

Command Descriptor Blocks (CDB): Up to 32 CB’s can be simultaneously active

and 2048 other ones queued Long requests are broken into 64 KB

segments

Page 18: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

NORMAL OPERATIONS (II) Read requests:

Test first to see if data are not already in read cache or in non-volatile write cache

Otherwise allocate space in cache and issue one or more requests to back-end storage classes

Write requests return as soon as data are modified in non-volatile write cache:

Cache has a delayed write policy

Page 19: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

NORMAL OPERATIONS

(III) Flushing data from cache can involve;

A back-end write to a mirrored storage class

Promotion from RAID 5 to mirrored storage before the write

Mirrored reads and writes are straightforward

Page 20: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

NORMAL OPERATIONS (IV)

RAID 5 reads are straightforward RAID 5 writes can be done:

On a per-RB base: requires two reads and two writes

In batched writes: more complex but cheaper

Page 21: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

BACKGROUND OPERATIONS

Triggered when array has been idle for some time

Include Compaction of empty RB slots, Migration between storage classes (using

an approximate LRU algorithm) and Load balancing between disks

Page 22: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

MONITORING System also includes:

An I/O logging tool and A management tool for analyzing the

array performance

Page 23: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

PERFORMANCE RESULTS (I) HP AutoRAID configuration with:

16 MB of controller data cache Twelve 2.0GB Seagate Barracuda disks

(7200rpm) Compared with:

Data General RAID array with64 MB front-end cache

Eleven individual disk drives implementing disk striping but without any redundancy

Page 24: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

PERFORMANCE RESULTS (II) Results of OLTP database workload:

AutoRAID was better than RAID array and comparable to set of non-redundant drives

But whole database was stored in mirrored storage!

Micro benchmarks: AutoRAID is always better than RAID array

but has smaller I/O rates than set of drives

Page 25: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

SIMULATION RESULTS (I) Increasing the disk speed improves the

throughput: Especially if density remains constant Transfer rates matter more than rotational

latency

64KB seems to be a good size for the Relocation Blocks: Around the size of a disk track

Page 26: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

SIMULATION RESULTS (II) Best heuristics for selecting the mirrored copy

to be read is shortest queue Allowing write cache overwrites has a HUGE

impact on performance RB’s demoted to RAID should use existing

holes when the system is not too loaded

Page 27: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

SUMMARY (I) System is very easy to set up:

Dynamic adaptation is a big win but it will not work for all workloads

Software is what makes AutoRAID, not the hardware

Being auto adaptive makes AutoRAIDhard to benchmark

Page 28: THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

SUMMARY (II) Future work includes:

System tuning especially Idle period detection Front-end cache management

algorithms Developing better techniques for

synthesizing traces