what is it? hierarchical storage software developed in collaboration with five us department of...

14

Upload: irene-stevens

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

What is it?

• Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992• Allows storage management of 100s of billions of files spanning 100s

of petabytes for the HPC community.• Licensed and supported by IBM

Why?

• Reduced cost• Scalability• Power usage• Reliability• Speed• Long term storage

How?

• Distributed cluster architecture• Metadata engine IBM DB2• Multiple storage classes• Striped disks and tapes

Who Uses it?

• NCSA BlueWaters• Argonne National Lab• Indiana State University

Disk and Tape

• Hierarchical storage management (HSM)• Frequently used data cache on disk• Archival data on tape

• Automatic migration (Mirror offsite)• Scalable, any instance of HPSS can access many tapes at the same time to provide parallel transfer

rates.

• Pros:• Lower cost• No power usage• Reliable

• Cons:• High latency

• Pros:• Low Latency

• Cons:• Power usage• Reliability• Higher Cost

Standard POSIX interface

• Users can access files using several methods:• FTP – standard FTP from mover• PFTP – Parallel transfer of data from multiple movers• Client API

• HSI – transfer files put/get files from HPSS• HTAR – archive multiple files together and transfer to HPSS

• VFS Client• XFS

Components

• Core Server• Translation Human Readable Name -> HPSS Object Identifiers• Translates virtual volumes into physical volumes

• Allows parallel I/O to the resources• Schedules mounting/dismounting of media

• Migration/Purge Server• Manages migration purge policies

• Disk Migration Purge• Once files have been moved down the hierarchy they are purged from disk

Components

• Tape File Migration• Make additional copies to multi-site setup

• Tape Volume Migration• Move data between tapes to optimally fill up tapes

• Gatekeeper (GK)• Account validation service• Site authorization etc…

• Location Server (LS)• Allows client to determine which location they should contact• Improves speed in multi-site setups

• Physical Volume Library (PVL)• Manages all HPSS physical volumes• Mounting and dismounting ( => PVR)• Atomic mounts for sets of cartridges for parallel access to data

Components

• Physical Volume Repository (PVR)• Interface to request cartridge mounts and dismounts• One to one with tape libraries

• Movers Servers• Handles actual data transfers• Communicates with Core Server to figure out source and destination• Retries moves on failures

Components

Scalability

• Horizontally scales:• Add more movers• Add more tape drives

BlueWaters

• Software “RAIT” is being developed jointly by IBM and NCSA• Add 8+2 reliability to HPSS striping• 40 GbE network• 100,000 tape cartridges• 38.5 TB per hour

Indiana University• Multi-site setup• Centralized archival storage for all campus clusters