storage systems in hpc john a. chandy department of electrical and computer engineering university...

Post on 24-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Storage Systems in HPCStorage Systems in HPC

John A. ChandyJohn A. ChandyDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringUniversity of ConnecticutUniversity of Connecticut

Research SummaryResearch Summary

• Storage SystemsStorage Systems

– Active StorageActive Storage

– Parallel File SystemsParallel File Systems

– Reliable Data StorageReliable Data Storage

– Active Storage NetworksActive Storage Networks

Storage SystemsStorage Systems

• Parallel ComputingParallel Computing– Building parallel file systems to support HPCBuilding parallel file systems to support HPC

– Computation at the storage nodeComputation at the storage node

– Data organization methods to improve performanceData organization methods to improve performance

• Reliable Data StorageReliable Data Storage– Customizable and extensible storage for reliabilityCustomizable and extensible storage for reliability

– Backup strategies using personal storage devicesBackup strategies using personal storage devices

– Data security, trust, and reliability in the cloudData security, trust, and reliability in the cloud

Parallel File SystemsParallel File Systems

• Network Attached StorageNetwork Attached Storage

– Put the storage on the network with a Put the storage on the network with a computer (server) acting as the go-betweencomputer (server) acting as the go-between

Network

Parallel File SystemsParallel File Systems

• Separate the metadata from the storageSeparate the metadata from the storage

Network

Metadata

Parallel File SystemsParallel File Systems

• How do you improve metadata performance?How do you improve metadata performance?

– Distribute metadata services on data nodesDistribute metadata services on data nodes

– Use active storage and object servicesUse active storage and object services

Active StorageActive Storage

• Allows us to run applications on storage nodesAllows us to run applications on storage nodes

• Can dramatically reduce data trafficCan dramatically reduce data traffic

– Eliminate large network latenciesEliminate large network latencies

• Take advantage of fast RAID arrays and SSDsTake advantage of fast RAID arrays and SSDs

– Drives bottle-necked by slow networksDrives bottle-necked by slow networks

• Run applications in parallel across multiple nodesRun applications in parallel across multiple nodes

• Make use of unused processor timeMake use of unused processor time

Programming ModelProgramming Model

• Based on object storageBased on object storage

• RPC basedRPC based

– Executable objectsExecutable objects

– RPC calls have full access to all object functions – RPC calls have full access to all object functions – read, write, create, set attribute, etc.read, write, create, set attribute, etc.

• Functions can be synchronous or asyncFunctions can be synchronous or async

• Supports multiple languages (C, Java, Python)Supports multiple languages (C, Java, Python)

Programming ModelProgramming Model

• Based on work by Acharya, Riedel - Stream basedBased on work by Acharya, Riedel - Stream based• Our model is Remote Procedure Call (RPC) basedOur model is Remote Procedure Call (RPC) based

o Use executable objectsUse executable objectso Added command to begin executionAdded command to begin executiono Allow full access to all OSD functionsAllow full access to all OSD functions

• Functions can be run sync or asyncFunctions can be run sync or asynco Due to iSCSI 30sec timeoutDue to iSCSI 30sec timeouto Working to allow queries for asyncWorking to allow queries for async

• Allow parallel execution using asyncAllow parallel execution using async• Support multiple languages (c, java, python)Support multiple languages (c, java, python)

SecuritySecurity

• Multiprocess implementationMultiprocess implementation

– Limits AS functions from directly accessing objectsLimits AS functions from directly accessing objects

– Limits access to the object services libraryLimits access to the object services library

– Enforces use of object security mechanismsEnforces use of object security mechanisms

• chroot sandboxingchroot sandboxing

– C/Java engines run in a chroot directoryC/Java engines run in a chroot directory

– Allows limited system libraries – e.g. libcAllows limited system libraries – e.g. libc

SecuritySecurity

• Multiprocess ImplementationMultiprocess Implementationo Limits AS functions from directly accessing objectsLimits AS functions from directly accessing objectso Limits access to the OSD services libraryLimits access to the OSD services library

Forces the use of RPCForces the use of RPCo Enforces the use of OSD security mechanismsEnforces the use of OSD security mechanisms

• Chroot SandboxingChroot Sandboxingo Applied to enginesApplied to engineso Limits engines inside a single directoryLimits engines inside a single directoryo Allows limiting of librariesAllows limiting of libraries

AS versions of libraries possibleAS versions of libraries possible

Active Storage Code ExampleActive Storage Code Example

Results: AES Local vs. Active Results: AES Local vs. Active Storage Storage

Results: Scaling with Multiple Results: Scaling with Multiple OSDsOSDs

Results: C vs. JavaResults: C vs. Java

High Performance ComputingHigh Performance Computing• Active storage networkActive storage network

– Computing in the networkComputing in the network

– SIMD-like processing of data in motionSIMD-like processing of data in motion

– Adaptive computing network elementsAdaptive computing network elements

– Application optimizations for database queries, scientific applications, Application optimizations for database queries, scientific applications, data mining, sort, etc.data mining, sort, etc.

Active Storage NetworksActive Storage Networks

Data Sort

BECAT CollaborationBECAT Collaboration

• Large Data ProblemsLarge Data Problems

• Parallel File Systems ImplementationParallel File Systems Implementation

top related