storage resource management (srm) for grid applications a scidac supported middleware component

16
1 Storage Resource Management Storage Resource Management (SRM) (SRM) For Grid Applications For Grid Applications A SciDAC supported A SciDAC supported middleware component middleware component Arie Shoshani Arie Shoshani Computing Sciences Directorate Computing Sciences Directorate Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm http://sdm.lbl.gov/srm

Upload: ronna

Post on 21-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm. Participants. PI: Arie Shoshani LBNL – 2 FTEs: Arie Shoshani, PI - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

1

Storage Resource ManagementStorage Resource Management(SRM)(SRM)

For Grid ApplicationsFor Grid ApplicationsA SciDAC supportedA SciDAC supported

middleware componentmiddleware component

Arie ShoshaniArie ShoshaniComputing Sciences DirectorateComputing Sciences Directorate

Lawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory

http://sdm.lbl.gov/srmhttp://sdm.lbl.gov/srm

Page 2: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

2

ParticipantsParticipants

PI: Arie Shoshani LBNL – 2 FTEs:Arie Shoshani, PIAlex Sim, co-PIJunmin GuAndreas Mueller

Fermilab – ½ FTE: Don Petravick, Co-PIRich Wellner 

Page 3: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

3

MotivationMotivation

• Grid architecture emphasized in the pastGrid architecture emphasized in the past• Security• Compute resource coordination & scheduling• Network resource coordination & scheduling (QOS)

• SRMs role in the data grid architectureSRMs role in the data grid architecture• Storage resource coordination & scheduling

• Types of storage resource managersTypes of storage resource managers• Disk Resource Manager (DRM)• Tape Resource Manager (TRM)• Hierarchical Resource Manager (TRM + DRM)

Page 4: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

4

Where Do SRMs Fit Where Do SRMs Fit in Grid Architecture?in Grid Architecture?

tape system

HRM

RequestExecuter

DRM

DiskCache

property-file index

Replicacatalog

NetworkWeatherService

logicalquery

pinning & filetransfer requests

network

DRM

DiskCache

clientclient...

RequestInterpreter

requestplanning

logical files

site-specific files

Client’s site

...

DiskCache

site-specific files requests

Page 5: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

5

Challenges (1)Challenges (1)

• Managing storage resources in an unreliable Managing storage resources in an unreliable distributed large heterogeneous systemdistributed large heterogeneous system

• Long lasting data intensive transactionsLong lasting data intensive transactions• Can’t afford to restart jobs• Can’t afford to loose data, especially from experiments

• Type of failuresType of failures• Storage system failures

• Mass Storage System (MSS)• Disk system

• Server failures• Network failures

Page 6: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

6

Challenges (2)Challenges (2)

• HeterogeneityHeterogeneity• Operating systems (well understood)• MSS - HPSS, Castor, Enstore, …• Disk systems – system attached, network attached,

parallel

• Optimization issuesOptimization issues• avoid extra file transfers - What to keep in each disk

caches over time • How to maximize sharing for multiple users• Global optimization• Multi-Tier storage system optimization

Page 7: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

7

Specific ProblemsSpecific Problems

• Managing resource space allocationManaging resource space allocation• What if there is no space?

• Managing pinning of filesManaging pinning of files• What if files can be removed in the middle of a transfer

• Space reservationsSpace reservations• What if multiple files are needed concurrently

• File streamingFile streaming• For processing a large set of files

• Pin-lockPin-lock• What if you pinned files, and system deadlocks

• User prioritiesUser priorities• Access control – who can read/write a fileAccess control – who can read/write a file

Page 8: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

8

HRMs in PPDGHRMs in PPDG (high level view)(high level view)

tape systemDisk

Cache

tape systemDisk

Cache

HRM-COPY

HRM-GET

Replica Coordinator

HRM(performs writes)

HRM(performs reads)

• Monitors files written into BNL’s HPSS• Selects files to replicate• Issues request_to_put for file (or many files)

LBNL BNL

GridFTP GET (pull mode)

Page 9: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

9

Details of InteractionsDetails of Interactions

LBNL-PDSF BNL

Client

HRM HRM

1. request toreplicate

2. file request

3. Stage the file

4. notify the caller

5. gridftp from BNLto PDSF

6. release the file

8. migrate the fileto HPSS

9. notify the client(file in HPSS)

7. notify the client(file in HRM)

Page 10: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

10

MeasurementsMeasurements

20020103123100 20020103123200 20020103123300 20020103123400 20020103123500 20020103123600 20020103123700 20020103123800

time

pro

cess

set287_07_10evts_h_dst.xdf.STAR.DBset195_02_2evts_dst.xdf.STAR.DBset162_01_28evts_dst.xdf.STAR.DBset195_01_33evts_dst.xdf.STAR.DBset193_01_17evts_h_dst.xdf.STAR.DBset165_01_31evts_dst.xdf.STAR.DBset165_02_30evts_dst.xdf.STAR.DBset163_02_24evts_dst.xdf.STAR.DBset163_01_32evts_dst.xdf.STAR.DBset192_01_27evts_dst.xdf.STAR.DB

FILE_REQUEST_FAILED

Notified_Client

Migration_Finished

Migration_Requested

Transfered_to_PDSF_from_BNL

Staging_finished_at_BNL

Staging_started_at BNL

Staging_requested_at_BNL

File replication request start

Page 11: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

11

SC 2001 Demo SetupSC 2001 Demo Setup

DRM

Disk Cache

Disk Cache

Disk Cache

Disk Cache

BerkeleyBerkeleyChicago Livermore

HRMGridFTPGridFTP GridFTPFTP

Disk Cache

BIT-MAPIndex

RequestManager

File TransferMonitoring

DRM GridFTP

Denver

client

server server server server

Logical Request

Data Path

Control path

Legend:

Page 12: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

12

Monitoring File TransferMonitoring File Transfer

Page 13: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

13

AccomplishmentAccomplishment

• Developed HRMs and DRMs using the same uniform Developed HRMs and DRMs using the same uniform protocolsprotocols

• Deployed in PPDGDeployed in PPDG

• Developed Command Line interface to HRMDeveloped Command Line interface to HRM

• Wrote a joint design specification in cordination with Wrote a joint design specification in cordination with EDG, Jlab, and Fermi (to be presented at GGF)EDG, Jlab, and Fermi (to be presented at GGF)

• Wrote a paper for MSS conferenceWrote a paper for MSS conference

• Future: develop a standard protocolFuture: develop a standard protocol

• Future: deploy HRM in ORNL & NERSC for ESG II projectFuture: deploy HRM in ORNL & NERSC for ESG II project

Page 14: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

14

Page 15: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

15

Page 16: Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component

16