the condor data access framework

22
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain

Upload: lucky

Post on 18-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

The Condor Data Access Framework. GridFTP / NeST Day 31 July 2001 Douglas Thain. The Condor Data Access Framework. Philosophy Components Organization: Communities Resource Discovery with ClassAds Example Applications Ongoing Work. Philosophy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Condor Data Access Framework

The CondorData Access Framework

GridFTP / NeST Day31 July 2001

Douglas Thain

Page 2: The Condor Data Access Framework

The CondorData Access Framework Philosophy Components Organization: Communities Resource Discovery with ClassAds Example Applications Ongoing Work

Page 3: The Condor Data Access Framework

Philosophy Goal: location-independent execution of jobs

with large I/O needs. Build moderately-sized mechanisms that can

be quickly deployed to existing problems. With experience, explore general-purpose

polcies and larger systems. Priorities:

Reliability and Correctness Throughput (PB/year) … Performance (MB/sec)

Page 4: The Condor Data Access Framework

Where does Globus fit in? We expect that the Globus protocols

will be the lingua franca of the grid. Condor is committed to speaking

the right language in order to participate.

Like any integration effort, there are some impedance-matching problems in both protocols and APIs.

None are insurmountable.

Page 5: The Condor Data Access Framework

Components NeST - Network Storage Appliance ReqEx - Scheduled Data Mover Kangaroo - Opportunistic Data Mover Bypass - Adapts Apps to Grid ClassAds - Express Relationships

Others?

Page 6: The Condor Data Access Framework

NeST

MSSNeSTFTPD

Schedules I/O according to declarations.

ReqEx

Performs I/O as apps request and conditions permit.

BypassAdapts ordinary I/O operations into grid protocols.

ClassAdsExpress relationships and restrictions between participants.

Page 7: The Condor Data Access Framework

ReqEx

FTPD NeST

Begin with list of jobs and data needs.

Reserve space,Move inputs,Submit jobs,Move outputs.

Scheduled Data Mover

Page 8: The Condor Data Access Framework

Kangaroo

FTPD NeSTNeST

Move outputs back:During executionAs conditions permitFine-grainedHop-by-hop

Move inputs:On demandShould

cache

Opportunistic Data Mover

Page 9: The Condor Data Access Framework

Bypass

NeST

Bypass

Creates interposition agents that re-route system calls to other code.

Pluggable File System (PFS): An agent build with Bypass. Presents grid protocols as filesystems.

vi /ftp/coral.cs.wisc.edu/etc/hosts

Page 10: The Condor Data Access Framework

Organizing Structure:I/O Communities A community is simply a storage appliance

shared by a number of CPUs. Traditional community: distributed file

system. Ordinary users want to restructure

communities according to application and load.

So, communities for grid computing should be easy to set up, reconfigure, and tear down.

NeST + Bypass makes this easy -- use the protocol appropriate for the situation.

Page 11: The Condor Data Access Framework

I/O Communities

Short-haul I/O

Long-haul I/O GridFTP

Chirp

Page 12: The Condor Data Access Framework

What Discovery System?

Device Discovery

Replica Discovery If X is not on my disk, where can I find it?

Where is my disk?

Where can I placeMy output now?

If I fetch X, where should I put it so that others can find it?

Page 13: The Condor Data Access Framework

Everything Together

AgentJob

Device Discovery

Replica Discovery

CPUDiscovery

Execution Site NeST

RemoteStorage

Short-Haul Long-Haul

Page 14: The Condor Data Access Framework

Resource Discoverywith ClassAds “Classic” ClassAds describe the

properties and requirements of two parties looking for each other.

When expressing I/O communites, there are three parties to a match: jobs, machines, and storage.

By extending the language slightly, we allow jobs to refer to the properties of the attached storage: Requirements = NearestStorage.HasCMSData

Page 15: The Condor Data Access Framework

Classic ClassAds

MachineMachineJob

JobAd

MachineAd

matc

h

Page 16: The Condor Data Access Framework

References in ClassAds

MachineMachine NeSTJob

JobAd

MachineAd

StorageAd

matc

h

Refers toNearestStorage.

Knows whereNearestStorage is.

Page 17: The Condor Data Access Framework

ClassAd ExampleJob Ad:

Type = “Job”Cmd = “cmsim.exe”Owner = “thain”

Requirements = (OpSys==LINUX)

&&(NearestStorage.HasCMS)

Machine Ad:

Type = “Machine”Name = “vulture”OpSys = “Linux”

Requirements =(Owner==“thain”)

NearestStorage = (Type==“Storage”)

&&(Name==“turkey”)

Storage Ad:

Type = “Storage”

Name = “turkey”

HasCMS = True

CMSPath = “/cms”

Page 18: The Condor Data Access Framework

Notes on ClassAds Every match is a hint

Participants must verify in claiming phase.

Storage: If dataset is missing, abort process and roll back.

Reference feature is new - Condor 6.3 A variation on ‘gang-matching’ as

described by Raman, et. al.

Page 19: The Condor Data Access Framework

Example Applications I/O Communities:

Applied to CMS simulation codes running at INFN and UW. Unmodified apps retrieve calibration data from nearest NeST.

Kangaroo Applied to Gaussian codes running at NCSA. Users

get progressive output when possible, but network failures don’t stop output.

Same idea applied to CMS reconstruction at INFN. (Older work called Grid Console.)

ReqEx In testing mode on CMS reconstruction at UW.

Page 20: The Condor Data Access Framework

Ongoing Work Move jobs to data or vice versa?

We can easily build communities for a particular application. Can we build software that works reasonably well in any situation?

Select staging or remote I/0? Depends on number of jobs, storage

capacity, network capacity, etc… Integration with replica management.

Is the App->NeST channel collection aware?

Page 21: The Condor Data Access Framework

Upcoming Publications Thain, Basney, Chang, Livny, “The

Kangaroo Approach to Data Movement on the Grid”, HPDC 10.

Thain, Bent, Livny, Arpaci-Dusseau, Arpaci Dusseau, “Gathering at the Well: Creating Communities for Grid I/O” - Supercomputing 2001.

Page 22: The Condor Data Access Framework