high throughput urgent computing

16
High Throughput Urgent Computing Jason Cope [email protected] Condor Week 2008

Upload: abra

Post on 01-Feb-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Condor Week 2008. High Throughput Urgent Computing. Jason Cope [email protected]. Project Collaborators. Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon University of Wisconsin-Madison Ian Alderman Miron Livny. Urgent Computing Use Cases. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Throughput Urgent Computing

High Throughput Urgent Computing

Jason Cope

[email protected]

Condor Week 2008

Page 2: High Throughput Urgent Computing

2

Project Collaborators

Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon

University of Wisconsin-Madison Ian Alderman Miron Livny

Page 3: High Throughput Urgent Computing

3

Urgent Computing Use Cases

Page 4: High Throughput Urgent Computing

4

High Throughput Urgent Computing

Urgent computing provides immediate, cohesive access to computing resources for emergency computations

Support for urgent high throughput computing environments is necessary Support for high throughput emergency computing

applications Urgent cycle scavenging

Page 5: High Throughput Urgent Computing

5

Resources for Urgent Computing Environments

Pros Cons

Dedicated Resources

Immediate access Wasted cycles Cost

Shared Resources

Reuse existing resources Increased utilization

Resource contention Scheduling, authorization

Page 6: High Throughput Urgent Computing

6

SPRUCE

Special PRiority Urgent Computing Environment (SPRUCE) TeraGrid Science Gateway http://spruce.teragrid.org

GOAL: Provide cohesive urgent computing infrastructure for emergency computations Authorization Resource Selection Resource Allocation

Page 7: High Throughput Urgent Computing

7

SPRUCE Architecture Overview ( 1 / 2 )

AutomatedTrigger

HumanTrigger

Right-of-WayToken

2

1

Event

First Responder

Right-of-Way Token

SPRUCE Gateway / Web

Services

Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

Page 8: High Throughput Urgent Computing

8

SPRUCE Architecture Overview ( 2 / 2 )

Conventional Job Submission

Parameters

!Urgent Computing

Parameters

Urgent ComputingJob Submission

SPRUCE JobManager

SupercomputerResource

Local Site Policies

Priority JobQueue

User Team Authentication

3

Choose aResource

4

5

Source: Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

Page 9: High Throughput Urgent Computing

9

SPRUCE Resources

Deployed on TeraGrid resources at IU, NCSA, NCAR, Purdue, TACC, SDSC, UC/ANL

Supported Resource Managers PBS PBS Pro LSF SGE LoadLeveler Cobalt

Local and Grid resource managers supported

Page 10: High Throughput Urgent Computing

10

SPRUCE and Condor

Conventional Job Submission

Parameters

!Urgent Computing

Parameters

Urgent ComputingJob Submission

SPRUCE JobManager

Condor Pool

Local Site Policies

User Team Authentication

3

Choose aResource

4

Adapted from Pete Beckman, ‘SPRUCE: An Infrastructure for Urgent Computing’

Page 11: High Throughput Urgent Computing

11

SPRUCE / Condor Integration

Added support for urgent computing ClassAds SPRUCE_URGENCY SPRUCE_TOKEN_VALID SPRUCE_TOKEN_VALID_CHECK_TIME

Modifications to the Condor schedd that support identifying SPRUCE jobs

SPRUCE Grid ASCII Helper Protocol (GAHP) Server Asynchronously invoke SPRUCE Web service

operations GAHP calls integrated into the Condor schedd

Page 12: High Throughput Urgent Computing

12

SPRUCE / Condor Integration

Page 13: High Throughput Urgent Computing

13

SPRUCE / Condor Integration

SPRUCE provides an authorization mechanism for access to Condor resources “Right-of-Way” access to Condor resources Same authorization infrastructure for

supercomputer and Grid resource access

Leverage existing Condor features to enhance scheduling policies Job ranking / suspension / preemption Site administrators define local scheduling policies

Page 14: High Throughput Urgent Computing

14

SPRUCE / Condor Status

Prototype complete August, 2007 Demonstrated urgent authorization and scheduling

capabilities Deployed and tested on equipment at the

University of Colorado Currently revising the prototype for a stable

software release Condor 7.0 support Final software development iteration before official

release Evaluation of SPRUCE-related software integrated

into larger Condor pools

Page 15: High Throughput Urgent Computing

15

Future Work

High throughput support for urgent computing applications SURA SCOOP CH3D Grid Appliance

Many additional evaluation tasks Application requirements Security Deadline scheduling / response time Reliability / fault tolerance analysis Data management

Page 16: High Throughput Urgent Computing

16

High Throughput Urgent Computing

[email protected]

http://spruce.teragrid.org