condor-g making condor grid enabled

Post on 30-Dec-2015

86 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Condor-G Making Condor Grid Enabled. Outline. Why use Condor-G Globus Universe GlideIn Status & Future Work. What is Condor-G?. Extensions to Condor to allow access to the Grid through Globus Two Parts Globus Universe GlideIn. Why Use Condor-G. Condor - PowerPoint PPT Presentation

TRANSCRIPT

Jaime FreyComputer Sciences DepartmentUniversity of Wisconsin-Madison

jfrey@cs.wisc.eduhttp://www.cs.wisc.edu/condor

Condor-GMaking Condor Grid

Enabled

www.cs.wisc.edu/condor

Outline

› Why use Condor-G

› Globus Universe

› GlideIn

› Status & Future Work

www.cs.wisc.edu/condor

What is Condor-G?

› Extensions to Condor to allow access to the Grid through Globus

› Two Parts Globus Universe GlideIn

www.cs.wisc.edu/condor

Why Use Condor-G

› Condor Designed to run jobs within a single

administrative domain

› Globus Designed to run jobs across many

administrative domains

› Condor-G Combine the strengths of both

www.cs.wisc.edu/condor

Condor-G Helps Condor Users

› Machines available to Condor users are limited Local Condor Pool Friendly Condor Pools (via Flocking)

› Through Globus, many more machines become available to run your jobs

www.cs.wisc.edu/condor

Condor-G Helps Globus Users

› Globus is primarily an infrastructure upon which to develop distributed applications

› Command-line tools are limited› Some users don’t want to rewrite their

applications to use Globus› Condor-G provides them a powerful

interface to the Grid to run their existing applications

www.cs.wisc.edu/condor

Globus Universe

› Advantages of using Condor as a front-end to Globus Full-featured queuing service Fault-tolerance Credential Management

www.cs.wisc.edu/condor

Full-Featured Queue

› Persistent queue

› Many queue-manipulation tools

› Set up job dependencies (DAGman)

› E-mail notification of events

› Log files

www.cs.wisc.edu/condor

Fault-Tolerance

› Local Crash Queue state kept on disk Condor Master restarts other daemons

› Remote Crash Condor will resubmit jobs Globus jobmanager enhanced to

improve recoverability

www.cs.wisc.edu/condor

Credential Management

› Authentication in Globus is done with limited-lifetime X509 proxies

› Proxy may expire before jobs finish executing

› Condor can put jobs on hold and e-mail user to refresh proxy

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

600 Globusjobs

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

User JobUser Job

Personal Condor Globus Resource

GridManagerGridManager

600 Globusjobs

www.cs.wisc.edu/condor

Globus Universe

› Disadvantages No matchmaking or dynamic

scheduling of jobs No job checkpoint or migration No remote system calls

www.cs.wisc.edu/condor

Solution: GlideIn

› Use the Globus Universe to run the Condor daemons on Globus resources

› When the resources run these GlideIn jobs, they will join your Condor Pool

› Submit your jobs as Standard or Vanilla Universe jobs and they will be matched and run on the Globus resources

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

600 Condorjobs

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

How It Works

ScheddSchedd

LSFLSF

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

How It Works

ScheddSchedd JobManagerJobManager

LSFLSF

User JobUser Job

StartdStartd

CollectorCollector

Personal Condor Globus Resource

GridManagerGridManager

600 Condorjobs

glide-ins

www.cs.wisc.edu/condor

GlideIn Concerns

› What if a Globus resource kills my GlideIn? That resource will disappear from your pool and

you jobs will be rescheduled on other machines

› What if all my jobs are done before a GlideIn runs? If the glided-in Condor daemons are not

matched with a job in 10 minutes, they terminate

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

www.cs.wisc.edu/condor

yourworkstation

personalCondor

Globus Grid

PBS LSF

Condor

GroupCondor

glide-ins

600 Condorjobs

www.cs.wisc.edu/condor

Current Status

› First version of GridManager ready Runs jobs using Globus GRAM Stages executable and standard I/O

using Globus GASS

› Jobmanager changes will be folded into a future release of Globus

› Credential management in progress

www.cs.wisc.edu/condor

Future Work

› GridManager Stage user jobs’ data files

› Automatic GlideIn Condor creates GlideIn jobs when more

resources are needed

› Matchmaking in Globus Universe Use Globus GRIS to create ClassAds for

Globus resources and match them to job ClassAds

www.cs.wisc.edu/condor

Questionsand

Thank You!

top related