job management

17
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Job management Introduction to job management Jorge Gomes , Mario David, Gonçalo Borges LIP

Upload: phamdan

Post on 06-Feb-2017

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Job management

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Job managementIntroduction to job management

Jorge Gomes, Mario David, Gonçalo BorgesLIP

Page 2: Job management

Workload Management System – 29 September 2007

2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Simple direct job submission

• What is the Workload Management System and Resource Brokers?

• How do you use it?

• Further information

• Practicals

Page 3: Job management

Workload Management System – 29 September 2007

3

Enabling Grids for E-sciencE

INFSO-RI-508833

Computing Element

WNWN ...

Batch system

WNWN WNWN

CECE

Internet

SiteLocalnetwork

Farm

Jobs

SiteFirewallgatekeeper qsub or similar

PBS, LSF. SGE, BQS, Condor ...

Page 4: Job management

Workload Management System – 29 September 2007

4

Enabling Grids for E-sciencE

INFSO-RI-508833

Computing Elements and User interfaces

• The most deployed type of Computing Element in EGEE is the so called lcg-CE– Based on the globus gatekeeper– Enhanced by the CERN LCG project– Further enhanced and maintained by EGEE and included in gLite

(ported to the SL4 OS and being currently tested)• Other computing elements include

– gLite-CE now being abandoned– CREAM a web services based CE fully independent from globus

• It is possible to submit jobs directly to the lcg-CE using the globus utilities available in the User Interface

UserUserInterface Interface Nodes

Page 5: Job management

Workload Management System – 29 September 2007

5

Enabling Grids for E-sciencE

INFSO-RI-508833

Architecture summary

CEs

LocalWorkstation

UIUI (user interface) has preinstalled client software

ssh

lcg-CE lcg-CE CREAM-CE gLite-CE

GlobusICE

Resource Broker server

lcg-RBor

I2G CrossBroker

gLite WMS

Page 6: Job management

Workload Management System – 29 September 2007

6

Enabling Grids for E-sciencE

INFSO-RI-508833

Globus job submission

• Direct job submission to the CE in gLite infrastructures may not be support in the future

• CLI tools for direct job submission to CREAM CEs will be provided as well (ICE) ICE-CREAM

• Direct job submission does not fit in the gLite job management philosophy– gLite philosophy is based on job submission through the

Resource Brokers (RB)• But it may be useful for management and test purposes

– It is faster– Uses less levels of middleware services

Page 7: Job management

Workload Management System – 29 September 2007

7

Enabling Grids for E-sciencE

INFSO-RI-508833

Globus job submission

• Uses the Globus GRAM services only

• Steps:1. Find a computing element matching your needs and

a local queue were to submit the job2. Make sure you have a valid proxy3. Use the globus-job-run for synchronous job execution

or the globus-job-submit to get the prompt back4. When using the globus-job-submit you get back a job

identifier that can be used to:a) check the job status with globus-job-statusb) cancel the job with globus-job-cancelc) retrieve the job output with globus-job-get-output

Page 8: Job management

Workload Management System – 29 September 2007

8

Enabling Grids for E-sciencE

INFSO-RI-508833

Without a resource broker…

• Without a resource broker, need direct interaction with nodes– Need to know resource addresses, capabilities, availability …

• Usually want a higher level abstraction – submit a job to a Grid not to a CE

User User Nodes

Page 9: Job management

Workload Management System – 29 September 2007

10

Enabling Grids for E-sciencE

INFSO-RI-508833

Which CE do you want to use?

• Without an RB, use the Information System to see what’s available, then choose… lcg-infosites --vo itut ce

#CPU Free Total Jobs Running Waiting ComputingElement---------------------------------------------------------- 22 22 0 0 0 ce-ieg.bifi.unizar.es:2119/jobmanager-lcgpbs-itut 20 20 0 0 0 ce.i2g.cesga.es:2119/jobmanager-lcgpbs-itutgrid 20 11 0 0 0 ce.i2g.cyf-kr.edu.pl:2119/jobmanager-pbs-itut 350 293 0 0 0 i2gce01.ifca.es:2119/jobmanager-lcgpbs-itut 32 32 0 0 0 i2gce.ui.savba.sk:2119/jobmanager-pbs-itut 60 22 0 0 0 i2g-ce01.lip.pt:2119/jobmanager-lcgsge-itutgridsdj

• The RB does this for you! – chooses CE for each job, balances workload, manages jobs and

their files

Page 10: Job management

Workload Management System – 29 September 2007

11

Enabling Grids for E-sciencE

INFSO-RI-508833

With a Resource Broker

• RB manages jobs on users’ behalf• User doesn’t decide where jobs are run• User defines the job and its requiremements, RB matches

this with available CEs• Effect:

• Easier submission• Users insulated from change in Compute elements • RB – can optimise your jobs – e.g. which CE?

UserUser RBRBComputing Elements

Page 11: Job management

12

Enabling Grids for E-sciencE

INFSO-RI-508833

The gLite job management

CEs

LocalWorkstation

UIUI (user interface) has preinstalled client software

lcg-RBor

WMS

User describes job in text file using Job Description Language (JDL)

Submits job to a RB using (usually) the command-line interface

ssh

lcg-RB server

or

gLite WMS server

Page 12: Job management

Workload Management System – 29 September 2007

13

Enabling Grids for E-sciencE

INFSO-RI-508833

Submitting jobs to lcg-RBs• Jobs run in batch mode on traditional gLite grids. • Steps in running a job on a gLite grid with a lcg-RB:

1. Create a text file in “Job Description Language”2. Optional check: list the compute elements that match

your requirements (“edg-job-list-match myfile.jdl” command)

3. Submit the job ~ “edg-job-submit myfile.jdl”Non-blocking - Each job is given an id.

4. Occasionally check the status of your job (“edg-job-status” command)

5. When “Done” retrieve output (“edg-job-get-output” command)

6. Or just cancel the job (“edg-job-cancel” command)

Page 13: Job management

Workload Management System – 29 September 2007

14

Enabling Grids for E-sciencE

INFSO-RI-508833

Submitting jobs to a gLite WMS• Jobs run in batch mode on traditional gLite grids. • Steps in running a job on a gLite grid with WMS:

1. Create a text file in “Job Description Language”2. Optional check: list the compute elements that match

your requirements (“glite-wms-job-list-match myfile.jdl” command)

3. Submit the job ~ “glite-wms-job-submit myfile.jdl”Non-blocking - Each job is given an id.

4. Occasionally check the status of your job (“glite-wms-job-status” command)

5. When “Done” retrieve output (“glite-wms-job-output” command)

6. Or just cancel the job (“glite-wms-job-cancel” command)

Page 14: Job management

Workload Management System – 29 September 2007

16

Enabling Grids for E-sciencE

INFSO-RI-508833

Executable = “gridTest”;StdError = “stderr.log”;StdOutput = “stdout.log”;InputSandbox = {“/home/joda/test/gridTest”};OutputSandbox = {“stderr.log”, “stdout.log”};Requirements = other.GlueCEPolicyMaxCPUTime > 480;RetryCount = 3;

Example JDL file

Page 15: Job management

Workload Management System – 29 September 2007

17

Enabling Grids for E-sciencE

INFSO-RI-508833

Job states

Flag Meaning

SUBMITTEDsubmission logged in the Logging & Bookkeeping

service

WAIT job match making for resources

READY job being sent to executing CE

SCHEDULED job scheduled in the CE queue manager

RUNNINGjob executing on a Worker Node of the selected

CE queue

DONE job terminated without grid errors

CLEARED job output retrieved

ABORT job aborted by middleware, check reason

Page 16: Job management

Workload Management System – 29 September 2007

23

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

Logging &Logging &Book-keepingBook-keeping

WMSWMS

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

DataSets info

Author.&Authen.

Job Submit

Event

Job Q

uery Job S

tatu

s

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

Publish

SE & CE info

““User User interface”interface”

LCG LCG FileCatalogue FileCatalogue (LFC)(LFC)

Page 17: Job management

Workload Management System – 29 September 2007

24

Enabling Grids for E-sciencE

INFSO-RI-508833

Further information• gLite Users Guide

– Follow http://www.glite.org and “Documentation”

• About JDL and grid– https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JD

L-Attributes-v0-8.pdf– http://www.grid.org.tr/servisler/dokumanlar/DataGrid-JDL-HowTo

.pdf

• GILDA wiki– We are using some of these pages– https://grid.ct.infn.it/twiki/bin/view/GILDA/

• EGEE Digital Library http://egee.lib.ed.ac.uk/