fp62004infrastructures6-ssa-026409 e-infrastructure shared between europe and latin america special...

26
FP6−2004−Infrastructures−6-SSA-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania Fourth EELA Tutorial for Managers and Users Mexico City, 28 August-1 September 2006

Upload: eric-bruce

Post on 26-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

FP6−2004−Infrastructures−6-SSA-026409

www.eu-eela.org

E-infrastructure shared between Europe and Latin America

Special Jobs

Claudio Cherubino

INFN CataniaFourth EELA Tutorial for Managers and Users Mexico City, 28 August-1 September 2006

Page 2: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 2FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Outline

• MPI jobs on gLite

• DAG

• Job Collection

• Parametric jobs

Page 3: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 3FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

MPI Overview

• Execution of parallel jobs is an essential issue for modern informatics and applications.

• Most used library for parallel jobs support is MPI (Message Passing Interface)

• At the state of the art, parallel jobs can run inside single Computing Elements (CE) only; – several projects are involved into studies concerning the

possibility of executing parallel jobs on Worker Nodes (WNs) belonging to different CEs.

Page 4: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 4FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Requirements & settings

• In order to guarantee that MPI job can run, the following requirements MUST BE satisfied:

– the MPICH software must be installed and placed in the PATH environment variable on each WNs of the CE.

– Some MPI’s applications require a shared filesystem among the WNs to run.

The variable VO_<name_of_VO>_SW_DIR will contain the name of a directory in case of SHARED filesystem.

The variable VO_<name_of_VO>_SW_DIR will contain “.” if there is NO SHARED filesystem.

Page 5: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 5FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

• From the user’s point of view, jobs to be run as MPI are specified setting the JDL JobType attribute to MPICH and specifying the NodeNumber attribute as well.

E.g.:

JobType = “MPICH”;

NodeNumber = 4;

This attribute defines the required number of CPUs needed for the application

Page 6: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 6FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

• When the previous two attributes are included in a JDL, the User Interface (UI) automatically adds the following expression:

to the JDL Requirements expression in order to find out the best resource where the job can be executed.

(other.GlueCEInfoTotalCPUs >= NodeNumber) &&

Member (“MPICH”,other.GlueHostApplicationSoftwareRunTimeEnvironment)

Page 7: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 7FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Create the file “mpi-test.jdl” inside $HOME/gLite/Other and put this contents inside the file:

[

Type = "Job";

JobType = "MPICH";

Executable = “cpi";

NodeNumber = 2;

StdOutput = “cpi.out";

StdError = “cpi.err";

InputSandbox = {"cpi"};

OutputSandbox = {“cpi.err",“cpi.out"};

RetryCount = 0;

]

MPI exercise

Page 8: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 8FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

MPI submission

[glite-tutor] /home/claudio > glite-job-submit -o id mpi-test.jdl

Selected Virtual Organisation name (from proxy certificate extension): gilda

Connecting to host glite-rb.ct.infn.it, port 7772Logging to host glite-rb.ct.infn.it, port 9002

========== glite-job-submit Success ====================== The job has been successfully submitted to the Network

Server. Use glite-job-status command to check job current status.

Your job identifier is:

- https://glite-rb.ct.infn.it:9000/bsrbbzbcXZWSzU3iUYlm6g

The job identifier has been saved in the following file: /home/claudio/id==========================================================

Page 9: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 9FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

MPI status and output

Query the status of the job using the following command:

[glite-tutor] /home/claudio > glite-job-status -i id

…………………………………………….

When the status of the job is “DONE”, you can retrieve output with

the following command:

[glite-tutor] /home/claudio > glite-job-output -i id

……………………………………………

Page 10: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 10

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

• LCG-2 User Guide Manuals Series– https://edms.cern.ch/file/454439/LCG-2-UserGuide.pdf

• http://oscinfo.osc.edu/training/

• http://www.netlib.org/mpi/index.html

• http://www-unix.mcs.anl.gov/mpi/learning.html

• http://www.ncsa.uiuc.edu/

MPI on the web…

Page 11: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 11

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Workload Manager Proxy

Page 12: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 12

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

WMProxy overview

• WMProxy (Workload Manager Proxy) – is a new service providing access to the gLite Workload

Management System (WMS) functionality through a simple Web Services based interface.

– has been designed to efficiently handle a large number of requests for job submission and control to the WMS

– the service interface addresses the Web Services and SOA architecture standards, in particular adhering to WS-I

– developed in C++ using gsoap 2.7.6b as soap stubs generator

Page 13: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 13

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

New request types

• Support for new types strongly relies on newly developed JDL converters and on the DAG submission support– all JDL conversions are performed on the server– a single submission for several jobs

• All new request types can be monitored and controlled through a single handle (the request id)– each sub-jobs can be however followed-up and controlled

independently through its own id

• “Smarter” WMS client commands/API – allow submission of DAGs, collections and parametric jobs

exploiting the concept of “shared sandbox”– allow automatic generation and submission of collections and

DAGs from sets of JDL files located in user specified directories on the UI

Page 14: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 14

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

WMProxy : submission & monitoring

• In order to submit jobs with WMProxy, it’s mandatory to delegate credentials:

• The submission/monitoring commands are slightly different, but most of the “old” options are supported

glite-wms-job-delegate-proxy -d del_ID

glite-wms-job-submit -d del_ID collection.jdl

glite-wms-job-status jobID

glite-wms-job-output \

https://glite-rb.ct.infn.it:9000/LHIIGaCVdl7Olm

sz0jpI_g

Page 15: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 15

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

DAG job

• A DAG job is a set of jobs where input, output, or execution of one or more jobs can depend on other jobs

• Dependencies are represented through Directed Acyclic Graphs, where the nodes are jobs, and the edges identify the dependencies

nodeA

nodeB nodeC NodeF

nodeD

Page 16: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 16

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

JDL structure

Page 17: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 17

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Attribute: Nodes

Page 18: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 18

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Attribute: Dependencies

Page 19: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 19

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

DAG jdl[ type = "dag"; max_nodes_running = 4; nodes = [ nodeA = [ file ="nodes/nodeA.jdl" ; ]; nodeB = [ file ="nodes/nodeB.jdl" ; ]; nodeC = [ file ="nodes/nodeC.jdl" ; ]; nodeD = [ file ="nodes/nodeD.jdl"; ]; dependencies = { {nodeA, nodeB}, {nodeA, nodeC}, { {nodeB,nodeC}, nodeD } } ];]

Node description could also be done here,

instead of using separate files

Page 20: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 20

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Job Collection

• A job collection is a set of independent jobs that user wants to submit and monitor via a single request

• Jobs of a collection are submitted as DAG nodes without dependencies

• JDL is a list of classad, which describes the subjobs

[

Type = "collection";

VirtualOrganisation = “gilda";

nodes = {

[ <job descr 1 >],

[ <job descr 2 >],

};

]

Page 21: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 21

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

‘Scattered’ Input Sandboxes

• Input Sandbox can contain – file paths on the UI machine (i.e. the usual way)– URI pointing to files on a remote gridFTP/HTTPS server

InputSandbox = {"gsiftp://neo.datamat.it:2811/var/prg/sim.exe","https://ghemon.cnaf.infn.it:8443/data/idat_1","file:///home/pacio/myconf“ };

• A base URI to be applied to all sandbox files can also be specified

InputSandboxBaseURI = "gsiftp://matrix.datamat.it:2811/var";

• Only local files (file://) are uploaded to the WMS node• File pointed by URIs are directly downloaded on the WN by the

JobWrapper just before the job is started

Page 22: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 22

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

‘Scattered’ Output Sandboxes

• JDL has been enriched with new attributes for specifying the destinations for the files listed in the OutputSandbox attribute list

OutputSandbox = { "jobOutput","run1/event1","jobError" };

OutputSandboxDestURI = {"gsiftp://matrix.datamat.it/var/jobOutput","https://grid003.ct.infn.it:8443/home/cms/event1","gsiftp://matrix.datamat.it/var/jobError" };

• A base URI to be applied to all sandbox files can also be specified

OutputSandboxBaseDestURI = "gsiftp://neo.datamat.it/home/run1/";

• Files are copied when the job has completed execution by the JobWrapper to the specified destination without transiting on the WMS node

Page 23: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 23

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Job collection example

[ type = "collection"; InputSandbox = {"date.sh"}; RetryCount = 0; nodes = { [ file ="jobs/job1.jdl" ; ], [ [

Executable = "/bin/sh"; Arguments = "date.sh"; Stdoutput = "date.out"; StdError = "date.err"; OutputSandbox ={"date.out", "date.err"};]

], [ file ="jobs/job3.jdl" ; ] };]

All nodes will share this Input Sandbox

Page 24: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 24

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Parametric Job

• A parametric job is a job where one or more of its attributes are parameterized

• Values of attributes vary according to a parameter

• Job monitoring / managing is always done through an unique jobID, as if the job was single (see submission of collection

[ JobType = "Parametric"; Executable = "/bin/sh"; Arguments = "md5.sh input_PARAM_.txt"; InputSandbox = {"md5.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError = "err_PARAM_.txt"; Parameters = 4; ParameterStart = 1; ParameterStep = 1; OutputSandbox = {"out_PARAM_.txt", "err_PARAM_.txt"};]

Page 25: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 25

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

Parametric job / 2

• Parameter can be also a list of string• InputSandbox (if present) has to be coherent with

parameters[ui-test] /home/giorgio/param > cat param2.jdl[ JobType = "Parametric";

Executable = “/bin/cat"; Arguments = “input_PARAM_.txt”;

InputSandbox = "input_PARAM_.txt"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = {earth,moon,mars}; OutputSandbox = {“myoutput_PARAM_.txt”};

]

[ui-test] /home/giorgio/param > ls

inputEARTH.txt inputMARS.txt inputMOON.txt param2.jdl

Page 26: FP62004Infrastructures6-SSA-026409  E-infrastructure shared between Europe and Latin America Special Jobs Claudio Cherubino INFN Catania

Fourth EELA Tutorial, Mexico City, 28 August-1 September 2006 26

FP6−2004−Infrastructures−6-SSA-026409

E-infrastructure shared between Europe and Latin America

References

• JDL attributes specification for WM proxy– https://edms.cern.ch/document/590869/1

• WMProxy quickstart– http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/wmproxy_client_quickstart.shtml

• WMS user guides– https://edms.cern.ch/document/572489/1