porting scientific application on...

37
EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme www.euindiagrid.eu How to Port Scientific Application on the GRID ? Stefano Cozzini CNR/IOM Democritos

Upload: others

Post on 17-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

How to Port Scientific Application on the GRID ?

Stefano CozziniCNR/IOM Democritos

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Outline

• a look at gLite infrastructures• Users&Application&Problems• Our methodology to work on the GRID• Client/server architecture• A real example of porting an application• Conclusions

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Glite infrastructure for applications (1)

• What it can offer: – CPU time on best effort basis if not

otherwise specified/decided with Specific SLAs agreement

– A job submission mechanism (batch system) – Storage and Data Management tools for

specialized data – Several VOs you can take part of...

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

• CPUs are generally loosely coupled • quad/six/eight core machines quite

available• Limited MPI support

– No inter-site MPI computation (not really a problem)

– No real way to distinguish between clusters and farms for MPI parallel jobs..

gLite infrastructure for applications (2)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

A few definitions• Grid infrastructure: A distributed infrastructure of computation and storage resources, which can be used by users in a transparent way (i.e. without need to know about the location of the resources etc.)

• Grid-enabling procedure: The procedure that allows a scientific (or generic) user to solve her scientific computational problem by means of a Grid infrastructure

• Application: A collection of work items to solve a certain problem or to achieve desired results using a Grid infrastructure. (...) In other words, a Grid application may consist of a number of jobs that together fulfill the whole task.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Observations

• An application is not only a software application but rather a complex computational problem to be solved in a Grid environment.

• A successful story of “Grid-enabled applications” refer not just to the porting of some scientific software on the Grid infrastructure but instead to a real exploitation of such software by a scientific community.

• Each application is unique in that it tries to solve a specific computational problem clearly stated by a scientific user or a

scientific community. In this context we can use as a synonym for application the term computational experiment.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Which kind of sofware?• home made codes – easy: users are supposed to know

everything about it• Some well know packages used with slight

modification/adaptations– less easy: users do not know too much

• “black box” application (a.k.a. legacy application)–difficult: nobody knows about it

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

which kind of “computational experiment ?”

• just a single application to run – parallel

• embarrassingly /tightly coupled

– serial

• a bunch of applications linked together (workflow)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Our method to port applications..• Identify them through user's discussions

and meeting:– Understand the computational requirements

of the group– Understand the level of IT skills in the group – Understand their concept of GRID and

applications– Understand the application requirements– Propose a solution (if exists)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Steps to follow..

• Step 0: awareness of grid computing opportunities /analysis of the computational requirement

• Step 1: technical deployment on the infrastructure

• Step 2: benchmarking procedures and assessment of the efficiency

• Step 3: production runs and final evaluation

• Step 4: dissemination of the results among peers.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Analysis

Grid-enabling procedure

Does is fiton the GRID ?

successful ?Benchmarking

analysis

OK ?

Production Runs

DOCUMENTATION AND REPORTS

No

NO

Y

Y

Yes

NO

Review the analysis

Review the procedure

(document why and exit)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Options for step 1: technical deployment

• Simple serial porting of an application– setup of a few simple tools (scripts etc. ) to

run the computational experiment (generally a parametric study)

• Porting of parallel applications with MPI:– this again is quite simple in principle but

present MPI limitation on the infrastructure makes this strategy not very efficient.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Options for step 1: technical deployment (2)

• Porting of complex codes/packages– requires important changes on the original

way to run the application and the experiment associated.

• Client/Server approach– to run embarrassingly or loosely coupled

applications/experiments

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

What do we need to port them on GRID ?

• TOOLS – Scripting languages for CLI – Ganga:

• a tool for computational-task management and easy access to Grid resources;

– Portals

• HUMAN FACTOR – Man power (from both sides: providers+ users) – Goodwill (from users)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Command Line interface..• UI commands are generally complex,

they are usually reiterated very frequently becoming:– boring/error prone/inefficient

• However this approach is really flexible..• Hackers will love it • Windows guys will hate it ( grid is not

“just one click away”.)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

CLI flexibility...• You can combine different commands

into simple scripts• Scripts can be easily re-used/changed...

or combined together..• This is not grid computing this is just

plain linux/unix hacking..• To do that please use your preferred

scripting language.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Which kind of applications/computational

experiments fit/unfit?• FIT:– Parameter sweep computational

experiments– Embarrassingly parallel computations – SMP/openMP programs

• UNFIT:– Tightly coupled parallel programs – I/O, Memory bounded applications

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Client/Server architecture• Fits perfectly for loosely coupled application:• Idea

– Submit N independent jobs and coordinate them through a specifc client-server architecture

– Each node performs a task – Exchanges among clients are coordinated by the

client/server mechanism..

• Requirement:– outbound connectivity from Wns to Server

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Client/Server mechanisms

Server(must a FQDH)

Site A

Site B

Site C

Client (WN)

Client (WN)

Client (WN)

Client (WN)

Client (WN)

UI

WMS

6job

s

CE

CE

2 jobs

2 jobs

2 jobs

CE

Client (WN)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Client/Server examples• BEMUSE:

– Biased Exchange Metadynamics Submission Environment

• Quantum Simulation: – Phonon calculations with Quantum/Espresso

• Both described in the conference proceedings of 2008 COST activity.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

• joint COST D37/EU IndiaGrid/CompChem schools already organized in Trieste (Italy)

– September 2008

– 29 March -01 April 2010

– Proceeding of the 2008 Workshop published in ICTP lecture Series (freely available on line)

– http://publications.ictp.it/lns/vol24/vol24toc.html

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Q/E and the GRIDLarge-scale computations with Quantum ESPRESSO require HPC resources (tightly coupled clusters):

BUT:

often many smaller-size, loosely-coupled or independent computations are required. A few examples:

• the search for transition pathways (Nudged Elastic Band method);

• calculations under diferent conditions (pressure, temperature) or for diferent compositions, or for diferent values of some parameters;

• full phonon dispersions in crystals

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Phonon in crystals

Force Costant Matrixes

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Calculations of phonons • Force constants are computed by ph.x code on a

grid of n q vectors.

• For each q one has to perform a set of linear response calculations one of each irrep ( irreducible representation) generally proportional to the number of atoms

• BUT:– Force constant calculation are independent for

each q – Irreps are almost independent for q: only

some data should collected at the end

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Phonon workfow

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Practical implementation (I)On Q/E Package: minor changes needed in the phonon code, namely

• possibility to run one q-vector at the time (already there)

• possibility to run one irrep (or one group of irreps) at the time and to save partial results.

On GRID infrastructure: Implemented a Python server-client application which takes care of dispatching ph.x jobs and of collecting results

NOTE: server-client is independent from jobs submission mechanisms: it could run everywhere if plugins for job submission are provided.

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Case study: gamma-Al2O

3

• HPC runs:– 11 4/8/16 cpus : –

• 11 q points • 120 irreps for q

A few weeks on modern workstation

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Comparing HPC vs GRID

• HPC client /server approach– For each q an

independent parallel job.

– Parallel jobs can use 4/8/16 CPUS

• GRID client/server approach– each client computes

serially one or more (1,4,6) irreps

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

HPC Results • Time to results: best cases

16 CPUs: ~ 40 hours

08 CPUs: ~ 50 hours

04 CPUs: ~110 hours

NOTES: HPC scheduling policies taken into accounts: -maximum 128 cpus at time

-maximum 12 hours for jobs

-maximum 10 jobs running for users

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Grid experiment

3000 jobs submited in chunks of 500: clients contact back the server, receive input data and starting data fles (hundreds of Mb).Jobs lost in cyberspace (∼ 60% of all contacted servers! of which 30- 40% due to failure in downloading starting data fles) are resubmited.VOs involved: COMPCHEM and EU-IndiaGRID

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

GRID Results (6 irreps per clients)

• Up to 145 independent jobs simultaneously running

• Less 60 hours to complete the experiments :

• Comparable with 8 CPUs parallel runs!

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Conclusions • A realistic application of Quantum ESPRESSO to

frst-principle calculations at the nanoscale was demonstrated on the GRID

• Results produced in a relatively short time in spite of a rather high job failure rate: GRID can be competitive with conventional High-Performance Computers !

• Full exploitation of GRID infrastructure requires however the possibility to select HPC (with MPI), or large multicore machines (with OpenMP), In order to enable High throughput calculation on medium size HPC problem

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Some tips to users (1)

• State clearly the computational requirements:– CPU intensive – I/O requirements– memory requirements– scientific libraries needed (if any)

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Some tips to users (2)

• try to estimate the actual need of resource for your computational problem– how many times do I need to run my

program ? – how much does it take to run ?– is there any room from improvement in term

of performances ?

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Lesson learned● Role of human interaction is fundamental

and sometimes underestimated● GRID users speak different language

from GRID providers● GRID providers need to adopt their

language NOT viceversa ● Technical tools are important BUT not

sufficient to overcome the user inertia..

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

Conclusions

• the successful porting of an application to a Grid environment highly depends on the smooth interaction of three key elements:– the infrastructure, – the problem – people around them

EU-IndiaGrid (RI-031834) is funded by the European Commission under the Research Infrastructure Programme

www.euindiagrid.eu

references

• How to port application on GRID– users.ictp.it/~pub_off/lectures/lns024/02-cozzini2/02-

cozzini2.pdf