1 managing distributed computing resources with dirac a.tsaregorodtsev, cppm-in2p3-cnrs, marseille...

1

Managing distributed computing resources with DIRAC

A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

12-17 September 2011, NEC’11, Varna

2

Outline

DIRAC Overview

Main subsystems Workload Management Request Management Transformation Management Data Management

Use in LHCb and other experiments

DIRAC as a service

Conclusion

Introduction

DIRAC is first of all a framework to build distributed computing systems Supporting Service Oriented Architectures GSI compliant secure client/service protocol

• Fine grained service access rules Hierarchical Configuration service for bootstrapping

distributed services and agents This framework is used to build all the DIRAC

systems: Workload Management

• Based on Pilot Job paradigm Production Management Data Management etc

3

PhysicistUser

EGEEPilot

Director

EGI/WLCGGrid

NDGPilot

Director

NDGGrid

EELAPilot

Director

GISELAGrid

CREAMPilot

Director

CREAMCE

MatcherService

ProductionManager

User credentials management

The WMS with Pilot Jobs requires a strict user proxy management system Jobs are submitted to the DIRAC Central Task Queue with

credentials of their owner (VOMS proxy) Pilot Jobs are submitted to a Grid WMS with credentials of a

user with a special Pilot role The Pilot Job fetches the user job and the job owner’s proxy The User Job is executed with its owner’s proxy used to

access SE, catalogs, etc The DIRAC Proxy manager service ensures the

necessary functionality Proxy storage and renewal Possibility to outsource the proxy renewal to the MyProxy

server

5

Direct submission to CEs

Using gLite WMS now just as a pilot deployment mechanism Limited use of brokering

features• For jobs with input data the

destination site is already chosen

Have to use multiple Resource Brokers because of scalability problems

DIRAC is supporting direct submission to CEs CREAM CEs Can apply individual site policy

• Site chooses how much load it can take (Pull vs Push paradigm) Direct measurement of the site state watching the pilot status info

This is a general trend All the LHC experiments declared abandoning eventually gLite

WMS

6

DIRAC sites

Dedicated Pilot Director per (group of) site(s)

On-site Director Site managers have full

control Of LHCb payloads

Off-site Director Site delegates control

to the central service Site must only define a

dedicated local user account

The payload submission through the SSH tunnel

In both cases the payload is executed with the owner credentials

7

On-site Director Off-site Director

DIRAC Sites

Several DIRACsites in productionin LHCb E.g. Yandex

• 1800 cores• Second largest MC

production site

Interesting possibility for small user communities or infrastructures e.g. contributing local clusters building regional or university grids

8

WMS performance

Up to 35K concurrent jobs in ~120 distinct sites Limited by the resources available to LHCb

10 mid-range servers hosting DIRAC central services Further optimizations to increase the capacity are possible

● Hardware, database optimizations, service load balancing, etc

9

Belle (KEK) use of the Amazon EC2

VM scheduler developed for Belle MC production system Dynamic VM spawning taking spot prices and TQ

state into account

10Thomas Kuhr, Belle

Belle Use of the Amazon EC2

Various computing resource combined in a single production system KEK cluster LCG grid sites Amazon EC2

Common monitoring, accounting, etc

11Thomas Kuhr, Belle II

Belle II

Starting at 2015 after the KEK update 50 ab-1 by 2020

Computing model Data rate 1.8 GB/s

( high rate scenario ) Using KEK computing

center, grid and cloud resources

Belle II distributed computing system is based on DIRAC

12

Raw Data Storageand Processing

MC Productionand Ntuple Production

NtupleAnalysis

Thomas Kuhr, Belle II

Support for MPI Jobs

MPI Service developedfor applications in theGISELA Grid Astrophysics, BioMed,

Seismology applications No special MPI support on

sites is required• MPI software installed by

Pilot Jobs MPI ring usage optimization

• Ring reuse for multiple jobs Lower load on the gLite WMS

• Variable ring sizes for different jobs

Possible usage for HEP applications: Proof on demand dynamic sessions

13

Coping with failures

Problem: distributed resources and services are unreliable Software bugs, misconfiguration Hardware failures Human errors

Solution: redundancy and asynchronous operations

DIRAC services are redundant Geographically: Configuration, Request

Management Several instances for any service

14

Request Management system A Request Management

System (RMS) to accept and execute asynchronously any kind of operation that can fail Data upload and registration Job status and parameter reports

Request are collected by RMS instances on VO-boxes at 7 Tier-1 sites Extra redundancy in VO-box availability

Requests are forwarded to the central Request Database For keeping track of the pending requests For efficient bulk request execution

15

DIRAC Transformation Management

Data driven payload generation based on templates

Generating data processing and replication tasks

LHCb specific templates and catalogs

16

Data Management

Based on the Request Management System

Asynchronous data operations transfers, registration,

removal Two complementary

replication mechanisms Transfer Agent

• user data• public network

FTS service• Production data• Private FTS OPN network• Smart pluggable

replication strategies

17

Transfer accounting (LHCb)

18

ILC using DIRAC

ILC CERN group Using DIRAC Workload

Management and Transformation systems

2M jobs run in the first year Instead of 20K planned

initially

DIRAC FileCatalog was developed for ILC More efficient than LFC for common queries Includes user metadata natively

19

DIRAC as a service

DIRAC installation shared by a number of user communities and centrally operated

EELA/GISELA grid gLite based DIRAC is part of the grid production infrastructure

• Single VO

French NGI installation https://dirac.in2p3.fr Started as a service for grid tutorials support Serving users from various domains now

• Biomed, earth observation, seismology, …• Multiple VOs

20

DIRAC as a service

Necessity to manage multiple VOs with a single DIRAC installation Per VO pilot credentials Per VO accounting Per VO resources

description

Pilot directors are VO aware Job matching takes pilot VO

assignment into account

21

DIRAC Consortium

Other projects are starting to use or evaluating DIRAC CTA, SuperB, BES, VIP(medical imaging), …

• Contributing to DIRAC development• Increasing the number of experts

Need for user support infrastructure

Turning DIRAC into an Open Source project DIRAC Consortium agreement in preparation

• IN2P3, Barcelona University, CERN, … http://diracgrid.org

• News, docs, forum

22

Conclusions

DIRAC is successfully used in LHCb for all distributed computing tasks in the first years of the LHC operations

Other experiments and user communities started to use DIRAC contributing their developments to the project

The DIRAC open source project is being built now to bring the experience from HEP computing to other experiments and application domains

23

24

Backup slides

LHCb in brief

25

Experiment dedicated to studying CP-violation Responsible for the dominance

of matter on antimatter Matter-antimatter difference

studied using the b-quark (beauty)

High precision physics (tiny difference…)

Single arm spectrometer Looks like a fixed-target

experiment Smallest of the 4 big LHC

experiments ~500 physicists

Nevertheless, computing is also a challenge….

LHCb Computing Model

Tier0 Center Raw data shipped in real time to Tier-0

Resilience enforced by a second copy at Tier-1’s Rate: ~3000 evts/s (35 kB) at ~100 MB/s

Part of the first pass reconstruction and re-reconstruction Acting as one of the Tier1 center

Calibration and alignment performed on a selected part of the data stream (at CERN) Alignment and tracking calibration using dimuons (~5/s)

• Used also for validation of new calibration PID calibration using Ks, D*

CAF – CERN Analysis Facility Grid resources for analysis Direct batch system usage (LXBATCH) for SW tuning Interactive usage (LXPLUS)

27

Tier1 Center

Real data persistency

First pass reconstruction and re-reconstruction

Data Stripping Event preselection in several streams (if needed) The resulting DST data shipped to all the other Tier1 centers

Group analysis Further reduction of the datasets, μDST format Centrally managed using the LHCb Production System

User analysis Selections on stripped data Preparing N-tuples and reduced datasets for local analysis

28

Tier2-Tier3 centers

No assumption of the local LHCb specific support

MC production facilities Small local storage requirements to buffer MC data before

shipping to a respective Tier1 center

User analysis No assumption of the user analysis in the base Computing model However, several distinguished centers are willing to contribute

• Analysis (Stripped) data replication to T2-T3 centers by site managers

Full or partial sample• Increases the amount of resources capable of running User

Analysis jobs Analysis data at T2 centers available to the whole Collaboration

• No special preferences for local users

29

1 managing distributed computing resources with dirac a.tsaregorodtsev, cppm-in2p3-cnrs, marseille...

Documents

belle belle use

site state

destination site

job owners proxythe

belle kek use

grid wms

dirac systems

site directorsite managers