atlas computing tdr

28
ATLAS Computing Computing TDR TDR Lamberto Luminari Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005 CSN1 – Napoli, 22 Settembre 2005

Upload: cruz-young

Post on 03-Jan-2016

57 views

Category:

Documents


2 download

DESCRIPTION

ATLAS Computing TDR. Lamberto Luminari CSN1 – Napoli, 22 Settembre 2005. Computing TDR: modifiche al Comp. Model. Giorni di operazione nel 2007: 100 -> 25-50 Accesso alle risorse nei vari centri: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ATLAS  Computing TDR

ATLAS ComputingComputing TDRTDR

Lamberto LuminariLamberto Luminari

CSN1 – Napoli, 22 Settembre 2005CSN1 – Napoli, 22 Settembre 2005

Page 2: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 22

Computing TDR: modifiche al Comp. Model

Giorni di operazione nel 2007: 100 -> 25-50

Accesso alle risorse nei vari centri:Access to the Tier-0 facility is granted only to people in the central production group and those providing the first-pass calibration.Access to the Tier-1 facilities is essentially restricted to the production managers of the working groups and to the central production group for reprocessing.In principle, all members of the ATLAS virtual organisation have access to a given Tier-2. In practice (and for operational optimization), heightened access to CPU and resources may be given to specific working groups at a particular site, according to a local policy agreed with the ATLAS central administration in a way that the ATLAS global policy is enforced over the aggregate of all sites. An example may be that DPD for the Higgs working group may be replicated to a subset of Tier-2 facilities, and the working group members have heightened access to those facilities.

Page 3: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 33

Computing TDR: modifiche al Comp. Model (2)

Tier-3 ResourcesThere will be a continuing need for local resources within an institution to store user ntuple-equivalents and allow work to proceed off the Grid. Clearly, the user expectations will grow for these facilities, and a site would already provide typically terabytes of storage for local use. Such ‘Tier-3’ facilities (which may be collections of desktops machines or local institute clusters) should be Grid-enabled, both to allow job submission and retrieval from the Grid, and to permit resources to be used temporarily and with agreement as part of the Tier-2 activities. Such resources may be useful for simulation or for the collective analysis of datasets shared with a working group for some of the time. The size of Tier-3 resources will depend on the local user community size and other factors, such as any specific software development or analysis activity foreseen in a given institute, and are therefore neither centrally planned nor controlled. It is nevertheless assumed that every active user will need O(1 TB) of local disk storage and a few kSI2k of CPU capacity to efficiently analyse ATLAS data.

Page 4: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 44

Computing TDR: AOD production

… As AOD events will be read many times more often than ESD and RAW data, AOD events are physically clustered on output by trigger or physics channel or other criteria that reflect analysis access patterns. This means that an AOD production job, unlike an ESD production job, produces many output files. The baseline streaming model is that each AOD event is written to exactly one stream: AOD output streams comprise a disjoint partition of the run. All streams produced in first-pass reconstruction share the same definition of AOD. On the order of 10 streams are anticipated in first-pass reconstruction…… Alternate models have been considered, and could also be viable. It is clear from the experience of the TeVatron experiments that a unique solution is not immediately evident. The above scenario reflects the best current understanding of a viable scheme, taking into account the extra constraints of the considerably larger ATLAS dataset. It relies heavily on the use of event collections and the TAG system. These methods are only undergoing their first serious tests at the time of writing. However, the system being devised is flexible, and can (within limits) sustain somewhat earlier event streaming and modestly overlapping streams without drastic technical or resource implications.

Page 5: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 55

Computing TDR: Offline software

Several orthogonal domain decompositions have been identified:The first spans the ATLAS detector subsystems:

Inner detector ( pixel det. + silicon strip det. + transition radiation tracker).Liquid argon calorimeter.Tile calorimeter.Muon spectrometer.

The primary data processing activities that must be supported for all of these detector subsystems are:

Event generation, simulation, digitization, pile-up, detector reconstruction, combined reconstruction, physics analysis, high level triggering, online monitoring, calibration and alignment processing.

Further domain decompositions cover the infrastructure needed to support the software development activity, and components that derive from the overall architectural vision. The overall structure is the following:

Framework and Core Services (event processing framework based on plug-compatible components and abstract interfaces).Event generators, simulation, digitization and pile-up.Event selection, reconstruction and physics analysis tools.Calibration and alignment.Infrastructure (services that support the software development process).

Page 6: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 66

Offline software: Athena Component Model

Page 7: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 77

Offline software: Athena Major components

Application Manager: the overall driving intelligence that manages and coordinates the activity of all other components within the application.

Algorithms and Sequencers: algorithms provide the basic per-event processing capability of the framework. A Sequencer is a sequence of Algorithms, each of which might itself be another Sequencer.

Tools: a tool is similar to an Algorithm, but differs in that it can be executed multiple times per event.

Transient Data Stores: all the data objects are organized in various transient data stores depending on their characteristics and lifetimes (e.g. event data, detector conditions data, etc…)

Services: provide services needed by the Algorithms. In general these are high-level, designed to support the needs of the physicist. Examples are the message-reporting system, different persistency services, random-number generators, etc.

Selectors: components that perform selection (e.g., the Event Selector provides functionality for selecting the input events that the application will process.

Converters: responsible for converting data from one representation to another. One example is the transformation of an object from its transient form to its persistent form and vice versa.

Utilities: C++ classes that provide general support for other components.

Page 8: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 88

Offline software: Simulation data flow

Page 9: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 99

Offline software: Reconstruction chains

Page 10: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1010

Offline Software for HLT and Monitoring

Page 11: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1111

Computing TDR: Databases and Data Man. (Project)

There are two broad categories of data storage in ATLAS: file-based data and database-resident data or more specifically, relational database-resident data. The two storage approaches are complementary and are used in appropriate contexts in ATLAS:

File storage is used for bulky data such as event data and large conditions data volumes; for contexts in which the remote connectivity (usually) implied by database storage is not reliably available; and generally for cases where simple, lightweight storage is adequate. Database storage is used where concurrent writes and transactional consistency are required; where data handling is inherently distributed, typically with centralized writers and distributed readers; where indexing and rapid querying across moderate data volumes is required; and where structured archival storage and query-based retrieval is required. Vendor neutrality in the DB interface (with implemented support for Oracle, MySQL and SQLite) has been addressed through the development of the Relational Access Layer (RAL) within the POOL project.

COOL (developed in a collaboration between LCG Application Area and ATLAS) is another DB-based storage service layered over RAL and is the basis for ATLAS conditions data storage. It provides for interval-of-validity based storage and retrieval of conditions.

Page 12: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1212

Computing TDR: Databases and Data Management

Use of the conditions database online for subdetector and HLT configuration presents considerable performance challenges. Parallel read performance is beyond the capacity of one database server and replication will have to be used to share the load amongst many slave servers:

One interesting possibility comes from the Frontier project, developed to dis-tribute data using a web-caching technology, where database queries are translated into http requests for web-page content, which can be cached using conventional web proxy server technology. This is particular suitable for distributed read-only access, when updates can be forced by flushing the proxy caches,.

Conditions data will also have to be distributed worldwide, for subsequent reconstruction passes, user analysis and subdetector calibration tasks:

The LCG 3D (Distributed Deployment of Data-bases) project is prototyping the necessary techniques, based on conventional database replication, with an architecture of Oracle servers at Tier 0 (CERN) and Tier-1 centres, and MySQL-based replicas of subsets of the data at Tier-2 sites and beyond. The use of the RAL database backend-independent access library by COOL and other database applications will be particularly important here, to enable such cross-platform replication.

Page 13: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1313

Localcat.Global

cat.

Lexor Dulcinea Capone

LCG NG Grid3 LSF

LCGexecutor

LCGexecutor

NGexecutor

G3executor

LSFexecutor

supervisorsupervisor supervisor supervisor supervisor

prodDBdms

(data man. system)

RLS RLS RLS

Don Quijote

Windmill

AMI

Computing TDR: GRID-based prod. system

Page 14: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1414

Production system performances

jobs per day

0

1000

2000

3000

4000

5000

6000

7000

80006/2

5/2

004

7/2

5/2

004

8/2

5/2

004

9/2

5/2

004

10/2

5/2

004

11/2

5/2

004

12/2

5/2

004

1/2

5/2

005

2/2

5/2

005

3/2

5/2

005

4/2

5/2

005

5/2

5/2

005

6/2

5/2

005

7/2

5/2

005

DC2DC2Rome prod

Rome prod

Jobs per day on the LCG-2 infrastructure

Page 15: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1515

Computing TDR: Tier-0 Operations

Page 16: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1616

Event Builder

Event Filter~7.5 MSI2k

Tier3

10 GB/sec

320 MB/sec

~ 75MB/s

622Mb/s links

~10~10

622Mb/s links

~PB/sec

Tier2~1.5 MSI2k

~4/Tier1~4/Tier1

Tier05. MSI2k - 5 PB/y

Tier18. MSI2k - 2 PB/y

Replica dei datiReplica dei dati

RAW:Una replica completa dei raw data risiede nei Tier-1 (~1/10 per Tier1)Campioni di eventi sono memorizzati anche nei Tier-2 e, in misura minore, nei Tier3

ESD:Tutte le versioni degli ESD sono replicate e risiedono in almeno due dei Tier1Gli ESD primari e i RAW data associati sono assegnati ai ~10 Tier1 con un meccanismo di roundrobinCampioni di eventi sono memorizzati anche nei Tier2 e, in misura minore, nei Tier3

AOD: Sono replicati completamente in ogni Tier1 e parzialmente nei Tier-2 (~1/3 – 1/4). Alcune stream possono essere memorizzate nei Tier3

TAG:I database dei TAG sono replicati in tutti i Tier1 e Tier-2

DPD:Nei Tier1, Tier2 e Tier3

Page 17: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1717

Computing TDR: Resource Requirement Computing TDR: Resource Requirement EvolutionEvolution

Tier-0Tier-0

CAFCAF

Page 18: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1818

Comp. TDR: Resource Requirement Evolution Comp. TDR: Resource Requirement Evolution (2)(2)

Tier-1Tier-1

Tier-2Tier-2

Page 19: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 1919

Computing System Commissioning

Richiesti dai gruppi di fisica 108 eventi simulati con il last layout e con le conoscenze sulla risposta dei rivelatori dai run di cosmici, da studiare a fondo prima della partenza del run a luglio 2007

6 mesi di produzioni sostenute a partire da fine estate 2006

Risorse di calcolo necessarie (calcolate a partire dalla partecipazione alle attivita’ per il Physics workshop con simulazione, ricostruzione e analisi di 7*106 eventi = 15 volte piu' eventi in un tempo ~4 volte piu’ lungo):

4 * potenza di calcolo disponibile per il Physics workshop

Page 20: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2020

Milestone 2006

1. Gennaio 2006: * production release per il commissioning del sistema di computing e studi iniziali sui raggi cosmici * completamento dell'implementazione dell'Event Data Model per la ricostruzione2. Febbraio 2006: * inizio del Data Challenge 3, anche chiamato Commissioning del sistema di computing (Computing System Commissioning)3. Aprile 2006: * integrazione dei componenti di ATLAS con il Service Challenge 4 di LCG4. Luglio 2006: * production release per i run di raggi cosmici (autunno 2006)5. Dicembre 2006: * production release per i primi data reali con i protoni.

Page 21: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2121

Event Builder

Event Filter~7.5 MSI2k

Tier3

10 GB/sec

320 MB/sec

~ 75MB/s

622Mb/s links

~10~10

622Mb/s links

Attività prevista nei centri Attività prevista nei centri italianiitaliani

~PB/sec

Tier2~1.5 MSI2k

~4/Tier1~4/Tier1

Tier05. MSI2k - 5 PB/y

Tier18. MSI2k - 2 PB/y

Ricostruzione:Muon Detector (LE, NA, PV), Calorimetri (MI, PI), Pixel Detector (MI)

Calibrazioni/allineamento/detector data: MDT (LNF, RM1-3), RPC (LE, NA, RM2), Calorimetri (MI, PI), Pixel Detector (MI) Cond. DB (CS), Det. Descr. DB (LE, PI), Det. Mon. (CS, NA, UD)

Studi di performance:Muoni (CS, LE, LNF, NA, PI, PV, RM1-2-3)Tau/jet/EtMiss/egamma (GE, MI, PI)

Analisi:Higgs sia SM che MSSM (CS, LNF, MI, PI, PV, RM1)Susy (LE, MI, NA)Top (PI, UD)Fisica del B (CS, GE, PI)

Simulazioni connesse alle attività suddetteStudi sul modello di analisi

Page 22: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2222

Event Builder

Event Filter~7.5 MSI2k

Tier3

10 GB/sec

320 MB/sec

~ 75MB/s

622Mb/s links

~10~10

622Mb/s links

Risorse necessarie nei Tier-2/3 Risorse necessarie nei Tier-2/3 italianiitaliani

~PB/sec

Tier2~1.5 MSI2k

~4/Tier1~4/Tier1

Tier05. MSI2k - 5 PB/y

Tier18. MSI2k - 2 PB/y

Nei Tier-2:Simulazioni per computing system commissioningCopia degli AOD (108 eventi * 100KB = 10 TB) con diversi sistemi di streaming (esclusivi e inclusivi) per (studi del modello di) analisiCampioni di eventi in formato RAW e ESD per calibrazioni e sviluppo algoritmi di ricostruzioneCalibration centersAttività di analisi organizzate

450 KSI2K (250 già disponibili a fine 2005)80 TB (30 già disponibili a fine 2005)

Nei Tier-3:Attività di analisi individuali e caotiche

40 KSI2K10 TB

Page 23: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2323

T2 Cloud Growth

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Disk (TB)

CPU (kSI2k)

Disk (TB) 1606.60 8747.98 15904.56 25815.10 35725.63 45654.33

CPU (kSI2k) 3653.24 19938.74 31767.93 53014.37 71121.85 89229.33

2007 2008 2009 2010 2011 2012

Risorse complessive dei Tier-2 ATLAS Risorse complessive dei Tier-2 ATLAS (Comp. (Comp. TDR)TDR)

Page 24: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2424

Valutazione costi Tier-2 Valutazione costi Tier-2 (acquisto anno corrente)(acquisto anno corrente)

Tier2INFN 2006 2007 2008 2009 2010

Tot. K€

CPU(kSI2K)

(new) (new) 200200

(tot) (tot) 450450

300300

75075018121812

2532253214201420

383238322709

6261

K€ 117117 114114 453453 241241 325 12501250

Dischi(TB)

(new) 50(new) 50

(tot) 80 (tot) 80 160160

240 240 972972

1212 1212 847847

2039 2039 1334

3194

K€ 115115 224224 855855 466 454454 21142114

Tot. K€ 232 338 1308 707 779779 33643364

Page 25: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2525

Progetti di realizzazione dei Tier-2Progetti di realizzazione dei Tier-2

7/97/9 Identificazione referenti locali ATLAS per i progettiIdentificazione referenti locali ATLAS per i progetti

12/912/9 Formazione commissione tecnica di supportoFormazione commissione tecnica di supporto

12-20/912-20/9 Input da coordinatori di attività e da commissione Input da coordinatori di attività e da commissione tecnica di supportotecnica di supporto

23/923/9 Primo draft progetti localiPrimo draft progetti locali

26-29/926-29/9 Esame preliminare progetti e feedbackEsame preliminare progetti e feedback

30/930/9 Versione “completa”(?) dei progettiVersione “completa”(?) dei progetti

3-4/103-4/10 Workshop Comm. Calcolo -> verifica “tecnica” progettiWorkshop Comm. Calcolo -> verifica “tecnica” progetti

5/105/10 Riunione (virtuale) discussione progettiRiunione (virtuale) discussione progetti

10/1010/10 CSN1CSN1

Page 26: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2626

Richieste 2006 (non Tier-2)Richieste 2006 (non Tier-2)

GEGE 2 Biproc. + Disco 2 TB2 Biproc. + Disco 2 TB 12.4 Keuro12.4 Keuro

BOBO Farm di analisi (infrastruttura + 5 Biproc.) + Disco 4 TBFarm di analisi (infrastruttura + 5 Biproc.) + Disco 4 TB 26.526.5

PIPI 5 kSI2k + Disco 2 TB5 kSI2k + Disco 2 TB 11.11.

PVPV 5 kSI2k + Disco 2 TB5 kSI2k + Disco 2 TB 11.11.

RM2RM2 2 switch Gigabit 2 switch Gigabit 2.2.

RM3RM3 Biproc. + Disco 2 TBBiproc. + Disco 2 TB 7.7.

UDUD Biproc. + Disco 1 TBBiproc. + Disco 1 TB 5.5.

Page 27: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2727

Stima evoluzione dei costi utilizzataStima evoluzione dei costi utilizzata

15P. Capiluppi CSN1 - 17 Maggio 2005

Cost evalution tableCost evalution tableCosts By Pasta III + Bernd-Jul04

2004 2005 2006 2007 2008 2009 2010CPU (€/Si2K) 1.25 0.83 0.55 0.36 0.24 0.16 0.12Disk (€/GB) 5.90 3.69 2.30 1.44 0.90 0.56 0.35Tape (€/GB) 0.53 0.53 0.53 0.53 0.26 0.26 0.26

cost INFN2004 2005 2006 2007 2008 2009 2010

CPU (€/Si2K) 1.30 0.86 0.58 0.38 0.25 0.17 0.12Disk (€/GB) 5.76 3.60 2.25 1.40 0.88 0.55 0.34Tape (€/GB) 0.42 0.42 0.42 0.42 0.21 0.21 0.21

INFN 2005 gare: costo tot cost/unitbox dual da 3kSI2K per 2150 Euro + IVA = 2580 0.861 TB disco FC completo per 3000 Euro + IVA = 3600 3.61 cassetta da 200 GB per 70 Euro/cadauna = 84 0.42

kSI2K/box 2.40 3.60 5.40 8.10 12.15 18.23 27.34GB/disk 200 330 540 900 1500 2400 4000

UltimaUltima garagara CNAF CNAF suisui dischidischi : 2.3 Euro/GB: 2.3 Euro/GB

Page 28: ATLAS  Computing TDR

CSN1 - Napoli 22/09/2005CSN1 - Napoli 22/09/2005 Lamberto Luminari - ATLAS Comp.Lamberto Luminari - ATLAS Comp. 2828

ATLAS jobs run at each LCG siteATLAS jobs run at each LCG site