the atlas computing model roger jones lancaster university chep06 mumbai 13 feb. 2006

25
The ATLAS Computing Model The ATLAS Computing Model Roger Jones Roger Jones Lancaster University Lancaster University CHEP06 CHEP06 Mumbai 13 Feb. 2006 Mumbai 13 Feb. 2006

Upload: andrew-wiley

Post on 28-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

The ATLAS Computing ModelThe ATLAS Computing Model

Roger JonesRoger Jones

Lancaster UniversityLancaster University

CHEP06CHEP06

Mumbai 13 Feb. 2006Mumbai 13 Feb. 2006

Page 2: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 2

OverviewOverview

• Brief summary ATLAS Facilities and their Brief summary ATLAS Facilities and their

rolesroles

• Growth of resourcesGrowth of resources

• CPU, Disk, Mass Storage

• Network requirementsNetwork requirements

• CERN↔ Tier 1 ↔ Tier 2

• Operational Issues and Hot TopicsOperational Issues and Hot Topics

Page 3: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 3

Computing ResourcesComputing Resources

• Computing Model fairly well evolved, documented in C-TDRComputing Model fairly well evolved, documented in C-TDR• Externally reviewed • http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lh

cc-2005-022.pdf

• There are (and will remain for some time) many unknownsThere are (and will remain for some time) many unknowns• Calibration and alignment strategy is still evolving• Physics data access patterns MAY be exercised from June

• Unlikely to know the real patterns until 2007/2008!

• Still uncertainties on the event sizes , reconstruction time

• Lesson from the previous round of experiments at CERN Lesson from the previous round of experiments at CERN

(LEP, 1989-2000)(LEP, 1989-2000)• Reviews in 1988 underestimated the computing requirements by

an order of magnitude!

Page 4: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 4

ATLAS FacilitiesATLAS Facilities

• Event Filter Farm at CERN Event Filter Farm at CERN • Located near the Experiment, assembles data into a stream to the Tier 0 Center

• Tier 0 Center at CERNTier 0 Center at CERN• Raw data Mass storage at CERN and to Tier 1 centers• Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD)• Ship ESD, AOD to Tier 1 centers Mass storage at CERN

• Tier 1 Centers distributed worldwide (10 centers)Tier 1 Centers distributed worldwide (10 centers)• Re-reconstruction of raw data, producing new ESD, AOD• Scheduled, group access to full ESD and AOD

• Tier 2 Centers distributed worldwide (approximately 30 centers)Tier 2 Centers distributed worldwide (approximately 30 centers)• Monte Carlo Simulation, producing ESD, AOD, ESD, AOD Tier 1 centers• On demand user physics analysis

• CERN Analysis FacilityCERN Analysis Facility

• Analysis

• Heightened access to ESD and RAW/calibration data on demand

• Tier 3 Centers distributed worldwideTier 3 Centers distributed worldwide• Physics analysis

Page 5: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 5

ProcessingProcessing

• Tier-0:Tier-0:• Prompt first pass processing on express/calibration

physics stream

• 24-48 hours later, process full physics data stream with

reasonable calibrations Implies large data movement from T0 →T1s

• Tier-1:Tier-1:• Reprocess 1-2 months after arrival with better calibrations

• Reprocess all resident RAW at year end with improved

calibration and software Implies large data movement from T1↔T1 and T1 → T2

Page 6: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 6

Analysis modelAnalysis model

Analysis model broken into two componentsAnalysis model broken into two components• Scheduled central production of augmented AOD,

tuples & TAG collections from ESD

Derived files moved to other T1s and to T2s

• Chaotic user analysis of augmented AOD streams,

tuples, new selections etc and individual user

simulation and CPU-bound tasks matching the

official MC production

Modest job traffic between T2s

Page 7: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 7

Inputs to the ATLAS Inputs to the ATLAS Computing Model (1)Computing Model (1)

Page 8: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 8

Inputs to the ATLAS Inputs to the ATLAS Computing Model (2)Computing Model (2)

Page 9: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 9

Tier 2 view

Tier 0 view

Data FlowData Flow

• EF farm EF farm T0 T0• 320 MB/s continuous

• T0 Raw data T0 Raw data Mass Storage at CERN Mass Storage at CERN

• T0 Raw data T0 Raw data Tier 1 centers Tier 1 centers

• T0 ESD, AOD, TAG T0 ESD, AOD, TAG Tier 1 centers Tier 1 centers • 2 copies of ESD distributed worldwide

• T1 T1 T2 T2• Some RAW/ESD, All AOD, All TAG• Some group derived datasets

• T2 T2 T1 T1• Simulated RAW, ESD, AOD, TAG

• T0 T0 T2 Calibration processing? T2 Calibration processing?

Page 10: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 10

ATLAS partial &“average” T1 Data Flow (2008)ATLAS partial &“average” T1 Data Flow (2008)

Tier-0

CPUfarm

T1T1OtherTier-1s

diskbuffer

RAW

1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day

ESD2

0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day

AOD2

10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day

AODm2

500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day

RAW

ESD2

AODm2

0.044 Hz3.74K f/day44 MB/s3.66 TB/day

RAW

ESD (2x)

AODm (10x)

1 Hz85K f/day720 MB/s

T1T1OtherTier-1s

T1T1EachTier-2

Tape

RAW

1.6 GB/file0.02 Hz1.7K f/day32 MB/s2.7 TB/day

diskstorage

AODm2

500 MB/file0.004 Hz0.34K f/day2 MB/s0.16 TB/day

ESD2

0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day

AOD2

10 MB/file0.2 Hz17K f/day2 MB/s0.16 TB/day

ESD2

0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day

AODm2

500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day

ESD2

0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day

AODm2

500 MB/file0.036 Hz3.1K f/day18 MB/s1.44 TB/day

ESD1

0.5 GB/file0.02 Hz1.7K f/day10 MB/s0.8 TB/day

AODm1

500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day

AODm1

500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day

AODm2

500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day

Plus simulation and Plus simulation and analysis data flowanalysis data flow

Page 11: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 11

Total ATLAS Total ATLAS Requirements in for 2008Requirements in for 2008

Page 12: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 12

Important points:Important points:

• Discussion on disk vs tape storage at Tier-1’sDiscussion on disk vs tape storage at Tier-1’s

• Tape in this discussion means low-access secure storage

• No ‘disk buffers’ included except input to Tier 0

• Storage of Simulation data from Tier 2’sStorage of Simulation data from Tier 2’s

• Assumed to be at T1s

• Need partnerships to plan networking

• Must have fail-over to other sites

• CommissioningCommissioning

• Requirement of flexibility in the early stages

• Simulation is a tunable parameter in T2 numbers!Simulation is a tunable parameter in T2 numbers!

• Heavy Ion running still under discussion.Heavy Ion running still under discussion.

Page 13: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 13

ATLAS T0 ResourcesATLAS T0 Resources

Page 14: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 14

ATLAS T1 ResourcesATLAS T1 Resources

Page 15: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 15

ATLAS T2 ResourcesATLAS T2 Resources

Page 16: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 16

Required Network Required Network BandwidthBandwidth

• CaveatsCaveats• No safety factors

• No headroom

• Just sustained average numbers

• Assumes no years/datasets are ‘junked’

• Physics analysis pattern still under study…

Page 17: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 17

T1 ↔ CERN Bandwidth I+OT1 ↔ CERN Bandwidth I+O

0

500

1000

1500

Jul

Aug

Sep Oct

Nov

Dec Jan

Feb

Mar

Apr

May Ju

nJu

lA

ugS

ep Oct

Nov

Dec Jan

Feb

Mar

Apr

May Ju

nJu

lA

ugS

ep Oct

Nov

Dec Jan

Feb

Mar

Apr

May Ju

nJu

lA

ugS

ep Oct

Nov

Dec

2007 2008 2009 2010

Month

MB

/s (n

omin

al)

ATLAS HI

ATLAS

•Mainly outward data movement

The projected time profile of the nominal bandwidth required between CERN and the Tier-1 cloud.

Page 18: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 18

T1 ↔ T1 Bandwidth I+OT1 ↔ T1 Bandwidth I+O

0

100

200

300

400

500

600

Jul

Aug

Sep Oct

Nov

Dec Ja

nFe

bM

arA

prM

ay Jun

Jul

Aug

Sep Oct

Nov

Dec Ja

nFe

bM

arA

prM

ay Jun

Jul

Aug

Sep Oct

Nov

Dec Ja

nFe

bM

arA

prM

ay Jun

Jul

Aug

Sep Oct

Nov

Dec

2007 2008 2009 2010Month

MB

/s (n

omin

al)

ATLAS HI

ATLAS

•About half is scheduled analysis

The projected time profile of the nominal bandwidth required T1 and T1 cloud.

Page 19: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 19

T1↔ T2 Bandwidth I+OT1↔ T2 Bandwidth I+O

0

50

100

150

200

250

Jul

Aug Sep

Oct

Nov Dec Jan

Feb

Mar

Apr

May Ju

n

Jul

Aug Sep

Oct

Nov Dec Jan

Feb

Mar

Apr

May Ju

n

Jul

Aug Sep

Oct

Nov Dec Jan

Feb

Mar

Apr

May Ju

n

Jul

Aug Sep

Oct

Nov Dec

2007 2008 2009 2010

Month

MB

/s (

no

min

al)

ATLAS HI

ATLAS

The projected time profile of the nominal aggregate bandwidth expected for an average ATLAS Tier- 1 and its three associated Tier-2s.

•Dominated by AOD

Page 20: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 20

Issues 1: T1 ReprocessingIssues 1: T1 Reprocessing

• Reprocessing at Tier 1s is understood in Reprocessing at Tier 1s is understood in

principleprinciple• In practice, requires efficient recall of data from

archive and processing• Pinning, pre-staging, DAGs all required?

• Requires the different storage roles to be well

understood

Page 21: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 21

Issues 2: StreamingIssues 2: Streaming

• This is *not* a theological issueThis is *not* a theological issue• All discussions are about optimisation of data access

• TDR has 4 streams from event filter TDR has 4 streams from event filter • primary physics, calibration, express, problem events

• Calibration stream has split at least once since!

• At AOD, envisage ~10 streamsAt AOD, envisage ~10 streams

• ESD streaming?ESD streaming?• Straw man streaming schemes (trigger based) being agreed

• Will explore the access improvements in large-scale exercises

• Will also look at overlaps, bookkeeping etc

Page 22: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 22

TAG AccessTAG Access

• TAG is a keyed list of variables/eventTAG is a keyed list of variables/event

• Two rolesTwo roles

• Direct access to event in file via pointer

• Data collection definition function

• Two formats, file and databaseTwo formats, file and database

• Now believe large queries require full databaseNow believe large queries require full database

• Restricts it to Tier1s and large Tier2s/CAF

• Ordinary Tier2s hold file-based TAG corresponding to

locally-held datasets

Page 23: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 23

ConclusionsConclusions

• Computing Model Data Flow understood for Computing Model Data Flow understood for

placing Raw, ESD and AOD at Tiered centersplacing Raw, ESD and AOD at Tiered centers• Still need to understand data flow implications of

Physics Analysis

• SC4/Computing System Commissioning in SC4/Computing System Commissioning in

2006 is vital.2006 is vital.

• Some issues will only be resolved with real Some issues will only be resolved with real

data in 2007-8data in 2007-8

Page 24: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 24

Backup SlidesBackup Slides

Page 25: The ATLAS Computing Model Roger Jones Lancaster University CHEP06 Mumbai 13 Feb. 2006

RWL Jones 13 Feb. 2006 MumbaiRWL Jones 13 Feb. 2006 Mumbai 25

Heavy Ion RunningHeavy Ion Running