lhcb readiness for run wlcg workshop okinawa

21
LHCb Readiness for Run 2 2015 WLCG Workshop Okinawa Stefan Roiser / CERN IT-SDC for LHCb Distributed Computing

Upload: juniper-bell

Post on 18-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

LHCb Run2 Readiness - StR Content Online changes with impact for Offline Offline Data Processing Offline Data Management Services LHCb relies on 11 April '15 LHCb Run2 Readiness - StR

TRANSCRIPT

Page 1: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Readiness for Run 22015 WLCG Workshop OkinawaStefan Roiser / CERN IT-SDCfor LHCb Distributed Computing

Page 2: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 2

Content• Online changes with impact for Offline• Offline Data Processing• Offline Data Management• Services LHCb relies on

11 April '15

Page 3: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 3

Run 1 Planned for Run 2Max beam energy 4 TeV 6.5 TeVTransverse beam emittance 1.8 μm 1.9 μmβ* (beam oscillation) 0.6 m / LHCb 3 m 0.4 m / LHCb 3 mNumber of bunches 1374 2508Max protons per bunch 1.7 * 1011 1.15 * 1011

Bunch spacing 50 ns 25 nsLHC Maximum Luminosity 7.7 * 1033 cm-2s-1 1.6 * 1034 cm-2s-1

LHCb Maximum Luminosity 4 * 1032 cm-2s-1 4 * 1032 cm-2s-1

LHCb μ (avg # collisions/crossing) 1.6 1.2

Preamble – LHC Evolution

NB: LHCb uses “luminosity leveling”, ie. the “in time pile up” and so the instantaneous luminosity stays constant for LHCb during an LHC fill

11 April '15

ATLAS & CMS

LHCb

Page 4: LHCb Readiness for Run WLCG Workshop Okinawa

Pit & Online

4

Page 5: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 5

Trigger Scheme• Hardware trigger reduce event rate to ~ 1 MHz

• High Level Trigger computing farm split into 1. HLT1 with partial event reconstruction, output will

be buffered on local disks2. HLT 1 output used for detector calibration and

alignment (O(hours)). (was done offline in Run 1)3. HLT2 runs deferred with signal event

reconstruction very close to offline reconstruction

• 12.5 kHz event rate to OFFLINE• At ~ 60kB event size this is ~ 750 MB/s• Event rate was 4.5 kHz in Run 1

NB: Because of deferred trigger, very little availability of HLT for offline data processing

11 April '15

See also Marco’s talk tomorrow on further evolution for future Runs

Page 6: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 6

HLT Output Stream Splitting

• 10 kHz go to classic Offline reconstruction / stripping on distributed computing resources• If needed part of this can be “parked” and processed in LS 2

• New concept of “Turbo Stream” in Run 2 for ~ 2.5 kHZ• i.e. wherever sufficient, take the HLT output with its event

reconstruction directly for physics analysis• Initially RAW information included, will be stripped off Offline

11 April '15

12.5 kHz to Storage

10 kHz Full (+Parked) Stream 2.5 kHz Turbo Stream

S. Benson, “The LHCb Turbo Stream”, T1, Village Center, Thu 10am,

Page 7: LHCb Readiness for Run WLCG Workshop Okinawa

Data Processing

7

Page 8: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 811 April '15

Offline Processing Workflow

27 Jan '15 Run2 Operations - StR 8

3GB, 1x

(M)DST(M)DST (M)DST(M)DST

StrippingStripping

RAWRAW Reconstr.Reconstr.

24h

Reconstr.

1. The RAW input file is available on Disk Buffer2. Reconstruction runs ~ 24 h, 1 input RAW, 1 output

FULL.DST to Disk Buffer3. Asynchronous migration of FULL.DST from Disk Buffer to

Tape4. Stripping (DaVinci) runs on 1 or 2 input files (~ 6h/file),

output several unmerged (M)DST files (one per “stream”) to Disk Buffer

1. Input FULL.DST removed from Disk Buffer asynchronously5. Rerun the above workflows for one run6. Once a stream reaches 5 GB of unmerged (M)DSTs (up

to O(100) files), Merging runs ~ 15 – 30 mins, output one merged (M)DST file to Disk

1. Input (M)DST files removed from Disk Buffer asynchronously

6h Stripping

RAW

Legend:Application

File Type

Storage Element

Buffer5GB, 1x

FULL.DST

Buffer

…unmerged(M)DST

unmerged(M)DST

O(MB) 1x

Buffer

5GB, 1x

FULL.DST

Tape

Merging30m

(M)DST (M)DST …5GB, 1xDisk

X

X XX

X

Page 9: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 9

Offline Data Processing Changes• What is reconstructed offline is supposed to be the final

reconstruction pass• Calibration / Alignment from HLT used also offline• No reprocessing (reco) foreseen before end of Run 2

• Expecting a higher stripping retention because of calibration and alignment done ONLINE • Partly damped by moving most physics streams to M(icro)DST format

(Note: MDST O(10kB/Event), DST O(120kB/Event))• All files from one “LHCb Run” are forced to reside on the same

storage • A run is the smallest granularity for physics analysis files• E.g. would reduce impact in case a disk breaks

• Workflow execution is now also possible on Tier 2 sites• Needed because of increase of collected data

11 April '15

Page 10: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 10

Workflow Execution Location

11 April '15

CNAF

RAW

Reco

Strippg

Merge

FULL.DST

unm. DST

DST

• Data Processing workflow executed by default at Tier 0/1 sites (stays the same as in Run 1)

• For Run 2 in addition we allow• A Tier 2 site to participate for a certain Job Type

remotely (most useful would be Reconstruction)• Any Tier 2 is allowed at any time to participate on

any Job Type (no static 1 to 1 “attaching” anymore)• In principle the system also allows for ANY site to

participate on any Job Type remotely

GRIDKA

RAW

Reco

Strppg

Merge

FULL.DST

unm. DST

DST

Manchester

RAW

Reco

FULL.DST

Manchester

RAW

Reco

FULL.DST

RAW

Reco

FULL.DST

X

CNAF

RAW

Reco

FULL.DST

XMonarc

(technically)

Page 11: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 11

All WorkflowsRun 1 Run 2

Data Processing T 0 / 1 T 0 / 1 / 2

Monte Carlo T 2 T 2 (can also run on T 0 / 1 sites if resources available)

User analysisT 0 / 1 T 0 / 1 / 2D(without input data can also run on T2)

11 April '15

• Very flexible computing model allows almost all workflows to be executed on every tier level / resource type

• Interested in running multicore jobs – especially on VMs – but no pressing need for it

• “Elastic MC” – knows the work / event, at start of the payload will calculate on the fly how many events to produce until the “end of the queue”

• User analysis – least amount of work but highest priority in the central task queue

F. Stagni, “Jobs masonry with elastic Grid Jobs”, T4, B250, Mo 5pm

Page 12: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 12

Compute ResourcesNon virtualized Virtualized“Classic Grid” – CE, batch system, … BOINC – volunteer computing

Non pledged – commercial, HPC, … Vac – self managed cloud resources

HLT farm – little use during Run 2 Vcycle – interaction via IaaS

11 April '15

• Expect rampup of virtualized infrastructures during Run 2

• All environments served by the same pilot infrastructure talking to one LHCb/DIRAC central task queue

F. Stagni, “Pilots 2.0: DIRAC pilots for all the skies”, T4, B250, Mo 2pm

A. McNab, “Managing virtual machines with Vac and Vcycle”, T7, C210, Mo 5pm

Page 13: LHCb Readiness for Run WLCG Workshop Okinawa

Data Management

Page 14: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 14

Data Storage• Introduced concept of Tier 2D(isk) sites

• i.e. Tier 2 sites with disk areas >= 300 TB• No more direct processing from “tape caches” foreseen

• Interact with disk buffer via FTS3 and process from there

• E.g. pre-staging “Legacy Run 1Stripping” data

• Should lead to reduction of disk cache size in front of tape

11 April '15

Page 15: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 15

Data Storage (ctd)• Catalogs

• File Catalog: provides replica information, recently migrated from LCG File Catalog to the Dirac File Catalog

• Bookkeeping (unchanged) provides data provenance information

• Data Popularity • Data collected

since 2012

11 April '15

C. Haen, “Federating LHCb datasets using the Dirac File

Catalog”, T3, C209, Mo 4.45pm

M. Hushchyn, “Disk storage mgmt for LHCb on Data Popularity”, T3, C209, Tue 6.15pm

Page 16: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 16

Data Access Operations• Gaudi Federation

• In use since last fall. • LHCb analysis jobs create a local replica catalog for their input data. If

the local copy is not available -> fall back to next remote replica. • Data access protocols

• SRM • Shall be in use for tape interactions• … and for writing to storage (job output upload, data replication)

• Xroot• LHCb will be constructing turls for input data on the fly without SRM

interaction for disk resident data access• Need single and stable xroot endpoint per storage element

• HTTP/WEBDAV • All storage sites are equipped with Http/Webdav access• Could be used as second access protocol

11 April '15

Page 17: LHCb Readiness for Run WLCG Workshop Okinawa

Underlying Services

Page 18: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 18

Services• CVMFS

• Building block for LHCb distributed computing, distributes all software and conditions data

• CernVM • Vac, vcycle, BOINC are using CernVM 3

• FTS 3 • Vital for LHCb WAN transfers and tape interaction (pre-staging of input data)

• Several WLCG monitoring services in use• SAM 3, dashboards, network monitoring• Working on perfSonar data extraction into LHCbDIRAC

• HTTP Federation• Builds on top of http/webdav access, provides easy access to LHCb data

namespace• Development on top ongoing for data consistency checks

11 April '15

F. Furano, “Seamless access to LHCb HTTP/WebDAV storage”, Mo/Tue, Poster Sess. A

Page 19: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 19

Summary• LHCb is ready for Run 2

• Several changes introduced in Run 2• Calibration/Alignment in the HLT farm• Closer integration of Tier 2 sites in data processing• New Dirac file replica catalog deployed• Disk resident data access via direct xroot• CVMFS and FTS3 are key “external” services

11 April '15

Page 20: LHCb Readiness for Run WLCG Workshop Okinawa

LHCb Run2 Readiness - StR 20

Goodie page

11 April '15

http://lhcb-web-dirac.cern.ch/DIRAC/LHCb-Production/undefined/grid/SiteStatus/display?name=LCG.RAL.uk

Page 21: LHCb Readiness for Run WLCG Workshop Okinawa

21