lhcb readiness for run wlcg workshop okinawa
DESCRIPTION
LHCb Run2 Readiness - StR Content Online changes with impact for Offline Offline Data Processing Offline Data Management Services LHCb relies on 11 April '15 LHCb Run2 Readiness - StRTRANSCRIPT
LHCb Readiness for Run 22015 WLCG Workshop OkinawaStefan Roiser / CERN IT-SDCfor LHCb Distributed Computing
LHCb Run2 Readiness - StR 2
Content• Online changes with impact for Offline• Offline Data Processing• Offline Data Management• Services LHCb relies on
11 April '15
LHCb Run2 Readiness - StR 3
Run 1 Planned for Run 2Max beam energy 4 TeV 6.5 TeVTransverse beam emittance 1.8 μm 1.9 μmβ* (beam oscillation) 0.6 m / LHCb 3 m 0.4 m / LHCb 3 mNumber of bunches 1374 2508Max protons per bunch 1.7 * 1011 1.15 * 1011
Bunch spacing 50 ns 25 nsLHC Maximum Luminosity 7.7 * 1033 cm-2s-1 1.6 * 1034 cm-2s-1
LHCb Maximum Luminosity 4 * 1032 cm-2s-1 4 * 1032 cm-2s-1
LHCb μ (avg # collisions/crossing) 1.6 1.2
Preamble – LHC Evolution
NB: LHCb uses “luminosity leveling”, ie. the “in time pile up” and so the instantaneous luminosity stays constant for LHCb during an LHC fill
11 April '15
ATLAS & CMS
LHCb
Pit & Online
4
LHCb Run2 Readiness - StR 5
Trigger Scheme• Hardware trigger reduce event rate to ~ 1 MHz
• High Level Trigger computing farm split into 1. HLT1 with partial event reconstruction, output will
be buffered on local disks2. HLT 1 output used for detector calibration and
alignment (O(hours)). (was done offline in Run 1)3. HLT2 runs deferred with signal event
reconstruction very close to offline reconstruction
• 12.5 kHz event rate to OFFLINE• At ~ 60kB event size this is ~ 750 MB/s• Event rate was 4.5 kHz in Run 1
NB: Because of deferred trigger, very little availability of HLT for offline data processing
11 April '15
See also Marco’s talk tomorrow on further evolution for future Runs
LHCb Run2 Readiness - StR 6
HLT Output Stream Splitting
• 10 kHz go to classic Offline reconstruction / stripping on distributed computing resources• If needed part of this can be “parked” and processed in LS 2
• New concept of “Turbo Stream” in Run 2 for ~ 2.5 kHZ• i.e. wherever sufficient, take the HLT output with its event
reconstruction directly for physics analysis• Initially RAW information included, will be stripped off Offline
11 April '15
12.5 kHz to Storage
10 kHz Full (+Parked) Stream 2.5 kHz Turbo Stream
S. Benson, “The LHCb Turbo Stream”, T1, Village Center, Thu 10am,
Data Processing
7
LHCb Run2 Readiness - StR 811 April '15
Offline Processing Workflow
27 Jan '15 Run2 Operations - StR 8
3GB, 1x
(M)DST(M)DST (M)DST(M)DST
StrippingStripping
RAWRAW Reconstr.Reconstr.
24h
Reconstr.
1. The RAW input file is available on Disk Buffer2. Reconstruction runs ~ 24 h, 1 input RAW, 1 output
FULL.DST to Disk Buffer3. Asynchronous migration of FULL.DST from Disk Buffer to
Tape4. Stripping (DaVinci) runs on 1 or 2 input files (~ 6h/file),
output several unmerged (M)DST files (one per “stream”) to Disk Buffer
1. Input FULL.DST removed from Disk Buffer asynchronously5. Rerun the above workflows for one run6. Once a stream reaches 5 GB of unmerged (M)DSTs (up
to O(100) files), Merging runs ~ 15 – 30 mins, output one merged (M)DST file to Disk
1. Input (M)DST files removed from Disk Buffer asynchronously
6h Stripping
RAW
Legend:Application
File Type
Storage Element
Buffer5GB, 1x
FULL.DST
Buffer
…unmerged(M)DST
unmerged(M)DST
O(MB) 1x
Buffer
5GB, 1x
FULL.DST
Tape
Merging30m
(M)DST (M)DST …5GB, 1xDisk
X
X XX
X
LHCb Run2 Readiness - StR 9
Offline Data Processing Changes• What is reconstructed offline is supposed to be the final
reconstruction pass• Calibration / Alignment from HLT used also offline• No reprocessing (reco) foreseen before end of Run 2
• Expecting a higher stripping retention because of calibration and alignment done ONLINE • Partly damped by moving most physics streams to M(icro)DST format
(Note: MDST O(10kB/Event), DST O(120kB/Event))• All files from one “LHCb Run” are forced to reside on the same
storage • A run is the smallest granularity for physics analysis files• E.g. would reduce impact in case a disk breaks
• Workflow execution is now also possible on Tier 2 sites• Needed because of increase of collected data
11 April '15
LHCb Run2 Readiness - StR 10
Workflow Execution Location
11 April '15
CNAF
RAW
Reco
Strippg
Merge
FULL.DST
unm. DST
DST
• Data Processing workflow executed by default at Tier 0/1 sites (stays the same as in Run 1)
• For Run 2 in addition we allow• A Tier 2 site to participate for a certain Job Type
remotely (most useful would be Reconstruction)• Any Tier 2 is allowed at any time to participate on
any Job Type (no static 1 to 1 “attaching” anymore)• In principle the system also allows for ANY site to
participate on any Job Type remotely
GRIDKA
RAW
Reco
Strppg
Merge
FULL.DST
unm. DST
DST
Manchester
RAW
Reco
FULL.DST
Manchester
RAW
Reco
FULL.DST
RAW
Reco
FULL.DST
X
CNAF
RAW
Reco
FULL.DST
XMonarc
(technically)
LHCb Run2 Readiness - StR 11
All WorkflowsRun 1 Run 2
Data Processing T 0 / 1 T 0 / 1 / 2
Monte Carlo T 2 T 2 (can also run on T 0 / 1 sites if resources available)
User analysisT 0 / 1 T 0 / 1 / 2D(without input data can also run on T2)
11 April '15
• Very flexible computing model allows almost all workflows to be executed on every tier level / resource type
• Interested in running multicore jobs – especially on VMs – but no pressing need for it
• “Elastic MC” – knows the work / event, at start of the payload will calculate on the fly how many events to produce until the “end of the queue”
• User analysis – least amount of work but highest priority in the central task queue
F. Stagni, “Jobs masonry with elastic Grid Jobs”, T4, B250, Mo 5pm
LHCb Run2 Readiness - StR 12
Compute ResourcesNon virtualized Virtualized“Classic Grid” – CE, batch system, … BOINC – volunteer computing
Non pledged – commercial, HPC, … Vac – self managed cloud resources
HLT farm – little use during Run 2 Vcycle – interaction via IaaS
11 April '15
• Expect rampup of virtualized infrastructures during Run 2
• All environments served by the same pilot infrastructure talking to one LHCb/DIRAC central task queue
F. Stagni, “Pilots 2.0: DIRAC pilots for all the skies”, T4, B250, Mo 2pm
A. McNab, “Managing virtual machines with Vac and Vcycle”, T7, C210, Mo 5pm
Data Management
LHCb Run2 Readiness - StR 14
Data Storage• Introduced concept of Tier 2D(isk) sites
• i.e. Tier 2 sites with disk areas >= 300 TB• No more direct processing from “tape caches” foreseen
• Interact with disk buffer via FTS3 and process from there
• E.g. pre-staging “Legacy Run 1Stripping” data
• Should lead to reduction of disk cache size in front of tape
11 April '15
LHCb Run2 Readiness - StR 15
Data Storage (ctd)• Catalogs
• File Catalog: provides replica information, recently migrated from LCG File Catalog to the Dirac File Catalog
• Bookkeeping (unchanged) provides data provenance information
• Data Popularity • Data collected
since 2012
11 April '15
C. Haen, “Federating LHCb datasets using the Dirac File
Catalog”, T3, C209, Mo 4.45pm
M. Hushchyn, “Disk storage mgmt for LHCb on Data Popularity”, T3, C209, Tue 6.15pm
LHCb Run2 Readiness - StR 16
Data Access Operations• Gaudi Federation
• In use since last fall. • LHCb analysis jobs create a local replica catalog for their input data. If
the local copy is not available -> fall back to next remote replica. • Data access protocols
• SRM • Shall be in use for tape interactions• … and for writing to storage (job output upload, data replication)
• Xroot• LHCb will be constructing turls for input data on the fly without SRM
interaction for disk resident data access• Need single and stable xroot endpoint per storage element
• HTTP/WEBDAV • All storage sites are equipped with Http/Webdav access• Could be used as second access protocol
11 April '15
Underlying Services
LHCb Run2 Readiness - StR 18
Services• CVMFS
• Building block for LHCb distributed computing, distributes all software and conditions data
• CernVM • Vac, vcycle, BOINC are using CernVM 3
• FTS 3 • Vital for LHCb WAN transfers and tape interaction (pre-staging of input data)
• Several WLCG monitoring services in use• SAM 3, dashboards, network monitoring• Working on perfSonar data extraction into LHCbDIRAC
• HTTP Federation• Builds on top of http/webdav access, provides easy access to LHCb data
namespace• Development on top ongoing for data consistency checks
11 April '15
F. Furano, “Seamless access to LHCb HTTP/WebDAV storage”, Mo/Tue, Poster Sess. A
LHCb Run2 Readiness - StR 19
Summary• LHCb is ready for Run 2
• Several changes introduced in Run 2• Calibration/Alignment in the HLT farm• Closer integration of Tier 2 sites in data processing• New Dirac file replica catalog deployed• Disk resident data access via direct xroot• CVMFS and FTS3 are key “external” services
11 April '15
LHCb Run2 Readiness - StR 20
Goodie page
11 April '15
http://lhcb-web-dirac.cern.ch/DIRAC/LHCb-Production/undefined/grid/SiteStatus/display?name=LCG.RAL.uk
21