configuring and automating an lhc experiment · hlt farm now even better resource optimization in...
TRANSCRIPT
![Page 1: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/1.jpg)
Clara Gaspar, October 2017
Configuring and Automating an LHC Experiment
For faster and better Physics Output
![Page 2: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/2.jpg)
Clara Gaspar, October 2017
❚ LHCb’s New Online Dataflow in Run II❙ Introduced Prompt Calibration and Alignment❙ Aims:
❘ Better Trigger Efficiency❘ Full offline-quality data directly out of the Online System
-> straight to Physics Analyses❘ By Making better use of online farm resources
〡Take advantage of LHC’s duty cycle and shutdown periods
❚ Represented a challenge for the control system
2
Introduction
![Page 3: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/3.jpg)
Clara Gaspar, October 2017
❚ Scope:
3
Experiment Control System
Detector Channels
Front End Electronics
Readout Boards
High Level Trigger Farm
Storage
Trigger
Expe
rimen
t Con
trol
Sys
tem
DAQ
Infrastructure & Detector Control System
External Systems (LHC, Technical Services, Safety, etc.)
TFC
Monitoring
![Page 4: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/4.jpg)
Clara Gaspar, October 2017
Experiment Control System
4
LVDev1
LVDev2
LVDevN
DCS
SubDetNDCS
SubDet2DCS
SubDet1DCS
SubDet1LV
SubDet1TEMP
SubDet1GAS
…
…
Com
man
ds
DAQ
SubDetNDAQ
SubDet2DAQ
SubDet1DAQ
SubDet1FEE
SubDet1RO
FEEDev1
FEEDev2
FEEDevN
ControlUnit
DeviceUnit
…
…
Legend:
INFR. TFC LHC
ECS
HLT
Stat
us &
Ala
rms
❚ Control Units are logical entities:❙ Behave as a Finite State Machine / Rule Based system:
❘ Capable of Partitioning: Exclude/Include children❘ Can take local decisions: Sequence & Automate Operations or Recover Errors
❚ Device Units ❙ Provide the interface to the device (hardware or software)
❚ Implementation:(JCOP project)❙ WinCC-OA❙ SMI++
❚ Deployment:❙ Runs distributed
over ~160 PCs(Virtual Machines)
![Page 5: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/5.jpg)
Clara Gaspar, October 2017
❚ Selects interesting events for Physics❚ Runs distributed on the HLT Farm
❙ Farm Hardware❘ ~1600 nodes, ~ 50000 cores, heterogeneous❘ Organized in 62 sub-farms, 24 to 32 nodes each
❙ Farm Software❘ Dataflow Pattern
〡Buffer Manager Concept❘ Dataflow Tasks
〡Based on Gaudi Online〡Integrated in Control System
like any other Devicevia FSM states&actions
5
The High Level Trigger
![Page 6: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/6.jpg)
Clara Gaspar, October 2017
Farm NodeFarm Node
HLT Farm Evolution❚ Resource Optimization in 2012 -> Deferred HLT
❙ Idea: Buffer data to disk when HLT busy / Process in inter-fill gap➨ Reconfigure the whole farm at start/end of Physics
6
Standard HLT Deferred HLTFarm Node Farm Node during Physics Farm Node Inter-fill
1 MHz
5 KHz
70 KHz HLTHLT HLT
EvtBuilder EvtBuilder
~20%
![Page 7: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/7.jpg)
Clara Gaspar, October 2017
HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT
❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background
➨ Each node has 3 concurrent/independent dataflow slices
7
Farm Node
100%
For DQ only
Local diskBuffer
1 MHz
HLT1
EvtBuilder
100 KHz
![Page 8: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/8.jpg)
Clara Gaspar, October 2017
HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT
❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background
➨ Each node has 3 concurrent/independent dataflow slices
8
Farm Node
100%
For DQ only ConditionsDB
Calib&AlignSelectedEvents
Local diskBuffer
1 MHz
HLT1
EvtBuilder
100 KHz
![Page 9: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/9.jpg)
Clara Gaspar, October 2017
HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT
❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background
➨ Each node has 3 concurrent/independent dataflow slices
9
Farm Node
100%
For DQ only ConditionsDB
Local diskBuffer
1 MHz
12.5 KHz
HLT1 HLT2
EvtBuilder
100 KHz
![Page 10: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/10.jpg)
Clara Gaspar, October 2017
HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT
❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background
➨ Each node has 3 concurrent/independent dataflow slices
10
Farm Node
100%
For DQ only ConditionsDB
Calib&AlignSelectedEvents
Local diskBuffer
1 MHz
12.5 KHz
HLT1 HLT2
EvtBuilder
100 KHz
![Page 11: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/11.jpg)
Clara Gaspar, October 2017
❚ Control Tree❙ From 1 to 3 Independent Trees (actions, states and partitioning)
❚ Dataflow Tasks:❙ In the past were Hardwired in the Control Tree❙ Now Dynamic via a Controller which receives an “Architecture”
11
HLT Farm Control
62 Subfarms
28/32 Nodes per Subfarm
HLT
HLTFNNHLTA02HLTA01
HLTA0101 HLTA0128
…
…
…
EvBuilder Moore Sender
HLTA
HLTFNNHLTA02HLTA01
HLTA0101 HLTA0128
…
…
Controller
…Task 1 Task 2 Task n
HLT2
HLTFNNHLTA02HLTA01
HLTA0101 HLTA0128
…
…
Controller
…Task 1 Task 2 Task n
HLT
HLTFNNHLTA02HLTA01
HLTA0101 HLTA0128
…
…
Controller
…Task 1 Task 2 Task n
![Page 12: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/12.jpg)
Clara Gaspar, October 2017
❚ Defines the dataflow / task layout on each (farm) node
12
“Architecture” Concept
❚ Graphic Editor:❙ Examples:
❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,
Alignment, etc.
❚ Graphic Usedat Run Time
![Page 13: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/13.jpg)
Clara Gaspar, October 2017
❚ Defines the dataflow / task layout on each (farm) node
13
“Architecture” Concept
❚ Graphic Editor:❙ Examples:
❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,
Alignment, etc.
❚ Graphic Usedat Run Time
![Page 14: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/14.jpg)
Clara Gaspar, October 2017
❚ Defines the dataflow / task layout on each (farm) node
14
“Architecture” Concept
❚ Graphic Editor:❙ Examples:
❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,
Alignment, etc.
❚ Graphic Usedat Run Time
![Page 15: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/15.jpg)
Clara Gaspar, October 2017
❚ Defines the dataflow / task layout on each (farm) node
15
“Architecture” Concept
❚ Graphic Editor:❙ Examples:
❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,
Alignment, etc.
❚ Graphic Usedat Run Time
![Page 16: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/16.jpg)
Clara Gaspar, October 2017
❚ Defines the dataflow / task layout on each (farm) node
16
“Architecture” Concept
❚ Graphic Editor:❙ Examples:
❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,
Alignment, etc.
❚ Graphic Usedat Run Time
![Page 17: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/17.jpg)
Clara Gaspar, October 2017
“Scenario” Concept❚ Defines number of HLT1 & HLT2 tasks for load
balancing (initially hardwired in the Architecture)❙ Depending on the LHC/LHCb State and type of node ❙ Can be applied at RunTime and takes effect within a second
17
![Page 18: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/18.jpg)
Clara Gaspar, October 2017
“Scenario” Concept❚ Defines number of HLT1 & HLT2 tasks for load
balancing (initially hardwired in the Architecture)❙ Depending on the LHC/LHCb State and type of node ❙ Can be applied at RunTime and takes effect within a second
18
![Page 19: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/19.jpg)
Clara Gaspar, October 2017 19
“Activity” Concept❚ Run Control driven by an Activity:
❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”
❚ Three Farm Slices -> Three Run Control(s):
![Page 20: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/20.jpg)
Clara Gaspar, October 2017 20
“Activity” Concept❚ Run Control driven by an Activity:
❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”
❚ Three Farm Slices -> Three Run Control(s):
![Page 21: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/21.jpg)
Clara Gaspar, October 2017 21
“Activity” Concept❚ Run Control driven by an Activity:
❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”
❚ Three Farm Slices -> Three Run Control(s):
![Page 22: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/22.jpg)
Clara Gaspar, October 2017 22
“Scheduler” Concept❚ Big Brother automates operations
based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run
Control with very few lines❙ 3 lines to automate main Run Control
![Page 23: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/23.jpg)
Clara Gaspar, October 2017 23
“Scheduler” Concept❚ Big Brother automates operations
based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run
Control with very few lines❙ 3 lines to automate main Run Control
![Page 24: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/24.jpg)
Clara Gaspar, October 2017 24
“Scheduler” Concept❚ Big Brother automates operations
based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run
Control with very few lines❙ 3 lines to automate main Run Control
![Page 25: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/25.jpg)
Clara Gaspar, October 2017 25
“Scheduler” Concept❚ Big Brother automates operations
based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run
Control with very few lines❙ 3 lines to automate main Run Control
![Page 26: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/26.jpg)
Clara Gaspar, October 2017 26
“Scheduler” Concept❚ Big Brother automates operations
based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run
Control with very few lines❙ 3 lines to automate main Run Control
![Page 27: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration](https://reader033.vdocuments.net/reader033/viewer/2022060606/605b11af267cc806c16b5005/html5/thumbnails/27.jpg)
Clara Gaspar, October 2017
❚ LHCb’s New Online Dataflow represented an Operational challenge, but due to:❙ The modular architecture and flexibility of the original design❙ Some new concepts (architectures, scenarios, schedulers)
❚ It is now:❙ Seamlessly Integrated❙ Completely automated
❚ And the farm is kept as busy as possible❙ Also offline simulation
(concurrently) when still free resources.
27
Conclusions