configuring and automating an lhc experiment · hlt farm now even better resource optimization in...

27
Clara Gaspar, October 2017 Configuring and Automating an LHC Experiment For faster and better Physics Output

Upload: others

Post on 18-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

Configuring and Automating an LHC Experiment

For faster and better Physics Output

Page 2: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ LHCb’s New Online Dataflow in Run II❙ Introduced Prompt Calibration and Alignment❙ Aims:

❘ Better Trigger Efficiency❘ Full offline-quality data directly out of the Online System

-> straight to Physics Analyses❘ By Making better use of online farm resources

〡Take advantage of LHC’s duty cycle and shutdown periods

❚ Represented a challenge for the control system

2

Introduction

Page 3: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Scope:

3

Experiment Control System

Detector Channels

Front End Electronics

Readout Boards

High Level Trigger Farm

Storage

Trigger

Expe

rimen

t Con

trol

Sys

tem

DAQ

Infrastructure & Detector Control System

External Systems (LHC, Technical Services, Safety, etc.)

TFC

Monitoring

Page 4: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

Experiment Control System

4

LVDev1

LVDev2

LVDevN

DCS

SubDetNDCS

SubDet2DCS

SubDet1DCS

SubDet1LV

SubDet1TEMP

SubDet1GAS

Com

man

ds

DAQ

SubDetNDAQ

SubDet2DAQ

SubDet1DAQ

SubDet1FEE

SubDet1RO

FEEDev1

FEEDev2

FEEDevN

ControlUnit

DeviceUnit

Legend:

INFR. TFC LHC

ECS

HLT

Stat

us &

Ala

rms

❚ Control Units are logical entities:❙ Behave as a Finite State Machine / Rule Based system:

❘ Capable of Partitioning: Exclude/Include children❘ Can take local decisions: Sequence & Automate Operations or Recover Errors

❚ Device Units ❙ Provide the interface to the device (hardware or software)

❚ Implementation:(JCOP project)❙ WinCC-OA❙ SMI++

❚ Deployment:❙ Runs distributed

over ~160 PCs(Virtual Machines)

Page 5: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Selects interesting events for Physics❚ Runs distributed on the HLT Farm

❙ Farm Hardware❘ ~1600 nodes, ~ 50000 cores, heterogeneous❘ Organized in 62 sub-farms, 24 to 32 nodes each

❙ Farm Software❘ Dataflow Pattern

〡Buffer Manager Concept❘ Dataflow Tasks

〡Based on Gaudi Online〡Integrated in Control System

like any other Devicevia FSM states&actions

5

The High Level Trigger

Page 6: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

Farm NodeFarm Node

HLT Farm Evolution❚ Resource Optimization in 2012 -> Deferred HLT

❙ Idea: Buffer data to disk when HLT busy / Process in inter-fill gap➨ Reconfigure the whole farm at start/end of Physics

6

Standard HLT Deferred HLTFarm Node Farm Node during Physics Farm Node Inter-fill

1 MHz

5 KHz

70 KHz HLTHLT HLT

EvtBuilder EvtBuilder

~20%

Page 7: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT

❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background

➨ Each node has 3 concurrent/independent dataflow slices

7

Farm Node

100%

For DQ only

Local diskBuffer

1 MHz

HLT1

EvtBuilder

100 KHz

Page 8: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT

❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background

➨ Each node has 3 concurrent/independent dataflow slices

8

Farm Node

100%

For DQ only ConditionsDB

Calib&AlignSelectedEvents

Local diskBuffer

1 MHz

HLT1

EvtBuilder

100 KHz

Page 9: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT

❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background

➨ Each node has 3 concurrent/independent dataflow slices

9

Farm Node

100%

For DQ only ConditionsDB

Local diskBuffer

1 MHz

12.5 KHz

HLT1 HLT2

EvtBuilder

100 KHz

Page 10: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

HLT Farm Now❚ Even better Resource Optimization in 2015 -> Split HLT

❙ Idea: Buffer ALL data to disk after HLT1 / Perform Calibration & Alignment / Run HLT2 permanently in background

➨ Each node has 3 concurrent/independent dataflow slices

10

Farm Node

100%

For DQ only ConditionsDB

Calib&AlignSelectedEvents

Local diskBuffer

1 MHz

12.5 KHz

HLT1 HLT2

EvtBuilder

100 KHz

Page 11: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Control Tree❙ From 1 to 3 Independent Trees (actions, states and partitioning)

❚ Dataflow Tasks:❙ In the past were Hardwired in the Control Tree❙ Now Dynamic via a Controller which receives an “Architecture”

11

HLT Farm Control

62 Subfarms

28/32 Nodes per Subfarm

HLT

HLTFNNHLTA02HLTA01

HLTA0101 HLTA0128

EvBuilder Moore Sender

HLTA

HLTFNNHLTA02HLTA01

HLTA0101 HLTA0128

Controller

…Task 1 Task 2 Task n

HLT2

HLTFNNHLTA02HLTA01

HLTA0101 HLTA0128

Controller

…Task 1 Task 2 Task n

HLT

HLTFNNHLTA02HLTA01

HLTA0101 HLTA0128

Controller

…Task 1 Task 2 Task n

Page 12: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Defines the dataflow / task layout on each (farm) node

12

“Architecture” Concept

❚ Graphic Editor:❙ Examples:

❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,

Alignment, etc.

❚ Graphic Usedat Run Time

Page 13: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Defines the dataflow / task layout on each (farm) node

13

“Architecture” Concept

❚ Graphic Editor:❙ Examples:

❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,

Alignment, etc.

❚ Graphic Usedat Run Time

Page 14: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Defines the dataflow / task layout on each (farm) node

14

“Architecture” Concept

❚ Graphic Editor:❙ Examples:

❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,

Alignment, etc.

❚ Graphic Usedat Run Time

Page 15: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Defines the dataflow / task layout on each (farm) node

15

“Architecture” Concept

❚ Graphic Editor:❙ Examples:

❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,

Alignment, etc.

❚ Graphic Usedat Run Time

Page 16: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ Defines the dataflow / task layout on each (farm) node

16

“Architecture” Concept

❚ Graphic Editor:❙ Examples:

❘ Old-style HLT❘ HLT1❘ HLT2❘ Calibration,

Alignment, etc.

❚ Graphic Usedat Run Time

Page 17: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

“Scenario” Concept❚ Defines number of HLT1 & HLT2 tasks for load

balancing (initially hardwired in the Architecture)❙ Depending on the LHC/LHCb State and type of node ❙ Can be applied at RunTime and takes effect within a second

17

Page 18: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

“Scenario” Concept❚ Defines number of HLT1 & HLT2 tasks for load

balancing (initially hardwired in the Architecture)❙ Depending on the LHC/LHCb State and type of node ❙ Can be applied at RunTime and takes effect within a second

18

Page 19: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 19

“Activity” Concept❚ Run Control driven by an Activity:

❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”

❚ Three Farm Slices -> Three Run Control(s):

Page 20: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 20

“Activity” Concept❚ Run Control driven by an Activity:

❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”

❚ Three Farm Slices -> Three Run Control(s):

Page 21: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 21

“Activity” Concept❚ Run Control driven by an Activity:

❙ Defines the set of parameters (“recipe”) which is applied by each sub-system.❘ Example: “PHYSICS|LEAD”❘ Contains, for example, the “Architecture”

❚ Three Farm Slices -> Three Run Control(s):

Page 22: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 22

“Scheduler” Concept❚ Big Brother automates operations

based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run

Control with very few lines❙ 3 lines to automate main Run Control

Page 23: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 23

“Scheduler” Concept❚ Big Brother automates operations

based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run

Control with very few lines❙ 3 lines to automate main Run Control

Page 24: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 24

“Scheduler” Concept❚ Big Brother automates operations

based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run

Control with very few lines❙ 3 lines to automate main Run Control

Page 25: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 25

“Scheduler” Concept❚ Big Brother automates operations

based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run

Control with very few lines❙ 3 lines to automate main Run Control

Page 26: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017 26

“Scheduler” Concept❚ Big Brother automates operations

based on the LHC State, driven by Scheduler(s):❙ Define sequences of Activities/Scenarios❙ Complete Automation of each Run

Control with very few lines❙ 3 lines to automate main Run Control

Page 27: Configuring and Automating an LHC Experiment · HLT Farm Now Even better Resource Optimization in 2015 -> Split HLT Idea: Buffer ALL data to disk after HLT1 / Perform Calibration

Clara Gaspar, October 2017

❚ LHCb’s New Online Dataflow represented an Operational challenge, but due to:❙ The modular architecture and flexibility of the original design❙ Some new concepts (architectures, scenarios, schedulers)

❚ It is now:❙ Seamlessly Integrated❙ Completely automated

❚ And the farm is kept as busy as possible❙ Also offline simulation

(concurrently) when still free resources.

27

Conclusions