the atlas trigger and data acquisition system an historical overview fred wickens representing atlas...

72
The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov 2010

Upload: tracy-atkins

Post on 20-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

The ATLAS Trigger and Data Acquisition SystemAn historical overview

Fred Wickensrepresenting ATLAS TDAQ

But with some personal commentary

SLAC – 16 Nov 2010

Page 2: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 2

Health Warning

• This talk has been prepared at relatively short notice– insufficient time for me to check that details are totally up-to-

date with the appropriate experts. • Some statements are my own opinions/recollections

– may not be agreed by other ATLAS TDAQ participants.• Some slides have been stolen from other public talks on

ATLAS, any errors of interpretation or detail are entirely mine– If you wish to know more details of the system see talks

given at various recent conferences – especially CHEP2010, e.g. talks by Nicoletta Garelli and Ricardo Goncalo

Page 3: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 3

OutlineThe talk will describe the ATLAS Trigger Data Acquisition system,

focusing mainly on the DataFlow and HLT Framework. Describing how it evolved, what it is now, its performance and some perspectives for the future.

• Introduction• Including a Description of the problem as it appeared in 1994

when the ATLAS Technical Proposal was written• The history of how the system evolved

• Noting some of the key architectural and implementation decisions• The history of how the system evolved

• Noting some of the key architectural and implementation decisions• How it works• The performance achieved in 2010• Future Challenges• Summary

Page 4: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 4

Introduction

Page 5: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 5Fred Wickens, RAL - Seminar at SLACLHC Days in Split - 4 Oct. 20105

The ATLAS Detector

• Large angular coverage: ||<4.9; tracking in||<2.5

• Inner detector ~100M Channels

– Pixels, Si-strips and Transition Radiation Tracker

• Calorimeters – O(100K) Channels

– Liquid Argon electromagnetic; Iron-scintillating tile hadronic

• Outer Muon Spectrometer ~ 1M Channels

• Magnets:

– Inner Tracker 2T solenoid

– Muons 4T air-core toroids

Page 6: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 6

7 TeV

Physics rates at the LHC

• At LHC Physics of interest is small fraction of total interaction rate– b-physics fraction ~ 10-3

– t-physics fraction ~ 10-8

– Higgs fraction ~ 10-11

• At 14 Tev and Luminosity 1034

– Design energy + Luminosity

• Total interactions 109 sec-1

– b-physics 106 sec-1

– t-physics 10 sec-1

– Higgs 10-2 sec-1

Page 7: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 7

The LHC and ATLAS

• LHC design has – Energy 14 TeV– Luminosity 1034 cm-2s-1 – Bunch separation 25 ns (bunch length ~1 ns)

• This results in– ~ 23 interactions / bunch crossing

• ~ 80 charged particles (mainly soft pions) / interaction • ~2000 charged particles / bunch crossing

• Produces ~PetaByte/s in detector (A stack of CDs a mile high!)• ATLAS Technical Proposal assumed:

– Event size of ~1MB– Level-1 Trigger rate of ~100 kHz– Hence data rate into DAQ/HLT ~100 GB/s– Acceptable rate to Off-line ~100 MB/s

Page 8: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 8

Experiment TDAQ comparisons

Page 9: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 9

Time Line

• 1994 Dec – ATLAS Technical Proposal– First physics assumed in 2005

• 1998 June – Level-1 Technical Design Report• 1998 June – DAQ, HLT and DCS Technical Progress Report• 2000 March – DAQ, HLT and DCS Technical Proposal • 2003 June – DAQ, HLT and DCS Technical Design Report

– First physics now assumed in 2007• 2005 – Detector Commissioning no central DAQ• 2006 – Detector Commissioning with central DAQ• 2007 – Start combined Cosmic running• 2008 Sept – few days of LHC running + more Cosmic running• 2009 Nov – LHC re-start at 900 GeV• 2010 Mar – LHC starts running at 7 TeV (3.5 on 3.5)

Page 10: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 10

Sociology

• The ATLAS TDAQ community comprises a very large number of people and many institutes, only a few people at most institutes– L1 TDR – 85 people from 20 institutes– DAQ/HLT/DCS TP – 197 people from 45 institutes– DAQ/HLT/DCS TPR – 211 people from 42 institutes– DAQ/HLT/DCS TDR – 228 people from 41 institutes– Current

• TDAQ Author list – 574 people from 105 institutes• TDAQ Institutes Board – 73 institutes

• In addition, much of the early development of ATLAS was done in the context of the LHC R&D projects, which formed their own sub-communities, many wedded to particular views, potential solutions and even technologies

Page 11: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 11

Funding Issues

• Complicated the picture even more:• Most of TDAQ was funded directly by participating Funding

Agencies – not indirectly via the Common Fund– Consequently the number of Funding Agencies involved was

• L1 – 7 FA’s (3 L1-Calo, 3 L1-Muon, 1 Central L1)• DAQ/HLT – Originally 15 FA’s + ~17% CF

– Note subsequently 7 other FA’s have also contributed

• DCS – 2 FA’s + ~40% CF

• TDAQ has had to adjust to several major perturbations in funding– Loss of major part of CF to meet shortfalls elsewhere– Initial DAQ/HLT system had to be scaled back by ~50% as

money needed to meet ATLAS cash-flow problems– But offset more recently by some additional contributions

from new ATLAS collaborators

Page 12: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 12

Summary of the Problem

• Thus ATLAS faced a problem:– Requiring system of unprecedented scale and performance– A number of candidate technologies existed which might support

such a system, but no clear front runner - in terms of performance, cost, longevity and future evolution

– A long timescale – time for technologies to evolve and solutions to emerge, but need to ensure a solution is in place

– A large diverse community – many ideas but little agreement– A spectrum of approaches – from too abstract to too concrete

• Development of the system was done gradually with various targeted studies:– To obtain a better understanding of issues– Strike a balance between abstract and concrete– Find a good solution, avoid searching for the “best”– Form a coherent community with consensus views

Page 13: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 13

History of how the System Evolved

Page 14: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 14

The ATLAS TP Architecture for DAQ/HLT

• 3-Level Trigger• L1 uses selected coarse

data from calorimeters and muon spectrometer

• L1 latency ~2us– Data help in pipelines in

detector front-ends

• Avg L2 latency ~10ms• Event Building at ~1kHz• Data to storage/off-line

at ~100Hz / ~100MB/s

Page 15: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 15

Possible Architecture Implementation of L2

• Uses RoI principle (see later)

• During L2 decision time data stored in “LVL2 Buffers”

• Parallel processing in Local Processors of data within each RoI from different detectors

• Results from different detectors and different RoI’s combined in a Global Processor

Page 16: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 16

Possible Overall DAQ/HLT Implementation

• Much of the thinking still based on custom h/w– VMEbus crates– DSP’s or FPGA’s for

L2 Local– Special processor

boards with micro-kernel for L2 Global

– Various high speed interconnects suggested: Fibre Channel, HIPPI, ATM, SCI

• Although was recognised that commodity h/w might become available for some parts

Page 17: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 17

Some key choices in the TP Architecture - 1

• Uniformity from the level of the ROL (Read-Out Link)– The read-out of each detector up to the output of the Read-

Out Driver (ROD) is the responsibility of the detector group.• Although there are some commonalities there are major

differences across the ROD’s

• Separation of the ROD’s and the “Read-Out Crates” (now Read-Out System - ROS)

– This simplified decision making, and also greatly simplified stand-alone detector and TDAQ commissioning and debugging.

– The separation continues to give considerable operational advantages

– But it has been noted that combining these units could lead to cheaper hardware and more flexible solutions.

Page 18: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 18

Some key choices in the TP Architecture - 2• Separation of the ROD crates of different detectors into a small

number (~15) of fixed TTC zones (Timing Trigger and Control – a real-time high precision timing system for synchronisation and transport of small data packets)– The DAQ (and EF part of HLT, but not L1 or L2) can also be

partitioned to allow concurrent independent operation of different detectors.

• This supports parallel independent calibration or debugging runs of different detectors.

• The LVL1 architecture essentially as built (See below)– Although there were further developments

• some technology changes (e.g. fewer ASIC’s, more FPGA’s)• max rate reduced from 100 to 75 kHz

– to reduce the cost of some detector electronics• The RoI Principle

– See next slide

Page 19: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 19

Regions of Interest

• The Level-1 selection is dominated by local signatures (i.e. within Region of Interest - RoI)

• Typically, there are 1-2 RoI/event• Can obtain further rate reduction

at Level-2 using just data withinthe Region of Interest– E.g. validate calorimeter data at full

granularity– If still OK check track in inner detector

• Emphasis on Reducing network b/w and processing power required

• Thus reduced the demand on the technology, but stronger coupling between Trigger and DataFlow

Page 20: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 20

ARCHITECTURE

40 MHz

Trigger DAQ

~1 PB/s(equivalent)

~ 100 Hz ~ 100 MB/sPhysics

Three logical levels

LVL1 - Fastest:Only Calo and

MuHardwired

LVL2 - Local:LVL1

refinement +track

associationLVL3 - Full

event:“Offline” analysis

~2 s

~10 ms

~1 sec.

Hierarchical data-flow

On-detector electronics:

Pipelines

Event fragments buffered in

parallel

Full event in processor farm

Page 21: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 21

Central TriggerProcessor

Region-of-Interest Unit(Level-1/Level-2)

Level-2 TriggerFront-end Systems

Calorimeter TriggerProcessor

MuonTrigger

Processor

µ

Subtriggerinformation

Timing, trigger andcontrol distribution

JetET e /

Calorimeters Muon Detectors

Level-1 TDR• Calorimeter and muon

– trigger on inclusive signatures• muons; • em/tau/jet calo clusters;

missing and sum ET

• Bunch crossing identified• Hardware trigger with

– Programmable thresholds– Selection based on

multiplicities and thresholds • Region of Interest Information

sent to Level-2 – e.g.– calo clusters (ET>10GeV) – muon tracks ( pT > 6 GeV)

Page 22: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 22

Evolution up to the TPR (1998) and TP (2000)

• Wide range of studies in technology and software• Standardised suites of software starting to emerge for various

functions• Some major changes for L2, consensus that:

– Should be implemented using sequential processing of algorithms mainly in Unix PCs

– Drop the custom RoI Distributor assumed earlier • Data requests should pass via the network

• But still far from consensus in some key areas– Networks (Ethernet slowly emerged, but various more

exotic networks were favoured for a long-time)– Read-Out Buffers – DAQ groups focussed on functionality,

L2 community focussed on performance. But even here there is some consensus emerging on a custom ROBin card

Page 23: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 23

Evolution up to the TPR (1998) and TP (2000)

• In the TP– yet more

convergence– the whole

DAQ/HLT system is described in a common language(UML)

Page 24: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 24

The TDR (2003) or The System Crystalises

• Overall Architecture agreed

Page 25: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 25

The TDR (2003)

• Baseline Implementation agreed

Page 26: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 26

TDR (2003)

• Gigabit Ethernet to be used for all networks• Most of the system to use standard rack-mounted Linux PC

servers • Read-Out System based on Industrial PC plus ROBin cards• Custom h/w limited to:

– RoIB (VME based system to build the RoI pointers from different parts of L1 into a single record)

– ROBin – custom PCI card to buffer event data from the detector RODs

– ROL – the read-out link used to transport data from a detector ROD to a ROBin (160 MB/s S-Link)

• Some adjustments to rates and assumed event size (1.5 MB)

Page 27: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 27

TDR (2003)

• Standard racks and their locations defined• Assumed would implement HLT with 8GHz (single-

core) dual socket PCs!

Page 28: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 28

SDX Level 1 Layout

Row 6 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF

Row 4 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF

Row 2 EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF

Rack Number 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Select which year

SDX Level 2 Layout

Row 6 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 NWbb NWbb T DCS cDCS DSS PP PP

Row 4 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2 EF/L2

Row 2 EF/L2 EF/L2 SFO SFO DC SFI SFI SFI SFIBackend switch rackDC switch rackDC switch rackOnline switch rackOnline Online Online Online

Rack Number 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

16 Racks

Landing 350daN/m2

Landing 350daN/m2

Land

ing

350d

aN/m

2

door

door

Ven

tilat

ion

duct

(on

the

flor)

water pipes

Shaft PX15Cable Tray USA15

door

Racks enter via this door

17 Racks

18 Racks

Crinoline Ladder

FRONTs

FRONTs

BACKs

100 cm

80-100 cm

70-90 cm

63 cm (taking into account the cooler)

The lower distance is due to structuralbeams or ventillation flaps and and applies to ~5 racks.

The lower limit is due to the powerdistribution boxes and will affectmost of the racks.

AIR FLOW

20052006200720082009

Landing 350daN/m2

Flap TrapVen

tilat

ion

duct

(on

the

flor)

water pipes

door

Racks enter via this door

13 Racks

18 Racks

2 Racks

17 Racks

AIR FLOW 63 cm (taking into account the cooler)

74-90 cm

100 cm

80-100 cm

FRONTs

BACKs

FRONT

e-box e-box e-box

e-box e-box e-box

ATLAS TDAQ Barrack Rack Layout

Page 29: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 29

How it Works

Page 30: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 30

ARCHITECTURE

H

L

T

40 MHz

75 kHz

~2 kHz

~ 200 Hz

40 MHz

RoI data = 1-2%

~2 GB/s

FE Pipelines2.5 s

LVL1 accept

Read-Out DriversROD ROD ROD

LVL1 2.5 s

CalorimeterTrigger

MuonTrigger

Event Builder

EB

~3 GB/s

ROS Read-Out Sub-systems

Read-Out BuffersROB ROB ROB

120 GB/s Read-Out Links

Calo MuTrCh Other detectors

~ 1 PB/s

Event Filter

EFPEFP

EFP

~ 1 sec

EFN

~3 GB/s

~ 300 MB/s

~ 300 MB/s

Trigger DAQ

LVL2 ~ 10 ms

L2P

L2SV

L2NL2PL2P

ROIB

LVL2 accept

RoI requests

RoI’s

Page 31: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 31

The Trigger Framework

• The development and testing of event selection code used the off-line software framework (Athena)

• The event selection code was then ported to the on-line “DataCollection” framework– Latter provides the interfaces to the on-line services

(e.g. run control, configuration, message passing). • But an increasing number of services provided in off-line code

would need to be ported and maintained for on-line use – Including services required to handle calibration and

alignment data• The TDR introduced the “PESA Steering Controller” (PSC) to

reduce this on-going effort

Page 32: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 32

The Trigger Framework

• The PSC is an interface inside the on-line application– Provides “Athena-like” environment (i.e. off-line)– Hides the on-line complications from the event selection

s/w

Page 33: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 33

The Trigger Framework

• The PSC allowed the use of many off-line services– Simplifies trigger code development and testing– Provides direct access to s/w to handle calibration/alignment– Greater homogeneity between off-line and on-line code– Event selection code in L2 and EF see the same

environment – so eased moving algorithms between them– In principle still allowed multiple algorithm threads in a

single application – in practice proved no longer practical• e.g. some offline services used external libraries which were

not thread-safe

• Hence L2 moved to use of many off-line services – but dropped multiple algorithm threads– Still need thorough testing of services introduced

• to ensure that they met the online requirements (timing, memory leaks and robustness)

Page 34: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 34

Event Selection Code• HLT algorithms:

– Extract features from sub-detector data – Combine features to reconstruct physical objects

• electron, muon, jet, etc – Combine objects to test event topology– Organised into Trigger Chains

• Trigger Chain:– Started if seed has fired– Processing of a chain stops as soon as an algorithm not passed– Chain passes if last Hypothesis in the chain is passed– Can be used to seed other chains in next level

• Trigger Menu– Consists of a list of triggers including prescales at each level

• i.e. L1 Item -> L2 Chain -> EF Chain – Can enable/disable a trigger during a run using the prescales

• Event passed if at least one EF Chain passed

Page 35: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 35

Execution of a Trigger Chain

match?

L2 calorim.

L2 tracking

cluster?

track?

Level 2 seeded by Level 1•Fast reconstruction algorithms •Reconstruction within RoI

Electromagneticclusters

EM ROI

Level1:Region of Interest is found and position in EM calorimeter is passed to Level 2

E.F.calorim.

E.F.tracking

track?

e/ OK?

e/ reconst.

Ev.Filter seeded by Level 2•Offline reconstruction algorithms •Refined alignment and calibration

Page 36: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 36

Changes since the TDR

• Multi-core technology – the 8GHz (and faster) CPU clocks did not appear, have to use multi-cores. Major impact to date is how to handle the large increase in number of applications!

• XPU racks – the initial HLT racks are connected to both DataCollection network (for L2) and Back-End network (for EF)

• Fewer but larger more performant SFOs• Concept of Luminosity blocks – defines a short period (1-2 mins)

where running is stable – implemented using a Tag added in L1-CTP– Allows parts of the system to be removed/added within a run– Allows a synchronised change of Trigger Pre-scales within a run

• Event streaming – separate files for different event types (express, different physics streams, calibration, debug etc)

• Partial Event Building added – for greater flexibility in calibrations• Better scaling by more proxies (e.g. for Databases, Information

Servers, gathering of histograms)• Better monitoring and configuration tools

Page 37: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 37

Even

t dat

a re

ques

ts &

Del

ete

com

man

ds

Requ

este

d ev

ent d

ata

ATLAS TDAQ System

USA15

SDX1

Read-OutDrivers(RODs)

Level 1trigger

Read-OutSubsystems

(ROSes)

Timing Trigger Control (TTC)

~1600Read-Out

Links

ROIBuilder

VME bus

Level 2Super-visors

DataFlowManager

EventFilter(EF) farm

[~ 500][~1600] [~100] [~5 ]LocalStorageSubFarmOutputs

(SFOs)

Level 2 farm

EventBuilderSubFarmInputs

(SFIs)

UX15

CERN computer

center

Data Storage

Control+ Configuration

Monitoring

File Servers[70]

[48]

[26]

Network Switches[4]

Region Of Interest (ROI)

surface

underground

[1]

[~ 150]

~90M channels ~98% fully operational in 2010

~90M channels ~98% fully operational in 2010

ATLAS Data

Trigger Info

[# nodes]

Control, Configuration and Monitoring Network not shown

Page 38: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 38

Other Components/Issues

• No time to include many other important aspects• In particular:

– Run Control– Monitoring - of the infrastructure and of the data– Configuration - of the many O(10K) applications– Error reporting– Use of databases– Calibration runs– Scaling of all of the above

Page 39: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 39

Status in 2010 Running

Page 40: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 40

TDAQ Farm StatusComponent Installed Comments

Online&Monitoring 100% ~60 nodes

ROSes 100% ~150 nodes

ROIB & L2SVs 100%

HLT (L2+EF) ~50% ~800 xpu nodes; ~300 EF nodes

Event Builder 100% ~60 nodes (exploiting multi-core)

SFO 100% Headroom for high instantaneous throughput

Networking 100% Redundancy deployed in critical areas

27 xpu racks ~800 xpu nodesXPU = L2 or EF Processing Unit can be configured to run either as L2 or EF on a “run by run” basis Possibility to move processing power between the L2 and the EF allows high flexibility to meet the trigger needs

Page 41: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 41

DataFlow rates Achieved

Output from ROS• Have exceeded by a good margin the TDR

specification of 20 kHz L2 data requests together with EB requests of 3 kHz

Event Building• Have sustained data rates of well over 4.5 GB/s for a

wide range of event sizes (100kB to 10MB)– An EB test with 1.3 MB events achieved 9 GB/s

SFO Output• Have sustained running at well over 1 GB/s

– cf 300 MB/s in TDR• Output to Computer Centre runs at up to ~900 MB/s

Page 42: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 42

ATLAS Run Efficiency

ATLAS Efficiency @Stable Beams at √s = 7 TeV(not luminosity weighted)Run Efficiency 96.5% (green): fraction of time in which ATLAS is recording data, while LHC is delivering stable beamsRun Efficiency Ready 93% (grey): fraction of time in which ATLAS is recording physics data with innermost detectors at nominal voltages (safety aspect)

Key functionality for maximizing efficiencyData taking starts at the beginning of the LHC fillStop-less removal/recovery: automated removal/ recovery of channels which stopped the triggerDynamic resynchronization: automated procedure to resynchronize channels which lost synchronization with LHC clock, w/o stopping the trigger

752.7 h of stable beams (March, 30th - Oct, 11th)

Page 43: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 43

Trigger Menu and Configuration

Trigger menu:• Collection of trigger signatures• ≈200 – 500 algorithm chains in current menus• Algorithms re-used in many chains• Selections dictated by ATLAS physics programme • Includes calibration & monitoring chains

Configuration infrastructure• Very flexible!• Pre-scale factors employed to

change menu while running– At change of Lumi Block

• Adapt to changing LHC luminosity

Page 44: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 44

Trigger Commissioning

• Initial timing in of L1 with Cosmics +single beams :

• First Collisions : L1 only• Since June : gradual

activation of HLT

Page 45: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 45

Beam spot monitoring in L2

• Example of using the flexibility and spare capacity in the system• Fit Primary vertex using tracks to Inner Detector data• Does not use RoI, so limited to a few kHz – because of ROS request limit• Very useful diagnostic during 2010 for LHC tuning

Page 46: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 46

Evolution of LHC Luminosity in 2010

Page 47: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 47

Total Integrated Luminosity in 2010

Page 48: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 48

Page 49: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 49

Summary of 2010 p-p running

• The ATLAS TDAQ system has operated beyond design requirements• meeting changing needs for trigger commissioning, understanding the

detector and accelerator, and delivering physics results• Robust and Flexible system

• thanks to years of planning, prototyping, commissioning and dedicated work by many people

• The data Recording farm regularly used well beyond design specifications• The system (dataflow and trigger) has successfully handled running with

luminosity spanning 5 orders of magnitude• There is space for the trigger to evolve and the selections will continue to

be optimized for even higher luminosities• There are enough HLT nodes to meet the full EB rate and present L

• If needed will install more CPU power in 2011• High Run Efficiency for Physics of 93%• Ready for running in 2011

Page 50: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 50

Moved on to 4 weeks of Heavy Ion running

Page 51: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 51

Page 52: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 52

Future Challenges

Page 53: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 53

Future Challenges – Up to Design Luminosity

• Small increase in event size – Still dominated by non zero-suppressed LAr Calo

• Need higher HLT rejection (x~4)– L1 increased x ~2, SFO reduced by x ~2

• Possible limits– HLT CPU power

• Add up to ~1200 more EF nodes– Also increases L2 - reduce number of XPUs used for EF

– ROS request rate• Some requests to go beyond TDR performance to allow more

non-RoI usage in L2 (e.g. ID full-scan and missing ET )• Possible improvements (with little change in network b/w):

– Optimise details of detector mapping into ROSs– Further optimisation of ROS s/w and/or L2 data requests– Update to ROS PCs– Add possibility of fetching pre-processed data (e.g. energy sum)

Page 54: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 54

Future Challenges – Beyond Design Luminosity

• Improve L1:– Use topology information and finer granularity data– Add a L1 Track Trigger– i.e. move some algorithms from L2 to L1

• Seems likely that:– L1 accept rate still ~100 kHz (detector power)– SFO event rate still ~200 Hz (off-line capacity)

• But much bigger events - several MB• DataFlow b/w and HLT processing will need to increase

– By at least factor of event size increase, possibly more– DataFlow - more full-scans and/or higher EB rate– HLT processing – more complex evts and some algorithms “stolen”

by L1• May be able to offset these by using pre-processed data

– E.g. in ROD, in ROS, Fast h/w Track Finder (FTK)• Technology improvements will help

– faster network, faster CPU’s, GPUs?,…

Page 55: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 55

Some likely Issues

• Scaling with the number of applications – likely to increase even more (currently > 10,000)– At least until find a way to use parallel threads

• problem also for off-line, but they may find a solution not compatible with the on-line environment!

– Need to avoid this causing problems (delays) in:State Transitions; Configuration; Gathering monitoring data (especially histograms)

• Load balancing – heterogeneous farms, events in the tails of processing time, L2 vs EF

• Bandwidth balancing – mapping of detectors to RODs/ROLs/ROS’s balanced the b/w load for design luminosity, but should be revisited

Page 56: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 56

Outlook

• Studies underway for a DataFlow evolution prototype– Combine L2, EB and EF in the same processor

• but still use RoI for L2

– Addresses various issues including load balancing and some aspects of scaling

• For the intermediate-term - may need to revise the specification for ROS requests and then investigate how to meet these (retaining current RODs and ROLs)

• For running in the longer term (>2020) - revisit the whole architecture of DAQ/HLT

Page 57: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 57

Summary

• Have described:– The ATLAS TDAQ system and how over a number of

years the architecture and implementation planning evolved to a system based almost entirely on commodity hardware

– How it uses the RoI principle to guide data flow across the networks as well as data processing in HLT

– The status of the system today and the performance achieved in 2010

• DataFlow rates at or beyond TDR specifications and >93% eff• A flexible HLT system running and many algorithms deployed

– Briefly the challenges for the future: to reach design luminosity and beyond

Page 58: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 58

Backup Slides

Page 59: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 5921 April 2023Fred Wickens, RALRicardo Gonçalo, RHUL

Pixel: 10x100μm; 80 M channels Strips: 80μm; 6 M channels

160000 channels

Beam Pickup: at ± 175m from ATLASTrigger on filled bunch Provide the reference timing

Minbias Trigger Scintillator: 32 sectors on LAr cryostatMain trigger for initial running coverage 2.1 to 3.8

Page 60: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 60

Detector ROD’s (Read-Out Drivers)

• Subdetector specific designs• Collects and processes data (no event selection) • Built as VME modules

– DSP’s, FPGA’s or ASIC’s for processing

– E.g. LArg ROD’s use DSPs to calculate

• Energy, fit quality, …

• Output via standard Read-Out Link (ROL) – 160 MByte/s optical fiber to

Read-Out System

Page 61: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 61

ROS - Read-Out System• ROBin Boards

– 64b x 66 MHz PCI card with buffer memory and hardware assisted memory control

– FPGA and embedded PPC– Receives data from up to 3

ROL’s– Stores data during LVL2

decision time

• ROS PC’s– Standard PC servers– Up to 4 ROBin’s per ROS PC– Receives data requests and

send responses via GbE

Page 62: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 62

Final ROBIN: ~600 used in full read-out

1

1

1

2

3

4

5

• 3 Read-Out Link channels (1) (200 MByte/s per channel), 64 MByte buffer memory per ROL, electrical Gigabit Ethernet (2) , PowerPC processor (466 MHz) (3), 128 MByte program and data memory, Xilinx XC2V2000 FPGA (4) , 66 MHz PCI-64 interface (5)• 12 layer PCB, 220*106mm, • Surface Mounted Devices on both sides• Power consumption ~ 15W (operational)

Page 63: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 63

MDT ROD’s, LArg ROD’s + ROS’s

Page 64: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 64

ATLAS DAQ/HLT HardwareHLT Racks on the right, Online services + Event Builder + SFO on left

Page 65: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 65

TDAQ 2010

1.5 MB/150 ns

~350 Hz

~3.5 kHz

20 kHz

~1 MHz

~20 kHz

~3.5 kHz

~ 350 Hz

~1 MHz

~30 GB/s

~550 MB/s

~5.5 GB/s

~40 ms

~300 ms

High Rate Test with Random

Triggers

~65 kHz

Page 66: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 6621 April 2023Fred Wickens, RAL

Fred Wickens, RAL - Seminar at SLACSLAC - 16 Nov. 2010 66

LHC collision rate (nb=4)

LHC collision rate (nb=2)

• Soft QCD studies• Provide control trigger on p-p collisions;

discriminate against beam-related backgrounds (using signal time)

• Minimum Bias Scintillators (MBTS) installed in each end-cap;

• Example: MBTS_1 – at least 1 hit in MBTS

• Also nr. of hits its in Inner Detector

Ricardo Gonçalo, RHUL LHC Days in Split - 4 Oct. 2010 66

Minimum Bias Trigger

Minbias Trigger Scintillator: 32 sectors on LAr cryostatMain trigger for initial running coverage 2.1 to 3.8

Page 67: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 67

80% acceptance due to support structures etc.

INST 3 (2008) S08003

Muon Trigger• Low PT: J/, and B-physics

• High PT: H/Z/W/τ μ, SUSY, exotics➝

• Level 1: look for coincidence hits in muon trigger chambers

– Resistive Plate Chambers (barrel) and Thin Gap Chambers (endcap)

– pT resolved from coincidence hits in look-up table

• Level 2: refine Level 1 candidate with precision hits from Muon Drift Tubes (MDT) and combine with inner detector track

• Event Filter: use offline algorithms and precision; complementary algorithm does inside-out tracking and muon reconstruction

Page 68: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 68

• Stand-alone: muons reconstructed from Muon Spectrometer information only– L2 efficiency > 98% w.r.t. L1 for

muons with pT > 4 GeV– Good agreement with simulation

• Combined: muons reconstructed from Muon Spectrometer segment combined with Inner Detector track– Sharp turn-on and high efficiency– Good agreement with simulation

• Alternative inside-out algorithm also used in Event Filter

Standalone Level 2 effic. wrt Level 1pT > 4GeV

CombinedEv. Filter effic. wrt Level 2pT > 4GeV

Page 69: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 69

e/γ Trigger

• pT≈3-20 GeV: b/c/tau decays, SUSY• pT≈20-100 GeV: W/Z/top/Higgs• pT>100 GeV: exotics

• Level 1: local ET maximum in ΔηxΔφ = 0.2x0.2 with possible isolation cut

• Level 2: fast tracking and calorimeter clustering – use shower shape variables plus track-cluster matching

• Event Filter: high precision offline algorithms wrapped for online running

W eν➝

L1 EM triggerpT > 5GeV

Page 70: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 70 70

ATLAS-CON

F-2010-065Jet Trigger

• QCD multijet production, top, SUSY, generic BSM searches

• Level 1: look for local maximum in ET in calorimeter towers of ΔηxΔφ = 0.4x0.4 to 0.8x0.8

• Level 2: simplified cone clustering algorithm (3 iterations max) on calorimeter cells

• Event Filter: anti-kT algorithm on calorimeter cells; currently running in transparent mode (no rejection)

• High Level Trigger running at EM scale plus jet energy scale corrections at the moment

Note in preparation

Page 71: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 71

Missing ET Trigger

• SUSY, Higgs• Level 1: ET

miss and ET calculated from all calorimeter towers

• Level 2: only muon corrections possible• Event Filter: re-calculate from calorimeter

cells and reconstructed muons Level 15 GeV threshold

Level 120 GeV threshold

Page 72: The ATLAS Trigger and Data Acquisition System An historical overview Fred Wickens representing ATLAS TDAQ But with some personal commentary SLAC – 16 Nov

16 Nov 2010 ATLAS TriggerDAQ 72

Plans for Heavy Ion Run

• Collect ≈3μb-1 of Pb-Pb collisions at 2.76 TeV/nucleon during 4 weeks in November

• Take advantage of ATLAS capabilities– Good angular coverage– Good particle ID– Forward scintillators and Zero Degree

Calorimeters

• Trigger rate ≈ 140 Hz– σPb+Pb≈ 7.6 barn– L ≈ 1x1025cm-2s-1 (1% of design)– I.e. around 100Hz of collisions

• Use modified L1 menu only– Use as little High Level Trigger as

possible– Avoid tracking if possible (1000s of

tracks for central collisions)

Trigger Triggers, thresholds, etc

Minimum bias

Hits in forward scintillators, zero-degree calorimeter, luminosity detectors etc for wide eta coveragePrimary triggers for heavy ion run

Σ ET 50, 500, 1000, 2000 GeVCentrality trigger and centrality veto to enhance peripheral collisions

Jets Single and di-jet triggers, scalar sum of jet energy for centrality veto

EM Single photon and electron triggers

Muons Single muon and di-muon triggers

Tau Single tau and di-tau triggers