werner wiedenmann university of wisconsin, madison, wisconsin on behalf of the atlas collaboration

17
1 The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration CHEP 2009 Prague, 21-27 March 2009

Upload: said

Post on 19-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger. Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration CHEP 2009 Prague, 21-27 March 2009. The ATLAS Trigger. Level-1 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

1

The ATLAS Online High Level Trigger Framework: Experience reusing Offline

Software Components in the ATLAS Trigger

Werner Wiedenmann

University of Wisconsin, Madison, Wisconsin

on behalf of the ATLAS Collaboration

CHEP 2009

Prague, 21-27 March 2009

Page 2: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

2Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

The ATLAS Trigger

Target processing time

<75 (100) kHz

~ 3 kHz

~ 200 Hz

Rate

~ 4 s 2 GHz / 4 core

CPU

~ 40 ms2 GHz / 4 core

CPU

2.5 μs

Level-1

Hardware trigger

High Level Triggers

(HLT)

Level-2 + Event Filter

Software trigger

runs on commodity hardware

Level 2 Trigger

Event Filter

Data Recording

Event Size ~1.6 MB

Page 3: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

3Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

HLT Hardware

Final system: 17 L2 racks 62 EF racks

1 rack = 31 PCs (1U)28 racks configurable as L2

or EF (XPU)1 Local File Server per rack

served by Central File Servers (CFS)

27 XPU racks installed ~35% of final system

XPUCPU 2x Harpertown

quad-core @ 2.5 GHz2 GB Memory/coreNetwork booted

Page 4: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

4Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

ATLAS High Level Triggers

Level-2 (75 kHz→3 kHz) Partial event reconstruction in Regions of Interest (RoI) provided

by Level 1 event data in RoI are requested from selection algorithms over the network from the readout system

HLT selection software runs in the Level-2 Processing Unit (L2PU). Selection algorithms are run in a worker thread.

Event Filter (3 kHz→200 Hz) Full event reconstruction seeded by Level-2 result with offline-type

algorithms Independent Processing Tasks (PT) run selection software on

Event Filter (EF) farm nodes HLT Event Selection Software is based on the ATLAS Athena

offline Framework HLT framework interfaces the HLT event selection algorithms to

online Driven by run control and data flow software Event loop

managed by data flow software Allows HLT algorithms to run unchanged in the trigger and offline

environment

Page 5: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

5Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

HLT software in the Level 2 and Event Filter Data Flow

Data Collection

L2PU/EFPT

Steering ControllerHLT Event Selection Software

HLT Framework

based on ATHENA/GAUDI

Online/Data Collection Framework

Level-2

Event processing in

Worker threads

Event Filter

Event processing in

Processing Tasks

Page 6: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

6Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

HLT Event Selection Software

HLTSSW

Steering Monitoring Service

1..*

MetaData Service

1..*ROB DataCollector

DataManager

HLTAlgorithms

Processing Task

Event DataModel

L2PU Application

<<import>>

Event DataModel

Reconstr. Algorithms

<<import>>

StoreGateAthena/Gaudi

<<import>><<import>>

Interface

Dependency

Package

Event Filter

HLT Core Software

Offline Core Software Offline Reconstruction

HLT Algorithms

Level2

HLT Data Flow Software HLT Selection Software Framework ATHENA/GAUDI Reuse offline components Common to Level-2 and EF

Offline algorithms used in EF

Page 7: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

7Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

HLT Framework

Data Flow

L2PU/EFPT

Algorithms

ATHENA Environment

athena

Algorithms

OfflineOnline

AthenaSvc AthenaSvc

Online Information

ServiceConditions Data Base

Provides special Athena services when necessary to allow algorithms a transparent running in online / offline. Whenever possible the offline infrastructure services are reused. Athena services with a special “online” backend are used e.g. for

Interfacing to online run control, state machine and the online configuration database

e.g. for the readout buffer configuration

Interfacing the algorithm configuration to the Trigger Database

Holds menu configuration

Interfacing to online conditions data

Forwarding of data requests to data flow in Level 2

Page 8: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

8Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

HLT Framework

The HLT framework is responsible For packaging and forwarding of the HLT decisions to online, e.g.

Detailed Level 2 and Event Filter decision record for subsequent reconstruction steps

Event output destination (Stream tag information) Use of event for calibration purposes and building event only partially ….

For error handling and forwarding of error conditions from algorithms to online applications

For forwarding of monitoring histograms and counters to online monitoring infrastructure

Level-2 and Event Filter share many common components from the framework differences mainly due to different data access in RoI or of full event

Page 9: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

9Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Development Model

Data Flow

L2PU/EFPT

Steering Controller

Algorithms

ATHENA Environment

athenaMT/PT

Steering Controller

Algorithms

OfflineOnlineOffline support for developers

Emulators athenaMT/PT

Emulate complete L2PU/PT environment with specific boundary conditions for data access

Command line tools no need to setup complex Data Flow systems

Allow testing of complete state machine

Written in Python/C++

HLT algorithm development, testing and optimization is done mainly in offline environment.

Developers can use online emulators athenaMT for Level-2 and athenaPT for Event filter to test their applications Decoupling HLT algorithm development and TDAQ data flow development

Integration

HLT Framework software

Page 10: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

10

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

The Atlas HLT Software Project The HLT framework packages form the AtlasHLT

software project setup like other offline software projects

uses many facilities from offline build and testing system is hosted in the offline version control repository Follows offline build and patching strategy Automatic tests …

AtlasHLT project has dependencies on most of the Atlas software projects AtlasHLT has to be last in build sequence

AtlasHLT

LCG

AtlasTrigger

AtlasCore

Gaudi

Atlas….

offline

tdaq

tdaq-common

online

Extends directly Gaudi base

services

Automatic memory leak test for Level 2

Contains HLT

algorithms

Page 11: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

11

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Operational Experience

The HLT framework has been successfully tested and used now over many years in e.g. The ATLAS test beam in 2004 and TDAQ commissioning runs In cosmic data taking and detector commissioning periods 1st beam on September 10th: configured in pass through mode

HLT framework used regularly for cosmic data taking and commissioning runs: e.g. continuous cosmic data taking after Sept .12:

216 millions events 453 TB data 400k files

Proved to be very versatile and allowed to react fast on changing detector conditions

Emulators athenaMT/PT used to analyze events offline in DEBUG streams and to eventually reassign them to the analysis streams.

Valuable feedback on functionality, stability and reliability

Page 12: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

12

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Operational Experience

System tests with full 1031 menu and simulated data showed that timing for event processing is at specification (2.5 GHz quad core CPUs)

Level-2: ~ 45 ms Event Filter: ~ 260 ms

(Event Filter not exploited by this setup)

7 hour run

L2 input ~ 60 kHz (80% of design rate)

2880 L2PU (70% of final system)

Cron jobs

Level-2

rejected events

Event Filter

rejected events

Page 13: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

13

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Event Parallelism and Multi-Core Machines

Multi-core machines are used for the HLT processors Processor development rapidly increasing number of CPU cores Calls for massive parallelism typically inherent to high energy physics

selection and reconstruction as “event parallelism” The original ATLAS HLT design exploits event parallelism in

Level-2: parallel event processing in multiple worker threads (with the possibility to fall back to multiple selection processes)

Event filter: multiple event processing tasks The HLT framework supports both ways of exploiting event parallelism

Special version of Gaudi/Athena framework to create selection algorithm instances for worker threads

Experience with multi threading in Level-2 has shown that it is very difficult to maintain a large code base in an open developer community thread safe and thread efficient.

Use Level-2 processing unit with one worker thread with one Level-2 processing unit per CPU core Allows also in Level-2 to profit from large ATLAS offline code basis (mostly

written and designed in “pre-multi-core era”) Increases software communality between Level-2 and Event Filter and

allows transparent move of Level-2 algorithms to Event Filter Increases however the resource usage (memory, network connections, …)

Page 14: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

14

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Experience with Multi Threading in HLT Framework

Initialization Event processing

Worker threads blocked due to common memory pool for STL containers

L2PU with 3 worker threads

Load

Config

Start

Stop

UnConfig

UnLoad

Worker Thread

Event processing in multiple worker threads Code sharing Small context switch times “lightweight processes” Automatic sharing of many hardware resources

Requires careful tuning and proves difficult to keep code thread safe and efficient over release cycles impacts capability to react fast on changing conditions and to deploy new code

Page 15: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

15

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Experience with Multiple Processes in HLT Framework

Run on every CPU core one instance of a Level-2 processing unit or Event Filter processing task Easy to do with existing code basis a

priori no code changes required Facilitates reuse of offline components

Observe good scaling with number of cores provides required throughput

Resource requirements are multiplied with number of process instances Memory

~ 1– 1.5 GByte/Application file descriptors, network sockets, number of controlled

applications ~ 7k presently ~ 20k final system

Trigger rate as a function of # applications/node (Hz)

0

50

100

150200

250

300

350

0 2 4 6 8 10 12 14 16 18

processes

rate

[Hz]

Machine with 8

cores in total

Event Filter

memory requirements

Page 16: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

16

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Resource Optimization

Typically in HEP applications all processes use a large amount of constant configuration and detector description data.

Common problem for offline reconstruction and trigger

Different prototypes tried see contr. # 244, S. Binet, “Harnessing multicores: strategies and implementations in ATLAS”

Very interesting also “OS based” memory optimization “KSM”

P/T

Core 1

P/T

Core 2

P/T

Core 3

P/T

Core 4

Memory Controller

P/T

Core 1

P/T

Core 2

P/T

Core 3

P/T

Core 4

Memory Controller

Data per

P/T

Data per

P/T

Data per

P/T Data per

P/T

Data per

P/T

Data per

P/T

Data per

P/T

Data per

P/TSharable

data

Memory

Page 17: Werner Wiedenmann University of Wisconsin, Madison, Wisconsin on behalf of the ATLAS Collaboration

17

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Werner Wiedenmann, The ATLAS Online High Level Trigger Framework…, CHEP 2009, Prague

Conclusions

The ATLAS HLT framework based on the ATHENA offline framework has successfully been used in many different data taking periods and provides the required performance

The reuse of offline software components allows to benefit from a big developer community and a large code basis.

The ability to develop, optimize and test trigger algorithms in offline is successfully used to establish the ATLAS HLT selection menu

The optimal use of multi core processors requires further frame work enhancements to reduce resource utilization. Many issues there are shared with offline and can profit from common solutions.

The HLT Framework is ready for the new data taking periods.