noνa event building, buffering & filtering from within the daq system

1
NOνA Event Building, Buffering & Filtering From within the DAQ System M.Fischler, C.Green, J.Kowalkowski, A.Norman , M.Paterno, R.RechenMacher, Fermi National Accelerator Laboratory Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. The Power of Data Driven Triggering DAQ Topology The NOνA DAQ system is designed to support continuous readout of over 368,000 detector channels. NOνA accomplishes this by creating a hierarchical readout structure that transmits raw data to aggregation nodes which perform multiple stages of event building and organization before retransmitting the data in real time to the next stage of the DAQ chain. In this manner data moves from Front End digitizer boards (FEBs) to DCMs to Event Building & Buffering nodes and finally into a data logging station. The flow of data out of the buffer nodes is controlled by a Global Trigger process that issues extraction directives based on time windows identified to be of interest. These windows can represent beam spills, calibration triggers, or data driven trigger decisions. The Buffer nodes used for DAQ processing are a farm of high performance commodity computers. Each node has a minimum of 16 compute cores and 32 GB of available memory for data processing and buffering. This permits them to run both the event builder process and the full multi-threaded data driven triggering suite, and buffer a minimum of 20 seconds of full, continuous, zero-bias data readout. The Nova detector utilizes a simple XY range stack geometry. This allows for the application of very efficient and powerful global tracking algorithms to identify event topologies that are of interest to physics analyses. The NOνA online physics program needs the ability to produce data driven triggers based on the identification and global reconstruction characteristics of four broad classes of event topology: Being able to generate DDTs based on these topologies allows NOνA to: Improve detector calibrations using “horizontal” muons Verify detector/beam timing with contained neutrino events Search for exotic phenomena in non-beam data (magnetic monopole, WIMP annihilation signatures) Search for directional cosmic neutrino sources Detect nearby supernovae ν µ CC candidate Multi-prong neutrino candidate (c osmic ray) Realtime Event Filtering ARTDAQ is a software product, under active development at Fermilab, that provides tools to build programs that combine realtime event building and filtering of events as part of an integrated DAQ system. The NOνA Data Driven Trigger (NOνA-DDT) demonstrator module uses the high speed event feeding and event filtering components of ARTDAQ to provide a real world proof of concept application that meets the NOvA trigger requirements. NOνA-DDT takes already-built 5ms wide raw data windows from shared memory, and injects them into the art framework where trigger decision modules are run and a decision message is broadcast to the NOνA Global Trigger System. The ART framework allows the usage of standard NOνA reconstruction concepts to be applied to the online trigger decision process. The common ART framework additionally allows for online reconstruction & filtering algorithms to be applied in the offline environment. This allows the identical code used in the DAQ to be used in offline simulation and testing, for determination of selection Neutrino Event topologies of interest observed in the NOνA prototype near detector during the 2011-2012 run. These topologies exhibit the core features for identification with the NOνA-DDT. Linear tracks Multi Track Vertices EM Shower objects Correlated Hit Cluster Data Driven Trigger Buffering As timeslices of detector data are collected and built they are written into two distinct buffer pools: The Global Trigger pool contains timeslices waiting for up to 20 s to be extracted by a global trigger and sent to the datalogger. The DDT pool is a separate set of IPV4 shared memory buffers designed to be the input queues for data driven trigger analysis processes. The DDT shared memory interface is optimized as a one- writer-many-readers design: The shared memory buffer has N “large” (fixed size) segments which are rotated through by the writer to reduce collisions with reader processes The writing process uses guard indicators before and after the detector data. These indicators allow read processes to tell if the data segment being read may have been over written during read-out. DAQ Event Buffering & Building NOνA-DDT Performance Because the DAQ system performs a 200 to 1 synchronized burst transmission of data between the data concentrator modules (DCMs) and the Event Builder buffer nodes, a large amount of network & data transmission buffering is required to handle the data rate. The buffering has been tuned by adjusting: the size of data packets to match a single complete time window the receive window on the buffer nodes to exploit the buffering capability of the high speed switched fabric between the DCMs and buffer nodes. •Cisco 4948 network switch array The NOνA Experiment & DAQ Readout The NOνA experiment is designed to probe the oscillations of neutrinos and anti-neutrinos to determine the transition probabilities for: These transition rates will allow NOνA to determine the neutrino mass hierarchy and highly constrain the CP violating phase δ. The NOνA measurements will improve our understanding of the mixing angles θ 13 and θ 23 . To perform these measurements NOνA has been designed with a 15kTon far detector which is located 810 km away from Fermilab and a small near detector at Fermilab. This enormous detector with over 368,000 channels of synchronized readout electronics operates in a free-running “trigger-less” readout mode. All of the channels are continuously digitized and the data are accumulated through a multi-tiered readout chain into a large farm of high speed computers. The hit data is buffered for minimum of 20 s in the farm while it awaits a beam timing data from Fermilab. Beam spill information from the NuMI (Neutrinos at the Main Injector) beamline is generated independently at FNAL and transmitted to the Ash River detector site. When the beam information is received the relevant data blocks that time match the beam spill are extracted with zero bias and recorded to permanent storage. The rest of the data is examined using real time filtering to determine which portions of the 4.3GB/s data stream are of interest. NOvA Far Detector Stats: Weight: 15kT Height: 5 stories (53x53ft) Detection Cells: 368,640 “Largest plastic structure built by man.” Prototype near detector, in operation at FNAL 2010-Present »140-200 Buffer Node Computers 180 Data Concentrator Modules 11,160 Front End Boards Buffer Nodes Buffer Nodes Buffer Nodes Buffer Nodes Buffer Nodes Buffer Nodes Buffer Nodes Data Buffer 5ms data blocks Data Logger Data Driven Triggers System Trigger Process or Trigger Process or Trigger Process or …. Event Data Slice Pointer Table Data Time Window Search Trigger Reception Grand Trigger OR Data Triggered Data Output Data Minimum Bias 0.75GB/S Stream DCM 1 DCM 1 DCM 1 DCM 1 DCMs COtS Ethernet 1Gb/s FEB FEB FEB FEB FEB FEB FEB Zero Suppressed at (6-8MeV/cell) FEB FEB FEB FEB FEB FEB FEB Global Trigger Processor Beam Spill Indicator (Async from FNAL @ .5-.9Hz) Trigger Broadcast Calib. Pulser ( 50-91Hz) Data Driven Trig. Decisions 11520 FEBs (368,4600 det. channels) 200 Buffer Nodes (3200 Compute Cores) Shared Memory DDT event stack Read the Full paper at: http://nova-docdb.fnal.gov/ Network (switch) buffering optimization as a function of far detector average DCM data rates and acquisition window time. The NOνA-DDT prototype system implements a real world trigger algorithm designed to identify tracks in the NOνA detector. The algorithm calculates the pair wise intersections of the Hough trajectories for the detector hits and identifies the resulting track candidate peaks in the Hough space. This is a real world example of an N 2 complexity algorithm and is of particular interest due to its ability to identify candidate slow particle tracks (i.e. Magnetic Monopoles with β>10 -5 ) in the far detector. The NOνA-DDT-Hough algorithm, without parallelization, was run on the NOνA near detector DAQ trigger cluster. It was able to reach decisions in on average 98±14 ms (compared to the target goal of 60). It was able to achieve this level of performance using only 1/16 th of the available CPU resources of each The ARTDAQ framework overhead for processing of 5ms time windows of NOνA near detector readout. Overhead includes unpacking and formatting of the DAQHit data objects. Hough transform execution time for 5m time windows of NOνA near detector readout. No parallelization was applied to the algorithm.

Upload: semah

Post on 22-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

NOνA Event Building, Buffering & Filtering From within the DAQ System . 11520 FEBs (368,4600 det. channels). FEB. FEB. FEB. FEB. FEB. FEB. 11,160 Front End Boards. 5ms data blocks. FEB. FEB. FEB. FEB. DCM 1. FEB. FEB. Data Buffer. Far Det. FEB. FEB. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NOνA Event Building, Buffering & Filtering From within the DAQ System

NOνA Event Building, Buffering & FilteringFrom within the DAQ System

M.Fischler, C.Green, J.Kowalkowski, A.Norman, M.Paterno, R.RechenMacher, Fermi National Accelerator Laboratory

Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA.

The Power of Data Driven Triggering

DAQ Topology The NOνA DAQ system is designed to support continuous readout of over 368,000 detector channels. NOνA accomplishes this by creating a hierarchical readout structure that transmits raw data to aggregation nodes which perform multiple stages of event building and organization before retransmitting the data in real time to the next stage of the DAQ chain. In this manner data moves from Front End digitizer boards (FEBs) to DCMs to Event Building & Buffering nodes and finally into a data logging station.

The flow of data out of the buffer nodes is controlled by a Global Trigger process that issues extraction directives based on time windows identified to be of interest. These windows can represent beam spills, calibration triggers, or data driven trigger decisions.

The Buffer nodes used for DAQ processing are a farm of high performance commodity computers. Each node has a minimum of 16 compute cores and 32 GB of available memory for data processing and buffering. This permits them to run both the event builder process and the full multi-threaded data driven triggering suite, and buffer a minimum of 20 seconds of full, continuous, zero-bias data readout.

The Nova detector utilizes a simple XY range stack geometry. This allows for the application of very efficient and powerful global tracking algorithms to identify event topologies that are of interest to physics analyses.

The NOνA online physics program needs the ability to produce data driven triggers based on the identification and global reconstruction characteristics of four broad classes of event topology:

Being able to generate DDTs based on these topologies allows NOνA to:• Improve detector calibrations using “horizontal” muons• Verify detector/beam timing with contained neutrino events• Search for exotic phenomena in non-beam data

(magnetic monopole, WIMP annihilation signatures)• Search for directional cosmic neutrino sources• Detect nearby supernovae

νµ CC candidate

Multi-prong neutrino candidate

(cosmic ray)

Realtime Event Filtering ARTDAQ is a software product, under active development at Fermilab, that provides tools to build programs that combine realtime event building and filtering of events as part of an integrated DAQ system.

The NOνA Data Driven Trigger (NOνA-DDT) demonstrator module uses the high speed event feeding and event filtering components of ARTDAQ to provide a real world proof of concept application that meets the NOvA trigger requirements.

NOνA-DDT takes already-built 5ms wide raw data windows from shared memory, and injects them into the art framework where trigger decision modules are run and a decision message is broadcast to the NOνA Global Trigger System.

The ART framework allows the usage of standard NOνA reconstruction concepts to be applied to the online trigger decision process. The common ART framework additionally allows for online reconstruction & filtering algorithms to be applied in the offline environment.

This allows the identical code used in the DAQ to be used in offline simulation and testing, for determination of selection efficiencies, reconstruction biases and over all trigger performance.

Neutrino Event topologies of interest observed in the NOνA prototype near detector during the 2011-2012 run. These topologies exhibit the core features for identification with the NOνA-DDT.

• Linear tracks• Multi Track Vertices

• EM Shower objects• Correlated Hit Cluster

Data Driven Trigger Buffering As timeslices of detector data are collected and built they are written into two distinct buffer pools:• The Global Trigger pool contains timeslices waiting for up to 20 s to be extracted by a

global trigger and sent to the datalogger.• The DDT pool is a separate set of IPV4 shared memory buffers designed to be the

input queues for data driven trigger analysis processes. The DDT shared memory interface is optimized as a one-writer-many-readers design:• The shared memory buffer has N “large” (fixed size) segments which are rotated

through by the writer to reduce collisions with reader processes• The writing process uses guard indicators before and after the detector data. These

indicators allow read processes to tell if the data segment being read may have been over written during read-out.

DAQ Event Buffering & Building

NOνA-DDT Performance

Because the DAQ system performs a 200 to 1 synchronized burst transmission of data between the data concentrator modules (DCMs) and the Event Builder buffer nodes, a large amount of network & data transmission buffering is required to handle the data rate.

The buffering has been tuned by adjusting:• the size of data packets to match a single complete

time window• the receive window on the buffer nodes to exploit the

buffering capability of the high speed switched fabric between the DCMs and buffer nodes.•Cisco 4948 network switch array

The NOνA Experiment & DAQ Readout The NOνA experiment is designed to probe the oscillations of neutrinos and anti-neutrinos to determine the transition probabilities for:

These transition rates will allow NOνA to determine the neutrino mass hierarchy and highly constrain the CP violating phase δ. The NOνA measurements will improve our understanding of the mixing angles θ13

and θ23. To perform these measurements NOνA has been designed with a 15kTon far detector which is located 810 km away from Fermilab and a small near detector at Fermilab.

This enormous detector with over 368,000 channels of synchronized readout electronics operates in a free-running “trigger-less” readout mode. All of the channels are continuously digitized and the data are accumulated through a multi-tiered readout chain into a large farm of high speed computers. The hit data is buffered for minimum of 20 s in the farm while it awaits a beam timing data from Fermilab. Beam spill information from the NuMI (Neutrinos at the Main Injector) beamline is generated independently at FNAL and transmitted to the Ash River detector site. When the beam information is received the relevant data blocks that time match the beam spill are extracted with zero bias and recorded to permanent storage. The rest of the data is examined using real time filtering to determine which portions of the 4.3GB/s data stream are of interest.

NOvA Far Detector Stats:Weight: 15kTHeight: 5 stories (53x53ft)Detection Cells: 368,640“Largest plastic structurebuilt by man.”

Prototype near detector, in operation at FNAL 2010-Present

»140-200 Buffer Node Computers

180 Data Concentrator Modules

11,160 Front End Boards

Buffer NodesBuffer NodesBuffer NodesBuffer NodesBuffer NodesBuffer NodesBuffer Nodes

Data Buffer

5ms datablocks

Data Logger

Data Driven Triggers SystemTrigger

ProcessorTrigger

Processor Trigger

Processor….

Event builder

Data Slice Pointer Table

Data Time Window Search

Trigger Reception

Gran

d Tr

igge

r OR

Data

TriggeredData Output Data

Min

imum

Bia

s0.

75G

B/S

Stre

am

DCM 1DCM 1DCM 1DCM 1DCM 1DCMs

COtS

Eth

erne

t 1Gb

/s

FEBFEBFEBFEBFEBFEBFEB

Zero Suppressedat(6-8MeV/cell)

FEBFEBFEBFEBFEBFEBFEB

Global Trigger ProcessorBeam Spill Indicator

(Async from FNAL @ .5-.9Hz)

Trig

ger B

road

cast

Calib. Pulser ( 50-91Hz)

Data

Driv

enTr

ig. D

ecis

ions

11520 FEBs (368,4600 det. channels)

200 Buffer Nodes (3200 Compute Cores)

Shared Memory DDT event stack

Read the Full paper at:http://nova-docdb.fnal.gov/

Network (switch) buffering optimization as a function of far detector average DCM data rates and acquisition window time.

The NOνA-DDT prototype system implements a real world trigger algorithm designed to identify tracks in the NOνA detector. The algorithm calculates the pair wise intersections of the Hough trajectories for the detector hits and identifies the resulting track candidate peaks in the Hough space. This is a real world example of an N2 complexity algorithm and is of particular interest due to its ability to identify candidate slow particle tracks (i.e. Magnetic Monopoles with β>10-5) in the far detector.

The NOνA-DDT-Hough algorithm, without parallelization, was run on the NOνA near detector DAQ trigger cluster. It was able to reach decisions in on average 98±14 ms (compared to the target goal of 60). It was able to achieve this level of performance using only 1/16th of the available CPU resources of each cluster node. This initial test shows that the system, after parallelization and optimizations, will be able to meet and exceed the NOνA trigger requirements.

The ARTDAQ framework overhead for processing of 5ms time windows of NOνA near detector readout. Overhead includes unpacking and formatting of the DAQHit data objects.

Hough transform execution time for 5m time windows of NOνA near detector readout. No parallelization was applied to the algorithm.