d ata a cquisition b ackbone c ore d a b c j. adamczewski, h.g. essel, n. kurz, s. linev

Data Acquisition Backbone Core DABCJ. Adamczewski, H.G. Essel, N. Kurz, S. Linev

GSI, Darmstadt

The new Facility for Antiproton and Ion Research at GSI

First sketch of the data acquisition concept for CBM (2004)H.G.Essel, FutureDAQ for CBM: On-line event selection, IEEE TNS Vol.53, No.3, June 2006, pp 677-681

CHEP2007 Victoria BC, Canada, 2- 7 September, 2007 Work supported by EU RP6 project JRA1 FutureDAQ RII3-CT-2004-506078

General purpose DAQ• Event building network with standard components like InfiniBand• Scheduled data transfer with ~10µs accuracy• Thread synchronization with determined latency Needs realtime patches of standard Linux kernel 2.6.19:• RT priorities, nanosleep and pthread_condition• PREEMPT_RT (Ingo Molnar), high resolution timer (Thomas Gleixner)

Use cases• Detector tests • FE equipment tests• High speed data transport• Time distribution• Switched event building• Software evaluation• MBS* event builder

Requirements• build events over fast networks• handle triggered or self-triggered front-ends• process time stamped data streams• provide data flow control (to front-ends)• connect (nearly) any front-ends• provide interfaces to plug in application codes• connect MBS readout or collector nodes• be controllable by several controls frameworks

* Muliti Branch System: current standard DAQ at GSI

Motivation for developing DABCThe left Figure shows hardware components the DABC has to deal with. Front-end boards typically keep sampling digitizers, whereas combiner and dispatcher boards have fast data links controlled by FPGAs. Prototypes of these boards are under test. Data rates in the order of 1 GBytes/s could be delivered through the PCI express interface.

Frontend hardware example

Logically such a setup looks like sketched in right Figure. Depending on the performance

requirements (data rates of the front-end channels) one may connect one or more front-

ends to one DABC node. From a minimum of one PC with Ethernet up to medium sized clusters all

kind of configurations are possible. One may separate input and processing nodes or combine both tasks on each machine using bi-directional

links depending on CPU requirements.

DABC as data flow engine

If the receiver nodes are busy with other calculations like event building, it is necessary to hold the senders. This is achieved by the back pressure mechanism shown in Figure above. The receivers inform the senders about their queue status after a collection of buffers. The senders only send when the receivers signalled sufficient resources. The back pressure works smoothly.

Data flow control of DABC

threadProducer

threadReceive

threadConsumer

OutputPort

memory pool

memory pool

threadWait

Network

add bufferwait for buffer

wait signal

send buffer

wait for buffer add buffer

allocate

allocate

free

free

IO

IO

wait completionwait feed back

waitreceive

feed back

threadSend

NetworkInterface B

put new buffer

NetworkInterface A

InputPort

Logical structure of DABC

datainput sorting

taggingfilteranalysis

datainput

sortingtaggingfilteranalysis

IB

PC

PC

GE

analysisarchive

archive

PC

frontendReadout

frontendother

frontendMBS readout scheduler

scheduler

DABC

http://www-linux.gsi.de/~dabc/dabc/DABC.php

Bandwidth achieved by DABC event building on 4 nodes at GSI (non synchronized mode, left graph). The test measurement result is included for comparison. With buffers > 32KB DABC delivers nearly the bandwitdh of the plain test program. Gigabit Ethernet delivers 70MByte/s.Four data sources per node generate data. Event building on the receiver nodes only checks the completeness of data.

On the IWF cluster the scaling of bandwidth with the number of nodes has been measured from 5 to 23 nodes

(right graph). The rates are plotted normalized to 5 nodes (100%). Rates drop to 83% for non synchronized traffic

and large buffers. With 8 KB buffers traffic stays at nearly 100%. The fourth measurement shows the effect of

synchronizing the traffic: Performance for large buffers on 23 nodes is even better than non synchronized on 5

nodes.

All nodes send to all nodes (bi-directional). Data rates are per direction. In synchronized mode senders send data according a time schedule avoiding conflicts at the receivers. Without synchronization all senders send round robin to all receivers and the network must handle collisions.Measurements have been performed at GSI (4 double Opteron machines 2.2 GHz, non synchronized, left graph) and Forschungszentrum Karlsruhe (IWF) on a cluster with 23 double dual-core Opteron (2.2 GHz)* and SilverStorm (QLogic) switch. On the 4 nodes at GSI there is no difference between synchronized and non synchronized mode.

* Many thanks to Frank Schmitz and Ivan Kondov

Performance measurements InfiniBand

Using the real time features of Linux • Linux kernel 2.6.19 with RT patches (high resolution timer now in 2.6.21).• With two threads doing condition ping-pong and one "worker"-thread one turn is 3.2 µs• With high resolution timer and hardware priority a sleeping thread gets control after 10 µs.• Synchronized transfer with microsleep achieves 800 MB/sec

Java GUI (DIM client)

Network

locally

DABC Module

port

port

Memory management

DABC Module

port

port

process process

Dataqueue

Device

Transport

Device

TransportData

queue

Module manager

& Queue

DABC Module A

processInputprocessCommand

ActionEvent manager

Thread with 1-n modules

Command

Device thread

Transport& Data queue

Data readyevent

(event)

GE switch

PC

ABB

DCB

FE

IB switch

FE

DC

Front-end board: sampling ADCs (4*4), clock distribution

Data combiner boards, clock distribution to FE

Data dispatcher board: PCI express card

8-20 PCs dual quad PC

DD

PCIe

bi-directionalevent building

*8 2.5 GBit/s bi-directional (optical) link: data, clock

*4 2.5 GBit/s data links

Gigabit Ethernet

FE: Frontend boardDC: Data combiner boardDD: Data dispatcher boardGE: Gigabit EthernetIB: InfiniBand

PCPC

Builder network

Process network

The right Figure shows the components of DABC data flow mechanism. Basic object is

a module which gets access to data buffer queues through input ports, processes them,

and outputs them to output ports. The data queues on each node are controlled by a

Memory management which allows processing buffers without copy by modules

running on the same node. The Transport component in this case propagates only

references.If a port is connected to a remote port, the Transport and Device components do the

transfers.

Module components can either run in separate threads to utilize multiple CPUs, or in one thread which is faster because no module synchronization by mutexes is needed in this case (left Figure). Data processing functions of the module are called by an Action event manager. The data flow is controlled by buffer resources (queues) handled through the Transports. Once a queue buffer is filled, the Transport sends a signal to the ActionEvent manager, which in turn calls the processInput function of the module associated through a port. This function has access to the data buffers through the port (Fig. above). Similarly, commands can be sent to the ActionEvent manager, which calls the associated module function processCommand.

The GUI is built up dynamically by the information delivered by the DIM servers running in the application threads on all nodes

d ata a cquisition b ackbone c ore d a b c j. adamczewski, h.g. essel, n. kurz, s. linev

Documents