a demonstration of a time multiplexed trigger for cms
DESCRIPTION
A demonstration of a Time Multiplexed Trigger for CMS. Rob Frazier, Simon Fayer , Rob Frazier, Geoff Hall, Christopher Hunt, Greg Iles , Dave Newbold , Andrew Rose (Imperial College & University of Bristol) 28 September 2011. Overview. - PowerPoint PPT PresentationTRANSCRIPT
A demonstration of a Time Multiplexed Trigger for CMS
Rob Frazier, Simon Fayer , Rob Frazier, Geoff Hall, Christopher Hunt, Greg Iles, Dave Newbold , Andrew Rose
(Imperial College & University of Bristol)
28 September 2011
2
Overview
• How is a Time Multiplexed Trigger different from a Conventional Trigger?
• The demonstration system
• Communication over Ethernet - IPbus
3
Concepts: Time Multiplexed Trigger
Exploring alternative method of triggering. Time-Multiplex incoming data so that the entire calorimeter (~5Tb/s) can be processed in a single FPGA. Akin to the DAQ event builders, but all must be finished in ~1μs
4
Time Multiplex
Can time multiplex in either η (strips) or φ (rings). Both feasible…
5
Past
Future
-5
-4
-3
-2
-1
0
+1
+2
Tim
e / φ
η (optical links)
New Data
Processing engine.
Events are streamed into processing engine
To next stage
Intrinsically very efficient use of logic (i.e. new calculation on every clock cycle – no waiting for data)
Essentially a 5Tb/s, low latency (1μs) image processor
6
Demonstrator System Hardware : Part 1
• NAT Europe MCH– Provides GbE and IPMI
• Vadatech VT892– Dual star topology with 12 full-size
double-width cards. – Modified to have vertical airflow.– MCH slot 1: Standard Services
• GbE, IPMI, etc– MCH Slot 2: Experimental Services
• clock distribution, • fast control & feedback• DAQ
See Eric Hazen’s talk: AMC13
7
Demonstrator System Hardware : Part 2
• MINI-T5-R2– Double Width AMC Card– Virtex 5 TX240T– Optics
• IN: 160 Gb/s (32x 5Gb/s)• OUT: 100Gb/s (20x 5b/s)
– RAM• Dual QDR II, 2x72Mb, • 2x 9Gb/s on each port (R/W)
– MMC• Atmel AT32UC3A3256
– AMC• 2x Ethernet, 1x SATA• 4x FatPipe, 1x Ext FatPipe
Thanks to Jean-Pierre Cachemiche for MMC code. Ported to Atmel
AVR32 MicroController by Simon Fayer
Test System
8
PP0
MINI-T5(Main Processor)
x 2 Main-Processors(14 in full system)
Patch Panel
ECAL & HCAL Trigger Primitive Data would enter here in final system
AMC13
PP1 PP2 PP3 PP4 PP5 PP0 PP0
x 24 Pre-Processors (x36 in full system)
MINI-T5 (6x PreProcessors)
MINI-T5(Main Processor)
MCHStandard Services
Custom Services
Simulates 6x Pre Processor ‑
Cards.
Possible because we only need 2
Main Processors‑
Main Processor Cards‑
Clock distributed
via MCHPatch panel. One fibre from
each Pre Processor ‑
routed to a Main Processor‑
Location for AMC13Clock, Fast Control & Feedback, DAQ
9
AMC13
Test system internals
10
TimeMux
RAM #0
Align Links
Algorithm
DAQ
DAQ
RAM #1
RAM #2
RAM #9
TimeMux x 24 Pre-Processors
Simulates 240 input links(1/4 of CMS Calorimeter Trigger)
x 24 Links
To DAQ via AMC13 or alternatively Ethernet
To Global Trigger
11
All 24 channels aligned
HeaderPP IdentRAM #0
(3 words)
RAM #1(3 words)
CRC Checked then zeroed by fw
8B/10B Commas
12
Where next?
• Current MINI-T5 can handle ¼ of CMS Calorimeter Trigger– Assumes a time multiplexing period of 14bx– Require x4 bandwidth to place entire Calo Trigger into a single FPGA– Link speed x2 (Virtex7 with 10Gb/s), Number of links x2 (48-72 Rx)
V7
MicroPOD™: 8.2x7.8mm with LGA electrical interface
MiniPOD™: 22x18.5mm with 9x9 MegArray™ connector
MAXI-T7
13
IPbusA method to communicate with cards over Ethernet
14
Communication Requirements
• Primary control path in MicroTCA is Ethernet– What protocol should we use UDP, TCP or something else?
• Requirements– Robust– Scalable– Reasonable bandwidth (make good use of 1Gb/s crate interface)– Relatively simple
• Not too onerous (10% of design, not 90%).• Maintainable over 10 years with different versions of the tools and
different people.– Portable from one card to the next
15
Communication Ideas
• Primary advantage of TCP is not reliability, but throughput– Imagine UDP with retry capability– Ethernet has a large latency (packet based, CRCs, etc)
• i.e. single transactions will be relatively slow– TCP allows multiple packets in flight simultaneously and ensures that all
packets arrive, and in the correct order• Ideal, but powerful CPU or complex firmware core to reach 1Gb/s• Separate commands still slow (i.e. do A, then B, then C)
Embedded CPU on the card either within FPGA or externalCan get quite complex (i.e require CPU, RAM, Flash, etc)
16
IPbus
• Originally created by Jeremy Mans et al in 2009/2010*
• Protocol describes basic transactions needed to control h/w
– A32/D32 Read/Write, Block Transfers, Auto Address Incrementing
– Simple concatenation of commands– Single packet may contain write followed by
block read
UDP, or
TCP
EMAC
PHY
I2CCore
GTXCore
DAQCore
TransactionEngine
*John Jones implemented something similar
IPbus Firmware
• Resource usage:– Xilinx SP601 Demo board
• Costs £200/$350• Small Spartan 6 (XC6LX16-CS324)• Uses 7% of registers, 18% of LUTs
and 25% BRAM– Block RAM usage may increase
slightly for v2.0 protocol
• Additional features:– Firmware also includes interface to
IPMI controller
17
Ethernet
IPMI
Firmware by Jeremy Mans Dave Newbold
18
The IPbus Suite Overview
• MicroHAL (Based on Redwood)– C++ Hardware Access Library– Highly scalable and fast– Hierarchical with array capability
• Mimic firmware structure– Automatic segmentation of large
commands• e.g. block R split up
– Software has full knowledge of registers• Map onto database
Redwood Explorer to access any register via simple web interface
Software by Andy Rose & Christopher Hunt
19
The IPbus Suite Overview
• Control Hub
– Single point of contact with hardware• Allows multiple applications/clients to access a single board
– Reliable and scalable
– Built on Erlang.• Concurrent telecom programming • Scales across multiple CPU cores
– Automatic segmentation of large commands (e.g. block R split up)
Software by Rob Frazier
20
Scalability with Redwood and the Control Hub
21
IPbus Suite Overview
• PyChips
– Python-based user-facing Hardware Access Library
– Simple & easy interface
– Great for very small or single-board projects
– Cross-platform: Windows, Linux, OS X, etc
– No dependencies except the Python interpreter itself
Software by Rob Frazier
22
IPbus Test System
• Substantial test system– 3 MicroHAL PCs– 1 Control Hub PC– 20 IPbus clients
– Currently 40Mb/s per card• 480Mb/s per crate• Increase this to 100-200Mb/s with jumbo frames
(x6) and firmware improvements• Consider moving to TCP for 1Gb/s
– Reliability• 1 in 189 million UDP packets lost• OK for lab system, but v2.0 of IPbus will have
retry mechanism
23
Links
• IPbus SVN & Wiki hosted on CACTUS project
– Website• http://projects.hepforge.org/cactus/index.php
– HepForge repository• http://projects.hepforge.org/cactus/trac/browser/trunk
– MicroHAL• The Software User Manual, Instant Start Tutorials and Developers Guide• http://projects.hepforge.org/cactus/trac/browser/trunk/doc/user_manua
l/Redwood.pdf?rev=head&format=txt
• Firmware Chief: contact Dave Newbold: [email protected] • Software Chief: Rob Frazier: [email protected] • MicroHAL & Redwood: [email protected]
24
Next steps
• Load RAMs with real events from CMS and pass them through the algorithms under development.
• Virtex 7 design with the necessary 10G links underway
• Develop and release IPbus v2.0– Allows access to IPbus via IPMI– Implements retry mechanism for UDP transport
• IPbus Software Suite v2.0– Code fairly mature, but improvements and bug – fixes will continue as it becomes more widely used.
Still relatively new
Welcome feedback on this.
e.g. performance, user interface, etc
25
Extra
26
Just one example - hierarchical design
+
27
Performance
– Currently• 40Mb/s with full structure albeit with multiple MicroHAL instances on same PC
and the ControlHub (i.e. likely scenario in CMS)• Scales linearly with number of cards (i.e. 480Mb/s) for a crate
– Target of 100+ Mb/s with UDP (Require TCP/IP for 1Gb/s)• Reducing copy stages in firmware from 5 to 3• Moving to Jumbo frames 1.5kB to 9kB – x6
0 50 100 150 200 250 300 3500
1020304050
Time for 100k reads
Read depth in 32bit words
Tim
e (s
)
0 50 100 150 200 250 300 3500
10
20
30
40
Data Rate
Read depth in 32bit words
Mb/
s
PC used to synthesisefirmware
28
Reliability
• Private network– All unnecessary network protocols switched off (spanning tree, etc)
• Sent 5 billion block read requests– 10 billion packets total, 53 went missing.– 350 * 32-bit block read– 7 Terabytes IPbus payload data received– 19 IPbus clients used in test– Packet loss averages at 1 in 189 million UDP packets
28
Version 1.2
Version 2.0, Draft 2
Retry mechanism