cdf pulsar board - university of...
TRANSCRIPT
CDF Pulsar Board
PULSAR: PULSer And Recorder
• People involved in this project• A few words about CDF trigger (Level 2): why building this board?• Pulsar design:
(1) as L2 teststand tool; (2) as a prototype for possible L2 upgrade;(3) as a general purpose tool
• Current status• Possible applications for Pulsar (within and outside CDF)
Ted Liu (Fermilab) , June 3rd, 2002, lunch talk at UC
PULSAR: PULSer And Recorderis designed primarily as a teststand toolfor CDF Level 2 trigger system
• from UC: Mircea Bogdan, Harold Sanders, Henry Frisch, Mel Shochet• from Fermilab: TL, Natalia Kuznetsova, Sakari Pitkanen (visiting student from Finland)• from Upenn: Peter Wittich
People Involved (part time or full time)
Since building something to test something else seems somewhat out of character at CDF nowadays, people who are currently involved in this project are mostly volunteers.
•Fully pipelined DAQ+Trigger:
•“Deadtimeless”:
Dead-time only incurred if all Level 2 or DAQ buffers are full
A few words about CDF trigger…
Front-end & Trigger Electronics Architecture
GlobalL2 decision
Today’s topic
Reces x 4L1
XTRP
SVT
CLIST
MUON
ISO
AlphasX 4
One SVT Cable each
6 fiber (hotlink)1 LVDS cable 16 fibers (hotlink) 7 fibers (taxi)
12 fibers(Taxi)3 L1 cable
Magic bus (128 bits)
L2 crate inputs
Global L2 decision crate:
ROC
TRACER
Track datacluster data
64bits L1MEtSumEt
•After L1A, data arrives to each interface board with different latency •most boards (L1, XTRP, CLIST, IsoList and SVT) will then request for bus, and send
its data to alpha over magicbus;• alpha will process the data, if it needs muon or Reces data (for certain events), it will then
get the data via programmed I/O over magicbus from muon and Reces boards• once decision is made, alpha board will handshake with Trigger Supervisor (via L2-TS cable)
L1 trk svt clist Iso reces mu SumEt,MEtTracks
Jetselectronsphotonsmuons
Taus Tags
MetSumEt
What does Level 2 really do?• Create all the trigger objects needed, then• Count objects above thresholds, or,• Cut on kinematics quantities
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
alphas
Magic bus96 bits/evt
1.5Kbits/evt
21bits/tk 117 /tk 40/cls 145/cls 512 flag bits/10.5Kb/evtassume typical event: 4 tracks and 6 clusters (x2 safety margin)Total per event: 96 + 21x4 + 117x4 + 40x6 + 145x6 + 512 (10.5K) + (1.5K)
= 2306 bits (300 bytes) or ~14Kbits (1.75Kbytes) if sent all50KHz L1A with 10% deadtime: need 10 us transfer time, 10 us processing time
peak(10us): 30MB/s (no muon and Reces data) or 174MB/s if sent all
Magic bus: send 128bits (16Bytes) every 200 ns: 80MB/s.This is the maximum bandwidth for MB, assume single burst from one interface board. Need to add arbitration overhead, program IO speed…etcfor the actual performance
A quick look at bandwidth:
largest datasize
When I first joined CDF(almost 2 years ago), I was told by some L2 experts thateverything was already done and the only thing left to do was to write professional code for Alpha…
So I bought this book…and so far never had a chance to read it. And thecompany ceased to exist…
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
AlphasX 4
One SVT Cable each
6 fiber (hotlink)1 LVDS cable 7 fibers (Taxi) 16 fibers (hotlink)
12 fibers(Taxi)L1 cables
Magic bus
L2 crate inputs
Asked 3 questions a while ago (the answer is in Pulsar design):(1) can one design an universal test (pulser)board? (testability)(2) can one design an universal interface board?(uniformity) (3) can the universal interface board be also a pre-processor?
So far, ~20 man years spent on commissioningthis crate in the past two years… The main difficulty is thatthe system was not designed for testability,simplicity/uniformity.
each board isdesigned differentlyby different groups,
Main Injector(new)
Tevatron
DØCDF
Chicago↓
p source
Booster
aSVT
XTRP
CLIST
ISO
L2 decision crate
L1
RecesHP scope
LogicalAnalyzer
HP scope
The way we have been debugging the Level 2 decision crate:
VERY often need to use CDF+Tevatronas a pulser, to run entire CDF DAQ system to debug L2.
The idea is to build test stand tools to “replace” CDF and Tevatron, to make life MUCH easier for EVERYONE.
L2 inputs
TS
Why diagnostic tools?
aSVT
XTRP
CLIST
ISO
L2 decision crate
L1
Reces
Pulsar
Hotlink IO
Taxi IO
SVT/XTRPL1TS
CDFctrl
VME
MUON
Pulsar is designed to have all the data interfaces that Level 2 decision crate has. It is a data source for all triggerinputs to Level 2 decision crate, it can be used to record data from upstream as well.
PulsAR: Pulser And Recorder
Basic hardware requirement: have all hardware interfaces
Main difficulty for Pulsar design:Each subsystem data pathwas implemented differently,to design an universal testerboard is not all that easy…The only way is to use mezzanine cards…
TS
Fixed or variabledata length?
SVT XTRP L1 CLIST ISO Muon
Data with Buffer#?
Incoming dataClock rate
data size range
Latency range*
EOE with data? (or from separate path?)
B0 marker?
Data gap withinone event?
Reces
30Mhz 7.6Mhz 7.6Mhz 20Mhz 12Mhz 30Mhzcdfclk x 4
Interface hardware SVT cable SVT cable L1 cable Hotlink+fiber Taxi+fiber Hotlink+fiber
7.6 Mhzcdfclk
Taxi+fiber
96 bits/evt
fixed fixed
noyes yes - no no yes -
yes
Level 2 trigger input data paths were implemented differently
Flow control ?
150bits/trk 21 bits/trk 46bits/clu 145bits/clu 11Kbits/evt 1.5Kb/evt
~ 6 us
yesyes
~132 ns~1us - 10us
variable
yes
BC#
yes
not used
~10-100us
variable
yes
BC#
yes
Not used
yes
no
no
no
~1-20us
variable
yes
no
no
no
variable
no
fixed
no no
no
nono
~1-5 us~few us
* Latency range also depends on L1A history …
noyes
aSVT
XTRP
CLIST
ISO
L2 decision crate
L1
RecesHP scope
LogicalAnalyzer
HP scope
What are the test tools ?
L2 inputs
MMB
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
MMBMagic
MysteryBoard
SVTformat
Magicbus
CDFctrl
Pulsar
Hotlink IO
Taxi IO
SVT/XTRPL1TS
CDFctrl
VME
MUON
L2 test crate
TS
Goals of Pulsar project
(1) Speed up the debugging process for L2 system:Alpha, Mbus, all interface boards
(2) test the robustness and rate capability of the L2 system without the need of beam or high luminosity beam;
(3) Long term maintenance of the system …
(4) If needed, Pulsar can be used as a prototype(candidate) for possible L2 upgrade path
Overall goal: make sure we have a robust L2 system with great performance wellinto Run2b …
SVT
Control/Merger
TS
L1
Data IOSVT
Mezz cardconnectors
Pulsar design (general purpose tool)
9U VME(VME FPGA not shown)
P1
P2 userctrl
3 Altera APEX 20K400_652 (BGA) FPGAs
P3
SLINKsignal lines
Data IO
spare lines
Each FPGA has 502 user IO pins, all of them are used.
Control FPGA
L1 input L1 output
L1 cableconnectors
TRK
FIFO
TRK
SVT/XTRP input connector
TSI
TSI
SVT/XTRP output connector
Some details on SVT/XTRP, L1 and TSI interfaces
fromalpha
toTSI
external FIFO, follows SVT implementation
L1 inputand outputshare the sameIO pins, setby one enable bit
MezzanineCard
PULSAR design
IO
Ctrl
IO
Hotlink/Taxi/S-LINK…
TS
L1
SVTSVT
L1
Front-panel(double width)
component sideThe mezzanine card connectors can be used either for user I/O or SLINK cards
To/from
Pulsar ora PC
SLINK
VME
LVDS connectorsMezzanine cards
Three FPGAs: Atlera APEX20K400
S-LINKLDC or LSC
S-LINKLDC or LSC
optionaluser definedsignal connectionfrom P2/P3
VME interface to all three FPGAs:
• based on UC VMEchip used on other UC boards• the interface from UC VMEchip to three main FPGAs:
VMEdata(31:0), VMEaddr(23:2), vmeAS, vmeDS and vmeWrite
CDF control (P2) signals to all three FPGAs:• CDFCLK, BC, B0, L1A/R, L2B0/B1, Halt, Recover, Run, L2A/R, L2BD0/BD1,
CDF_error, GLIVE, STOP, RL(2:0) … essentially all CDF control signals arevisible to Pulsar
Pulsar inter-communication control lines (P2 user defined pins):• follows SVT implementation (can communicate with any SVT board)• A1: Pulsar_init• A2: Pulsar_error• A3: Pulsar_freeze• A4: Pulsar_lostlock• A5: Pulsar_spare
Any Pulsar board can drive and listen to these 5 lines from P2.
Pulsar interface to/from P1 and P2 backplane
Custom Mezzanine cards (follow CMC standard)• Hotlink: Tx and Rx (CLIST, Muon data paths)• Taxi: Tx and Rx (Iso, Reces data paths)
Altera EP1K30_144 FPGA
CMC Connectors (J1 and J3)
Hotlink or Taxi Tx/Rx chipsHotlink Tx/Rx: CY7B923JC/933JC
Taxi Tx/Rx: AM7968/7969-175JC
Hotlink Optical Tx/Rx: HFBR-1119T/2119T
Taxi Optical Tx/Rx: HFBR-1414T/2416T
CMC: Common Mezzanine Card standard
hotlink mezzanine card prototype
The transition module is very simple (just a few SLINK CMC connectors).
It uses P2 type connector for P3. We will only use P3 for SLINK and spare (user defined) signals. (it doesn’t have to be P2 type connector).
Loaded with SLINKMezzanine cards
Can simply useCDF CAL backplane.
CERN sent ustwo transition modules
CDF Pulsar board
ATLAS will use one standard link (S-LINK) to move the data from all Read-out Drivers (RODs) from the subdetectors from the experiment 100 meter down under the ground to the Read-out Buffers (ROBs) located in buildings on the surface. Around 1500 of those links are needed, each one moving data at 160 MByte/s.
SLINK format example: ATLAS SLINK data format
SLINK interface mezzanine card
PCI to SLINKSLINK to PCI
Proven technology, has been used by a few experiments to takehundreds of TB data in the past few years
New 32-bit SLINK to 64 bit PCI interface card: S32PCI64
•highly autonomous data reception
• 32-bit SLINK, 64-bit PCI bus
• 33MHz and 66 MHz PCI clock speed
• up to 260MByte/s bandwidth
New 32-bit SLINK to 64 bit PCI interface card: S32PCI64
High-speed follow up of the Simple SLINK to PCI interface card
• highly autonomous data reception• 32-bit SLINK, 64-bit PCI bus• 33MHz and 66 MHz PCI clock speed• up to 260MByte/s bandwidth
division EP ATE/DQ
CERN
Erik van der Bij
SLINK
SLINK
32to64
map
32to64
map
S-LINK to PCI-64 interfaceS-LINK to PCI-64 interface
INPUTBUFFER
FIFO(1024 x 64-bit)
INPUTBUFFER
FIFO(1024 x 64-bit)
BACKENDCONTROL
LOGIC
BACKENDCONTROL
LOGIC
PCIBURSTFIFO
128x64-bit
PCIBURST
FIFO
128x64-bit
REQUESTFIFO
(address, length)
REQUESTFIFO
(address, length)
ACKNOWLEDGEFIFO
(ctl words, length)
ACKNOWLEDGEFIFO
(ctl words, length)
CONTROL,STATUS &
INTERRUPTREGISTERS
CONTROL,STATUS &
INTERRUPTREGISTERS
64-BIT
PCI
CORE
64-BIT
PCI
CORE
DMAENGINE
33/66 MHz32/64-bit
PCI
33/66 MHz32/64-bit
PCI
division EP ATE/DQ
CERN
Erik van der Bij
SLINK
SLINK
32to64
map
32to64
map
S-LINK to PCI-64 interfaceS-LINK to PCI-64 interface
INPUTBUFFER
FIFO(1024 x 64-bit)
INPUTBUFFER
FIFO(1024 x 64-bit)
BACKENDCONTROL
LOGIC
BACKENDCONTROL
LOGIC
PCIBURSTFIFO
128x64-bit
PCIBURST
FIFO
128x64-bit
REQUESTFIFO
(address, length)
REQUESTFIFO
(address, length)
ACKNOWLEDGEFIFO
(ctl words, length)
ACKNOWLEDGEFIFO
(ctl words, length)
CONTROL,STATUS &
INTERRUPTREGISTERS
CONTROL,STATUS &
INTERRUPTREGISTERS
64-BIT
PCI
CORE
64-BIT
PCI
CORE
DMAENGINE
33/66 MHz32/64-bit
PCI
33/66 MHz32/64-bit
PCI
New 32-bit SLINK to 64 bit PCI interface card: S32PCI64
High-speed follow up of the Simple SLINK to PCI interface card
• highly autonomous data reception• 32-bit SLINK, 64-bit PCI bus• 33MHz and 66 MHz PCI clock speed• up to 260MByte/s bandwidth
Possible applications for Pulsar
• As Pulser for Hit-Finder boards in SVT system
• As Pulser for L2 decision crate: can source any data path as well as multiple paths at the same time
• As Recorder for L2: can record data from upstream for any data path (or multiple paths)
• As a simple standalone DAQ system (in a test beam environment with custom mezz cards)à can receive external trigger signals (NIM etc) via AUX card in the back
• general diagnostics tool within and outside CDF• Future expansion is as cheap/fast/easy as designing a mezzanine card ……
the rest are all commercially available …
Other possible/potential applications:
The flexible design (“lego style”) makes it possible to use Pulsar as a general purpose tool within or outside CDF …
• Can sink/source G-link/TAXI/HotLink/LVDS data for generic DAQ/trigger diagnostics
• As spare interface board for the existing L2 decision crate
• could be used as a prototype for Level 2 upgrade if needed
PC 1
One possible configuration of multiple Pulsars
S-LINK
PC 2
S-LINK
S-LINK
S-LINK
S-LINK
S-LINK
CustomMezzanine cards
Pulsar
Pulsar
Pulsar
Pulsar
Pulsar
Pulsar
The formation of Pulsar cluster
Examples: possible applications of Pulsar• as a pulser for CDF L2 system;• as a recorder for CDF L2 system (via VME);• general purpose recorder (data sink to PC);• possible candidate for CDF L2 upgrade
MezzanineCard
IO
Ctrl
IO
Hotlink/Taxi/S-LINK…
SLINK
VME
S-LINKLDC or LSC
S-LINKLDC or LSC
optionaluser definedsignal connectionfrom P2/P3
Basic requirement for a L2 data source board:
L1A for buffer n
Data block
latencyTo L2 interfaceBoard input
FIFO
RAM
Test pattern
(1) upon L1A for buffer n, start a counter for buffer n;
(2) At the same time clock data from RAM into the FIFO,
(3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed which matches with that of the subsystem… the actual latency is controlled by when the data is clocked out the FIFO.
This is an over simplified picture
FIFOFIFOFIFOFIFO
Optical IO FPGA
SRAM
InternalTest
RAM
controller
L1ABuf #
The latency is controlledby when the data is clockedout the FIFO
Pulsar in pulser mode:hotlink examples:
(only one mezzanine card shown)
128K x 36 SRAM
8 bitsdata
8 bitsdata
8 bitsdata
4 Ctrlbits
load test pattern memory:use 36 bits data width, 32 will be for 4 fiber output (4 x 8), the highest 4 bits will be used as control bitsto mark the content of data. For each event worth data, the first one will be the header, and the 32 bitsdata will contain the latency (&number of words etc) for this particular event and this particular path. The last one is the trailer, which can contain other info if needed (such as what L2 decision should be etc)(either use internal RAM or use 128K x 36 external SRAM, CY7C1350):
Buffer0 data
Buffer1 data
Buffer2 data
Buffer3 data
36bits
8 bitsdata
The highest two address bits will be controlled by buffer number to divide automatically the memory for 4 buffers
How does it work:(1) after L1A, read the first word(header) and get the latency, at the same time start a counter;(2) continue to readout the rest of the data words from the memory and clock them into a FIFO,
until the trailer is reached (can get the L2 decision information there)(3) once the counter reaches latency threshold, clock the data out from the FIFO at the speed
which matches with the subsystem.this way the latency for each event and each data path can be individually controlled by user.
Buffer 0 data memory
header
trailer
Latency for this event, and other info
data data data data
Other information (what L2 decision should be etc)
One could have morecontrol by insertinggaps in betweendata words…etc usingthe 4 control bits,to better mimic thereal situation for certain data paths.
This approach seem to bequite flexible
1st event
Ctrl bit 35: headerCtrl bit 34: trailerCtrl bit 33: gapCtrl bit 32: reserved
Initial thoughts on tester firmware design
FIFO
L1ABuf#
counter0counter1counter2
counter3
RAM
VME36bits
addr
State machine
FIFO
FIFO
FIFO
FIFO
data
ctrl ctrl
MezzanineCard side
• Latch L1A+buf#• read 1st word from RAM• save latency&comparewith counter
• continue reading datafrom RAM to FIFOs
• until last word• once counter countsup to latency, enableFIFO output and ctrlsignals for Tx chips• ready for next L1A
Buffer 0 data
Buffer 1 data
Buffer 2 data
Buffer 3 data
null
L1A
header
RAMtoFIFO
Latency
done
output
Comments:Pro: simpleCon: not so elegant, as thestate machine has to finishsending all data out of the FIFObefore able to process nextone. Maybe ok if run at higher clock rate. Will be some intrinsicdelay between events.
Good starting point, allow us to simulate theboard soon.
Would be better to separatethe RAM to FIFO part fromthe actual data sending part
Possible implementation A:
Peter Wittich willtalk about pulser modefirmware design detail and status soon.
null
L1A
header
RAMtoFIFO
done
Comments:Pro: more elegantCon: somewhat more
involved.
Implement this laterfor the real thing.
We decided to go for implementation A first.
Output controller
FIFO
Data ready to be sent
latencycounterQ
Possible implementation B:
FIFOFIFOFIFOFIFO
Optical IO Unit
SRAM
InternalTest
RAM
controller
L1ABuf #
Configure the (S)RAMas a circular buffer for recording (for each L1A)and can be stopped
and read out via VME.
hotlink examples:
(only one mezzanine card shown)
Pulsar in recorder mode (readout via VME)
or
VME
PULSAR
Magic bus
Level 2 Test Stand: Board level test
ALPHA
INTFACEdata input
Data source: Level2_PulsarData sink: Alpha, MMB+PulsarData patterns:
(1) hand made(2) derived from MC(3) derived from data bank(4) recorded from upstream,
catch errors and reproduce them
TRACER
ROC
L2 decision to TS
MMB
MagicbusAnalyzer
Full loop(s) test
• To just test interface board,one Pulsar + MMB is needed.
• To just test alpha and magicbus,one Pulsar and MMB is needed
• many other options…
Reces x 4L1
XTRP
SVT
CLIST
ISO
MUON
alphas
Magic bus
PULSAR
PULSAR
Driving the full system:
TRACER
L2 test crate
PULSAR
only 5 Pulsar boards needed to drive the full system
16 fibersX 3 =48
16 fibers
PULSAR
TS input
with different
PULSAR
ROC
L1A rate &Event patterns
TRK
Merger
L1
OpticalIO
CDFctrl
OpticalIOSLINK
datactrlSLINK to PCI
PC
Pulsar in (general purpose) recorder mode (directly into a PC)
Mezzanine cardconnectors
Pulsar can convert any user data(via custom mezzanine cards)into SLINK format thentransfer the data into a PC
TS
TRK
Pulsar in recorder mode(into a PC): firmware design is similar to all three FPGAs
FIFO
FIFO
L1A(x4) queue
FIFOL1 trigger bits
(pulls oneevent worth
of data ata time)
*Checks dataconsistence
•merges and stamp data
SLINKFormatter
L1ABuffer(2)
Input 1
Input 2
headerinfo.
DataIO FPGA
DataIO FPGA
Control/merger FPGA
P3
L1A
L1T
firmware design is similar to all three FPGAs
Data package (per L1A)in SLINK format sent to PC…Throughput: 160 MB/s into PC
Board Level simulation
SLINK output to P3
L1A
L1Trigger bits
input data1
input data2
Output from DataIO FPGA
Can we use Pulsar to upgrade Level 2 if needed?
it is possible…
Level2_PULSAR:
Level2_Processor Controller pre-processor/merger pULSer
AndReadout
RecesPre-processor
X 3
16 x3fibers
Cluster Pre-processor
MuonPre-processor
L1 bitsXTRP
cluster6 fibersIso7 fibers
16 fibers
Global ProcessorController
L1SVT
TS
L2 decision
One possible configuration:
Reces/trk
Cluster/trk
Muon/trk
Slink to PCI
PCI to Slink
CPU
SumET,METfrom a L1 type cable(another mezzanine card)
SLINK
SLINK
SLINK
RecesPre-processor
X 3
3 RecesPre-processor
+ 1 merger
All use the same type of board: Pulsar
16x3 fiberstaxi
6 hotlink+ LVDS
7 taxi fibers
12 hotlinkmatchbox
4 hotlinkprematchbox
XTRP
L1
SLINK
SumEt,MET
CPU
TS
L1 SVT
Possible configuration:
Reces/Trk
Cluster/trk
Muon/trkSLINK
Muon path
Clist and Iso path
Reces path
XTRP
Control/Merger
TS
L1
Data IO
Pulsar as Muon Pre-processor (interface board)
P1
P2
P3
SLINKsignals
spare lines
0-120 degree
120-240 degree
240-360 degreeDataIO
SRAM
SRAM
Info available:L1, XTRP,Muon matchbox data (0-240 degree)
Info available:L1, XTRP,Muon matchbox,Pre-matchbox
ALL muon data available with Track info.
SVT
data from Matchbox (16 hotlink fibers)
data from pre-match box(4 hotlink fibers)
Can already create trigger objects at this stage
XTRP
Control/Merger
TS
L1
Data IO
Pulsar as Cluster Pre-processor (interface board)
P1
P2
P3
SLINKsignals
spare lines
4 hotlink fibers
2 hotlink fibersPlus 1 LVDS cable
4 Taxi fibersDataIO
SRAM
SRAM
Info available:L1, XTRP,all clusters
Info available:L1, XTRP,isolation
ALL clusterInformationavailable withTrack info
Can already create trigger objects at this stage
SVT
CLIST input
IsoList input
3 Taxi fibers
Transfer A (1st 33ns)---------------------bit0 - data(0)bit1 - data(1)bit2 - data(2)bit3 - data(3)bit4 - data(4)bit5 - data(5)bit6 - data(6)
bit7 - VCC
Transfer B (2nd 33ns)---------------------bit0 - data(7)bit1 - data(8)bit2 - data(9)bit3 - data(10)bit4 - data(11)bit5 - data(12)bit6 - data(13)bit7 - VCC
Transfer C (3rd 33ns)---------------------bit0 - data(14)bit1 - data(15)bit2 - data(16)bit3 - data(17)bit4 - data(18)bit5 - data(19)bit6 - data(20)bit7 - Bunch Zero Marker
Transfer D (4th 33ns)---------------------bit0 - data(21)bit1 - data(22)bit2 - data(23)bit3 - GNDbit4 - GNDbit5 - GNDbit6 - L2 Endmarkbit7 - GND
Information from Eric James about muon input:
Each matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.
There is a register on the Matchbox card which gives one the ability to send zero, ten, twenty, or thirty words to L2. This feature was included in case we needed to complete our data transfer within a giventime window to make the system work. The central trigger primitives are sent in the first ten words, the forward trigger primitives are sent in the next ten words, and L1 trigger decision data is sent in the last ten words. If one looks at the table in section 29.5.1 of CDF4152, the words which get sent to L2 begin with the High Pt CMU East bits (P0+3) and end with the IMB Diagnostic bits (P0+32). Theoutput ordering of the words is the same as that shown in the table. The pre-match connections workIn exactly the same way. There are only 16 24-bit words output to L2 from each pre-match card.From the table in section 29.5.2 of CDF4152, the first word which gets sent is the CMP primitives for stacks 00-23 (P0+2). The last word sent is CMP/CSP west matches for stacks 72-95 (P0+17). The ordering is the same as in the table. There are also register bits on the Pre-Match to control the number of words being sent. For this card one would transfer either zero, eight, or sixteen words.
FIFOFIFOFIFOFIFO
Optical IO FPGA firmwarefor muon case
Motherboard Optical IO FPGAEach matchbox card sends up to 32 24-bit words for each fiber. The transfer time for each word is 132 ns. Each 24-bit word is encoded into 32-bit transfer over hotlink and come in as four groups of 8-bit words.
D C B A
8 bits each, totalEncoded useful data is 24 bits @132 ns
8 bits data stream will be pushed into FIFO for each fiber, then they will be clocked into motherboard optical IO FPGA (pushed or pulled). The FIFO read clock can be faster than the write clock (depends on how fast one can clock data over CMC connectors). To be conservative, assume for now the read/write clock is about the same.
XTRP lookup table(for both muon and Cal):XFT divides the COT into 288 segments (1.25 degree each),A wedge spanning 15 degree has 12 segments, each segmenthas a unique LUT within a wedge.
15 bits Address 18 bits output
0-1: phase CM IM00: CMU high Pt CAL01: CMU low Pt crack10: CMX high Pt IMU11: CMX low Pt IMU
2-8: 96 XFT signed-Pt
9-11: Local phi within segment
12: passed superlayer 8?13: Isolation (other tracks nearby?)14: reserved for future use
For CMU, CMX and IMU,the 18 bits represent the 18 triggertowers in three wedges (one +two neighbor wedges)
when even a part of tower is withinthe muon footprint and satisfiesthe Pt threshold, the bit corresponding to the tower is set to 1
An example of LUT for CMUfor an isolated track with
Pt =-6.19 GeV/c, local phi=1.02 degree, andpassed superlayer 8 of COT
Bit address output content
001100011111100 000000000000000000001100011111101 000000000000100000001100011111110 000000000000000000001100011111111 000000000001110000
Phase:CMU high Pt
Pt bins for –6.19 GeV/c
local phi
passed SL8
isolation
32-bit FIFO@132ns
(30 x 4 deep)8->32
Motherboard Optical IO FPGA (only shown 4 ch)
32-bit FIFO@132ns
32-bit FIFO@132ns
32-bit FIFO@132ns
8->32
8->32
8->32
8 bits@33ns
L1A(x4) queue
64-bit (x4) FIFO@132ns
L1 bits
24-bit FIFO@132ns
XTRP data
Muon Filter
Algorithm(pulls one
event worthof data at
a time)
*Checks dataconsistence
•Filters databased on L1bits and XTRPinformation
•muon-track Match…. etc.
SRAM
0-30 deg
30-60 deg
60-90 deg
90-120 deg
SLINKFormatter
32 bits data+ ctrl bits
L1ABuffer(2)
ToMergerFPGA
Muon data as an example.Each word is 24 bits (sent as4 8-bit words over hotlink).
Can pack them into 32-bitwords (SLINK is 32 bits):4 24-bit words can be packed into3 32-bit words.
Or can pack them in 24 bitsexactly as muon bank format,and use the other 8 bitsto stamp other information …
CDF Muon bank data format
Muon data path:
Muon path has 16 fiber (hotlink) inputs, large data size (11Kbits/evt).12 of them from matchbox, each fiber cover 30 degree in phi.4 from pre-matchbox.
Designed to have many options:Simple ways to suppress muon data (examples):• first check L1 bits: if no muon info is needed, send out
empty data package with only header and trailer, whereheader will be stamped with buffer number, bunch counter etc.
• if muon info is needed, then check XTRP(track) bits, may onlypass the ROI (Region-Of-Interest) phi data downstream;
• or zero suppress the data based on muon data itself
Sophisticated way to suppress muon data:• use XTRP LUT to pre-match tracks with muon data;
FIFOFIFOFIFO
F E D C B A8 bits each @50ns, one cluster is encoded in 6 8-bit words in all 6 fibers
8 bits data stream will be pushed into FIFO for each fiber. The end of event marker comes from LVDS connector.
FIFOFIFOFIFOFIFO
Optical IO FPGA firmwarefor CLIST case
LVDS
Inputs: 6 fibers + LVDS
48-bit FIFO8->48
Motherboard Optical IO FPGA for CLIST case
8->48
8->48
8->48
8 bits@50ns
L1A(x4) queue
64-bit (x4) FIFOL1 bits
24-bit FIFOXTRP data
Cluster FormatterAlgorithm
(pulls oneevent worth
of data ata time)
*sums all infofor each cluster*checks end of
event*Checks data
consistence•Stamp databased on L1bits and XTRPinformation
•cluster-track match?
(electron ID)
SRAM
SLINKFormatter
32 bits data+ 4 bits ctrl
L1ABuffer(2)
ToMergerFPGA
48-bit FIFO
48-bit FIFO
48-bit FIFO
48-bit FIFO
48-bit FIFO
8->48
8->48
8-bit FIFO4->16LVDS
Motherboard Merger FPGA
FIFO
FIFO
Iso input
L1A(x4) queue
64-bit (x4) FIFO@132nsL1 bits
24-bit FIFO@132ns
XTRP data
Cluster merger
Algorithm(pulls one
event worthof data at
a time)
*Checks dataconsistence
•merges and stamp data
for both clusterand Iso info
SRAM
SLINKFormatter
32 bits data+ 4 bits ctrl
L1ABuffer(2)
FromOptical IO FPGAs To
ControlFPGA
Cluster input
SVT
Control/Merger
TS
L1
Data IO
Pulsar as Processor Controller
P1
P2
P3
SLINKsignal lines
spare lines
Reces/trk
Cluster/trk
Muon/trk
Met/SumEt
DataIO
SRAM
SRAM
Info available:L1, SVT, XTRP,Reces, Cluster, Iso
Info available:L1, SVT,XTRP,Muon,Met, SumEt
Tracks, Jets, e,photon,tau etc
Tracks, muon,Met,SumEt etc
ALL Info available for L2 decisions,
Can even make simple L2 decisions, only send complicated events to PC
L1 trk svt clist Iso reces mu SumEt,MEtTracks
Jetselectronsphotonsmuons
Taus Tags
MetSumEt
What does Level 2 really do? (examples)• Create all the trigger objects needed, then• Count objects above thresholds, or,• Cut on kinematics quantities
Possible Baseline Design could be:
• PreProcessor/Interface versions of the board gather data fromeach subsystem and package the data. This can include sparsificationbased on L1 trigger bits, tracking, or the data itself.
• Processor controller/Merger version of the board merges data from interface boards and packages the data for transfer to a CPU. Thedata can be further sparsified at this stage. This board alsoprovide the interface between L2 and the TS.
• Both types of PreProcesor board can be used to readout data for TL2D
The default operation would have the final decisions made in codein the CPU.
Optical IO FPGA
DAQ buffersSRAMController
VME interface
Formatter/filter
RAM controller
RAM controller
MezzanineCard interface
CDF Ctrlinterface
upstreamdownstream
Blocks in green are common to all FPGA types, blocks in whiteare specific to this type of FPGA
Play/recordPlay/record
Mezzanine card interface
Mezzanine Card ID
Mezzanine cardIdentification
Mezzanine cardinterface
Feature: has a dedicated unit which identifies the mezzaninecard plugged into motherboard at power up.
At power up, all FPGAs will be configured, but the inputswill be “disconnected” by default. The mezzanine card identificationunit will first identify the mezzanine card type, if it is the wrongtype for this FPGA firmware, signal an error (error register&LED). If the right type is identified, the gate will be closed….
Mezzanine card identification at Power Up Mezzanine cardshave both inputand output types
gate
MUON
RECES
PROCESSOR
ROC
TRACER
CLUSTER
RECES
RECES
Muon/trk
Cluster/trk
MERGER
NewL2Decisioncrate
SumEt,MEt
TS
Slinkto PCI
PCIto Slink
mem
CPU
Reces/trk
GHz PC orVME processor
All Pulsar boards take two slots (due to mezzanine cards)Total: 7 pulsar boards = 14 slots
Baseline design: use pre-processors tosimply suppress/organize data, use processorcontroller to simply pass data to CPU via Slink to PCI and also handshake with TS. All trigger algorithmwill be handled by CPU.
MUON
RECES
PROCESSOR
ROC
TRACER
CLUSTER
RECES
RECES
Muon/trk
Cluster/trk
MEGER
NewL2Decisioncrate
PreFred
TS
Slinkto PCI
PCIto Slink
mem
CPU
Reces/trk
GHz PC orVME processor
We use SLINKproducts to:
(1) Send data from processor controller to CPU;(2) Receive L2 decision data from CPU to Pulsar(3) Link between pulsar boards
48 fiberstaxi
6 hotlink+ LVDS
7 taxi fibers
12 hotlinkmatchbox
4 hotlinkprematchbox
XTRP
L1
SLINK
SumEt,MET
CPU
TS
L1 SVT
Another possible system configuration (3 Pulsar boards, use 1 MMB and 4 Reces)
Reces/Trk
Cluster/trk
Muon/trkSLINK
muon
cluster
Existing Reces boards
Short Mbus
MMBReces data in SVT or SLINK format
FIFO
DAQ buffer
TestRAM
TestRAM
Upstream data
Downstreamdata
Initial estimate of FPGA RAM capacity requirements (worse case):Each FPGA needs to have up to 4 eventsFIFO like storage at input stage, needs to have 4 events DAQ buffers (not includingtest RAMs at input and output):Optical IO FPGA (worse case is muon input):
(1.3KB/evt * 4) *2 = 10.4 KBEP20K400 has 26KB maximum RAM capacity,
while EP20K200 has 13KB.
Merger FPGA (worse case is in processor controller mode): (1.8KB/evt * 4)*2 = 15 KB, EP20K400 should be big enough
Assuming no data suppression(for worse case)
What has been done so far (project officially started last Dec.)
Pulsar design is “optimized and finalized”:
• schematics finished in April• core firmware in place, VHDL code in CVS• intensive board level simulation in progress• detailed firmware design and VHDL coding in progress• layout in progress • Pulsar web page is set up with all design details• hotlink mezzanine cards (Tx and Rx) prototype testing on going
Plan in the summer:
• First Pulsar prototype
Pulsar Top Level schematics
IO FPGA
Control FPGA
Initial Pulsar components placement in April (the placement has been changed since)
MANY IO pins are assigned by hand to make the routing easier. Criticalsignals (clocks etc) are assigned by compiler with core working VHDL code.
Initial Pulsar layout (as in April). this is old. the placement has been changed since
Pulsar firmware in CVS• common: library for lower level VHDL code• fpgapinmap: IO pin assignment files,
compile setting files, VHDL templates etc• Rx: VHDL code in various receiving modes• Tx: VHDL code in various pulser modes
crucial for team work and long term maintenance
Pulsar is designed for long term maintenance for Level 2
Hotlink mezzanine cardsprototypes (Tx and Rx)
http://hep.uchicago.edu/~thliu/projects/Pulsar
Board Level simulation and prototype testing plan
Pulsar Mezzanine cards: hotlink Tx and Rx mezzanine cards have been simulated together,has a teststand setup at UC for prototype testing.
Pulsar motherboard:
• intensive board level simulation in progress• prepare for multi-board simulation: Pulsar + mezzanine cards• schedule depends on board level simulation work… • the goal is to validate the design with INTENSIVE board level simulation by the end of June. Send out the prototype in July,and continue board level simulation…
Board Level simulation:
Prototype testing plan:
• can use SLINK test tools first• then test with custom mezzanine cards
Transmitter Receiver
Mezzanine cards board level simulation
(1) 4 fiber case first(2) 2 fiber + LVDS case
This setup can test everything except CMC connectors
Natalia will talk aboutdetails later
Transmitter Receiver
Mezzanine card Prototype/production test plan I (use the working teststand setup at UC):
Pattern Generator HP LA
(1) Use PG + LA;(2) Use FPGA internal RAM + LA(3) Use BIST + LA (hotlink)(run for long time andset limit on bit-error-rate to
test for robustness)
This setup can test everything except CMC connectors
Pulsar + 4 hotlink mezzanine cards: prepare for multi-board simulation
Pulsar
Mezzanine cards
PULSAR
IO
M
IO
Slink out
CPU
Use SLINK source card to send dataUse SLINK data sink to check data
Initial test plan for Pulsar board prototypePulsar board prototype can be first tested with SLINK test tools,
then can be tested with custom mezzanine cards
SLINK data Drain test board
SLIN data source test board(ideal for initialPulsar prototype testing)
GlobalL2 decision
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
PULSAR
(1) As pulser and recorder
(2) upgrade/replacement
What Pulsar can do for L2?
Summary: • Pulsar is designed primarily as a teststsand tool for CDF Level 2 trigger system;• The design is general enough that it can be used in many applicatons:
(1) as teststand tool for CDF L2 decision crate and SVT system;(2) as spare interface board for CDF L2 decision crate;(3) as upgrade path for L2 decision crate;(4) as general purpose DAQ/trigger diagnostics tool:
can source/sink/spy LVDS/Hotlink/Taxi/G-LINK etc.(5) as simple DAQ system (such as in a test beam environment) …
MezzanineCard
IO
Ctrl
IO
Hotlink/Taxi/Glink/S-LINK…
TS
L1
SVTSVT
L1
SLINK
VME
S-LINKLDC or LSC
S-LINKLDC or LSC
optionaluser definedsignal connectionfrom P2/P3
Future expansion is as cheap/fast/easy as designing a mezzanine card, the rest are commercially available(SLINK + PC)
The flexible design (“lego style”) makes it possible to use Pulsar as a general purpose tool (within and outside CDF)
HOLAS-LINK
DestinationIntegrated
4 x
HOLAS-LINK
DestinationIntegrated
4 x
33to66
map
4 x
33to66
map
4 x
INPUTFIFO
(256 x 66-bit)
4 x
INPUTFIFO
(256 x 66-bit)
4 x
BACKENDCONTROL
LOGIC
BACKENDCONTROL
LOGIC
PCIBURST
FIFO
128x64-bit
PCIBURST
FIFO
128x64-bit
4 x REQUEST FIFO
(address)each 15 deep
4 x REQUEST FIFO
(address)each 15 deep
4 x ACK.FIFO
(ctl words, length)each 15 deep
4 x ACK.FIFO
(ctl words, length)each 15 deep
CONTROL,STATUS &
INTERRUPTREGISTERS
CONTROL,STATUS &
INTERRUPTREGISTERS
DMAENGINE
64-BIT
PCI
CORE
64-BIT
PCI
CORE
33/66 MHz32/64-bit
PCI
33/66 MHz32/64-bit
PCI
4 4
4
4
4
4
1
Next generation: FILAR (Quad HOLA S-LINK to 64-bit/66 Mhz PCI interface)
4S32PCI64 QuadIn– Software and hardware design effort minimal
• Same programming model as S32PCI64– optimised even more: only 4 instead of 6 PCI accesses needed
per event (fixed buffer size, read ACK FIFO gives also empty status)
• all hardware components exist, just integration
4 x S-LINK LDC
integrated4 x Input Buffer FIFO 256*64 bit DMA
Engine
64 bitPCI Core
33/66 MHz
32/64 bit
3.3V only
Request FIFOAddress
ACK FIFOCTL Words, Length
ControlLogic
PCI
4S32PCI64 block diagram
44
4
4
Tx
Prototype test plan II: use one Pulsar prototype board
this would allow full tests (including the CMC connectors)
Rx
I/O
I/O
Mezzanine cardsproductioncan start ONLY AFTER
the prototypesare tested withPulsar prototype
M
Groups involved in building the system
SMXR, SMD & SQUID(ANL, FNAL)
ADMEM + Café, (FNAL, UCLA,Texas Tech)
HADASD (Frascati)
TDC (Michigan, Pisa,Duke, LBL, FNAL)
Michigan
Michigan, ANL,UCLA, Yale
OSU, FNAL,Michigan
Illinois
Chicago
Chicago, Yale
Michigan
Pisa, Geneva, Trieste, Chicago
ANL
FNAL, YaleGlobal
L2 decision