dcc out of sync problems stan durkin, ohio state
DESCRIPTION
DCC Out of Sync Problems Stan Durkin, Ohio State. In Recent High Rate Cosmic Runs (July 18-23, 2010) DCCs have gone into an Out-of-Sync Condition 7 times FMM 750 W 82 B 28 S 1 E 0 FMM 752 W 0 B 0 S 0 E 0 FMM 754 W 1023 B 14 S 6 E 0 - PowerPoint PPT PresentationTRANSCRIPT
DCC Out of Sync Problems
Stan Durkin, Ohio State
In Recent High Rate Cosmic Runs (July 18-23, 2010) DCCs have gone into an Out-of-Sync Condition 7 times
FMM 750 W 82 B 28 S 1 E 0FMM 752 W 0 B 0 S 0 E 0FMM 754 W 1023 B 14 S 6 E 0FMM 756 W 107 B 33 S 2 E 0
Analyze Study Run 141291 (specifically 490s to 540 s)
4,230,000 events thru each RUI5102 events on CMSSW data~0.1 % of events saved
Rate (from slopes): 79.5 KHz
Time (seconds)
L1As
DCC FIFO Overflows at High Data Rates
SLINK FIFO 1MB
Input_FIFO 248KB
CSC DCC &&DDU header have FMM information
CSC DCC sTTS state machine:
SLINK_FIFO goes to Half_Full set WARNING;SLINK_FIFO reset WARNING when drop back to Almost_Empty;IN_FIFO goes to Half_Full and L1A Buffer in WARNING, set BUSY;IN_FIFO goes to Half_Full, but SLINK_FIFO not in WARNING, set WARNING;IN_FIFO stays Half_Full for more than 3.2ms, set BUSY;IN_FIFO reaches Almost_Full, set Out_Of_Sync;IN_FIFO or SLINK_FIFO reaches Full, set Out_of_Sync;L1A Buffer: >1536: set WARNING, reset WARNING when it drop to 1280;L1A Buffer: >1920: set BUSY, reset BUSY when it drop to 1536;L1A Buffer: >2016: set Out_Of_Sync;
- Warning and Busy Stops L1A Triggers (lacency ~1sec)- Out_of_Sync stops run for a resync
t(s) dt(s) FMM 139.384429875 0.436721600 1 139.386232725 0.001802850 8 140.119162225 0.732929500 1 140.120998750 0.001836525 8 144.130565900 4.009567150 1 144.132397975 0.001832075 8 146.057188825 1.924790850 1 146.058872650 0.001683825 8 148.779290350 2.720417700 1 148.781143125 0.001852775 8 152.496441950 3.715298825 1 152.498013425 0.001571475 8 152.817810300 0.319796875 1 152.819979975 0.002169675 8 153.590204650 0.770224675 1 153.592016100 0.001811450 8 154.189867650 0.597851550 1 154.191494650 0.001627000 8 … repeats 90 times … 191.300884525 0.001097700 8 191.301140075 0.000255550 1
191.303430625 0.002290550 2
FMM Log 141491
FMM Throttling Seems to be Working
Time FMM 1 Asserted
Time (msec)
1.8 msec
Transition FMM 12
2.290±0.005 msec
Data Rates aren’t Large Enough to be Causing Overflows
Average Event Sizes RUI 750 884 bytes RUI 751 993 bytes RUI 752 861 bytes RUI 753 1129 bytes RUI 754 843 bytes RUI 755 1163 bytes RUI 756 821 bytes RUI 757 988 bytes
78.5 Khz
~78.5 MB/sLog10(P)*106
Rate (MB/s)
Theoretical Probability of >50 events in Queue
SLINK FIFO1 Mbyte
600 MB/s 480 MB/s To Fill SLINK FIFO in 2.29 msecrequires >200 MB/s even if outputstopped
60 Events in Run 141491 CMSSW data show bad transmission
1960 826d bc50 bc500000 8000 bc50 bc500080 0000 bc50 bc508000 8000 bc50 bc500000 0000 bc50 bc500080 2c1e bc50 bc50c0de c000 bc50 bc50 1560 826d 6d0f 50800000 8000 0001 80000080 0000 1014 3f7f8000 8000 ffff 80000000 0000 0000 20000080 2210 0006 a000
3.2 GB/s 3.2GB/s Two independent 3.2 Gbit links
Good Data
Bad data, 0xBC50 idle code
Transfer problemOn 3.2 Gbit Backplane
f308 7342 76b2 516401f0 5ae0 0e36 d9001960 734d 5064 c0de0000 8000 8000 76b20080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 be16 a000 0000c0de c000 c000 00061960 86bd 5064 c0de0000 8000 8000 76b30080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 2a10 a000 0000c0de c000 c000 00061960 916d 5064 c0de0000 8000 8000 76b40080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 5039 a000 0000c0de c000 c000 00061960 960d 5064 c0de
How do we prove these events are causing problem ?
last column shift
Viewed several hundred bad transmissionevents. Only a small number of DDU->DCClinks gave problems.
RUI755 DDU 25 mostRUI757 DDU 33 manyRUI751 DDU 7 a fewRUI751 DDU 3 a fewRUI756 DDU 35 oneRUI755 DDU 16 one
We will swap DDU 25 and see if the problemsgo away.
Possible Remedies to Problem
• Fix problem boards
• Reconfigure XILINX RocketIOS
Channel Bonding – lock step data transmissions 16 bit -> 32 bit transfers – keep data packets together
• Change Clock Frequency in Firmware (divide by 2)
we don’t need 800 Mbyte/s
This is not urgent. We will proceed with caution.