avoiding metastability in fpga devices mapld 2009 david landoll applications architect mentor...
TRANSCRIPT
Avoiding Metastability in FPGA Devices
MAPLD 2009
David LandollApplications ArchitectMentor Graphics Corp.
Today’s FPGAs
Onboard Computer
System on Chip / FPGA
Memory Controller
Timers IRQCtrl
UART DMA Controller
AHB/APB Bridge
AMBA AHB
32bit Data bus
Data Memory Parity Memory
CAN Controller
HDLC Controller
SpaceWire Controller
AMBA APB
FPU
HDLC Controller
CAN Switch
Leon CPU
EDAC Controller
Boot PROM
SDRAM SDRAM
Address/Control bus
RS422 SpaceWire HDLC1
Up/Downlink HDLC0
Up/Downlink CAN0 CAN1
Dual CAN transceiver
1.5V Linear Regulator
+3.3V
+3.3V +1.5V
CLK Generator
Configuration PROM
JTAG
I/O port
PPS out PPS in
DMA Controller UART
CPU AMBAAHB
• Fabrication advances provide more available silicon area• More functionality can weigh less and take up less space• Integrating/reusing capabilities lowers cost
2
Flight Management
MaintenanceDiagnostics
Image Processing
Integrated Avionics Processing
Flight Control & Avoidance Systems
Boeing 787: Integration’s Next StepFrom its central processor to its common data network, surveillance system and navigation system, the theme of the Boeing 787 Dreamliner is integration.
James W. Ramsey Communications
Weather Radar
3
Integration Presents New Challenges
Such integration usually involves multiple independent clock domains, which leads to clock-domain crossings and metastability errors!
Clock Domain Crossing (CDC) ErrorsUnpredictable Loss of Data
CDC problems— corrupt control and data signals— are subtle, intermittent, unpredictable— are the 2nd major cause of respins— are difficult to reproduce and debug — are temperature, voltage, and process sensitive— will only occur in hardware; often in the final design
Traditional verification techniques do not work for CDC signals
4
A CDC Verification methodology is needed to reduce the risk of CDC related data errors
4
MetastabilityWhat the heck is it, anyway?
What is a clock?— Periodic pulsing signal— Digital logic uniformly connected to this signal— Acts as the Symphony Conductor – keeps logic in sync— Action happens across the logic at one specific point
Typically the “rising edge”
55
Vee, Vss: GND: 0V
Vcc, Vdd : +5V, +3.3V
MetastabilityWhat the heck is it, anyway?
What’s in a register?— (Also known as a latch, flip-flop, etc)— Contain transistors that “trap” the input value at the
appropriate time E.g. rising edge of the clock
— How does this happen?
66
MetastabilityThe Physics of a Register
Let’s take a look at a register— CMOS D-type transmission gate flip-flop
7
CLK
D Q
D
CLK
Q0
0
0
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
7
Transistor Model of a D Flip-Flop
MetastabilityThe Physics of a Register
Let’s take a look at a register— CMOS D-type transmission gate flip-flop
8
CLK
D Q
D
CLK
Q1
0
0
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
8
Transistor Model of a D Flip-Flop
MetastabilityThe Physics of a Register
Let’s take a look at a register— CMOS D-type transmission gate flip-flop
9
CLK
D Q
D
CLK
Q1
1
1
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
9
Transistor Model of a D Flip-Flop
Transistor Model of a D flip-flop
MetastabilityThe Physics of a Register
Let’s take a look at a register— CMOS D-type transmission gate flip-flop
10
CLK
D Q
D
CLK
Q0
0
1
Only works if D has a “good value” at the rising edge of the clock(no Set-up/hold time violations)
-- simple D-type flip-flop
process(CLK)
begin
if rising_edge(CLK) then
Q <= D;
end if;
end process;
10
MetastabilityThe Physics of a Register
When setup/hold conditions are violated, the output of a storage element becomes unpredictable
This effect is called metastability If not contained, metastability can propagate…
11
din tffMTBF
clk
1
fclk = Clock Frequency
fin = Input Signal Frequency
td = Duration of critical time window
CLK
D
Q
CLK
D Q
Setup/hold window
Metastability is UNAVOIDABLE in designs with multiple asynchronous clocks
11
Clock Domain Crossing signal
Clock Domain CrossingsGuaranteed to Cause Metastability
When 2 or more designs run on disparate clocks:— The clocks will continually skew, guaranteeing setup/hold violations — Signals from one design to another are “Clock Domain Crossings” (CDCs)
12
D
CLK
Q
Sensor System Guidance System
TxTx
Clock B
Clock AClock A
Setup/hold window
Signals that cross asynchronous clock
domains (CDC signals) WILL violate setup and
hold conditions
12
D
CLK
Q
Mitigating Clock Domain Crossing Issues
Problem:— Signals crossing a clock domain will violate set-up/hold— Impact: Control/data signals will be dropped/corrupted
Loss of Data
Approaches:— Avoid having systems that have multiple clocks
Although sensible, it’s becoming impossible
— Design around the problem Designer can add “synchronizers” to the design Metastability still happens, but nobody else sees it
— E.g. 2DFF, FIFO, etc.— “Fences in” metastability
13
13
Isolate Metastability: Synchronizers
Designers add synchronizers to reduce the probability of metastable signals
Synchronizers are sub-circuits that can prevent metastable values from being sampled across clock domains
— Take unpredictable metastable signals and create predictable behavior
14
Mitigating Clock Domain Crossing IssuesIsolate Metastability: Synchronizers
15
Clock AClock A Clock BClock B
ii i +1i +1 i +2i +2i -1i -1ii i +1i +1 i +2i +2i -1i -1
TxTx
Metastability window
When metastability occurs, the delay through a synchronizer becomes unpredictable
RxRx
ii i +1i +1 i +2i +2i -1i -1 i +3i +3
15
Invalid Command
Synchronizer Delays Can Reconverge with unexpected resultswith unexpected results
CDC signals cross with an assumed relationship Can be combinational, sequential, or deeply sequential Unpredictable delays on CDC paths lead to reconvergence errors
— Designs need logic to correctly handle reconvergence— Can occur on single-bit or multiple-bit signals
16
S1
S2
S3 S4FS
M In
pu
t
tx_d0
tx_d2
tx_d1
Sync 2
Sync 1
Sync 0
0 0
00 1
00 1
0 0 0
00 0
00 1
0
Valid Command – but delayedG
rey E
nco
der
Gre
y D
eco
der
0 0
01 1
11 1
1 0 0
01 0
11 1
1
Sync 10
0
00 0
01 1
1
16
And, Synchronizers Fail if Misused
Synchronization between clock domains requires a transfer protocol
— Ensures data is predictably transferred between domains
These protocols must be verified
When protocol is violated— Data is lost— Simulation may not show a failure— Silicon will eventually show a
functional error
Synchronizer won’t function properly if the required Transfer Protocol is violated
17
Verification Must Cover All Three CDC Problems
18
Clock domain crossings need:— Structured synchronization— Transfer protocols— Global reconvergence checking
Reconvergence problem
Missing sync problem Possible
protocol problem
18
Mitigating Clock Domain Crossing Issues
Problem:— Signals crossing a clock domain will violate set-up/hold— Impact: Control/data signals will be dropped/corrupted
Approaches:— Avoid having systems that have multiple clocks— Designer can add “synchronizers” to the design— Designer-added synchronizers + full CDC verification
Assures synchronizers are present and used correctly
19
19
Recommendations
During design planning1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for both clock domains, and use coding guidelines
1. Use signal naming conventions
2. Many clock domain errors come from design changes, not the initial design
3. Limit “clock domain crossings” to specific areas or blocks in the design, when possible.
4. NOTE: These techniques can help assure synchronizers are present, but are unlikely to help identify reconvergence or CDC protocol issues.
3. When multi-clock design is required, plan for proper verification
1. How to we accomplish this?
20
20
My_sig_Tx
My_data_Tx_RegMy_data_Rx_sync1
My_data_Rx_sync2
Comb_sig_Rx
For Example:
• Append “_A_reg” to signals leaving A-clk register, “_A” for A-clk combo signals
• Leverage during code reviews - help identify missing synchronizers
• Make sure ONLY _A_reg signals go to synchronizers (no combo logic)
Verifying CDC Synchronization
Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized signals can create
unexpected behavior Approaches:
— Simulation Digital logic simulators do NOT model transistor behavior Do not model “metastability”
21
21
For example …
22
Q
D
CLK
Simulation captures a ‘1’ while silicon produces
either a ‘1’ or ‘0’
Setup Violation
Q
D
CLK
Hold Violation
Simulation Does NOT Reflect Silicon Behavior
Q in silicon Q in silicon
Simulation captures a ‘0’ while silicon produces
either a ‘1’ or ‘0’
22
Q in simulation Q in simulation
Verifying CDC Synchronization
Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
Approaches:— Simulation
Won’t model CDC’s correctly to detect errors
— Static Timing Analysis Can be used to identify signals that cross domains Can be used as input for a manual review But…Won’t detect missing or incorrectly used synchronizers, or
reconvergence
23
23
Verifying CDC Synchronization
Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
Approaches:— Simulation
Won’t model CDC’s correctly to detect errors
— Static Timing Analysis Identifies signals for manual review, but otherwise useless
— Manual Design Reviews Error prone (and very time consuming) Typically only identifies synchronizer structures, misses
reconvergence and invalid sync protocol usage Evidence suggests at least some synchronizers will be missed
24
24
For Example…Trivial Reconvergence Error
Reconverging synchronized CDC signals - timing is unpredictable. Need to verify the downstream logic can handle variations
— Manually identifying the reconvergence is very hard— Manually identifying all possible behaviors is harder— Manually assuring logic will behave correctly – typically intractable
25
25
Verifying CDC Synchronization
Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
Approaches:— Simulation - Won’t model CDC’s correctly to detect errors
— Timing Analysis - Identifies signals for review, but otherwise useless
— Manual Design Reviews - error prone, incomplete
— Lab Verification? Problem is intermittent, debug is impossible
— Spice simulation? – It *does* model transistors, but… Where will you get the “Spice deck”? (transistor level model)
Would be far too slow on a large FPGA
26
26
Verifying CDC Synchronization
Problem:— Missing synchronizers will create metastability— Correctly placed but misused synchronizers won’t work— Reconvergence of synchronized Control logic bugs
Approaches:— So - we need a new method that reliably:
Identifies ALL CDC signals, structures, reconvergence Assures ALL connected, functioning correctly Creates reports for manual reviews The EDA industry has responded
— 6 commercial tools now available…and counting— But…most won’t identify all 3 of our CDC issues
27
27
Mentor’s CDC Verification Technology
Who’s using our technology? Mil-Aero
— Honeywell, Inc. — L-3 Communications — Lockheed Martin Co — Ministry of Aerospace & Aeronautics — Northrop Grumman Corp — Raytheon — Rockwell Collins Inc. — SAAB Group — Thales
Commercial— Widely used in commercial space
The market leader in CDC verificationThe market leader in CDC verification
28
28
Example Value from One Customer Design
— IEEE standard serial communications core— Used in 50-60 other COMMERCIAL ASIC products— Widely deployed (millions in use daily)
Placed core in a sensor guidance system— Found issues in the lab— Debugged FPGA for weeks— Suspected a CDC issue, but not sure…
Deployed Mentor’s CDC solution— Results same day— Found 199 serious CDC bugs!
■ 45 Missing Synchronizers■ 83 Incorrect Synchronizers■ 76 Reconverging Signals■ 11 other problems
— Most resulting from “more stressful” usage In production:
— Commercial ASIC : Customer issue – device is erratic, locks up— Avionics: Could result in an Airworthiness Directive
29
29
SummaryRecommendations
During design planning1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for all clock domains
3. When multi-clock design is required, plan for proper verification
During verification1. Watch for multiple clocks in designs (Tip – Count PLLs)
2. Ask how CDC issues are mitigated (remember there are 3)
Utilize commercial tools designed for detecting these problems1. Verify all 3 classes of CDC problems
1. Structural Verification
2. Protocols Verification
3. Reconvergance Verification
2. Use reports to aid manual reviews
3. Use CDC tools to support ROBUSTNESS
30
30
In Conclusion …
Every multi-clock design is subject to metastability Traditional verification methodologies CANNOT
assure robustness
To properly mitigate the dangers of CDC, we strongly recommend a solution that… :
— Supports Manual Reviews— Automatically reports all sources of CDC problems— Has a proven CDC verification methodology &
customer success
31
31
32