landoll david mapld09 pres 1

Upload: pooja-khare

Post on 05-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Landoll David Mapld09 Pres 1

    1/32

    Avoiding Metastability inFPGA Devices

    MAPLD 2009

    David Landoll

    Applications Architect

    Mentor Graphics Corp.

  • 8/2/2019 Landoll David Mapld09 Pres 1

    2/32

    Todays FPGAs

    Onboard Computer

    System on Chip /

    FPGA

    MemoryController

    Timers IRQCtrl

    UARTDMA

    Controller

    AHB/APBBridge

    AMBA AHB

    bit Data bus33

    Data Memory Parity Memory

    CAN

    Controller

    HDLC

    Controller

    SpaceWire

    Controller

    AMBA APB

    FPU

    HDLC

    Controller

    CAN

    Switch

    Leon CPU

    EDAC

    Controller

    BootPROM

    SDRAMSDRAM

    Address/Control bus

    RS33 SpaceWireHDLC3

    Up/DownlinkHDLC3

    Up/DownlinkCAN3CAN3

    Dual CAN

    transceiver

    . V Linear11

    Regulator

    + . V33

    + . V33 + . V33

    CLK

    Generator

    Configuration

    PROM

    JTAG

    I/O port

    PPS outPPS in

    DMAController

    UART

    CPUAMBAAHB

    Fabrication advances provide more available silicon area More functionality can weigh less and take up less space Integrating/reusing capabilities lowers cost

    2

  • 8/2/2019 Landoll David Mapld09 Pres 1

    3/32

    Flight Management

    MaintenanceDiagnostics

    Image Processing

    Integrated AvionicsProcessing

    Flight Control &AvoidanceSystems

    Boeing 787: Integrations Next StepFrom its central processor to its common data network,surveillance system and navigation system, the theme ofthe Boeing 787 Dreamliner is integration.

    James W. Ramsey Communications

    Weather Radar

    3

    Integration Presents New Challenges

    Such integration usually involves multiple independent clock domains,which leads to clock-domain crossings and metastability errors!

  • 8/2/2019 Landoll David Mapld09 Pres 1

    4/32

    Clock Domain Crossing (CDC) ErrorsUnpredictable Loss of Data

    CDC problems corrupt control and data signals

    are subtle, intermittent, unpredictable

    are the 2nd major cause of respins

    are difficult to reproduce and debug

    are temperature, voltage, and process sensitive

    willonly occur in hardware; often in the final design

    Traditional verification techniques do notwork for CDCsignals

    4

    A CDC Verification methodology is needed to reducethe risk of CDC related data errors

    4

  • 8/2/2019 Landoll David Mapld09 Pres 1

    5/32

    Metastability

    What the heck is it, anyway?

    What is a clock? Periodic pulsing signal

    Digital logic uniformly connected to this signal

    Acts as the Symphony Conductor keeps logic in sync

    Action happens across the logic at one specific point Typically the rising edge

    55

    Vee, Vss: GND: 0V

    Vcc, Vdd : +5V, +3.3V

  • 8/2/2019 Landoll David Mapld09 Pres 1

    6/32

    Metastability

    What the heck is it, anyway?

    Whats in a register? (Also known as a latch, flip-flop, etc)

    Contain transistors that trap the input value at the

    appropriate time E.g. rising edge of the clock

    How does this happen?

    66

  • 8/2/2019 Landoll David Mapld09 Pres 1

    7/32

    Metastability

    The Physics of a Register

    Lets take a look at a register CMOS D-type transmission gate flip-flop

    7

    CLK

    D Q

    D

    CLK

    Q0

    0

    0

    -- simple D-type flip-flopprocess(CLK)

    begin

    if rising_edge(CLK) then

    Q

  • 8/2/2019 Landoll David Mapld09 Pres 1

    8/32

    Metastability

    The Physics of a Register

    Lets take a look at a register CMOS D-type transmission gate flip-flop

    8

    CLK

    D Q

    D

    CLK

    Q1

    0

    0

    -- simple D-type flip-flopprocess(CLK)

    begin

    if rising_edge(CLK) then

    Q

  • 8/2/2019 Landoll David Mapld09 Pres 1

    9/32

    Metastability

    The Physics of a Register

    Lets take a look at a register CMOS D-type transmission gate flip-flop

    9

    CLK

    D Q

    D

    CLK

    Q1

    1

    1

    -- simple D-type flip-flopprocess(CLK)

    begin

    if rising_edge(CLK) then

    Q

  • 8/2/2019 Landoll David Mapld09 Pres 1

    10/32

    Transistor Model of a D flip-flop

    Metastability

    The Physics of a Register

    Lets take a look at a register CMOS D-type transmission gate flip-flop

    10

    CLK

    D Q

    D

    CLK

    Q0

    0

    1

    Only works if D has a good value at the rising edge of the clock

    (no Set-up/hold time violations)

    -- simple D-type flip-flopprocess(CLK)

    begin

    if rising_edge(CLK) then

    Q

  • 8/2/2019 Landoll David Mapld09 Pres 1

    11/32

    Metastability

    The Physics of a Register

    When setup/hold conditions are violated, the output of astorage element becomes unpredictable

    This effect is called metastability If not contained, metastability can propagate

    11

    din tff

    MTBFclk

    =3

    fclk = Clock Frequency

    fin = Input Signal Frequency

    td = Duration of critical time window

    CLK

    D

    Q

    CLK

    D Q

    Setup/hold window

    Metastability is UNAVOIDABLE in designs with multipleasynchronous clocks

    11

  • 8/2/2019 Landoll David Mapld09 Pres 1

    12/32

    Clock Domain Crossing signal

    Clock Domain Crossings

    Guaranteed to Cause Metastability

    When 2 or more designs run on disparate clocks: The clocks will continually skew, guaranteeing setup/hold violations

    Signals from one design to another are Clock Domain Crossings (CDCs)

    12

    D

    CLK

    Q

    Sensor System Guidance System

    TxTx

    Clock B

    Clock AClock A

    Setup/hold window

    Signals that crossasynchronous clock

    domains (CDC signals)WILL violate setup and hold

    conditions

    12

    D

    CLK

    Q

  • 8/2/2019 Landoll David Mapld09 Pres 1

    13/32

    Mitigating Clock Domain Crossing Issues

    Problem: Signals crossing a clock domain will violate set-up/hold

    Impact: Control/data signals will be dropped/corrupted Loss of Data

    Approaches: Avoid having systems that have multiple clocks

    Although sensible, its becoming impossible Design around the problem

    Designer can add synchronizers to the design Metastability still happens, but nobody else sees it

    E.g. 2DFF, FIFO, etc.

    Fences in metastability

    13 13

  • 8/2/2019 Landoll David Mapld09 Pres 1

    14/32

    Isolate Metastability: Synchronizers

    Designers add synchronizers to reduce the probability ofmetastable signals

    Synchronizers are sub-circuits that can prevent metastable valuesfrom being sampled across clock domains

    Take unpredictable metastable signals and create predictable behavior

    14

  • 8/2/2019 Landoll David Mapld09 Pres 1

    15/32

    Mitigating Clock Domain Crossing Issues

    Isolate Metastability: Synchronizers

    15

    QQ

    Clock AClock A Clock BClock B

    ii i +1i +1 i +2i +2i -1i -1ii i +1i +1 i +2i +2i -1i -1

    TxTx

    Metastability window

    When metastability occurs, the delay through asynchronizer becomes unpredictable

    RxRx

    ii i +1i +1 i +2i +2i -1i -1 i +3i +3

    15

  • 8/2/2019 Landoll David Mapld09 Pres 1

    16/32

    Invalid Command

    Synchronizer Delays Can Reconverge

    with unexpected resultswith unexpected results

    CDC signals cross with an assumed relationship Can be combinational, sequential, or deeply sequential Unpredictable delays on CDC paths lead to reconvergence errors

    Designs need logic to correctly handle reconvergence Can occur on single-bit or multiple-bit signals

    16

    S1

    S2

    S3 S4

    FSM

    Input

    tx_d0

    tx_d2

    tx_d1

    Sync 2

    Sync 1

    Sync 0

    000

    010

    010

    000

    000

    010

    Valid Command but delayed

    Grey

    Encode

    r

    GreyDecod

    er

    000

    111

    111

    000

    101

    111

    Sync 1

    000

    000

    111

    16

  • 8/2/2019 Landoll David Mapld09 Pres 1

    17/32

    And, Synchronizers Fail if Misused

    Synchronization between clockdomains requires a transferprotocol

    Ensures data ispredictablytransferred between domains

    These protocols mustbe verified When protocol is violated

    Data is lost

    Simulation may notshow a failure

    Silicon willeventually show a

    functional error

    Synchronizer wont function properly if the requiredTransfer Protocol is violated

    17

  • 8/2/2019 Landoll David Mapld09 Pres 1

    18/32

    Verification Must Cover

    All Three CDC Problems

    18

    Clock domain crossings need: Structured synchronization

    Transfer protocols

    Global reconvergence checking

    Reconvergenceproblem

    Missing syncproblem Possible protocol

    problem

    18

  • 8/2/2019 Landoll David Mapld09 Pres 1

    19/32

    Mitigating Clock Domain Crossing Issues

    Problem: Signals crossing a clock domain will violate set-up/hold

    Impact: Control/data signals will be dropped/corrupted

    Approaches: Avoid having systems that have multiple clocks Designer can add synchronizers to the design

    Designer-added synchronizers + full CDC verification Assures synchronizers are present and used correctly

    19 19

  • 8/2/2019 Landoll David Mapld09 Pres 1

    20/32

    Recommendations

    During design planning1. Create systems/designs using 1 clk, 1 edge when possible

    2. If multiple clocks are required, try to use 1 designer for both clock

    domains, and use coding guidelines

    1. Use signal naming conventions

    2. Many clock domain errors come from design changes, not the initial design

    3. Limit clock domain crossings to specific areas or blocks in the design, when possible.

    4. NOTE: These techniques can help assure synchronizers are present, but are unlikely

    to help identify reconvergence or CDC protocol issues.

    3. When multi-clock design is required, plan for proper verification

    1. How to we accomplish this?

    20 20

    My_sig_Tx

    My_data_Tx_RegMy_data_Rx_sync1

    My_data_Rx_sync2

    Comb_sig_Rx

    or Example:

    Append _A_reg to signals leaving A-clk register, _A for A-clk combo signals

    Leverage during code reviews - help identify missing synchronizers

    Make sure ONLY _A_reg signals go to synchronizers (no combo logic)

  • 8/2/2019 Landoll David Mapld09 Pres 1

    21/32

    Verifying CDC Synchronization

    Problem: Missing synchronizers will create metastability

    Correctly placed but misused synchronizers wont work

    Reconvergence of synchronized signals can create unexpected

    behavior Approaches: Simulation

    Digital logic simulators do NOT model transistor behavior Do not model metastability

    21 21

  • 8/2/2019 Landoll David Mapld09 Pres 1

    22/32

    For example

    22

    Q

    D

    CLK

    Simulation captures a 1 while

    silicon produces either a 1 or0

    Setup Violation

    Q

    D

    CLK

    Hold Violation

    Simulation Does NOT Reflect Silicon Behavior

    Q in silicon Q in silicon

    Simulation captures a 0 while

    silicon produces either a 1 or0

    22

    Q in simulation Q in simulation

  • 8/2/2019 Landoll David Mapld09 Pres 1

    23/32

    Verifying CDC Synchronization

    Problem: Missing synchronizers will create metastability

    Correctly placed but misused synchronizers wont work

    Reconvergence of synchronizedControl logic bugs

    Approaches: Simulation Wont model CDCs correctly to detect errors Static Timing Analysis

    Can be used to identify signals that cross domains Can be used as input for a manual review ButWont detect missing or incorrectly used synchronizers, or

    reconvergence

    23 23

  • 8/2/2019 Landoll David Mapld09 Pres 1

    24/32

    Verifying CDC Synchronization

    Problem: Missing synchronizers will create metastability

    Correctly placed but misused synchronizers wont work

    Reconvergence of synchronizedControl logic bugs

    Approaches: Simulation Wont model CDCs correctly to detect errors Static Timing Analysis

    Identifies signals for manual review, but otherwise useless Manual Design Reviews

    Error prone (and very time consuming) Typically only identifies synchronizer structures, misses reconvergence

    and invalid sync protocol usage

    Evidence suggests at least some synchronizers will be missed

    24 24

  • 8/2/2019 Landoll David Mapld09 Pres 1

    25/32

    For ExampleTrivial Reconvergence Error

    Reconverging synchronized CDC signals - timing is unpredictable. Need to verify the downstream logic can handle variations Manually identifying the reconvergence is very hard

    Manually identifying all possible behaviors is harder

    Manually assuring logic will behave correctly typically intractable

    25 25

  • 8/2/2019 Landoll David Mapld09 Pres 1

    26/32

    Verifying CDC Synchronization

    Problem: Missing synchronizers will create metastability

    Correctly placed but misused synchronizers wont work

    Reconvergence of synchronizedControl logic bugs

    Approaches: Simulation - Wont model CDCs correctly to detect errors Timing Analysis -Identifies signals for review, but otherwise useless

    Manual Design Reviews - error prone, incomplete

    Lab Verification?

    Problem is intermittent, debug is impossible Spice simulation? It *does* model transistors, but

    Where will you get the Spice deck? (transistor level model) Would be far too slow on a large FPGA

    26 26

  • 8/2/2019 Landoll David Mapld09 Pres 1

    27/32

    Verifying CDC Synchronization

    Problem: Missing synchronizers will create metastability Correctly placed but misused synchronizers wont work

    Reconvergence of synchronizedControl logic bugs

    Approaches: So - we need a new method that reliably: Identifies ALL CDC signals, structures, reconvergence

    Assures ALL connected, functioning correctly Creates reports for manual reviews The EDA industry has responded

    6 commercial tools now availableand counting Butmost wont identify all 3 of our CDC issues

    27 27

  • 8/2/2019 Landoll David Mapld09 Pres 1

    28/32

    Mentors CDC Verification Technology

    Whos using our technology?Mil-Aero Honeywell, Inc. L-3 Communications Lockheed Martin Co Ministry of Aerospace & Aeronautics

    Northrop Grumman Corp Raytheon Rockwell Collins Inc. SAAB Group Thales Commercial Widely used in commercial space

    The market leader in CDC verificationThe market leader in CDC verification

    28 28

  • 8/2/2019 Landoll David Mapld09 Pres 1

    29/32

    Example Value from One Customer

    Design IEEE standard serial communications core Used in 50-60 other COMMERCIAL ASIC products Widely deployed (millions in use daily) Placed core in a sensor guidance system Found issues in the lab Debugged FPGA for weeks Suspected a CDC issue, but not sure

    Deployed Mentors CDC solution Results same day Found 199 serious CDC bugs!

    45 Missing Synchronizers

    83 Incorrect Synchronizers

    76 Reconverging Signals

    11 other problems Most resulting from more stressful usage In production: Commercial ASIC : Customer issue device is erratic, locks up Avionics: Could result in an Airworthiness Directive

    29 29

    S

  • 8/2/2019 Landoll David Mapld09 Pres 1

    30/32

    Summary

    Recommendations

    During design planning

    1. Create systems/designs using 1 clk, 1 edge when possible

    2. If multiple clocks are required, try to use 1 designer for all clock domains

    3. When multi-clock design is required, plan for proper verification

    During verification

    1. Watch for multiple clocks in designs (Tip Count PLLs)

    2. Ask how CDC issues are mitigated (remember there are 3)

    Utilize commercial tools designed for detecting these problems

    1. Verify all 3 classes of CDC problems

    1. Structural Verification

    2. Protocols Verification

    3. Reconvergance Verification

    2. Use reports to aid manual reviews

    3. Use CDC tools to support ROBUSTNESS

    30 30

  • 8/2/2019 Landoll David Mapld09 Pres 1

    31/32

    In Conclusion

    Every multi-clock design is subject to metastability Traditional verification methodologies CANNOT assurerobustness

    To properly mitigate the dangers of CDC, we stronglyrecommend a solution that :

    Supports Manual Reviews

    Automatically reports all sources of CDC problems

    Has a proven CDC verification methodology &

    customer success

    31 31

  • 8/2/2019 Landoll David Mapld09 Pres 1

    32/32

    32