a simplified approach to fault tolerant state machine design for single event upsets melanie berg

26
A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Upload: brook-turner

Post on 18-Jan-2018

260 views

Category:

Documents


0 download

DESCRIPTION

D219/MAPLD 2004SLIDE 3Berg Definition of Fault Tolerance n Masking or recovering from erroneous conditions in a system once they have been detected n The degree of fault tolerance implementation is defined by your system level requirements… I.e. what actually is acceptable behavior upon error n Questions that must be answered within the system requirements documentation: — Does your system only need to detect an error? — How quickly must the system respond to an error? — Must your system also correct the error? — Is the system susceptible to more than one error per clock cycle?

TRANSCRIPT

Page 1: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets

Melanie Berg

Page 2: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 2

Overview

Presentation describes “Hardened by Design” techniques at a high level of abstraction… FGPA/ASIC logic Design

Background— Definition of Fault Tolerance— State Machines— Synchronous Design Theory

Proposed Method of SEU detection Proposed Method of SEU correction

Page 3: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 3

Definition of Fault Tolerance

Masking or recovering from erroneous conditions in a system once they have been detected

The degree of fault tolerance implementation is defined by your system level requirements… I.e. what actually is acceptable behavior upon error

Questions that must be answered within the system requirements documentation:

— Does your system only need to detect an error?— How quickly must the system respond to an error?— Must your system also correct the error?— Is the system susceptible to more than one error per clock cycle?

Page 4: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 4

Synchronous Design with Asynchronous Events

This discussion focuses on sequential Single Event Upsets (SEUs) within a synchronous design environment.

The SEU is considered a soft (temporary) error which has occurred due to a DFF being hit by a charged particle.

Configuration or SRAM errors will not be considered Although the design is synchronous, it is very

important to note that the SEU is an asynchronous event…

— Generally not taken into account— Metastability and unpredictable events can occur— Can invoke a SEFI

Page 5: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 5

Common Fault Tolerant Implementation

Triple Mode Redundancy (TMR) is the most commonly implemented solution of SEU tolerance.

— Why …. Because it is a very simple solution

In many cases it is not implemented correctly

— Glitches within the TMR voting logic (due to mitigation across separate clock domains or hazardous combinational logic) must be taken into account incase a SEU occurs near a clock edge

TMR can be very area extensive

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

DVotingLogic

Page 6: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 6

Glitches in TMR Circuitry: Example

TMR Circuit

Counter

A

B

COutSig E

32 bits

sysclkReset

For this example, C will be hit byan SEU, the TMR logic should

stay stable. However, poorTMR circuitry was synthesizedand a glitch occurs on OutSig

If Outsig glitches near aclock edge, unpredictableresults within the counter

occur

Page 7: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 7

Glitchy TMR Circuitry Continued

Page 8: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 8

Proposed EDAC Methodology

Goal: The proposed EDAC techniques are:— Targeted for synchronous Finite State Machine Designs— Less area extensive than TMR— Glitch Free and synchronous: Reduces the rate of SEFI

Note: Synchronous Design techniques referred to in this presentation are derived from the ASIC industry and are implemented using HDL…

— DFF data inputs should not change within the setup and hold of the DFF: Metastability and unpredictable functionality will occur

— Within a synchronous design, metastability will only happen at clock domain crossings…Must use metastability filters (synchronizers) to protect against these Asynchronous events

— Synchronous design theory minimizes clock boundary crossings— This is a challenge when SEUs can occur at any point in time

anywhere in the circuit

Page 9: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 9

Synchronous State Machines

A Finite State Machine (FSM) is designed to deterministically transition through a pattern of defined states

A synchronous FSM utilizes flip-flops to hold its currents state, transitions according to a clock edge and only accepts inputs that have been synchronized to the same clock

Generally FSMs are utilized as control mechanisms Concern/Challenge:

— If an SEU occurs within a FSM, the entire system can lock up into an unreachable state: SEFI!!!

Page 10: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 10

Synchronous State Machines

The structure consists of four major parts:— Inputs— Current State Register— Next State Logic— Output logic

Cu

rren

t State

Outpu

ts

Inputs

Clock

Next State

Page 11: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 11

Encoding Schemes

Each state of a FSM must be mapped into some type of encoding (pattern of bits)

Once the state is mapped, it is then considered a defined (legal) state

Unmapped bit patterns are illegal states

IDLE

GetData

ProcessData

SendData

BadData

Start=0

Start=1

Example:Five states need to be mapped.There is only one input: Start

Page 12: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 12

Encoding Schemes

1 1 0

STATES (5):

IDLE :000GET_DATA :001PROCESS_DATA:010BAD_DATA :011SEND_DATA :100

1 0 0

Registers: binaryencoding

Good state : SEND_DATA

Bad state: unmapped

1 0 0

0 0 0

Registers: OneHot encoding

1

1

Good state : SEND_DATA

Bad state: unmapped

STATES (5):

IDLE :00001GET_DATA :00010PROCESS_DATA:00100BAD_DATA :01000SEND_DATA :10000

0

0

Page 13: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 13

Safe State Machines???

A “Safe” State Machine has been defined as one that:— Has a set of defined states— Can deterministically jump to a defined state if an illegal

state has been reached (due to a SEU). Synthesis tools offer a “Safe” option (demand from our

industry):TYPE states IS ( IDLE, GET_DATA, PROCESS_DATA, SEND_DATA, BAD_DATA );SIGNAL current_state, next_state : states;attribute SAFE_FSM: Boolean;attribute SAFE_FSM of states: type is true;

However…Designers Beware!!!!!!!— Synthesis Tools Safe option is not deterministic if an SEU

occurs near a clock edge!!!!!

Page 14: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 14

Binary Encoding: How Safe is the “Safe”

Attribute?

If a Binary encoded FSM flips into an illegal (unmapped) state, the safe option will return the FSM into a known state that is defined by the others or default clause

If a Binary encoded FSM flips into a good state, this error will go undetected.

— If the FSM is controlling a critical output, this phenomena can be very detrimental!

— How safe is this?

Page 15: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 15

Safe State Machines???

1 1 0STATES (5):

IDLE :000TURNON_A :001TURNOFF_A :010TURNON_B :011TURNOFF_B :100

1 0 0

Using the “Safe” attribute will transition the user toa specified legal state upon an SEU

Good State

Illegal State:unmapped

0 0 1

Using the “Safe” attribute will not detect the SEU:This could cause detrimental behavior

Good State:TURNON_A

0 1 1

legal State: TURNON_B

State(1) Flips upon SEU:

012

012

Page 16: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 16

One-Hot vs. Binary

There used to be a consensus suggesting that Binary is “safer” than One-Hot

— Based on the idea that One-Hot requires more DFFs to implement a FSM thus has a higher probability of incurring an error

This theory has been changed!— Most of the community now understands that although One-

Hot requires more registers, it has the built-in detection that is necessary for safe design

— Binary encoding can lead to a very “un-safe” design

Page 17: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 17

Proposed SEU Error Detection: One-Hot

One-Hot requires only one bit be active high per clock period

If more than one bit is turned on, then an error will be detected.

Combinational XNOR over the FSM bits is sufficient for SEU detection… even if a SEU occurs near a clock edge

A MUX can be used to transition the current state into a defined “ERROR STATE” if the parity check fails

If the system can not receive Multiple Event Upsets within one clock period, then the circuitry can never flip into a legal state (illegally)!

Next State

Current S

tate

Outputs

Inputs

Clock

XNORcombinational logic

Error State Pattern

MU

X

QQS

ET

CLR

D

QQS

ET

CLR

D

Metastability filterto synchronize

SEU

Page 18: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 18

FSM SEU: Error Correction : Using Companion States

There exists many publications on Error Correction theory.

None directly address how to correctly implement FSM fault correction while using current day synthesis tools.

— Glitch control: Generally synthesis tools will produce “glitchy” logic

— Synthesis “optimization” algorithms will erase the necessary redundancy for EDAC

— The user must sometimes hand instantiate logic— The user must place the necessary attributes to avoid

redundant logic erasure.

Page 19: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 19

Error Correction within One Cycle: Using Companion

States We’ll base the derivation off of a 4 state FSM:

STATEAOusig=’1'

STATEBOutsig = ‘0’

STATECOutsig=’0'

STATEDOutsig=’0'

Intrans=’1'

Intrans=’0'Original FSM

Page 20: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 20

Error Correction within One Cycle: Using Companion

States 1.       Find an encoding such that the states have a

hamming distance of 3 (at least 3 bits must be different from state to state)...

— 00000 (state-A), — 11100(state-B), — 01111(state-C), — 10011(state-D). — Five bits are necessary to encode a four-state machine

in order to achieve the required hamming distance of three.   

Page 21: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 21

Error Correction within One Cycle: Using Companion

States2. For each encoding, calculate the companion

encodings such that the hamming distance is one… for example:— Companion encoding for state A (00000) is:

00001,00010,00100,01000,10000 — Companion encoding for state B (11100) is:

11101,11110,11001,10100,01100

Page 22: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 22

Error Correction within One Cycle: Using Companion

States When implementing the state machine, state A is encoded as 00000

and then (theoretically) “OR-ed” with all of its companion encodings. This covers all possible SEUs

Do the same for all other states Use the output of the “OR-ed” states to determine next state logic.

— Thus if a bit flips… the companion state will catch it and the FSM will be able to correctly determine the next state

Be careful! The “OR” logic is more complex than simply using a string of “OR” gates.

Page 23: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 23

Error Correction within One Cycle:

Glitch Control One major issue that is extremely overlooked is SEUs

occurring near clock edges If this occurs, your error checking logic may cause a

glitch Due to routing timing differences, this can cause

incorrect values to be latched into the current state registers.

Refer to a Karnaugh Map for glitch-less implementation

The designer may have to hand instantiate the logic if the synthesis tool does not adhere to the VHDL as expected

Page 24: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 24

Error Correction within One Cycle: Glitch Control

State(0)

State(3)

State(1)

State(2)

00 01 11 10

00

01

11

10

1 1 1

1

StateA companion states SOP (including State(4) dimension):

State(0)State(1)State(2)State(3) + State(0)State(1)State(2)State(4) +

State(0)State(1)State(3)State(4) + State(0)State(2)State(3)State(4) +

State(1)State(2)State(3)State(4)

1

Page 25: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 25

Error Correction within One Cycle:

Glitch Control The designer will have to include the synthesis

directives in order to turn off the tools “optimization”:

— Preserve_driver— Preserve_signal

Always check the gate level output of the synthesis tool.

Page 26: A Simplified Approach to Fault Tolerant State Machine Design for Single Event Upsets Melanie Berg

Berg D219/MAPLD 2004SLIDE 26

Conclusion

This presentation proposes methods of Fault Tolerant State Machine implementation due to potential IC SEU susceptibility.

Be aware of potential glitches due to asynchronous SEUs occurring near a clock edge…

— Mitigation Techniques must be Glitch Free!— Mitigation may need a synchronization circuit— Due to metastability and routing delay differences, can be more

catastrophic than expected Special directives must be used in order to drive the synthesis tools

when implementing fault tolerant redundant logic because the tools are generally focused on area and speed optimization.