on-chip-redundancy on-chip-redundancy according to

22
1 Wind River - Germany Andreas Buchwieser On-chip-redundancy On-chip-redundancy according to IEC61508, Annex E: Analysis of a combination of Software Virtualization Layer And a SPEAR1310 System-On-Chip (ARM Cortex A9 Dual-Core) International TÜV Rheinland Symposium in China Functional Safety in Industrial Applications 18 – 19 October 2011, Shanghai - China

Upload: others

Post on 05-May-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On-chip-redundancy On-chip-redundancy according to

1Wind River - GermanyAndreas Buchwieser

On-chip-redundancy

On-chip-redundancy according to IEC61508, Annex E:Analysis of a combination of Software Virtualization Layer

And a SPEAR1310 System-On-Chip(ARM Cortex A9 Dual-Core)

International TÜV Rheinland Symposium in ChinaFunctional Safety in Industrial Applications18 – 19 October 2011, Shanghai - China

Page 2: On-chip-redundancy On-chip-redundancy according to

2Wind River - GermanyAndreas Buchwieser

Agenda

� Definitions

� Consolidation through Virtualization

� Compliant Item

� Analysis Approach

� Hardware Random Failures

� Common Cause Failures

� Outlook

� Summary

Page 3: On-chip-redundancy On-chip-redundancy according to

3Wind River - GermanyAndreas Buchwieser

Definitions

� VirtualizationAbstraction of computer resources, hiding the physical characteristics

� HypervisorVirtualization platform that allows multiple operating systems to run on a host computer at the same time (Wikipedia)

� Virtual BoardEnvironment for one operating system or bare application; has physical and/or virtual hardware controlled by the Hypervisor

Page 4: On-chip-redundancy On-chip-redundancy according to

4Wind River - GermanyAndreas Buchwieser

Consolidation through Virtualization HFT0

EquipmentUnder

Control PE

Virtualization

Virtual Board 1 Virtual Board 2

COTS OS VxWorks Cert

Application

Virtualization Mechanism - WR Hypervisor

Safe Application

Virtualization

Hardware

Page 5: On-chip-redundancy On-chip-redundancy according to

5Wind River - GermanyAndreas Buchwieser

Safety Standard IEC 61508

� ”Where the software is to implement both safety and non-safety functions, then all of the software shall be treated as safety-related, unless adequate design measures ensure that the failures of non-safety functions cannot adversely affect safety functions.”

� ”..all of the software shall be treated as belonging to the highest SIL, unless adequate independence between the safety functions of the different safety integrity levels can be shown in the design. It shall be demonstrated either (1) that independence is achieved both in the spatial and temporal domain, or (2) that any violation of independence is controlled.”

[source: IEC 61508-3, Edition 2.0, Dated: 2010-04]

Page 6: On-chip-redundancy On-chip-redundancy according to

6Wind River - GermanyAndreas Buchwieser

Consolidation through Virtualization HFT1

PE PE

EquipmentUnder

Control

Virtual Board 1 Virtual Board 3

VxWorks Cert

Virtualization Mechanism - WR Hypervisor

Core 1 Core 2

Safe Application Application

COTS OS

Virtual Board 2

Safe Application

Safe OS

VirtualizationVirtualization

Dual Channel Solution

Compliant Item

Page 7: On-chip-redundancy On-chip-redundancy according to

7Wind River - GermanyAndreas Buchwieser

Safety Standard IEC 61508

� ” ...On-chip redundancy as used in this standard means a duplication (ortriplication etc.) of functional units to establish a hardware fault tolerance greater than zero..”

� ”.. A subsystem with a hardware fault tolerance greater than 0 can be realized using one single IC semi-conductor substrate (on-chip redundancy)..”

[source: IEC 61508-2 Ed2.0, Annex E: , Edition 2.0, Dated: 2010-04]

Page 8: On-chip-redundancy On-chip-redundancy according to

8Wind River - GermanyAndreas Buchwieser

1oo2D Architecture

� Both channels need to demand the safety function

� Diagnostic tests detect faults in either channel

� Voting is adapted

� Faults in both channels -> safe state

Page 9: On-chip-redundancy On-chip-redundancy according to

9Wind River - GermanyAndreas Buchwieser

Safety Requirements Compliant Item

� Used for two kind of safety functions:– Low-demand SIL3

– High-demand / continuous mode SIL2

� Used in a safety function powered on for a long time

� Safe State– OS is informed about a fault

– Voter is informed and is able to achieve or maintain a safe state

� Could execute both safety functions and non safety functions

� Is assumed to be analyzed according to route 1H and 1S

Page 10: On-chip-redundancy On-chip-redundancy according to

10Wind River - GermanyAndreas Buchwieser

� YOGITECH’s fRMethodology is a “white-box” approach to do functional safety analysis and safety-oriented exploration of integrated circuits (IC) in compliance with IEC 61508 and ISO 26262� splitting the circuit in elementary parts (“sensitive zones”)

� Sensitive zones are extracted from the design databasewith automatic tools (to guarantee completeness)

� computing their failure rates

� using those failure rates to compute safety metrics � verifying the results with fault injection

� Transient (single event upset and single event transient)� Permanent (stuck-at, bridging, stuck-open, stuck-on)� Common-cause failures (e.g. clock/PLL and reset faults)

� allowing sensitivity analyses by changing parameters� delivering to customer numbers to compare different architectures

� YOGITECH is responsible of ISO 26262:10 Annex A, about how to apply ISO 26262 to microcontrollers

λSZ = f (λelem, C, D, F,DC)

λelem = elementary failure rate for each fault modelC = probability of faiilure in that sensitive zone in terms of area, n. and type of gates, number of registers, type of interconnections, number of logic levels etc...D = dangerousity of the sensitive zoneF = frequency of use of the sensitive zoneDC = diagnostic coverage for the sensitive zone

YOGITECH‘s fRMethodology

Page 11: On-chip-redundancy On-chip-redundancy according to

11Wind River - GermanyAndreas Buchwieser

Analysis Process*

Safety architecture

HW RandomFailures Common-Cause

FailuresSystematic

Failures

Criticality ranking

DC of availablediagnostics

Identification ofSafety gaps

Improvements

ApplicationMeasures

(End Customer)

PBIT/CBITMeasures

(WINDRIVER/End Customer)

HWMeasures

(Chip provider)

Evidences of lowcriticality

(Chip provideror End Customer)

Annex E of part 2

Improvements

HW Measures(Chip provider or End Customer)

Layout Measures(Chip provider)

Diversity(End Customer/WINDRIVER)

Identification ofSafety gaps

Interferencefreeness

ConfigurationHW/SW/tools

design

Identification ofSafety gaps

Improvements

Application / SWseparation

(WINDRIVER/End Customer)

guidelinesComputation of

metrics

Source*: Yogitech fRMethodology, approved by TÜV SÜD (certificate Z10 06 11 61674 001)

Page 12: On-chip-redundancy On-chip-redundancy according to

12Wind River - GermanyAndreas Buchwieser

Abstract View of Compliant Item Functions

S

S

PEi PEo

PEo

A

WDTe

PEi

DDRCOMM

Vi

ViPEc

PEc

Ve

VMONe

E2PROM

SR

Zone 0: Fully shared HW

Zone 1: Separate HW

Zone 1-: Shared HW usedin time division

Page 13: On-chip-redundancy On-chip-redundancy according to

13Wind River - GermanyAndreas Buchwieser

Allocation of Spear1310 modules to zones

CPU1 CPU 2

GIC

Cache Controller

L2Cache

Power management

SMISerial memor y interface

MPMCDDR2/3 interface

32KB ROM(only for boot)

Reset and clock c ontr ol

GIGA-Ethernet

OTP memory

32KB+4KB SRAM(only for boot)

OTP

SCU

GPT0

GPT1

UART1

UART 2

UART3,4,5 GPT2,3

GPIO

GPIO

UART0

THSENS

EXTERNALWATCHDOG

WDTe

EXTERNALVOTER

Ve

GPIO

safecomm.(input)

actuator

VMONe

THSENS

DDR

E2PROM

MISC

safe inter-board comm.

Separate HW used ONLY by unsaf e VBs

Separate HW used ONLY by saf e VB channel 1

Separate HW used ONLY by saf e VB channel 2

Separate HW used by saf e channels but with some f ully shared HW

Shared HW used by saf e channels in time div ision

Shared HW used by saf e channels in time div ision with some fully shared HW

Fully shared HW (one-channel)

I2C

I2C

RS485

RS485

unsafecomm.

ADC

RTC

HW & VB programsand data

unsafe analogue input

FAST-Ether net ( 1,2,3) unsafe inter-board comm.

safecomm.(output)

unsafe inter-board comm.

unsafe inter-board comm.

Page 14: On-chip-redundancy On-chip-redundancy according to

14Wind River - GermanyAndreas Buchwieser

HW Random Failures: Allocation of Targets

Page 15: On-chip-redundancy On-chip-redundancy according to

15Wind River - GermanyAndreas Buchwieser

HW Random Failures: Steps for quantitative Analysis

� Partition the compliant item in sub-modules

� Identify the fault models for each module and sub-module

� Estimate the failure rates for each sub-module and for each fault model, including the estimate of the amount of safe, no effect and no part failures according IEC 61508 2nd edition for each sub-module

� First estimate of the DC for each sub-module

� Rank the sub-modules in terms of the remaining risk of undetected failures

� First computation of the safety metrics (SFF and PFD/PFH);

� Fix a new target for the DC for each sub-module in order to match the SFF targets defined during safety requirements allocation

� Define the CoUs needed to match those targets.

Page 16: On-chip-redundancy On-chip-redundancy according to

16Wind River - GermanyAndreas Buchwieser

HW Random Failures: Diagnostic Measures in the Compliant Item

Page 17: On-chip-redundancy On-chip-redundancy according to

17Wind River - GermanyAndreas Buchwieser

DM3, DM4: Internal and External Watchdog

Page 18: On-chip-redundancy On-chip-redundancy according to

18Wind River - GermanyAndreas Buchwieser

Common Cause Failures: Critical Requirements IEC 61508-2, Annex E, E.1 and E.2

Page 19: On-chip-redundancy On-chip-redundancy according to

19Wind River - GermanyAndreas Buchwieser

Common Cause Failures affecting redundant channels in MCUs

� Sleeping Faults

� Clock Faults

� Power Faults

� Temperature Faults (hots spots)

� Timing Faults

� Checker FaultsFaults in the Software checker or in the comparator of the dual core

� Cascading FaultsClass of dependent failures

Page 20: On-chip-redundancy On-chip-redundancy according to

20Wind River - GermanyAndreas Buchwieser

Common Cause Failures: Application of BetaICtable IEC61508-2 Annex E.3, Zone 1

Page 21: On-chip-redundancy On-chip-redundancy according to

21Wind River - GermanyAndreas Buchwieser

Summary

� Diversity required� Diversity at application level between the channels� Diversity at Hypervisor level between the channels

� CCF� More details on the MCU structure required or� Specific structure of the external watchdog needed

� Complexity of SW tests� Need to reach high coverages for MCU, bus interconnect� Not possible to reach with simlpe SW based on MCU instruction manual� Approach needs

� Software Tests with MCU-aware approach� Verification Strategy (fault-injection)

� Safety Manual Quality� Guidelines for SW Test solid and verified upfront� Proper verification strategy� End customer will reach claimed coverage following guidelines

Page 22: On-chip-redundancy On-chip-redundancy according to

22Wind River - GermanyAndreas Buchwieser

Outlook

Phase A1.0

Phase A1

Phase A2/A3

Go/nogo

Detailed specificationof what has to be covered by

the diagnostic measures(BIT, WDT-FPGA, etc.)

Product detailedspecification &implementation

Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)

Collection of evidences

Verification of CCF(e.g. analysis of

SPEAr1310 layout)

Detailed specificationof what has to be covered by

the diagnostic measures(BIT, WDT-FPGA, etc.)

Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)

Collection of evidences

Detailed specificationof what has to be covered by

the diagnostic measures(BIT, WDT-FPGA, etc.)

Verification of CCF(e.g. analysis of

SPEAr1310 layout)

Verification of productimplementation (V&V of BIT,WDT-FPGA etc. coverage)

Collection of evidences

Initial analysisof SPEAr1310-based

logic solver

Detailed specificationof what has to be covered by

the diagnostic measures(BIT, WDT-FPGA, etc.)

Ski

p ap

plic

atio

n di

vers

ity

Product certification