a rough guide matthew squair - critical uncertainties · preliminary hazard analysis ... propagate...

64
DEF(AUST) 5679 A Rough Guide Matthew Squair www.criticaluncertainties.com 2 November 2014 1

Upload: truongdien

Post on 04-Jun-2018

234 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

DEF(AUST) 5679A Rough Guide

Matthew Squair

www.criticaluncertainties.com

2 November 2014

1

Page 2: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

1 Introduction

2 The DEF (AUST) 5679 safety requirements

3 System integrity assessment

4 Components design & implementation assurance

5 Equivalence with other standards

6 Independent safety evaluation

7 COTS criticality assessment

8 Experience and feedback

9 Changes

10 Conclusions and references

2

Page 3: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

DEF(AUST) 5679 A User’s guide

Nulka, the first project to use DEF(AUST) 5679

3 Introduction

Page 4: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Why a rough guide?

This guide represents lessons learned in the application of the standard

As all standards are interpreted there is, inevitably, a likelihood that theusers interpretation does not exactly reflect the writers intent

This guide discusses some of the area’s of ambiguity and interpretation, aswell as delving into ’how to do it’

We’ll be discussing Issue 1 of the standard (1998). Issue 2 was publishedin 2008. That’s because that’s when I wrote this document :)

4 Introduction

Page 5: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

The history of the standard

Originally published in August 1998 by the then Army Engineering Agency

Developed by DSTO to address perceived shortfalls in the then extant setof system and software safety standards

A system standard it considers people, hardware and software althoughthe focus is on software

Published after an extensive period of consultation with variousgovernment, defence, industry and academic experts

In practice its used by Army and Navy

Air Force still cleaves to DO-178 for software and MIL-STD-882C forsystem safety

5 Introduction

Page 6: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

A comparison with MIL-STD-882D

MIL-STD-882D is a goal based standard

Focus on broad principles

Not prescriptive as to method and processes applied

Eight general system safety requirements

No detailed requirements, up to the program manager to define

Task descriptions such as Design and Integration in Annex A

Inherently requires tailoring

DEF (AUST) 5679 is a process based standard as the authors did notbelieve the time was right to move to a goal based standard

Does not allow tailoring

Considers operators as an integral part of the system

Process and task specific

6 Introduction

Page 7: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Safety caseProvides detailed requirements for a safetycase

Requires a component level of design(SW, HW & human)

Use of a high level argument (can usegoal structuring notation)

Supporting documents (analyses) e.gPHAR, SHA, SIAR etc

Progressive review (safety assessmentgroup, independent assessor (interim))

Section 11.4 of the standard cross referencesother standard document requirements tothe DEF (AUST) 5679 safety case

7 The DEF (AUST) 5679 safety requirements Safety case

Page 8: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Preliminary hazard analysis

PHA defines system boundary, accidents and their severity

Focus is on the system, it’s interaction with the environment and externalevents and how these can combine to cause an accident

Interfaces establish the system boundary

The set of system functions defines the scope of the system

The system functions also serve to define the system interfaces and thusthe boundary

Anything outside the system which can affect it are termed the operationalcontext

8 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 9: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

HRI and LOT mismatch

Problem: Accident severities may not match the safety program’s ororganisational definition of the severity component of risk

Example - The RAN’s Hazard Risk Index severities versus 5679:

Example (RAN) accident severity and and 5679 mismatch

Problem 1: Only hazards to personnel covered by 5679 not eqpt or mission

Problem 2: Probabilities are not expressed in terms of duration of exposure

9 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 10: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

HRI and LOT mismatch (part II)

What to do about it:

Use subzone in the risk matrix to capture multiple fatalities, i.e. ’Ia’oraugment the risk matrix with a multiple fatalities row

Define the exposure duration for risk

Add equipment and mission targets to LOT definition

10 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 11: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Preliminary hazard analysis (cont’d)

System hazards are identified which are top level system states from whichan accident could result (sometimes in combination with a set of externalevents termed an accident sequence)

Quantitative assessment of hazard likelihood is excluded

Each system hazard generates safety requirements:

System Safety Requirements = Hazard X should not occur

Level of Trust (LOT) assigned for each accident sequence, hazardtuple

LOT = f(severity, external mitigators)

A default LOT is selected if we do not know the probability of ahazard propagating to an accident

The LOT matrix therefore needs to correlate with the organisational riskmatrix contours

11 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 12: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

HRI and LOT matrices mismatch (example)LOT matrix does not correlate to the RAN’s Hazard Risk Index

Treatment contours do not strictly correlate to traditional iso-riskcontours

Difficult to correlate between hazard risk zone and LOT selected

Example (RAN) risk matrix and and 5679 LOT matrix mismatch

12 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 13: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

LOT to risk reduction problems

This makes it difficult to argue from a LOT backwards to show that thesoftware hazard now has a reduced likelihood and therefore a reduced risk

How do we argue from a LOT back to a level of risk reduction?

What then is the safety claim, e.g ’residual risk of a software hazard’,being made?

13 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 14: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Harmonisation of HRI and LOT (example)What to do about it:

Harmonise DEF AUST 5679 LOT zones with the risk matrix

Bin* both severity and probability ranges into qualitative ranges

Define the duration of exposure i.e. X years of system life

Harmonisation of LOT with example (RAN) risk matrix

14 The DEF (AUST) 5679 safety requirements Preliminary hazard analysis

Page 15: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

System hazard analysis (analysis part)

System Hazard Analysis (SHA)

System decomposed to components (human, hardware & software)

Establishes component hazards that could lead to a system hazard

Component level hazard = Component Safety Requirement

Decompose and allocate system safety requirements to components

Component Y should not cause system hazard X

15 The DEF (AUST) 5679 safety requirements System hazard analysis

Page 16: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

System hazard analysis (design part)

SHA includes architectural requirements

Analysis is a bit of a misnomer as SHA contains system design activities

Architecture safety requirements 14.3.10/12

14.3.10 Localise and isolate safety critical functions

14.3.10 Keep as small as possible

14.3.10 Separately identified functions

14.3.10 Keep as simple as possible

14.3.12 Loosely coupled

16 The DEF (AUST) 5679 safety requirements System hazard analysis

Page 17: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Strengths and weaknesses of the process

Thoroughness and completeness (effects to causes)

But your Hazards list scales up with number of components

Does not identify requirements incompleteness/inconsistency as a hazard

Dormant specification fault

Upto 70% of all software related accidents originate with specificationerrors

SHA doesnt address traditional common cause effects

Zonal hazard analysis?

Dependence of hardware (common power supplies, cabling etc)

See DEF STAN 00-56 (the early issues) for how these can be covered

17 The DEF (AUST) 5679 safety requirements System hazard analysis

Page 18: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

DEF(AUST) 5679 accident sequence model

Models are important, they define what we pay attention to and what wedon’t

Standard is not clear on:

Can multiple hazards lead to the same accident?

Can one hazard can lead to multiple accidents?

Hazard propagation within system

Excludes environmental hazards but allows environmental mitigators(which is inconsistent)

18 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 19: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

DEF(AUST) 5679 accident sequence model

DEF(AUST) 5679 original accident sequence

19 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 20: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

An alternative accident sequence model

Work by J.C Laprie as part of IEEE-CS TC on Fault-Tolerant Computing in1970 and IFIP WG 10.4 produced a set of standard terms and definitions:

Fault. A fault is the adjudged or hypothesised cause of an error

A fault is active when it produces an error otherwise it is dormantFaults can be grouped as, design, physical or interaction

Error. An error is that part of the system state that may cause asubsequent failure

Error propagation. Within a given component (i.e., internalpropagation) is caused by the process: an error is successivelytransformed into other errors

Failure. A system failure is a transition event that occurs when thedelivered service deviates from correct service and results in a systemfault state

20 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 21: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

An alternative accident sequence model (Part II)

Giving us:

Hazard. A sequence of faults, events and error states causally linkedto an accident

Accident. A system failure with defined loss conditions (fault states)

Accident sequence. The set of events and intermediate states thatpropagate a system hazard state to an accident

21 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 22: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

An alternative accident sequence (part III)

Accident sequence based on Laprie (IFIP WG 10.4)

22 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 23: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Accident sequence issues

Standard does not allow qualitative assessment of external events

But this may be all you have

Also introduces an inconsistency in arguing between qualitative versusquantitative criteria

What to do about it:

Develop quantitative probability bins and equate them to a qualitativecriteria ala MIL-STD-882

Using the qualitative likelihood reduce the LOT by one level for eachdecade in reduced likelihood

Apply the no more than 2 levels rule to this means of reducing LOT

23 The DEF (AUST) 5679 safety requirements Accident sequence model

Page 24: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Safety Assurance Level (SAL) assignment

There are six SAL’s corresponding to the six LOT’s:

De minimis SAL is equivalent to LOT associated to CSR

Defined by a set of design & analysis techniques

This is also a circular argument, and a weakness of SAL’s in general

Each SAL has design and implementation requirements

Protective measures are assigned to reduce the LOT:

Undesirable to have components with a SAL of 4 or higher

Can only reduce a SAL two levels i.e. SAL5 to SAL3

If you start with a SAL 6 component you can only reduce it to SAL 4

The rules imply redesign of the system architecture

Operator components should not be allocated a SAL higher than 3

If so then their use must be justified in the safety case

24 System integrity assessment Safety integrity levels

Page 25: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Rules for assigning SALs

Can reduce SALs through employing independent components

Independence argument must be part of the justification for a SALassignment that relies on it

Techniques

Redundancy (function and data)

Design redundancy (i.e. dissimilar techniques)I Cannot be used alone to reduce the SAL value

Safety kernel

Use of lower software control categories i.e. requiring operator inputI Can reduce the SAL of the control component value by oneI Cannot be used alone to reduce the SAL two levels below the default

SAL

Software that can be user modified must be assigned a SAL of 0

Again there is an implied architectural design step

25 System integrity assessment SAL assignment rules

Page 26: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

LOT to SAL flowdown

26 System integrity assessment SAL assignment rules

Page 27: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Component design assurance

Design assurances objective is to reduce to the minimum the likelihood ofdesign flaws that could cause hazardous faults

Flaws is an interesting term, used through the standard but not defined

For discussion we’ll treat flaws as a latent design fault

Component System Assurance Levels (SALs)

Formal, rigorous etc terms are defined in standard

Note the amalgamation of SAL levels into common assurance strategies

27 Components design & implementation assurance

Page 28: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

SALs - some questions

Assumption that formality costs more, but some case studies indicate not?

What is the acceptable level of design faults?

Should the safety specification be separated from the softwarespecification?

Is indirect traceability acceptable for design verification?

28 Components design & implementation assurance

Page 29: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Component implementation assurance

The objective of implementation assurance is to reduce to the minimumthe likelihood of detail design and implementation flaws that could causehazardous faults

What are detailed design products? Not specified in the standard

What is detailed, ”that which follows the PDR and precedes CDR”?

See below for a definition of preliminary versus detailed design in an OODcontext

Preliminary versus detailed software design

29 Components design & implementation assurance

Page 30: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Component implementation assurance

Three component classes are defined (does not include procedures)

Three generic safety case patterns are discussed

Allows separation of concerns and a modularised safety case

30 Components design & implementation assurance

Page 31: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Software assurance

Of the three, software is the most well developed

As with design assurance there is a significant watershed at SAL3,4

Notionally each SAL is an order of magnitude greater if so why doesonly one attribute differentiate SAL 5 from 6?

SAL aggregation for tests does not follow that of other softwareattributes

31 Components design & implementation assurance Software assurance

Page 32: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Operator assurance

Only training and skill level requirements are specified

These are not independent, nor are they the only factors!

In the real world we also use:

Currency e.g. ’hours flown lately’ is important especially for experts

Independent certification e.g. ’check rides’ and the like

Personnel selection and psychological aptitude tests

Procedures are not covered as a separate component class

But errors in procedures can lead to hazards and accidents...

Should be given their own CSR

32 Components design & implementation assurance Operator assurance

Page 33: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Operator assurance

Human error is difficult to predict, but not random

We need a reasonable process of identifying possible human error that atleast includes:

A formal model of human error

Error affordances (procedure design, simple errors, mode confusion,decision bias)

Workload including management of the HMI

Performance shaping factors (fatigue, lighting, stress etc)

Team interaction effects

33 Components design & implementation assurance Operator assurance

Page 34: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Operator assurance

What to do about it:

Augment existing assurance requirements including:

I Task analysis and allocation (more than just Fitts list)I Error analysisI Defensive techniquesI Team dynamics/interaction design (including the software agent)

For design conduct human error analysis (informal (S1,S2), semiformal (S3,S4) & formal (S5, S6))

Address procedures in operator SAL tasks, or develop separate CSRfor procedures

Use interfaces to establish human factors contracts between operatorand hardware/software

34 Components design & implementation assurance Operator assurance

Page 35: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Hardware assurance

Even more austere than operator assurance, revolves around hardwareverification from informal to formal

Focuses on custom hardware not Non Development Items (NDI)

NDI hardware implicitly covered in section 11.2 of the standard

For lower integrity levels (S1,2) provide evidence the componentmeets the supplier specification

For higher levels (>S3) treat as a developmental item,but section6.4.2.2 states

Clause 6.4.2.2

For non-custom (NDI) hardware, it is required that reliable, thoroughlytested and robust equipment is used. Such equipment must be analysedwith respect to safety, including the potential interaction betweencomputing systems.

35 Components design & implementation assurance Hardware assurance

Page 36: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Hardware assurance

There appears to be some confusion in the NDI(xOTS) definitions

On paper provides little incentive to select a simple hardware design tospecifically replace a complex software design

36 Components design & implementation assurance Hardware assurance

Page 37: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

NDI integrity level ambiguity resolution

Partial solution to NDI (COTS/MOTS) integrity level ambiguity

See xOTS section for further discussion

37 Components design & implementation assurance Hardware assurance

Page 38: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

General comments and recommendations on SALs

Definition of SAL requirements is not entirely convincing

No transparency in the standard as to how they were arrived at

Focuses on process i.e. test coverage versus outcome i.e. freedom fromfaults (i.e. N faults per 1000 SLOC)

A risk of people ’gaming’ the standard (but DEF(AUST) 5679 is not alonein this, all risk based integrity level standards suffer from this problem)

Simplicity requirement (para 3.3.3) is not linked to SALs

38 Components design & implementation assurance Hardware assurance

Page 39: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

General comments and recommendations on SALs

What to do about it:

Simplify LOT and SAL zones in line with risk matrices such as thosein MIL-STD-882 or DEF STAN 00-56

SAL assignments should be independently assessed and approved tominimise ’gaming’

Define a critical fault density level e.g. ’0.1 per KSLOC’ for SAL 6

Let the developer select the methods to achieve same

Clarify and supply missing definitions if invoking the standard

Require simple software e.g a McCabe Cyclomatic Complexity Metricof < 20 for safety critical modules 1

1But refer to Les Hattens work on what really works as an indicator39 Components design & implementation assurance Hardware assurance

Page 40: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Equivalence with other standards

Its not

Because of 5679s rigour it is difficult to argue equivalence

Some guidance on other standard tasks that may equate to DEF(AUST)5679 is provided in table 1 of Part 2 Section 11

DEF STAN 00-55 (cancelled) is probably closest in process structure

The standard does not allow tailoring making it difficult to argueequivalence at a process level

ALARP principles are not invoked (or SFAIRP)

Does not allow acceptance of higher risk by a higher authority as perMIL-STD-882

40 Equivalence with other standards

Page 41: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Equivalence with other standards

Its not

Because of 5679s rigour it is difficult to argue equivalence

Some guidance on other standard tasks that may equate to DEF(AUST)5679 is provided in table 1 of Part 2 Section 11

DEF STAN 00-55 (cancelled) is probably closest in process structure

The standard does not allow tailoring making it difficult to argueequivalence at a process level

ALARP principles are not invoked (or SFAIRP)

Does not allow acceptance of higher risk by a higher authority as perMIL-STD-882

40 Equivalence with other standards

Page 42: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Equivalence with other standards

Its not

Because of 5679s rigour it is difficult to argue equivalence

Some guidance on other standard tasks that may equate to DEF(AUST)5679 is provided in table 1 of Part 2 Section 11

DEF STAN 00-55 (cancelled) is probably closest in process structure

The standard does not allow tailoring making it difficult to argueequivalence at a process level

ALARP principles are not invoked (or SFAIRP)

Does not allow acceptance of higher risk by a higher authority as perMIL-STD-882

40 Equivalence with other standards

Page 43: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Equivalence with other standards

The equivalence problem is significant

In Australia we spend very little time developing a system from scratch

Mostly we take an existing product and tailor it to our unique needs

The project is not the systems ’parent’ but the parent of a change to thesystem, usually coupled with a different role and environment

Safety programs for modified systems:

start with an existing safety baseline

evaluate whether changes introduce new hazards

evaluate whether changes are removing existing hazard controls

41 Equivalence with other standards

Page 44: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Equivalence with other standards

What to do about it:

Don’t argue equivalence of process:

Instead consider whether the application of another standard hasachieved the required outcome

Base the safety argument around a change management process

Obtain agreement to work with the existing systems safety standard& case from the regulator or independent safety evaluator

Based on the extant safety case it is assumed the system is safe

The intent of any change is to maintain the current levels of (accepted)risk. Analyse changes individually and as a cohort for:

New hazards introduced

The reduction in control and mitigation measures for extant hazards

42 Equivalence with other standards

Page 45: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Independent audit and certification

Auditor function

Independent of the developer and acquirer

Monitors the safety management process

Acts as the independent arbitrator

Tasks the evaluator to evaluate the safety case

Evaluator

Actually a body rather than a single person

Checks the detail and validity of the Developers arguments that thesystem will meet its safety critical requirements

There may also be a certifier role (i.e. TAR, DOS)

43 Independent safety evaluation

Page 46: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Independent audit and certification

Staged and independent evaluation of the safety case

Independence of the evaluator must be demonstrated

No commercial/management relationship with projects safetymanagement group

No involvement in the system development

Formal evaluation reports are required:

Describes evaluation activities

Findings of each activity

Identifies specific recommendations and concerns

Recommendations (Essential, desirable and optional)

Standard implies that these are binding on all parties (possibly contentious)

Needs to be made an explicit clause of the contract

44 Independent safety evaluation

Page 47: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

’x’ Off The Shelf (xOTS)

Many similar terms (COTS, MOTS, GOTS, NDI2 & SOUP3)

DEF(AUST) 5679 does not deal comprehensively with xOTS

So what is ’Off The Shelf’?

COTS is:

sold, leased, or licensed to the general publicoffered by a vendor trying to profit from itsupported and evolved by the vendor, who retains IP rightsavailable in multiple, identical copies used withoutmodification of the internals[Mayer & Oberndorf 01]

The paradigm according to Tim Kelly [Ye & Kelly 04]

The attraction is...

they were developed by someone else

The problem is...they were developed by someone else

2Non Developmental Item3Software of Uncertain Pedigree

45 COTS criticality assessment

Page 48: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

’x’ Off The Shelf (xOTS)

Many similar terms (COTS, MOTS, GOTS, NDI2 & SOUP3)

DEF(AUST) 5679 does not deal comprehensively with xOTS

So what is ’Off The Shelf’?

COTS is:

sold, leased, or licensed to the general publicoffered by a vendor trying to profit from itsupported and evolved by the vendor, who retains IP rightsavailable in multiple, identical copies used withoutmodification of the internals[Mayer & Oberndorf 01]

The paradigm according to Tim Kelly [Ye & Kelly 04]

The attraction is...they were developed by someone else

The problem is...they were developed by someone else

2Non Developmental Item3Software of Uncertain Pedigree

45 COTS criticality assessment

Page 49: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

’x’ Off The Shelf (xOTS)

Many similar terms (COTS, MOTS, GOTS, NDI2 & SOUP3)

DEF(AUST) 5679 does not deal comprehensively with xOTS

So what is ’Off The Shelf’?

COTS is:

sold, leased, or licensed to the general publicoffered by a vendor trying to profit from itsupported and evolved by the vendor, who retains IP rightsavailable in multiple, identical copies used withoutmodification of the internals[Mayer & Oberndorf 01]

The paradigm according to Tim Kelly [Ye & Kelly 04]

The attraction is...they were developed by someone else

The problem is...

they were developed by someone else

2Non Developmental Item3Software of Uncertain Pedigree

45 COTS criticality assessment

Page 50: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

’x’ Off The Shelf (xOTS)

Many similar terms (COTS, MOTS, GOTS, NDI2 & SOUP3)

DEF(AUST) 5679 does not deal comprehensively with xOTS

So what is ’Off The Shelf’?

COTS is:

sold, leased, or licensed to the general publicoffered by a vendor trying to profit from itsupported and evolved by the vendor, who retains IP rightsavailable in multiple, identical copies used withoutmodification of the internals[Mayer & Oberndorf 01]

The paradigm according to Tim Kelly [Ye & Kelly 04]

The attraction is...they were developed by someone else

The problem is...they were developed by someone else

2Non Developmental Item3Software of Uncertain Pedigree

45 COTS criticality assessment

Page 51: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

xOTS ’hazards’

xOTS doesn’t really introduce new software hazards but it does possesstraditional ones to a much greater degree than bespoke software:

xOTS usually does much more than you ask of it

Added complexity both from the COTS item and handlers

So the likelihood of getting into trouble is much greater, usually in theform of emergent interaction style hazards

Some other ’gotchas’ in COTS use:

There may be backdoors and attendant safety/security issues

There is the risk attendant on modifying without insight (integration)

There may be existing design faults, of which you are unaware

46 COTS criticality assessment

Page 52: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

The safety challenge for xOTS

All of the preceding presents some fundamental problems for a safetyprogram:

How do we assure freedom from faults with little to no influence over(or insight) into the product

Manage context hazards i.e integration hazards

Preclude inadvertent interactions

Avoid or manage un-needed complicatedness or functionality

However in other domains COTS items are used all the time. For examplestandard bolts, rivets, lamps etc (usually built to a published standard)

47 COTS criticality assessment

Page 53: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Justifying xOTS safety

Role of historical / operational / usage data:

potentially the most compelling evidence

hard to get enough data, of adequate quality

Rough rules of thumb [Ye & Kelly 2004]. If operated N hours withouthazardous failure

50% confidence will operate safely for next N hours

90% confidence will operate safely for N/3 hours

Assumes random time independent failure (is this valid?)

Critical assumptions:

That the operational environment is representative (Design margins?)

That failures are logged and assessed (Active safety program?)

We neither add or modify functions nor correct design faults

48 COTS criticality assessment

Page 54: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Two distinct ways in which we apply xOTS

1 New start system which we want to future proof

2 Legacy systems which we want to refresh with COTS technology

New start systems lend themselves to architecture solutions

Objective is a fault coverage equivalent to a bespoke system

Dependably modify the system during development and sustainment

Legacy systems are more problematic

Inherently a greater risk

Greater effort involved

May have to modify the COTS component

49 COTS criticality assessment

Page 55: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Simplex Architecture (JSF, NSSN, F16 INSERT)

Use dynamic binding to allow components to be added/deleted at runtime

Replacement unit containers enforce firewalls for address space faults

Provide real time group communication

Dynamic real time resource management using GRMS theory

Use analytic redundancy to assure continued performance if COTScomponent or upgraded component fails

Partition each function into two domains that are analyticallyredundant:

High assurance simple functional kernel - MUST ALWAYS WORK

High assurance container for upgrades/COTS - MUST PERFORM

There are also significant payoffs in terms of maintenance and upgrade

50 COTS criticality assessment Simplex architecture approach - New start

Page 56: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Simplex Architecture (with annotated LOT & SALs)

51 COTS criticality assessment Simplex architecture approach - New start

Page 57: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Contract Based (x)COTS Product Selection (CBCPS)

Technology refresh of an existing architecture [Ye & Kelly 2004]

If we want safe xOTS we have to buy the right (safe) xOTS

Principal of a defined software contract established at acquisition

Three streams of parallel activities

xOTS assessment and acquisition

software engineering

safety engineering

52 COTS criticality assessment Contract Based COTS Product Selection (CBCPS) - Legacy

Page 58: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

CBCPS process adapted to DEF(AUST) 5679

53 COTS criticality assessment Contract Based COTS Product Selection (CBCPS) - Legacy

Page 59: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

CBCPS criticality assessment methodology

General method

Design process is used to generate evaluation criteria for COTS item

There is an explicit decision to use xOTS and manage attendant risksI Proactive approach, we don’t just ’buy then justify’I Assurance evidence will need to be tailored to the SAL

It may be necessary to modify the xOTS item to meet the SALI Modifiability should also be a criteria

54 COTS criticality assessment Contract Based COTS Product Selection (CBCPS) - Legacy

Page 60: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Industrial experience with the standard

The following projects have applied the standard

RAN Nulka EU2 Autopilot

Stonefish Exercise Mine

Navigation Display System

Digital Hydrographic Database

SEA 1429 Collins Class Replacement Combat System (partial)

55 Experience and feedback

Page 61: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Project/industry feedback on the standard

Comments received on Issue 1 and 2 of the standard:

Solid standard for development

Demanding at higher LOTs

How do we mantain 60 page spec in Z?

Initial support from DSTO required

Not as prescriptive as DEF STAN 00-55

NULKA safety case lacked design level

Acceptable for the WSERB?

Issue of financial independence of auditor, evaluators?

Simple and clear on application

How to use in sustainment not clear

Tailored application, endorsed by evaluator

56 Experience and feedback

Page 62: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Mooted changes [Cant et al. 2005]

Accident sequences to be clarified

Use of qualitative estimates of external mitigating factors

Better definition of what is meant by semi-formal and rigorous proofs

Removal of ability to modify the strength of assurance

Terminology changes

57 Changes

Page 63: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Conclusions

Though well thought through, the standard is still based on safety integritylevels, with all their problems:

Circular definition

Focus upon processes rather than outcomes

The number of SALs does not correlate to actions required

Problematice probability:

Ignores qualitative assessments

Does not consider units of duration for probability

Be prepared to tailor

Not much support for acquisition of xOTS solutions

Your certifier will need to approve tailoring of the standard

58 Conclusions and references

Page 64: A Rough Guide Matthew Squair - Critical Uncertainties · Preliminary hazard analysis ... propagate a system hazard state to an accident ... 5679 - A Rough Guide Matthew Squair

Bibliography

[Cant et al. 2005] Cant, T. Mahoney, B. and Atchison, B., Revision of AustralianDefence Standard DEF (Aust) 5679, 10th Australian workshop on Safety RelatedProgramming systems, Sydney, CRPIT, Vol XX. Tony Cant Ed., 2005.

[DEF (AUST) 5679 1998] DSTO, DEF(Aust) 5679, The Procurement ofComputer-Based Safety-Critical Systems, Defence Science and TechnologyOrganisation, Australia, Australian Defence Standard, Issue 1, August 1998.

[Mayer & Oberndorf 01] Meyer B. and Oberndorf P., Managing Software Acquisition:Open Systems and COTS Products Addison- Wesley, 2001.

[Ye & Kelly 2004] Ye, F., Kell, T., Criticality Analysis for COTS Software Components,University of York, York, UK, ISSC 2004.

59 Conclusions and references