software safety engineering (s2e) program status dan fitch march 7, 2001

37
Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Upload: lionel-jordan

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety Engineering(S2E) Program Status

Dan Fitch

March 7, 2001

Page 2: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety Program - Overview

General Safety Concepts - WHY

Software Safety and CLCS - HOWKnown HazardsDesigning for SafetySafety & Reliability Thread

Current Status

Page 3: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety – What is it?

Limit

LimitAnticipate

Limit

Limit

Detect

ControlLimit

Limit

Mitigate

RateSlope

AbsoluteValue

Prevent Limit DamageReturn to Safe State

Page 4: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety – What is it?

DefinitionsFunctionally-critical

Mission completionSafety-Critical

Humans = Life & LimbHardware = $106

Some set theoryInput versus output

Page 5: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Some Theory…

Set ofInputs ()

Set ofOutputs

Unknowns ()

KnownKnown

SafeUnsafe

AssumedSafe

Sources: Normal Operation Hardware Failures Human Intervention Models/Simulators

Page 6: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety – Why do it?

Direction:DoD Mil-Std-882D, DoD-Std-2167

NASA NSTS-07700, NSS-8719.13, NASA-GB-1740.13, NSS-

22206, NSS-22254, Direction from Dan Goldin

CLCS 84K-00055, KDP-P-2901

Page 7: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Safety – Why do it?

Objective: Identify & Mitigate Risk

Known Fault Scenarios – by requirements, analyses & test

Possible Unknowns – by design approach & further test

Page 8: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

“Knowns”

Hardware fault-driven scenarios

Legacy of hardware failure data available from the 1970’s

Hardware-driven hazards May be analyzed – the SSAMay be tested – specific fault injection

Identifies Risk & Yields Design Changes – Issues/ESRs

The Safety Case – Summary of Risk Findings

Page 9: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

“Unknowns”

“Stuff” Happens

Software doesn’t fail – It just doesn’t do what we thought it would

Hardware and some functions (e.g., seeds & races) cause most random errors

Specification & Coding errors = Prime Cause90% of errors are in the specificationsC++ and Java are inherently powerful, but

dangerous

Page 10: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Farengi Software Safety Rule #76

If it "touches*" hardware that can impact the safety of people or equipment, an SSA is absolutely necessary.

*(i.e., controls, monitors, or mitigates therisk of using)

Page 11: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

SSA - What and When

Assessment of risk factors due to softwareHardware Hazards SFMEA and SFTAKDP-P-2901

Schedule: 30 days before the first interaction with Flight HardwareIn time for 5A/B TestingPresented at TRR/ORR

Page 12: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

Detail Design

Code Development

Conceptual Design

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

Page 13: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

TRR/ORR

Detail Design

Code Development

Val/VerTest

5A/B(WithHdwr)

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

ReadinessReviews

Page 14: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

PHA

TRR/ORR

Detail Design

Code Development

Val/VerTest

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds ReadinessReviews

Page 15: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

PHAFTA/

FMEA

TRR/ORR

Detail Design

Code Development

Val/VerTest

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds ReadinessReviews

Page 16: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

PHAFTA/

FMEARisk

Assessment

TRR/ORR

Detail Design

Code Development

Val/VerTest

CH

AW

S*

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrixH

azar

ds

*CHAWS = CLCS Hazard Analysis Worksheet

ReadinessReviews

Issu

es

Page 17: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

System Safety Analysis

PHAFTA/

FMEARisk

Assessment

SSA Report

TRR/ORR

Detail Design

Code Development

Val/VerTest

CH

AW

S*

Issu

es

5A/B(WithHdwr)

KDP-P-2901 SSA Process

Conceptual Design

SystemTest

IPT/DP-1 SRS/DP-2 DDS/ODS/DP-3

3A/B4A/B

S-CMatrix

Risk

CM-Driven Changes

Haz

ards

*CHAWS = CLCS Hazard Analysis Worksheet

ReadinessReviews

Issu

es

Page 18: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Software Fault Tree Analysis

Works backward from the fault to its root causesUses design details of the entire systemLeads to better understanding of causes and their

preventionUnknown fault events not considered

Page 19: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Fault Tree Analysis

Top Event Fill Valve not closed

Other Root

Cause

Human did not notice

pressure

S/W did not react to over pressure

Basic Fault EventsIntermediate Events

S/W did not anticipate rapid

pressure rise

Causal RelationshipAND

Page 20: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Analysis & CLCS Architecture

HardwareSafing

System S/W

Sys Srvcs

Apps Srvcs

Applications

RemainingRisk

Hazardous Event

Control &Mitigation

Detection &Anticipation

Page 21: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

The Software FMEA

Predicted hardware failures followed to their conclusion through the softwareWhat can go wrong?What happens when it does?

Must know system failures up frontWon’t prevent the unexpected

Page 22: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

CLCS

Spiral Development Cultural Changes

Failure of software Test

Page 23: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

SSA – Traditional Approach

Failure Modes& Effects Analysis

Fault Tree

Analysis

Traditional Development

•All or most code available•A lot known about the system•Too late…

Page 24: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

SSA - An Iterative Process

Safety Criticality Assessment

EngineeringDesign Changes

Failure Modes& Effects Analysis

Fault Tree Analysis

Spiral Development

Page 25: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

S&MA will perform a Software Safety Analysis (SSA) for each Delivery and every location; i.e., as we step up to each new drop.

After the initial SSA, an update of the analysis and a new SSA report will be done for each modification to the safety critical software.

SSA - Where

Page 26: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

SSA - Planning

On a Pert chart, the SSA preparation activity will begin during the preparation of the design specifications and have a finish-to-finish relationship with the validation/verification (4A/B) testing.

Design Begin … Val/Ver Test

PHAFTA

FMEARisk Assessment

SSAReport

Page 27: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Farengi Software Safety Rule #304

The SSA isn’t enough.

Page 28: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

CLCS

Spiral Development Cultural Changes

Failure of software Test

Page 29: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Paradigms

Software Failures:

“Software does not fail - it just does not perform as intended”

Dr Nancy Leveson, MIT

Page 30: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Paradigms

Design and test for functionality:

Also specify what the system

should not do.

Then test it.

Page 31: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Some Theory… 2nd Look

Set ofInputs ()

Set ofOutputs

Unknowns ()

KnownKnown

SafeUnsafe

AssumedSafe

Sources: Normal Operation Hardware Failures Human Intervention Models/Simulators

Fault Injection(added known)

Page 32: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Design for Safety

“Program and Project Responsibilities”Dan Goldin message:

Safety is more than FMEA and FTASafety must be designed in at the earliest

Existing SpecificationsMust include safety

Methods & techniques for mitigation of hazardsRequirements – Traceable and Testable

Page 33: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Initiatives

Dan Goldin: “Design for Safety”Smart Practices applied early to designs

Early engineering changes are cheaperProvide draft guidance for design of safety-critical

softwareProcess changes

Design Guidelines – NASA-GB-7410.13Peer reviews – enhanced checklistTest development – Fault Injection for Robustness

Works to prevent unforeseen fault scenarios

Page 34: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Objectives

Known fault scenarios – AnalysisRedesignTest – functionality and robustness

UnknownsDesign them out of the systemTest – fault injection

Page 35: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

S/W Safety – Where we are.

Safety-Critical software identified & in engineering review

Software Safety Integration Team formedSoftware FTA/FMEA in work

Will be recurring due to spiral development

Design for Safety concepts being integratedSafety & Reliability Thread introducedPost-SSA Analysis Tools being procured

Page 36: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

S/W Safety – What’s Next?

Today“Design for Safety” and “Known Fault

Analyses”Tomorrow

Recursive and bi-directional analysesReliability predictions, Markov, Numerical

Integration, Weibull analysis techniquesProbabilistic fault injection techniques

Page 37: Software Safety Engineering (S2E) Program Status Dan Fitch March 7, 2001

Summary

Life on the Leading Edge

Probably the “Largest real-time safety-critical control system on the planet”

Safety is our #1 core value

We are on front and center stage – The NASA team is watching