softrel, llc benefits of sre assessment and software fmeas€¦ · qualitative software failure...
TRANSCRIPT
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
BENEFITS OF SOFTWARE RELIABILITY ASSESSMENT AND SOFTWARE FMEAS
Ann Marie Neufelder, SoftRel, LLC, [email protected]
http://www.softrel.com
321-514-4659
1
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.2
Why do people do software reliability assessments? Software FMEAs?
• Software and firmware is growing in size by an average of 10-12% per year according to the General Accounting Office [1]
• With software, you only need one catastrophic failure to escape the testing cycle to effect ROI of the entire system. Software FMEAs can identify the failure modes that are difficult to see in testing but are catastrophic in operation.
• Leading causes of late software deliveries are[2]:• Defect pileup from previous releases resulted in unplanned
maintenance• Maintaining the previous release required unexpected
personnel from the current release which caused it to in turn be late
• Both of these can be predicted and managed with SRE
• Contrary to popular belief, the organizations that (legitimately) deliver on time also deliver with fewer defects• Does not apply to organizations that are on time via half
baked deliveries
• Almost any development practice that keeps the schedule on track, especially, early in development has the potential to also reduce the defects found in operation
2
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.3
Software reliability
3
Quantitative assessment used for planning releases, staffing, and minimizing project risk.
Qualitative software failure modes effects analysis used for identifying what can go wrong with the software early enough to effect design.
Ann Marie Neufelder is a recognized leader in both quantitative and qualitative software reliability
IEEE 1633 Recommended Practices for Software Reliability recommends both quantitative and qualitative approaches discussed herein
• Approved on first ballot with 100% approval by DoD, NASA, NRC, medical devices, energy systems, manufacturing
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.4
Softrel, LLC currently has
largest database of software
development factors versus
reliability versus on time delivery
Since 1993, Softrel, LLC has been benchmarking actual operational and test data from 150+ real software systems based onReliability/ defect density of deployed
softwareProbability of on time deliveryMagnitude of schedule slip when not on
time689 Development practices and inherent
risks associated with the software releaseReliability growth
Every 12-18 months predictive models are recalibrated based on new data
Every 4-5 years models are rerun for new development and testing methods
Software projects span many industries and sizes and range from seriously distressed to world class success
4
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.5
Softrel Database has Industry CoverageOur benchmarking is mainly on engineering systems that
contain software
5
Defense30%
Space4%
Medical8%
Commercial electronics
10%
Commercial software
4%Energy5%
Machinery39%
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.6
Softrel Database has SW Release Outcome Coverage
Our data shows… Contrary to popular myth - on time
delivery and magnitude of late deliveries decrease as defect density decreases (On time is with respect to software engineering estimates) No successful SW project had more than
2 major risks No distressed SW project had fewer than
1 major risk Inherent risks typically can’t be avoided
and include New technology, new product, new
personnel, target hardware that doesn’t yet exist when software is being developed
Most distressed project had 612 times as many defects in operation as most successful project when normalized by effective code size
6
Successfulrelease
Mediocre release
Distressed release
Probability of late delivery (based on SE estimates) 10-25%
25% -85% 100%
Magnitude of late delivery as % of original schedule 12-25%
25%-67%
67%-200%
Defect removal upon operational deployment >= 75% 40-74% < 40%Fielded defect density per normalized effective size 0.04 0.31 1.63
Range of defect density.0056-.089
.090 -
.870..880 to 3.4
No major risks 78% 27% 0%Exactly one inherent risk 11% 64% 50%Exactly two inherent risks 11% 6% 30%Exactly three inherent risks 0% 0% 10%Four or more inherent risks 0% 3% 10%
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.7
Softrel Database software projects over nearly every
size too
Effective size (EKSLOC) – amount of new and modified code plus de-rated amount of reused code
Database includes projects covering entire range of size875 to 1,587,000 lines of new or
modified code for a specific software release. Reused code is typically an
additional 100,000 to 10,000,000 SLOC 1 month project to 7 year project 1 month of labor to 290 years of
labor
Normalized EKSLOC – normalized to one base language so as tocompare projects developed in different languages
7
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.8
How the SRE Assessment Models Works
© SoftRel, LLC 2020. This presentation may not be reprinted in whole or part without written permission from [email protected]
1. Complete assessment
PredictedGroup
World class
Distressed
Very goodAbove average
AverageFair
Impaired
Score
PredictedNormalized
FieldedDefectDensity
PredictedProbability
latedelivery
.011
2.069
.060
.112
.205
.6081.111
10%
100%
20%25%36%85%
100%
2.Defect density Probability late delivery
Identified from corresponding row
Predicted operational defects = Defect density x normalized effective size in KSLOC Predicted failure rate = Predicted defects per month / expected duty cycle.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.9
Key benefits of SRE quantitative assessment
Identify a failed project before it becomes failed. • Basic Softrel models predicts distressed, mediocre and successful before
code is written.• Detailed models allow for sensitivity analysis to identify cheapest and
fastest way to get back on track
Identify failed project early
Predict defect pileup – #1 cause of distressed programs. Predict the best way to schedule releases so as to avoid defect pileup.Predict
Identify effective and ineffective development practicesIdentify
ineffective practices
Select alternatives – Commercial off the shelf/vendor supplied, reinvent, or reuse to reduce effective code size and hence operational defects; replacing ineffective with effective development practices; planning for reliability growth
Select alternatives
Benchmark defect density of components to each other or to othersBenchmark to industry
Assess vendors, subcontractorsAssess
vendors
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
IDENTIFY A FAILED SW
PROJECT WHEN IT’S
EARLY ENOUGH TO
MITIGATE
Reason #1 for SRE assessment
10
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.11
SRE predicts this defect profile, so you can manage defects, size and on time delivery and identify a failed project
Height and width is function of new/modified code (effective size) and development techniques and risk (defect density). Incremental testing can cause multiple peaks.Every distressed project in our DB was not aware that SW failure rate was increasing upon deployment.
Def
ects
Normalized usage time
40 years of history to show that all software releases eventually experience a Rayleigh curve. Only difference is height, width and number of peaks.[3]
Successful deployments release SW from 75% removal onwards
Mediocre programs release SW after peak and before 75% removal
Distressed programs deploy before peak observed
40% defectsremoved
75% defectsremoved
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.12
Projects are often late and unreliable when SRE isn’t used because of underestimates of scope and defect potentialNo one sets out to release software with increasing failure rate
It happens when SRE metrics aren’t used early in project when there is time to do something about it
Team is expecting a small number of defects when the larger number could have been predicted and managed before code was even written
© Softrel, LLC 2014 This presentation may not be copied in part or in whole without written permission from AM Neufelder. 12
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.13
Real world application of
SRE assessment
• Company X had long history of successful software projects
• Until one, which compared to national average was mediocre, but compared to their past history was a failure
• Company X wanted to know why so• They don’t spend money fixing the wrong root
cause• To ensure that history doesn’t repeat itself
• Root cause was identified from SRE assessment• They tried to tackle 4 inherent risks in one release• They learned how to
• Identify the inherent risks that derail the project• Schedule them such that no more than 2 in any
one SW release• 2 smaller releases with 2 risks each is better than
1 large release with 4 risks with regards to both schedule and defects
• Testing longer and adding more people did not solve this problem, breaking the releases into smaller chunks did solve the problem.
13
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
AVOID DEFECT PILEUP WHICH IS #1 CAUSE OF
LATE RELEASEReason #2 for SRE assessment
14
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.15
Defect density/Defect prediction can be used to plan release sizes/frequency to avoid defect pileup Superimpose predicted defects from current and future releases together
15
0
2
4
6
8
10
12
Defects from release #1
0
2
4
6
8
10
12
Defects from Release #2In this example, defects
are piling up from release to release
Solutions to pileup –1) Split features up into more smaller releases
2) Keep the same spacing but less new code in each release
3) Keep the same code size but greater
spacing.Red – OperationalYellow – Formal testGrey – Developer testing
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.16
Real example of avoiding defect pileup In the below real example, “kicking the can” predicted to cause defect pileup
Releases are too far apart initially and too close together at the end
SRE predictions allowed for leveling of features before code was even written
0100200300400500600700800900
Total defects predicted (nominal case) from releases 3 to 7 predicted for each month
Average per month = 132
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
IDENTIFY EFFECTIVE AND INEFFECTIVE
DEVELOPMENT PRACTICES
Reason #3 for SRE Assessment
17
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
Identify which software development characteristics and practices have biggest effect on software reliability Softrel, LLC has mathematically correlated 689 development characteristics related to
below categories to operational defect density
23 characteristics can identify distressed, mediocre and successful
Top 155 characteristics comprise detailed model which supports sensitivity analysis
Assessment identifies gaps as well as predicted improvement when addressing a gap
18
Category of questions Examples
Avoiding big blobs -Decomposition
Code a little, test a little philosophy. Release development/test time < 18 months long. Each developer has a schedule that is granular to day or week.
Domain Expertise Expertise of software engineers as end user or with industry
Inherent risks Government regulations, safety, cyber, untrained end users, etc.
Execution of project Monitoring software progress daily or weekly, identifying risks early, etc.
Personnel Small team sizes, software manager’s who don’t try to manage people and write code
Planning ahead Planning the scope, personnel, equipment, risks before they become problematic
Visualization A picture is worth 1000 words. Specifications with diagrams/pictures/tables are associated with fewer defects than text.
Requirements Developing requirements that aren’t missing anything important
System testing Testing the requirements, design, stresses, lines of code, operational profile
Unit testing Unit testing by every software engineer is mandatary and as per a defined template.Branch coverage tools and metrics.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.19
SRE Assessment considers Neufelder’s LawIn Softrel DB there have been No successful
releases when engineering cycle exceeds 18 months
All successful releases have <=18 monthengineering cycle
When the engineering cycle time is <= 8.5 months few SW projects fail
5/24/17 Software Reliability in Acquisitions 19
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 10 20 30 40 50 60 70 80 90
DEP
LOY
ED D
EFEC
T D
ENSI
TY
MONTHS OF DEVELOPMENT/TEST TIME FOR RELEASE
Engineering cycle time versus defect density
Successful Mediocre Distressed
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
Identify practices that don’t effect defects as much as people think• Twenty five years of research by Softrel, LLC shows….
20
Practices that don’t always reduce defects as much as people think
Why
Code reviews • Engineers rarely look at what is “missing” from the code• Agenda isn’t necessarily related to defects.• Action items aren’t followed up on. • Too much time spent on unimportant code.• Too much time spent on things that could be easily
identified with an automated tool.• Too much time on style -not enough on substance
SEI CMMi assessment ROI plateaus at level 3
Too much focus on formal validation and not enough on developer testing
Organizations forget to do unit and integration testing and focus only on requirements testing which covers < 40% of code
Waiting until the code is done to write the test plan
Test plan is based on what the code does as opposed to what it is required to do. Missing code falls through the cracks.
RM Tools such as DOORs It’s hard to have pictures/diagrams in DOORS. The problem isn’t the tool, it’s the “text” approach to requirements tracing.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
ABILITY TO SELECT ALTERNATIVES
WHILE ALTERNATIVES ARE STILL FEASIBLE
Reason #4 for SRE Assessment
21
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.22
Use SRE assessment to perform sensitivity analysis
© SoftRel, LLC 2020. This presentation may not be reprinted in whole or part without written permission from [email protected]
1. Complete assessment
3. If project can improve to next group before code is written then…
•Average defect reduction = 55%•Average probability late reduction = 25%
PredictedPercentile Group
World class
Distressed
SuccessfulAbove average
AverageBelow average
Impaired
Score
PredictedNormalized
FieldedDefectDensity
PredictedProbability
latedelivery
.011
2.069
.060
.112
.205
.6081.111
10%
100%
20%25%36%85%
100%
2.Find defect density Probability late delivery)
from corresponding row
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.23
Identify alternatives• Software defects found in operation are related to the below things which
can be traded off before code is written.
• Some of these things can be changed early in development.
• However, once code is written, testing longer and delaying schedule or deploying and living with field support is typically only alternative.
Parameter Sensitivity Resolution
EFFECTIVEsize
Cutting the EFFECTIVE size in half will double the MTTF.
Avoid reinventing the wheel with Reuse, COTS, FOSS when possible.
Defect density prediction (assessmentof practices and risks)
Cutting the defect density in half will double the MTTF. Problem is that this may not be possible in the short term. Generally not possible to reduce > 50% in one release.
Replace ineffective practices with effective practices.
Reliability Growth
Increasing test time on target hardware, removed defects and not adding new features during growth has exponential effect.
Deploy smaller releases and grow reliability while next release is under development.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
BENCHMARK DEFECT DENSITY OF COMPONENTS TO EACH OTHER OR TO OTHERS
Reason #5 for SRE Assessment
24
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
Average defect density by industry and application type
IndustryFielded defect density per normalized EKSLOC* 95% confidence
Defense 0.0899 0.0357Space 0.2292Medical 1.0608 0.3946Commercial electronics 0.2373 0.1407Commercial transportation 0.0355Commercial software 0.1681 0.0832Energy 0.6573Machinery 0.7365 0.2675
*This means only defects found in operation (after testing).
ApplicationFielded defect density per normalized EKSLOC* 95% confidence
Vehicle 0.0956 0.0111Satellite 0.1023 0.0565Missiles 0.0108Software only 0.2477 0.2183Equipment 0.7037 0.2481Sensor or FW 0.2292Device 0.3377 0.2591Aircraft 0.0355
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.26
Average Defect density by SEI CMMi level
CMMi level
Predicted fielded defect density
95% confidence
Predicted testing defect density
95% confidence
1 0.548 0.208 3.563 3.142 0.182 0.086 3.554 2.7553,4 or 5 0.1005 0.081 1.356 .351
Note that the Softrel, LLC database did not identify any measurable difference in fielded defect density beyond level 3
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
The demonstrated accuracy of the modelsOne parameter models (that don’t employ an assessment) are more accurate than guessing but not as accurate as with an SRE assessment
Model Demonstrated relative error when used before code is written
Guessing 800%
Industry/application lookup 284% (RSQ = 9.9)
SEI CMMi model overall 450% (RSQ = 6.6)
SEI CMMi level 1 706%
SEI CMMi level 2 49% *
SEI CMMi level >= 2 155% *
All relative error demonstrations depend on accurate and complete inputs
*If, and only if, the SEI CMMi assessment is recent and the organization developing the software is working at that level consistently and throughout
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
The demonstrated accuracy of the Softrel SRE Assessment models
• Relative error for world class is high because of small number defects predicted.• Example: If model predicts 1 defect and 2 are found, the relative error is 100%.• Shortcut model is relatively accurate but provides very little sensitivity analysis• All relative error demonstrations depend on accurate and complete inputs
Model#
of p
aram
eter
sDemonstrated relative error by percentile group when prediction performed before code is written
Overall
Wor
ldcl
ass
Very
good
Abo
ve a
vera
ge
Ave
rage
Belo
w a
vera
ge
Impa
ired
Dis
tres
sed
Full-scale 100 83% 188% 35% 71% 63% 26% 66% 96%Full-scale 208 113% 245% 50% 104% 83% 48% 73% 98%Full-scale 361 131% 302% 82% 103% 102% 23% 79% 81%Shortcut 22 90% 747% 60% 29% 21% 26% 42% 68%
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
ASSESS VENDORS, SUBCONTRACTORS
Reason #6 for SRE Assessments
All models can be used to select and assess vendors/subcontractors
Assessment has one of seven outcomes which can be used relatively to compare one contractor or vendor to another
Several large organizations in industry and Government have used SRE assessment for that purpose
29
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
SOFTWARE FMEAQualitative methods
30
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.31
Software and firmware FMEAs are used to• Identify failure modes for systems that are difficult or expensive to test (i.e.
missiles, spacecraft)
• Identify a small number of catastrophic failures that would be difficult or expensive to identify during testing
• Identify a small number of catastrophic failures that span across multiple systems (i.e. mass produced systems)
• > 50% of operational failures are due to what was not specified and should have been. SFMEA is one of few tools that can identify this.
• Focus on failure space with regards to requirements, design, code, installation scripts, use cases, user manuals. (Reviews rarely focus on anything other than success space)
• Identify alternative processing, fault tolerance, health monitoring systems (HMS)
• Develop test plans that cover both off nominal and nominal cases
31
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.32
A few software and firmware failure modesFailure mode
categoriesDescription
Faulty functionality The software provides the incorrect functionality, fails to provide required functionality, provides extraneous functionality
Faulty timing The software or parts of it execute too early or too late or the software responds too quickly or too sluggishly
Faulty sequence/ order A particular event is initiated in the incorrect order or not at all. Faulty data Data is corrupted, incorrect, in the incorrect units, etc.Faulty error detection and/or recovery
Software fails to detect or recover from a failure in the system
False alarm Software detects a failure when there is noneFaulty synchronization The parts of the system aren’t synchronized or communicating.Faulty Logic There is complex logic and the software executes the incorrect
response for a certain set of conditionsFaulty processing The software behaves improperly after an unexpected shutdownFaulty Algorithms/Computations
A formula or set of formulas does not work for all possible inputs
Faulty usability Software engineers have faulty assumptions about end users. End user’s can’t recover from mistakes they make. User manuals are incorrect, missing or not useful.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.33
Process for performing a Software Failure Modes Effects Analyses is similar to hardware FMEA except for failure modes and viewpoint
Prepare Software FMEA
Define scope Tailor the SFMEA
Generate CILMitigate
Analyze failure modes and root causes
Identify resources
Identify equivalent
failure modes
Identify consequences
Identify local/subsystem/
system failure effects
Identify severity
and likelihood
Identify corrective
actionsIdentify
preventivemeasures
Identify compensating
provisions
Analyze applicablefailure modes
Identify root causes(s) for each failure mode
Generate a Critical
Items List (CIL)
Identify boundary
Set ground
rules
Select View
points
Identifywhat cango wrong
Gather artifacts
Define likelihood
and severity
Select template
and tools
Revise RPN
Identify riskiest
functions
For each use case, use case steps,
requirements, interfaces, detailed design, user manuals,
Installation scripts …(as applicable based
on selected view point)
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.34
SFMEA template
34
Softw
are
unde
r anl
aysi
s
Des
crip
tion
of u
se c
ase,
re
quire
men
t,in
terfa
ce, e
tc.
Failu
re m
ode
Roo
t cau
se
Loca
l effe
ct
Effe
ct o
n su
bsys
tem
Effe
ct o
n sy
stem
Prev
entiv
e m
easu
res
Seve
rity
Like
lihoo
d
RPN
= s
ever
ity* l
ikel
ihoo
dC
orre
ctiv
e ac
tion
Com
pens
atin
g Pr
ovis
ions
Rev
ised
RPN
Test
pro
cedu
re?
Actio
n ite
m?
Faul
t tol
eran
ce?
Failure analysis contains information pertinent to selected viewpoint:1. Use case2. SRS statement3. Interface definition4. Function5. User instructions6. Installation scripts
Consequences section. There can be more effects such as effects on manufacturer, effects on user, etc.
Severity ratings can use scale as HWLikelihood = f(development risk * visibility * past history * install base)
Mitigation section. Some SFMEA rows will feed the test procedures. Some will result in action items. Some will result in fault tolerance.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.35
Failure mode identification is most critical part of SFMEA• Peel the onion approach typically works the best
• What can go wrong with entire software? What’s missing altogether? What happens if a commonly executed function fails? What if software loses track of system state?
• What can go wrong with one use case or feature? What’s missing within the use case? What if steps execute out of order? What if timing is off? What if use case inadvertently executes? Does it conflict with other use cases?
• What can go wrong with one step in a use case or feature? What happens if software shuts down while executing this step of the use case? What happens if data is faulty?
35Entire software system
One use case or feature
One software requirement/s
pecification
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.36
Summary
36
SRE quantitative assessments available via
• Training class (open session, online, on site) – Software Reliability Toolkit provided to every student. Has basic capabilities in macro enabled spreadsheet
• Frestimate software – Has a graphical user interface for the toolkit and has more features for planning and sensitivity analysis
• Services – Ann Marie Neufelder can perform predictions for you until employees are trained. She can also review SRE assessments once employees are trained.
SRE qualitative SFMEA available via
• Practical Applications of Software Reliability, 2014 available on website and Amazon
• SFMEA toolkit automates 400+ failure mode and root cause pairs. SFMEA toolkit and book bundle - $525
• SFMEA training (open session, online, on site)
• SFMEA services – Ann Marie Neufelder can perform SFMEAs for you until employees are trained. Ann Marie can also review the SFMEAs once employees are trained.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.
BACKUP MATERIALAnnex
37
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.38
SFMEA viewpointsFMEA Viewpoints
Software viewpoint
Level of architecture applicable for viewpoint
Failure Modes
Functional The use cases, system and software requirements
The system does not do it’s required function or does the wrong function
Interface The interface design The system components aren’t synchronized or compatible
Detailed The detailed design or code The design and/or code isn’t implemented to the requirements or design
Maintenance A change to the design or code The change to the design or code will cause a new fault in the software
Usability The ability for the software to be consistent and user friendly
The end user causes a system failure because of the software interface
Serviceability The ability for the software to be installed or updated without a software engineer
The software doesn’t operate because it isn’t installed or updated properly
Vulnerability The ability for the software to protect the system from hackers
The software is performing the wrong functions because it is being controlled externally. Or sensitive information has been leaked to the wrong people.
Software production process
The ability for the software engineering process to uncover software faults prior to operational failure events.
The software system has faults that could have been found and corrected prior to operation.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.39
A few examples of big disasters caused by little SW faults
Failure Event Associated software faultSeveral patients suffered radiation overdose from the Therac 25 equipment in the mid-1980s. [THERAC]
A race condition combined with ambiguous error messages and missing hardware overrides.
AT&T long distance service was down for 9 hours in January 1991. [AT&T]
An improperly placed “break” statement was introduced into the code while making another change.
Ariane 5 Explosion in 1996. [ARIAN5]
An unhandled mismatch between 64 bit and 16 bit format.
NASA Mars Climate Orbiter crash in 1999.[MARS]
Metric/English unit mismatch. Mars Climate Orbiter was written to take thrust instructions using the metric unit Newton (N), while the software on the ground that generated those instructions used the Imperial measure pound-force(lbf).
28 cancer patients were over-radiated in Panama City in 2000. [PANAMA]
The software was reconfigured in a manner that had not been tested by the manufacturer.
On October 8th, 2005, The European Space Agency's CryoSat-1 satellite was lost shortly after launching. [CRYOSAT]
Flight Control System code was missing a required command from the on-board flight control system to the main engine.
A rail car fire in a major underground metro system in April 2007. [RAILCAR]
Missing error detection and recovery by the software.
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.40
Ann Marie Neufelder authored current guidance for SFMEAGuidance Comments
Mil-Std 1629A Procedures for Performing a Failure Mode, Effects and Criticality Analysis, November 24, 1980. Cancelled on 8/1998.
Defines how FMEAs are performed but it doesn’t discuss software components
MIL-HDBK-338B, Military Handbook: Electronic Reliability Design Handbook, October 1, 1998.
Adapted in 1988 to apply to software. However, the guidance provides only a few failure modes and a limited example. There is no discussion of the software related viewpoints.
“SAE ARP 5580 Recommended Failure Modes and Effects Analysis (FMEA) Practices for Non-Automobile Applications”, July, 2001, Society of Automotive Engineers.
Introduced the concepts of the various software viewpoints. Introduced a few failure modes but examples and guidance is limited.
“Effective Application of Software Failure Modes Effects Analysis”, November, 2014, AM Neufelder, produced for Quanterion, Inc.
Identifies hundreds of software specific failure modes and root causes, 8 possible viewpoints and dozens of real worldexamples.
IEEE 1633 Recommended Practices for Software Reliability, 2016
Based on AM Neufelder 2014 publication
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.41
GUIDANCE RELATED TO SOFTWARE
RELIABILITY
Guidance Comments
IEEE 1633 Recommended Practices for Software Reliability
2016 document is comprehensive and practical. 2008 document is not.
SAE JA 1002 and 1003 Software Reliability Program Implementation Guide
Useful for developing a software reliability plan. The techniques, however, are discussed elsewhere such as IEEE 1633.
DO178C Software Considerations in Airborne Systems and Equipment Certification
Probably the best software standard for ultra high reliable software.
Rome Laboratory TR-92-52: Software Reliability, Measurement, and Testing, 1992.
Great document but outdated.
DACS Software Reliability Sourcebook
Nice overview but doesn’t discuss predictions
The Handbook of Software Reliability Engineering
Encyclopedia type document.Parts are outdated.
System and Software Assurance Notebook
If combining software and hardware predictions, this is a must have document.
A Survey of Software Reliability Modeling and Estimation by Naval Surface Weapons Center
Contains the theory behind nearly every software reliability growth model
Copyright SoftRel, LLC 2020 This material may not be reprinted in part or in whole without written permission from Ann Marie Neufelder.42
References
[1] US General Accounting Office, “GAO report number GAO-10-706T entitled 'defense acquisitions: observations on weapon program performance and acquisition reforms' which was released on may 19, 2010.Http://www.Gao.Gov/products/GAO-10-706T
[2] A. Neufelder, “The Cold Hard Truth About Reliable Software”, Published by Softrel, LLC, 2016. http://www.softrel.com/truth.htm
[3] Some references includea) J. McCall, W. Randell, J. Dunham, L. Lauterbach, Software Reliability, Measurement, and Testing Software Reliability and Test Integration RL-TR-92-52, Rome Laboratory, Rome, NY, 1992 b) "System and Software Reliability Assurance Notebook", P. Lakey, Boeing Corp., A. Neufelder, produced for Rome Laboratory, 1997.c) Keene, Dr. Samuel, Cole, G.F. “Gerry”, “Reliability Growth of Fielded Software”, Reliability Review, Vol 14, March 1994.
© Softrel, LLC 2014 This presentation may not be copied in part or in whole without written permission from AM Neufelder.
42