presentation bio return to main menu f9presentation bio return to main menu paper bio f9 friday, nov...
TRANSCRIPT
P R E S E N T A T I O N
International Conference On
Software Testing, Analysis & ReviewNOV 8-12, 1999 • BARCELONA, SPAIN
Presentation
Bio
Return to Main Menu
PresentationPaperBio
Return to Main Menu F9
Friday, Nov 12, 1999
Les Hatton
Testing, Complexity,
Coupling, Diagnosis and
Repetitive Failure
Title Slide
OAKWOOD COMPUTING - SURVIVAL AND AVOIDANCE STRATEGIES FOR SOFTWARE FAILURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
"Testing: the influence of complexity, couplingrepetitive failure and diagnosis"
EuroStar’99, Barcelona, 12th Nov, 1999
by
Les Hatton
Oakwood Computing, Surrey, U.K.and
Computing Laboratory, University of Kent, UK
Version 1.0: 30/Sep/1999
©Copyright, L.Hatton, 1999
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 2)
Overview of talk
! Some observations on testing
! The nature of fault
! Repetitive failure and diagnosis
! Conclusions
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 3)
Observations
! Why do we test ?
! Growing problems
! Risk management
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 4)
Why do we test ?
! The classic view is to find fault. A successfultest causes the system to fail.
! A more modern view is that testing does twothings:-
– Finds faults and
– Quantifies the run-time behaviour" Risk management
" Demonstration of standard of care
" To study severity of system failure
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 5)
Growing problemsfor testing
! Testing is getting harder in modern systemsbecause of:-
– Increasing complexity
– Increasing coupling
– Failures in education
– ‘Reduced time to market’
! The balance between fault finding and riskassessment is also changing because of:-
– Increasing diagnostic problems
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 6)
Increasing complexity
The amount of software in consumer electronicproducts is currently doubling about every18 months.
• Line-scan TVs have ~250,000 lines of C.
• There are around 200,000 lines of C in a car.
• Modern commercial passenger aircraft havebetween 2 and 5 million lines of code in over ahundred computer systems
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 7)
Increasing complexity
Flat system
Function 1
Function 2
Function 3
Function 4
For M modules and N paths per module,complexity - M x N
Tests easy to construct but many required
This architecture is common in real-time systems
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 8)
Increasing complexity
Deep system
Function 1.1.1.1
Function 1.1.1.2
Function 1.1.1
Function 1.1.2
Function 1.1.3
Function 1.1
Function 1
Function 2
For M modules and N paths per module,and binary fan-out of µ,
complexity -
Tests hard to construct but not so many requiredThis architecture is common in conventional systems
N1 − 2µ( )
ln M
ln 2
1− 2µ[ ]
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 9)
Increasing coupling
Coupling is the degree of interdependencebetween otherwise separate systems
• In telecommunications systems, coupling canbe very high
• In consumer appliances such as cars, manycomputer systems communicate with eachother giving high coupling
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 10)
Failures in education
Note the following quotations:-“Our students graduate and move into industry without anysubstantial knowledge of how to go about testing a program.Moreover, we rarely have any advice to provide in ourintroductory courses on how a student should go abouttesting and debugging his or her exercises”.
“Every programmer and programming organisation couldimprove immensely by performing a detailed analysis of thedetected errors, or at least a subset of them of duty of care”
“An efficient program debugger should be able to pinpointmost errors without going near a computer”
The most depressing aspect about these is thatthey were made in 1979 by Glen Myers.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 11)
Overview of talk
! Some observations on testing
! The nature of fault
! Repetitive failure and diagnosis
! Conclusions
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 12)
Easy faults ...
Dereference pointer contents 0x0 at
strlen(...) called from
line 126 of myc_constexpr.c called from
line 247 of myc_evalexpr.c called from
line 2459 of myc_expr.c
This is called a stack trace. It points unerringly at theresponsible code line and usually takes a matter ofmoments to fix. This is why pointer failuresamount for a relatively small amount of failure inreleased systems.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 13)
... and hard faults
...
if ( tolerance == acceptable_tolerance )
...
This is a comparison of real valued variables. It isbroken in most programming languages. In 1982,the author and a colleague spent 14 weeks tryingto find this in the middle of 70,000 lines of signal-processing software, because it occasionallybehaved slightly differently on one machine thananother. An acceptance test found the symptoms.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 14)
How do faults lead to failure ?
! Ed Adams of IBM (1984) found that
– ~33% of all faults only failed < once every 5000execution years
– The most common failures, ( > once every 5years) were caused by only 2% of the faults.
– Any correction had about a 15% chance ofintroducing a problem at least as big into thesystem.
! Pfleeger and Hatton (1997) found (amongstother things) that:-
– static faults and dynamic failure were highlycorrelated in a high reliability system.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 15)
The relationship between faultand failure
All faults
Those faultswhich fail
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 16)
Fault mean free path
! As Voas (1998) points out:-
– A significant number of deliberately injectedfaults caused no change in the externalbehaviour of the program.
Note also that inspections find fault and dynamictesting finds failure.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 17)
Inspection detectable faults inC applications
������
������������
��������
��������
��������
������
����
����
�������
����������
����������
����������
����������
����������
����������
���������
��������
��������
�������
������
������
������
��������
���������
��������
��������
��������
������
��������
���������
��������
������
��������
����������
��������
�������
������
��������
����������
����������
����������
�����������
������������
������������
������������
��������
�������
������
������
����������
����������
��������
������
������������
��������
��������
������
��������
����������
����������
����������
����������
����������
����������
��������
������
������
������
����������������
������
�����������������
���������
������������
������������
������������
����������
�������
������
�����
����
����������
��������
������
������������
������������
����������
������
����
��������
�������
������
����
����
�������
����������
����������
��������
�����
����
����
����
��������
��������
��������
������
�������
����������
����������
��������
�����
�������
���������
������
�������
���������
������
����
����
����
����
����������
��������
����
Wei
ghte
d f
ault
s p
er 1
000
lin
es.
0
5
10
15
20
25
Gra
phic
s
Gen
eral
Ele
c-en
g
Des
ign
Sys
tem
Con
trol
Dat
abas
e
Gra
phic
s
Par
sing
Par
sing
Insu
ranc
e
Uti
liti
es
Uti
liti
es
Uti
liti
es
Con
trol
Com
ms
Com
ms
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 18)
Failure rate of staticallydetectable faults
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
����������������������������������������������
������������������������������������������������
������������������������������������������������
Data derived from CAA CDIS
0
0.5
1
1.5
2
2.5
3
3.5
4
Average
dynamic
testing
Thorough
dynamic
testing
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 19)
The failure density U curve
Defects perKLOC
Average component complexity
For Ada, assembler, C, C++, Cobol, Fortran, Pascal, and PL/M systems:
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 20)
The failure density U curve -invasive truncation
Defects perKLOC
Average component complexity
In those systems where excessive complexity has been restricted, thecurve is truncated. Here a component specification issue is closelyinvolved with deciding the eventual optimal test strategy.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 21)
Some observations on OO(Humphrey’s (1995) data)
������������������
�������������������
�������������������
����������������
����������������
��������������������������������
�����������������
�����������������
������������������
������������������
������������������
������������������
������������������
������������������
������������������
������������������
������������������
������������������
Relative time to fix defects in C++
v. Pascal (Humphrey)
0
10
20
30
40
50
60
Code
review
Unit
testing
After
unit
testing
����������������
Pascal����������������
C++
In OO systems, the cost-detection curve appears to rise much quickersuggesting that inspection-weighted testing will be far moreeffective.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 22)
Some observations on OO(Hatton’s (1998) data)
0
10
20
30
40
50
60
70
80
90
100
< 1
hou
r
< 2
hou
rs
< 5
hou
rs
< 1
0 ho
urs
< 2
0 ho
urs
< 5
0 ho
urs
< 1
00 h
ours
< 2
00 h
ours
C++
C
In these OO systems, around 5% of all failures took an extremely longtime to fix, although they were found easily.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 23)
The nature of fault
! We can conclude:-
– Certain classes of fault yield much easier totesting than others.
– Certain classes of fault have no effect on theprogram’s expected behaviour.
– Certain classes of fault fail but are entirelyavoidable.
– Most faults never fail
– A knowledge of the design is necessary todetermine the best way to test it.
– The balance between fault finding and riskassessment changes with design.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 24)
Overview of talk
! Some observations on testing
! The nature of fault
! Repetitive failure and diagnosis
! Conclusions
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 25)
An observablefact
Software systems are unique amongstengineering systems in that their behaviour isdominated by repetitive failure. Why ?
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 26)
Risk Management
! All studies of real systems failure show:-
– It is overwhelmingly likely that systems will fail
– Systems are dominated by repetitive failure
– Relating failure to a responsible fault or faults isgetting much harder leading to a substantial“don’t know” category - the diagnosis problem.
All this leads us inevitably to recognise thatfailure is an inevitable property of softwaresystems and we should assess the risk by testquantification and plan for its management.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 27)
Risk management
The Patriot missile problem, 1991
• The Patriot missile missed an incoming Scud inthe 1991 Gulf War, which then killed 29 people.
• The tracking software failed because therewere anomalously two different representationsof the constant “0.1”. This in turn was multipliedby system uptime to give a spatial error.
• The defect was found in systems testing toolate to fix.
• The system carried the message, “System mustbe rebooted every 8 hours”. This keeps spatialerror small enough. System was left up 48hours.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 28)
More evidence for repetitivefailure
����������������������
����������������������
������������������������������������������
��������������������
��������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�����������������������������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
������������������������������������������
��������������������
��������������������
��������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������
�������������������Per
cen
t of
pos
t-d
eliv
ery
fail
ure
s
0
5
10
15
20
25
30
35
Maj
or
Med
ium
Min
or
Tes
ting
Doc
umen
tati
on
Una
ssig
ned
This data suggests that major / minor failure ratio is somewherebetween 5% and 10%. Note that fully 1/3 were not diagnosable.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 29)
Diagnosis
Factors inhibiting diagnosis
• System complexity and coupling
• Engineer over-optimism leading to poordiagnostics and hence to poor diagnosis
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 30)
Diagnosis
DiagnosticDistance
DiagnosticQuality
Difficult
Easy
Moderate
Moderate
Poor Good
Close
Distant
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 31)
Moderate, (distant/good)
An example from real life, Airbus A320 AF319,25/8/88, (Mellor (1994)):-• MAN PITCH TRIM ONLY, followed in quick succession by ...
• Fault in right main landing gear
• Fault in electrical flight control system computer 2
• Fault in alternate ground spoilers 1-2-3-5
• Fault in left pitch control green hydraulic circuit
• Loss of attitude protection
• Fault in Air Data System 2
• Autopilot 2 shown as engaged when it was disengaged
• LAVATORY SMOKE
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 32)
Airbus A340 G-VAEL,Sept 1994
Programmers effort:-
Please wait ...
Symptom: The Flight management system hung (andlots of other exciting things).
Translation into English:-
The Flight Management System has crashedand will take slightly less than N of yourearth minutes to reboot. Try whistling.
(This is still a problem after a very large amount of effort).
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 33)
The great local bar disaster
Programmers effort:-
System stressed ...
Symptom: The author’s local bar was unable todispense beer.
Translation into English:-
The printer has run out of paper(Two hours of author and friend, entirely wasted).
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 34)
The computer for the rest of us
Programmers effort:-
More than 64 TCP or UDPstreams open ...
Symptom: The author’s lovely new G3 Mac would notlog onto to his Internet Service Provider
Translation into English:-
The modem is not switched on(Nearly 3 hours wasted).
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 35)
Moderate, (close/poor)
“Button push ignored”
• This appears on the Flight ManagementSystem of a McDonnell-Douglas MD-11, (Drury(1997))
It is not clear what the programmer is trying toconvey. “Paris is the capital of France” wouldhave been equally useful.
• The pilot also noted “The airplane[computer system] manuals were writtenas though by creatures from anotherplanet”.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 36)
The reasons for repetitivefailure
The following factors very commonly appear:-
• Software engineers expect their systems towork rather than accepting their inevitablefailure and planning accordingly
• Software systems are characterised byexceptionally poor diagnosis and frequentlyincomprehensible user manuals
• Software systems are getting much larger andmore tightly coupled in general
• We don’t learn from our mistakes
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 37)
Repetitive failure in the outsideworld
! Just to show that reluctance to learn isn’t solelythe preserve of software engineers ...
– The DC-10 cargo door saga" In the six months prior to the dreadful crash of the Turkish
Airlines DC-10 in Paris in March 1974, there had been no lessthan 1000 cargo door incidents amongst the then 100 strongfleet of DC-10s. They were disregarded.
" In this incident, the cargo door fell off, and the resulting de-pressurisation caused the cabin floor to collapse severing vitalcontrol cables..
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 38)
What we would like to do
DiagnosticDistance
DiagnosticQuality
Difficult
Easy
Moderate
Moderate
Poor Good
Close
Distant
Education
Design
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 39)
The influence of poordiagnosis on testing
! In essence poor diagnosis has the followingimplications for testing
– Testing priority switches in favour of riskmanagement. Tests find defects which cannotbe corrected.
– Testing is not only assessing the reliability of theproduct but its sensitivity. Well-designedproducts respond easily to correctivemaintenance, (and don’t require much of it !).(Think of car engine evolution)
– One of the responsibilities of testing is to assessdiagnosability
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 40)
Testing is an economictrade-off
••
•• •
•
•ROCOF
USAGE TIME
Fitted Curve
Measured reliability
Desired reliability
The point at which testing stops is the point at which it is economicallyviable to stop given the trade-off between early availability and therisk of failure. It is manifestly NOT an engineering decision.
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 41)
Testing is an economictrade-off
Late requirements change Ambition / capability clash
Replacement of testers
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 42)
Overview of talk
! Some observations on testing
! The nature of defect
! Repetitive failure and diagnosis
! Conclusions
© OC Copyright, 1999EuroStar’99: v. 1.0, 30/Sep/1999, (slide 43)
Conclusions
! Testing is getting harder
! Testing is increasingly concerned with riskassessment
– Diagnosis is much harder
– Systems are much bigger and more tightly-coupled
! Testing is itself a diagnostic for thedevelopment process
! Testing should play a much bigger role indesign
Les Hatton
Les Hatton is an independent consultant in software reliability. He isalso Professor of Software Reliability at the Computing Laboratory,University of Kent, U.K. He holds a B.A. (1970) from King's College,Cambridge, an M.Sc. (1971) and Ph.D. (1973) from the University ofManchester, all in mathematics, an A.L.C.M. (1980) in guitar from theLondon College of Music, and an LL.M. in IT law from the University ofStrathclyde (1999). He received a number of international prizes forgeophysics in the 1970's and '80s culminating in the 1987 EuropeanConrad Schlumberger prize for his work in computational geophysics.
Shortly afterwards, he became interested in software reliability, andchanged careers to study the design of high-integrity and safety-criticalsystems on which he has been a keynote speaker at a number of softwareconferences. He is the author of numerous technical papers and booksand is finally nearing completion of another book entitled "SoftwareFailure: avoiding the avoidable and living with the rest. In October1998, he was voted amongst the “world’s leading scholars of systems andsoftware engineering” for the period 1993-1997 by the US Journal ofSystems and Software.