seng521 (fall seng 521 software reliability & testing overview of software reliability...
DESCRIPTION
SENG521 (Fall Section 1 Basic Concepts & DefinitionsTRANSCRIPT
SENG521 (Fall 2002) [email protected] 1
SENG 521SENG 521Software Reliability & Software Reliability & TestingTesting
Overview of Software Reliability Overview of Software Reliability EngineeringEngineering
Department of Electrical & Computer Engineering, University of Calgary
B.H. Far ( [email protected] )http://www.enel.ucalgary.ca/~far/Lectures/SENG521/01/
SENG521 (Fall 2002) [email protected] 2
ContentsContents About this course.About this course. What is software reliability?What is software reliability? What factors affect software quality?What factors affect software quality? What is software reliability engineering?What is software reliability engineering? Software reliability engineering process.Software reliability engineering process.
SENG521 (Fall 2002) [email protected] 3
Section 1Section 1Basic Concepts& Definitions
SENG521 (Fall 2002) [email protected] 4
Realities …Realities … Software development is a very high risk task. About 20% of the software projects are canceled.
(missed schedules, etc.) About 84% of software projects are incomplete
when released (need patch, etc). Almost all of the software projects costs exceed
initial estimations. (cost overrun)
SENG521 (Fall 2002) [email protected] 5
Software Engineering /1Software Engineering /1 Business software has a large number of parts that have many
interactions (i.e., complexity). Software engineering paradigms provide models and
techniques that make it easier to handle complexity. A number of contemporary software engineering. paradigms
have been proposed: Object-orientation Component-ware Design patterns Software architectures etc.
SENG521 (Fall 2002) [email protected] 6
Software Engineering /2Software Engineering /2 Evolution of software
engineering paradigms: Assembly languages Procedural and structured
programming Object Oriented programming Component-ware Design patterns Software architectures
…… Software Agents
Languages that Languages that have their have their conceptual basis conceptual basis determined by determined by machine machine architecturearchitecture
Languages that Languages that have their key have their key abstractions rooted abstractions rooted in the problem in the problem domaindomain
Increase Increase ofofComplexitComplexityy
time
SENG521 (Fall 2002) [email protected] 7
What Affects Software?What Affects Software? Timeliness:Timeliness:
Meeting the project deadline. Reaching the market at the right time.
Cost:Cost: Meeting the anticipated project costs.
Reliability:Reliability: Working fine for the designated period on the
designated system.
SENG521 (Fall 2002) [email protected] 8
Definition: Failure & AvailabilityDefinition: Failure & Availability Failure: Failure: Any departure of system behavior in
execution from user needs. Failure intensity:Failure intensity: the number of failures per natural
or time unit. Failure intensity is way of expressing reliability.
Availability:Availability: The probability at any given time that a system or a capability of a system functions satisfactorily in a specified environment. If you are given an average down time per failure,
availability implies a certain reliability.
SENG521 (Fall 2002) [email protected] 9
Definition: Verification & ValidationDefinition: Verification & Validation Verification:Verification:
For each development phase or for each module are the outputs and inputs generated correctly? And do they match correctly?
Validation:Validation: Does the software meet its requirements?
SENG521 (Fall 2002) [email protected] 10
Definition: ReliabilityDefinition: Reliability Reliability is the probability that a system or a
capability of a system functions without failure for a “specified time” or “number of natural units” in a specified environment. (Musa, et al.)
A recent survey of software consumers revealed that reliability was the most important quality attribute of the application software.
This course is concerned with the engineering of reliable software products.
SENG521 (Fall 2002) [email protected] 11
About This Course …About This Course … The topics discussed include:
Concepts and relationships; analytical models and supporting tools; techniques for software reliability improvement,
including: fault avoidance, fault elimination, fault tolerance error detection and repair, failure detection and retraction; risk management.
SENG521 (Fall 2002) [email protected] 12
Section 2Section 2Reliability
SENG521 (Fall 2002) [email protected] 13
Reliability: Natural SystemReliability: Natural System Natural system
life cycle. Aging effect:
Life span of a natural system is limited by the maximum reproduction rate of the cells.
SENG521 (Fall 2002) [email protected] 14
Reliability: HardwareReliability: Hardware Hardware life
cycle. Useful life span
of a hardware system is limited by the age (wear out) of the system.
SENG521 (Fall 2002) [email protected] 15
Reliability: SoftwareReliability: Software Software life cycle. Software systems
are changed (updated) many times during their life cycle.
Each update adds to the structural deterioration of the software system.
SENG521 (Fall 2002) [email protected] 16
Software vs. HardwareSoftware vs. Hardware Software reliability doesn’t decrease with
time. Hardware faults are mostly physical faults. Software faults are mostly design faults
which are harder to measure, model, detect and correct.
SENG521 (Fall 2002) [email protected] 17
Reliability: Science Reliability: Science Exploring ways of implementing “reliability”
in software products. Reliability Science’s goals:
Developing “models” and “techniques” to build reliable software.
Testing such models and techniques for adequacy, soundness and completeness.
SENG521 (Fall 2002) [email protected] 18
Section 3Section 3Reliability Engineering
SENG521 (Fall 2002) [email protected] 19
What is Engineering?What is Engineering? Engineering =
Analysis + Design + Construction + Verification + Management
What is the problem to be solved? What characters of the entity are
used to solve the problem? How will the entity be realized? How it is constructed? What approach is used to uncover
errors in design and construction? How will the entity be supported in
the long term?
SENG521 (Fall 2002) [email protected] 20
Reliability: Engineering /1Reliability: Engineering /1 Engineering of “reliability” in software
products. Reliability Engineering’s goal:
developing software to reach the market With “minimum” development time With “minimum” development cost With “maximum” reliability Software
SoftwareQualityQuality
SENG521 (Fall 2002) [email protected] 21
Reliability: Engineering /2Reliability: Engineering /2
Pick quantitative representations for the 3 factors (cost, time and reliability) and measure them!
Software quality means getting the right balance among development cost, development time and reliability.
SREMinimum & MaximumCost, Time, Reliability
Optimum
SENG521 (Fall 2002) [email protected] 22
What is SRE? /1What is SRE? /1 Software Reliability Engineering (SRE) is a multi-
faceted discipline covering the software product lifecycle.
It involves both technical and management activities in three basic areas: Software Development and Maintenance Measurement and Analysis of Reliability Data, Feedback of Reliability Information into the software
lifecycle activities.
SENG521 (Fall 2002) [email protected] 23
What is SRE ? /2What is SRE ? /2 SRE is a practice for quantitatively planning and
guiding software development and test, with emphasis on reliability and availability.
SRE simultaneously does three things: It ensures that product reliability and availability meet
user needs. It delivers the product to market faster. It increases productivity, lowering product life-cycle cost.
In applying SRE, one can vary relative emphasis placed on these three factors.
SENG521 (Fall 2002) [email protected] 24
Section 4Section 4Software Reliability
Engineering (SRE) Process
SENG521 (Fall 2002) [email protected] 25
SRE: Process /1SRE: Process /1 There are 5 steps in
SRE process (for each system to test):
Define necessary reliability
Develop operational profiles
Prepare for test Execute test Apply failure data to
guide decisions
SENG521 (Fall 2002) [email protected] 26
SRE: Process /2SRE: Process /2 The Develop Operational Profiles, and Prepare for
Test activities all start during the Requirements and Architecture phases of the software development process.
They all extend to varying degrees into the Design and Implementation phase, as they can be affected by it.
The Execute Test and Guide Test activities coincide with the Test phase.
SENG521 (Fall 2002) [email protected] 27
SRE: Necessary ReliabilitySRE: Necessary Reliability Define what “failure” means for the product. Choose a common measure for all failure intensities, either
failures per some natural unit or failures per hour. Set the total system failure intensity objective (FIO). Compute a developed software FIO by subtracting the total
of the FIOs of all hardware and acquired software components from the system FIOs.
Use the developed software FIOs to track the reliability growth during system test.
SENG521 (Fall 2002) [email protected] 28
SRE: Operational Profile /1SRE: Operational Profile /1 An operation is a major system logical task,
which returns control to the system when complete.
An operational profile is a complete set of operations with their probabilities of occurrence.
SENG521 (Fall 2002) [email protected] 29
SRE: Operational Profile /2SRE: Operational Profile /2 There are four principal steps in developing an
operational profile: Identify the operation initiators List the operations invoked by each initiator Determine the occurrence rates Determine the occurrence probabilities by dividing the
occurrence rates by the total occurrence rate There are three kinds of initiators: user types,
external systems, and the system itself.
SENG521 (Fall 2002) [email protected] 30
SRE: Operational Profile /3SRE: Operational Profile /3 Review Operational profile:
Review the functionality to be implemented to remove operations that are not likely to be worth their cost
Suggest operations where opportunities for reuse will be most cost-effective Plan a more competitive release strategy using operational development.
With operational development, development proceeds operation by operation, ordered by the operational profile. This makes it possible to deliver the most used, most critical capabilities to customers earlier than scheduled.
Allocate resources for requirements, design, and code reviews among operations to cut schedules and costs
Allocate system engineering, architectural design, development, and code resources among operations to cut schedules and costs
Allocate development, code, and test resources among modules to cut schedules and costs
SENG521 (Fall 2002) [email protected] 31
SRE: Prepare for TestSRE: Prepare for Test The Prepare for Test activity uses the operational
profiles to prepare test cases and test procedures. Test cases are allocated in accordance with the
operational profile. Test cases are assigned to the operations by
selecting from all the possible intra-operation choices with equal probability.
The test procedure is the controller that invokes test cases during execution.
SENG521 (Fall 2002) [email protected] 32
SRE: Execute TestSRE: Execute Test Allocate test time among the associated systems and
types of test (feature, load, regression, etc.). Invoke the test cases at random times, choosing
operations randomly in accordance with the operational profile.
Identify failures, along with when they occur. This information will be used in Apply Failure Data
and Guide Test.
SENG521 (Fall 2002) [email protected] 33
Types of TestTypes of Test Reliability Growth Test
Certification Test
SENG521 (Fall 2002) [email protected] 34
SRE: Apply Failure DataSRE: Apply Failure Data Plot each new failure as it occurs on a
reliability demonstration chart. Accept or reject software (operations) using
reliability demonstration chart. Track reliability growth as faults are
removed.
SENG521 (Fall 2002) [email protected] 35
Collect Field DataCollect Field Data SRE for the software product lifecycle. Collect field data to use in succeeding releases either using
automatic reporting routines or manual collection, using a random sample of field sites.
Collect data on failure intensity and on customer satisfaction and use this information in setting the failure intensity objective for the next release.
Measure operational profiles in the field and use this information to correct the operational profiles we estimated.
Collect information to refine the process of choosing reliability strategies in future projects.
SENG521 (Fall 2002) [email protected] 36
Section 5Section 5Error &Failure
SENG521 (Fall 2002) [email protected] 37
Definition: FaultDefinition: Fault A fault is a cause for either a failure of the program
or an internal error (e.g., an incorrect state, incorrect timing)
A fault must be detected and then removed Fault can be removed without execution (e.g., code
inspection, design review) Fault removal due to execution depends on the
occurrence of associated “failure”. Occurrence depends on length of execution time
and operational profile.
SENG521 (Fall 2002) [email protected] 38
Definition: ErrorDefinition: Error Error has two meanings:
A discrepancy between a computed, observed or measured value or condition and the true, specified or theoretically correct value or condition.
A human action that results in software containing a fault.
Human errors are the hardest to detect.
SENG521 (Fall 2002) [email protected] 39
More DefinitionsMore Definitions Defect:Defect: refers to either fault (cause) or failure
(effect) Service:Service: expected behavior of a software
system Availability:Availability: system uptime divided by the
sum of system uptime and downtime.
DowntineUptimeUptimetyAvailabili
SENG521 (Fall 2002) [email protected] 40
Failure Specification /1Failure Specification /11) Time of failure2) Time interval
between failures3) Cumulative failure
up to a given time4) Failures
experienced in a time interval
Failure no.
Failure times (hours)
Failure interval (hours)
1 10 10
2 19 9
3 32 13
4 43 11
5 58 15
6 70 12
7 88 18
8 103 15
9 125 22
10 150 25
11 169 19
12 199 30
13 231 32
14 256 25
15 296 40
Time based failure specification
SENG521 (Fall 2002) [email protected] 41
Failure Specification /2Failure Specification /21) Time of failure2) Time interval
between failures3) Cumulative failure
up to a given time4) Failures
experienced in a time interval
Time(s) Cumulative Failures
Failures in interval
30 2 2
60 5 3
90 7 2
120 8 1
150 10 2
180 11 1
210 12 1
240 13 1
270 14 1
Failure based failure specification
SENG521 (Fall 2002) [email protected] 42
Failure Specification /3Failure Specification /3 Many reliability modeling programs and
tools based on them (e.g., SMERFS, and CASRE) have the capability to estimate model parameters from either “failure count” or “time interval between failures” data.
SENG521 (Fall 2002) [email protected] 43
Failure Functions /1Failure Functions /1 Cumulative Failure
Function (mean value function) denotes the average cumulative failures associated with each time point.
Failures in time period
Probability Value X Probability
0 0.10 0.00
1 0.18 0.18
2 0.22 0.44
3 0.16 0.48
4 0.11 0.44
5 0.08 0.40
6 0.05 0.30
7 0.04 0.28
8 0.03 0.24
9 0.02 0.18
10 0.01 0.10
Cumulative failure 3.04
Failure distribution
SENG521 (Fall 2002) [email protected] 44
Failure Functions /2Failure Functions /2 Failure Intensity
Function (FIF) represents the rate of change of cumulative failure function.
As faults are removed, failure intensity tends to drop and reliability tends to increase.
SENG521 (Fall 2002) [email protected] 45
Failure Functions /3Failure Functions /3 Meantime to Failure (MTTF): expected time
that next failure will be observed.
R(x) is the reliability.
Meantime to Repair (MTTR): expected time until the system will be repaired.
dxxRMTTF
0
SENG521 (Fall 2002) [email protected] 46
Failure Functions /4Failure Functions /4 Failure Rate Function: the probability that a failure
per unit time occurs in the interval [t, t+Δt] given the failure has not occurred before t.
Meantime Between Failures (MTBF): MTBF = MTTF + MTTR
Availability can also be defined as:
MTBFMTTF
MTTRMTTFMTTFtyAvailabili
SENG521 (Fall 2002) [email protected] 47
Failure Functions /5Failure Functions /5Failure(s) in time period
Probability
Elapsed time(1 hour)
Elapsed time(5 hours)
0 0.10 0.01
1 0.18 0.02
2 0.22 0.03
3 0.16 0.04
4 0.11 0.05
5 0.08 0.07
6 0.05 0.09
7 0.04 0.12
8 0.03 0.16
9 0.02 0.13
10 0.01 0.10
11 0 0.07
12 0 0.05
13 0 0.03
14 0 0.02
15 0 0.01
Mean 3.04 7.77
SENG521 (Fall 2002) [email protected] 48
Reliability ModelReliability Model
ReliabilityReliabilityModelModel
Fault introduction:Characteristics of the product (e.g., program size)Development process (e.g., SE tools and techniques, staff experiences, etc.)
Fault removal:Failure discovery (e.g., extent of execution, operational profile)Quality of repair activity
Environment
SENG521 (Fall 2002) [email protected] 49
ConclusionConclusion Software Reliability Engineering (SRE) can
offer metrics to help elevate a software development organization to the upper levels of software development maturity.