response to undesired events in software systems presented by joe piccioni
DESCRIPTION
Response to Undesired Events In Software Systems Presented by Joe Piccioni Kim Ushe Mupfumira Senthil Ramanathan Smitha Chunduri. Overview. Definition of Undesired Events (UEs) How to handle UEs Effects of UEs on code complexity Impossible Abstractions - PowerPoint PPT PresentationTRANSCRIPT
Response to Undesired EventsResponse to Undesired Events
InIn
Software SystemsSoftware Systems
Presented byPresented by
Joe PiccioniJoe Piccioni
Kim Ushe Mupfumira Kim Ushe Mupfumira
Senthil RamanathanSenthil Ramanathan
Smitha ChunduriSmitha Chunduri
OverviewOverview
Definition of Undesired Events (UEs)Definition of Undesired Events (UEs) How to handle UEsHow to handle UEs Effects of UEs on code complexityEffects of UEs on code complexity Impossible AbstractionsImpossible Abstractions Direction of Propagation of UEsDirection of Propagation of UEs Common Error IndicationsCommon Error Indications Suggestions for a proper UE handling mechanismSuggestions for a proper UE handling mechanism Degrees of UEDegrees of UE Factors which determine the degree of UEFactors which determine the degree of UE ConclusionsConclusions
What are Undesired Events?What are Undesired Events?
Deviation from normal behaviorDeviation from normal behavior Errors should not be handled but correctedErrors should not be handled but corrected Even with programs proven to be correct Even with programs proven to be correct
UEs at run-time will continue to be a UEs at run-time will continue to be a problemproblem
Routines to respond to UEs must be Routines to respond to UEs must be provided in reliable systemsprovided in reliable systems
Why should we expect UEs?Why should we expect UEs?
Programs written to demonstrate structural Programs written to demonstrate structural programming are written with the programming are written with the assumption that they will always perform assumption that they will always perform correctlycorrectly
Incorrect data or inconsistent data may be Incorrect data or inconsistent data may be supplied to the systemsupplied to the system
Programs are changed from time to time, Programs are changed from time to time, new errors may appearnew errors may appear
What should we do about UEs?What should we do about UEs?
Programs can be defined to take corrective Programs can be defined to take corrective action when UEs occuraction when UEs occur
Often such programs can only be added Often such programs can only be added after a period of useafter a period of use
Structure of the system should allow for Structure of the system should allow for such a likely change or addition to the such a likely change or addition to the program to enhance overall system program to enhance overall system reliabilityreliability
A program’s response to UEsA program’s response to UEs
Attempts self diagnosisAttempts self diagnosis Print diagnosis informationPrint diagnosis information Save partial resultsSave partial results RetryRetry Use of alternative resourcesUse of alternative resources Send a message to the userSend a message to the user
Leveled structureLeveled structure
An UE will be detected by a lower levelAn UE will be detected by a lower level Information available elsewhere (usually at Information available elsewhere (usually at
higher levels) determines the appropriate higher levels) determines the appropriate actionaction
The UE should be communicated to higher The UE should be communicated to higher levels where diagnosis and recovery is levels where diagnosis and recovery is attemptedattempted
UE on code complexityUE on code complexity
Probability of UEs in I/O modules is higherProbability of UEs in I/O modules is higher Straight forward machine language to write Straight forward machine language to write
on a tape is usually simpleon a tape is usually simple Code needed for error detection and Code needed for error detection and
correction makes the program quite correction makes the program quite complexcomplex
As a result change in the normal case is As a result change in the normal case is difficultdifficult
SolutionSolution
Parnas proposes the use of a software analog of a Parnas proposes the use of a software analog of a trap used in hardware systemstrap used in hardware systems
Traps simplify code and decrease probability of Traps simplify code and decrease probability of UEs going undetectedUEs going undetected
The code concerned with recovery from UE is The code concerned with recovery from UE is called by means of a trapcalled by means of a trap
This organization achieves a lexical separation of This organization achieves a lexical separation of normal use, detection, and correction procedures, normal use, detection, and correction procedures, thereby easing changes thereby easing changes
Separating Error handling code from Separating Error handling code from “Regular” code“Regular” code
In traditional programming, error detection, In traditional programming, error detection, reporting, and handling often lead to confusing reporting, and handling often lead to confusing spaghetti code. spaghetti code.
For example pseudo code for a function that reads For example pseudo code for a function that reads an entire file into memory might look like this:an entire file into memory might look like this:
read file {read file {open the file;open the file;determine its size;determine its size;allocate that much memory;allocate that much memory;read the file into memory;read the file into memory;close the file;close the file;}}
This function looks simple enough but it This function looks simple enough but it ignores all of the following errors:ignores all of the following errors:– What happens if the file can’t be opened?What happens if the file can’t be opened?– What happens if the length of the file can’t be What happens if the length of the file can’t be
determined?determined?– What happens if enough memory can’t be What happens if enough memory can’t be
allocated?allocated?– What happens if the read fails?What happens if the read fails?– What happens if the file can’t be closed?What happens if the file can’t be closed?
To answer these questions within your read_function your To answer these questions within your read_function your code would end up looking like this:code would end up looking like this:
error codeType error codeType readFile readFile {{initialize errorCode = 0;initialize errorCode = 0;open the file;open the file;if (theFileIsOpen) { if (theFileIsOpen) {
determine the length of the filedetermine the length of the file;;if (gotTheFileLength) {if (gotTheFileLength) {
allocate that much memory;allocate that much memory;if (gotEnoughMemory) {if (gotEnoughMemory) {
read the file into memory;read the file into memory;if (readFailed) errorCode = -1;if (readFailed) errorCode = -1;
} else errorCode = -2;} else errorCode = -2;} else errorCode = -3;} else errorCode = -3;close this file;close this file;if (theFileDidntclose && errorCode == 0) errorCode = -4;if (theFileDidntclose && errorCode == 0) errorCode = -4;else errorCode = errorCode and -4;else errorCode = errorCode and -4;
} else errorCode = -5;} else errorCode = -5;return errorCode;return errorCode;
}}
With error detection built in your original 7 lines With error detection built in your original 7 lines in red have been inflated to 17 lines of codein red have been inflated to 17 lines of code
Worse there is so much error detection, reporting, Worse there is so much error detection, reporting, and returning that the original 7 lines of code are and returning that the original 7 lines of code are lost in the clutterlost in the clutter
Java provides an easy solution to the problem of Java provides an easy solution to the problem of error managementerror management
Exceptions enable you to write the main flow of Exceptions enable you to write the main flow of your code and deal with the well exceptional cases your code and deal with the well exceptional cases elsewhereelsewhere
If the read_file function used exceptions instead of traditional error If the read_file function used exceptions instead of traditional error management techniques, it would look like this:management techniques, it would look like this:
readFilereadFile { {try { try {
open the file;open the file;determine its size;determine its size;allocate that much memory;allocate that much memory;read the file into memory;read the file into memory;close the file;close the file;
} catch (fileOpenFailed) doSomething;} catch (fileOpenFailed) doSomething;catch (sizeDeterminationFailed) doSomething;catch (sizeDeterminationFailed) doSomething;catch (memoryAllocationFailed) doSomething;catch (memoryAllocationFailed) doSomething;catch (readFailed) doSomething;catch (readFailed) doSomething;catch (fileClosedFailed) doSomething;catch (fileClosedFailed) doSomething;
}}
Note that exceptions do not spare you the Note that exceptions do not spare you the effort of doing the work of detecting, effort of doing the work of detecting, reporting, and handling errorsreporting, and handling errors
What the exceptions do is to separate all the What the exceptions do is to separate all the details of what to do when an UE happens details of what to do when an UE happens from the normal casefrom the normal case
Also the code size and structure is reduced Also the code size and structure is reduced and simplifiedand simplified
Impossible AbstractionsImpossible Abstractions
The need to make an appropriate response often severely The need to make an appropriate response often severely limits the Abstractions we set up.limits the Abstractions we set up.
Programs become less clear when the user can’t write all Programs become less clear when the user can’t write all of their code in terms of the abstract model.of their code in terms of the abstract model.– For practicality reasons, one must compromise the abstraction and For practicality reasons, one must compromise the abstraction and
include a set of degraded designs.include a set of degraded designs.
Parnas’ 2Parnas’ 2ndnd suggestion is to not specify a module to have suggestion is to not specify a module to have properties which UEs frequently violate.properties which UEs frequently violate.
Interfaces must include the necessary operations to Interfaces must include the necessary operations to communicate the occurrence of an UE.communicate the occurrence of an UE.
The Direction of propagation of The Direction of propagation of Undesired EventsUndesired Events
Downward – violates the specified restrictions on the Downward – violates the specified restrictions on the virtual machine. Represents an “Error of Usage”.virtual machine. Represents an “Error of Usage”.
Upward – failure of a properly used mechanism or Upward – failure of a properly used mechanism or reflection of an Undesired Event which was previously reflection of an Undesired Event which was previously sent downward. Represents an “Error of Mechanism”.sent downward. Represents an “Error of Mechanism”.
– Job abortion occurs as a last resort.Job abortion occurs as a last resort. A program should:A program should:
– Recover or,Recover or,
– Adjust it’s external state and report the UE upwards.Adjust it’s external state and report the UE upwards.
Continuation After UE “Handling”Continuation After UE “Handling”The Meta-structure previously The Meta-structure previously described has four advantages:described has four advantages:
Doesn’t violate the principles of information hiding.Doesn’t violate the principles of information hiding. The Uses definition remains valid.The Uses definition remains valid. Allow evolution in a direction of increased reliability.Allow evolution in a direction of increased reliability. Trivial trap routines generally simplify debugging as the Trivial trap routines generally simplify debugging as the
system is integrated.system is integrated.– These routines may only print their own name, but they These routines may only print their own name, but they
can also indicate which module is at fault.can also indicate which module is at fault.– This information in turn will designate who should This information in turn will designate who should
study the problem.study the problem.
Football ExampleFootball ExampleResonsibility HierarchyResonsibility Hierarchy
Upper Level: Head CoachUpper Level: Head CoachResponsible over all.Responsible over all.
Middle Level: Other Coaches and Staff Middle Level: Other Coaches and Staff Responsible for certain groups, preparing certain teams.Responsible for certain groups, preparing certain teams.
Lower Level: Players Lower Level: Players Responsible for own performance.Responsible for own performance.
Modular DesignModular Design
OffenseOffenseO-LineO-Line
BacksBacks
RecieversRecievers
DefenseDefenseD-LineD-Line
LinebackersLinebackers
Corners, SafetiesCorners, Safeties
Special TeamsSpecial TeamsPuntersPunters
KickersKickers
RestRest
Head CoachHead CoachHe or she acts as an interface between each Module.He or she acts as an interface between each Module.
Systematic Approach = Game Plan Systematic Approach = Game Plan Basic OperationsBasic Operations
– Offense: Offense: » Running, Throwing, Catching, BlockingRunning, Throwing, Catching, Blocking
– Defense:Defense:» Tackling, Batting, Cover, PursuitTackling, Batting, Cover, Pursuit
Types of UEsTypes of UEs– Injury, Performance, Penalties, Equipment, Injury, Performance, Penalties, Equipment,
Time Management, Drastic Game Situations, Time Management, Drastic Game Situations, etc.etc.
Responsibility LevelsResponsibility LevelsEvent Types and Handlers Event Types and Handlers
Player Level (lower):Player Level (lower):– Minor injury = Tough it out Minor injury = Tough it out
– Poor performance = Try harderPoor performance = Try harder
– Few penalties = Play smarterFew penalties = Play smarter
Staff Level (middle): Staff Level (middle): » These errors may be detected at player level but staff is These errors may be detected at player level but staff is
responsible for taking the corrective action. responsible for taking the corrective action.
– Severe injury = Substitute playerSevere injury = Substitute player
– Continued poor performance = Switch formationContinued poor performance = Switch formation
– Equipment = ReplacementEquipment = Replacement
Head Coach LevelHead Coach Level
Problems which Staff cannot solveProblems which Staff cannot solve Has information from each module and game situation Has information from each module and game situation
informationinformation Time ManagementTime Management
– Run out the clockRun out the clock– Stop the clockStop the clock
» Call time outCall time out» Spike the footballSpike the football» Run out of boundsRun out of bounds
Drastic Game SituationsDrastic Game Situations– On sides Kick AttemptOn sides Kick Attempt
» Special Teams coach can rely the costs of such an attempt.Special Teams coach can rely the costs of such an attempt.
Points applied to the ExamplePoints applied to the Example
Impossible AbstractionsImpossible Abstractions– Quarterbacks can run (injury and complexity risk)Quarterbacks can run (injury and complexity risk)– Trick playsTrick plays
Interfaces contain operations to communicate UEsInterfaces contain operations to communicate UEs– Head Coach can call plays through a microphone Head Coach can call plays through a microphone
directly to the quarterback.directly to the quarterback.– Staff has complete field snapshots where they can Staff has complete field snapshots where they can
detect important information and bring it to the head detect important information and bring it to the head coach. coach.
Information Hiding Principles still ValidInformation Hiding Principles still Valid– Coaches concerned with only the team they manage. Coaches concerned with only the team they manage. – Players concentrate on their own position.Players concentrate on their own position.
Common error indicationCommon error indication
A list of general conditions where an UE A list of general conditions where an UE could occur.could occur.
Aids in constructing a list which specifies Aids in constructing a list which specifies the limitations of the program and the list of the limitations of the program and the list of UEs which are bound to occur in case of a UEs which are bound to occur in case of a violation.violation.
Aimed at improving ones anticipation of the Aimed at improving ones anticipation of the types of UEstypes of UEs
It is not the comprehensive list of UEsIt is not the comprehensive list of UEs
Common error indications(contd)Common error indications(contd)
Limitations on the values of parametersLimitations on the values of parameters
Example:Example:
1.Entering the value of speed in a stationary bike.1.Entering the value of speed in a stationary bike.
2.Entering you address in a web form.2.Entering you address in a web form.
Capacity LimitationsCapacity Limitations
Example:Example:
1. When maximum weight an elevator can carry is 1. When maximum weight an elevator can carry is exceeded.exceeded.
2. Uploading attachments to your e-mail.2. Uploading attachments to your e-mail.
Common error indications(contd)Common error indications(contd)
Requests for undefined informationRequests for undefined information
Example:Example:
Trying to open a file which doesn’t exist.Trying to open a file which doesn’t exist.
Restrictions on the order of operationsRestrictions on the order of operations
Examples:Examples:
11..A Banking Module which provides functionalities such A Banking Module which provides functionalities such as Inserting, Deleting and Displaying a customer account.as Inserting, Deleting and Displaying a customer account.
2. Trying to access a file before opening the file. 2. Trying to access a file before opening the file.
Common error indications(contd)Common error indications(contd)
Detection of actions which are likely to be Detection of actions which are likely to be unintentionedunintentioned
Examples: Examples: 1. A door in a car is not locked properly1. A door in a car is not locked properly
2. The door of an elevator is not locked 2. The door of an elevator is not locked properly.properly.(here I mean the elevators where the doors are (here I mean the elevators where the doors are manual)manual)
3. Trying to open a file which is already 3. Trying to open a file which is already open.open.
Suggestions on building a proper Suggestions on building a proper UE handling mechanismUE handling mechanism
SufficiencySufficiency Priority of trapsPriority of traps
A single erroneous call may violate several of A single erroneous call may violate several of the applicability conditions.the applicability conditions.
Trapping to several UE routines not efficient.Trapping to several UE routines not efficient. Traps should be prioritized.Traps should be prioritized.
Example:Example:Entering a credit card number when making Entering a credit card number when making
a purchase on web.a purchase on web.
Suggestions………Suggestions………
Size of the “trap vector”Size of the “trap vector” Influence of the state of a function on Influence of the state of a function on
occurrence of a trapoccurrence of a trap
Example:Example:
A doctor who diagnoses a patient before providing any A doctor who diagnoses a patient before providing any treatment .treatment .
Suggestions…..Suggestions…..
Providing Accurate Information about the UE to Providing Accurate Information about the UE to the userthe user
It isIt is difficult because design methods hidden from the difficult because design methods hidden from the user provides the accurate information about the UEuser provides the accurate information about the UE
Two extreme approachesTwo extreme approaches
1. Use of single trap to report failure.1. Use of single trap to report failure.
Disadvantages:Disadvantages:
It is very hard for the user to diagnose the failureIt is very hard for the user to diagnose the failure
Suggestions……Suggestions……
2. 2. Fully detailed where a predicate is associated with Fully detailed where a predicate is associated with each functioneach function..
Predicate is set true if the associated function is affected Predicate is set true if the associated function is affected by the failure.by the failure.
A master predicate which is set to true in case of a A master predicate which is set to true in case of a catastrophic failurecatastrophic failure
Disadvantages: Disadvantages:
Would return true or false for each function call.Would return true or false for each function call.
Highly redundant.Highly redundant.
Suggestions…..Suggestions…..
An optimized approachAn optimized approach Failure trap routines pass a parameter which classifies the Failure trap routines pass a parameter which classifies the
type of error.type of error. Example:Example: errno and strerror(errno) in C language.errno and strerror(errno) in C language. Redundancy and efficiencyRedundancy and efficiency The fully detailed extreme provides a highly insulated The fully detailed extreme provides a highly insulated
module.module.
Suggestions…..Suggestions…..
Redundancy of checks has to be eliminated when Redundancy of checks has to be eliminated when UEs are rare.UEs are rare.
Retaining the upper level checksRetaining the upper level checks
Can detect UEs before any irreversible changeCan detect UEs before any irreversible change
Retaining the lower level checksRetaining the lower level checks Usually Preferred except when it is not difficult to back Usually Preferred except when it is not difficult to back
upup
Incidents Vs CrashesIncidents Vs Crashes
Incidents are events although undesired were Incidents are events although undesired were expected and recovery attempts were successful.expected and recovery attempts were successful.
All other errors are CRASHES!!!!!All other errors are CRASHES!!!!! This distinction is required to allow several This distinction is required to allow several
degrees of undesired events.degrees of undesired events. Recovery is considered to be successful if each Recovery is considered to be successful if each
degree satisfies a set of predicates.degree satisfies a set of predicates. If requirements of degree “i” can’t be met system If requirements of degree “i” can’t be met system
attempts to satisfy degree ‘i+1’.attempts to satisfy degree ‘i+1’.
ExampleExample An error playing a CD:An error playing a CD:
1.1. Check if the case is properly closed.Check if the case is properly closed.
2.2. If the power cord is properly fixed.If the power cord is properly fixed.
3.3. Any internal problem which can be repaired.Any internal problem which can be repaired.
4.4. Any internal part which needs replacement.Any internal part which needs replacement.
5.5. Serious damage which can’t be repaired or Serious damage which can’t be repaired or replaced.replaced.
Degrees of UEDegrees of UE
Allows a programmer to:Allows a programmer to: Define what he expects his program to do.Define what he expects his program to do. What he wants to treat as an incident and What he wants to treat as an incident and
how he is prepared to handle it.how he is prepared to handle it. What he means by correct UE handling.What he means by correct UE handling.
Factors which determine the Factors which determine the degree of an UEdegree of an UE
Basic CauseBasic Cause Find the cause by trying recovery actions. Find the cause by trying recovery actions.
Start with the simplest or cheapest and Start with the simplest or cheapest and when it fails try the next one. when it fails try the next one.
SituationSituation The degree of an undesiredThe degree of an undesired event depends event depends
on the situation at the time the UE on the situation at the time the UE occurred.The degree varies depending on occurred.The degree varies depending on the situation when the UE had occurred.the situation when the UE had occurred.
Order of DegreesOrder of Degrees
Criteria for determining the ordering of Criteria for determining the ordering of degrees can be considered by degrees can be considered by
Order of AimsOrder of Aims Order of ActionsOrder of Actions
Order of AimsOrder of Aims
Situation achieved by degree ‘i’ is less Situation achieved by degree ‘i’ is less desirable than aims of lower degrees.desirable than aims of lower degrees.
““Less desirable “ depends on the goal and Less desirable “ depends on the goal and purpose of the user. They might be different purpose of the user. They might be different for different users.for different users.
Order of ActionsOrder of Actions
Order of degrees may be different even if all Order of degrees may be different even if all degrees may lead to same situation using degrees may lead to same situation using
different methods and costs.different methods and costs. Decision as to which degree should be tried Decision as to which degree should be tried
must be left to the user. must be left to the user. Recovery from an UE requires cooperation Recovery from an UE requires cooperation
of both levels.of both levels.
SolutionsSolutions
Provide different versions of the system Provide different versions of the system (difference lies in their preparation for and (difference lies in their preparation for and recovery from UE’s.)recovery from UE’s.)
Provide recovery actions as operations of Provide recovery actions as operations of the abstract machine.the abstract machine.
Dependable, Feel-Good SoftwareDependable, Feel-Good Software
Systematic approach throughout the system.Systematic approach throughout the system. Abstract interfaces not excessively restrictive.Abstract interfaces not excessively restrictive. Pass failures upward, reflect downward traveling UEs.Pass failures upward, reflect downward traveling UEs. UE consideration requires half (or more) of the UE consideration requires half (or more) of the
programmers effort.programmers effort. The TRAP function should be a separate module The TRAP function should be a separate module
containing the details of inter-level communication.containing the details of inter-level communication.– This communication is hidden from each level.This communication is hidden from each level.
Information about UEs are defined in the level’s abstract Information about UEs are defined in the level’s abstract terms.terms.
The Uses hierarchy is maintained.The Uses hierarchy is maintained. Costs are low as long as no UE occurs.Costs are low as long as no UE occurs.