a new systemic approach - gnssn home ne… · goals for a systemic approach •need to expand our...
TRANSCRIPT
A New Systems-Theoretic Approach to Safety
Dr. John Thomas
Outline
• Goals for a systemic approach
• Foundations
• New systems approaches to safety
– Systems-Theoretic Accident Model and Processes
– STPA (hazard analysis)
– CAST (accident analysis)
Goals for a systemic approach
• Need to expand our view of safety – Safety is dependent on many aspects – Technical, human, organizational, etc. – Need to understand the whole system of interactions
• Need to build in safety from the start – Versus waiting to assure a finished design is safe
• Handle challenges in modern systems – Traditional approaches developed for relatively simple
electro-mechanical systems – Software and digital complexity make exhaustive testing
impossible – Role of humans is changing – Unanticipated and unexpected emergent system behavior
Goals for a systemic approach
• Need to expand our view of safety – Safety depends on many aspects – Technical, human, organizational, etc.
• Technical factors – Easy to focus on independent random failures – But other technical problems pose growing challenge – Design errors – Incomplete/incorrect requirements
• Esp. accidents from software operating as required
– Incorrect assumptions
Technology
The problem doesn’t exist in any single component
It exists in the interactions among many components
Goals for a systemic approach • Need to expand our view of safety
– Safety depends on many aspects
– Technical, human, organizational, etc.
• Software – Doesn’t “fail” like hardware
– “Curse of software”
– Most software-related accidents result from flawed requirements
Technology
Goals for a systemic approach • Need to expand our view of safety
– Safety depends on many aspects – Technical, human, organizational, etc.
• Human behavior / social factors – Human error more than random component failure – Need to look deeper than human-machine interface Must consider: – “Clumsy automation”, mode confusion, etc. – How technology might induce human error – Human error often a symptom of deeper trouble (Dekker)
• To fix, need to understand why it would make sense at the time
Technology Human
China Airlines 006 • Autopilot compensates for single engine malfunction • Autopilot reaches max limits, aircraft turns slightly • Pilots not notified Autopilot at its limits • Pilots disengage autopilot for manual control
– Controls return to default – Aircraft immediately nosedives
Goals for a systemic approach
Technology
Human (operations, engineering,
etc.)
• Need to expand our view of safety – Safety depends on many aspects – Technical, human, organizational, etc.
• Engineering and development – Engineers are human too! – Design/requirements errors are another form of
human error – Fixing design/requirements problems is not enough
• What about the processes that created them and analysis methods that overlooked them?
Goals for a systemic approach • Need to expand our view of safety
– Safety depends on many aspects – Technical, human, organizational, etc.
• Stakeholders: vendors, regulators, contractors, public, etc. • Organizational, managerial, leadership, culture, etc.
– Clearly impact safety, but too easily ignored – How can we anticipate these influences? – How do we include them in a systemic approach?
Technology Human /
Social
Organizational
Goals for a systemic approach
• Need a hollistic view of safety
– Cannot consider these factors in isolation
– Highly dependent on interactions
– These are complex socio-technical systems
– Social must be integrated with the technical
STAMP: a systems approach (Nancy Leveson)
• A new view of safety based on systems theory • Treat safety as a dynamic control problem
– Safety requires enforcing constraints on system behavior – Accidents occur when interactions among components
violate those constraints – Safety a control problem, not just failure problem
• Captures dysfunctional interactions and unsafe system behavior – Whether due to failures, design errors, flawed
requirements, human behavior, etc. – Includes unanticipated and unexpected behaviors – Includes systemic factors for accidents
Nancy Leveson, 2012, Engineering a Safer World
Safety as a control problem
• Examples – O-ring did not control propellant gas release in field
joint of Challenger Space Shuttle
– In HPCI example, did not adequately control the flow of water into the plant
– At Fukushima, did not control the release of radioactivity from the plant
– Software did not adequately control descent speed of Mars Polar Lander
Controlled Process
Process
Model
Control
Actions Feedback
STAMP
• Controllers use a process model to determine control actions
• Accidents often occur when the process model is incorrect
• Four types of hazardous control actions: 1) Control commands required for safety
are not given 2) Unsafe ones are given 3) Potentially safe commands but given too
early, too late 4) Control action stops too soon or applied
too long
Controller
14
Explains software errors, human errors, component interaction accidents, components failures …
© Copyright John Thomas 2013
Example
Safety
Control
Structure
Control structure examples (from completed analyses)
HPCI/RCIC
Safety Control Structure
More Detailed Control Structure
Cyclotron
Proton Therapy Machine High-level Control Structure
Beam path and control elements
© Copyright John Thomas 2013
Gantry
Proton Therapy Machine High-level Control Structure
© Copyright John Thomas 2013 Antoine PhD Thesis, 2012
Proton Therapy Machine Control Structure
© Copyright John Thomas 2013 Antoine PhD Thesis, 2012
Chemical Plant
ESW p354
Image from: http://www.cbgnetwork.org/2608.html
© Copyright John Thomas 2013
Captures interactions between Management, Operations,
Technology, Engineering, etc.
Ballistic Missile Defense System
Image from: http://www.mda.mil/global/images/system/aegis/FTM-21_Missile%201_Bulkhead%20Center14_BN4H0939.jpg
Safeware Corporation
Extremely complex system
But the complexity is managed
U.S. pharmaceutical safety control
structure
Image from: http://www.kleantreatmentcenter.com/wp-content/uploads/2012/07/vioxx.jpeg
© Copyright John Thomas 2013
CAST and STPA
Accidents are caused by inadequate control
27
How do we find inadequate control in a design or accident?
STPA Hazard
Analysis
STAMP Model
© Copyright John Thomas 2013
CAST Accident Analysis
Nancy Leveson, 2012, Engineering a Safer World
Systems Theoretic Process Analysis (STPA)
• Method of applying STAMP for a design
• Integrates safety into system engineering – Can drive design from the beginning of project
(more efficient)
• Can also analyze hazards in existing design
• Starts at very high-level of abstraction – Scalable to extremely complex systems
• Can help identify unexpected accident scenarios
STPA (System-Theoretic Process Analysis)
• Identify accidents and hazards
• Construct the control structure
• Step 1: Identify unsafe control actions
• Step 2: Identify causal factors and control flaws
29
Controlled process
Control Actions
Feedback
Controller
STAMP Model
STPA Hazard Analysis
Identifying Unsafe Control Actions
Control
Action
Not providing
causes hazard
Providing
causes
hazard
Incorrect
Timing/
Order
Stopped Too
Soon / Applied
too long
STPA Step 2
31
Inadequate Control Algorithm
(Flaws in creation, process changes,
incorrect modification or adaptation)
Controller
Process
Model (inconsistent,
incomplete, or
incorrect)
Control input or external information wrong or missing
Actuator Inadequate operation
Inappropriate, ineffective, or
missing control action
Sensor Inadequate operation
Inadequate or missing feedback Feedback Delays
Component failures
Changes over time
Controlled Process
Unidentified or out-of-range disturbance
Controller
Process input missing or wrong Process output contributes to system hazard
Incorrect or no information provided
Measurement inaccuracies
Feedback delays
Delayed operation
Conflicting control actions
Missing or wrong communication with another controller
Controller
Is it Practical? • STPA has been or is being used in a large variety of industries
– Nuclear and Electrical Power
– Spacecraft
– Aircraft
– Air Traffic Control
– UAVs (RPAs)
– Defense
– Automobiles (GM, Ford, Nissan?)
– Medical Devices and Hospital Safety
– Chemical plants
– Oil and Gas
– C02 Capture, Transport, and Storage
– Etc.
• Analysis of the management structure of the space shuttle program (post-Columbia)
• Risk management in the development of NASA’s new manned space program (Constellation)
• NASA Mission control ─ re-planning and changing mission control procedures safely
• Food safety
• Safety in pharmaceutical drug development
• Risk analysis of outpatient GI surgery at Beth Israel Deaconess Hospital
• Analysis and prevention of corporate fraud
Social and Managerial
Is it Practical? (2)
Does it Work?
• Most of these systems are very complex (e.g., the U.S. Missile Defense System)
• In all cases where a comparison was made:
– STPA found the same hazard causes as the old methods
– Plus it found more causes than traditional methods
• All components were operating exactly as intended but complexity of component interactions led to unanticipated system behavior
• Examples: missing case in software requirements, timing problems in sending and receiving messages, etc.
– Sometimes found accidents that had occurred that other methods missed
– Cost was orders of magnitude less than the traditional hazard analysis methods
One Example:
• Blood Gas Analyzer (Vincent Balgos) – 75 scenarios found by FMEA
– 175 identified by STPA
– Took much less time and resources (mostly human)
• FMEA took a team of people months to perform
• STPA took one person two weeks (and he was just learning STPA)
– Only STPA found scenario that had led to a Class 1 recall by FDA (actually found nine scenarios leading to it)
Automating STPA
36
Hazardous Control Actions
Formal (model-based) requirements
specification Hazards
• Can automate most of Step 1 (but requires human decision making)
• Formal underlying discrete mathematical models allow for automated
consistency/completeness checks (can detect conflicts)
• Have not yet automated Step 2 (causes of unsafe control actions)
Thank you!
• Email: [email protected]
• Interested in systems approach to security?
– STAMP / STPA works for security too!
• Book: “Engineering a Safer World”
– MIT Press, 2012 (Nancy Leveson)
• STPA Primer
– More examples, exercises – Search Google for “STPA Primer”