velocity eu 2012 escalating scenarios: outage handling pitfalls
DESCRIPTION
When things go wrong, our judgement is clouded at best, blinded at worst. In order to successfully navigate a large-scale outage, being aware of potentials gaps in knowledge and context can help make for a better outcome. The Human Factors and Systems Safety community have been studying how people situate themselves, coordinate amongst a team, use tooling, make decisions, and keep their cool under sometimes very stressful and escalating scenarios. We can learn from this research in order to adopt a more mature stance when the s*#t hits the fan. We’re going to look closely at how people behave under these circumstances using real-world examples and scan what we can learn from High Reliability Organizations(HROs) and fields such as aviation, military, and trauma-driven healthcare.TRANSCRIPT
Escalating Scenarios
A Deep Dive Into Outage Pitfalls
John AllspawVelocity
London 2012
Wednesday, October 3, 12
TROUBLESHOOTING
This is NOT about troubleshooting
Or, not just about troubleshooting
Wednesday, October 3, 12
LAYOUT
• Criteria• Situational Awareness• HROs• Decision Making• Communication• Team Coordination• A little bit of psychology
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
How important is this?
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Oct 2011
Sept 2012
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Where to learn from?
Wednesday, October 3, 12
TMI
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
Kegworth 1989
Wednesday, October 3, 12
Dr. Richard Cook, Velocity US 2012http://www.youtube.com/watch?v=R_PDc0HFdP0
Wednesday, October 3, 12
“The Self-Designing High-Reliability Organization: Aircraft Carrier Flight Operations at Sea”Rochlin, La Porte, and Roberts. Naval War College Review 1987
http://govleaders.org/reliability.htm
Wednesday, October 3, 12
Wednesday, October 3, 12
Wednesday, October 3, 12
What Goes On In Our Heads?
Wednesday, October 3, 12
Jens Rasmussen, 1983Senior Member, IEEE
“Skills, Rules, and Knowledge; Signals, Signs, and Symbols, and Other Distinctions in Human Performance Models”IEEE Transactions On Systems, Man, and Cybernetics, May 1983
Wednesday, October 3, 12
SKILL - BASED
Simple, routineRULE - BASED
Knowable, but unfamiliarKNOWLEDGE - BASED
WTF IS GOING ON?(Reason, 1990)
Wednesday, October 3, 12
Situational Awareness"the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future,” - (Endsley, 1995)
"keeping track of what is going on around you in a complex, dynamic environment" (Moray, 2005, p. 4)
"knowing what is going on so you can figure out what to do" (Adam, 1993)
Wednesday, October 3, 12
OODA Loop
Observe Orient Decide Act
MetricsMonitoringAlertingAlarming
AnalysisVisualizationCorrelation
PlanningResourcing
Execution
credit: http://blog.b3k.us/ooda.htmlWednesday, October 3, 12
Canonical Work
“Towards a Theory of Situational Awareness”Mica Endsley, Human Factors (1995)
http://www.satechnologies.com/Papers/pdf/Toward%20a%20Theory%20of%20SA.pdf
Wednesday, October 3, 12
Situational AwarenessLevel I
Perception
Level IIComprehension
Level IIIProjection
Wednesday, October 3, 12
Situational Awareness
Projectionof future status
LEVEL III
Comprehension of current situation
LEVEL II
Perception of elements in current situation
LEVEL I
DecisionPerformance
of actionsState of the environment
System capabilityInterface designStress and workloadComplexityAutomation
Goals andobjectives
Preconceptions (expectations)
Information processing mechanisms
Long term memory states Automaticity
Feedback
- Abilities- Experience- Training
Task/System Factors
Individual Factors
(Endsley)Wednesday, October 3, 12
Level One: Perception
Wednesday, October 3, 12
Wednesday, October 3, 12
Context
Can you spot the anomaly?Wednesday, October 3, 12
24 hours
Context
Wednesday, October 3, 12
Context
7 daysWednesday, October 3, 12
Context
NormalBut
NoisyWednesday, October 3, 12
Level Two
ComprehensionWednesday, October 3, 12
Level Two
Wednesday, October 3, 12
Mental Models
• Categorization & Comprehension
• Mental “map” or “schema”
• Informed by experience, stored in memory
Wednesday, October 3, 12
Mental Models
Wednesday, October 3, 12
Level Three
Wednesday, October 3, 12
Mental Models
Wednesday, October 3, 12
Level Three
• Ambiguity • Fixation • Confusion • Lack of Information• Failure to maintain • Failure to meet expected checkpoint or target• Failure to resolve discrepancies • A bad gut feeling that things are not quite right
Common Clues you’re losing SA at this level
Wednesday, October 3, 12
Characteristics of response to escalating scenarios
Wednesday, October 3, 12
...tend to neglect how processes develop within time (awareness of rates) versus assessing how things are in the moment
Characteristics of response to escalating scenarios
“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980
Wednesday, October 3, 12
...have di!culty in dealing with exponential developments (hard to imagine how fast something can change, or accelerate)
Characteristics of response to escalating scenarios
“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980
Wednesday, October 3, 12
...inclined to think in causal SERIES, instead of causal NETS.
A therefore B,
instead of
A, therefore B and C (therefore D and E), etc.
Characteristics of response to escalating scenarios
“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980
Wednesday, October 3, 12
Requisite Memory Trap
SA Pitfalls
Wednesday, October 3, 12
Workload, anxiety, fatigue, other stressors
SA Pitfalls
Wednesday, October 3, 12
Data Overload
SA Pitfalls
Wednesday, October 3, 12
Misplace Salience
SA Pitfalls
Wednesday, October 3, 12
http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdf
Wednesday, October 3, 12
http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdf
Wednesday, October 3, 12
Wednesday, October 3, 12
Complexity Creep“Everything should be as simple as it can be, but not simpler.”- paraphrased, Einstein
SA Pitfalls
Wednesday, October 3, 12
Poor Mental Models
SA Pitfalls
Wednesday, October 3, 12
Out-Of-The-Loop Syndrome
SA Pitfalls
Wednesday, October 3, 12
Refusal to make decisions
SA Pitfalls
Wednesday, October 3, 12
Non-communicating lone wolf-isms
Heroism
Wednesday, October 3, 12
Irrelevant noise in comm channels
Distraction
Wednesday, October 3, 12
Wednesday, October 3, 12
• Divide and conquer applied to problem space, division of labor
• Incident resolution vs. Problem resolution
• Reproducibility
• Fault Tolerance E!ects
TEAMS
Wednesday, October 3, 12
Shotgun debugging
TEAMS
Wednesday, October 3, 12
• Interpredictability
• Common Ground
• Directability
JOINTACTIVITY
http://csel.eng.ohio-state.edu/woods/distributed/CG%20final.pdf
Wednesday, October 3, 12
Interpredictability
Wednesday, October 3, 12
Common GroundWednesday, October 3, 12
Directability
Wednesday, October 3, 12
Improvisation
Wednesday, October 3, 12
IMPROVISATION
Wednesday, October 3, 12
IMPROVISATION
Wednesday, October 3, 12
Improvisation
“...you can’t improvise on nothing; you got to improvise on something.”
Charles Mingus
Wednesday, October 3, 12
Diagnose the problem
Represent the problem
Detect the Problem/Opportunity
Generate acourse of
action
ApplyLeveragePoints
Evaluate
Wednesday, October 3, 12
CommunicationRecommendations
•Explicitness•Assertiveness•Timing
Wednesday, October 3, 12
Assertiveness
• Passive
• Assertive
• Aggressive
Wednesday, October 3, 12
Wednesday, October 3, 12
Exercise
Wednesday, October 3, 12
Communication
• IRC?
• Face-To-Face?
• Conference Call?
• Morse Code?
Wednesday, October 3, 12
Kegworth 1989
Wednesday, October 3, 12
MeaningEncode
Sender ReceiverDecode
Meaning
Transmission
Wednesday, October 3, 12
MeaningEncode
Sender ReceiverDecode
Meaning
Transmission
Wednesday, October 3, 12
MeaningEncode
Sender ReceiverDecode
Meaning
Transmission
Meaning
DecodeReceiver Sender
Encode
Meaning
Transmission
Wednesday, October 3, 12
MeaningEncode
Sender ReceiverDecode
Meaning
Transmission
Meaning
DecodeReceiver Sender
Encode
Meaning
Transmission
Wednesday, October 3, 12
FeedbackInformational
Wednesday, October 3, 12
FeedbackCorrective
Wednesday, October 3, 12
FeedbackReinforcing
Wednesday, October 3, 12
Decision Making Naturalistic Decision Making (NDM)Gary Klein
Wednesday, October 3, 12
Decision Making
Step One: What is the problem?
Wednesday, October 3, 12
Decision Making
Step Two: What shall I do?
Wednesday, October 3, 12
Recognition-Primed Decisions
Decision Making
Wednesday, October 3, 12
Rule-Based Decisions
Decision Making
Wednesday, October 3, 12
Choice decisions
Decision Making
Wednesday, October 3, 12
Creative decisionsDecision Making
Wednesday, October 3, 12
Decision Making
Creative Choice Rule-Based RPD
Decreasing cognitive effortDecreasing effects of stress
Increasing cognitive effortIncreasing effects of stress
Wednesday, October 3, 12
PRE-Mortem
Decision Making
Wednesday, October 3, 12
Tooling
Wednesday, October 3, 12
??
Wednesday, October 3, 12
MetricTimePeriod
Wednesday, October 3, 12
Controls
Wednesday, October 3, 12
ALERTS• Meant to boost SA
• Alarm overload
• High false alarm rates
• Routinely disable alerts
Wednesday, October 3, 12
Alert ReliabilityWednesday, October 3, 12
ALERT DESIGN
• Signal:Noise can be di"cult
• Easy to err on more false alarms
• Decay in trust
• Origins: Undetectable conditions
Wednesday, October 3, 12
ALERT DESIGN
Confirmation
Wednesday, October 3, 12
ALERT DESIGN
Expectancy
Wednesday, October 3, 12
ALERT DESIGN
Wednesday, October 3, 12
ALERT DESIGN
• Don’t make people singularly reliant on alarms
• Support alarm confirmation activities
• Make alarms unambiguous
• Reduce, reduce, reduce false alerts
• Set missed/false alert trade-o!s appropriately
Wednesday, October 3, 12
ALERT DESIGN
• Use multiple modalities
• Minimize alarm disruptions to ongoing activities
• Support the assessment/diagnosis of multiple alerts
• Support global SA of systems in an alarm state
Wednesday, October 3, 12
Mature Role of Automation
http://www.bainbrdg.demon.co.uk/Papers/Ironies.html
“Ironies of Automation” - Lisanne Bainbridge
Wednesday, October 3, 12
Mature Role of Automation
• Moves humans from manual operator to supervisor
• Extends and augments human abilities, doesn’t replace it
• Doesn’t remove “human error”
• Are brittle
• Recognize that there is always discretionary space for humans
• Recognizes the Law of Stretched Systems
Wednesday, October 3, 12
SUMMARY
Wednesday, October 3, 12
So what can we do?
“In preparing for battle, I have always found that plans are useless but planning is indispensable.”
- Eisenhower
Wednesday, October 3, 12
So what can we do?We develop our Non-Technical Skills
• Situational Awareness
• Communication
• Decision Making
• Improvisation
• Crew Resource Management (CRM)
Wednesday, October 3, 12
So what can we do?
We tailor our environment to adapt
• Tooling to support SA
• Learning from outages (PostMortem)
• Anticipating problems (PreMortem)
• Gather Meta-Metrics
Wednesday, October 3, 12
Wednesday, October 3, 12
THE ENDWednesday, October 3, 12
Credits
• http://www.flickr.com/photos/28650594@N03/2718027136/
• http://www.flickr.com/photos/telstar/2816887784/
• http://www.flickr.com/photos/soldiersmediacenter/3855375117/
• http://www.flickr.com/photos/19743256@N00/2640217771/
• http://www.flickr.com/photos/29456680@N06/6148754691/
• https://www.flickr.com/photos/spencediddy/3197199659/sizes/l/in/faves-allspaw/
• http://www.flickr.com/photos/splorp/64027565/sizes/l/in/photostream/
Wednesday, October 3, 12