velocity eu 2012 escalating scenarios: outage handling pitfalls

Post on 28-Jan-2015

107 Views

Category:

Business

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

When things go wrong, our judgement is clouded at best, blinded at worst. In order to successfully navigate a large-scale outage, being aware of potentials gaps in knowledge and context can help make for a better outcome. The Human Factors and Systems Safety community have been studying how people situate themselves, coordinate amongst a team, use tooling, make decisions, and keep their cool under sometimes very stressful and escalating scenarios. We can learn from this research in order to adopt a more mature stance when the s*#t hits the fan. We’re going to look closely at how people behave under these circumstances using real-world examples and scan what we can learn from High Reliability Organizations(HROs) and fields such as aviation, military, and trauma-driven healthcare.

TRANSCRIPT

Escalating Scenarios

A Deep Dive Into Outage Pitfalls

John AllspawVelocity

London 2012

Wednesday, October 3, 12

TROUBLESHOOTING

This is NOT about troubleshooting

Or, not just about troubleshooting

Wednesday, October 3, 12

LAYOUT

• Criteria• Situational Awareness• HROs• Decision Making• Communication• Team Coordination• A little bit of psychology

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

How important is this?

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Oct 2011

Sept 2012

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Where to learn from?

Wednesday, October 3, 12

TMI

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

Kegworth 1989

Wednesday, October 3, 12

Dr. Richard Cook, Velocity US 2012http://www.youtube.com/watch?v=R_PDc0HFdP0

Wednesday, October 3, 12

“The Self-Designing High-Reliability Organization: Aircraft Carrier Flight Operations at Sea”Rochlin, La Porte, and Roberts. Naval War College Review 1987

http://govleaders.org/reliability.htm

Wednesday, October 3, 12

Wednesday, October 3, 12

Wednesday, October 3, 12

What Goes On In Our Heads?

Wednesday, October 3, 12

Jens Rasmussen, 1983Senior Member, IEEE

“Skills, Rules, and Knowledge; Signals, Signs, and Symbols, and Other Distinctions in Human Performance Models”IEEE Transactions On Systems, Man, and Cybernetics, May 1983

Wednesday, October 3, 12

SKILL - BASED

Simple, routineRULE - BASED

Knowable, but unfamiliarKNOWLEDGE - BASED

WTF IS GOING ON?(Reason, 1990)

Wednesday, October 3, 12

Situational Awareness"the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future,” - (Endsley, 1995)

"keeping track of what is going on around you in a complex, dynamic environment" (Moray, 2005, p. 4)

"knowing what is going on so you can figure out what to do" (Adam, 1993)

Wednesday, October 3, 12

OODA Loop

Observe Orient Decide Act

MetricsMonitoringAlertingAlarming

AnalysisVisualizationCorrelation

PlanningResourcing

Execution

credit: http://blog.b3k.us/ooda.htmlWednesday, October 3, 12

Canonical Work

“Towards a Theory of Situational Awareness”Mica Endsley, Human Factors (1995)

http://www.satechnologies.com/Papers/pdf/Toward%20a%20Theory%20of%20SA.pdf

Wednesday, October 3, 12

Situational AwarenessLevel I

Perception

Level IIComprehension

Level IIIProjection

Wednesday, October 3, 12

Situational Awareness

Projectionof future status

LEVEL III

Comprehension of current situation

LEVEL II

Perception of elements in current situation

LEVEL I

DecisionPerformance

of actionsState of the environment

System capabilityInterface designStress and workloadComplexityAutomation

Goals andobjectives

Preconceptions (expectations)

Information processing mechanisms

Long term memory states Automaticity

Feedback

- Abilities- Experience- Training

Task/System Factors

Individual Factors

(Endsley)Wednesday, October 3, 12

Level One: Perception

Wednesday, October 3, 12

Wednesday, October 3, 12

Context

Can you spot the anomaly?Wednesday, October 3, 12

24 hours

Context

Wednesday, October 3, 12

Context

7 daysWednesday, October 3, 12

Context

NormalBut

NoisyWednesday, October 3, 12

Level Two

ComprehensionWednesday, October 3, 12

Level Two

Wednesday, October 3, 12

Mental Models

• Categorization & Comprehension

• Mental “map” or “schema”

• Informed by experience, stored in memory

Wednesday, October 3, 12

Mental Models

Wednesday, October 3, 12

Level Three

Wednesday, October 3, 12

Mental Models

Wednesday, October 3, 12

Level Three

• Ambiguity • Fixation • Confusion • Lack of Information• Failure to maintain • Failure to meet expected checkpoint or target• Failure to resolve discrepancies • A bad gut feeling that things are not quite right 

Common Clues you’re losing SA at this level

Wednesday, October 3, 12

Characteristics of response to escalating scenarios

Wednesday, October 3, 12

...tend to neglect how processes develop within time (awareness of rates) versus assessing how things are in the moment

Characteristics of response to escalating scenarios

“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980

Wednesday, October 3, 12

...have di!culty in dealing with exponential developments (hard to imagine how fast something can change, or accelerate)

Characteristics of response to escalating scenarios

“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980

Wednesday, October 3, 12

...inclined to think in causal SERIES, instead of causal NETS.

A therefore B,

instead of

A, therefore B and C (therefore D and E), etc.

Characteristics of response to escalating scenarios

“On the Difficulties People Have in Dealing With Complexity” Dietrich Doerner, 1980

Wednesday, October 3, 12

Requisite Memory Trap

SA Pitfalls

Wednesday, October 3, 12

Workload, anxiety, fatigue, other stressors

SA Pitfalls

Wednesday, October 3, 12

Data Overload

SA Pitfalls

Wednesday, October 3, 12

Misplace Salience

SA Pitfalls

Wednesday, October 3, 12

http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdf

Wednesday, October 3, 12

http://www.perceptualedge.com/articles/Whitepapers/Dashboard_Design.pdf

Wednesday, October 3, 12

Wednesday, October 3, 12

Complexity Creep“Everything should be as simple as it can be, but not simpler.”- paraphrased, Einstein

SA Pitfalls

Wednesday, October 3, 12

Poor Mental Models

SA Pitfalls

Wednesday, October 3, 12

Out-Of-The-Loop Syndrome

SA Pitfalls

Wednesday, October 3, 12

Refusal to make decisions

SA Pitfalls

Wednesday, October 3, 12

Non-communicating lone wolf-isms

Heroism

Wednesday, October 3, 12

Irrelevant noise in comm channels

Distraction

Wednesday, October 3, 12

Wednesday, October 3, 12

• Divide and conquer applied to problem space, division of labor

• Incident resolution vs. Problem resolution

• Reproducibility

• Fault Tolerance E!ects

TEAMS

Wednesday, October 3, 12

Shotgun debugging

TEAMS

Wednesday, October 3, 12

• Interpredictability

• Common Ground

• Directability

JOINTACTIVITY

http://csel.eng.ohio-state.edu/woods/distributed/CG%20final.pdf

Wednesday, October 3, 12

Interpredictability

Wednesday, October 3, 12

Common GroundWednesday, October 3, 12

Directability

Wednesday, October 3, 12

Improvisation

Wednesday, October 3, 12

IMPROVISATION

Wednesday, October 3, 12

IMPROVISATION

Wednesday, October 3, 12

Improvisation

“...you can’t improvise on nothing; you got to improvise on something.”

Charles Mingus

Wednesday, October 3, 12

Diagnose the problem

Represent the problem

Detect the Problem/Opportunity

Generate acourse of

action

ApplyLeveragePoints

Evaluate

Wednesday, October 3, 12

CommunicationRecommendations

•Explicitness•Assertiveness•Timing

Wednesday, October 3, 12

Assertiveness

• Passive

• Assertive

• Aggressive

Wednesday, October 3, 12

Wednesday, October 3, 12

Exercise

Wednesday, October 3, 12

Communication

• IRC?

• Face-To-Face?

• Conference Call?

• Morse Code?

Wednesday, October 3, 12

Kegworth 1989

Wednesday, October 3, 12

MeaningEncode

Sender ReceiverDecode

Meaning

Transmission

Wednesday, October 3, 12

MeaningEncode

Sender ReceiverDecode

Meaning

Transmission

Wednesday, October 3, 12

MeaningEncode

Sender ReceiverDecode

Meaning

Transmission

Meaning

DecodeReceiver Sender

Encode

Meaning

Transmission

Wednesday, October 3, 12

MeaningEncode

Sender ReceiverDecode

Meaning

Transmission

Meaning

DecodeReceiver Sender

Encode

Meaning

Transmission

Wednesday, October 3, 12

FeedbackInformational

Wednesday, October 3, 12

FeedbackCorrective

Wednesday, October 3, 12

FeedbackReinforcing

Wednesday, October 3, 12

Decision Making Naturalistic Decision Making (NDM)Gary Klein

Wednesday, October 3, 12

Decision Making

Step One: What is the problem?

Wednesday, October 3, 12

Decision Making

Step Two: What shall I do?

Wednesday, October 3, 12

Recognition-Primed Decisions

Decision Making

Wednesday, October 3, 12

Rule-Based Decisions

Decision Making

Wednesday, October 3, 12

Choice decisions

Decision Making

Wednesday, October 3, 12

Creative decisionsDecision Making

Wednesday, October 3, 12

Decision Making

Creative Choice Rule-Based RPD

Decreasing cognitive effortDecreasing effects of stress

Increasing cognitive effortIncreasing effects of stress

Wednesday, October 3, 12

PRE-Mortem

Decision Making

Wednesday, October 3, 12

Tooling

Wednesday, October 3, 12

??

Wednesday, October 3, 12

MetricTimePeriod

Wednesday, October 3, 12

Controls

Wednesday, October 3, 12

ALERTS• Meant to boost SA

• Alarm overload

• High false alarm rates

• Routinely disable alerts

Wednesday, October 3, 12

Alert ReliabilityWednesday, October 3, 12

ALERT DESIGN

• Signal:Noise can be di"cult

• Easy to err on more false alarms

• Decay in trust

• Origins: Undetectable conditions

Wednesday, October 3, 12

ALERT DESIGN

Confirmation

Wednesday, October 3, 12

ALERT DESIGN

Expectancy

Wednesday, October 3, 12

ALERT DESIGN

Wednesday, October 3, 12

ALERT DESIGN

• Don’t make people singularly reliant on alarms

• Support alarm confirmation activities

• Make alarms unambiguous

• Reduce, reduce, reduce false alerts

• Set missed/false alert trade-o!s appropriately

Wednesday, October 3, 12

ALERT DESIGN

• Use multiple modalities

• Minimize alarm disruptions to ongoing activities

• Support the assessment/diagnosis of multiple alerts

• Support global SA of systems in an alarm state

Wednesday, October 3, 12

Mature Role of Automation

http://www.bainbrdg.demon.co.uk/Papers/Ironies.html

“Ironies of Automation” - Lisanne Bainbridge

Wednesday, October 3, 12

Mature Role of Automation

• Moves humans from manual operator to supervisor

• Extends and augments human abilities, doesn’t replace it

• Doesn’t remove “human error”

• Are brittle

• Recognize that there is always discretionary space for humans

• Recognizes the Law of Stretched Systems

Wednesday, October 3, 12

SUMMARY

Wednesday, October 3, 12

So what can we do?

“In preparing for battle, I have always found that plans are useless but planning is indispensable.”

- Eisenhower

Wednesday, October 3, 12

So what can we do?We develop our Non-Technical Skills

• Situational Awareness

• Communication

• Decision Making

• Improvisation

• Crew Resource Management (CRM)

Wednesday, October 3, 12

So what can we do?

We tailor our environment to adapt

• Tooling to support SA

• Learning from outages (PostMortem)

• Anticipating problems (PreMortem)

• Gather Meta-Metrics

Wednesday, October 3, 12

Wednesday, October 3, 12

THE ENDWednesday, October 3, 12

top related