cs5032 lecture 6: human error 2

21
HUMAN RELIABILITY DR JOHN ROOKSBY

Upload: john-rooksby

Post on 18-Dec-2014

573 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: CS5032 Lecture 6: Human Error 2

HUMAN RELIABILITY

DR JOHN ROOKSBY

Page 2: CS5032 Lecture 6: Human Error 2

IN THIS LECTURE

The “new view” of human error

Rules and rule following

Cooperative work and crew resource management

Page 3: CS5032 Lecture 6: Human Error 2

THE KEGWORTHAIR DISASTER

Page 4: CS5032 Lecture 6: Human Error 2

OUTLINE

A plane crash on the 8th January 1989

British Midland Flight 92. Flying from Heathrow to Belfast

Crashes by the M1 motorway near Kegworth, while attempting an emergency landing at East Midlands Airport

The plane was a Boeing 737-400. A new variant of Boeing 737. In use by BM for less than two months

There were 118 passengers and 8 Crew. 47 die, and 74 seriously injured

Page 5: CS5032 Lecture 6: Human Error 2

SEQUENCE OF EVENTS• The pilots hear a pounding noise and feel vibrations

(subsequently found to be caused by a fan blade breaking inside the left engine).

• Smoke enters the cabin and passengers sitting near the rear of the plane notice flames coming from the left engine

• The flight is diverted to East Midlands Airport

• The pilot shuts down the engine on the right

Page 6: CS5032 Lecture 6: Human Error 2

SEQUENCE OF EVENTS• The pilots can no longer feel the vibrations, and do not notice

the vibration detector is still reporting a problem. The smoke disperses.

• The pilot informs the passengers and crew that there was a problem with the right engine and that it has been shut down

• 20 minutes later. On approach to East Midlands Airport, the pilot increases thrust. This causes the left engine to burst into flames and cease operating

• The pilots try to restart the left engine, but crash short of the runway

Page 7: CS5032 Lecture 6: Human Error 2

WRONG ENGINE SHUT DOWN. WHY?

Incorrect assumption: Pilots believed the “bleed air” was taken from the right engine, and therefore the smoke must be coming from the right. The 737 used bleed air from the right engine, not the 737-400. Psychologists call this a mistake in “knowledge based performance”

Design issues: No visibility of engines, so relied on other information sources to explain vibrations. The vibration sensors were tiny, and had a new style of digital display. The vibration sensors were inaccurate on the 737 but not the 737-400

Inadequate training: A one day course, and no simulator training

Page 8: CS5032 Lecture 6: Human Error 2

ERROR NOT TRAPPED. WHY?

Coincidence: The smoke disappeared after shutting down the right engine and the vibrations lessened. - Psychologists call this “Confirmation bias”.

Lapse in procedure: After shutting down the right engine the pilot began checking all meters and reviewing decisions but stopped after being interrupted by a transmission from the airport asking him to descend to 12,000 ft.

Lack of Communication: Some cabin crew and passengers could see the left engine was on fire, but did not inform the pilot, even when the pilot announced he was shutting down the right engine.

Design Issue: The vibration meters would have shown a problem with the left engine, but were too difficult to read. There was no alarm.

Page 9: CS5032 Lecture 6: Human Error 2

COCKPIT OF A BOEING 737-400

Page 10: CS5032 Lecture 6: Human Error 2

VIEWPOINTSTraditional engineering view

• The crash was caused by an engine failure. Therefore we must design better engines.

Traditional managerial view

• The crash was caused by the pilots. We must hire better pilots.

The Socio-technical systems engineering view or new view

• The crash had no single cause, but involved problems in Testing, Design, Training, Teamwork, Communications, Procedure Following, Decision Making, poor ‘upgrade’ management, (and more)

• We need better engines, but we also need to expect problems to happen and to be adequately prepared for them

Page 11: CS5032 Lecture 6: Human Error 2

THE “NEW VIEW” OF HUMAN ERROR

The old view

Human error is the cause of accidents

Systems are inherently safe and people introduce errors

Bad things happen to bad people

The new view

Human error is a symptom of trouble deeper inside a system

Systems are inherently unsafe and people usually keep them running well

All humans are fallible

Page 12: CS5032 Lecture 6: Human Error 2

THE “NEW VIEW” OF HUMAN ERROR

Is not new! This is just a name, it has been around for 20 years.

Draws the emphasis away from modelling human error, and towards understanding what underlies human actions when operating technology

• How do people get things right?

Argues too much emphasis is placed on “the sharp end”. It argues that error is symptomatic of deeper trouble

Opposes the “blame culture” that has arisen in many organisations. We are too quick to blame system operators when managers and engineers are at fault.

Page 13: CS5032 Lecture 6: Human Error 2

HUMAN RELIABILITY

Humans don’t just introduce errors into systems, but are often responsible for avoiding and correcting them too.

What do people really do when they are operating a technology?

• Very little human work is driven by a clear and unambiguous set of recipes or processes, even when these are available

• All human work is situationally contingent. Work must inevitably be more than following a set of steps.

• If people work to rule, accidents can happen. For example the prior to the sinking of the SS Estonia a crew member did not report a leak as it was not his job.

Page 14: CS5032 Lecture 6: Human Error 2

CORRECT PROCEDURE?

There is not always a ‘correct’ procedure by which to judge any action.

Sometimes trial and error processes are necessary

• In young organisations, best practices may not yet exist

• New and unusual situations may occur in which a trial and error approach is appropriate

• Sometimes it is appropriate to play or experiment. This is how innovation often happens.

So deciding when something is an error, and judging whether an error was appropriate to a set of circumstances can be highly context dependent.

Page 15: CS5032 Lecture 6: Human Error 2
Page 16: CS5032 Lecture 6: Human Error 2

FIELDWORK

Often we don’t notice that people need to do things to keep complex systems running smoothly.

• Fieldwork is an important aspect of understanding how systems are operated and how people work.

Page 17: CS5032 Lecture 6: Human Error 2

STUDYING SUCCESS

It is important to study and understand ordinary work

We can also learn lessons from “successful failures”, including

• The Apollo 13 Mission

• The Airbus A380 engine explosion over Batom island

• The Sioux City Crash

however accounts of successful failures can turn into a form of hero worship, and organisations that experience these kinds of success against the odds can build a false sense of invulnerability.

Page 18: CS5032 Lecture 6: Human Error 2

PROBLEMS WITH AUTOMATION

As work becomes automated, engineers often make the mistake of automating the aspects that are easy to automate.

• The Fitts list MABA-MABA approach can lead to a dangerous lack of awareness and control for systems operators.

• The “paradox of automation” is that automation creates and requires new forms of labour.

• The major design problem is no longer how to support workflow, but how to support awareness across a system and organisation, and how to support appropriate kinds of intervention

Page 19: CS5032 Lecture 6: Human Error 2

CREW RESOURCE MANAGEMENT

One approach to improving reliability and reducing human error is crew resource management (CRM)

• Developed in the aviation industry, and now widely used

• Formerly Crew Resource Management

CRM Promotes

• The effective use of all resources (human, physical, software)

• Teamwork

• Proactive accident prevention

Page 20: CS5032 Lecture 6: Human Error 2

CREW RESOURCE MANAGEMENT

The focus of CRM is upon

• Communication: How to communicate clearly and effectively

• Situational awareness: How to build and maintain an accurate and shared picture of an unfolding situation

• Decision making: How to make appropriate decisions using the available information. (and how to make appropriate information available)

• Teamwork: Effective group work, effective leadership, and effective followership.

• Removing barriers: How to remove barriers to the above

Page 21: CS5032 Lecture 6: Human Error 2

KEY POINTS

It can be too narrow to focus on human error

• Human errors are usually symptomatic of deeper problems

• Human reliability is not just about humans not making errors, but about how humans maintain dependability

We cannot rely on there being correct procedures for every situations. Procedures are important, but we need to support cooperative working

Design approaches, as well as human and organisational approaches, can be taken to support human reliability.