presented by nathanael paul september 25, 2002

26
Presented by Nathanael Paul September 25, 2002 Y2K: Perrow’s Normal Accident Theory (NAT) Tested and “Normal Accidents-Yesterday and Today”

Upload: lundy

Post on 22-Feb-2016

42 views

Category:

Documents


8 download

DESCRIPTION

Y2K: Perrow’s Normal Accident Theory (NAT) Tested and “Normal Accidents-Yesterday and Today”. Presented by Nathanael Paul September 25, 2002. Some questions…. What is the most code you have ever written? Largest project (number of lines of code) that you have ever worked on? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by Nathanael Paul September 25, 2002

Presented by Nathanael PaulSeptember 25, 2002

Y2K: Perrow’s Normal Accident Theory (NAT) Testedand“Normal Accidents-Yesterday and Today”

Page 2: Presented by Nathanael Paul September 25, 2002

2

Some questions…

What is the most code you have ever written? Largest project (number of lines of code) that you have ever worked on?

Y2K – the ultimate system failure Were you an optimist or a pessimist?

Page 3: Presented by Nathanael Paul September 25, 2002

3

Society and Systems after 1984

First non-negotiable deadline 180 billion lines of code needs inspecting

Social security: 30 million lines to fix After 400 people and 5 years, only 6 million fixed

“organizations are always screwing up… uncertainty drives system accidents, and this is a hallmark of Y2K”

Failures and Social Incoherence

Page 4: Presented by Nathanael Paul September 25, 2002

4

Perrow’s take on Y2K:“I expect moderate failures but little

social incoherence”

Page 5: Presented by Nathanael Paul September 25, 2002

5

Ingredients

Unexpected interaction of multiple failures Tightly coupled system Incomprehensible

Page 6: Presented by Nathanael Paul September 25, 2002

6

Interdependency

Slight inconvenience or isolated hardships Not as interconnected Handling of problem was better than at first thought

Tight coupling One way of doing things w/o too much slack

“web of connections” Closest analogy to software yet… Alternative paths Testing

Page 7: Presented by Nathanael Paul September 25, 2002

7

Optimists and Pessimists

Pessimists Biblical Apocalypse Computer and Financial experts

Optimists Industrial trade groups, government, and companies Late in their response

Page 8: Presented by Nathanael Paul September 25, 2002

8

Key points to both sides

P: Everything is linked, so everything is “mission critical” Hard to prioritize

O: Experience with failures before will see us through

O: Testing results not announced b/c of liability P: Accident becomes a catastrophic disaster

(multiple failures with coupled single systems)

Page 9: Presented by Nathanael Paul September 25, 2002

9

The Chip

Chips can’t be reprogrammed 7 billion programmable microcontrollers in ‘96

Air Traffic Control’s problems with mainframes People locked in car plant, prisoners let loose “We know something about unexpected

interactions and are more prepared to look for them than ever before.”

“The Butterfly Effect”

Page 10: Presented by Nathanael Paul September 25, 2002

10

Electric Power

Society’s lifeblood Complex interconnected “grids” 1998: Most of the 75% electric N. American

power companies were in awareness/assessment (same findings in Jan. 1999)

“Just in time production” Nuclear facilities not “expecting” problems

Page 11: Presented by Nathanael Paul September 25, 2002

11

Lack of Interest

Jan. 1998 at premier tech conf. of industry No sessions, no meetings on Y2K One presentation was scheduled People were mad. Y2K was a hoax and presenter was a

profiteering consultant March, 1998 at 3rd annual industry-wide meeting on

Y2k 70 of 8,000 companies were there

One summer’s power meeting canceled b/c of lack of interest

Page 12: Presented by Nathanael Paul September 25, 2002

12

More on Power

Interconnectivity No telecom, no power. No power, no telecom. Available fuel supply and delivery

No service obligations to provide base load power to bulk power entities

Gov’t intervention not wanted. Merge, but no fix

Page 13: Presented by Nathanael Paul September 25, 2002

13

And last, but not least… Nuclear Power

Jan. 1999: Only 31 percent ready Harder to fix?

Not expecting problems Hard to test all components If not ready by 2000, shutdown Provided 25% of power

40% in Northeast

Page 14: Presented by Nathanael Paul September 25, 2002

14

Y2K going wrong… We give up.

Y2K compliance vs. Y2K readiness “The Domino Effect”

Banking Shipping Farming and hybrid seeds

Just show them the software warranty You’re probably not liable anyway

Page 15: Presented by Nathanael Paul September 25, 2002

15

Conclusions about NAT and accidents today

Which of the characteristics of NAT does software normally exhibit? Tight/loose coupling? Interdependency? Linear/Complex? “web of connections”

What has been done in the past to help in reducing “accidents” of software? (reduction of tight coupling/complexity/interdependencies)

Let’s see what Strauch has to say…

Page 16: Presented by Nathanael Paul September 25, 2002

16

Other Accidents and Views

Challenger and Chernobyl Do these accidents support NAT?

Operator Error Someone to blame? Justified blame?

Chemical Refineries, Nuclear Power, commercial aviation have all seen drops in accidents or in types of accidents in Perrow

Can Perrow’s assertions be justified that more system accidents would happen?

Page 17: Presented by Nathanael Paul September 25, 2002

17

1995 crash of American Airlines Boeing 757 in Cali, Columbia

Saving time and expense by landing to the south (Miami to Cali)

Many tasks performed to get ready for approach Approach named after final approach fix

(unusual, not named after initial approach) Initial approach beacon, Tulua, deleted from

approach data Flew to final approach (not initial)

Page 18: Presented by Nathanael Paul September 25, 2002

18

Factors involved in Cali crash

“Hide the results of operator actions from operators themselves”

Navigational database design Abbreviations used and instrument approach

procedures Nepal 1992 accident, very similar to Cali crash

Lesson was not learned.

Page 19: Presented by Nathanael Paul September 25, 2002

19

Accident Frequency since ‘84

Depends on country and particular system Perrow’s assertions affected by:

Industry variables Cultural variables Hindsight of his work in helping others And…

Page 20: Presented by Nathanael Paul September 25, 2002

20

Technology

Airbus Industrie (A-320) Introduce new technology, time to familiarize High to lower rate of fatalities as time goes on

Training Better able to emulate real system Focus on what people need seen in training Training related accidents all but eliminated Operator error reduction in training? Was Perrow

still right?

Page 21: Presented by Nathanael Paul September 25, 2002

21

Aviation Technology

CFIT Ground Proximity Warning Systems (GPWS) not

good at high speeds (Cali crash good example) Terrain Aural Warning System (TAWS) No TAWS aircraft in CFIT crash, yet…

In flight collision of 2 aircraft Terminal Collision Alerting System (TCAS) No 2 planes with TCAS in a collision, yet…

Page 22: Presented by Nathanael Paul September 25, 2002

22

Organizational Accidents – Organizational Features in system safety

Valujet ’96 crash of DC-9 Canisters of chemical oxygen generators Non-traditional contracting out of work Maintenance personnel were rushed to work on 2

aircraft to meet deadlines Canisters not labeled correctly Warehousing personnel returned the canisters to

their rightful owner

Page 23: Presented by Nathanael Paul September 25, 2002

23

What Happened?

Cost reductions over safety Regulation (FAA) failed where accident may

have been prevented Enron

Page 24: Presented by Nathanael Paul September 25, 2002

24

Learning from our mistakes

Something done before 1984… Shortcomings of navigational databases

addressed (Cali accident) FAA operational oversight addressed (Valujet) Financial system deficiencies addressed (Enron) Rejected take offs decreased after better training What about Exxon Valdez oil spell (vessel’s

master and alcohol)

Page 25: Presented by Nathanael Paul September 25, 2002

25

Doomed to repeat, if there is no change

Airplane flaps and slats ’87 Northwest Airlines crash in Detroit (better

training and aural warnings) Dallas-Ft. Worth crash b/c of flaps and slats (made

sure this didn’t happen again) Concorde

Engine could eat tire debris, tires in front of engine Nothing done until 2000 Paris accident (problem

cited much earlier by engineers)

Page 26: Presented by Nathanael Paul September 25, 2002

26

Conclusions

Was NAT successful? Why? “Features” can create deficiencies Are systems any more comprehensible? Operator error vs. Design Error Why the reduction in system accidents?

Have we truly stopped accusing the operator and started looking at the systems?

Technology Did it help or hurt more in system accidents?