‘critical events’ — applying continuous improvement to incident...

5
“CRITICAL EVENTS” – APPLYING CONTINUOUS IMPROVEMENT TO INCIDENT REPORTING Nigel Cann (FIChemE, CEng, CEnv, GAICD) General Manager, Australian Vinyls, 65 Leakes Road, Laverton, VIC 3028, Australia; e-mail: [email protected] The intention of Major Hazard Regulations around the world is to bring about sufficient control of high hazard risks to prevent high consequence events on infrastructure, communities and the environment. There is an obligation on Operators 1 to ensure that these hazards are controlled, to have the systems implemented and maintained to reduce and mitigate the hazard consequences. Despite these well understood principles from both the regulators and the operators of Major Hazard Facilities, major hazard accidents have continued to occur, most noticeably in the media at Buncefield 2 and Texas City 3 in the last 24 months. So how does an Operator of a Major Hazard Facility know the control measures they have applied are adequate and that they remain functional? Regulations 4,5 suggest this done by full review every 5 years. This author contends that this is insufficient and that to be effective a more responsive mechanism is required to get Operators attention. Australian Vinyls has developed a process where key incidents – called “critical events” – are identified. This process is based on the belief that Reasons “Swiss Cheese” model 6 on the stages leading to an accident applies. Central to the process is the identification of events, incidents and near misses (or “near hits”) that can be studied in detail to “root cause” so that any “holes” in the control measures can be located and eliminated, before they can align, thereby preventing a Major Incident. The paper provides an illustration of how some control measures are assessed as “critical control measures” and how examples of identifying “critical events” related to “critical control measures” from incidents, monitoring programs and annual reviews of performance measures lead to continu- ous improvement of safety and a high degree of assurance that Major Hazard Incidents can be prevented. The incidents discussed will be the loss of level control in a storage tank, the automatic stopping of a runaway reaction and the replacement of the above ground piping on a storage deluge system. INTRODUCTION The intention of Major Hazard Regulations around the world is to bring about sufficient control of high hazard risks to prevent high consequence events on infrastructure, communities and the environment. There is an obligation on Operators to ensure that these hazards are controlled, to have the systems implemented and maintained to reduce and mitigate the hazard consequences. Despite these well understood principles from both the regulators and the operators of Major Hazard Facilities, major hazard accidents have continued to occur, most notably at Buncefield and Texas City in the last 24 months. So how does an Operator of a Major Hazard Facility know the control measures they have applied are adequate and that they remain functional? Regulations suggest this is done by full review every 5 years. This author contends that this is insufficient and that to be effective a more respon- sive mechanism is required to hold Operators attention. Australian Vinyls has developed a process where key incidents – called “critical events” – are identified. This process is based on the application of Reason’s “Swiss Cheese” model (Reason, 1990) and Deeming’s Continuous Improvement PDCA Cycle (Tague, 1995). Central to the process is the easy identification of events, incidents and near misses (or “near hits”) that can be studied in detail down to “root cause” so that any “holes” in the control measures can be located and eliminated, before they can align, thereby maintaining the layers of protection and thus preventing Major Incidents. The paper provides an illustration of how some control measures are identified as “critical control measures” and how examples of identifying “critical events” related to “critical control measures” from inci- dents, monitoring programs and annual reviews of perform- ance measures lead to continuous improvement of safety and a high degree of assurance that Major Hazard Incidents can be prevented. 1 Here “Operator” has the meaning of Major Hazard Facilities Regu- lations and refers to the employer who has management or control of the facility, p5. 2 http://www.buncefieldinvestigation.gov.uk/index.htm 3 http://www.chron.com/disp/story.mpl/special/05/blastarchive/ 3747726.html 4 Occupational Health and Safety (Major Hazards Facilities) Regu- lations 2000, Victoria, Statutory Rule No. 50/2000. 5 The Control of Major Accident Hazards Regulations 1999, ISBN 0 11 082192 0, Statutory Instrument 1999 No. 743. 6 Reason, J. (1990). Human error. New York: Cambridge University Press. IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE 1

Upload: lamlien

Post on 04-May-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ‘Critical Events’ — Applying Continuous Improvement to Incident Reporting/media/Documents/Subject Groups/S… ·  · 2013-12-06“CRITICAL EVENTS” – APPLYING CONTINUOUS

IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE

“CRITICAL EVENTS” – APPLYING CONTINUOUS IMPROVEMENTTO INCIDENT REPORTING

Nigel Cann (FIChemE, CEng, CEnv, GAICD)

General Manager, Australian Vinyls, 65 Leakes Road, Laverton, VIC 3028, Australia; e-mail: [email protected]

The intention of Major Hazard Regulations around the world is to bring about sufficient control of

high hazard risks to prevent high consequence events on infrastructure, communities and the

environment. There is an obligation on Operators1 to ensure that these hazards are controlled, to

have the systems implemented and maintained to reduce and mitigate the hazard consequences.

Despite these well understood principles from both the regulators and the operators of Major

Hazard Facilities, major hazard accidents have continued to occur, most noticeably in the media

at Buncefield2 and Texas City3 in the last 24 months.

So how does an Operator of a Major Hazard Facility know the control measures they have

applied are adequate and that they remain functional? Regulations4,5 suggest this done by full

review every 5 years. This author contends that this is insufficient and that to be effective a

more responsive mechanism is required to get Operators attention.

Australian Vinyls has developed a process where key incidents – called “critical events” – are

identified. This process is based on the belief that Reasons “Swiss Cheese” model6 on the stages

leading to an accident applies. Central to the process is the identification of events, incidents and

near misses (or “near hits”) that can be studied in detail to “root cause” so that any “holes” in

the control measures can be located and eliminated, before they can align, thereby preventing a

Major Incident.

The paper provides an illustration of how some control measures are assessed as “critical control

measures” and how examples of identifying “critical events” related to “critical control measures”

from incidents, monitoring programs and annual reviews of performance measures lead to continu-

ous improvement of safety and a high degree of assurance that Major Hazard Incidents can be

prevented.

The incidents discussed will be the loss of level control in a storage tank, the automatic stopping

of a runaway reaction and the replacement of the above ground piping on a storage deluge system.

INTRODUCTIONThe intention of Major Hazard Regulations around theworld is to bring about sufficient control of high hazardrisks to prevent high consequence events on infrastructure,communities and the environment. There is an obligationon Operators to ensure that these hazards are controlled,to have the systems implemented and maintained toreduce and mitigate the hazard consequences.

Despite these well understood principles from boththe regulators and the operators of Major Hazard Facilities,major hazard accidents have continued to occur, mostnotably at Buncefield and Texas City in the last 24 months.

1Here “Operator” has the meaning of Major Hazard Facilities Regu-

lations and refers to the employer who has management or control of

the facility, p5.2http://www.buncefieldinvestigation.gov.uk/index.htm3http://www.chron.com/disp/story.mpl/special/05/blastarchive/3747726.html4Occupational Health and Safety (Major Hazards Facilities) Regu-

lations 2000, Victoria, Statutory Rule No. 50/2000.5The Control of Major Accident Hazards Regulations 1999, ISBN

0 11 082192 0, Statutory Instrument 1999 No. 743.6Reason, J. (1990). Human error. New York: Cambridge University

Press.

1

So how does an Operator of a Major Hazard Facilityknow the control measures they have applied are adequateand that they remain functional? Regulations suggest thisis done by full review every 5 years. This author contendsthat this is insufficient and that to be effective a more respon-sive mechanism is required to hold Operators attention.

Australian Vinyls has developed a process where keyincidents – called “critical events” – are identified. Thisprocess is based on the application of Reason’s “SwissCheese” model (Reason, 1990) and Deeming’s ContinuousImprovement PDCA Cycle (Tague, 1995). Central to theprocess is the easy identification of events, incidents andnear misses (or “near hits”) that can be studied in detaildown to “root cause” so that any “holes” in the controlmeasures can be located and eliminated, before they canalign, thereby maintaining the layers of protection andthus preventing Major Incidents.

The paper provides an illustration of how somecontrol measures are identified as “critical controlmeasures” and how examples of identifying “criticalevents” related to “critical control measures” from inci-dents, monitoring programs and annual reviews of perform-ance measures lead to continuous improvement of safetyand a high degree of assurance that Major Hazard Incidentscan be prevented.

Page 2: ‘Critical Events’ — Applying Continuous Improvement to Incident Reporting/media/Documents/Subject Groups/S… ·  · 2013-12-06“CRITICAL EVENTS” – APPLYING CONTINUOUS

IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE

Two incidents will be discussed: the loss of levelcontrol in a storage tank and the replacement of the aboveground piping on a storage deluge system to illustrate theapproach.

DEMONSTRATION AND THE PDCA CYCLEFollowing the Esso Longford Gas Plant Accident(Dawson,1999) the State of Victoria in Australia respondedby implementing the Occupational Health and Safety(Major Hazard Facilities) Regulations 2000 ( Governmentof Victoria, 2000). Examination of these regulationsshows a progressive pathway of demonstration required ofthe Occupiers of Major Hazard Facilities in the state.These being:

. Regulation 302 Identification of major incidents andhazards

(1) The operator of a major hazard facilitymust—(a) identify all major incidents which could

occur at the major hazard facility; and(b) identify all hazards that could cause, or

contribute to causing, those majorincidents.

. Regulation 304 (1) The operator of a major hazardfacility must adopt control measures which eliminateor, if it is not practicable to eliminate, which reduce sofar as is practicable, risk to health and safety.

What performanceindicators do I use?

Tests?Audits?

Incidents?Time?

Review?

Make changes toPI’s , SMS, install

new ControlMeasures?

Act

Check

Figure 1. Putting the regulations into a

2

. Regulation 301 (1) The operator of a major hazardfacility must establish and implement a Safety Manage-ment System for the major hazard facility.

. Regulation 402 (2) (b) demonstrating the adequacy ofthe control measures adopted or reviewed under regu-lations 304 and 306.

. Schedule 2, 7.3 Performance indicators for the effective-ness of control measures adopted . . .

. Regulation 306 (1) . . . must review, and as necessaryrevise, those matters so as to ensure that the controlmeasures adopted are such that the operator continuesto comply with regulation 304(1).

If these regulations are looked at through the eyes of aBusiness Manager rather than through those of a Safety Pro-fessional or Risk Engineer, the basis of a DeemingContinuous Improvement (or PDCA) cycle (Tague, 1995)can be seen as illustrated in Figure 1. The stages of theprocess become:

Plan: Identify all the potential major incident scenarios(Regulation 302 (1) (a)) and then identify the hazardsthat can cause those incidents (Regulation 302 (1) (b)).

Do: Adopt control measures to manage those hazards(Regulation 304(1)) and manage those via a safety man-agement system (Regulation 301(1)).

Check: Demonstrate the adequacy of the control measures(Regulation 402 (2) (b) by the adoption of performanceindicators (Schedule 2, part 7.3) that test the effective-ness and indicate failure of the control measures.

What Major Incidentscan occur?What Hazards exist that can cause those Incidents?

What Control Measurescan be used to manageand contain thosehazards?How do I managein SMS?

Plan

Do

continuous improvement framework

Page 3: ‘Critical Events’ — Applying Continuous Improvement to Incident Reporting/media/Documents/Subject Groups/S… ·  · 2013-12-06“CRITICAL EVENTS” – APPLYING CONTINUOUS

Figure 2. Copy of Major Hazard Incident Database for a Loss of Containment from a large hole (diameter . largest fitting) in a Vinyl

Chloride Monomer Storage Tank

IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE

Implement tests, audit cycles, identify incidents (bothinternal and external) and review the information period-ically.

Act: Make changes to your performance indicators (PI’s),safety management system (SMS) and establish newcontrol measures (Regulation 306 (1)).

This paper will not go into the detail of how to ident-ify incidents and hazards and even the selection of controlmeasures to manage those hazards. These are necessary pre-requisite steps for actively managing the control measuresvia a safety management system. For those not familiarwith the process, the following illustration from AustralianVinyls will have to suffice. In Figure 2, a major incidentscenario is illustrated for a loss of containment from alarge hole in a vinyl chloride monomer (VCM) storagetank. In the example four separate hazards are listed withdetails of the internal overpressure hazard (highlighted inblue) shown, along with the control measures for that par-ticular hazard. In turn the details of the relief valve controlmeasure (highlighted in black) are provided. The “!” indi-cates this has been classified as a critical control measure.

LEARNING FROM INDUSTRIAL ACCIDENTSAfter nearly every significant, high consequence, majorindustrial accident, some form of public enquiry inevitablyis held – Buncefield being a recent example (Powell,2006). No matter where the incidents occurred, whether atLongford in Australia (Hopkins, 2000), Texas City in theUSA (Broadribb, 2006) or further back to Flixborough(Kletz, 1994) the conclusions are the same: there areimmediate causes, there are underlying causes that have

3

contributed to the immediate, direct event due to failuresat the organizational level (Anderson, 2004), but furtherthere are precursor incidents that have indicated therewere problems (Hopkins, 2000).

These conclusions, by well resourced multi-disciplin-ary teams, readily point to an accident causation model inline with what has become commonly known as the“Swiss Cheese Model” (Reason, 1990). In this model, forevery barrier that prevents a hazard leading to an incident,there are holes (like the holes in a slice of Swiss cheese).If a series of holes in each barrier line up then there isnothing to prevent the hazard leading to an undesirableincident.

CRITICAL EVENTSIt is one thing to observe that historical records show thatthey fit with a well respected model. Knowledge onlyturns into learning when it is applied appropriately. Thusthe learning that is to be gained from the applicability ofthe “Swiss Cheese Model” is only real if a system can bedevised to manage a major hazard facility in real time sothat latent failures (Swiss cheese holes) are identified andrectified before a situation (abnormal event) arrives thatfinds fault in every layer of the protection.

At Australian Vinyls we have defined a system called“Critical Events” that has been integrated into our IncidentReporting system that leads to continuous improvement. ACritical Event is defined as:

“A CRITICAL EVENT is any breach, failure, or

loss of a CRITICAL CONTROL MEASURE or

Page 4: ‘Critical Events’ — Applying Continuous Improvement to Incident Reporting/media/Documents/Subject Groups/S… ·  · 2013-12-06“CRITICAL EVENTS” – APPLYING CONTINUOUS

Figure 3. VCM storage tank deluge line flushing

IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE

failure to meet a monitoring schedule for a

CRITICAL CONTROL MEASURE.”

So for each Critical Control Measure, performanceindicators are defined that capture critical events so thataction is taken in a timely manner to remedy problemsbefore the control measure is called into action in anger.Every incident is categorised on the following CriticalEvent scale:

1. Not a Major Incident or Critical Event2. Critical Event – Inspection, test or audit of a Critical

Control Measure not performed as scheduled3. Critical Event – Critical Equipment/Control Measure

found to be defective on inspection or test4. Critical Event – Critical Equipment/Control Measure

found to be defective on demand5. Critical Event – Failure to follow a Critical Work

Instruction or Procedure6. Major Incident

DEVELOPING PERFORMANCE INDICATORSUsing the example of relief valves on the VCM storagetanks (as per figure 2), the following performance indicatorshave been established:

1. No failure on demand.2. As received Pop test to be within 10% of setting as per

AVRES/ENG/INT/PC2053. Inspection and test to be no more than 3 months overdue

without Tolerable Risk Assessment as per AVRES/ENG/INT/PC205

4. Number of SHE Incidents that have identified as a rootcause inadequate inspection, testing, installation oroverhaul of PRVs and BDs. (pressure relief valvesand bursting discs)

Alignment of the performance indicators with theCritical Event scale can be seen. Thus events as theyoccur are followed up and corrections made. On a yearlybasis a formal review is undertaken of the longer term per-formance indicators where the operational owners report tothe site Safety, Health and Environment Committee(employee and management representatives), longer termtrends are determined, conclusions drawn and furtherimprovements made.

CASE STUDIES

STORAGE TANK OVERFILLEDAt 4:30am on 5th March 2002 problems were encounteredunloading a VCM tanker as the vapour return compressorswere cutting out and ice was noticed on the catch pot. Thecontrol room operator then noticed that the level indicatortrend on the storage tank had not been rising for sometime. A simple jog of the controller and the indicatorimmediately raised to 95% (above the high limit). Theunloading operation was placed on hold, the storage tankequalised with another (both settled at about 55%) and theinventory then brought down in the faulty storage tank.

4

The unloading compressor would not run as a lowsuction pressure was detected, which was due to a ballfloat in the suction catch pot blocking off the inlet asliquid VCM had been correctly collected here as thestorage tank had overflowed into the vapour return line.The level control system did not function as it hadbecome frozen into position at the 34% level due to cor-rosion product buildup. This had prevented both the auto-matic cutoff and the Distributed Control System (DCS)software that swapped tanks from operating. The staticnature of the trend was not noticed by the operator as char-ging of the plant was coming from the same storage tank dueto low stock levels. Also a secondary level indication systemwas not monitored as the results were not believed by theoperators – it was a capacitance probe and the resultshave been affected by water levels in the VCM on previousoccasions.

Underpinning these operational and mechanical fail-ures our investigations revealed:

. An operational culture of accepting restarts withoutunderstanding the causes of trips

. A maintenance culture of operating even critical equip-ment to failure

. A maintenance culture of not strictly keeping to cali-bration and trip & alarm schedules

. An acceptance culture of equipment that failed tooperate (the capacitance probe and a related boom gateoperation)

. Checking DCS software that had been bypassed due tothe difficulty of appropriately managing normal oper-ational requirements (filling and using VCM from thesame storage tank) in an effort to reduce unnecessaryalarms.

. A management culture that accepted undesirable con-taminants in the raw VCM (water).

Page 5: ‘Critical Events’ — Applying Continuous Improvement to Incident Reporting/media/Documents/Subject Groups/S… ·  · 2013-12-06“CRITICAL EVENTS” – APPLYING CONTINUOUS

Figure 4. The PDCA continuous improvement cycle in action

IChemE SYMPOSIUM SERIES NO. 153 # 2007 IChemE

As a consequence, the incident generated specificimprovements to the level control system:

. A new independent level indication system was sourced,hooked up into the DCS and either measurement used tostop the unloading operation (this took nearly 2 years toresearch, trial and install in all four storage tanks)

. DCS software was rewritten to handle all known operat-ing circumstances

Also the following cultural changes wereimplemented:

. Root cause analysis training was promulgated throughthe operating teams

. Maintenance systems and procedures were changed toget a planned mentality in place (working via writtenplans, integrating maintenance system software intopeoples normal working, afternoon planning meetingsrather than morning fault meetings)

. Systems put in place to track progress of routine cali-bration and trip & alarm routines on critical equipment

And further studies were undertaken to:

. Find other vessels where level control may not be appro-priate – four were found.

. Determine the need for the boom gates (study foundthese to create more maintenance and operational risksthan those that they reduced – currently in final stagesof being decommissioned).

YEARLY REVIEW FINDS DELUGE PROBLEMIn 2004 a yearly review found that the frequency of one-off failures of the deluge systems in the unloading andstorage areas had increased. In each individual criticalevent nozzles were becoming blocked by internal cor-rosion products from the 25 year old mild steel aboveground piping. This led initially to a series of flushingexercises (see figure 3) and ultimately to total replacementof the above ground piping of the affected deluges.

5

SUMMARYThe key to each of these studies is to continue the improve-ment, to build ongoing performance measures and a way toidentify the early warning signals that all is not as it shouldbe. And lastly, when Critical Events are identified, Manage-ment must ensure follow up occurs and issues are addressedin a timely manner to close the defects in the controlmeasures. Then take action to continuously improve bystarting another PDCA cycle. Checks and reviews need tobe performed at multiple layers of management to continu-ally monitor, check and review.

This process is represented in Figure 4 where Con-tinuous Improvement is represented by a set of steps for thePDCAcycle toclimb.TheSafetyManagementSystem(SMS)acts as a wedge to hold you in position whilst also acting as alever to achieve the next level of improvement.

BIBLIOGRAPHYAnderson, M., 2004, Behavioural Safety and Major Accident

Hazards: Magic Bullet or Shot in the Dark?, IChemE,

Hazards XVIII Symposium Series 150: 697–711

Broadribb, M.P., 2006, Lessons from Texas City – A Case

History, Loss Prevention Bull, 192: 3–12

Dawson, D.M., Brooks, B.J., 1999, Report of the Longford

Royal Commission, Government Printer, State of Victoria.

Government of Victoria, 2000, Occupational Health and Safety

(Major Hazards Facilities) Regulations 2000, Victoria, Stat-

utory Rule No. 50/2000.

Kletz, T.A., 1994, Learning from Accidents 2nd edn., Oxford:

Butterworth-Heinemann Ltd, 69–87

Powell, T. 2006, The Buncefield Investigation: Third Progress

Report

Reason, J., 1991, Human error, New York: Cambridge Univer-

sity Press, 207–9

Tague, N., 1995, The Quality Toolbox, Milwaukee: ASQ

Quality Press