alyeska pipeline service company, - home page

28
Alyeska Pipeline Service Company, TK-190 Overfill Incident Root Cause Analysis Report And Post Accident Review June 22, 2010 Table of Contents Executive Summary . . . . . . . . . . . . . . . . . . . . . . . Page 3-5 Incident Description . . . . . . . . . . . . . . . . . . . . . . . Page 6-8 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 9-10 Investigation Team . . . . . . . . . . . . . . . . . . . . . . . . Page 10 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 10-11 Root Causes & Recommendations . . . . . . . . . . . . Page 11-13 Contributing Causes & Recommendations . . . . . . Page 13-17 Attachments: 1) PS09 TK-190 Overfill Photos 2) PS09 Simplified Power Configuration (withheld) 3) PS09 Simplified Control Module UPS Configuration (withheld) 4) PS09 Normal Operational Configuration (withheld) 5) PS09 Relief Event Configuration (withheld) 6) PS09 UPS Panel / Breaker Photos (withheld) 7) PS09 Response Configuration (withheld) 8) Technical Failure Analysis Report (withheld) 9) Incident Event & Causal Factor Chart (withheld) 10) Incident Summary Chart (withheld) 11) Incident Barrier Analysis Worksheet (withheld) 12) Personnel Interview List (withheld) 13) Document Review List In response to a public records request under Alaska statutes, The Alaska State Pipeline Coordinator’s Office has released the company investigation report, withholding Attachments 2 through 7 and 9 through 12 , citing security and public health and welfare (Attachments 2 through 7), protection of trade secrets (Attachments 9 through 11) and harm to the state’s oversight ability (Attachment 12).

Upload: others

Post on 03-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alyeska Pipeline Service Company, -   Home Page

Alyeska Pipeline Service Company,

TK-190 Overfill Incident Root Cause Analysis Report And Post Accident Review

June 22, 2010

Table of Contents

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . Page 3-5

Incident Description . . . . . . . . . . . . . . . . . . . . . . . Page 6-8

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 9-10

Investigation Team . . . . . . . . . . . . . . . . . . . . . . . . Page 10

Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 10-11

Root Causes & Recommendations . . . . . . . . . . . . Page 11-13

Contributing Causes & Recommendations . . . . . . Page 13-17 Attachments:

1) PS09 TK-190 Overfill Photos 2) PS09 Simplified Power Configuration (withheld) 3) PS09 Simplified Control Module UPS Configuration (withheld) 4) PS09 Normal Operational Configuration (withheld) 5) PS09 Relief Event Configuration (withheld) 6) PS09 UPS Panel / Breaker Photos (withheld) 7) PS09 Response Configuration (withheld) 8) Technical Failure Analysis Report (withheld) 9) Incident Event & Causal Factor Chart (withheld) 10) Incident Summary Chart (withheld) 11) Incident Barrier Analysis Worksheet (withheld) 12) Personnel Interview List (withheld) 13) Document Review List

In response to a public records request under Alaska statutes, The Alaska State Pipeline Coordinator’s Office has released the company investigation report, withholding Attachments 2 through 7 and 9 through 12 , citing security and public health and welfare (Attachments 2 through 7), protection of trade secrets (Attachments 9 through 11) and harm to the state’s oversight ability (Attachment 12).

Page 2: Alyeska Pipeline Service Company, -   Home Page
Page 3: Alyeska Pipeline Service Company, -   Home Page
Page 4: Alyeska Pipeline Service Company, -   Home Page

Executive Summary

On May 25, 2010, during a scheduled short duration shutdown of the Trans Alaska Pipeline System (TAPS), Pump Station 9 (PS09) experienced an unrecognized relief event. The event resulted in overfilling of Relief Tank 190 (TK-190) and a product spill into secondary containment. There were no injuries and the pipeline was successfully restarted approximately 77 hours after initial notification. The volume of product spilled was approximately 4,500 barrels of Alaska North Slope Crude Oil.

On May 19, 2010, the Operations Control Center (OCC) conducted a Safe Operating Committee (SOC) meeting to review procedure TP-OCC-1007, Pipeline Short Duration Shutdown, May 25, 2010, for Valve Testing, Fire System Testing, and Turin Work, and discuss work activities planned for the shutdown. On May 24, 2010, the Shutdown Commander requested and received oral confirmation that everything was in place and all requirements had been satisfied to ensure a successful shutdown. On the same day, the fire system testing TWSPs were approved and the Shutdown Coordinator issued the Approved Work List indicating sacs on these procedures had not been completed.

an May 25, 2010, a final golno go meeting was conducted by teleconference with involved personnel attending. The Shutdown Commander again orally verified everything was in-place and all requirements had been satisfied. At 9:02 am, acc initiated the pipeline shutdown in accordance with the approved procedure TP-aCC-1 007. At 9:24 am, Maintenance Technicians initiated Temporary Work Site Procedure TWSP-40007697-03, Test PS09 Manifold IR Detector Response, the first of three PS09 fire system test procedures to be performed during the shutdown. The purpose of the first test was to specifically trigger a Station Isolation command, At 9:33 am PS09 battery limit valves BL-1 and BL-2 closed as a result of the Station Isolation command issued. In accordance with TP-aCC-1007, acc subsequently initiated a command to reopen battery limit valves BL-1 and BL-2, the valves traveled to their fully open position, and the station was back in normal configuration. The second test of the fire system TWSP-40007697-01, Test PS09 Control Module Room Pull Station Response, was initiated and completed by the Maintenance Technicians.

At 10:20 am, the third and final fire system test procedure TWSP-40007697-02, Test PS09 PDC Voted Smoke Response, was initiated. This test simulated an electrical fire in the station Power Distribution Center (PDC) Module. System design for an electrical fire in the PDC Module is to isolate power by opening the breakers to the transformers for the PDC, The Control Module receives power from the PDC so it also lost power as expected. The Control Module's redundant power supply is designed utilizing an Uninterruptible Power Supply (UPS) with batteries and a 65 kW generator The Control Module UPS failed separating the 65 kW generator from the electrical bus. The PDC Module also has a UPS redundant power supply designed to pick-up load, which also failed.

The primary power and redundant power outage to the Control Module caused the critical station control systems to shutdown: Safety, Integrity, Pressure Protection System (SIPPS), the Network Interface Panel (NIP), and the Station Control Panel (SCP). As a result, acc lost visibility to and control of PS09. The tank farm audible and visual evacuation alarm did not function as a result of the loss of primary power and the loss of SIPPS. There was also no normal or emergency lighting available in the Control Module.

The relief system is designed to fail open during a power outage so that the pipeline is not inadvertently over pressured. There are a total of five relief valves: three suction side (upstream) relief valves and two discharge side (downstream) relief valves. All five valves went to the fully open position upon station power loss.

The fire system testing Maintenance Technician called acc via phone to notify the controller of the power loss and that he would begin troubleshooting. The Technician and an on-site Fire Systems Engineer verified the breakers to the POC Module transformers were tripped as expected by performance of the fire system testing procedure.

Confidential and proprietary infonnation protected from public disclosure under AS 40.25.120(a)(4) and the Freedom oflnformation Act, Exemption 4,5 USC § 522 (b)(4) trade secrets and commercial or financial infonnation obtained from a person and

TK-190 Overfill Incident privileged or confidentiaL Page 3 of 17

Root Cause Analysis Report And Post Accident Review 6/22/10

I

Page 5: Alyeska Pipeline Service Company, -   Home Page

acc and PS09 field personnel began to focus attention on reestablishing communications with the Control Module and power restoration to the station. acc contacted an Automation Engineer located at the acc facility to assist field personnel with the troubleshooting process. The attention of acc was singularly focused on restoring power and communication with the Control Module. They did not recognize that loss of power to the Control Module caused the PS09 relief valves to open allowing crude oil to flow into TK-190. PS09 field personnel also did not recognize the relief valves went to the open position after station power joss.

The Maintenance Technician troubleshooting the redundant power supply visually inspected the UPS breakers in the Control Module and found them in a closed (normal) position. A UPS providing power to emergency lighting is required by UL 924 to have a guard installed on the breakers to preclude inadvertent contact, which may open the breakers. When the DC input breaker opened during the power outage, the visible switch indicating breaker status was not able to travel to the tripped position due to the protective guard. This masked the actual condition that the breaker was tripped and not closed as indicated by the breaker switch.

After several minutes of not being able to identify the redundant power supply failure mode, the troubleshooting team decided to restore primary power. The breakers to the POC transformer that had opened during the final fire system test were reset (closed) and immediately tripped open again. The technicians attempted to close the POC transformer breakers two more times. The troubleshooting team realized that the SCP resets all internal registers to zero during a power outage. When power was restored, the reset SCP indicated a fire in the POC Module and reopened the breakers to the POC transformer. The troubleshooting team decided to isolate the SCP and then restore primary power. Before this action was fully completed TK-190 was observed to be overfilling.

At 11 :00 am, the Shutdown Coordinator began a scheduled shutdown status meeting via teleconference, In the background of the teleconference a radio call from PS09 was overheard announcing TK-190 was overfilling. All non-essential personnel were evacuated from the station. A Valve Program Engineer, a Fire System Engineer, and Maintenance Technicians began to systematically close the upstream and downstream relief block valves in the Manifold Building. Maintenance Technicians also reported to the battery limit valve BL-1 and started closing it.

acc relieved line pressure into the PS05 relief tank and began isolating the pipeline north of PS09 by sequentially closing Remote Gate Valves (RGV) at 11 :06 am. PS09 on-site personnel finished closing the ten relief block valves at 11:15 am. The PS09 substation 13.8 kV breaker was opened by on-site personnel electrically isolating the station from utility power at 11 :35 am. Technicians closed each pump unit's suction and discharge valves and the station recycle valve. Battery limit valve BL-1 was closed at 11:50 am and battery limit valve BL-2 was closed at 12:16 pm. These actions cumulatively fully isolated the station hydraulically and electrically. The Incident Management Team was activated, which started the response coordination.

A number of significant incidents on TAPS over the last several years, demonstrate a trend of operational discipline deficiencies similar to those involved with the TK-190 overfill incident. The investigations were conducted satisfactorily with identification of key contributing factors, root causes, and corresponding recommendations. Many of the key recommendations developed from the previous incident investigations have direct application to the TK-190 overfill incident. The previous investigation reports and corresponding recommendation implementation demonstrate significant energy and resources were dedicated to maintain TAPS safety and integrity. However, there is recognition of a need for significant improvement in the organization's ability to effectively learn from these experiences and prevent recurrence.

Root Causes

A Technical Failure Analysis was completed but was unable to specifically determine the physical failure cause. The UPS failure was not expected and despite multiple attempts subsequent to the TK-190 overfill incident, the failure could not be recreated and no specific root cause identified.

Confidential and proprietary infonnation protected from public disclosure under AS

TK-190 Overfill Incident Root Cause Analysis Report

40.25.120(a)(4) and the Freedom ofInfonnation Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial infonnation obtained from a person and

Page 4 of 17

And Post Accident Review privileged or confidential. 6/22/10

Page 6: Alyeska Pipeline Service Company, -   Home Page

Root Cause #1 - Design Less Than Adeguate (L TAl

Several technical and design issues were identified during the investigation These items are detailed further in this report and should be reviewed and fully considered by a team of technical discipline experts to determine any necessary system modifications.

Root Cause #2 - Previous Incident MAPs & Lessons Learned LTA

Over the last several years, there have been a number of incidents with resulting Management Action Plans (MAPs) intended to implement recommendations identified during the investigations. Lessons Learned are routinely conducted throughout the organization for a wide variety of activities such as major maintenance completion, pipeline shutdowns, oil spill drills, and incident response. Despite the efforts made to address previous incidents and to learn from previous work activities, there continues to be a pattern of significant incidents occurring. As an organization, we are not optimizing our opportunities to learn. Completion of actions intended to prevent incidents and the opportunities to learn from work activities have not been effective in influencing the culture or behaviors.

Contributing Causes

ContributinQCause #1 - Situational Awareness LTA

Employees knowledgeable of operational processes (OCC & field) did not react in a manner that supported the safety and integrity of TAPS. Situational awareness is paramount to responding to abnormal conditions. Interviews suggest many of those directly involved in this event, both at OCC and PS09, reflected that they should have realized the relief valves had opened and crude was flowing to TK­190 when power was lost.

Contributing Cause #2 - Safe Operating Committees LTA

The Corporate Safety Program, SA-38, and other Department Operating Procedures (OOPs) outline specific requirements for performing Safe Operating Committee (SOC) review of procedures. As a result of the recent PS09 Piping Overpressure Event in July 2009, some of these procedures were improved and others developed. These procedures were reviewed during the investigation and noted as not fully addressing all the issues previously identified. Specific to this event, SOCs were not conducted by field personnel for the three fire system testing procedures (TWSP-40007697-01, 02, 03) to fully assess potential impacts by their execution. An SOC was conducted for the OCC May 25, 2010, shutdown procedure (TP-OCC-1007) by OCC, but it did not include all affected parties as required (e.g., no PS09 participation).

Contributing Cause #3 - Standards, Policies, and Administrative Controls (Procedures) LTA

Policies, standards, procedures, and administrative controls are in place to help govern personnel actions during pipeline normal and abnormal conditions. During the May 25, 2010 shutdown and the TK-190 overfill incident, there were gaps and conflicts in procedural requirements and compliance. Work permits were issued by the Response Base Supervisor and Pump Station Caretaker. The Shutdown Coordinator located at the FEOC was accountable for coordination of shutdown work activities, the Fairbanks Maintenance Base O&M Supervisor was on site supervision, and work procedural initiation was provided by OCC controllers. Personnel interviews indicated confusion about who had primary control and process oversight during the shutdown.

Recommendations

See the detailed descriptions of the Root and Contributing Causes of this report for recommendations.

Confidential and proprietary information protected from public disclosure under AS TK-190 Overfill Incident 4025120(a)(4) and the Freedom oflnfonnation Act, Exemption 4,5 USC § 522 (b)(4) Page 5 of 17 Root Cause Analysis Report trade secrets and commercial or financial information obtained from a person and And Post Accident Review privileged or confidential 6/22/10

Page 7: Alyeska Pipeline Service Company, -   Home Page

Incident Description

On May 25, 2010, during a scheduled short duration shutdown of the Trans Alaska Pipeline System (TAPS), Pump Station 9 (PS09) experienced an unrecognized relief event. The event resulted in overfilling of Relief Tank 190 (TK-190) and a product spill into secondary containment. Immediate actions included personnel evacuation, valve closure to stop flow, pump station electrical isolation, and activation of the Incident Management Team (IMT). There were no injuries and the pipeline was successfully restarted approximately 77 hours after initial notification. The volume of product spilled was approximately 4,500 barrels of Alaska North Slope Crude Oil. (Reference Attachment 1)

On May 19,2010, the Operations Control Center (OCC) conducted a Safe Operating Committee (SOC) meeting to review procedure TP-OCC-1 007, Pipeline Shori Duration Shutdown, May 25, 2010, for Valve Testing, Fire System Testing, and Turin Work, and discuss work activities planned for the shutdown. The scope description summarized the work as "... pipeline shutdown to allow Valve Testing at PS07 and PS09, PS05 AT&T Turin work, PS09 Fire System Testing and Gas Building work at PS03." The SOC participants included an oee Controller, Project Lead, Fire System Engineer, Automation Engineer, and PS03 personnel. There were no personnel from PS09 in attendance.

In preparation for the shutdown green light meeting on May 24, 2010, the Shutdown Commander requested and received oral confirmation that everything was in place and all requirements had been satisfied to ensure a successful shutdown. SOC reviews were required for the fire system testing Temporary Work Site Procedures (TWSP). On the same day, the fire system testing TWSPs were approved and the Shutdown Coordinator issued the Approved Work List indicating sacs for these procedures had not been completed.

Prior to the pipeline shutdown on May 25, 2010, the PS09 fire system was released by OCC to Maintenance to prepare for the scheduled tesflng. A final go/no-go meeting was conducted in the Fairbanks Emergency Operations Center (FEOC) and by teleconference with involved personnel attending. During the meeting the Shutdown Commander again orally verified everything was in place and all requirements had been satisfied. Unit Work Permit U-PS9-05241 0-004 was approved and issued by the PS09 Response Base Supervisor and the Caretaker for the fire system testing. The Unit Work Permit included, "Fire Tests: Manifold, Control & PDC-Test IR, Pull Station & Smoke Appliances per TWSPs NOTE: Valve movement, HVAC, Unit SO's, Stobes & pwr outage (PDC, Control Mod, Guard Shack) ... " as the work description.

The PS09 Caretaker had multiple roles during the shutdown. The Caretaker was the process representative, one of the maintenance technicians assigned to perform the fire system testing, and the Operations single point of contact (SPOC) for project F645, the PS09 02 Valve Replacement. Project F645 was scheduled to conduct testing on battery limit valve BL-1 during the shutdown in preparation for project implementation during the July, 2010, scheduled shutdown. An Alyeska Valve Program Engineer was on-site at PS09 to support the BL-1 valve testing and a SOC had been completed for this work.

At 9:02 am, OCC initiated the pipeline shutdown in accordance with the approved procedure TP-OCC­1007. The suction relief set-point at PS09 was changed from normal operating relief pressure of 310 psi to 475 psi for the static state conditions.

At 9:24 am, the Maintenance Technicians initiated Temporary Work Site Procedure TWSP-40007697-03, Test PS09 Manifold IR Detector Response, the first of three fire system test procedures to be performed during the shutdown. Similar fire system testing had been completed during pipeline slowdowns in early May at PS03 and PS04. The previous testing required the Maintenance Override System (MOS) to be activated to isolate the pipeline control system. The PS09 testing on May 25, 2010, was being performed to demonstrate the full functionality of the fire system so the MOS was not activated.

The purpose of the first test was to specifically trigger a Station Isolation command, which was automatically activated as designed. At 9:33 am PS09 battery limit valves BL-1 and BL-2 closed as a result of the Station Isolation command issued. This portion of the fire system testing was successfully

Ccnfidential and proprietary information protected from public disclosure under AS TK-190 Overfill Incident 40.25.120(a)(4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) Page 6 of 17 Root Cause Analysis Report trade secrets and commercial or financial information obtained from a person and And Post Accident Review privileged or confidential 6/22/10

Page 8: Alyeska Pipeline Service Company, -   Home Page

completed. In accordance with TP-OCC-1007, oce subsequently initiated a command to reopen battery limit valves BL-1 and BL-2, the valves traveled to their fully open position, and the station was back in normal configuration.

The second test of the fire system TWSP-40007697-01, Test PS09 Control Module Room Pull Station Response, was initiated by the Maintenance Technicians. aec confirmed that a Station Shutdown was activated successfully completing this fire system test. Controllers in acc noted increasing pipeline head pressure at PS09, due to pipeline topography causing flow, and increased the suction relief set-point from 475 psi to 500 psi.

At 10:20 am, the third and final fire system test procedure TWSP-40007697-02, Test PS09 POC Voted Smoke Response, was initiated. This test simulated an electrical fire in the station Power Distribution Center (POC) Module. System design for an electrical fire in the POC Module is to isolate power by opening the breakers to the transformers for the PDC. The Control Module receives power from the POC so it also lost power as expected (Reference Attachment 2).

The Control Module's redundant power supply is designed utilizing an Uninterruptible Power Supply (UPS) with batteries and a 65 kW generator. The Control Module UPS failed separating the 65 kW generator from the electrical bus. The POC Module also has a UPS redundant power supply designed to pick-up load, which also failed (Reference Attachment 3).

The primary power and redundant power outage to the Control Module caused the critical station control systems to shutdown: Safety, Integrity, Pressure Protection System (SIPPS), the Network Interface Panel (NIP), and the Station Control Panel (SCP). As a result, OCC lost visibility to and control of PS09. There were no system updates received by OCC, with the last known station status conditions displayed on the Operator Work Stations (OWS). There was no normal or emergency lighting available in the Control Module.

The relief system is designed to fail open during a power outage so that the pipeline is not inadvertently over pressured. There are a total of five relief valves: three suction side (upstream) relief valves and two discharge side (downstream) relief valves (Reference Attachment 4). All five valves went to the fully open position upon station power loss. Crude oil was flowing into TK-190 from both the suction (north) as well as the discharge (south) sides of the station due to pressure created as a result of area topography (Reference Attachment 5).

The tank farm audible and visual evacuation alarm did not function as a result of the loss of primary power and the loss of SIPPS. The tank farm audible and visual alarm is designed to activate when the relief valves are open to 5% or more. The alarm is activated by the SIPPS and is powered on a non­critical circuit. Non-critical circuits are not supported by the redundant power supply.

The fire system testing Maintenance Technician/Caretaker called OCC via phone to notify the controller of the power loss and that he would begin troubleshooting. The Technician and an on-site Fire Systems Engineer verified breakers 1-52-6 and 2-52-7 to the PDC Module transformers were tripped as expected by performance of the fire system testing procedure. The 65 kW generator was running but the back-up automatic transfer switch (ATS) and the UPS indicator lights were not illuminated. The Maintenance Technician verified the ATS linkage and breaker were in the proper configuration.

OCC successfully contacted each of the other pump stations to ensure open communications. acc and PS09 personnel began to focus attention on reestablishing communications with the Control Module and power restoration to the station. acc contacted an Automation Engineer located at the OCC facility to assist field personnel with the troubleshooting process. The attention of the acc Pipeline Controllers, Lead Controller, and acc Supervisor was singularly focused on restoring power. No one at acc conducted an overall system review to gain context or larger understanding for the events occurring at PS09. While OCC was focused on restoring communication with the Control Module, they did not recognize that loss of power to the Control Module caused the PS09 relief valves to open allowing crude oil to flow into TK-190.

Confidential and proprietary information protected from public disclosure under AS

TK-190 Overfill Incident 40 25. 120(a)(4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) Page 7 of 17 Root Cause Analysis Report trade secrets and commercial or financial information obtained from a person and

And Post Accident Review privileged or confidential 6122110

Page 9: Alyeska Pipeline Service Company, -   Home Page

Field personnel located at PS09 during the shutdown were focused on maintenance. Similar to acc, they did not recognize the relief valves went to the open position after station power loss. On-site personnel were in the Control Module and the PDC Module troubleshooting the power fa·liures.

Pump Station electrical distribution is fed through a rectifier in the UPS that converts ,A.C to DC power, and then through an inverter that converts the DC power to AC. The design of the system through the UPS is to provide "clean power" to the Control Module. A DC input breaker located before the inverter tripped during the power outage. This breaker is downstream of both the UPS batteries and supplemental generator, which created a single point of faiiure in the system (Reference Attachment 3).

The technicians troubleshooting the redundant power supply visually inspected the UPS breakers in the Control Module and found them in a closed (normal) position. These UPS units were originally configured for supplying power to critical circuits. In application, they were also used to provide power for emergency lighting. A UPS providing power to emergency lighting is required by UL 924 to have a guard installed on the breakers to preclude inadvertent contact, which may open the breakers. When the DC input breaker opened during the power outage, the visible switch indicating breaker status was not able to travel to the tripped position due to the particular design of the protective guard. This masked the actual condition that the breaker was tripped and not closed as indicated by the breaker switch (Reference Attachment 6).

After several minutes of not being able to identify the redundant power supply failure mode, the troubleshooting team decided to restore primary power. The breakers to the PDC transformer that had opened during the final fire system test were reset (closed) and immediately tripped open again. The technicians continued troubleshooting this condition and subsequently attempted to close the PDC transformer breakers two more times. During the troubleshooting process the team realized that the Station Control Panel (SCP) Programmable Logic Controller (PLC) resets all internal registers to zero during a power outage. When power was restored, the reset Stat'lon Control Panel indicated a fire existed in the PDC Module and opened the breakers to the PDC transformer. On-site personnel working with the Automation Engineer located at OCC, decided to isolate the Station Control Panel and then restore primary power. Before this action was fully completed the fire system testing Maintenance Technician/Caretaker observed TK-190 overfilling.

At 11 :00 am, the Shutdown Coordinator began a scheduled shutdown status meeting via teleconference. OCC notified the FEOC there was no visibility to PS09 due to the Control Module loss of power. acc did not have an estimate for when power would be restored. The Fairbanks Maintenance Base Operations and Maintenance (O&M) Supervisor on-site at PS09, reported the fire system testing was complete but they were having difficulty restoring power. In the background of the teleconference, a radio call from PS09 was overheard announcing TK-190 was overfilling.

The Maintenance Technician/Caretaker announced via radio for all non-essential personnel to evacuate the station and the on-site Valve Engineer supporting project F645 to report to the Manifold Building. Assisted by the Fire System Engineer and other Maintenance Technicians they began to systematically close the upstream and downstream relief block valves. Maintenance Technicians also reported to the station battery limit valve BL-1 and started closing it using the RB-100 mule and generator, which had been staged at the valve. No gas detection monitoring was conducted at the Manifold Building. However, personnel responding to BL-1 did perform gas detection monitoring.

OCC relieved line pressure into the PS05 relief tank and began isolating the pipeline north of PS09 by sequentially closing Remote Gate Valves (RGV) at 11 :06 am. PS09 on-site personnel finished closing the ten relief block valves at 11 :15 am. The PS09 substation 13.8 kV breaker was opened by on-site personnel electrically isolating the station from utility power at 11 :35 am. The Maintenance Technicians closed each pump unit's suction and discharge valves and the station recycle valve. PS09 battery limit valve BL-1 was closed at 11:50 am and battery limit valve BL-2 was closed at 12:16 pm using the relocated RB-100 mule and generator. These actions cumulatively fully isolated the station hydraulically and electrically. The IMT was activated, which started the response coordination and management.

Confidential and propnetary information protected from public disclosure under AS TK-190 Overfill Incident 40.25.l20(a)(4) and the Freedom ofInformation Act, Exemption 4,5 USC § 522 (b)(4) Page 8 of 17 Root Cause Analysis Report trade secrets and commercial or fmancial information obtamed from a person and And Post Accident Review privlleged or confidential. 6122110

I

Page 10: Alyeska Pipeline Service Company, -   Home Page

Background

A number of significant incidents on TAPS over the last several years, demonstrate a trend of operational discipline deficiencies similar to those involved with the TK-190 overfill incident These incidents include:

1. Pump Station 9 Tank Vent Fire on January 6, 2007 2. RGV 32 Leak on January 9,2007 3. Pump Station 9 TK-190 Overfill Near Loss on March 22,2007 4. Pump Station 9 Energy isolation Near Loss Events in October, 2008 5. Pump Station 1 Sadelrochit Stream Gas Excursion on January 15, 2009 6. Pump Station 9 Piping Overpressure Event on July 19,2009.

Each of the above noted incidents appear to have been investigated in a manner consistent with requirements and expectations. The investigations were conducted satisfactorily with identification of key contributing factors, root causes, and corresponding recommendations. The recommendations were either implemented within a separate Manage Action Plan (MAP) or embedded within the report. These recommendations appear to have been completed within the context of each individual incident in question and were believed to have been effective toward mitigating likelihood and consequence of further incidents. Many of the key recommendations generated from these incident investigations have direct application to the TK-190 overfill incident as summarized below:

" Perform a Process Hazard Analysis (PHA) on tank farms, sumps, vents and drains " Strengthen our Process Safety Management (PSM) practices on TAPS " Install automatic audible alarms in Tank Farms for relief events " Review with OCC controilers to provide increased awareness of abnormal conditions " Identify all sources of hazardous energy present in the work site prior to starting work " Consider additional protocols for blocking incoming flows to the pump station if facilities and/or

personnel are at risk " Review current requirements for SOC reviews " Establish a standard protocol for evaluating non-routine operations prior to proceeding with same " Consider implementation of clear, concise, quick-reference, guidance documents to expedite

correct response and communication by operating personnel " Review the effectiveness of audible station alarms " Modify work permitting and OCC procedures to strengthen and document the turnover/transfer of

custody of equipment " Reinforce with OCC controllers their accountabilities for monitoring and maintaining system

conditions at all times " Reinforce role/accountability of the maintenance work force in energy isolation and work

implementation after taking and when returning custody of equipment " Conduct an Operations Engineering review of pump station valve configurations to identify

configurations that could be injurious to the facility " Validate shutdown maintenance planning process; Ensure all personnel understand their roles

and accountabilities " In addition to procedure reviews and the SOC process, use the pipeline real-time simulator and/or

control system test bed to test shutdown procedures, conflicts with controls system logic, and OCC controller training

• Clarify expectations regarding when to convene an SOC • Review the effectiveness of "Operational Discipline" procedures; reiterate expectations with

management teams and employees.

The previous investigation reports and corresponding recommendation implementation demonstrate significant energy and resources were dedicated to maintain TAPS safety and integrity. However, there is recognition of a need for significant improvement in the organization's ability to effectively learn from these experiences and prevent recurrence. The previous incident actions have been completed, however,

Confidential and proprietary information protected from public disclosure under AS

TK-190 Overfill Incident 40.25. 120(a)(4) and the Freedom ofInfonnation Act, Exemption 4, 5 USC § 522 (b)(4) Page 9 of 17 Root Cause Analysis Report trade secrets and commerCIal or fmancial information obtained from a person and And Post Accident Review privileged or confidential. 6/22/10

I

Page 11: Alyeska Pipeline Service Company, -   Home Page

they did not result in the cultural and behavioral changes addressing the above noted issues and those contained further in this report.

OMPV-0001, Management of TAPS Shutdown Based Maintenance, includes a requirement of conducting post shutdown Lessons Learned. However, there is no direction or requirements for how this information is specifically considered during planning for the next shutdown. A review of the Lessons Learned Reports from the four previous scheduled pipeline shutdowns revealed a number of "Opportunities for Improvement" items repeated. For example, there are multiple references to SOCs, concern was expressed with the time allotted between shutdowns and resource availability, and procedures received OT revised at the last minute.

TAPS employees continually identify hazards, assess risk, and implement mitigation measures in the course of business. An assessment of hazards and risks was conducted in October 2009 for the fire system testing work at another pump station, which identified risks specific to a UPS failure. An email was distributed by the responsible Fire Systems Engineer to the Fairbanks Maintenance team which identified the series of events to occur and potential ramifications to consider. The email included the following statements:

"You will want to notify aee that you are testing. This procedure will remove 480V power from the poe affecting valves, process ventilation, and relief systems. The valves will fail in place, the ventilation will shut down and the relief systems should not be affected because of the UPS system that powers them. It will not shut down the station unless the relief system UPS is not up to snuff. When the test is done the valves and relief will resume where they left off. The ventilation must be restarted."

There are two important observations from these statements. First, had an SOC been conducted on the fire system testing procedures and these individuals involved, it is likely the risk would have been identified and a contingency developed in the event of UPS failure (e.g., close in TK-190). Second, this information was never used in a broader consideration of risks for the fire system testing.

Investigation Team

Immediately after the May 25, 2010 incident, Alyeska's Senior Management requested a team be assembled to investigate the incident and potential management system deficiencies. The Investigation Team was tasked with developing an incident description, identifying root and contributing causes, and developing recommendations to mitigate or prevent recurrence. Team members were selected based on specific expertise and comprised of TAPS personnel with varied qualifications, experience, and backgrounds. These include operations, maintenance, engineering, operational discipline, process safety management, and a Root Cause Analysis Subject Matter Expert (SME). The investigation team members are:

Ray Grubb, Quality Assurance Supervisor, (Investigation Team Lead) Whitney Grande, Senior Health & Safety Manager Scott James, Facility Engineering Supervisor Brendan LaBelle-Hamer, Pump Station 1 Operations & Maintenance Supervisor Andres Morales, Valdez Maintenance Manager Tom Stokes, Operations Business Strategy Manager

Methodology

Two distinct approaches were taken to complete the incident investigation; 1) an Incident Root Cause Analysis (RCA) was conducted in conjunction with, 2) a Technical Failure Analysis. The Incident RCA was conducted to identify any potential management system deficiencies which acted as root or

Confidential and proprietary information protected from public disclosure under AS 40.25.120(a)(4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) TK-190 Overfill Incident Page 10 of 17

Root Cause Analysis Report trade secrets and commercial or financial information obtained from a person and privileged or confidential. And Post Accident Review 6/22/10

I

Page 12: Alyeska Pipeline Service Company, -   Home Page

contributing causes. The Technical Failure Analysis was conducted to determine the specific technical failure mode (Reference Attachment 8)

The Technical Failure Analysis Team members are:

Dave Roberts, Automation Engineer Bill Frichtl, Senior Electrical Engineer Rick Signor, Maintenance Technician

The Incident RCA included data gathering using various techniques such as personnel interviews, documentation reviews, and records reviews. The data and evidence gathered provided the basis for the development of an Event & Casual Factor Chart (Reference Attachment 9). This investigation tool helps show what happened (or did not), when it happened, and what organizations/personnel were involved. The Event & Casual Factor Chart (E&CF) chart evolved from data gathered during personnel interviews (Reference Attachment 12) and document and record reviews (Reference Attachment 13). The E&CF was also used to identify additional interviewees, documents, and records where potentially useful information could be obtained. It was also used to determine if additional analysis tools (e.g. Barrier or Change Analysis) would assist with the investigation or analysis. Each event on the E&CF was then comprehensively considered based on data gathered to identify causal factors.

A Barrier Analysis was completed to assess the adequacy of the barriers or safeguards that should have prevented or at least mitigated the TK-190 overfill (Reference Attachment 11). Barriers can be physical such as labeling, placards, guards and design; or administrative controls such as policies, procedures, or work permits. A Barrier Analysis can also identify missing barriers that should have been in place but were not. The Barrier Analysis did not identify any missing barriers but it did identify additional barriers for the investigation team to consider and include in the E&FC.

Following standard root cause determination methods and utilizing the TAPRoot@ Root Cause Tree, each causal factor was worked through the TAPRoot® Root Cause Tree to identify specific root causes. The causal factors were then bundled into distinct categories, which were comprehensively considered for root and contributing cause designations. The causal factors, contributing causes, and root causes were used as the basis for recommendation development summarized in an Incident Summary Chart (Reference Attachment 10). The recommendations are for Alyeska Senior Management consideration. If further action is determined necessary, Alyeska management will develop the strategy and identify the resources necessary for successful implementation.

Root Causes

The Technical Failure Analysis was not able to specifically determine the physical failure cause. The UPS failure was not expected and despite multiple attempts subsequent to the TK-190 overfill incident, the failure could not be recreated and no specific root cause identified. Recommendations are included in the Technical Failure Analysis Report, which is hereby incorporated into this report. (Reference Attachment 8)

Root Cause #1 - Desiqn Less Than Adequate (LTA)

Several technical and design issues were identified during the investigation. Each item should be reviewed and fully considered by a team of technical discipline experts to determine any necessary system modifications.

At the same time the PDC transformer breakers 1-52-6 and 2-52-7 opened as part of the fire systems test, the Control Module UPS (39-UPS-4603R) internal breaker for DC input opened. This breaker has a shunt trip which opens the breaker on DC under voltage, down stream AC over voltage, or over current through the inverter. These functions are standard features on the UPS to provide protection for the UPS itself and caused a single point of failure.

Confidential and proprietary infonnation protected from public disclosure under AS TK-190 Overlill Incident 40.25.120(a)(4) and the Freedom of Infonnation Act, Exemption 4,5 USC § 522 (b)(4) Page 11 of 17 Root Cause Analysis Report trade secrets and commercial or fmancial infonnation obtained from a person and And Post Accident Review privileged or confidential 6/22/10

Page 13: Alyeska Pipeline Service Company, -   Home Page

A guard was installed on the Control Module UPS DC input breaker to protect it from accidentally being tripped. This guard is UL required because the UPS feeds emergency lighting circuits. The particular configuration of the guard prevented visual confirmation that the breaker had opened and made it difficult to troubleshoot.

When power was restored to the Station Control Panel, part of the initialization sequence was to reset al! the PLC internal r'egisters to zero. When this happened, all the outputs were reset, which in this case indicated a fire in the PDC Module and opened the PDC transformer breakers again.

The tank farm audible and visual alarm activates whenever any of the relief valves move open 5% or more. The alarm is powered from the non-critical bus and gets the control signal from SIPPS. The lOSS of power to the Control Module caused the SIPPS panel to be de-energized resulting in the tank farm evacuation alarm not functioning.

After the first fire system test, battery limit valves BL-1 and BL-2 could have been either in the normally operated open position or remained in a closed position isolating the station. Had the valves remained in a closed position, the subsequent power outage would have caused minimal crude flow and TK-190 would not have overfilled. The relief valves provide protection from over pressurizing the pipeline. Isolating the pipeline from the relief system is permissible but must be properly executed and managed.

In preparation for the shutdown, the fill level of TK-190 had been reduced from 18' to 13.8'. Prior to 1994 the minimum fill level for TK-190 was approximately 5.5'. Currently, the tank has a minimum operating fill level of 13.3' to ensure flooding of the relief valves and elimination of vapor pocket formation. It is unclear if TK-190 capacity at the existing minimum fill level is adequate under the current operating philosophy.

Maintenance issues were noted during the investigation including UPS batteries with some dead cells and temperature concerns in the PDC Module. Work orders were generated prior to the May 25, 2010, shutdown to address this work but had not been completed. This work was not considered critical by the Investigation Team and not integral to the incident.

Root Cause #1 Recommendations:

1. Establish technical teams to conduct reviews of the technical and design issues identified above and to validate the overall Strategic Reconfiguration (SR) Design Basis. Examples of areas that should be considered for review include:

• Control Logic • Use and capacity of the breakout tanks • Process configuration and relief systems • Electrical configuration and circuits on the critical bus or non-critical bus • How power is fed to the control systems in the Control Module (SIPPS, SCADA, NIPS, and SCP).

2. Conduct a review of the philosophy and operating practice regarding the configuration of the mainline valves (RGVs and BLs) during shutdowns

3. Review Electrical and Automation (E&A) changes made at PS03 and PS04 and ensure all applicable to PS09 have been completed. Expedite the implementation of optimization projects at PS09, specifically the security and controls upgrade projects.

Root Cause #2 - Previous Incident MAPs &Lessons Learned LTA

Over the last several years, there have been a number of incidents with resulting Management Action Plans (MAPs) intended to implement recommendations identified during the investigations. Lessons Learned are routinely conducted throughout the organization for a wide variety of activities such as major maintenance completion, pipeline shutdowns, oil spill drills, and incident response. Despite the efforts made to address previous incidents and to learn from previous work activities, there continues to be a

Confidential and proprietary infonnation protected from public disclosure under AS TK-190 Overfill Incident Page 12 of 17 40.25.l20(a)(4) and the Freedom ofInformation Act, Exemption 4, 5 USC § 522 (b)(4) Root Cause Analysis Report trade secrets and commercial or financial infonnation obtained from a person andAnd Post Accident Review 6/22/10privileged or confidential.

Page 14: Alyeska Pipeline Service Company, -   Home Page

pattern of significant incidents occurring. As an organization, we are not optimizing our opportunities to learn. Personnel are working hard to complete all requirements and remain in compliance, but the completion of actions intended to prevent incidents and the opportunities to Jearn from work activities have not been effective in influencing the culture OJ behaviors. Although actions are implemented to address deficiencies and foster continuous improvement, the fragmented approach does not always result in the comprehensive results intended.

Reports and recommendations from previous incidents have not been communicated well throughout the organization. There may be some expectation for the communications to be included in the MAPs but most actions are specific with little disseminating of information for possible application to other parts of the organization. Review of the Incident Investigation Process and sub tier procedures reveals no direction regarding MAP development, implementation, or validation of resolution action effectiveness. There is usually no continuity between the Incident Investigation Team and the MAP Development Team. The richness of discussion and complete appreciation for the report recommendations may not always be transferred into the MAPs. The Operations Incident Review Board has not been meeting as routinely as intended and has not effectively communicated incident learning's throughout the organization.

The organization does not always conduct Lessons Learned with the rigor necessary or fully use the previous lessons learned for the next major maintenance activity or pipeline shutdown. An example of procedural deficiencies regarding Lessons Learned is OMPV-0001, Management of TAPS Shutdown Based Maintenance. This procedure includes a requirement for conducting post shutdown Lessons Learned but no direction or requirements for how this information is specifically considered during planning for the next shutdown. Management has not fully succeeded in fostering a culture of learning by taking actions such as establishing a centralized repository for lessons learned, establishing expectations for utilization, or developing methods for broader analysis and communication.

Root Cause #2 Recommendations:

1. Include TK-190 Investigation Team representation during the MAP development and implementation to ensure continuity during the transfer of recommendations into completed action items. Ensure Incident Investigation Team representation during MAP development and implementation for future incident investigations.

2. Enhance AMS-024, Incident Reporting, Investigation and Analysis Process, and LPS-001, Loss Prevention System, to provide direction and detail on MAP purpose, accountabilities, Investigation Team/MAP continuity, development, communication, tracking, and validation. Also, provide guidance to the Operations Incident Review Board to incorporate knowledge sharing and a learning culture.

3. Improve methods to provide easy and reasonable access to incident investigation reports, Lessons Learned, risk assessments, and hazard analysis (e.g. SOC, PHAs) type documents. Establish expectations for personnel to utilize the tools to foster a culture of knowledge sharing and learning throughout the organization.

Contributing Causes

Contributing Cause #1 . Situational Awareness LTA

Employees knowledgeable of operational processes (OCC & Field) did not react in a manner that supported the safety and integrity of TAPS. The investigation team strongly considered "Situational Awareness LTA" as a primary root cause. However, after much deliberation, the team concluded that this was a significant Contributing Cause of this event and actually fit beneath the umbrella of Root Cause #2, Previous Incident MAPs & Lessons Learned LTA. Specifically, situational awareness was identified in the previous PS09 Piping Overpressure Event report and the fact that it was identified as an issue during this incident investigation provides direct linkage to Root Cause #2.

Confidential and proprietary information protected from public disclosure under AS TK-190 Overfill Incident 40.25 .120(a)(4) and the Freedom of InfOImation Act, Exemption 4, 5 USC § 522 (b)(4) Page 13 of 17 Root Cause Analysis Report trade secrets and commercial or financial information obtained from a person and And Post Accident Review privileged or confidential 6/22/10

Page 15: Alyeska Pipeline Service Company, -   Home Page

Situational awareness is paramount to responding to abnormal conditions. Interviews suggest many of those directly involved in this event, both at acc and PS09, reflected that they should have realized the relief valves had opened and crude was flowing to TK-190 when power was lost. acc and operational procedures and training do not historically focus on static-state shutdown conditions. Procedures are not consistently written in a manner that provides the user with instruction or guidance on how to address an abnormal condition should the system not react in the expected manner. PS09 field personnel failed to monitor for gas detection before entering the Manifold Building during the initial incident response. Existing hazard recognition training programs focus on personnel safety (e.g., line of fire).

During the PS09 power outage, acc and field personnel immediately began troubleshooting the power and communications failures. No one stepped back to take a more holistic view of the pipeline or potential ramifications to the pump station due to loss of power. The acc Controllers had some degree of attention on the relief system because the relief valve set point was changed twice, but they did not recognize available information to help them further assess the situation (e.g., acc data indicating significant upstream flow). acc personnel monitor and react to volumes of data, alarms, and screens. As such, we must ensure they remain focused (situational awareness) and equipped (visibility to critical decision making data) to appropriately respond to abnormal conditions. This lack of action and preparedness prevailed in spite of a communication in 2009 which noted the fire system testing "will not shut down the station unless the relief system LIPS is not up to snuff'.

Individuals on-site at PS09 during the shutdown, including the a&M Supervisor, had operational backgrounds but their focus was on maintenance activities and not operational processes. These individuals did not achieve situational awareness nor did they react in a manner to safely address the abnormal condition. Individuals in operating roles must have operational process knowledge and focus to be successful. Currently, there are expectations for maintenance and response personnel to fill some of the roles historically performed by operations personnel. In the case of the TK-190 overfill incident, one individual filled three roles as the Pump Station Caretaker, Maintenance Technician, and project F645 SPOC.

While acc did not have visibility of PS09, monitoring upstream pressure at PS08 could have provided insight to the Controllers that pipeline pressure was dropping, indicating flow. OCC is responsible to operate a complex system in terms of station configurations, human machine interfaces (HMI), and complexity of automated systems. Each of the four active pump stations currently has a different configuration. PS09 is fully automated and unstaffed, PS03 and PS04 are also fully automated and transitioning to unstaffed, and PS01 has not started the automation process and remains in the legacy configuration. acc responses to alarms received vary station to station due to their particular configuration. For automated stations, the interconnection and failure modes are complex and can be difficult to troubleshoot. Automation Engineers are routinely consulted to assist acc and field personnel with troubleshooting activities.

Contributing Cause #1 Recommendations:

1. Enhance the organization's Process Safety competencies with emphasis on heightened awareness, understanding, and communication of Process Safety. Within this context consider: • Provide Process Safety Management (PSM) training for all managers involved with operations,

maintenance, and projects The training should familiarize personnel in leadership positions with the differences methods for managing process safety versus personnel safety

• Develop operating procedures to ensure pipeline shutdown states (e.g. slowdown, static) are considered and appropriately addressed

• Incorporate sac and risk assessment results with appropriate contingency actions for potential abnormal conditions in operating procedures (e.g., at specific hold points or significant steps within the procedure)

• Assess the opportunity to apply the Safe Performance Self Assessment (SPSA) concepts within an operating procedure (e.g., if this next step does not respond as intended what will I do and what should I be concerned about)

• Incorporate elements of Situational Awareness in the work permitting process.

'd t ConfidentIal and propnetary information protected from public disclosure under AS TK-190 0 ve rfill I nCI en ( .. Page 14 of 17 Root Cause Analysis Report 40.25.120 a)(4) and the Freedom of InfonnatIOn Act, ExemptIOn 4, 5 USC § 522 (b)(4) And Post Accident Review trade secrets and commefCIaJ or fmanclal mformatIOn obtamed from a person and 6/22/10

privileged or confidential

Page 16: Alyeska Pipeline Service Company, -   Home Page

2 Review Situational Awareness training programs and incorporate improvements into existing training considering: e Retaining a resource with expertise in the area of Situational Awareness " Defining Situational Awareness and its importance in operating TAPS safely

Scenario based simulations (OCC) and exercises (field) which test situational awareness and related decision making to abnormal conditions

.. Specific situational awareness exercises within spill drills e Process knowledge skills based training for accountable managers, supervisors, and technicians ~ Assessment of knowledge for accountable managers, supervisors, and technicians .. A structured feedback or Situational Awareness Mentoring Program where more experienced

technicians/controllers mentor new personnel focusing on potential abnormal conditions and how to respond appropriately.

3. Assess industry best practices and improve management processes for OCC alarms and HMI screens.

4. Enhance the investigation and lessons learned processes by incorporating a focus for identifying situational awareness deficiencies for improvement opportunities.

Contributing Cause #2 - Safe Operating Committees LTA

The Corporate Safety Program, SA-38, and other Department Operating Procedures (OOPs) outline specific requirements for performing Safe Operating Committee (SOC) review of procedures. As a result of the recent PS09 Piping Overpressure Event in July 2009, some of these procedures were improved and others developed. These procedures were reviewed during the investigation and noted as not fully addressing all the issues previously identified. As an example, SA-38 specifically requires DOP's be developed and to identify who can be an SOC Chairperson. OCC procedure OMD-01 01, Oil Movements Department Safe Operating Committee Review Requirements, was published in 2008 but this requirement has not been addressed.

Specific to this event, sacs were not conducted by field personnel for the three fire system testing procedures (TWSP-40007697-01, 02, 03) to fully assess potential impacts by their execution. An SOC was conducted for the OCC May 25, 2010, shutdown procedure (TP-OCC-1007) by OCC but it did not include all affected parties as required (e.g., no PS09 participation). There was an opportunity to block in or isolate the station and still accommodate the fire system testing and pressure protection of the pipeline. The SOC process employed did not recognize or encourage discussion involving alternate methods to isolate the station.

No requisite training on the SOC requirement was provided to personnel. There appears to be a high degree of variability in implementing the SOC process as it is inherently influenced by the Chairperson and participants. SA-38 provides structure but there is no standard methodology for conducting sacs. sacs may not be viewed as a critical barrier in our overall safety system and their value may not be fully understood. For example, a Lead Maintenance Technician was specifically questioned about sacs for the PS09 shutdown work and he answered affirmatively that they had been completed. However, the SOCs for the fire system testing TWSPs had not been completed prior to the shutdown.

Contributing Cause #2 Recommendations:

1. Revise relevant SOC programs, processes, and procedures (e.g. SA-38 and DOPs)to: • Ensure sufficient clarity is provided for when an SOC shall be performed and address any

remaining gaps previously identified in the PS09 Overpressure Investigation Report (e.g., OMD­0101 OCC Chairperson clarity) Incorporate risk assessment methodologies into the SOC processes and procedures to provide a standardized approach and aid in the performance of SOC's

Confidential and proprietary information protected from public disclosure under AS TK-190 Overfill Incident 40.25.l20(a)(4) and the Freedom of Information Act, Exemption 4,5 USC § 522 (b)(4) Page 15 of 17 Root Cause Analysis Report trade secrets and commercial or fmancial information obtained from a person and And Post Accident Review privileged or confidential. 6/22/10

Page 17: Alyeska Pipeline Service Company, -   Home Page

.. Consider incorporating a peer review or assurance process for each SOC to improve quality. The assurance activity should also provide verification that an SOC has been completed prior to commissioning affected work.

2. Enhance SOC program to improve training and include: SOC requirements

" How to conduct an SOC .. Methodology in assessing risks/hazards ~ Incorporation of risk mitigation into procedures and how to validate effectiveness " Value and importance of sacs .. Evaluation of all forms of energy (e.g. kinetic) .. Knowledge transfer assurance (i.e., attendee testing).

Contributing Cause #3 - Standards, Policies, and Administrative Controls (Procedures) LTA

Policies, standards, procedures, and administrative controls are in place to help govern personnel actions during pipeline normal and abnormal conditions. During the May 25, 2010, shutdown and the TK-190 overfill incident, there were gaps and conflicts in procedural requirements and compliance. Prior to the shutdown, required SOCs were not completed (i.e., fire system testing TWSPs) or did not meet the expectations of SA.-38 and OOPs. For example, OCC procedure OMD-0101 does not meet all SA-38 requirements (see Contributing Cause #2).

Personnel interviews indicated confusion about who had primary control and process oversight during the shutdown. Work permits were issued by the Response Base Supervisor and Pump Station Caretaker. The Shutdown Coordinator located at the FEOC was accountable for coordination of shutdown work activities, the Fairbanks Maintenance Base O&M Supervisor was on site supervision, and work procedural initiation was provided by OCC controllers. OMPV-0001, Management of TAPS Shutdown Based Maintenance, specifies personnel at the FEOC as the oversight and coordinating organization, but the role was fulfilled to some degree by the OCC Controllers, Response Base Supervisor, O&M Supervisor, and PS09 Caretaker.

During the shutdown, communication protocol between personnel located in the FEOC, OCC and PS09 on-site supervision did not meet the expectations intended by OMPV-0001, Immediately after the PS09 power loss, OCC continually communicated with the fire system testing Maintenance Technicians via phone, but the FEOC was not aware of the abnormal condition until the 11 :00 am scheduled shutdown status teleconference. This resulted in the inability of the FEOC to initiate a contingency plan and subsequent response to the abnormal conditions being encountered. A.dditionally, notification to all shutdown personnel would have increased the collective knowledge base and may have raised situational awareness to a level where there was recognition the PS09 relief valves were open due to the loss of power. The communication protocol outlined in OMPV-0001 describing the roles and responsibilities for accountable positions, content and update frequency of status meetings, and method of communication (i.e. 24 hour Meet-Me-Line) were not fully complied with and did not meet expectations.

During an emergency, EC-71-09 requires that OCC be the primary contact for coordination and OMPV­0001 requires the FEOC to be notified. A. previous event, PS09 Piping Overpressure, was the catalyst for development of OCC-3.15, acc Interface with Field Maintenance Work, to increase the rigor of work activity reviews for broader potential impacts. The log required to be completed by the OCC Controllers in support of OCC-3.15 does not meet the expectations of the intended rigorous review.

Inconsistencies between procedures added to the lack of full communications with all involved personnel during the shutdown.

Confidential and proprietary infonnation protected from public disclosure under AS 40.25.l20(a)(4) and the Freedom of Infonnation Act, Exemption 4,5 USC § 522 (b)(4) trade secrets and commercial or financial information obtained from a person and privileged or confidential.

TK-190 Overfill Incident Page 16 of 17 Root Cause Analysis Report And Post Accident Review 6/22/10

Page 18: Alyeska Pipeline Service Company, -   Home Page

2

Contributing Cause #3 Recommendations:

1. Clarify OMPV-0001, Management of TAPS Shutdown Based Maintenance, to ensure communications and notification consistency with the EC-71 series.

Enhance existing shutdown checklists or develop additional checklists as a requirement in OMPV-001 to provide a method for compliance verification

3. Enhance existing procedures to assure authority, control, and organizational hierarchy during pipeline shutdown conditions (e.g. OMPV-0001 & EC.·71).

4. Revise the OCC Field Maintenance Log (form 10089) which supports procedure OCC-3.15 to ensure the required critical questions are systematically asked before an entry to the log is made.

Confidential and proprietary infonnation protected from public disclosure under AS 40.25.120(a)(4) and the Freedom of Infonnation Act, Exemption 4,5 USC § 522 (b)(4) trade secrets and commercial privileged or confidential.

or financial information obtained from a person and

TK-190 Overfill Incident Root Cause Analysis Report And Post Accident Review

Page 17 of 17

6/22/10

Page 19: Alyeska Pipeline Service Company, -   Home Page

Attachments

Confidential and proprietary infonnation protected from public disclosure under AS 40.25.120(a)(4) and the Freedom of Infonnation Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial information obtained from a person and privileged or confidential.

TK·190 Overfill incident Rool Cause Analysis Report And Post Accident Review 6J22/10

Page 20: Alyeska Pipeline Service Company, -   Home Page
Page 21: Alyeska Pipeline Service Company, -   Home Page
Page 22: Alyeska Pipeline Service Company, -   Home Page
Page 23: Alyeska Pipeline Service Company, -   Home Page

Tecllnical F'aiJure Analysis Report PS09 Control System UPS Failure, Mny, 25, 2010

Summary

During a planned pipeline shutdown on May 25, 2010, fun functional tests of several sections of the fire protection system at PS09 were implemented. One of the tests involved manualty initiating a confirmed fire indication in the station Power Distribution Center (PDe) Module. During the test the protective features programmed into the control system functioned as designed and isoJated the utility power to the poe Module. There was also a resultant Joss of utiHty power to the Contro) Module. In both the POC and Control Modules, there should have been a ubumpJess" power transfer to the respective Uninterruptible Power Supplies (UPS). However, the UPS in both modules fajled to supply power when required. This resulted in a total loss of power 10 both the SIPPS and Station Control Panel (SCP) and resultant loss of process controls. The failure of both UPS was attributed to protective breakers opening between the DC batteries and the DC/AC inverters. Three subsequent tests resulted jn proper transfer to UPS power without tripping the subject breakers. To..date no definitive reason for these breakers opening has been identified. The most probable root cause is in intermittent Jow voltage problem on the DC side of the system. Additional system testing} a review of the maintenance program for the UPS. and a design review In the current operating context and risk profile are recommended. The recommendations are currently being worked with UPS enhancements being desfgned for near term implementation. Additional testing is planned to validate the UPS will respond as designed until the enhancements are made~

Background

During execution of PDe Fire Test Procedure on May 25,2010, a confirmed fire was initiated in the PDe Module 4701. Automatic controls isoJated the utility power into the poe Module per the C&E Matrix actions. Subsequent to PDe power isolation, utility power was lost to the Control Module 4601 ~ Technicians stationed in the PDe and Con1rol Modules observed that both modules lost utility power AS WELL AS UPS power. Control Module total power loss was confirmed by the Joss of power to SCP and SlPPS and by Joss of module emergency lighting. poe module UPS failure was confirmed by observation of loss of emergency lighting in the PDe module and in the manifold bUilding relief bay. The entire 13.8 kV power distribution system in the VFD modules was stilJ active throughout the event.

When the control module UPS failed while normal power was isolated, several systems shut down:

• S'PPS • Network Interface Pane) (Turin)

o SIPPS peer to peer primary network o SIPPS peer to peer backup network o SCADA primary network o SCADA backup network o Maintenance network.

• Station Contro! PLC (SCP) • SCADA primary Field Control Unit (FeU) • Control Module Emergency Hghting .. Security video panel

The fire panel and backup SCADA FeU stayed active due to battery backup. When SIPPS lost power several events occurred:

• AU 5 relief valves transitioned to their fail safe state (3 suction and 2 discharge) of fuff open.

TK-190 Overfill incident Attachment 8 Page 1 of4 Root Cause Anaiysis Report 6/22110 And Post Accident Review

VJ ~~ <'-'~..-..(\'3 ~.,rJo,w'-"d

.-oN ° dN tf.)

::-J V) is er:Qjc.. ~ u t\'S o \I) e]0 0 :.alr)~

o~'"tj ;.:==~ ~.g.;; ~ Q..D85 0

8 ~ d~ ..9 --~

"OtS~-tl < t: eudr.S ;.J 0 Ce·..-t·..cc.e"tU~E3 00(..) ..= ~ a i:O~ ~ E<+-tP.t:t@0a-. .. ..-I e 0

..g~ ~ o.).~ a 0 <J

.... J-4 n.2 ~ E . ~~ e]B ~ c:0 o.c -0 0 Q}

"1ji"Q~ la~;j6

'-" Co)

(;~~J-( ~;:: ~ Q) 0 CO t3-o G) f"'I 4) (J)"'O~tf}bn U:tv) OJ~~4):= o · ~ >C) 0 «1.­~.b~

Page 24: Alyeska Pipeline Service Company, -   Home Page

Tecbnical Failure Analysis Report PS09 Control System VI)S Failure, May, 25} 20]{}

All RGVJs controlled by PS 09 SIPPS (Segments 9 aQd 10) stayed in open posItion PS 09 manifoJd~ BL, and T valves stayed at last position•

o BL 1 and 2 valves were open o TO and T1 were open o S1, 82, rv11, and 01 were open o 02 and M2 were ctosed

ece lost visibility of the site with the exception of the 3 main Une units (MLUts). • PS 05 peer to peer watchdog timer began to lime (2 minutes later a block Une was initiated but the site was already in that state)

The three MLU 1s were stilt visible since the~communications module UPS was functional along with its Turin node as weH as the equipment in the MLU modules. The DC Input brea'"ker to the inverter on Control Module UPS-4603 was found tripped during the power recovery immediately foUo~ing the power Joss. .. .. The 65 kW generator started within 5 minutes and was confirmed running by on sIte personneL Due to the failure mode of the UPS the power could not be routed to the equipment from the generator. The generator feeds a critical bus in the control module. The bypass power to th~ UPS is from the non...crificaI bus such that when the UPS went to bypass. no power was available to the load side of the UPS.

The tripped DC input breaker on PDe UPS-4711 was not discovered until after restoration of utility power at approximately 3am on May 26, 2010.

After power was restored, several tests of the control module UPS-4603 and backup 65kw generator (39-GEN-4605R) were conducted. These tests were done on the evenrng of 5127/2010 and the following morning. Through the tests the UPS operated correctly with no dlsruptlon in power to any of the downstream equipment. Several Alyeska technicians participated/witnessed an or portions of the tests as wen as Automation engineering via telephone.

Test 1: Opened the AC feed breaker (SWB..4701R-2B)to the UPS (MCC-4603R R2C) while UPS was normaf and load was on the inverter.

1) UPS load remained on inverter. 2) 95kw generator started after specified delay" 3) 65kw generator auto transfer switch TSW..4805R (ATS) transferred 65kw onto the critical MeC

buss as expected after specified delay. UPS battery voltage dropped during the delay before the ATS transferred. A low battery voltage alarm was observed during the delay period. Battery voltage retumed to normal after the ATS transferred. These were the expected actions except for the low battery alarm. All status' and alarms were confirmed at ace throughout the transition ..

4} AC power was restored to the control module by closing the feeder breaker in the PDe.. The ATS transferred back to normal posiUon and the 65kw generator switched off the buss and went into cool down as expected.. All status' and alarms were confirmed at acc throughout the transition.

Test 2: Opened the AC feed to the control module from the PDC modu~e while UPS was normal and load was on the inverter with the 65kw generator still running in cool down cycle.

1) The resul1s matched test 1 except no low battery voltage alarm Was observed.

Test 3: Opened the AC feed to the control module from the PDe module whHe UPS was normal and load was on the inverter and 65kw generator had shutdown after its cool cycJe completed. Note: a volt meter was in place to monitor the UPS battery voltage.

1) Results matched test 1. 2) Low battery voltage alarm activated at 120vdc battery voltage.

TK-190 Overtifl Incident Attachment 8 Page 2 of 4 Root Cause Analysis Report 6/22/10 And Post Accident Review

Confidential and proprietary information protected from public disclosure under AS 40.25.120{a)(4) and the Freedom of Infonnation Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial information obtained from a person and privileged or confidential.

I

Page 25: Alyeska Pipeline Service Company, -   Home Page

Technical Failure Analysis Report PS09 Control System UPS Failure, IVfa)', 25, 2010

3) Battery vortage was observed at 116vdc when AT S transferred.

Test 4: Transitioned the UPS into tlservice modeh\vhHe UPS was normal and load was on the inverter per the manufa'cturer's procedure

1) UPS load transferred to bypass without interruption. 2) UPS switched to "servIce moden without load interruption. . . .... 3) UPS returned to "normal modeJt and load transferred back to inverter without load tnterruptfon. 4) An status' and alarms were confirmed at ace throughout the transitions.

Test 5: Transitioned the UPS to bypass via the inverter bypass pushbutlon. 1) UPS load transferred to bypass without interruption.. 2} UPS load transferred back to inverter without interruption. 3) AU status· and alarms were confirmed at ace throughout the transitIons.

Test 6: Opened UPS DC Input breaker (top left breaker on Inverter section of UPS) while UPS was normal and load was on the inverter.

1) UPS ioad transferred to bypass without rnterruptlon. 2) Cfosed UPS DC Input breaker and reset the inverter via the inverter reset pushbutton. 3) UPS load transferred back to inverter without interruption.

AU status' and alarms were confirmed at Gee throughout the transitions.

Test 7: Opened UPS AC Input breaker (MCC4601-R2C) while UPS was normal and load was on the inverter. 1} UPS load remained on inverter. 2) UPS AC Power Fail alarm active and confirmed at OCCy

Closed UPS AC Input breaker and AC Power fail alarm cleared locally and at acc..

Findings/Probable Root Cause

The DC input breaker which opened and caused the loss of UPS power has four automatic trips:

• Low DC input voftage (1.75 Volts/Cell or 105 Volts) • High AC output voltage • Inverter over current for 10 seconds (1100/0 of rated) • Short on the AC output distribution system.

Analysis by internal personnel was unable to conclusively determine which of the possibilities actuany caused the trip. It was noted that there are outstanding work orders to replace weak cells in the battery bank of the UPS.. However) testing subsequent to the event was unable to recreate the conditions of May 25. Had the celts been low enough to be the root cause, the effects should have been repeatable. Follow up discussions with technical representatives at the equipment manufacturer did not uncover any known Issues or directly related desIgn changes which have been made since time of purchase by APSC~ Vendor personnel did suggest that a mode of failure which is often not repeatable (as in our sUbsequent testing) can be that of high resistance terminations on the DC side of the system, typically the battery tenllinafs. Such failures often are not repeatable because they "self correct't due to arcing across the high resistance area thereby re-establishing a path of conductance. Forensic analysis was not done to investigate this mode of faHure due to the jnabiJity to access the equipment while jn service. To-date, no definitive root cause of the failure on May 25, can be attributed.. It is most probable that the problem is/was associated with the DC side of the system (i.e. the battery bank).

TK-190 Overfill Incident Attachment a Page 3 of 4 Root Cause Analysis Report 6/22110 AndPostAorndenrRe~ew Confidential and proprietary infonnation protected from public disclosure under AS

40.25.120(aX4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial infonnation obtained from a person and privileged or confidential.

I

Page 26: Alyeska Pipeline Service Company, -   Home Page

Technical Failure Analysis Report PS09 ControJ System UPS Failure, May, 25, 2010

Recommendations

Recommendations developed as a result of this failure and resultant analysis are as foHow­During the planned maintenance shutdown June 19/20, both the PDe and Control Module UPS battery banks should be tested. Test specifications 10 be supplied by BUI Frichtl, Alyeska Electrical Engineering SME.

• SUbsequent to testing, all terminations in the DC side of the sUbject UPS should be inspected for signs of high resistance tenninations and/or arcing.

• Any known maintenance issues should be addressed immediately subsequent to above recommended testing etc.

• A design review of the UPS should be conducted. Specific emphasis should be placed on reliabflity) reduced complexity. and maintainability oftne system. The design requirements based on PHAILOPA analysis should be reviewed and updated in the current operating context/risk profile of the TAPS. Any identifted recommendations for design improvements should be elevated as soon as practical for management consideration.

• An ReM analysis shoutd be conducted on the UPS systems based on current operating context and operating history of the equipment.

Confidential and proprietary information protected from public disclosure under AS 40.25.120(a)(4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial infonnation obtained from a person and privileged or confidential.

TK-190 OverfiU jnc;d~nt Attachment 8 Page 40£4 Root Cause Analysis Report 6/22110And Post Accldent Review

Page 27: Alyeska Pipeline Service Company, -   Home Page

--Document Review List

Document to Title Revision CP-35 p 1 Trans Alaska PipeHne System Pipeline OU Ed 2, Rev 0

Discharge Prevention and Contingency Plant Volume 1} Regulatory Volume I

OMPV-0001 Management of TAPS Shutdown Based I 1 Maintenance

Effective Date No

4130/101

6/9/102 t

IOMPV-0004 !ReqUirements for Control and Approval of Critical 0 Work t 6/4/10:3 Pressure Controller Set Points (SR) 9I OCG-3.01-SR I I4 6/10/10-\ !OCC-3r15 DeC Interface with Field Maintenance Work 3

5 211/10 TP-OCv1 007­ Pipeline Short Duration Shutdown, May 251 201O, 2

for Valve Testing r Are System Testing and Turin 6 Work 5/24110

N-6.00.12 Pipeline Safe Operating Comrrlittee (SOC) Review 7 7 Requirements 12/28/09

N-6.00.1B Pipeline Operational Discipline 2 8 3/24/10

D8...180 Design Basis Update. Section 4.2 Pipeline Tanks Ed 4, Rev 24 9 2/4/10

TWSP40007697·01 Test PS09 Control Module Control Room pun I 0 10 Station Response

TWSP-40007697-D2jTest PS09 PDC Voted Smoke Response I 0 11 i _

TWSP-40007697-03 Test PS09 Manifold IR Detector Response 0 12

Ll-09-00520 /PS-9 Potential Relief Line Overpressure July 17, ---~-

13 2009 Event 8/31/09LI·09-00520 PS-g Potential Relief Une Overpressure Jury 17.

14 2009 Event Management Action Plan 9/4109LI..09-00027 MAP PS01 Sadlerochit Stream Gas Excursion January

15, 2009 Event Management Action Plan 15 3/24110

DM OO-M85A Oil ~ovements Gradient 1.5 MMBPD. Base 3 16 SG::Q.876 PS-1 Inlet 117°F

D-39-E3252 J sh 1 Pump Station 9 39-MCC4601R (39-BD-4601R) 2 .J17 480V One Line Diaaram

.D..39-E3252, sh 2 IPump Station 9 39-MCG-4601R (39-BD-4601R) 2 I18 480V One Line Diagram D..39...E3252. sh 3 Pump Station 9 39-MCC-4601 R (39-6D-4601 R) 2

19 480V One Line Diagram

I D-39-E3300, sh 1 Pump Station 9 Control Module (39-80-4601 R) 2 20 Equipment and Cable Tray Layout

D-39-E3350 sh 1 39-BD-4601R Control Module 39-PDP-4601R & 39.1 5 I21 PDP-4603R Panel Schedule

I SA-3B Corporate Safety Manual 22 44 214/10IPipeline Safety Operating Committee {SOC)N-6.00f12 23 J Review ReQuirements 7 12128/09

Oil Movements Department Operational Discipline 24 OMD-0102 0 9125108

Total Loss of Communication Between acc and all OCC-4.02-SR25 Pump Stations 8 12122109

TK-190 Overfifl Incident Root Cause Analysis And Post Accident Review 10(2

Attachment 13 6122nO

Page 28: Alyeska Pipeline Service Company, -   Home Page

Revision 1 Effective Date _No f Document 10 Title :1 !Hi9h Crude Tank Level (Abnormal Operating

3 I 5/11/0926 acc..7..01 .SR .Procedure)

~U-PS09..Q52410-004 Unit Work Permit 9 II 12/1/09

I I 13

I 1211/0928 lH-PS09-052410-0Q3 Hoi Work Pennit 1

I 3/12/09529 AMS..Q15 \organiZatiOnal Change Management Work Plan May 25 Short Duration Pipeline

5/24/1030 I N/A Shutdown I IPS01 Sadlerochit Stream Gas Excursion

7124/09~ f'tJ/A .Management Action Plan with Addendum t

t

it

DeC Controller Operator Qualifications Report 6/10/10I32 I N/A

ILessons Learned July 2009 Shutdown 8/5/0933 N/A II I

Lessons Learned August 2008 Shutdown 9/16/08N/AI34

Lessons Learned Mini Pipeline Shutdown I35 N/A r 2123108 rPS09 UPS Maintenance History Report I

36 N/A ! 6/3/10 APSe Govemement Letter 20857 .. Notice to

37 'N/A Correct Deficiencies F&G PS03 PS04 PS09 4/22110 United States Department of Interior

38 N/A Letter No.10-Q33-RN 3/26/10

I State of Alaska Division of Fire and Life Safety 39 I N/A ILetter No.. 10-026-RN 3/2110

I JPO Letter No. 09-092-RN

40 N/A 9/17/09 Work Order Package 40007697

41 NIA 6/1110 PS09 Tank Vent Fire Incident Report

42 N/A 3J9/07 (PS09TK 190 Overfill Near Loss Incident Report

143 NJA 3/22107

I PS09 Near Loss Incident Involving Energy Isolation 44 N/A Incident Reoort '1111108

TAPS PS01 Sadlerochit Stream Gas Excursion 45 NJA l Incident Report 2/23/09

TAPS PS09 Piping Over Pressure incident Report 46 N/A 8/31/09

incident Report. March 22, 2007 Pump Station 9 47 N/A Shutdown Incident 4/13/07

PS-9 Tank Vent Fire Management Action Plan I48 N/A 3/14/07

Confidential and proprietary information protected from public disclosure under AS 40.25.120(a)(4) and the Freedom of Information Act, Exemption 4, 5 USC § 522 (b)(4) trade secrets and commercial or financial information obtained from a person and privileged or confidential.

TK·190 OverflJr Incident Root Cause Analysls And Post Accident Review 2of2

Attachment 13 6122110