best ever alarm system toolkit xihui chen, katia danilova, kay kasemir sns/ornl [email protected]...

33
Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl .gov July 2009

Upload: piers-moore

Post on 23-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

Best Ever Alarm System Toolkit

Xihui Chen,

Katia Danilova,

Kay Kasemir

SNS/ORNL

[email protected]

July 2009

Page 2: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

2 Managed by UT-Battellefor the U.S. Department of Energy

Previous Attempts at SNS• ALH, soft-IOCs and EDM screens

• Issues– GUI

• Static Layouts

• N clicks to see (some of the) active alarms

– Configuration

• .. was bad Always too many alarms

• Changes required contacting one of the 2 experts,restart ALH, …

– Information• Operator guidance?

• Related displays?

• Most frequent alarm?

• Timeline of alarm?

Page 3: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

3 Managed by UT-Battellefor the U.S. Department of Energy

New End-User View: Alarm Table

• All currentalarms– new, ack’ed

• Sort by PV,Descr., Time, Severity, …

• Optional: Annunciate

• Acknowledge one or multiple alarms– Select by PV or description

– BNL/RHIC type un-ack’

Page 4: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

4 Managed by UT-Battellefor the U.S. Department of Energy

Another View: Alarm Tree

• All alarms– Disabled, inactive, new, ack’ed

• Hierarchical– Optionally only show

active alarms– Ack’/Un-ack’ PVs or sub-tree

Page 5: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

5 Managed by UT-Battellefor the U.S. Department of Energy

Guidance, Related Displays, Commands

Basic Text

Start EDM screen

Open web page

Run ext. command

Hierarchical:Including info of parent entries

Merges Guidance etc. from all selected alarms

Page 6: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

6 Managed by UT-Battellefor the U.S. Department of Energy

.. Within CSS

Alarms

History of PV

EPICS Config.

Page 7: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

7 Managed by UT-Battellefor the U.S. Department of Energy

E-Log Entries

• “Logbook”from context menucreates text w/basic info aboutselected alarms.Edit, submit.

• Pluggable implementation, not limited to Oracle-based SNS ELog

Page 8: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

8 Managed by UT-Battellefor the U.S. Department of Energy

.. may require Authentication/Authorization

Log in/out while CSS is running

Online Configuration Changes

Page 9: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

9 Managed by UT-Battellefor the U.S. Department of Energy

Configure PV

• Again online

• Especially usefulfor operators toupdate guidanceand relatedscreens.

Page 10: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

10 Managed by UT-Battellefor the U.S. Department of Energy

Logging

• ..into generic CSS log also used for error/warn/info/debug messages

• Alarm Server: State transitions, Annunciations

• Alarm GUI: Ack/Un-Ack requests, Config changes

• Generic Message History Viewer– Example w/ Filter on TEXT=CONFIG

Page 11: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

11 Managed by UT-Battellefor the U.S. Department of Energy

Logging: Get timeline

• Example: Filter on TYPE, PV

1. PV triggers,clears, triggers again

2. Alarm Server latches alarm

4. Problem fixed

3. Alarm Server annunciates

5. Ack’ed by operator

6. All OK

Page 12: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

12 Managed by UT-Battellefor the U.S. Department of Energy

All Sorts of Web Reports

Page 13: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

13 Managed by UT-Battellefor the U.S. Department of Energy

Technical View

Alarm Cfg & StateRDB

Alarm Cfg & StateRDB

IOCsIOCs

Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?

Alarm ServerCurrent Alarms: Acknowledged? Transient? Annunciated?

LOGLOG

MessageRDB

MessageRDB

JMS2

Speech

JMS2

Speech

JMS2

RDB

JMS2

RDB

Tomcat-Reports

Tomcat-Reports

CSS ApplicationsCSS Applications

Alarm Client GUI

JMS

Alarm Updates Ack’; Config UpdatesAnnunciationsLog Messages

TALKTALK ALARM_CLIENTALARM_CLIENTALARM_SERVERALARM_SERVER

PV Updates (Channel Access, …)

Page 14: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

14 Managed by UT-Battellefor the U.S. Department of Energy

General Alarm Server Behavior

• Latch highest severity, or non-latching– like ALH “ack. transient”

• Annunciate

• Chatter filter ala ALH Alarm only if severity persists some minimum time .. or alarm happens >=N times within period

• Optional formula-based alarm enablement:– Enable if “(pv_x > 5 && pv_y < 7) || pv_z==1”

– … but we prefer to move that logic into IOC

• When acknowledging MAJOR alarm, subsequent MINOR alarms not annunciated– ALH would again blink/require ack’

Page 15: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

15 Managed by UT-Battellefor the U.S. Department of Energy

Best Ever Alarm System Tools, Indeed

.. but Tools are only half the issue

Good configuration requires plan & follow-up.

B. Hollifield, E. Habibi,"Alarm Management: Seven (??) Effective Methods for Optimum Performance", ISA, 2007

Page 16: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

16 Managed by UT-Battellefor the U.S. Department of Energy

Alarm Philosophy

Goal:

Help operators take correct actions

– Alarms with guidance, related displays– Manageable alarm rate (<150/day)– Operators will respond to every alarm

(corollary to manageable rate)

Page 17: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

17 Managed by UT-Battellefor the U.S. Department of Energy

DOES IT REQUIRE IMMEDIATE OPERATOR ACTION?– What action? Alarm guidance!

Not “make elog entry”, “tell next shift”, … Consider consequence of no action

Is it the best alarm?– Would other subsystems, with better PVs, alarm at the same time?

What’s a valid alarm?

Page 18: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

18 Managed by UT-Battellefor the U.S. Department of Energy

How are alarms added?

• Alarm triggers: PVs on IOCs– But more than just setting HIGH, HIHI, HSV, HHSV– HYST is good idea– Dynamic limits, enable based on machine state,...

Requires thought, communication, documentation

• Added to alarm server with– Guidance: How to respond– Related screen: Reason for alarm (limits, …), link to screens

mentioned in guidance– Link to rationalization info (wiki)

Page 19: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

19 Managed by UT-Battellefor the U.S. Department of Energy

Impact/Consequence GridCategory So What Minor Consequence Major Consequence

Personnel Safety PPS independent from EPICS?

Environment, Public

Can EPICS cause contained spill of mercury?

Uncontained spill??

Cost:Beam Production, Downtime,Beam Quality

No effect

Beam off < 1 sec?

Beam off <10 min

<$10000

Beam off >10min

>$10000

• Mostly: How long will beam be off?

Page 20: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

20 Managed by UT-Battellefor the U.S. Department of Energy

.. combined with Response Time

Time to Respond Minor Consequence Major Consequence

>30 Minutes NO_ALARM MINOR

10..30 minutes MINOR MAJOR

<10 minutes MAJOR MAJOR + Annunciate

– This part is still evolving…

Page 21: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

21 Managed by UT-Battellefor the U.S. Department of Energy

Example: Elevated Temp/Press/Res.Err./…

• Immediate action required?– Do something to prevent interlock trip

• Impact, Consequence?– Beam off: Reset & OK, 5 minutes?

– Cryo cold box trip: Off for a day?

• Time to respond?– 10 minutes to prevent interlock?

• MINOR? MAJOR?

• Guidance: “Open Valve 47 a bit, …”

• Related Displays: Screen that shows Temp, Valve, …

Page 22: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

22 Managed by UT-Battellefor the U.S. Department of Energy

“Safety System” Alarms

Protection Systems not per se high priority– Action is required, but we’re safe for now, it won’t get worse if

we wait

Pick One“Mommy, I need to gooo!”“Mommy, I went”

(Does it require operator action? How much time is there?)

Page 23: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

23 Managed by UT-Battellefor the U.S. Department of Energy

Avoid Multiple Alarm Levels

Page 24: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

24 Managed by UT-Battellefor the U.S. Department of Energy

Bad Example: Old SNS ‘MEBT’ Alarms

• Each amplifier trip:≥ 3 ~identicalalarms, no guidance

• Rethought w/ subsystemengineer, IOC programmerand operators: 1 better alarm

Page 25: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

25 Managed by UT-Battellefor the U.S. Department of Energy

Alarms for Redundant Pumps

Page 26: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

26 Managed by UT-Battellefor the U.S. Department of Energy

Alarm Generation: Redundant Pumps the wrong way

• Control System– Pump1 on/off status– Pump2 on/off status

• Simple Config setting: Pump Off => Alarm:– It’s normal for the ‘backup’ to be off– Both running is usually bad as well

• Except during tests or switchover

– During maintenance, both can be off

Page 27: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

27 Managed by UT-Battellefor the U.S. Department of Energy

Redundant Pumps

• Control System– Pump1 on/off status– Pump2 on/off status– Number of running pumps– Configurable number of desired pumps

• Alarm System: Running == Desired?– … with delay to handle tests, switchover

• Same applies to devices that are only needed on-demand

11Required Pumps:Required Pumps:

Page 28: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

28 Managed by UT-Battellefor the U.S. Department of Energy

Weekly Review: How Many? Top 10?

Page 29: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

29 Managed by UT-Battellefor the U.S. Department of Energy

A lot of information available

How often did PV trigger?

For how long?

When?

Temporary issue?Or need HYST,alarm delay,fix to hardware?

Page 30: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

30 Managed by UT-Battellefor the U.S. Department of Energy

Weekly Check: Stale, Forgotten?

Page 31: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

31 Managed by UT-Battellefor the U.S. Department of Energy

GUI: Similar to SNS GUI shown hereGUI: Similar to SNS GUI shown here

JMS

CSS OtherOther

RDBRDB

LOGLOG ALARMALARM

JMS2RDB

IOCIOC

LDAPLDAP

Interconnection ServerInterconnection Server

What about the DESY Alarm System?

FiltersFilters

Filt.AlrmFilt.Alrm

No Channel Access Monitor of selected alarm PVs!

IOCs push all alarms via new protocol into Interconn. Server.

No Channel Access Monitor of selected alarm PVs!

IOCs push all alarms via new protocol into Interconn. Server.

Page 32: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

32 Managed by UT-Battellefor the U.S. Department of Energy

Design Choices

• Similar alarm table and tree GUIs

• JMS for communication– slightly different messages, though

• DESY IOCs send all alarms, then filtered in AMS– DESY: All IOC alarms should show up in AMS, zero additional configuration

– At SNS, how many of the 350000 PVs would send alarms?We want to make the addition of alarms simple, but not automatic, and encourage guidance, related displays.

• DESY/SNS: LDAP vs. RDB for configuration/state– Choice was based on available infrastructure.

• JMS Listeners– SNS: Logger, Annunciator

– DESY: Logger, Send SMS, EMail, Voice Mail

Page 33: Best Ever Alarm System Toolkit Xihui Chen, Katia Danilova, Kay Kasemir SNS/ORNL kasemirk@ornl.gov July 2009

33 Managed by UT-Battellefor the U.S. Department of Energy

Summary

• BEAST operational at SNS since Feb’09– DESY AMS is similar and has been

operational for longer

• Pick either, but good configuration requires work in any case– Started with previous “annunciated” alarms

• ~300, no guidance, no related displays

• Now ~400, all with guidance, rel. displays, links to operational procedures

– “Philosophy” helps decide what gets added and how• Immediate Operator Action? Consequence?

Response Time?

– Weekly review spots troubles and tries to improve configuration