safety systems kelly mahoney december 1, 2010 remote access review

19
Safety Systems Kelly Mahoney December 1, 2010 Remote Access Review

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Safety Systems

Kelly Mahoney

December 1, 2010 

Remote Access Review

Safety Systems

• Personnel Safety Systems– Fully redundant PLC based safety systems– Fail-safe– Separate from EPICS controls

• Machine Protection Systems– Special Fail-safe hardware managed by EPICS controls– Critical inputs/functions have additional protection

Safety Systems – PSS Now

PLC A0 PLC An-even

Modbus (RS232)

PSS HMI

PSS Program/Monitor

MB+ (~ 1Mb/s)

TCP/IP

PSS Development

Bridge/Router

MB+

JLab IT Infrastructure Enclave

Controls Enclave

Fully Redundant (1 of 2 systems shown)

Only Communication with PLCs is through proprietary HW/SW on dedicated machines.

PCs can be connected to the accelerator network for patches.

Safety Systems - PSS

PSS Remote Access:– on-site (e.g. office)

• Only access from dedicated machines, using special hardware/software (Total of 4)

• Additional password protection• PLCs in “Read Only” mode

– off-site (e.g. home)• No off-site access

Safety Systems - PSS

Engineering Solutions:– Segregation/Redundancy/Failsafe– Multiple layers of protection– Division A/B handshake through hardware– No Safety Functions are performed through network– HW Memory Protect– SW “Program” mode protection– Equipment racks padlocked– Working towards NIST SP800-82

Policy/Procedures

• Policy– PSS Config Control Policy (Currently Under Revision)– JLab Safety Configuration Management Board– JRRP review for major changes– External technical panel review for major architecture changes,

e.g. 12GeV

• Procedure– PSS communication network well documented and under config

control– Work control/Work authorization procedures– PSS Certification Procedures

• Includes specific steps for SW validation• Step to save ‘gold’ copy of software after certification

– PSS is line item in ATLis work control system

Effective?• PSS Fiber Optic trunk cut during construction

– Able to:• Initiate Investigation/conduct forensics• Identify specific fibers/systems from documentation• Create temporary route, while maintaining config

control• Use approved red-line markups for changes• Consult the SCMB on proposed changes• Create work-control documentation• Run temporary fiber trunks/terminate and QA• Re-certify affected systems

…In < 11 hours after cut

– PLC network drawings updated to reflect “temporary” route as new released revision level

– Updated drawings again for permanent fix

PSS Change Process

Training

• All SSG personnel receive classroom training from PLC mfg. – Augmented by specialty training from mfg. in the form of DVDs, webinars,

manuals, tutorials, …etc.

• Additional training on JLab/SSG specific implementation• All engineers receive system safety training• Typical training cycle is two years before personnel are considered

competent in PSS policy/procedures• Operators trained to spot suspicious/inconsistent behavior

– Formal Training– OJT– Written Test includes “what if” questions

• SSG engineer is trained in safety network architecture– Will be taking test for ProfiNet Engineer certification in December, 2010

Assurance• Risk based/Graded Approach

– JLab developed risk assessment method applies to all types of software based systems

• Emphasis on competency of SSG staff

• Emphasis on Requirements – System Requirements– Logic Specification

• Certification procedure directly derived from logic spec

• Logic Spec Program Certification Procedures all track at same rev level

• Working toward NASA Software assurance model– NASA-STD-8739.8 – NASA-GB-A201

Performance Measures

• Very few with regard to remote access– No recorded incidents of attempted cyber intrusion

• IF there were an event, it is treated like an accident investigation and information is recorded in corrective action tracker system.

• SSG has culture of copious self-reporting. Problems are recorded for future reference. Information analyzed for trends.

Residual Risks• Malware targeted at industrial controls, ala Stuxnet

(Transferred through USB memory sticks)

Solution:

Isolation

Disable “autorun”

Keep antivirus software current

Scan portable memory

Dedicated memory devices for PSS software data/license xfer

Stay connected to DHS CERT, manufacturer bulletins, and other sources of information on potential threats

Dedicated program/HMI PCs

• Access by Unauthorized JLab “Insider” • Separate authentication in addition to JLAB/Controls Dept. Requirements at both the PC and PLC level• Multiple layers• Independent/Redundant systems• Locked racks• Least Privileges• On-line program periodically compared to “gold” copy downloaded after last certification• PM program includes inspection of system racks• All PSS systems, infrastructure are labeled

Residual Risks• Restoration after loss of HW/SW

– Active PCs have RAID redundant hard drives– Periodically perform image of HD– Development PC is duplicate of control room display PC– SW backup on JLab managed file system– Hardcopy printout

• Obsolescence/Compatibility– Dedicated test stands for burn-in/test of spares and new/refurbished HW/SW– Much of the existing I/O is obsolete– Attrition plan is to transfer to new safety PLC model – Re$ource Limited

• Programming mistake– Two independent programmers– PLC addressing scheme segregates divisions (A=Even, B=Odd)– PLC address hardcoded in to PLC program– Version check at installation/test– Spread responsibilities for requirements/test procedures among personnel

• Induced trip, e.g. DNS, network storm, IT sniffing, …– Isolation from controls network– Bandwidth limits– Communication time allocation limits

Safety Systems - MPS

• EPICS based – same access as for other controls

• Inputs may be masked to support multiple machine operating modes and configurations.

• Engineering controls– Hardware mask disable for always critical inputs (always active)– Additional software controls for mask enable for critical inputs that

require masking from time to time– Failsafe design deters tampering– Control room alarms for important run configuration items

Safety Systems - MPS

Safety Systems

• Future Plans– 12GeV Hall D

• Dedicated/certified safety PLCs• Redundant Profinet running ProfiSafe protocol• Dedicated PSS industrial firewall and managed switches• Access through firewall requires manufacturer’s certificate• Safety PLCs are “network aware”

– Work to SEI CMMI model for better metrics

12GeV Hall D Safety Systems

IT Router

PSS Firewall

ManagedSwitch

ManagedSwitch

ManagedSwitch

ManagedSwitch

ManagedSwitch

Controls Domain

Safety Domain

PCs must have certificate from PLC mfg. to communicate through firewall

Safety PLCs and I/O

Potential DMZ

Note: System is redundant after first switch. Only one division shown for clarity.

Self-healing ringProfiSafe Protocol

Running on ProfiNet

Safety Systems

• Future Plans– Upgrade to safety PLCs using industrial Ethernet – Two level authentication for all PCs with access to safety PLCs– Full implementation of suite of complementary standards– Slow and deliberate move to Windows 7 as support software is

certified for use with this OS – will start with test stand.– Formal modeling of safety/security system properties

Safety Systems

• Comments ?