maximizing the halo detector’s uptime...halo – maximizing uptime - 2016. automatic power...

11
Maximizing the HALO Detector’s Uptime Stéphane Venne – SNOLAB User’s Meeting 2016

Upload: others

Post on 07-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Maximizing the HALO Detector’s Uptime

3

3 .

3

3

3

.

Stéphane Venne – SNOLAB User’s Meeting 2016

Page 2: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Helium And Lead Observatory

• Operational since May 2012

• 79 tons of lead and 128 Helium-3 counters

• Low cost, low maintenance, long lifetime, dedicated supernova detector

• Supernovas occur approximately 3 times per century

• Crucial that HALO be operational all the time, to not miss the next supernova

HALO – Maximizing Uptime - 2016

2

Page 3: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Current live time plot

3

HALO – Maximizing Uptime - 2016

Page 4: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Causes of downtime

4

Current live time fraction: ~95%

Cause Solution

Active detector work and maintenance

Testing on Laurentian spare equipment. Should decrease now that most systems are in place.

Power outages UPS, automatic shutdown and startup of detector

Hardware malfunction Redundant hardware and automatic sentry

Maximize the uptime with automated systems and proceduresReduces the response time

HALO – Maximizing Uptime - 2016

Page 5: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Sentry – DAQ monitoring

5

Overview• For redundancy: 2 DAQ computers• Both running ORCA• For consistency, configuration files and run scripts are stored on Dropbox• Thus, consistent run number• Primary and secondary DAQs

DAQ failure• Primary is running, secondary is

on standby• If primary does not respond to

pings, secondary takes over and raises alarm

Toggle scheduler• Automatic transfer/switch at

set interval to test system

HALO – Maximizing Uptime - 2016

Page 6: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Sentry – SBC monitoring

6

• ORCA will only start a run if all data taking hardware is present• In HALO, these are the 2 SBCs• If an SBC becomes unreachable, run in progress is stopped• Redundant hardware – can operate half the detector

Full detector operation is preferable.Checks if the removed SBCs are reachable at the end of a run.

If they are, next run starts with both SBCs taking data.

Raises alarm

HALO – Maximizing Uptime - 2016

Page 7: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Automatic power management

7

• Laboratory is subject to scheduled (and unscheduled) power outages

• Uninterruptible Power Supply (UPS) installed• Can maintain detector operational for ~3h on battery• If outage longer than 3h, systems must be shutdown gracefully• Systems must be booted when power restored

Network UPS Tools (NUT)UPS

HALO – Maximizing Uptime - 2016

Page 8: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

8

1

2

3

4

Power outage

Numbers in red indicate possible times at which power could be restored. Must be ready for every scenario.

HALO – Maximizing Uptime - 2016

Page 9: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

9

• Can start full or half detector, depending on available hardware• Can restart SBCs if necessary• Sends email to report the process activity

Power restorationHALO – Maximizing Uptime - 2016

Page 10: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

Next steps…

10

• Sentry has been running since July• A minor bug needs to be fixed• Automatic power management system has recently

been installed• Needs to be tested a few more times• Needs to be tested in a real power outage• Add statistics recording – how much time we gain with

automated system

HALO – Maximizing Uptime - 2016

Page 11: Maximizing the HALO Detector’s Uptime...HALO – Maximizing Uptime - 2016. Automatic power management 7 • Laboratory is subject to scheduled (and unscheduled) power outages •

11

Any questions?

Thank you!

HALO – Maximizing Uptime - 2016