Maximizing the HALO Detector’s Uptime
3
3 .
3
3
3
.
Stéphane Venne – SNOLAB User’s Meeting 2016
Helium And Lead Observatory
• Operational since May 2012
• 79 tons of lead and 128 Helium-3 counters
• Low cost, low maintenance, long lifetime, dedicated supernova detector
• Supernovas occur approximately 3 times per century
• Crucial that HALO be operational all the time, to not miss the next supernova
HALO – Maximizing Uptime - 2016
2
Current live time plot
3
HALO – Maximizing Uptime - 2016
Causes of downtime
4
Current live time fraction: ~95%
Cause Solution
Active detector work and maintenance
Testing on Laurentian spare equipment. Should decrease now that most systems are in place.
Power outages UPS, automatic shutdown and startup of detector
Hardware malfunction Redundant hardware and automatic sentry
Maximize the uptime with automated systems and proceduresReduces the response time
HALO – Maximizing Uptime - 2016
Sentry – DAQ monitoring
5
Overview• For redundancy: 2 DAQ computers• Both running ORCA• For consistency, configuration files and run scripts are stored on Dropbox• Thus, consistent run number• Primary and secondary DAQs
DAQ failure• Primary is running, secondary is
on standby• If primary does not respond to
pings, secondary takes over and raises alarm
Toggle scheduler• Automatic transfer/switch at
set interval to test system
HALO – Maximizing Uptime - 2016
Sentry – SBC monitoring
6
• ORCA will only start a run if all data taking hardware is present• In HALO, these are the 2 SBCs• If an SBC becomes unreachable, run in progress is stopped• Redundant hardware – can operate half the detector
Full detector operation is preferable.Checks if the removed SBCs are reachable at the end of a run.
If they are, next run starts with both SBCs taking data.
Raises alarm
HALO – Maximizing Uptime - 2016
Automatic power management
7
• Laboratory is subject to scheduled (and unscheduled) power outages
• Uninterruptible Power Supply (UPS) installed• Can maintain detector operational for ~3h on battery• If outage longer than 3h, systems must be shutdown gracefully• Systems must be booted when power restored
Network UPS Tools (NUT)UPS
HALO – Maximizing Uptime - 2016
8
1
2
3
4
Power outage
Numbers in red indicate possible times at which power could be restored. Must be ready for every scenario.
HALO – Maximizing Uptime - 2016
9
• Can start full or half detector, depending on available hardware• Can restart SBCs if necessary• Sends email to report the process activity
Power restorationHALO – Maximizing Uptime - 2016
Next steps…
10
• Sentry has been running since July• A minor bug needs to be fixed• Automatic power management system has recently
been installed• Needs to be tested a few more times• Needs to be tested in a real power outage• Add statistics recording – how much time we gain with
automated system
HALO – Maximizing Uptime - 2016
11
Any questions?
Thank you!
HALO – Maximizing Uptime - 2016