sean falzon - nagios - resilient notifications

31
Resilient Notifications By Sean Falzon [email protected]

Upload: nagios

Post on 13-Apr-2017

518 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Resilient Notifications

By Sean [email protected]

About me

What is this about?

• Why resilient notifications?

– Notifications are IMPORTANT• They can also be ANNOYING

– Monitoring a system which you rely on for

communications is good

• Relying on that system to notify you when you

have a problem is probably not best practice

Sending a notification when

something happens is natural

behavior

From birth

Humans have always been trying

to find better ways to send

notifications

Most popular notifications

• E-mail

• SMS

My favorite ways to notify

• Email

• SMS

• Voice

• Service desk tickets

Pro’s for SMS

• Simple to send

• Web/email gateways

• SMS Modem / connected phone

Con’s for SMS

• No way to know if it was received

• No way to tell if it was read

• Cannot be automatically forwarded by phone network

So what's the answer?

• There are commercial products out there– Some can be costly

– Free products like Pushbullet and telegram are good, but may not suit due to internet connectivity and as free services no SLA contract

– Apps like aNag are good but have downsides such as internet / vpn connectivity and the app sometimes has issues

I love voice

• You can forward a mobile phone / IP phone

• You can listen to a voice message in a car (well in Australia you can)

• You can acknowledge with voice or keypad

Voice network diagram

Voice is resilient

• Multiple pathways

– ISDN

– SIP

– PSTN

Voice is resilient

• Forwarding of calls

– On call roster are hard to manage

– Things change

– Forwarding a phone is easy!

Voice is resilient

• Acknowledging voice calls is easy

– Keypad acknowledgement

– Voice acknowledgement

How does it work?

Call Support person

Call 2nd

person or manager

Create Asterisk

Call file

&

Synth Voice

Call speedcall

Acknowledgement

• In my view it is best if you do not allow someone to acknowledge anything outside of the Nagios interface

Escalations

“Insanity is doing the same

thing over and over again and

expecting different results”Albert Einstein

Use multiple methods

• Combine multiple methods to achieve a solution that you KNOW will deliver the notification when you need it

Problem: on call rosters

• Created by people for people, not predictable – if you never have people swap your employees are robots

Escalate notifications

• Nagios has built in support for notifications escalations

• Avoid using the same communications method for the escalation

• Can be used to escalate from one method to another for the same contact

Problem: Excessive Notifications

• Operators ignore more notifications than they action

• Important notifications are missed

• Do not notify on unreachable

Solutions?

• Most successful solutions require a third party web interface to select who is on call

– Works well

– Requires access to the web interface

My preferred way?

• Use the corporate phone system

– System speed calls allow you to divert

numbers

– Remote hot desk allows changing diversions

securely without computer

– Phone system has redundancy / resiliency

built in from its original design

Right tool for the job

• Pick the right notification type

do we want to be interrupted with a loud

notification type for issues that can happen

frequently but only matter if they are sustained?

NO probably not

Don’t rely just on “smart” phones

• If using SMS, Voice. Email or APPS it is easy to forget that the person who is on call is probably going to access all four on the same device.

Resources

Here’s what you do:

Read about notification escalations here http://tinyurl.com/qx68m65

Read about status and reachability here http://tinyurl.com/qfoyrue

Want scripts? Or want to share yours?????

http://exchange.nagios.org/

Thank you!

Any Questions?

Some things shouldn’t be interrupted

Unless its really important