moving beyond nagios next generation monitoring - … · nagios pain - configuration • nagios...

50
Next Generation Monitoring: Moving Beyond Nagios

Upload: buituyen

Post on 13-Sep-2018

247 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Next Generation Monitoring:Moving Beyond Nagios

Page 2: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Intro - Us

• who are we?• why do we care about this?

Page 3: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Intro - You

do you like your servers?

Page 4: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios

• "It was here when I got here"• initially released 1999

Page 5: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

What this talk is about

• Nagios pains• Sensu <3

Page 6: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Sensu <3

• client: runs checks• server:

• maintains state• executes handlers• answers questions about state

Page 7: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Configuration

• large central configuration• NRPE configs on server & client• Perl

Page 8: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Configuration

• Nagios config in separate git repo• vast differences b/w dev, stages, & prod• chicken-and-egg w/ code changes• complexity grew w/ number of teams

Page 9: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios Pain - Scaling

Weak scaling story• each check adds load• scheduled checks may not run

No HA• no good failover• can't share state

Page 10: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Interfacethe interface is brutal

Page 11: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Workflow

This one is mostly our fault.

Page 12: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Workflow

• acknowledge and forget• downtime and forget• disable notifications and forget• disable checks and forget

Page 13: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Workflow

We had a weekly email that would bug us about acknowledged or silenced alerts...

Page 14: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Nagios pain - Workflow

….no one read it.

Page 15: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Let’s talk about greener grass

Page 16: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Alternative Nagios strategies

generate Nagios config via puppet• passive checks?• centralized checks not associated with

nodes?

Page 17: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Sensu is compatible

• uses the Nagios plugin interface• existing Nagios checks work in Sensu

Page 18: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making
Page 19: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Scared?

“Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.” -- @murphy_slaw

Page 20: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

How we use Sensu

• set up in HA clusters• individual components are HA• entirely configured in Puppet

Page 21: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Checks are just Puppet

• all of our checks run standalone• Puppet configures the checks that run on a

node• the Sensu server requires no reconfiguration

to add new checks

Page 22: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Checks sit next to functionality

Page 23: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making
Page 24: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Sensu + Puppet == <3

Page 25: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

How do MySQL and Sensu interact?

• host-based condition monitoring• not graphing

• monitor health of MySQL and its hardware

Page 26: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

What do we check?

Page 27: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

More Puppet <3

Page 28: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

More Puppet <3

Page 29: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

How do we work with Sensu?

Page 30: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Alerts vs. Events: Nagios

Nagios is normally used for everything:• scheduling• emailing• raising alerts• notification• escalation• acknowledgement

Page 31: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Alerts vs. Events: Sensu

Sensu does these things• runs checks• generates events• calls handlers

Page 32: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Disabling/Stashing checks

• stash checks/host• sensu-cli tools

• silence around maintenance/reboots• can query API for stashed alerts

• our stashed alerts come back

Page 33: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Stash example

Page 34: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Sensu notification handlers

our Sensu handlers can• announce in IRC• file a (JIRA) ticket• page via PagerDuty• send email

Page 35: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Sensu notification handlers

Page 36: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Less used notification handlers

We also have:• Graphite• aws_prune• OpsGenie

Page 37: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Users and events

Users interact with their notifications via:• PagerDuty

• IRC bot, mobile app, SMS, web UI

Can see overview via:• Uchiwa

Page 38: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Uchiwa

Page 39: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Event resolution

Sensu • announces in IRC• can close a ticket• PagerDuty• sends email

Page 40: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

motd

Page 41: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Are we totally dependent on Puppet?

Page 42: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Are we totally dependent on Puppet?

• Puppet runs hourly• We may want to change checks more

frequently• Problem? Certainly not

Page 43: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

(1) Smart checks

Page 44: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

(2) Dynamic thresholds

Page 45: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Symlinks to the rescue!

Page 46: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

(3) Symlinked checks

• not in use at Yelp• allows tweaking of:

• intervals• handlers

Page 47: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Making the switch

• compatibility helps• there are still differences• make the best use of customized check

routing• start non-alerting• gain confidence• add in alerting

Page 48: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Conclusion

• it's an adjustment• Sensu lets us do more with less• we still need to migrate some checks

Page 49: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

Questions?

Page 50: Moving Beyond Nagios Next Generation Monitoring - … · Nagios pain - Configuration • Nagios config in separate git repo • vast differences b/w dev, stages, & prod ... Making

@YelpEngineering

YelpEngineers

engineeringblog.yelp.com

github.com/yelp

Follow us! @hashbrowncipher & @jcsuperstar

Related GitHub Projects:github.com/yelp/sensu_handlersgithub.com/yelp/pysensu-yelp

Kyle’s (@solarkennedy) talk at SFDevOps:Part 1Part 2

Tom’s (@bobtfish) from Puppetconf 2014Sensu and Sensibility

Come see us at booth 402!