from zero to visibility

28
From Zero To Visibility Bridget Kromhout

Upload: bridgetkromhout

Post on 08-May-2015

4.180 views

Category:

Technology


3 download

DESCRIPTION

Monitorama Portland 2014 Portland, OR 2014-05-05 to 2014-05-07 When I joined a startup already in progress as their first ops hire, what monitoring existed was a twisty maze of half-measures. The devteam dreaded oncall, and our Mean Time To Lost Sleep was way too low. Improving visibility into our infrastructure and application performance required trying new tools and changing how we thought about what we were measuring. Join me for a tragicomic journey from the vale of blissful ignorance through the straits of Nagios and into the mountains of Graphite. Thrill! to the victories. Cringe! at the rewards of hubris. Share! your own insights, because this tale never really ends.

TRANSCRIPT

Page 1: From Zero To Visibility

From Zero To Visibility

Bridget Kromhout

Page 2: From Zero To Visibility

8thbridge.comsmall social commerce startupacquired in the last month by Fluid, Inc.small devteamI am the ops team

http://www.thedirtbox.com/wp-content/uploads/2013/01/ping-pongart.jpg

Page 3: From Zero To Visibility

twisty maze of little shell scripts

http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg

Page 4: From Zero To Visibility

time-consuming to understanddifficult to modifydoesn’t scale

artisanal monitoring?!

http://shop.bespokebacon.com/images/bespoke-logo.final(3).png

Page 5: From Zero To Visibility

New Relicpros:nice graphsapplication-level viewgood error analysis

cons:slow to updatemany false-positive alertshigh prices (better now)

Page 6: From Zero To Visibility

motivating change

http://99designs.com/illustrations/contests/illustration-pagerduty-161025/entries

Page 7: From Zero To Visibility

as hideous as you remember

Page 8: From Zero To Visibility

“Horrendous interface”“Well, it’s more “old” than anything

else. At least everything is in the

same place as you left it because it’s

been the same since 1912.”https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/

not alone!

Page 9: From Zero To Visibility

“Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.”

who watches the RabbitMQ?

-- @murphy_slaw (via @lozzd)

http://images.sodahead.com/profiles/0/0/0/5/1/6/6/3/9/Watchmen-trademark-symbol-62141795529.jpeghttp://portertech.ca/images/2011-11-01/sensu-diagram.png

Page 10: From Zero To Visibility

hating on nagios: the middle years

Page 11: From Zero To Visibility

“hadoop does not suffer from a paucity of configuration options” http://jaganesundar.wordpress.com/2011/12/05/installing-and-configuring-hadoop-0-20-205-using-it-rpm/

monitor all the ports?!

best way to monitor HBase:hbck: the HBase consistency checker

nagios -> bash script -> parsing output of hbck

http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios

Page 12: From Zero To Visibility

http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png

Page 13: From Zero To Visibility
Page 14: From Zero To Visibility

“Cyber” monday: 1988 called; wants its word back.

Page 15: From Zero To Visibility

wow. such nosql. very webscale.

“a single write operation holds the lock exclusively, and no other read or write operations may share the lock.”

Page 16: From Zero To Visibility

“If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it.” Ian Malpass, Etsy

http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Page 17: From Zero To Visibility

the (former) state of our graphite & statsd

● Graphite 0.9.9○ hand-rolled○ over 2 years old○ missing new features (Consolidate by!)

● StatsD was newish, but…○ hand-rolled○ running in a screen session○ on a special snowflake box

Page 18: From Zero To Visibility

http://media-cache-ec0.pinimg.com/736x/68/c2/9d/68c29deb72bad94cd4e3c1aa0f3cdcd8.jpg

this is wrong tool. never use this.

Page 19: From Zero To Visibility

Community cookbooks?

● StatsD○ https://github.com/librato/statsd-cookbook

● Graphite ones good, but…○ focus on Apache (we use nginx)○ we haven’t moved to Chef 11 (gasp!)

Page 20: From Zero To Visibility

when in doubt: tcpdump is your friend

http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/

Page 21: From Zero To Visibility

carbon-aggravator (between 0.9.10 & 0.9.12)

# If set true, metric received will be forwarded to# DESTINATIONS in addition to# the output of the aggregation rules. If set false # the carbon-aggregator will# only ever send the output of aggregation.FORWARD_ALL = True

Page 22: From Zero To Visibility

carbonate: A+++ would clone again

whisper-fill.pybackfill datapoints between whisper files

Page 23: From Zero To Visibility

life as a third wheel party

thresholds: because not every outage is abrupt

normal traffic

decision to turn off

decision to turnback on

accidental removal

Page 24: From Zero To Visibility

open-source error reporting

Page 25: From Zero To Visibility

all the things

StatsDApplication-level error analysis

Alarms for autoscaling

Timers & counters

Log & host-level

Hadoop & HBase visualization

MongoDBGraphs

Time-series data graphing

client-side plugins

Threshold-based alarmsDashboard

external checks

Page 26: From Zero To Visibility

What’s next?

http://blog.xebia.fr/wp-content/uploads/2013/12/file-logstash-es-kibana.png

Page 27: From Zero To Visibility

what even is ideal monitoring solution

http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg

❏ finds real problems❏ actionable alerting❏ usable by all❏ …?

Page 28: From Zero To Visibility

questions; comments; whatnot

Twitter: @bridgetkromhoutEmail: [email protected]

In person: DevOps Days Minneapolis (devopsdays.org)