from zero to visibility

From Zero To Visibility

Bridget Kromhout

8thbridge.comsmall social commerce startupacquired in the last month by Fluid, Inc.small devteamI am the ops team

http://www.thedirtbox.com/wp-content/uploads/2013/01/ping-pongart.jpg

twisty maze of little shell scripts

http://www.pcgameshardware.de/screenshots/1280x1024/2007/07/CA01.jpg

time-consuming to understanddifficult to modifydoesn’t scale

artisanal monitoring?!

http://shop.bespokebacon.com/images/bespoke-logo.final(3).png

New Relicpros:nice graphsapplication-level viewgood error analysis

cons:slow to updatemany false-positive alertshigh prices (better now)

motivating change

http://99designs.com/illustrations/contests/illustration-pagerduty-161025/entries

as hideous as you remember

“Horrendous interface”“Well, it’s more “old” than anything

else. At least everything is in the

same place as you left it because it’s

been the same since 1912.”https://laur.ie/blog/2014/02/why-ill-be-letting-nagios-live-on-a-bit-longer-thank-you-very-much/

not alone!

“Sensu has so many moving parts that I wouldn’t be able to sleep at night unless I set up a Nagios instance to make sure they were all running.”

who watches the RabbitMQ?

-- @murphy_slaw (via @lozzd)

http://images.sodahead.com/profiles/0/0/0/5/1/6/6/3/9/Watchmen-trademark-symbol-62141795529.jpeghttp://portertech.ca/images/2011-11-01/sensu-diagram.png

hating on nagios: the middle years

“hadoop does not suffer from a paucity of configuration options” http://jaganesundar.wordpress.com/2011/12/05/installing-and-configuring-hadoop-0-20-205-using-it-rpm/

monitor all the ports?!

best way to monitor HBase:hbck: the HBase consistency checker

nagios -> bash script -> parsing output of hbck

http://www.ymc.ch/en/how-to-monitor-hbase-health-by-nagios

http://modiinhub.com/wp-content/uploads/2014/02/logo-mongodb-tagline.png

“Cyber” monday: 1988 called; wants its word back.

wow. such nosql. very webscale.

“a single write operation holds the lock exclusively, and no other read or write operations may share the lock.”

“If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it.” Ian Malpass, Etsy

http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

the (former) state of our graphite & statsd

● Graphite 0.9.9○ hand-rolled○ over 2 years old○ missing new features (Consolidate by!)

● StatsD was newish, but…○ hand-rolled○ running in a screen session○ on a special snowflake box

http://media-cache-ec0.pinimg.com/736x/68/c2/9d/68c29deb72bad94cd4e3c1aa0f3cdcd8.jpg

this is wrong tool. never use this.

Community cookbooks?

● StatsD○ https://github.com/librato/statsd-cookbook

● Graphite ones good, but…○ focus on Apache (we use nginx)○ we haven’t moved to Chef 11 (gasp!)

when in doubt: tcpdump is your friend

http://blog.johngoulah.com/2012/10/looking-under-the-covers-of-statsd/



carbon-aggravator (between 0.9.10 & 0.9.12)

# If set true, metric received will be forwarded to# DESTINATIONS in addition to# the output of the aggregation rules. If set false # the carbon-aggregator will# only ever send the output of aggregation.FORWARD_ALL = True

carbonate: A+++ would clone again

whisper-fill.pybackfill datapoints between whisper files

life as a third wheel party

thresholds: because not every outage is abrupt

normal traffic

decision to turn off

decision to turnback on

accidental removal

open-source error reporting

all the things

StatsDApplication-level error analysis

Alarms for autoscaling

Timers & counters

Log & host-level

Hadoop & HBase visualization

MongoDBGraphs

Time-series data graphing

client-side plugins

Threshold-based alarmsDashboard

external checks

What’s next?

http://blog.xebia.fr/wp-content/uploads/2013/12/file-logstash-es-kibana.png

what even is ideal monitoring solution

http://www.quickmeme.com/img/f5/f512ff9bee084263df5571d3c81388019dcb063173e1dbcfa2babac9274576b6.jpg

❏ finds real problems❏ actionable alerting❏ usable by all❏ …?

questions; comments; whatnot

Twitter: @bridgetkromhoutEmail: [email protected]

In person: DevOps Days Minneapolis (devopsdays.org)

from zero to visibility

Technology