how to measure everything: a million metrics per second with minimal developer overhead - puppetco
DESCRIPTION
How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - Jos Boumans, KruxTRANSCRIPT
HOW TO MEASURE EVERYTHINGA million metrics per second with minimal developer overhead
!Jos Boumans - @jiboumans
http://www.imagemediapartners.com/Portals/20286/images/MeasuringTape-s.jpg
RIPE NCCEngineering manager for RIPE Database
http://www.ripe.net/db
CANONICAL
http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775
Engineering manager for Ubuntu Server 10.04 & 10.10
http://www.ubuntu.com/business/server/overview
KRUXVP of Operations & Infrastructure
http://www.krux.com/
SOME OF OUR CUSTOMERS
A LOT OF TRAFFIChttp://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html
AVERAGE DATA EVENTS / SEC
http://investor.fb.com/results.cfm http://www.statisticbrain.com/twitter-statistics/
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
0 35,000 70,000 105,000 140,000
Twitter : New Tweets Wikipedia: Page Views
Facebook: Messages Sent Krux: New Data Points
MONTHLY UNIQUE USERS
0 500,000,000 1,000,000,000 1,500,000,000 2,000,000,000
http://reportcard.wmflabs.org/ http://www.statisticbrain.com/twitter-statistics/
http://newsroom.fb.com/company-info/
DATA IS EVERYTHINGAlways know what’s going on
http://perpetual-wonder.com/blog/wp-content/uploads/2012/09/Where-do-we-go-from-here.jpg
UNIQUE METRICSUnique metrics received, per second
METRICS & VISUALIZATION… and a little bit of monitoring
http://getfit101.files.wordpress.com/2012/04/visualization.jpg
VISUALIZATION MATTERSHumans are good at patterns & shapes
http://1.bp.blogspot.com/-CO-8FK9bohE/T89rD8dTyEI/AAAAAAAAAEE/YUZ00v_filk/s1600/live_like_it_matters_by_mythirll-d3iqcxt.jpg
INSIGHT MATTERSWe consider it a core competence
http://yourselfseries.com/teens/files/2013/05/suicide_bonus_Insight_final.jpg
SHOW EVERYONEAnd better yet, encourage people to add their own
http://www.kissimmee.org/ftp/KCC/events/views/images/crowd_cheer.jpg
THE BOTTOM LINE
KEY CHARACTERISTICS… of our metrics collection
http://www.fullcirclefeedback.com.au/resources/wp-content/uploads/2014/01/Key-skills-and-characteristics-of-good-HR-leaders.jpg
WHAT TO VISUALIZEPick your operational KPIs
http://1.bp.blogspot.com/-nrB1A9hamEk/UVZui_JUG1I/AAAAAAAAAdI/zGqHuanZNVU/s1600/missed-opportunities.jpg
REQUEST & ERROR RATESThe baseline for everything else
WORST RESPONSE TIMESTrack the worst upper 95th & upper 99th across a cluster
TRACK EVENTSDid a code change or batch job cause a change in
behaviour?
CAPACITY / THRESHOLDSHow much traffic can your service sustain?
SINGLE SERVICE OVERVIEWCreate a single graph for every service
WHAT TO CAPTUREEverything. No, really.
http://arkansasagnews.uark.edu/monarchs95.jpg
INFRASTRUCTUREEverything needed to create, capture and
act on a million metrics per secondshttp://discussamerica.org/remer-blog/images/Freeway_Interchange2.jpg
GRAPHITE, STATSD & COLLECTDThe Trifecta
COLLECTDOpen Source Monitoring Tool
https://collectd.org/ https://collectd.org/wiki/index.php/Plugin:StatsD
STATSDSimple stats collector service
https://github.com/etsy/statsd http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.phphttp://emps.exeter.ac.uk/media/universityofexeter/emps/eisa/exista-splash.jpg
STATSD NAMING SCHEMEstats. # to distinguish from events$environment. # prod, dev, etc$cluster_name. # api-ash, www-dub, etc$application. # webapp, login, etc$metric_name_here. # any key the app wants$hostname # node the stat came from
STATSD CONFIGURATION{ graphite: { globalPrefix: stats.$env.$cluster_name, globalSuffix: require(‘os').hostname().split('.')[0], legacyNamespace: false, }, percentThreshold: [ 95, 99 ], deleteIdleStats: true, }
https://github.com/etsy/statsd/blob/master/exampleConfig.js
GRAPHITEMetric store & Graph UI
http://graphite.wikidot.com/ http://graphite.readthedocs.org/en/latest/
GRAPHITE SETUPAt least one graphite server per data center
DATA RETENTION
[default] pattern = .* priority = 110 retentions = 10:6h,60:15d,600:5y xFilesFactor = 0
http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf
STANDARD AGGREGATIONS# Average & Sum for timers <prefix>.timers.<key>._totals.ash.<type>.avg (10) = avg <<prefix>>.timers.<<key>>.<node>.<type> !<prefix>.timers.<key>._totals.ash.<type>.sum (10) = sum <<prefix>>.timers.<<key>>.<node>.(?!upper|lower)<type> !# Min / Max for Lower / Upper <prefix>.timers.<key>._totals.ash.upper (10) = max <<prefix>>.timers.<<key>>.<node>.upper !<prefix>.timers.<key>._totals.ash.lower (10) = min <<prefix>>.timers.<<key>>.<node>.lower
http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf
PERFORMANCEFirst problem: IOPS
Second problem: CPUhttp://www.organisationscience.com/styled-6/files/dt-improved-performance.jpg
GRAPHITE ALTERNATIVESCirconus: All the insights you ever wanted
Zabbix: OSS self hosted monitoringhttp://circonus.com
http://zabbix.com https://github.com/lyft/circonus-statsd-backend
https://github.com/dlecocq/statsd-zabbix
GRAPHITE.JSCustom dashboards using jQuery
https://github.com/prestontimmons/graphitejs http://dashboarddude.com/blog/2013/01/23/dashboards-for-graphite/
COSTOptimize for adoption rates in your organization by
eliminating cost as a constrainthttp://www.examiner.com/images/blog/wysiwyg/image/money].jpg
INSTRUMENTATIONInstrument your infrastructure, not just your apps
http://2.bp.blogspot.com/-bL9D8VMtor4/TiNBDEJmvOI/AAAAAAAAByc/Y0Uc3GVPNl0/s400/SeminaGestaoPessoasOrquestraROB4428.jpg
APACHEUse mod_statsd to capture stats directly from the Apache request
http://kaleidos.net/files/images/apache318x260.pnghttp://httpd.apache.org/
https://github.com/jib/mod_statsd
BASIC CONFIGURATION<Location /api> Statsd On StatsdPrefix apache </Location>
https://github.com/jib/mod_statsd/blob/master/DOCUMENTATION
$ curl http://localhost/api/foo?id=42 !
Stat: apache.api.foo.GET.200:31|ms
VARNISHuse libvmod-statsd & libvmod-timers to capture
stats directly from the Varnish requesthttp://www.adammalone.net/sites/default/files/styles/blog_image/public/varnish-bunny.png?itok=1bBDTA1A
https://www.varnish-cache.org/ https://github.com/jib/libvmod-statsd
BASIC CONFIGURATION# pseudo codeimport statsd; import timers;sub vcl_deliver { statsd.timing( $backend + # from req.backend $hit_miss + # from obj.hits $resp_code, # from obj.status timers. req_response_time() );}
https://github.com/jib/libvmod-statsd/blob/master/README.rst http://jiboumans.wordpress.com/2013/02/27/realtime-stats-from-varnish/
SAMPLE GRAPHThe request per second & response time graphs
are coming straight from varnish
PYTHONCreate a base library in your language of choice
https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search
KRUX-STDLIB$ pip install krux-stdlib
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/
BASIC APP USING STDLIB
$ sample-app -h[…]!logging: --log-level {info,debug,critical,warning,error} Verbosity of logging. (default: warning)stats: --stats Enable sending statistics to statsd. (default: False) --stats-host STATS_HOST Statsd host to send statistics to. (default: localhost) --stats-port STATS_PORT Statsd port to send statistics to. (default: 8125) --stats-environment STATS_ENVIRONMENT Statsd environment. (default: dev)
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/
BASIC APP USING STDLIBclass App(krux.cli.Application): def __init__(self): ### Call to the superclass to bootstrap. super(Application, self).__init__( name = 'sample-app') def run(self): stats = self.stats log = self.logger! with stats.timer('run'): log.info('running...') ...
https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/ https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search
CLIecho ‘events.deploy.appname:1|c’ | nc localhost -u 8125
JAVASCRIPTUse a simple HTTP endpoint to send stats
PUPPETUse the Puppet module graphite-report to send Puppet
reporting data directly to Graphitehttp://docs.puppetlabs.com/guides/reporting.html
https://github.com/krux/puppet-module-graphite-report
Q & A
http://vickicaruana.blogspot.com/2011/01/are-you-afraid-to-raise-your-hand.html
@jiboumans http://slideshare.net/jiboumans