how to measure everything: a million metrics per second with minimal developer overhead - puppetco

51
HOW TO MEASURE EVERYTHING A million metrics per second with minimal developer overhead Jos Boumans - @jiboumans http://www.imagemediapartners.com/Portals/20286/images/MeasuringTape-s.jpg

Upload: puppet-labs

Post on 01-Dec-2014

671 views

Category:

Technology


1 download

DESCRIPTION

How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - Jos Boumans, Krux

TRANSCRIPT

Page 1: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

HOW TO MEASURE EVERYTHINGA million metrics per second with minimal developer overhead

!Jos Boumans - @jiboumans

http://www.imagemediapartners.com/Portals/20286/images/MeasuringTape-s.jpg

Page 2: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

RIPE NCCEngineering manager for RIPE Database

http://www.ripe.net/db

Page 3: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

CANONICAL

http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775

Engineering manager for Ubuntu Server 10.04 & 10.10

http://www.ubuntu.com/business/server/overview

Page 4: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

KRUXVP of Operations & Infrastructure

http://www.krux.com/

Page 5: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

SOME OF OUR CUSTOMERS

Page 6: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

A LOT OF TRAFFIChttp://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html

Page 7: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

AVERAGE DATA EVENTS / SEC

http://investor.fb.com/results.cfm http://www.statisticbrain.com/twitter-statistics/

http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

0 35,000 70,000 105,000 140,000

Twitter : New Tweets Wikipedia: Page Views

Facebook: Messages Sent Krux: New Data Points

Page 8: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

MONTHLY UNIQUE USERS

0 500,000,000 1,000,000,000 1,500,000,000 2,000,000,000

http://reportcard.wmflabs.org/ http://www.statisticbrain.com/twitter-statistics/

http://newsroom.fb.com/company-info/

Page 9: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

DATA IS EVERYTHINGAlways know what’s going on

http://perpetual-wonder.com/blog/wp-content/uploads/2012/09/Where-do-we-go-from-here.jpg

Page 11: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

METRICS & VISUALIZATION… and a little bit of monitoring

http://getfit101.files.wordpress.com/2012/04/visualization.jpg

Page 12: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

VISUALIZATION MATTERSHumans are good at patterns & shapes

http://1.bp.blogspot.com/-CO-8FK9bohE/T89rD8dTyEI/AAAAAAAAAEE/YUZ00v_filk/s1600/live_like_it_matters_by_mythirll-d3iqcxt.jpg

Page 13: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

INSIGHT MATTERSWe consider it a core competence

http://yourselfseries.com/teens/files/2013/05/suicide_bonus_Insight_final.jpg

Page 14: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

SHOW EVERYONEAnd better yet, encourage people to add their own

http://www.kissimmee.org/ftp/KCC/events/views/images/crowd_cheer.jpg

Page 15: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

THE BOTTOM LINE

Page 16: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

KEY CHARACTERISTICS… of our metrics collection

http://www.fullcirclefeedback.com.au/resources/wp-content/uploads/2014/01/Key-skills-and-characteristics-of-good-HR-leaders.jpg

Page 17: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

WHAT TO VISUALIZEPick your operational KPIs

http://1.bp.blogspot.com/-nrB1A9hamEk/UVZui_JUG1I/AAAAAAAAAdI/zGqHuanZNVU/s1600/missed-opportunities.jpg

Page 18: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

REQUEST & ERROR RATESThe baseline for everything else

Page 19: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

WORST RESPONSE TIMESTrack the worst upper 95th & upper 99th across a cluster

Page 20: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

TRACK EVENTSDid a code change or batch job cause a change in

behaviour?

Page 21: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

CAPACITY / THRESHOLDSHow much traffic can your service sustain?

Page 22: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

SINGLE SERVICE OVERVIEWCreate a single graph for every service

Page 23: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

WHAT TO CAPTUREEverything. No, really.

http://arkansasagnews.uark.edu/monarchs95.jpg

Page 24: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

INFRASTRUCTUREEverything needed to create, capture and

act on a million metrics per secondshttp://discussamerica.org/remer-blog/images/Freeway_Interchange2.jpg

Page 25: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

GRAPHITE, STATSD & COLLECTDThe Trifecta

Page 26: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

COLLECTDOpen Source Monitoring Tool

https://collectd.org/ https://collectd.org/wiki/index.php/Plugin:StatsD

Page 27: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

STATSDSimple stats collector service

https://github.com/etsy/statsd http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

https://wwwx.cs.unc.edu/~sparkst/howto/network_tuning.phphttp://emps.exeter.ac.uk/media/universityofexeter/emps/eisa/exista-splash.jpg

Page 28: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

STATSD NAMING SCHEMEstats. # to distinguish from events$environment. # prod, dev, etc$cluster_name. # api-ash, www-dub, etc$application. # webapp, login, etc$metric_name_here. # any key the app wants$hostname # node the stat came from

Page 29: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

STATSD CONFIGURATION{ graphite: { globalPrefix: stats.$env.$cluster_name, globalSuffix: require(‘os').hostname().split('.')[0], legacyNamespace: false, }, percentThreshold: [ 95, 99 ], deleteIdleStats: true, }

https://github.com/etsy/statsd/blob/master/exampleConfig.js

Page 30: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

GRAPHITEMetric store & Graph UI

http://graphite.wikidot.com/ http://graphite.readthedocs.org/en/latest/

Page 31: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

GRAPHITE SETUPAt least one graphite server per data center

Page 32: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

DATA RETENTION

[default] pattern = .* priority = 110 retentions = 10:6h,60:15d,600:5y xFilesFactor = 0

http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf

Page 33: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

STANDARD AGGREGATIONS# Average & Sum for timers <prefix>.timers.<key>._totals.ash.<type>.avg (10) = avg <<prefix>>.timers.<<key>>.<node>.<type> !<prefix>.timers.<key>._totals.ash.<type>.sum (10) = sum <<prefix>>.timers.<<key>>.<node>.(?!upper|lower)<type> !# Min / Max for Lower / Upper <prefix>.timers.<key>._totals.ash.upper (10) = max <<prefix>>.timers.<<key>>.<node>.upper !<prefix>.timers.<key>._totals.ash.lower (10) = min <<prefix>>.timers.<<key>>.<node>.lower

http://graphite.readthedocs.org/en/latest/config-carbon.html#aggregation-rules-conf

Page 34: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

PERFORMANCEFirst problem: IOPS

Second problem: CPUhttp://www.organisationscience.com/styled-6/files/dt-improved-performance.jpg

Page 35: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

GRAPHITE ALTERNATIVESCirconus: All the insights you ever wanted

Zabbix: OSS self hosted monitoringhttp://circonus.com

http://zabbix.com https://github.com/lyft/circonus-statsd-backend

https://github.com/dlecocq/statsd-zabbix

Page 36: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

GRAPHITE.JSCustom dashboards using jQuery

https://github.com/prestontimmons/graphitejs http://dashboarddude.com/blog/2013/01/23/dashboards-for-graphite/

Page 37: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

COSTOptimize for adoption rates in your organization by

eliminating cost as a constrainthttp://www.examiner.com/images/blog/wysiwyg/image/money].jpg

Page 38: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

INSTRUMENTATIONInstrument your infrastructure, not just your apps

http://2.bp.blogspot.com/-bL9D8VMtor4/TiNBDEJmvOI/AAAAAAAAByc/Y0Uc3GVPNl0/s400/SeminaGestaoPessoasOrquestraROB4428.jpg

Page 39: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

APACHEUse mod_statsd to capture stats directly from the Apache request

http://kaleidos.net/files/images/apache318x260.pnghttp://httpd.apache.org/

https://github.com/jib/mod_statsd

Page 40: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

BASIC CONFIGURATION<Location /api> Statsd On StatsdPrefix apache </Location>

https://github.com/jib/mod_statsd/blob/master/DOCUMENTATION

$ curl http://localhost/api/foo?id=42 !

Stat: apache.api.foo.GET.200:31|ms

Page 41: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

VARNISHuse libvmod-statsd & libvmod-timers to capture

stats directly from the Varnish requesthttp://www.adammalone.net/sites/default/files/styles/blog_image/public/varnish-bunny.png?itok=1bBDTA1A

https://www.varnish-cache.org/ https://github.com/jib/libvmod-statsd

Page 42: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

BASIC CONFIGURATION# pseudo codeimport statsd; import timers;sub vcl_deliver { statsd.timing( $backend + # from req.backend $hit_miss + # from obj.hits $resp_code, # from obj.status timers. req_response_time() );}

https://github.com/jib/libvmod-statsd/blob/master/README.rst http://jiboumans.wordpress.com/2013/02/27/realtime-stats-from-varnish/

Page 43: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

SAMPLE GRAPHThe request per second & response time graphs

are coming straight from varnish

Page 44: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

PYTHONCreate a base library in your language of choice

https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search

Page 45: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

KRUX-STDLIB$ pip install krux-stdlib

https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/

Page 46: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

BASIC APP USING STDLIB

$ sample-app -h[…]!logging: --log-level {info,debug,critical,warning,error} Verbosity of logging. (default: warning)stats: --stats Enable sending statistics to statsd. (default: False) --stats-host STATS_HOST Statsd host to send statistics to. (default: localhost) --stats-port STATS_PORT Statsd port to send statistics to. (default: 8125) --stats-environment STATS_ENVIRONMENT Statsd environment. (default: dev)

https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/

Page 47: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

BASIC APP USING STDLIBclass App(krux.cli.Application): def __init__(self): ### Call to the superclass to bootstrap. super(Application, self).__init__( name = 'sample-app') def run(self): stats = self.stats log = self.logger! with stats.timer('run'): log.info('running...') ...

https://staticfiles.krxd.net/foss/docs/pypi/krux-stdlib/ https://pypi.python.org/pypi?%3Aaction=search&term=krux&submit=search

Page 48: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

CLIecho ‘events.deploy.appname:1|c’ | nc localhost -u 8125

Page 49: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

JAVASCRIPTUse a simple HTTP endpoint to send stats

Page 50: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

PUPPETUse the Puppet module graphite-report to send Puppet

reporting data directly to Graphitehttp://docs.puppetlabs.com/guides/reporting.html

https://github.com/krux/puppet-module-graphite-report

Page 51: How to Measure Everything: A Million Metrics Per Second with Minimal Developer Overhead - PuppetCo

Q & A

http://vickicaruana.blogspot.com/2011/01/are-you-afraid-to-raise-your-hand.html

@jiboumans http://slideshare.net/jiboumans