monitoring : the art of knowing when and why things go wrong
TRANSCRIPT
1
La 1ère école 100 % dédiée à l'open source
Open Source School est fondée à l'initiative de Smile, leader de l'intégration et de l'infogérance open source, et de l'EPSI,établissement privé pionnier de l’enseignement supérieur en informatique.
Dans le cadre du Programme d’Investissements d’Avenir (PIA), le gouvernement français a décidé de soutenir la création de cette école en lui attribuant une première aide de 1,4M€ et confirme sa volonté de soutenir la filière du Logiciel Libre actuellement en plein développement.
Avec une croissance annuelle de plus de 10%, et 4 000 postes vacants chaque année dans le secteur du Logiciel Libre, OSS entend répondre à la pénurie de compétences du secteur en mobilisant l’ensemble de l’écosystème et en proposant la plus vaste offre en matière de formation aux technologies open source tant en formation initiale qu'en formation continue.
2
Les formations du plein emploi !
Formation Continue
Open Source School "Executive Education" est un organisme de formation agréé qui propose un catalogue de plus de 200 formations professionnelles et différents dispositifs de reconversion permettant le retour à l’emploi (POE) ou une meilleure employabilité pour de nombreux professionnels de l’informatique.
Pour vos demandes : [email protected]
Formation Initiale
100% logiciels libres et 100% alternance, le cursus Open Source School s’appuie sur le référentiel des blocs de compétences de l’EPSI.Il est sanctionné par un titre de niveau I RNCP, Bac+5. Le programme est proposé dans 6 campus à Bordeaux, Lille, Lyon, Montpellier, Nantes, Paris.
Principles Monitoring tools Reading graphs Real-life examples
Plan
1 Principles
2 Monitoring tools
3 Reading graphs
4 Real-life examples
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 2/63
Principles Monitoring tools Reading graphs Real-life examples
Principles
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 3/63
Principles Monitoring tools Reading graphs Real-life examples
Stating the obvious
Stating the obvious
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 4/63
Principles Monitoring tools Reading graphs Real-life examples
Stating the obvious
Guess what?
You need monitoring !!
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 5/63
Principles Monitoring tools Reading graphs Real-life examples
Stating the obvious
Alright, why ?
You application will be down at some point in time, and you’llwant to know it before the users
You will want to know what failed first, because it is the likelycause of the problem
You will want to know how your application behavescompared to projections
You might want to keep an eye on some particular subsystem
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 6/63
Principles Monitoring tools Reading graphs Real-life examples
What is monitoring
What is monitoring
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 7/63
Principles Monitoring tools Reading graphs Real-life examples
What is monitoring
What is monitoring
Monitoring is a vague term that encompasses many techniquesused to :
Periodically makes sure everything works fine
Warn you when something is wrong
Take automated corrective actions
Collect performance data periodically and present it to you
Not all monitoring tools implement all these featuressimultaneously, simpler systems may concentrate on one feature
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 8/63
Principles Monitoring tools Reading graphs Real-life examples
Information theory
Information theory
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 9/63
Principles Monitoring tools Reading graphs Real-life examples
Information theory
Information theory
To see if something is wrong, you must perform a check
The program that performs the check is often called a probe
The check may return a binary status : working or failing
The check may return data : 35◦C, 1432MB, 14MB/s ...
Such checks are usually compared to a threshold
Failed checks may lead to notification, and even escalation
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 10/63
Principles Monitoring tools Reading graphs Real-life examples
Information theory
Information theory
In order to get a quick notification, you need to check often
This may increase the load on the server
Checks are not actual users, designing a good check is hard
And often requires additional development !
A single failure does not mean the application is down, asingle success does not mean the application is fully back up
Avoiding false positives and false negatives is hard
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 11/63
Principles Monitoring tools Reading graphs Real-life examples
Information theory
Polling intervals
Some monitoring tools poll their target every 15 minutes
This is good in some cases
What if you’re doing a 20 minutes load test ? a 5 minutesimport ?
Some monitoring tools poll every 10 seconds
Do you have enough storage space ? Is it really necessary onthe long term ?
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 12/63
Principles Monitoring tools Reading graphs Real-life examples
Information theory
Data storage
Some monitoring tools store data in RRD databases
Fixed sizeBut you lose accuracy with timeData retention is usually configurable
Others store data in a regular database
Watch for data growth
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 13/63
Principles Monitoring tools Reading graphs Real-life examples
Monitoring tools
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 14/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 15/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin
Munin is a ”graphing” tool
Configuration is almost decentralized, and mostly done on themonitored system
Munin use RRD graphs and has a default polling interval of 5minutes
Munin has a (static) web interface
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 16/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: power features
Complex aggregate graphs
Custom probes are simple to write and test
Ability to send notification on thresholds
Easy enough to setup to be used on a single server
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 17/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: how to install
aptitude install munin : on the server
aptitude install munin-node : on each monitored system
Or on the same machine, if standalone
aptitude install munin-plugins-extra : additionalplugins
Wait about 5 minutes for munin-cron to trigger
Go to : http://localhost/munin/
You might have to allow from all in/etc/apache2/conf.d/munin
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 18/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: configuring plugins
By default, Debian will enable some plugins based on alreadyinstalled packages
Use munin-node-configure to list available and enabledplugins
To enable a new plugin, symlink it in /etc/munin/plugins
Plugins might need configuration in/etc/munin/plugin-conf.d
Usually to change the user the plugin should run asOr transmit environment variables
The name of the symlink can be used to select :
Plugin argument : if eth0, ps apache2
Sub-features or alternate features : df inode,varnish objects
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 19/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: testing plugins
Use munin-run to run plugin in the same environment asmunin-node
This avoir permission issues : your plugin may work as rootbut not as munin
You can give arguments such as config to print the pluginconfig
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 20/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: writing plugins
A plugin is just a program that returns one or more labelledvalues
It can use any language : C, shell, perl, even PHP
A plugin should have a config argument that makes it printits config
TitleLabelsColorsGraphing options (stacked, lines, differential, units...)
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 21/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: configuring nodes for remote polling
Edit /etc/munin/munin-node.confc i d r a l l o w 127 . 0 . 0 . 1 /32c i d r a l l o w $mun i n s e r v e r a d d r e s s
Restart munin-node
Check that you can nc -v host 4949 from the monitoringserver
btw : try list and fetch $plugin
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 22/63
Principles Monitoring tools Reading graphs Real-life examples
Munin
Munin: adding hosts to the central server
edit /etc/munin/munin.conf[ myse rve r . mydomain . com ]add r e s s myse rve r . mydomain . com
Combined graphs :
[ domain . com ; To ta l s ] #Force i t i n t o the ” foo . com”−domain . . .update no # Turn o f f data−f e t c h i n g f o r t h i s ” hos t ” .
loadsum . g r a p h t i t l e Loads summarisedloadsum . comb ined l oads . sum s e r v e r 1 . mydomain . com : l oad . l oad s e r v e r 2 . mydomain . com : l oad . l oadloadsum . comb ined l oads . l a b e l Combined l o a d s
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 23/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 24/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios
Nagios is a service monitoring application
It is based on a Green-Yellow-Red trichotomy (OK,WARNING and CRITICAL)
It can be used for graphing through third-party components(nagiosgrapher)
Its the heart of many other network monitoring tools(Centreon...)
It is old, and has successors now : ICINGA, Shinken
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 25/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: features
The largest collection of probes in the world
In fact, other solutions stopped bothering and tend to becompatible with nagios probes
Based on hosts and services
contacts can be defined, and escalated to
It is possible to schedule downtime, and temporaryacknowlege a problem
Nagios has flap detection
Heaps of third-party components (Nagvis...)
Passive monitoring (with freshness checks)
Distribution / redundancy
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 26/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: installation
aptitude install nagios3
Configure standard HTTP authentication in/etc/apache2/conf.d/nagios3.conf if required
Log in to http://localhost/nagios3/
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 27/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: object typesSimple types
hosts : a physical or virtual server, printer, switch...services : something a host does or has, and that you want tomonitorcontacts : where and how you want to send notificationscommands : probes and notification methods
Group types
hostgroupservicegroupcontactgroup
Advanced types :
time periodsservice/host dependancyservice/host escalationservice/host extended information : only cosmetic
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 28/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: defining a new host
d e f i n e hos t{use g en e r i c−hos thost name t e s t . l x cadd r e s s 192 . 168 . 122 . 13
}
Reload nagios
The host is now visible in the web interface, and checked(pinged)
But no services are monitored
Add it to the ssh-servers hostgroup to check SSH
This works because a service has been previously defined onthe ssh-servers hostgroup
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 29/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: adding a new service
Let’s monitor a hypothetical /appcheck.php test page
First, add a command, since there are no default commandsthat allow arbitrary URLs
d e f i n e command{command name check webappcommand l ine / u s r / l i b / nag i o s / p l u g i n s / ch e ck h t t p −I ’$HOSTADDRESS$’ −u ’/ appcheck . php ’
}
Now, we can add a service checkd e f i n e s e r v i c e {
host name l o c a l h o s ts e r v i c e d e s c r i p t i o n Magentocheck command check webappuse g en e r i c−s e r v i c e
}
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 30/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: parameters
What if the URL is not always /appcheck.php
What if we want to set a timeout on the check, or any otherparameter ?
d e f i n e command{command name check webappcommand l ine / u s r / l i b / nag i o s / p l u g i n s / ch e ck h t t p −I ’$HOSTADDRESS$’ −u ’$ARG1$ ’
}d e f i n e s e r v i c e {
host name l o c a l h o s ts e r v i c e d e s c r i p t i o n Magentocheck command check webapp !/ appcheck . phpuse g en e r i c−s e r v i c e
}
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 31/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: custom probes
A custom probe is simply a plugin that returns 0, 1, 2 or 3
They may return performance data for graphing, if relevant
All textual output is saved, but has no consequence
As with munin, any language can be used
Of course, the faster the better, standard plugins are thuswritten in C
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 32/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: remote probes
The information you want to read may not be availableremotely
Or it may be, but you would rather keep the port closed
The solution is the Nagios Remote Plugin Executor : NRPE
On the monitoring server : aptitude install
nagios-nrpe-plugin
On the monitored systems : aptitude installnagios-nrpe-server nagios-plugins
Then, allow the monitoring server to connect in/etc/nagios/nrpe.cfg
You may no define probes in /etc/nagios/nrpe.cfg
By default, the monitoring server cannot change a probe’sarguments, but you may decide to allow it
In that case, set dont blame nrpe=1
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 33/63
Principles Monitoring tools Reading graphs Real-life examples
Nagios
Nagios: nrpe config
write a plugin that tests if a controversial package has beeninstalled
In nrpe local.cfg :
command [ check open jdk ]=/ u s r / l i b / nag i o s / p l u g i n s / check open jdk
In nagios config :
d e f i n e s e r v i c e {host name l o c a l h o s ts e r v i c e d e s c r i p t i o n Magentocheck command ch e c k n r p e 1 a r g ! check open jdkuse g en e r i c−s e r v i c e
}
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 34/63
Principles Monitoring tools Reading graphs Real-life examples
collectd
collectd
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 35/63
Principles Monitoring tools Reading graphs Real-life examples
collectd
collectd
Collectd is a small monitoring tool
uses RRD with a very fast default refresh rate (10 seconds)
Plugins are mostly C and not very easy to adapt
It is pretty easy to compile if you need to
It is perfect for load test monitoring
It is not perfect for long-term monitoring !
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 36/63
Principles Monitoring tools Reading graphs Real-life examples
collectd
collectd: installation
aptitude install collectd
Edit /etc/collectd/collectd.conf
Enable the plugins you needAnd configure them
For instance, you can display apache2 and mysqld processesseparatelyAnd you might need to enter a mysql password
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 37/63
Principles Monitoring tools Reading graphs Real-life examples
collectd
collectd: visualization
By default, collectd only fills RRD files, and does not showyou anything
Fortunately, we are provided with a CGI script for that
You can rsync the rrd files and display them locally to avoidmessing with apache config
/usr/share/doc/collectd/examples
Copy collection3 to /var/www
Install libconfig-general-perl andlibregexp-common-perl
AllowOverride All inside /var/www/collection3
Access http://localhost/collection3/bin
Use the controls to zoom on the graphs !
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 38/63
Principles Monitoring tools Reading graphs Real-life examples
Reading graphs
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 39/63
Principles Monitoring tools Reading graphs Real-life examples
Reading graphs
”Traffic lights” monitoring is easy to comprehend
It is working (green-OK)It might fail soon (yellow-WARNING)It is failing (red-CRITICAL)
Graphs are harder
Where do I put the threshold ?Is this pattern normal ?What is happening here ?
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 40/63
Principles Monitoring tools Reading graphs Real-life examples
Common pitfalls
Common pitfalls
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 41/63
Principles Monitoring tools Reading graphs Real-life examples
Common pitfalls
Common pitfalls: understanding the metric
What is it i’m reading
Load average : what does it mean, what does it NOT mean
CPU usage : user vs system
Memory
Usually extremely confusingsimple available/free graphs are wrong by definition (becausememory management is not simple)graphs that show caches and buffers are probably more reliable
How is the value obtained ?
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 42/63
Principles Monitoring tools Reading graphs Real-life examples
Common pitfalls
Common pitfalls: monitoring tool limitations
Sampling rate
Frequent spikes will not appeargauge or counter ?
The probe is just another process : it may hang is the serveris loading
Holes in the graph are not a cause for alarm in small amounts
Before reading a graph, check the date : reading stale data isuseless
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 43/63
Principles Monitoring tools Reading graphs Real-life examples
Common pitfalls
Common pitfalls: graphing limitations
Always check the scales (time scale at top, metric verticalscale, min max and average values at bottom)
A huge spike will completely ”crush” the graph
Adjust the axes if necessary
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 44/63
Principles Monitoring tools Reading graphs Real-life examples
Common pitfalls
Common pitfalls: interpretation
This weird looking pattern that happened at the same time asthe application bug might just be perfectly normal
Check the history of the graphs, does the pattern repeat itself? regularly ?
You should see your backups
When performance-enhancing changes enter production,check the graphs !
When ”we swear it won’t affect performance” changes enterproduction, double-check the graphs !
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 45/63
Principles Monitoring tools Reading graphs Real-life examples
Real-life examples
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 46/63
Principles Monitoring tools Reading graphs Real-life examples
A website has TV coverage on high-audience channel
A website has TV coverage on high-audience channel
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 47/63
Principles Monitoring tools Reading graphs Real-life examples
A website has TV coverage on high-audience channel
TV commercial impact 1/4 : bandwidth usage
Context : merchant-website is talked about in a popular TV Show.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 48/63
Principles Monitoring tools Reading graphs Real-life examples
A website has TV coverage on high-audience channel
TV commercial impact 2/4 : varnish hit/miss count
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 49/63
Principles Monitoring tools Reading graphs Real-life examples
A website has TV coverage on high-audience channel
TV commercial impact 2/4 : what to read
What do we read in those two graphs :
1 HTTP hit per second climbs from less than 500/s to almost5000/s (10-fold raise)
2 In the varnish graph we see clearly that the cache scales(green). Not the misses (orange).
3 The burst doesn’t last : people in front of TV go immediatlyor forget. Most don’t stay long on the site.
4 The network usage grows 10 folds too. The average is around20Mb/s, the 95th percentile is around 40 and here we reached352Mb/s max.
5 At no time do we see a ”flat” in either graph. Nothingsaturated. Meaning that nothing was here to limit the goodimpact of TV coverage.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 50/63
Principles Monitoring tools Reading graphs Real-life examples
A website has TV coverage on high-audience channel
TV commercial impact 4/4 : analysis
Point 2 above is very good it means that varnish is able tocache most hits, and the rate of the non-cached hits does notgrow with frequentation. This situation is the best one canwish for.
Pre-requisites : the application is design and developped withvarnish in mind and devloppers understand what the gain isProbable outcome : platform can handle massivefrequentation with no sweat.
No network saturation means the website’s users probably hadno visible symptom of slowness.
Reality outcome : this particular website successfully handled it’smost active day (and highest daily income to date).
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 51/63
Principles Monitoring tools Reading graphs Real-life examples
Important news impact on editorial website
Important news impact on editorial website
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 52/63
Principles Monitoring tools Reading graphs Real-life examples
Important news impact on editorial website
Big news on news-site 1/3 : varnish hit/miss countContext : newspaper-like website with eZ publish. MuammarGaddafi dies that day. The daily evening newsletter links to thearticle.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 53/63
Principles Monitoring tools Reading graphs Real-life examples
Important news impact on editorial website
Big news on news-site 2/3 : what to read
On the graph are highlighted from left to right :
1 The spike of previous’ day after lunch2 The one of today’s morning (similar in height)3 The massive spikes of afternoon when the news is spread on
the web, facebook and others4 and the even more massive spike of frequentation after a
newsletter was sent.
Learn how your users behave. On news websites, thetypical scheme is : people open their favorite webiste once inthe morning when they arrive to work and once in theafternoon after lunch. Those period are a typicalhigh-frequentation moments.
The green/orange ratio is similar to the previous example,scaling is absorbed by cached hits.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 54/63
Principles Monitoring tools Reading graphs Real-life examples
Important news impact on editorial website
Big news on news-site 3/3 : analysis
As in the previous example varnish handled the scaling by cachingmost hits.Reality outcome : this particular website successfully behavedcorrectly
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 55/63
Principles Monitoring tools Reading graphs Real-life examples
Hardware limit crashes a website
Hardware limit crashes a website
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 56/63
Principles Monitoring tools Reading graphs Real-life examples
Hardware limit crashes a website
Firewall limit 1/4 : network bandiwdth graphContext : a Magento website switchs to using memcache forstoring data and sessions. A bug in the application causes anendless loop of lookup/write in the cache. Heavy traffic kills thefirewall.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 57/63
Principles Monitoring tools Reading graphs Real-life examples
Hardware limit crashes a website
Firewall limit 2/4 : memcache hit rate graph
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 58/63
Principles Monitoring tools Reading graphs Real-life examples
Hardware limit crashes a website
Firewall limit 3/4 : what to read
1 On the network graph we can see several things
The flat zone indicates the network traffic limit reached by thefirewall (around 600Mb/s)We can see arount 6pm the huge raise in network traffic (3fold) and saturation instantly (caused by the application loop).Around 11pm memcache was desactivated in application’sconfiguration.
2 On the memcache graph, the cache-hit-rate is elevated(around one thousand) whitch is good
Every time a cache is used instead of database or file it’s fasterfor everyone (server performance and client response time)Cache impact was therefore positive but the network limitcripled the architecture and forced a fallback to a slowersolution.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 59/63
Principles Monitoring tools Reading graphs Real-life examples
Hardware limit crashes a website
Firewall limit 4/4 : analysis
1 Flat zones in most graph is a synonym for a problemthings are always going up or down.
2 Both graphs above show flat we can see the network problemand the application bug.
3 After firewall crash MySQL servers located in the samesegment as memcache became unreachable, causing thewebsite’s to go dark.
4 Such graphics, used with knowledge of the incident chronologycan help to investigate and understand an incident causepost mortem, but rarely helps to understand real time.
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 60/63
Principles Monitoring tools Reading graphs Real-life examples
Custom monitoring
Custom monitoring
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 61/63
Principles Monitoring tools Reading graphs Real-life examples
Custom monitoring
Custom metrics 1/2
www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 62/63
Principles Monitoring tools Reading graphs Real-life examples
Custom monitoring
Custom metrics 2/2Graph above show moving average of the number of sales for5,10 and 30 daysA simple SQL query can be an interesting metric.Don’t overblow this, query or process must remain very fastan light to execute for the server.
select ’five_day.value’, count(*)/5
from sales_flat_order
where created_at > date_sub(now(), interval 5 day)
union
select ’ten_day.value’, count(*)/10
from sales_flat_order
where created_at > date_sub(now(), interval 10 day)
union
select ’thirty_day.value’,count(*)/30
from sales_flat_order
where created_at > date_sub(now(), interval 30 day)www.opensourceschool.fr – Licence Creative Commons (CC BY-SA 3.0 FR) – 63/63