enterprise drupal application & hosting infrastructure level monitoring

22

Upload: daniel-kanchev

Post on 06-Apr-2017

123 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring
Page 2: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Enterprise Drupal Application & Hosting Infrastructure Level

Monitoring

Daniel KanchevSenior Site Reliability Engineer

@dvkanchev

Page 3: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Enterprise Drupal Hosting Characteristics

○ Consists of multiple servers

○ Provides high availability

○ Offers auto scalability

○ Requires multiple services to work as expected

Page 4: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Enterprise Drupal Hosting Characteristics

○ Consists of multiple servers

○ Provides high availability

○ Offers auto scalability

○ Requires multiple services to work as expected

○ Really expensive

○ Nobody wants to manage this sh*t :)

Page 5: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Hosting Types Complexity

Page 6: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Hosting Types Complexity

○ Shared Hosting Service

○ Single Virtual Server

○ Single Dedicated Server

○ PaaS

Page 7: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Hosting Types Complexity

○ Shared Hosting Service

○ Single Virtual Server

○ Single Dedicated Server

○ PaaS

○ Custom Private/Public Clouds

Page 8: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring
Page 9: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

○ ElasticSearch/Solr

○ Redis/Memcached

○ GraphQL

○ MongoDB

○ Nodejs

○ Gearman

○ CI systems

Page 10: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

One Monitoring To Rule Them All

• Website Monitoring• Hosting Infrastructure Monitoring

Page 11: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Website Monitoring Architecture

Website

London Amsterdam Munich

Page 12: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Website Monitoring Architecture

Website

London Amsterdam Munich

503 ISE

Page 13: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Incidents○ Critical Incident - website is down from all locations

○ Major Incident - website is down from a single location; MySQL replication

is broken; PHP fatal errors recorded in the logs; read-only file system issue

○ Minor Incident - Memcached/Redis on a single server is down

○ Notice Incident - web node X is running out of space; PHP warnings

recorded in the logs

Page 14: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring
Page 15: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Core Principles○ Log all events and archive them. Write postmortem reports

○ Check every single incident - even minor ones and notices

○ Define performance limits and regularly check reports

○ Beware of cascade failures

○ Always strive to go back to pre-incident state

○ Check one thing at a time and return “OK” or “Failure”

Page 16: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Examples○ 1 of 5 app servers goes down

○ Load on the other 4 increases by 20%

○ Redis caches are invalidated - overload

○ Varnish is restarted by a system

administrator to apply a configuration

change

○ App servers start to return 503 errors

○ MySQL master goes down

○ MySQL slave 1 takes over and at this

moment there is no downtime

○ MySQL slave 2 is behind the new

master

○ The new MySQL master goes down too

result is a broken DB or outdated one

Page 17: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring
Page 18: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring
Page 19: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

KEY TAKEAWAYS

1. Embrace Failure and Design for Failure2. Automate Recovery3. Log all incidents and analyse them4. Measure and graph the performance of all components5. Regularly brake things on purpose in order to test

Page 20: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

RESOURCES

Injecting Failure at Netflix - goo.gl/YE1sEYWhat is SRE - goo.gl/2lI8E0SRE book - goo.gl/bfL2AtNetflix Open Source Software - https://netflix.github.io/Etsy “Measure Everything” - goo.gl/CPVUT5

Page 21: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

JOIN US FORCONTRIBUTION SPRINTS

First Time Sprinter Workshop - 9:00-12:00 - Room Wicklow2AMentored Core Sprint - 9:00-18:00 - Wicklow Hall 2BGeneral Sprints - 9:00 - 18:00 - Wicklow Hall 2A

Page 22: Enterprise Drupal Application & Hosting Infrastructure Level Monitoring

Evaluate This Session

THANK YOU!

events.drupal.org/dublin2016/schedule

WHAT DID YOU THINK?