making your logs work for you: drupal escalation and disaster recovery

Post on 06-Aug-2015

105 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

When sh*t hits the fan, what do you do?

How to make your logs work for you in times of website sadness

DrupalCon LA 2015

Pantheon.io

Most common causes of downtime

30%Weather/

Environment

33%IT/

Equipment

34%Cyber Attack

Pantheon.io

The top cause of unplanned outages?

48%Human Error

52% of those surveyed believe ALL or most of unplanned outages can be avoided.

What does downtime cost you and your business?

Pantheon.io

How does downtime affect your bottom line?

37%

22%

15%

10%

9%

7%

Cost associated with reputation and brand damage

Revenues lost because of system availability problems

Loss of user productivity and increased frustration

Cost associated with compliance or regulatory failure

Cost of forensics to determine the root causes of disruptions

Cost of technical support to restore systems to an operational state

Pantheon.io

Meet the Speaker

Timani Tunduwani

Customer Support Manager

PANTHEON RUNS 100,000 WEBSITES

WE DO THIS ALL THE TIME

Pantheon.io

1. Overview2. Logging in Drupal & PHP3. Incident Planning & Management4. Live Demo5. Questions

Agenda

Pantheon.io

Website is down. Why?How do we get it back up?

Is the infrastructure down?

Is Drupal Sad and Broken again??

What’s going on?

Do I fix it or Pantheon?

WTF?! FIX IT?

What happens when your website is down?

EVERY SECOND YOU DON’T KNOW

IS ANOTHER SECOND YOUR

WEBSITE IS DOWN

Pantheon.io

What would you do if your website went down… right now?

Website Owner

Who do I contact?

YOU NEED A PLAN! SERIOUSLY!

Website Developer Project Manager Drupal Support & Maintenance Team

Application log management

<?php watchdog($type, $message, $severity = WATCHDOG_NOTICE, $link = NULL); ?>

Pantheon.io

1. Standardize2. Centralize3. Aggregate4. Analyze5. Alert

A 5 step plan for success

Pantheon.io

1. Semi-arbitrary log format2. Drupal 8 using PSR33. Can not have saved searches beyond sticky search4. No reporting dashboard for post mortem5. No stack traces6. Not portable. Have you tried to export the watchdog table?

Current limitations of watchdog

Pantheon.io

MariaDB [pantheon]> select wid, type, message, variables from watchdog limit 100 \G

*************************** 1. row ***************************

wid: 1830682

type: php

message: %type: !message in %function (line %line of %file).

variables: a:6:{s:5:"%type";s:6:"Notice";s:8:"!message";s:26:"Undefined index:

authorize";s:9:"%function";s:40:"FeedsEntityProcessor->entitySaveAccess()";s:5:"%file";s:

108:"

/srv/bindings/aa7491e7ef954a8fb4f9dc41abccab80/code/sites/all/modules/feeds/plugins/Feeds

EntityProcessor.inc";s:5:"%line";i:77;s:14:"severity_level";i:5;}

Drupal watchdog table

Pantheon.io

1. PHP Framework Interop Group (PHP Fig)

2. Monolog2.1. Chain of responsibility

logging pattern2.2. Core concepts

Overview

Pantheon.io

Proposing a Standards Recommendation (PSR)

❏ PSR 0: added-spl-autoload-register❏ PSR-1: Basic-coding-standard❏ PSR-2: Coding-style-guide-meta❏ PSR-3: Logger-interface❏ PSR-4: Autoloader-examples

Pantheon.io

PSR-3 : A common interface for logging libraries

The goal is to allow libraries to receive a Psr\Log\LoggerInterface object and write logs to it in a simple and universal way.

Pantheon.io

Logging Levels - RFC 5424

Error Level Code Description

DEBUG 100 Detailed debug information.

INFO 200 Interesting events. Examples: User logs in, SQL logs.

NOTICE 250 Normal but significant events.

WARNING 300 Exceptional occurrences that are not errors.

Error 400 Runtime errors that do not require immediate action.

Critical 500 Critical conditions.

Alert 550 Action must be taken immediately.

Emergency 600 Emergency: system is unusable.

Pantheon.io

Monolog + Composer + Drupal

Monolog sends your logs to files, sockets, inboxes, databases and various web services

Pantheon.io

Chain of responsibility pattern

Pantheon.io

Core Concepts

1. Logger2. Handler3. Log Levels4. Formatter5. Processor6. Utilities

Log system overview

Application Performance Monitoring

Pantheon.io

Centralizing application logs

Monolog

Pantheon.io

• On-Call Scheduling• Auto-Escalation• International Reach• Collaboration• Advanced Analytics

Features

• Reliability• Monitoring Aggregation• Easy Setup• Effective Alerting• Full stack visibility

Cloud-based centralized log management

Pantheon.io

Centralizing application logs

Monolog

Pantheon.io

• Built-in alerting• Customized dashboards• Persistent workspaces• Multiple integrations available• Advanced Analytics• Overage protection

• Agentless log collection• Centralized logging• Supports multiple log formats• Automated event parsing• Powerful search capabilities• Unlimited saved searches

Features

Incident Management

Pantheon.io

Incident Response Goals

1. Verify that an incident occurred.2. Maintain or Restore Business Continuity.3. Reduce the incident impact.4. Determine how the attack was done or the incident happened.5. Prevent future attacks or incidents.6. Improve security and incident response.7. Prosecute illegal activity.8. Keep management informed of the situation & response

Pantheon.io

Incident planning

Step 1Form a Collaborative Planning Team

Step 2Understand the Situation

Step 3Determine Goals and Objectives

Step 4Plan Development

Step 5Plan Prep, Review & Approval

Step 6Plan Implementation& Maintenance

Incident managment system

IT incidents management platform

Pantheon.io

1. Reliability2. Monitoring Aggregation3. Easy Setup4. Effective Alerting5. Mobile Incident Management6. Escalation Policies

1. On-Call Scheduling2. Auto-Escalation3. International Reach4. Collaboration5. Advanced Analytics

Features

Slack HQ communication platform

Live Demo #1Time to break something. YAY!

Fin!

top related