we'll do it live! - joseph pierri, pagerduty - devopsdays tel aviv 2016

Post on 08-Jan-2017

40 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

joseph@pagerduty.com

“We’ll do it Live!”

Testing your Software in ProductionJoseph Pierri

joseph@pagerduty.com

joseph@pagerduty.com

joseph@pagerduty.com

People who don’t take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year. - Peter Drucker

joseph@pagerduty.com

Testing: Problem Statement

Optimize User Experience

Minimize Operational Pain

Constraint: Developer Time

joseph@pagerduty.com

Conventional Approach

Production

Local Testing

Staging

Load Test

joseph@pagerduty.com

Staging

Benefits Challenges

Sort of prod-like Contention

Integration Difficult to Scale

Often Broken

joseph@pagerduty.com

Load Testing

Benefits Challenges

Realistic Data Maintaining Data

Realistic Fleet Scaling Fleet

Realistic Traffic Traffic…

joseph@pagerduty.com

Less Conventional Approaches

Local Containers

Disposable Environments

Test in Production

joseph@pagerduty.com

Local Containers• No contention issues

• Easy integration testing

• Some scalability issues

joseph@pagerduty.com

Disposable Environments• Codified environment

• Spun up and disposed of on demand

joseph@pagerduty.com

Testing in Production• Very production-like!

• Workload & environment

• Requires risk mitigation techniques…

joseph@pagerduty.com

Reducing Risk

Know when Something Breaks

Limit the Impact

Rolling Back

joseph@pagerduty.com

Know when Something Breaks

Monitoring, Logging, Alerting

(Others)

joseph@pagerduty.com

Know when Something Breaks

Production End-to-End Functional Testing

Software

SystemE2E Suite

ALERT

FAIL?

joseph@pagerduty.com

Limit the Impact

Canary Deploys

V+

V

Hosts

joseph@pagerduty.com

Limit the Impact

Feature Flags

Software

ServiceUsers

V+

V

joseph@pagerduty.com

Rolling Back

Deployment Pipeline

Test, Build

Canary

Deploy

Rollback

joseph@pagerduty.com

Culture

Quantify!

Risk Tolerance

“You built it, you run it”

joseph@pagerduty.com

TiP - Real World

Good Fit Difficult

New feature Mobile app

Incremental Chg Bank machine

New service Rocket

joseph@pagerduty.com

Measure Everything!

# deploys by service

joseph@pagerduty.com

A Tale of Two Services

joseph@pagerduty.com

People who don’t take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year. - Peter Drucker

joseph@pagerduty.com

A Tale of Two Services• 2014: New backend service for notifications

• Contention headaches

• Load Test fleet matches prod

joseph@pagerduty.com

A Tale of Two Services• 2016: New Kafka producer service

• Containerized

• “Prod is the best LT anyways”

joseph@pagerduty.com

Bringing it Together

4

3

1 Conventional Approaches

2

Reducing Risk

Real World

Testing in Production

joseph@pagerduty.com

A Tale of a Bridge

Silver Bridge (1928-1967)

joseph@pagerduty.com

A Tale of a Bridge

"[The Silver Bridge] legacy should be to remind engineers to proceed always with the utmost caution, ever mindful of the possible existence of unknown unknowns and the potential consequences of even the smallest design decisions” - Henry Petroski

joseph@pagerduty.com

Software ≠ BridgeBridge Software

Deploys One Many

Partial Deploys? No Yes

Rollbacks Difficult Easy

Bad Deploys? Disaster Manageable

Approach Never Fail Fail Fast

joseph@pagerduty.com

Questions

top related