we'll do it live! - joseph pierri, pagerduty - devopsdays tel aviv 2016
Post on 08-Jan-2017
40 Views
Preview:
TRANSCRIPT
joseph@pagerduty.com
“We’ll do it Live!”
Testing your Software in ProductionJoseph Pierri
joseph@pagerduty.com
joseph@pagerduty.com
People who don’t take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year. - Peter Drucker
joseph@pagerduty.com
Testing: Problem Statement
Optimize User Experience
Minimize Operational Pain
Constraint: Developer Time
joseph@pagerduty.com
Conventional Approach
Production
Local Testing
Staging
Load Test
joseph@pagerduty.com
Staging
Benefits Challenges
Sort of prod-like Contention
Integration Difficult to Scale
Often Broken
joseph@pagerduty.com
Load Testing
Benefits Challenges
Realistic Data Maintaining Data
Realistic Fleet Scaling Fleet
Realistic Traffic Traffic…
joseph@pagerduty.com
Less Conventional Approaches
Local Containers
Disposable Environments
Test in Production
joseph@pagerduty.com
Local Containers• No contention issues
• Easy integration testing
• Some scalability issues
joseph@pagerduty.com
Disposable Environments• Codified environment
• Spun up and disposed of on demand
joseph@pagerduty.com
Testing in Production• Very production-like!
• Workload & environment
• Requires risk mitigation techniques…
joseph@pagerduty.com
Reducing Risk
Know when Something Breaks
Limit the Impact
Rolling Back
joseph@pagerduty.com
Know when Something Breaks
Monitoring, Logging, Alerting
(Others)
joseph@pagerduty.com
Know when Something Breaks
Production End-to-End Functional Testing
Software
SystemE2E Suite
ALERT
FAIL?
joseph@pagerduty.com
Limit the Impact
Canary Deploys
V+
V
Hosts
joseph@pagerduty.com
Limit the Impact
Feature Flags
Software
ServiceUsers
V+
V
joseph@pagerduty.com
Rolling Back
Deployment Pipeline
Test, Build
Canary
Deploy
Rollback
joseph@pagerduty.com
Culture
Quantify!
Risk Tolerance
“You built it, you run it”
joseph@pagerduty.com
TiP - Real World
Good Fit Difficult
New feature Mobile app
Incremental Chg Bank machine
New service Rocket
joseph@pagerduty.com
People who don’t take risks generally make about two big mistakes a year. People who do take risks generally make about two big mistakes a year. - Peter Drucker
joseph@pagerduty.com
A Tale of Two Services• 2014: New backend service for notifications
• Contention headaches
• Load Test fleet matches prod
joseph@pagerduty.com
A Tale of Two Services• 2016: New Kafka producer service
• Containerized
• “Prod is the best LT anyways”
joseph@pagerduty.com
Bringing it Together
4
3
1 Conventional Approaches
2
Reducing Risk
Real World
Testing in Production
joseph@pagerduty.com
A Tale of a Bridge
Silver Bridge (1928-1967)
joseph@pagerduty.com
A Tale of a Bridge
"[The Silver Bridge] legacy should be to remind engineers to proceed always with the utmost caution, ever mindful of the possible existence of unknown unknowns and the potential consequences of even the smallest design decisions” - Henry Petroski
joseph@pagerduty.com
Software ≠ BridgeBridge Software
Deploys One Many
Partial Deploys? No Yes
Rollbacks Difficult Easy
Bad Deploys? Disaster Manageable
Approach Never Fail Fail Fast
top related