2016 - ignite - blameless system design

20
Blameless System Design Douglas Land Vast.com, Inc.

Upload: devopsdaysaustin

Post on 13-Apr-2017

205 views

Category:

Software


0 download

TRANSCRIPT

Page 1: 2016 - IGNITE - Blameless System Design

Blameless System DesignDouglas LandVast.com, Inc.

Page 2: 2016 - IGNITE - Blameless System Design

I break systems… a LOT ● Auth● Syslog● Chef● Ambassadors● Prod Frontends

Page 3: 2016 - IGNITE - Blameless System Design

Sometimes I ‘break’ systems on purpose...● Service discovery by chef● 90% code in prod● No shared storage for cloudstack

Sometimes you just need do things.

Page 4: 2016 - IGNITE - Blameless System Design

Higher standardsAnd yet, I still hold others to a higher standard..

● Servers still on public internet???● Created a flat VLAN when we did move to private IPs???● No centralized management of virtualization infrastructure???● The only 'shared storage' is via DRBD and ha.d???

Page 5: 2016 - IGNITE - Blameless System Design

Technical debtor’s prisonWe’re obsessed with technical debt

Qualifying it:

● Application Debt ● Infrastructure Debt● Architecture Debt

Quantifying it:

● size of code base● code coverage● coupling and cohesion reports● cyclomatic complexity● Halstead complexity measures

Page 6: 2016 - IGNITE - Blameless System Design

The myth of technical debt

Peter Norvig, “All code is liability”

Not actually technical debt:● Maintenance● Changes in understanding● Operational inertia● Poor code choices● Dependency liabilities

Page 7: 2016 - IGNITE - Blameless System Design

So what is technical debt?Technical debt is the choices we intentionally make to speed up the development or implementation of systems, and which we acknowledge will need to be changed later.

Technical debt is the result of an Efficiency-Thoroughness Trade-Off at an individual level.

Technical debt is the output of a project constraint model at an organizational level.

Page 8: 2016 - IGNITE - Blameless System Design

The blame gameShouldn't we stop blaming people for making the trade-offs they're forced to make?

Page 9: 2016 - IGNITE - Blameless System Design

Being Blameless● If we remove fear we will have a more

honest conversation about trade-offs● if we're honest about those trade-offs

crisis might be averted altogether● If we understand our history, we won't be

destined to repeat it

Page 10: 2016 - IGNITE - Blameless System Design

What is blameless system design?Assuming goodwill

Blameless post-mortems

Empathy

Experimentation

Honesty

Communication

Page 11: 2016 - IGNITE - Blameless System Design

Assume Goodwill

Your co-worker probably doesn’t come into work every day with the intent of harming you or the organization.

Page 12: 2016 - IGNITE - Blameless System Design

Blameless Post-mortems“We must strive to understand that accidents don’t happen because people gamble and lose.Accidents happen because the person believes that:…what is about to happen is not possible,…or what is about to happen has no connection to what they are doing,…or that the possibility of getting the intended outcome is well worth whatever risk there is.”

- Erik Hollnagel

Page 13: 2016 - IGNITE - Blameless System Design

Empathy

● Reject ‘contempt culture’● Focus on the positive● Consider others’ perspectives

Page 14: 2016 - IGNITE - Blameless System Design

ExperimentationThe Engineering Design Process

● Define the Problem

● Do Background Research

● Specify Requirements

● Brainstorm Solutions

● Choose the Best Solution

● Do Development Work

● Build a Prototype

● Test and Redesign

Page 15: 2016 - IGNITE - Blameless System Design

Honesty● Publish ALL your results● Document ALL your decisions● Be honest about trade-offs● Track mitigations

Page 16: 2016 - IGNITE - Blameless System Design

Communication● Broadcast expectations● Honor achievements● Make doc easy to find● Open discussions● Well define feedback

channels

Page 17: 2016 - IGNITE - Blameless System Design

Did someone say devops?

● Culture● Measurement● Sharing● Feedback loops

Page 18: 2016 - IGNITE - Blameless System Design

The badIt’s hard to change culture and get away from a retribution culture and the RCA mentality

It’s hard to get over hindsight bias.

It’s a lot of work to encourage openness and honesty, and define what that looks like.

It’s hard to get over their impostor syndrome and / or contempt cultures.

Page 19: 2016 - IGNITE - Blameless System Design

The good● Remove fear● Encourage ‘risk’● Create feedback● Reduce redundant learning● Improve working environment, trust

Page 20: 2016 - IGNITE - Blameless System Design

Douglas Land - Director of operations, Vast.com, Inc.

[email protected] | @webuilddevops

Some References:

http://www.datical.com/blog/technical-debt-devops/

http://laughingmeme.org/2016/01/10/towards-an-understanding-of-technical-debt/

http://blog.aurynn.com/86/contempt-culture

http://erikhollnagel.com/ideas/etto-principle/index.html

http://indecorous.com/fallible_humans/

https://hbr.org/2003/05/it-doesnt-matter/ar/pr

https://codeascraft.com/2014/07/18/just-culture-resources/

http://sidneydekker.com/just-culture/