devops days tel aviv 2013: what exactly is anti-fragile in devops? - asher sterkin
TRANSCRIPT
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 1
What is Exactly Anti-Fragile in DevOps? Asher Sterkin
Distinguished Engineer, SPVSS, Cisco Video Systems, Israel
September 30, 2013
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
Antifragile
Some things benefit from shocks…
volatility, randomness, disorder,
and stressors and love adventure,
risk, and uncertainty… there is no
word for the exact opposite of
fragile. Let’s call it antifragile. Nassim N. Taleb, “Antifragile. Things that gain from
disorder”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
The Book and Reactions
I think this concept is incredibly
powerful when applied to systems
and organizational architecture.
Jez Humble, “On Antifragility in Systems and
Organizational Structure”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
The Book and Reactions
The Netflix cloud architecture is
anti-fragile… The Netflix culture is
anti-fragile… Getting stronger
through failure is the basis of anti-
fragility. Avoiding failure at all costs
… makes you brittle and
vulnerable...
Adrian Cockcroft, “Looking back at 2012 with
pointers to 2013”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
The Book and Reactions
If the idea is nice and neat,
however, the book that houses it
is just the opposite. It is a big,
baggy, sprawling mess.
David Runciman, review of the book in Guardian
November 21, 2012
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
Larger Body of Knowledge
• Complex Adaptive Systems
• Highly-Optimized Tolerance
• Technology Development Cycle
• Disruptive Innovations
• Product Development Flow
• Lean Start Up
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
For Today • De-fragilization
• Skin in the Game
• Barbell
• Asymmetric Pay-off
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
Patrick Debois: “Codifying devops practices”
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 11
De-Fragilization
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
“Beauty plus pity-that is the closest we can get to a definition of art. Where there is beauty there is pity for the simple reason that beauty must die: beauty always dies, the manner dies with the matter, the world dies with the individual.”
V. Nabokov, “Lecture on Metamorphosis”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
Continuous Delivery
... to exert a constant stress on
your delivery and deployment
process to reduce its fragility so
that releasing becomes a
boring, low-risk activity.
Jez Humble, “On Antifragility in Systems and
Organizational Structure”
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 15
Large batches increase cycle time
Large batches increase variability in flow
John Allspaw: “Ops Meta-Metrics”,
slides 103-109
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 16
Reducing batch size accelerates feedback
Reducing batch size reduces overhead
Reducing batch size reduces risk
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Batch size and Bottlenecks Reduce batch
size before you
attack
bottlenecks
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
“Resilience through Failure”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Resilience to failure is a
lofty goal. It enables a
system to survive and
withstand failure. There's an
even higher peak to strive
for, however: making the
system stronger and better
with each failure.
A. Tseitlin, “The Antifragile Organization”
Culture of Continuous Learning
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
E. Deming, “Out of Crisis”
“Confusing common causes with special causes will only make things worse
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
Resilience, Reliability, Robustness
Normal Operation
MTBF
Failure!
MTTD MTTR
Max Downtime
Normal Operation
MTBF
Max Data
Lost
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
It’s me
“Obsessive protection of the system against extremely rare events makes it more fragile. Resilience comes before the last percentiles of reliability.
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 26
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
Give ‘em pagers!
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29
“DevOps: architects, developers, QA/QC, system engineers, and IT cooperate to maximize the company value
It’s me again
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 30
Barbell
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32 John Alsspaw, “Dev and Ops Collaboration”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 34
Developers
• Responding to outages, takes on-call
• Alerting systems thresholding, design
• Architecture design and review
• Building metrics collection
• Application configuration
• Shipping public-facing code
• Responding to outages, takes on-call
• Alerting systems thresholding, design
• Architecture design and review
• Building metrics collection
• Application configuration
• Infrastructure buildout/management
Operations
John Allspaw, “Reply to NoOps @ Netflix”
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project No, not really
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36
Every analogy has its limit.
One just needs to learn where
to stop.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37
N. Taleb, “Antifragile”
“The first step to anti-fragility consists in decreasing downside…This brings us to the solution in the form of barbell … Away from Golden Middle.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38
Away from golden Middle
Risk
Aversion
Risk
Loving
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 39
Niek Bartholomeus,
“DevOps For Dinosaurs”
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 40 Jeremy Edberg, “DevOps at Netflix”
Risk Aversion Risk Loving
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41
N. Taleb, “Antifragile”
“The downside/lost should be known and protected, not probability
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42
It’s me
“ You never know where from your next outage or cyber attack will come from and when. The maximal downtime and maximal data lost should be known and guaranteed regardless of probabilities.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43
Asymmetric Pay-Offs
x
Gain
/Loss f
(x)
Pain
Gain
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44
DevOps Areas
OPS DEV
Area 1: extend delivery to
production
Area 2: extend operations
feedback to project
Area 3: embed project knowledge into operations
Area 4: embed operations knowledge into project
Cisco Confidential © 2013 Cisco and/or its affiliates. All rights reserved. 45
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46
N. Taleb
“Never be sucker. Period!
Thanks