shift happens - rapidly rolling forward during production failure

37
1 IBM _ Chapter Opening September 16, 2015 Presentation Title Shift Happens Continually moving forward when the outcome looks bleak @Al_Wagner

Upload: ibm-urbancode-products

Post on 08-Feb-2017

101 views

Category:

Software


2 download

TRANSCRIPT

1IBM _

Chapter Opening

September 16, 2015Presentation Title

Shift Happens

Continually moving forward when the outcome looks bleak@Al_Wagner

2IBM _

Avoiding Deployment Failures..

especially those that could cause a production outage, is top of mind for many IT professionals. However, sometimes failures will occur in production, which means that planning for recovery is essential. Preventative measures like canary, blue/green or rolling deployments can help, but also having the ability to roll forward (instead of rolling back), also known as shifting right, means you can push through a failure while learning from deployment process mistakes and shortening mean time to recovery (MTTR).• Deployment models like canary, blue/green and rolling that can

help prevent major production outages• How to pinpoint deployment failures in your process and correct

them• Pulling together a basic failure response plan• How you can roll forward while improving your deployment

process

September 16, 2015Shift Happens

Survey says…

https://www-01.ibm.com/marketing/iwm/dre/signup?source=mrs-form-3570&S_PKG=ov50501

DevOps is all about executing with speed!

Line-of-business

Customer

Getting ideas into production fast – Getting people to use it – Analyzing their feedback

Continuous Delivery

Continuous Feedback

Continuous Innovation

• Reducing Scopeü Small batches of

incremental changes• Empowering Resources

ü Co-located, automatous teams

• Accelerating Schedulesü Automate, automate,

automate• Increasing Quality

ü Everyone contributes

ü Small batches of incremental changes

ü Co-located, automatous teams, collaboration

ü Continuous release & deployment

ü Everyone contributes

5

Managing the Iron Triangle by…

QualitySchedule

Scope

Resources

Traditional software deployments

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Users

Environment #1

Environment #2

LoadBalancer

1. Servers taken off-line

Traditional software deployments

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Users

Environment #1

Environment #2

LoadBalancer New deployment is tested

1. Servers taken off-line2. New release is deployed & tested

The clock is ticking!

Software Deployment

Traditional software deployments

WebServer

AppServer

DatabaseServer

AppServer

DatabaseServer

Users

Environment #1

Environment #2

LoadBalancer New version of application

1. Servers taken off-line2. New release is deployed & tested3. Servers brought back on-line

WebServer

Manual deployments are error prone

9

One wrong move andit can all

And when disaster strikes! You need to know…

What failed?

Where didit fail?

What apps were

impacted?

Should I move traffic to

another server?

Do we go forward or rollback?

If you fail to plan; you plan to fail!

Why didit fail?

During the post mortem, you need to uncover…Did anything trigger the deployment failure?What was the root cause of the failure?What could we have done differently to avoid this situation?How can we improve so it doesn’t happen again?

Accelerate delivery of incremental software change

Failures due to inconsistent devand production environments

Bottlenecks trying to deliver more

frequent releases to meet market

demands

Complex, manual, processes for release lack

repeatability and speed

Poor visibility into dependencies

across releases, resources, and

teams

Accelerate delivery of incremental software change

Failures due to inconsistent devand production environments

Bottlenecks trying to deliver more

frequent releases to meet market

demands

Complex, manual, processes for release lack

repeatability and speed

Poor visibility into dependencies

across releases, resources, and

teams

The Four Pillars ofGold-Standard Deployment

• Use the same processü Reduces deployment errors

• Automate, automate, automateü Deliver repeatability, reliability, &

with traceability

• Deliver incremental changesü Reduces risk to business

• Release what you testü Increases confidence

Automate provisioning and deploymentsSCM

Build Automation

Publish build

Pull changes

IBMCloudOrchestrator

IBMPureApplicationSystem

IBMCloudManagerwithOpenStack

IBMBluemix

Provision environment with open patterns

Public: Sharedoff premises cloud

Dedicated: off premises cloud

Local: Dedicated on premises

cloudTraditional IT

ü Traceable

VMWarevCenter

ü Repeatableü Reliable

IBM UrbanCodeDeploy

Automatedeployment to

hybrid environments

IBM Cloud UrbanCode Deploy as a Service

Develop Build

Mobile Device

Mainframe

Traditional

Deploy

Features of the new SaaS offering• Full automated application delivery capabilities• Hosted on IBM infrastructure, managed by IBM• Monthly subscription, license managed by IBM• Full product support

App

App

App

App

SoftLayer, AWS, Azure

App

IBM Cloud UrbanCode

Deploy

NEW!

16Page© 2016 IBM Corporation

IBM UrbanCode Release for release management

16

ü No more release week-end parties: Coordinate stakeholders, orchestrate deployment activities, enforce qualification process with relevant workflow and quality gates, get necessary approvals prior getting to production. Make releases predictable and boring!

ü Reduced down time: Eliminate wasted time, orchestrate large & complex releases involving several hundred applications, and hundreds of stakeholders.

ü Reduced time to market with continuous delivery releases:Accelerate release frequency with distributed release management for small scope frequent releases delivered by application teamsMake releases predictable and boring!

IBM UrbanCode Release & Deploy iOS mobile appü Monitor Progress:

Understand the overall progress of your releases and remaining work. Get real time calculations of the projected completion time

ü Alert for Critical issues: See critical data of late tasks and idling tasks so you can encounter problems and mitigate business risks.

ü Understand team status: Learn from teams what they are blocked by to take the right corrective actions

https://itunes.apple.com/ca/app/ibm-urbancode-release-deploy/id1084753666?mt=8

Shift right and continuously move forward

Accelerate releases by making a conscious decision to carry an acceptable level of …

…into PRODUCTION!

Dark Launches & Toggles• Feature toggle - restricts access to source code

in development until ready for release to end users

if “work_in_progress” {develop new functionality here

} else {already deployed as production code

};

• Business toggle – control user or group of user access to new functionality

if “beta_usergroup” {provide access to new experiment

} else {route user to existing production code

};

ü ProsNew experiments can quickly be made available to groups of trusted users

X ConsIncrease in technical debt as ”toggle” code needs to be managed

Zero downtime deployment strategies

Canary Release Blue/Green Deployments Rolling Deployments

a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.

a release technique that reduces downtime and risk by running two identical production environments called Blue and Green. At any time, only one of the environments is live, with the live environment serving all production traffic.

a software release strategy that staggers deployment across multiple phases, which usually include one or more servers performing one or more functions within a server cluster to reduce application downtime.

Canary Releases (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Users

Old Version

Old Version

50% ofUsers

LoadBalancer

50% ofUsers

Canary Releases (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Users

Old Version

New Version

AllUsers

DeploymentAutomationInventory

LoadBalancer

Canary Releases (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Users

Old Version

New Version

MostUsers

(95%)

SomeUsers

(5%)

DeploymentAutomationInventory

LoadBalancer

As confidence in the new release increases, the percentage of users who have access is increased.

Canary Releases (example flow)

WebServer

AppServer

DatabaseServer

Users LoadBalancer

Old Version

WebServer

AppServer

DatabaseServer

New Version

AllUsers

DeploymentAutomationInventory

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

New VersionNew Version

Eventually the new version is deployed to the second environment.

Canary Releases (example flow)

WebServer

AppServer

DatabaseServer

Users LoadBalancer

Old Version

WebServer

AppServer

DatabaseServer

New Version

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

New VersionNew Version

50% ofUsers

50% ofUsers

And the user load is split across the two environments.

Blue / Green Deployments (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Environment #1

RouterUsers

AllUsers

Two environments, each of sufficient resources to serve the application in production.

Environment #2Previous Release

(hot stand-by)

Blue / Green Deployments (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Environment #1

Environment #2

RouterUsers

AllUsers

Two environments, each of sufficient resources to serve the application in production.

DeploymentAutomationInventory

The new release is deployed to the idle environment.

Blue / Green Deployments (example flow)

WebServer

AppServer

DatabaseServer

WebServer

AppServer

DatabaseServer

Environment #1

RouterUsers

AllUsers

Two environments, each of sufficient resources to serve the application in production.

Environment #2

Previous Release(hot stand-by)

When the new deployment is working as expected, users are routed to the new version.

LoadBalancer

Rolling Deployments (example flow)

WebServer

AppServer

DatabaseServer

Server Cluster #1

WebServer

AppServer

DatabaseServer

Server Cluster #2

WebServer

AppServer

DatabaseServer

Server Cluster #3

WebServer

AppServer

DatabaseServer

Server Cluster #4

Users

Rolling Deployments (example flow)

WebServer

AppServer

DatabaseServer

Server Cluster #1

WebServer

AppServer

DatabaseServer

Server Cluster #2

WebServer

AppServer

DatabaseServer

Server Cluster #3

WebServer

AppServer

DatabaseServer

Server Cluster #4

Users

DeploymentAutomationInventory

LoadBalancer

1. Cluster #1 taken off-line2. Application change deployed3. Deployment tested

Rolling Deployments (example flow)

WebServer

AppServer

DatabaseServer

Server Cluster #1

WebServer

AppServer

DatabaseServer

Server Cluster #2

WebServer

AppServer

DatabaseServer

Server Cluster #3

WebServer

AppServer

DatabaseServer

Server Cluster #4

Users

DeploymentAutomationInventory

LoadBalancer

1. Cluster #1 brought back on-line2. Cluster #2 is taken off-line3. Application change deployed4. Deployment tested

Rolling Deployments (example flow)

WebServer

AppServer

DatabaseServer

Server Cluster #1

WebServer

AppServer

DatabaseServer

Server Cluster #2

WebServer

AppServer

DatabaseServer

Server Cluster #3

WebServer

AppServer

DatabaseServer

Server Cluster #4

Users

DeploymentAutomationInventory

LoadBalancer

1. Cluster #3 brought back on-line2. Cluster #3 & #4 is taken off-line3. Application change deployed4. Deployment tested

Rolling Deployments (example flow)

WebServer

AppServer

DatabaseServer

Server Cluster #1

WebServer

AppServer

DatabaseServer

Server Cluster #2

WebServer

AppServer

DatabaseServer

Server Cluster #3

WebServer

AppServer

DatabaseServer

Server Cluster #4

Users LoadBalancer

All environments are presenting the latest version of the application.

Pros and Cons…Canary Release Blue/Green Deployments Rolling Deployments

Pros• No downtime of production

environment• Quick access to a backup

environment• A/B testing of new features and

functionality• Capture performance metrics of

new release during early adoption

Cons• Management and maintenance of

multiple versions of the software• Maintain persistent sessions

during deployment• Database must support two

versions of the application (until cut-over is complete)

Pros• No downtime of production

environment• Quick access to a backup

environment – hot standby• Ability to test application in a

production environment

Cons• Requires two similar environments• Maintain persistent sessions

during deployment• Database must support two

versions of the application (until cut-over is complete)

Pros• No downtime of production

environment• Incrementally validate

deployments and reduce risk• Reduce visibility of performance

degradation• Seamless user experience

Cons• Maintain persistent sessions

during deployment• Database must support two

versions of the application (until deployment is complete)

Your mission if you choose to accept it…Measure your DevOps progress• Deployment / Change Frequency

– Measures delivery team responsiveness, cohesiveness, capabilities, efficiency, & tooling effectiveness

• Change Lead Time– Measure efficiency of end to end development process; from first code change to deployment– Measure cycle time of the individual activities

• Change Failure Rate– How many deployment fail / number of deployments

• Mean Time To Recover (MTTR)– How long does it take to recover from a failure– Understand the contributors to failure:

• code complexity, number of app changes, number of operating environment changes

36

37IBM _ September 16, 2015Shift Happens

Thank You