Download - Continuous Delivery with NetflixOSS
Continuous Delivery with Netflix OSSDan Woods
/danveloper
Senior Software Engineer: Delivery Engineering
Learning Ratpack
Overview of Netflix OSS
• Netflix encourages talking to the world about how we’re solving problems
!• We solve a ton of problems that companies both small
and large are faced with !• Shoot to open source as much as possible
Overview of Netflix OSS
• Netflix is a large consumer of cloud offerings — mostly from AWS
!• We’ve done a ton of work over the years to lift the
infrastructure entirely to the cloud !• Pioneered running at scale on Amazon AWS
Overview of Netflix OSS
• Developed a massive tool suite to operationalize running in the cloud at scale
!• Teams need to be able to quickly get code running in the
cloud !• Teams needs to quickly be able to see metrics and
performance
Overview of Netflix OSS
Links: !
http://techblog.netflix.com/ !
http://github.com/netflix !
http://netflix.github.io
Continuous Delivery
Big Picture:!!
What Does Continuous Delivery Mean At Netflix?
Continuous Delivery
Big Picture:!!• Immutable Infrastructure !• Tooling the Build System !• Ongoing and Continuous Deployment
Immutable Infrastructure
• Designing a server to become your unit of deployment !• “Bake” the software into a “pre-cooked” (known-good
configuration) image !• Allows you to test and certify a server image for
distribution !• Walk that server through the phases of test, qa, and
finally to prod
Immutable Infrastructure
• Builds must be designed in a way that produces an os-package
!• This allows the build to control the manner in which the
server image will be created !• Specify OS-level dependencies (Java, Python, etc) !• Get all the benefits of a version controlled configuration
Tooling the Build System
• Hundreds, sometimes thousand, of builds that run every day at Netflix
!• Builds need to fit into a somewhat conferment structure to
garner the support of the tooling !• A polyglot stack adds a ton of complexity to designing
the tooling for the build system !• Teams are free to use whatever language or framework or
stack that they want, and we need to do our best to have a handle on the permutations
Tooling the Build System
• The JVM is the predominant code platform at Netflix !• Many different languages on the JVM, including:
JavaScript, Scala, Groovy, Clojure, Ruby, Python !• The “runner up” runtime is NodeJS !• Lots of new JavaScript stuff starting to come out, starting
to design scalable tooling around JS
Tooling the Build System
• Netflix has adopted Gradle as its build platform !• Gradle is a JVM-based build system that is capable of
building JVM and non-JVM projects !• Support for dynamically and programmatically designing
builds (loads of flexibility) !• Great open source community, tons of support from
Gradlware
Tooling the Build System
• Can build plugins for Gradle in Groovy (ahh soo nice :-)) !• Plugins are designed to make it appealing for teams to
conform to the tooling infrastructure !• Custom internal Gradle wrapper applies common
conventions and applies hacks that would be unmanageable at scale
!• The goal of all this is to make teams want to use the build
tooling, so that we can operationalize and manage it for scale
Continuous Deployment
• Continuous Delivery at Netflix speaks to more than just staging code for deployment
!• The Continuous Delivery story is a follow through, from
source to production !• Continuous Deployment is an integral part of that process
(it means the code running in the cloud!) !• Hands down this is the trickiest and most-fragile part of
the whole process…
Continuous Deployment
• By this point in the workflow, the code has already been built and baked…
!• We have an immutable server image, and we’re ready to
ship it off to the cloud… !• The complexity is here: “ship it off to the cloud” is an
inherently asynchronous process… !• There are many failure points.!
Continuous Deployment
What constitutes a successful deployment? !• Every application has a different definition of “success” !• Need to provide tooling so that the process is able to
identify the vectors of success !
Continuous Deployment
What constitutes a successful deployment? !• Amazon telling us the server has deployed is basically
the equivalent of them saying they pressed the power button
!• Need to consider a successful deployment in terms of
“this server is ready to start taking traffic” !
!
Continuous Deployment
What constitutes a successful deployment? !• “Ready to start taking traffic” means different things to
different applications: !
• Tomcat has started, and the app is listening? !
• Tomcat has started, app is listening, caches are primed?
!• Tomcat has started, app is listening, and the server
group is in some designated traffic pool (canary)? !
!!
Continuous Deployment
• Service discovery becomes a very big part of understanding the health of an app
!• Gives the app the responsibility to inform the tool as to its
traffic-taking-readiness !• It would be difficult for the tool to reach out to every
instance to ask it for its health, better to have the instance tell us !
• The tooling now only need to query two places: Amazon and the Service Registry
Continuous Deployment
• Teams can choose if “Discovery” health should be incorporated into their continuous deployment workflow
!• This may not be necessary; for strictly IPC stack apps, it’s
ok for them to be “up” and to let the IPC client (Ribbon) determine to which instance traffic is routed
Continuous Deployment
What do we do after success?!!• Once the new version of code is deployed, now what? !• Netflix lumps packages of software into a “cluster”, within
which different versions may run !• For rapid rollback, we need to keep the ancestor server
group around, but take it out of traffic rotation
Continuous Deployment
What do we do after success?!!• Put the ancestral server group into a “disabled” state !• Inform the service registry that the instances within this
group are no longer accepting traffic !• Most consuming apps will use the service registry to find
their endpoint, so this is sufficient !• For those that use DNS and go through a load balancer,
we remove the instances from associated load balancers as well
Continuous Deployment
Why not just update the existing config and roll the servers (rolling push)?!!• Rolling push is a bad, bad thing !• While new instances are launching against a new image,
ancestral instances still exist !• Can leave the server group in a half-done state, which
can yield very weird results !• Tooling is built around the server group being the
management target
Continuous Deployment
Incubating Deployment Strategies…!!• Phased canary
• 25%, 50%, 75%, 100% !• Global push
• Deployment windows to different regions !• Highlander
• Don’t keep the ancestor server group around • This is good for test environments that don’t need
rollback
Continuous Deployment
Continuous Delivery Tooling!!• Many CD tools are available today from NetflixOSS! !• The puzzle pieces are there for the entire problem
domain !• Tooling for build system packaging, baking immutable
infrastructure, service discovery, continuous deployment, and cluster management
Build System Tooling
Nebula Gradle Plugins!!• Nebula (like, “space clouds”) is a collection of Gradle
plugins to assist in the continuous delivery workflow !• Often two parts: Nebula and Gradle — The “Gradle” part
is just a Gradle plugin, and you’re on your own to configure it; the “Nebula” part is an opinionated veneer
!• Tons of great plugins, extensive documentation, many,
many, many available videos and presentations on Nebula
Build System Tooling
Nebula OS Package Plugins!!• The Gradle Side
• Provides mechanism for producing Debian and RPM artifacts • Very straight-forward integration that uses Gradle’s well-known
CopySpec for getting files into an OS structure • Nice DSL for describing OS-level dependencies
!• The Nebula Side
• Derives configuration in a “best fit” kind of way • Provides integration with Gradle’s application plugin to package a
runnable distribution into an OS artifact • Provides ability to produce an OS daemon for your service
!https://github.com/nebula-plugins/nebula-ospackage-plugin
Build System Tooling
The Bakery
Baking a Server Image!!• Aminator
• Provides easy creation of package-specific AMIs • Attaches a “Base Image” volume, installs your software package • Takes a snapshot of the volume, resulting in an AMI • This AMI is the immutable infrastructure • AMI will act as our unit of deployment going forward
!!
https://github.com/netflix/aminator
Service Discovery
Service Registry for Apps!!• Eureka
• Applications can register their own health !
• Integrates tightly with Ribbon to provide inter-app service discovery, load balancing, and fault tolerance
!• Able to be leveraged during the continuous deployment process to
inform as to successful deployments !!
https://github.com/netflix/eureka https://github.com/netflix/ribbon
Continuous Deployment and Cluster Management
Managing Deployments!!• Asgard
• Provides a UI for managing AWS cloud resources • RESTful API for consumers to be able to script against • Decorates AWS with concepts that are relevant to Netflix’s continuous
delivery infrastructure • This includes the concept of applications and clusters, which is
something that AWS does not have • Standalone, runnable JAR or WAR deployment options
!!
https://github.com/netflix/asgard
Continuous Deployment and Cluster Management
Some Harsh Realities…
• All of this stuff is difficult to get up-and-running !• Every tool makes assumptions about account structure,
available resources, naming conventions, etc !• Non-native concepts, like applications and clusters, are
difficult to understand from an outsider’s perspective !• Cost-to-benefit may be low if you’re not adopting the
entire stack
Getting better…
• Many initiatives underway currently to engage the open source community more directly !
• The goal is to make the barrier for entry very low on getting up-and-running with NetflixOSS !
• Andrew Spyker (@aspyker) is leading the charge for making NetflixOSS plug-and-play…
!• Although, not very much (right now) speaks directly to
gluing tools together for continuous delivery
Some Resources
• Zero to Cloud: • http://www.oscon.com/oscon2014/public/schedule/detail/34252 • Walks you through a document that shows how to setup your AWS
account • Shows you how to leverage CloudFormation to configure a NetflixOSS
runtime !• Zero to Docker:
• http://techblog.netflix.com/2014/11/zerotodocker-easy-way-to-evaluate.html
• Pre-built Docker images for NetflixOSS components • Provides a quick way to get up-and-running • Not for production use; not in-use at Netflix
Trying to make this easy on you…
Introducing the Zero to Cloud Gradle Plugin!!
https://github.com/Netflix-Skunkworks/zerotocloud-gradle !
• “Netflix Skunkworks”, so not officially NetflixOSS at this point
!• A single command can initialize a continuous delivery
infrastructure built on NetflixOSS technologies !• Plugin can be utilizes by builds to be the “glue” between
the OS packaging, the Bakery, and Asgard