continuous delivery in practice (extended)

Continuous Delivery In Practice

Lessons from Kenshoo’s RTB project

Who, What, Where

Tzach Zohar:● System Architect● [email protected]

Kenshoo: ● Founded 2006● Online Marketing Technology● >500 employees● 12 World Wide locations

http://il.linkedin.com/in/tzachzohar

http://www.kenshoo.com/

Agenda

● Continuous Delivery: What? Why?● RTB Project● How: 10 Field Tested Tips● The Process● Appendices

Continuous Delivery: Definition(s)

“Continuous Delivery (CD) is a design practice …blah blah blah… Techniques such as

automated testing, continuous integration …blah blah blah... resulting in the ability to rapidly, reliably and

repeatedly push out enhancements ...blah blah blah.”

- Wikipedia

http://en.wikipedia.org/wiki/Continuous_delivery


TL;DR


“Continuous delivery is a set of principles and practices to reduce the cost, time, and

risk of delivering incremental changes to users.”

- Jez Humble

http://www.thoughtworks.com/insights/blog/case-continuous-delivery


“Continuous Delivery is a software development discipline where you build

software in such a way that the software can be released to production at any time”

- Martin Fowler

http://martinfowler.com/bliki/ContinuousDelivery.html

Continuous Delivery: Why bother?

“Our highest priority is to satisfy the customerthrough early and continuous delivery

of valuable software”

First principle of the Agile Manifesto

http://agilemanifesto.org/principles.html

Continuous Delivery: Why bother?

Better suited productResponsiveness

Less wasteHigher quality

Simplicity

Recommended Further Reading on ThoughtWorks

http://www.thoughtworks.com/insights/blog/case-continuous-delivery

Background: RTB Project

● ~1.5 years ● ~3 developers, 1 PM, 0.5 Ops (no QA)● ~Dozens of paying clients● ~50 servers (AWS)● ~1.5M requests per minute● ~7ms average response time● ~99.9% availability


Frontend ClusterHighly available, high throughput ~20 node cluster

BackendSingle node, internal APIs

FBXFacebook RTB API

Reporting ClusterElastic Map Reduce (EMR) on-demand 16-node cluster

Cassandra ClusterHighly available, high throughput ~24 node cluster

S3Raw traffic logs


~5-10 deployments / week

1.The Obvious

● Single branch (details later)● Full, Fast, Reliable coverage● Full deployment automation● Fast feedback● ABCD - Always Be Continuously

Deploying

http://itsmeduncan.com/always-be-cap-deploying-abcd

● Unit: complete functional coverage● Integration: with external systems - thin!● Behavioral: we use Cucumber● Staging: verify actual server upgrade

2. Four-Layer Test Suite

http://cukes.info/

Staging: verify compatibility of new build with other components’ production builds


3. Keep Builds Stable

Do not overlook a test that “sometimes fails”, trusting build status is crucial

3. Keep Builds Stable

● Random data tests● Asynchronous tests● Integration tests

Be suspicious of:

4. Master Is Always Shippable

On every commit? Not QuiteWe follow the “GitHub Flow”:

Local Master

Local Feature Branch

Master

Feature Branch

1. pull

3. push

2. checkout

4. Merge

https://guides.github.com/introduction/flow/index.html

4. Master Is Always Shippable

“Merge” == Build and Deploy

credit: [email protected]

5. Rigorous Code Reviews

● Because “merge” means “deploy”!● Insist on proper coverage● Insist on code cleanliness● Insist on consistent design● Insist!

5. Rigorous Code Reviews

https://github.com/tzachz/github-comment-counter

https://github.com/tzachz/github-comment-counter

6. Real-Time Feedback

Detect issues immediately and visually

7. Keep Upgrade in Mind (1)

Use the “Parallel Change” pattern when changing cross-node APIs / Data

1.Write: oldRead: both

2.Write: new Read: both

3.Write: new Read: new

deploy deploy

http://martinfowler.com/bliki/ParallelChange.html

8. Keep Upgrade in Mind (2)

Verify backward compatibility in tests

9. A/B Testing

Apply new features to a limited user-group Measure business results per-group

(Not by branching)

9. A/B Testing

Splitting into groups correctly is important

9. A/B Testing

It’s easy to mess up (neglecting biases, wrong grouping, wrong comparison

methods)

This excellent talk by LivePerson’s Shlomo Lahav helped us a lot

http://www.youtube.com/watch?v=wim79vHA_oI

10. Own It

Constantly check buildsConstantly collect feedbackConstantly check monitorsAnswer the phone at 3am

10. Own It

That’s It.

The Process

● Greenfield? That’s easy:○ Start with deployment and build○ Deploy a Hello World application○ Every new feature is test-covered

The Process (RTB)

1.Increase Unit+Integration CoverageCreate naive deployment AutomationCreate monitoringManual Staging tests

2.Automated stagingDowntime eradicatedManual (but often) deployment trigger

3.Autopilot - deploy upon commit

~ 9 Months

~ 3 Months

Appendix A. Partial Tool List

Testing: JUnit, Cucumber, NoseBuild / CI: Jenkins, Gradle, JaCoCo

Code Review: GitHubProvisioning: Puppet

Deployment: Fabric, botoMonitoring: Metrics, Graphite

http://junit.org/

http://cukes.info/

http://pythontesting.net/framework/nose/nose-introduction/

http://jenkins-ci.org/

http://www.gradle.org/

http://www.eclemma.org/jacoco/

https://github.com/

http://puppetlabs.com/

http://www.fabfile.org/

http://boto.readthedocs.org/en/latest/

http://metrics.codahale.com/

http://graphite.wikidot.com/

Appendix B. Are You Ready?

Unit Coverage > 90%?

Good Staging Tests?

Informative Monitors?

Builds Are Kept Green?

No API Breaking Changes?

Rigorous Code Reviews?

Support Has Your Phone Number?

Do You Own it?

Not Ready

No Yes

credit: [email protected]

Thanks. Questions?

continuous delivery in practice (extended)

Software

agenda continuous delivery

process rtb

rtb project frontend

compatibility of new

node cluster cassandra

integration coverage

new read

kenshoos rtb project