making spinnaker go @ stitch fix

38
Making Spinnaker Go @ Stitch Fix Diana Tkachenko, Data Platform Engineer

Upload: diana-tkachenko

Post on 16-Apr-2017

838 views

Category:

Engineering


6 download

TRANSCRIPT

Page 1: Making Spinnaker Go @ Stitch Fix

Making Spinnaker Go

@ Stitch Fix

Diana Tkachenko,Data Platform Engineer

Page 2: Making Spinnaker Go @ Stitch Fix

Spinnaker Is Not Yet in Production

Let me tell you an awesome story of how to install and set up spinnaker to make it work for you!

Page 3: Making Spinnaker Go @ Stitch Fix

I. Our InfrastructureII. Setting Up Spinnaker

III. Authentication on Spinnaker

Page 4: Making Spinnaker Go @ Stitch Fix

PART I Our Infrastructure Pre-Spinnaker

Page 5: Making Spinnaker Go @ Stitch Fix

100% of Infrastructure on AWS3 Peered VPCs

Isolate environments into different VPCs:

● TEST○ testing deployments before

pushing to prod● PROD

○ all production deployments● INFRA

○ tools that both prod and test need to use

prod test

infra

jenkinsartifactoryspinnaker

flotilla

Page 6: Making Spinnaker Go @ Stitch Fix

Deployment Pipeline

Immutable Server Pattern

● Package Code into RPMs● Bake AMI from RPM● Deploy

○ Set up Launch Config with AMI

○ Create ASG

○ Set up ELBs, Route53

Page 7: Making Spinnaker Go @ Stitch Fix

Process Overview

create ELB

create Route53

create spec

bake AMI

launch ASG

build RPM

Repeatable Deployment Process

Definition of Application

make changes to code

To create an application, this would be the one time setup

app “scaffolding” on aws;route53 points to ELB

rpm built from this recipe

Iterative process for deploying new versions

attach to ELB

Page 8: Making Spinnaker Go @ Stitch Fix

Step 1: Build RPM from Spec

Wrote up simple tools to create the RPM:

● Create spec file from template● Customize spec file● Jenkins job to build RPM

The process appears complex:

● The spec file seems scary for user● But it makes deployment easy down the

line!

Name: sf-helloworldVersion: 0.0.1Release: 1Summary: YOUR SUMMARY HERE!Group: Development/LibrariesLicense: stitchfix-internalBuildArch: noarchAutoReqProv: noBuildRequires:Requires: sf-base, sf-aa, sf-nginx

%installmkdir -p $RPM_BUILD_ROOT{/stitchfix,/etc/init.d}cp -R %{_sourcedir} $RPM_BUILD_ROOT/stitchfix/%{base_name}cp %{_topdir}/SCRIPTS/sf-%{base_name} $RPM_BUILD_ROOT/etc/init.d/sf-%{base_name}

%files/stitchfix/%{base_name}/etc/init.d/sf-%{base_name}

%postln -s /etc/nginx/sites-available/sf-app.conf /etc/nginx/sites-enabled/sf-app.conf/usr/bin/pip-2.7 install -e /stitchfix/%{base_name}chkconfig --add %{name}chkconfig --levels 345 %{name} on

sf-helloworld.spec

Page 9: Making Spinnaker Go @ Stitch Fix

Step 2: Bake AMI

● Used aminator (also from Netflix) to create

AMIs● Jenkins job for baking

How does AMI get baked?

1. Create volume from base AMI id

2. Attach and mount volume3. Chroot into volume4. Install RPM on volume5. Create snapshot from volume6. Register AMI from snapshot

EC2 Instance(Baking Machine)

Artifactory(RPM repo)

RPM

Volume

get RPM from repo

inst

all R

PM

Page 10: Making Spinnaker Go @ Stitch Fix

Step 3: Deploy

ELB

ASG

Route53

EC2 EC2 EC2

Launch Config

AMIRPM

is baked into

both used to create

internet traffic

imm

utab

le server

routes traffic

Page 11: Making Spinnaker Go @ Stitch Fix

Why Spinnaker?

80 Data Scientists

10 Platform Engineers

Our data scientists are responsible for:

● Building ETLs

● Deploying Dashboards and Services

We value self service!

Page 12: Making Spinnaker Go @ Stitch Fix

PART IISetting Up SpinnakerIn Our Infrastructure

Page 13: Making Spinnaker Go @ Stitch Fix

Key Differences

from the Netflix Setup

1. Amazon Linux instead of Ubuntua. Adding RPM support to Gradle

b. System V instead of Upstart

2. Nginx instead of Apache3. Secured Redis on AWS4. No Cassandra in Existing

Architecture

And how to handle them

Page 14: Making Spinnaker Go @ Stitch Fix

Diff #1You drew the short straw with Amazon Linux (Red Hat) instead of Ubuntu

Page 15: Making Spinnaker Go @ Stitch Fix

Adding RPM Support to Gradle

Create the buildRpm block:

● add our rpm repo in /etc/yum.repos.d on bake machine

● add dependency rpms inside the block

● make sure to build all the other spinnaker rpms and push to your rpm repo

./gradlew buildRpm

// UbuntubuildDeb { requires('redis-server', '3.0.5', GREATER | EQUAL) requires('spinnaker-clouddriver') requires('spinnaker-deck') requires('spinnaker-echo') requires('spinnaker-front50') requires('spinnaker-gate') requires('spinnaker-igor') requires('spinnaker-orca') requires('spinnaker-rosco') requires('spinnaker-rush') requires('apache2')}

// CentosbuildRpm { requires('sf-nginx') requires('sf-base') requires('spinnaker-clouddriver') requires('spinnaker-deck') requires('spinnaker-echo') requires('spinnaker-front50') requires('spinnaker-gate') requires('spinnaker-igor') requires('spinnaker-orca') requires('spinnaker-rosco') requires('spinnaker-rush') os = LINUX # ⇐ YOU NEED THIS MAGIC LINE!}

[spinnaker] build.gradle

Page 16: Making Spinnaker Go @ Stitch Fix

Upstart on Amazon LinuxDifferent startup systems:

● We use System V (ancient) ○ service nginx start○ startup scripts in /etc/init.d○ chkconfig for starting on bootup

● Spinnaker uses upstart○ initctl start spinnaker○ conf files in /etc/init

Another Issue:

● 0.6.5 version of upstart on Amazon Linux which is way older than 1.4 on Ubuntu

description "rosco"start on filesystem or runlevel [2345]

# not supported in old version# so for amazon linux we remove these lines:setuid spinnakersetgid spinnaker

expect forkstop on stopping spinnaker

env HOME=/home/spinnaker exec /opt/rosco/bin/rosco 2>&1 > /var/log/spinnaker/rosco/rosco.log &

[rosco] /etc/init/rosco.conf

Page 17: Making Spinnaker Go @ Stitch Fix

Diff #2You’re hip and use Nginx instead of Apache

Page 18: Making Spinnaker Go @ Stitch Fix

Namespace Gate and Rosco in Nginx

● include /etc/nginx/sites-enabled in main nginx conf● on deploy, symlink

/etc/nginx/sites-available/spinnaker.conf => /etc/nginx/sites-enabled/spinnaker.conf

[spinnaker]

/etc/nginx/sites-available/spinnaker.conf

# all services on the same machineserver {

listen 80; location / { root /opt/deck/html; }

# namespacing gate location ~* ^/gate/ { rewrite ^/gate/(.*) /$1 break; proxy_pass http://localhost:8084; } # namespacing rosco location ~* ^/rosco/ { rewrite ^/rosco/(.*) /$1 break; proxy_pass http://localhost:8087; }}

ELB HTTP 80 ⇒ HTTP 80

nginx 80

/ => /opt/deck/html/gate/health => localhost:8084/health

/rosco/health => localhost:8087/health

EC2

spinnaker.<internal-domain>.com

Page 19: Making Spinnaker Go @ Stitch Fix

Diff #3You happily use AWS Elasticache for Redis, but find out Spinnaker angers it

Page 20: Making Spinnaker Go @ Stitch Fix

AWS Elasticache is Special

AWS Redis won’t let you issue CONFIG commands!

● Redis version has to be >= 2.8.0● On AWS elasticache console, add

notify-keyspace-events=Egx to a new parameter group

○ this enables redis keyspace events for generic commands and expired events

● In gate.yml, add

redis.configuration.secure=true

server: port: ${services.gate.port:8084} address: ${services.gate.host:localhost}

...

redis: connection: ${services.redis.connection} # add the following two lines if using aws redis configuration: secure: true

[spinnaker] /config/gate.yml

AWS

Redis 2.8.0

spinnakerparameter

group

notify-keyspace-events=Egx

Page 21: Making Spinnaker Go @ Stitch Fix

Diff #4You’d like a quick Cassandra hack since you are Cassandra-less

Page 22: Making Spinnaker Go @ Stitch Fix

Quick EBS Backed Cassandra Node

Don’t want an entire cluster - want fast setup, so create single-node Cassandra:

● EBS backed store for cassandra data● Startup script remaps route53 entry on each

deployment○ Point straight to EC2, not ELB

On redeploy or termination:

● EBS detaches, so data is not lost● cassandra.<internal-domain>.com mapped

to new EC2

Cassandra

cassandra.<internal-domain>.com

EBS/cassandra-storage

# change all store dirs to EBSdata_file_directories: - /cassandra-storage/datacommitlog_directory: /cassandra-storage/commitlogsaved_caches_directory: /cassandra-storage/saved_caches

# point all to private route53 entryseed_provider: parameters: - seeds: cassandra.<internal-domain>.comlisten_address: cassandra.<internal-domain>.comrpc_address: cassandra.<internal-domain>.com

/etc/cassandra/conf/cassandra.yaml

Page 23: Making Spinnaker Go @ Stitch Fix

Overview: Spinnaker on AWS

ELBspinnaker.<internal-domain>.com

HTTP 80 ⇒ HTTP 80

ASG

EC2

clouddriver7002

front50

8080

orca8083

rosco8087

gate8084

rush8085

igor8088

echo8089

nginx80

deck80

route53 cname for load balancerload balancer listeners

deck, rosco, gate through nginxgate calls everything else

cassandra redis

Page 24: Making Spinnaker Go @ Stitch Fix

PART IIIAuth on SpinnakerKeep Calm

Page 25: Making Spinnaker Go @ Stitch Fix

SSL + Auth on Spinnaker

● Where to Terminate SSL?● Glory and the Beast of Self Signed

Certs● Google OAuth2.0 Redirects Mess

up Nginx Rewrites● Tomcat Ignores Client Certs for

Client AuthGet ready to read a lot of stack traces

Page 26: Making Spinnaker Go @ Stitch Fix

SSL: Dilemma #1Where to terminate SSL:

a. ELBb. Nginxc. Server

Page 27: Making Spinnaker Go @ Stitch Fix

Nginx to Terminate SSL for Deck, Rosco

● Configure nginx with cert and key and turn ssl on● Nginx now cannot start on bootup - needs

password?○ Add password to a file, add to nginx

● Now our healthcheck is messed up

○ Add 5000 port for easy ELB healthcheck

● Optional 80 => 443 redirect

● Notice how gate rewrite is gone…○ has to do with oauth redirects

server { listen 5000; location / { add_header Content-Type text/plain; return 200 'POOOOOOOOP'; }}

# optional redirect hereserver { listen 80; return 301 https://$host$request_uri;}

server { listen 443 ssl; ssl_password_file /etc/keys/spinnaker.pass; ssl_certificate /opt/spinnaker/ssl/server.crt; ssl_certificate_key /opt/spinnaker/ssl/server.key;

location / { root /opt/deck/html; }

location ~* ^/rosco/ { rewrite ^/rosco/(.*) /$1 break; proxy_pass http://localhost:8087; }}

[spinnaker]

/etc/nginx/sites-available/spinnaker.conf

Page 28: Making Spinnaker Go @ Stitch Fix

For Gate, Pass Through SSL Directly to Server

We want ELB to just pass traffic through to gate

without decrypting:

● Bypass nginx for gate: ports 8084 ⇒ 8084 for gate SSL

Gate is responsible for all types of authentication:

● Have client certificate? ○ Authenticate client certificate - this is

why gate needs to terminate SSL● No client certificate?

○ Send to google oauth

ELB

HTTP 80 ⇒ HTTP 80TCP 443 ⇒ TCP 443

TCP 8084 ⇒ TCP 8084

EC2

spinnaker.<internal-domain>.com

gate8084

nginx443

80 ⇒ 443

Page 29: Making Spinnaker Go @ Stitch Fix

SSL: Dilemma #2Self signed certs? Meet your new best friends, the Java TrustStores

Page 30: Making Spinnaker Go @ Stitch Fix

Tomcat Needs CA to Be in Trust Store

Because we are using self-signed certs, it’s important to have our self created CA in the

truststore:

● Add spinnaker cert to java keystore using keytool utility

● Add keystore/truststore file location to gate-local.yml config

server: ssl: enabled: true keyStore: /opt/spinnaker/ssl/keystore.jks keyStorePassword: poop keyAlias: server trustStore: /opt/spinnaker/ssl/keystore.jks trustStorePassword: poop

/opt/spinnaker/conf/gate-local.yml

But at some point I still had problems, so here’s a quick hack - add your CA to default java CA file:

$JAVA_HOME/jre/lib/security/cacerts

Page 31: Making Spinnaker Go @ Stitch Fix

OAuth: Dilemma #3Google OAuth2.0 redirects trample all over your Nginx rewrites

Page 32: Making Spinnaker Go @ Stitch Fix

Remove Namespacing for Gate & Bypass Nginx

● Set redirect_uri to our gate address: https://spinnaker.<internal-domain>.com:8084/login

● Gate can no longer be namespaced because on redirect, /gate in the path gets lost as only $host recorded

Spinnaker(gate)

Google Auth

Server

Web Browser(deck javascript)

https://spinnaker.<internal-domain>.com:8084/login

User authorization request

User authorizes application

Auth code grant

Access token request

Access token grant

Page 33: Making Spinnaker Go @ Stitch Fix

Client Auth: Dilemma #4Tomcat doesn’t seem to care about your client cert

Page 34: Making Spinnaker Go @ Stitch Fix

Make Tomcat Request Client Cert for Client Auth

We need to enable scripts to post tasks to spinnaker with client authentication:

● Create certs for client● Configure gate tomcat to validate client cert

Spinnaker Gatespinnaker.<internal-domain>.com:8084

Beakhead(Spinnaker Client)

x509: enabled: true subjectPrincipalRegex: CN=(.*?)

server: ssl:

clientAuth: want enabled: true keyStore: /opt/spinnaker/ssl/keystore.jks keyStorePassword: poop keyAlias: server trustStore: /opt/spinnaker/ssl/keystore.jks trustStorePassword: poop

/opt/spinnaker/conf/gate-local.yml

POST /tasksInclude client cert in request

● Layer based authentication on gate

● Tomcat validates cert: has to recognize cert authority from truststore

● Returns response if authenticated

Page 35: Making Spinnaker Go @ Stitch Fix

PART IVTake AwaysWhat we learned

Page 36: Making Spinnaker Go @ Stitch Fix

Spinnaker is complex!There are barriers to overcome if working with different infrastructure.

Page 37: Making Spinnaker Go @ Stitch Fix

I learned a lot about SSL, OAuth 2.0 and Client Authentication.

Like a lot.

Page 38: Making Spinnaker Go @ Stitch Fix

Thanks for Listening!

We are very much looking forward to having Spinnaker in production.

Find me on spinnaker slack

@dtkachenko

All pictures used in this presentation credit to Allie Brosh hyperboleandahalf.blogspot.com