large scale kubernetes on aws at europe's leading online fashion platform - aws tech community...

51
AWS TECH COMMUNITY DAYS 2017-09-28 HENNING JACOBS @try_except_ Kubernetes on AWS @ZalandoTech

Upload: henning-jacobs

Post on 22-Jan-2018

544 views

Category:

Technology


0 download

TRANSCRIPT

AWS TECH COMMUNITY DAYS

2017-09-28

HENNING JACOBS

@try_except_

Kubernetes on AWS@ZalandoTech

2

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

ZALANDO

15 markets

6 fulfillment centers

21 million active customers

3.6 billion € net sales 2016

200 million visits per month

13,000 employees in Europe

3

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

ZALANDO TECHNOLOGY

HOME-BREWED,CUTTING-EDGE& SCALABLEtechnology solutions

1,800employees from

tech locations+ HQs in Berlin6

77nations

help our brand toWIN ONLINE

4

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

ZALANDO TECH’SINFRASTRUCTURE

5

Please write the title in all capital letters

FOUR ERAS AT ZALANDO TECH

ZOMCATPHP STUPS KUBERNETES

2010 2015 2016

Data center

WAR

AWS

Docker

Cloud Formation

Low level (AWS API)

AWS

Docker

Kubernetes manifest

High abstraction level

Data center

PHP files

6

Please write the title in all capital letters

LARGE SCALE?

8

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

KUBERNETES:ARCHITECTURE

9

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

KUBERNETES ON AWS: CONTEXT

200 engineering teams

30 prod. clusters

AWS/STUPS

Dockerized apps

No manual operations

Reliability

Autoscaling

Seamless migration

10

Please write the title in all capital letters

ISOLATED AWS ACCOUNTS

Internet

*.abc.example.org *.xyz.example.org

Product ABC Product XYZ

EC2

LBLB

11

Please write the title in all capital letters

KUBERNETES ON AWS

12

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

DEPLOYMENT

13

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

DEPLOYMENT CONFIGURATION

.├── deploy/apply│ ├── deployment.yaml # K8s Deployment│ ├── credentials.yaml # K8s TPR│ ├── ingress.yaml # K8s Ingress│ └── service.yaml # K8s Service└── delivery.yaml # pipeline config

14

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

INGRESS.YAML

apiVersion: extensions/v1beta1kind: Ingressmetadata: name: "..."spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80

15

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CONTINUOUS DELIVERY PLATFORM

16

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CDP: APPLY

17

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CDP: OPTIONAL APPROVAL

18

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

AWS INTEGRATION

19

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CLOUD FORMATION VIA CI/CD

.├── deploy/apply│ ├── deployment.yaml # K8s Deployment│ ├── cf-iam-role.yaml # AWS IAM Role│ ├── cf-rds.yaml # AWS RDS Database│ ├── kube-ingress.yaml # K8s Ingress│ ├── kube-secret.yaml # K8s Secret│ └── kube-service.yaml # K8s Service└── delivery.yaml # CI/CD config

20

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

ASSIGNING AWS IAM ROLE TO POD

kind: Deploymentspec: template: metadata: annotations: # annotation for kube2iam iam.amazonaws.com/role: "app-myapp-role" spec: containers: - name: ... ...

https://github.com/jtblin/kube2iam

⇒ AWS SDKs just work as expected

21

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

OAUTH / IAMINTEGRATION

22

Please write the title in all capital letters

SERVICE TO SERVICE AUTHNZ

Kubernetes Cluster

https://resource-server.example.org/protected

HTTP/1.1 401 Unauthorized{"message": "Authorization required"}

23

Please write the title in all capital letters

CREDENTIAL PROVIDER

24

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

USING THE OAUTH CREDENTIALS

#!/bin/bash

secret=$(cat /creds/mytok-token-secret)

curl -H "Authorization: Bearer $secret" \

https://resource-server.example.org/protected

25

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

CHALLENGES

26

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

1. Getting Started

2. Stability

3. Onboarding

4. User Experience

CHALLENGES

27

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

CHALLENGE 1:GETTING STARTED

28

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

GETTING STARTED

https://github.com/hjacobs/kubernetes-on-aws-users

29

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

GETTING STARTED

https://github.com/hjacobs/kubernetes-on-aws-users

30

Please write the title in all capital letters

CLUSTER PROVISIONING

31

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CLUSTER PROVISIONING

• Two Cloud Formation stacks

• Master & worker ASGs + etcd

• Nodes w/ Container Linux

• K8s manifests applied separately

• kube-system Deployments

• DaemonSets

32

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

GETTING STARTED

Goal: use Kubernetes API as primary interface for AWS

• Mate, External DNS

• Kubernetes Ingress Controller for AWS

• kube2iam

⇒ we wrote new componentsto achieve our goal

33

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

INGRESS CONTROLLER

https://github.com/zalando-incubator/kube-ingress-aws-controller / https://github.com/kubernetes-incubator/external-dns

34

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

GETTING STARTED

Other questions we asked ourselves..

• Single AZ vs. Multi AZ?

• Federation?

• Overlay network?

• Authnz?

35

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

GETTING STARTED

Other questions we asked ourselves..

• Single AZ vs. Multi AZ? ⇒ Multi AZ

• Federation? ⇒ No, not ready yet

• Overlay network? ⇒ Flannel, “rock solid”

• Authnz? ⇒ OAuth, webhook

36

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

CHALLENGE 2: STABILITY

37

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

CLUSTER UPDATES

38

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

STABILITY: AWS RATE LIMITS

• Ran into the same trap twice (Mate & Ingress Ctrl)

• Kubernetes core causes many calls (e.g. EBS)

• Monitoring (ZMON) needs to poll AWS

⇒ One of our biggest pain points with AWS(and all workarounds are hard and/or ugly)

39

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

STABILITY: LIMIT RANGE

kubectl describe limitrange

Name: limits

Namespace: default

Type Resource Min Max Default Req Default Limit Max Limit/Request Ratio

---- -------- --- --- ----------- ------------- -----------------------

Container memory - 64Gi 100Mi 1Gi -

Container cpu - 16 100m 3 -

http://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html#resources

⇒ Mitigate errors on OSI layer 8 ;-)

Recommended: The 5 Whys

https://en.wikipedia.org/wiki/5_Whys

ON CALL

43

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

CHALLENGE 3: ONBOARDING

44

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

ONBOARDING

• Many new concepts to grasp vs. 200 teams

• Kubernetes Training (2h)

• Documentation

• Recorded Friday Demos

• Support Channels (chat, mail)

45

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

CHALLENGE 4:USER EXPERIENCE

46

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

USER EXPERIENCE

• Continuous Delivery Platform (delivery.yaml)

• Juggling with K8s and CF YAMLs

• Inconsistent state, troubleshooting

47

Put images in the grey dotted box "unsupported placeholder" - behind the orange box (left side stays white)

Write the quote in all capital letters

KUBERNETESVS.

AWS ECS

48

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

AWS API

Tasks, Services

Static AWS API

Blox

Operating worker nodes

Vendor community/support

AWS only

WHY NOT ECS?

Declarative API (fast & no rate limits)

High level abstractions (Ingress, CronJob)

Extensible API (e.g. TPR)

Batteries included (DaemonSet, StatefulSet)

Operating etcd, master & worker nodes

Huge community

Run anywhere

⟺⟺⟺⟺⟺⟺⟺

disclaimer: incomplete and opinionated ;-)

https://github.com/hjacobs/kube-ops-view

50

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"

Use bullet points to summarize information rather than writing long paragraphs in the text box

LINKS

Running Kubernetes in Production on AWShttp://kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/kubernetes-in-production.html

Kube AWS Ingress Controllerhttps://github.com/zalando-incubator/kube-ingress-aws-controller

External DNShttps://github.com/kubernetes-incubator/external-dns

PostgreSQL Operatorhttps://github.com/zalando-incubator/postgres-operator

Zalando Cluster Configurationhttps://github.com/zalando-incubator/kubernetes-on-aws

List of Organizations using Kubernetes on AWShttps://github.com/hjacobs/kubernetes-on-aws-users

Please write contact name, department and position in all capital letters

QUESTIONS?

HENNING JACOBS

DEDICATED OWNER

DEVELOPER PRODUCTIVITY

[email protected]

@try_except_

Illustrations by @01k

Please write contact name, department and position in all capital letters