aws re:invent 2016: netflix: container scheduling, execution, and integration with aws (con313)

61
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Andrew Spyker, Sr. Software Engineer, Netflix December 2016 CON313 Netflix Container Scheduling, Execution, and Integration with AWS

Upload: amazon-web-services

Post on 08-Jan-2017

209 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Andrew Spyker, Sr. Software Engineer, Netflix

December 2016

CON313

NetflixContainer Scheduling, Execution, and Integration with AWS

Page 2: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

What to Expect from the Session

• Why containers?

• Including current use cases and scale

• How did we get there?

• Overview of our container cloud platform

• Collaboration with ECS

Page 3: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

About Netflix

• 86.7M members

• 1000+ developers

• 190+ countries

• > ⅓ NA internet download traffic

• 500+ microservices

• Over 100,000 VMs

• 3 regions across the world

Page 4: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Why containers?

Given that our VM architecture is comprised of …

amazingly resilient,

microservice driven,

cloud native,

CI/CD devops enabled,

elastically scalable

do we really need containers?

Page 5: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Our Container System Provides Innovation Velocity

• Iterative local development, deploy when ready

• Manage app and dependencies easily and completely

• Simpler way to express resources, let system manage

Page 6: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Innovation Velocity - Use Cases

• Media encoding - encoding research development time

• Using VMs - 1 month, using containers - 1 week

• Niagara

• Build all Netflix codebases in hours

• Saves development 100s of hours of debugging

• Edge Rearchitecture with Node.js

• Focus returns to app development

• Simplifies, speeds test and deployment

Page 7: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Why not use existing container mgmt solution?

• Most solutions are focused on the datacenter

• Most solutions are

• Working to abstract datacenter and cross-cloud

• Delivering more than cluster manager

• Not yet at our level of scale

• Wanted to leverage our existing cloud platform

• Not appropriate for Netflix

Page 8: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Batch

Page 9: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

What do batch users want?

• Simple shared resources, run till done, job files

• NOT

• EC2 instance sizes, automatic scaling, AMI OS

• WHY

• Offloads resource management ops, simpler

Page 10: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Historic use of containers

• General workflow (Meson), stream

processing (Mantis)

• Proven using cgroups and Mesos

• With simple isolation

• Using specific packaging formats

Linux

cgroups

Page 11: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Enter Titus

Job Management

Batch

Resource Management & Optimization

Container ExecutionIntegration

Page 12: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Sample batch use cases

• Algorithm

Model

Training

Page 13: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

GPU usage

• Personalization and recommendation

• Deep learning with neural nets/mini batch

• Titus

• Added g2 support using nvidia-docker-plugin

• Mounts nvidia drivers and devices into Docker containers

• Distribution of training jobs and infrastructure made self service

• Recently moved to p2.8xl instances

• 2X performance improvement with same CUDA-based code

Page 14: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Sample batch use cases

• Media encoding experimentation

• Digital watermarking

Page 15: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Sample batch use cases

Ad hoc

reporting

Open connect

CDN reporting

Page 16: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Lessons learned from batch

• Docker helped generalize use cases

• Cluster automatic scaling adds efficiency

• Advanced scheduling required

• Initially ignored failures (with retries)

• Time-sensitive batch came later

Page 17: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Titus Batch Usage (Week of 11/7)

• Started ~ 300,000 containers during the week

• Peak of 1000 containers per minute

• Peak of 3,000 instances (mix of r3.8xls and m4.4xls)

Page 18: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Services

Page 19: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Adding Services to Titus

Job Management

Batch

Resource Management & Optimization

Container ExecutionIntegration

Service

Page 20: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Services are just

long- running

batches, right?

Page 21: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Services more complex

Services resize constantly and run forever

• Automatic scaling

• Hard to upgrade underlying hosts

Have more state

• Ready for traffic vs. just started/stopped

• Even harder to upgrade

Existing, well-defined dev, deploy, runtime, & ops tools

Page 22: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Real Networking is Hard

Page 23: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Multi-Tenant Networking is Hard

• IP per container

• Security group support

• IAM role support

• Network bandwidth isolation

Page 24: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Solutions

• VPC Networking driver

• Supports ENI’s - full IP functionality

• With scheduling - security groups

• Support traffic control (isolation)

• EC2 Metadata proxy

• Adds container “node” identity

• Delivers IAM roles

Page 25: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

VPC Networking Integration with Docker

Titus

Executor

Titus Networking Driver

- Create and attach ENI with

- security group

- IP address

create net namespace

Page 26: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

VPC Networking Integration with Docker

Titus

Executor

Titus Networking Driver

- Launch ”pod root” container with

- IP address

- Using “pause” container

- Using net=none

Pod Root

ContainerDocker

create net namespace

Page 27: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

VPC Networking Integration with Docker

Titus

Executor

Titus Networking Driver

- Create virtual ethernet

- Configure routing rules

- Configure metadata proxy iptables NAT

- Configure traffic control for bandwidthpod_root_id

Pod Root

Container

Page 28: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

VPC Networking Integration with Docker

Titus

Executor

Pod Root

Container(pod_root_id)

Docker

App

Container

create container with

--net=container:pod_root_id

Page 29: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Metadata Proxy

container

Amazon

Metadata

Service

(169.254.169.254)

Titus Metadata Proxy

What is my IP, instanceid, hostname?

- Return Titus assigned

What is my AMI, instance type, etc.

- Unknown

Give me my role credentials

- Assume role to container role, return

credentials

Give me anything else

- Proxy

veth<id>

169.254.169.254:80

host_ip:9999

iptables/NAT

Page 30: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Putting it all together

Virtual Machine Host

ENI1sg=A

ENI2sg=X

ENI3sg=Y,Z

Non-routable IP IP1

IP2

IP3

sg=X sg=X sg=Y,ZNonroutable IP, sg=A Metadata proxy

App

container

pod root

veth<id>

App

container

pod root

veth<id>

App

container

pod root

veth<id>

App

container

pod root

veth<id>

Container 1 Container 2 Container 3 Container 4

Linux Policy Based Routing

+ Traffic Control

169.254.169.254

NAT

Page 31: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Additional AWS Integrations

• Live and rotated to S3 log file access

• Multi-tenant resource isolation (disk)

• Environmental context

• Automatic instance type selection

• Elastic scaling of underlying resource pool

Page 32: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Netflix Infrastructure Integration

• Spinnaker CI/CD

• Atlas telemetry

• Discovery/IPC

• Edda (and dependent systems)

• Healthcheck, system metrics pollers

• Chaos testing

Page 33: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

VM’sVM’s

Why? Single consistent cloud platform

VPC

EC2

Virtual Machines

AW

S A

uto

sca

ler

Service

Applications

Cloud Platform Libraries

(metrics, IPC, health)

Titu

s J

ob

Co

ntro

l

VM’sVM’s

Container

Service

Applications

Cloud Platform Libraries

(metrics, IPC, health)

VM’sVM’s

Container

Batch

Applications

Cloud Platform Libraries

(metrics, IPC)

Edda EurekaAtlas

Page 34: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Titus Spinnaker Integration

Page 35: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Deploy based on

new Docker

registry tags

Page 36: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Deployment

strategies same

as Auto Scaling

group

IAM roles and

security groups

per container

Basic resource

requirements

Page 37: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Easily see health

check & service

discovery status

Page 38: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)
Page 39: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)
Page 40: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Fenzo – The heart of Titus scheduling

Extensible library for scheduling frameworks

• Plugins based scheduling objectives

• Bin packing, etc.

• Heterogeneous resources & tasks

• Cluster automatic scaling

• Multiple instance types

• Plugin-based constraints evaluator

• Resource affinity, task locality, etc.

• Single offer mode added in support of ECS

Page 41: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Fenzo scheduling strategy

For each task

On each host

Validate hard constraints

Eval fitness and soft constraints

Until fitness “good enough”, and

A minimum #hosts evaluated

Plugins

Page 42: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Scheduling – Capacity Guarantees

Desired

Max

Titus maintains …

Critical tier

• guaranteed

capacity & start

latencies

Flex tier

• more dynamic

capacity & variable

start latency

Titus Master

SchedulerFenzo

Page 43: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Scheduling – Bin Packing, Elastic Scaling

Max

User adds work tasks

• Titus does bin

packing to ensure

that we can

downscale entire

hosts efficientlyCan

terminate

Desired

Min

✖ ✖ ✖ ✖

Titus Master

SchedulerFenzo

Page 44: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Availability Zone B

Availability Zone A

Scheduling – Constraints including zone

balancing

User specifies constraints

• Availability Zone

balancing

• Resource and Task

affinity

• Hard and softDesired

Min

Titus Master

SchedulerFenzo

Page 45: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Auto Scaling group version 001

Scheduling – Rolling new Titus code

Operator updates Titus agent

codebase

• New scheduling on new cluster

• Batch jobs drain

• Service tasks are migrated via

Spinnaker pipelines

• Old cluster scales down

Desired

Min

Auto Scaling group version 002

Min

Desired

✖ ✖

Titus Master

SchedulerFenzo

Page 46: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Current Service Usage

• Approach

• Started with internal applications

• Moved on to line-of-fire Node.js (shadow first, prod 1Q17)

• Moved on to stream processing (prod 4Q)

• Current - ~ 2000 long running containers

1Q

Batch 2Q

Service

pre-prod 3Q

Service

shadow

Service

Prod

4Q

Page 47: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Collaboration with ECS

Page 48: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Why ECS?

• Decrease operational overhead of underlying cluster

state management

• Allow open source collaboration on ECS agent

• Work with Amazon and others on EC2 enablement

• GPUS, VPC, security groups, IAM roles, etc.

• Over time, this enablement should result in less maintenance

Page 49: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Titus Today

Container host

mesos-

agent

Titus

executor

containercontainer

containerMesos

master

Titus

Scheduler

EC2

integrationOutbound

- Launch/terminate container

- Reconciliation

Inbound

- Container host events (and offers)

- Container events

Page 50: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

First Titus ECS Implementation

Container host

ECS agent

Titus

executor

containercontainer

containerECS

Titus

Scheduler

EC2

integrationOutbound

- Launch/terminate container

- Polling for

- Container host events

- Container events

Page 51: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Collaboration with ECS team starts

• Collaboration on ECS “event stream” that could provide

• “Real time” task & container instance state changes

• Event based architecture more scalable than polling

• Great engineering collaboration

• Face to face focus

• Monthly interlocks

• Engineer to engineer focused

Page 52: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Current Titus ECS Implementation

Container host

ECS agent

Titus

executor

containercontainer

container

ECS

Titus

Scheduler

EC2

integration

Outbound

- Launch/terminate container

- Reconciliation

Inbound

- Container host events

- Container events

CloudWatch

EventsSQS

Page 53: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Analysis - Periodic Reconciliation

For tasks in listTasks

describeTasks (batches of 100)

Number of API calls: 1 + num tasks / 100 per reconcile

1280 containers

across 40 nodes

Page 54: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Analysis - Scheduling

• Number of API calls: 2X number of tasks

• registerTaskDefinition and startTask

• Largest Titus historical job

• 1000 tasks per minute

• Possible with increased rate limits

Page 55: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Continued areas of scheduling collaboration

• Combining/batching registerTaskDefinition and startTask

• More resource types in the control plane

• Disk, network bandwidth, ENIs

• To fit with existing scheduler approach

• Extensible message fields in task state transitions

• Named tasks (beyond ARNs) for terminate

• Starting vs. started state

Page 56: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Possible phases of ECS support in Titus

• Work in progress

• ECS completing scheduling collaboration items

• Complete transition to ECS for overall cluster manager

• Allows us to contribute to ECS agent open source

Netflix cloud platform and EC2 integration points

• Future

• Provide Fenzo as the ECS task placement service

• Extend Titus Job Management features to ECS

Page 57: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Titus Future Focus

Page 58: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Future Strategy of Titus

• Service automatic scaling and global traffic

integration

• Service/batch SLA management

• Capacity guarantees, fair shares, and pre-emption

• Trough / Internal Spot market management

• Exposing pods to users

• More use cases and scale

Page 59: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Questions?

Andrew Spyker (@aspyker)

Page 60: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Thank you!

Page 61: AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration with AWS (CON313)

Remember to complete

your evaluations!