(dvo203) the life of a netflix engineer using 37% of the internet

82
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dave Hahn, Operations and Reliability Engineering [email protected] | @relix42 DVO203 A Day in the Life of a Netflix Engineer Using 37% of the Internet

Upload: amazon-web-services

Post on 15-Apr-2017

5.544 views

Category:

Technology


1 download

TRANSCRIPT

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dave Hahn, Operations and Reliability Engineering

[email protected] | @relix42

DVO203

A Day in the Life

of a Netflix Engineer

Using 37% of the Internet

A bit about me

Dave Hahn

Senior * Engineer

Critical Operations and Response Engineering Team

(CORE)

Operations Crisis Handling

Cloud Performance

Reliability Insight

Network Engagement

Hardware Software

*

Operations Crisis Handling

Cloud Performance

Reliability Insight

Network Engagement

Hardware Software

Making things better

*

CORE team

Part of Operations Engineering

Responsible for:

Crisis management

Availability reporting

Reliability best practices

AWS relationship

Operations education

CORE team

SREs

PMs

Crisis leaders

CORE goals

Protect the customer experience

CORE goals

Protect the customer experience

Cannot connect to the Netflix service

Try Again

Go outside and play

Sunshine!

CORE goals

Protect the customer experience

CORE goals

Protect the

customer experience

CORE goals

Protect the customer experience

Unique failures

CORE goals

Protect the customer experience

Unique failures

Constant Improvement

A bit about Netflix

A bit about Netflix

Media and entertainment company

Goal

Delight our customers and win moments of truth

Moment

The Netflix cloud journey

‘09

‘10

‘11

‘13‘15

Start

cloud

effort

First

device

talking to

AWS

Serving from

EU-WEST-1

Serving from

US-WEST-2

Migration

complete

The Netflix cloud journey

Netflix architecture

AZ1

Global deployment

US-West US-East EU-West

Netflix architecture

Microservices

100s

& 100s of microservices

Netflix architecture

Netflix ecosystem

100s of microservices

1000s of daily production changes

10,000s of instances

100,000s of customer interactions per minute

1,000,000s of customers

1,000,000,000s of metrics

10,000,000,000 hours of streamed

Netflix ecosystem

100s of microservices

1000s of daily production changes

10,000s of instances

100,000s of customer interactions per minute

1,000,000s of customers

1,000,000,000s of metrics

10,000,000,000 hours of streamed

10s of operations engineers

Netflix ecosystem

100s of microservices

1000s of daily production changes

10,000s of instances

100,000s of customer interactions per minute

1,000,000s of customers

1,000,000,000s of metrics

10,000,000,000 hours of streamed

10s of operations engineers

No NOC

How?

DevOps culture

DevOps culture

100% ownership

Code

Test

Deploy

Run

Support

DevOps culture

100% ownership

On call 24x7

DevOps culture

100% ownership

On call 24x7

Incident reviews

DevOps culture

100% ownership

On call 24x7

Incident reviews

Honest and open feedback

How?

DevOps culture

Easy ownership

Easy ownership

Easy ownership

Service discovery

Easy ownership

Solid communication

Easy ownership

Continuous deployment

Easy ownership

Data persistence

How?

DevOps culture

Easy ownership

Insight

Insight

Metrics

Insight

Insight

Insight

Insight

Operational insight

Insight

Insight

Insight

How?

DevOps culture

Easy ownership

Insight

Cloud thinking

Cloud thinking

Cloud thinking

Verbs not nouns

How?

DevOps culture

Easy ownership

Insight

Cloud thinking

Remove surprises

Cloud guarantees

Your instances will die

R.I.P. <your favorite instance>

Stateless applications

High data spread and redundancy

Production failure injection

Cloud guarantees

Your instances will die

You will share resources

Cloud guarantees

Your instances will die

You will share resources

The architecture will change

Cloud guarantees

Your instances will die

You will share resources

The architecture will change

You never see the lights

A day in the life

What would you say you do here?

Well Bob

Crisis handling

Well Bob

Crisis handling

Engagement

Well Bob

Crisis handling

Engagement

Making things

Making things

Well Bob

Crisis handling

Engagement

Automation

Education

What I don’t do

What I do

Make things better

Making things better for you

netflix.github.io

Making things better for you

jobs.netflix.com

Speaker When? Where?

Running Spark and Presto on the Netflix Big Data

Platform

Daniel

Weeks

Thu @

11am

Palazzo

F

Splitting the Check on Compliance and Security:

Keeping Developers and Auditors Happy in the

Cloud

Jason ChanThu @

11am

Marcello

4501B

@

Visit the Netflix booth

Speakers there to answer questions

Thank you!

Dave Hahn

[email protected]

@relix42

Remember to complete

your evaluations!