what an enterprise can learn from netflix, a cloud-native company (ent203) | aws re:invent 2013

29
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. What an Enterprise Can Learn from Netflix, a Cloud Native Company Yury Izrailevsky, VP Cloud and Platform Engineering, Netflix November 14, 2013

Upload: amazon-web-services

Post on 11-May-2015

705 views

Category:

Technology


0 download

DESCRIPTION

In moving its streaming product to the cloud, Netflix has been able to realize tremendous benefits in scalability, performance, and availability. The biggest benefit came from moving to a service-based architecture, which allowed engineering teams to accelerate their development cycle and innovate more quickly. However, cloud migration was a substantial effort. We mobilized resources across the company over several years, reorganized our engineering and operations teams, developed new security policies, migrated to the DevOps operations model, and even embraced a new product architecture. In this talk, we trace the evolution of the Netflix cloud model, both the successes and the challenges, and present them in a way that’s maximally useful to enterprises considering making the move to the cloud.

TRANSCRIPT

Page 1: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

What an Enterprise Can Learn from Netflix,

a Cloud Native Company

Yury Izrailevsky, VP Cloud and Platform Engineering, Netflix

November 14, 2013

Page 2: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

August 2008 Database Corruption

RDBMS

Page 3: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

3

Performance Scalability Availability

Page 4: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Netflix Streaming Growth

4

• 5 billion quarterly streaming hours

• 40 million customers

• 41 countries

• 3 continents

100x growth since 2009

Page 5: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Netflix Cross-regional Cloud Architecture

Page 6: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013
Page 7: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

7

Performance Scalability Availability

Page 8: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Too Expensive?

Netflix data center

87% cost reduction

per streaming start

Page 9: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Efficiency Benefits

Economy of scale Elasticity

1/4/2009 1/4/2010 1/4/2011 1/4/2012 1/4/2013

Streaming growth

Cyclical daily streaming usage

Page 10: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

10

Performance Scalability Availability

Page 11: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

A Truly Great Service…

11

Availability goal: 99.99%

30 secs/week

at peak traffic

Has to Just Work!

Page 12: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Weekly Streaming Availability (13wkMA) 7/1

7/2

011

8/7

/20

11

8/2

8/2

011

9/1

8/2

011

10

/9/2

011

10

/30

/201

1

11

/20

/201

1

12

/11

/201

1

1/1

/20

12

1/2

2/2

012

2/1

2/2

012

3/4

/20

12

3/2

5/2

012

4/1

5/2

012

5/6

/20

12

5/2

7/2

012

6/1

7/2

012

7/8

/20

12

7/2

9/2

012

8/1

9/2

012

9/9

/20

12

9/3

0/2

012

10

/21

/201

2

11

/11

/201

2

12

/2/2

012

12

/23

/201

2

1/1

3/2

013

2/3

/20

13

2/2

4/2

013

3/1

7/2

013

4/7

/20

13

4/2

8/2

013

5/1

9/2

013

6/9

/20

13

6/3

0/2

013

7/2

1/2

013

8/1

1/2

013

9/1

/20

13

9/2

2/2

013

10

/13

/201

3

12/24/2012 Elastic

Load Balancing outage

Using AWS redundancy to build highly fault-tolerant architecture

Page 13: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Netflix Cloud Journey: Tough Decisions

• System rearchitecture

• New security model

• New operational model

• Organizational changes

Page 14: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Old Architecture: Consolidated Java App

Javaweb Javaweb Javaweb

… …

Page 15: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native Service-based Architecture

15

Page 16: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cascading Failures

16

API

Instant

Queue

Simple DB

Page 17: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cascading Failures

17

99% availability

X …

99% 500

= 0.657%

99% availability 99% availability

Page 18: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native: Strategies to Improve Availability

18

Graceful

degradation Redundancy

Page 19: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native: Graceful Degradation

19

Page 20: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native: Redundancy

20

Zone

A

Zone

B

Zone

C

Redundancy across

Availability Zones

Page 21: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native Persistence

21

RDBMS Relational NoSQL

distributed databases

Page 22: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Testing Fault Tolerance: Simian Army

22

Chaos Monkey Latency Monkey Chaos Gorilla

Page 23: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Open Source Portal at http://netflix.github.com

Page 24: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Native Operations: DevOps

Netflix data center

Central NOC team

coordinates bi-

weekly releases

Dev teams push production

changes on own schedule;

no central coordination

Page 25: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

AMI-Based Cloud Deployments

Old

code

New

code

Red-black

deployments Bake new AMI for

each app deployment

Page 26: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Evolving a Cloud Native Organization

Data center

IT-Ops manages

budget, capacity Self-service provisioning by dev

teams; visibility through tools

Coordinated releases

via centralized NOC Distributed DevOps; SREs build

tools, share best practices

Oracle DBAs manage

several databases Java, DevOps engineers support

dozens of Cassandra clusters

Data science: analysts

write SQL queries Hadoop engineers build ETL

using PIG/Python

Page 27: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Cloud Pilot Project: Jobs Page

Page 28: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Building a Great Streaming Product

28

Page 29: What an Enterprise Can Learn from Netflix, a Cloud-native Company (ENT203) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

ENT203