scaling the netflix api - oscon

119
Scaling the Netflix API Daniel Jacobson @daniel_jacobson http://www.linkedin.com/in/danieljacobson http://www.slideshare.net/danieljacobson

Upload: daniel-jacobson

Post on 15-Jan-2015

4.879 views

Category:

Technology


5 download

DESCRIPTION

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

TRANSCRIPT

Page 1: Scaling the Netflix API - OSCON

Scaling the Netflix API

Daniel Jacobson@daniel_jacobson

http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson

Page 2: Scaling the Netflix API - OSCON

Please read the notes associated with each slide for

the full context of the presentation

Page 3: Scaling the Netflix API - OSCON

What do I mean by “scale”?

Page 4: Scaling the Netflix API - OSCON

But There Are Many Ways to Scale!

OrganizationSystems

Devices

Development

Testing

Page 5: Scaling the Netflix API - OSCON

But first, some background…

Page 6: Scaling the Netflix API - OSCON

Global Streaming Videofor TV Shows and Movies

Page 7: Scaling the Netflix API - OSCON

More than 36 Million Subscribers

More than 40 Countries

Page 8: Scaling the Netflix API - OSCON

Netflix Accounts for 33% of Peak Internet Traffic in North America

Netflix subscribers are watching more than 1 billion hours a month

Page 9: Scaling the Netflix API - OSCON
Page 10: Scaling the Netflix API - OSCON

Netflix REST API:One-Size-Fits-All (OSFA)

Solution

Page 11: Scaling the Netflix API - OSCON

Image courtesy of Jay Mac 3 on Flickr

Page 12: Scaling the Netflix API - OSCON

Netflix API Requests by AudienceAt Launch In 2008

External Developers

Page 13: Scaling the Netflix API - OSCON
Page 14: Scaling the Netflix API - OSCON
Page 15: Scaling the Netflix API - OSCON

Image courtesy of Jay Mac 3 on Flickr

Page 16: Scaling the Netflix API - OSCON

Netflix API Requests by AudienceFrom 2011

External Developers

Page 17: Scaling the Netflix API - OSCON

Scaling…

OrganizationSystems

Devices

Development

Testing

Page 18: Scaling the Netflix API - OSCON

Distributed Architecture

Page 19: Scaling the Netflix API - OSCON
Page 20: Scaling the Netflix API - OSCON

1000+ Device Types

Page 21: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies Reviews A/B Test

Engine

Dozens of Dependencies

Page 22: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 23: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 24: Scaling the Netflix API - OSCON

http://www.slideshare.net/reed2001/culture-1798664

Page 25: Scaling the Netflix API - OSCON

Scaling…

OrganizationSystems

Devices

Development

Testing

Page 26: Scaling the Netflix API - OSCON

System Resiliency

Page 27: Scaling the Netflix API - OSCON

Distributed Architecture

Page 28: Scaling the Netflix API - OSCON

Dependency Relationships

Page 29: Scaling the Netflix API - OSCON

2,000,000,000Requests Per Day to the

Netflix API

Page 30: Scaling the Netflix API - OSCON

30Distinct, Direct Dependent Services for the Netflix API

Page 31: Scaling the Netflix API - OSCON

14,000,000,000Netflix API Calls Per Day to those Dependent Services

Page 32: Scaling the Netflix API - OSCON

0Dependent Services with

100% SLA

Page 33: Scaling the Netflix API - OSCON

99.99% = 99.7%30

0.3% of 2B = 6M failures per day

2+ Hours of Downtime Per Month

Page 34: Scaling the Netflix API - OSCON

99.99% = 99.7%30

0.3% of 2B = 6M failures per day

2+ Hours of Downtime Per Month

Page 35: Scaling the Netflix API - OSCON

99.9% = 97%30

3% of 2B = 60M failures per day

20+ Hours of Downtime Per Month

Page 36: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 37: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 38: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 39: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 40: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 41: Scaling the Netflix API - OSCON
Page 42: Scaling the Netflix API - OSCON

Circuit Breaker Dashboard

Page 43: Scaling the Netflix API - OSCON
Page 44: Scaling the Netflix API - OSCON

Call Volume and Health / Last 10 Seconds

Page 45: Scaling the Netflix API - OSCON

Call Volume / Last 2 Minutes

Page 46: Scaling the Netflix API - OSCON

Successful Requests

Page 47: Scaling the Netflix API - OSCON

Successful, But Slower Than Expected

Page 48: Scaling the Netflix API - OSCON

Short-Circuited Requests, Delivering Fallbacks

Page 49: Scaling the Netflix API - OSCON

Timeouts, Delivering Fallbacks

Page 50: Scaling the Netflix API - OSCON

Thread Pool & Task Queue Full, Delivering Fallbacks

Page 51: Scaling the Netflix API - OSCON

Exceptions, Delivering Fallbacks

Page 52: Scaling the Netflix API - OSCON

Error Rate# + # + # + # / (# + # + # + # + #) = Error Rate

Page 53: Scaling the Netflix API - OSCON

Status of Fallback Circuit

Page 54: Scaling the Netflix API - OSCON

Requests per Second, Over Last 10 Seconds

Page 55: Scaling the Netflix API - OSCON

SLA Information

Page 56: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 57: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 58: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Page 59: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Fallback

Page 60: Scaling the Netflix API - OSCON

Personalization

EngineUser Info Movie

MetadataMovie Ratings

Similar Movies

API

Reviews A/B Test Engine

Fallback

Page 61: Scaling the Netflix API - OSCON

System Infrastructure

Page 62: Scaling the Netflix API - OSCON

AWS Cloud

Page 63: Scaling the Netflix API - OSCON
Page 64: Scaling the Netflix API - OSCON
Page 65: Scaling the Netflix API - OSCON

Autoscaling

Page 66: Scaling the Netflix API - OSCON

Autoscaling

Page 67: Scaling the Netflix API - OSCON

More than 36 Million Subscribers

More than 40 Countries

Page 68: Scaling the Netflix API - OSCON

ZuulGatekeeper for the Netflix Streaming Application

Page 69: Scaling the Netflix API - OSCON

Zuul

• Multi-Region Resiliency

• Insights• Stress Testing• Canary Testing• Dynamic Routing

• Load Shedding• Security• Static Response

Handling• Authentication

Page 70: Scaling the Netflix API - OSCON

Isthmus

Page 71: Scaling the Netflix API - OSCON

Forced Failure

Page 72: Scaling the Netflix API - OSCON
Page 73: Scaling the Netflix API - OSCON

Scaling…

OrganizationSystems

Devices

Development

Testing

Page 74: Scaling the Netflix API - OSCON
Page 75: Scaling the Netflix API - OSCON
Page 76: Scaling the Netflix API - OSCON

Screen Real Estate

Page 77: Scaling the Netflix API - OSCON

Controller

Page 78: Scaling the Netflix API - OSCON

Technical Capabilities

Page 79: Scaling the Netflix API - OSCON

One-Size-Fits-AllAPI

Request

RequestRequest

Request

Request

Request

RequestRequest

Request

Request

RequestRequest

Request

Request

Request

Request

Page 80: Scaling the Netflix API - OSCON

Scaling…

OrganizationSystems

Devices

Development

Testing

Page 81: Scaling the Netflix API - OSCON

Courtesy of South Florida Classical Review

Page 82: Scaling the Netflix API - OSCON
Page 83: Scaling the Netflix API - OSCON

Resource-Based API

vs.

Experience-Based API

Page 84: Scaling the Netflix API - OSCON

Resource-Based Requests

• /users/<id>/ratings/title• /users/<id>/queues• /users/<id>/queues/instant• /users/<id>/recommendations• /catalog/titles/movie• /catalog/titles/series• /catalog/people

Page 85: Scaling the Netflix API - OSCON

REST API

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

Network Border Network Border

Page 86: Scaling the Netflix API - OSCON

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

OSFA API

Network Border Network Border

SERVER CODE

CLIENT CODE

Page 87: Scaling the Netflix API - OSCON

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

OSFA API

Network Border Network Border

DATA GATHERING,FORMATTING,AND DELIVERY

USER INTERFACERENDERING

Page 88: Scaling the Netflix API - OSCON
Page 89: Scaling the Netflix API - OSCON
Page 90: Scaling the Netflix API - OSCON

Experience-Based Requests

• /ps3/homescreen

Page 91: Scaling the Netflix API - OSCON

JAVA API

Network Border Network Border

RECOMMENDATIONS

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

Groovy Layer

Page 92: Scaling the Netflix API - OSCON
Page 93: Scaling the Netflix API - OSCON

RECOMMENDATIONSA

ZXSXX C CCC

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

JAVA API

SERVER CODE

CLIENT CODE

CLIENT ADAPTER CODE(WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)

Network Border Network Border

Page 94: Scaling the Netflix API - OSCON

RECOMMENDATIONSA

ZXSXX C CCC

MOVIE DATA

SIMILAR MOVIES

AUTH MEMBERDATA

A/B TESTS

START-UP

RATINGS

JAVA API

DATA GATHERING

DATA FORMATTINGAND DELIVERY

USER INTERFACERENDERING

Network Border Network Border

Page 95: Scaling the Netflix API - OSCON
Page 96: Scaling the Netflix API - OSCON

Scaling…

OrganizationSystems

Devices

Development

Testing

Page 97: Scaling the Netflix API - OSCON

Dependency Relationships

Page 98: Scaling the Netflix API - OSCON
Page 99: Scaling the Netflix API - OSCON

Testing Philosophy:

Act Fast, React Fast

Page 100: Scaling the Netflix API - OSCON

That Doesn’t Mean We Don’t Test

• Unit tests

• Functional tests

• Regression scripts

• Continuous integration

• Capacity planning

• Load / Performance tests

Page 101: Scaling the Netflix API - OSCON

Cloud-Based Deployment Techniques

Page 102: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

Page 103: Scaling the Netflix API - OSCON

Canary Analysis Automation

Page 104: Scaling the Netflix API - OSCON

Single Canary InstanceTo Test New Code with Production Traffic

(around 1% or less of traffic)

Current Code

In Production

API Requests from the Internet

Error!

Page 105: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

Page 106: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

Perfect!

Page 107: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

Perfect!

Page 108: Scaling the Netflix API - OSCON

Stress Test with Zuul

Page 109: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 110: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 111: Scaling the Netflix API - OSCON

Error!

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 112: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 113: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

Perfect!

Page 114: Scaling the Netflix API - OSCON

Stress Test with Zuul

Page 115: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 116: Scaling the Netflix API - OSCON

Current Code

In Production

API Requests from the Internet

New Code

Getting Prepared for Production

Page 117: Scaling the Netflix API - OSCON

API Requests from the Internet

New Code

Getting Prepared for Production

Page 118: Scaling the Netflix API - OSCON

https://www.github.com/Netflix

Page 119: Scaling the Netflix API - OSCON

Scaling the Netflix API

Daniel Jacobson@daniel_jacobson

http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson

HelpWanted!