netflix cloud architecture and open source

29
Netflix Architecture and Open Source Andrew Spyker Senior Software Engineer, Netflix

Upload: aspyker

Post on 07-Jan-2017

3.604 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Netflix Cloud Architecture and Open Source

Netflix Architecture and Open Source

Andrew SpykerSenior Software Engineer, Netflix

Page 2: Netflix Cloud Architecture and Open Source

About Netflix

69M members2000+ employees (1400 tech)80+ countries> 100M hours watch per day> ⅓ NA internet download traffic500+ MicroservicesMany 10’s of thousands VM’s3 regions across the world

Page 3: Netflix Cloud Architecture and Open Source

About the SpeakerCloud platform technologies

Distributed configuration, service discovery, RPC, application frameworks, non-Java sidecar

Container cloudResource management and scheduling, making Docker containers

operational in Amazon EC2/ECS

Open SourceOrganize @NetflixOSS meetups & internal group

PerformanceAssist across Netflix, but focused mainly on cloud platform perf

With Netflix for ~ 1 year. Previously at IBM here in Raleigh/Durham (RTP)

@aspyker

ispyker.blogspot.com

Page 4: Netflix Cloud Architecture and Open Source

Agenda

NetflixOSSNetflix Cloud ArchitectureGetting started

Page 5: Netflix Cloud Architecture and Open Source

Why does Netflix open source?Allows engineers to gather feedback

Openly talk, through code, on our approachCollaboration on key projects with the worldHappily use proven outside open source

And improve it for Netflix scale and availability

Netflix culture of freedom and responsibilityWant to open source?Go for it, be responsible!

Recruiting and RetentionCandidates know exactly what they can work onNetflixOSS engineers choose to stay at Netflix

Page 6: Netflix Cloud Architecture and Open Source

NetflixOSS is widely usedThe architecture has shaped public cloud usage

Immutability, Red/Black Deploys, Chaos,Regional and worldwide high availability

OfferingsPivotal Spring Cloud

Large usageIBM Watson as a Service (on IBM Cloud)Nike Digital is hiring NetflixOSS experts

Interesting usage“To help locate new troves of data claiming to be the files stolen from

AshleyMadison, the company’s forensics team has been using a tool that Netflix released last year called Scumblr”

Page 7: Netflix Cloud Architecture and Open Source

NetflixOSS Website Relaunch

http://netflix.github.io

Page 8: Netflix Cloud Architecture and Open Source

Key aspects of NetflixOSS websiteShow how the pieces fit together

Projects now discussed with each other in context

OSS categories mirror internal teamsNo artificial categories, focal points for each area

Focus on projects that are core to NetflixProjects mentioned are core and strategic

Page 9: Netflix Cloud Architecture and Open Source

Agenda

NetflixOSSNetflix Cloud ArchitectureGetting Started

Page 10: Netflix Cloud Architecture and Open Source

Elastic, Web and Hyper Scale

Doing this

Not doing that

Page 11: Netflix Cloud Architecture and Open Source

Elastic, Web and Hyper Scale

Front endAPI

AnotherMicroservice

Temporalcaching

DurableStorage

LoadBalancers

Strategy BenefitMake deployments automated Without automation impossibleExpose well designed API to users Offloads presentation complexity to clientsRemove state for mid tier services Allows easy elastic scale outPush temporal state to client and caching tier Leverage clients, avoids data tier overloadUse partitioned data storage Data design and storage scales with HA

……

RecommendationMicroservice

Page 12: Netflix Cloud Architecture and Open Source

HA and Automatic Recovery

Feeling This

Not Feeling That

Page 13: Netflix Cloud Architecture and Open Source

Micro serviceImplementation

Call microservice #2

Highly Available Service Runtime Recipe

Ribbon REST clientwith Eureka

Microservice #1(REST services)

App ServiceMicroservice #2

Executecall

Hyst

rix

EurekaServer(s)

EurekaServer(s)

EurekaServer(s)

KaryonFallback

Implementation

Implementation Detail Benefits

Decompose into micro services • Key user path always available• Failure does not propagate across service boundaries

Karyon /w automatic Eureka registration • New instances are quickly found• Failing individual instances disappear

Ribbon client with Eureka awareness • Load balances & retries across instances with “smarts”• Handles temporal instance failure

Hystrix as dependency circuit breaker • Allows for fast failure• Provides graceful cross service degradation/recovery

Page 14: Netflix Cloud Architecture and Open Source

IaaS High Availability

Region (us-east-1)

us-east-1eus-east-1c

Eureka

Web App Service1 Service2

Cluster Auto Recovery and Scaling Services (Auto Scaling Groups)

ELB’s

Rule Why?Always > 2 of everything 1 is SPOF, 2 doesn’t web scale and slow DR recoveryIncluding IaaS and cloud services You’re only as strong as your weakest dependencyUse auto scaler/recovery monitoring Clusters guarantee availability and service latencyUse application level health checks Instance on the network != healthyWorldwide availability Data replication, global front-end routing, cross region traffic

us-east-1d

Page 15: Netflix Cloud Architecture and Open Source

A truly global service

Replicate data across regions

Be able to redirect traffic from region to region

Be able to migrate regional traffic to other regions

Have automated control across regions

Flux Demo

Page 16: Netflix Cloud Architecture and Open Source

Testing is only way to prove HA

Chaos MonkeyKill instances in production - runs regularly

Chaos GorillaKills availability zones (single datacenter)Also testing for split brain important

Chaos KongKill entire region and shift traffic globallyRun frequently but with prior scheduling

Page 17: Netflix Cloud Architecture and Open Source

Continuous Delivery

Reading This

Not This

Page 18: Netflix Cloud Architecture and Open Source

v

Continuous Delivery

Cluster v1 Canary v2 Cluster V2

Step TechnologyDevelopers test locally Unit test frameworksContinuous build Continuous build server based on gradle buildsBuild “bakes” full instance image Aminator and deployment pipeline bake images from build

artifactsDeveloper work across dev and test Archaius allows for environment based contextDevelopers do canary tests, red/black deployments in prod

Asgard console provides app cluster common devops approach, security patterns, and visibility

ContinuousBuild Server

Baked to images (AMI’s)

… …

Page 19: Netflix Cloud Architecture and Open Source

From Asgard to SpinnakerSpinnaker is our CI/CD solution

CI/CD solution including baking and Jenkins integrationWorkflow engine for the continuous deliveryPipeline based deployment including bakingGlobal visibility across all of our AWS regionsProvides an API first designA microservices runtime HA architectureMore flexible cloud model so the community can contribute back

improvements not related to AWS

Asgard continues to work side-by-sideSpinnaker is this new end to end CI/CD tool

Page 20: Netflix Cloud Architecture and Open Source

Spinnaker ExamplesWorks atNetflix scale

Views of global pipelines

From simple Asgard like deployment to advanced CI/CD pipelines

Page 21: Netflix Cloud Architecture and Open Source

Operational Visibility

If you can’t see it, you can’t improve it

Page 22: Netflix Cloud Architecture and Open Source

Operational Visibility

Microservice #1 Microservice #2Visibility Point TechnologyBasic IaaS instance monitoring Not enough (not scalable, not app specific)User like external monitoring SaaS offerings or OSS like UptimeTargeted performance, sampling Vector performance and app level metricsService to service interconnects Hystrix streams ➔Turbine aggregation ➔Hystrix dashboard

Application centric metrics Servo/Spectator gauges, counters, timers sent to metrics store like AtlasRemote logging Logstash/Kibana or similar log aggregation and analysis frameworksThreshold monitoring and alerts Services like Atlas and PagerDuty for incident management

Servo/Spectator

Hystrix/Turbine

External UptimeMonitoring Metric/Event

Repositories

LogStash/ElasticSearch/Kibana

Incidents

……

Atlas

Vector

Page 23: Netflix Cloud Architecture and Open Source

Security

Dynamic Security

Done in new ways

NOT

Page 24: Netflix Cloud Architecture and Open Source

Dynamic, Web Scale & Simpler SecuritySecurity Monkey

Monitors security policies, tracks changes, alerts on situationsScumblrSearches internet for security “nuggets” (credentials, hacking discussions)SketchyA safe way to collect text and screenshots from websitesFIDOAutomated event detection, analysis, enrichment & and enforcementSleepy PuppyDelayed cross site scripting propagation testing frameworkLemurx.509 certificate orchestration framework

Page 25: Netflix Cloud Architecture and Open Source

What did we not cover?

Over 50 github projectsNetflixOSS is “Technical indigestion as a service”

Big Data, Data Persistence and UI Engineering

Big Data tools used well beyond NetflixEphemeral, semi and fully persistent data systemsRecent addition of UI OSS and Falcor

Page 26: Netflix Cloud Architecture and Open Source

Agenda

NetflixOSSNetflix Cloud Architecture

Getting Started

Page 27: Netflix Cloud Architecture and Open Source

How do I get started?All of the previous slides shows NetflixOSS components

Code: http://netflix.github.ioAnnouncements: http://techblog.netflix.com/

Want to get running a bit faster?ZeroToCloud

Workshop for getting started with build/bake/deploy in Amazon EC2

ZeroToDockerDocker images that containing running Netflix technologies (not production

ready, but easy to understand)

Page 28: Netflix Cloud Architecture and Open Source

ZeroToDocker Demo

Mac OS X

Virtual Box

Ubuntu 14.04

single kernel

Con

tain

er #

1Fi

lesy

stem

+

proc

ess

Eur

eka

Con

tain

er

Zuul

C

onta

iner

Ano

ther

C

onta

iner

...

Docker running instancesSingle kernelContained processes

Zookeeper and ExhibitorA Microservices app and

surrounding NetflixOSS services (Zuul to Karyon

with Eureka)

Page 29: Netflix Cloud Architecture and Open Source

Questions

?