building a distributed & automated open source program at netflix

Post on 23-Jan-2018

176 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Netflix Open Source

Andrew Spyker (@aspyker) - Engineering Manager

Building a distributed andautomated open source program

About Netflix

● 86.7M members● A few thousand employees● 190+ countries● > ⅓ NA internet download traffic● 500+ Microservices● Many 10’s of thousands VM’s● 3 regions across the world

Trivia

Netflix been opensourcing, since?

a) Around the start of streaming service - 2007b) Around when we went international - 2010c) Around House of Cards release time - 2013

Answer

2010

Why does Netflix Open Source?

Improve Engineering● Great feedback from wider community● Collaborate through open code

Recruit new and retain engineering talent● Hard problems are openly worked on

Industry Alignment

Why does Netflix Open Source?

Netflix movesto cloud

2008

2013

2016

http://netflix.github.io

Open Source Functional Areas

● Contribute to Hadoop, Hive, Pig, Parquet, Presto, Spark● Genie - RESTful API’s for Big Data Jobs● Lipstick - Graphical depiction of executing Pig jobs● Aegisthus - Data pipeline from Cassandra to Big Data

Open Source Functional Areas

● Nebula - Plugins for gradle to simplify builds● Animator - Bakes AMI’s from OS installation packages● Spinnaker - New continuous delivery platform

Open Source Functional Areas

● Eureka, Ribbon, Hystrix - Cloud native, resilient IPC● Karyon, Prana, Archius - Microservice App Frameworks● Fenzo - Mesos advanced scheduling library

Open Source Functional Areas

● Photon - Java Interoperable File Format implementation● VMAF - Perceptual quality metric algorithm and test toolkit

Open Source Functional Areas

● Raigad/Priam - Management/ops sidecars for ES and C*● EVCache - Distributed, replicated memcache++● Dynomite - Dynamo layer on top of non-dynamo data stores

Open Source Functional Areas

● Spectator/Atlas - Monitoring and Telemetry client and server● Vector - Fine grained per instance performance monitoring● Vizceral - Worldwide traffic to microservice graph

visualization● Simian Army - Suite of automations and resiliency testing

Open Source Functional Areas

● Security Monkey - Automated cloud security monitoring● Scumblr/Sketchy - Internet intelligence gathering● FIDO - Security event orchestration (analysis/response)● Lemur - Simplified x.509 cert management● Sleepy Puppy - Delayed cross site scripting framework

Open Source Functional Areas

● Work across front end technologies including Restify● Falcor - Virtual JSON graph & optimized query to backends● RxJS - Simplify Javascript async event based programming

Netflix’s approach to open source

Form a small cross-functional team working group that centralizes OSS competence, assisting decentralized teams working with OSS spend less time focusing on the administrative aspects (legal, tooling, branding, monitoring, and community promotion).

Open source enabler - OSS Interest Group

● Internal mailing list● Meets once per month● Topics from developers● Help each other with

common problems

Trivia

How many OSS projects does Netflix have?

a) 59b) 102c) 176

Answer

176

Netflix (119) Spinnaker (17) nebula-plugins (40)

Open Source Shepherds

● Management with business context

● Consistency across related projects

● Document how area fits together

● Focus on OSS health of each area

Common tools accelerate developers

● Security● Backup● Github user/group repo management● Project tracking● Build systems● CI systems

Security tools

● We scan code for○ Access keys, credentials, email

addresses, hostnames● Provide tools and automation to

○ Scan before initial release○ Scan repeatedly on github

Source code management

● Backup and archival○ Github down != Netflix down

● Internal mirrors we could build from

Project Ownership

All projects have● Development lead, Management lead● Shepherd from OSS function areaOnly projects with active leads stay active!

Github management

● Has to be easy○ Otherwise, teams will go it alone

● Has to be automated○ Self service - chat ops○ Following secure best practices

Github user managementSupport bring github id● User links to internal id● All tools then can

associate identity

Two Factor Auth Enforcement● Automation to boot users who don’t● Be careful - education on recovery!

Github group management

● Owners○ Limited group - due to power○ Automate via chatops all owner actions

● Netflixer group○ Full write permissions on all repos

● Outside contributors○ Added by netflixers, validated over time

Github automated through chat ops

Overall Org Health Tracking

Metrics we track

● Issues○ open, closed, TTC

● Pull Requests○ open, closed, TTC

● Last commit timing● Stars/forks● Num contributors

Project Health Tracking

github.com/Netflix/

OSSTracker

● Repeatable builds● deb/rpm files for OS

package baking● Reduces boilerplate for

common best practices● Standards for

release/version mgmt

Common Build For Gradle/Java

nebula-plugins.github.io

Common CI Systems

● Travis CI○ Populate .travis.yml and sh files○ Standard targets for snapshots,

candidates, and releases○ Binary upload credentials handled○ Consistency across projects

● Cloudbees○ Job-dsl to create release jobs

Using Docker to make projects easier

● A running image is worth a thousand wiki documents

● Started with ZeroToDocker○ Monolithic solution○ Leveraged Dockerhub

trusted builds

Introducing TravisCI Docker buildsFunction Dockerhub

trusted buildsTravisCI Docker support

Github commit traceable builds ✔ ✔

Trusted build servers ✔ ✔

Full build control (labels, etc.) ✖ ✔

Easy to integrate with artifact releases ✖ ✔

● Experimenting: OSSTracker & Genie● Docker compose used across images

TODO Group

● Joined 2015● Collaborate on how

to better collaborate● Leverage TODO group’s work

○ Github focus○ Automation innovations

● Good group for helping OSS companies

Trivia

Which of the followingdoes Hystrix lead in?

a) Most PR’s closed d) Most Forksb) Most Issues closed e) Most contributorsc) Most Stars

Answer

All of the above

Recent NetflixOSS Releases

CI atNetflix scale

Multi-region deployment control

Advanced CI/CD pipelines

Recent NetflixOSS Releases

Chaos Monkey 2.0● Integrated with Spinnaker● Termination scheduling better● Termination event tracking

Photon● Java IMF implementation● Parsing, Interpretation, Validation

Recent NetflixOSS Releases

Vizceral● React and Web Component● Graph data to visualize traffic

Dynomite● Dynamo layer on top of data stores● Redis and memcache● Manager (config, multi-region, backup)

Questions?

Andrew Spyker (@aspyker) - Engineering Manager

top related