building a distributed & automated open source program at netflix
TRANSCRIPT
Netflix Open Source
Andrew Spyker (@aspyker) - Engineering Manager
Building a distributed andautomated open source program
About Netflix
● 86.7M members● A few thousand employees● 190+ countries● > ⅓ NA internet download traffic● 500+ Microservices● Many 10’s of thousands VM’s● 3 regions across the world
Trivia
Netflix been opensourcing, since?
a) Around the start of streaming service - 2007b) Around when we went international - 2010c) Around House of Cards release time - 2013
Answer
2010
Why does Netflix Open Source?
Improve Engineering● Great feedback from wider community● Collaborate through open code
Recruit new and retain engineering talent● Hard problems are openly worked on
Industry Alignment
Why does Netflix Open Source?
Netflix movesto cloud
2008
2013
2016
http://netflix.github.io
Open Source Functional Areas
● Contribute to Hadoop, Hive, Pig, Parquet, Presto, Spark● Genie - RESTful API’s for Big Data Jobs● Lipstick - Graphical depiction of executing Pig jobs● Aegisthus - Data pipeline from Cassandra to Big Data
Open Source Functional Areas
● Nebula - Plugins for gradle to simplify builds● Animator - Bakes AMI’s from OS installation packages● Spinnaker - New continuous delivery platform
Open Source Functional Areas
● Eureka, Ribbon, Hystrix - Cloud native, resilient IPC● Karyon, Prana, Archius - Microservice App Frameworks● Fenzo - Mesos advanced scheduling library
Open Source Functional Areas
● Photon - Java Interoperable File Format implementation● VMAF - Perceptual quality metric algorithm and test toolkit
Open Source Functional Areas
● Raigad/Priam - Management/ops sidecars for ES and C*● EVCache - Distributed, replicated memcache++● Dynomite - Dynamo layer on top of non-dynamo data stores
Open Source Functional Areas
● Spectator/Atlas - Monitoring and Telemetry client and server● Vector - Fine grained per instance performance monitoring● Vizceral - Worldwide traffic to microservice graph
visualization● Simian Army - Suite of automations and resiliency testing
Open Source Functional Areas
● Security Monkey - Automated cloud security monitoring● Scumblr/Sketchy - Internet intelligence gathering● FIDO - Security event orchestration (analysis/response)● Lemur - Simplified x.509 cert management● Sleepy Puppy - Delayed cross site scripting framework
Open Source Functional Areas
● Work across front end technologies including Restify● Falcor - Virtual JSON graph & optimized query to backends● RxJS - Simplify Javascript async event based programming
Netflix’s approach to open source
Form a small cross-functional team working group that centralizes OSS competence, assisting decentralized teams working with OSS spend less time focusing on the administrative aspects (legal, tooling, branding, monitoring, and community promotion).
Open source enabler - OSS Interest Group
● Internal mailing list● Meets once per month● Topics from developers● Help each other with
common problems
Trivia
How many OSS projects does Netflix have?
a) 59b) 102c) 176
Answer
176
Netflix (119) Spinnaker (17) nebula-plugins (40)
Open Source Shepherds
● Management with business context
● Consistency across related projects
● Document how area fits together
● Focus on OSS health of each area
Common tools accelerate developers
● Security● Backup● Github user/group repo management● Project tracking● Build systems● CI systems
Security tools
● We scan code for○ Access keys, credentials, email
addresses, hostnames● Provide tools and automation to
○ Scan before initial release○ Scan repeatedly on github
Source code management
● Backup and archival○ Github down != Netflix down
● Internal mirrors we could build from
Project Ownership
All projects have● Development lead, Management lead● Shepherd from OSS function areaOnly projects with active leads stay active!
Github management
● Has to be easy○ Otherwise, teams will go it alone
● Has to be automated○ Self service - chat ops○ Following secure best practices
Github user managementSupport bring github id● User links to internal id● All tools then can
associate identity
Two Factor Auth Enforcement● Automation to boot users who don’t● Be careful - education on recovery!
Github group management
● Owners○ Limited group - due to power○ Automate via chatops all owner actions
● Netflixer group○ Full write permissions on all repos
● Outside contributors○ Added by netflixers, validated over time
Github automated through chat ops
Overall Org Health Tracking
Metrics we track
● Issues○ open, closed, TTC
● Pull Requests○ open, closed, TTC
● Last commit timing● Stars/forks● Num contributors
Project Health Tracking
github.com/Netflix/
OSSTracker
● Repeatable builds● deb/rpm files for OS
package baking● Reduces boilerplate for
common best practices● Standards for
release/version mgmt
Common Build For Gradle/Java
nebula-plugins.github.io
Common CI Systems
● Travis CI○ Populate .travis.yml and sh files○ Standard targets for snapshots,
candidates, and releases○ Binary upload credentials handled○ Consistency across projects
● Cloudbees○ Job-dsl to create release jobs
Using Docker to make projects easier
● A running image is worth a thousand wiki documents
● Started with ZeroToDocker○ Monolithic solution○ Leveraged Dockerhub
trusted builds
Introducing TravisCI Docker buildsFunction Dockerhub
trusted buildsTravisCI Docker support
Github commit traceable builds ✔ ✔
Trusted build servers ✔ ✔
Full build control (labels, etc.) ✖ ✔
Easy to integrate with artifact releases ✖ ✔
● Experimenting: OSSTracker & Genie● Docker compose used across images
TODO Group
● Joined 2015● Collaborate on how
to better collaborate● Leverage TODO group’s work
○ Github focus○ Automation innovations
● Good group for helping OSS companies
Trivia
Which of the followingdoes Hystrix lead in?
a) Most PR’s closed d) Most Forksb) Most Issues closed e) Most contributorsc) Most Stars
Answer
All of the above
Recent NetflixOSS Releases
CI atNetflix scale
Multi-region deployment control
Advanced CI/CD pipelines
Recent NetflixOSS Releases
Chaos Monkey 2.0● Integrated with Spinnaker● Termination scheduling better● Termination event tracking
Photon● Java IMF implementation● Parsing, Interpretation, Validation
Recent NetflixOSS Releases
Vizceral● React and Web Component● Graph data to visualize traffic
Dynomite● Dynamo layer on top of data stores● Redis and memcache● Manager (config, multi-region, backup)
Questions?
Andrew Spyker (@aspyker) - Engineering Manager