atmosphere 2014: really large scale systems configuration - phil dibowitz

65
Really large scale systems configuration Config Management @ Facebook Phil Dibowitz Atmosphere – May 2014

Upload: proidea

Post on 08-May-2015

130 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

For many years, Facebook managed its systems with cfengine2. With many individual clusters over 10k nodes in size, a slew of different constantly-changing system configurations, and small teams, this system was showing its age and the complexity was steadily increasing, limiting its effectiveness and usability. It was difficult to integrate with internal systems, testing was often impractical, and it provided no isolation of configurations, among many other problems. After an extensive evaluation of the tools and paradigms in modern systems configuration management – open source, proprietary, and a potential home-grown solution – we built a system based on the open-source project Chef. The evaluation process involved understanding the direction we wanted to take in managing the next many iterations of systems, clusters, and teams. More importantly, we evaluated the various paradigms behind effective configuration management and the different kinds of scale they provide. What we ended up with is an extremely flexible system that allows a tiny team to manage an incredibly large number of systems with a variety of unique configuration needs. In this talk we will look at the paradigms behind the system we built, the software we chose and why, and the system we built using that software. Further, we will look at how the philosophies we followed can apply to anyone wanting to scale their systems infrastructure. Phil Dibowitz - Phil Dibowitz has been working in systems engineering for 12 years and is currently a production engineer at Facebook. Initially, he worked on the traffic infrastructure team, automating load balancer configuration management, as well as designing and building the production IPv6 infrastructure. He now leads the team responsible for rebuilding the configuration management system from the ground up. Prior to Facebook, he worked at Google, where he managed the large Gmail environment, and at Ticketmaster, where he co-authored and open sourced a configuration management tool called Spine. He also contributes to, and maintains, various open source projects and has spoken at conferences and LUG’s on a variety of topics from Path MTU Discovery to X509.

TRANSCRIPT

Page 1: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Really large scalesystems configuration

Config Management @ FacebookPhil DibowitzAtmosphere – May 2014

Page 2: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Page 3: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Configuration Management Experience

● Co-authored Spine (2005-2007)

● Authored Provision

Scale Experience

● Ticketmaster, Google, Facebook

Passionate about scaling configuration management

Who am I?

Page 4: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

http://coolinterestingstuff.com/amazing-space-images/

Scaling

Page 5: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scaling Configuration Management

How many homogeneous systems can I maintain?

How many heterogeneous systems can I maintain?

How many people are needed?

Can I safely delegate delta configuration?

Page 6: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

The Goal

http://www.prathidhwani.org/sports/goal-2012-the-beautiful-game-is-back/

Page 7: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

The Goal

● 4 people

● Hundreds of thousands of heterogeneous systems

● Service owners own/adjust relevant settings

Page 8: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

What did we need?

Page 9: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

https://www.asburyseminary.edu/elink/my-profile-on-the-hub/

1. BasicScalableBuildingBlocks

Page 10: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

1. Basic Scalable Build Blocks

Distributed! Everything on the client (duh!)

Page 11: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Page 12: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

1. Basic Scalable Build Blocks

Distributed! Everything on the client (duh!)

Deterministic! The system you want on every run

Idempotent! Only the necessary changes

Extensible! Tied into internal systems

Flexible! No dictated workflow

Page 13: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/

2. Configuration as Data

Page 14: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

2. Configuration as Data

Service Owner

http://www.flickr.com/photos/laurapple/7370381182/

I want• shared mem• VIP• core files somewhere else• service running• less/more/no nscd caching

Page 15: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

2. Configuration as Data

Service owners may not know:

• How to configure VIPs

• Optimal sysctl settings

• Network settings

• Authentication settings

Page 16: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

http://livelovesweat.wordpress.com/2011/12/07/the-importance-of-flexibility/

3. Flexibility

Page 17: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

3. Flexibility

• Super-fast prototyping

• Internal assumptions can be changed - easily

• Extend in new ways – easily

• Adapt to our workflow ...

Page 18: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

3. Flexibility - Example

• Template /etc/sysctl.conf

• Build a hash of default sysctls

• Provide these defaults early in “run”

• Let any engineer munge the bits they want

• /etc/sysctl.conf template interpolated “after”

Page 19: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

http://www.flickr.com/photos/75905404@N00/7126147125/

Picking a tool

Page 20: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Why Chef?

Easier to see from a problem with Chef

Page 21: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Problem (for us): node.save()

● Scale issues with 15k-node clusters

● Standard solution sucks (disable ohai plugins)

Page 22: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Solution (for us): whitelist_node_attrs

• New cookbook re-opens Chef::Node.save()

• Deletes non-white-listed attrs before saving

• Have as much data as you want during the run

• We send < 1kb back to the server!

Code available:

https://github.com/opscode-cookbooks/whitelist-node-attrs

Page 23: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Chef: whitelist_node_attrsclass Chef class Node

alias_method :old_save, :save# Overwrite chef’s node.save to whitelist. doesn’t get “later” than thisdef save Chef::Log.info(“Whitelisting node attributes”) whitelist = self[:whitelist].to_hash self.default_attrs = Whitelist.filter(self.default_attrs, whitelist) self.normal_attrs = Whitelist.filter(self.normal_attrs, whitelist) self.override_attrs = Whitelist.filter(self.override_attrs, whitelist) self.automatic_attrs = Whitelist.filter(self.override_attrs, whitelist) old_saveend

endend

Page 24: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Chef: whitelist_node_attrs

Well... that’s flexible!

(we did this a few times)

Page 25: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our desired workflow

Page 26: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our Desired Workflow

• Provide API for anyone, anywhere to extend configs by

munging data structures

• Engineers don’t need to know what they’re building on, just

what they want to change

• Engineers can change their systems without fear of changing

anything else

• Testing should be easy

• And...

Page 27: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Something Different

“Moving idempotency up”

Or

“Idempotent Systems”

http://www.flickr.com/photos/esi_design/4548531839/sizes/l/in/photostream/

Page 28: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Moving Idempotency Up

● Idempotent records can get stale

● Remove cron/sysctl/user/etc. resource

● Resulting entry not removed => stale entries

● Idempotent systems control a set of configs

● Remove cron/sysctl/user/etc.

● No longer rendered in config

Page 29: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Idempotent Records vs. Systems

This is a pain:

Page 30: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Idempotent Records vs. Systems

This is better:

Page 31: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Case Study

Page 32: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Case Study: sysctl

● fb_sysctl/attributes/default.rb

● Provides defaults looking at hw, kernel, etc.

● fb_sysctl/recipes/default.rb

● Defines a template

● fb_sysctl/templates/default/sysctl.conf.erb

● 3-line template

Page 33: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Case Study: sysctl

# Generated by Chef, do not edit directly!<%- node['fb']['fb_sysctl'].to_hash.keys.sort.each do |key| %> <%= key %> = <%= node['fb']['fb_sysctl'][key] %> <%- end %>

Template:

# Generated by Chef, do not edit directly!...net.ipv6.conf.eth0.accept_ra = 1 net.ipv6.conf.eth0.accept_ra_pinfo = 0 net.ipv6.conf.eth0.autoconf = 0...

Result:

Page 34: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Case Study: sysctl

In the cookbook for the DB servers:

node.default['fb']['fb_sysctl']['kernel.shmmax'] = 19541180416

node.default['fb']['fb_sysctl']['kernel.shmall'] = 5432001

fb_database/recipes/default.rb

Page 35: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Case Study: sysctl

How does this help us scale?

• Delegation is simple, thus...

• Fewer people need to manage configs, therefore...

• Significantly better heterogenous scale

Page 36: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Other Examples

node.default['fb']['fb_networking']['want_ipv6'] = true

Want IPv6?

node.is_layer3?

Want to know what kind of network?

node.default['fb']['fb_cron']['jobs']['myjob'] = {

'time' => '*/15 * * * *',

'command' => 'thing',

'user' => 'myservice',

}

New cronjob?

Page 37: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Other Examples: DSR

LB

Web Web Web

Internet

Page 38: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Other Examples: DSR

node.add_dsr_vip('10.1.1.2')

Page 39: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our Chef Infrastructure

Page 40: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our Chef Infrastructure

Open Source Chef and Enterprise Chef

Page 41: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our Chef Infrastructure - Customizations

• Stateless Chef Servers• No search• No databags

• Separate Failure Domains

• Tiered Model

Page 42: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Production: Global

SVN RepoSVN Repo

CodeCode Cluster 1Cluster 1

ReviewReview

LintLint

Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4

Page 43: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Production: Cluster

Chef FE 2Chef FE 2Chef FE 1Chef FE 1

LB

LB

Chef BE 2Chef BE 2

Chef FE 3Chef FE 3

Web Web SVC DB DB

(grocery-delivery) Chef BE 1Chef BE 1

(grocery-delivery)

SVNSVN

Page 44: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Assumptions

• Server is basically stateless

• Node data not persistent

• No databags

• Grocery-delivery keeps roles/cookbooks in sync

• Chef only knows about the cluster it is in

Page 45: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Implementation Details

• Grocery-delivery runs on every backend, every minute

• Persistent data needs to come from FB systems of

record (SOR)

• Ohai is tied into necessary SORs

• Runlist is forced on every run by node

Page 46: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Implementation Details: Client● Report Handlers feed data into monitoring:

● Last exception seen

● Success/Failure of run

● Number of resources

● Time to run

● Time since last run

● Other system info

Page 47: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Implementation Details: Server● chef-server-stats gathers data for monitoring:

● Stats (postgres, authz [opc], etc.)

● Errors (nginx, erchef, etc.)

● Grocery-delivery syncs cookbooks, roles, (databags):

● Pluggable and configurable

● Both open-source:

● https://github.com/facebook/chef-utils

Page 48: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

But does it scale?

Page 49: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scale

• Cluster size ~10k+ nodes

• Lots of clusters

• 15 minute convergence (14 min splay)

• grocery-delivery runs every minute

Page 50: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scale - Erchef

Pre-erchef vs Post-erchef

(Enterprise Chef 1.4+)

Page 51: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scale - Erchef

Page 52: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scale – Open Source Chef Server 11

Let’s throw more than a cluster ata Chef instance!

(Open Source Chef 11.0)

Page 53: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Scale – Open Source Chef Server 11

Page 54: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz
Page 55: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Testing: Multi-prong approach

• Foodcritic pre-commits for sanity

• Chefspec for unit-testing of core/critical cookbooks

• Taste-tester for full test-in-prod

Page 56: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Testing: taste-tester

• Manage a local chef-zero instance

• Track changes to local git repo, upload deltas

• Point real production servers at chef-zero for testing

• Configurable, with plug-in framework

Page 57: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Testing: taste-tester

desktop$ taste-tester test -s server1.example.comSync your repo to chef-zero; Test on a prod machine

Run Chef on test server server1# chef-client

Fix bugs, re-sync desktop$ vim … ; taste-tester upload

Available at http://github.com/facebook/chef-utils

Page 58: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our source...

• Chef-server-stats – Get perf stats from chef-servers

• Taste-tester – Our testing system (new!)

• Grocery-delivery – Our cookbook syncing system

(rewritten and re-released!)

• Between-meals – changes-tracking lib for TT and GD

(new!)

• Example cookbooks – fb_cron and fb_sysctl (new!)

• Philosophy.md – Ideas in this talk in text form (new!)

Page 59: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Our source...

Coming soon...

rubygems.org releases of TT, BM, GD

Page 60: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Lessons

Page 61: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Lessons

• Idempotent systems > idempotent records

• Delegating delta config == easier heterogeneity

• Full programming languages > restrictive DSLs

• Scale is more than just a number of clients

• Easy abstractions are critical

• Testing against real systems is useful and necessary

Page 62: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Summary

So how about those types of scale?

Page 63: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Summary

How many homogeneous systems can you

maintain?

How many heterogeneous systems can you

maintain?

How many people are needed?

Can you safely delegate delta configuration?

>> 17k

> 17k

~4

Yes

Page 64: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

Thanks● Opscode

● Adam Jacob, Chris Brown, Steven Danna, & the erchef and

bifrost teams

● Andrew Crump

● foodcritic rules!

● Everyone I work(ed) with

● Aaron, Bethanye, David, KC, Larry, Marcin, Pedro, Tyler

Page 65: Atmosphere 2014: Really large scale systems configuration - Phil Dibowitz

INFRASTRUCTURE