chefconf 2015 - chef retrospective

Post on 21-Jul-2015

65 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Looking Back:

@JoeNuspl @gwaldo

Chef Retrospectives at Workday & CommerceHub

Joe and I have come to Chef from drastically different places, and our working conditions are almost guaranteed to be different than yours, but here are some lessons that we’ve learned the long way.

About Joe

• Senior Ops Engineer

• @JoeNuspl

• github.com/nvwls

J

Workday

A NYSE listed company (WDAY) that provides enterprise cloud applications for human capital management (HCM), payroll,

financial management, recruiting, and analytics.

J

Workday Environment• 9 physical data centers world wide plus Amazon

and HP cloud

• 124 roles

• 153 cookbooks

• More than 10K servers under chef control

• PCI and Regulatory compliance

J

About Waldo

• Pipeline Team Lead

• @gwaldo

• github.com/gwaldo

Connect e-Retailers with Suppliers, providing drop-shipping services. Processed > 44 million orders for the top online retailers in US & Canada (> $7B retail sales)

CommerceHub Environment

(kidding)

CommerceHub Environment

• Low-Thousands of VMs (VMware)

• Mostly monolithic codebase

• Java on Windows originally, now split w/ Ubuntu

• Many Roles and (small) Envs

Introduction of Chef

• 0.8.2 in 2010

• Knew no ruby

• Hired to apply engineering disciple to operations

• Chef 11 in 2013

• Knew no ruby

• Hired as DevOps / Automation Cheerleader

CommerceHubWorkday

The Good @ Workday: SSH

• 2FA ssh into the data center, then multi hop ssh to get the final machine

• Wrote ssh wrapper that grabs PIN from SecurID.app and sets up ssh control masters and socks proxies along the way.

• A VP regularly uses it to get access to some realtime performance dashboards.

J

The Good @ Workday: Jira Automation

• Don’t just automate servers; automate workflow

• Automate routine Jira / Confluence updates

J

Good at CommerceHub• Solid Infrastructure, ramping

up spending

• There was a lot of desire for improvements

• People care

• Some automation was already in place

• exp. around Testing

The Bad @ Workday: Chef Workarounds

• cookbook_file resources would update the file every chef run. Used templates for everything.

• search was slow and unreliable. ran knife exec scripts to collect the search data and stuff it into a data bag.

• too much “convert this shell script into chef code”

J W: Chef Search result order (“Sensunamis”)

The Bad @ Workday: Community Cookbook

Quality Variance• A majority assumed:

• running ubuntu

• Internet access

• can compile code

J W: I understand the Internet Access assumption, but not the code compiling one. Is it that you wouldn’t want to compile everything, but options for specifying a built package aren’t available? A bigger problem that I have with community cookbooks is that many simply don’t work. Ask about this on stage. ‘In fact, mcollective removes things like…'

The Bad @ Workday: Not having a “gold

standard cookbook”• Programmers tend to plagiarize.

• It is encouraged as “code reuse”

• People inevitably choose the worst example

• Causing the crap to spread

J

Bad at CommerceHub• Key people wanted different things

• Lots of “Key People”

• “Can you automate this environment first?”

• Gatekeepers

• Little insight tooling (logging, metrics, alerting)

• Surprise! Chef requires Engineering Effort

It’s not a DevOpsey conference without a @littleidea quote. But seriously, it seems that some people thought “Hire a DevOp, and it’ll magically get better!”

The Ugly @ Workday: Data Bag Misuse

• Created the silo data bag to put data center specific overrides

• Predated Chef::Environments

• Grew out of control, 280K of json.

J

The Ugly @ Workday• Not being tightly integrated with the rest

of the Infrastructure team

• Not creating build pipeline sooner

• Not creating easy-to-use test environments sooner

• Occasional excessive logic in Templates

• We were lacking clear “Gold Standard” Cookbook design example.

J Not tooting our own horn

Ugly at CommerceHub• Developers sometimes uninterested

in Chef/Ruby “Ops Work”

• Not establishing opinions early (TIMTOWTDI)

• Many small Environments

• Many teams solving the same problems*

Ugly at CommerceHub

• Resistance to Include Ops Eng work in timeframes

• Aligning People + Interest + Time/Opportunity/Dollars

• Berkshelf and Testing are late additions to Chef workflow

What do you call…A group of Wolves?

A group of Crows?

A group of Developers?

a Pack

a Murder

a Merge Conflict

W: Despite interest in Chef, there was significant resistance to adopt.

Why Resistant to Change?• You’re going to automate me out of a job

• I inherited the pile of crap, I don’t understand how it works, so if you break it I won’t be able to fix this.

• If it ain’t broke, don’t fix it. (or “I made this pile of crap. Don’t change it.”)

• Damn it Jim, I’m sys admin not a programmer.

• Used to Ops being invisible.

Resistant to Change

• “I’d just have to verify that it worked anyway.”

• Overemphasis on Standardization and Consensus.

• The people know the processes. They made them.

• “I don’t trust code.”

• “It’ll take longer to do the automation than the work.”

Friction• Status Quo

• Language

• Common Idioms

• “I have to learn Ruby?!”

• Analysis Paralysis

• Training, because Learning Curve

• “Windows Support*”

Friction

• Ops Engineering

• More used to the Former than the Latter

MistakesWereMade

J

What could we have done better?

• Lots of things

• Identify the goals of your org & make them:

• See the light

• Enter the light

• And shine

• Fight the Silver Bullet mentality

J

What could we have done better?

• Be more explicit about engineering effort involved. (It’s software engineering)

• Chef is powerful, but not always the best tool for the job.

• Identify as part of a skill and job promotion.

W:

What could we have done better?

• More Explicit about code-reviews.

• Be more opinionated early-on.

• Testing up-front.

W:

Wins

• Consistency

• No more snowflake hunts

• Mitigating environment differences

• Capacity additions made easy

• Facilitating Services split-outs

We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.

Wins

• Gateway drug to automation-addiction

• People Upgrades

• Bringing visibility of Operations work

• Reduction of “Works on my machine” rage

We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.

So, where does that leave us?

So here is where we have suggestions from Mugglesville.

Request #1: “Best Practices”

We’re often asked for “Best Practices”, but people see things like this. Their reaction is…

…If I try to be non-proscriptive, and give options, it can come across as wishy-washy….

Well…

…Trying to figure out what they need leads to exasperation…

…and sometimes they wonder if we know what we’re doing. Having strong feelings leads to the original problem when they see an opposing view. (ROLES STAHP)

Or you end up being wrong because they do something(s) unexpected.

Solution #1 “Recommended Practices”

• Present Options/Views of a subject (e.g. Roles)

• Explain pros & cons of the approach.

• “If your environment looks like ABC, this may make sense for you.”

• Reviewed periodically, and describe changes visibly.

So, let’s give it to them.

Request #2: “Where the [devops] did that value come from?”

Attributes. I love ‘em.

“Can you take a look at something?

I can’t figure out why the value isn’t $val.”

This is where I take them through the process of figuring out what values are being set, and where in the order they fit. This is time-consuming. And I often come down to showing them this:

https://docs.chef.io/attributes.html#attribute-precedence

I love this page. It gives new Chefs hives. 15 attribute levels. But you want to help, so you sit down.

And start digging… After awhile you can’t figure it out, when they say…

“Oh, I must have set this value on the node itself…”

“WHY WOULD YOU DO THAT?!” (I’d want to scream)

What I’d like to see is something like this:

Solution #2 `knife  (…)  inspect  (…)`

The process to determine what value is set. Let’s make it a little more verbose.

Request #3: Windows

Look, I love this community. And I honestly don’t hate Windows. But Chef-on-Windows has not been great this last 2 years.

Request #4: Versions on Roles & Envs

It’s time, no?

W: Finally, a Plea to Chef: Chef is not our job. Our priorities are not the same. Asking for empathy and patience, and we’ll give you the same.

Now, we’d like to end on a high-note…

Introducing

Sous Chef

https://github.com/commercehub-oss/sous_chef/

(not an official logo) Work of Larry Zarou, this is a cookbook to help you set up a cookbook-testing pipeline.

Introducing

Sous Chef

https://github.com/commercehub-oss/sous_chef/

Currently opinionated toward CHUB’s environment, but contribs welcome!

@joenuspl Workday

@gwaldo CommerceHub

Thank you!

Please rate in the app

top related