chefconf 2015 - chef retrospective

61
Looking Back: @JoeNuspl @gwaldo Chef Retrospectives at Workday & CommerceHub

Upload: gwaldo

Post on 21-Jul-2015

63 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: ChefConf 2015 - Chef Retrospective

Looking Back:

@JoeNuspl @gwaldo

Chef Retrospectives at Workday & CommerceHub

Page 2: ChefConf 2015 - Chef Retrospective

Joe and I have come to Chef from drastically different places, and our working conditions are almost guaranteed to be different than yours, but here are some lessons that we’ve learned the long way.

Page 3: ChefConf 2015 - Chef Retrospective

About Joe

• Senior Ops Engineer

• @JoeNuspl

• github.com/nvwls

J

Page 4: ChefConf 2015 - Chef Retrospective

Workday

A NYSE listed company (WDAY) that provides enterprise cloud applications for human capital management (HCM), payroll,

financial management, recruiting, and analytics.

J

Page 5: ChefConf 2015 - Chef Retrospective

Workday Environment• 9 physical data centers world wide plus Amazon

and HP cloud

• 124 roles

• 153 cookbooks

• More than 10K servers under chef control

• PCI and Regulatory compliance

J

Page 6: ChefConf 2015 - Chef Retrospective

About Waldo

• Pipeline Team Lead

• @gwaldo

• github.com/gwaldo

Page 7: ChefConf 2015 - Chef Retrospective

Connect e-Retailers with Suppliers, providing drop-shipping services. Processed > 44 million orders for the top online retailers in US & Canada (> $7B retail sales)

Page 8: ChefConf 2015 - Chef Retrospective

CommerceHub Environment

(kidding)

Page 9: ChefConf 2015 - Chef Retrospective

CommerceHub Environment

• Low-Thousands of VMs (VMware)

• Mostly monolithic codebase

• Java on Windows originally, now split w/ Ubuntu

• Many Roles and (small) Envs

Page 10: ChefConf 2015 - Chef Retrospective

Introduction of Chef

• 0.8.2 in 2010

• Knew no ruby

• Hired to apply engineering disciple to operations

• Chef 11 in 2013

• Knew no ruby

• Hired as DevOps / Automation Cheerleader

CommerceHubWorkday

Page 11: ChefConf 2015 - Chef Retrospective
Page 12: ChefConf 2015 - Chef Retrospective
Page 13: ChefConf 2015 - Chef Retrospective

The Good @ Workday: SSH

• 2FA ssh into the data center, then multi hop ssh to get the final machine

• Wrote ssh wrapper that grabs PIN from SecurID.app and sets up ssh control masters and socks proxies along the way.

• A VP regularly uses it to get access to some realtime performance dashboards.

J

Page 14: ChefConf 2015 - Chef Retrospective

The Good @ Workday: Jira Automation

• Don’t just automate servers; automate workflow

• Automate routine Jira / Confluence updates

J

Page 15: ChefConf 2015 - Chef Retrospective

Good at CommerceHub• Solid Infrastructure, ramping

up spending

• There was a lot of desire for improvements

• People care

• Some automation was already in place

• exp. around Testing

Page 16: ChefConf 2015 - Chef Retrospective

The Bad @ Workday: Chef Workarounds

• cookbook_file resources would update the file every chef run. Used templates for everything.

• search was slow and unreliable. ran knife exec scripts to collect the search data and stuff it into a data bag.

• too much “convert this shell script into chef code”

J W: Chef Search result order (“Sensunamis”)

Page 17: ChefConf 2015 - Chef Retrospective

The Bad @ Workday: Community Cookbook

Quality Variance• A majority assumed:

• running ubuntu

• Internet access

• can compile code

J W: I understand the Internet Access assumption, but not the code compiling one. Is it that you wouldn’t want to compile everything, but options for specifying a built package aren’t available? A bigger problem that I have with community cookbooks is that many simply don’t work. Ask about this on stage. ‘In fact, mcollective removes things like…'

Page 18: ChefConf 2015 - Chef Retrospective

The Bad @ Workday: Not having a “gold

standard cookbook”• Programmers tend to plagiarize.

• It is encouraged as “code reuse”

• People inevitably choose the worst example

• Causing the crap to spread

J

Page 19: ChefConf 2015 - Chef Retrospective

Bad at CommerceHub• Key people wanted different things

• Lots of “Key People”

• “Can you automate this environment first?”

• Gatekeepers

• Little insight tooling (logging, metrics, alerting)

• Surprise! Chef requires Engineering Effort

Page 20: ChefConf 2015 - Chef Retrospective

It’s not a DevOpsey conference without a @littleidea quote. But seriously, it seems that some people thought “Hire a DevOp, and it’ll magically get better!”

Page 21: ChefConf 2015 - Chef Retrospective

The Ugly @ Workday: Data Bag Misuse

• Created the silo data bag to put data center specific overrides

• Predated Chef::Environments

• Grew out of control, 280K of json.

J

Page 22: ChefConf 2015 - Chef Retrospective

The Ugly @ Workday• Not being tightly integrated with the rest

of the Infrastructure team

• Not creating build pipeline sooner

• Not creating easy-to-use test environments sooner

• Occasional excessive logic in Templates

• We were lacking clear “Gold Standard” Cookbook design example.

J Not tooting our own horn

Page 23: ChefConf 2015 - Chef Retrospective

Ugly at CommerceHub• Developers sometimes uninterested

in Chef/Ruby “Ops Work”

• Not establishing opinions early (TIMTOWTDI)

• Many small Environments

• Many teams solving the same problems*

Page 24: ChefConf 2015 - Chef Retrospective

Ugly at CommerceHub

• Resistance to Include Ops Eng work in timeframes

• Aligning People + Interest + Time/Opportunity/Dollars

• Berkshelf and Testing are late additions to Chef workflow

Page 25: ChefConf 2015 - Chef Retrospective

What do you call…A group of Wolves?

A group of Crows?

A group of Developers?

a Pack

a Murder

a Merge Conflict

Page 26: ChefConf 2015 - Chef Retrospective

W: Despite interest in Chef, there was significant resistance to adopt.

Page 27: ChefConf 2015 - Chef Retrospective

Why Resistant to Change?• You’re going to automate me out of a job

• I inherited the pile of crap, I don’t understand how it works, so if you break it I won’t be able to fix this.

• If it ain’t broke, don’t fix it. (or “I made this pile of crap. Don’t change it.”)

• Damn it Jim, I’m sys admin not a programmer.

• Used to Ops being invisible.

Page 28: ChefConf 2015 - Chef Retrospective

Resistant to Change

• “I’d just have to verify that it worked anyway.”

• Overemphasis on Standardization and Consensus.

• The people know the processes. They made them.

• “I don’t trust code.”

• “It’ll take longer to do the automation than the work.”

Page 29: ChefConf 2015 - Chef Retrospective

Friction• Status Quo

• Language

• Common Idioms

• “I have to learn Ruby?!”

• Analysis Paralysis

• Training, because Learning Curve

• “Windows Support*”

Page 30: ChefConf 2015 - Chef Retrospective

Friction

• Ops Engineering

• More used to the Former than the Latter

Page 31: ChefConf 2015 - Chef Retrospective
Page 32: ChefConf 2015 - Chef Retrospective

MistakesWereMade

J

Page 33: ChefConf 2015 - Chef Retrospective

What could we have done better?

• Lots of things

• Identify the goals of your org & make them:

• See the light

• Enter the light

• And shine

• Fight the Silver Bullet mentality

J

Page 34: ChefConf 2015 - Chef Retrospective

What could we have done better?

• Be more explicit about engineering effort involved. (It’s software engineering)

• Chef is powerful, but not always the best tool for the job.

• Identify as part of a skill and job promotion.

W:

Page 35: ChefConf 2015 - Chef Retrospective

What could we have done better?

• More Explicit about code-reviews.

• Be more opinionated early-on.

• Testing up-front.

W:

Page 36: ChefConf 2015 - Chef Retrospective

Wins

• Consistency

• No more snowflake hunts

• Mitigating environment differences

• Capacity additions made easy

• Facilitating Services split-outs

We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.

Page 37: ChefConf 2015 - Chef Retrospective

Wins

• Gateway drug to automation-addiction

• People Upgrades

• Bringing visibility of Operations work

• Reduction of “Works on my machine” rage

We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.

Page 38: ChefConf 2015 - Chef Retrospective

So, where does that leave us?

Page 39: ChefConf 2015 - Chef Retrospective

So here is where we have suggestions from Mugglesville.

Page 40: ChefConf 2015 - Chef Retrospective

Request #1: “Best Practices”

We’re often asked for “Best Practices”, but people see things like this. Their reaction is…

Page 41: ChefConf 2015 - Chef Retrospective

…If I try to be non-proscriptive, and give options, it can come across as wishy-washy….

Page 42: ChefConf 2015 - Chef Retrospective

Well…

…Trying to figure out what they need leads to exasperation…

Page 43: ChefConf 2015 - Chef Retrospective

…and sometimes they wonder if we know what we’re doing. Having strong feelings leads to the original problem when they see an opposing view. (ROLES STAHP)

Page 44: ChefConf 2015 - Chef Retrospective

Or you end up being wrong because they do something(s) unexpected.

Page 45: ChefConf 2015 - Chef Retrospective

Solution #1 “Recommended Practices”

• Present Options/Views of a subject (e.g. Roles)

• Explain pros & cons of the approach.

• “If your environment looks like ABC, this may make sense for you.”

• Reviewed periodically, and describe changes visibly.

So, let’s give it to them.

Page 46: ChefConf 2015 - Chef Retrospective

Request #2: “Where the [devops] did that value come from?”

Attributes. I love ‘em.

Page 47: ChefConf 2015 - Chef Retrospective

“Can you take a look at something?

I can’t figure out why the value isn’t $val.”

This is where I take them through the process of figuring out what values are being set, and where in the order they fit. This is time-consuming. And I often come down to showing them this:

Page 48: ChefConf 2015 - Chef Retrospective

https://docs.chef.io/attributes.html#attribute-precedence

I love this page. It gives new Chefs hives. 15 attribute levels. But you want to help, so you sit down.

Page 49: ChefConf 2015 - Chef Retrospective

And start digging… After awhile you can’t figure it out, when they say…

Page 50: ChefConf 2015 - Chef Retrospective

“Oh, I must have set this value on the node itself…”

Page 51: ChefConf 2015 - Chef Retrospective

“WHY WOULD YOU DO THAT?!” (I’d want to scream)

What I’d like to see is something like this:

Page 52: ChefConf 2015 - Chef Retrospective

Solution #2 `knife  (…)  inspect  (…)`

The process to determine what value is set. Let’s make it a little more verbose.

Page 53: ChefConf 2015 - Chef Retrospective

Request #3: Windows

Look, I love this community. And I honestly don’t hate Windows. But Chef-on-Windows has not been great this last 2 years.

Page 54: ChefConf 2015 - Chef Retrospective

Request #4: Versions on Roles & Envs

It’s time, no?

Page 55: ChefConf 2015 - Chef Retrospective

W: Finally, a Plea to Chef: Chef is not our job. Our priorities are not the same. Asking for empathy and patience, and we’ll give you the same.

Page 56: ChefConf 2015 - Chef Retrospective

Now, we’d like to end on a high-note…

Page 57: ChefConf 2015 - Chef Retrospective

Introducing

Sous Chef

https://github.com/commercehub-oss/sous_chef/

(not an official logo) Work of Larry Zarou, this is a cookbook to help you set up a cookbook-testing pipeline.

Page 58: ChefConf 2015 - Chef Retrospective

Introducing

Sous Chef

https://github.com/commercehub-oss/sous_chef/

Currently opinionated toward CHUB’s environment, but contribs welcome!

Page 59: ChefConf 2015 - Chef Retrospective

@joenuspl Workday

@gwaldo CommerceHub

Thank you!

Please rate in the app

Page 60: ChefConf 2015 - Chef Retrospective
Page 61: ChefConf 2015 - Chef Retrospective