infrastructure migration

44
Infrastructure Migrations How many infrastructure migrations have I done? I’m not sure. I stopped counting around 5. One of the benefits of working for a small company that’s growing quickly is that you get to experience a lot of new things...and moving production and office environments is one of them. Thursday, August 2, 12

Upload: matt-simmons

Post on 30-Oct-2014

474 views

Category:

Technology


0 download

DESCRIPTION

Slideshow from a (nonrecorded) talk I gave at the Columbus, Ohio LOPSA chapter.

TRANSCRIPT

Page 1: Infrastructure Migration

InfrastructureMigrations

How many infrastructure migrations have I done? I’m not sure. I stopped counting around 5.

One of the benefits of working for a small company that’s growing quickly is that you get to experience a lot of new things...and moving production and office environments is one of them.

Thursday, August 2, 12

Page 2: Infrastructure Migration

I am: Matt Simmons

• 10+ year sysadmin

• Small infrastructures

• 6+ infrastructure migrations

• http://www.standalone-sysadmin.com

You probably know this...

Thursday, August 2, 12

Page 3: Infrastructure Migration

This is:

InfrastructureMigrations

Thursday, August 2, 12

Page 4: Infrastructure Migration

10,000ft view

• Pre-Planning

• Execution

• Post-Mortem

Like most things, 90% of the work is planning.

The other 90% is lifting heavy things.

There’s another 10-25% reserved for figuring out what went wrong, and determining how to make it not happen again.

Thursday, August 2, 12

Page 5: Infrastructure Migration

Considerations:Types of Migrations

• Build in parallel

• Move Infrastructure

• Hybrid

You really, really want to build in parallel. Sure it’s expensive, but it means much, much shorter periods of downtime.

Moving an infrastructure is hair-raising, because there are only a few million things that can go wrong.

And you don’t know scary until you’re driving a U-Haul full of servers across the Pennsylvania Turnpike in the middle of a rainstorm.

Most people will probably end up doing hybrid migrations, where you build some of the new infrastructure, then migrate some from the existing setup.

Watch out for things like IP addressing issues, and that you’ve made the correct assumptions about rack space and power requirements for the machines that are moving.

Thursday, August 2, 12

Page 6: Infrastructure Migration

Considerations:

• Downtime Limits

• Uptime Requirements

• Service Window Length

You might have a maintenance window, where downtime is planned and doesn’t count against your SLAs. If your migration can fit within this, awesome (hint: it can’t.)

So you need to figure out what kind of downtime you can afford, and remember to schedule notices to your customers far enough in advance so that they aren’t taken by surprise.

Strangely enough, downtime limits and uptime requirements aren’t the same.

Figure out what your uptime limits are according to your user base’s expectations, then figure out how much infrastructure needs to be running in order to accommodate that. Good luck.

Thursday, August 2, 12

Page 7: Infrastructure Migration

Considerations:

Upstream Network Changes

I think I could do an entire presentation where I just list all of the problems that could happen when network providers screw things up.

Big ones to watch out for:

1. Is the test and turn-up date early enough so that inevitable failures don’t impact the go-live date?

2. Is the circuit exactly what you ordered, and is what you ordered exactly what you need?

3. Are cross-connects in the datacenter ordered, and is the datacenter networking team working with the provider?

Thursday, August 2, 12

Page 8: Infrastructure Migration

Considerations:

(Wo)man Power

You can’t lift all of the things you own.

You need friends to come help you move, right? And you usually pay them beer and pizza for the effort.

Moving infrastructures is kind of like that, except “money” typically substitutes for beer and pizza, and you want to find people who are reasonably smart, because you probably don’t own anything in your apartment that costs as much as a high performance RAID array.

Figure out how many people you need, then add 20% to cover the stuff you didn’t think of.

Have another 10% at home ready to come in if the need arises.

Thursday, August 2, 12

Page 9: Infrastructure Migration

Considerations:

How can we parallelize the work?

If you have teams, having them all work independently but simultaneously is important, so try not to have one team waiting around on the result of another team. This is no different than removing bottlenecks from a computing infrastructure.

Thursday, August 2, 12

Page 10: Infrastructure Migration

Establishing a Plan

Documentation shall set you free!

Thursday, August 2, 12

Page 11: Infrastructure Migration

Build a checklist

• What needs to be done

• By whom?

• Where?

• In what order?

Every good plan includes a checklist

Thursday, August 2, 12

Page 12: Infrastructure Migration

Build a checklist

• Off site prior

• On site prior

• On site during

• On site after

• Testing

• Signoff

Include all phases

Off site things before moves are usually slow processes or long-term changes that rely on TTLs or human interaction outside of your organization.

Thursday, August 2, 12

Page 13: Infrastructure Migration

Build a checklist

Establish Dependencies

If item 23 relies on item 24 being done, then it’s probably in the wrong place...

Figuring out all of these dependencies is like untangling a knot. It’s slow, it’s difficult, and when you’re done, no one seems to be as appreciative of your hard work as you are.

Thursday, August 2, 12

Page 14: Infrastructure Migration

Build a checklist

Build in checkpoints

Checkpoints are a great place to stop all the teams at the same time and make sure that everyone’s on the same page.

Thursday, August 2, 12

Page 15: Infrastructure Migration

Build a checklist

Include communication up-stream

Overcommunicate.

Keep your boss informed.

Keep your stakeholders informed.

If you have the kind of work environment where your users care, keep them informed.

Thursday, August 2, 12

Page 16: Infrastructure Migration

Build a checklist

• Per team?

• Per location?

• Per person?

Multiple Checklists

If you’ve got multiple teams, you are likely to need multiple checklists.

Ditto if your locations are farther apart.

If each person’s tasks are complicated, give each person an individual checklist, too.

Thursday, August 2, 12

Page 17: Infrastructure Migration

Build a checklist

Schedule Breaks

Breaks are SO important.

You can’t work for 8 hours without stopping to rest, physically or mentally. Put these into the schedule.

Thursday, August 2, 12

Page 18: Infrastructure Migration

Change Management Techniques

Establish tests for complicated steps(or groups)

Would you build a new server then put it into production without testing it?

Of course not.

Build tests to see if your work so far is correct. It can be as simple as “at this point, LED 7, 8, and 9 should be green, and LED 10 should be amber”.

Thursday, August 2, 12

Page 19: Infrastructure Migration

Change Management Techniques

Establish roll-back procedures

Things happen. Stuff doesn’t always go right.

Make sure your plan includes when to roll-back and what steps to take to do it.

Thursday, August 2, 12

Page 20: Infrastructure Migration

Change Management Techniques

Establish failure guidelines

(What happens if...)

• ...a machine breaks?

• ...a router doesn’t boot?

• ...?

Failures are inevitable. Unhandled failures are unnecessary though.

Know how to tell if something has failed, and know what to do about it.

Thursday, August 2, 12

Page 21: Infrastructure Migration

Identify Goods & Services to be Purchased

• Cables of specific lengths, connectors, label tape, velcro, rack shelves, etc

• Servers, routers, firmwares, licenses, etc

• Circuits, bandwidth, accounts, etc

These kinds of steps require a lot of planning, but more planning just makes the end result better.

Thursday, August 2, 12

Page 22: Infrastructure Migration

Maintain Communications

• Cellphones

• (at least one per team)

• 2-way radios

• (for lack of cellular service)

• Probably not IP phones

Cell reception in datacenters is spotty. Using handheld 2-way radios is much more reliable.

Don’t rely on your IP phone infrastructure for critical communications during network outages.

Just don’t.

Thursday, August 2, 12

Page 23: Infrastructure Migration

Find Warm Bodies

Figure out how many people you need.

Add 20% for good measure

Have 10% standing by

Thursday, August 2, 12

Page 24: Infrastructure Migration

Establish Roles

• Zone

• Man to Man

• Point Guard

Zone: “Your job is to stay at this rack, pulling things out in the order prescribed by the checklist, and to load them on the cart once removed”

Man to Man: “Your job is to cart these servers to the truck, and once the number of servers in the truck matches the number prescribed by the checklist, to drive the truck to the new datacenter, and assist in loading the servers onto the cart for the next zone man”

Point Guard: “Your job is to act as the communications hub, the person to verify that check points happen on schedule, and that things are correct, as well as to finalize sign-off and hand-off once we’re done”

...and so on, as required by your migration.

Thursday, August 2, 12

Page 25: Infrastructure Migration

Communicatethe plan

Default to being too communicative

Have your point guard annoy people with the number of email updates.

Thursday, August 2, 12

Page 26: Infrastructure Migration

Communicatethe plan

Get clearance from the stake-holdersBefore ever starting work, make sure that everyone is on board with the migration plan, and that everyone has agreed and signed off.

Thursday, August 2, 12

Page 27: Infrastructure Migration

Communicatethe plan

Alert users multiple times

• Well in advance

• A week before

• Immediately before

(so long term projects aren’t scheduled)

(so short-term pushes aren’t interrupted)

(so last minute issues don’t compound)

Thursday, August 2, 12

Page 28: Infrastructure Migration

Communicatethe plan

Give everyone the information they need

• Checklists

• Plan document

• Contact Information

...and has signed off on it

I actually got to the point where every person involved in the migration got a personalized envelope.

The contents were the checklist relevant to their job, the diagrams of what the rack looked like before, what the new racks were supposed to look like, and the contact information for all of the other team members.

Thursday, August 2, 12

Page 29: Infrastructure Migration

Executing the planI love it when a plan comes together...

Thursday, August 2, 12

Page 30: Infrastructure Migration

Executing the plan

Verify all goods were purchased

Doing inventory sucks, but not having enough ethernet cables that reach to the switch sucks more...

Thursday, August 2, 12

Page 31: Infrastructure Migration

Executing the plan

Clear personal schedules

“oh, that was this weekend? Crap, man, I’m sorry. I have to go drink beer with my other friends and have a good weekend. Maybe next time, brah”

Thursday, August 2, 12

Page 32: Infrastructure Migration

Executing the plan

Complete off-site checklist items

Verify that everyone at both sites knows what’s happening, when, and is on board. Make sure the datacenter has people on hand to help who are capable of helping.

Thursday, August 2, 12

Page 33: Infrastructure Migration

Executing the plan

Show up early

,,,because something won’t be right.

Thursday, August 2, 12

Page 34: Infrastructure Migration

Executing the plan

Verify assigned roles

Ask for questions

...and ask each person.

Make sure that they know how to get ahold of you and the point guard.

Thursday, August 2, 12

Page 35: Infrastructure Migration

Executing the plan

Step through the list

Thursday, August 2, 12

Page 36: Infrastructure Migration

Executing the plan

Verify completeness with each team

Thursday, August 2, 12

Page 37: Infrastructure Migration

Executing the plan

Perform on-site and off-site post-complete items

Thursday, August 2, 12

Page 38: Infrastructure Migration

Executing the plan

Go have a beer.

Seriously, celebrate completing the task with the team. I didn’t always get to do this, and I’m still sorry about it today.

Thursday, August 2, 12

Page 39: Infrastructure Migration

Executing the plan

Complete post-mortem according to schedule

During the next work-week, complete the post-mortem and identify what went wrong as well as what went right.

You can’t replicate success and eliminate failure unless you identify them.

Thursday, August 2, 12

Page 40: Infrastructure Migration

Dealing with problemsYes, you will have problems...

Thursday, August 2, 12

Page 41: Infrastructure Migration

Dealing with problems

Problems are inevitable

(It’s not “if”, it’s “when”)

During my talk, I gave far more discussion on this topic than I’m going to give here.

Two big take-aways:

1) Problems are inevitable because they are a condition of the infrastructure, and they arise from its inherent complexity.

2) It’s not possible to eliminate all failures, but it’s desirable to minimize them, and to try to eliminate repeating the same failure by improving the process and design.

Read “The Field Guide to Understanding Human Error” by Sydney Dekker

http://amzn.to/QFpcqY

Thursday, August 2, 12

Page 42: Infrastructure Migration

Dealing with problems

• Identify & Acknowledge the problem

• Don’t punish the reporter

• Follow the failure guidelines

• Roll-back if necessary & reschedule

Thursday, August 2, 12

Page 43: Infrastructure Migration

Post-mortem

• What went wrong?

• Why?

• The ‘Five Whys’

• What went right?

• What have we learned?

Thursday, August 2, 12

Page 44: Infrastructure Migration

InfrastructureMigrations

Thanks for your time.

I hope you were able to get something out of it.

If you have questions, feel free to contact me

@[email protected]

Thursday, August 2, 12