off-hours critical issue escalation

57
Critical Issue Escalation: Our Process Evan Hamilton Head of Community, UserVoice

Upload: evan-hamilton

Post on 20-Aug-2015

4.050 views

Category:

Business


0 download

TRANSCRIPT

Page 1: Off-Hours Critical Issue Escalation

Critical Issue Escalation: Our Process

Evan HamiltonHead of Community, UserVoice

Page 2: Off-Hours Critical Issue Escalation

OMG EVERYTHING IS BROKEN

Page 3: Off-Hours Critical Issue Escalation

Why do we need a process here?

Page 4: Off-Hours Critical Issue Escalation

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

Page 5: Off-Hours Critical Issue Escalation

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

Page 6: Off-Hours Critical Issue Escalation

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

• We don’t work 24/7

Page 7: Off-Hours Critical Issue Escalation

Why do we need a process here?

• Our customers need a working product (or they’ll leave)

• When things go wrong without a plan, chaos ensues

• We don’t work 24/7

• We don’t want to wake everyone up every time there is an issue

Page 8: Off-Hours Critical Issue Escalation

So what is a critical issue?

Page 9: Off-Hours Critical Issue Escalation

So what is a critical issue?Work hours:• Interrupting core functionality (Ex: settings not saving consistently)

• OR losing/corrupting data

• OR serious consequences (Ex: loss of a major account)

• AND can be reproduced (or has been reported by enough people that it must be happening)

Page 10: Off-Hours Critical Issue Escalation

So what is a critical issue?Work hours:• Interrupting core functionality (Ex: settings not saving consistently)

• OR losing/corrupting data

• OR serious consequences (Ex: loss of a major account)

• AND can be reproduced (or has been reported by enough people that it must be happening)

Off hours:• Blocking core functionality (Ex: can’t access feature)

• AND affecting multiple people

• OR losing/corrupting data

• AND can be reproduced (or has been reported by enough people that it must be happening)

Page 11: Off-Hours Critical Issue Escalation

Step 0: Spot the Issue

Page 12: Off-Hours Critical Issue Escalation

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Page 13: Off-Hours Critical Issue Escalation

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Social MediaCommunity team monitors all day, and at least twice each evening. If any of the team is unavailable for an extended period of time, they’ll potentially deputize someone from

Support.

Page 14: Off-Hours Critical Issue Escalation

Ticket QueueSupport team monitors all day, and at least twice each evening. If any of the team will be unavailable for an

extended period of time, they’ll deputize someone from Sales or Community.

Social MediaCommunity team monitors all day, and at least twice each evening. If any of the team is unavailable for an extended period of time, they’ll potentially deputize someone from

Support.

The Rest of the TeamThey may not be on the Customer Team, but if they see a

critical issue, it’s their responsibility to report it.

Page 15: Off-Hours Critical Issue Escalation

Step 1: Create a Trello bug.

Issue history FTW. Ad-hoc communication FTL.

Page 16: Off-Hours Critical Issue Escalation

Step 2: Contact a Developer

Page 17: Off-Hours Critical Issue Escalation

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

Page 18: Off-Hours Critical Issue Escalation

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

Page 19: Off-Hours Critical Issue Escalation

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

• Off hours? Call them, don’t text or chat or email.

Page 20: Off-Hours Critical Issue Escalation

• Choose the relevant product area (Systems, Front-End, or Code) & contact the dev at the top of that list.

• Office hours? Ping them in HipChat.

• Off hours? Call them, don’t text or chat or email.

• If they don’t respond to 2 pings within 10 minutes, move down the list.

Page 21: Off-Hours Critical Issue Escalation

Dev Escalation List• System dev: Kevin• App devs: Jonathan, Mark, Austin, Joey, Raimo, Rich• Interface devs: Joshua, John, Brad, Rich

For System Issues (site is down/slow, emails don’t work): contact system + app dev

For Interface Issues (the interface looks broken, won’t work, etc): contact interface + app dev

For all other issues: contact app dev

Page 22: Off-Hours Critical Issue Escalation

Devs: did you get the call? Then:

Page 23: Off-Hours Critical Issue Escalation

Devs: did you get the call? Then:

• Respond affirmatively to the person who contacted you

Page 24: Off-Hours Critical Issue Escalation

Devs: did you get the call? Then:

• Respond affirmatively to the person who contacted you

• Join the Engineering room on HipChat and let others know someone is working on it

Page 25: Off-Hours Critical Issue Escalation

NO additional customer team members should be communicating with the dev solving the problem – only the one who first reported it.

More voices confuse and distract.

Page 26: Off-Hours Critical Issue Escalation

Step 3:Inform the Customer

Team

Page 27: Off-Hours Critical Issue Escalation

Email the whole customer team so they know about the issue (and that you’re working with the devs)

Page 28: Off-Hours Critical Issue Escalation

Email the whole customer team so they know about the issue (and that you’re working with the devs)

Is it work hours? Also @all everyone in the Support room on HipChat

Page 29: Off-Hours Critical Issue Escalation

Step 4:Is it super-critical?

Page 30: Off-Hours Critical Issue Escalation

Ask Developer (before they fix the

bug):

Page 31: Off-Hours Critical Issue Escalation

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

Page 32: Off-Hours Critical Issue Escalation

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

Page 33: Off-Hours Critical Issue Escalation

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

• (If they’re too busy fixing it, consider calling in a second dev)

Page 34: Off-Hours Critical Issue Escalation

Ask Developer (before they fix the

bug):

• Roughly how many people might this be affecting?

• Roughly what issues might this be causing?

• (If they’re too busy fixing it, consider calling in a second dev)

• Customer team: it’s your job to ensure this happens

Page 35: Off-Hours Critical Issue Escalation

How do I know if it’s Super-Critical?

• Does this affect more than 20% of accounts?

• Is this a very frustrating or visible bug (vs just an annoyance)?

*This is somewhat arbitrary.

Page 36: Off-Hours Critical Issue Escalation

If Super-Critical:

• Call the Head of Support & Head of Community

• Community Department should tweet about the issue (make sure to reschedule any other tweets - “check out our blog” would be an unfortunate tweet during an outage)

• Leave a maximum of 30m between any public messages about critical bugs and 15m between public messages about downtime

• DO NOT suggest a timeframe (it may change)

• DO NOT talk about the cause (you may be wrong)

• Going to require a long fix? Publish a blog post & Facebook status too

Page 37: Off-Hours Critical Issue Escalation

Step 5: Respond to Issues

Page 38: Off-Hours Critical Issue Escalation

Who answers what?

Work hours?Community handles social media, Support handles tickets.

*This is somewhat arbitrary.

Page 39: Off-Hours Critical Issue Escalation

Who answers what?

Work hours?Community handles social media, Support handles tickets.

Off hours?Support handles both (but call in backup if needed).

*This is somewhat arbitrary.

Page 40: Off-Hours Critical Issue Escalation

Who answers what?

Work hours?Community handles social media, Support handles tickets.

Off hours?Support handles both (but call in backup if needed).

-Regardless, make sure you’re in the Support room in HipChat so you can be communicating with the team-

*This is somewhat arbitrary.

Page 41: Off-Hours Critical Issue Escalation

Step 6: Solve and Verify

Page 42: Off-Hours Critical Issue Escalation

• Dev should fix the issue (duh).

Page 43: Off-Hours Critical Issue Escalation

• Dev should fix the issue (duh).

• Dev should verify that the fix will stick (may require calling in a second dev)

Page 44: Off-Hours Critical Issue Escalation

• Dev should fix the issue (duh).

• Dev should verify that the fix will stick (may require calling in a second dev)

• Customer Team member should also verify that issues are resolved

Page 45: Off-Hours Critical Issue Escalation

Step 5: Report Damage and

Close the Loop(the 7 questions)

Page 46: Off-Hours Critical Issue Escalation

Dev should answer these questions for the Customer

Team member:1. What did our customers experience? (Please be explicit: don’t just say what was broken, explain the experience our customers would have had when trying to accomplish this task.)

2. How many/which customers were affected?

2. When did this issue start? When was it resolved?

3. What caused it?

3. What are we doing to avoid it in the future?

4. What are the chances that there will be related issues in the short-term future?

• What was the damage (data, accounts, etc)?

Page 47: Off-Hours Critical Issue Escalation

The Loop-Closing:

Page 48: Off-Hours Critical Issue Escalation

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

Page 49: Off-Hours Critical Issue Escalation

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

Page 50: Off-Hours Critical Issue Escalation

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

3.If mass communication occurred, publish announcement of the fix to those channels

Page 51: Off-Hours Critical Issue Escalation

The Loop-Closing:1.Entire Engineering Team and Customer Team should be sent these answers so they’re clued in

2.Customer Team member should follow up with all customers who reported the issue*

3.If mass communication occurred, publish announcement of the fix to those channels

*And give the one who reported it a discount for their next month of billing!

Page 52: Off-Hours Critical Issue Escalation

Post-issue Communication: should we blog about it?

Page 53: Off-Hours Critical Issue Escalation

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

Page 54: Off-Hours Critical Issue Escalation

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

• (If it only affected a small # of accounts, email them)

Page 55: Off-Hours Critical Issue Escalation

Post-issue Communication: should we blog about it?

• The litmus test: would I be angry if I experienced this and then heard nothing? Then blog.

• (If it only affected a small # of accounts, email them)

• If extremely severe, consider reimbursements as well.

Page 56: Off-Hours Critical Issue Escalation

Hooray, we’ve saved the day!

Page 57: Off-Hours Critical Issue Escalation

Evan Hamilton@[email protected] content at http//:community.uservoice.com