![Page 1: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/1.jpg)
Plan the WorkWork the Plan
Postmortem Action Items: Follow-up and Burndown
![Page 2: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/2.jpg)
Postmortems 101
Postmortems are great! And necessary!
BUT… what about all those follow-up action items that still haven't been resolved months after the fact?
![Page 3: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/3.jpg)
Confidential + Proprietaryhttp://www.nasa.gov/mission_pages/swift/bursts/shredded-star.html#.UnymcnWvyCw
![Page 4: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/4.jpg)
Antipattern 1: Unbalanced AI plan
vs.
https://commons.wikimedia.org/wiki/File:Scaffolding_on_Princes_Gate.jpg
https://pixabay.com/en/band-aid-first-aids-injury-24298/
![Page 5: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/5.jpg)
Solution: Balance your action item plan
https://commons.wikimedia.org/wiki/File:A_dog_plays_on_a_seesaw_with_children_in_Scotland,.jpg
![Page 6: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/6.jpg)
Antipattern 2: Only fixing symptoms
https://pixabay.com/en/photos/thermometer/
![Page 7: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/7.jpg)
https://pixabay.com/en/photos/winter/?image_type=vector&cat=nature
Solution: Address the problem at the root level
![Page 8: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/8.jpg)
Antipattern 3: Humans as root cause
http://publicdomainvectors.org/tr/bedava-vektor/K%C4%B1rm%C4%B1z%C4%B1-i%C5%9Faret-eden-bir-ele/36212.html
![Page 9: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/9.jpg)
Reliability =
f ( , , ) ,
Solution: Remove the ability for humans to introduce errors
https://cdn.pixabay.com/photo/2013/07/12/17/12/happy-151793_960_720.png https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Linecons_database.svg/600px-Linecons_database.svg.png https://cdn.pixabay.com/photo/2013/07/12/17/46/geometry-152406_960_720.png https://cdn.pixabay.com/photo/2013/07/12/12/34/server-145957_960_720.png
![Page 10: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/10.jpg)
Antipattern 4: Not thinking beyond prevention
https://pixabay.com/en/domino-hand-stop-corruption-665547/
![Page 11: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/11.jpg)
Solution: Consider the entire timeline of the incident
HitsProduction
Diagnose, Triage, Mitigate
Mitigate ResolveDetect
Detection
Incident Duration
Root Cause
Diagnose, Triage, Mitigate
Mitigate ResolveDetect
Detection
Incident Duration
Improve Diagnosis & Triage Improve Detection
![Page 12: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/12.jpg)
Transforming dysfunction to function
![Page 13: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/13.jpg)
Best Practice 1: Prioritize and classify the work
Sprint 2
Sprint 1
Sprint 3
Sprint 4
High Priority Postmortem Action item
![Page 14: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/14.jpg)
Best Practice 2: Executive focus
https://en.wikipedia.org/wiki/Grace_Hopper
![Page 15: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/15.jpg)
To our users, a postmortem without subsequent action is indistinguishable from no postmortem.
Therefore, all postmortems which follow a user-affecting outage must have at least one P[01] bug associated with them. I personally review exceptions. There are very few exceptions.
Executive focus: Ben Treynor Sloss
![Page 16: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/16.jpg)
Best Practice 3: Postmortem reviews and reports
Example AI Review Checklist
❐ Realistic?
❐ Repeat incident prevention?
❐ Resolution time improvements?
❐ Automation Opportunities?
❐ Added to the project plan?
https://pixabay.com/en/check-mark-tick-mark-check-correct-1292787/
![Page 17: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/17.jpg)
Reports: AIs open by priority
Total Critical High Medium Low Trival
![Page 18: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/18.jpg)
Reports: AI age
![Page 19: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/19.jpg)
Reports: AI debt buildup
![Page 20: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/20.jpg)
In sum: Every postmortem should have
● A balanced action item plan● Concrete and actionable follow-up
![Page 21: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/21.jpg)
Caveat: Specificity of Google
Modify these recommendations for:
● A much smaller organization● Downtime-intolerant services● Downtime-tolerant services
![Page 22: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/22.jpg)
Further Resources
● USENIX ;login: Article● Postmortem Culture: Learning from
Failure (SRE Book)● Handout: PM AI checklist
![Page 23: Plan the Work Work the Plan - USENIX · Postmortems 101 Postmortems are great! And necessary! BUT… what about all those follow-up action items that still haven't been resolved months](https://reader034.vdocuments.net/reader034/viewer/2022042313/5edf3e87ad6a402d666a97c8/html5/thumbnails/23.jpg)
Thank You!