[HN Gopher] On the Term "Blameless Postmortem"
___________________________________________________________________
On the Term "Blameless Postmortem"
Author : thcipriani
Score : 20 points
Date : 2022-01-27 19:51 UTC (3 hours ago)
(HTM) web link (tylercipriani.com)
(TXT) w3m dump (tylercipriani.com)
| SpicyLemonZest wrote:
| I like the idea here, but in my experience the biggest practical
| issue with postmortems is getting people to actually do them. A
| heavy term serves as a reminder that it's an important
| investigation, it has to get done, we can't just put it off until
| it fits conveniently into the schedule. I worry whether a
| lighter-sounding term would make it easier for people to work on
| their projects first and delay post-incident investigations
| indefinitely.
| schmatz wrote:
| I like the term "Incident report".
| marcosdumay wrote:
| Oh, God. If you believe "disquisition" carries less negative
| connotations than "blameless postmortem", you completely failed
| at reading the audience.
|
| Aviation uses the word "investigation", by the way. But they can
| only omit the "blameless" part because there are very strong
| guarantees that it will be blameless.
| isleyaardvark wrote:
| I prefer "retrospective", which doesn't sound like a police
| investigation or bring to mind airplane crashes.
|
| It's also easier to do those on a weekly basis, so there's less
| of a Pavlovian association of "bad thing happens" then "synonym
| for postmortem happens".
| rfreiberger wrote:
| The name needs to change but also the attitude that as engineers,
| we build complex systems and assume everyone has the knowledge
| how to use it. A few world wide outages I've been a part of was
| caused by a task runner which didn't lint the command and allowed
| a broken bash one-liner to be executed across every system in
| parallel.
|
| Yes, it's a simple mistake but how was a system allowed access to
| our global environment that this edge case was never calculated?
| In many of the meetings, the common issue is communication even
| between co-workers on the same team, and between internal
| platform providers. One case was an outage on the storage backend
| and realized after a long meeting that the internal SLA was much
| greater than we expected (and which the systems would timeout).
| It only worked for so long as storage utilization was extremely
| low.
| zippergz wrote:
| You can't debate the value of the term without considering the
| conditions that led to it. Why would someone call a postmortem
| "blameless"? Because (in some companies) there absolutely was a
| culture of blame, which made people less forthcoming and
| thoughtful about the causes of incidents, which limited the
| potential learnings. This term was not pulled out of thin air, or
| built upon some imaginary possible blame. It was designed to
| explicitly remove blame that was already present in the culture.
| fragmede wrote:
| In particular, firing the dev that wrote the buggy code or the
| SRE that pushed the bad change is an obvious management
| reaction that "blameless postmortem" seeks to redress. I'm
| happy for OP that they've never worked somewhere that toxic but
| those places absolutely exist.
___________________________________________________________________
(page generated 2022-01-27 23:02 UTC)