[HN Gopher] Premortems will keep your code alive
       ___________________________________________________________________
        
       Premortems will keep your code alive
        
       Author : eschluntz
       Score  : 47 points
       Date   : 2021-10-13 16:58 UTC (6 hours ago)
        
 (HTM) web link (www.cobaltrobotics.com)
 (TXT) w3m dump (www.cobaltrobotics.com)
        
       | onion2k wrote:
       | _Before deploying new code, ask yourself and at least one other
       | person "If this is going to cause a major issue, what would it
       | be?"_
       | 
       | I love a premortem, but this isn't quite how I do them. I prefer
       | to start long before a deploy, before a feature hits sprint
       | planning. By asking my team "What could go wrong if we start
       | this?" we often uncover some of the unknowns that can derail our
       | engineering effort. It's particularly important to get the
       | quality assurance team involved that early too, by getting them
       | to think about how they'll test the feature - QA should be about
       | implementing processes to assure quality after all.
        
       | eschluntz wrote:
       | Author here! Sorry about the CSS crashing :p
       | 
       | Don't worry, our robots are hosted separately from our marketing
       | site.
        
       | cactus2093 wrote:
       | I like this idea and this framing of it. A way I've put it in the
       | past is that a postmortem is only really useful if you thought
       | that you were already doing everything you could to prevent an
       | issue of this type. In 95% of incidents I've experienced in my
       | career, this is just not the case. There was a known lack of
       | testing, the deadlines were too tight and we decided to push
       | through and take shortcuts to launch anyway, there was known tech
       | debt where everyone on the team was scared of this code but it
       | was too much work to address it so it kept getting put off, etc.
       | 
       | Often every person on the team could rattle off 5 major problems
       | off the top of their head. That's it, that's your postmortem
       | right there, let's go and prioritize solving these underlying
       | problems. But engineering orgs get obsessed with establishing a
       | sacred heavy-handed process around postmortems. There must be a
       | reviewer, and a chairperson, and a standardized form that takes
       | an hour or two to fill out, and several rounds of revisions. This
       | gives you a nice shiny metric to point to where you can say that
       | 100% of incidents were followed up with a postmortem. If any big
       | picture questions come up during the postmortem, well that's not
       | the kind of thing you can just make a jira ticket for, i.e.
       | "don't make the deadline so short next time". So we'll move right
       | past that one and then make one or two actionable jira tickets
       | like adding another view to the metrics dashboard which overfits
       | on this particular issue while doing nothing to address the
       | underlying problems, and then get back to work taking shortcuts
       | and building up more tech debt on a whole new set of features.
       | 
       | A premortem may end up with the same sort of perverse incentives,
       | but it's an interesting thing to try. If it's something the org
       | commits to actually putting aside time for ahead of every
       | deadline, I could even see it potentially helping a little bit to
       | address some of the underlying problems.
        
         | eschluntz wrote:
         | Yup, one of the things we care about is making the process as
         | painless as possible, so one one groans if someone says "let's
         | do a premortem".
         | 
         | For postmortems, there's always existing risks that people knew
         | about, but I think they're still helpful to identify _which_ of
         | those risks are actually causing the most pain and up their
         | priority.
        
         | acover wrote:
         | It depends how accurately you can predict failures.
         | 
         | Take SpaceX's early days, the actual cause of failure #2 wasn't
         | on anyone's top 10 list. Doing a post mortem has value, it lets
         | you see if the shortcuts you take make sense.
        
       | codetrotter wrote:
       | Page seems to not load CSS for me. Anyone else seeing that
       | happening?
       | 
       | Edit: Here's a snapshot of the page with seemingly no CSS loaded,
       | same as I was seeing https://archive.ph/qe2IW
        
         | eschluntz wrote:
         | Turned out to be our nginx cache fighting our wordpress cache.
         | 
         | Clearly should have done a premortem :p
        
         | aesyondu wrote:
         | Yep, getting 404 on the css files.
        
         | elpatokamo wrote:
         | Same for me. Tried with and without uBlock turned on. Getting
         | 404s for the css files
        
       ___________________________________________________________________
       (page generated 2021-10-13 23:01 UTC)