[HN Gopher] It's not your fault
       ___________________________________________________________________
        
       It's not your fault
        
       Author : thcipriani
       Score  : 55 points
       Date   : 2022-04-03 09:45 UTC (13 hours ago)
        
 (HTM) web link (www.kostaharlan.net)
 (TXT) w3m dump (www.kostaharlan.net)
        
       | alexashka wrote:
       | > breaking prod is not your fault. Put differently: breaking prod
       | is a systems failure, not an individual one.
       | 
       | False dichotomy.
        
       | blakesterz wrote:
       | I feel like, for me at least, being able to say "That was my
       | fault" makes me remember what I did. It makes me document that
       | failure. And makes me tell others to avoid what I did.
       | 
       | I've been training a JR lately and I will always say "Here's all
       | the things I did wrong when I did this, so don't do this things"
       | 
       | If I did it, he could easily do it, and if we can all avoid my
       | mistakes, so much the better.
       | 
       | It WAS my fault quite a few times, and I'm ok with that. And
       | luckily, everyone else around here is too. I'd hate to work
       | somewhere that punishes honest mistakes. (there are limits, of
       | course)
        
       | hprotagonist wrote:
       | perhaps the best example i can think of was the gitlab incident.
       | 
       | and if we're honest it's because the engineer's name is Yorick.
        
       | _aleph2c_ wrote:
       | This may be true, but when things break people will look for a
       | scapegoat. So when things break, and you are mostly-responsible
       | for initiating the failure, use collective language ("we" and not
       | "I"), frame the failure as a systems failure when you are talking
       | to management or the executives, look cool even if you are
       | feeling stressed out. Manage the narrative! Sure, you flipped the
       | switch or whatever, but try and survive the event. Just because
       | you think its a system's failure doesn't mean other people share
       | this belief, don't volunteer to be thrown off the bus.
        
         | ChrisMarshallNY wrote:
         | In my experience, working at a "classic" Japanese engineering
         | firm, scapegoating was discouraged.
         | 
         | During postmortems, we would often decide something like "Chris
         | made an erroneous assumption that the fix introduced no bugs."
         | (That's a classic "oldtimer" mistake, BTW. I make it all the
         | time -I'm a slow learner).
         | 
         | Absolutely no blame would be affixed. It was really important
         | for Chris (that's me) to assume Responsibility for the error,
         | and the team would develop a solution.
         | 
         | This being a Japanese company, of course, said "solution"
         | usually ended up being another punchlist item, like "Perform
         | complete regression tests for even the smallest bug fix
         | release," etc.
         | 
         | I'm not thrilled with people using "hero programmer syndrome,"
         | or "bus factor" as an excuse to write naive or deliberately
         | dumbed-down code, though.
         | 
         | Sometimes, a program needs to be maintained by skilled,
         | experienced, well-paid, and motivated people. If a company
         | insists on developing code, using advanced techniques, then
         | turning over maintenance to junior staff, or do a bad job,
         | writing a program, because they want it to be maintained by the
         | absolute cheapest programmers possible, that's a problem.
        
       | _3u10 wrote:
       | It being your fault is a good way to go through life.
       | 
       | It focuses your mind on what you could do to avoid those
       | situations in the first place.
       | 
       | If it's your fault prod broke you can fix the process or you can
       | look for a new job where the process is already fixed. Or you can
       | find a role that doesn't involve pushing code to prod, maybe in
       | R&D, etc.
        
         | ivraatiems wrote:
         | It being your fault in the sense of "you know you made a
         | mistake and you're committed to remedying that mistake" is
         | fine. What we don't need is "you know you made a mistake and
         | now you must endure abuse and have your job threatened over a
         | mistake."
        
           | _3u10 wrote:
           | Yeah shitty managers exist. I was a lead when one of the
           | engineers shipped a debug build that made it past App Store
           | review. (Our debug builds were obvious). My manager says Mike
           | (name changed) isn't cutting releases anymore.
           | 
           | I say Mike is cutting releases because he's now the one
           | person I trust on the team to not fuck it up.
           | 
           | If you need it in writing so you can fire me if mike fucks it
           | up, let me know.
           | 
           | Manager mike and I all cut the next release at Mikes
           | workstation with him knowing my ass was on the line if we
           | shipped another debug build.
           | 
           | Mike never shipped another debug build.
        
             | rdtwo wrote:
             | Only works if Mike has skin in the game. If every time Mike
             | cuts a debug build we blame the process then mikes never
             | too worried about it
        
               | _3u10 wrote:
               | True. Most people have no skin in the game.
        
               | rdtwo wrote:
               | Sometimes In big corporate they don't and can Keep
               | screwing up
        
             | drewcoo wrote:
             | Did nobody say "fix the process" or "make 'cutting a build'
             | automated with tests so that a human can't 'fuck it up'" or
             | "have Mike drive that automation?" Because that would be a
             | blameless approach.
        
         | kayodelycaon wrote:
         | It depends what you mean by fault.
         | 
         | If you mean hold yourself responsible for doing the best you
         | can and learning from mistakes, then I fully agree.
         | 
         | The issue is fault can also mean carrying guilt with you and
         | continuing to be blamed for it. This is not helpful once you've
         | learned the lessons you needed.
        
       | slibhb wrote:
       | The question isn't whether it's your fault, it's whether you take
       | responsibility for it. If no one takes responsibility for
       | anything then you get nowhere. And if you take responsibility for
       | it, it's your fault if it goes wrong.
       | 
       | > "Hero Programmer" is a derogatory name for a programmer who
       | chooses to fix problems in epic, caffeine-fueled 36-hour coding
       | sessions that frequently just kick the can down the road to the
       | next heroic 36-hour coding blitz. Hero programmers would rather
       | react than plan. Projects with hero programmers working on them
       | often make a lot of progress initially, but never arrive at a
       | stable state of completion
       | 
       | Maybe there are workplaces where people get together to
       | collaborate on a design and then break the design down into tasks
       | and assign those tasks to programmers to implement. Maybe this
       | process is performed until the project is done. Maybe. But I've
       | never seen it. I see people taking responsibility for small and
       | large tasks, and the large ones sometimes involve a single person
       | re-implementing entire systems spread across thousands of files
       | (though not necessarily in "36-hour coding blitzes").
        
         | marginalia_nu wrote:
         | Honestly that whole hero programmer bit seems like a bit of a
         | strawman. What's being described sounds like a talented but
         | inexperienced developer (which doesn't necessarily mean they
         | are young or fresh to the field; some people manage to stay
         | beginners for decades). Doesn't mean there aren't highly
         | talented developers that can get a lot of work done in a short
         | amount of time if you let them.
         | 
         | The failure in that case is not having a more senior developer
         | mentor the kid.
        
         | darkerside wrote:
         | Agreed.
         | 
         | > Projects with hero programmers working on them often make a
         | lot of progress initially, but never arrive at a stable state
         | of completion
         | 
         | No project is ever really finished except the ones nobody cares
         | about. Probably because they stopped being maintained by the
         | hero programmer.
        
           | watwut wrote:
           | This is not true. We and I had finished tons of projects.
           | They are done in the true sense. We are OK with their state
           | and they rarely ever change. They work.
           | 
           | We moved on other projects.
        
       | ozzythecat wrote:
       | > When you look at things from this lens, all the successes of a
       | website, an application, or an organization flow from the talents
       | and genius of a few individuals. It's a compelling outlook
       | because, well, empirically it can definitely appear this way, and
       | it's naturally aligned with the other dominant societal ideas we
       | have about individuality.
       | 
       | In many large organizations, much of the success comes from the
       | foresight, insight, and hard "work" comes from a few benefitting
       | many. The reality is, it _is_ individuals and not some collective
       | group or  "teams".
        
       | ivraatiems wrote:
       | I have some paradoxical feelings about "blameless" retro culture
       | that I'll try to sum up.
       | 
       | In general, I'm in favor of the approach. I don't think singling
       | people out and bullying or shaming them for their mistakes ever
       | works. I think most well-intentioned engineers will already beat
       | themselves up plenty for making a serious mistake, and they don't
       | need any encouragement to do so. I know I do.
       | 
       | On the other hand, there is a red line. At a place I worked, a
       | DBA was let go after he repeatedly brought production down for 45
       | minutes to an hour at a time by running intensive queries of his
       | own design for data-gathering, in some cases, after being
       | explicitly told not to do that against the prod database. This
       | was a person whose job description required him to have access to
       | prod.
       | 
       | There were process problems, maybe - being allowed to run
       | whatever queries you want on production under your own authority,
       | sure - but his cavalier attitude towards a production environment
       | was still unacceptable. Process can only help when people are
       | well-intentioned and doing their best; if people are malicious or
       | negligent or just not good at their jobs, adding more process to
       | get around that only makes things worse.
        
         | mateo411 wrote:
         | It seems like a read replica would have helped out in this
         | instance.
         | 
         | I agree if somebody decides to keep doing the same actions
         | after being told not do to them, because their actions would
         | bring down production, and their actions do bring down
         | production, then they should be held accountable.
        
         | benjiweber wrote:
         | Reminded of
         | https://twitter.com/allspaw/status/931543941966647297
        
         | tuckerman wrote:
         | I think there should be a difference between a postmortem
         | process and a performance management process and just because
         | the first is blameless doesn't mean that the second can't look
         | back to find problems or negligence.
         | 
         | That said, even when there is obvious negligence, having the
         | postmortem process look at the issue with blamelessness is
         | important to build up tooling/changes that could prevent it
         | from happening again. For example, maybe you could revoke
         | individuals having direct access to the production database
         | without multi-party authentication.
        
           | SilasX wrote:
           | >I think there should be a difference between a postmortem
           | process and a performance management process and just because
           | the first is blameless doesn't mean that the second can't
           | look back to find problems or negligence.
           | 
           | That doesn't make sense. The moment that you look back at a
           | postmortem for use in penalizing someone via performance
           | management, the postmortem is no longer blameless.
        
             | tuckerman wrote:
             | You don't look back at the postmortem, but if a manager
             | says "you have repeatedly broken policy and, despite
             | warnings, have logged into systems without permissions
             | leading to incidents" I don't think that's a problem. It's
             | completely separate.
             | 
             | Additionally, if someone is going up for promotion and uses
             | a number of launches in their packet that all resulted in
             | regressions and didn't have good rollback plans, I don't
             | think the committee needs to be blind to that fact.
        
         | phkahler wrote:
         | >> if people are malicious or negligent or just not good at
         | their jobs, adding more process to get around that only makes
         | things worse.
         | 
         | That's why there is a hiring _and_ firing process.
        
         | jdc wrote:
         | https://blog.crunchydata.com/blog/control-runaway-postgres-q...
        
         | NewEntryHN wrote:
         | > he repeatedly
         | 
         | Surely the first occurrence led to a post-mortem which
         | documented and forbed the practices that became known to be
         | dangerous for production.
        
           | electroly wrote:
           | Yes, that is presumably what "after being explicitly told not
           | to do that against the prod database" refers to.
        
             | dsjoerg wrote:
             | Maybe. Unclear if it was documented or just told verbally.
        
               | ivraatiems wrote:
               | I don't know for sure but I believe there was a PIP and
               | so on.
        
         | Buttons840 wrote:
         | > At a place I worked, a DBA was let go after he repeatedly
         | brought production down for 45 minutes to an hour at a time by
         | running intensive queries of his own design for data-gathering,
         | in some cases, after being explicitly told not to do that
         | against the prod database. This was a person whose job
         | description required him to have access to prod.
         | 
         | Trying to have some sympathy: Was he given an alternative? Or
         | was it a "stop doing that important thing -- I don't know how
         | else to do it, figure it out" situation?
        
           | ivraatiems wrote:
           | It wasn't particularly important and we had "offline" copies
           | of most of the DB data for this sort of thing, just somewhat
           | less up to date. I honestly don't know why he did this.
        
       | wly_cdgr wrote:
       | Just another bullshit hit piece on the fact that a lot of the
       | very best software is made by either one person or a very small
       | band of very close collaborators. The great man theory reflects
       | reality to a large degree, drink your salty tears
        
         | ivraatiems wrote:
         | Their argument: "The great man theory is wrong because [actual
         | reasons]."
         | 
         | Your argument: "The great man theory is right, you're dumb if
         | you disagree."
         | 
         | Sure, I'm pretty dumb, but why should I believe you?
        
           | PeterisP wrote:
           | Their entire argument against the great man theory, quoting
           | the original article, is "But it's also wrong, and toxic to
           | sustainable development and equitable environments."
           | 
           | No reasons or arguments are provided, just an assertion.
           | 
           | On the other hand, "a lot of the very best software is made
           | by either one person or a very small band of very close
           | collaborators." is an argument - not very strong, but at
           | least something, with prominent examples such as git, tex,
           | Calibre, emacs, OBS, Minecraft, Stardew Valley, Sublime Text
           | and many others.
        
           | wly_cdgr wrote:
           | A good reason would be the countless examples available
           | freely on sites like GitHub.
           | 
           | Also the fact that it is this way in music, art, sculpture,
           | writing, mathematics, game design, sport, cooking,
           | fashion,....it would be shocking indeed if it was any
           | different in programming
        
             | Underphil wrote:
             | If you release on GitHub you have no skin in the game. If
             | you break something or abandon the project entirely (which
             | happens with a breathtaking frequency) you can just move on
             | to something else. Not the same _at all_.
        
             | drewcoo wrote:
             | Aha. I disagreed with you until you made this point. Now I
             | think I see where you're right and people are talking past
             | each other.
             | 
             | Art masterpieces are not built by committee. You list
             | mostly what I'd consider arts.
             | 
             | You claim that programming is (like) an art.
             | 
             | And I think that's the difference. The opinion you're
             | arguing against comes from the perspective that programming
             | on a larger team to solve real world problems is (like) an
             | engineering discipline. Engineering is both an art and a
             | science, but more importantly engineering is a
             | collaborative process.
             | 
             | And that's why the focus of the piece was on social aspects
             | of engineering and avoiding blame culture.
             | 
             | We need better terms than were using. "Programming" is
             | about as useful in this discussion as "building things with
             | rock." Does that mean stacking cairns? Does that mean fine
             | art sculpture? Does that mean designing highways and dams?
        
             | ivraatiems wrote:
             | ...it's your argument. Not on me to prove it for you!
        
               | wly_cdgr wrote:
               | Sure. But I got better things to do than prove the
               | obvious to people who don't want to believe it
        
               | exBarrelSpoiler wrote:
               | Then you cede the argument entirely.
        
         | _3u10 wrote:
         | They might not build all of it but I'm pretty sure Linux never
         | gets built without Linus.
        
       | dj_mc_merlin wrote:
       | "Great Men are the cause of everything!"
       | 
       | "No, you're wrong, it's Systems!"
       | 
       | "No, it's Great Men!"
       | 
       | "No, it's Systems!"
       | 
       | The author found out there's another way to think about events
       | other than "someone did it". Now he thinks he can apply that
       | thought to everything. Sometimes it's the system, sometimes the
       | person. You can't generalize easily.
        
         | beaconstudios wrote:
         | If we're talking about the historical theory, the systems angle
         | doesn't deny that smart people exist. It just highlights that
         | Einstein was born at the right time and in the right
         | environment to become a great person - 50 years earlier and the
         | groundwork he relied on would not have existed, and he would've
         | been stuck laying that groundwork himself. I feel like Kuhn's
         | "the structure of scientific revolutions" lays out the case
         | well, in that incremental improvements set up for a massive
         | paradigm shift, but both the iteration and the revolution are
         | conducted by smart people.
        
           | marginalia_nu wrote:
           | This is assuming you can separate a person from their
           | environment, which is a highly dubious counterfactual that
           | becomes stranger the more you consider it. If through some
           | freak accident in probability and a family tradition of
           | extramarital affairs with Germanic slaves someone genetically
           | identical to Einstein was born gens Claudia in the early
           | imperial period of Rome, he simply wouldn't be the same
           | person.
           | 
           | A person's experiences and circumstances are an inexorable
           | part of that person. They are part of what makes them great
           | (or not). You simply can't remove the person from their
           | environment.
        
             | beaconstudios wrote:
             | That is also very true, there are many layers to a systemic
             | (or critical, if you prefer) perspective.
        
       | mberning wrote:
       | Sorry, sometimes breaking prod is your fault. As a senior
       | engineer there is no excuse for forgetting a where clause on your
       | SQL delete. And doubly no excuse for not having somebody else
       | look at it.
        
         | dilyevsky wrote:
         | This is a bad take. Everyone makes mistakes. I once saw
         | principal engineer take down 15 thousand machine cluster
         | offline bc he fat-fingered a command. The "blameless" part of
         | pm culture is designed to address process and architecture
         | rather than individual failures. Ofc if someone is
         | intentionally subverting the process that's a different
         | story...
        
           | darkerside wrote:
           | Are you saying that wasn't his fault? I agree it's a false
           | dichotomy between the process failing and people sharing
           | blame.
        
         | theli0nheart wrote:
         | Humans forget things and make mistakes, no matter what your job
         | title is. Good systems and organizations take this into account
         | and are fault tolerant.
        
           | mattm wrote:
           | That's true but a senior engineer is expected to know when
           | they are doing something potentially risky and should know
           | when to ask someone to double check their work.
        
             | fsociety wrote:
             | That's all and good if your senior engineer is a program.
             | But they are a human, and humans are prone to making silly
             | mistakes.
             | 
             | Instead systems should be designed such that you can't
             | easily take them down due to operator errors.
        
         | hacker_newz wrote:
         | It's a process problem. If an engineer shouldn't be able to run
         | a SQL delete query without a where clause, then there should be
         | processes, procedures, or tooling in place to prevent that.
        
       | NewEntryHN wrote:
       | The way to attribute accountability is protocols. Create and
       | maintain protocols that are known ways to do something safely. If
       | you broke something by disregarding the protocol, then you've
       | fucked up. If you broke something by complying with the protocol,
       | then you discovered the protocol needs to be updated.
        
       ___________________________________________________________________
       (page generated 2022-04-03 23:01 UTC)