[HN Gopher] It's not your fault
___________________________________________________________________
It's not your fault
Author : thcipriani
Score : 55 points
Date : 2022-04-03 09:45 UTC (13 hours ago)
(HTM) web link (www.kostaharlan.net)
(TXT) w3m dump (www.kostaharlan.net)
| alexashka wrote:
| > breaking prod is not your fault. Put differently: breaking prod
| is a systems failure, not an individual one.
|
| False dichotomy.
| blakesterz wrote:
| I feel like, for me at least, being able to say "That was my
| fault" makes me remember what I did. It makes me document that
| failure. And makes me tell others to avoid what I did.
|
| I've been training a JR lately and I will always say "Here's all
| the things I did wrong when I did this, so don't do this things"
|
| If I did it, he could easily do it, and if we can all avoid my
| mistakes, so much the better.
|
| It WAS my fault quite a few times, and I'm ok with that. And
| luckily, everyone else around here is too. I'd hate to work
| somewhere that punishes honest mistakes. (there are limits, of
| course)
| hprotagonist wrote:
| perhaps the best example i can think of was the gitlab incident.
|
| and if we're honest it's because the engineer's name is Yorick.
| _aleph2c_ wrote:
| This may be true, but when things break people will look for a
| scapegoat. So when things break, and you are mostly-responsible
| for initiating the failure, use collective language ("we" and not
| "I"), frame the failure as a systems failure when you are talking
| to management or the executives, look cool even if you are
| feeling stressed out. Manage the narrative! Sure, you flipped the
| switch or whatever, but try and survive the event. Just because
| you think its a system's failure doesn't mean other people share
| this belief, don't volunteer to be thrown off the bus.
| ChrisMarshallNY wrote:
| In my experience, working at a "classic" Japanese engineering
| firm, scapegoating was discouraged.
|
| During postmortems, we would often decide something like "Chris
| made an erroneous assumption that the fix introduced no bugs."
| (That's a classic "oldtimer" mistake, BTW. I make it all the
| time -I'm a slow learner).
|
| Absolutely no blame would be affixed. It was really important
| for Chris (that's me) to assume Responsibility for the error,
| and the team would develop a solution.
|
| This being a Japanese company, of course, said "solution"
| usually ended up being another punchlist item, like "Perform
| complete regression tests for even the smallest bug fix
| release," etc.
|
| I'm not thrilled with people using "hero programmer syndrome,"
| or "bus factor" as an excuse to write naive or deliberately
| dumbed-down code, though.
|
| Sometimes, a program needs to be maintained by skilled,
| experienced, well-paid, and motivated people. If a company
| insists on developing code, using advanced techniques, then
| turning over maintenance to junior staff, or do a bad job,
| writing a program, because they want it to be maintained by the
| absolute cheapest programmers possible, that's a problem.
| _3u10 wrote:
| It being your fault is a good way to go through life.
|
| It focuses your mind on what you could do to avoid those
| situations in the first place.
|
| If it's your fault prod broke you can fix the process or you can
| look for a new job where the process is already fixed. Or you can
| find a role that doesn't involve pushing code to prod, maybe in
| R&D, etc.
| ivraatiems wrote:
| It being your fault in the sense of "you know you made a
| mistake and you're committed to remedying that mistake" is
| fine. What we don't need is "you know you made a mistake and
| now you must endure abuse and have your job threatened over a
| mistake."
| _3u10 wrote:
| Yeah shitty managers exist. I was a lead when one of the
| engineers shipped a debug build that made it past App Store
| review. (Our debug builds were obvious). My manager says Mike
| (name changed) isn't cutting releases anymore.
|
| I say Mike is cutting releases because he's now the one
| person I trust on the team to not fuck it up.
|
| If you need it in writing so you can fire me if mike fucks it
| up, let me know.
|
| Manager mike and I all cut the next release at Mikes
| workstation with him knowing my ass was on the line if we
| shipped another debug build.
|
| Mike never shipped another debug build.
| rdtwo wrote:
| Only works if Mike has skin in the game. If every time Mike
| cuts a debug build we blame the process then mikes never
| too worried about it
| _3u10 wrote:
| True. Most people have no skin in the game.
| rdtwo wrote:
| Sometimes In big corporate they don't and can Keep
| screwing up
| drewcoo wrote:
| Did nobody say "fix the process" or "make 'cutting a build'
| automated with tests so that a human can't 'fuck it up'" or
| "have Mike drive that automation?" Because that would be a
| blameless approach.
| kayodelycaon wrote:
| It depends what you mean by fault.
|
| If you mean hold yourself responsible for doing the best you
| can and learning from mistakes, then I fully agree.
|
| The issue is fault can also mean carrying guilt with you and
| continuing to be blamed for it. This is not helpful once you've
| learned the lessons you needed.
| slibhb wrote:
| The question isn't whether it's your fault, it's whether you take
| responsibility for it. If no one takes responsibility for
| anything then you get nowhere. And if you take responsibility for
| it, it's your fault if it goes wrong.
|
| > "Hero Programmer" is a derogatory name for a programmer who
| chooses to fix problems in epic, caffeine-fueled 36-hour coding
| sessions that frequently just kick the can down the road to the
| next heroic 36-hour coding blitz. Hero programmers would rather
| react than plan. Projects with hero programmers working on them
| often make a lot of progress initially, but never arrive at a
| stable state of completion
|
| Maybe there are workplaces where people get together to
| collaborate on a design and then break the design down into tasks
| and assign those tasks to programmers to implement. Maybe this
| process is performed until the project is done. Maybe. But I've
| never seen it. I see people taking responsibility for small and
| large tasks, and the large ones sometimes involve a single person
| re-implementing entire systems spread across thousands of files
| (though not necessarily in "36-hour coding blitzes").
| marginalia_nu wrote:
| Honestly that whole hero programmer bit seems like a bit of a
| strawman. What's being described sounds like a talented but
| inexperienced developer (which doesn't necessarily mean they
| are young or fresh to the field; some people manage to stay
| beginners for decades). Doesn't mean there aren't highly
| talented developers that can get a lot of work done in a short
| amount of time if you let them.
|
| The failure in that case is not having a more senior developer
| mentor the kid.
| darkerside wrote:
| Agreed.
|
| > Projects with hero programmers working on them often make a
| lot of progress initially, but never arrive at a stable state
| of completion
|
| No project is ever really finished except the ones nobody cares
| about. Probably because they stopped being maintained by the
| hero programmer.
| watwut wrote:
| This is not true. We and I had finished tons of projects.
| They are done in the true sense. We are OK with their state
| and they rarely ever change. They work.
|
| We moved on other projects.
| ozzythecat wrote:
| > When you look at things from this lens, all the successes of a
| website, an application, or an organization flow from the talents
| and genius of a few individuals. It's a compelling outlook
| because, well, empirically it can definitely appear this way, and
| it's naturally aligned with the other dominant societal ideas we
| have about individuality.
|
| In many large organizations, much of the success comes from the
| foresight, insight, and hard "work" comes from a few benefitting
| many. The reality is, it _is_ individuals and not some collective
| group or "teams".
| ivraatiems wrote:
| I have some paradoxical feelings about "blameless" retro culture
| that I'll try to sum up.
|
| In general, I'm in favor of the approach. I don't think singling
| people out and bullying or shaming them for their mistakes ever
| works. I think most well-intentioned engineers will already beat
| themselves up plenty for making a serious mistake, and they don't
| need any encouragement to do so. I know I do.
|
| On the other hand, there is a red line. At a place I worked, a
| DBA was let go after he repeatedly brought production down for 45
| minutes to an hour at a time by running intensive queries of his
| own design for data-gathering, in some cases, after being
| explicitly told not to do that against the prod database. This
| was a person whose job description required him to have access to
| prod.
|
| There were process problems, maybe - being allowed to run
| whatever queries you want on production under your own authority,
| sure - but his cavalier attitude towards a production environment
| was still unacceptable. Process can only help when people are
| well-intentioned and doing their best; if people are malicious or
| negligent or just not good at their jobs, adding more process to
| get around that only makes things worse.
| mateo411 wrote:
| It seems like a read replica would have helped out in this
| instance.
|
| I agree if somebody decides to keep doing the same actions
| after being told not do to them, because their actions would
| bring down production, and their actions do bring down
| production, then they should be held accountable.
| benjiweber wrote:
| Reminded of
| https://twitter.com/allspaw/status/931543941966647297
| tuckerman wrote:
| I think there should be a difference between a postmortem
| process and a performance management process and just because
| the first is blameless doesn't mean that the second can't look
| back to find problems or negligence.
|
| That said, even when there is obvious negligence, having the
| postmortem process look at the issue with blamelessness is
| important to build up tooling/changes that could prevent it
| from happening again. For example, maybe you could revoke
| individuals having direct access to the production database
| without multi-party authentication.
| SilasX wrote:
| >I think there should be a difference between a postmortem
| process and a performance management process and just because
| the first is blameless doesn't mean that the second can't
| look back to find problems or negligence.
|
| That doesn't make sense. The moment that you look back at a
| postmortem for use in penalizing someone via performance
| management, the postmortem is no longer blameless.
| tuckerman wrote:
| You don't look back at the postmortem, but if a manager
| says "you have repeatedly broken policy and, despite
| warnings, have logged into systems without permissions
| leading to incidents" I don't think that's a problem. It's
| completely separate.
|
| Additionally, if someone is going up for promotion and uses
| a number of launches in their packet that all resulted in
| regressions and didn't have good rollback plans, I don't
| think the committee needs to be blind to that fact.
| phkahler wrote:
| >> if people are malicious or negligent or just not good at
| their jobs, adding more process to get around that only makes
| things worse.
|
| That's why there is a hiring _and_ firing process.
| jdc wrote:
| https://blog.crunchydata.com/blog/control-runaway-postgres-q...
| NewEntryHN wrote:
| > he repeatedly
|
| Surely the first occurrence led to a post-mortem which
| documented and forbed the practices that became known to be
| dangerous for production.
| electroly wrote:
| Yes, that is presumably what "after being explicitly told not
| to do that against the prod database" refers to.
| dsjoerg wrote:
| Maybe. Unclear if it was documented or just told verbally.
| ivraatiems wrote:
| I don't know for sure but I believe there was a PIP and
| so on.
| Buttons840 wrote:
| > At a place I worked, a DBA was let go after he repeatedly
| brought production down for 45 minutes to an hour at a time by
| running intensive queries of his own design for data-gathering,
| in some cases, after being explicitly told not to do that
| against the prod database. This was a person whose job
| description required him to have access to prod.
|
| Trying to have some sympathy: Was he given an alternative? Or
| was it a "stop doing that important thing -- I don't know how
| else to do it, figure it out" situation?
| ivraatiems wrote:
| It wasn't particularly important and we had "offline" copies
| of most of the DB data for this sort of thing, just somewhat
| less up to date. I honestly don't know why he did this.
| wly_cdgr wrote:
| Just another bullshit hit piece on the fact that a lot of the
| very best software is made by either one person or a very small
| band of very close collaborators. The great man theory reflects
| reality to a large degree, drink your salty tears
| ivraatiems wrote:
| Their argument: "The great man theory is wrong because [actual
| reasons]."
|
| Your argument: "The great man theory is right, you're dumb if
| you disagree."
|
| Sure, I'm pretty dumb, but why should I believe you?
| PeterisP wrote:
| Their entire argument against the great man theory, quoting
| the original article, is "But it's also wrong, and toxic to
| sustainable development and equitable environments."
|
| No reasons or arguments are provided, just an assertion.
|
| On the other hand, "a lot of the very best software is made
| by either one person or a very small band of very close
| collaborators." is an argument - not very strong, but at
| least something, with prominent examples such as git, tex,
| Calibre, emacs, OBS, Minecraft, Stardew Valley, Sublime Text
| and many others.
| wly_cdgr wrote:
| A good reason would be the countless examples available
| freely on sites like GitHub.
|
| Also the fact that it is this way in music, art, sculpture,
| writing, mathematics, game design, sport, cooking,
| fashion,....it would be shocking indeed if it was any
| different in programming
| Underphil wrote:
| If you release on GitHub you have no skin in the game. If
| you break something or abandon the project entirely (which
| happens with a breathtaking frequency) you can just move on
| to something else. Not the same _at all_.
| drewcoo wrote:
| Aha. I disagreed with you until you made this point. Now I
| think I see where you're right and people are talking past
| each other.
|
| Art masterpieces are not built by committee. You list
| mostly what I'd consider arts.
|
| You claim that programming is (like) an art.
|
| And I think that's the difference. The opinion you're
| arguing against comes from the perspective that programming
| on a larger team to solve real world problems is (like) an
| engineering discipline. Engineering is both an art and a
| science, but more importantly engineering is a
| collaborative process.
|
| And that's why the focus of the piece was on social aspects
| of engineering and avoiding blame culture.
|
| We need better terms than were using. "Programming" is
| about as useful in this discussion as "building things with
| rock." Does that mean stacking cairns? Does that mean fine
| art sculpture? Does that mean designing highways and dams?
| ivraatiems wrote:
| ...it's your argument. Not on me to prove it for you!
| wly_cdgr wrote:
| Sure. But I got better things to do than prove the
| obvious to people who don't want to believe it
| exBarrelSpoiler wrote:
| Then you cede the argument entirely.
| _3u10 wrote:
| They might not build all of it but I'm pretty sure Linux never
| gets built without Linus.
| dj_mc_merlin wrote:
| "Great Men are the cause of everything!"
|
| "No, you're wrong, it's Systems!"
|
| "No, it's Great Men!"
|
| "No, it's Systems!"
|
| The author found out there's another way to think about events
| other than "someone did it". Now he thinks he can apply that
| thought to everything. Sometimes it's the system, sometimes the
| person. You can't generalize easily.
| beaconstudios wrote:
| If we're talking about the historical theory, the systems angle
| doesn't deny that smart people exist. It just highlights that
| Einstein was born at the right time and in the right
| environment to become a great person - 50 years earlier and the
| groundwork he relied on would not have existed, and he would've
| been stuck laying that groundwork himself. I feel like Kuhn's
| "the structure of scientific revolutions" lays out the case
| well, in that incremental improvements set up for a massive
| paradigm shift, but both the iteration and the revolution are
| conducted by smart people.
| marginalia_nu wrote:
| This is assuming you can separate a person from their
| environment, which is a highly dubious counterfactual that
| becomes stranger the more you consider it. If through some
| freak accident in probability and a family tradition of
| extramarital affairs with Germanic slaves someone genetically
| identical to Einstein was born gens Claudia in the early
| imperial period of Rome, he simply wouldn't be the same
| person.
|
| A person's experiences and circumstances are an inexorable
| part of that person. They are part of what makes them great
| (or not). You simply can't remove the person from their
| environment.
| beaconstudios wrote:
| That is also very true, there are many layers to a systemic
| (or critical, if you prefer) perspective.
| mberning wrote:
| Sorry, sometimes breaking prod is your fault. As a senior
| engineer there is no excuse for forgetting a where clause on your
| SQL delete. And doubly no excuse for not having somebody else
| look at it.
| dilyevsky wrote:
| This is a bad take. Everyone makes mistakes. I once saw
| principal engineer take down 15 thousand machine cluster
| offline bc he fat-fingered a command. The "blameless" part of
| pm culture is designed to address process and architecture
| rather than individual failures. Ofc if someone is
| intentionally subverting the process that's a different
| story...
| darkerside wrote:
| Are you saying that wasn't his fault? I agree it's a false
| dichotomy between the process failing and people sharing
| blame.
| theli0nheart wrote:
| Humans forget things and make mistakes, no matter what your job
| title is. Good systems and organizations take this into account
| and are fault tolerant.
| mattm wrote:
| That's true but a senior engineer is expected to know when
| they are doing something potentially risky and should know
| when to ask someone to double check their work.
| fsociety wrote:
| That's all and good if your senior engineer is a program.
| But they are a human, and humans are prone to making silly
| mistakes.
|
| Instead systems should be designed such that you can't
| easily take them down due to operator errors.
| hacker_newz wrote:
| It's a process problem. If an engineer shouldn't be able to run
| a SQL delete query without a where clause, then there should be
| processes, procedures, or tooling in place to prevent that.
| NewEntryHN wrote:
| The way to attribute accountability is protocols. Create and
| maintain protocols that are known ways to do something safely. If
| you broke something by disregarding the protocol, then you've
| fucked up. If you broke something by complying with the protocol,
| then you discovered the protocol needs to be updated.
___________________________________________________________________
(page generated 2022-04-03 23:01 UTC)