[HN Gopher] Firing Myself
       ___________________________________________________________________
        
       Firing Myself
        
       Author : banzin
       Score  : 63 points
       Date   : 2024-07-13 19:22 UTC (3 hours ago)
        
 (HTM) web link (backintyme.substack.com)
 (TXT) w3m dump (backintyme.substack.com)
        
       | doctor_eval wrote:
       | > I found myself on the phone to Rackspace, leaning on a desk for
       | support, listening to their engineer patiently explain that
       | backups for this MySQL instance had been cancelled over 2 months
       | ago. Ah.
       | 
       | There is no part of this story that's the protagonist's fault.
       | What a mess.
        
         | cwales95 wrote:
         | Yeah, cannot help but agree. It should have been impossible for
         | this to happen in the first place.
        
           | RaftPeople wrote:
           | > _It should have been impossible for this to happen in the
           | first place._
           | 
           | Exactly, CEO should have fired himself for allowing that
           | environment to exist.
        
         | occz wrote:
         | Agreed. Negligence bordering on criminal all the way up the
         | management chain. The fact that they blamed the author is
         | telling about the culture as well.
        
       | badgersnake wrote:
       | I wouldn't blame you for resigning, it sounds like an awful
       | environment.
       | 
       | But individuals will always make mistakes, systems and processes
       | prevent individuals mistakes from doing damage. That's what was
       | lacking here, not your fault at all. I just hope lessons were
       | learned.
        
         | unyttigfjelltol wrote:
         | Should have spun it as a novel game feature. Like burning the
         | library at Alexandria.
        
       | jt2190 wrote:
       | How did this
        
       | fishtoaster wrote:
       | I once made a huge fuckup.
       | 
       | A couple years into my career, I was trying to get my AWS keys
       | configured right locally. I hardcoded them into my .zshrc file. A
       | few days later on a Sunday, forgetting that I'd done that, I
       | committed and pushed that file to my public dotfiles repo, at
       | which point those keys were instantly and automatically
       | compromised.
       | 
       | After the dust settled, the CTO pulled me into the office and
       | said:
       | 
       | 1. So that I know you know: explain to me what you did, why it
       | shouldn't have happened, and how you'll avoid it in the future.
       | 
       | 2. This is not your fault - it's ours. These keys were way
       | overpermissioned and our safeguards were inadequate - we'll fix
       | that.
       | 
       | 3. As long as it doesn't happen again, we're cool.
       | 
       | Looking back, 10 years later, I think that was exactly the right
       | way to handle it. Address what the individual did, but realize
       | that it's a process issue. If your process only works when 100%
       | of people act perfectly 100% of the time, your process does not
       | work and needs fixing.
        
         | vvanders wrote:
         | Yep, been adjacent enough to a couple large ones through my
         | career to see the details and been up-close to a few that this
         | is the right way to approach it.
         | 
         | Did the person know they screwed up? Did they show remorse and
         | a willingness to dive in and sort it out? They likely feel like
         | absolute shit about the whole thing and you don't need to come
         | down on them like a ton of bricks. If that much damage could be
         | done with a single person then you have a gap in your
         | process/culture/etc and that should be addressed from the top.
         | 
         | One of the best takes I've seen on this was from a previous
         | manager who when confronted with a similar situation as the
         | article(it was a full DB drop). The person tried to hand in
         | their resignation on the spot, they instead(and I'm
         | paraphrasing here) said: "You're the most qualified person to
         | handle this risk in the future as we've just spent $(insert
         | revenue hit here) training you. Moving forward we want you to
         | own backup/restore and making sure those things work".
         | 
         | That person ended up being one of their best engineers and they
         | had fantastic resiliency moving forward. It turns out if you
         | give someone a bit of grace and trust when they realize they
         | screwed up you'll end up with a stronger organization and
         | culture because of it.
        
           | NegativeK wrote:
           | To quote a statistician friend: 100% of humans make mistakes.
           | 
           | OP's leadership was shit. The org let a junior dev delete
           | shit in prod and then didn't own up to _their_ mistake? Did
           | they later go on to work at a genetics company and blame
           | users for being the subject of password sprays?
        
           | Aurornis wrote:
           | > they instead(and I'm paraphrasing here) said: "You're the
           | most qualified person to handle this risk in the future as
           | we've just spent $(insert revenue hit here) training you.
           | 
           | This is an old quote that has been originally attributed to
           | different people throughout the years. It shows up in a lot
           | of different management books and, more recently, LinkedIn
           | influencer posts.
           | 
           | It's good for lightening the situation and adding some
           | levity, but after hearing it repeated 100 different times
           | from different books, podcasts, and LinkedIn quotes it has
           | really worn on me as somewhat dishonest. It feels clever the
           | first time you hear it, but really the cost of the mistake is
           | a separate issue from the decision to fire someone for it.
           | 
           | In real world situations, the decision to let someone go
           | involved a deeper dive into assessing whether the incident
           | was really a one-off mistake, or the culmination of a pattern
           | of careless behavior, failure to learn, or refusal to adopt
           | good practices.
           | 
           | I've seen situations where the actual dollar amount of the
           | damage was negligible, but the circumstances that caused the
           | accident were so egregiously bad and avoidable that we
           | couldn't justify allowing the person to continue operating in
           | the role. I wish it was as simple as training people up or
           | having them learn from their mistakes, but some people are so
           | relentlessly careless that it's better for everyone to just
           | cut losses.
           | 
           | However when the investigation shows that the incident really
           | was a one-time mistake from someone with an otherwise strong
           | history of learning and growing, cutting that person for a
           | single accident is a mistake.
           | 
           | The important thing to acknowledge is point #3 from the post
           | above: Once you've made an expensive mistake, that's usually
           | your last freebie. The next expensive mistake isn't very
           | likely to be joked away as another "expensive training"
        
             | vvanders wrote:
             | I'm fairly certain it occured since the story was first-
             | hand and about 12+ years ago(although they may have lifted
             | it from similar sources). It's not a bad way to diffuse
             | things if it's clear there was an honest mistake
             | 
             | Your point on willingness to learn is bang on. If there's
             | no remorse or intentionally negligent then yes that's a
             | different story.
        
               | Aurornis wrote:
               | Oh I'm sure it occurred. The CEO was just repeating it
               | from the countless number of management books where the
               | quote appears.
               | 
               | My point was that it's a story that gets overlaid on top
               | of the real decision making process
        
         | ay wrote:
         | So much this.
         | 
         | There is a great book which I think should be on a table of
         | every single person (especially leadership) working in any
         | place which involves humans interacting with machines:
         | 
         | https://www.amazon.com/Field-Guide-Understanding-Human-Error...
        
         | kmarc wrote:
         | Besides the obvious takeaway of the story, to anyone who reads
         | this: use pre-commit hooks to avoid this kind of problems (or
         | something equivalent).
         | 
         | With the pre-commit framework, an example hook would be
         | https://github.com/Yelp/detect-secrets
        
         | nine_k wrote:
         | Here's one of may favorite anecdotes / fables on the topic.
         | 
         | A young trader joined a financial company. He tried hard to
         | shoe how good and useful he was, and he indeed was, at the
         | rookie level.
         | 
         | One day he made a mistake, directly and undeniably attributable
         | to him, and lost $200k due to that mistake.
         | 
         | Crushed and depressed, he came to his boss and said:
         | 
         | -- Sir! I failed so badly. I think I'm not fit for this job. I
         | want to leave the company.
         | 
         | But the boss went furious:
         | 
         | -- How dare you, yes, how dare you ask me to let you go right
         | after we've invested $200k in your professional training?!
        
       | amackera wrote:
       | Less "Firing yourself" and more like liberating yourself from a
       | toxic unprofessional clown show.
        
       | anon115 wrote:
       | so they didnt have a backup? thats on them lol
        
       | dudus wrote:
       | So a company gives junior engineers full access to a production
       | database without backup so they can work on it developing
       | features that require DDL SQL commands. I've seen it happen
       | before, what I've never seen is someone blame the junior employee
       | when things undoubtedly go south.
       | 
       | I'm not sure I even believe that part of the story. This was
       | either a very disfunctional company or a looooong time ago.
        
         | endofreach wrote:
         | > I'm not sure I even believe that part of the story. This was
         | either a very disfunctional company
         | 
         | The first sentence of the article tells us it was "a Social
         | Gaming startup" and with that as well everything we needed to
         | know.
        
         | kevin_nisbet wrote:
         | I haven't personally seen this particular case either but I
         | have no doubt it could happen. I've seen orgs where a blameless
         | type culture isn't natural, and I've had to explain to the
         | leadership that publicly humiliating (in jest) someone for
         | getting caught by the phishing tests or posting private data to
         | a pastebin type service is a bad idea.
         | 
         | And I've interacted with plenty of people who externalize
         | everything that goes wrong to them, naturally some of these
         | folks will be in leadership positions.
        
       | loktarogar wrote:
       | No junior should have been able to cause this much damage on
       | their own without a safety net of some kind.
       | 
       | It's on the company for cancelling their backups.
        
       | freehorse wrote:
       | This sounds like a company that does not learn from errors, looks
       | for "junior engineer" scapegoats instead of looking for the
       | systemic processes that facilitated this, and not a great place
       | to stay tbh. This was a chance for the company to reflect on some
       | of their processes, and take measures that will avoid similar
       | issues (and the steps to take are pretty obvious). And the
       | description of what happened afterwords show a probably toxic
       | environment.
       | 
       | It should never be like this, and especially in this case I blame
       | OP 0%. This is something that could happen to anybody in such
       | circumstances. I have not deleted a full database, but have had
       | to restore stuff a few times, I have made mistakes myself and
       | have rushed to fix problems caused by others' mistakes and each
       | single time the whole point and discussions was about improving
       | our processes so that this does not happen again.
        
       | newaccountman2 wrote:
       | > backups for this MySQL instance had been cancelled over 2
       | months ago.
       | 
       | Uhh, there's the problem, not that someone accidentally deleted
       | something lol
        
       | menzoic wrote:
       | Clearly the fault of a terribly lead engineering organization.
       | Mistakes are almost guaranteed to happen. This is why good
       | engineering orgs have guardrails in place. There were no
       | guardrails whatsoever here. Accounts used to manually access
       | adhoc production databases should not have delete permissions for
       | critical data. And worst off all no backups.
        
       | cybervegan wrote:
       | There's a lot of responsibility there resting on your superiors
       | because they weren't following "best practises". Sure you fucked
       | up, but if they had backups, it wouldn't have been such a
       | disaster, and if you had a Dev environment to test against, it
       | would have been a non-issue entirely. Straight out of Uni, you
       | shouldn't have been expected to know that, but I bet you grew as
       | a consequence.
        
         | Spivak wrote:
         | Yep, whether the leadership recognizes it or not this is an
         | organizational failure. No access controls for destroying prod
         | data, no backups, no recovery plan, told to do testing in prod,
         | whatever horrible process they have that required engineers
         | regularly directly accessing the database.
        
       | cstrahan wrote:
       | I can relate to this with my own story, where I managed to delete
       | an entire database -- my first day on the job, no less.
       | 
       | I was hired by a little photo development company, doing both
       | walk in jobs and electronic B2B orders. I was brought in to pick
       | up on the maintenance and development of the B2B order placement
       | web service the previous developer had written.
       | 
       | Sadly, the previous dev designed the DB schema and software under
       | the assumption that there would only ever be one business
       | customer. When that ceased to be the case, he decided to simply
       | create another database and spin up another process.
       | 
       | So here I am on my first day, tasked with creating a new empty
       | database to bring on another customer. I used the Microsoft SQL
       | Server admin GUI to generate the DDL from one of the existing
       | tables, created (and switched the connection to) a pristine, new
       | DB, and ran the script.
       | 
       | Little did I know, in the middle of many thousands of lines of
       | SQL, the script switched the connection back to the DB from which
       | the DDL was generated, and then proceeds to drop every single
       | table.
       | 
       | Oops.
       | 
       | Of course, the last dev disabled back ups a couple months before
       | I joined. My one saving grace was that the dev had some strange
       | fixation on logging every single thing that happened in a bunch
       | of XML log files; I managed to quickly write some code to rebuild
       | the state of the DB from those log files.
       | 
       | I was (and am) grateful to my boss for trusting my ability to
       | resolve the problem I had created, and placing as much value as
       | he did in my ownership of the problem.
       | 
       | That was about 16 years ago. One of the best working experiences
       | in my career, and a time of rapid technical growth for myself. I
       | would have missed out on a lot if that had been handled
       | differently.
        
         | esafak wrote:
         | > Sadly, the previous dev designed the DB schema and software
         | under the assumption that there would only ever be one business
         | customer.
         | 
         | What kind of an assumption is that?!
        
       | zitterbewegung wrote:
       | > I found myself on the phone to Rackspace, leaning on a desk for
       | support, listening to their engineer patiently explain that
       | backups for this MySQL instance had been cancelled over 2 months
       | ago. Ah.
       | 
       | This is the issue not what the author did. It would be a matter
       | of time that the database would have been accidentally deleted
       | somehow.
        
       | kaiokendev wrote:
       | Have been in situations just like this, on pretty much every side
       | (the fuck-upper, the person who has to fix the fuck up, and the
       | person who has to come up with a fuck-up remediation plan)
       | 
       | The most egregious case involved an incompetent configuration
       | that resulted in hundreds of millions $ in lost data and a
       | 6-month long automated recovery project. Fortunately, there were
       | traces of the data across the entire stack - from page caches in
       | a random employee's browser, to automated reports and OCR dumps.
       | By the end of the project, all data was recovered. No one from
       | outside ever found out or even realized anything had happened -
       | we had redundancy upon redundancy across several parts of the
       | business, and the entire company basically shifted the way we did
       | ops to work around the issue for the time being. Every department
       | had a scorecard tracking how many of their files were recovered,
       | and we had little celebrations when we hit recovery milestones.
       | To this day only a few people know who was responsible (wasn't
       | me! lol)
       | 
       | Blame and derision are always inevitable in situations like this.
       | It's how it's handled afterwards that really marks the competence
       | of the company.
        
       | xyst wrote:
       | > One of the peculiarities of my development environment was that
       | I ran all my code against the production database.
       | 
       | Hahaha. I still see this being done today every now and then.
       | 
       | > The CEO leaned across the table, got in my face, and said,
       | "this, is a monumental fuck up. You're gonna cost us millions in
       | revenue". His co-founder (remotely present via Skype) chimed in
       | "you're lucky to still be here".
       | 
       | this type of leadership needs to be put on blast. 2010 or 2024,
       | doesn't matter.
       | 
       | If it's going to cost "millions in revenue", then maybe it would
       | have been prudent to invest time in proper data access controls,
       | proper backups, and rollback procedures.
       | 
       | Absolutely incompetent leadership should never be hired ever
       | again. There should be a public blacklist so I don't make the
       | mistake of ever working with such idiocy.
       | 
       | The only people ever "fired" should be leadership. Unless the
       | intent is on purpose in which you should be subject to jail time
        
         | BoorishBears wrote:
         | They let you stay after costing them millions in revenue then?
         | Doesn't sound like the worst leadership to me?
        
       | hcarvalhoalves wrote:
       | > The CEO leaned across the table, got in my face, and said,
       | "this, is a monumental fuck up. You're gonna cost us millions in
       | revenue". His co-founder (remotely present via Skype) chimed in
       | "you're lucky to still be here".
       | 
       | Should expose the CEO's name. Between this and forcing you to
       | work 3 days straight, that was the least professional way to
       | handle this situation.
        
       | alex_lav wrote:
       | > I found myself on the phone to Rackspace, leaning on a desk for
       | support, listening to their engineer patiently explain that
       | backups for this MySQL instance had been cancelled over 2 months
       | ago. Ah.
       | 
       | As usual, a company with legitimately moronic processes
       | experiences the consequences of those moronic processes when a
       | "junior" person breaks something. Whoever turned off those
       | backups as well as whoever thought devs (especially "junior"
       | devs) should be mutating prod tables by hand are ultimately
       | accountable.
        
       | steve_adams_86 wrote:
       | I can't imagine putting someone who's new to this work in that
       | kind of precarious position. If I let someone make a mistake that
       | severe, _I 'd_ apologize to _them_ and work with them through the
       | solution and safeguards to prevent it from happening again.
       | 
       | A little bit of room for error is essential for learning, but
       | this is insane. I'm so glad the only person who has ever put me
       | in that kind of position is me, haha. This career would have
       | seemed so much scarier if the people I worked with early on were
       | willing to trust me with such terrifying error margins.
        
       | delichon wrote:
       | It makes you a better developer. I backup obsessively BECAUSE I
       | fucked up almost this badly and more than once. Hire yourself
       | back and charge a bit more for the extra wisdom.
        
       | andrewstuart wrote:
       | It is ALWAYS the fault of management when the databases are lost.
       | 
       | Engineers must never feel guilty if the company was run in such a
       | way as to make that possible.
        
       ___________________________________________________________________
       (page generated 2024-07-13 23:01 UTC)