[HN Gopher] Firing Myself
___________________________________________________________________
Firing Myself
Author : banzin
Score : 63 points
Date : 2024-07-13 19:22 UTC (3 hours ago)
(HTM) web link (backintyme.substack.com)
(TXT) w3m dump (backintyme.substack.com)
| doctor_eval wrote:
| > I found myself on the phone to Rackspace, leaning on a desk for
| support, listening to their engineer patiently explain that
| backups for this MySQL instance had been cancelled over 2 months
| ago. Ah.
|
| There is no part of this story that's the protagonist's fault.
| What a mess.
| cwales95 wrote:
| Yeah, cannot help but agree. It should have been impossible for
| this to happen in the first place.
| RaftPeople wrote:
| > _It should have been impossible for this to happen in the
| first place._
|
| Exactly, CEO should have fired himself for allowing that
| environment to exist.
| occz wrote:
| Agreed. Negligence bordering on criminal all the way up the
| management chain. The fact that they blamed the author is
| telling about the culture as well.
| badgersnake wrote:
| I wouldn't blame you for resigning, it sounds like an awful
| environment.
|
| But individuals will always make mistakes, systems and processes
| prevent individuals mistakes from doing damage. That's what was
| lacking here, not your fault at all. I just hope lessons were
| learned.
| unyttigfjelltol wrote:
| Should have spun it as a novel game feature. Like burning the
| library at Alexandria.
| jt2190 wrote:
| How did this
| fishtoaster wrote:
| I once made a huge fuckup.
|
| A couple years into my career, I was trying to get my AWS keys
| configured right locally. I hardcoded them into my .zshrc file. A
| few days later on a Sunday, forgetting that I'd done that, I
| committed and pushed that file to my public dotfiles repo, at
| which point those keys were instantly and automatically
| compromised.
|
| After the dust settled, the CTO pulled me into the office and
| said:
|
| 1. So that I know you know: explain to me what you did, why it
| shouldn't have happened, and how you'll avoid it in the future.
|
| 2. This is not your fault - it's ours. These keys were way
| overpermissioned and our safeguards were inadequate - we'll fix
| that.
|
| 3. As long as it doesn't happen again, we're cool.
|
| Looking back, 10 years later, I think that was exactly the right
| way to handle it. Address what the individual did, but realize
| that it's a process issue. If your process only works when 100%
| of people act perfectly 100% of the time, your process does not
| work and needs fixing.
| vvanders wrote:
| Yep, been adjacent enough to a couple large ones through my
| career to see the details and been up-close to a few that this
| is the right way to approach it.
|
| Did the person know they screwed up? Did they show remorse and
| a willingness to dive in and sort it out? They likely feel like
| absolute shit about the whole thing and you don't need to come
| down on them like a ton of bricks. If that much damage could be
| done with a single person then you have a gap in your
| process/culture/etc and that should be addressed from the top.
|
| One of the best takes I've seen on this was from a previous
| manager who when confronted with a similar situation as the
| article(it was a full DB drop). The person tried to hand in
| their resignation on the spot, they instead(and I'm
| paraphrasing here) said: "You're the most qualified person to
| handle this risk in the future as we've just spent $(insert
| revenue hit here) training you. Moving forward we want you to
| own backup/restore and making sure those things work".
|
| That person ended up being one of their best engineers and they
| had fantastic resiliency moving forward. It turns out if you
| give someone a bit of grace and trust when they realize they
| screwed up you'll end up with a stronger organization and
| culture because of it.
| NegativeK wrote:
| To quote a statistician friend: 100% of humans make mistakes.
|
| OP's leadership was shit. The org let a junior dev delete
| shit in prod and then didn't own up to _their_ mistake? Did
| they later go on to work at a genetics company and blame
| users for being the subject of password sprays?
| Aurornis wrote:
| > they instead(and I'm paraphrasing here) said: "You're the
| most qualified person to handle this risk in the future as
| we've just spent $(insert revenue hit here) training you.
|
| This is an old quote that has been originally attributed to
| different people throughout the years. It shows up in a lot
| of different management books and, more recently, LinkedIn
| influencer posts.
|
| It's good for lightening the situation and adding some
| levity, but after hearing it repeated 100 different times
| from different books, podcasts, and LinkedIn quotes it has
| really worn on me as somewhat dishonest. It feels clever the
| first time you hear it, but really the cost of the mistake is
| a separate issue from the decision to fire someone for it.
|
| In real world situations, the decision to let someone go
| involved a deeper dive into assessing whether the incident
| was really a one-off mistake, or the culmination of a pattern
| of careless behavior, failure to learn, or refusal to adopt
| good practices.
|
| I've seen situations where the actual dollar amount of the
| damage was negligible, but the circumstances that caused the
| accident were so egregiously bad and avoidable that we
| couldn't justify allowing the person to continue operating in
| the role. I wish it was as simple as training people up or
| having them learn from their mistakes, but some people are so
| relentlessly careless that it's better for everyone to just
| cut losses.
|
| However when the investigation shows that the incident really
| was a one-time mistake from someone with an otherwise strong
| history of learning and growing, cutting that person for a
| single accident is a mistake.
|
| The important thing to acknowledge is point #3 from the post
| above: Once you've made an expensive mistake, that's usually
| your last freebie. The next expensive mistake isn't very
| likely to be joked away as another "expensive training"
| vvanders wrote:
| I'm fairly certain it occured since the story was first-
| hand and about 12+ years ago(although they may have lifted
| it from similar sources). It's not a bad way to diffuse
| things if it's clear there was an honest mistake
|
| Your point on willingness to learn is bang on. If there's
| no remorse or intentionally negligent then yes that's a
| different story.
| Aurornis wrote:
| Oh I'm sure it occurred. The CEO was just repeating it
| from the countless number of management books where the
| quote appears.
|
| My point was that it's a story that gets overlaid on top
| of the real decision making process
| ay wrote:
| So much this.
|
| There is a great book which I think should be on a table of
| every single person (especially leadership) working in any
| place which involves humans interacting with machines:
|
| https://www.amazon.com/Field-Guide-Understanding-Human-Error...
| kmarc wrote:
| Besides the obvious takeaway of the story, to anyone who reads
| this: use pre-commit hooks to avoid this kind of problems (or
| something equivalent).
|
| With the pre-commit framework, an example hook would be
| https://github.com/Yelp/detect-secrets
| nine_k wrote:
| Here's one of may favorite anecdotes / fables on the topic.
|
| A young trader joined a financial company. He tried hard to
| shoe how good and useful he was, and he indeed was, at the
| rookie level.
|
| One day he made a mistake, directly and undeniably attributable
| to him, and lost $200k due to that mistake.
|
| Crushed and depressed, he came to his boss and said:
|
| -- Sir! I failed so badly. I think I'm not fit for this job. I
| want to leave the company.
|
| But the boss went furious:
|
| -- How dare you, yes, how dare you ask me to let you go right
| after we've invested $200k in your professional training?!
| amackera wrote:
| Less "Firing yourself" and more like liberating yourself from a
| toxic unprofessional clown show.
| anon115 wrote:
| so they didnt have a backup? thats on them lol
| dudus wrote:
| So a company gives junior engineers full access to a production
| database without backup so they can work on it developing
| features that require DDL SQL commands. I've seen it happen
| before, what I've never seen is someone blame the junior employee
| when things undoubtedly go south.
|
| I'm not sure I even believe that part of the story. This was
| either a very disfunctional company or a looooong time ago.
| endofreach wrote:
| > I'm not sure I even believe that part of the story. This was
| either a very disfunctional company
|
| The first sentence of the article tells us it was "a Social
| Gaming startup" and with that as well everything we needed to
| know.
| kevin_nisbet wrote:
| I haven't personally seen this particular case either but I
| have no doubt it could happen. I've seen orgs where a blameless
| type culture isn't natural, and I've had to explain to the
| leadership that publicly humiliating (in jest) someone for
| getting caught by the phishing tests or posting private data to
| a pastebin type service is a bad idea.
|
| And I've interacted with plenty of people who externalize
| everything that goes wrong to them, naturally some of these
| folks will be in leadership positions.
| loktarogar wrote:
| No junior should have been able to cause this much damage on
| their own without a safety net of some kind.
|
| It's on the company for cancelling their backups.
| freehorse wrote:
| This sounds like a company that does not learn from errors, looks
| for "junior engineer" scapegoats instead of looking for the
| systemic processes that facilitated this, and not a great place
| to stay tbh. This was a chance for the company to reflect on some
| of their processes, and take measures that will avoid similar
| issues (and the steps to take are pretty obvious). And the
| description of what happened afterwords show a probably toxic
| environment.
|
| It should never be like this, and especially in this case I blame
| OP 0%. This is something that could happen to anybody in such
| circumstances. I have not deleted a full database, but have had
| to restore stuff a few times, I have made mistakes myself and
| have rushed to fix problems caused by others' mistakes and each
| single time the whole point and discussions was about improving
| our processes so that this does not happen again.
| newaccountman2 wrote:
| > backups for this MySQL instance had been cancelled over 2
| months ago.
|
| Uhh, there's the problem, not that someone accidentally deleted
| something lol
| menzoic wrote:
| Clearly the fault of a terribly lead engineering organization.
| Mistakes are almost guaranteed to happen. This is why good
| engineering orgs have guardrails in place. There were no
| guardrails whatsoever here. Accounts used to manually access
| adhoc production databases should not have delete permissions for
| critical data. And worst off all no backups.
| cybervegan wrote:
| There's a lot of responsibility there resting on your superiors
| because they weren't following "best practises". Sure you fucked
| up, but if they had backups, it wouldn't have been such a
| disaster, and if you had a Dev environment to test against, it
| would have been a non-issue entirely. Straight out of Uni, you
| shouldn't have been expected to know that, but I bet you grew as
| a consequence.
| Spivak wrote:
| Yep, whether the leadership recognizes it or not this is an
| organizational failure. No access controls for destroying prod
| data, no backups, no recovery plan, told to do testing in prod,
| whatever horrible process they have that required engineers
| regularly directly accessing the database.
| cstrahan wrote:
| I can relate to this with my own story, where I managed to delete
| an entire database -- my first day on the job, no less.
|
| I was hired by a little photo development company, doing both
| walk in jobs and electronic B2B orders. I was brought in to pick
| up on the maintenance and development of the B2B order placement
| web service the previous developer had written.
|
| Sadly, the previous dev designed the DB schema and software under
| the assumption that there would only ever be one business
| customer. When that ceased to be the case, he decided to simply
| create another database and spin up another process.
|
| So here I am on my first day, tasked with creating a new empty
| database to bring on another customer. I used the Microsoft SQL
| Server admin GUI to generate the DDL from one of the existing
| tables, created (and switched the connection to) a pristine, new
| DB, and ran the script.
|
| Little did I know, in the middle of many thousands of lines of
| SQL, the script switched the connection back to the DB from which
| the DDL was generated, and then proceeds to drop every single
| table.
|
| Oops.
|
| Of course, the last dev disabled back ups a couple months before
| I joined. My one saving grace was that the dev had some strange
| fixation on logging every single thing that happened in a bunch
| of XML log files; I managed to quickly write some code to rebuild
| the state of the DB from those log files.
|
| I was (and am) grateful to my boss for trusting my ability to
| resolve the problem I had created, and placing as much value as
| he did in my ownership of the problem.
|
| That was about 16 years ago. One of the best working experiences
| in my career, and a time of rapid technical growth for myself. I
| would have missed out on a lot if that had been handled
| differently.
| esafak wrote:
| > Sadly, the previous dev designed the DB schema and software
| under the assumption that there would only ever be one business
| customer.
|
| What kind of an assumption is that?!
| zitterbewegung wrote:
| > I found myself on the phone to Rackspace, leaning on a desk for
| support, listening to their engineer patiently explain that
| backups for this MySQL instance had been cancelled over 2 months
| ago. Ah.
|
| This is the issue not what the author did. It would be a matter
| of time that the database would have been accidentally deleted
| somehow.
| kaiokendev wrote:
| Have been in situations just like this, on pretty much every side
| (the fuck-upper, the person who has to fix the fuck up, and the
| person who has to come up with a fuck-up remediation plan)
|
| The most egregious case involved an incompetent configuration
| that resulted in hundreds of millions $ in lost data and a
| 6-month long automated recovery project. Fortunately, there were
| traces of the data across the entire stack - from page caches in
| a random employee's browser, to automated reports and OCR dumps.
| By the end of the project, all data was recovered. No one from
| outside ever found out or even realized anything had happened -
| we had redundancy upon redundancy across several parts of the
| business, and the entire company basically shifted the way we did
| ops to work around the issue for the time being. Every department
| had a scorecard tracking how many of their files were recovered,
| and we had little celebrations when we hit recovery milestones.
| To this day only a few people know who was responsible (wasn't
| me! lol)
|
| Blame and derision are always inevitable in situations like this.
| It's how it's handled afterwards that really marks the competence
| of the company.
| xyst wrote:
| > One of the peculiarities of my development environment was that
| I ran all my code against the production database.
|
| Hahaha. I still see this being done today every now and then.
|
| > The CEO leaned across the table, got in my face, and said,
| "this, is a monumental fuck up. You're gonna cost us millions in
| revenue". His co-founder (remotely present via Skype) chimed in
| "you're lucky to still be here".
|
| this type of leadership needs to be put on blast. 2010 or 2024,
| doesn't matter.
|
| If it's going to cost "millions in revenue", then maybe it would
| have been prudent to invest time in proper data access controls,
| proper backups, and rollback procedures.
|
| Absolutely incompetent leadership should never be hired ever
| again. There should be a public blacklist so I don't make the
| mistake of ever working with such idiocy.
|
| The only people ever "fired" should be leadership. Unless the
| intent is on purpose in which you should be subject to jail time
| BoorishBears wrote:
| They let you stay after costing them millions in revenue then?
| Doesn't sound like the worst leadership to me?
| hcarvalhoalves wrote:
| > The CEO leaned across the table, got in my face, and said,
| "this, is a monumental fuck up. You're gonna cost us millions in
| revenue". His co-founder (remotely present via Skype) chimed in
| "you're lucky to still be here".
|
| Should expose the CEO's name. Between this and forcing you to
| work 3 days straight, that was the least professional way to
| handle this situation.
| alex_lav wrote:
| > I found myself on the phone to Rackspace, leaning on a desk for
| support, listening to their engineer patiently explain that
| backups for this MySQL instance had been cancelled over 2 months
| ago. Ah.
|
| As usual, a company with legitimately moronic processes
| experiences the consequences of those moronic processes when a
| "junior" person breaks something. Whoever turned off those
| backups as well as whoever thought devs (especially "junior"
| devs) should be mutating prod tables by hand are ultimately
| accountable.
| steve_adams_86 wrote:
| I can't imagine putting someone who's new to this work in that
| kind of precarious position. If I let someone make a mistake that
| severe, _I 'd_ apologize to _them_ and work with them through the
| solution and safeguards to prevent it from happening again.
|
| A little bit of room for error is essential for learning, but
| this is insane. I'm so glad the only person who has ever put me
| in that kind of position is me, haha. This career would have
| seemed so much scarier if the people I worked with early on were
| willing to trust me with such terrifying error margins.
| delichon wrote:
| It makes you a better developer. I backup obsessively BECAUSE I
| fucked up almost this badly and more than once. Hire yourself
| back and charge a bit more for the extra wisdom.
| andrewstuart wrote:
| It is ALWAYS the fault of management when the databases are lost.
|
| Engineers must never feel guilty if the company was run in such a
| way as to make that possible.
___________________________________________________________________
(page generated 2024-07-13 23:01 UTC)