[HN Gopher] I Accidentally Deleted 7TB of Videos Before Going to...
___________________________________________________________________
I Accidentally Deleted 7TB of Videos Before Going to Production
Author : thevinter
Score : 445 points
Date : 2022-05-05 10:00 UTC (13 hours ago)
(HTM) web link (blog.thevinter.com)
(TXT) w3m dump (blog.thevinter.com)
| lnxg33k1 wrote:
| But are you a junior dev with less than one year of experience
| working by yourself alone at a company? No tech lead/help?
| birdyrooster wrote:
| Is 7TB a lot? Peers at personal arrays at orders of magnitude
| greater.
| iamben wrote:
| I like these stories. I think they resonate well for 'the rest of
| us'. I've made plenty of mistakes like this - you learn and grow,
| right?
|
| One of the best things about HN is that so many incredible,
| talented people post. It's incredibly inspiring to raise your own
| game, to see what the best are doing. But sometimes it's equally
| important to realise we all fuck up, and for every unicorn dev
| there's another thousand of us grinding away.
|
| OP - well done for sorting the problem and telling us all about
| it!
| rossdavidh wrote:
| Amen
| JacobiX wrote:
| > It involves bad practices and errors from multiple parties in a
| world that might seem
|
| > foreign to the "Silicon Valley" world but paints an accurate
| picture of what
|
| > development is for small IT companies around the world
|
| Everybody makes mistakes even in the "Silicon Valley" world, but
| such problems cloud be easily caught by testing (which he did but
| it was restricted to the first page) and performing a simple dry-
| run.
| crispyambulance wrote:
| Exactly, _everyone_ makes mistakes. Sometimes huge ones. In
| hindsight or on the sidelines it 's always easy to point out a
| few technical things that WOULD HAVE avoided catastrophe, but
| does that help? I think not (aside from a cautionary parable
| for interns).
|
| Things are complicated, people are human and forget things,
| there are pressures to "get it done" and override the
| guardrails. Everybody has horror stories. Some worse than
| others. Welcome to the OP's day of horror. I would think
| "Silicon Valley" dev-ops horror stories make this one seem like
| a triviality.
| batch12 wrote:
| It's like the first time you run rm -rf
| /path/to/delete/ *
|
| And realize it is taking too long...
| SnowHill9902 wrote:
| Can you explain? I feel like it removes / but not sure why.
| switch007 wrote:
| The error is the space before the asterisk. The original
| intention was to delete the contents of the folder
| /path/to/delete/. Instead, the asterisk enumerates files in
| the current directory and they get deleted
| KarlKode wrote:
| Besides recursively deleting /path/to/delete/ the command
| also deletes all (non hidden) content of the current
| directory (note the * at the end of the line). I assume the
| correct command would be /path/to/delete/*.
| Tesl wrote:
| It removes everything in the current directory
| pwg wrote:
| rm -rf /path/to/delete/ *
|
| Note the space between the last / and the *
|
| This will recursively remove the directory /path/to/delete
| and remove every file/directory that matches * in the current
| directory where 'rm' is being run.
|
| When what was most likely meant was: rm -rf
| /path/to/delete/*
|
| Note the lack of a space between the last / and _. This will
| remove all files that match_ that reside in the
| /path/to/delete/ directory.
| progx wrote:
| Now you learned what a backup is.
| lpointal wrote:
| How can any enterprise only rely on such online services and
| not keep copies of their job on their own storage ?
|
| At least store in large TB hard disks connected with a SATA
| adapter when needed, and put them in a case in a safe place
| (better: two copies, stored in two places). What is the HD +
| copy time price relatively to production work ?
| stareatgoats wrote:
| A great success story as far as I'm concerned, even if it doesn't
| reflect well on Vimeo support. But a good reminder to have
| someone doublecheck your logic if you aim to delete massive
| amounts of data from production. And to check if the backups are
| working (producing restorable data) on a regular basis. Sometimes
| they just seem to be working, as I have learned the hard way...
| legalcorrection wrote:
| [deleted]
| JauntyHatAngle wrote:
| I'm baffled by this too. Unnecessary bridge burning I'd call
| it.
|
| It's not even necessary to the story.
| dkersten wrote:
| Its explained on the first line: " I'm a Junior Developer
| with less than one year of actual experience. Some of the
| things that might seem obvious to some might not be so for
| me". I guess it applies to this, too, not just the technical
| aspects.
| dsego wrote:
| I might've missed it, but I don't think that line existed
| when this was first posted.
| thevinter wrote:
| You're right and I edited the company's name (might be too late
| but better this way). That said I'm not very happy with the
| experience of working for TheCompanyTM anyways so I'm in the
| process of switching jobs.
|
| Thanks for the comment :)
| philliphaydon wrote:
| I would take down the post entirelly.
|
| Your current job is linked in your CV.
| legalcorrection wrote:
| And try emailing the hackernews mods asking them to take
| this post down.
| yowlingcat wrote:
| As sibling comments indicate, I would advise emailing HN mods
| to take this post down and remove it from your blog and post
| it on an anonymous one. Here are the problems you will face:
|
| 1) Your current blog has your current employer + client
| linked to it. 2) Your github has your real name. 3) All of
| these have been crawled/archived.
|
| None of this bodes well for your career in the future. While
| I think your blog post is a great war story, it's really not
| a good idea to post it on your main account which can be
| traced back to your real name and CV because it will come up
| the next time you apply for a job.
|
| Unfortunately, even if it illustrates a great deal of
| ingenuity and creativity on your part in fixing a mess you
| made, many folks will take one look at it and be judgmental.
| You have to manage your reputation online and be careful.
| legalcorrection wrote:
| You're welcome and good luck!
| KingOfCoders wrote:
| Talking bad about your employer is great for finding a new
| job. Companies are eager to hire people who bad-talk them.
| breakfastduck wrote:
| He doesn't talk bad about his employer. He talks bad about
| his employers client.
| mkr-hn wrote:
| Tech is like any other human endeavor. People talk.
| People change jobs and still like the people in the place
| they left.
| urbandw311er wrote:
| Would you have had the courage to post this here if you hadn't
| been able to fix it?
| tomkwong wrote:
| First, I want to say that this is a great post. You always grow
| stronger when you make mistakes. Writing it up solidify
| understanding in the learning process.
|
| This story resonates with many people here because many
| experienced engineers had done something similar before. For me,
| destructive batch operations like this would be two distinct
| steps:
|
| 1. Identify files that need to be deleted; 2. Loop through the
| list and delete them one by one.
|
| These steps are decoupled so that the list can be validated. Each
| step can be tested independently. And the scripts are idempotent
| and can be reused.
|
| Production operations are always risky. A good practice is to
| always prepare an execution plan with detailed steps, a
| validation plan, and a rollback plan. And, review the plan with
| peers before the operation.
| notyourday wrote:
| > 1. Identify files that need to be deleted; 2. Loop through
| the list and delete them one by one.
|
| > These steps are decoupled so that the list can be validated.
| Each step can be tested independently. And the scripts are
| idempotent and can be reused.
|
| This is the most underrated comment.
|
| I'm saying it as someone who had the ultimate oversight of
| deleting hundreds of TBs per day spread of billions of files on
| different clouds and local storage.
| dsego wrote:
| > but at the time the code seemed completely correct to me
|
| It always does.
|
| > Well, it teaches me to do more diverse tests when doing
| destructive operations.
|
| Or add some logging and do a dry run and check the results,
| literally simple prints statements:
| print("-----") print("Downloading videos ids from url:
| {url}") print(list of ids) ... ...
| ... # delete() dangerous action commented out until I'm
| sure it's right print("I'm about to delete video {id}")
| print("Deleted {count} videos") # maybe even assert ...
|
| Then dump out to a file and spot check it five times before
| running for real.
| aqme28 wrote:
| Rather than commenting it out, I suggest adding a --live-run
| flag to scripts and checking the output of --live-run=false (or
| omitted) before you run it "live."
| sdevonoes wrote:
| But then you have double the chances of introducing a bug for
| the specific scenario we are talking about:
|
| Before: there is chance there is a bug in my "delete" use
| case
|
| Now: what we have before plus the change that there is a bug
| in my "--live-run" flag
| aqme28 wrote:
| You can make automated tests for your flag. You can't make
| automated tests for your code comments.
| mbiondi wrote:
| Agreed, I've also been burned doing stupid things like this and
| always print out the commands and check them before actually
| doing the commit.
|
| As they say, measure twice, cut once.
|
| Don't feel bad, I think every professional in IT goes through
| something similar at one time or another.
| lifthrasiir wrote:
| Human-in-the-loop is so important concept in ops and yet
| everyone (that's including me) seems to learn it the hard way.
| GordonS wrote:
| It's amazing the number of times I look at some simple code and
| think "nah, this is so simple it doesn't need a test!", add
| tests anyway (because I know I should)... and immediately find
| the test fails because of an issue that would have been
| difficult to diagnose in production.
|
| Automated tests are awesome :)
| dncornholio wrote:
| Dry run really is key here. Most automated tests wouldn't find
| this bug.
| pc86 wrote:
| I just want to say as someone currently working on a script to
| delete approximately 3.2TB of a ~4TB production database, this
| subthread is pure gold.
| hayd wrote:
| I'd make sure those include WARN or ERROR (I'd use logging to
| do that), that way you can grep for those. Spot checking might
| be difficult if the logs get long.
| V__ wrote:
| This was my first thought too. Another think I like to do, is
| to limit the loop to say one page or 10 entries and check after
| each run that it was correctly executed. It makes it a half-
| automated task, but saves time in the long run.
| hinkley wrote:
| Condensed to aphorism form: Decide, then act.
|
| There's a whole menagerie of failure modes that come from
| trying to make decisions and actions at the same time. This is
| but one of them.
|
| Another of my favorites is egregious use of caching, because
| traversing a DAG can result in the same decision being made
| four or five times, and the 'obvious' solution is to just add
| caches and/or promises to fix the problem.
|
| As near as I can tell, this dates back to a time when
| accumulating two copies of data into memory was considered a
| faux pas, and so we try to stream the data and work with it at
| the same time. We don't live there anymore, and because we
| don't live there anymore we are expected to handle bigger
| problems, like DAGs instead of lists or trees. These
| incremental solutions only work with streams and sometimes
| trees. They don't work with graphs.
|
| Critically, if the reason you're creating duplicate work is
| because you're subconsciously trying to conserve memory by
| acting while traversing, then adding caches completely
| sabotages that goal (and a number of others). If you build the
| plan first, then executing it is effectively dynamic
| programming. Or as you've pointed out, you can just not execute
| it at all.
|
| Plus the testing burden is so drastically reduced that I get
| super-frustrated having to have this conversation with people
| over and over again.
| password4321 wrote:
| SELECT COUNT(1) FROM table -- UPDATE table SET
| col='val' WHERE 1=1
| worble wrote:
| BEGIN TRANSACTION UPDATE table SET col='val' WHERE
| 1=1 ROLLBACK
| password4321 wrote:
| Definitely better, when you can afford the overhead!
| tomrod wrote:
| Exactly!
| mipmap04 wrote:
| I do this, too, but I also take a count of the expected number
| of items to be deleted as well. If my collection I'm iterating
| over doesn't have exactly that number of objects I expect, I
| don't proceed.
| kortex wrote:
| This is why I like to always write any sort of user-script
| batch-job tools (backfills, purges, scrapers) with a "porcelain
| and plumbing" approach: The first step generates a fully
| declarative manifest of files/uris/commands (usually just json)
| and the second step actually executes them. I've used a --dry-
| run flag to just output the manifest, but I just read some
| folks use a --live-run flag to _enable_ , with dry-run being
| the default, and I like that much better so I'll be using that
| going forward.
|
| This pattern has the added benefit that it makes it really easy
| to write unit tests, which is something often sorely lacking in
| these sorts of batch scripts. It also makes full automation
| down the line a breeze, since you have nice shearing layers
| between your components.
|
| http://www.laputan.org/mud/mud.html#ShearingLayers
| InfoSecErik wrote:
| I tend towards a --dry-run flag for creative actions and
| --confirm for destructive actions. Probably sightly annoying
| that the commands end up seemingly different, but it sure
| beats accidentally nuking something important.
| gilleain wrote:
| Yes, I find command line tools that have a "--dry-run" flag to
| be very helpful. If the tool (or script or whatever) is
| performing some destructive or expensive change, then having
| the ability to ask "what do you think I want to do?" is great.
|
| It's like the difference between "do what I say" and "do what I
| mean"...
| bzxcvbn wrote:
| That's what I like about powershell. Every script can include
| a "SupportsShouldProcess" [1] attribute. What this means is
| that you can pass two new arguments to you script, which have
| standardized names across the whole platform:
|
| - -WhatIf to see what would happen if you run the script;
|
| - -Confirm, which asks for confirmation before any
| potentially destructive action.
|
| Moreover these arguments get passed down to any command you
| write in your script that support them. So you can write
| something like:
| [CmdletBinding(SupportsShouldProcess)] param
| ([Parameter()] [string] $FolderToBeDeleted)
| # I'm using bash-like aliases but these are really powershell
| cmdlets! echo "Deleting files in $FolderToBeDeleted"
| $files = @(ls $FolderToBeDeleted -rec -file) echo
| "Found $($files.Length) files" rm $files
|
| If I call this script with -WhatIf, it will only display the
| list of files to be deleted without doing anything. If I call
| it with -Confirm, it will ask for confirmation before each
| file, with an option to abort, debug the script, or process
| the rest without confirming again.
|
| I can also declare that my script is "High" impact with the
| "ConfirmImpact = High" switch. This will make it so that the
| user gets asked for confirmation without explicitly passing
| -Confirm. A user can set their $ConfirmPreference to High,
| Medium, Low, or None, to make sure they get asked for
| confirmation for any script that declare an impact at least
| as high as their preference.
|
| [1]: https://docs.microsoft.com/en-
| us/powershell/scripting/learn/...
| spookthesunset wrote:
| I'm a bit confused (because I didnt read the docs)... does
| calling it with "--whatif" exercise the same code path as
| calling without, only the "do destructive stuff"
| automagically doesn't do anything? Or is it a separate
| routine that you have to write?
|
| Cause if it is an entirely separate code path, doesn't that
| introduce a case where what you say you'll isn't exactly
| what actually happens?
| bzxcvbn wrote:
| It's the first option. And yes, sometimes you have to be
| careful if you want to implement SupportsShouldProcess
| correctly, it's not something you can add willy-nilly.
| For example, if you create a folder, you can't `cd` there
| in -WhatIf mode.
| FriedrichN wrote:
| All my tools that have a possible destructive outcome use
| either a interactive stdin prompt or a --live option. I like
| the idea of dry running by default.
| rjh29 wrote:
| Going further, make it dry run by default and have an
| --execute flag to actually run the commands: this encourages
| the user to check the dryrun output first.
| mmcclimon wrote:
| The rule we have is that anything that is not idempotent and
| not run as a matter of daily routine must dry-run by default,
| and not take action unless you pass --really. This has saved
| my bacon many times!
| maweki wrote:
| Deleting actually is idempotent. Doing it twice wont be
| different from doing it once.
| maccard wrote:
| Deleting * may not be though. Your selection needs to be
| idempotent.
| maweki wrote:
| idempotency means that f(X) = f(f(X)). Modifying the X
| inbetween is not allowed. Is there really an initial
| environment where rm * ; rm * ; does something different
| than rm * once?
| einsty wrote:
| In the case of any live system, i would say yes.
| Additional, and different, files could have appeared on
| the file system in between the times of each rm *.
| mikeryan wrote:
| * is just short hand for a list of files. Calling rm with
| the same list of files will have the same results if you
| call it multiple times. That's idempotent.
|
| Your example is changing the list of files, or arguments
| to rm between runs. Same as pc85's example where the
| timestamp argument changes.
| pc86 wrote:
| In addition to what einsty said (which is 100% accurate),
| if you're deleting aged records, on any system of
| sufficient size objects will become aged beyond your
| threshold between executions.
| jameshart wrote:
| Right. You can kind of consider the state of a filesystem
| on which you occasionally run rm * purges to be a system
| whose state is made up of 'stuff in the filesystem' and
| 'timestamp the last purge was run'.
|
| If you run rm * multiple times, the state of the system
| changes each time because that 'timestamp' ends up being
| different each time.
|
| But if instead you run an rm on files older than a fixed
| timestamp, multiple times, the resulting filesystem is
| idempotent with respect to that operation, because the
| timestamp ends up set to the same value, and the
| filesystem in every case contains all the files added
| later than that timestamp.
| hansel_der wrote:
| > Is there really an initial environment where rm * ; rm
| * ; does something different than rm * once?
|
| if * expands to the rm binary itself, maybe.
| maweki wrote:
| How is the system different after the first and after the
| second call?
| jgoldshlag wrote:
| If there is an rm executable in the current directory,
| and also one later in your PATH, the second run might use
| a different rm that could do whatever it wants to
| zrail wrote:
| Early in my career I used --yes-i-really-mean-it and then a
| coworker removed it with the commit message "remove
| whimsy".
|
| T'was a sad day.
| inglor_cz wrote:
| Yeah, that is what I recommend too.
|
| Instead of performing the dangerous action outright, just log a
| message to screen (or elsewhere) and watch what is happening.
|
| Alternatively, or subsequently, chroot and try that stuff on
| some dummy data to see if it actually works.
| thunderbong wrote:
| That is called experience.
|
| Good decisions come from experience. Experience comes from
| making bad decisions.
| dkersten wrote:
| I was involved with archiving of data that was legally required
| to be retained for PSD2 compliance. So it was pretty important
| that the data was correctly archived, but it was just as
| important that it was properly removed from other places due to
| data protection.
|
| This is basically the approach that was taken: log before and
| after every action exactly what data or files is being acted on
| and how. Don't actually do it. Then have multiple people
| inspect the logs. Once ok'd, run again, with manual prompts
| after each log item asking to continue, for the first few
| files/bits of data. Only after that was ok'd too did it run the
| remainder.
|
| In other things I've worked on, I've taken the terraform-style
| plan first, then apply the plan approach, with manual
| inspection of the plan in between.
| dredmorbius wrote:
| mv then rm is another idiom. So long as you have the space.
|
| For database entries, flag for deletion, then delete.
|
| In the files case, the move or rename also accomplishes the
| result of breaking any functionality which still relies on
| those file ... whilst you can still recover.
|
| Way back in the day I was doing filesystem surgery on a Linux
| system, shuffling partitions around. I meant to issue the 'rf
| -rm .' in a specific directory, I happened to be in root.
|
| However ...
|
| - I'd booted a live-Linux version. (This was back when those
| still ran from floppy).
|
| - I'd mounted all partitions _other_ than the one I was
| performing surgery on '-ro' (read-only).
|
| So what I bought was a reboot, and an opportunity to see what
| a Linux system with an active shell, but no executables,
| looks like.
|
| Plan ahead. Make big changes in stages. Measure twice (or 3,
| or 10, or 20 times), cut once. Sit on your hands for a minute
| before running as root. Paste into an editor session (C-x C-e
| Readline command, as noted elsewhere in this thread).
|
| Have backups.
| marcosdumay wrote:
| You mean cp then rm?
|
| And yes, copy, verify, delete. And make sure by the code
| structure that you either do the three on the same files,
| or their fail.
|
| Also, do it slowly, with just a bit of data on each
| iteration. That will make the verification step more
| reliable.
|
| Anyway, for a huge majority of cases, only having backups
| is enough already. Just make sure to test them.
| andi999 wrote:
| I think mv then rm is probably meant as 'windows trash
| bin' style.
| csours wrote:
| Make a plan, check the plan, [fix the plan, check the plan
| (loop)], do the plan
|
| See PDCA for more a more time critical decision loop.
| https://en.wikipedia.org/wiki/PDCA
| zeristor wrote:
| Yes, I love the idea of the Plan Apply.
| crispyambulance wrote:
| > ... Then have multiple people inspect the logs. Once ok'd,
| run again, with manual prompts after each log item asking to
| continue...
|
| This sort-of reminds me of some "critical" work I had to do a
| couple of decades ago. I was in a shop that used this
| horrifically tedious tool for designing masks for special
| kinds of photonic devices-- basically it was tracing out
| optical waveguides that would be placed on a crystal that was
| processed much like a silicon IC.
|
| The process was for TWO of us to sit in front of computer and
| review the curves in this crazy old EDA layout tool called
| "L-edit" before it got sent to have the actual masks made
| (which were very expensive). It took HOURS to check
| everything.
|
| The first hour was tolerable but then boredom started to
| creep in and we got sloppy. The whole reason TWO people got
| tasked with this was because it was thought that we would
| keep each other focused-- 2 pairs of eyes are better than
| one, right?. Instead, it just underscored the tedium of it
| all. One day someone walked in and found us BOTH in DEEP
| SLEEP in front of the monitor. Having two people didn't
| decrease the waste caused by mistakes, it just bored the hell
| out of more people.
| foota wrote:
| How many mistakes did you catch?
| Freestyler_3 wrote:
| From his story I can tell he found one big mistake. The
| tedious work itself.
| mmmm2 wrote:
| Another good approach is do deletions slowly. Put sleeps
| between each operation, and log everything. That way if you
| realize something is broken, you have a chance of catching it
| before it's too late.
| water8 wrote:
| It never hurts to ask for another set of eyes to review. At
| the least if something goes awry, the blame isn't solely on
| you.
| tauwauwau wrote:
| Once we get used to doing same thing multiple times a day, it
| doesn't matter if the log shows that we're about to take a
| destructive action, we'll still do it. Only thing that is
| foolproof is to not take the destructive action because
| people make mistake, it's human nature. I don't know how this
| can be implemented, may be encrypt the files, take a backup
| in some other location (which may not be allowed).
|
| Multiple reviewers here didn't catch the mistake
|
| https://www.bloombergquint.com/markets/citi-s-900-million-
| mi...
| dkersten wrote:
| > Multiple reviewers here didn't catch the mistake
|
| Sure, but we can only do so much. I find its good bang for
| buck and alternatives that might prevent that are not
| always available, so we do the best we can. You gotta make
| a call on whether its enough or not.
| slaymaker1907 wrote:
| I'm a fan of doing things temporally so data is very rarely
| actually deleted from the database. Most of the time, you
| just update the "valid_to" field to the current time.
| Sometimes real deleted are required such as with privacy
| requests, but I think that sort of thing is pretty rare.
|
| If your application has space concerns, you can modify this
| approach to be like a recycle bin where you delete records
| which are no longer valid and have been invalid for over a
| month (or whatever time frame is appropriate for your
| application). However, I think this is unnecessary in most
| cases except for blob/file storage.
| Danieru wrote:
| That form had a couple weird checkboxes with odd wording.
| It is a famous mistake, but also rather understandable just
| because the form was cryptic.
| irrational wrote:
| Because everyone assumes that everyone else is looking at
| it more closely than they are. "I'll just do a cursory look
| since I'm sure everyone else is doing a in-depth look."
| Narrator: nobody did an in-depth search.
| HowardStark wrote:
| While this is a huge issue, a solution (well, a partial
| mitigation) I've seen and used is the "Pointing and
| Calling" technique. The basic idea is that you incorporate
| more actions beyond reading and typing or pressing a button
| --generally by having people point at something and say
| aloud what it is they're doing and what they expect to
| happen.
|
| It's used rather extensively in safety-critical public
| transportation in Japan [1] and to a lesser extent in New
| York (along with many other countries) [2]. This can easily
| extend to software without overcomplicating by just setting
| the expectation that engineers, Q&A, etc. do this even when
| alone.
|
| [1] https://www.atlasobscura.com/articles/pointing-and-
| calling-j...
|
| [2] https://en.wikipedia.org/wiki/Pointing_and_calling
| emerged wrote:
| "I'm removing that semicolon!" (Pointing)
| bbarnett wrote:
| Parent meant this sort of pointing.
|
| https://t.co/TjfX5K54H7
| akavel wrote:
| I heard of this technique, but unfortunately I don't see
| how it can be easily applied in software
| engineering/devops.
|
| Also, I now realized that aviation checklists seem to
| tend to be done similarly with gestures - at least from
| what I saw on YouTube, not sure if that's representative
| or only used during education (?)
| samus wrote:
| Spelling out loudly the command you are about to execute
| and explaining the reasoning behind it can help a lot
| too.
| samhw wrote:
| Hell, GitHub does that to an extent, with the "type the
| name of this repository to delete it" prompts. Typing the
| name of the repository isn't exactly perfect, but it's an
| interesting direction.
| Blackcatmaxy wrote:
| There was a thread recently about a repo that
| accidentally went private and lost all of its stars
| because of confusion with GH teams vs GH profile readme
| repo naming. I think this type of prompt is very useful
| for explicitly preventing the rare worst case scenarios
| but the problem is making any type of prompt "routine" so
| that our brains fail to process it.
| lostlogin wrote:
| This is it I think.
| https://news.ycombinator.com/item?id=31033758
| swid wrote:
| The suggestion in that post about how to fix it is good,
| and mirrors one I read in the Rachael by the Bay blog -
| type the number of machines to continue:
|
| https://rachelbythebay.com/w/2020/10/26/num/
|
| The take away by both is there is actually something to
| do which can wake people up when the stakes are high, and
| they might not be doing what they expect.
| oauea wrote:
| And most importantly, don't let yourself get into the
| habit of copy pasting the value
| underwater wrote:
| I wonder if your could print some non visible characters
| in there to taint the copied value in some detectable
| way.
| skrtskrt wrote:
| I always copy-paste into that box as well, they should
| probably make at least an attempt at disabling pasting
| into it
| JadeNB wrote:
| > Then have multiple people inspect the logs.
|
| I think that this is the most important part of any check.
| Your parent refers to checking the log five times, but, at
| least in my experience, I won't catch any more errors on the
| fifth time than the first--if I once saw what I expected
| rather than what was there, I'll keep doing so. Of course
| everyone has their blind spots, but, as in the famous Swiss-
| cheese approach, we just hope that they don't line up!
| veltas wrote:
| Yep, even writing a simple wildcard at command-line I will
| 'echo' before I 'rm'.
| pjerem wrote:
| On computers I own, I always install "trash-cli" and i even
| created an alias for rm to trash. It's like rm, but it goes
| to the good old trash. It will not save your prod but it's
| pretty useful on your own computer at least.
| sam0x17 wrote:
| Indeed. I would say that framework or even language-level
| support for putting things in "dry-run" mode is something
| sorely missed from many modern frameworks and languages, that
| old C libraries used to do.
| OrwellianTimes wrote:
| Experience is the best teacher(tm)
| rawgabbit wrote:
| To ensure that the files are actually are downloaded (step1),
| before deleting the original (step2). I would make make step1
| an input to step2. That is step2 cannot work without step1.
| Something like: (step1) Download video from
| URL. Include the Id in the filename. (step2) Grab the
| list of files that have been downloaded and parse to get the
| Id. Using the Id, delete the original file.
| bambax wrote:
| Yes. Also, maybe not have a delete action in the middle of a
| script. It's usually better to build a list of items to be
| deleted. In that case, two lists: items to be deleted, items to
| be kept. Then compare the lists:
|
| - make sure the sum of their lengths == number of total current
| items
|
| - make sure items_to_be_kept.length != 0
|
| - make sure no two items appear in both lists
|
| - check some items chosen at random to see if they were sorted
| in the correct list
|
| At this point the only possible mistake left is to confuse the
| lists and send the "to_be_kept" one to the delete script; a dry
| run of the delete list can be in order.
| pc86 wrote:
| I've had good success with this approach, have two distinct
| scripts generate the two lists, then in addition to your
| items here also checking that every item appears in one of
| the lists.
| ectopod wrote:
| This. The original approach can fail horribly if there's a
| problem on the server when you run the script for real. Your
| code can be perfect but that's no guarantee the server will
| always return what it ought to.
| ufo wrote:
| What do you recommend, to not get intro trouble if there are
| spaces or newlines in the file names?
| marcosdumay wrote:
| Try not to delete stuff with Bash.
|
| This is the most reliable way. Bash has a few niceties for
| error handling, but if you are using them, you would
| probably fare better in another language.
|
| If you do insist on Bash, quote everything, and use the
| "${var}" syntax instead of "$var". Also, make sure you
| handle every single possible error.
| ricardobeat wrote:
| `set -e` will abort on any error, anywhere in the
| pipeline. It's a must for any critical script.
| kevinmgranger wrote:
| Don't use a shell script.
| ufo wrote:
| Do you mean, always pass the list directly to the next
| script via function calls, without writing it to an
| intermediate file / pipeline?
| plonk wrote:
| Yes, use the list argument to Python's subprocess.run for
| example. It's much easier to not mess up if your
| arguments don't get parsed by a shell before getting
| passed.
| mkr-hn wrote:
| This sounds like a "do nothing script."
|
| https://news.ycombinator.com/item?id=29083367
|
| It defaults to not doing anything so you can gradually and
| selectively have it do something.
|
| Learned about when I posted my command line checklist tool on
| HN: https://github.com/givemefoxes/sneklist
|
| (https://news.ycombinator.com/item?id=25811276)
|
| You could use it to summon up a checklist of to-dos like "make
| sure the collection in the dictionary has the expected number
| of values" before a "do you want to proceed? Y/n"
| jagged-chisel wrote:
| This is how I do it in compiled code. In shell, I print the
| destructive command for dry runs - no conditions around whether
| to print or not, I go back to remove echo and printf to
| actually run the commands.
| zrail wrote:
| Another technique that I've used with good success is to write
| a script that dumps out bash commands to delete files
| individually. I can visually inspect the file, analyze it with
| other tools, etc and then when I'm happy it's correct just
| "bash file_full_of_rms.sh" and be confident that it did the
| right thing.
| cruano wrote:
| That was our SOP for running DELETE SQL commands on
| production too, a script that generates a .sql that's run
| manually. It saved out asses a fair amount of times
| ineedasername wrote:
| Yeah, wish I'd learned that the easy way. Fresh into one of
| my first jobs I was working with a vendor's custom
| interface to merge/purge duplicate records. It didn't have
| a good method of record matching on inserts from the
| customer web interface so a large % of records had
| duplicates.
|
| Anyway, I selected what I though was a "merge all
| duplicates" option without previewing results. What I had
| _actually_ done was "merge all selected". So, the system
| proceeded to merge a very large % of the database... Into
| One. Single. Record.
|
| Luckily the vendor kept very good backups, and so I kept my
| job. Because I also luckily had a very good boss and I had
| already demonstrated my value in other ways, he just asked
| me "Well, are you going to make that mistake again?". I
| wisely said no, and he just smiled and said "Then I think
| we're done here."
|
| I have been particularly fortunate throughout my career to
| have very good managers. As much as managers get a lot of
| flack here on HN, done well they are empowering, not a
| hindrance, and I attribute a lot of success in my career to
| them.
| JadeNB wrote:
| > Yeah, wish I'd learned that the easy way.
|
| I think that, if you've only learned something like that
| the easy way, then you haven't learned it yet. As long as
| everything's only ever gone right, it's easy to think,
| I'm in a rush this one time, and I've never really needed
| those safety procedures before, ....
| karlding wrote:
| At a previous job the DB admin mandated that everyone had
| to write queries that would create a temporary table
| containing a copy of all the rows that needed to be
| deleted. This data would be inspected to make sure that it
| was truly the correct data. Then the data would be deleted
| from the actual table by doing a delete that joined against
| the copied table. If for some reason it needed to be
| restored, the data could be restored from the copy.
| XorNot wrote:
| At the point you're doing this, you should be using a proper
| programming language with better defined string handling
| semantics though. In every place it comes up you'll have
| access to Python and can call the unlink command directly and
| much more safely - plus a debugging environment which you can
| actually step through if you're unsure.
| zrail wrote:
| Eh, I think that misses the point a bit. Use whatever you
| want to generate the output, but make the intermediary
| structure trivial to inspect and execute. If you're
| actually taking the destructive actions within your
| complicated* logic then there's less room to stop, think,
| and test.
|
| You could always generate an intermediary set,
| inspect/test/etc, and then apply it with Python. I've done
| that too, works just as well. The important thing is to
| separate the planning step from the apply step.
|
| * where "complicated" means more complicated than, for ex,
| `rm some_path.txt` or `DELETE FROM table WHERE id = 123`.
| KMnO4 wrote:
| Ah, I'm glad I'm not the only one who did this. It also means
| that you can fix things when they break halfway. Say you get
| an error when the script is processing entry 101 (perhaps
| it's running files through ffmpeg). Just fix the error and
| delete the first 100 lines.
| hinkley wrote:
| I tend to write one script that emits a list of files, and
| another that takes a list of files as arguments.
|
| It's simple to manually test corner cases, and then when
| everything is smooth I can just script1 |
| xargs script2
|
| It's also handy if the process gets interrupted in the
| middle, because running script1 again generates a shorter
| list the second time, without having to generate the file
| again.
|
| When I'm trying to get script1 right I can pipe it to a file,
| and cat the file to work out what the next sed or awk script
| needs to be.
| francis-io wrote:
| This was taught to me in my first linux admin job.
|
| I was running commands manually to interact with files and
| databases, but was quickly shown that even just writing all
| the commands out, one by one gives room personally review and
| get a peer review, and also helps with typos. I could ask a
| colleague "I'm about to run all these commands on the DB, do
| you see any problem with this?". It also reduces the blame if
| things go wrong if it managed to pass approval by two
| engineers.
|
| While I'm thinking back, another little tip I was told was to
| always put a "#" in front of any command I paste into a
| terminal. This stops accidentally copying a carriage return
| and executing the command.
| koolba wrote:
| > This stops accidentally copying a carriage return and
| executing the command.
|
| For a one-liner sure, but a multi line command can still be
| catastrophic.
|
| Showing the contents of the clipboard in the terminal
| itself (eg via xclip) or opening an editor and saving the
| contents to a file are usually better approaches. The
| latter let's you craft the entire command in the editor and
| then run it as a script.
| afiori wrote:
| From [0]:
|
| [For Bash] Ctrl + x + Ctrl + e : launch editor defined by
| $EDITOR to input your command. Useful for multi-line
| commands.
|
| I have tested this on windows with a MINGW64 bash, it
| works similarly to how `git commit` works; by creating a
| new temporary file and detecting* when you close the
| editor.
|
| [0] https://github.com/onceupon/Bash-Oneliner
|
| * Actually I have no idea how this works; does bash wait
| for the child process to stop? does it do some posix
| filesystem magic to detect when the file is "free"? I
| can't really see other ways
| mh- wrote:
| It does create and give a temporary file path to the
| editor, but then simply waits for the process to exit
| with a healthy status.
|
| Once that happens, it reads from the temporary file that
| it created.
| remram wrote:
| The 'enable-bracketed-paste' setting is an easier and more
| reliable way to deal with that:
| https://unix.stackexchange.com/a/600641/81005
|
| It will prevent any number of newlines from running the
| commands if they're pasted instead of typed.
|
| You can enable it either in .inputrc or .bashrc (with `bind
| 'set enable-bracketed-paste on'`)
| ineedasername wrote:
| _> literally simple prints statements_
|
| Yes, that can be a simple but powerful live on screen log. I
| developed a library to use an API from a SaaS vendor, in much
| the same way as the author. It was my first such project & I
| learned the hard way (wasted time, luckily no data loss or
| corruption) that print() was an excellent way to keep tabs on
| progress. On more than one occasion it saved me when the
| results started scrolling by and I did an _oh sh*t!_ as I
| rushed to kill the job.
| krono wrote:
| The No. 2 philosophy!
|
| Make sure you got everything out and off before you pull up
| your pants, or else you better be prepared to deal with all the
| shit that might follow!
| ElCapitanMarkla wrote:
| Nice work :D I tend to always add a `--dryrun` flag to any
| scripts like this these days so that when we move it to
| production we can run an extra test there just to be sure.
| mikotodomo wrote:
| > Some of the things that might seem obvious to some might not be
| so for me, thanks!
|
| > my mind thought that url would refresh itself as soon as the
| page variable changed
|
| This is what I thought too when I read the code. I don't think
| it's obvious at all!
| xmprt wrote:
| That's actually surprising to me. In most languages that I've
| worked with, strings are immutable so the fact that url doesn't
| update is more obvious to me and I'd be surprised if it did
| update.
| shantnutiwari wrote:
| What negativity and arrogance in the comments here. Jeez, it's
| like no one HN ever made a mistake, a bunch of 10xers ninja
| programmers here. Please read this:
|
| >I also want to preface this whole post by saying that I'm a
| Junior Developer with less than one year of actual experience.
| Some of the things that might seem obvious to some might not be
| so for me, thanks!
|
| It's just some kid sharing a mistake they made and owning up.
| Ease up on the "LOL what an idiot" attitude
| nicbou wrote:
| More importantly, this person is helping us learn from their
| mistake. This is something that should be encouraged, not
| mocked.
| JacobiX wrote:
| Just to be fair also to some commenters, I think that the post
| had been edited after posting from what I remember ... so maybe
| the older comments are not very relevant.
| thevinter wrote:
| To clarify, I only removed the company name and added the top
| disclaimer
| [deleted]
| [deleted]
| noufalibrahim wrote:
| I think it was a great post. Reveals a knack for clarity in
| explanations. The mistake is simple enough and natural for a
| junior. If it were just one video or something, it would
| probably not even be noteworthy. I think the developer learned
| from the incident too. So all good.
|
| I do think Vimeo was irresponsible in the whine affair though.
| snowwrestler wrote:
| I'm impressed by their commitment to automation. If that was
| me, once I realized that manually uploading from Gdrive to
| Vimeo would fix the problem, I probably would have just
| committed myself to manually doing that all weekend. It would
| feel safer and serve as a sort of penance for screwing up the
| automation the first time.
|
| But nope, they went right back to scripting and got it done.
| KrishnaShripad wrote:
| I have done a lot of such blunders myself. Accidentally deleted
| my unchecked code and had to re-write everything from memory.
|
| I envy those who claim to do no mistakes at all.
| boygobbo wrote:
| Don't envy them - they are deluding themselves.
| aeroplanetext wrote:
| I've been there! At least when you write it the second time
| it goes more quickly.
| FunnyLookinHat wrote:
| I was actually really impressed with this individual! For
| someone who has less than a year of experience, they're showing
| quite a bit of initiative, drive, and curiosity - which really
| are what make or break engineers as they develop. Taking the
| time to do a blog post (effectively a post-mortem) and share it
| is even better!
|
| And yes - I've literally done this exact same error (with TB of
| video data!). Spending the following week remediating all of
| that data loss was a great lesson in patience and attention to
| detail. :-)
|
| OP: If you're ever looking for a job be sure to send me a
| message. Contact info in profile.
| Moru wrote:
| My mistake was on floppy disc with source code, other text
| files and images. Was hand editing (in hex disc editor) the
| floppy to get back the data, sector by sector. Fun times. Not
| going back there though :-)
| nso wrote:
| Mine was a DELETE FROM Users; WHERE... Fun was had.
| codegeek wrote:
| Usually the recommendation is to not start writing the
| DELETE query first. Write the SELECT query first and see
| the results. If you miss the WHERE clause, you will see
| that immediately. Then change SELECT * to DELETE. But I
| assume you have learned that lesson already :)
| Moru wrote:
| Yes, but it can't be stressed enough, always the first
| time for someone.
| tasuki wrote:
| Wrt "less than one year of experience", looking at Nikita's
| CV and GitHub, despite the title, they aren't really a junior
| developer :)
| franciscop wrote:
| True, he's been teaching programming since at least 2018, I
| was in a similar boat where I'd been programming for almost
| 5-7 years for fun and profit before my first official
| fulltime job.
| [deleted]
| 692 wrote:
| there's an argument that the best people around are the people
| who have already (or almost) made some big mistakes.
|
| I have made a couple of huge ones - luckily I kept my job
| comprev wrote:
| When interviewing candidates I always enquire about their
| professional mistakes. Their reply often is the decider
| between hiring/rejecting.
|
| I want to have colleagues who admit fault, be truthful about
| actions which lead to the issue, and learn from it. The
| learning includes organisations perhaps putting additional
| measures in place to prevent future issues.
|
| One candidate told of a story how he was On-Call early in his
| career and was told situations happened so rarely, just to
| continue living life as normal.
|
| Unfortunately for him, his pager went off at 02:00am while he
| was high as a kite on drugs - but felt he had to take action
| (mostly due to arrogance!).
|
| He promptly deleted production data and things only got worse
| when he tried to rectify the situation.
|
| Of course he was fired for his actions but ever since he's
| been stone cold sober when on-call.... just in case.
|
| He learned a valuable lesson about professional
| responsibilities.
| vsareto wrote:
| >When interviewing candidates I always enquire about their
| professional mistakes.
|
| "You see, my biggest mistake was programming in the first
| place! Since then, it's just been an apology tour"
| avgcorrection wrote:
| It's funny how so many managers on this board are like,
| yeah I focus disproportionately much on this one factor.
| Why? Because my intuition and experience says so.
| DoubleDerper wrote:
| Don't fire for the mistake. Fire for the inability of
| someone to own it, cover it up, or point fingers at others.
| comprev wrote:
| His honesty of admitting to being off his nut while on-
| call led to his firing, not the action of deleting
| things.
| BolexNOLA wrote:
| >His honesty of admitting to being off his nut
|
| This now my favorite euphemism for being high
| YorickPeterse wrote:
| I currently have about 12 years of experience, and a few years
| back I accidentally cleaned up GitLab's database a bit too
| well. I wouldn't be surprised if the people being dismissive
| simply never worked on a moderately complex and large system,
| and thus don't understand how easy it is to make these kinds of
| mistakes.
| nspattak wrote:
| LOL!
|
| I have multiple years of experience than this man and still I
| could *very* *too* *easily* make a 7Tb mistake (or likely more
| :P )
| grumple wrote:
| This sort of mistake happens all the time when you write in
| multiple languages. A key solution is code review, a standard
| practice which doesn't seem to have happened here (and
| certainly isn't the fault of a junior).
| [deleted]
| aristus wrote:
| Hey, everyone, ease up. I have: 1) dropped a production database
| because I thought it was the test database. 2) screwed up a print
| job costing $100,000 in today's money and had to do it again 3)
| crashed all of Facebook with a C++ bug. 4) crashed Facebook photo
| uploads, with a JavaScript bug, in my first month. 5) literally
| killed a startup's cash flow and caused them to lose their
| merchant account because I over focused on the wrong bugs.
| paintman252 wrote:
| You worked at Facebook, we get it
| hbn wrote:
| At my first development job (paid internship at a moderately-
| sized, though fast-growing business - maybe 300 people at the
| time?) I introduced a bug that didn't appear until a certain
| microservice stopped working (my code defaulted in the wrong
| direction when the ms failed) and as far as I can tell they may
| have lost or almost lost a pretty big account from it. In an
| after-hours meeting regarding the issue, one of the higher ups
| ended up storming out and never showing up again.
|
| In my defence, we had to get 2 PR approvals before anything was
| merged! But I definitely learned a thing or two from that
| experience
| [deleted]
| JasonFruit wrote:
| I believe if we're honest, we've all done stupid things we should
| have avoided. I remember a group of about 3000 emails that went
| out to insurance agents saying that policy #123456789 for Someone
| Funky was going to be cancelled by underwriting. I also remember
| very quickly figuring out how to automate Outlook's email recall
| feature.
|
| We've all made big dumb mistakes. Recover and learn.
| hexsprite wrote:
| when doing migrations/conversions I always write a script in dry-
| run mode first. I exhaustively check the results to make sure
| they are expected. Then try to do a real conversion/transfer of
| only the 1st file and make sure that worked. Then do a couple
| more. Etc. Only then do I feel confident to do the whole thing.
| uptown wrote:
| Junior Dev: "I'm under an NDA"
|
| Also Junior Dev: "Here's my source code"
| [deleted]
| bufferoverflow wrote:
| Always do a dry run when deleting many things with code.
|
| - Captain Obvious
| mastazi wrote:
| > Vimeo doesn't provide an easy way of doing it. I wrote to the
| support team around October asking them if it was possible to do
| a migration, and they told us that they "will look into it"
| without letting us know anything ever since. [...] At one point,
| without letting us know anything, Vimeo decided it was a great
| idea to comply with our request and dumped all the videos present
| on OTT onto the new platform. No questions were asked [...] they
| were duplicating videos that were already uploaded.
|
| Oh yes Vimeo, the crappy company that won't let you play videos
| unless you enable autoplay in your browser[1].
|
| Selecting them as a provider was the actual mistake.
|
| [1] https://askubuntu.com/questions/777489/vimeo-video-not-
| playi...
| 0xbadcafebee wrote:
| This is more common than you think. Not just losing data, but not
| having a good handle on where the important parts of the system
| are, and how close you are to catastrophe. I find diagrams really
| help. I can recall a visual map of the system when I work on some
| component, and think, "OH, I remember seeing this component
| connected to a really critical thing, I need to check something
| first."
|
| Start by creating one empty page for every component of your
| system. You won't remember them all, but over time you can add
| missing ones. Each page is the authoritative source of info on
| that component. If you need more pages for one component, put
| them in a directory of the same name as the page and add ".d" to
| the directory name, and link to them from the first page.
| Finally, create a diagram (however you want) that includes every
| component you have a page for. Add the count of components to the
| top of the diagram. If the count on the diagram doesn't match the
| number of documents, time to update the diagram. If you ever add,
| remove or rename a page, time to update the diagram. If you do
| this the same way for every different system you have, you can
| link them all together and get both small and large scale
| diagrams. (p.s. don't waste time automating this unless you find
| the system changing constantly or you have a very big system)
| fedeb95 wrote:
| in my opinion any process that isn't preceded by another
| identical and automated process that varies only by the data
| involved is very risky to do in production. your management
| hopefully had a big reality check? or not because of backups?
| chanandler_bong wrote:
| Experience is directly proportional to the amount of equipment
| ruined or data lost.
|
| Even though you were fortunate not to lose any data, you gained a
| lot of experience!
| rexreed wrote:
| A big part of the reason for the problem in this post is because
| Vimeo made it impossible to move videos from one Vimeo product to
| another Vimeo product: "There were roughly 500 videos on VimeoOTT
| that had to be transferred to Enterprise and Vimeo doesn't
| provide an easy way of doing it."
|
| I have found working with Vimeo to be very frustrating,
| especially recently. They have a great video solution, especially
| for streaming, but they seem to put these unnecessary and
| frustrating roadblocks that make me constantly question my
| decision to use Vimeo. From in ability to move videos from one
| place to another, requiring complete uploads (resulting in
| problems like this post) to nonsensical limits and pricing,
| especially on their new webinar offering, which has a limit of
| 100 registered attendees. For anyone who has run webinars before,
| this makes no sense since 100 registered attendees usually means
| 20-30% of those people actually attend, so you're capped at 20-30
| live attendees. They should price it like most event sites and
| charge per live attendance rather than registration.
|
| Regardless, I've been very frustrated with Vimeo since it could
| be so much better if they didn't have these roadblocks in place.
| If they could have easily enabled moving videos from one product
| to another, the post (and 7TB of lost videos) would never have
| happened. It wasn't always this way with Vimeo, but they went IPO
| in May 2021 and it's no surprise they're turning the screws on
| their product offering and pricing now.
| beeforpork wrote:
| > I Accidentally Deleted 7TB of Videos ...
|
| Spoiler:
|
| But there was a backup that could be reuploaded in time and
| everything was fine in the end.
| nix23 wrote:
| ZFS -> Snapshot....always!! Before touching writable-data (my
| personal mantra) ;)
| hnlmorg wrote:
| I love ZFS too but that's not really relevant to this
| discussion because the deleted items were on a video hosting
| platform and the company did already have local copies.
| nix23 wrote:
| Yes and? Make a snapshot on live. Again, never touch data
| before snapshot.
| volume wrote:
| This reminds of some IRC threads. You post a question and
| someone's answer assumes you are going to rip out and
| replace your existing prod setup just so you can use their
| pet tool.
| hnlmorg wrote:
| At risk of sounding snarky, you do understand how video
| hosting platforms work? Customers, even enterprise ones,
| don't have shell access let alone control over what file
| system is used.
|
| There are a hundred ways this problem could have been
| prevented but ZFS isn't one of them.
| whiplash451 wrote:
| So, "i am under NDA" but I reveal my client's name and a lot of
| sensitive details about what we are doing. LOL.
| dewey wrote:
| Where do you see the clients name? I only see Vimeo being
| mentioned.
| ceejayoz wrote:
| It has been edited.
|
| https://news.ycombinator.com/item?id=31271836
| daniel-cussen wrote:
| Well at least deleting the secret is a step back toward the
| NDA he left behind.
| Closi wrote:
| It still breaks the NDA:
|
| * Firstly, you don't have to name the company to break the
| NDA anyway (you are still disclosing information you aren't
| supposed to disclose regardless of if it can be linked back
| to the company).
|
| * Secondly, the client is still named on the front page of
| the website.
|
| * Thirdly, OP posted this with his real name that trivially
| links back to the dev shop he is working for. The site also
| has his CV which lists the client again, with a description
| of the project to link it to the post.
|
| * Finally, The client can trivially be identified by
| googling the description in the second paragraph (i.e. just
| search the named countries in operation plus the word Gym).
| 12ian34 wrote:
| Not all NDAs have the same terms. I could write up and
| serve an NDA right now that still counts as an NDA yet
| permits everything in your list.
| Closi wrote:
| All contracts vary in terms, but I've never seen an NDA
| that says "you can talk about the content under NDA as
| long as you don't mention the businesses name, and just
| identify who they are in a roundabout way instead".
|
| "Well i'm under an NDA, so I can tell you all the
| specifics of the project, but I can't tell you the
| companies name. I _can say_ they own the largest search
| engine though, and have a market cap of 1.5 trillion, and
| rhyme with "Roogle", but I really can't say who they
| are. Anyway, here is some code I wrote for them and a
| description of how we nearly ruined their project along
| with me calling them incompetent..."
| dewey wrote:
| Got it. To be honest I'd be hesitant to publish a blog post
| like that with your name + current company name attached to
| it.
|
| It's a bit different to share a fun story a few years later
| about that time you almost wiped production.
| [deleted]
| unfocused wrote:
| I'm currently working with FOIA software, and a regular user can
| only delete one document at a time from the information that they
| verify/redact before sending out. They can't even multi select!
| Only an admin can delete multiple documents at one time.
|
| I'm guessing users accidentally deleted multiple documents one
| too many times, and now it's baked in.
| qwertox wrote:
| Aaaahhh, the feeling you get when you notice that you fucked up.
| Everything gets quiet, body motion stops, cheeks get hot, heart
| starts to beat and sinks really low, "fuck, fuck, fuck, fuck,
| fuck, fuck, fuck, fuck, fuck, fucking shit". Pause. Wait. Think.
| "Backups, what do I have, how hard will it be to recover? What is
| lost?". Later you get up and walk in circles, fingers rolling the
| beard, building the plan in the head. Coffee gets made.
| wonderwonder wrote:
| lol, its amazing how fast the blood leaves your face when your
| mind transitions from "cool that worked well" to "Oh no, what
| have I done?"
|
| That backups comment sounds very familiar.
|
| I accidentally deleted a clients products table from the
| production database in my early years as a solo dev. There was
| only a production database. Luckily I had written a feature to
| export the products to an excel sheet a while before and
| happened to have an excel copy from the prior day. I managed to
| build an export to ingest the excel and repopulate the table in
| record speed while waiting for my phone to ring and the client
| to be furious. Luckily they never found out.
| [deleted]
| gwerbret wrote:
| I had this experience when, years ago on my first day as group
| lead at $JOB, I was being shown a RAID 5 production server that
| held years of valuable, irreplaceable data (because there were
| no backups. Let me repeat that there were no backups). For some
| bizarre reason, I thought "oh cool, hot-swappable drives" and
| pulled one out of the rack. This naturally resulted in loud,
| persistent beeping from the machine, which everyone ignored on
| the assumption that the fellow who was just hired as the group
| lead knew what the f he was doing.
|
| While I _didn 't_ know what I was doing, I did manage to get
| the beeping to stop, and had to come in at 5 a.m. the next day
| to restripe the drive I'd yanked out.
|
| Did I mention there were no backups? When I was a little bit
| more seasoned on the job, I raised a polite but persistent
| issue with management of the need for durable backups. Although
| I kept at it for months, they thought about it, talked about
| it, and ultimately did nothing. A few months after I left, the
| entire array failed. Since the group's work relied on the
| irreplaceable data, all work ground to a halt for the several
| months it took for an off-site company to recover the data.
| ycmjs wrote:
| My previous boss stores company data this same way. I begged
| him to approve the $5 per month cost for Backblaze on the
| computers I used. He approved it for some, but not all (about
| half of the ten computers). He completely rejected the idea
| for the company's data. After all, it was already protected
| by RAID.
| ricardobeat wrote:
| Isn't RAID 5 supposed to survive a single disk being taken
| out?
| windsurfer wrote:
| If a second drive fails after the first while rebuilding
| (which happens more often with larger and slower drives),
| the data is lost.
| arminiusreturns wrote:
| Theoretically but there are often other things at play. I
| know the story is older but since about 2015 raid5 has been
| dead to me, mostly because at current drive sizes a raid5
| rebuild takes so long your chance of a cascade failure and
| losing a second drive which makes it a "send to a recovery
| lab" risk. Anywhere you would use raid5 just do raid6.
| cntrl wrote:
| damn, your description is spot on and reading this triggered
| PTSD in me... Last time I had this feeling was two years ago
| when I destroyed one of our development servers because of a
| failed application update. I know exactly how I wished Ctrl + Z
| to exist in real life... We had backups of the machine, but it
| was still kind of a humiliating feeling to tell everybody and
| ask for restore from backup (everybody was cool though in the
| end)
| Taylor_OD wrote:
| God the feeling of having your body temp rise based purely on
| realizing you fucked up is so relatable.
| deltarholamda wrote:
| Pffft, it's not a real panic until you weigh the pros and cons
| of leaving the country with nothing but the clothes on your
| back and becoming a illegal immigrant shepherd in a nation with
| too many consonants in its name.
|
| (Your description is so, so, spot on.)
| beardedetim wrote:
| Ah, the goat farmer fantasy that always seems to come _at the
| cusp_ of the solution.
| CapmCrackaWaka wrote:
| The worst panic I've felt actually took me over the precipice
| into peaceful oblivion. I started simply saying to myself "oh
| well... It's just a job".
| sergiotapia wrote:
| I lost 1hr and 30 minutes of a Slack like app (chat messages).
| Luckily at the time we were pretty small so not much data was
| lost but holy shit did that make me almost throw up.
|
| Thank God my automatic backups were so close to the mistake I
| made and I didn't lose 24 hours.
|
| Haven't made a mistake like that since and I don't destroy DB
| records like that anymore.
| Oarch wrote:
| Poetic! Love it
| Helitio wrote:
| Just a note: being able to click yourself a server at Google, AWS
| etc. Might be cheap enough even paying for 15tb of traffic.
| DonHopkins wrote:
| >... the "Silicon Valley" world ...
|
| To rebillionizing!
|
| https://www.youtube.com/watch?v=wGy5SGTuAGI&t=369s
|
| ...yeah, the Tres Commas bottle was on the DELETE key. The corner
| of it was just, it juuuust got on there...
| lesgobrandon wrote:
| [deleted]
| [deleted]
| dclowd9901 wrote:
| His solution reminds me of how I used Cypress to generate test
| accounts on our local admin dashboard for Cypress tests, since
| our api was inadequate (it didn't do the billing signoff required
| to create accounts that last longer than a month... don't
| ask...).
| SnowHill9902 wrote:
| Related: is there any HTTP API model that supports transactions
| with commit and rollback? Also isolation levels? Usually one
| wants to set_stock(get_stock() + 10) but there may be competing
| from various clients between both calls, resulting in races.
| Usual web APIs seem vulnerable to this.
| jffry wrote:
| Wouldn't the model be to expose an increment_stock(10) type
| HTTP endpoint instead, and the backend can ensure it's atomic?
| LinAGKar wrote:
| Shouldn't that be `page={page}` rather than `page{page}`? Or
| better yet, use the requests `params` argument.
| hanly_paul wrote:
| I am also a junior with 1 year's experience, just in Python but
| none with the requests module or web development. If the 'page'
| variable is being changed, was the error something specific to
| this module, not refreshing the page?
| orange_puff wrote:
| As everyone else has already pointed out, better testing would
| have been very useful here. For instance, print(len(our_ids))
| would have been a dead giveaway that that something was up
|
| I am also a junior dev and completely empathize with being given
| a lot of responsibility and potentially messing up. I think for
| someone with < 1 year of experience, to solve the problems you
| created as fast as you did is really impressive. Thankfully your
| story ends well :)
| AtNightWeCode wrote:
| The conclusion should include that backup at separate locations
| is key. Also, that the backups are tested and work. I worked with
| clients that had everything from lightning strikes destroying
| servers to ransomware to people making mistakes. No problem with
| solid backups. There is a difference between a good process and
| skill.
| thisNeeds2BeSad wrote:
| The only thing that I can remember helping against such actions,
| is the exponential need for confirmation by intent.
|
| Means, if you delete one small file you need one confirmation, if
| you delete thousands, you need a intent stating i expect thousand
| files to be deleted. Same goes for size. So not a okay button,
| but instead a form allowing you to enter the dimension of the
| intented outcome. 100 files max, 1 gb max deleted.
|
| If the request goves over the intent, the system aborts.
| dncornholio wrote:
| What is the f doing in
|
| url = f"https://api.ourservice.com/media?page{page}&step=100 ?
| throwaway744678 wrote:
| It's a Python f-string [0]. A way of formatting a string by
| directly including a Python expression between curly braces.
|
| [0] https://docs.python.org/3/tutorial/inputoutput.html#tut-f-
| st...
| qwertox wrote:
| "f-strings", a (new) way to format strings.
| jraph wrote:
| f for format ("formatted string").
|
| It does the same thing as
| `https://api.ourservice.com/media?page${page}&step=100` [sic]
| in Javascript, or
| "https://api.ourservice.com/media?page$page&step=100" in Bash,
| PHP, Perl or Groovy (and other languages). It outs you into
| variable substitution / interpolation in the string literal.
|
| In Python these string literals are called f-strings if you
| want to look it up. They are defined in PEP 498 - Literal
| String Interpolation [1] and available since Python 3.6.
|
| [1] https://peps.python.org/pep-0498/
|
| [sic] there probably would be a missing '=' in this url after
| "?page"
| fifticon wrote:
| if it's python, it's the formatting/interpolation string
| marker.
| vjust wrote:
| So much wisdom in these comments, people have different styles of
| being careful, and each makes sense in a nuclear "go" situation
| p0d wrote:
| For many years I have had a private blog. I like to write but
| realised 99% of us are not interesting to read. This is a young
| guy processing his thoughts. Not "teaching" the rest of us as he
| frames it. This should have stayed in-house and personal. The
| company can then decide which clients, authorities to contact if
| necessary. There is a book in all of us as they say. For most of
| us it should stay there.
| donalhunt wrote:
| fwiw I would probably have turned to rclone.org for this. It
| doesn't have support for vimeo out of the box but the Vimeo API
| seems sane enough that it would be trivial to implement uploads
| quickly.
|
| Previously used rclone for doing massive transfers between cloud
| providers using "cheap" on-demand servers which provide unlimited
| data transfer (the public clouds make this very expensive).
| ghoomketu wrote:
| The more I read about vimeo the more I wonder what's up with
| these guys.
|
| Only recently they made some god aweful policy changes for
| content creators(1), but it looks like they treat their
| enterprise customers just the same.
|
| Surely, there must be better alternatives for hosting videos than
| being at the mercy of a company who couldn't care less about big
| paying customers.
|
| (1) https://www.theverge.com/2022/3/18/22985820/vimeo-
| bandwidth-...
| pfista wrote:
| mux.com seems like a great alternative and is super developer
| focused.
| bbbush wrote:
| scary. maybe as well just pay vimeo to restore data.
| IYasha wrote:
| So, apparently, vimeo has better support than youtube (not
| informative, but at least they DO something). Duly noted.
| aasasd wrote:
| After having read about plenty of such cases over the years, I
| have a persistent dread of pulling something like that myself, to
| the point of being nervous with '*' in the terminal, and
| generally checking everything twice. (And also have some kind of
| mild horror-high from corporate snafu stories, weirdly
| reminiscent of Ballard's 'Crash').
|
| So: I never feed the data straight from the gathering script into
| the modifying script, at least not in the first runs. Instead, I
| dump the whole list of items into a file, count them in there,
| gawk at them to see that they're right, and compare with the
| source data by hand until I begin to annoy myself. Then I feed
| that file to the second script.
| Peleus wrote:
| Under NDA but I'll give rough details of what's occurring while
| also naming my client and disparaging them to the public.
|
| Well that's a brave move...
| searchableguy wrote:
| They said they are a junior developer with not much experience.
| I'm afraid they may not know what is and isn't covered under
| NDA.
| KingOfCoders wrote:
| My tip would be: read what you sign.
| thevinter wrote:
| Just to clarify, my company is under an NDA and not
| personally me. It also encompasses only the actual project
| details so a post like this is legally compliant. (Not a
| lawyer, might be wrong)
| KingOfCoders wrote:
| So you're not under an NDA as you wrote.
|
| I don't know your position but I would assume a NDA is
| part of your freelancer or employee contract.
| mkr-hn wrote:
| OP might at least want to consult with a contract lawyer
| in Italy to make sure.
| Closi wrote:
| You likely have a confidentiality clause in your
| contract.
|
| If your company is under an NDA, your company will have
| an obligation to ensure that _you_ also do not disclose
| information.
|
| Companies are mostly just collections of people, and an
| NDA is mostly meant to stop people working on the project
| from talking about the project.
| bluehatbrit wrote:
| In every contract I've ever signed, part of the NDA
| clause with my employer is that I'm also bound by NDA's
| my employer is bound by, so if the employer signs an NDA
| with a customer, I would also be bound by that. It might
| be worth checking your contract, otherwise having a
| company sign an NDA doesn't hold much weight if their
| staff are free to go around sharing the information
| themselves.
| [deleted]
| photon-torpedo wrote:
| Apart from all the advice on how to do such destructive
| operations more safely, I think there's also a lesson to be
| learned about communicating more actively:
|
| 1. Vimeo responds to the original request with "will look into
| it", then... nothing happens? This may depend on culture, but at
| least from my experience in the UK, this is a very non-committal
| response, and if you really want them to do something, you'll
| need to chase them. Wait a few days and inquire if they have any
| estimate for when it might get done, or if they need more
| information. I find that the "looking into it" response is
| sometimes used to gauge how important the request is to you.
|
| 2. Once you go with your own solution, just drop a quick message
| to Vimeo: "Hey, just wanted to let you know we've found our own
| solution for this, and won't require your help any more. Sorry if
| you've already committed any resources for this task. Have a nice
| day, yada yada." This not just avoids what happened here, but is
| also a courtesy to them.
| mbostleman wrote:
| Related: The change is fine, it's only one line.
| amtamt wrote:
| A computer lets you make more mistakes faster than any invention
| in human history, with the possible exceptions of handguns and
| tequila.
| mindcrime wrote:
| Imagine coding while drinking tequila...
| johnklos wrote:
| We can all poke at this person for doing things incorrectly, but
| one has to wonder what mindset could lead to any programmer ever
| thinking that: 1) parsing a web page shouldn't be
| considered incredibly fraught with problems 2) that
| reloading web pages should be part of (1) 3) that this
| should ever possibly be run without validating the list of files
| that would be deleted
|
| So forget the specifics. Where are people learning these things,
| and what do we do to teach them better things?
| bsder wrote:
| "rm -rf" blowing you foot off is a Unix Right of Passage(tm).
|
| You _will_ do it at least once in your career. If you 're old
| enough you will do it twice. If you're really old, you get the
| joy of doing it a third time.
|
| The subtlety increases each time because you _do_ learn.
| dboreham wrote:
| College? Parents? In my experience it runs pretty deep so not
| sure it can be easily trained out. This mindset is probably
| quite useful in evolutionary terms: rush at the attacking bear
| without thinking, for example.
| plonk wrote:
| > rush at the attacking bear without thinking, for example
|
| Would that work? I don't see a bear backing down and I don't
| see the human winning either.
| qayxc wrote:
| > Where are people learning these things, and what do we do to
| teach them better things?
|
| Learn to learn and learn to work carefully. It starts in school
| and should be part of a proper college/university education or
| vocational training.
|
| There's several ways of learning the specifics: by experience
| on-the-job, which can be hard if mistakes can get you fired; or
| by putting in the work in your free time.
|
| If your job is to work with certain web frameworks and you're
| not very experienced, either ask senior devs to assist/review
| before going live with critical changes. Alternatively,
| practice at home. Unpopular, but you need to get experience
| from somewhere. OSS projects are a great way to do that - be
| that by creating your own or by contributing to an existing
| one.
| dncornholio wrote:
| Some mistakes can only be learned by making them. Sometimes you
| can tell someone a hundred times something, they won't learn
| until they experience it.
|
| The point is not to prevent these mistakes, but to keep the
| consequences low.
|
| Have backups, have version control, etc.
| ufmace wrote:
| True, and worth remembering why. Most of us are constantly
| getting warned about the dire potential consequences of huge
| numbers of things, most of which are either massively
| unlikely to ever happen or not actually that bad, or both.
| It's very difficult to tell which of the things we get warned
| about are actually high risk until something bites us.
| Mo3 wrote:
| Seriously.. also, looking at these code snippets...
|
| If someone delivers code that looks like that, especially if
| intended for a production system, I'm firing immediately.
|
| It's a miracle nothing has happened sooner.
| ziddoap wrote:
| From the article:
|
| > _I 'm a Junior Developer with less than one year of actual
| experience._
|
| > _The bad news is that this was on Friday, and we needed to
| have the videos back up at most for Tuesday morning._
|
| You say:
|
| > _If someone delivers code that looks like that, especially
| if intended for a production system, I 'm firing immediately_
|
| Fire immediately? What a miserable sounding place to work.
| Mo3 wrote:
| In this case - seeing how they let them have direct access
| to production - I agree on the miserable sounding place to
| work and repeat myself -
|
| It's a miracle nothing happened sooner
| ziddoap wrote:
| I was referring to your workplace.
| Mo3 wrote:
| At least we don't let junior developers with close to
| zero experience anywhere near production..
|
| I didn't quite read the part about his experience in the
| article, I agree firing over that wouldn't be fair, but
| that just raises other questions.
| DeathArrow wrote:
| There's a thing called unit tests.
| muglug wrote:
| The root of this particular issue was Vimeo's failure to do this
| migration for their customers.
|
| Vimeo OTT has a codebase written in Rails, whereas the main PHP
| application is written in PHP. At the time Vimeo acquired Vimeo
| OTT's codebase, the Vimeo OTT codebase was small -- around 10,000
| lines of Ruby. Rewriting that codebase inside the Vimeo PHP
| application would have been a tough technical challenge for the
| all-Ruby team, and they'd have likely lost some people along the
| way and missed out on some content deals, so they decided instead
| to maintain two separate codebases and two separate login
| systems.
|
| The video-playback and video-storage infra has since been
| unified, but all the business logic is still siloed.
| conductr wrote:
| He wasn't asking them to refactor their internal code bases.
| But they should be able to whip up the 20 lines of code needed
| to do this between APIs (or just directly on their servers).
| Essentially what author was trying to do when he screwed up.
| For the author this was disposable code, for Vimeo this would
| have been a reusable utility.
|
| I know how these things happen. Support ticket queues and all.
| And while I don't fully know the difference in cost, I would
| assume a customer upgrading to an Enterprise plan would get a
| better support experience.
|
| Whoever within authors company negotiated the upgrade to
| Enterprise (or didn't) and failed to embed some agreement
| around OTT to Enterprise transition assistance was the one who
| made the first mistake.
| macspoofing wrote:
| >The root of this particular issue was Vimeo's failure to do
| this migration for their customers.
|
| Yes and No. At the end of the day, you as a business have to
| insulate yourself from your infrastructure provider.
| notyourday wrote:
| Vimeo is the only infrastructure provider providing that
| service. It is impossible to insulate a business from it.
| chernevik wrote:
| Per the post, Vimeo DID do it -- without telling the customer!
| And then wouldn't help uncluster the situation.
| macspoofing wrote:
| > but at the time the code seemed completely correct to me
|
| I venture this kind of (misplaced) over-confidence is not
| atypical of many junior developers. As someone with a few years
| under my belt, I don't care how sure I was of the code I wrote
| that deletes important data, I would have gone through the code
| over and over again, and at least ran a simulation (by maybe
| logging the generated delete urls for manual verification).
|
| It's a rite of passage and we all went through something like
| this. It's how you learn and grow.
|
| >It also should probably teach something to Vimeo
|
| No. Even if Vimeo could have made things better, it's still your
| fault. You have to take responsibility for your business. At the
| end of the day, if this causes the closure of your company, Vimeo
| is still fine.
| wumms wrote:
| Not completely off topic (as one of my scripts deleted files
| recently which dates were off by one):
|
| > Fri May 06 2022
|
| > I'm currently working [...] in Italy
| masswerk wrote:
| Controversial opinion: And this is why block syntax by white
| space is not for production.
| krit_dms wrote:
| This is hardly a whitespace issue
| masswerk wrote:
| Ah, yes, I just noticed the difference in indentation. In
| actuality, the error about the mental model of variable
| states.
| havkom wrote:
| The company was lucky to have someone like you that could
| actually sort out real problems efficiently. I would bring up
| this story when negotiating for a raise.
| davbryn1 wrote:
| "What does this teach us? Well, it teaches me to do more diverse
| tests when doing destructive operations. It also should probably
| teach something to Vimeo and to my contractor but I doubt it will
| (and yes, the upload for some reason is still manual to this day.
| Go figure!)"
|
| So you wrote bad code, didn't test it properly, ran it on
| production on the Friday before a release and are blaming Vimeo
| and [name redacted]?
|
| And your resolution was yet another cobbled together script that
| you probably didn't test?
|
| This isn't a great article to have attached your name to
| gala8y wrote:
| Not to mention that he _deleted_, but not _lost_ videos.
| Nothing to see here.
| oneepic wrote:
| Earlier in the article, the author does call out that it's bad
| code, so he's not entirely blaming these companies. Anyway: You
| should not be afraid of thinking about what _each_ party could
| have done better. Not just yourself, but other people too. When
| I look back on times where I only blamed myself for prod
| issues, it was less of a learning experience, and more focused
| on beating myself up for no good reason. That approach shows
| that I 'm afraid of the consequences, and it's an effective way
| to feel isolated from the team instead of improving.
| nickkell wrote:
| Better to do it before the release then afterwards. I'm
| assuming this way nobody noticed the issue.
|
| Also, would you rather everyone only ever posted about all the
| times they were successful?
| chopin wrote:
| I'd hire this guy if only being for this frank about his
| mistake. He owned it and that is what I would look for.
|
| After deletion, what should he have done? Postpone the go-live?
| That's often not a a cost-effective option. As for a risk-
| analysis the worst what could happen was deletion of the
| remaining videos. I don't think that that makes big difference
| in this situation. And to do the right thing, you have to have
| the infrastructure in place, if you are in a hurry. I doubt
| that's the case for a 10 heads shop.
| GordonS wrote:
| Aye, this is how you learn and make sure it doesn't happen
| again.
|
| I did a similar thing ~20 years ago when I first started my
| career, accidentally deleting a production database because I
| thought I was working on the test database.
|
| I owned it, learned lessons from it, and it's never happened
| again.
| davbryn1 wrote:
| Owning the mistake would be fine if he did that - he did'nt.
| He blamed the company he was contracting for. That's a big no
| from me
| esquivalience wrote:
| It's as if we read different articles. He literally writes
| that he made "A series of mistakes that could've probably
| been easily prevented."
| thevinter wrote:
| I'm sorry if it came off like that. The mistake in this
| case was completely mine (bad code and bad testing). The
| detour on the other two companies was mostly because this
| way of deleting/recovering stuff should've probably been
| avoided in the first place, other than that I'm absolutely
| not blaming anyone else!
| davbryn1 wrote:
| Don't worry about all that - there isn't a developer
| worth their salt that hasn't made a mistake. But I'd
| consider having this blog post and HN post retracted
| purely for future internet checks. It isn't a reflection
| on you, and your honesty is fantastic. But there is a lot
| to be said about using a pseudonym when it comes this
| close to your employers
| desarun wrote:
| I'd probably make your github profile private for a while
| as well. Or at least removing your real name from it.
| [deleted]
| malexbone wrote:
| Agree 100%. Acknowledged mistake, moved forward to find a
| solution. Reflected on lessons learned. Shared valuable
| lesson.
|
| To me this indicates intelligence, competence, integrity,
| grit and generosity. TechnicL proficiency is much easier to
| come by than integrity, grit and generosity. I would trust
| the author to deliver on commitments.
| honksillet wrote:
| Agreed. But I'd also fire him from this job.
| Beltiras wrote:
| For having got into a sticky situation and out of it?
| [deleted]
| SparkyMcUnicorn wrote:
| "Recently, I was asked if I was going to fire an employee
| who made a mistake that cost the company $600,000. No, I
| replied, I just spent $600,000 training him. Why would I
| want somebody to hire his experience?"
|
| -- Thomas J. Watson
| yohannparis wrote:
| Doesn't make sense. Their employer literally paid them to
| learn from their mistake.
|
| Now, you think they should be fired? So that another
| employer rips the benefits of that learning experience.
| kwertyoowiyop wrote:
| Will every developer who has never checked in bad code on
| Friday, or accidentally deleted the wrong data, please raise
| their hand?
|
| 'Judgment comes from experience, and experience comes from poor
| judgment.'
|
| :-)
| dang wrote:
| (Since the OP redacted the company name from the post, I've
| done the same in your comment here. I hope that's ok.)
|
| (We do this sort of thing to protect users, usually as the
| result of an emailed request, and you can tell when we've done
| it because of the word 'redacted' in square brackets.)
| jasonlotito wrote:
| > This isn't a great article to have attached your name to
|
| A million times better than your comment.
| davbryn1 wrote:
| All I did was give advice. If you don't like it it's fine.
| smokey_circles wrote:
| Oof, we wouldn't work well together. Very rarely is someone
| good enough to be this obnoxious.
| davbryn1 wrote:
| I very much doubt you would ever work with or for me.
| [deleted]
| breakfastduck wrote:
| Vimeo completed a major migration of videos between accounts
| with no confirmation or communication before commiting it, then
| refused to reverse the change. Hardly the best service.
|
| The article hardly comes across as 'blaming' them for the core
| issue but they were definitely not helpful.
| wruza wrote:
| Code without constant logging of "utc [who] does what exactly" is
| a no-go for me for a long time. Also, if you have to be
| destructive, replace the <rm/sell/halt> with log() for at least
| one time (aka --verbose --dry-run) and check your expectations.
| One-shot scripts like this are screaming disaster.
|
| (The problematic line lacks the closing ", probably a typo? I
| though it closed in an unexpected location)
| ge96 wrote:
| The product I work on, I can watch the events occur afterwards
| (videos of people using it) and it's so embarrassing watching it
| fail. The wasted time. Ahh... I've gotten better to check deps
| and run a full automated E2E test everytime new code is deployed
| (before/after diff envs).
|
| Still things happen. Hopefully you have a large enough client
| base where some bad experience doesn't define the whole thing.
| BillyTheKing wrote:
| For larger 'live' production changes I've now started to rely on
| generative programming. I've got one script in some 'normal'
| programming language like javascript, or python, which in turn
| generates a script that contains a list of curl or other cli
| commands which do the actual deletion, modification, addition,
| etc.
|
| This allows me to run a small sub-set of commands and test those
| under a live-environment before running all commands at once. In
| addition, this also functions as a complete log of what has been
| changed manually in production.
| RankingMember wrote:
| I'm impressed you went with an automated solution (PlayWright)
| for 500 videos after all that, considering they could be cross-
| loaded from Google Drive almost instantaneously. I'm glad it
| worked, but coding around a screw-up under the gun seems like a
| high-risk operation compared to spending 4 hours doing the task
| manually (albeit being super bored the whole time), but with the
| benefit of knowing it's being done correctly instead of hurriedly
| writing a script to potentially do something else wrong very
| efficiently and dig your hole deeper.
| leokennis wrote:
| Actually I was surprised reading that the person wrote a script
| to delete 900 videos.
|
| If you need to do it once, it's probably 2-3 hours of work?
| That is identifying a duplicate video and then clicking the
| button(s) to delete it once every 20 seconds.
|
| Reminds me of https://xkcd.com/1205/
| bruhbruhbruh wrote:
| +1 to this. After the few major screw-ups I've caused at work,
| my self-confidence in my coding ability is rocked, and I tended
| to react by erring towards manual cleanup, rather than coding
| some scalable solution for fixing the issues
| alkaloid wrote:
| Does anyone else get that deep, dark, disturbing feeling in their
| gut when they know they have done something bad like this?
|
| This is why I use so many print statements and comment out
| destructive actions! Lots of experience with these feelings!
| arein3 wrote:
| You can automate using puppeteer or selenium
| dsego wrote:
| The author used Playwright in the end to automate uploads.
| Using e2e tools for automating tasks is clever, I'm not sure I
| would've thought of it.
| chopin wrote:
| It's clever, but also brittle. And might have disastrous
| error conditions (like hitting "Delete" instead of "Continue"
| if the wrong UI part has focus).
| andreagrandi wrote:
| It should really be something like: "a flaw in our system allowed
| me to delete 7am TB of videos". Not entirely your fault.
| mrkwse wrote:
| System and/or development processes
| desarun wrote:
| Oh dude, we've all been there.
|
| 9 years ago I was working for a major broadcasting company in the
| arse end of London as a junior dev, building one of their Android
| apps.
|
| We'd roll features out months before & enable them with feature
| flags via a json file we'd manually push to a prod server at a
| later date.
|
| We'd just built a huge new feature letting you request content to
| be downloaded to your set top box remotely & it had a 250k
| marketing campaign to go along with the launch.
|
| Senior dev trusted me with prod deployment rights.
|
| I pushed the wrong json config to prod, launching the feature
| weeks before the marketing campaign.
|
| Thank god I was a junior perm, that was definitely a firing
| offence.
| hayd wrote:
| > Senior dev trusted me with prod deployment rights.
|
| That part's crazy! If you think it was a firing offence
| wouldn't they've been fired? (I don't think it is, but
| obviously requires system changes/explanation.)
| BurningPenguin wrote:
| I accidentally deleted a printer from the printserver by using a
| python script. The docs weren't exactly clear, so i thought it
| would only remove the local printer connection. After reading
| this post i feel better now. My fuckup wasn't that bad in
| comparison. :)
| furyofantares wrote:
| Great post and great attitude.
|
| I think I would reflect on why this is a script to begin with.
| It's run once and with only 500 items could be done manually,
| though 500 is certainly a bit much.
|
| But it's not a massive time saver; the point of the script should
| be almost entirely to increase accuracy. I think I would write
| one script to generate the list of videos to delete; that's the
| part that's actually difficult, and a human can then verify the
| list. I would probably just delete them by hand after that, but
| if I really wanted a script for that part too, it would be a
| separate script that uses a list that has been vetted by a human
| even if initially created by the first script.
| Reason077 wrote:
| > _" What does this teach us? Well, it teaches me to do more
| diverse tests when doing destructive operations."_
|
| I think it also teaches us that adversity sometimes leads to
| better solutions. I love that the OP made a hacky script that did
| in 4 hours what a guy was paid to do manually over several
| months!
| KingOfCoders wrote:
| "I'm under an NDA"
|
| Don't write a blog post.
| franciscop wrote:
| This is a great technical write up, I'd love to hear the human
| side of this story as well! When did you tell the higher ups that
| you deleted production? Was no one more senior on call to try to
| fix it? Did they want you to learn how to fix it? Or were you the
| most senior responsible for this whole area? Or did they don't
| know?
| thevinter wrote:
| The first part of my write up slightly explains it but the
| point is that HN is the top 1%. In my current company we have
| 10 developers, most of them without a technical degree. They
| know how to do what they've been doing for the past 10 years
| but (as with most small companies here in Italy) people don't
| know what best practices are used in the industry, what a
| pipeline is or what a dry-run is (I learned about it today
| myself!).
|
| What happened is that no one knew how to react and I was
| probably the best suited for it, we don't really have seniority
| in office.
|
| That said when I deleted the videos I immediately told my boss.
| He was kind of scared but his reaction was mostly "Well, now we
| have to re-upload them immediately, find a way. The people that
| uploaded them once won't be doing it twice". I was basically
| left on my own to find a solution (which I luckily did).
|
| Please note that I'm in no way blaming my company or accusing
| it of something, this is the standard knowledge base and way of
| dealing with things in many places, contrary to what working in
| big tech or reading HN might make you believe!
| franciscop wrote:
| Thanks for the explanation, that makes a lot of sense!
|
| > "HN is the top 1%" + "this is the standard knowledge base
| and way of dealing with things in many places, contrary to
| what working in big tech or reading HN might make you
| believe!"
|
| I'm in fact from Spain and now live in Japan, and I believe
| the practices in Spain would be as bad as Italy, and in Japan
| they are def worse (great at hardware, horrible at software),
| so I do understand a lot of what you are saying. FWIW, in
| Spain I've seen whole dev teams composed only of interns!
|
| > "we landed a big contract for one of the biggest gym
| companies in Italy, the UK and South Africa" + "we don't
| really have seniority in office"
|
| Maybe now that seems like you have the budget it's a good
| time to go to management and suggest to hire some senior devs
| who can mentor the rest into learning best practices? You can
| sell it like a reinvestment in the company to management if
| they want to take it as pure profit. If Italy is like Spain,
| many devs won't really even want to learn these things, but
| some will and then those will become seniors at some point.
| Sirikon wrote:
| Everyone makes mistakes, juniors and seniors alike, but I
| consider you have the right mindset and resolutive skills that
| will make you thrive :)
| ricardobayes wrote:
| Any process that makes a junior directly access prod
| codebase/database is flawed. No matter how small of a company you
| are, you can set up a proper CI/CD pipeline.
| thevinter wrote:
| 90% of IT companies in Italy don't even know what a CI/CD
| pipeline is. That said I don't think it's something we could've
| integrated in our pipeline as it's an error that originated
| from an external service!
| Fritsdehacker wrote:
| This is why you have backups. Good on you to have them!
|
| When I just started as a junior dev at a small company I made the
| classic mistake of emptying the prod db instead of my local dev
| db. This was a small and in hindsight insignificant project. But
| Google was our customer, so it didn't feel insignificant at the
| time.
|
| In this case my inexperience was partly my savior. All the data
| was inputted by people via a web form. Normally you're supposed
| to use POST to submit a form. But I was quite clueless at the
| time, so I had used GET. This meant all requests were still in
| the Apache logs. I could simply replay all requests.
|
| I still feel my hard pounding when I think about the moment I
| realized what had happened. I was really relieved when everything
| was back!
|
| What I learned from this incident:
|
| - make automated backups
|
| - no access to prod db from anywhere but prod
| cassandratt wrote:
| Yea, I've wiped out an entire government's form library once.
| Backups are a career saver.
| NikolaNovak wrote:
| Honestly, this is positively representative of any junior
| developer with comparable experience. Depending on their
| background and how much production work they had, there's an
| overwhelming sense of eagerness and enthusiasm. Quick to script
| and perhaps a bit too quick to execute.
|
| A friendly team will harness that enthusiasm and tame the
| quickness / encourage respect for production. We all made a
| massive doo doo and its how you proceed that'll define your
| career.
| RcouF1uZ4gsC wrote:
| This is one of those times that even if you don't use a fully
| functional language, trying to make as much of your program logic
| pure functions would be helpful.
|
| It also makes it more testable. Instead of putting the delete
| call right in the loop, split it into four functions.
| function getAllVimeoVideos() function
| getAllDbVideos() function
| getVideosToDelete(vimeo_videos, db_videos) function
| deleteVideos(videos_to_delete)
|
| Your core logic lives in getVideosToDelete which is simply a set
| difference.
|
| Given that there are only a few hundred videos, it is easy to run
| the getter functions above and quickly verify they are returning
| what you expect.
| acutis_fan wrote:
| Yes that's fun. a List<Foo>
| getFoosToUpdate(List<Foo> foos, List<Bar> bars)
|
| function is the first time I thought about time complexity in
| my job.
|
| Say Foo and Bar have fields in common, such that you can say a
| Foo object "equals" or "matches to" a Bar object, like if they
| have name and dateOfBirth fields or something else that are the
| same (nothing like a common ID between the two). Now say there
| are some other fields too, like amountSpentThisYearOnDogFood
| that you know is always accurate for Bars, but might be out of
| date for Foos. How do you get the list of all the Foos to
| update?
|
| Initially I did the nested for loop solution that's like
| List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
| { List<Foo> returnList = new List<Foo>();
| foreach (var foo in foos) { foreach (var bar
| in bars) { // check if "equal" or "matching"
| based on some criteria // if equal, update foo dog
| food expenditure with bar dog food expenditure, add to
| returnList, and break } } return
| returnList; }
|
| but that's O(n^2) right.
|
| The solution with a Dictionary is obviously better. All you
| need to ensure is that you have a method for both the Foo and
| Bar classes that will produce the equivalent hash for both, if
| they would be considered equal or matching by whatever criteria
| you are using.
|
| So you could have something like int
| GetHashOfFoo(Foo foo) { string firstName =
| foo.FirstName; string lastName = foo.LastName;
| DateTime dob = foo.Dob; return (firstName,
| lastName, dob).GetHashCode(); // convenient c# method }
| int GetHashOfBar(Bar bar) { string firstName =
| bar.FirstName; string lastName = bar.LastName;
| DateTime dob = bar.Dob; return (firstName,
| lastName, dob).GetHashCode(); }
|
| These two functions will return the same value if those fields
| are the same. So then you can do something like
| List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
| { List<Foo> returnList = new List<Foo>();
| Dictionary<int, Bar> barsByHash = new Dictionary<int,
| Bar>(bars.Count); foreach (var bar in bars)
| { int barHash = GetHashOfBar(bar);
| barsByHash[barHash] = bar; } foreach (var
| foo in foos) { int fooHash =
| GetHashOfFoo(foo); if (barsByHash.ContainsKey(fooHash)
| { returnList.Add(foo.CopyWith(dogFoodExpenditure:
| barsByHash[fooHash].DogFoodExpenditure)) } }
| return returnList; }
|
| Which is faster cause you only have to go through the bars list
| once.
|
| I actually messed up something like OP with this, but with
| doing undesired additions instead of undesired deletions.
|
| You can think of it as having two endpoints, both expecting a
| .csv with rows being the things you were
| updating/changing/deleting.
|
| The problem was, there was a column to indicate (with a
| character) whether the row was for an edit, or addition, or
| deletion, but this was only with one of these endpoints. For
| the other, there was only addition functionality, but I thought
| changes and deletions were also options for the other kind of
| .csv due to some unwise assumptions on my part (thinking that
| the other .csv would have the same options as the other).
| That's how we accidentally put in over 100 additions that
| should have been changes that had to be manually deleted.
| Luckily I had a list of all the mistaken additions.
| tomhallett wrote:
| This was going to be my exact recommendation. By "separating
| the concerns", you make it easier on my pretty much every
| dimension: testing in unit tests, doing a dry run in
| production, ability to read the code (you and code reviews),
| and in some cases your code will be written in a more
| functional way reducing variable scoping issues.
| DeathArrow wrote:
| This wouldn't be an issues if providers like Vimeo would soft
| delete and hard delete the items after a period of time, allowing
| recovery between.
|
| Everywhere I have to implement a delete operation, I never hard
| delete data on first call.
| kirillzubovsky wrote:
| Mistakes happen. Kudos to the author on taking it as a learning
| opportunity. I am friends with a lot of smart devs, and many of
| them have dropped a production db at least once, and if not then,
| then accidentally emailed 10k people ...etc. It happens. Work to
| avoid it, but plan for what to do when it inevitably happens.
| -\\_(tsu)_/-
___________________________________________________________________
(page generated 2022-05-05 23:00 UTC)