[HN Gopher] Japan HP accidentally deleted 77TB data in Kyoto U. ...
       ___________________________________________________________________
        
       Japan HP accidentally deleted 77TB data in Kyoto U. supercomputing
       system
        
       Author : rguiscard
       Score  : 229 points
       Date   : 2021-12-30 05:43 UTC (17 hours ago)
        
 (HTM) web link (www.iimc.kyoto-u.ac.jp)
 (TXT) w3m dump (www.iimc.kyoto-u.ac.jp)
        
       | deepsun wrote:
       | Yet another bug due to using command-line interface (which is
       | designed for humans not programs) by programs.
        
       | Jimmc414 wrote:
       | >the find command containing undefined variables was executed and
       | deleted the files
       | 
       | Just a note that "set -u" at the beginning of a bash script will
       | cause it to throw an error for undefined variables. warning that
       | of course this should be tested as it will also cause [[ $var ]]
       | to fail.
       | 
       | If that's the case
       | 
       | [ -z "${VAR:-}" ] && echo "VAR is not set or is empty" || echo
       | "VAR is set to $VAR"
       | 
       | will help test that condition
        
       | moonbug wrote:
       | it's a lustre filesystem. the data would've been eaten eventually
       | anyway.
        
         | bayindirh wrote:
         | What would make you think of that?
        
       | l33tman wrote:
       | I've been a Linux coder and user forever, and I didn't know that
       | bash "reloads" a script while running if the file is modified.
       | Good to learn before I also delete a whole filesystem due to
       | this! :)
        
         | [deleted]
        
         | tyingq wrote:
         | Is that what happened? I can't reproduce that by changing a
         | bash script that's running a while [ 1 ] loop.
         | 
         | Is it maybe that they were editing or copying the file and a
         | cron job kicked off?
        
           | marcan_42 wrote:
           | That's because it's a loop, so it's already read. Try
           | appending a line to a running script instead.
        
             | tyingq wrote:
             | Ah, yep. This does work, and prints out both "one" and
             | "two":                 printf "echo one\nsleep 3\n" >
             | s1;(bash s1 &);sleep 1 && printf "echo two\n" >> s1
             | 
             | That's interesting. And changing the "sleep 1" to "sleep 4"
             | make it only output "one".
        
         | [deleted]
        
       | pharmakom wrote:
       | I have switched to F# for scripting tasks and have found F#
       | scripts are (usually) either correct on the first try or fail at
       | the type-checking stage. I would highly recommend it for anything
       | near production.
        
       | gnufx wrote:
       | Assuming this was a "scratch" HPC filesystem, as I'd guess,
       | "scratch" is used advisedly -- users should be prepared to lose
       | anything on it, not that it should happen with finger trouble.
       | However, if I understand correctly from the comments, I'm
       | surprised at the tools, and that the vendor was managing the
       | filesystem. I'd expect to use https://github.com/cea-
       | hpc/robinhood/wiki with Lustre, though I thought I'd seen a Cray
       | presentation about tools of their own.
        
       | booleandilemma wrote:
       | It's amazing how human errors scale with technology. Just
       | imagine, one day we'll be making mistakes at the Type III
       | civilization level! :)
        
       | ketanip wrote:
       | Did they had any other separated backups ?
        
       | 0xbadcafebee wrote:
       | Not really surprising. HPE has provided bottom-of-the-barrel
       | support for decades.
        
       | rvnx wrote:
       | I really appreciate the announcement from Hewlett Packard, which
       | is very apologetic:
       | https://www.iimc.kyoto-u.ac.jp/services/comp/pdf/file_loss_i...
       | 
       | They do not try to blame it on complex systems or other factors.
       | 
       | Users lost 1 day and 1/2 of recent work (which doesn't seem to be
       | that bad).                 About file loss in Luster file system
       | in your supercomputer system, we are 100% responsible.       We
       | deeply apologize for causing a great deal of inconvenience due to
       | the serious failure of the file loss.            We would like to
       | report the background of the file disappearance, its root cause
       | and future countermeasures as follows:            We believe that
       | this file loss is 100% our responsibility.       We will offer
       | compensation for users who have lost files.            [...]
       | Impact:       --            Target file system: /LARGE0
       | Deleted files: December 14, 2021 17:32 to December 16, 2021 12:43
       | Files that were supposed to be deleted: Files that had not been
       | updated since 17:32 on December 3, 2021            [...]
       | Cause:       --            The backup script uses the find
       | command to delete log files that are older than 10 days.
       | A variable name is passed to the delete process of the find
       | command.            A new improved version of the script was
       | applied on the system.            However, during deployment,
       | there was a lack of consideration as the periodical script was
       | not disabled.            The modified shell script was reloaded
       | from the middle.            As a result, the find command
       | containing undefined variables was executed and deleted the
       | files.            [...]            Further measures:       --
       | In the future, the programs to be applied to the system will be
       | fully verified and applied.            We will examine the extent
       | of the impact and make improvements so that similar problems do
       | not occur.            In addition, we will re-educate the
       | engineers in charge of human error and risk prediction /
       | prevention to prevent recurrence.           We will thoroughly
       | implement the measures.
        
         | thaumasiotes wrote:
         | > However, during deployment, there was a lack of consideration
         | as the cronjob was not disabled.
         | 
         | I'm intrigued to see that the report you link (which is in
         | Japanese) mentions `find` and `bash` by those names, but
         | doesn't contain the word `cron`. How does the report refer to
         | the idea of a "cronjob"? Why is it different?
        
           | pm215 wrote:
           | The Japanese text in that PDF doesn't say anything about
           | cron. It just says that the script was overwritten "while
           | there was an executing script in existence" ("Shi Xing Zhong
           | nosukuriputogaCun Zai shiteiruZhuang Tai de"), and doesn't
           | say whether that was because that executing script was
           | launched by cron or by hand.
        
           | rvnx wrote:
           | I took: "bash ha, shierusukuriputonoShi Xing Zhong niShi Shi
           | shie", which means it's either cronjob or sleep with a loop (
           | https://iww.hateblo.jp/entry/20211229/file_lost_insident )
        
             | ramchip wrote:
             | I read this sentence as "Bash reads the shell script just-
             | in-time while executing it", with no context as to why it
             | was running (cron, loop, by hand...)
        
             | kragen wrote:
             | "shierusukuriputo" is "shell script", but my Japanese is
             | too poor to understand Shi Xing Zhong  or Shi Shi .
        
               | numpad0 wrote:
               | Shi Xing Zhong : while executing
               | 
               | Shi Shi : appropriate times, or as needed
        
               | rvnx wrote:
               | I guess the most correct context is that the script was
               | running "periodically".
               | 
               | https://zenn.dev/mattn/articles/5af86b61004bdc
               | https://iww.hateblo.jp/entry/20211229/file_lost_insident
        
         | jacquesm wrote:
         | The sense of honor and responsibility shining through is
         | refreshing.
        
         | [deleted]
        
         | doctor_eval wrote:
         | So this is something I've never understood. If you modify a
         | shell script while it's running, the shell executes the
         | modified file. This normally but not always causes the script
         | to fail.
         | 
         | Now I've known about this behaviour for a very long time and it
         | always seemed very broken to me. It's not how binaries work (at
         | least not when I was doing that kind of thing).
         | 
         | So I guess bash or whatever does an mmap of the script it's
         | running, which is presumably why modifications to the script
         | are visible immediately. But if a new file was installed eg
         | using cp/tar/unzip, I'm surprised that this didn't just unlink
         | the old script and create a new one - which would create a new
         | inode and therefore make the operation atomic, right? And this
         | (I assume) is why a recompiled binary doesn't have the same
         | problem (because the old binary is first unlinked).
         | 
         | So, how could this (IMO) bad behaviour be fixed? Presumably
         | mmap is used for efficiency, but isn't it possible to mark a
         | file as in use so it's cant be modified? I've certainly seen on
         | some old Unices that you can't overwrite a running binary. Why
         | can't we do the same with shell scripts?
         | 
         | Honestly, while it's great that HP is accepting responsibility,
         | and we know that this happens, the behaviour seems both
         | arbitrary and unnecessary to me. Is it fixable?
        
           | wongarsu wrote:
           | > isn't it possible to mark a file as in use so it's cant be
           | modified?
           | 
           | That's the route chosen by Windows for binary executables
           | (exe/dll) and various other systems. Locking a file against
           | writes, delete/rename or even read is just another flag in
           | the windows equivalent of fopen [1]. This makes for software
           | that's quite easy to reason about, but hard to update. The
           | reason why you have to restart Windows to install Windows
           | updates or even install some software is largely due to this
           | locking mechanism: you can't update files that are open (and
           | rename tricks don't work because locks apply to files, not
           | inodes).
           | 
           | With about three decades of hindsight I'm not sure if it's a
           | good tradeoff. It makes it easy to prevent the race
           | conditions that are an endless source of security bugs on
           | unix-like systems; but otoh most software doesn't use the
           | mechanism because it's not in the lowest-common-denominator
           | File APIs of most programming languages; and MS is paying for
           | it with users refusing to install updates because they don't
           | want to restart their PC.
           | 
           | 1: Search for FILE_SHARE_DELETE in
           | https://docs.microsoft.com/en-
           | us/windows/win32/api/fileapi/n...
        
             | pjmlp wrote:
             | Files in use can be shadow updated and then will be
             | actually replaced when possible.
             | 
             | Naturally no one reads MSDN docs.
             | 
             | Also to note that other non-UNIX clones follow similar
             | approach to file locking.
        
             | formerly_proven wrote:
             | On Unix/Linux you can't update a file mmaped for execution
             | either - text files are busy.
        
               | toast0 wrote:
               | I've updated .so files on FreeBSD while they're running.
               | They weren't busy and a program which had it mmaped to
               | run promptly crashed (my update wasn't intended to be hot
               | loaded and wasn't crafted to be safe, although, it could
               | have been if I knew it was possible). And now I won't
               | forget why I should use install instead of cp (install
               | unlinks before writing, by default, cp opens and
               | overwrites the existing file)
        
           | prussian wrote:
           | >So, how could this (IMO) bad behaviour be fixed?
           | 
           | By reading in the whole file at once. Bash does not mmap
           | shared the script it is parsing. You can see this behavior
           | with                   strace -e read,lseek bash << EOF
           | echo 1         echo 2         EOF
           | 
           | bash will read(), do its multi-step expansion-parsing thing
           | and then lseek back so the next read starts on the next input
           | it needs to handle. This is why the problems described in the
           | story can happen.
           | 
           | The other way to fix this is to simply use editors that will
           | just make a new file and move over that file on the target on
           | save. I believe vim or neovim does this by default, but
           | things like, ed or vi do not. Emacs will do something similar
           | on _first_ save if you did not (setq backup-by-copying t) but
           | any write after will still be done in-place. I tested this
           | trivially without reviewing the emacs source simply doing the
           | following and you can to with $EDITOR of choice:
           | !#/usr/bin/env bash         echo test         sleep 10
           | # evil command below, uncomment me and save         # echo
           | test2
           | 
           | while running sleep, if changing the script causes things to
           | happen, your editor may cause the problem described.
        
           | helsinkiandrew wrote:
           | > If you modify a shell script while it's running, the shell
           | executes the modified file
           | 
           | That is dependent on the OS. In this case wasn't the shell
           | script just executed fresh from a cronjob?
           | 
           | I remember on Digital Unix - on an Alpha so this was a few
           | years ago - that you could change a c program (a loop that
           | printed something then slept, for example), recompile and it
           | would change the running binary.
        
             | doctor_eval wrote:
             | > wasn't the shell script just executed fresh from a
             | cronjob?
             | 
             | The description said that the script changed while it was
             | running, so certain newly introduced environment variables
             | didn't have values and this triggered the issue.
             | 
             | My reading was that this was just a terrible coincidence -
             | the cron job must have started just before the upgrade.
             | 
             | Regarding changing a C program, now you mention it I think
             | that the behaviour you describe might also have happened on
             | DG/UX, after an upgrade. IIRC it used to use ETXTBSY and
             | after an upgrade it would just overwrite.
             | 
             | Not really behaviour that you want (or expect) tho.
        
           | eklavya wrote:
           | From what I know, so far linux doesn't have an exclusive lock
           | capability on a file, windows does however. So in linux you
           | can't mark a file in exclusive possession of a process.
        
             | [deleted]
        
             | eklavya wrote:
             | Down voters should read up on the state of mandatory
             | locking in Linux and what conditions need to be met and how
             | reliable it is.
        
           | kragen wrote:
           | This behavior in shell scripts predates mmap. In very early
           | versions of Unix it was arguably even useful; there was a
           | goto command which was implemented by seeking on the shell-
           | script file descriptor rather than as a shell builtin, for
           | example. I don't know of any use for it since the transition
           | to the Bourne shell, but my knowledge is far from
           | comprehensive. (I suppose if your shell script is not small
           | compared to the size of RAM, it might be undesirable to read
           | it all in at the start of execution; shar files are a real-
           | life example even on non-PDP-11 machines.)
           | 
           | As I understand it, the reason for ETXTBSY ("on some old
           | Unices...you can't overwrite a running binary") was to
           | prevent segfaults.
           | 
           | cp usually just opens the file O_WRONLY|O_TRUNC, which seems
           | like the wrong default; Emacs for example does create a new
           | file and rename it over the old one when you save, usually,
           | allocating a new inode as you say. By default it makes an
           | exception if there are other hardlinks to the file.
           | 
           | Btrfs and xfs have a "reflink" feature that allows you to
           | efficiently make a copy-on-write snapshot of a file, which
           | would be ideal for this sort of thing, since the shell or
           | whatever won't see any changes to the original file, even if
           | it's overwritten in place. Unfortunately I don't think you
           | can make _anonymous_ reflinks, so for the shell to reflink a
           | shell script when it starts executing it would need write
           | access to somewhere in the filesystem to put the reflink, and
           | then it would need to know how to find that place, somehow.
           | And of course that wouldn 't help if you were running on
           | ext4fs or, I imagine, Lustre, though apparently an
           | implementation was proposed in 02019:
           | https://wiki.lustre.org/Lreflink_High_Level_Design
        
             | sillysaurusx wrote:
             | "Emacs for example does create a new file and rename it
             | over the old one when you save, usually, allocating a new
             | inode as you say. By default it makes an exception if there
             | are other hardlinks to the file."
             | 
             | Though the trade off is that all operation ceases on a full
             | hard drive.
             | 
             | I don't have a better solution, but it's worth noting.
        
               | vbezhenar wrote:
               | Does it mean that I need to have extra free space? Does
               | not sound good.
        
               | thaumasiotes wrote:
               | Well, it looks like creating another hard link is a
               | nearly-free solution. And beyond that, since emacs
               | already has both behaviors, presumably you can tell it
               | you want the in-place modification.
        
             | exikyut wrote:
             | > _there was a goto command which was implemented by
             | seeking on the shell-script file descriptor rather than as
             | a shell builtin, for example._
             | 
             | Oh noooo I just realized you could probably implement a
             | shared library loadable module for bash `enable` that does
             | the same thing... just fseek()s the fd...
             | 
             | * _Runs for the hills screaming_ *
        
           | bluedino wrote:
           | It's nice to see the same mistakes that people have been
           | making for as long as I've been alive, on small and large
           | systems all over the world, still happen on projects with
           | professional teams from HPE or IBM that cost hundreds of
           | millions of dollars.
        
           | Hello71 wrote:
           | > So I guess bash or whatever does an mmap of the script it's
           | running
           | 
           | this is incorrect, and is relatively easy to test:
           | $ strace -y -P /tmp/test.sh bash /tmp/test.sh
           | ioctl(3</tmp/test.sh>, TCGETS, 0x7ffc6daea580) = -1 ENOTTY
           | (Inappropriate ioctl for device)       lseek(3</tmp/test.sh>,
           | 0, SEEK_CUR)     = 0       read(3</tmp/test.sh>,
           | "#!/bin/sh\n", 80) = 10       lseek(3</tmp/test.sh>, 0,
           | SEEK_SET)     = 0       dup2(3</tmp/test.sh>, 255)
           | = 255</tmp/test.sh>       close(3</tmp/test.sh>)
           | = 0       fcntl(255</tmp/test.sh>, F_SETFD, FD_CLOEXEC) = 0
           | fcntl(255</tmp/test.sh>, F_GETFL)       = 0x8000 (flags
           | O_RDONLY|O_LARGEFILE)       newfstatat(255</tmp/test.sh>, "",
           | {st_mode=S_IFREG|0644, st_size=10, ...}, AT_EMPTY_PATH) = 0
           | lseek(255</tmp/test.sh>, 0, SEEK_CUR)   = 0
           | read(255</tmp/test.sh>, "#!/bin/sh\n", 10) = 10
           | read(255</tmp/test.sh>, "", 10)         = 0
           | 
           | the reason why modifying a script during execution can have
           | unpredictable results, not demonstrated in this test, is that
           | Unix shells traditionally alternate between reading commands
           | and executing them, instead of reading the entire file
           | (potentially very large compared to 1970s RAM size) and
           | executing commands from the in-memory copy. on modern
           | systems, shell script sizes are usually negligible compared
           | to system RAM. therefore, you can manually cause the entire
           | file to be buffered by enclosing the script in a function or
           | subshell:                 #!/bin/sh       main() {       #
           | script goes here       }       main
        
         | KaiserPro wrote:
         | Ahhh the joy of lustre and the accidental cronjob.
         | 
         | about 15 years ago I experienced the same thing. An updater
         | script based on rsync was trying to keep one nfs machine image
         | in sync with another. However for what ever reason, the script
         | accidentally tries to sync the entire nfs root directory with
         | its own, deleting everything show by show in reverse
         | alphabetical order.
         | 
         | At the time Lustre didn't really have any good monitoring tools
         | for showing you who was doing what, so they had to wait till
         | they hit a normal NFS server before they could figure out and
         | stop what was deleting everything.
         | 
         | Needless to say, a lot of the backups may have been failing.
        
           | unixhero wrote:
           | For this reason I actually use simpler tools than rsync.
        
           | throwaway75787 wrote:
           | rsync has a number of safety and boundary options, not to
           | mention --dry-run.
           | 
           | Options, as in I also found out about them the hard way.
        
         | martin_vejmelka wrote:
         | Just pointing out that those are most likely just the days the
         | files were saved. There could still be some unlucky souls that
         | ran computations for several days/weeks that happened to
         | terminate on those days (and store the results). Those people
         | could lose significantly more than a day and a half. On the
         | flip side, HP jobs tend to be frequently checkpointed unless
         | the storage cost is prohibitive for the type of job.
        
         | Closi wrote:
         | Agreed - no corporate-speak, sounds like it was written by an
         | actual human.
        
           | Hamuko wrote:
           | It sounds a lot like Japanese corpo-speak.
        
             | Aeolun wrote:
             | Which is very formulaic, but also almost definitely written
             | by a human :D
        
         | dustintrex wrote:
         | Japanese companies structure apologies very differently from US
         | ones, because the legal consequences are very different. In the
         | US, an apology is considered an admission of responsibility and
         | is often the _starting_ point of legal action against the
         | culprit, while in Japan, a sufficiently sincere* apology may
         | well defuse the situation entirely.
         | 
         | * Zhen  makoto, a word often glossed as "sincere" but not
         | identical in meaning: it's about the amount of effort you're
         | willing to take on, not how "honestly" you feel something
         | 
         | Also, the culprit here is not HP proper but their consulting/SI
         | wing HP Enterprise, which has a, uhh, less than stellar
         | reputation for competence.
        
           | tiahura wrote:
           | "In the US, an apology is considered an admission of
           | responsibility and is often the starting point of legal
           | action against the culprit, while in Japan, a sufficiently
           | sincere* apology may well defuse the situation entirely."
           | 
           | -----
           | 
           | "hospital staff and doctors willing to discuss, apologize for
           | and resolve adverse medical events through a "collaborative
           | communication resolution program" experienced a significant
           | decrease in the filing of legal claims, defense costs,
           | liability costs and time required to close cases."
           | 
           | https://www.natlawreview.com/article/you-had-me-i-m-sorry-
           | im...
        
           | gilmore606 wrote:
           | I once caused a blank box to appear on Rakuten's homepage for
           | several hours. My boss had to fly to Japan to apologize in
           | person to their CEO.
        
             | zhte415 wrote:
             | I don't know a huge amount about Rakuten. Don't they
             | purposefully adopt a pretty flat communication structure,
             | and the CEO travels around a lot, as well as English
             | demanded in senior roles?
        
             | hcknwscommenter wrote:
             | Or that is what he used to justify a free (likely business
             | class) flight to JP.
        
           | cafard wrote:
           | I was deeply impressed some--30?--years ago when there was a
           | minor scandal in sumo wrestling: some kid, who had been
           | advanced too quickly did a few stupid things, as I recall the
           | sort of thing you can find in American sports pages every
           | week. The heads of the sumo wrestling association
           | acknowledged that they had contributed to the situation, and
           | docked their own pay. Do you think Roger Goodell is going to
           | do that?
        
           | numpad0 wrote:
           | > Bi She 100%noZe Ren niyori
           | 
           | Voluntarily stating 100% responsibility is consequential and
           | not typical, smells politics.
        
             | dustintrex wrote:
             | I presume there were extensive discussions between the two
             | parties about the wording before this statement was
             | published.
        
           | p_l wrote:
           | HP and HPE are now two separate companies, split from post-
           | Fiorina HP.
           | 
           | HP does consumer grade stuff only, while HPE does the
           | enterprise side (not just consulting, in fact non trivial
           | portion of HPE consulting arm was spun off and merged into
           | DXC)
        
             | KevinEldon wrote:
             | I agree with your point, and want to add that HP does
             | commercial grade end-user compute and printing along with
             | the related enterprise services. They have a whole set of
             | offerings for the medical industry [1], industrial printing
             | [2], and enterprise PC fleet management services [3].
             | 
             | [1] https://www.hp.com/us-
             | en/printers/3d-printers/industries/hea...
             | 
             | [2] https://www.hp.com/us-en/industrial-digital-
             | presses.html
             | 
             | [3] https://www.hp.com/us-en/services/manageability.html
        
             | exikyut wrote:
             | I think "sell the HP-35 for 3.14 x cost of materials" cool-
             | HP became Agilent, right?
             | 
             | If I wanted to follow the trail of awesomeness what forest
             | should I be sticking my nose to the ground in? :)
        
               | ted_dunning wrote:
               | That depends on which trail of awesomeness you are after.
               | 
               | The trail that maintains the legacy of DEC and Silicon
               | Graphics and Cray is in HPE (where I work). The Cray
               | legend is still very much alive, but you can still detect
               | the whiff of the the spirit that made HP and DEC
               | minicomputers extraordinary.
        
               | p_l wrote:
               | Well, I suspect the SGI legacy is now in better hands
               | than when it was controlled by Rackable with branding
               | filed off. The only good parts they sold us were the
               | Ultraviolets, and those were probably the most
               | nonsensical purchase (protip: do not buy supercomputer
               | modules just to run 8 VMs on it, it's waste of money even
               | if the hw is awesome)
        
               | dbuder wrote:
               | now Keysight, Agilent is medical equip.
        
             | hk1337 wrote:
             | More specifically, it spun off and merged with CSC to
             | create DXC.
        
           | resoluteteeth wrote:
           | > a sufficiently sincere* apology may well defuse the
           | situation entirely.
           | 
           | > * Zhen  makoto, a word often glossed as "sincere" but not
           | identical in meaning: it's about the amount of effort you're
           | willing to take on, not how "honestly" you feel something
           | 
           | While it's true that makoto can be translated as sincere(ly)
           | in constructions like "makotoni moushiwakearimasen" (I
           | sincerely apologize) (although this is written Cheng ni
           | rather than Zhen ni), it is unlikely that the word makoto
           | would be used in a phrase like "a sincere apology" or in
           | discussing how sincere an apology was, so I don't really
           | think introducing the word "makoto" in your comment sheds any
           | additional light on japanese culture surrounding apologies.
           | 
           | You could actually make the exact same comment about how
           | "real" in English can also effectively mean sincere in
           | "really sorry" and draw the same conclusions about American
           | culture.
        
             | eof wrote:
             | My take away (as westerner with zero direct Japanese
             | culture exposure) from the comment you're replying to was
             | that in Japan, companies are incentivized to take on some
             | measure of ownership and voluntary restitution, because
             | there is some legal notion in Japan around "honest mistakes
             | not being litigable if they are genuinely rectified."
        
               | Aeolun wrote:
               | I think it's more that legal measures would not be
               | employed, rather than that they are legally unactionable.
               | 
               | You could have an uphill battle against an unsympathetic
               | judge in front of you if you sue anyway though.
        
               | gowld wrote:
               | In the US "genuinely rectified" is the standard for civil
               | suit ("actual damages" for negligence, and "specific
               | performance" or monetary equivalent for contracts).
               | Punitive damages are added only for malicious intent.
               | 
               | The reason many US companies don't apologize is because
               | they _don 't want_ to make restitution, and can get away
               | without paying.
        
           | [deleted]
        
           | [deleted]
        
           | Hello71 wrote:
           | > In the US, an apology is considered an admission of
           | responsibility and is often the starting point of legal
           | action against the culprit
           | 
           | source? this is a popular theory among non-lawyers, but not,
           | as far as I can tell, well-supported by the evidence.
           | http://jaapl.org/content/early/2021/05/19/JAAPL.200107-20 has
           | extensive citations including for their claim that "In
           | theory, telling a patient about an error may make patients
           | more likely to pursue litigation. In practice, however, bad
           | outcomes alone are typically not reason enough for patients
           | or their families to file malpractice claims."
        
             | blacksmith_tb wrote:
             | Medicine may not fit the pattern (personally I'd want to
             | know where I stand, even if the news is bad), but I took
             | the OP to be saying "American firms prefer not to go on
             | record saying they screwed up, since that would naturally
             | be brought up in subsequent legal proceedings".
        
           | imiric wrote:
           | Apologies, both personal and corporate, are taken very
           | seriously in Japanese culture[1]. They're a way of preserving
           | honor that dates back to the samurai era. You can see this in
           | the custom of bowing, where the length and extension of a bow
           | reflects the gravity of the situation. The act of seppuku can
           | be considered an extreme version of this.
           | 
           | I'm not a Japanophile, but find their culture fascinating.
           | 
           | [1]:
           | https://theculturetrip.com/asia/japan/articles/sumimasen-
           | beh...
        
             | sandworm101 wrote:
             | Because Japanese culture accepts failure. Admission and
             | acceptance preserves honour. Western culture, particularly
             | in north America, punishes the admission of failure.
             | Everyone is supposed to fight things out to the last, to
             | never give an inch. Even when companies settle cases they
             | rarely admit wrongdoing publicly. Notice that this Japanese
             | company talks about re-educating/training personnel. An
             | American company would be expected to sack all involved and
             | sue the consulting firm.
        
               | tiahura wrote:
               | Hmm. I suppose it's subjective, but isn't America's
               | acceptance of failure often cited as an element in the
               | success of its entrepreneurial spirit?
               | 
               | And, wrt to samurais and failure, seppuku?
        
               | adventured wrote:
               | > isn't America's acceptance of failure often cited as an
               | element in the success of its entrepreneurial spirit
               | 
               | Yes, the parent comment is entirely wrong. The US culture
               | is hyper accepting of failure compared to most other
               | prominent cultures, including Japanese culture (which in
               | fact does not tolerate failure very well at all).
        
               | maneesh wrote:
               | Yes, I think op is conflating the terms of apology and
               | failure in this instance.
        
               | halpert wrote:
               | Weren't Japanese soldiers in WWII instructed to kill
               | themselves instead of being captured to "preserve honor?"
               | That doesn't sound like the acceptance of failure to me.
        
               | singlow wrote:
               | Only certain classes of officers if I remember correctly,
               | and it more likely was motivated by preventing the
               | leaking of intelligence under interrogation by the enemy,
               | rather than really being about preserving honor. Of
               | course it's probably easier to carry out seppuku if you
               | convince yourself its honorable, rather than to just do
               | it based on rationality.
               | 
               | Not all failure results in capture, so one particular
               | failure scenario does not speak of the general view on
               | failure.
        
               | moogleii wrote:
               | Citizens were also instructed to kill themselves.
        
               | bsanr2 wrote:
               | They were instructed to avoid capture by any means
               | because they had been indoctrinated by propaganda that
               | indicated horrific treatment by Allied POW personnel.
               | While this was obviously false, it's interesting to note
               | that Japanese knowledge of America's history with
               | slavery, segregation, and Indian removal would have made
               | this assumption not unreasonable, and further, may have
               | influenced Japanese treatment of American POWs. After
               | all, a major consideration in Japan's decision to go to
               | war in the first place was the leadership's understanding
               | that their lack of status as a white power would hamper
               | their colonial ambitions. They were only a few decades
               | removed from being excluded from the Berlin Conference,
               | for example.
        
               | shakow wrote:
               | > they had been indoctrinated by propaganda that
               | indicated horrific treatment by Allied POW personnel.
               | 
               | Sources? I have never heard of that yet.
        
               | AdrianB1 wrote:
               | Not true, there are lots of sources of information that
               | gives the entire picture: it is a tradition from the ages
               | of samurai.
        
               | argiopetech wrote:
               | It's far too much to go in to in the HN comments section
               | from a phone, but removing the key military leadership
               | who pushed this mindset, a new government structure, and
               | the rejuvenation of business under e.g., Deming caused a
               | significant shift in Japanese culture following the war.
               | 
               | Edit: This is not necessarily in support of the
               | grandparent comment; rather, a caution against judging a
               | culture based on historical anecdotes.
        
               | hollander wrote:
               | But at least you won't end up in court.
        
               | lvass wrote:
               | That's exactly what acceptance of failure sounds like to
               | me. E.g. you abandon a chess game when you accept you'll
               | lose. To just keep pushing for a situation you know is
               | irredeemable is to deny you've failed.
        
               | gowld wrote:
               | A captured solider in war doesn't get to go home and try
               | again after apologizing.
               | 
               | Also, going to war is inherently already a massive
               | corruption of culture.
        
               | AdrianB1 wrote:
               | When you learned about that, you did not read the entire
               | paragraph: being captured was traditionally considered a
               | dishonor (for the past 500 years, at least), it is a huge
               | failure for a soldier (or samurai). Ritual suicide is the
               | solution to that dishonor.
               | 
               | There are degrees of failure, some is considered to be
               | beyond fixing.
        
               | whatshisface wrote:
               | That sounds like saying that the ongoing US war crimes at
               | Guantanamo are important for understanding Google app
               | engine service contracts. Just because Japan is far away
               | doesn't mean that everything that happened there happened
               | in the same place at the same time.
        
             | moogleii wrote:
             | Unfortunately it can also be taken a bit too far. It's not
             | uncommon to essentially buy yourself out of punitive
             | criminal justice via an apology and some cash (jidan/gomen
             | money). It's not totally unlike a settlement here, except
             | it's much more acceptable as an opening move (as a victim,
             | you should seriously consider it). It seems to be inline
             | with the goal of preserving social harmony and making "it"
             | go away asap.
        
             | xvilka wrote:
             | On the other hand, the failure is generally frowned upon
             | and avoided at all costs.
        
             | EvanAnderson wrote:
             | Japanese firms are interesting to work with. I enjoyed the
             | culture, personally. I have two anecdotes.
             | 
             | The old white box PC shop I worked for, back in the 90s,
             | quoted PCs to a Japanese-owned auto parts manufacturer. The
             | Customer accepted the quote, paid an up-front deposit, and
             | requested the PCs not be built and delivered until some
             | construction at their site was completed in a few months.
             | 
             | In the meantime component pricing went down and
             | speeds/feeds went up. When the PCs were built we ended up
             | being forced to source higher clock speed CPUs and larger
             | hard disk drives. It took some cajoling to get the Customer
             | to take delivery. They felt they should pay more for the
             | upgrades. (We were actually making more money on the deal
             | even with the upgraded components anyway!)
             | 
             | My current company once pitched a support agreement to a
             | Japanese-owned firm. We offered a discount for annual
             | commitment versus month-to-month. I'd copy/pasted the
             | month-to-month terms for the annual but forgot to alter the
             | minimum notice period for ending the annual option. Both
             | month-to-month and annual indicated a notice period of 30
             | days rather than the intended 180 days for the annual
             | option.
             | 
             | The Customer's contact questioned the notice period being
             | the same. He asked why they wouldn't opt for annual
             | commitment, get the discounted rate, and also have the 30
             | day notice period. My partner, who had worked with Japanese
             | firms in the past, responded: "We know you wouldn't choose
             | the annual option if your intention was not to work with us
             | for at least a year." The Customer agreed and we ended up
             | getting the gig on an annual basis.
        
               | masklinn wrote:
               | > The Customer's contact questioned the notice period
               | being the same. He asked why they wouldn't opt for annual
               | commitment, get the discounted rate, and also have the 30
               | day notice period. My partner, who had worked with
               | Japanese firms in the past, responded: "We know you
               | wouldn't choose the annual option if your intention was
               | not to work with us for at least a year." The Customer
               | agreed and we ended up getting the gig on an annual
               | basis.
               | 
               | That's a really cute answer by your partner, I like it a
               | lot, even ignoring that it manages to save face while
               | paying serious respect to the purported customer.
        
               | EvanAnderson wrote:
               | Writing the parent comment gave me, if nothing else, an
               | excuse to publicly document that exchange. I was, and
               | still am, in awe of his ability to think on his feet and
               | come up with such a wholly appropriate response.
        
           | zengargoyle wrote:
           | I flip table rage quit my 15 year job at a university over a
           | new C-suite from who knows where throwing HPE
           | equipment/software/consultants at me for a big sort of
           | project and finding them all so utter crap that I would be a
           | part of that stupidity. I have my standards.
        
             | hpcjoe wrote:
             | Before I left HPE post Cray acquisition, some of the folks
             | in the consulting/"cloud" division insisted that to use
             | their tooling, we had to insert a windows machine into
             | clusterstor, that mounted lustre, so that they could run
             | their powershell script to gather usage metrics.
             | 
             | What I found working with these teams, was that the desire
             | to flip tables was quite strong after meeting with them. I
             | tried to address their concerns, one point at a time. They
             | were bewildered that windows could not mount (modern)
             | lustre. Really bewildered. I offered to help rewrite their
             | scripts in another (portable) language, so we could avoid
             | these problems. Still they were bewildered.
             | 
             | They were not why I left. Merely a confirmation that my
             | decision to leave was the right one.
        
         | codedokode wrote:
         | > As a result, the find command containing undefined variables
         | was executed
         | 
         | And this is why shell should not execute commands with
         | "undefined" variables and give an error instead.
        
         | taubek wrote:
         | 77TB in one and half day? Impressive.
         | 
         | The style of apology is very nice. It is not extensive as some
         | technical post mortem analysis that I've read, but all of the
         | important things are here.
        
         | agumonkey wrote:
         | What a strangely simple error
        
         | Solstinox wrote:
         | I think I found the problem:
         | 
         | "A new improved version of the script was applied on the
         | system."
        
         | olliej wrote:
         | 1.5 days isn't too bad. If it were me my primary concern would
         | be losing bash history :D
        
         | nikanj wrote:
         | Every shell script should start with set -e and set -u
        
           | thadk wrote:
           | https://www.gnu.org/software/bash/manual/bash.html#The-
           | Set-B...
        
           | sillysaurusx wrote:
           | e doesn't work for (subshell | commands), and u is
           | inconvenient when appending to PATHs. Every tool has its
           | place, and dogma is often unhelpful.
        
             | CyberShadow wrote:
             | > e doesn't work for (subshell | commands)
             | 
             | That's not an argument against enabling it.
             | 
             | In bash, -o pipefail addresses this.
             | 
             | > and u is inconvenient when appending to PATHs
             | 
             | PATH should always be set. Try: env -i sh -c 'echo $PATH'
             | 
             | If you're prioritizing convenience over correctness,
             | prepare to face the consequences.
             | 
             | > Every tool has its place, and dogma is often unhelpful.
             | 
             | Visual Basic's "ON ERROR RESUME NEXT" perhaps also had its
             | place. That doesn't mean that using it is good advice.
             | 
             | If anything, I would consider the often cited wooledge etc.
             | advice of not using -e/-u as dogma. Case in point: no one
             | lost 77TB of data because they should _not_ have used -e
             | /-u.
        
               | sillysaurusx wrote:
               | I said PATHs, not PATH. There are at least four I use on
               | a regular basis.
               | 
               | Super not interested in a pedantic debate. It's easy to
               | armchair analyze. I found flaws in 55 codebases at
               | Matasao, and yours is no exception.
               | 
               | e makes it super annoying to pass a variable number of
               | args to a script, since shift will fail and cause an
               | exit.
               | 
               | I do usually turn it on after, but you seem like the type
               | to fail a code review if a script doesn't start with it.
               | I don't think that's a productive attitude.
        
               | CyberShadow wrote:
               | That's quite a number of bad-faith assumptions in your
               | comment, which are also incidentally wrong.
               | 
               | > e makes it _super annoying_
               | 
               | I rest my case?
        
               | sillysaurusx wrote:
               | I don't think the audience is interested in this. If
               | you'd like to be specific, I'm happy to talk about
               | specific critiques. Otherwise it's just posturing, and
               | there are better things to do over the holidays.
               | 
               | The original assertion was that under no circumstances
               | should a bash script not begin with -e. I gave a
               | circumstance (passing optional arguments), and said dogma
               | is often counterproductive. I stand by all of those.
               | 
               | Let's agree to disagree and move on.
        
               | Aeolun wrote:
               | I kind of agree with your point that there should be
               | exceptions, but I think I also agree with OP that using
               | -e as a general rule is probably a safe starting point.
        
               | stonewareslord wrote:
               | I disagree. You can write shell scripts just fine and
               | always set -euo pipefail
               | 
               | * I'm not sure what you mean by four PATHs, but if you
               | really mean to be using unset variables for them, you
               | should be using " ${V-}" or "${V:-}" syntax which does
               | not fail. But again I don't know why you would do this
               | other than maybe [[ "${1-}" ]]
               | 
               | * Variable arguments are still trivial with $#. Check
               | (($#>3)), use while (($#>0)), etc
               | 
               | I also disagree that this is unproductive. With minor
               | modifications/(adding :- or -), you can prevent a whole
               | class of bugs (undefined variables). This woukd have
               | prevent real-world issues such as in the post here as
               | well as Steam when it wiped home directories since it ran
               | (not sure the exact syntax) rm -rf $STEAMROOT/* with an
               | unset variable
        
           | vbezhenar wrote:
           | I long adopted                   /bin/sh -eu
           | 
           | header in my scripts. It's a must-have.
        
             | CyberShadow wrote:
             | If you mean as a shebang (#!/bin/sh -eu), I would suggest
             | switching to using "set" instead, because the shebang will
             | not be interpreted if the script is ran as "sh script.sh"
             | (as opposed to ./script.sh).
        
             | kop316 wrote:
             | Perdon my ignorance, but what do those do? Searching for it
             | doesn't give me anything.
        
               | Symbiote wrote:
               | See
               | https://www.gnu.org/software/bash/manual/bash.html#The-
               | Set-B..., in short -e makes scripts exit if a command
               | fails, and -u makes them exit if a variable is undefined.
               | 
               | If you think your colleagues won't know this, "set -o
               | errexit; set -o nounset" would be easier for them to
               | search on.
               | 
               | (Via "3.4 Shell Parameters" - "3.4.1 Positional
               | Parameters" - "4 Shell Builtin Commands", or searching
               | the whole page for "-e".)
        
               | kop316 wrote:
               | Thank you very much, I appreciate it.
        
         | AdamJacobMuller wrote:
         | > The modified shell script was reloaded from the middle.
         | 
         | This is an incredible edge case. I'm amazed they hit this issue
         | and just as amazed that they correctly identified that issue
         | and reported on it.
         | 
         | This response is great, it's the exact opposite of the wishy-
         | washy mealy-mouthed response to the lastpass security incident.
        
         | capableweb wrote:
         | Interesting, seems the shell script was executed from the cron
         | job just as it was being replaced on the server itself?
        
         | s5300 wrote:
         | Huh. I may be remembering incorrectly, but I recall having
         | somebody somewhat entrenched in related business tell me that
         | HP has been going downhill from an industry perspective roughly
         | two years ago...
         | 
         | Nice to see them completely own up to the mistake right away. I
         | wonder who made the final call on doing so, companies admitting
         | fault so transparently & immediately offering recourse seems
         | pretty damn rare anymore.
         | 
         | Without the intent of sounding xenophobic, I wonder if it's
         | because it's HP Japan where reputation is much more culturally
         | important. US MBA's admitting fault... haha...
        
       | ananonymoususer wrote:
       | The cause of this is a known behavior of Unix/Linux scripts, but
       | unfortunately not everyone knows this. If you change a script
       | while it is running, the shell that runs it will read (what it
       | thinks is) the next line from the old script, but it will be
       | reading at the expected position in the old script file, but from
       | the new script file. So what it reads and executes will probably
       | not be what you wanted.
        
       | mayurbirle wrote:
       | nani!
        
       | pettycashstash2 wrote:
       | Looks like 10 of 14 groups were restored from backup.
        
       | rguiscard wrote:
       | In the process of functional modification of the backup program
       | by Hewlett-Packard Japan, the supplier of the supercomputer
       | system, there was a problem in the unintentional modification of
       | the program and its application procedure, which caused a
       | malfunction in the process of deleting the files under the
       | /LARGE0 directory instead of deleting the backup log files that
       | are no longer needed.
       | 
       | Translated with www.DeepL.com/Translator (free version)
        
       | Proven wrote:
        
       | vardump wrote:
       | That's a lot of floppy disks!
        
       | quelsolaar wrote:
       | Who brought tres commas?
        
         | alekun wrote:
         | just in case someone didn't see this masterpiece
         | https://youtu.be/vvDK8tMyCic
        
       | marcan_42 wrote:
       | Everyone is mentioning error control for shell scripts or "don't
       | use shell scripts", but neither of those are the solution to
       | _this_ problem. The solution to this problem is correctly
       | implementing atomic deployment, which is important for any system
       | using any programming language.
       | 
       | What I like to do is have two directories I ping pong between
       | when deploying, and a `cur` symlink that points to the current
       | version. The symlink is atomically replaced (new symlink and
       | rename it over) whenever the deploy process completes. Any
       | software/scripts using that tree will be written to first chdir()
       | in, which will resolve the symlink at that time, and thus won't
       | be affected by the deploy (at least as long as you don't do it
       | twice in a row; if that is a concern due to long running
       | processes, you could use timestamped directories instead and a
       | garbage collection process that cleans stuff up once it is
       | certain there are no users left).
        
         | db65edfc7996 wrote:
         | The original blue-green deployment strategy. I have done a
         | similar thing as well.
        
       | karlerss wrote:
       | When communicating non-critical data-loss to teammates, I like to
       | do it with this haiku:                 Three things are certain:
       | Death, taxes, and lost data.       Guess which has occurred.
       | 
       | From https://www.gnu.org/fun/jokes/error-haiku.en.html
        
       | nh2 wrote:
       | However, during deployment, there was a lack of consideration as
       | the periodical script was not disabled.              The modified
       | shell script was reloaded from the middle.
       | 
       | In my opinion, this is the wrong takeaway, and an important
       | lesson was not learned.
       | 
       | It's not an operator "lack of consideration".
       | 
       | The lesson should be "when dealing with important data, do not
       | use outrageously bad programming languages that allow run-time
       | code rewriting, and that continue to execute even in the presence
       | of undefined variables".
       | 
       | If you use shell scripting, this is bound to happen, and will
       | happen again.
       | 
       | "We'll use Python or anything else instead of shell" would
       | fundamentally remove the possibility of this category of failure.
        
         | toast0 wrote:
         | > outrageously bad programming languages that allow run-time
         | code rewriting
         | 
         | Almost all languages allow run-time code rewriting. Some of
         | them just make it easier than others, and some of them make it
         | a very useful feature. If you're very careful, updating a bash
         | script while you're running it can be useful, but most often
         | it's a mistake; in Erlang, hot loading is usually intentional
         | and often useful. Most other languages don't make it easy, so
         | you'll probably only do it if it's useful.
        
         | 0xbadcafebee wrote:
         | The problem was not that they used shell scripts. The problem
         | was that the people writing the shell scripts were just bad
         | programmers. If you hire a bad programmer to write them in
         | Python, they'll still have tons of bugs.
         | 
         | The shell scripts I write have fewer bugs than the Python code
         | I see other teams churn out. But that's because I know what I'm
         | doing. Don't hire people who don't know what they're doing.
        
       | j1elo wrote:
       | I guess this is as good of a time as any other to remind people
       | to use the "unofficial" Bash strict mode:
       | 
       | https://gist.github.com/robin-a-meade/58d60124b88b60816e8349...
       | [^1]
       | 
       | And always, _always_ , use ShellCheck
       | (https://www.shellcheck.net/) to catch most pitfalls and common
       | mistakes on this powerful but dangerous language that is shell
       | scripting.
       | 
       | [^1]: I think this gist is better than the original article in
       | which it is based, because the article also suggested changing
       | the IFS variable, which is _not_ that good of an advice, so sadly
       | the original text becomes a bad recommendation!
        
         | xvilka wrote:
         | And don't use shell for writing complex scripts, there are
         | better automation tools and languages.
        
           | [deleted]
        
           | j1elo wrote:
           | Good point, except if an important part of your complex
           | script is really just plumbing the outputs of one program to
           | the inputs of another. Because that's what shell scripting
           | excels at. Calling an external process is a first-class
           | citizen in shell, whereas it is a somewhat clunky thing (or
           | at the very least, much more verbose) to do in any other
           | languages.
        
           | rrll22 wrote:
           | Such as?
        
             | dqpb wrote:
             | Python
        
               | BeatQuestGames wrote:
               | For example,take my project.
               | 
               | https://github.com/Mylab6/PiBluetoothMidSetup
               | 
               | While I could of done this in Bash.
               | 
               | 1. I don't really like Bash
               | 
               | 2. Python is much easier. I did challenge myself to only
               | use Python's built in libraries, but aside from being
               | unable to use Yaml everything works.
               | 
               | I can imagine in some environments you might not have
               | access to a Python interrupter though...
        
         | thaumasiotes wrote:
         | > I guess this is as good of a time as any other to remind
         | people to use the "unofficial" Bash strict mode
         | 
         | Not really; the report doesn't mention any error in the script.
        
           | leoh wrote:
           | There is a reading which suggests that an environment
           | variable being unset caused an overabundance of files being
           | deleted. `set -u` causes the script to exit if any variables
           | are unset.
        
       | fred_is_fred wrote:
       | This is HPE - not HP. Servers, not Printers.
        
       ___________________________________________________________________
       (page generated 2021-12-30 23:02 UTC)