[HN Gopher] Japan HP accidentally deleted 77TB data in Kyoto U. ...
___________________________________________________________________
Japan HP accidentally deleted 77TB data in Kyoto U. supercomputing
system
Author : rguiscard
Score : 229 points
Date : 2021-12-30 05:43 UTC (17 hours ago)
(HTM) web link (www.iimc.kyoto-u.ac.jp)
(TXT) w3m dump (www.iimc.kyoto-u.ac.jp)
| deepsun wrote:
| Yet another bug due to using command-line interface (which is
| designed for humans not programs) by programs.
| Jimmc414 wrote:
| >the find command containing undefined variables was executed and
| deleted the files
|
| Just a note that "set -u" at the beginning of a bash script will
| cause it to throw an error for undefined variables. warning that
| of course this should be tested as it will also cause [[ $var ]]
| to fail.
|
| If that's the case
|
| [ -z "${VAR:-}" ] && echo "VAR is not set or is empty" || echo
| "VAR is set to $VAR"
|
| will help test that condition
| moonbug wrote:
| it's a lustre filesystem. the data would've been eaten eventually
| anyway.
| bayindirh wrote:
| What would make you think of that?
| l33tman wrote:
| I've been a Linux coder and user forever, and I didn't know that
| bash "reloads" a script while running if the file is modified.
| Good to learn before I also delete a whole filesystem due to
| this! :)
| [deleted]
| tyingq wrote:
| Is that what happened? I can't reproduce that by changing a
| bash script that's running a while [ 1 ] loop.
|
| Is it maybe that they were editing or copying the file and a
| cron job kicked off?
| marcan_42 wrote:
| That's because it's a loop, so it's already read. Try
| appending a line to a running script instead.
| tyingq wrote:
| Ah, yep. This does work, and prints out both "one" and
| "two": printf "echo one\nsleep 3\n" >
| s1;(bash s1 &);sleep 1 && printf "echo two\n" >> s1
|
| That's interesting. And changing the "sleep 1" to "sleep 4"
| make it only output "one".
| [deleted]
| pharmakom wrote:
| I have switched to F# for scripting tasks and have found F#
| scripts are (usually) either correct on the first try or fail at
| the type-checking stage. I would highly recommend it for anything
| near production.
| gnufx wrote:
| Assuming this was a "scratch" HPC filesystem, as I'd guess,
| "scratch" is used advisedly -- users should be prepared to lose
| anything on it, not that it should happen with finger trouble.
| However, if I understand correctly from the comments, I'm
| surprised at the tools, and that the vendor was managing the
| filesystem. I'd expect to use https://github.com/cea-
| hpc/robinhood/wiki with Lustre, though I thought I'd seen a Cray
| presentation about tools of their own.
| booleandilemma wrote:
| It's amazing how human errors scale with technology. Just
| imagine, one day we'll be making mistakes at the Type III
| civilization level! :)
| ketanip wrote:
| Did they had any other separated backups ?
| 0xbadcafebee wrote:
| Not really surprising. HPE has provided bottom-of-the-barrel
| support for decades.
| rvnx wrote:
| I really appreciate the announcement from Hewlett Packard, which
| is very apologetic:
| https://www.iimc.kyoto-u.ac.jp/services/comp/pdf/file_loss_i...
|
| They do not try to blame it on complex systems or other factors.
|
| Users lost 1 day and 1/2 of recent work (which doesn't seem to be
| that bad). About file loss in Luster file system
| in your supercomputer system, we are 100% responsible. We
| deeply apologize for causing a great deal of inconvenience due to
| the serious failure of the file loss. We would like to
| report the background of the file disappearance, its root cause
| and future countermeasures as follows: We believe that
| this file loss is 100% our responsibility. We will offer
| compensation for users who have lost files. [...]
| Impact: -- Target file system: /LARGE0
| Deleted files: December 14, 2021 17:32 to December 16, 2021 12:43
| Files that were supposed to be deleted: Files that had not been
| updated since 17:32 on December 3, 2021 [...]
| Cause: -- The backup script uses the find
| command to delete log files that are older than 10 days.
| A variable name is passed to the delete process of the find
| command. A new improved version of the script was
| applied on the system. However, during deployment,
| there was a lack of consideration as the periodical script was
| not disabled. The modified shell script was reloaded
| from the middle. As a result, the find command
| containing undefined variables was executed and deleted the
| files. [...] Further measures: --
| In the future, the programs to be applied to the system will be
| fully verified and applied. We will examine the extent
| of the impact and make improvements so that similar problems do
| not occur. In addition, we will re-educate the
| engineers in charge of human error and risk prediction /
| prevention to prevent recurrence. We will thoroughly
| implement the measures.
| thaumasiotes wrote:
| > However, during deployment, there was a lack of consideration
| as the cronjob was not disabled.
|
| I'm intrigued to see that the report you link (which is in
| Japanese) mentions `find` and `bash` by those names, but
| doesn't contain the word `cron`. How does the report refer to
| the idea of a "cronjob"? Why is it different?
| pm215 wrote:
| The Japanese text in that PDF doesn't say anything about
| cron. It just says that the script was overwritten "while
| there was an executing script in existence" ("Shi Xing Zhong
| nosukuriputogaCun Zai shiteiruZhuang Tai de"), and doesn't
| say whether that was because that executing script was
| launched by cron or by hand.
| rvnx wrote:
| I took: "bash ha, shierusukuriputonoShi Xing Zhong niShi Shi
| shie", which means it's either cronjob or sleep with a loop (
| https://iww.hateblo.jp/entry/20211229/file_lost_insident )
| ramchip wrote:
| I read this sentence as "Bash reads the shell script just-
| in-time while executing it", with no context as to why it
| was running (cron, loop, by hand...)
| kragen wrote:
| "shierusukuriputo" is "shell script", but my Japanese is
| too poor to understand Shi Xing Zhong or Shi Shi .
| numpad0 wrote:
| Shi Xing Zhong : while executing
|
| Shi Shi : appropriate times, or as needed
| rvnx wrote:
| I guess the most correct context is that the script was
| running "periodically".
|
| https://zenn.dev/mattn/articles/5af86b61004bdc
| https://iww.hateblo.jp/entry/20211229/file_lost_insident
| jacquesm wrote:
| The sense of honor and responsibility shining through is
| refreshing.
| [deleted]
| doctor_eval wrote:
| So this is something I've never understood. If you modify a
| shell script while it's running, the shell executes the
| modified file. This normally but not always causes the script
| to fail.
|
| Now I've known about this behaviour for a very long time and it
| always seemed very broken to me. It's not how binaries work (at
| least not when I was doing that kind of thing).
|
| So I guess bash or whatever does an mmap of the script it's
| running, which is presumably why modifications to the script
| are visible immediately. But if a new file was installed eg
| using cp/tar/unzip, I'm surprised that this didn't just unlink
| the old script and create a new one - which would create a new
| inode and therefore make the operation atomic, right? And this
| (I assume) is why a recompiled binary doesn't have the same
| problem (because the old binary is first unlinked).
|
| So, how could this (IMO) bad behaviour be fixed? Presumably
| mmap is used for efficiency, but isn't it possible to mark a
| file as in use so it's cant be modified? I've certainly seen on
| some old Unices that you can't overwrite a running binary. Why
| can't we do the same with shell scripts?
|
| Honestly, while it's great that HP is accepting responsibility,
| and we know that this happens, the behaviour seems both
| arbitrary and unnecessary to me. Is it fixable?
| wongarsu wrote:
| > isn't it possible to mark a file as in use so it's cant be
| modified?
|
| That's the route chosen by Windows for binary executables
| (exe/dll) and various other systems. Locking a file against
| writes, delete/rename or even read is just another flag in
| the windows equivalent of fopen [1]. This makes for software
| that's quite easy to reason about, but hard to update. The
| reason why you have to restart Windows to install Windows
| updates or even install some software is largely due to this
| locking mechanism: you can't update files that are open (and
| rename tricks don't work because locks apply to files, not
| inodes).
|
| With about three decades of hindsight I'm not sure if it's a
| good tradeoff. It makes it easy to prevent the race
| conditions that are an endless source of security bugs on
| unix-like systems; but otoh most software doesn't use the
| mechanism because it's not in the lowest-common-denominator
| File APIs of most programming languages; and MS is paying for
| it with users refusing to install updates because they don't
| want to restart their PC.
|
| 1: Search for FILE_SHARE_DELETE in
| https://docs.microsoft.com/en-
| us/windows/win32/api/fileapi/n...
| pjmlp wrote:
| Files in use can be shadow updated and then will be
| actually replaced when possible.
|
| Naturally no one reads MSDN docs.
|
| Also to note that other non-UNIX clones follow similar
| approach to file locking.
| formerly_proven wrote:
| On Unix/Linux you can't update a file mmaped for execution
| either - text files are busy.
| toast0 wrote:
| I've updated .so files on FreeBSD while they're running.
| They weren't busy and a program which had it mmaped to
| run promptly crashed (my update wasn't intended to be hot
| loaded and wasn't crafted to be safe, although, it could
| have been if I knew it was possible). And now I won't
| forget why I should use install instead of cp (install
| unlinks before writing, by default, cp opens and
| overwrites the existing file)
| prussian wrote:
| >So, how could this (IMO) bad behaviour be fixed?
|
| By reading in the whole file at once. Bash does not mmap
| shared the script it is parsing. You can see this behavior
| with strace -e read,lseek bash << EOF
| echo 1 echo 2 EOF
|
| bash will read(), do its multi-step expansion-parsing thing
| and then lseek back so the next read starts on the next input
| it needs to handle. This is why the problems described in the
| story can happen.
|
| The other way to fix this is to simply use editors that will
| just make a new file and move over that file on the target on
| save. I believe vim or neovim does this by default, but
| things like, ed or vi do not. Emacs will do something similar
| on _first_ save if you did not (setq backup-by-copying t) but
| any write after will still be done in-place. I tested this
| trivially without reviewing the emacs source simply doing the
| following and you can to with $EDITOR of choice:
| !#/usr/bin/env bash echo test sleep 10
| # evil command below, uncomment me and save # echo
| test2
|
| while running sleep, if changing the script causes things to
| happen, your editor may cause the problem described.
| helsinkiandrew wrote:
| > If you modify a shell script while it's running, the shell
| executes the modified file
|
| That is dependent on the OS. In this case wasn't the shell
| script just executed fresh from a cronjob?
|
| I remember on Digital Unix - on an Alpha so this was a few
| years ago - that you could change a c program (a loop that
| printed something then slept, for example), recompile and it
| would change the running binary.
| doctor_eval wrote:
| > wasn't the shell script just executed fresh from a
| cronjob?
|
| The description said that the script changed while it was
| running, so certain newly introduced environment variables
| didn't have values and this triggered the issue.
|
| My reading was that this was just a terrible coincidence -
| the cron job must have started just before the upgrade.
|
| Regarding changing a C program, now you mention it I think
| that the behaviour you describe might also have happened on
| DG/UX, after an upgrade. IIRC it used to use ETXTBSY and
| after an upgrade it would just overwrite.
|
| Not really behaviour that you want (or expect) tho.
| eklavya wrote:
| From what I know, so far linux doesn't have an exclusive lock
| capability on a file, windows does however. So in linux you
| can't mark a file in exclusive possession of a process.
| [deleted]
| eklavya wrote:
| Down voters should read up on the state of mandatory
| locking in Linux and what conditions need to be met and how
| reliable it is.
| kragen wrote:
| This behavior in shell scripts predates mmap. In very early
| versions of Unix it was arguably even useful; there was a
| goto command which was implemented by seeking on the shell-
| script file descriptor rather than as a shell builtin, for
| example. I don't know of any use for it since the transition
| to the Bourne shell, but my knowledge is far from
| comprehensive. (I suppose if your shell script is not small
| compared to the size of RAM, it might be undesirable to read
| it all in at the start of execution; shar files are a real-
| life example even on non-PDP-11 machines.)
|
| As I understand it, the reason for ETXTBSY ("on some old
| Unices...you can't overwrite a running binary") was to
| prevent segfaults.
|
| cp usually just opens the file O_WRONLY|O_TRUNC, which seems
| like the wrong default; Emacs for example does create a new
| file and rename it over the old one when you save, usually,
| allocating a new inode as you say. By default it makes an
| exception if there are other hardlinks to the file.
|
| Btrfs and xfs have a "reflink" feature that allows you to
| efficiently make a copy-on-write snapshot of a file, which
| would be ideal for this sort of thing, since the shell or
| whatever won't see any changes to the original file, even if
| it's overwritten in place. Unfortunately I don't think you
| can make _anonymous_ reflinks, so for the shell to reflink a
| shell script when it starts executing it would need write
| access to somewhere in the filesystem to put the reflink, and
| then it would need to know how to find that place, somehow.
| And of course that wouldn 't help if you were running on
| ext4fs or, I imagine, Lustre, though apparently an
| implementation was proposed in 02019:
| https://wiki.lustre.org/Lreflink_High_Level_Design
| sillysaurusx wrote:
| "Emacs for example does create a new file and rename it
| over the old one when you save, usually, allocating a new
| inode as you say. By default it makes an exception if there
| are other hardlinks to the file."
|
| Though the trade off is that all operation ceases on a full
| hard drive.
|
| I don't have a better solution, but it's worth noting.
| vbezhenar wrote:
| Does it mean that I need to have extra free space? Does
| not sound good.
| thaumasiotes wrote:
| Well, it looks like creating another hard link is a
| nearly-free solution. And beyond that, since emacs
| already has both behaviors, presumably you can tell it
| you want the in-place modification.
| exikyut wrote:
| > _there was a goto command which was implemented by
| seeking on the shell-script file descriptor rather than as
| a shell builtin, for example._
|
| Oh noooo I just realized you could probably implement a
| shared library loadable module for bash `enable` that does
| the same thing... just fseek()s the fd...
|
| * _Runs for the hills screaming_ *
| bluedino wrote:
| It's nice to see the same mistakes that people have been
| making for as long as I've been alive, on small and large
| systems all over the world, still happen on projects with
| professional teams from HPE or IBM that cost hundreds of
| millions of dollars.
| Hello71 wrote:
| > So I guess bash or whatever does an mmap of the script it's
| running
|
| this is incorrect, and is relatively easy to test:
| $ strace -y -P /tmp/test.sh bash /tmp/test.sh
| ioctl(3</tmp/test.sh>, TCGETS, 0x7ffc6daea580) = -1 ENOTTY
| (Inappropriate ioctl for device) lseek(3</tmp/test.sh>,
| 0, SEEK_CUR) = 0 read(3</tmp/test.sh>,
| "#!/bin/sh\n", 80) = 10 lseek(3</tmp/test.sh>, 0,
| SEEK_SET) = 0 dup2(3</tmp/test.sh>, 255)
| = 255</tmp/test.sh> close(3</tmp/test.sh>)
| = 0 fcntl(255</tmp/test.sh>, F_SETFD, FD_CLOEXEC) = 0
| fcntl(255</tmp/test.sh>, F_GETFL) = 0x8000 (flags
| O_RDONLY|O_LARGEFILE) newfstatat(255</tmp/test.sh>, "",
| {st_mode=S_IFREG|0644, st_size=10, ...}, AT_EMPTY_PATH) = 0
| lseek(255</tmp/test.sh>, 0, SEEK_CUR) = 0
| read(255</tmp/test.sh>, "#!/bin/sh\n", 10) = 10
| read(255</tmp/test.sh>, "", 10) = 0
|
| the reason why modifying a script during execution can have
| unpredictable results, not demonstrated in this test, is that
| Unix shells traditionally alternate between reading commands
| and executing them, instead of reading the entire file
| (potentially very large compared to 1970s RAM size) and
| executing commands from the in-memory copy. on modern
| systems, shell script sizes are usually negligible compared
| to system RAM. therefore, you can manually cause the entire
| file to be buffered by enclosing the script in a function or
| subshell: #!/bin/sh main() { #
| script goes here } main
| KaiserPro wrote:
| Ahhh the joy of lustre and the accidental cronjob.
|
| about 15 years ago I experienced the same thing. An updater
| script based on rsync was trying to keep one nfs machine image
| in sync with another. However for what ever reason, the script
| accidentally tries to sync the entire nfs root directory with
| its own, deleting everything show by show in reverse
| alphabetical order.
|
| At the time Lustre didn't really have any good monitoring tools
| for showing you who was doing what, so they had to wait till
| they hit a normal NFS server before they could figure out and
| stop what was deleting everything.
|
| Needless to say, a lot of the backups may have been failing.
| unixhero wrote:
| For this reason I actually use simpler tools than rsync.
| throwaway75787 wrote:
| rsync has a number of safety and boundary options, not to
| mention --dry-run.
|
| Options, as in I also found out about them the hard way.
| martin_vejmelka wrote:
| Just pointing out that those are most likely just the days the
| files were saved. There could still be some unlucky souls that
| ran computations for several days/weeks that happened to
| terminate on those days (and store the results). Those people
| could lose significantly more than a day and a half. On the
| flip side, HP jobs tend to be frequently checkpointed unless
| the storage cost is prohibitive for the type of job.
| Closi wrote:
| Agreed - no corporate-speak, sounds like it was written by an
| actual human.
| Hamuko wrote:
| It sounds a lot like Japanese corpo-speak.
| Aeolun wrote:
| Which is very formulaic, but also almost definitely written
| by a human :D
| dustintrex wrote:
| Japanese companies structure apologies very differently from US
| ones, because the legal consequences are very different. In the
| US, an apology is considered an admission of responsibility and
| is often the _starting_ point of legal action against the
| culprit, while in Japan, a sufficiently sincere* apology may
| well defuse the situation entirely.
|
| * Zhen makoto, a word often glossed as "sincere" but not
| identical in meaning: it's about the amount of effort you're
| willing to take on, not how "honestly" you feel something
|
| Also, the culprit here is not HP proper but their consulting/SI
| wing HP Enterprise, which has a, uhh, less than stellar
| reputation for competence.
| tiahura wrote:
| "In the US, an apology is considered an admission of
| responsibility and is often the starting point of legal
| action against the culprit, while in Japan, a sufficiently
| sincere* apology may well defuse the situation entirely."
|
| -----
|
| "hospital staff and doctors willing to discuss, apologize for
| and resolve adverse medical events through a "collaborative
| communication resolution program" experienced a significant
| decrease in the filing of legal claims, defense costs,
| liability costs and time required to close cases."
|
| https://www.natlawreview.com/article/you-had-me-i-m-sorry-
| im...
| gilmore606 wrote:
| I once caused a blank box to appear on Rakuten's homepage for
| several hours. My boss had to fly to Japan to apologize in
| person to their CEO.
| zhte415 wrote:
| I don't know a huge amount about Rakuten. Don't they
| purposefully adopt a pretty flat communication structure,
| and the CEO travels around a lot, as well as English
| demanded in senior roles?
| hcknwscommenter wrote:
| Or that is what he used to justify a free (likely business
| class) flight to JP.
| cafard wrote:
| I was deeply impressed some--30?--years ago when there was a
| minor scandal in sumo wrestling: some kid, who had been
| advanced too quickly did a few stupid things, as I recall the
| sort of thing you can find in American sports pages every
| week. The heads of the sumo wrestling association
| acknowledged that they had contributed to the situation, and
| docked their own pay. Do you think Roger Goodell is going to
| do that?
| numpad0 wrote:
| > Bi She 100%noZe Ren niyori
|
| Voluntarily stating 100% responsibility is consequential and
| not typical, smells politics.
| dustintrex wrote:
| I presume there were extensive discussions between the two
| parties about the wording before this statement was
| published.
| p_l wrote:
| HP and HPE are now two separate companies, split from post-
| Fiorina HP.
|
| HP does consumer grade stuff only, while HPE does the
| enterprise side (not just consulting, in fact non trivial
| portion of HPE consulting arm was spun off and merged into
| DXC)
| KevinEldon wrote:
| I agree with your point, and want to add that HP does
| commercial grade end-user compute and printing along with
| the related enterprise services. They have a whole set of
| offerings for the medical industry [1], industrial printing
| [2], and enterprise PC fleet management services [3].
|
| [1] https://www.hp.com/us-
| en/printers/3d-printers/industries/hea...
|
| [2] https://www.hp.com/us-en/industrial-digital-
| presses.html
|
| [3] https://www.hp.com/us-en/services/manageability.html
| exikyut wrote:
| I think "sell the HP-35 for 3.14 x cost of materials" cool-
| HP became Agilent, right?
|
| If I wanted to follow the trail of awesomeness what forest
| should I be sticking my nose to the ground in? :)
| ted_dunning wrote:
| That depends on which trail of awesomeness you are after.
|
| The trail that maintains the legacy of DEC and Silicon
| Graphics and Cray is in HPE (where I work). The Cray
| legend is still very much alive, but you can still detect
| the whiff of the the spirit that made HP and DEC
| minicomputers extraordinary.
| p_l wrote:
| Well, I suspect the SGI legacy is now in better hands
| than when it was controlled by Rackable with branding
| filed off. The only good parts they sold us were the
| Ultraviolets, and those were probably the most
| nonsensical purchase (protip: do not buy supercomputer
| modules just to run 8 VMs on it, it's waste of money even
| if the hw is awesome)
| dbuder wrote:
| now Keysight, Agilent is medical equip.
| hk1337 wrote:
| More specifically, it spun off and merged with CSC to
| create DXC.
| resoluteteeth wrote:
| > a sufficiently sincere* apology may well defuse the
| situation entirely.
|
| > * Zhen makoto, a word often glossed as "sincere" but not
| identical in meaning: it's about the amount of effort you're
| willing to take on, not how "honestly" you feel something
|
| While it's true that makoto can be translated as sincere(ly)
| in constructions like "makotoni moushiwakearimasen" (I
| sincerely apologize) (although this is written Cheng ni
| rather than Zhen ni), it is unlikely that the word makoto
| would be used in a phrase like "a sincere apology" or in
| discussing how sincere an apology was, so I don't really
| think introducing the word "makoto" in your comment sheds any
| additional light on japanese culture surrounding apologies.
|
| You could actually make the exact same comment about how
| "real" in English can also effectively mean sincere in
| "really sorry" and draw the same conclusions about American
| culture.
| eof wrote:
| My take away (as westerner with zero direct Japanese
| culture exposure) from the comment you're replying to was
| that in Japan, companies are incentivized to take on some
| measure of ownership and voluntary restitution, because
| there is some legal notion in Japan around "honest mistakes
| not being litigable if they are genuinely rectified."
| Aeolun wrote:
| I think it's more that legal measures would not be
| employed, rather than that they are legally unactionable.
|
| You could have an uphill battle against an unsympathetic
| judge in front of you if you sue anyway though.
| gowld wrote:
| In the US "genuinely rectified" is the standard for civil
| suit ("actual damages" for negligence, and "specific
| performance" or monetary equivalent for contracts).
| Punitive damages are added only for malicious intent.
|
| The reason many US companies don't apologize is because
| they _don 't want_ to make restitution, and can get away
| without paying.
| [deleted]
| [deleted]
| Hello71 wrote:
| > In the US, an apology is considered an admission of
| responsibility and is often the starting point of legal
| action against the culprit
|
| source? this is a popular theory among non-lawyers, but not,
| as far as I can tell, well-supported by the evidence.
| http://jaapl.org/content/early/2021/05/19/JAAPL.200107-20 has
| extensive citations including for their claim that "In
| theory, telling a patient about an error may make patients
| more likely to pursue litigation. In practice, however, bad
| outcomes alone are typically not reason enough for patients
| or their families to file malpractice claims."
| blacksmith_tb wrote:
| Medicine may not fit the pattern (personally I'd want to
| know where I stand, even if the news is bad), but I took
| the OP to be saying "American firms prefer not to go on
| record saying they screwed up, since that would naturally
| be brought up in subsequent legal proceedings".
| imiric wrote:
| Apologies, both personal and corporate, are taken very
| seriously in Japanese culture[1]. They're a way of preserving
| honor that dates back to the samurai era. You can see this in
| the custom of bowing, where the length and extension of a bow
| reflects the gravity of the situation. The act of seppuku can
| be considered an extreme version of this.
|
| I'm not a Japanophile, but find their culture fascinating.
|
| [1]:
| https://theculturetrip.com/asia/japan/articles/sumimasen-
| beh...
| sandworm101 wrote:
| Because Japanese culture accepts failure. Admission and
| acceptance preserves honour. Western culture, particularly
| in north America, punishes the admission of failure.
| Everyone is supposed to fight things out to the last, to
| never give an inch. Even when companies settle cases they
| rarely admit wrongdoing publicly. Notice that this Japanese
| company talks about re-educating/training personnel. An
| American company would be expected to sack all involved and
| sue the consulting firm.
| tiahura wrote:
| Hmm. I suppose it's subjective, but isn't America's
| acceptance of failure often cited as an element in the
| success of its entrepreneurial spirit?
|
| And, wrt to samurais and failure, seppuku?
| adventured wrote:
| > isn't America's acceptance of failure often cited as an
| element in the success of its entrepreneurial spirit
|
| Yes, the parent comment is entirely wrong. The US culture
| is hyper accepting of failure compared to most other
| prominent cultures, including Japanese culture (which in
| fact does not tolerate failure very well at all).
| maneesh wrote:
| Yes, I think op is conflating the terms of apology and
| failure in this instance.
| halpert wrote:
| Weren't Japanese soldiers in WWII instructed to kill
| themselves instead of being captured to "preserve honor?"
| That doesn't sound like the acceptance of failure to me.
| singlow wrote:
| Only certain classes of officers if I remember correctly,
| and it more likely was motivated by preventing the
| leaking of intelligence under interrogation by the enemy,
| rather than really being about preserving honor. Of
| course it's probably easier to carry out seppuku if you
| convince yourself its honorable, rather than to just do
| it based on rationality.
|
| Not all failure results in capture, so one particular
| failure scenario does not speak of the general view on
| failure.
| moogleii wrote:
| Citizens were also instructed to kill themselves.
| bsanr2 wrote:
| They were instructed to avoid capture by any means
| because they had been indoctrinated by propaganda that
| indicated horrific treatment by Allied POW personnel.
| While this was obviously false, it's interesting to note
| that Japanese knowledge of America's history with
| slavery, segregation, and Indian removal would have made
| this assumption not unreasonable, and further, may have
| influenced Japanese treatment of American POWs. After
| all, a major consideration in Japan's decision to go to
| war in the first place was the leadership's understanding
| that their lack of status as a white power would hamper
| their colonial ambitions. They were only a few decades
| removed from being excluded from the Berlin Conference,
| for example.
| shakow wrote:
| > they had been indoctrinated by propaganda that
| indicated horrific treatment by Allied POW personnel.
|
| Sources? I have never heard of that yet.
| AdrianB1 wrote:
| Not true, there are lots of sources of information that
| gives the entire picture: it is a tradition from the ages
| of samurai.
| argiopetech wrote:
| It's far too much to go in to in the HN comments section
| from a phone, but removing the key military leadership
| who pushed this mindset, a new government structure, and
| the rejuvenation of business under e.g., Deming caused a
| significant shift in Japanese culture following the war.
|
| Edit: This is not necessarily in support of the
| grandparent comment; rather, a caution against judging a
| culture based on historical anecdotes.
| hollander wrote:
| But at least you won't end up in court.
| lvass wrote:
| That's exactly what acceptance of failure sounds like to
| me. E.g. you abandon a chess game when you accept you'll
| lose. To just keep pushing for a situation you know is
| irredeemable is to deny you've failed.
| gowld wrote:
| A captured solider in war doesn't get to go home and try
| again after apologizing.
|
| Also, going to war is inherently already a massive
| corruption of culture.
| AdrianB1 wrote:
| When you learned about that, you did not read the entire
| paragraph: being captured was traditionally considered a
| dishonor (for the past 500 years, at least), it is a huge
| failure for a soldier (or samurai). Ritual suicide is the
| solution to that dishonor.
|
| There are degrees of failure, some is considered to be
| beyond fixing.
| whatshisface wrote:
| That sounds like saying that the ongoing US war crimes at
| Guantanamo are important for understanding Google app
| engine service contracts. Just because Japan is far away
| doesn't mean that everything that happened there happened
| in the same place at the same time.
| moogleii wrote:
| Unfortunately it can also be taken a bit too far. It's not
| uncommon to essentially buy yourself out of punitive
| criminal justice via an apology and some cash (jidan/gomen
| money). It's not totally unlike a settlement here, except
| it's much more acceptable as an opening move (as a victim,
| you should seriously consider it). It seems to be inline
| with the goal of preserving social harmony and making "it"
| go away asap.
| xvilka wrote:
| On the other hand, the failure is generally frowned upon
| and avoided at all costs.
| EvanAnderson wrote:
| Japanese firms are interesting to work with. I enjoyed the
| culture, personally. I have two anecdotes.
|
| The old white box PC shop I worked for, back in the 90s,
| quoted PCs to a Japanese-owned auto parts manufacturer. The
| Customer accepted the quote, paid an up-front deposit, and
| requested the PCs not be built and delivered until some
| construction at their site was completed in a few months.
|
| In the meantime component pricing went down and
| speeds/feeds went up. When the PCs were built we ended up
| being forced to source higher clock speed CPUs and larger
| hard disk drives. It took some cajoling to get the Customer
| to take delivery. They felt they should pay more for the
| upgrades. (We were actually making more money on the deal
| even with the upgraded components anyway!)
|
| My current company once pitched a support agreement to a
| Japanese-owned firm. We offered a discount for annual
| commitment versus month-to-month. I'd copy/pasted the
| month-to-month terms for the annual but forgot to alter the
| minimum notice period for ending the annual option. Both
| month-to-month and annual indicated a notice period of 30
| days rather than the intended 180 days for the annual
| option.
|
| The Customer's contact questioned the notice period being
| the same. He asked why they wouldn't opt for annual
| commitment, get the discounted rate, and also have the 30
| day notice period. My partner, who had worked with Japanese
| firms in the past, responded: "We know you wouldn't choose
| the annual option if your intention was not to work with us
| for at least a year." The Customer agreed and we ended up
| getting the gig on an annual basis.
| masklinn wrote:
| > The Customer's contact questioned the notice period
| being the same. He asked why they wouldn't opt for annual
| commitment, get the discounted rate, and also have the 30
| day notice period. My partner, who had worked with
| Japanese firms in the past, responded: "We know you
| wouldn't choose the annual option if your intention was
| not to work with us for at least a year." The Customer
| agreed and we ended up getting the gig on an annual
| basis.
|
| That's a really cute answer by your partner, I like it a
| lot, even ignoring that it manages to save face while
| paying serious respect to the purported customer.
| EvanAnderson wrote:
| Writing the parent comment gave me, if nothing else, an
| excuse to publicly document that exchange. I was, and
| still am, in awe of his ability to think on his feet and
| come up with such a wholly appropriate response.
| zengargoyle wrote:
| I flip table rage quit my 15 year job at a university over a
| new C-suite from who knows where throwing HPE
| equipment/software/consultants at me for a big sort of
| project and finding them all so utter crap that I would be a
| part of that stupidity. I have my standards.
| hpcjoe wrote:
| Before I left HPE post Cray acquisition, some of the folks
| in the consulting/"cloud" division insisted that to use
| their tooling, we had to insert a windows machine into
| clusterstor, that mounted lustre, so that they could run
| their powershell script to gather usage metrics.
|
| What I found working with these teams, was that the desire
| to flip tables was quite strong after meeting with them. I
| tried to address their concerns, one point at a time. They
| were bewildered that windows could not mount (modern)
| lustre. Really bewildered. I offered to help rewrite their
| scripts in another (portable) language, so we could avoid
| these problems. Still they were bewildered.
|
| They were not why I left. Merely a confirmation that my
| decision to leave was the right one.
| codedokode wrote:
| > As a result, the find command containing undefined variables
| was executed
|
| And this is why shell should not execute commands with
| "undefined" variables and give an error instead.
| taubek wrote:
| 77TB in one and half day? Impressive.
|
| The style of apology is very nice. It is not extensive as some
| technical post mortem analysis that I've read, but all of the
| important things are here.
| agumonkey wrote:
| What a strangely simple error
| Solstinox wrote:
| I think I found the problem:
|
| "A new improved version of the script was applied on the
| system."
| olliej wrote:
| 1.5 days isn't too bad. If it were me my primary concern would
| be losing bash history :D
| nikanj wrote:
| Every shell script should start with set -e and set -u
| thadk wrote:
| https://www.gnu.org/software/bash/manual/bash.html#The-
| Set-B...
| sillysaurusx wrote:
| e doesn't work for (subshell | commands), and u is
| inconvenient when appending to PATHs. Every tool has its
| place, and dogma is often unhelpful.
| CyberShadow wrote:
| > e doesn't work for (subshell | commands)
|
| That's not an argument against enabling it.
|
| In bash, -o pipefail addresses this.
|
| > and u is inconvenient when appending to PATHs
|
| PATH should always be set. Try: env -i sh -c 'echo $PATH'
|
| If you're prioritizing convenience over correctness,
| prepare to face the consequences.
|
| > Every tool has its place, and dogma is often unhelpful.
|
| Visual Basic's "ON ERROR RESUME NEXT" perhaps also had its
| place. That doesn't mean that using it is good advice.
|
| If anything, I would consider the often cited wooledge etc.
| advice of not using -e/-u as dogma. Case in point: no one
| lost 77TB of data because they should _not_ have used -e
| /-u.
| sillysaurusx wrote:
| I said PATHs, not PATH. There are at least four I use on
| a regular basis.
|
| Super not interested in a pedantic debate. It's easy to
| armchair analyze. I found flaws in 55 codebases at
| Matasao, and yours is no exception.
|
| e makes it super annoying to pass a variable number of
| args to a script, since shift will fail and cause an
| exit.
|
| I do usually turn it on after, but you seem like the type
| to fail a code review if a script doesn't start with it.
| I don't think that's a productive attitude.
| CyberShadow wrote:
| That's quite a number of bad-faith assumptions in your
| comment, which are also incidentally wrong.
|
| > e makes it _super annoying_
|
| I rest my case?
| sillysaurusx wrote:
| I don't think the audience is interested in this. If
| you'd like to be specific, I'm happy to talk about
| specific critiques. Otherwise it's just posturing, and
| there are better things to do over the holidays.
|
| The original assertion was that under no circumstances
| should a bash script not begin with -e. I gave a
| circumstance (passing optional arguments), and said dogma
| is often counterproductive. I stand by all of those.
|
| Let's agree to disagree and move on.
| Aeolun wrote:
| I kind of agree with your point that there should be
| exceptions, but I think I also agree with OP that using
| -e as a general rule is probably a safe starting point.
| stonewareslord wrote:
| I disagree. You can write shell scripts just fine and
| always set -euo pipefail
|
| * I'm not sure what you mean by four PATHs, but if you
| really mean to be using unset variables for them, you
| should be using " ${V-}" or "${V:-}" syntax which does
| not fail. But again I don't know why you would do this
| other than maybe [[ "${1-}" ]]
|
| * Variable arguments are still trivial with $#. Check
| (($#>3)), use while (($#>0)), etc
|
| I also disagree that this is unproductive. With minor
| modifications/(adding :- or -), you can prevent a whole
| class of bugs (undefined variables). This woukd have
| prevent real-world issues such as in the post here as
| well as Steam when it wiped home directories since it ran
| (not sure the exact syntax) rm -rf $STEAMROOT/* with an
| unset variable
| vbezhenar wrote:
| I long adopted /bin/sh -eu
|
| header in my scripts. It's a must-have.
| CyberShadow wrote:
| If you mean as a shebang (#!/bin/sh -eu), I would suggest
| switching to using "set" instead, because the shebang will
| not be interpreted if the script is ran as "sh script.sh"
| (as opposed to ./script.sh).
| kop316 wrote:
| Perdon my ignorance, but what do those do? Searching for it
| doesn't give me anything.
| Symbiote wrote:
| See
| https://www.gnu.org/software/bash/manual/bash.html#The-
| Set-B..., in short -e makes scripts exit if a command
| fails, and -u makes them exit if a variable is undefined.
|
| If you think your colleagues won't know this, "set -o
| errexit; set -o nounset" would be easier for them to
| search on.
|
| (Via "3.4 Shell Parameters" - "3.4.1 Positional
| Parameters" - "4 Shell Builtin Commands", or searching
| the whole page for "-e".)
| kop316 wrote:
| Thank you very much, I appreciate it.
| AdamJacobMuller wrote:
| > The modified shell script was reloaded from the middle.
|
| This is an incredible edge case. I'm amazed they hit this issue
| and just as amazed that they correctly identified that issue
| and reported on it.
|
| This response is great, it's the exact opposite of the wishy-
| washy mealy-mouthed response to the lastpass security incident.
| capableweb wrote:
| Interesting, seems the shell script was executed from the cron
| job just as it was being replaced on the server itself?
| s5300 wrote:
| Huh. I may be remembering incorrectly, but I recall having
| somebody somewhat entrenched in related business tell me that
| HP has been going downhill from an industry perspective roughly
| two years ago...
|
| Nice to see them completely own up to the mistake right away. I
| wonder who made the final call on doing so, companies admitting
| fault so transparently & immediately offering recourse seems
| pretty damn rare anymore.
|
| Without the intent of sounding xenophobic, I wonder if it's
| because it's HP Japan where reputation is much more culturally
| important. US MBA's admitting fault... haha...
| ananonymoususer wrote:
| The cause of this is a known behavior of Unix/Linux scripts, but
| unfortunately not everyone knows this. If you change a script
| while it is running, the shell that runs it will read (what it
| thinks is) the next line from the old script, but it will be
| reading at the expected position in the old script file, but from
| the new script file. So what it reads and executes will probably
| not be what you wanted.
| mayurbirle wrote:
| nani!
| pettycashstash2 wrote:
| Looks like 10 of 14 groups were restored from backup.
| rguiscard wrote:
| In the process of functional modification of the backup program
| by Hewlett-Packard Japan, the supplier of the supercomputer
| system, there was a problem in the unintentional modification of
| the program and its application procedure, which caused a
| malfunction in the process of deleting the files under the
| /LARGE0 directory instead of deleting the backup log files that
| are no longer needed.
|
| Translated with www.DeepL.com/Translator (free version)
| Proven wrote:
| vardump wrote:
| That's a lot of floppy disks!
| quelsolaar wrote:
| Who brought tres commas?
| alekun wrote:
| just in case someone didn't see this masterpiece
| https://youtu.be/vvDK8tMyCic
| marcan_42 wrote:
| Everyone is mentioning error control for shell scripts or "don't
| use shell scripts", but neither of those are the solution to
| _this_ problem. The solution to this problem is correctly
| implementing atomic deployment, which is important for any system
| using any programming language.
|
| What I like to do is have two directories I ping pong between
| when deploying, and a `cur` symlink that points to the current
| version. The symlink is atomically replaced (new symlink and
| rename it over) whenever the deploy process completes. Any
| software/scripts using that tree will be written to first chdir()
| in, which will resolve the symlink at that time, and thus won't
| be affected by the deploy (at least as long as you don't do it
| twice in a row; if that is a concern due to long running
| processes, you could use timestamped directories instead and a
| garbage collection process that cleans stuff up once it is
| certain there are no users left).
| db65edfc7996 wrote:
| The original blue-green deployment strategy. I have done a
| similar thing as well.
| karlerss wrote:
| When communicating non-critical data-loss to teammates, I like to
| do it with this haiku: Three things are certain:
| Death, taxes, and lost data. Guess which has occurred.
|
| From https://www.gnu.org/fun/jokes/error-haiku.en.html
| nh2 wrote:
| However, during deployment, there was a lack of consideration as
| the periodical script was not disabled. The modified
| shell script was reloaded from the middle.
|
| In my opinion, this is the wrong takeaway, and an important
| lesson was not learned.
|
| It's not an operator "lack of consideration".
|
| The lesson should be "when dealing with important data, do not
| use outrageously bad programming languages that allow run-time
| code rewriting, and that continue to execute even in the presence
| of undefined variables".
|
| If you use shell scripting, this is bound to happen, and will
| happen again.
|
| "We'll use Python or anything else instead of shell" would
| fundamentally remove the possibility of this category of failure.
| toast0 wrote:
| > outrageously bad programming languages that allow run-time
| code rewriting
|
| Almost all languages allow run-time code rewriting. Some of
| them just make it easier than others, and some of them make it
| a very useful feature. If you're very careful, updating a bash
| script while you're running it can be useful, but most often
| it's a mistake; in Erlang, hot loading is usually intentional
| and often useful. Most other languages don't make it easy, so
| you'll probably only do it if it's useful.
| 0xbadcafebee wrote:
| The problem was not that they used shell scripts. The problem
| was that the people writing the shell scripts were just bad
| programmers. If you hire a bad programmer to write them in
| Python, they'll still have tons of bugs.
|
| The shell scripts I write have fewer bugs than the Python code
| I see other teams churn out. But that's because I know what I'm
| doing. Don't hire people who don't know what they're doing.
| j1elo wrote:
| I guess this is as good of a time as any other to remind people
| to use the "unofficial" Bash strict mode:
|
| https://gist.github.com/robin-a-meade/58d60124b88b60816e8349...
| [^1]
|
| And always, _always_ , use ShellCheck
| (https://www.shellcheck.net/) to catch most pitfalls and common
| mistakes on this powerful but dangerous language that is shell
| scripting.
|
| [^1]: I think this gist is better than the original article in
| which it is based, because the article also suggested changing
| the IFS variable, which is _not_ that good of an advice, so sadly
| the original text becomes a bad recommendation!
| xvilka wrote:
| And don't use shell for writing complex scripts, there are
| better automation tools and languages.
| [deleted]
| j1elo wrote:
| Good point, except if an important part of your complex
| script is really just plumbing the outputs of one program to
| the inputs of another. Because that's what shell scripting
| excels at. Calling an external process is a first-class
| citizen in shell, whereas it is a somewhat clunky thing (or
| at the very least, much more verbose) to do in any other
| languages.
| rrll22 wrote:
| Such as?
| dqpb wrote:
| Python
| BeatQuestGames wrote:
| For example,take my project.
|
| https://github.com/Mylab6/PiBluetoothMidSetup
|
| While I could of done this in Bash.
|
| 1. I don't really like Bash
|
| 2. Python is much easier. I did challenge myself to only
| use Python's built in libraries, but aside from being
| unable to use Yaml everything works.
|
| I can imagine in some environments you might not have
| access to a Python interrupter though...
| thaumasiotes wrote:
| > I guess this is as good of a time as any other to remind
| people to use the "unofficial" Bash strict mode
|
| Not really; the report doesn't mention any error in the script.
| leoh wrote:
| There is a reading which suggests that an environment
| variable being unset caused an overabundance of files being
| deleted. `set -u` causes the script to exit if any variables
| are unset.
| fred_is_fred wrote:
| This is HPE - not HP. Servers, not Printers.
___________________________________________________________________
(page generated 2021-12-30 23:02 UTC)