[HN Gopher] Unix Admin Horror Story Summary (1992)
       ___________________________________________________________________
        
       Unix Admin Horror Story Summary (1992)
        
       Author : voxadam
       Score  : 102 points
       Date   : 2021-12-21 15:38 UTC (7 hours ago)
        
 (HTM) web link (www-uxsup.csx.cam.ac.uk)
 (TXT) w3m dump (www-uxsup.csx.cam.ac.uk)
        
       | cecilpl2 wrote:
       | I found my favorite story buried in the middle of this, from
       | 1986. It's a classic on par with The Story of Mel, a Real
       | Programmer. Reproduced here for your reading pleasure:
       | Have you ever left your terminal logged in, only to find when you
       | came       back to it that a (supposed) friend had typed "rm -rf
       | ~/*" and was       hovering over the keyboard with threats along
       | the lines of "lend me a       fiver 'til Thursday, or I hit
       | return"?  Undoubtedly the person in       question would not have
       | had the nerve to inflict such a trauma upon       you, and was
       | doing it in jest.  So you've probably never experienced the
       | worst of such disasters....              It was a quiet Wednesday
       | afternoon.  Wednesday, 1st October, 15:15       BST, to be
       | precise, when Peter, an office-mate of mine, leaned away
       | from his terminal and said to me, "Mario, I'm having a little
       | trouble       sending mail."  Knowing that msg was capable of
       | confusing even the       most capable of people, I sauntered over
       | to his terminal to see what       was wrong.  A strange error
       | message of the form (I forget the exact       details) "cannot
       | access /foo/bar for userid 147" had been issued by       msg.  My
       | first thought was "Who's userid 147?; the sender of the
       | message, the destination, or what?"  So I leant over to another
       | terminal, already logged in, and typed               grep 147
       | /etc/passwd       only to receive the response
       | /etc/passwd: No such file or directory.              Instantly, I
       | guessed that something was amiss.  This was confirmed       when
       | in response to               ls /etc       I got
       | ls: not found.              I suggested to Peter that it would be
       | a good idea not to try anything       for a while, and went off
       | to find our system manager.              When I arrived at his
       | office, his door was ajar, and within ten       seconds I
       | realised what the problem was.  James, our manager, was       sat
       | down, head in hands, hands between knees, as one whose world has
       | just come to an end.  Our newly-appointed system programmer,
       | Neil, was       beside him, gazing listlessly at the screen of
       | his terminal.  And at       the top of the screen I spied the
       | following lines:               # cd               # rm -rf *
       | Oh, shit, I thought.  That would just about explain it.
       | I can't remember what happened in the succeeding minutes; my
       | memory is       just a blur.  I do remember trying ls (again),
       | ps, who and maybe a few       other commands beside, all to no
       | avail.  The next thing I remember was       being at my terminal
       | again (a multi-window graphics terminal), and       typing
       | cd /               echo \*       I owe a debt of thanks to David
       | Korn for making echo a built-in of his       shell; needless to
       | say, /bin, together with /bin/echo, had been       deleted.  What
       | transpired in the next few minutes was that /dev, /etc       and
       | /lib had also gone in their entirety; fortunately Neil had
       | interrupted rm while it was somewhere down below /news, and /tmp,
       | /usr       and /users were all untouched.              Meanwhile
       | James had made for our tape cupboard and had retrieved what
       | claimed to be a dump tape of the root filesystem, taken four
       | weeks       earlier.  The pressing question was, "How do we
       | recover the contents       of the tape?".  Not only had we lost
       | /etc/restore, but all of the       device entries for the tape
       | deck had vanished.  And where does mknod       live?  You guessed
       | it, /etc.  How about recovery across Ethernet of       any of
       | this from another VAX?  Well, /bin/tar had gone, and
       | thoughtfully the Berkeley people had put rcp in /bin in the 4.3
       | distribution.  What's more, none of the Ether stuff wanted to
       | know       without /etc/hosts at least.  We found a version of
       | cpio in       /usr/local, but that was unlikely to do us any good
       | without a tape       deck.              Alternatively, we could
       | get the boot tape out and rebuild the root       filesystem, but
       | neither James nor Neil had done that before, and we       weren't
       | sure that the first thing to happen would be that the whole
       | disk would be re-formatted, losing all our user files.  (We take
       | dumps       of the user files every Thursday; by Murphy's Law
       | this had to happen       on a Wednesday).  Another solution might
       | be to borrow a disk from       another VAX, boot off that, and
       | tidy up later, but that would have       entailed calling the DEC
       | engineer out, at the very least.  We had a       number of users
       | in the final throes of writing up PhD theses and the       loss
       | of a maybe a weeks' work (not to mention the machine down time)
       | was unthinkable.              So, what to do?  The next idea was
       | to write a program to make a device       descriptor for the tape
       | deck, but we all know where cc, as and ld       live.  Or maybe
       | make skeletal entries for /etc/passwd, /etc/hosts and       so
       | on, so that /usr/bin/ftp would work.  By sheer luck, I had a
       | gnuemacs still running in one of my windows, which we could use
       | to       create passwd, etc., but the first step was to create a
       | directory to       put them in.  Of course /bin/mkdir had gone,
       | and so had /bin/mv, so we       couldn't rename /tmp to /etc.
       | However, this looked like a reasonable       line of attack.
       | By now we had been joined by Alasdair, our resident UNIX guru,
       | and as       luck would have it, someone who knows VAX assembler.
       | So our plan       became this: write a program in assembler which
       | would either rename       /tmp to /etc, or make /etc, assemble it
       | on another VAX, uuencode it,       type in the uuencoded file
       | using my gnu, uudecode it (some bright       spark had thought to
       | put uudecode in /usr/bin), run it, and hey       presto, it would
       | all be plain sailing from there.  By yet another       miracle of
       | good fortune, the terminal from which the damage had been
       | done was still su'd to root (su is in /bin, remember?), so at
       | least we       stood a chance of all this working.
       | Off we set on our merry way, and within only an hour we had
       | managed to       concoct the dozen or so lines of assembler to
       | create /etc.  The       stripped binary was only 76 bytes long,
       | so we converted it to hex       (slightly more readable than the
       | output of uuencode), and typed it in       using my editor.  If
       | any of you ever have the same problem, here's the       hex for
       | future reference:
       | 070100002c000000000000000000000000000000000000000000000000000000
       | 0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f
       | 8800040000bc012f65746300              I had a handy program
       | around (doesn't everybody?) for converting ASCII       hex to
       | binary, and the output of /usr/bin/sum tallied with our
       | original binary.  But hang on---how do you set execute permission
       | without /bin/chmod?  A few seconds thought (which as usual,
       | lasted a       couple of minutes) suggested that we write the
       | binary on top of an       already existing binary, owned by
       | me...problem solved.              So along we trotted to the
       | terminal with the root login, carefully       remembered to set
       | the umask to 0 (so that I could create files in it       using my
       | gnu), and ran the binary.  So now we had a /etc, writable by
       | all.  From there it was but a few easy steps to creating passwd,
       | hosts, services, protocols, (etc), and then ftp was willing to
       | play       ball.  Then we recovered the contents of /bin across
       | the ether (it's       amazing how much you come to miss ls after
       | just a few, short hours),       and selected files from /etc.
       | The key file was /etc/rrestore, with       which we recovered
       | /dev from the dump tape, and the rest is history.
       | Now, you're asking yourself (as I am), what's the moral of this
       | story?       Well, for one thing, you must always remember the
       | immortal words,       DON'T PANIC.  Our initial reaction was to
       | reboot the machine and try       everything as single user, but
       | it's unlikely it would have come up       without /etc/init and
       | /bin/sh.  Rational thought saved us from this       one.
       | The next thing to remember is that UNIX tools really can be put
       | to       unusual purposes.  Even without my gnuemacs, we could
       | have survived by       using, say, /usr/bin/grep as a substitute
       | for /bin/cat.              And the final thing is, it's amazing
       | how much of the system you can       delete without it falling
       | apart completely.  Apart from the fact that       nobody could
       | login (/bin/login?), and most of the useful commands       had
       | gone, everything else seemed normal.  Of course, some things
       | can't       stand life without say /etc/termcap, or /dev/kmem, or
       | /etc/utmp, but       by and large it all hangs together.
       | I shall leave you with this question: if you were placed in the
       | same       situation, and had the presence of mind that always
       | comes with       hindsight, could you have got out of it in a
       | simpler or easier way?       Answers on a postage stamp to:
       | Mario Wolczko
        
       | geocrasher wrote:
       | This one made me LOL:                  My mistake on SunOS (with
       | OpenWindows) was to try and clean up all the        '.*'
       | directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I
       | was very careful and made sure I was in /tmp and then executed
       | "rm -rf ./.*".             I will never do this again. If I am in
       | any doubt as to how a wildcard        will expand I will echo it
       | first.
       | 
       | I read this, and just had to go try it because I couldn't picture
       | it in my brain. Here it is:                  $ echo ./.*
       | ./. ./..
       | 
       | So if you're in /tmp/ and do 'rm -rf ./.*', it's
       | rm -rf ./. ./..
       | 
       | and ./.. is .. which from tmp is /. Thankfully we have
       | protections against this now. Back then, not so much.
        
       | DonHopkins wrote:
       | I posted this horror story before, with a link to Pete "Gymble
       | Roulette" Cottrell's infamous contest at the end (which I wasn't
       | supposed to tell anyone outside of UMD CS Dept staff about):
       | 
       | https://news.ycombinator.com/item?id=15802533
       | 
       | Pyramid's OSx version of Unix (a dual-universe Unix supporting
       | both 4.xBSD and System V) [1] had a bug in the "passwd" program,
       | such that if somebody edited /etc/passwd with a text editor and
       | introduced a blank line (say at the end of the file, or
       | anywhere), the next person who changed their password with the
       | setuid root passwd program would cause the blank line to be
       | replaced by "::0:0:::" (empty user name, empty password, uid 0,
       | gid 0), which then let you get a root shell with 'su ""', and log
       | in as root by pressing the return key to the Login: prompt. (Well
       | it wasn't quite that simple. The email explains.)
       | 
       | https://en.wikipedia.org/wiki/Pyramid_Technology
       | 
       | Here's the email in which I reported it to the staff mailing
       | list.                   Date: Tue, 30 Sep 86 03:53:12 EDT
       | From: Don Hopkins <don@brillig.umd.edu>         Message-Id:
       | <8609300753.AA22574@brillig.umd.edu>         To:
       | chris@mimsy.umd.edu, staff@mimsy.umd.edu,                 Pete
       | "Gymble Roulette" Cottrell <pete@mimsy.umd.edu>         In-Reply-
       | To: Chris Torek's message of Mon, 29 Sep 86 22:57:57 EDT
       | Subject: stranger and stranger and stranger and stranger and
       | stranger                 Date: Mon, 29 Sep 86 22:57:57 EDT
       | From: Chris Torek <chris@mimsy.umd.edu>                 Gymble
       | has been `upgraded'.                 Pyramid's new login program
       | requires that every account have a            password.
       | The remote login system works by having special, password-less
       | accounts.                 Fun.              Pyramid's has
       | obviously put a WHOLE lot of thought into their nifty
       | security measures in the new release.               Is it only
       | half installed, or what? I can't find much in the way of
       | sources. /usr/src (on the ucb side of the universe at lease) is
       | quite         sparse.               On gymble, if there is a
       | stray newline at the end of /etc/passwd, the         next time
       | passwd is run, a nasty little "::0:0:::" entry gets added on
       | that line! [Ye Olde Standard Unix "passwd" Bug That MUST Have
       | Been Put         There On Purpose.] So I tacked a newline onto
       | the end with vipw to see         how much fun I could have with
       | this....              One effect is that I got a root shell by
       | typing:              % su ""              But that's not nearly
       | as bad as the effect of typing:              % rlogin gymble -l
       | ""              All I typed after that was <cr>:              you
       | don't hasword: New passhoose one new         word: <cr>
       | se a lonNew passger password.         word: <cr>         se a
       | lonNew password:ger password.         <cr>         Please use a
       | longer password.         Password: <cr>         Retype new
       | password: <cr>         Connection closed              Yes, it was
       | quite garbled for me, too: you're not seeing things, or on
       | ttyh4. I tried it several times, and it was still garbled. But
       | I'm not         EVEN going to complain about it being garbled,
       | though, for three         reasons: 1) It's the effect of a brand
       | new Pyramid "feature", and         being used to their software
       | releases, it seems only trivial cosmetic,         comparitivly.
       | 2) I want to be able to get to sleep tonight, so I'm         just
       | going to pretend it didn't happen. 3) There are PLEANTY of things
       | to complain about that are much much much worse. [My guess,
       | though,         would be that something is writing to /dev/tty
       | one way, and something         else isn't.]  Except for this
       | sentence, I will also completely ignore         the fact that it
       | closed the connection after setting the password, in         a
       | generous fit of compassion for overworked programmers with
       | ridiculous deadlines.              So then there was an entry in
       | /etc/passwd where the ::0:0::: had been:
       | :7h37OHz9Ww/oY:0:0:::              i.e., it let me insist upon a
       | password it thought was too short by         repeating it. (A
       | somewhat undocumented feature of the passwd program.)
       | ("That's not a bug, it's a feature!")              Then instead
       | of recognizing an empty string as meaning no password,
       | and clearing out the field like it should, it encrypted the null
       | string and stuck it there. PRETTY CHEEZY, PYRAMID!!!! That means
       | grepping for entries in /etc/passwd that have null strings in the
       | password field will NOT necessarily find all accounts with no
       | password.               So just because I was enjoying myself so
       | much, I once again did:              % rlogin gymble -l ""
       | Password: <cr>         [ message of the day et all ]         #
       | Wham, bam, thank you man! Instead of letting me in without
       | prompting         for a password [like it should, according to
       | everyone but pyramid], or         not allowing a null password
       | and insisting I change it [like it         shouldn't, according
       | to everyone but pyramid], it asked for a         password. I hit
       | return, and sure enough the encrypted null string         matched
       | what was in the passwd entry. It was quite difficult to resist
       | the temptation of deleting everyone's files and trashing the root
       | partition.                  -Don              P.S.: First one to
       | forward this to Pyramid is a turd.
       | 
       | P.P.S.: The origin story of Pete's "Gymble Roulette" nick-name is
       | here:
       | 
       | http://art.net/~hopkins/Don/text/gymble-roulette.html
       | 
       | The postscript comment was an oblique reference to the fact that
       | I'd previously gotten in trouble for forwarding Pete's hilarious
       | "Gymble Roulette" email to a mailing list and somehow it found
       | its was back to Pyramid. In my defense, he did say "Tell your
       | friends and loved ones.")
        
         | stryan wrote:
         | What a small world; I read this comment while sitting in the
         | UMD CS department machine room.
         | 
         | Glad bad login programs aren't something I have to deal with
         | anymore (knock on wood).
        
       | DonHopkins wrote:
       | At least he didn't have to install Solaris on Sun executive's
       | workstations.
       | 
       | Michael Tiemann on "The Worst Job in the World":
       | 
       | http://www.art.net/~hopkins/Don/unix-haters/slowlaris/worst-...
       | 
       | >I have a friend who has to have the worst job in the world: he
       | is a Unix system administrator. But it's worse than that, as I
       | will soon tell. [...]
       | 
       | https://en.wikipedia.org/wiki/Michael_Tiemann
       | 
       | >Michael Tiemann is vice president of open source affairs at Red
       | Hat, Inc., and former President of the Open Source Initiative.
       | [...] He co-founded Cygnus Solutions in 1989. [...]
       | Opensource.com profiled him in 2014, calling him one of "open
       | source's great explainers."
       | 
       | https://news.ycombinator.com/item?id=20006186
       | 
       | http://www.poppyfields.net/filks/00070.html
       | 
       | The Day SunOS Died                   "Bye, bye, SunOS 4.1.3!
       | ATT System V has replaced BSD.         You can cling to the
       | standards of the industry         But only if you pay the right
       | fee --          Only if you pay the right fee . . ."
        
         | drewg123 wrote:
         | I remember being assigned to look into Solaris when working as
         | a volunteer sysadmin in grad school, where we were a SunOS
         | shop. I took a sparcstation, wiped it, and installed Solaris.
         | This was 1992 or so, so it must have been 5.0 or 5.1. I hated
         | it, but I don't remember very many specifics about why I didn't
         | like it. I think it was partially the unbundled compilers,
         | combined with everything just being "different", combined with
         | perceived slowness. That was the last place I worked with Suns,
         | as my first job was sysadmin'ing DEC Ultrix boxes, and DEC
         | Alphas. Ultrix & OSF/1 were much closer to SunOS than Solaris,
         | ironically.
         | 
         | I do wish that Sun would have evolved the BSD kernel rather
         | than jumped to System V.
        
           | pjmlp wrote:
           | That was around the time that GCC finally started to get some
           | wind, due to the unbundling of UNIX SDK.
        
             | drewg123 wrote:
             | I think it also got wind because it was just so much easier
             | to compile stuff with gcc than it was to use the vendor
             | compilers, with all of their incompatible flags and
             | extensions. This was in the days before package managers,
             | when everybody compiled open source stuff themselves, a lot
             | of things didn't use autoconf, etc.
             | 
             | I remember compiling almost all open source stuff (emacs,
             | tex, postscript, file utils, etc) with GCC, and reserving
             | the vendor compiler for situations where performance
             | actually matters (math / linear algebra packages,
             | professors' code).
             | 
             | EDIT: I remember a few years where people tended to assume
             | all the world ran SunOS 4.1, just like people assume all
             | the world runs some flavor of debian/ubuntu now.
        
             | DonHopkins wrote:
             | The unbundling of the free C compiler and the high price of
             | the unbundled C compiler and AT&T's shitty bloated C++
             | compiler was emblematic of what was so bad about Sun
             | abandoning their Berkeley BSD roots and getting into bed
             | with AT&T System V with Solaris. And that provided an
             | opportunity for Cygnus Solutions.
             | 
             | Not coincidentally, after he founded Cygnus Solutions
             | (which Red Hat later bought), Michael Tiemann worked
             | closely with Sun to support GCC on their platform.
             | 
             | https://web.archive.org/web/20160310075610/http://www.toad.
             | c...
             | 
             | >We had the grandiose idea that major computer companies
             | like Sun, SGI, and DEC would fire their compiler
             | departments and use our free compilers and debuggers
             | instead, paying us a million dollars a year for support and
             | development. That wasn't quite right, but before we
             | starved, we stumbled into the embedded systems market,
             | doing jobs for Intel (the i960, a now-forgotten RISC chip),
             | AMD (their now-forgotten but nice 29000 RISC), and various
             | companies like 3Com and Adobe who had to port major pieces
             | of code to these chips. In that market, once we fixed the
             | tools to support cross-compiling, we had major advantages
             | over the existing competitors, and we swarmed right through
             | the market for 32-bit embedded system programming tools.
             | And ultimately, we did get million-dollar contracts, such
             | as one from Sony for building Playstation compilers and
             | emulators. This allowed game developers to start working a
             | year before the Playstation hardware was available. This
             | enabled the Playstation to come to market sooner, with more
             | and better games.
             | 
             | https://web.archive.org/web/20150701032848/http://www.toad.
             | c...
             | 
             | >Michael Tiemann, President, has been writing free software
             | since 1987. He wrote the code for GNU C's function
             | inlining. He wrote a portable instruction scheduler which
             | boosted GNU C's performance by 30\% on the SPARC. He is the
             | author of GNU C++, the first available native code C++
             | compiler. Mr. Tiemann has ported the GNU compiler to the
             | SPARC, Motorola 88000, and National 32032 architectures, as
             | well as adding support for Sun's FPA board on Sun 3s. He
             | ported the GNU debugger to the SPARC and Intel 80386
             | architectures, extended the debugger and linker to handle
             | C++ features, and ported the linker to SPARC.
             | 
             | https://www.oreilly.com/openbook/opensources/book/tiemans.h
             | t...
             | 
             | >The real bombshell came in June of 1987, when Stallman
             | released the GNU C Compiler (GCC) Version 1.0. I downloaded
             | it immediately, and I used all the tricks I'd read about in
             | the Emacs and GDB manuals to quickly learn its 110,000
             | lines of code. Stallman's compiler supported two platforms
             | in its first release: the venerable VAX and the new Sun3
             | workstation. It handily generated better code on these
             | platforms than the respective vendors' compilers could
             | muster. In two weeks, I had ported GCC to a new
             | microprocessor (the 32032 from National Semiconductor), and
             | the resulting port was 20% faster than the proprietary
             | compiler supplied by National. With another two weeks of
             | hacking, I had raised the delta to 40%. (It was often said
             | that the reason the National chip faded from existence was
             | because it was supposed to be a 1 MIPS chip, to compete
             | with Motorola's 68020, but when it was released, it only
             | clocked .75 MIPS on application benchmarks. Note that 140%
             | * 0.75 MIPS = 1.05 MIPS. How much did poor compiler
             | technology cost National?) Compilers, Debuggers, and
             | Editors are the Big 3 tools that programmers use on a day-
             | to-day basis. GCC, GDB, and Emacs were so profoundly better
             | than the proprietary alternatives, I could not help but
             | think about how much money (not to mention economic
             | benefit) there would be in replacing proprietary technology
             | with technology that was not only better, but also getting
             | better faster.
        
       | geocrasher wrote:
       | 2003 or 2004. Customer called in and said that his dedicated
       | server was hacked. I restored from backup.
       | 
       | An hour later, he calls back. Hacked _again_. Restored again.
       | 
       | An hour later, he calls back. He realizes that the hacker is
       | _him_! He 's doing a thing, but doesn't know what he's doing
       | wrong. So I have him email me the last thing he typed on his
       | server, as root:                  rm -rf /home/user/path/to/thing
       | /home/otheruser/path/to/somethingelse /
       | home/path/to/some/other/thing/altogether
        
         | terr-dav wrote:
         | I wonder if there's there a terminal setting or font that
         | renders whitespace like a ]-shaped underscore? Could solve a
         | whole class of bugs.
        
         | hnlmorg wrote:
         | This is precisely why I always `-v` when `rm`ing recursively.
         | It might be closing the barn door after the proverbial horse
         | has bolted; but at least the fuck up is visible and in some
         | circumstances you have a fighting chance to kill `rm` before
         | too much damage has been done.
        
       | cstross wrote:
       | This takes me back to roughly 1993.
       | 
       | I was in a department running on a mix of Wyse green-screen
       | terminals and, later, X terminals, when we got a budget upgrade
       | that would roll out actual individual PCs -- 486s running SCO
       | Open Desktop -- to everyone. (This was _not_ cheap, it cost about
       | PS4000 for the hardware per seat, although the software was free
       | because, er, this was back in the day when SCO was a respectable
       | UNIX development house rather than a serial litigation zombie,
       | and we were SCO 's techpubs department).
       | 
       | Anyway, the editors, who were techpubs management (and thereby
       | stronger on the management than the tech side of things), got
       | their workstations before anyone else. And one of them thought,
       | "ooh goodie, my very own UNIX system!" And proceeded to do "sudo
       | chown -R me:me /" (substitute their username and group for "me")
       | all over the root filesystem.
       | 
       | It's amazing what breaks when every shared library suddenly
       | belongs to a random user, isn't it?
        
         | etcet wrote:
         | I've seen a bash history where "sudo chown -R me:me /" was
         | followed up by "sudo chown +R me:me /". At least they tried.
        
         | forinti wrote:
         | He knew just enough to shoot himself in the foot.
        
         | AceJohnny2 wrote:
         | > _rather than a serial litigation zombie_
         | 
         | "this sounds like cstross"
         | 
         | <checks username>
         | 
         | "heh"
        
         | hnlmorg wrote:
         | heh I have an almost identical war story.
        
       | jlv2 wrote:
       | I vaguely thought I posted in this thread back then.
       | 
       | Back in 1984/5, I had a directory in my homedir called "etc" for
       | of miscellaneous stuff. One day I thought: that's a bad name, I
       | should remove it. I errantly typed "rm -rf /etc". Thankfully I
       | got a "Permission Denied" error. Except, I then did the obvious
       | override, "sudo rm -rf /etc" (1). This was on a VAX 11/780 with
       | about 50 undergrads doing project work on. The command ran for a
       | while and then I heard moans out in the terminal room, as the
       | system crashed. It took us about 3 hours to restore from backup
       | tape.
       | 
       | (1) I used to have to explain what "sudo" was, because this
       | happened before we posted it to USENET, and before it was
       | ubiquitous on systems.
        
         | drewg123 wrote:
         | When did it become ubiquitous? I first got root privs via sudo
         | on a *nix system in 1991 or so, and I remember it being widely
         | deployed even then.
        
       | lifeisstillgood wrote:
       | So much of this can be filed under "before we culturally accepted
       | prod is different.."
        
         | mdpye wrote:
         | Most of these stories relate to administering interactive
         | multi-user machines, not the kind of thing we now think of as a
         | server.
         | 
         | Users were simultaneously logged in at the shell going about
         | their business in *nix, not sending stateless requests in to a
         | server process.
         | 
         | And you certainly couldn't afford to have a duplicate of a
         | machine that expensive.
         | 
         | The idea of multiple environments didn't really exist, and you
         | mostly administered machines from within - hence many of the
         | stories being about getting enough tools working again to
         | straighten it out. You didn't have another machine (or perhaps
         | the connectivity) to put the thing on the operating table from
         | a working system.
         | 
         | Things were different...
        
         | hnlmorg wrote:
         | Try "before hardware was cheap enough that companies could run
         | dedicated non-prod instances"
        
       | dang wrote:
       | One small (very) past thread:
       | 
       |  _Unix Admin. Horror Story Summary, version 1.0 (old)_ -
       | https://news.ycombinator.com/item?id=721578 - July 2009 (6
       | comments)
        
       | midasuni wrote:
       | > But the most important thing that can be learned from this is
       | not that you have to make backups (we all know that, right? ;-)
       | ). More important than making backups is to make sure your
       | backups are complete and verified
       | 
       | C'est plus ca change...
        
       | karmakaze wrote:
       | I really enjoy the recovery parts of the stories that have them,
       | like a good Hollywood movie script, but real.
       | 
       | Unix wasn't very common after leaving university and I have more
       | PC/LAN type stories. There was one memorable moment, where I was
       | working very late in the office and got a call. [If working late,
       | the main line would ring the entire office and I could press the
       | blinky light to answer.] It was one of our consultants on the
       | west coast who somehow had a corrupt filesystem, but that machine
       | was the one that had all the project files for the many months of
       | consulting work that the team had been developing. [I don't
       | recall but it may have been CVS or SVN.]
       | 
       | The tricky bit was that it was using OS/2 and its HPFS filesystem
       | so the usual file utilities wouldn't work. We had a number of IBM
       | tech books on our bookshelves (because we also did mainframe
       | consulting) and I'd been reading about terminal streams and one
       | about the HPFS filesystem in particular. It mentioned boot
       | blocks, superblocks, bands, allocation bit blocks, etc.
       | 
       | Being young (and dumb) went with "what's the worst thing that
       | could happen" and came up with a plan: using the DOS 'nu' (Norton
       | Utility) copy a few choice sectors from a similar spec-looking
       | machine and try the OS/2 equivalent of 'chkdsk /f'--the client
       | after all was IBM known for conformity. We first had to dial-up
       | modem transmit the 'nu' program, but then we were coping the
       | first 18 (or so) sectors to get the boot sector, partition table,
       | boot program or other HPFS initial sector data; then there were
       | some sectors in the middle of the disk that served as a kind of
       | main description table with others in bands (that we didn't
       | bother with). Guessed the starting point and number of sectors.
       | This was a grasping at straws Hail Mary. Rebooted the machine,
       | let the OS/2 run its chcdsk as it detected a problem, waited a
       | long while until it was done. Unbelieveably it all worked! There
       | might have been a couple open files lost and some files that were
       | recently deleted being present, but no big differences. We didn't
       | think we needed to tell anyone. He bought me beers as promised
       | when I came to visit.
       | 
       | Bonus memory: LapLink with the parallel transfer cable was _the
       | shit_ in those days. https://en.wikipedia.org/wiki/LapLink_cable
        
       | notme77 wrote:
       | source ~/.bash_history
        
         | oneweekwonder wrote:
         | hah, before ci/devops tools was popular and you had to setup
         | multiple identical/fall-over machines you could scp your
         | .bash_history clean it up a bit and source it, neat.
        
       | butterfi wrote:
       | I thought I would find these funny and instead they just made me
       | anxious. I mean, they are funny, I guess I just have PTSD from
       | years of Unix administration.
        
         | cf100clunk wrote:
         | Came here to say the same thing... Unix sysadmin since 1988. I
         | would laugh about these stories but I just cannot without a
         | lump in my throat and a few mea maxima culpas in my own mind.
         | Reminds me of having heard somewhere that Tom Waits, on
         | watching the litany of road show disasters parodied on This Is
         | Spinal Tap, wept rather than laughed.
         | 
         | My own worst Unix Admin Horror Story is a variant of the
         | classic ''accidental delete-restore from backup if you've got
         | one'' scenario: in the early '90s I accidentally repopulated
         | all YP tables on a production Sparcstation 10 machine in real
         | time on a busy workday, but the tables had not been kept up to
         | date! It took until the next day to restore from backup and a
         | further day of research and testing to get all the YP tables up
         | to proper state, then write scripts to keep them updated. (This
         | was before Sun was legally forced to rename YP to NIS, btw).
        
       | krylon wrote:
       | Ironically, while I love Unix, I have spent most of my career
       | shepherding Windows boxes. The only real horror story I got was a
       | new coworker (turning me from "the IT guy" into "half the IT
       | department") who looked through the Active Directory tree and
       | found the GPO management part had replicated the organizational
       | structure. Since there were no GPOs at the time, he considered
       | this wasteful and confusing, so he went and deleted it.
       | 
       | ...
       | 
       | Except that what he _did_ delete, it turned out, was the actual
       | organizational structure of the Active Directory tree, including
       | _ALL_ user accounts. (It 's hard to explain without visuals aids,
       | the UI gave no indication it would delete the actual AD objects,
       | not just the (non-existent) GPOs.)
       | 
       | Before long, people started calling to let us know they could no
       | longer log into their computers or the terminal server. _sigh_ It
       | was a fairly stressful morning.
       | 
       | We really tried, for about 45 minutes, to resurrect the Active
       | Directory tree, but it was no good (this was Windows Server 2008,
       | so no AD Recycle Bin), so we had to restore the server from
       | backup. I have since learnt that there is backup software that
       | allows you to restore, say, your AD tree, or maybe even just a
       | part of it, instead of the whole machine. Well, the backup
       | software we had at the time _suuuuucked_ , so not only did we
       | have to restore the entire server, but we had to literally sit
       | _all day_ and watch the progress bar move at glacial speed.
       | 
       | In the end, we had the server up and running again, and
       | fortunately both the company's CEO and most employees actually
       | welcomed the opportunity to finally, _FINALLY_ clean up their
       | desks, something every single one of them had been delaying for a
       | long time. And by the time we were done, I was just so exhausted
       | I wasn 't even mad at the newbie anymore.
       | 
       | At least we learnt from that mistake, though. Got ourselves a
       | second domain controller, and a much better backup solution. In
       | retrospect, I think it was probably a good thing - our boss took
       | it with good humor, no data was lost, our backup system worked,
       | but we also saw how badly it sucked, and the incident gave us
       | some leverage to get the funding for said upgrades. Also,
       | everyone had a clean desk, and since it was a Friday, a couple of
       | coworkers decided to start their weekend early.
        
         | yeuxardents wrote:
         | This story worked out surprisingly well, usually, not so much
         | (:
        
       | rntksi wrote:
       | >Well one time I was installing a minimal base system of Linux on
       | a friends PC, so that we would have all the necessary utlitities
       | to bring over the rest of the stuff. His 3 1/2 inch disk was
       | dead, so when had to get the 5 1/4 inch version of the boot/root
       | disk. Too bad that version, having to fit in 1.2M instead of
       | 1.44, didn't have tar
       | 
       | Heh ... I wonder how many years forward will people stop knowing
       | what a 3 1/2 and a 5 1/4 disk is
        
         | CoastalCoder wrote:
         | Or to understand the confusion regarding 3.5" disks being
         | floppies rather than hard disks.
        
         | hulitu wrote:
         | Ah, the good old days. Single density, double density ( 720 kB,
         | 1,44 MB). I heard also of 2.88 MB floppies - never saw one in
         | real life. If i remember correctly 2.88 MB was double density
         | double sided and you needed a special floppy drive.
        
           | bityard wrote:
           | Single-sided: 360 KB, double-sided (or double density): 720
           | KB, high density: 1.44MB.
           | 
           | There were 2.88 MB disks and drives but they never gained
           | much traction, because they were expensive and the PC
           | industry kept promising various "floppy killers" like the ZIP
           | drive.
        
         | pmontra wrote:
         | As in "You 3D printed the Save icon!" ?
         | 
         | https://logosatwork.com/you-3d-printed-the-save-icon/
        
       | greedo wrote:
       | I had a contractor installing software last week. He was granted
       | full sudo permissions despite having demonstrated a unique set of
       | command line skills. Everything was going fine until I get a text
       | from his manager saying the contractor couldn't SSH in this
       | morning. Turns out he had gotten frustrated with some file/folder
       | permissions in the directory where he was supposed to install the
       | software package. So he simply ran `chmod -R 777 /*`
       | 
       | Needless to say that required a full restore from the previous
       | night's backup since he hadn't snapshotted the VM before
       | beginning his work. He was very angry that he had lost 2 days of
       | work. I was very sympathetic...
        
       | AceJohnny2 wrote:
       | My very favorite is more of a "recovery legend", telling the
       | heroic tale of recovering a a Unix system after an errant "rm
       | -rf" deleted most of the system's critical files:
       | 
       | https://www.ee.ryerson.ca/~elf/hack/recovery.html
        
         | agentwiggles wrote:
         | Nice! When I saw the thread title I was hoping this story would
         | get posted somewhere. I read this a long time ago and hadn't
         | been able to find it for years!
         | 
         | Thanks for sharing!
        
           | AceJohnny2 wrote:
           | I once spent an hour or more finding it again, so I
           | bookmarked it ;)
        
       | whartung wrote:
       | Worst thing I ever did was cross hard mount NFS volumes across
       | two machines.
       | 
       | With a hard NFS mount, the mount will hang until the other
       | machine responds.
       | 
       | When we had to power cycle the two servers, they would not come
       | up as they were deadlocked waiting for each other. That was
       | exciting.
        
         | treesknees wrote:
         | This is somewhat common in environments with stable power, you
         | basically never have the entire IT system go down and come back
         | up at the same time.
        
         | patrickdavey wrote:
         | How did you fix it?
        
           | oneweekwonder wrote:
           | not op but we recently wanted to mount nfs and the sysadmin
           | was adamant we use automounter[0] instead of fstab because if
           | the nfs mount is not available it can hang the kernel.
           | 
           | Not sure if it is true or just sysadmin lore but was
           | interesting enough to learn about a alternative.
           | 
           | [0]: https://linux.die.net/man/8/automount
        
           | jcynix wrote:
           | Been there, done that. Ok, a colleague did it, sbkut 30 years
           | ago ;-)
           | 
           | IIRC resetting the machine and forcing it to single boot was
           | the solution.
        
         | bostik wrote:
         | Speaking of NFS, ex-coworker had renamed his prior company's
         | servers "notresponding" and "stilltrying".
         | 
         | The NFS client logs must have been glorious.
        
       | kloch wrote:
       | I love this. It takes us back to a time when administering a Unix
       | system was a Big Deal. Partly because they were rare and
       | expensive. But also they were truly multi-user with dozens of
       | people logged in at any given time.
        
         | rilindo wrote:
         | They still are a big deal, only you can manage up to thousands
         | of them and with modern automation, if you screw up one, you
         | screw up all of them.
        
       | qwertox wrote:
       | This reminds me of the time when I execuded `rm -rf ~` in hopes
       | of deleting an erroneously created directory.
       | 
       | Or when my "wrongly" expanded `mv` command moved all the files
       | and directories in home into the last directory of the home
       | directory. Which was a NTFS mount, leading to a loss of all the
       | file permissions.
        
       | csydas wrote:
       | The first entry about adding tcsh is more or less the basis of
       | one of the questions I use in technical interviews for our Linux
       | team. It's less about specifics on tcsh and more just about
       | explaining the hierarchy of linux/unix and why we have the
       | bin/sbin directories under / and /usr; the more the candidate can
       | explain (or even hypothesize), the more comfortable my team feels
       | with their general curiosity/understanding of Linux/Unix.
       | 
       | It's a niche situation sure, but being able to understand the
       | system and tooling you're working with to a degree to understand
       | what options you really have shows a great deal of discipline and
       | curiosity, for me at least. Again, it's less about "can you
       | figure out what to do in this specific situation" and more "can
       | you just explain what you look at every single day in plain and
       | simple terms? Did you ever think about it?"
       | 
       | It's been a surprisingly revealing question for nascent Linux
       | admins on how they react to questioning the things they look at
       | every single day, and how ready they are to __really__ dig into
       | the kernel internals.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-12-21 23:00 UTC)