[HN Gopher] Unix Admin Horror Story Summary (1992)
___________________________________________________________________
Unix Admin Horror Story Summary (1992)
Author : voxadam
Score : 102 points
Date : 2021-12-21 15:38 UTC (7 hours ago)
(HTM) web link (www-uxsup.csx.cam.ac.uk)
(TXT) w3m dump (www-uxsup.csx.cam.ac.uk)
| cecilpl2 wrote:
| I found my favorite story buried in the middle of this, from
| 1986. It's a classic on par with The Story of Mel, a Real
| Programmer. Reproduced here for your reading pleasure:
| Have you ever left your terminal logged in, only to find when you
| came back to it that a (supposed) friend had typed "rm -rf
| ~/*" and was hovering over the keyboard with threats along
| the lines of "lend me a fiver 'til Thursday, or I hit
| return"? Undoubtedly the person in question would not have
| had the nerve to inflict such a trauma upon you, and was
| doing it in jest. So you've probably never experienced the
| worst of such disasters.... It was a quiet Wednesday
| afternoon. Wednesday, 1st October, 15:15 BST, to be
| precise, when Peter, an office-mate of mine, leaned away
| from his terminal and said to me, "Mario, I'm having a little
| trouble sending mail." Knowing that msg was capable of
| confusing even the most capable of people, I sauntered over
| to his terminal to see what was wrong. A strange error
| message of the form (I forget the exact details) "cannot
| access /foo/bar for userid 147" had been issued by msg. My
| first thought was "Who's userid 147?; the sender of the
| message, the destination, or what?" So I leant over to another
| terminal, already logged in, and typed grep 147
| /etc/passwd only to receive the response
| /etc/passwd: No such file or directory. Instantly, I
| guessed that something was amiss. This was confirmed when
| in response to ls /etc I got
| ls: not found. I suggested to Peter that it would be
| a good idea not to try anything for a while, and went off
| to find our system manager. When I arrived at his
| office, his door was ajar, and within ten seconds I
| realised what the problem was. James, our manager, was sat
| down, head in hands, hands between knees, as one whose world has
| just come to an end. Our newly-appointed system programmer,
| Neil, was beside him, gazing listlessly at the screen of
| his terminal. And at the top of the screen I spied the
| following lines: # cd # rm -rf *
| Oh, shit, I thought. That would just about explain it.
| I can't remember what happened in the succeeding minutes; my
| memory is just a blur. I do remember trying ls (again),
| ps, who and maybe a few other commands beside, all to no
| avail. The next thing I remember was being at my terminal
| again (a multi-window graphics terminal), and typing
| cd / echo \* I owe a debt of thanks to David
| Korn for making echo a built-in of his shell; needless to
| say, /bin, together with /bin/echo, had been deleted. What
| transpired in the next few minutes was that /dev, /etc and
| /lib had also gone in their entirety; fortunately Neil had
| interrupted rm while it was somewhere down below /news, and /tmp,
| /usr and /users were all untouched. Meanwhile
| James had made for our tape cupboard and had retrieved what
| claimed to be a dump tape of the root filesystem, taken four
| weeks earlier. The pressing question was, "How do we
| recover the contents of the tape?". Not only had we lost
| /etc/restore, but all of the device entries for the tape
| deck had vanished. And where does mknod live? You guessed
| it, /etc. How about recovery across Ethernet of any of
| this from another VAX? Well, /bin/tar had gone, and
| thoughtfully the Berkeley people had put rcp in /bin in the 4.3
| distribution. What's more, none of the Ether stuff wanted to
| know without /etc/hosts at least. We found a version of
| cpio in /usr/local, but that was unlikely to do us any good
| without a tape deck. Alternatively, we could
| get the boot tape out and rebuild the root filesystem, but
| neither James nor Neil had done that before, and we weren't
| sure that the first thing to happen would be that the whole
| disk would be re-formatted, losing all our user files. (We take
| dumps of the user files every Thursday; by Murphy's Law
| this had to happen on a Wednesday). Another solution might
| be to borrow a disk from another VAX, boot off that, and
| tidy up later, but that would have entailed calling the DEC
| engineer out, at the very least. We had a number of users
| in the final throes of writing up PhD theses and the loss
| of a maybe a weeks' work (not to mention the machine down time)
| was unthinkable. So, what to do? The next idea was
| to write a program to make a device descriptor for the tape
| deck, but we all know where cc, as and ld live. Or maybe
| make skeletal entries for /etc/passwd, /etc/hosts and so
| on, so that /usr/bin/ftp would work. By sheer luck, I had a
| gnuemacs still running in one of my windows, which we could use
| to create passwd, etc., but the first step was to create a
| directory to put them in. Of course /bin/mkdir had gone,
| and so had /bin/mv, so we couldn't rename /tmp to /etc.
| However, this looked like a reasonable line of attack.
| By now we had been joined by Alasdair, our resident UNIX guru,
| and as luck would have it, someone who knows VAX assembler.
| So our plan became this: write a program in assembler which
| would either rename /tmp to /etc, or make /etc, assemble it
| on another VAX, uuencode it, type in the uuencoded file
| using my gnu, uudecode it (some bright spark had thought to
| put uudecode in /usr/bin), run it, and hey presto, it would
| all be plain sailing from there. By yet another miracle of
| good fortune, the terminal from which the damage had been
| done was still su'd to root (su is in /bin, remember?), so at
| least we stood a chance of all this working.
| Off we set on our merry way, and within only an hour we had
| managed to concoct the dozen or so lines of assembler to
| create /etc. The stripped binary was only 76 bytes long,
| so we converted it to hex (slightly more readable than the
| output of uuencode), and typed it in using my editor. If
| any of you ever have the same problem, here's the hex for
| future reference:
| 070100002c000000000000000000000000000000000000000000000000000000
| 0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f
| 8800040000bc012f65746300 I had a handy program
| around (doesn't everybody?) for converting ASCII hex to
| binary, and the output of /usr/bin/sum tallied with our
| original binary. But hang on---how do you set execute permission
| without /bin/chmod? A few seconds thought (which as usual,
| lasted a couple of minutes) suggested that we write the
| binary on top of an already existing binary, owned by
| me...problem solved. So along we trotted to the
| terminal with the root login, carefully remembered to set
| the umask to 0 (so that I could create files in it using my
| gnu), and ran the binary. So now we had a /etc, writable by
| all. From there it was but a few easy steps to creating passwd,
| hosts, services, protocols, (etc), and then ftp was willing to
| play ball. Then we recovered the contents of /bin across
| the ether (it's amazing how much you come to miss ls after
| just a few, short hours), and selected files from /etc.
| The key file was /etc/rrestore, with which we recovered
| /dev from the dump tape, and the rest is history.
| Now, you're asking yourself (as I am), what's the moral of this
| story? Well, for one thing, you must always remember the
| immortal words, DON'T PANIC. Our initial reaction was to
| reboot the machine and try everything as single user, but
| it's unlikely it would have come up without /etc/init and
| /bin/sh. Rational thought saved us from this one.
| The next thing to remember is that UNIX tools really can be put
| to unusual purposes. Even without my gnuemacs, we could
| have survived by using, say, /usr/bin/grep as a substitute
| for /bin/cat. And the final thing is, it's amazing
| how much of the system you can delete without it falling
| apart completely. Apart from the fact that nobody could
| login (/bin/login?), and most of the useful commands had
| gone, everything else seemed normal. Of course, some things
| can't stand life without say /etc/termcap, or /dev/kmem, or
| /etc/utmp, but by and large it all hangs together.
| I shall leave you with this question: if you were placed in the
| same situation, and had the presence of mind that always
| comes with hindsight, could you have got out of it in a
| simpler or easier way? Answers on a postage stamp to:
| Mario Wolczko
| geocrasher wrote:
| This one made me LOL: My mistake on SunOS (with
| OpenWindows) was to try and clean up all the '.*'
| directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I
| was very careful and made sure I was in /tmp and then executed
| "rm -rf ./.*". I will never do this again. If I am in
| any doubt as to how a wildcard will expand I will echo it
| first.
|
| I read this, and just had to go try it because I couldn't picture
| it in my brain. Here it is: $ echo ./.*
| ./. ./..
|
| So if you're in /tmp/ and do 'rm -rf ./.*', it's
| rm -rf ./. ./..
|
| and ./.. is .. which from tmp is /. Thankfully we have
| protections against this now. Back then, not so much.
| DonHopkins wrote:
| I posted this horror story before, with a link to Pete "Gymble
| Roulette" Cottrell's infamous contest at the end (which I wasn't
| supposed to tell anyone outside of UMD CS Dept staff about):
|
| https://news.ycombinator.com/item?id=15802533
|
| Pyramid's OSx version of Unix (a dual-universe Unix supporting
| both 4.xBSD and System V) [1] had a bug in the "passwd" program,
| such that if somebody edited /etc/passwd with a text editor and
| introduced a blank line (say at the end of the file, or
| anywhere), the next person who changed their password with the
| setuid root passwd program would cause the blank line to be
| replaced by "::0:0:::" (empty user name, empty password, uid 0,
| gid 0), which then let you get a root shell with 'su ""', and log
| in as root by pressing the return key to the Login: prompt. (Well
| it wasn't quite that simple. The email explains.)
|
| https://en.wikipedia.org/wiki/Pyramid_Technology
|
| Here's the email in which I reported it to the staff mailing
| list. Date: Tue, 30 Sep 86 03:53:12 EDT
| From: Don Hopkins <don@brillig.umd.edu> Message-Id:
| <8609300753.AA22574@brillig.umd.edu> To:
| chris@mimsy.umd.edu, staff@mimsy.umd.edu, Pete
| "Gymble Roulette" Cottrell <pete@mimsy.umd.edu> In-Reply-
| To: Chris Torek's message of Mon, 29 Sep 86 22:57:57 EDT
| Subject: stranger and stranger and stranger and stranger and
| stranger Date: Mon, 29 Sep 86 22:57:57 EDT
| From: Chris Torek <chris@mimsy.umd.edu> Gymble
| has been `upgraded'. Pyramid's new login program
| requires that every account have a password.
| The remote login system works by having special, password-less
| accounts. Fun. Pyramid's has
| obviously put a WHOLE lot of thought into their nifty
| security measures in the new release. Is it only
| half installed, or what? I can't find much in the way of
| sources. /usr/src (on the ucb side of the universe at lease) is
| quite sparse. On gymble, if there is a
| stray newline at the end of /etc/passwd, the next time
| passwd is run, a nasty little "::0:0:::" entry gets added on
| that line! [Ye Olde Standard Unix "passwd" Bug That MUST Have
| Been Put There On Purpose.] So I tacked a newline onto
| the end with vipw to see how much fun I could have with
| this.... One effect is that I got a root shell by
| typing: % su "" But that's not nearly
| as bad as the effect of typing: % rlogin gymble -l
| "" All I typed after that was <cr>: you
| don't hasword: New passhoose one new word: <cr>
| se a lonNew passger password. word: <cr> se a
| lonNew password:ger password. <cr> Please use a
| longer password. Password: <cr> Retype new
| password: <cr> Connection closed Yes, it was
| quite garbled for me, too: you're not seeing things, or on
| ttyh4. I tried it several times, and it was still garbled. But
| I'm not EVEN going to complain about it being garbled,
| though, for three reasons: 1) It's the effect of a brand
| new Pyramid "feature", and being used to their software
| releases, it seems only trivial cosmetic, comparitivly.
| 2) I want to be able to get to sleep tonight, so I'm just
| going to pretend it didn't happen. 3) There are PLEANTY of things
| to complain about that are much much much worse. [My guess,
| though, would be that something is writing to /dev/tty
| one way, and something else isn't.] Except for this
| sentence, I will also completely ignore the fact that it
| closed the connection after setting the password, in a
| generous fit of compassion for overworked programmers with
| ridiculous deadlines. So then there was an entry in
| /etc/passwd where the ::0:0::: had been:
| :7h37OHz9Ww/oY:0:0::: i.e., it let me insist upon a
| password it thought was too short by repeating it. (A
| somewhat undocumented feature of the passwd program.)
| ("That's not a bug, it's a feature!") Then instead
| of recognizing an empty string as meaning no password,
| and clearing out the field like it should, it encrypted the null
| string and stuck it there. PRETTY CHEEZY, PYRAMID!!!! That means
| grepping for entries in /etc/passwd that have null strings in the
| password field will NOT necessarily find all accounts with no
| password. So just because I was enjoying myself so
| much, I once again did: % rlogin gymble -l ""
| Password: <cr> [ message of the day et all ] #
| Wham, bam, thank you man! Instead of letting me in without
| prompting for a password [like it should, according to
| everyone but pyramid], or not allowing a null password
| and insisting I change it [like it shouldn't, according
| to everyone but pyramid], it asked for a password. I hit
| return, and sure enough the encrypted null string matched
| what was in the passwd entry. It was quite difficult to resist
| the temptation of deleting everyone's files and trashing the root
| partition. -Don P.S.: First one to
| forward this to Pyramid is a turd.
|
| P.P.S.: The origin story of Pete's "Gymble Roulette" nick-name is
| here:
|
| http://art.net/~hopkins/Don/text/gymble-roulette.html
|
| The postscript comment was an oblique reference to the fact that
| I'd previously gotten in trouble for forwarding Pete's hilarious
| "Gymble Roulette" email to a mailing list and somehow it found
| its was back to Pyramid. In my defense, he did say "Tell your
| friends and loved ones.")
| stryan wrote:
| What a small world; I read this comment while sitting in the
| UMD CS department machine room.
|
| Glad bad login programs aren't something I have to deal with
| anymore (knock on wood).
| DonHopkins wrote:
| At least he didn't have to install Solaris on Sun executive's
| workstations.
|
| Michael Tiemann on "The Worst Job in the World":
|
| http://www.art.net/~hopkins/Don/unix-haters/slowlaris/worst-...
|
| >I have a friend who has to have the worst job in the world: he
| is a Unix system administrator. But it's worse than that, as I
| will soon tell. [...]
|
| https://en.wikipedia.org/wiki/Michael_Tiemann
|
| >Michael Tiemann is vice president of open source affairs at Red
| Hat, Inc., and former President of the Open Source Initiative.
| [...] He co-founded Cygnus Solutions in 1989. [...]
| Opensource.com profiled him in 2014, calling him one of "open
| source's great explainers."
|
| https://news.ycombinator.com/item?id=20006186
|
| http://www.poppyfields.net/filks/00070.html
|
| The Day SunOS Died "Bye, bye, SunOS 4.1.3!
| ATT System V has replaced BSD. You can cling to the
| standards of the industry But only if you pay the right
| fee -- Only if you pay the right fee . . ."
| drewg123 wrote:
| I remember being assigned to look into Solaris when working as
| a volunteer sysadmin in grad school, where we were a SunOS
| shop. I took a sparcstation, wiped it, and installed Solaris.
| This was 1992 or so, so it must have been 5.0 or 5.1. I hated
| it, but I don't remember very many specifics about why I didn't
| like it. I think it was partially the unbundled compilers,
| combined with everything just being "different", combined with
| perceived slowness. That was the last place I worked with Suns,
| as my first job was sysadmin'ing DEC Ultrix boxes, and DEC
| Alphas. Ultrix & OSF/1 were much closer to SunOS than Solaris,
| ironically.
|
| I do wish that Sun would have evolved the BSD kernel rather
| than jumped to System V.
| pjmlp wrote:
| That was around the time that GCC finally started to get some
| wind, due to the unbundling of UNIX SDK.
| drewg123 wrote:
| I think it also got wind because it was just so much easier
| to compile stuff with gcc than it was to use the vendor
| compilers, with all of their incompatible flags and
| extensions. This was in the days before package managers,
| when everybody compiled open source stuff themselves, a lot
| of things didn't use autoconf, etc.
|
| I remember compiling almost all open source stuff (emacs,
| tex, postscript, file utils, etc) with GCC, and reserving
| the vendor compiler for situations where performance
| actually matters (math / linear algebra packages,
| professors' code).
|
| EDIT: I remember a few years where people tended to assume
| all the world ran SunOS 4.1, just like people assume all
| the world runs some flavor of debian/ubuntu now.
| DonHopkins wrote:
| The unbundling of the free C compiler and the high price of
| the unbundled C compiler and AT&T's shitty bloated C++
| compiler was emblematic of what was so bad about Sun
| abandoning their Berkeley BSD roots and getting into bed
| with AT&T System V with Solaris. And that provided an
| opportunity for Cygnus Solutions.
|
| Not coincidentally, after he founded Cygnus Solutions
| (which Red Hat later bought), Michael Tiemann worked
| closely with Sun to support GCC on their platform.
|
| https://web.archive.org/web/20160310075610/http://www.toad.
| c...
|
| >We had the grandiose idea that major computer companies
| like Sun, SGI, and DEC would fire their compiler
| departments and use our free compilers and debuggers
| instead, paying us a million dollars a year for support and
| development. That wasn't quite right, but before we
| starved, we stumbled into the embedded systems market,
| doing jobs for Intel (the i960, a now-forgotten RISC chip),
| AMD (their now-forgotten but nice 29000 RISC), and various
| companies like 3Com and Adobe who had to port major pieces
| of code to these chips. In that market, once we fixed the
| tools to support cross-compiling, we had major advantages
| over the existing competitors, and we swarmed right through
| the market for 32-bit embedded system programming tools.
| And ultimately, we did get million-dollar contracts, such
| as one from Sony for building Playstation compilers and
| emulators. This allowed game developers to start working a
| year before the Playstation hardware was available. This
| enabled the Playstation to come to market sooner, with more
| and better games.
|
| https://web.archive.org/web/20150701032848/http://www.toad.
| c...
|
| >Michael Tiemann, President, has been writing free software
| since 1987. He wrote the code for GNU C's function
| inlining. He wrote a portable instruction scheduler which
| boosted GNU C's performance by 30\% on the SPARC. He is the
| author of GNU C++, the first available native code C++
| compiler. Mr. Tiemann has ported the GNU compiler to the
| SPARC, Motorola 88000, and National 32032 architectures, as
| well as adding support for Sun's FPA board on Sun 3s. He
| ported the GNU debugger to the SPARC and Intel 80386
| architectures, extended the debugger and linker to handle
| C++ features, and ported the linker to SPARC.
|
| https://www.oreilly.com/openbook/opensources/book/tiemans.h
| t...
|
| >The real bombshell came in June of 1987, when Stallman
| released the GNU C Compiler (GCC) Version 1.0. I downloaded
| it immediately, and I used all the tricks I'd read about in
| the Emacs and GDB manuals to quickly learn its 110,000
| lines of code. Stallman's compiler supported two platforms
| in its first release: the venerable VAX and the new Sun3
| workstation. It handily generated better code on these
| platforms than the respective vendors' compilers could
| muster. In two weeks, I had ported GCC to a new
| microprocessor (the 32032 from National Semiconductor), and
| the resulting port was 20% faster than the proprietary
| compiler supplied by National. With another two weeks of
| hacking, I had raised the delta to 40%. (It was often said
| that the reason the National chip faded from existence was
| because it was supposed to be a 1 MIPS chip, to compete
| with Motorola's 68020, but when it was released, it only
| clocked .75 MIPS on application benchmarks. Note that 140%
| * 0.75 MIPS = 1.05 MIPS. How much did poor compiler
| technology cost National?) Compilers, Debuggers, and
| Editors are the Big 3 tools that programmers use on a day-
| to-day basis. GCC, GDB, and Emacs were so profoundly better
| than the proprietary alternatives, I could not help but
| think about how much money (not to mention economic
| benefit) there would be in replacing proprietary technology
| with technology that was not only better, but also getting
| better faster.
| geocrasher wrote:
| 2003 or 2004. Customer called in and said that his dedicated
| server was hacked. I restored from backup.
|
| An hour later, he calls back. Hacked _again_. Restored again.
|
| An hour later, he calls back. He realizes that the hacker is
| _him_! He 's doing a thing, but doesn't know what he's doing
| wrong. So I have him email me the last thing he typed on his
| server, as root: rm -rf /home/user/path/to/thing
| /home/otheruser/path/to/somethingelse /
| home/path/to/some/other/thing/altogether
| terr-dav wrote:
| I wonder if there's there a terminal setting or font that
| renders whitespace like a ]-shaped underscore? Could solve a
| whole class of bugs.
| hnlmorg wrote:
| This is precisely why I always `-v` when `rm`ing recursively.
| It might be closing the barn door after the proverbial horse
| has bolted; but at least the fuck up is visible and in some
| circumstances you have a fighting chance to kill `rm` before
| too much damage has been done.
| cstross wrote:
| This takes me back to roughly 1993.
|
| I was in a department running on a mix of Wyse green-screen
| terminals and, later, X terminals, when we got a budget upgrade
| that would roll out actual individual PCs -- 486s running SCO
| Open Desktop -- to everyone. (This was _not_ cheap, it cost about
| PS4000 for the hardware per seat, although the software was free
| because, er, this was back in the day when SCO was a respectable
| UNIX development house rather than a serial litigation zombie,
| and we were SCO 's techpubs department).
|
| Anyway, the editors, who were techpubs management (and thereby
| stronger on the management than the tech side of things), got
| their workstations before anyone else. And one of them thought,
| "ooh goodie, my very own UNIX system!" And proceeded to do "sudo
| chown -R me:me /" (substitute their username and group for "me")
| all over the root filesystem.
|
| It's amazing what breaks when every shared library suddenly
| belongs to a random user, isn't it?
| etcet wrote:
| I've seen a bash history where "sudo chown -R me:me /" was
| followed up by "sudo chown +R me:me /". At least they tried.
| forinti wrote:
| He knew just enough to shoot himself in the foot.
| AceJohnny2 wrote:
| > _rather than a serial litigation zombie_
|
| "this sounds like cstross"
|
| <checks username>
|
| "heh"
| hnlmorg wrote:
| heh I have an almost identical war story.
| jlv2 wrote:
| I vaguely thought I posted in this thread back then.
|
| Back in 1984/5, I had a directory in my homedir called "etc" for
| of miscellaneous stuff. One day I thought: that's a bad name, I
| should remove it. I errantly typed "rm -rf /etc". Thankfully I
| got a "Permission Denied" error. Except, I then did the obvious
| override, "sudo rm -rf /etc" (1). This was on a VAX 11/780 with
| about 50 undergrads doing project work on. The command ran for a
| while and then I heard moans out in the terminal room, as the
| system crashed. It took us about 3 hours to restore from backup
| tape.
|
| (1) I used to have to explain what "sudo" was, because this
| happened before we posted it to USENET, and before it was
| ubiquitous on systems.
| drewg123 wrote:
| When did it become ubiquitous? I first got root privs via sudo
| on a *nix system in 1991 or so, and I remember it being widely
| deployed even then.
| lifeisstillgood wrote:
| So much of this can be filed under "before we culturally accepted
| prod is different.."
| mdpye wrote:
| Most of these stories relate to administering interactive
| multi-user machines, not the kind of thing we now think of as a
| server.
|
| Users were simultaneously logged in at the shell going about
| their business in *nix, not sending stateless requests in to a
| server process.
|
| And you certainly couldn't afford to have a duplicate of a
| machine that expensive.
|
| The idea of multiple environments didn't really exist, and you
| mostly administered machines from within - hence many of the
| stories being about getting enough tools working again to
| straighten it out. You didn't have another machine (or perhaps
| the connectivity) to put the thing on the operating table from
| a working system.
|
| Things were different...
| hnlmorg wrote:
| Try "before hardware was cheap enough that companies could run
| dedicated non-prod instances"
| dang wrote:
| One small (very) past thread:
|
| _Unix Admin. Horror Story Summary, version 1.0 (old)_ -
| https://news.ycombinator.com/item?id=721578 - July 2009 (6
| comments)
| midasuni wrote:
| > But the most important thing that can be learned from this is
| not that you have to make backups (we all know that, right? ;-)
| ). More important than making backups is to make sure your
| backups are complete and verified
|
| C'est plus ca change...
| karmakaze wrote:
| I really enjoy the recovery parts of the stories that have them,
| like a good Hollywood movie script, but real.
|
| Unix wasn't very common after leaving university and I have more
| PC/LAN type stories. There was one memorable moment, where I was
| working very late in the office and got a call. [If working late,
| the main line would ring the entire office and I could press the
| blinky light to answer.] It was one of our consultants on the
| west coast who somehow had a corrupt filesystem, but that machine
| was the one that had all the project files for the many months of
| consulting work that the team had been developing. [I don't
| recall but it may have been CVS or SVN.]
|
| The tricky bit was that it was using OS/2 and its HPFS filesystem
| so the usual file utilities wouldn't work. We had a number of IBM
| tech books on our bookshelves (because we also did mainframe
| consulting) and I'd been reading about terminal streams and one
| about the HPFS filesystem in particular. It mentioned boot
| blocks, superblocks, bands, allocation bit blocks, etc.
|
| Being young (and dumb) went with "what's the worst thing that
| could happen" and came up with a plan: using the DOS 'nu' (Norton
| Utility) copy a few choice sectors from a similar spec-looking
| machine and try the OS/2 equivalent of 'chkdsk /f'--the client
| after all was IBM known for conformity. We first had to dial-up
| modem transmit the 'nu' program, but then we were coping the
| first 18 (or so) sectors to get the boot sector, partition table,
| boot program or other HPFS initial sector data; then there were
| some sectors in the middle of the disk that served as a kind of
| main description table with others in bands (that we didn't
| bother with). Guessed the starting point and number of sectors.
| This was a grasping at straws Hail Mary. Rebooted the machine,
| let the OS/2 run its chcdsk as it detected a problem, waited a
| long while until it was done. Unbelieveably it all worked! There
| might have been a couple open files lost and some files that were
| recently deleted being present, but no big differences. We didn't
| think we needed to tell anyone. He bought me beers as promised
| when I came to visit.
|
| Bonus memory: LapLink with the parallel transfer cable was _the
| shit_ in those days. https://en.wikipedia.org/wiki/LapLink_cable
| notme77 wrote:
| source ~/.bash_history
| oneweekwonder wrote:
| hah, before ci/devops tools was popular and you had to setup
| multiple identical/fall-over machines you could scp your
| .bash_history clean it up a bit and source it, neat.
| butterfi wrote:
| I thought I would find these funny and instead they just made me
| anxious. I mean, they are funny, I guess I just have PTSD from
| years of Unix administration.
| cf100clunk wrote:
| Came here to say the same thing... Unix sysadmin since 1988. I
| would laugh about these stories but I just cannot without a
| lump in my throat and a few mea maxima culpas in my own mind.
| Reminds me of having heard somewhere that Tom Waits, on
| watching the litany of road show disasters parodied on This Is
| Spinal Tap, wept rather than laughed.
|
| My own worst Unix Admin Horror Story is a variant of the
| classic ''accidental delete-restore from backup if you've got
| one'' scenario: in the early '90s I accidentally repopulated
| all YP tables on a production Sparcstation 10 machine in real
| time on a busy workday, but the tables had not been kept up to
| date! It took until the next day to restore from backup and a
| further day of research and testing to get all the YP tables up
| to proper state, then write scripts to keep them updated. (This
| was before Sun was legally forced to rename YP to NIS, btw).
| krylon wrote:
| Ironically, while I love Unix, I have spent most of my career
| shepherding Windows boxes. The only real horror story I got was a
| new coworker (turning me from "the IT guy" into "half the IT
| department") who looked through the Active Directory tree and
| found the GPO management part had replicated the organizational
| structure. Since there were no GPOs at the time, he considered
| this wasteful and confusing, so he went and deleted it.
|
| ...
|
| Except that what he _did_ delete, it turned out, was the actual
| organizational structure of the Active Directory tree, including
| _ALL_ user accounts. (It 's hard to explain without visuals aids,
| the UI gave no indication it would delete the actual AD objects,
| not just the (non-existent) GPOs.)
|
| Before long, people started calling to let us know they could no
| longer log into their computers or the terminal server. _sigh_ It
| was a fairly stressful morning.
|
| We really tried, for about 45 minutes, to resurrect the Active
| Directory tree, but it was no good (this was Windows Server 2008,
| so no AD Recycle Bin), so we had to restore the server from
| backup. I have since learnt that there is backup software that
| allows you to restore, say, your AD tree, or maybe even just a
| part of it, instead of the whole machine. Well, the backup
| software we had at the time _suuuuucked_ , so not only did we
| have to restore the entire server, but we had to literally sit
| _all day_ and watch the progress bar move at glacial speed.
|
| In the end, we had the server up and running again, and
| fortunately both the company's CEO and most employees actually
| welcomed the opportunity to finally, _FINALLY_ clean up their
| desks, something every single one of them had been delaying for a
| long time. And by the time we were done, I was just so exhausted
| I wasn 't even mad at the newbie anymore.
|
| At least we learnt from that mistake, though. Got ourselves a
| second domain controller, and a much better backup solution. In
| retrospect, I think it was probably a good thing - our boss took
| it with good humor, no data was lost, our backup system worked,
| but we also saw how badly it sucked, and the incident gave us
| some leverage to get the funding for said upgrades. Also,
| everyone had a clean desk, and since it was a Friday, a couple of
| coworkers decided to start their weekend early.
| yeuxardents wrote:
| This story worked out surprisingly well, usually, not so much
| (:
| rntksi wrote:
| >Well one time I was installing a minimal base system of Linux on
| a friends PC, so that we would have all the necessary utlitities
| to bring over the rest of the stuff. His 3 1/2 inch disk was
| dead, so when had to get the 5 1/4 inch version of the boot/root
| disk. Too bad that version, having to fit in 1.2M instead of
| 1.44, didn't have tar
|
| Heh ... I wonder how many years forward will people stop knowing
| what a 3 1/2 and a 5 1/4 disk is
| CoastalCoder wrote:
| Or to understand the confusion regarding 3.5" disks being
| floppies rather than hard disks.
| hulitu wrote:
| Ah, the good old days. Single density, double density ( 720 kB,
| 1,44 MB). I heard also of 2.88 MB floppies - never saw one in
| real life. If i remember correctly 2.88 MB was double density
| double sided and you needed a special floppy drive.
| bityard wrote:
| Single-sided: 360 KB, double-sided (or double density): 720
| KB, high density: 1.44MB.
|
| There were 2.88 MB disks and drives but they never gained
| much traction, because they were expensive and the PC
| industry kept promising various "floppy killers" like the ZIP
| drive.
| pmontra wrote:
| As in "You 3D printed the Save icon!" ?
|
| https://logosatwork.com/you-3d-printed-the-save-icon/
| greedo wrote:
| I had a contractor installing software last week. He was granted
| full sudo permissions despite having demonstrated a unique set of
| command line skills. Everything was going fine until I get a text
| from his manager saying the contractor couldn't SSH in this
| morning. Turns out he had gotten frustrated with some file/folder
| permissions in the directory where he was supposed to install the
| software package. So he simply ran `chmod -R 777 /*`
|
| Needless to say that required a full restore from the previous
| night's backup since he hadn't snapshotted the VM before
| beginning his work. He was very angry that he had lost 2 days of
| work. I was very sympathetic...
| AceJohnny2 wrote:
| My very favorite is more of a "recovery legend", telling the
| heroic tale of recovering a a Unix system after an errant "rm
| -rf" deleted most of the system's critical files:
|
| https://www.ee.ryerson.ca/~elf/hack/recovery.html
| agentwiggles wrote:
| Nice! When I saw the thread title I was hoping this story would
| get posted somewhere. I read this a long time ago and hadn't
| been able to find it for years!
|
| Thanks for sharing!
| AceJohnny2 wrote:
| I once spent an hour or more finding it again, so I
| bookmarked it ;)
| whartung wrote:
| Worst thing I ever did was cross hard mount NFS volumes across
| two machines.
|
| With a hard NFS mount, the mount will hang until the other
| machine responds.
|
| When we had to power cycle the two servers, they would not come
| up as they were deadlocked waiting for each other. That was
| exciting.
| treesknees wrote:
| This is somewhat common in environments with stable power, you
| basically never have the entire IT system go down and come back
| up at the same time.
| patrickdavey wrote:
| How did you fix it?
| oneweekwonder wrote:
| not op but we recently wanted to mount nfs and the sysadmin
| was adamant we use automounter[0] instead of fstab because if
| the nfs mount is not available it can hang the kernel.
|
| Not sure if it is true or just sysadmin lore but was
| interesting enough to learn about a alternative.
|
| [0]: https://linux.die.net/man/8/automount
| jcynix wrote:
| Been there, done that. Ok, a colleague did it, sbkut 30 years
| ago ;-)
|
| IIRC resetting the machine and forcing it to single boot was
| the solution.
| bostik wrote:
| Speaking of NFS, ex-coworker had renamed his prior company's
| servers "notresponding" and "stilltrying".
|
| The NFS client logs must have been glorious.
| kloch wrote:
| I love this. It takes us back to a time when administering a Unix
| system was a Big Deal. Partly because they were rare and
| expensive. But also they were truly multi-user with dozens of
| people logged in at any given time.
| rilindo wrote:
| They still are a big deal, only you can manage up to thousands
| of them and with modern automation, if you screw up one, you
| screw up all of them.
| qwertox wrote:
| This reminds me of the time when I execuded `rm -rf ~` in hopes
| of deleting an erroneously created directory.
|
| Or when my "wrongly" expanded `mv` command moved all the files
| and directories in home into the last directory of the home
| directory. Which was a NTFS mount, leading to a loss of all the
| file permissions.
| csydas wrote:
| The first entry about adding tcsh is more or less the basis of
| one of the questions I use in technical interviews for our Linux
| team. It's less about specifics on tcsh and more just about
| explaining the hierarchy of linux/unix and why we have the
| bin/sbin directories under / and /usr; the more the candidate can
| explain (or even hypothesize), the more comfortable my team feels
| with their general curiosity/understanding of Linux/Unix.
|
| It's a niche situation sure, but being able to understand the
| system and tooling you're working with to a degree to understand
| what options you really have shows a great deal of discipline and
| curiosity, for me at least. Again, it's less about "can you
| figure out what to do in this specific situation" and more "can
| you just explain what you look at every single day in plain and
| simple terms? Did you ever think about it?"
|
| It's been a surprisingly revealing question for nascent Linux
| admins on how they react to questioning the things they look at
| every single day, and how ready they are to __really__ dig into
| the kernel internals.
| [deleted]
___________________________________________________________________
(page generated 2021-12-21 23:00 UTC)