[HN Gopher] How Unix spell ran in 64kb RAM
       ___________________________________________________________________
        
       How Unix spell ran in 64kb RAM
        
       Author : madmax108
       Score  : 218 points
       Date   : 2025-01-19 00:31 UTC (22 hours ago)
        
 (HTM) web link (blog.codingconfessions.com)
 (TXT) w3m dump (blog.codingconfessions.com)
        
       | Koshkin wrote:
       | 64kB was huge. The first version of UNIX needed 24kB (half of
       | which was taken by the kernel).
        
       | MarkusWandel wrote:
       | Marginally related... has anyone ever ported the "typo" program
       | to modern C?
        
         | o11c wrote:
         | For reference,
         | https://github.com/robpike/typo/blob/master/unix/typo.c
        
           | MikeTheGreat wrote:
           | Also for reference, for those who aren't familiar with typo
           | and don't want to read the C source code listed above (side-
           | comment: citing the comment-free source code 'for reference'
           | is hilarious. Thank you for the laugh :) )
           | 
           | https://github.com/robpike/typo/blob/master/typo.png
        
         | yencabulator wrote:
         | How about Go? I think that counts as "modern C" ;-)
         | 
         | This is pretty much the real thing:
         | https://github.com/robpike/typo
         | 
         | (If you're crabby about it, this is similar:
         | https://github.com/crate-ci/typos/)
        
       | sitkack wrote:
       | Way too smart for worse is better. Think Worser!
       | 
       | The main memory bandwidth and the disk bandwidth were about the
       | same, a little of 1MB/s.
       | 
       | I would have done this in multiple passes (but still used the
       | Bloom Filters, those are cool).
       | 
       | https://github.com/arnoldrobbins/v10spell
       | 
       | https://code.google.com/archive/p/unix-spell/
       | 
       | The original paper is great
       | https://www.semanticscholar.org/paper/Development-of-a-Spell...
       | 
       | It is hosted on his web page https://www.cs.dartmouth.edu/~doug/
       | 
       | https://en.wikipedia.org/wiki/Douglas_McIlroy
       | 
       | If you are a word nerd, you will have found obovate and there,
       | this chart.
       | 
       | https://upload.wikimedia.org/wikipedia/commons/e/e8/Leaf_mor...
        
       | WaitWaitWha wrote:
       | in the mid 80's i ran into something similar. Fast is relative.
       | 
       | I had a lot of data, 640KB RAM, 64KB of heap, and 64KB of stack.
       | I had hundreds of megabytes that I had to search extract data
       | from and then combine some of them.
       | 
       | I experimented with data index structured into ternary trees.
       | Conceptually it made sense, but implementation-wise the
       | relationships and paths were still too big to keep in 64KB.
       | 
       | Instead of compression, I did swapping. I wrote a TSR (think
       | service), a piece of code that would process a chunk of the data,
       | extract the results, store it n the stack, dump the original
       | data, make an interrupt call to the TSR, which in turn destroy
       | the heap, and read in the next chunk from storage, return control
       | to the program, process, combine with stack data, and continue
       | until finished the entire process.
       | 
       | Originally this process took about a week for three data entry
       | persons (think about a dozen 3" ring binders filled with tables),
       | and an specialist combining the information. The program
       | completed the work in just a few hours. It was amazingly "fast".
       | 
       | This was on a single threaded system.
       | 
       | [0] https://en.wikipedia.org/wiki/Terminate-and-stay-
       | resident_pr...
        
       | fortran77 wrote:
       | I had spelling checkers on the Apple ][ that ran in 48K!
        
         | jhbadger wrote:
         | Exactly. 64K only seems tiny to people who didn't use home
         | computers in the 1980s.
        
         | dahart wrote:
         | Do you know how many words it had, how accurate it was, and
         | whether it used the disk for lookup? Do you know if it was a
         | port of unix spell?
        
       | vandyswa wrote:
       | This spell check must have come later. The original spell check
       | burst the words of the document, and sort -u'd them:
       | 
       | makewords sentence | lowercase | sort | unique | mismatch
        
       | Scaevolus wrote:
       | I wonder what common typos this misses thanks to the hashing?
       | 
       | Related, a contest about compressing the Wordle dictionary:
       | http://golf.horse/wordle/
        
       | msephton wrote:
       | How about 39kB for a video game with physics, dynamic graphics,
       | two music tracks, sound effects, online high scores, and built-in
       | instructions? https://news.ycombinator.com/item?id=38372936
        
         | fallat wrote:
         | Not the same when the machines in question are not on the same
         | level...
        
         | GuB-42 wrote:
         | Sizecoding is a thing. .kkrieger is a rather famous 96kB FPS
         | game. There is even an entire demoparty called Lovebyte that is
         | dedicated to it, the biggest category is 1k, but all demoscene
         | events I can think of have a sizecoding competition of some
         | kind.
         | 
         | And it is a completely different thing. In general, it is more
         | about procedural generation and tricks then good packing.
         | Runtime packers are used, like crinkler and kkrunchy, but
         | actually they use a _lot_ of RAM, like hundreds of MB, which is
         | a bit surprising considering that the decompressed executable
         | is in the tens of kB. But that 's because they use very
         | powerful but slow compression algorithms.
         | 
         | Sizecoding usually doesn't care about RAM, unless the platform
         | requires it, the only think that matters is the size of the
         | executable file and its data. For that 39kB Playdate game, I
         | guess that's the same idea. The Playdate has 16MB of RAM, I bet
         | the game took full advantage of it.
        
         | teamonkey wrote:
         | The original Elite was 22k on tape running on a machine with
         | 32k of RAM
        
       | PaulHoule wrote:
       | You can write an external memory spell checker with a tiny amount
       | of RAM: something like                  - sort the words in the
       | document        - eliminate unique words (they sort together)
       | - merge the sorted words with the sorted dictionary and keep only
       | the missing words
       | 
       | I saw this in BASIC in _Creative Computing_ and got it working in
       | on my TRS-80 Color Computer which had much less than 32k of
       | available RAM, so that was the first thing I thought when I saw
       | the headline.
       | 
       | Now this blew people away when it came out
       | 
       | https://winworldpc.com/product/turbo-lightning/1x
       | 
       | it had a compressed dictionary that would fit together with the
       | other programs you were running on a PC and spell check _as you
       | typed_ ; there was a 640k limit for the PC but it could only use
       | a fraction of that so as not to interfere and in the early days
       | of the PC you couldn't actually afford to fill it out.
        
         | bongodongobob wrote:
         | Link seems to be broken.
        
           | fsckboy wrote:
           | works4me
        
         | jandrese wrote:
         | I guess you really used the fact that most words are repeated
         | to keep the byte count in check? On the old C=64 I had it was a
         | bit of a problem not to blow out the memory with just the text
         | of the document once you started using it for more than a 1 or
         | 2 page paper. Keeping a second sorted copy seems almost
         | luxurious.
         | 
         | I guess you could save the working copy to disk first, then do
         | the sort, then compare, then reload the working copy. I think
         | the C=64 developers probably avoided that strategy because the
         | disk interface was so damn slow.
        
           | tczMUFlmoNk wrote:
           | I may be wrong, but from "external memory" in the description
           | I think the idea is that each of those steps can be done on
           | disk, not RAM. An external merge sort is a pretty standard
           | database primitive, and the other two only require purely
           | sequential access, so are friendly to spinning disks and the
           | like.
        
             | PaulHoule wrote:
             | Knuth's _Sorting and Searching_ book talks a lot about that
             | sort of algorithm.
             | 
             | The old IBM 360 had a 16MB address space at best, but in
             | 1968 there were very few installations anywhere near
             | filling it. This kind of tape
             | 
             | https://en.wikipedia.org/wiki/9-track_tape
             | 
             | could have a capacity of 80 MB or so, which is (1) much
             | larger than RAM, and (2) mainly sequential (it could fast
             | forward for a bit and try to find a particular block, but
             | it was pretty slow) Contrast this to the floppy drives in
             | the 70-140 kb range which the 8-bit micros had. Thus there
             | was a lot of literature on external memory algorithms that
             | would work on tapes in the early days -- though similar
             | methods are still of interest today for "big data" as RAM
             | is faster by a lot when you access it sequentially, you
             | want to minimize round trips in distributed systems, etc.
             | 
             | (It was funny though when I talked to computer scientists
             | around 2004 and was told to forget about external memory
             | algorithms because 'main memory' was the thing; people
             | realized, for instance, that Google Maps could store all
             | the tiles in RAM in half a rack of 1U servers and if
             | utilization was high it was much cheaper than serving the
             | tiles off disk; Hadoop came along and was a blast from the
             | past that itself was obsolete in just a few years)
        
         | probably_wrong wrote:
         | Worth noting that the article mentions this alternative as
         | their first PoC and its drawbacks: "Because of its simplistic
         | implementation, it was not very accurate, and also slow because
         | of dictionary lookups on the disk."
        
       | paulg2222 wrote:
       | Cool. Now move forward.
        
         | hulitu wrote:
         | > Cool. Now move forward.
         | 
         | They are trying, but only seem to go backwards. See Windows,
         | Android, iOS, Teams for examples.
        
       | eichin wrote:
       | For perspective, in 1983 or so, Grammatik on CP/M ran in under
       | 64k and did "grammar checking" (spell checking, plus a bunch of
       | expert system rules) on an 8-bit system. (It sticks in my memory
       | because of the time spent poking at the _really_ interesting
       | part: that it was so compact because it was in Forth, and there
       | was enough of an outer interpreter in the product that with a
       | little hex editing you could just use it as a Forth interpreter -
       | with a _very_ specialized set of functions preloaded :-)
        
         | stevekemp wrote:
         | The Wordstar editor I run on my own CP/M system, with 64k of
         | RAM, contains the 2023-byte long "SPELL.COM" spell-checker.
         | 
         | I've not decompiled it to see how it works, but it's small and
         | fast, and works well.
        
           | yencabulator wrote:
           | You have to count the size of the dictionary too. See
           | "DICT.DIC".
           | 
           | https://dflund.se/~pi/cpm/files/ftp.mayn.de/pub/cpm/archive/.
           | ..
        
       | eichin wrote:
       | One of the things that got me intrigued by Unix was an early
       | 1980s(ish) Byte article which walked through building a (trivial
       | example, not the "real" one) spell checker out of a
       | split/sort/comm pipeline, something like 7 commands? 8-bit PCs
       | didn't have anything like that, and yet it didn't look like it
       | needed _that_ much sophistication...
        
         | MarkSweep wrote:
         | One a similar note, there is this period video where Brian
         | Kernighan shows how to construct a spell checker using a UNIX
         | shell one-liner:
         | 
         | https://youtu.be/tc4ROCJYbm0?t=4m56s
        
       | LeoPanthera wrote:
       | I can't remember the name of the product but in the 80s there was
       | a _hardware_ spell checker for the IBM PC. It was a box that
       | connected between your keyboard and your PC, and if you ever
       | typed a string of letters that it did not recognize as a
       | dictionary word, it would beep to let you know.
        
         | jadamson wrote:
         | Xerox PC Type Right
         | 
         | https://vintageapple.org/pcworld/pdf/PC_World_8711_November_...
         | page 237 has a review (big PDF warning)
        
           | eurleif wrote:
           | >You can even set the device to recognize your password and
           | beep if you mistype it, thereby eliminating a bevy of
           | incorrect sign-on messages.
           | 
           | Wow, what a thing for a tech publication to suggest doing!
           | Different times.
        
       | ronjakoi wrote:
       | > But 27-bit hash codes were too big: with 2^15 words, they
       | needed 2^15 * 27 bits of memory, while the PDP-11 had only 2^15 *
       | 16 bits (64kB) of RAM--compression was essential.
       | 
       | I'm frustrated when people put this kind of typography on the
       | web. HTML can do superscript.
        
         | hnlmorg wrote:
         | Then you should enjoy this article because nearly all the
         | expressions are presented as proper mathematical equations bar
         | the few places where those expressions are pseudocode
        
       | daantje wrote:
       | Reading about this and similar techniques in Programming Pearls
       | (Second Edition) by Jon Bentley left the younger me spellbound.
       | Similar to the evolution of linkers up to mold.
        
         | canucker2016 wrote:
         | archive.org has Jon Bentley's Programming Pearls 2nd Edition
         | available:
         | 
         | https://archive.org/details/programming-pearls/mode/2up
        
       | dalke wrote:
       | > At this time, Bloom filter was not even called Bloom filter. In
       | his paper, Douglas calls it a "superimposed code scheme".
       | 
       | A Bloom filter is a specific type of superimposed code.
       | 
       | Calvin Mooers developed random (1) superimposed coding in his
       | Master's thesis at MIT back in the 1940s, directly influenced by
       | Shannon's work.
       | 
       | Bourne's superb 1963 book "Methods of Information Handling" gives
       | details of the mathematics.
       | 
       | I've no doubt Douglas knew about the broader technique, which,
       | for example, the author of "The Large Data Base File Structure
       | Dilemma" (1975) at http://dx.doi.org/10.1021/ci60001a005
       | described as "an old technique called super-imposed coding".
       | 
       | (1) The "random" is an important qualifier because there were
       | superimposed codes predating Mooers, but they were not
       | mathematically interesting or all that practically important.
        
       | noja wrote:
       | B is bytes. b is bits.
        
       | Theodores wrote:
       | I can remember using UNIX spell with the '-b' option, because I
       | am British. There were only two language options, and now I want
       | to know what the decision making was behind that, how the code
       | catered for that and where the respective dictionaries came from.
       | Did Australians and New Zealanders use British spelling or
       | American?
       | 
       | UNIX spell was the 'ZX81 1K chess' of spelling, and, on home
       | computers, we did not have a lot of spell checking going on until
       | MS Word for Windows 3.1. Before then, in offices, secretaries did
       | the typing with WordPerfect. They were human spell checkers for
       | their respective managers and teams.
       | 
       | Meanwhile, at home, with our dot matrix printers and flickery
       | screens, we were winging it with paper dictionaries for all of
       | those early years of computing. I can't remember spell checking
       | as being that important back then as everyone could spell. I was
       | in a school of a thousand and there was only one kid that claimed
       | to be dyslexic, a plausible excuse for not being able to spell.
       | Maybe the 1980s was literacy's golden age with there being a
       | clear start date for the decline in our spelling ability, that
       | being the day UNIX spell was written.
       | 
       | I like to play Scrabble. Although a very different problem to
       | spell checking, the process shares some steps with UNIX spell.
       | Common word prefixes and suffixes are identified and bolted
       | together in the rack or on the board with other components. Then
       | a Scrabble dictionary is a bit like UNIX spell as it is just a
       | big dictionary of words with no meanings provided. All that
       | matters is whether a given word is in the book or not. It also
       | has a few special look up tables such as the 102 two letter
       | words.
        
       | 1vuio0pswjnm7 wrote:
       | "In order to secure funding for Unix, Ken Thompson and Dennis
       | Ritchie pitched Unix as a text processing system for the patents
       | department to AT&T. Naturally, a text processing system needed a
       | spell checker as well."
       | 
       | I still use UNIX every day primarily for text processing.
        
       ___________________________________________________________________
       (page generated 2025-01-19 23:01 UTC)