[HN Gopher] How Unix spell ran in 64kb RAM
___________________________________________________________________
How Unix spell ran in 64kb RAM
Author : madmax108
Score : 218 points
Date : 2025-01-19 00:31 UTC (22 hours ago)
(HTM) web link (blog.codingconfessions.com)
(TXT) w3m dump (blog.codingconfessions.com)
| Koshkin wrote:
| 64kB was huge. The first version of UNIX needed 24kB (half of
| which was taken by the kernel).
| MarkusWandel wrote:
| Marginally related... has anyone ever ported the "typo" program
| to modern C?
| o11c wrote:
| For reference,
| https://github.com/robpike/typo/blob/master/unix/typo.c
| MikeTheGreat wrote:
| Also for reference, for those who aren't familiar with typo
| and don't want to read the C source code listed above (side-
| comment: citing the comment-free source code 'for reference'
| is hilarious. Thank you for the laugh :) )
|
| https://github.com/robpike/typo/blob/master/typo.png
| yencabulator wrote:
| How about Go? I think that counts as "modern C" ;-)
|
| This is pretty much the real thing:
| https://github.com/robpike/typo
|
| (If you're crabby about it, this is similar:
| https://github.com/crate-ci/typos/)
| sitkack wrote:
| Way too smart for worse is better. Think Worser!
|
| The main memory bandwidth and the disk bandwidth were about the
| same, a little of 1MB/s.
|
| I would have done this in multiple passes (but still used the
| Bloom Filters, those are cool).
|
| https://github.com/arnoldrobbins/v10spell
|
| https://code.google.com/archive/p/unix-spell/
|
| The original paper is great
| https://www.semanticscholar.org/paper/Development-of-a-Spell...
|
| It is hosted on his web page https://www.cs.dartmouth.edu/~doug/
|
| https://en.wikipedia.org/wiki/Douglas_McIlroy
|
| If you are a word nerd, you will have found obovate and there,
| this chart.
|
| https://upload.wikimedia.org/wikipedia/commons/e/e8/Leaf_mor...
| WaitWaitWha wrote:
| in the mid 80's i ran into something similar. Fast is relative.
|
| I had a lot of data, 640KB RAM, 64KB of heap, and 64KB of stack.
| I had hundreds of megabytes that I had to search extract data
| from and then combine some of them.
|
| I experimented with data index structured into ternary trees.
| Conceptually it made sense, but implementation-wise the
| relationships and paths were still too big to keep in 64KB.
|
| Instead of compression, I did swapping. I wrote a TSR (think
| service), a piece of code that would process a chunk of the data,
| extract the results, store it n the stack, dump the original
| data, make an interrupt call to the TSR, which in turn destroy
| the heap, and read in the next chunk from storage, return control
| to the program, process, combine with stack data, and continue
| until finished the entire process.
|
| Originally this process took about a week for three data entry
| persons (think about a dozen 3" ring binders filled with tables),
| and an specialist combining the information. The program
| completed the work in just a few hours. It was amazingly "fast".
|
| This was on a single threaded system.
|
| [0] https://en.wikipedia.org/wiki/Terminate-and-stay-
| resident_pr...
| fortran77 wrote:
| I had spelling checkers on the Apple ][ that ran in 48K!
| jhbadger wrote:
| Exactly. 64K only seems tiny to people who didn't use home
| computers in the 1980s.
| dahart wrote:
| Do you know how many words it had, how accurate it was, and
| whether it used the disk for lookup? Do you know if it was a
| port of unix spell?
| vandyswa wrote:
| This spell check must have come later. The original spell check
| burst the words of the document, and sort -u'd them:
|
| makewords sentence | lowercase | sort | unique | mismatch
| Scaevolus wrote:
| I wonder what common typos this misses thanks to the hashing?
|
| Related, a contest about compressing the Wordle dictionary:
| http://golf.horse/wordle/
| msephton wrote:
| How about 39kB for a video game with physics, dynamic graphics,
| two music tracks, sound effects, online high scores, and built-in
| instructions? https://news.ycombinator.com/item?id=38372936
| fallat wrote:
| Not the same when the machines in question are not on the same
| level...
| GuB-42 wrote:
| Sizecoding is a thing. .kkrieger is a rather famous 96kB FPS
| game. There is even an entire demoparty called Lovebyte that is
| dedicated to it, the biggest category is 1k, but all demoscene
| events I can think of have a sizecoding competition of some
| kind.
|
| And it is a completely different thing. In general, it is more
| about procedural generation and tricks then good packing.
| Runtime packers are used, like crinkler and kkrunchy, but
| actually they use a _lot_ of RAM, like hundreds of MB, which is
| a bit surprising considering that the decompressed executable
| is in the tens of kB. But that 's because they use very
| powerful but slow compression algorithms.
|
| Sizecoding usually doesn't care about RAM, unless the platform
| requires it, the only think that matters is the size of the
| executable file and its data. For that 39kB Playdate game, I
| guess that's the same idea. The Playdate has 16MB of RAM, I bet
| the game took full advantage of it.
| teamonkey wrote:
| The original Elite was 22k on tape running on a machine with
| 32k of RAM
| PaulHoule wrote:
| You can write an external memory spell checker with a tiny amount
| of RAM: something like - sort the words in the
| document - eliminate unique words (they sort together)
| - merge the sorted words with the sorted dictionary and keep only
| the missing words
|
| I saw this in BASIC in _Creative Computing_ and got it working in
| on my TRS-80 Color Computer which had much less than 32k of
| available RAM, so that was the first thing I thought when I saw
| the headline.
|
| Now this blew people away when it came out
|
| https://winworldpc.com/product/turbo-lightning/1x
|
| it had a compressed dictionary that would fit together with the
| other programs you were running on a PC and spell check _as you
| typed_ ; there was a 640k limit for the PC but it could only use
| a fraction of that so as not to interfere and in the early days
| of the PC you couldn't actually afford to fill it out.
| bongodongobob wrote:
| Link seems to be broken.
| fsckboy wrote:
| works4me
| jandrese wrote:
| I guess you really used the fact that most words are repeated
| to keep the byte count in check? On the old C=64 I had it was a
| bit of a problem not to blow out the memory with just the text
| of the document once you started using it for more than a 1 or
| 2 page paper. Keeping a second sorted copy seems almost
| luxurious.
|
| I guess you could save the working copy to disk first, then do
| the sort, then compare, then reload the working copy. I think
| the C=64 developers probably avoided that strategy because the
| disk interface was so damn slow.
| tczMUFlmoNk wrote:
| I may be wrong, but from "external memory" in the description
| I think the idea is that each of those steps can be done on
| disk, not RAM. An external merge sort is a pretty standard
| database primitive, and the other two only require purely
| sequential access, so are friendly to spinning disks and the
| like.
| PaulHoule wrote:
| Knuth's _Sorting and Searching_ book talks a lot about that
| sort of algorithm.
|
| The old IBM 360 had a 16MB address space at best, but in
| 1968 there were very few installations anywhere near
| filling it. This kind of tape
|
| https://en.wikipedia.org/wiki/9-track_tape
|
| could have a capacity of 80 MB or so, which is (1) much
| larger than RAM, and (2) mainly sequential (it could fast
| forward for a bit and try to find a particular block, but
| it was pretty slow) Contrast this to the floppy drives in
| the 70-140 kb range which the 8-bit micros had. Thus there
| was a lot of literature on external memory algorithms that
| would work on tapes in the early days -- though similar
| methods are still of interest today for "big data" as RAM
| is faster by a lot when you access it sequentially, you
| want to minimize round trips in distributed systems, etc.
|
| (It was funny though when I talked to computer scientists
| around 2004 and was told to forget about external memory
| algorithms because 'main memory' was the thing; people
| realized, for instance, that Google Maps could store all
| the tiles in RAM in half a rack of 1U servers and if
| utilization was high it was much cheaper than serving the
| tiles off disk; Hadoop came along and was a blast from the
| past that itself was obsolete in just a few years)
| probably_wrong wrote:
| Worth noting that the article mentions this alternative as
| their first PoC and its drawbacks: "Because of its simplistic
| implementation, it was not very accurate, and also slow because
| of dictionary lookups on the disk."
| paulg2222 wrote:
| Cool. Now move forward.
| hulitu wrote:
| > Cool. Now move forward.
|
| They are trying, but only seem to go backwards. See Windows,
| Android, iOS, Teams for examples.
| eichin wrote:
| For perspective, in 1983 or so, Grammatik on CP/M ran in under
| 64k and did "grammar checking" (spell checking, plus a bunch of
| expert system rules) on an 8-bit system. (It sticks in my memory
| because of the time spent poking at the _really_ interesting
| part: that it was so compact because it was in Forth, and there
| was enough of an outer interpreter in the product that with a
| little hex editing you could just use it as a Forth interpreter -
| with a _very_ specialized set of functions preloaded :-)
| stevekemp wrote:
| The Wordstar editor I run on my own CP/M system, with 64k of
| RAM, contains the 2023-byte long "SPELL.COM" spell-checker.
|
| I've not decompiled it to see how it works, but it's small and
| fast, and works well.
| yencabulator wrote:
| You have to count the size of the dictionary too. See
| "DICT.DIC".
|
| https://dflund.se/~pi/cpm/files/ftp.mayn.de/pub/cpm/archive/.
| ..
| eichin wrote:
| One of the things that got me intrigued by Unix was an early
| 1980s(ish) Byte article which walked through building a (trivial
| example, not the "real" one) spell checker out of a
| split/sort/comm pipeline, something like 7 commands? 8-bit PCs
| didn't have anything like that, and yet it didn't look like it
| needed _that_ much sophistication...
| MarkSweep wrote:
| One a similar note, there is this period video where Brian
| Kernighan shows how to construct a spell checker using a UNIX
| shell one-liner:
|
| https://youtu.be/tc4ROCJYbm0?t=4m56s
| LeoPanthera wrote:
| I can't remember the name of the product but in the 80s there was
| a _hardware_ spell checker for the IBM PC. It was a box that
| connected between your keyboard and your PC, and if you ever
| typed a string of letters that it did not recognize as a
| dictionary word, it would beep to let you know.
| jadamson wrote:
| Xerox PC Type Right
|
| https://vintageapple.org/pcworld/pdf/PC_World_8711_November_...
| page 237 has a review (big PDF warning)
| eurleif wrote:
| >You can even set the device to recognize your password and
| beep if you mistype it, thereby eliminating a bevy of
| incorrect sign-on messages.
|
| Wow, what a thing for a tech publication to suggest doing!
| Different times.
| ronjakoi wrote:
| > But 27-bit hash codes were too big: with 2^15 words, they
| needed 2^15 * 27 bits of memory, while the PDP-11 had only 2^15 *
| 16 bits (64kB) of RAM--compression was essential.
|
| I'm frustrated when people put this kind of typography on the
| web. HTML can do superscript.
| hnlmorg wrote:
| Then you should enjoy this article because nearly all the
| expressions are presented as proper mathematical equations bar
| the few places where those expressions are pseudocode
| daantje wrote:
| Reading about this and similar techniques in Programming Pearls
| (Second Edition) by Jon Bentley left the younger me spellbound.
| Similar to the evolution of linkers up to mold.
| canucker2016 wrote:
| archive.org has Jon Bentley's Programming Pearls 2nd Edition
| available:
|
| https://archive.org/details/programming-pearls/mode/2up
| dalke wrote:
| > At this time, Bloom filter was not even called Bloom filter. In
| his paper, Douglas calls it a "superimposed code scheme".
|
| A Bloom filter is a specific type of superimposed code.
|
| Calvin Mooers developed random (1) superimposed coding in his
| Master's thesis at MIT back in the 1940s, directly influenced by
| Shannon's work.
|
| Bourne's superb 1963 book "Methods of Information Handling" gives
| details of the mathematics.
|
| I've no doubt Douglas knew about the broader technique, which,
| for example, the author of "The Large Data Base File Structure
| Dilemma" (1975) at http://dx.doi.org/10.1021/ci60001a005
| described as "an old technique called super-imposed coding".
|
| (1) The "random" is an important qualifier because there were
| superimposed codes predating Mooers, but they were not
| mathematically interesting or all that practically important.
| noja wrote:
| B is bytes. b is bits.
| Theodores wrote:
| I can remember using UNIX spell with the '-b' option, because I
| am British. There were only two language options, and now I want
| to know what the decision making was behind that, how the code
| catered for that and where the respective dictionaries came from.
| Did Australians and New Zealanders use British spelling or
| American?
|
| UNIX spell was the 'ZX81 1K chess' of spelling, and, on home
| computers, we did not have a lot of spell checking going on until
| MS Word for Windows 3.1. Before then, in offices, secretaries did
| the typing with WordPerfect. They were human spell checkers for
| their respective managers and teams.
|
| Meanwhile, at home, with our dot matrix printers and flickery
| screens, we were winging it with paper dictionaries for all of
| those early years of computing. I can't remember spell checking
| as being that important back then as everyone could spell. I was
| in a school of a thousand and there was only one kid that claimed
| to be dyslexic, a plausible excuse for not being able to spell.
| Maybe the 1980s was literacy's golden age with there being a
| clear start date for the decline in our spelling ability, that
| being the day UNIX spell was written.
|
| I like to play Scrabble. Although a very different problem to
| spell checking, the process shares some steps with UNIX spell.
| Common word prefixes and suffixes are identified and bolted
| together in the rack or on the board with other components. Then
| a Scrabble dictionary is a bit like UNIX spell as it is just a
| big dictionary of words with no meanings provided. All that
| matters is whether a given word is in the book or not. It also
| has a few special look up tables such as the 102 two letter
| words.
| 1vuio0pswjnm7 wrote:
| "In order to secure funding for Unix, Ken Thompson and Dennis
| Ritchie pitched Unix as a text processing system for the patents
| department to AT&T. Naturally, a text processing system needed a
| spell checker as well."
|
| I still use UNIX every day primarily for text processing.
___________________________________________________________________
(page generated 2025-01-19 23:01 UTC)