[HN Gopher] We'd be better off with 9-bit bytes
       ___________________________________________________________________
        
       We'd be better off with 9-bit bytes
        
       Author : luu
       Score  : 43 points
       Date   : 2025-08-06 19:39 UTC (3 hours ago)
        
 (HTM) web link (pavpanchekha.com)
 (TXT) w3m dump (pavpanchekha.com)
        
       | FrankWilhoit wrote:
       | That's what the PDP-10 community was saying decades ago.
        
       | Keyframe wrote:
       | Yeah, but hear me out - 10-bit bytes!
        
         | Waterluvian wrote:
         | Uh oh. Looks like humanity has been bitten by the bit byte bug.
        
         | pdpi wrote:
         | One of the nice features of 8 bit bytes is being able to break
         | them into two hex nibbles. 9 bits breaks that, though you could
         | do three octal digits instead I suppose.
         | 
         | 10 bit bytes would give us 5-bit nibbles. That would be 0-9a-v
         | digits, which seems a bit extreme.
        
           | pratyahava wrote:
           | Crockford base32 would be great. it is 0-9, A-Z minus I, L,
           | O, U.
        
             | pdpi wrote:
             | The moment you feel the need to skip letters due to
             | propensity for errors should also be the moment you realise
             | you're doing something wrong, though. It's kind of fine if
             | you want a case insensitive encoding scheme, but it's kind
             | of nasty for human-first purposes (e.g. in source code).
        
           | int_19h wrote:
           | Clearly it should be 12 bits, that way you could use either 3
           | hex digits or 4 octal ones. ~
        
             | monocasa wrote:
             | Alternate world where the pdp-8 evolved into our modern
             | processors.
        
         | pratyahava wrote:
         | i am fascinated with the idea of 10-bit bytes since years ago.
         | i asked chatgpt "below is a piece of an article about 9-bit
         | bytes, create the same but for 10-bit bytes" and gave a piece,
         | the reply is below.
         | 
         | IPv4: Everyone knows the story: IPv4 had 32-bit addresses, so
         | about 4 billion total.33 Less due to various reserved subnets.
         | That's not enough in a world with 8 billion humans, and that's
         | led to NATs, more active network middleware, and the impossibly
         | glacial pace of IPv6 roll-out. It's 2025 and Github--Github!--
         | doesn't support IPv6. But in a world with 10-bit bytes IPv4
         | would have had 40-bit addresses, about 1 trillion total. That
         | would be more than enough right now, and likely sufficient well
         | into the 22nd century.44 In our timeline, exhaustion hit in
         | 2011, when demand was doubling every five years. 256x more
         | addresses gets us to 2065 projecting linearly, and probably
         | later with slowing growth. When exhaustion does set in, it
         | would plausibly happen in a world where address demand has
         | stabilized, and light market forces or reallocation would
         | suffice--no need for NAT spaghetti or painful protocol
         | transitions.
         | 
         | UNIX time: In our timeline, 32-bit UNIX timestamps run out in
         | 2038, so again all software has to painfully transition to
         | larger, 64-bit structures. Equivalent 40-bit timestamps last
         | until year 34,857, so absolutely no hurry. Negative timestamps
         | would reach back to year -34,818, easily covering everything
         | from the birth of agriculture to the last Ice Age to the time
         | Neanderthals still roamed Europe.55 And yes, probably long
         | enough to support most science fiction timelines without
         | breaking a sweat.
        
           | relevant_stats wrote:
           | I really don't get why some people like to pollute
           | conversations with LLMs answers. Particularly when they are
           | as dumb as your example.
           | 
           | What's the point?
        
             | svachalek wrote:
             | Same, we all have access to the LLM too, but I go to forums
             | for human thoughts.
        
               | pratyahava wrote:
               | ok, agree with your point, i should have got the numbers
               | from chatgpt and just put them in the comment with my
               | words, i was just lazy to calculate how much profit we
               | would have with 10-bit bytes.
        
             | pratyahava wrote:
             | umm, i guess most of the article is made by llm, so i did
             | not see it as a sin, but for other cases i agree, copy-
             | pasting from llm is crap
        
         | iosjunkie wrote:
         | No! No, no, not 10! He said 9. Nobody's comin' up with 10. Who
         | processing with 10 bits? What's the extra bit for? You're just
         | wastin' electrons.
        
         | titzer wrote:
         | 10 bit bytes would be awesome! Think of 20 bit microcontrollers
         | and 40 bit workstations. 40 bits makes 5 byte words, that'd be
         | rad. Also, CPUs could support "legacy" 32 bit integers and use
         | a full 8 bits for tags, which are useful for implementing
         | dynamic languages.
        
       | zamadatix wrote:
       | Because we have 8 bit bytes we are familiar with the famous or
       | obvious cases multiples-of-8-bits ran out, and those cases sound
       | a lot better with 12.5% extra bits. What's harder to see in this
       | kind of thought experiment is what the famously obvious cases
       | multiples-of-9-bits ran out would have been. The article starts
       | to think about some of these towards the end, but it's hard as
       | it's not immediately obvious how many others there might be (or,
       | alternatively, why it'd be significantly different total number
       | of issues than 8 bit bytes had). ChatGPT particularly isn't going
       | to have a ton of training data about the problems with 9 bit
       | multiples running out to hand feed you.
       | 
       | It also works in the reverse direction too. E.g. knowing
       | networking headers don't even care about byte alignment for sub
       | fields (e.g. a VID is 10 bits because it's packed with a few
       | other fields in 2 bytes) I wouldn't be surprised if IPv4 would
       | have ended up being 3 byte addresses = 27 bits, instead of
       | 4*9=36, since they were more worried with small packet overheads
       | than matching specific word sizes in certain CPUs.
        
         | marcosdumay wrote:
         | Well, there should be half as many cases of multiples-of-9-bits
         | ran out than for multiples-of-8-bits.
         | 
         | I don't think this is enough of a reason, though.
        
       | MangoToupe wrote:
       | Maybe if we worked with 7-bit bytes folks would be more grateful.
        
       | folsom wrote:
       | I don't know what if we ended up with a 27 bit address space?
       | 
       | As far as ISPs competing on speeds in the mid 90s, for some
       | reason it feels like historical retrospectives are always about
       | ten years off.
        
       | jayd16 wrote:
       | I guess nibbles would be 3 bits and you'd 3 per byte?
        
       | monocasa wrote:
       | Ohh, and then we could write the digits in octal.
       | 
       | Interestingly, the N64 internally had 9 bit bytes, just accesses
       | from the CPU ignored one of the bits. This wasn't a parity bit,
       | but instead a true extra data bit that was used by the GPU.
        
         | ethan_smith wrote:
         | The N64's Reality Display Processor actually used that 9th bit
         | as a coverage mask for antialiasing, allowing per-pixel alpha
         | blending without additional memory lookups.
        
           | monocasa wrote:
           | As well as extra bits in the Z buffer to give it a 15.3 fixed
           | point format.
        
       | kazinator wrote:
       | 36 bit addresses would be better than 32, but I like being able
       | to store a 64 bit double or pointer or integer in a word using
       | NaN tagging (subject to the limitation that only 48 bits of the
       | pointer are significant).
        
         | consp wrote:
         | Funny thing is we sort-of got 36bit addressing mainstream with
         | PAE in the 32bit x86 age.
        
           | kazinator wrote:
           | We sort of got 16 + 4 = 20 bit addressing in the 16 bit x86
           | age too.
        
       | Retr0id wrote:
       | Aside from memory limits, one of the problems with 32-bit
       | pointers is that ASLR is weakened as a security mitigation -
       | there's simply fewer bits left to randomise. A 36-bit address
       | space doesn't improve on this much.
       | 
       | 64-bit pointers are pretty spacious and have "spare" bits for
       | metadata (e.g. PAC, NaN-boxing). 72-bit pointers are even better
       | I suppose, but their adoption would've come later.
        
       | kazinator wrote:
       | Problem is, not only did we have decades of C code that
       | unnecessarily assumed 8/16/32, this all-the-world-is-a-VAX view
       | is now baked into newer languages.
       | 
       | C is good for portability to this kind of machine. You can have a
       | 36 bit int (for instance), CHAR_BIT is defined as 9 and so on.
       | 
       | With a little bit of extra reasoning, you can make the code fit
       | different machines sizes so that you use all the available bits.
        
         | pratyahava wrote:
         | was that assumption in C code really unnecessary? i suppose it
         | made many things much easier.
        
           | kazinator wrote:
           | In my experience, highly portable C is cleaner and easier to
           | understand and maintain than C which riddles abstract logic
           | with dependencies on the specific parameters of the abstract
           | machine.
           | 
           | Sometimes the latter is a win, but not if that is your
           | default modus operandi.
           | 
           | Another issue is that machine-specific code that assumes
           | compiler and machine characteristics often has outright
           | undefined behavior, not making distinctions between "this
           | type is guaranteed to be 32 bits" and "this type is
           | guaranteed to wrap around to a negative value" or "if we
           | shift this value 32 bits or more, we get zero so we are okay"
           | and such.
           | 
           | There are programmers who are not stupid like this, but those
           | are the ones who will tend to reach for portable coding.
        
         | 0cf8612b2e1e wrote:
         | Now a C++ proposal to define a byte as 8 bits
         | 
         | https://isocpp.org/files/papers/P3477R1.html
        
       | TruffleLabs wrote:
       | PDP-8 has a 12-bit word size
        
       | bawolff wrote:
       | > But in a world with 9-bit bytes IPv4 would have had 36-bit
       | addresses, about 64 billion total.
       | 
       | Or we would have had 27 bit addresses and ran into problems
       | sooner.
        
         | bigstrat2003 wrote:
         | That might've been better, actually. The author makes the
         | mistake of "more time would've made this better", but we've had
         | plenty of time to transition to IPv6. People simply don't
         | because they are lazy and IPv4 works for them. More time
         | wouldn't help that, any more than a procrastinating student
         | benefits when the deadline for a paper gets extended.
         | 
         | But on the other hand, if we had run out _sooner_ , perhaps
         | IPv4 wouldn't be as entrenched and people would've been more
         | willing to switch. Maybe not, of course, but it's at least a
         | possibility.
        
           | dmitrygr wrote:
           | > simply don't because they are lazy and IPv4 works for them
           | 
           | Or because IPv6 was not a simple "add more bits to address"
           | but a much larger in-places-unwanted change.
        
             | zamadatix wrote:
             | Most of the "unwanted" things in IPv6 aren't actually
             | required by IPv6. Temporary addresses, most of the feature
             | complexity in NDP, SLAAC, link-local addresses for anything
             | but the underlying stuff that happens automatically, "no
             | NAT, you must use PD", probably more I'm forgetting.
             | Another large portion is things related to trying to be
             | dual stack like concurrent resolutions/requests, various
             | forms of tunneling, NAT64, and others.
             | 
             | They're almost always deployed though because people end up
             | liking the ideas. They don't want to configure VRRP for
             | gateway redundancy, they don't want a DHCP server for
             | clients to be able to connect, they want to be able to use
             | link-local addresses for certain application use cases,
             | they want the random addresses for increased privacy, they
             | want to dual stack for compatibility, etc. For the people
             | that don't care they see people deploying all of this and
             | think "oh damn, that's nuts", not realizing you can still
             | just deploy it almost exactly the same as IPv4 with longer
             | addresses if that's all you want.
        
               | JoshTriplett wrote:
               | > They're almost always deployed though because people
               | end up liking the ideas.
               | 
               | Or they're deployed because it's difficult to use IPv6
               | without them, even if you want to. For instance, it's
               | quite difficult to use Linux with IPv6 in a static
               | configuration _without_ any form of autodiscovery of
               | addresses or routes; I 've yet to achieve such a
               | configuration. With IPv4, I can bring up the network in a
               | tiny fraction of a second and have it _work_ ; with IPv6,
               | the only successful configuration I've found takes many
               | seconds to decide it has a working network, and sometimes
               | flakes out entirely.
               | 
               | Challenge: boot up an AWS instance, configure networking
               | using your preferred IP version, successfully make a
               | connection to an external server using that version, and
               | get a packet back, in under 500ms from the time your
               | instance gets control, succeeding 50 times out of 50.
               | Very doable with IPv4; I have yet to achieve that with
               | IPv6.
        
             | bigstrat2003 wrote:
             | I've run IPv6 on both corporate and home networks. Whether
             | or not the additions were merited, they are not a
             | formidable challenge for any reasonably-skilled admin. So
             | no, I don't think that the reason you gave suffices as an
             | excuse for why so many still refuse to deploy IPv6.
        
         | ay wrote:
         | The first transition was _to_ IPv4, and it was reportedly (I
         | wasn't in the workforce yet :-) relatively easy...
         | 
         | https://www.internetsociety.org/blog/2016/09/final-report-on...
         | 
         | Some more interesting history reading here:
         | 
         | https://datatracker.ietf.org/doc/html/rfc33
        
       | SlowTao wrote:
       | Can you imagine the argument for 8bit bytes if we still lived in
       | the original 6bit world of the 1950s?
       | 
       | A big part of the move to 8bit systems was that it allowed
       | expanded text systems with letter casing, punctuation and various
       | ASCII stuff.
       | 
       | We could move to the world of Fortran 36bit if really needed and
       | solve all these problems while introducing a problem called
       | Fortran.
        
         | LegionMammal978 wrote:
         | There was already more than enough space for characters with
         | 12-bit systems like the PDP-8. If anything, the convergence on
         | 8-bit words just made it more efficient to use 7-bit codepages
         | like ASCII.
        
           | consp wrote:
           | As the UTF encodings have shown you can put any encoding in
           | any bitform if need be.
        
       | duskwuff wrote:
       | Non-power-of-2 sizes are awkward from a hardware perspective. A
       | lot of designs for e.g. optimized multipliers depend on the
       | operands being divisible into halves; that doesn't work with
       | units of 9 bits. It's also nice to be able to describe a bit
       | position using a fixed number of bits (e.g. 0-7 in 3 bits, 0-31
       | in 5 bits, 0-63 in 6 bits), e.g. to represent a number of bitwise
       | shift operations, or to select a bit from a byte; this also falls
       | apart with 9, where you'd have to use four bits and have a bunch
       | of invalid values.
        
         | falcor84 wrote:
         | We just need 3 valued electronics
        
           | percentcer wrote:
           | on, off, and the other thing
        
             | tyingq wrote:
             | hi-z is one choice. Though I don't know how well that does
             | past a certain speed.
        
           | skissane wrote:
           | The Soviets had ternary computers:
           | https://en.wikipedia.org/wiki/Setun
           | 
           | Then they decided to abandon their indigenous technology in
           | favour of copying Western designs
        
       | smallstepforman wrote:
       | The elephant in the room nobody talks about is silicon cost
       | (wires, gates, multiplexirs, AND and OR gates etc). With a 4th
       | lane, you may as well go straight to 16 bits to a byte.
        
         | pratyahava wrote:
         | This must be the real reason of using 8-bit. But then why did
         | they make 9-bit machine instead of 16-bit?
        
           | AlotOfReading wrote:
           | The original meaning of byte was a variable number of bits to
           | represent a character, joined into a larger word that
           | reflected the machine's internal structure. The IBM STRETCH
           | machines could change how many bits per character. This was
           | originally only 1-6 bits [1] because they didn't see much
           | need for 8 bit characters and it would have forced them to
           | choose 64 bit words, when 60 bit words was faster and
           | cheaper. A few months later they had a change of heart after
           | considering how addressing interacted with memory paging [2]
           | and added support for 8 bit bytes for futureproofing and 64
           | bit words, which became dominant with the 360.
           | 
           | [1] https://web.archive.org/web/20170404160423/http://archive
           | .co...
           | 
           | [2] https://web.archive.org/web/20170404161611/http://archive
           | .co...
        
       | alphazard wrote:
       | When you stop to think about it, it really doesn't make sense to
       | have memory addresses map to 8-bit values, instead of bits
       | directly. Storage, memory, and CPUs all deal with larger blocks
       | of bits, which have names like "pages" and "sectors" and "words"
       | depending on the context.
       | 
       | If accessing a bit is really accessing a larger block and
       | throwing away most of it in every case, then the additional byte
       | grouping isn't really helping much.
        
         | SpaceNoodled wrote:
         | It makes sense for the address to map to a value the same width
         | as the data bus.
         | 
         | A one-bit wide bus ... er, wire, now, I guess ... Could work
         | just fine, but now we are extremely limited with the number of
         | operations achievable, as well as the amount of addressable
         | data: an eight-bit address can now only reference a maximum of
         | 32 bytes of data, which is so small as to be effectively
         | useless.
        
           | alphazard wrote:
           | If each memory address mapped to a CPU word sized value, that
           | would make sense, and that is closer to the reality of
           | instructions reading a word of memory at a time. Instead of
           | using the CPU word size as the smallest addressable value, or
           | the smallest possible value (a bit) as the smallest
           | addressable value, we use a byte.
           | 
           | It's an arbitrary grouping, and worse, it's rarely useful to
           | think in terms of it. If you are optimizing access patterns,
           | then you are thinking in terms of CPU words, cache line
           | sizes, memory pages, and disk sectors. None of those are
           | bytes.
        
         | wmf wrote:
         | Byte addressing is really useful for string handling.
        
       | m463 wrote:
       | We have already solved this problem many times.
       | 
       | In clothing stores, numerical clothes sizes have steadily grown a
       | little larger.
       | 
       | The same make and model car/suv/pickup have steadily grown larger
       | in stance.
       | 
       | I think what is needed is to silently add 9-bit bytes, but don't
       | tell anyone.
       | 
       | also: https://imgs.xkcd.com/comics/standards_2x.png
        
       | nottorp wrote:
       | Of course, if that happens we'll get an article demanding 10-bit
       | bytes.
       | 
       | Got to stop somewhere.
        
       | NelsonMinar wrote:
       | This is ignoring the natural fact that we have 8 bit bytes
       | because programmers have 8 fingers.
        
         | mkl wrote:
         | Most have 10. That's the reason we use base 10 for numbers,
         | even though 12 would make a lot of things easier:
         | https://en.wikipedia.org/wiki/Duodecimal
        
           | alserio wrote:
           | ISO reserves programmers thumbs to LGTM on pull requests
        
         | classichasclass wrote:
         | No, we still have 10. Real programmers think in octal. ;)
        
       | skort wrote:
       | > Thank you to GPT 4o and o4 for discussions, research, and
       | drafting.
       | 
       | Note to the author, put this up front, so I know that you did the
       | bare minimum and I can safely ignore this article for the slop it
       | is.
        
       | js8 wrote:
       | I have thought for fun about a little RISC microcomputer with
       | 6-bit bytes, and 4-byte words (12 MiB of addressable RAM). I
       | think 6-bit bytes would have been great at a point in history,
       | and in something crazy fun like Minecraft. (It's actually
       | interesting question, if we were to design early microprocessors
       | with today's knowledge of HW methods, things like RISC, caches or
       | pipelining, what would we do differently?)
        
       | zokier wrote:
       | Another interesting thought experiment would what if we went down
       | to 6 bit bytes instead? Then the common values probably would be
       | 24 and especially 48 bits (4 and 8 bytes), but 36 bit values
       | might have appeared also in some places. In many ways 6 bit bytes
       | would have had similar effect than 9 bit bytes; 18 and 36 bits
       | would have been 3 and 6 bytes instead of 2 and 4 bytes. Notably
       | with 6 bit bytes text encoding would have needed to be multibyte
       | from the get-go, which might have been significant benefit (12
       | bit ASCII?)
        
         | wmf wrote:
         | Some early mainframes used 6-bit characters which is why they
         | didn't have lowercase.
        
       | HappyPanacea wrote:
       | Panchekha is on a roll lately, I just read all of his recent
       | posts a week ago. I really liked his AI vs Herbie series.
        
       | sedatk wrote:
       | Our capability to mispredict wouldn't have been different. We
       | would have still picked the wrong size, and got stuck with
       | scaling problems.
        
       | labrador wrote:
       | At the end: "Thank you to GPT 4o and o4 for discussions,
       | research, and drafting."
       | 
       | At first I thought that was a nice way to handle credit, but on
       | further thought I wonder if this is necessary because the base
       | line assumption is that everyone is using LLMs to help them
       | write.
        
         | xandrius wrote:
         | Yeah, I don't remember ever thanking the spellchecker anything
         | in the past. Maybe we are kinder to technology nowadays that we
         | even credit it?
         | 
         | Thank you to Android for mobile Internet connectivity,
         | browsing, and typing.
        
           | labrador wrote:
           | A counter point is that googling "thank you linux" turns up a
           | lot of hits. "thank you linux for opening my eyes to a bigger
           | world" is a typical comment.
        
         | svachalek wrote:
         | As soon as that's my baseline assumption, I think I'm done with
         | the internet. I can get LLM slop on my own.
        
           | labrador wrote:
           | I thought the article was well written. I'm assuming the
           | author did most of the writing because it didn't sound like
           | AI slop. I also assume he meant he uses AI to assist, not as
           | the main driver.
        
       | PaulHoule wrote:
       | I thought the PDP 10 had 6-bit bytes, or at least 6-bit
       | characters
       | 
       | https://en.wikipedia.org/wiki/Six-bit_character_code#DEC_SIX...
       | 
       | Notably the PDP 8 had 12 bit words (2x6) and the PDP 10 had 36
       | bit words (6x6)
       | 
       | Notably the PDP 10 had addressing modes where it could address a
       | run of bits inside a word so it was adaptable to working with
       | data from other systems. I've got some notes on a fantasy
       | computer that has 48-bit words (fit inside a Javascript double!)
       | and a mechanism like the PDP 10 where you can write "deep
       | pointers" that have a bit offset and length that can even hang
       | into the next word, with the length set to zero bits this could
       | address UTF-8 character sequences. Think of a world where
       | something like the PDP 10 inspired microcomputers, was used by
       | people who used CJK characters and has a video system that would
       | make the NeoGeo blush. Crazy I know.
        
       | nayuki wrote:
       | Today, we all agree that "byte" means 8 bits. But half a century
       | ago, this was not so clear and the different hardware
       | manufacturers were battling it out with different sized bytes.
       | 
       | A reminder of that past history is that in Internet standards
       | documents, the word "octet" is used to unambiguously refer to an
       | 8-bit byte. Also, "octet" is the French word for byte, so a
       | "gigaoctet (Go)" is a gigabyte (GB) in English.
       | 
       | (Now, if only we could pin down the sizes of C/C++'s
       | char/short/int/long/long-long integer types...)
        
       | Dwedit wrote:
       | Many old 8-bit processors was basically 9-bit processors once you
       | considered the carry flag.
        
       | apt-apt-apt-apt wrote:
       | We may have been stuck with slower, more expensive machines for
       | 40+ years while computers that couldn't fully use the higher
       | limits wasted time and energy.
        
       | brudgers wrote:
       | [delayed]
        
       | kyralis wrote:
       | "We've guessed wrong historically on data sizes, and if we had 9
       | bit bytes those guesses (if otherwise unchanged) would have been
       | less wrong, so 9 bit bytes would be better!" is an extremely
       | tenuous argument. Different decisions would have been made.
       | 
       | We need to be better at estimating require sizes, not trying to
       | trick ourselves into accomplishing that by slipping in an extra
       | bit to our bytes.
        
       | Nevermark wrote:
       | What if wherever we were using a byte, we just used two bytes?
       | 
       | 8-bit byte software continues to work, while the new 16-bit
       | "doublebyte" software gets all the advantages of the extra 8 bits
       | without any requiring any changes to CPU/GPU, RAM, SSD, ...
       | 
       | Magic. :)
        
       | LarMachinarum wrote:
       | while none of the arguments of the article came even close to
       | being convincing or to balancing out the disadvantages of a non-
       | power-of-two orientation, there actually is one totally different
       | argument/domain where the 9 bit per byte thing would hold true,
       | that is: ECC bits in consumer devices (as opposed to just on
       | servers):
       | 
       | The fact that Intel managed to push their shitty market
       | segmentation strategy of only even supporting ECC RAM on servers
       | has rather nefarious and long-lasting consequences.
        
       ___________________________________________________________________
       (page generated 2025-08-06 23:00 UTC)