[HN Gopher] We'd be better off with 9-bit bytes
___________________________________________________________________
We'd be better off with 9-bit bytes
Author : luu
Score : 43 points
Date : 2025-08-06 19:39 UTC (3 hours ago)
(HTM) web link (pavpanchekha.com)
(TXT) w3m dump (pavpanchekha.com)
| FrankWilhoit wrote:
| That's what the PDP-10 community was saying decades ago.
| Keyframe wrote:
| Yeah, but hear me out - 10-bit bytes!
| Waterluvian wrote:
| Uh oh. Looks like humanity has been bitten by the bit byte bug.
| pdpi wrote:
| One of the nice features of 8 bit bytes is being able to break
| them into two hex nibbles. 9 bits breaks that, though you could
| do three octal digits instead I suppose.
|
| 10 bit bytes would give us 5-bit nibbles. That would be 0-9a-v
| digits, which seems a bit extreme.
| pratyahava wrote:
| Crockford base32 would be great. it is 0-9, A-Z minus I, L,
| O, U.
| pdpi wrote:
| The moment you feel the need to skip letters due to
| propensity for errors should also be the moment you realise
| you're doing something wrong, though. It's kind of fine if
| you want a case insensitive encoding scheme, but it's kind
| of nasty for human-first purposes (e.g. in source code).
| int_19h wrote:
| Clearly it should be 12 bits, that way you could use either 3
| hex digits or 4 octal ones. ~
| monocasa wrote:
| Alternate world where the pdp-8 evolved into our modern
| processors.
| pratyahava wrote:
| i am fascinated with the idea of 10-bit bytes since years ago.
| i asked chatgpt "below is a piece of an article about 9-bit
| bytes, create the same but for 10-bit bytes" and gave a piece,
| the reply is below.
|
| IPv4: Everyone knows the story: IPv4 had 32-bit addresses, so
| about 4 billion total.33 Less due to various reserved subnets.
| That's not enough in a world with 8 billion humans, and that's
| led to NATs, more active network middleware, and the impossibly
| glacial pace of IPv6 roll-out. It's 2025 and Github--Github!--
| doesn't support IPv6. But in a world with 10-bit bytes IPv4
| would have had 40-bit addresses, about 1 trillion total. That
| would be more than enough right now, and likely sufficient well
| into the 22nd century.44 In our timeline, exhaustion hit in
| 2011, when demand was doubling every five years. 256x more
| addresses gets us to 2065 projecting linearly, and probably
| later with slowing growth. When exhaustion does set in, it
| would plausibly happen in a world where address demand has
| stabilized, and light market forces or reallocation would
| suffice--no need for NAT spaghetti or painful protocol
| transitions.
|
| UNIX time: In our timeline, 32-bit UNIX timestamps run out in
| 2038, so again all software has to painfully transition to
| larger, 64-bit structures. Equivalent 40-bit timestamps last
| until year 34,857, so absolutely no hurry. Negative timestamps
| would reach back to year -34,818, easily covering everything
| from the birth of agriculture to the last Ice Age to the time
| Neanderthals still roamed Europe.55 And yes, probably long
| enough to support most science fiction timelines without
| breaking a sweat.
| relevant_stats wrote:
| I really don't get why some people like to pollute
| conversations with LLMs answers. Particularly when they are
| as dumb as your example.
|
| What's the point?
| svachalek wrote:
| Same, we all have access to the LLM too, but I go to forums
| for human thoughts.
| pratyahava wrote:
| ok, agree with your point, i should have got the numbers
| from chatgpt and just put them in the comment with my
| words, i was just lazy to calculate how much profit we
| would have with 10-bit bytes.
| pratyahava wrote:
| umm, i guess most of the article is made by llm, so i did
| not see it as a sin, but for other cases i agree, copy-
| pasting from llm is crap
| iosjunkie wrote:
| No! No, no, not 10! He said 9. Nobody's comin' up with 10. Who
| processing with 10 bits? What's the extra bit for? You're just
| wastin' electrons.
| titzer wrote:
| 10 bit bytes would be awesome! Think of 20 bit microcontrollers
| and 40 bit workstations. 40 bits makes 5 byte words, that'd be
| rad. Also, CPUs could support "legacy" 32 bit integers and use
| a full 8 bits for tags, which are useful for implementing
| dynamic languages.
| zamadatix wrote:
| Because we have 8 bit bytes we are familiar with the famous or
| obvious cases multiples-of-8-bits ran out, and those cases sound
| a lot better with 12.5% extra bits. What's harder to see in this
| kind of thought experiment is what the famously obvious cases
| multiples-of-9-bits ran out would have been. The article starts
| to think about some of these towards the end, but it's hard as
| it's not immediately obvious how many others there might be (or,
| alternatively, why it'd be significantly different total number
| of issues than 8 bit bytes had). ChatGPT particularly isn't going
| to have a ton of training data about the problems with 9 bit
| multiples running out to hand feed you.
|
| It also works in the reverse direction too. E.g. knowing
| networking headers don't even care about byte alignment for sub
| fields (e.g. a VID is 10 bits because it's packed with a few
| other fields in 2 bytes) I wouldn't be surprised if IPv4 would
| have ended up being 3 byte addresses = 27 bits, instead of
| 4*9=36, since they were more worried with small packet overheads
| than matching specific word sizes in certain CPUs.
| marcosdumay wrote:
| Well, there should be half as many cases of multiples-of-9-bits
| ran out than for multiples-of-8-bits.
|
| I don't think this is enough of a reason, though.
| MangoToupe wrote:
| Maybe if we worked with 7-bit bytes folks would be more grateful.
| folsom wrote:
| I don't know what if we ended up with a 27 bit address space?
|
| As far as ISPs competing on speeds in the mid 90s, for some
| reason it feels like historical retrospectives are always about
| ten years off.
| jayd16 wrote:
| I guess nibbles would be 3 bits and you'd 3 per byte?
| monocasa wrote:
| Ohh, and then we could write the digits in octal.
|
| Interestingly, the N64 internally had 9 bit bytes, just accesses
| from the CPU ignored one of the bits. This wasn't a parity bit,
| but instead a true extra data bit that was used by the GPU.
| ethan_smith wrote:
| The N64's Reality Display Processor actually used that 9th bit
| as a coverage mask for antialiasing, allowing per-pixel alpha
| blending without additional memory lookups.
| monocasa wrote:
| As well as extra bits in the Z buffer to give it a 15.3 fixed
| point format.
| kazinator wrote:
| 36 bit addresses would be better than 32, but I like being able
| to store a 64 bit double or pointer or integer in a word using
| NaN tagging (subject to the limitation that only 48 bits of the
| pointer are significant).
| consp wrote:
| Funny thing is we sort-of got 36bit addressing mainstream with
| PAE in the 32bit x86 age.
| kazinator wrote:
| We sort of got 16 + 4 = 20 bit addressing in the 16 bit x86
| age too.
| Retr0id wrote:
| Aside from memory limits, one of the problems with 32-bit
| pointers is that ASLR is weakened as a security mitigation -
| there's simply fewer bits left to randomise. A 36-bit address
| space doesn't improve on this much.
|
| 64-bit pointers are pretty spacious and have "spare" bits for
| metadata (e.g. PAC, NaN-boxing). 72-bit pointers are even better
| I suppose, but their adoption would've come later.
| kazinator wrote:
| Problem is, not only did we have decades of C code that
| unnecessarily assumed 8/16/32, this all-the-world-is-a-VAX view
| is now baked into newer languages.
|
| C is good for portability to this kind of machine. You can have a
| 36 bit int (for instance), CHAR_BIT is defined as 9 and so on.
|
| With a little bit of extra reasoning, you can make the code fit
| different machines sizes so that you use all the available bits.
| pratyahava wrote:
| was that assumption in C code really unnecessary? i suppose it
| made many things much easier.
| kazinator wrote:
| In my experience, highly portable C is cleaner and easier to
| understand and maintain than C which riddles abstract logic
| with dependencies on the specific parameters of the abstract
| machine.
|
| Sometimes the latter is a win, but not if that is your
| default modus operandi.
|
| Another issue is that machine-specific code that assumes
| compiler and machine characteristics often has outright
| undefined behavior, not making distinctions between "this
| type is guaranteed to be 32 bits" and "this type is
| guaranteed to wrap around to a negative value" or "if we
| shift this value 32 bits or more, we get zero so we are okay"
| and such.
|
| There are programmers who are not stupid like this, but those
| are the ones who will tend to reach for portable coding.
| 0cf8612b2e1e wrote:
| Now a C++ proposal to define a byte as 8 bits
|
| https://isocpp.org/files/papers/P3477R1.html
| TruffleLabs wrote:
| PDP-8 has a 12-bit word size
| bawolff wrote:
| > But in a world with 9-bit bytes IPv4 would have had 36-bit
| addresses, about 64 billion total.
|
| Or we would have had 27 bit addresses and ran into problems
| sooner.
| bigstrat2003 wrote:
| That might've been better, actually. The author makes the
| mistake of "more time would've made this better", but we've had
| plenty of time to transition to IPv6. People simply don't
| because they are lazy and IPv4 works for them. More time
| wouldn't help that, any more than a procrastinating student
| benefits when the deadline for a paper gets extended.
|
| But on the other hand, if we had run out _sooner_ , perhaps
| IPv4 wouldn't be as entrenched and people would've been more
| willing to switch. Maybe not, of course, but it's at least a
| possibility.
| dmitrygr wrote:
| > simply don't because they are lazy and IPv4 works for them
|
| Or because IPv6 was not a simple "add more bits to address"
| but a much larger in-places-unwanted change.
| zamadatix wrote:
| Most of the "unwanted" things in IPv6 aren't actually
| required by IPv6. Temporary addresses, most of the feature
| complexity in NDP, SLAAC, link-local addresses for anything
| but the underlying stuff that happens automatically, "no
| NAT, you must use PD", probably more I'm forgetting.
| Another large portion is things related to trying to be
| dual stack like concurrent resolutions/requests, various
| forms of tunneling, NAT64, and others.
|
| They're almost always deployed though because people end up
| liking the ideas. They don't want to configure VRRP for
| gateway redundancy, they don't want a DHCP server for
| clients to be able to connect, they want to be able to use
| link-local addresses for certain application use cases,
| they want the random addresses for increased privacy, they
| want to dual stack for compatibility, etc. For the people
| that don't care they see people deploying all of this and
| think "oh damn, that's nuts", not realizing you can still
| just deploy it almost exactly the same as IPv4 with longer
| addresses if that's all you want.
| JoshTriplett wrote:
| > They're almost always deployed though because people
| end up liking the ideas.
|
| Or they're deployed because it's difficult to use IPv6
| without them, even if you want to. For instance, it's
| quite difficult to use Linux with IPv6 in a static
| configuration _without_ any form of autodiscovery of
| addresses or routes; I 've yet to achieve such a
| configuration. With IPv4, I can bring up the network in a
| tiny fraction of a second and have it _work_ ; with IPv6,
| the only successful configuration I've found takes many
| seconds to decide it has a working network, and sometimes
| flakes out entirely.
|
| Challenge: boot up an AWS instance, configure networking
| using your preferred IP version, successfully make a
| connection to an external server using that version, and
| get a packet back, in under 500ms from the time your
| instance gets control, succeeding 50 times out of 50.
| Very doable with IPv4; I have yet to achieve that with
| IPv6.
| bigstrat2003 wrote:
| I've run IPv6 on both corporate and home networks. Whether
| or not the additions were merited, they are not a
| formidable challenge for any reasonably-skilled admin. So
| no, I don't think that the reason you gave suffices as an
| excuse for why so many still refuse to deploy IPv6.
| ay wrote:
| The first transition was _to_ IPv4, and it was reportedly (I
| wasn't in the workforce yet :-) relatively easy...
|
| https://www.internetsociety.org/blog/2016/09/final-report-on...
|
| Some more interesting history reading here:
|
| https://datatracker.ietf.org/doc/html/rfc33
| SlowTao wrote:
| Can you imagine the argument for 8bit bytes if we still lived in
| the original 6bit world of the 1950s?
|
| A big part of the move to 8bit systems was that it allowed
| expanded text systems with letter casing, punctuation and various
| ASCII stuff.
|
| We could move to the world of Fortran 36bit if really needed and
| solve all these problems while introducing a problem called
| Fortran.
| LegionMammal978 wrote:
| There was already more than enough space for characters with
| 12-bit systems like the PDP-8. If anything, the convergence on
| 8-bit words just made it more efficient to use 7-bit codepages
| like ASCII.
| consp wrote:
| As the UTF encodings have shown you can put any encoding in
| any bitform if need be.
| duskwuff wrote:
| Non-power-of-2 sizes are awkward from a hardware perspective. A
| lot of designs for e.g. optimized multipliers depend on the
| operands being divisible into halves; that doesn't work with
| units of 9 bits. It's also nice to be able to describe a bit
| position using a fixed number of bits (e.g. 0-7 in 3 bits, 0-31
| in 5 bits, 0-63 in 6 bits), e.g. to represent a number of bitwise
| shift operations, or to select a bit from a byte; this also falls
| apart with 9, where you'd have to use four bits and have a bunch
| of invalid values.
| falcor84 wrote:
| We just need 3 valued electronics
| percentcer wrote:
| on, off, and the other thing
| tyingq wrote:
| hi-z is one choice. Though I don't know how well that does
| past a certain speed.
| skissane wrote:
| The Soviets had ternary computers:
| https://en.wikipedia.org/wiki/Setun
|
| Then they decided to abandon their indigenous technology in
| favour of copying Western designs
| smallstepforman wrote:
| The elephant in the room nobody talks about is silicon cost
| (wires, gates, multiplexirs, AND and OR gates etc). With a 4th
| lane, you may as well go straight to 16 bits to a byte.
| pratyahava wrote:
| This must be the real reason of using 8-bit. But then why did
| they make 9-bit machine instead of 16-bit?
| AlotOfReading wrote:
| The original meaning of byte was a variable number of bits to
| represent a character, joined into a larger word that
| reflected the machine's internal structure. The IBM STRETCH
| machines could change how many bits per character. This was
| originally only 1-6 bits [1] because they didn't see much
| need for 8 bit characters and it would have forced them to
| choose 64 bit words, when 60 bit words was faster and
| cheaper. A few months later they had a change of heart after
| considering how addressing interacted with memory paging [2]
| and added support for 8 bit bytes for futureproofing and 64
| bit words, which became dominant with the 360.
|
| [1] https://web.archive.org/web/20170404160423/http://archive
| .co...
|
| [2] https://web.archive.org/web/20170404161611/http://archive
| .co...
| alphazard wrote:
| When you stop to think about it, it really doesn't make sense to
| have memory addresses map to 8-bit values, instead of bits
| directly. Storage, memory, and CPUs all deal with larger blocks
| of bits, which have names like "pages" and "sectors" and "words"
| depending on the context.
|
| If accessing a bit is really accessing a larger block and
| throwing away most of it in every case, then the additional byte
| grouping isn't really helping much.
| SpaceNoodled wrote:
| It makes sense for the address to map to a value the same width
| as the data bus.
|
| A one-bit wide bus ... er, wire, now, I guess ... Could work
| just fine, but now we are extremely limited with the number of
| operations achievable, as well as the amount of addressable
| data: an eight-bit address can now only reference a maximum of
| 32 bytes of data, which is so small as to be effectively
| useless.
| alphazard wrote:
| If each memory address mapped to a CPU word sized value, that
| would make sense, and that is closer to the reality of
| instructions reading a word of memory at a time. Instead of
| using the CPU word size as the smallest addressable value, or
| the smallest possible value (a bit) as the smallest
| addressable value, we use a byte.
|
| It's an arbitrary grouping, and worse, it's rarely useful to
| think in terms of it. If you are optimizing access patterns,
| then you are thinking in terms of CPU words, cache line
| sizes, memory pages, and disk sectors. None of those are
| bytes.
| wmf wrote:
| Byte addressing is really useful for string handling.
| m463 wrote:
| We have already solved this problem many times.
|
| In clothing stores, numerical clothes sizes have steadily grown a
| little larger.
|
| The same make and model car/suv/pickup have steadily grown larger
| in stance.
|
| I think what is needed is to silently add 9-bit bytes, but don't
| tell anyone.
|
| also: https://imgs.xkcd.com/comics/standards_2x.png
| nottorp wrote:
| Of course, if that happens we'll get an article demanding 10-bit
| bytes.
|
| Got to stop somewhere.
| NelsonMinar wrote:
| This is ignoring the natural fact that we have 8 bit bytes
| because programmers have 8 fingers.
| mkl wrote:
| Most have 10. That's the reason we use base 10 for numbers,
| even though 12 would make a lot of things easier:
| https://en.wikipedia.org/wiki/Duodecimal
| alserio wrote:
| ISO reserves programmers thumbs to LGTM on pull requests
| classichasclass wrote:
| No, we still have 10. Real programmers think in octal. ;)
| skort wrote:
| > Thank you to GPT 4o and o4 for discussions, research, and
| drafting.
|
| Note to the author, put this up front, so I know that you did the
| bare minimum and I can safely ignore this article for the slop it
| is.
| js8 wrote:
| I have thought for fun about a little RISC microcomputer with
| 6-bit bytes, and 4-byte words (12 MiB of addressable RAM). I
| think 6-bit bytes would have been great at a point in history,
| and in something crazy fun like Minecraft. (It's actually
| interesting question, if we were to design early microprocessors
| with today's knowledge of HW methods, things like RISC, caches or
| pipelining, what would we do differently?)
| zokier wrote:
| Another interesting thought experiment would what if we went down
| to 6 bit bytes instead? Then the common values probably would be
| 24 and especially 48 bits (4 and 8 bytes), but 36 bit values
| might have appeared also in some places. In many ways 6 bit bytes
| would have had similar effect than 9 bit bytes; 18 and 36 bits
| would have been 3 and 6 bytes instead of 2 and 4 bytes. Notably
| with 6 bit bytes text encoding would have needed to be multibyte
| from the get-go, which might have been significant benefit (12
| bit ASCII?)
| wmf wrote:
| Some early mainframes used 6-bit characters which is why they
| didn't have lowercase.
| HappyPanacea wrote:
| Panchekha is on a roll lately, I just read all of his recent
| posts a week ago. I really liked his AI vs Herbie series.
| sedatk wrote:
| Our capability to mispredict wouldn't have been different. We
| would have still picked the wrong size, and got stuck with
| scaling problems.
| labrador wrote:
| At the end: "Thank you to GPT 4o and o4 for discussions,
| research, and drafting."
|
| At first I thought that was a nice way to handle credit, but on
| further thought I wonder if this is necessary because the base
| line assumption is that everyone is using LLMs to help them
| write.
| xandrius wrote:
| Yeah, I don't remember ever thanking the spellchecker anything
| in the past. Maybe we are kinder to technology nowadays that we
| even credit it?
|
| Thank you to Android for mobile Internet connectivity,
| browsing, and typing.
| labrador wrote:
| A counter point is that googling "thank you linux" turns up a
| lot of hits. "thank you linux for opening my eyes to a bigger
| world" is a typical comment.
| svachalek wrote:
| As soon as that's my baseline assumption, I think I'm done with
| the internet. I can get LLM slop on my own.
| labrador wrote:
| I thought the article was well written. I'm assuming the
| author did most of the writing because it didn't sound like
| AI slop. I also assume he meant he uses AI to assist, not as
| the main driver.
| PaulHoule wrote:
| I thought the PDP 10 had 6-bit bytes, or at least 6-bit
| characters
|
| https://en.wikipedia.org/wiki/Six-bit_character_code#DEC_SIX...
|
| Notably the PDP 8 had 12 bit words (2x6) and the PDP 10 had 36
| bit words (6x6)
|
| Notably the PDP 10 had addressing modes where it could address a
| run of bits inside a word so it was adaptable to working with
| data from other systems. I've got some notes on a fantasy
| computer that has 48-bit words (fit inside a Javascript double!)
| and a mechanism like the PDP 10 where you can write "deep
| pointers" that have a bit offset and length that can even hang
| into the next word, with the length set to zero bits this could
| address UTF-8 character sequences. Think of a world where
| something like the PDP 10 inspired microcomputers, was used by
| people who used CJK characters and has a video system that would
| make the NeoGeo blush. Crazy I know.
| nayuki wrote:
| Today, we all agree that "byte" means 8 bits. But half a century
| ago, this was not so clear and the different hardware
| manufacturers were battling it out with different sized bytes.
|
| A reminder of that past history is that in Internet standards
| documents, the word "octet" is used to unambiguously refer to an
| 8-bit byte. Also, "octet" is the French word for byte, so a
| "gigaoctet (Go)" is a gigabyte (GB) in English.
|
| (Now, if only we could pin down the sizes of C/C++'s
| char/short/int/long/long-long integer types...)
| Dwedit wrote:
| Many old 8-bit processors was basically 9-bit processors once you
| considered the carry flag.
| apt-apt-apt-apt wrote:
| We may have been stuck with slower, more expensive machines for
| 40+ years while computers that couldn't fully use the higher
| limits wasted time and energy.
| brudgers wrote:
| [delayed]
| kyralis wrote:
| "We've guessed wrong historically on data sizes, and if we had 9
| bit bytes those guesses (if otherwise unchanged) would have been
| less wrong, so 9 bit bytes would be better!" is an extremely
| tenuous argument. Different decisions would have been made.
|
| We need to be better at estimating require sizes, not trying to
| trick ourselves into accomplishing that by slipping in an extra
| bit to our bytes.
| Nevermark wrote:
| What if wherever we were using a byte, we just used two bytes?
|
| 8-bit byte software continues to work, while the new 16-bit
| "doublebyte" software gets all the advantages of the extra 8 bits
| without any requiring any changes to CPU/GPU, RAM, SSD, ...
|
| Magic. :)
| LarMachinarum wrote:
| while none of the arguments of the article came even close to
| being convincing or to balancing out the disadvantages of a non-
| power-of-two orientation, there actually is one totally different
| argument/domain where the 9 bit per byte thing would hold true,
| that is: ECC bits in consumer devices (as opposed to just on
| servers):
|
| The fact that Intel managed to push their shitty market
| segmentation strategy of only even supporting ECC RAM on servers
| has rather nefarious and long-lasting consequences.
___________________________________________________________________
(page generated 2025-08-06 23:00 UTC)