[HN Gopher] How to pack ternary numbers in 8-bit bytes
       ___________________________________________________________________
        
       How to pack ternary numbers in 8-bit bytes
        
       Author : todsacerdoti
       Score  : 65 points
       Date   : 2024-12-05 15:51 UTC (7 hours ago)
        
 (HTM) web link (compilade.net)
 (TXT) w3m dump (compilade.net)
        
       | sigmoid10 wrote:
       | >I'll be calling a "ternary digit" a "trit", like a "binary
       | digit" is called a "bit".
       | 
       | Shouldn't it be a "tit" if you want to be consistent? In fact,
       | please let's make that happen. The HR execs are gonna go crazy
       | when they hear the devs talking about tits all day.
        
         | jgtrosh wrote:
         | Wait till you learn what the french pronunciation of bit sounds
         | like
        
         | foxyv wrote:
         | They were talking about tits and killing children! I think we
         | have a satanic cult in our IT department.
        
           | koolba wrote:
           | " _And then he said the way to become a demon was to kill his
           | parent!_ "
        
         | zamadatix wrote:
         | It's a joke comment but it made me realize people see the "bit"
         | shortening coming about differently. It could be any one of
         | (and maybe even more, I just had to stop thinking about this
         | before I ended up spending all day on it):
         | 
         | - BInary digiT [BIT]
         | 
         | - BInary digIT [BIT[ (i.e. "merge on the shared vowel")
         | 
         | - Binary digIT [BIT]
         | 
         | That's consistent with the common usage of trit if you take it
         | as a "trinary digit":
         | 
         | - TRInary digiT [TRIT]
         | 
         | - TRInary digIT [TRIT]
         | 
         | - TRinary digIT [TRIT]
         | 
         | But if you're one to stick to ternary digit then you can end up
         | with all sorts of unique combos:
         | 
         | - TErnary digiT [TET]
         | 
         | - TErnary digIT [TEIT] (no shared vowel to merge on)
         | 
         | - Ternary digIT [TIT] (the above humor)
         | 
         | Personally, the middle options for the first 2 sections (BIT
         | and TRIT via shared vowel) are what I always mentally jumped
         | to. If I was going to stick with "ternary" terminology I think,
         | from a pure mapping of mental mechanics, I'd have ended up on
         | TET (i.e. when there is no shared vowel assume the vowel was
         | from the first word instead of the 2nd). The problem with that
         | being TET conflicts with base 4, leaving you with either the
         | awkward double vowel or the awkward end vowel pull. Or just
         | calling it a trinary digit where you get the same consistency
         | results as from BIT.
         | 
         | Thank you for coming to my TED talk (the things a hacker thinks
         | about while burning their PTO I suppose?)
        
           | beng-nl wrote:
           | I look forward to your thesis.
        
           | sigmoid10 wrote:
           | Nice analysis, but you missed the most obvious one, which is
           | also what I was deriving from in the above comment:
           | Pronunciation. The "i" in "bit" is pronounced exactly like
           | the second "i" in "digit" and not like the "i" in "binary",
           | which sounds more like the "i" in bite. So the "it" in "bit"
           | has to come from digIT. Consequently, the most logical
           | shortening for "Trinary digIT" must be TIT.
        
             | zamadatix wrote:
             | That's the "Binary digIT [BIT]" line. Do you always go
             | based on the input sounds rather than output reading or
             | just sometimes though? E.g. for special weapons and tactics
             | do you say it like you would in "the cat swat at the toy"
             | or more akin to "sw@" (same first half but then "at" for
             | the second half because the 'a' comes from a short 'a'?).
             | 
             | Of course, just for GIF alone I wouldn't mind your method
             | was the universally obvious way... but now I might as well
             | talk about religion and politics too!
        
               | sigmoid10 wrote:
               | I meant you missed that line for the "trinary digit"
               | cases.
        
           | saghm wrote:
           | > Personally, the middle options for the first 2 sections
           | (BIT and TRIT via shared vowel) are what I always mentally
           | jumped to. If I was going to stick with "ternary" terminology
           | I think, from a pure mapping of mental mechanics, I'd have
           | ended up on TET (i.e. when there is no shared vowel assume
           | the vowel was from the first word instead of the 2nd). The
           | problem with that being TET conflicts with base 4, leaving
           | you with either the awkward double vowel or the awkward end
           | vowel pull. Or just calling it a trinary digit where you get
           | the same consistency results as from BIT.
           | 
           | If I had been trying to come up with something, I probably
           | would have started with an approach of picking something that
           | was as similar-sounding to "bit" as possible so it would be
           | recognizable and then looking for a justification I could use
           | to argue why it was consistent rather than trying to figure
           | out what the origin of "bit" is and then utilizing the same
           | formula. Interestingly, this would have ended up at basically
           | the same place as you, since I would have first thought of
           | "tit", rejected it due to it not seeming likely to get
           | adopted due to the connotations, and then picked "trit" as
           | the next closest thing.
        
           | cperciva wrote:
           | We have more to go on than binary though. Hexadecimal
           | "digits" are universally known as "hexits", not "hexats" --
           | so "it" must be the suffix, not the "t" by itself.
        
         | pjturpeau wrote:
         | I think it should better be "tet".
        
         | NelsonMinar wrote:
         | Or call it a trit, avoid the juvenile joke, and respect decades
         | of computer science history.
         | https://en.wikipedia.org/wiki/Ternary_numeral_system reply
        
           | fsckboy wrote:
           | but there are no "trits" in his scheme, he's using base 343
           | represented in base 2, from which trits can be extracted
           | through a reasonably laborious conversion.
           | 
           | also, bits have the dual nature of being boolean logic
           | values, and integers (in fact, arithmetic is done by logic).
           | Bits deserve the honor of their name, and pretenders to the
           | throne are hoping for stolen valor.
        
             | hinkley wrote:
             | 243. Which is 3^5.
        
           | alt187 wrote:
           | Tit tit tit.
        
         | jamamp wrote:
         | If you thought random bit flips were bad, wait until you get
         | random tit flips.
        
       | taeric wrote:
       | I'm curious what the benefits of ternary numbers are here? The
       | section on balanced ternary numbers in Knuth's work was super fun
       | to read; but I didn't realize this was relevant nowadays?
       | 
       | At the outset, it seems like it is more about storing ternary
       | weights in binary without having to convert a multidigit ternary
       | number? That is, if you have 20 weights, if this was binary, you
       | could store all 20 weights as a single 20 bit number. I'm
       | assuming you could convert the 3^20 number to binary and do the
       | same? Is that roughly what is happening, here, with most of the
       | effort in how to get out each trit?
       | 
       | Edit: I should add that I was largely misunderstanding parts of
       | the article. I was reading this as akin to BCD, but with ternary.
       | Reading it without that misunderstanding, I can see that they are
       | largely doing what I said here.
        
         | winwang wrote:
         | PAM3 for data transfer: https://www.synopsys.com/designware-
         | ip/technical-bulletin/pa...
        
           | taeric wrote:
           | At the physical layer, this makes a ton of sense. Apologies
           | for completely ignoring that part. I was specifically curious
           | on encoding each "digit" of a ternary number independently in
           | a computer. Which.... yeah, that isn't what this article was
           | even saying.
        
         | Mokosha wrote:
         | I don't know what the benefits of ternary numbers are in this
         | particular case, but a similar technique is used in ASTC[1].
         | 
         | In this case, a fixed-size block of pixels stores one or more
         | pairs of colors using a lower resolution than your usual 8bpp.
         | Then, these color "endpoints" are interpolated using weights
         | that allow you to approximate the original colors. For storing
         | the endpoints, sometimes you can get a better trade-off of
         | resolution/bit if you can pack ternary numbers in the bitstream
         | instead of being restricted to power-of-two boundaries.
         | 
         | [1]: https://developer.arm.com/documentation/102162/0430/The-
         | ASTC...
        
         | thechao wrote:
         | Ternary is a thing; however, there's a lot of times when you're
         | hand-rolling a compression scheme. In those cases, it is common
         | to consider all sorts of nonstandard bases. So, for instance,
         | if you need to encode a pair of numbers, and that pair is in
         | the (rectangular) range: `[0,2]*[0,4]`, then it can be
         | efficiently encoded in a 4b value, using the generalized
         | version of the technique outlined in the article. I know that a
         | lot of GPU block textures rely pretty heavily on products-of-
         | weird-bases, usually 3, 5, and 7. Here's some 'good'
         | combinations (high utilization rates, easy to decode):
         | 
         | 3^5 ~ 8b
         | 
         | 5^3 ~ 7b
         | 
         | 2 * 5 * 7^2 ~ 9b
         | 
         | 2 * 3 * 5 ~ 5b
         | 
         | 11 * 11 ~ 7b
         | 
         | 3 * 5 * 7 ~ 7b
         | 
         | etc.
        
           | taeric wrote:
           | Yeah, I was making the mistake of thinking this was storing
           | ternary in a BCD like scheme. I... honestly don't know how I
           | talked myself into that.
        
           | hinkley wrote:
           | Doom also famously redefined a circle as 256deg and reworked
           | all of their trigonometry and polar coordinate translations
           | to match. And so the smallest increment of angle you could do
           | in the game was around 85 arc minutes.
        
         | AlchemistCamp wrote:
         | At the hardware level, I think the answer is that 3 is closer
         | to e than 2 is.
         | 
         | For a software emulation of ternary, I have no idea.
        
         | astrobe_ wrote:
         | true, false, unknown is something you need sometimes.
        
         | av_conk wrote:
         | There's emerging work on using the quantum analogue for trits
         | (qutrits) in quantum computing.
         | 
         | They are more robust to certain errors.
        
         | WorldMaker wrote:
         | It's not directly relevant "nowadays", but it is also fun to
         | note that hardware ternary computers existed, for example:
         | https://en.wikipedia.org/wiki/Setun
         | 
         | Also, I forget which RAM format it was, but I recall that one
         | of the RAM formats still in use is essentially ternary at the
         | physical hardware level and uses the "sign" of the (-1, 0, 1)
         | trit as a "check half-digit" for very fast checksums in
         | hardware (and then the absolute value as the bit), because the
         | physical hardware was going to be three-state anyway and cheap
         | checksums are something RAM will always want.
        
           | amelius wrote:
           | > but it is also fun to note that hardware ternary computers
           | existed
           | 
           | Maybe we should also have a post about storing bits in trits,
           | then.
        
             | taeric wrote:
             | At a base level, this is just how to store numbers and
             | convert between radixes?
             | 
             | That is, the confusion I had in my thread was I was
             | thinking "pack numbers" meant something more than just
             | storing a radix three number. Where the "packing" was just
             | taking advantage of the standard positional values.
             | 
             | For example, consider that the number 123 can be seen as
             | three numbers 0-9 as opposed to one number. How do you
             | extract the three numbers? It can take a few instructions
             | for similar reasons as the ternary case.
             | 
             | Stated differently, if you can see how the numbers 0x12,
             | 18, 022 (octal), and 0b10010 (binary) are all the same
             | number, then you can see how the storage of them is roughly
             | all the same.
             | 
             | Now, for fun, you can also see how getting out the
             | different "digits" of a number can be very different based
             | on the base and limiting yourself to only working with
             | binary shifts.
        
       | jyscao wrote:
       | Does anyone know where the theoretical limit of log(3)/log(2)
       | comes from?
        
         | dmitrygr wrote:
         | Information theory.
         | 
         | https://en.wikipedia.org/wiki/Entropy_(information_theory)
        
         | wging wrote:
         | Another way to write that number is as the log base 2 of 3. The
         | log base 2 of the number of possible states gives you the
         | number of bits of information in a value chosen from that many
         | possible states; here, a trit has 3 possible states.
        
         | taeric wrote:
         | This is the equation for converting radix. You can lookup
         | optimal radix choice for more, I think?
         | 
         | For fun, consider how much space a base 10 number takes in
         | binary. It takes about 4 bits with slack. Specifically, about
         | 3.32 bits. Which is log(10)/log(2).
        
         | badmintonbaseba wrote:
         | To be able to reversibly encode an N-trit ternary number into
         | an M-bit binary number, the number of possible N-trit numbers
         | must be less or equal to the number of possible M-bit binary
         | numbers. Otherwise there would be two input ternary numbers
         | that would map to the the same binary number (pigeonhole
         | principle). Which means that 3^N <= 2^M.                 3^N <=
         | 2^M       log(3^N) <= log(2^M)       N*log(3) <= M*log(2)
         | log(3)/log(2) <= M/N
         | 
         | where M/N is the bits per ternary digit.
        
         | tonyarkles wrote:
         | The information theory comment is correct but I'm going to
         | elaborate a little bit with some potentially useful info.
         | 
         | For a binary number with N bits, you can represent 2^N values.
         | Easy example: There are 2^8 = 256 possible values that can be
         | represented by an 8-bit value. You can go the other way, too,
         | and ask "how many bits do I need to represent a given value?"
         | by using some math.
         | 
         | 2^N = 256
         | 
         | Log2(2^N) = Log2(256)
         | 
         | N = Log2(256) = Log(256)/Log(2) = 8
         | 
         | And you can use this to figure out how many bits you need to
         | represent an arbitrary number: N = Log2(1000) =
         | Log(1000)/Log(2) = 9.966. That makes intuitive sense because a
         | 10-bit number has 1024 values.
         | 
         | To get to the theoretical limit you're asking about we do the
         | same thing. For an arbitrary number x, how many trits do we
         | need to represent it in trinary and how many bits N do we need
         | to represent it in binary? What is the ratio of trits to bits?
         | 
         | x = 3^M and x = 2^N
         | 
         | Log3(x) = M
         | 
         | Log(x) / Log(3) = M
         | 
         | Log2(x) = N
         | 
         | Log(x) / Log(2) = N
         | 
         | And then we can just take that ratio N/M to figure out how many
         | bits N we need to represent M trits:
         | 
         | R = N/M = Log(x)/Log(2) x Log(3)/Log(x) = Log(3)/Log(2)
        
         | nick__m wrote:
         | you should read it as log2(3) because
         | log[?](y)=log[?](y)/log[?](x) and it represents the minimal
         | number of bits (yes it's not an integer) required to represents
         | 3 states
        
         | anon291 wrote:
         | In general, in order to a store a thing with N values you need
         | log_2(N) bits of information. This is actually really simple to
         | see if N is a power of two. If you have four things, how many
         | yes/no questions do you need to exactly determine one of the
         | four things? The answer is two (divide the four into two groups
         | of two, ask which group the object is in. Then, when you have
         | that group, split into two groups of one, and ask the next
         | question)[1].
         | 
         | So, for powers of two, it's obviously log_2(N).
         | 
         | Now we just extrapolate. For three things, on average, you need
         | to ask log_2(3) questions. Sometimes you ask just one,
         | sometimes you must ask two (depending on where you split the
         | group). Either way, the formula is the same.
         | 
         | log(3)/log(2) is just a way of writing log_2(3) (since the
         | value of log3/log2 is the same regardless of which base the log
         | is in).
         | 
         | Note that two isn't special. It's only because we've limited
         | ourselves to 'how many yes/no/questions'. If you wanted to know
         | how many multiple choice questions of 3 responses would you
         | need to exactly determine which object of a set of 4, the
         | answer is log_3(4).
         | 
         | [1] You can think of any byte, word, double word, quad word as
         | simply a path in a binary tree of all numbers in the range
         | 0..2^N-1 (where N is bit width). The path uniquely identifies
         | the number.
        
       | PhilipRoman wrote:
       | If you like this sort of thing (and don't care about performance)
       | check out Arithmetic Coding for the most generalized fractional
       | symbol packing algorithm.
        
         | kthielen wrote:
         | Or ANS if you want an equally general packing to AC but you
         | care about performance. ;P
        
       | sweis wrote:
       | I dug into this once and the "theoretical ideal" of 3 originated
       | in a 1950s paper about vacuum tube computers, which itself
       | immediately backed off and said the choice of base 2 is
       | frequently justified.
       | 
       | https://sweis.medium.com/revisiting-radix-economy-8f642d9f3c...
       | 
       | In this case, the context are {-1, 0, 1} weights in a LLM model,
       | which I don't think is being used for any hardware efficiency
       | argument. I think it's just quantizing weights into 3 states.
        
       | jo-han wrote:
       | Do -1, 0 and 1 all occur with the same frequency in typical large
       | language models or would there be a benefit in encoding 0 with 0
       | and -1 and 1 with respectively 10 and 11? (or something even more
       | complex)
       | 
       | Edit: probably not when easy retrieval is needed.
        
       | Spoof7726 wrote:
       | This is exactly a Succinct Data Structure. The solution proposed
       | by the author doesn't allow random access; that is, you can't
       | access a specific A[i] unless you unpack all of them. There is
       | some research (e.g., [1]) allowing random access within
       | reasonable time with slightly more storage, although it is almost
       | entirely of theoretical interest.
       | 
       | [1]: Mihai Patrascu, Succincter, FOCS 2008 best student paper.
        
         | hinkley wrote:
         | If I'm looking for the 3rd position in decimal I ignore
         | everything above and below the third position. Which you can do
         | by taking modulo 1000 and dividing by 100, or dividing by 100
         | and taking modulo 10, which is easier to calculate in the
         | moment. You don't need any fixed point math just normal integer
         | floor on division operations.
         | 
         | So why would it be different in ternary?
        
           | Spoof7726 wrote:
           | Because division is not trivial. Even computing x mod 3 for
           | an n-bit integer x is O(n), if x is represented in the binary
           | form.
        
         | chris_va wrote:
         | Yes, nothing here is really new.
         | 
         | For inference, trits/sec decoded into L1 cache is a lot more
         | important than random access, and is going to be hardware
         | specific (e.g. what SIMD instructions are available). 8bit
         | seems an odd choice given most available instructions, but the
         | Succincter paper's use of mod and individual decoding is
         | (unless I am misreading it) much slower.
        
           | Spoof7726 wrote:
           | I completely agree. I don't think the paper is practical
           | either; I shared it only to show that random access is
           | possible in theory.
        
       | zellyn wrote:
       | Am I doing something wrong? I get 94.9% for 243/256
       | 
       | 243 is pretty good. You have to go up to 3^17 = 129140163 < 2^27
       | = 134217728 to beat it:                    1                   3
       | 4  2                  1 75.0%*          2                   9
       | 16  4                  7 56.2%          3                  27
       | 32  5                  5 84.4%*          4                  81
       | 128  7                 47 63.3%          5                 243
       | 256  8                 13 94.9%*          6                 729
       | 1024 10                295 71.2%          7                2187
       | 4096 12               1909 53.4%          8                6561
       | 8192 13               1631 80.1%          9               19683
       | 32768 15              13085 60.1%         10               59049
       | 65536 16               6487 90.1%         11              177147
       | 262144 18              84997 67.6%         12              531441
       | 1048576 20             517135 50.7%         13
       | 1594323             2097152 21             502829 76.0%
       | 14             4782969             8388608 23            3605639
       | 57.0%         15            14348907            16777216 24
       | 2428309 85.5%         16            43046721            67108864
       | 26           24062143 64.1%         17           129140163
       | 134217728 27            5077565 96.2%*         18
       | 387420489           536870912 29          149450423 72.2%
       | 19          1162261467          2147483648 31          985222181
       | 54.1%         20          3486784401          4294967296 32
       | 808182895 81.2%         21         10460353203
       | 17179869184 34         6719515981 60.9%         22
       | 31381059609         34359738368 35         2978678759 91.3%
       | 23         94143178827        137438953472 37        43295774645
       | 68.5%         24        282429536481        549755813888 39
       | 267326277407 51.4%         25        847288609443
       | 1099511627776 40       252223018333 77.1%         26
       | 2541865828329       4398046511104 42      1856180682775 57.8%
       | 27       7625597484987       8796093022208 43      1170495537221
       | 86.7%         28      22876792454961      35184372088832 45
       | 12307579633871 65.0%         29      68630377364883
       | 70368744177664 46      1738366812781 97.5%*         30
       | 205891132094649     281474976710656 48     75583844616007 73.1%
       | 31     617673396283947    1125899906842624 50    508226510558677
       | 54.9%         32    1853020188851841    2251799813685248 51
       | 398779624833407 82.3%         33    5559060566555523
       | 9007199254740992 53   3448138688185469 61.7%         34
       | 16677181699666569   18014398509481984 54   1337216809815415 92.6%
       | 35   50031545098999707   72057594037927936 56  22026048938928229
       | 69.4%         36  150094635296999121  288230376151711744 58
       | 138135740854712623 52.1%         37  450283905890997363
       | 576460752303423488 59 126176846412426125 78.1%         38
       | 1350851717672992089 2305843009213693952 61 954991291540701863
       | 58.6%         39 4052555153018976267 4611686018427387904 62
       | 559130865408411637 87.9%
       | 
       | Generated with this:                   max_use = 0         for i
       | in range(1, 40):             trinary = 3**i             power =
       | math.ceil(math.log2(trinary))             binary = 2**int(power)
       | diff = binary - trinary             use = (trinary / binary)
       | max_star = ''             if use > max_use:
       | max_use = use                 max_star = '*'             print(f'
       | {i:2} {trinary:19} {binary:19} {power:2} {diff:18}
       | {use:.1%}{max_star}')
        
         | nayuki wrote:
         | > I get 94.9% for 243/256
         | 
         | You're looking at the alphabet utilization rate. Whereas the
         | standard way to measure efficiency (which is the way that the
         | author did it) is to look at entropy.
         | 
         | Here's how I would phrase it. Obviously, if you have a message
         | that has 256 possible states and each one has the same
         | probability of occurring, then the entropy is 8 bits per
         | message.
         | 
         | If you have a message having 243 uniformly distributed states,
         | then its entropy is defined as log base-2 of 243 = log_2 (243)
         | = ln(243) / ln(2) [?] 7.925 bits. But if you used 8 bits to
         | express this message, then you've wasted 0.075 bits (or 75
         | millibits, mb) per message, so your efficiency is 7.925/8 [?]
         | 99.06%.
         | 
         | How would you achieve 7.925 bits per message in practice? By
         | packing more messages together and sharing wasted space. By
         | using some continued fraction magic, let me proposed to store
         | 15867x of these base-243 messages together. There are 243^15867
         | possibilities, which is approximately equal to
         | 2^125742.999994713, or just a hair under 125743 bits. So, a
         | block of 125743 bits can store 15867x base-243 messages
         | losslessly, which means each base-243 message uses 125743/15867
         | [?] 7.925 bits on average.
        
       ___________________________________________________________________
       (page generated 2024-12-05 23:00 UTC)