[HN Gopher] How to pack ternary numbers in 8-bit bytes
___________________________________________________________________
How to pack ternary numbers in 8-bit bytes
Author : todsacerdoti
Score : 65 points
Date : 2024-12-05 15:51 UTC (7 hours ago)
(HTM) web link (compilade.net)
(TXT) w3m dump (compilade.net)
| sigmoid10 wrote:
| >I'll be calling a "ternary digit" a "trit", like a "binary
| digit" is called a "bit".
|
| Shouldn't it be a "tit" if you want to be consistent? In fact,
| please let's make that happen. The HR execs are gonna go crazy
| when they hear the devs talking about tits all day.
| jgtrosh wrote:
| Wait till you learn what the french pronunciation of bit sounds
| like
| foxyv wrote:
| They were talking about tits and killing children! I think we
| have a satanic cult in our IT department.
| koolba wrote:
| " _And then he said the way to become a demon was to kill his
| parent!_ "
| zamadatix wrote:
| It's a joke comment but it made me realize people see the "bit"
| shortening coming about differently. It could be any one of
| (and maybe even more, I just had to stop thinking about this
| before I ended up spending all day on it):
|
| - BInary digiT [BIT]
|
| - BInary digIT [BIT[ (i.e. "merge on the shared vowel")
|
| - Binary digIT [BIT]
|
| That's consistent with the common usage of trit if you take it
| as a "trinary digit":
|
| - TRInary digiT [TRIT]
|
| - TRInary digIT [TRIT]
|
| - TRinary digIT [TRIT]
|
| But if you're one to stick to ternary digit then you can end up
| with all sorts of unique combos:
|
| - TErnary digiT [TET]
|
| - TErnary digIT [TEIT] (no shared vowel to merge on)
|
| - Ternary digIT [TIT] (the above humor)
|
| Personally, the middle options for the first 2 sections (BIT
| and TRIT via shared vowel) are what I always mentally jumped
| to. If I was going to stick with "ternary" terminology I think,
| from a pure mapping of mental mechanics, I'd have ended up on
| TET (i.e. when there is no shared vowel assume the vowel was
| from the first word instead of the 2nd). The problem with that
| being TET conflicts with base 4, leaving you with either the
| awkward double vowel or the awkward end vowel pull. Or just
| calling it a trinary digit where you get the same consistency
| results as from BIT.
|
| Thank you for coming to my TED talk (the things a hacker thinks
| about while burning their PTO I suppose?)
| beng-nl wrote:
| I look forward to your thesis.
| sigmoid10 wrote:
| Nice analysis, but you missed the most obvious one, which is
| also what I was deriving from in the above comment:
| Pronunciation. The "i" in "bit" is pronounced exactly like
| the second "i" in "digit" and not like the "i" in "binary",
| which sounds more like the "i" in bite. So the "it" in "bit"
| has to come from digIT. Consequently, the most logical
| shortening for "Trinary digIT" must be TIT.
| zamadatix wrote:
| That's the "Binary digIT [BIT]" line. Do you always go
| based on the input sounds rather than output reading or
| just sometimes though? E.g. for special weapons and tactics
| do you say it like you would in "the cat swat at the toy"
| or more akin to "sw@" (same first half but then "at" for
| the second half because the 'a' comes from a short 'a'?).
|
| Of course, just for GIF alone I wouldn't mind your method
| was the universally obvious way... but now I might as well
| talk about religion and politics too!
| sigmoid10 wrote:
| I meant you missed that line for the "trinary digit"
| cases.
| saghm wrote:
| > Personally, the middle options for the first 2 sections
| (BIT and TRIT via shared vowel) are what I always mentally
| jumped to. If I was going to stick with "ternary" terminology
| I think, from a pure mapping of mental mechanics, I'd have
| ended up on TET (i.e. when there is no shared vowel assume
| the vowel was from the first word instead of the 2nd). The
| problem with that being TET conflicts with base 4, leaving
| you with either the awkward double vowel or the awkward end
| vowel pull. Or just calling it a trinary digit where you get
| the same consistency results as from BIT.
|
| If I had been trying to come up with something, I probably
| would have started with an approach of picking something that
| was as similar-sounding to "bit" as possible so it would be
| recognizable and then looking for a justification I could use
| to argue why it was consistent rather than trying to figure
| out what the origin of "bit" is and then utilizing the same
| formula. Interestingly, this would have ended up at basically
| the same place as you, since I would have first thought of
| "tit", rejected it due to it not seeming likely to get
| adopted due to the connotations, and then picked "trit" as
| the next closest thing.
| cperciva wrote:
| We have more to go on than binary though. Hexadecimal
| "digits" are universally known as "hexits", not "hexats" --
| so "it" must be the suffix, not the "t" by itself.
| pjturpeau wrote:
| I think it should better be "tet".
| NelsonMinar wrote:
| Or call it a trit, avoid the juvenile joke, and respect decades
| of computer science history.
| https://en.wikipedia.org/wiki/Ternary_numeral_system reply
| fsckboy wrote:
| but there are no "trits" in his scheme, he's using base 343
| represented in base 2, from which trits can be extracted
| through a reasonably laborious conversion.
|
| also, bits have the dual nature of being boolean logic
| values, and integers (in fact, arithmetic is done by logic).
| Bits deserve the honor of their name, and pretenders to the
| throne are hoping for stolen valor.
| hinkley wrote:
| 243. Which is 3^5.
| alt187 wrote:
| Tit tit tit.
| jamamp wrote:
| If you thought random bit flips were bad, wait until you get
| random tit flips.
| taeric wrote:
| I'm curious what the benefits of ternary numbers are here? The
| section on balanced ternary numbers in Knuth's work was super fun
| to read; but I didn't realize this was relevant nowadays?
|
| At the outset, it seems like it is more about storing ternary
| weights in binary without having to convert a multidigit ternary
| number? That is, if you have 20 weights, if this was binary, you
| could store all 20 weights as a single 20 bit number. I'm
| assuming you could convert the 3^20 number to binary and do the
| same? Is that roughly what is happening, here, with most of the
| effort in how to get out each trit?
|
| Edit: I should add that I was largely misunderstanding parts of
| the article. I was reading this as akin to BCD, but with ternary.
| Reading it without that misunderstanding, I can see that they are
| largely doing what I said here.
| winwang wrote:
| PAM3 for data transfer: https://www.synopsys.com/designware-
| ip/technical-bulletin/pa...
| taeric wrote:
| At the physical layer, this makes a ton of sense. Apologies
| for completely ignoring that part. I was specifically curious
| on encoding each "digit" of a ternary number independently in
| a computer. Which.... yeah, that isn't what this article was
| even saying.
| Mokosha wrote:
| I don't know what the benefits of ternary numbers are in this
| particular case, but a similar technique is used in ASTC[1].
|
| In this case, a fixed-size block of pixels stores one or more
| pairs of colors using a lower resolution than your usual 8bpp.
| Then, these color "endpoints" are interpolated using weights
| that allow you to approximate the original colors. For storing
| the endpoints, sometimes you can get a better trade-off of
| resolution/bit if you can pack ternary numbers in the bitstream
| instead of being restricted to power-of-two boundaries.
|
| [1]: https://developer.arm.com/documentation/102162/0430/The-
| ASTC...
| thechao wrote:
| Ternary is a thing; however, there's a lot of times when you're
| hand-rolling a compression scheme. In those cases, it is common
| to consider all sorts of nonstandard bases. So, for instance,
| if you need to encode a pair of numbers, and that pair is in
| the (rectangular) range: `[0,2]*[0,4]`, then it can be
| efficiently encoded in a 4b value, using the generalized
| version of the technique outlined in the article. I know that a
| lot of GPU block textures rely pretty heavily on products-of-
| weird-bases, usually 3, 5, and 7. Here's some 'good'
| combinations (high utilization rates, easy to decode):
|
| 3^5 ~ 8b
|
| 5^3 ~ 7b
|
| 2 * 5 * 7^2 ~ 9b
|
| 2 * 3 * 5 ~ 5b
|
| 11 * 11 ~ 7b
|
| 3 * 5 * 7 ~ 7b
|
| etc.
| taeric wrote:
| Yeah, I was making the mistake of thinking this was storing
| ternary in a BCD like scheme. I... honestly don't know how I
| talked myself into that.
| hinkley wrote:
| Doom also famously redefined a circle as 256deg and reworked
| all of their trigonometry and polar coordinate translations
| to match. And so the smallest increment of angle you could do
| in the game was around 85 arc minutes.
| AlchemistCamp wrote:
| At the hardware level, I think the answer is that 3 is closer
| to e than 2 is.
|
| For a software emulation of ternary, I have no idea.
| astrobe_ wrote:
| true, false, unknown is something you need sometimes.
| av_conk wrote:
| There's emerging work on using the quantum analogue for trits
| (qutrits) in quantum computing.
|
| They are more robust to certain errors.
| WorldMaker wrote:
| It's not directly relevant "nowadays", but it is also fun to
| note that hardware ternary computers existed, for example:
| https://en.wikipedia.org/wiki/Setun
|
| Also, I forget which RAM format it was, but I recall that one
| of the RAM formats still in use is essentially ternary at the
| physical hardware level and uses the "sign" of the (-1, 0, 1)
| trit as a "check half-digit" for very fast checksums in
| hardware (and then the absolute value as the bit), because the
| physical hardware was going to be three-state anyway and cheap
| checksums are something RAM will always want.
| amelius wrote:
| > but it is also fun to note that hardware ternary computers
| existed
|
| Maybe we should also have a post about storing bits in trits,
| then.
| taeric wrote:
| At a base level, this is just how to store numbers and
| convert between radixes?
|
| That is, the confusion I had in my thread was I was
| thinking "pack numbers" meant something more than just
| storing a radix three number. Where the "packing" was just
| taking advantage of the standard positional values.
|
| For example, consider that the number 123 can be seen as
| three numbers 0-9 as opposed to one number. How do you
| extract the three numbers? It can take a few instructions
| for similar reasons as the ternary case.
|
| Stated differently, if you can see how the numbers 0x12,
| 18, 022 (octal), and 0b10010 (binary) are all the same
| number, then you can see how the storage of them is roughly
| all the same.
|
| Now, for fun, you can also see how getting out the
| different "digits" of a number can be very different based
| on the base and limiting yourself to only working with
| binary shifts.
| jyscao wrote:
| Does anyone know where the theoretical limit of log(3)/log(2)
| comes from?
| dmitrygr wrote:
| Information theory.
|
| https://en.wikipedia.org/wiki/Entropy_(information_theory)
| wging wrote:
| Another way to write that number is as the log base 2 of 3. The
| log base 2 of the number of possible states gives you the
| number of bits of information in a value chosen from that many
| possible states; here, a trit has 3 possible states.
| taeric wrote:
| This is the equation for converting radix. You can lookup
| optimal radix choice for more, I think?
|
| For fun, consider how much space a base 10 number takes in
| binary. It takes about 4 bits with slack. Specifically, about
| 3.32 bits. Which is log(10)/log(2).
| badmintonbaseba wrote:
| To be able to reversibly encode an N-trit ternary number into
| an M-bit binary number, the number of possible N-trit numbers
| must be less or equal to the number of possible M-bit binary
| numbers. Otherwise there would be two input ternary numbers
| that would map to the the same binary number (pigeonhole
| principle). Which means that 3^N <= 2^M. 3^N <=
| 2^M log(3^N) <= log(2^M) N*log(3) <= M*log(2)
| log(3)/log(2) <= M/N
|
| where M/N is the bits per ternary digit.
| tonyarkles wrote:
| The information theory comment is correct but I'm going to
| elaborate a little bit with some potentially useful info.
|
| For a binary number with N bits, you can represent 2^N values.
| Easy example: There are 2^8 = 256 possible values that can be
| represented by an 8-bit value. You can go the other way, too,
| and ask "how many bits do I need to represent a given value?"
| by using some math.
|
| 2^N = 256
|
| Log2(2^N) = Log2(256)
|
| N = Log2(256) = Log(256)/Log(2) = 8
|
| And you can use this to figure out how many bits you need to
| represent an arbitrary number: N = Log2(1000) =
| Log(1000)/Log(2) = 9.966. That makes intuitive sense because a
| 10-bit number has 1024 values.
|
| To get to the theoretical limit you're asking about we do the
| same thing. For an arbitrary number x, how many trits do we
| need to represent it in trinary and how many bits N do we need
| to represent it in binary? What is the ratio of trits to bits?
|
| x = 3^M and x = 2^N
|
| Log3(x) = M
|
| Log(x) / Log(3) = M
|
| Log2(x) = N
|
| Log(x) / Log(2) = N
|
| And then we can just take that ratio N/M to figure out how many
| bits N we need to represent M trits:
|
| R = N/M = Log(x)/Log(2) x Log(3)/Log(x) = Log(3)/Log(2)
| nick__m wrote:
| you should read it as log2(3) because
| log[?](y)=log[?](y)/log[?](x) and it represents the minimal
| number of bits (yes it's not an integer) required to represents
| 3 states
| anon291 wrote:
| In general, in order to a store a thing with N values you need
| log_2(N) bits of information. This is actually really simple to
| see if N is a power of two. If you have four things, how many
| yes/no questions do you need to exactly determine one of the
| four things? The answer is two (divide the four into two groups
| of two, ask which group the object is in. Then, when you have
| that group, split into two groups of one, and ask the next
| question)[1].
|
| So, for powers of two, it's obviously log_2(N).
|
| Now we just extrapolate. For three things, on average, you need
| to ask log_2(3) questions. Sometimes you ask just one,
| sometimes you must ask two (depending on where you split the
| group). Either way, the formula is the same.
|
| log(3)/log(2) is just a way of writing log_2(3) (since the
| value of log3/log2 is the same regardless of which base the log
| is in).
|
| Note that two isn't special. It's only because we've limited
| ourselves to 'how many yes/no/questions'. If you wanted to know
| how many multiple choice questions of 3 responses would you
| need to exactly determine which object of a set of 4, the
| answer is log_3(4).
|
| [1] You can think of any byte, word, double word, quad word as
| simply a path in a binary tree of all numbers in the range
| 0..2^N-1 (where N is bit width). The path uniquely identifies
| the number.
| PhilipRoman wrote:
| If you like this sort of thing (and don't care about performance)
| check out Arithmetic Coding for the most generalized fractional
| symbol packing algorithm.
| kthielen wrote:
| Or ANS if you want an equally general packing to AC but you
| care about performance. ;P
| sweis wrote:
| I dug into this once and the "theoretical ideal" of 3 originated
| in a 1950s paper about vacuum tube computers, which itself
| immediately backed off and said the choice of base 2 is
| frequently justified.
|
| https://sweis.medium.com/revisiting-radix-economy-8f642d9f3c...
|
| In this case, the context are {-1, 0, 1} weights in a LLM model,
| which I don't think is being used for any hardware efficiency
| argument. I think it's just quantizing weights into 3 states.
| jo-han wrote:
| Do -1, 0 and 1 all occur with the same frequency in typical large
| language models or would there be a benefit in encoding 0 with 0
| and -1 and 1 with respectively 10 and 11? (or something even more
| complex)
|
| Edit: probably not when easy retrieval is needed.
| Spoof7726 wrote:
| This is exactly a Succinct Data Structure. The solution proposed
| by the author doesn't allow random access; that is, you can't
| access a specific A[i] unless you unpack all of them. There is
| some research (e.g., [1]) allowing random access within
| reasonable time with slightly more storage, although it is almost
| entirely of theoretical interest.
|
| [1]: Mihai Patrascu, Succincter, FOCS 2008 best student paper.
| hinkley wrote:
| If I'm looking for the 3rd position in decimal I ignore
| everything above and below the third position. Which you can do
| by taking modulo 1000 and dividing by 100, or dividing by 100
| and taking modulo 10, which is easier to calculate in the
| moment. You don't need any fixed point math just normal integer
| floor on division operations.
|
| So why would it be different in ternary?
| Spoof7726 wrote:
| Because division is not trivial. Even computing x mod 3 for
| an n-bit integer x is O(n), if x is represented in the binary
| form.
| chris_va wrote:
| Yes, nothing here is really new.
|
| For inference, trits/sec decoded into L1 cache is a lot more
| important than random access, and is going to be hardware
| specific (e.g. what SIMD instructions are available). 8bit
| seems an odd choice given most available instructions, but the
| Succincter paper's use of mod and individual decoding is
| (unless I am misreading it) much slower.
| Spoof7726 wrote:
| I completely agree. I don't think the paper is practical
| either; I shared it only to show that random access is
| possible in theory.
| zellyn wrote:
| Am I doing something wrong? I get 94.9% for 243/256
|
| 243 is pretty good. You have to go up to 3^17 = 129140163 < 2^27
| = 134217728 to beat it: 1 3
| 4 2 1 75.0%* 2 9
| 16 4 7 56.2% 3 27
| 32 5 5 84.4%* 4 81
| 128 7 47 63.3% 5 243
| 256 8 13 94.9%* 6 729
| 1024 10 295 71.2% 7 2187
| 4096 12 1909 53.4% 8 6561
| 8192 13 1631 80.1% 9 19683
| 32768 15 13085 60.1% 10 59049
| 65536 16 6487 90.1% 11 177147
| 262144 18 84997 67.6% 12 531441
| 1048576 20 517135 50.7% 13
| 1594323 2097152 21 502829 76.0%
| 14 4782969 8388608 23 3605639
| 57.0% 15 14348907 16777216 24
| 2428309 85.5% 16 43046721 67108864
| 26 24062143 64.1% 17 129140163
| 134217728 27 5077565 96.2%* 18
| 387420489 536870912 29 149450423 72.2%
| 19 1162261467 2147483648 31 985222181
| 54.1% 20 3486784401 4294967296 32
| 808182895 81.2% 21 10460353203
| 17179869184 34 6719515981 60.9% 22
| 31381059609 34359738368 35 2978678759 91.3%
| 23 94143178827 137438953472 37 43295774645
| 68.5% 24 282429536481 549755813888 39
| 267326277407 51.4% 25 847288609443
| 1099511627776 40 252223018333 77.1% 26
| 2541865828329 4398046511104 42 1856180682775 57.8%
| 27 7625597484987 8796093022208 43 1170495537221
| 86.7% 28 22876792454961 35184372088832 45
| 12307579633871 65.0% 29 68630377364883
| 70368744177664 46 1738366812781 97.5%* 30
| 205891132094649 281474976710656 48 75583844616007 73.1%
| 31 617673396283947 1125899906842624 50 508226510558677
| 54.9% 32 1853020188851841 2251799813685248 51
| 398779624833407 82.3% 33 5559060566555523
| 9007199254740992 53 3448138688185469 61.7% 34
| 16677181699666569 18014398509481984 54 1337216809815415 92.6%
| 35 50031545098999707 72057594037927936 56 22026048938928229
| 69.4% 36 150094635296999121 288230376151711744 58
| 138135740854712623 52.1% 37 450283905890997363
| 576460752303423488 59 126176846412426125 78.1% 38
| 1350851717672992089 2305843009213693952 61 954991291540701863
| 58.6% 39 4052555153018976267 4611686018427387904 62
| 559130865408411637 87.9%
|
| Generated with this: max_use = 0 for i
| in range(1, 40): trinary = 3**i power =
| math.ceil(math.log2(trinary)) binary = 2**int(power)
| diff = binary - trinary use = (trinary / binary)
| max_star = '' if use > max_use:
| max_use = use max_star = '*' print(f'
| {i:2} {trinary:19} {binary:19} {power:2} {diff:18}
| {use:.1%}{max_star}')
| nayuki wrote:
| > I get 94.9% for 243/256
|
| You're looking at the alphabet utilization rate. Whereas the
| standard way to measure efficiency (which is the way that the
| author did it) is to look at entropy.
|
| Here's how I would phrase it. Obviously, if you have a message
| that has 256 possible states and each one has the same
| probability of occurring, then the entropy is 8 bits per
| message.
|
| If you have a message having 243 uniformly distributed states,
| then its entropy is defined as log base-2 of 243 = log_2 (243)
| = ln(243) / ln(2) [?] 7.925 bits. But if you used 8 bits to
| express this message, then you've wasted 0.075 bits (or 75
| millibits, mb) per message, so your efficiency is 7.925/8 [?]
| 99.06%.
|
| How would you achieve 7.925 bits per message in practice? By
| packing more messages together and sharing wasted space. By
| using some continued fraction magic, let me proposed to store
| 15867x of these base-243 messages together. There are 243^15867
| possibilities, which is approximately equal to
| 2^125742.999994713, or just a hair under 125743 bits. So, a
| block of 125743 bits can store 15867x base-243 messages
| losslessly, which means each base-243 message uses 125743/15867
| [?] 7.925 bits on average.
___________________________________________________________________
(page generated 2024-12-05 23:00 UTC)