[HN Gopher] Show HN: Outperforming VByte for Large Integers Usin...
___________________________________________________________________
Show HN: Outperforming VByte for Large Integers Using Phi-Encoding
Author : keepamovin
Score : 14 points
Date : 2024-08-25 13:38 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| gloria_mundi wrote:
| I'm not sure I understand the prime density thing. Of the numbers
| up to 8258, about 12.5% are prime. Accounting for the fact that
| about a quarter of these primes ends in 101, i.e. cannot occur, I
| would expect about 10.7% = 12.5% * (3/4) / (7/8), which is fairly
| close to the observed 9.4%.
|
| The 2.1% in the README seems to be the density of primes < 1000
| among numbers up to 8258. That's not what was counted.
| ForOldHack wrote:
| After you read about Ramanujan...
|
| https://web.williams.edu/Mathematics/sjmiller/public_html/ma...
|
| To be honest, I have a degree in math, and struggle to
| understand the extreme difficulty in assessing the density of
| primes.
| gloria_mundi wrote:
| I haven't read all of that, but the problem at hand seems
| significantly less complicated.
|
| We're mapping the numbers from 1 to 1000 to distinct numbers
| up to 8258, and the README claims that we should expect 2.1%
| of the resulting numbers to be prime. I see no reason for
| this claim, and as I understand it, the 2.1% comes from
| pi(1000) / 8258, which seems like nonsense to me.
| ot wrote:
| VByte is a weird baseline to compare to: it is a byte-aligned
| encoding scheme, so it trades off some space efficiency for speed
| of decoding. The proposed method is bit-aligned, so it should be
| compared against bit-aligned encoding schemes.
|
| For large integers, it is hard to beat Elias Delta coding [1],
| which asymptotically uses log(n) + o(log(n)) bits, and in
| practice it breaks even with most other encoding schemes quite
| early on.
|
| More importantly, Elias Gamma and Delta are complete, meaning
| that there is no redundancy (another way to look at it is that
| any sequence over the alphabet is decodable). VByte is complete
| over the byte alphabet. Any complete code is optimal for some
| integer distribution (see "implied probability" in the Wikipedia
| page).
|
| So if your integer distribution is heavier on the large integers,
| there are already plenty of complete encodings to pick from, and
| almost certainly one that fits well the distribution.
|
| The scheme proposed here, instead, is not complete, as mentioned
| in the README ("this method does introduce some redundancy,
| particularly when sequences must be padded to prevent
| unintentional "101" patterns"), so it cannot be optimal for any
| distribution.
|
| The Wikipedia page on the Kraft-McMillan inequality [2] explains
| this in more detail.
|
| [1] https://en.wikipedia.org/wiki/Elias_delta_coding [2]
| https://en.wikipedia.org/wiki/Kraft%E2%80%93McMillan_inequal...
___________________________________________________________________
(page generated 2024-08-27 23:01 UTC)