[HN Gopher] Show HN: Outperforming VByte for Large Integers Usin...
       ___________________________________________________________________
        
       Show HN: Outperforming VByte for Large Integers Using Phi-Encoding
        
       Author : keepamovin
       Score  : 14 points
       Date   : 2024-08-25 13:38 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gloria_mundi wrote:
       | I'm not sure I understand the prime density thing. Of the numbers
       | up to 8258, about 12.5% are prime. Accounting for the fact that
       | about a quarter of these primes ends in 101, i.e. cannot occur, I
       | would expect about 10.7% = 12.5% * (3/4) / (7/8), which is fairly
       | close to the observed 9.4%.
       | 
       | The 2.1% in the README seems to be the density of primes < 1000
       | among numbers up to 8258. That's not what was counted.
        
         | ForOldHack wrote:
         | After you read about Ramanujan...
         | 
         | https://web.williams.edu/Mathematics/sjmiller/public_html/ma...
         | 
         | To be honest, I have a degree in math, and struggle to
         | understand the extreme difficulty in assessing the density of
         | primes.
        
           | gloria_mundi wrote:
           | I haven't read all of that, but the problem at hand seems
           | significantly less complicated.
           | 
           | We're mapping the numbers from 1 to 1000 to distinct numbers
           | up to 8258, and the README claims that we should expect 2.1%
           | of the resulting numbers to be prime. I see no reason for
           | this claim, and as I understand it, the 2.1% comes from
           | pi(1000) / 8258, which seems like nonsense to me.
        
       | ot wrote:
       | VByte is a weird baseline to compare to: it is a byte-aligned
       | encoding scheme, so it trades off some space efficiency for speed
       | of decoding. The proposed method is bit-aligned, so it should be
       | compared against bit-aligned encoding schemes.
       | 
       | For large integers, it is hard to beat Elias Delta coding [1],
       | which asymptotically uses log(n) + o(log(n)) bits, and in
       | practice it breaks even with most other encoding schemes quite
       | early on.
       | 
       | More importantly, Elias Gamma and Delta are complete, meaning
       | that there is no redundancy (another way to look at it is that
       | any sequence over the alphabet is decodable). VByte is complete
       | over the byte alphabet. Any complete code is optimal for some
       | integer distribution (see "implied probability" in the Wikipedia
       | page).
       | 
       | So if your integer distribution is heavier on the large integers,
       | there are already plenty of complete encodings to pick from, and
       | almost certainly one that fits well the distribution.
       | 
       | The scheme proposed here, instead, is not complete, as mentioned
       | in the README ("this method does introduce some redundancy,
       | particularly when sequences must be padded to prevent
       | unintentional "101" patterns"), so it cannot be optimal for any
       | distribution.
       | 
       | The Wikipedia page on the Kraft-McMillan inequality [2] explains
       | this in more detail.
       | 
       | [1] https://en.wikipedia.org/wiki/Elias_delta_coding [2]
       | https://en.wikipedia.org/wiki/Kraft%E2%80%93McMillan_inequal...
        
       ___________________________________________________________________
       (page generated 2024-08-27 23:01 UTC)