[HN Gopher] ChibiHash: Small, Fast 64 bit hash function
       ___________________________________________________________________
        
       ChibiHash: Small, Fast 64 bit hash function
        
       Author : thunderbong
       Score  : 87 points
       Date   : 2024-11-18 00:23 UTC (22 hours ago)
        
 (HTM) web link (nrk.neocities.org)
 (TXT) w3m dump (nrk.neocities.org)
        
       | aappleby wrote:
       | SMHasher/Murmurhash author here - I don't see anything
       | fundamentally wrong with this hash function, it uses basically
       | the same operations as the Murmur family (and a lot of other
       | hashes at this point).
       | 
       | The handling of the "tail" of the key (the last < 32 bytes) is
       | slightly odd (the "if (l & 1) { mix 1 byte of key }" happens
       | before 8-byte chunks and 2-byte chunks), but if it passes
       | SMHasher it should be fine for general use.
        
         | HexDecOctBin wrote:
         | > if it passes SMHasher it should be fine for general use.
         | 
         | Author of the post writes:
         | 
         | > I kept making changes until the tests passed.
         | 
         | If SMHasher has become a target, shouldn't it be considered a
         | bad measure now? [1]
         | 
         | [1]: https://en.wikipedia.org/wiki/Goodhart%27s_law
        
           | gliptic wrote:
           | Hash tests have always been targets. What else are you
           | supposed to do for non-cryptographic hashes?
        
             | vanderZwan wrote:
             | Not OP, but I suppose there might be a difference between
             | blindly trying to pass a test by tweaking numbers, or a
             | more principled approach to figuring out how to introduce
             | better randomness.
             | 
             | OTOH, I would expect that tests for hashing and random
             | number generation in particular would be naturally
             | resistant to overfitting, due to the nature of the problem.
             | But I'm not an expert at this topic, so I'd love to hear
             | someone else's take on that
        
               | gliptic wrote:
               | Maybe, but from what I've seen you run out of what theory
               | can say about your non-crypto hash function pretty
               | quickly.
        
               | HelloNurse wrote:
               | If this is the case, improvements come from tuning
               | constants and calculation and tests are more valuable.
        
       | snvzz wrote:
       | How does it compare to CRC64, for the purpose of detecting
       | errors?
        
         | GTP wrote:
         | With CRC, you have algebraic guarantees on which kind of errors
         | it detects (and which kind doesn't), while with hash functions
         | collisions are essetially random (unless you have some clever
         | insight on how it works internally). Which one is best, depends
         | on the context.
        
         | AlotOfReading wrote:
         | There are few reasons to use a CRC64. Your purposes would
         | probably be just as well-served with a good CRC32.
         | 
         | CRCs allow you to precisely tune the type of error detection
         | guarantees you have to the application. You just pick the
         | combination of hamming distance and polynomial size that you
         | like best from the CRC Zoo:
         | 
         | https://users.ece.cmu.edu/~koopman/crc/index.html
         | 
         | A hash function is essentially just a statistical version with
         | the avalanche property that works acceptably well the majority
         | of the time instead of a hard guarantee.
        
       | teo_zero wrote:
       | While the benefit of processing chunks of 8 bytes is obvious,
       | what's the purpose of grouping those into macrogroups of 4? Does
       | it trigger any implicit parallelism I failed to spot? Or is it
       | just to end this phase with the 4 h[] having had the same amount
       | of entropy, and thus starting the next one with h[0]?
       | 
       | > The way it loads the 8 bytes is also important. The correct way
       | is to load via shift+or > This is free of any UB, works on any
       | alignment and on any machine regardless of it's endianness. It's
       | also fast, gcc and clang recognize this pattern and optimize it
       | into a single mov instruction on x86 targets.
       | 
       | Is a single MOV instruction still fast when the 8 bytes begin on
       | an odd address?
        
         | e4m2 wrote:
         | > Is a single MOV instruction still fast when the 8 bytes begin
         | on an odd address?
         | 
         | On x86, yes. There is no performance penalty for misaligned
         | loads, except when the misaligned load also happens to straddle
         | a cache line boundary, in which case it _is_ slower, but only
         | marginally so.
        
         | xxs wrote:
         | like mentioned x86/64 is quite generous with non-aligned
         | access, yet on architectures that require aligned loads, they
         | will be aligned (all lowest bits being zero at the start), so
         | it will continue being aligned with each 8 byte load.
        
           | teo_zero wrote:
           | > they will be aligned (all lowest bits being zero at the
           | start
           | 
           | I don't think so. At the start k will be equal to keyIn,
           | which can point to any arbitrary memory location.
        
       | palsecam wrote:
       | See also: "Meow hash" by the legendary Casey Muratori, a fast
       | non-cryptographic hash function:
       | https://github.com/cmuratori/meow_hash
       | 
       | > _It's the fastest hash function we know of, and we have
       | benchmarked all the ones we could find. On modern Intel x64 CPUs,
       | it hashes 16 bytes per cycle single-threaded. This means in cache
       | it can hash at a rate of 64 gigabytes per second on a 4.2gHz
       | machine. Out of cache, it hashes at whatever speed your main
       | memory bus can provide to a single core, since that is usually
       | the limiting factor on modern x64 CPUs._
       | 
       | > _It has also now been tuned to be the fastest hash on small
       | inputs, too. Despite the fact that it is a full 128-bit hash, it
       | still outperforms "fast" 64-bit hashes across all input sizes._
       | 
       | -- https://archive.is/CQOVm (originally
       | https://mollyrocket.com/meowhash)
       | 
       | Discussion on HN: https://news.ycombinator.com/item?id=29038813 &
       | https://news.ycombinator.com/item?id=18262627
        
         | Sesse__ wrote:
         | Not sure why people keep claiming it's the "fastest" when it's
         | far down the list on the SMHasher benchmark list:
         | 
         | https://rurban.github.io/smhasher/doc/table.html
         | 
         | In particular, rapidhash is three times as fast for small
         | inputs and is portable (unlike meow_hash). Plus meow fails one
         | of the SMHasher quality tests.
        
           | palsecam wrote:
           | Right, Meow is not the best when it comes to small inputs.
           | 
           | It shines for large inputs (it's far _up_ the SMHasher list
           | you linked in this case).
           | 
           | It offers a good compromise; it's not bad ( _good_ , even)
           | for both large and small inputs.
           | 
           | Granted, rapidhash is quite good on both fronts, too.
        
             | Sesse__ wrote:
             | Well, it's not even the best AES-NI hash on the list (for
             | large nor small inputs), and it's not passing the quality
             | tests, so why bother?
        
         | zamalek wrote:
         | From the archive link (blog post published 2018):
         | 
         | > To our surprise, we found a lack of published, well-
         | optimized, large-data hash functions.
         | 
         | Murmur was released in 2008. Murmur3 in 2011. Release dates on
         | higher quality functions are not easy to find, but I am sure
         | that there were more around.
         | 
         | This type of thing is why I take Casey's claims with a huge
         | dose of salt.
        
         | ur-whale wrote:
         | > See also: "Meow hash"
         | 
         | Meow hash may very well be fast on large data, but it
         | completely fails the "small code" criteria, which is one of the
         | clearly stated design goal of Chibi.
         | 
         | Generally speaking, the "code as small as possible" is IMO a
         | design constraint that is very often under-estimated or even
         | ignored for these type of algorithms, including the crypto hard
         | ones.
         | 
         | I personally find the "code fits on half a page" metric very
         | important: beyond satisfying some very personal sense of
         | aesthetics, it buys you quite a lot of nice properties:
         | - fits in head        - inlines like a dream        - easy to
         | integrate        - easy to do header only        - easy to
         | security audit        - pretty hard to introduce bugs        -
         | nothing up my sleeve numbers pretty much obvious
         | 
         | For these reasons, I used to love the TEA (tiny encryption
         | algorithm) [1] before cryptanalysis unfortunately showed it to
         | be weak. Its successors (XXTEA, etc...) are already almost too
         | big for my taste.
         | 
         | The speck cipher [2] is quite nice in that regard, in spite of
         | its rather untrustworthy provenance.
         | 
         | [1] https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm
         | 
         | [2] https://en.wikipedia.org/wiki/Speck_(cipher)
        
       | remix2000 wrote:
       | XXH3 omitted from benchmark?
        
       | anfilt wrote:
       | Might also try running it through Smhasher3:
       | https://gitlab.com/fwojcik/smhasher3/-/blob/main/results/REA...
       | 
       | Also here is a hash function I wrote a while ago now:
       | https://github.com/Keith-Cancel/k-hashv/tree/main
        
       ___________________________________________________________________
       (page generated 2024-11-18 23:02 UTC)