[HN Gopher] Meow Hash (2018)
___________________________________________________________________
Meow Hash (2018)
Author : DeathArrow
Score : 327 points
Date : 2021-10-29 14:27 UTC (8 hours ago)
(HTM) web link (mollyrocket.com)
(TXT) w3m dump (mollyrocket.com)
| Cilvic wrote:
| I know +1 is against the rules, but I think this deserves a
| positive comment. I enjoyed reading it, thanks for taking the
| time and giving back to the ecosystem.
| anonova wrote:
| A detailed analysis of Meow Hash: https://peter.website/meow-
| hash-cryptanalysis
|
| It's not the highest of quality hash functions (see the SMHasher
| benchmarks), but it is fast. A great alternative is XXH3
| (https://cyan4973.github.io/xxHash/), which has seen far more
| usage in practice.
| thomasahle wrote:
| XXH3 has issues too. There are certain inputs, independent of
| the seed, on which it collides much more often than it should.
|
| On the other hand a hash like UMash guarantees low collisions
| on any input:
| https://engineering.backtrace.io/2020-08-24-umash-fast-enoug...
| jdcarter wrote:
| I'm using XXHash3 for verifying data integrity of large (10+MB)
| blobs. Very fast and appears to work very well in my testing--
| it's never missed any bit error I've thrown at it.
|
| Aside: when storing hashes, be sure to store the hash type as
| well so that you can change it later if needed, e.g.
| "xxh3-[hash value]". RFC-6920 also has things to say about
| storing hash types and values, although I haven't seen its
| format in common use.
| DeathArrow wrote:
| Useless analysys since the author says it's not a cryptographic
| hash but useful as a fast hash for change detection.
|
| "we wanted a fast, non-cryptographic hash for use in change
| detection and deduplication"
|
| >A great alternative is XXH3
|
| Meow Hash is twice as fast.
| iratewizard wrote:
| Meow hash is also written by people who initially thought
| SHA-1 was acceptable for large scale change hashing.
| MauranKilom wrote:
| If you had made it one section into the analysis, you would
| have seen that _at the time_ MeowHash made certain
| cryptographic claims that the author set out to disprove.
|
| The readme has since been updated. I didn't check whether any
| algorithmic changes were made on top, but the discussion of
| the analysis on github didn't point to a lot of low-hanging
| fruit.
| [deleted]
| LeoPanthera wrote:
| It's not useless analysis, because even for non-cryptographic
| hashes you want the likelihood of any arbitrary hash to be
| roughly equal. A hash function which "prefers" certain
| outputs has a far higher probability of collision.
| IncRnd wrote:
| Don't you think asset planting is an attack against a game's
| pipeline?
|
| The author of the article's page claims the hash is not
| cryptographic but actually goes on to make security claims
| about the hash. People who do not understand cryptography
| should be careful about making such claims. The author appear
| to understand this more than your comment demonstrates.
|
| For example, a claim about change detection is a
| cryptographic claim of detecting preimage attacks. In a
| threat model, a security professional would determine whether
| a first preimage or a second preimage attack is what should
| be guarded in attack scenarios. Then, the professional would
| help with analysis, determining mitigations, defense in
| depth, and prioritization of fixing the vulnerabilities
| exposed by how the hash is used.
|
| A hash cannot be considered standalone. It is the
| architecture and use-case where the hash's security
| properties are used to determine what security properties of
| the application are fulfilled.
|
| So, if the author is correct, which seems to be the case,
| then meowhash should not be used in a production environment
| outside of the most simplistic checks. It seems faster for
| its intended use case to simply check for a single bit
| difference between two images - no hash required.
| null000 wrote:
| > it seems faster for its intended use...
|
| But then you have to store the entire before & after
| locally? That's the entire point of using a hash for change
| detection.
| IncRnd wrote:
| > But then you have to store the entire before & after
| locally?
|
| Yes, there is difference between two (as you say) and
| there is integrity (modification detection). In the case
| of comparing new assets in a pipeline to those that were
| created earlier, it sounds plausible both copies would be
| present.
|
| > That's the entire point of using a hash for change
| detection.
|
| This is called integrity protection. Change detection is
| the incorrect term to use here. Please see what I
| referenced earlier for first and second preimage.
| ilitirit wrote:
| I can vouch for xxHash (it does what it says on the can), but
| I'm really curious to hear from who have experience with meow
| hash.
| kazinator wrote:
| > _Because we have to process hundreds of gigabytes of art assets
| to build game packages, we wanted a fast, non-cryptographic hash
| for use in change detection and deduplication. We had been using
| a cryptographic hash (SHA-1), but it was unnecessarily slowing
| things down._
|
| That's because you were doing a potentially silly thing: hashing
| every byte.
|
| What does a hash tell you? If two hashes are different, it
| confirms that the objects are different. If two hashes are the
| same, they could be the same object or they could be different.
|
| For that purpose, you don't have to hash everything: just hash
| the areas where objects are likely to differ.
|
| If two 15 gigabyte files are likely to have differences in their
| first kilobyte, then hashing just first kilobyte (negligibly
| fast) will work just as well as hashing 15 gigabytes.
|
| If two files are of different length, then they are not the same.
| How about:
|
| hash(<length> + <first kilobyte> + <last kilobyte>)
|
| For deduplication, a 32 bit CRC could probably be reasonably
| efficient. No matter what hash you use, you have to verify the
| collisions. If CRC32 is enough to make false positives rare over
| your type of data set, it's good enough.
| andruc wrote:
| "just hash the areas where objects are likely to differ"
|
| Correct me if I'm wrong but given these are art assets,
| wouldn't "determining where the objects are likely to differ"
| itself be an expensive/hard problem?
| cb321 wrote:
| It really depends. The early bytes probably contain things
| like image dimensions which might discriminate a lot - or not
| at all. They might contain palette-style color information
| which might discriminate tons - or not at all. Most things
| with hash functions depend upon distributions of inputs
| relative to the function, though.
|
| I think it would be accurate to say "Determining where they
| are likely to differ is not necessarily hard, but it could
| be." If it happens to be easy then you can radically change
| the scale of the problem.
|
| File length alone is a weak hash actively maintained by any
| file system and so available "for free" (or one stat call).
| And it can work surprisingly well, as the grandparent alludes
| to.
|
| EDIT: You actually do not need to even open(2) files that are
| not part of "same size clusters", never mind hash their data.
| This alone can really squash the problem size over a more
| naive approach even if you did feel the need to hash the
| whole file.
|
| EDIT: And in terms of performance, if this data has to
| actually be read off a drive then the hash need not go any
| faster than the drive IO which is probably much, much slower
| than most candidate hash functions. Even PCIe5.0 NVMes are
| only like 14 GB/s, SATA buses like 0.75 GB/s, etc.
| kazinator wrote:
| To avoid the hash being entirely oblivious to bytes beyond
| a small block at the start, we could take additional
| samples of the file at exponentially increasing offsets:
|
| Hash the first kilobyte.
|
| Hash the 16th kilobyte
|
| Hash the 256th kilobyte
|
| Hash the 4096th kilobyte (4M)
|
| Hash the 65536th kilobyte (64M)
|
| Hash the 1048576th kilobyte (1G)
|
| Then 16G, 256G, ...
|
| It might be worth including the last kilobyte. (I think I
| mentioned that.) If the file format has trailing metadata,
| that can help, especially if there is any sort of checksum
| in it.
| cb321 wrote:
| Sure. I agree various sampling strategies are a good
| design, as long as they don't add up to much IO. Making
| that a "user parameter" to be customizable seems ideal.
| What you suggest may make a fine default - if adapted to
| varying file sizes somehow. E.g., that Nim dups program I
| mentioned elsewhere could just take a sequence of slices
| instead of only one..maybe just ignoring out of bounds
| slices. { And for size clusters where the size < N, screw
| it - just hash the whole thing, probably. ;-) }
| cb321 wrote:
| This is roughly how
| https://github.com/c-blake/cligen/blob/master/examples/dups....
| can work. There is little question that changing the scale of
| the problem very much lessons performance sensitivity to the
| hash.
| pepoluan wrote:
| CRC32 is actually very slow in modern CPUs. So if you can use
| an algorithm that is both (1) faster _and_ (2) better quality,
| why not use that algo instead?
| kazinator wrote:
| > _CRC32 is actually very slow in modern CPUs_
|
| Dd you read my whole comment? If we are hashing just a few
| kilobytes of a multi-gigabyte file, it almost certainly
| doesn't matter.
| midnightclubbed wrote:
| I wouldn't categorize it as 'very slow', at worse 50% slower
| than 128bit XXH3 hash (and faster than the 32bit XXH hash).
|
| Modern intel processors have the CRC32C instructions and ARM
| have both CRC32 and CRC32C. There is a reluctance in some of
| the hash comparisons to use the hardware CRC implementations
| for compatibility reasons, but most modern CPUs have them.
|
| The main argument for using a hardware CRC32 (and to some
| extent a software one too) is simplicity. If your use-case
| needs decent (but not blinding) performance, then there is
| something to be said for a hash implementation that can be
| written in ~10 lines of code. You still have to handle the
| possibility of collisions but any 32bit hash has that
| restriction, which is fine for many uses.
| ed25519FUUU wrote:
| > _The Meow hash is a high-speed hash function named after the
| character Meow in Meow the Infinite._
|
| As someone who doesn't watch cartoons I'm constantly surprised at
| the influence cartoons and anime is having on modern tech
| especially with regards to computer science.
| phtrivier wrote:
| In this case, the comic Meow the Infinite is actually done by
| the same game shop behing the meow hash.
| caeril wrote:
| Not sure you can call MR a "game shop". Pretty sure it's just
| Casey, with Anna on contract.
| arduinomancer wrote:
| To add some context Casey is one of the creators of Meow the
| Infinite
| alecco wrote:
| (2018)
| amenghra wrote:
| Would be interesting to compare it to siphash.
| adamgordonbell wrote:
| Found this in the readme: Due to recent
| discoveries by Peter Schmidt-Nielsen, we have decided to
| reclassify Meow hash 0.5/calico from level 3 to level 1. This
| means that we recommend not to use this hash for message
| authentication codes, or for hash tables in scenarios where
| collision induced denial-of-service attacks are a concern.
| For level 3/MAC capabilities consider migrating to SipHash. If
| the performance of SipHash is not satisfying, continuing to use
| Meow 0.5 for hash tables is better than migrating to another
| fast hash.
| CodesInChaos wrote:
| Siphash has pretty bad performance on large inputs. From what I
| remember, even MD5 was faster.
| Comevius wrote:
| Google made faster Siphash variants, and also HighwayHash
| that's much faster.
|
| https://github.com/google/highwayhash
| kortex wrote:
| This is great! Many applications (like dedupe) don't need full
| crypto guarantees.
|
| If you need something fast, and crypto secure, I recommend
| checking out Blake3/b3sum. I'm just learning about XXH3 in this
| thread so I cannot comment on how it stacks up but I love b3sum
| for fast file hashing.
|
| https://github.com/BLAKE3-team/BLAKE3/tree/master/b3sum
| scriptdevil wrote:
| I recently used seahash in rust for a similar job. This seems to
| fall in the same niche.
| buildbot wrote:
| This could be super useful for deduplicating large photo
| datasets!
|
| It might be worth changing this link to the github, there's a lot
| more info on the hash there.
| NKosmatos wrote:
| Came here to say the same thing :-) Anyone know of a free(ish)
| software based on a very fast hashing algorith for checking
| large number of images? My intention is to check for bit-rot
| and other unforeseen data corruptions between backup disks, I
| assume many others must have a similar use case.
| bityard wrote:
| > This could be super useful for deduplicating large photo
| datasets!
|
| That is indeed the use case mentioned in the article. A hash
| like this is useful for deduplicating large _file_ datasets but
| for deduplicating images in particular, you really want a
| perceptual hash. Because two images files can _look_ identical
| but have slightly (or wildly) different byte streams. The
| trade-off is that perceptual hashes are notorious for false
| positives in large datasets.
| kayamon wrote:
| I once saw a guy dedupe a folder by batch-converting them to
| low quality JPEG thumbnails then sorting by size.
| buildbot wrote:
| Yep, just happy someone was looking into it, I currently use
| xxHash for this.
|
| Perceptual hashing for photos isn't great in my opinion, for
| example I have many raw photos that look very similar, the
| only difference may be a slightly higher shutter speed, or a
| correction of the camera angle. I'm guessing many perceptual
| hashes would mark these as duplicate. Maybe that's what many
| want, it's not what I want personally.
| pepoluan wrote:
| And what's wrong with xxHash?
|
| According to xxHash's performance comparison page, xxHash
| is faster than MeowHash.
| da_big_ghey wrote:
| Seahash came out very well last I have seen in it, any
| comparison? https://docs.rs/seahash/4.1.0/seahash/
| pepoluan wrote:
| XXHash author last updated the following page on Aug 10 of this
| year:
|
| https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...
| wiskinator wrote:
| I love this website because I went to the link, didn't know what
| a PWA was, went to the root website and _STILL_ don 't know what
| a PWA is.
|
| I'm going to try the old trick of saying the wrong thing so
| someone corrects me.
|
| Clearly a PWA is a Pug With Attitude. Is this a Jonathon and
| Noodle sponsored app?
| PebblesRox wrote:
| A PWA is a Progressive Web App.
|
| https://developer.mozilla.org/en-US/docs/Web/Progressive_web...
| bowmessage wrote:
| I think you may have commented on the wrong article.
|
| (Did you mean to comment on "Publish Your PWAs to the iOS App
| Store" [0])?
|
| [0] https://news.ycombinator.com/item?id=29040793
| apavlo wrote:
| SMHasher benchmark numbers are here (including Meow Hash):
|
| https://github.com/rurban/smhasher#readme
| CodesInChaos wrote:
| The quality column on that page seems dubious, since it lists
| test failures for secure cryptographic hashes.
| pkhuong wrote:
| Cryptographic hashes mostly look at collisions for the whole
| output (or precisely truncated outputs). There can still be
| patterns or clumping in short ranges of the output bits for
| certain input families, as long as the full outputs differ.
|
| For example, the identity function is extremely resistant to
| collisions. However, it doesn't mix the input bits, so it's
| not a great hash function.
| lmilcin wrote:
| Now the only issue is, I will be an instant subject of ridicule
| if my team sees me using something called "Meow Hash".
| 0des wrote:
| Why?
| failrate wrote:
| Too bad, there's also the very useful Squirrel3 Noise Hash.
| recursive wrote:
| Not everyone can do it, but I recommend ignoring ridicule
| that's not well founded.
| jnwatson wrote:
| You better not use a VCS that's also a common insult then.
| NelsonMinar wrote:
| I wish there was a consensus non-cryptographic hash algorithm.
| Something that hashes arbitrary data to 128 or 256 bit keys with
| good randomness, few collisions on typical input data, and
| universal implementation.
|
| Most programmers I know reach for SHA-1 (or if we're being
| honest, MD5). But a cryptohash is really wasteful if you don't
| need the cryptographic properties.
|
| Last I looked at this Murmurhash was the closest to widely used
| with some folks saying Metrohash was better. But that was years
| ago. Judging by this discussion xxHash is the new hotness?
| bmn__ wrote:
| > good randomness, few collisions on typical input data, and
| universal implementation
|
| The relevant standards body NIST has issued SHA-3 in 2015.
|
| Spread the word that programmers should not reach for outdated
| hash algos. <https://valerieaurora.org/hash.html>
| tptacek wrote:
| The chart on that page has always squicked me out a bit,
| because SHA-2 is certainly still "considered strong" by
| cryptography engineers.
| pepoluan wrote:
| xxHash is not really "new", it's been available since 2014. At
| least, that's the oldest Python bindings available on PyPI:
|
| https://pypi.org/project/xxhash/0.0.1/
|
| Sure, as it developed, xxHash variants appear. The latest in
| the family is XXH3 and it's already being used at least since
| 2019:
|
| http://fastcompression.blogspot.com/2019/03/presenting-xxh3....
| chronogram wrote:
| You want to have XXH128 for that right? 128 bit, portable,
| virtually impossible to collide, and only slightly slower than
| XXH3 while still way faster than most options.
| pspeter3 wrote:
| I'm curious too about the differences between Murmurhash and
| xxHash. I've been using Murmur in my side projects.
| pkhuong wrote:
| The production of seed-independent collisions for various
| versions of murmurhash (e.g.,
| http://emboss.github.io/blog/2012/12/14/breaking-murmur-
| hash...) motivated siphash. In general, when there's no
| positive proof of collision bounds, I would assume (un)lucky
| inputs can collide much more often than usual, even when you
| change the seed.
|
| The difference between murmurhash and xxhash in that regard
| is that anyone can download code to generate murmurhash
| collisions, while code to make xxhash cry is still
| unknown/secret.
|
| XXH3 is also a lot faster for long strings, and I believe
| comparable to murmurhash for short inputs.
| tromp wrote:
| > I wish there was a consensus non-cryptographic hash algorithm
|
| I think Siphash serves that role pretty well. It's one of the
| most if not the most secure among non-cryptographic hashes, and
| quite speedy, especially with SIMD implementations. One need
| only consider alternatives if one wants to trade off a bit of
| security for even higher speed.
| NelsonMinar wrote:
| Siphash is more like a crypto has right? According to
| smhasher its performance is somewhere between a non-crypto
| hash like xxHash or Meow and something like SHA-1.
| tromp wrote:
| As its Wikipedia page says:
|
| Although designed for use as a hash function to ensure
| security, SipHash is fundamentally different from
| cryptographic hash functions like SHA in that it is only
| suitable as a message authentication code: a keyed hash
| function like HMAC. That is, SHA is designed so that it is
| difficult for an attacker to find two messages X and Y such
| that SHA(X) = SHA(Y), even though anyone may compute
| SHA(X). SipHash instead guarantees that, having seen Xi and
| SipHash(Xi, k), an attacker who does not know the key k
| cannot find (any information about) k or SipHash(Y, k) for
| any message Y [?] {Xi} which they have not seen before.
| Traubenfuchs wrote:
| > we have benchmarked all the ones we could find
|
| Those results would increase the value of this article from "very
| low" to "medium".
| chalst wrote:
| The README for xxhash has benchmarks covering fast hashes
| including Meow:
|
| https://github.com/Cyan4973/xxHash/wiki/Performance-comparis...
| cb321 wrote:
| I have measured (on a Skylake core) the hash from Daniel Lemire,
| Owen Kaser, "Faster 64-bit universal hashing using carry-less
| multiplications, Journal of Cryptographic Engineering (to
| appear)" at t_cycles = (0.00741 +- 0.00098 cycles/byte) * bytes +
| (37.43 +- 0.48) which is roughly 135 B/cycle (675 GB/s at 5 GHz)
| or over 2x faster than the claim here. Admittedly, that 37 cycle
| fixed cost is probably quite a bit higher, and I don't know how
| well the bit mixing compares.
|
| Parenthetically, a time equation (like t_cycles = coef * bytes +
| fixedCost) is a more convenient way to summarize performance of
| this kind of thing than the constant short string/long string
| mumble-mumble. Yes, loop unrolling/etc. can create step-ish
| functions and a linear regression is pretty approximate (but my
| stderrs are pretty small above), but even so..much less tripping
| over words.
| pkhuong wrote:
| umash (https://github.com/backtrace-labs/umash) has a similar
| structure PH block structure (and similar universality
| guarantees), but was designed for decent bit mixing (enough to
| satisfy smhasher, unlike CLHASH, which needs an additional
| finalizer) with a lower fixed time cost: 22 cycles for a one-
| byte hash on a 2.5 GHz Xeon 8175M.
|
| I'm not sure how one would use that linear regression. What
| kind of hardware offers 675 GB/s of memory bandwidth? 140
| bytes/cycle is easily more than twice the L2 read bandwidth
| offered by any COTS chip I'm aware of. There are also warm up
| effects past the fixed cost of setup and finalizers that slow
| down hashing for short input. For what range of input sizes
| (and hot/cold cache state) would you say the regression is a
| useful model?
| cb321 wrote:
| I was only doing short < 1024 byte strings in L1 (like URI
| lengths-ish). I'm well aware no memory system on typical HW
| can deliver that (EDIT: "at large space scale"). The article
| itself mentions a similar calculation (but at their slower
| scale) which is why I included it.
|
| And yes, warm up absolutely matters. My numbers are for the
| best case path which is among the only easy things to even
| _define_ , IMO. Worst and average are sooooo much
| harder/assumption-riddled. Then you argue a lot about said
| assumptions.
|
| I do the minimum times of many runs in an inner loop and the
| regression on those min-time results. So, the min-loops will
| warm up everything like branch predictors, etc. I'd suspect
| the regression model would hold up to usable L1 cache (e.g.
| 30 KiB or so). Beyond that, it's mostly just measuring the
| memory system not the hash function..and I'm not sure what
| the point of that is. Separation of concerns would suggest L1
| perf is most relevant to the "hash calculation part". (IMO,
| many benchmarks confuse this.)
|
| And, yeah..modern memory systems suck. Simple back of the
| envelope would suggest some DIMM-oriented workload would be
| waiting on the DIMMs like 95% of the time, usually much
| higher on Xeons with their low single core BW (maybe 98%!).
| The Meow article sort of alludes to that as well with less
| detail. Single core Xeon BW is always a huge disappointment.
| Forces you to go (very) parallel just to saturate DIMMs
| (EDIT: which in turn cannot saturate even a single L1!).
|
| BTW, it sounds like you know all of this (and maybe more) and
| I don't mean to suggest otherwise, but I thought some
| numbers/details might assist other passersby.
|
| { EDIT: Also, the stderr on that slope is ~13%. So, it's
| possible 135 is really 128 (1.05x lower), the 2 cache
| line/cycle load limit or something like that. This doesn't
| mean the 640 GB/s rate number is wrong - it just underscores
| how mismatched memory systems are at their various scales. }
|
| { EDIT: and yes, 128/16 = 8x faster than meow, not 2x. Oops.
| }
| pbsd wrote:
| 1/135 cycles per byte on Skylake is just plain impossible, even
| if the hash consisted of simply one xor per 32 bytes of input.
|
| The lower bound for CLHASH would be the cost of one carryless
| multiplication per 16 bytes of input, or in other words 1/16 ~
| 0.0625 cycles per byte, since you can only issue one such
| multiplication per cycle.
| cb321 wrote:
| Feel free to measure it yourself instead of just speculating.
| (And as mentioned elsewhere, it is probably 1/128.) { EDIT:
| and I never measured meow - it could be faster than 1/16
| cycle/byte but poorly assessed; just be sure to use a min
| loop and small data. }
| invincivlepvt wrote:
| <a href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Search Engine Optimization</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Search Engine Optimisation</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Search Engine Optimisation
| Jalandhar</a> <a href="https://invincivlepvt.com/search-engine-
| optimization-jalandh..." rel="nofollow">Search Engine
| Optimization Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services Agency</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services Company</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services Agency Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">SEO Services Company Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best SEO Services Company</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best SEO Services</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best SEO Services Agency</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best Search Engine Optimisation
| Services</a> <a href="https://invincivlepvt.com/search-engine-
| optimization-jalandh..." rel="nofollow">Best Search Engine
| Optimization Services</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best Search Engine Optimization
| Services Jalandhar</a> <a href="https://invincivlepvt.com/search-
| engine-optimization-jalandh..." rel="nofollow">Best Search Engine
| Optimisation Services Jalandhar</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Local SEO</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">Best Local SEO</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">National SEO</a> <a
| href="https://invincivlepvt.com/search-engine-optimization-
| jalandh..." rel="nofollow">National SEO</a>
| arduinomancer wrote:
| A collision was reported a couple months ago
|
| https://github.com/cmuratori/meow_hash/issues/80
| Accacin wrote:
| The back and forth in that thread is incredibly interesting
| too.
| cbsmith wrote:
| As noted in the post title, the original article is from 2018. I
| had either forgotten about Meow hash or just never knew about it,
| so I appreciate the post, but I was curious about the timing of
| the post.
| bmn__ wrote:
| > I was curious about the timing of the post.
|
| cmuratori mentioned meowhash in passing in a recent (i.e. few
| days ago) Refterm or optimisation video. It is possible that
| user DeathArrow watched this, too, and decided to submit the
| project homepage to HN.
| cateye wrote:
| I really enjoyed reading this the clear and concise way of
| explanation as documentation.
|
| While I was totally not interested in a hash function whatsoever,
| I read the whole thing :)
| rurban wrote:
| For comparison: https://github.com/rurban/smhasher/
|
| It's indeed pretty fast ( among the top 5), but not particularly
| good. It fails some basic tests. Rather use xxh3, t1ha or
| farmhash
| daniel-thompson wrote:
| Can you elaborate on what it fails?
| https://mollyrocket.com/meowhash claims:
|
| > It cleanly passes every test in smhasher
| pepoluan wrote:
| MeowHash page last updated 2018.
|
| smhasher page last updated several days ago.
| [deleted]
| oefrha wrote:
| Discussed at the time:
| https://news.ycombinator.com/item?id=18262627
| dang wrote:
| Thanks! Macroexpanded:
|
| _Meow Hash: A high-speed non-cryptographic hash function_ -
| https://news.ycombinator.com/item?id=18262627 - Oct 2018 (68
| comments)
|
| Also:
|
| _Cryptanalysis of Meow Hash_ -
| https://news.ycombinator.com/item?id=27978878 - July 2021 (9
| comments)
| wolf550e wrote:
| How does it compare to XXH3?
|
| https://github.com/Cyan4973/xxHash
|
| https://github.com/rurban/smhasher#readme says XXH3 is 16GB/s
| while meow is 17GB/s.
| caeril wrote:
| > How does it compare to XXH3?
|
| XX is a slightly better hashing function, meow is slightly
| faster.
|
| Take your pick. Life is about tradeoffs.
| pepoluan wrote:
| Well, the page for MeowHash is written in 2018.
|
| xxHash Performance Comparison page is last updated in 2021,
| and in that page XXH3 is faster than MeowHash:
|
| https://github.com/Cyan4973/xxHash/wiki/Performance-
| comparis...
| DeathArrow wrote:
| i7-9700K can reach up to 4.9 Ghz and Meow Hash can hash at
| a rate of 16 bytes per cycle. That means 78 GB/s.
|
| xxHash rate is 31.5 GB/s on that CPU.
| wolf550e wrote:
| Scroll down, XXH3 is 59 GB/s using AVX2 while meow is 58
| GB/s in their benchmark.
| snapetom wrote:
| I've been using Hashids (https://hashids.org/). It's been fine
| performance-wise and well supported across languages.
| andruc wrote:
| Care to share your definition of fine, and give at least a
| ballpark on the size of the assets you're using it on?
| snapetom wrote:
| It's a pretty small app for some researchers. Front end is
| Angular, backend is an API in Bottle. At most, there's 10
| people generating data, and about 40 people on reading data.
| Most times are 10 users uploading, and 5-10 users reading.
| The readers are external partners. Hashid is to obfuscate ids
| we have in another system so these external people cannot tie
| the data back due to privacy issues. It generates about 1000
| rows of data across 4 PostGresql tables in a given day.
|
| I like how the algorithm has been well supported across
| languages. We have no problems with it on the Python side. An
| analyst runs monthly reports using R against the data (so
| about 30k records pulled. Numbers are summarized and stats
| are generated. The hashing isn't a bottleneck at all.
___________________________________________________________________
(page generated 2021-10-29 23:00 UTC)