[HN Gopher] Fuzzy Joins (Minhash)
___________________________________________________________________
Fuzzy Joins (Minhash)
Author : yellowflash
Score : 43 points
Date : 2022-06-24 04:37 UTC (2 days ago)
(HTM) web link (blog.yellowflash.in)
(TXT) w3m dump (blog.yellowflash.in)
| goldenkey wrote:
| Brilliant stuff. Isn't the XORing essentially just equivalent to
| a 1-time pad -- which isn't very hash-like. I'd think using a
| PRNG [1] with the initial hash value as the seed, to generate
| more values, would be more effective.
|
| https://en.wikipedia.org/wiki/Pseudorandom_number_generator
| davesque wrote:
| Yeah, I had a similar intuition about that approach possibly
| being weak or problematic. However, if my cursory reading of
| the algorithm is at all correct, the goal may simply be to
| consistently choose another random hash value (a different min
| hash) and it may work fine. In other words, the same hash
| values would be considered the minimum after XORing with a
| random number _and_ they will be different from the initial
| hash that was selected as the minimum without XORing. Those are
| the behaviors that matter and not so much that the hashes that
| result from the XORing are cryptographically secure.
___________________________________________________________________
(page generated 2022-06-26 23:00 UTC)