[HN Gopher] Simhashing (Hopefully) Made Simple (2012)
___________________________________________________________________
Simhashing (Hopefully) Made Simple (2012)
Author : kkm
Score : 32 points
Date : 2021-04-10 10:52 UTC (12 hours ago)
(HTM) web link (ferd.ca)
(TXT) w3m dump (ferd.ca)
| Nzen wrote:
| Simhashing is a style of characterizing the similarity of data.
| The author begins with the idea that we can discard the first
| characters of { aaarock, aabjeep, aaareep } to prefer the latter
| two as most similar and concludes with computing the hamming
| distance of data.
| jonathankoren wrote:
| I wrote something similar several years ago on minhashing for
| near duplicate detection.
|
| https://medium.com/@jonathankoren/near-duplicate-detection-b...
| [deleted]
___________________________________________________________________
(page generated 2021-04-10 23:01 UTC)