[HN Gopher] Simhashing (Hopefully) Made Simple (2012)
       ___________________________________________________________________
        
       Simhashing (Hopefully) Made Simple (2012)
        
       Author : kkm
       Score  : 32 points
       Date   : 2021-04-10 10:52 UTC (12 hours ago)
        
 (HTM) web link (ferd.ca)
 (TXT) w3m dump (ferd.ca)
        
       | Nzen wrote:
       | Simhashing is a style of characterizing the similarity of data.
       | The author begins with the idea that we can discard the first
       | characters of { aaarock, aabjeep, aaareep } to prefer the latter
       | two as most similar and concludes with computing the hamming
       | distance of data.
        
       | jonathankoren wrote:
       | I wrote something similar several years ago on minhashing for
       | near duplicate detection.
       | 
       | https://medium.com/@jonathankoren/near-duplicate-detection-b...
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-04-10 23:01 UTC)