[HN Gopher] Application of Locality Sensitive Hashing to audio f...
       ___________________________________________________________________
        
       Application of Locality Sensitive Hashing to audio fingerprinting
        
       Author : pncnmnp
       Score  : 30 points
       Date   : 2024-04-01 11:46 UTC (1 days ago)
        
 (HTM) web link (santhoshhari.github.io)
 (TXT) w3m dump (santhoshhari.github.io)
        
       | abetusk wrote:
       | Does anyone have experience with how well this works?
        
         | johanvts wrote:
         | How well it works for identifying music is going to be an issue
         | of how well you can construct the feature vector which is not
         | covered. If you map every song in to the same small subset of
         | space, LSH will struggle. Map them to well separated places and
         | it will shine.
        
           | bravura wrote:
           | This is wrong, or misleading at best.
           | 
           | It's a known difficult open problem to create a good audio
           | representation (feature vector). Good, in the case of LSH,
           | would be that representation distance correlates well with
           | human perception.
           | 
           | For something like Shazam, you want this representation to be
           | invariant to minor transformations of the audio. That's the
           | interesting problem.
        
         | pncnmnp wrote:
         | Check out Steve and Iran's website
         | https://musicinformationretrieval.com/
         | 
         | They have a bunch of Colab notebooks where they explore various
         | music IR ideas, including LSH.
         | 
         | GitHub:
         | https://github.com/iranroman/musicinformationretrieval.com/
        
       | bequanna wrote:
       | Does anyone know how this approach differs from the approach
       | taken by the popular(ish) dejavu project?
       | 
       | https://github.com/worldveil/dejavu
        
         | willseth wrote:
         | Last I checked Dejavu doesn't use LSH or any approximations. It
         | simply queries for exact matches, "aligns" them, and determines
         | if there is enough signal from the noise to consider it a
         | match.
        
           | bequanna wrote:
           | Ahh, I see.
           | 
           | The approach here is just getting vector nearest neighbors.
           | 
           | I guess I don't see how this could work like "Shazam" when
           | you're comparing the vector of some sample to the vector for
           | the whole song.
        
             | willseth wrote:
             | Oh, no, it's not for the whole song. For each song, you
             | take N samples, each sample gets converted into a
             | fingerprint, and all of the fingerprints are stored in a
             | database. Then when you query, you perform the same sample-
             | to-fingerprint computation, and query the database for
             | matching fingerprints. The database will return many false
             | positives, but only 1 will "align", meaning the relative
             | timing of the fingerprints returned is in sync with an
             | actual song in the database. Once the number of aligned
             | fingerprints reaches a set threshold, you can stop and
             | return it as a result. That's how it performs subset
             | matching.
        
               | bequanna wrote:
               | Gotcha.
               | 
               | So taking n overlapping samples for the song,
               | fingerprinting then vectorizing each. Then doing the same
               | thing for the captured sample and getting nearest
               | neighbors.
               | 
               | I've used dejavu for a personal project and it is quite
               | fast. Is the LSH approach somehow more
               | efficient/attractive now because we've made strides in
               | vector nearest neighbors search?
        
               | willseth wrote:
               | Almost, but dejavu doesn't search nearest-neighbors. It
               | searches for exact matches, so its robustness is
               | essentially based on oversampling. Nearest neighbor
               | search might be a way to achieve similar robustness with
               | fewer fingerprints - assuming it results in fewer false
               | negatives. LSH is one way to approach nearest neighbor
               | search. So in theory you could modify dejavu with an LSH-
               | based NN search and potentially increase robustness with
               | less data, modulo whatever tradeoffs are necessary for
               | computing NN search.
        
       ___________________________________________________________________
       (page generated 2024-04-02 23:01 UTC)