[HN Gopher] Videohash - Perceptual video hashing python package
       ___________________________________________________________________
        
       Videohash - Perceptual video hashing python package
        
       Author : akamhy
       Score  : 42 points
       Date   : 2021-10-11 15:59 UTC (7 hours ago)
        
 (HTM) web link (pypi.org)
 (TXT) w3m dump (pypi.org)
        
       | phenkdo wrote:
       | can you explain how you are hashing the video? i took a quick
       | look at the github and don't see details...
        
         | fxtentacle wrote:
         | It's open source ;)
         | 
         | Apparently, they extract video frames using FFMPEG, create a
         | collage out of those frames, then use the whash method of the
         | python imagehash package.
         | 
         | So it's basically reducing video hashing to image hashing,
         | which was previously solved.
        
         | hetspookjee wrote:
         | I think it creates a collage of the video frames:
         | https://github.com/akamhy/videohash/blob/8759b6ad7fdabcdf4dd...
         | 
         | and passes that on to the videohash.py module to generate a
         | hash:
         | https://github.com/akamhy/videohash/blob/main/videohash/vide...
         | 
         | by using the library imagehash:
         | https://pypi.org/project/ImageHash/
        
           | [deleted]
        
         | giantrobot wrote:
         | They're extracting video frames at some interval (default of 1
         | second) as 144x144px stills and then turning them into a square
         | collage. That collage then has a perceptual hash performed on
         | it.
         | 
         | The major problem here is two videos with the exact same
         | content but slightly different times (say one with a couple
         | second intro) will rarely if ever have a positive match.
         | 
         | The only cases I see where this particular scheme helpful is
         | where you've got videos with the same contents but different
         | encodings. The length will be the same but quality between two
         | encodings (and names) might be different. This would help you
         | find them in a sea of files.
         | 
         | A simple improvement would be to only check the frame from the
         | middle of each video first. If the frame at the same time stamp
         | are the same in one part you've got a non-zero probability of a
         | match. Then you can attempt to check more frames radiating out
         | from the center point. Negative matches will fail fast and save
         | you work. It also matches when the lengths are dissimilar
         | because of trims or splices at the beginning and end of the
         | videos.
         | 
         | A second improvement would be to pick a frame from the A video
         | and scan through the B video (or segment of each) to find a
         | high probability match. Then check other segments of the video
         | for matches in the same way.
         | 
         | Trying to turn a video into a single static representation and
         | comparing it is not the best.
        
           | jsdwarf wrote:
           | Wouldn't it make more sense to convert the video to greyscale
           | and e.g. detect significant changes of brightness during
           | frames and store them as vector coordinates (% of the
           | playtime, brightness delta)?
        
             | giantrobot wrote:
             | That could work. But I think limiting your search to
             | brightness patterns is going to make for a lot of false
             | positives. The brightness search might make for a good
             | first pass to find a subset of the corpus for a more in
             | depth search.
        
       | willcodeforfoo wrote:
       | Tangentially-related: What's the state of the art for storing a
       | bunch[1] of hashes like the OP (or pHash, etc.) in PostgreSQL and
       | querying by hamming distance in a reasonable time?[2]
       | 
       | pg_similarity? pg_trgm? cube?
       | 
       | [1]: 10-50 Million
       | 
       | [2]: < 200ms
        
         | varelaz wrote:
         | I don't know if it makes sense to query hamming distance for
         | hash. Closest hashes don't guarantee closest images at all. You
         | can check for amount of parts matching by query like: select
         | video_id from video_hashes where hash in (...) group by
         | video_id order by count(distinct hash) desc limit 10
         | 
         | technically it can be fast since selection on hash could be
         | very narrow. You need only index by hash, video_id.
        
           | CaveTech wrote:
           | OP is referring to phashes aka perceptual hashes, where
           | closest hashes should indeed indicate similarity.
        
       | helsinki wrote:
       | You may want to support either bit interleaving or a CNN that
       | emits a vector that preserves visual locality to other videos,
       | allowing for small changes to be ignored between hashes.
        
       | varelaz wrote:
       | I used similar approach for video hashing. Instead of interval I
       | used key frames with ffmpeg, then you don't depend on codec. Also
       | didn't rescale but took hash of every frame. For youtube I found
       | that it still produces different hashes sometimes.
       | 
       | edit: to get only keyframes use select=eq(pict_type,I)
        
         | mzs wrote:
         | faster: -skip_frame nokey
        
       ___________________________________________________________________
       (page generated 2021-10-11 23:01 UTC)