[HN Gopher] How Good Is Monterey's Visual Look Up?
       ___________________________________________________________________
        
       How Good Is Monterey's Visual Look Up?
        
       Author : ingve
       Score  : 65 points
       Date   : 2022-03-16 08:43 UTC (1 days ago)
        
 (HTM) web link (eclecticlight.co)
 (TXT) w3m dump (eclecticlight.co)
        
       | KimJongUndrpnz wrote:
        
       | mistersquid wrote:
       | Visual Lookup performs poorly on non-Western historical art such
       | as the Fart Battle Scroll (including non-bowdlerized versions).
       | [0] [1]
       | 
       | Hopefully Visual Lookup's data set will improve with time and
       | usage.
       | 
       | [0]
       | https://archive.wul.waseda.ac.jp/kosho/chi04/chi04_01029/chi...
       | 
       | [1] https://www.tofugu.com/japan/fart-scrolls/
        
         | KarlKemp wrote:
         | The feature isn't (yet) available outside their larger western
         | markets: https://www.apple.com/ios/feature-
         | availability/#visual-look-... It will presumably improve over
         | time.
        
         | reayn wrote:
         | I don't know what I expected to see after clicking on to that
         | article but I can't say I'm disappointed lmao.
        
       | dymk wrote:
       | Just for famous paintings, apparently. Surely the article could
       | have showed off more than just that?
       | 
       | I guess there was a single mention of a Havanese dog.
        
         | ajmurmann wrote:
         | For what it's worth, I just tried this with a contemporary
         | painting I own that's not famous but sat least came through the
         | gallery system and it was recognized
         | 
         | Edit: it just gives me a link to
         | https://www.artsy.net/artwork/mark-andrew-bailey-ingess It does
         | not present any information like artist, size etc in the iOS
         | interface when showing this painting, just the link. Still
         | pretty cool.
        
         | felixthehat wrote:
         | Just tried it on my pup, a triumph! https://imgur.com/a/bpkSTuu
        
           | Rebelgecko wrote:
           | Those results are certainly interesting from a geopolitical
           | perspective. It looks like there's an internal mapping of
           | Tibet -> PRC?
        
             | felixthehat wrote:
             | 'What have I told you about chewing my shoes and getting
             | embroiled in geopolitics?'
        
       | CountDrewku wrote:
        
       | recursive wrote:
       | It doesn't seem to say it explicitly anywhere. For anyone as
       | confused as I was trying to make sense of what this article is
       | actually about, it appears to be a new macos feature.
        
       | acdha wrote:
       | I had a ton of pictures from a trip to Portugal in 2003 which I
       | had roughly located (the dates told me which city I was in) but
       | the integrated lookups in Photos made it pretty easy to see exact
       | names and locations for most of the historic buildings, artwork,
       | etc.
       | 
       | The big thing I wish they had was a workflow optimization: it'd
       | be great if there was a way to copy the locations with a single
       | click and copy them to temporally adjacent photos since if you
       | took a picture of, say, a famous church you could safely assume
       | that the closeup details of stonework 3 minutes later were in the
       | same place.
        
       | GekkePrutser wrote:
       | I have mixed feelings about this..
       | 
       | It sounds like a useful feature but I don't want to help Apple to
       | train their algorithms which they're still planning to use to
       | snoop on our computers. Their plans are only 'on hold', not
       | cancelled. Which sounds a lot like they're waiting for the
       | upheaval to blow over, or for some other vendor to introduce this
       | so they can point the finger at them and say they're not doing
       | anything unprecedented.
       | 
       | So probably I won't use it, I've already stopped using Apple's
       | built in photo app and most of iCloud anyway. Not that I have
       | anything to hide, I just don't want big tech looking over my
       | shoulder. It was great when Apple was one of the last to take a
       | stand on privacy and I'm sad at the ease with which they threw it
       | out the window.
        
         | KarlKemp wrote:
         | They are not asking for users to correct or provide any
         | information, so you really aren't helping. Plus, it isn't even
         | obvious how tags on your sightseeing photos would be used to
         | improve CSAM detection.
        
       | KarlKemp wrote:
       | Of note, Photos.app has long had some image recognition features,
       | and they have worked quite well for me in the past.
       | 
       | As one example, searching for ,,paper" brings up dozens of hits
       | in my library of thousands, including a fair number where it took
       | me a while to find the paper. It somehow manages to find two
       | portraits where the person is wearing paper-in-plastic-sleeve ID
       | tags, but not any of the almost identical portraits with all-
       | plastic IDs.
        
       | WoodenChair wrote:
       | You can do this on iOS Safari too in the latest version with a
       | long press.
        
         | HNHatesUsers wrote:
        
       | morpheuskafka wrote:
       | > Confusion between the Pissarro and Gallen-Kallela paintings
       | above resulted from a 'collision', in which their Neural Hashes
       | are sufficiently similar to result in misidentification, one of
       | the reasons that Apple was dissuaded from using this technique to
       | detect CSAM.
       | 
       | So if two literal paintings made centuries ago can cause a hash
       | collision, there's no way this ever should have been considered
       | for matching against files that no one else can see or research
       | with for the most serious crime/reputation damage imaginable. It
       | would not even be remotely hard to make up some collisions, and
       | it could probably be done even without the original dataset.
        
         | KarlKemp wrote:
         | As apple states at the time, any action would require more than
         | one match. For any hash value of a given size, it is trivially
         | easy to calculate the probability of collisions and to adjust
         | required thresholds for any arbitrary rate of false positives.
         | 
         | If the FP rate is 1/1000, requiring three ,,hits" makes it
         | 1/1,000,000,000, or essentially zero.
        
           | Someone wrote:
           | The false positive rate has to be a lot lower for this to
           | work.
           | 
           | If it is 1/1000, it is only 1/1,000,000,000 if they have only
           | 3 of images from a customer. They typically have thousands,
           | though. A 1:1000 false positive rate would mean several ones
           | in many iCloud photo databases.
           | 
           | On the plus side, in case of multiple hits, they would have a
           | human look at the images.
           | 
           | The whole thing was intended as a way to make that human
           | check economically viable. Instead of having people look at
           | every picture uploaded to iCloud, they would filter out
           | almost all of them, and only let humans look at the few
           | remaining (where, I guess, 'few' still could be a lot, given
           | their number of users)
        
             | IfOnlyYouKnew wrote:
             | They wouldn't have a human look at it. They don't have the
             | images, only the hashes-that's the point.
             | 
             | And it's somewhat irrelevant how the probability of
             | collisions is specifically calulated (1/1000 already
             | assumed 1:n comparisons), as long as we agree it's easy to
             | calculate for a given user. The algorithm _does_ know about
             | the sizes of the respective image libraries, for example,
             | and could adjust the threshold with precision.
        
               | Someone wrote:
               | They don't have the images, but they do have "visual
               | derivatives". https://www.apple.com/child-
               | safety/pdf/CSAM_Detection_Techni...:
               | 
               |  _"The device creates a cryptographic safety voucher that
               | encodes the match result. It also encrypts the image's
               | NeuralHash and a visual derivative. This voucher is
               | uploaded to iCloud Photos along with the image.
               | 
               | [...]
               | 
               | Once more than a threshold number of matches has
               | occurred, Apple has enough shares that the server can
               | combine the shares it has retrieved, and reconstruct the
               | decryption key for the ciphertexts it has collected,
               | thereby revealing the NeuralHash and visual derivative
               | for the known CSAM matches."_
               | 
               | https://www.apple.com/child-
               | safety/pdf/Security_Threat_Model... is even clearer:
               | 
               |  _"The decrypted vouchers allow Apple servers to access a
               | visual derivative - such as a low-resolution version - of
               | each matching image.
               | 
               | These visual derivatives are then examined by human
               | reviewers"_
        
           | giantrobot wrote:
           | That assumes that only "bad" images exist in CSAM corpus.
           | There's no guarantee of that and it's not something that can
           | be audited in a meaningful way. In the US the only place CSAM
           | images can _legally_ exist is NCMEC. Even someone wanting to
           | generate a detection system can 't access the corpus directly
           | and has to rely on some convoluted system of hashes.
        
         | outworlder wrote:
         | > So if two literal paintings made centuries ago can cause a
         | hash collision, there's no way this ever should have been
         | considered for matching against files that no one else can see
         | or research with for the most serious crime/reputation damage
         | imaginable.
         | 
         | It simply does not follow that the classifier for CSAM would
         | have the same rate of false positives. There isn't enough
         | information to infer that.
        
       | jchw wrote:
       | Based on the article, I'd expect Apple was retooling their CSAM
       | scanner to try to catch art thieves.
       | 
       | Jokes aside, I would like to use this opportunity to express
       | something I really want: I really wish I could search Wayback
       | Machine with perceptual hashes. Google Images has had search by
       | image for a long time, but it seems to get rid of content after a
       | while once it's offline. Meanwhile, Internet Archive has a ton of
       | images you basically can't find elsewhere anymore, and depending
       | on _how_ it was archived, it may be very difficult to find it if
       | you don't already know the URL. For sake of preservation, that
       | would be genuinely amazing. You could go from a single thumbnail
       | or image and potentially find more images or better versions.
       | 
       | It's not like being able to identify common objects and artifacts
       | with a phone camera isn't super cool, but its far from perfect
       | and in some of its more novel use cases (such as helping blind
       | people navigate) that can be troublesome. Nothing technically
       | stops the aforementioned Internet Archive phash index except for
       | the fact that there will probably never be enough resources to
       | create or maintain such an index.
        
         | dessant wrote:
         | Internet Archive has an experimental API to perform reverse
         | image searches.
         | 
         | https://archive.readme.io/docs/reverse-image-search-api
         | 
         | There is also RootAbout: http://rootabout.com/
         | 
         | You may have a better chance of finding the image by searching
         | on a couple dozen search engines using my extension.
         | 
         | https://github.com/dessant/search-by-image#readme
        
         | KarlKemp wrote:
         | For as long as it lasts, yandex is my go-to reverse image
         | search favorite, by a rather large margin.
         | 
         | Try searching with a portrait... it is unlikely to find the
         | person, unless there are images of that person in Russian
         | social media. But it will find your identical twin behind the
         | ironic curtain.
        
         | gregsadetsky wrote:
         | Agreed that it'd be great to have that phash image index. A
         | full text search of the Wayback Machine's archives would be
         | amazing to have as well..!
         | 
         | I've been putting away the idea of starting a server that would
         | request archives from the Wayback Machine, parse text from the
         | html documents, and create the world's-simplest-search-index
         | i.e. just the location (document id) of every encountered word.
         | There's a ton of problems with this "plan", but... having any
         | search would be better than nothing?
        
           | jchw wrote:
           | Honestly, I think possibly the biggest problem with indexing
           | Wayback Machine is simply the size. I'm pretty sure it's
           | growing far faster than anyone can pull WARCs out for
           | indexing, especially because well, it's not exactly high
           | throughput on the download side. I don't blame anyone for
           | that, but it does make the prospect of externally indexing
           | feel a bit bleak.
           | 
           | At this point, I'd like it if there were just tools to index
           | huge WARCs on their own. Maybe it's time to write that.
        
             | gregsadetsky wrote:
             | Right, the download speed is definitely an issue (and like
             | you say, it's quite understandable considering the
             | volume/traffic they deal with), and the continual growth is
             | one of many factors I didn't consider.
             | 
             | I wonder if the IA would allow someone to interconnect
             | directly with their storage datacenter, if one were to
             | submit a well articulated plan to create this search
             | index/capability.
             | 
             | Also, what do you mean by tools to index WARCs?
             | Specifically, the gzip + WARC parsing + html parsing steps?
             | Would the (CLI?) result be text extracted from the original
             | html pages, i.e. something along the lines of running
             | `strings` or beautifulsoup?
        
               | jchw wrote:
               | Yeah, pretty much. Though being able to directly load
               | data into a search cluster like Elastic would be nice.
        
       ___________________________________________________________________
       (page generated 2022-03-17 23:01 UTC)