[HN Gopher] How Good Is Monterey's Visual Look Up?
___________________________________________________________________
How Good Is Monterey's Visual Look Up?
Author : ingve
Score : 65 points
Date : 2022-03-16 08:43 UTC (1 days ago)
(HTM) web link (eclecticlight.co)
(TXT) w3m dump (eclecticlight.co)
| KimJongUndrpnz wrote:
| mistersquid wrote:
| Visual Lookup performs poorly on non-Western historical art such
| as the Fart Battle Scroll (including non-bowdlerized versions).
| [0] [1]
|
| Hopefully Visual Lookup's data set will improve with time and
| usage.
|
| [0]
| https://archive.wul.waseda.ac.jp/kosho/chi04/chi04_01029/chi...
|
| [1] https://www.tofugu.com/japan/fart-scrolls/
| KarlKemp wrote:
| The feature isn't (yet) available outside their larger western
| markets: https://www.apple.com/ios/feature-
| availability/#visual-look-... It will presumably improve over
| time.
| reayn wrote:
| I don't know what I expected to see after clicking on to that
| article but I can't say I'm disappointed lmao.
| dymk wrote:
| Just for famous paintings, apparently. Surely the article could
| have showed off more than just that?
|
| I guess there was a single mention of a Havanese dog.
| ajmurmann wrote:
| For what it's worth, I just tried this with a contemporary
| painting I own that's not famous but sat least came through the
| gallery system and it was recognized
|
| Edit: it just gives me a link to
| https://www.artsy.net/artwork/mark-andrew-bailey-ingess It does
| not present any information like artist, size etc in the iOS
| interface when showing this painting, just the link. Still
| pretty cool.
| felixthehat wrote:
| Just tried it on my pup, a triumph! https://imgur.com/a/bpkSTuu
| Rebelgecko wrote:
| Those results are certainly interesting from a geopolitical
| perspective. It looks like there's an internal mapping of
| Tibet -> PRC?
| felixthehat wrote:
| 'What have I told you about chewing my shoes and getting
| embroiled in geopolitics?'
| CountDrewku wrote:
| recursive wrote:
| It doesn't seem to say it explicitly anywhere. For anyone as
| confused as I was trying to make sense of what this article is
| actually about, it appears to be a new macos feature.
| acdha wrote:
| I had a ton of pictures from a trip to Portugal in 2003 which I
| had roughly located (the dates told me which city I was in) but
| the integrated lookups in Photos made it pretty easy to see exact
| names and locations for most of the historic buildings, artwork,
| etc.
|
| The big thing I wish they had was a workflow optimization: it'd
| be great if there was a way to copy the locations with a single
| click and copy them to temporally adjacent photos since if you
| took a picture of, say, a famous church you could safely assume
| that the closeup details of stonework 3 minutes later were in the
| same place.
| GekkePrutser wrote:
| I have mixed feelings about this..
|
| It sounds like a useful feature but I don't want to help Apple to
| train their algorithms which they're still planning to use to
| snoop on our computers. Their plans are only 'on hold', not
| cancelled. Which sounds a lot like they're waiting for the
| upheaval to blow over, or for some other vendor to introduce this
| so they can point the finger at them and say they're not doing
| anything unprecedented.
|
| So probably I won't use it, I've already stopped using Apple's
| built in photo app and most of iCloud anyway. Not that I have
| anything to hide, I just don't want big tech looking over my
| shoulder. It was great when Apple was one of the last to take a
| stand on privacy and I'm sad at the ease with which they threw it
| out the window.
| KarlKemp wrote:
| They are not asking for users to correct or provide any
| information, so you really aren't helping. Plus, it isn't even
| obvious how tags on your sightseeing photos would be used to
| improve CSAM detection.
| KarlKemp wrote:
| Of note, Photos.app has long had some image recognition features,
| and they have worked quite well for me in the past.
|
| As one example, searching for ,,paper" brings up dozens of hits
| in my library of thousands, including a fair number where it took
| me a while to find the paper. It somehow manages to find two
| portraits where the person is wearing paper-in-plastic-sleeve ID
| tags, but not any of the almost identical portraits with all-
| plastic IDs.
| WoodenChair wrote:
| You can do this on iOS Safari too in the latest version with a
| long press.
| HNHatesUsers wrote:
| morpheuskafka wrote:
| > Confusion between the Pissarro and Gallen-Kallela paintings
| above resulted from a 'collision', in which their Neural Hashes
| are sufficiently similar to result in misidentification, one of
| the reasons that Apple was dissuaded from using this technique to
| detect CSAM.
|
| So if two literal paintings made centuries ago can cause a hash
| collision, there's no way this ever should have been considered
| for matching against files that no one else can see or research
| with for the most serious crime/reputation damage imaginable. It
| would not even be remotely hard to make up some collisions, and
| it could probably be done even without the original dataset.
| KarlKemp wrote:
| As apple states at the time, any action would require more than
| one match. For any hash value of a given size, it is trivially
| easy to calculate the probability of collisions and to adjust
| required thresholds for any arbitrary rate of false positives.
|
| If the FP rate is 1/1000, requiring three ,,hits" makes it
| 1/1,000,000,000, or essentially zero.
| Someone wrote:
| The false positive rate has to be a lot lower for this to
| work.
|
| If it is 1/1000, it is only 1/1,000,000,000 if they have only
| 3 of images from a customer. They typically have thousands,
| though. A 1:1000 false positive rate would mean several ones
| in many iCloud photo databases.
|
| On the plus side, in case of multiple hits, they would have a
| human look at the images.
|
| The whole thing was intended as a way to make that human
| check economically viable. Instead of having people look at
| every picture uploaded to iCloud, they would filter out
| almost all of them, and only let humans look at the few
| remaining (where, I guess, 'few' still could be a lot, given
| their number of users)
| IfOnlyYouKnew wrote:
| They wouldn't have a human look at it. They don't have the
| images, only the hashes-that's the point.
|
| And it's somewhat irrelevant how the probability of
| collisions is specifically calulated (1/1000 already
| assumed 1:n comparisons), as long as we agree it's easy to
| calculate for a given user. The algorithm _does_ know about
| the sizes of the respective image libraries, for example,
| and could adjust the threshold with precision.
| Someone wrote:
| They don't have the images, but they do have "visual
| derivatives". https://www.apple.com/child-
| safety/pdf/CSAM_Detection_Techni...:
|
| _"The device creates a cryptographic safety voucher that
| encodes the match result. It also encrypts the image's
| NeuralHash and a visual derivative. This voucher is
| uploaded to iCloud Photos along with the image.
|
| [...]
|
| Once more than a threshold number of matches has
| occurred, Apple has enough shares that the server can
| combine the shares it has retrieved, and reconstruct the
| decryption key for the ciphertexts it has collected,
| thereby revealing the NeuralHash and visual derivative
| for the known CSAM matches."_
|
| https://www.apple.com/child-
| safety/pdf/Security_Threat_Model... is even clearer:
|
| _"The decrypted vouchers allow Apple servers to access a
| visual derivative - such as a low-resolution version - of
| each matching image.
|
| These visual derivatives are then examined by human
| reviewers"_
| giantrobot wrote:
| That assumes that only "bad" images exist in CSAM corpus.
| There's no guarantee of that and it's not something that can
| be audited in a meaningful way. In the US the only place CSAM
| images can _legally_ exist is NCMEC. Even someone wanting to
| generate a detection system can 't access the corpus directly
| and has to rely on some convoluted system of hashes.
| outworlder wrote:
| > So if two literal paintings made centuries ago can cause a
| hash collision, there's no way this ever should have been
| considered for matching against files that no one else can see
| or research with for the most serious crime/reputation damage
| imaginable.
|
| It simply does not follow that the classifier for CSAM would
| have the same rate of false positives. There isn't enough
| information to infer that.
| jchw wrote:
| Based on the article, I'd expect Apple was retooling their CSAM
| scanner to try to catch art thieves.
|
| Jokes aside, I would like to use this opportunity to express
| something I really want: I really wish I could search Wayback
| Machine with perceptual hashes. Google Images has had search by
| image for a long time, but it seems to get rid of content after a
| while once it's offline. Meanwhile, Internet Archive has a ton of
| images you basically can't find elsewhere anymore, and depending
| on _how_ it was archived, it may be very difficult to find it if
| you don't already know the URL. For sake of preservation, that
| would be genuinely amazing. You could go from a single thumbnail
| or image and potentially find more images or better versions.
|
| It's not like being able to identify common objects and artifacts
| with a phone camera isn't super cool, but its far from perfect
| and in some of its more novel use cases (such as helping blind
| people navigate) that can be troublesome. Nothing technically
| stops the aforementioned Internet Archive phash index except for
| the fact that there will probably never be enough resources to
| create or maintain such an index.
| dessant wrote:
| Internet Archive has an experimental API to perform reverse
| image searches.
|
| https://archive.readme.io/docs/reverse-image-search-api
|
| There is also RootAbout: http://rootabout.com/
|
| You may have a better chance of finding the image by searching
| on a couple dozen search engines using my extension.
|
| https://github.com/dessant/search-by-image#readme
| KarlKemp wrote:
| For as long as it lasts, yandex is my go-to reverse image
| search favorite, by a rather large margin.
|
| Try searching with a portrait... it is unlikely to find the
| person, unless there are images of that person in Russian
| social media. But it will find your identical twin behind the
| ironic curtain.
| gregsadetsky wrote:
| Agreed that it'd be great to have that phash image index. A
| full text search of the Wayback Machine's archives would be
| amazing to have as well..!
|
| I've been putting away the idea of starting a server that would
| request archives from the Wayback Machine, parse text from the
| html documents, and create the world's-simplest-search-index
| i.e. just the location (document id) of every encountered word.
| There's a ton of problems with this "plan", but... having any
| search would be better than nothing?
| jchw wrote:
| Honestly, I think possibly the biggest problem with indexing
| Wayback Machine is simply the size. I'm pretty sure it's
| growing far faster than anyone can pull WARCs out for
| indexing, especially because well, it's not exactly high
| throughput on the download side. I don't blame anyone for
| that, but it does make the prospect of externally indexing
| feel a bit bleak.
|
| At this point, I'd like it if there were just tools to index
| huge WARCs on their own. Maybe it's time to write that.
| gregsadetsky wrote:
| Right, the download speed is definitely an issue (and like
| you say, it's quite understandable considering the
| volume/traffic they deal with), and the continual growth is
| one of many factors I didn't consider.
|
| I wonder if the IA would allow someone to interconnect
| directly with their storage datacenter, if one were to
| submit a well articulated plan to create this search
| index/capability.
|
| Also, what do you mean by tools to index WARCs?
| Specifically, the gzip + WARC parsing + html parsing steps?
| Would the (CLI?) result be text extracted from the original
| html pages, i.e. something along the lines of running
| `strings` or beautifulsoup?
| jchw wrote:
| Yeah, pretty much. Though being able to directly load
| data into a search cluster like Elastic would be nice.
___________________________________________________________________
(page generated 2022-03-17 23:01 UTC)