[HN Gopher] I accidentally built a meme search engine
___________________________________________________________________
I accidentally built a meme search engine
Author : EamonLeonard
Score : 296 points
Date : 2024-04-12 18:13 UTC (2 days ago)
(HTM) web link (harper.blog)
(TXT) w3m dump (harper.blog)
| systemz wrote:
| Interesting, I knew about something similar but more focused on
| server side: https://github.com/simon987/sist2
| bo0tzz wrote:
| Last year we added CLIP-based image search to https://immich.app/
| and even though I have a pretty good understanding of how it
| works, it still blows my mind damn near every day. It's the
| closest thing to magic I've ever seen.
| apricot13 wrote:
| Just needs OCR for the perfect meme searching solution!
| bo0tzz wrote:
| OCR will be there at some point, but it already does a
| surprisingly good job without!
| itake wrote:
| I'd also consider adding searching via QR codes. you could
| search by the content in the QR code (like the url) or if
| its a URL, search the content on the page of the QR code.
| Eisenstein wrote:
| Why? I can't think of a use for this.
| dsvf wrote:
| Happy immich user here! I once took a cute photo of our baby
| chewing on a whisk, and actually finding the correct photo in
| an unsorted, untagged huge pile of photos by simply searching
| for "whisk" was a mindblow experience! It is an amazingly
| powerful tool!
| pwillia7 wrote:
| How does it compare to Google Photo search? I search things
| like 'whisk' with success regularly... though to be fair not
| as random as whisk, but things like "steering wheel"
| fxtentacle wrote:
| Wow I have been looking for something like that for a long
| time. Thanks for telling me about immich :)
| wruza wrote:
| The "Demo portal" link breaks back button, btw.
| atif089 wrote:
| Thanks for sharing about immich. I have a task that has been on
| my to-do list for several years now.
|
| Amongst all the WhatsApp media on my phone I would like to get
| a list of all the videos and photos with my family in it and
| then delete the rest.
|
| Is something like this possible with immich?
| robotnikman wrote:
| Gives me an idea for a meme search service I can use locally to
| search through all the images on my computer to find a specific
| meme (I tend to know I downloaded a funny one and then when I
| want to share it with someone I can never find it)
| rmdes wrote:
| I want to do this but for 30GB of PDFs
| harper wrote:
| this shouldn't be too hard
| mft_ wrote:
| I steered a friend towards Paperless (and away from an LLM
| solution) as a way of searching/accessing GBs of architectural
| PDFs recently - so far, it's apparently working well for them.
|
| https://github.com/paperless-ngx/paperless-ngx
| rmdes wrote:
| I have been playing with it for a while but I miss a
| conversational interface where I can interrogate the PDF's
| and summarize them or let's say, find all the main events per
| year in a corpus of text and build a time-line of said events
| (context legal case with tons of text data to parse)
| ritavdas wrote:
| Did you host any version of this on the cloud for the general
| public to access?
| harper wrote:
| Nope. I would love to make something public using this tech. It
| is so magical.
| araes wrote:
| We're almost getting back to the .com era of the 2000's with
| some of these "public cloud" company demos. Enough frenzy, that
| if your app really starts grinding compute cycles you can
| quickly DDOS yourself with server costs. Even at $0.001/request
| [1], if you get 10,000 _HN_ readers who all make 100 requests
| on average, you suddenly get $1000 server bill from somebody.
| Those used to be on /. all the time circa 2000.
|
| If few convert, and most just tell their friends to try your
| cool demo, you can suddenly have 100,000 _reddit_ users making
| 200+ requests on average every day cause your free demo 's so
| cool. And suddenly you're mostly trying to figure out how to
| scrounge up server costs to cover the free parts.
|
| Frankly, seems like the entire industry's probably going to
| have a lot of the same optimizations pretty soon. "How do we
| stop delivering such enormous JPGs with every Amazon/eBay
| click?" and similar.
|
| [1] Slighly old article, so I lower the $/request on compute a
| bit from $0.0014 to $0.001. https://a16z.com/navigating-the-
| high-cost-of-ai-compute/
| ianbicking wrote:
| Huh, are the image vector embeddings implicitly doing OCR as
| well? Because it seems like the meme search is pulling from the
| text as well as images, though it's not entirely clear.
| bo0tzz wrote:
| CLIP does not have explicit OCR support, but it does somewhat
| coincidentally have a slight understanding of text. This is
| explained by training captions containing (some of) the text
| that is in the image.
| osmarks wrote:
| I think the SigLIP models' dataset (WebLi) includes OCRed
| things too, so they have very good text understanding. I
| tested a bunch of things for my own meme search engine.
| osmarks wrote:
| (https://arxiv.org/pdf/2209.06794.pdf page 20.)
| lancehasson wrote:
| This is awesome! We made similar functionality (plus more)
| available through an API. If anyone is interested to try it out
| and share feedback, please message me and I'll hook you up.
| harper wrote:
| would love to check it out
| yreg wrote:
| Last year there was also a very funny project of meme search
| engine leveraging an iPhone farm:
|
| https://findthatmeme.com/blog/2023/01/08/image-stacks-and-ip...
|
| https://news.ycombinator.com/item?id=34315782
| rovr138 wrote:
| You might be interested in this,
| https://github.com/mazzzystar/Queryable, https://queryable.app/
|
| I run it on my iPhone.
|
| Native app. Doesn't require a network connection (great for
| privacy).
|
| > Queryable is a Core ML model that runs locally on your device.
| Leveraging OpenAI CLIP's model encoding technology to connect
| images and text, you can search your iPhone photo album using any
| natural language input. Most importantly, it is completely
| offline, so your album privacy will not be revealed to anyone.
| And, it is open-source: GitHub
| mazzystar wrote:
| Thank you for your introduction!
|
| After creating Queryable, I also developed an app called
| MemeSearch, which searches for memes on Reddit
| (https://apps.apple.com/us/app/memesearch-reddit-meme-
| finder/...). Although it's completely free, it hasn't been
| downloaded by many users. I thought nobody wanted it, so I'm
| glad to see there are still some people who share a similar
| taste.
| ggrelet wrote:
| Thanks for Queryable, I use it quite often. As for Reddit
| meme finder, how do you deal with reddits sudden price
| increase for its api?
|
| Also, I think you should use another icon from this app
| because it looks like a goofy side project. It probably is
| but people would probably not download iPhone apps if the
| icon doesn't look professional. (My two cents)
| mazzystar wrote:
| I made this app when the Reddit API was free : )
|
| As for the icon, I drew it myself. I found it funny.
| rovr138 wrote:
| Definitely! Great app!
|
| Hadn't seen MemeSearch, downloading it now.
| speedgoose wrote:
| It's very cool to see how it's now possible to easily replicate
| old Google Photos features in 10 hours using open-source tools on
| a laptop.
| gardenhedge wrote:
| I think this is based off Google research
| https://github.com/google-research/big_vision
| osmarks wrote:
| The Google research was based on OpenAI research from 2021,
| though.
| diptanu wrote:
| These hacks/side projects are amazing! I feel we will see a lot
| of creativity as tools to build data intensive AI applications
| become easier.
|
| We built and open sourced Indexify
| https://github.com/tensorlakeai/indexify to make it easy to build
| resilient pipelines to combine data with many different models
| and transformations to build applications that relies on
| embedding or any other metadata extracted by models from Videos,
| Photos and any documents!
|
| I didn't know about SigClip, the author mentioned on the blog,
| need to add this to our library :) I also found it incredible
| that he generated the crawler with Claude! This is the type of
| boilerplate I hope we don't have to write in the future
| thesz wrote:
| It should be named "I accidentally a meme search engine" [1].
|
| [1]
| https://www.reddit.com/r/AskReddit/comments/jooo5/reddit_ori...
| harper wrote:
| i thought this far too late
| mulmen wrote:
| I think that makes it even better. In the sense that you
| truly accidentally meme'd.
| giancarlostoro wrote:
| Don't you hate it when you accidentally the whole post?
| justinator wrote:
| Hey @harper, you ever write about your vision quests?
| harper wrote:
| Rarely. They are mostly bust fucking around. I will try to
| document the current one more.
|
| Fwiw, my recent blog is me trying to do this more
| justinator wrote:
| Cool!
| om8 wrote:
| CLIP is a very interesting technology.
|
| On my previous job ML department created internal tool, where you
| could search through city panoramas (like google street view)
| using text.
|
| It could find you in a second all road pits, overfilled dumpsters
| and other ugly (and beautiful) things you wanted.
| lanej wrote:
| Oh hey Eamo
| brabel wrote:
| > I imagine that we will see this tech rolled into all the
| various photo apps shortly.
|
| Yeah, Google's and Apple's Photos both can search for pictures
| given a description of what you're looking for. In my experience
| both work very well (e.g. search for "cars" in your pics, and
| it'll find all your cars over the years if you, like me, take
| pictures with your cars a lot :) ).
| black_puppydog wrote:
| I love the use of "to Google something" to mean "take something
| tu t works pretty well and then make it so bad nobody will
| use/notice it"
| pksebben wrote:
| I apologize in advance if you're sick of hearing this, but...
|
| I clicked through to your sites 'cause I dig your angle and I saw
| the bit about the kindle. Ouch, dude. Money sure ain't everything
| but holy crap.
|
| You have my condolences. Keep building awesome shit, please.
|
| edit: followup question - do you still have it?
| harper wrote:
| I will never part with my expensive kindle. Lolol. I read so
| many good books on it.
___________________________________________________________________
(page generated 2024-04-14 23:02 UTC)