[HN Gopher] Improving recommendation systems and search in the a...
___________________________________________________________________
Improving recommendation systems and search in the age of LLMs
Author : 7d7n
Score : 312 points
Date : 2025-03-23 03:40 UTC (19 hours ago)
(HTM) web link (eugeneyan.com)
(TXT) w3m dump (eugeneyan.com)
| whatever1 wrote:
| Why we don't have an LLM based search tool for our pc /
| smartphones?
|
| Specially for the smartphones all of your data is on the cloud
| anyway, instead of just scraping it for advertising and the FBI
| they could also do something useful for the user?
| dmbche wrote:
| It doesn't solve any problem, you can just search your files
| using your prefered file explorer (crtl-f)
|
| I'd assume most people organise their files so that they know
| where things are as well.
| whatever1 wrote:
| But file explorer does not read the actual files and build
| context. Even for pure text files that sometimes search
| functions can also access, I need to remember exactly the
| string of characters I am looking for.
|
| I was hoping an LLM would have a context of all of my content
| (text and visual) and for the first time use my computers
| data as a knowledge base.
|
| Queries like "what was my design file for that x service" ?
| Today it's impossible to answer unless you have organized
| your data your self.
|
| Why do we still have to organize our data manually?
| pests wrote:
| The photos apps do this well now. Can search Apple/Google
| photos with questions about the content of images and
| videos and get useful results.
| nine_k wrote:
| > _you can just search your files using your prefered file
| explorer_
|
| This only work if you remember specific substrings. An LLM
| (or some other language model) can summarize and interpolate.
| It can be asked to find that file that mentions a transaction
| for buying candy, and it has a fair chance to find it, even
| if none of the words "transaction", "buying" or "candy" are
| present in the file, e.g. it says "shelled out $17 for a huge
| pack of gobstoppers".
|
| > _I 'd assume most people organise their files_
|
| You'll be shocked, but...
| ozim wrote:
| I think the same, people are not organized - even with
| things that make them money and being organized could earn
| them much more.
| dmbche wrote:
| But isn't that candy example non-sensical? In what
| situation do you need some information without any of the
| context(or without knowing any of the context)?
|
| i really believe that this is not an actual problem in need
| of solving, but instead creating a tool (personal ai
| assistant) and trying to find a usecase
|
| Edit0: note to self, rambling - assuming there exist
| valuable information that one needa to access in their
| files, but one doesn't know where it is, when it was made,
| it's name or other information about it(as you could find
| said file right away with this information).
|
| Say you need an information for some documentation like the
| C standard - you need precise information on some process.
| Is it not much simpler to just open the doc and use the
| index? Then again for you to be aeare of the C standard
| makes the query useless.
|
| If it's from something less well organised, say you want
| letters you wrote to your significant other, maybe the
| assistant could help. But then again, what are you asking?
| How hard is it to keep your letters in a folder? Or even
| simply know what you've done (I surely can't imagine
| forgetting things I've created but somehow finding use in a
| llm that finds it for me).
|
| Like asking it "what is my opinion on x" or "what's a good
| compliment I wrote" is nonsensical to me, but asking it
| about external ressources makes the idea of training it on
| your own data pointless. "How did I write X API" - just
| open your file, no? You know where it is, you made it.
|
| Like saying "get me that picture of unle tony in Florida"
| might save you 10 seconds instead of going into your files
| and thinking about when you got that picture, but it's not
| solving a real issue or making things more efficient.
| (Edit1: if you don't know Tony, when you got the picture or
| of what it's a picture of, why are you querying? What's the
| usecase for this information, is it just to prove it can be
| done? It feels like the user needs to contorts themselves
| in a small niche for this product to be useful)
|
| Either it's used for non valuable work (menial search) or
| you already know how to get the answer you need.
|
| I cannot imagine a query that would be useful compared to
| simply being aware of what's in your computer. And if
| you're not aware of it, how do you search for it?
| mjlee wrote:
| I think your brain may just work differently to mine, and
| I don't think I'm unique.
|
| > "get me that picture of unle tony in Florida" might
| save you 10 seconds instead of going into your files and
| thinking about when you got that picture
|
| I don't have a memory for time, and I can't picture
| things in my mind. Thinking about when I took a picture
| does nothing for me, I could be out by years. Having some
| unified natural language search engine would be amazing
| for me. I might remember it was a sunny day and that we
| got ice cream, and that's what I want to search on.
|
| The "small niche" use case for me is often my daughter
| wants to see a photo of a family member I'm talking
| about, or I want to remember some other aspect of the day
| and the photo triggers that for me.
| dmbche wrote:
| Makes a lot of sense. Thanks for the response, enjoy your
| day!
| pizza wrote:
| Here's an example of a type of feature I want: I'm
| looking at a menu from a popular restaurant and it has
| hundreds of choices. I start to feel some analysis
| paralysis. I say to my computer, "hey computer, I'm open
| to any suggestions, so long as it's well-seasoned, spicy,
| salty, has some protein and fiber, easy to digest, rich
| in nutrients, not too dry, not too oily, pairs well with
| <whatever I have in my fridge>, etc.." Basically,
| property-oriented search queries whose answers can be
| verified, without having to trudge through them myself,
| where I don't really care about correctness, just
| satisficing.
| ozim wrote:
| I think you are really wrong.
|
| Most people I see at work and outside don't care and they
| want stupid machine to deal with it.
|
| That is why smartphones and tablets move away from providing
| ,,file system" access.
|
| It is super annoying for me but most people want to save
| their tax form or their baby photo not even understanding
| each is different file type - because they couldn't care less
| about file types let alone making folder structure to keep
| them organized.
| acchow wrote:
| Curiously, the things I search most often are not located in
| files: calendar, photo content/location, email, ChatGPT
| history, Spotify library, iMessage/whatsapp history,
| contacts, notes, Amazon order history
| rudedogg wrote:
| This is roughly what Apple Intelligence was supposed to deliver
| but has yet to.
| visarga wrote:
| I found that ChatGPT or Claude are really good at music and
| shopping suggestions. Just chat with them about your tastes for
| a while, then ask for suggestions. Compared to old recommender
| systems this method allows much better user guidance.
| josephg wrote:
| Yeah, Claude helped me decide what to get my girlfriend for
| her birthday a few weeks ago. It suggested some great gift
| ideas I hadn't thought of - and my girlfriend loved them.
| Workaccount2 wrote:
| I think we can expect this to be rapidly monetized.
| KoftaBob wrote:
| For shopping suggestions, I've had the best experience with
| Perplexity.
| GraemeMeyer wrote:
| It's coming for PCs soon:
| https://www.theregister.com/2025/01/20/microsoft_unveils_win...
|
| And to a certain extent for the Microsoft cloud experience as
| well: https://www.theverge.com/2024/10/8/24265312/microsoft-
| onedri...
| curious_cat_163 wrote:
| > Why we don't have an LLM based search tool for our pc /
| smartphones?
|
| I'll offer my take as an outside observer. If someone has
| better insights, feel free to share as well.
|
| In market terms, I think it is because Google, Microsoft and
| Apple are all still trying with varied success. It has to be
| them because that's where a big bulk of the users are. They are
| all also public companies with impatient investors wanting the
| stock to go up into the right. So, they are both cautious about
| what ship to billions of devices (brand protection) and
| cautious about "opening up" their OS beyond that they have
| already done (fear of disruption).
|
| In technical terms, it is taking a while because if the tool is
| going to use LLMs, then they need to solve for 99.999% of the
| reliability problems (brand protection) that come with that
| tech. They need to solve for power consumption (either on edge
| or in the data centers) due to their sheer scale.
|
| So, their choices are ship fast (which Google has been trying
| to do more) and iterate in public; or partner with other
| product companies by investing in them (which Microsoft has
| been doing with Open AI and Google is doing with Anthropic,
| etc.).
|
| Apple is taking some middle path but they just fired the person
| who was heading up the initiative [1] so let's see how that
| goes.
|
| My two cents.
|
| [1] https://www.reuters.com/technology/artificial-
| intelligence/a...
| anon373839 wrote:
| Terrific post. Just about everything Eugene writes about AI/ML is
| pure gold.
| hackernewds wrote:
| haha this is some solid astroturfing Eugene :)
| 7d7n wrote:
| haha that wasn't me ;)
| anon373839 wrote:
| Oops, sorry! Wasn't trying to make you look bad. Just a fan
| of your writing.
| 7d7n wrote:
| Not at all! I appreciate the kind words. Thank you!
| anonymousDan wrote:
| It's interesting that none of these papers seem to be coming out
| of academic labs....
| pizza wrote:
| Checking if a recommendation system is actually good in
| practice is kind of tough to do without owning a whole internet
| media platform as well. At best, you'll get the table scraps
| from these corporations (in the form of toy datasets/models
| made available), and you still will struggle to make your dev
| loop productive enough without throwing similar amounts of
| compute that the ~FAANGs do so as to validate whether that 0.2%
| improvement you got really meant anything or not. Oh, and also,
| the nature of recommendations is that they get very stale very
| quickly, so be prepared to check that your method still works
| when you do yet another huge training run on a weekly/daily
| cadence.
| lmeyerov wrote:
| As someone whose customers do this stuff, I'm 100% for most
| academics chasing harder and more important problems
|
| Most of these papers are specialized increments on high
| baselines for a primarily commercial problem. Likewise, they
| focus on optimizing phenomena that occur in their product,
| which may not occur in others. Eg, Netflix sliding window is
| neato to see the result of, but I rather students user their
| freedom to explore bigger ideas like mamba, and leave sliding
| windows to a masters student who is experimenting with
| intentionally narrowly scoped tweaks.
| bradly wrote:
| > you still will struggle to make your dev loop productive
| enough without throwing similar amounts of compute that the
| ~FAANGs do so as to validate whether that 0.2% improvement
| you got really meant anything or not
|
| And do not forget the incredible of number of actual humans
| FAANG pays every day to evaluate any changes in result sets
| for top x,000 queries.
| lmeyerov wrote:
| As someone whose customers do this stuff, I'm 100% for most
| academics chasing harder and more important problems.
|
| Most of these papers are specialized increments on high
| baselines for a primarily commercial problem. Likewise, they
| focus on optimizing phenomena that occur in their product,
| which may not occur in others. Eg, Netflix sliding window is
| neato to see the result of, but I rather students user their
| freedom to explore bigger ideas like mamba, and leave sliding
| windows to a masters student who is experimenting with
| intentionally narrowly scoped tweaks. At that point, to top PhD
| grads at industrial labs will probably win.
|
| That said, recsys is a general formulation with applications
| beyond shopping carts and social feeds, and bigger ideas do
| come out, where I'd expect competitive labs to do projects on.
| GNN for recsys was a big bet a couple years ago, and LLMs now,
| and it is curious to me those bigger shifts are industrial labs
| papers as you say. Maybe the statement there is recsys is one
| of the areas that industry hires a lot of PhDs on, as it is so
| core to revenue lift: academia has regular representation,
| while industry is overrepresented.
| jamesblonde wrote:
| It is very interesting that Eugene does this work and publishes
| it so soon after conferences. Traditionally this would be a
| literature survey by a PhD student and would take 12 months to
| come out as some obscure journal behind a walled garden. I wonder
| if it is an outlier (Eugene is good!) or a sign of things to
| come?
| drodgers wrote:
| > a sign of things to come
|
| Isn't this, like, a sign of what's been happening for the last
| 20+ years (arxiv, blogs etc.)?
| jamesblonde wrote:
| To some extent. But it's hard to find quality. Eugene's stuff
| is quality. For example, i'm in distributed systems,
| databases, and MLOps. Murat Demirbas (Uni Buffalo) has been
| the best in dist systems. Andy Pavlo (CMU) for databases.
| Stanford (Matei) have been doing the best summarizing in
| MLOps.
| thorum wrote:
| In the age of local LLMs I'd like to see a personal
| recommendation system that doesn't care about being scalable and
| efficient. Why can't I write a prompt that describes exactly what
| I'm looking for in detail and then let my GPU run for a week
| until it finds something that matches?
| r4ndomname wrote:
| This is exactly what I am hoping to get sometimes (but I would
| say, 1 week is maybe a little long).
|
| If I go through my current tasks and see, that for some task I
| need a set of documents, emails, .., why cant I just prompt the
| system to get it in 30-ish minutes. But as someone already
| stated Apple Intelligence is supposed to fill this gap.
| mdp2021 wrote:
| > _maybe a little long_
|
| Many of us have ongoing problems pending for years - for just
| "a week", "where do I sign".
|
| It really depends on the task.
| whiplash451 wrote:
| Why would it take a week?
|
| Is this because you want it to continuously watch for live data
| that could match your need?
| mdp2021 wrote:
| Because thinking takes time.
| pizza wrote:
| It's worth pointing out that even with the largest models out
| there, coherence drops fast over length. In a local home ML
| setup, until somebody radically improves long-term coherence,
| models with < x memory may be a diametrically opposed
| constraint to something that still says the right thing after >
| y minutes of search.
| fhe wrote:
| or it keeps monitoring the web and notify me whenever something
| that matches my interests shows up -- like a more sophisticated
| Google alert. I really would love that.
| osmarks wrote:
| You could just run a local LLM over every document and ask it
| "is this related to this query". I don't think you actually
| want to wait a week (and holding all the documents you might
| ever want to search would run to petabytes).
|
| (the _reasonable_ way is embedding search, which runs much
| faster with some precomputation, but you still have to store
| things)
| kortilla wrote:
| The entire library of Congress is like 10TB. You don't need
| anything near petabytes until you get out of text into rich
| media.
| osmarks wrote:
| Common Crawl is petabytes. Anna's Archive is about a
| petabyte, but it includes PDFs with images.
| amelius wrote:
| A better way would be to ask the LLM to generate keywords (or
| queries). And then use old school techniques to find a set of
| documents, and then filter those using another LLM.
| brookst wrote:
| How is that better than embeddings? You're using embeddings
| to get a finite list of keywords, throwing out the extra
| benefits of embeddings (support for every human language,
| for instance), using a conventional index, and then going
| back to embeddings space for the final LLM?
|
| That whole thing can be simplified to: compute and store
| embeddings for docs, compute embeddings for query, find
| most similar docs.
| amelius wrote:
| Yes, you can do the "old school search" part with
| embeddings.
| brookst wrote:
| Ah, I had interpreted "old school search" to mean classic
| text indexing and Boolean style search. I'd argue that if
| it's using embeddings and cosine similarity, it's not old
| school. But that's just semantics.
| osmarks wrote:
| https://arxiv.org/abs/2212.10496
| bryanrasmussen wrote:
| this is sort of like a dream I had
| https://medium.com/luminasticity/the-county-map-of-the-world...
|
| >The idea was that he could graft queries in this that he did
| not expect to finish quickly but which he could let run for
| hours or days and how freeing it was to do more advanced
| research this way.
| desdenova wrote:
| Why can't you?
|
| Just run the biggest model you can find out of swap and wait a
| long time for it to finish.
|
| You'll obviously see more focus on smaller models, because most
| people aren't willing to wait weeks for their slop, and also
| don't have server GPU clusters to run huge models.
| HeatrayEnjoyer wrote:
| > Just run the biggest model you can find out of swap
|
| This kills the SSD
| onel wrote:
| Another amazing post from Eugene
| anthk wrote:
| Use 'Recoll' and learn to use search strings. For Windows users,
| older Recoll releases are standalone and have all the
| dependencies bundled, so you can search into PDF's, ODT/DOCX and
| tons more.
| x1xx wrote:
| > Spotify saw a 9% increase in exploratory intent queries, a 30%
| rise in maximum query length per user, and a 10% increase in
| average query length--this suggests the query recommendation
| updates helped users express more complex intents
|
| To me it's not clear that it should be interpreted as an
| improvement: what I read in this summary is that users had to
| search more and to enter longer queries to get to what they
| needed.
| rorytbyrne wrote:
| We would need to normalise query length by the success rate to
| draw any informative conclusions here. The rate of immediate
| follow-up queries could be a decent proxy for this.
| Traubenfuchs wrote:
| > a 9% increase in exploratory intent queries
|
| Users struggle to find the right stuff or stuff that's so good
| they don't need do do more queries.
|
| > a 30% rise in maximum query length per user, and a 10%
| increase in average query length
|
| Users need to execute more complex queries to find what they
| are looking for.
| 1oooqooq wrote:
| that's what you get when you have a "search pm".
| RicoElectrico wrote:
| I can understand tracking metrics for performance (as in speed,
| server load) or revenue. But I don't see how anyone could make
| such conclusions as they did with a straight face, apart from
| achieving some OKR for promotion reasons. There's no substitute
| for user research, focused mindset and good taste.
|
| I can imagine that's why today's apps suck so much as most of
| the pain points won't be easily caught by user behavior
| metrics.
|
| One thing Alex from Organic Maps taught me is how important it
| is to just listen to your users. Many of the UX improvements
| were driven by addressing complaints from e-mail feedback.
| wildrhythms wrote:
| No you don't understand, more queries = more engagement!
| MostlyStable wrote:
| It's relatively easy to construct a scenario where more
| search _is_ in fact indicative of _better_ search. To stick
| with Spotify: let 's imagine they have an amazing search tool
| that consistently finds new, interesting music that the user
| genuinely likes. I can imagine that in that situation, users
| are going to search more, because doing so consistently gets
| them new, enjoyable music.
|
| But the opposite is equally possible: a terrible search tool
| could regularly fail to find what the user is looking for or
| produce music that they enjoy. In this situation, I can
| _also_ imagine users searching more, because it takes more
| search effort to find something they like.
|
| They key is _why_ are users searching. In Spotify 's case I
| imagine that you could try and connect number of searches per
| listen, or how often a search results in a listen and how
| often those listens result in a positive rating. There are
| probably more options, but there needs to be some way of
| connecting the amount of search with how the user feels about
| those search results.
|
| And yeah, using nothing other than search volume is probably
| a bad way to go about it
| braiamp wrote:
| Yeah, this should be evaluated in a multivariate/bivariate
| model. Of the successful queries, how the length changed before
| and after interventions.
| stuaxo wrote:
| Off topic - but I think joining recommendation systems and forums
| (aka all the social media that isn't bsky or fedi) has been a
| complete disaster for society.
| thaumiel wrote:
| ah this explains why my spotify experience has gotten worse over
| time.
| UrineSqueegee wrote:
| I have the exact opposite experience, recently when a playlist
| I have is over, I find that every recommended track that plays
| after, I love so much I end up putting in my playlist
| appleorchard46 wrote:
| I liked when you could make a playlist radio and do that
| manually. That's been removed now of course.
| Melatonic wrote:
| On desktop I believe you can still take any of your
| playlists and tell it to generate a "similar" playlist.
| Works really well.
| thaumiel wrote:
| My taste in music is apparently so varied, that if I want to
| keep the "daily" Spotify list as I want them, I have to limit
| myself in variation in what I listen to, otherwise they will
| get too mixed up and I will not enjoy them anymore. So I use
| other peoples recommendations or music review sites instead
| to find new music/bands/artists. I tried the spotify AI dj
| service a couple of times, but it has not been a good
| experience, when it tries to push in a new direction it has
| never really gotten it right for me.
| tullie wrote:
| The other direction that isn't explicitly mentioned in this post
| is the variants of SASRec and Bert4Rec that are still trained on
| ID-Tokens but showing scaling laws much like LLMs. E.g. Meta's
| approach https://arxiv.org/abs/2402.17152 (paper write up here:
| https://www.shaped.ai/blog/is-this-the-chatgpt-moment-for-re...)
| bookofjoe wrote:
| Perplexity Pro suggested several portable car battery chargers,
| which led me to search online reviews, whose consensus (five or
| so review sites) highest-rated chargers were the first two on
| Perplexity's recommendation list. In other words, the AI was an
| helpful guide to focused deeper search.
| memhole wrote:
| It looks like a great overview of recommendation systems. I think
| my main takeaways are:
|
| 1. Latency is a major issue.
|
| 2. Fine tuning can lead to major improvements and I think reduce
| latency. If I didn't misread.
|
| 3. There's some threshold or problems where prompting or fine
| tuning should be used.
| novia wrote:
| I started listening to this article (using a text to speech
| model) shortly after waking up.
|
| I thought it was very heavy on jargon. Like, it was written in a
| way that makes the author appear very intelligent without
| necessarily effectively conveying information to the audience.
| This is something that I've often seen authors do in academic
| papers, and my one published research paper (not first author) is
| no exception.
|
| I'm by no means an expert in the field of ML, so perhaps I am
| just not the intended audience. I'm curious if other people here
| felt the same way when reading though.
|
| Hopefully this observation / opinion isn't too negative.
| curious_cat_163 wrote:
| To me, it reads like a survey paper intended for (and maybe
| written by) a researcher about to start a new project. I am not
| a researcher in this space but I have dabbled elsewhere, so it
| is somewhat accessible. The degree to which one leverages
| existing jargon in their writing is a choice, of course.
|
| I am curious -- what would have made it more effective at
| conveying information to you? Different people learn
| differently but I wonder how people get beyond the hurdles of
| jargon.
| novia wrote:
| Yeah I'm not sure if it's just me and my learning style or if
| researchers purposefully use terminology that's obstructive
| to understanding to maintain walled gardens. I don't think my
| reading comprehension level is particularly low!
|
| Usually the best way to learn about things like this for me
| is to see some actual code or to write things myself, but the
| lack of coding examples in the text isn't the thing that I
| find troubling. I don't know, it's just.. like, excessively
| pointer heavy?
|
| Maybe if you've been in the field long enough, reading a
| particular term will instantly conjure up an idea of a
| corresponding algorithm or code block or something and that's
| what I'm missing.
| 7d7n wrote:
| Thank you for the feedback! I'm sorry you found it jargony/less
| accessible than you'd like.
|
| The intended audience was my team and fellow practitioners;
| assuming some understanding of the jargon allowed me to skip
| the basics and write more concisely.
| softwaredoug wrote:
| A lot of teams can do a lot with search with just LLMs in the
| loop on query and index side doing enrichment that used to be
| months-long projects. Even with smaller, self hosted models and
| fairly naive prompts you can turn a search string into a more
| structured query - and cache the hell out of it. Or classify
| documents into a taxonomy. All backed by boring old lexical or
| vector search engine. In fact I'd say if you're NOT doing this
| you're making a mistake.
| syndacks wrote:
| Can you share more, or at least point me in the right
| direction?
| ntonozzi wrote:
| One place to explore more would be Doc2Query:
| https://arxiv.org/abs/1904.08375.
|
| It's not the latest and hottest but super simple to do with
| LLMs these days and can improve a lexical search engine quite
| a lot.
___________________________________________________________________
(page generated 2025-03-23 23:00 UTC)