[HN Gopher] Improving recommendation systems and search in the a...
       ___________________________________________________________________
        
       Improving recommendation systems and search in the age of LLMs
        
       Author : 7d7n
       Score  : 312 points
       Date   : 2025-03-23 03:40 UTC (19 hours ago)
        
 (HTM) web link (eugeneyan.com)
 (TXT) w3m dump (eugeneyan.com)
        
       | whatever1 wrote:
       | Why we don't have an LLM based search tool for our pc /
       | smartphones?
       | 
       | Specially for the smartphones all of your data is on the cloud
       | anyway, instead of just scraping it for advertising and the FBI
       | they could also do something useful for the user?
        
         | dmbche wrote:
         | It doesn't solve any problem, you can just search your files
         | using your prefered file explorer (crtl-f)
         | 
         | I'd assume most people organise their files so that they know
         | where things are as well.
        
           | whatever1 wrote:
           | But file explorer does not read the actual files and build
           | context. Even for pure text files that sometimes search
           | functions can also access, I need to remember exactly the
           | string of characters I am looking for.
           | 
           | I was hoping an LLM would have a context of all of my content
           | (text and visual) and for the first time use my computers
           | data as a knowledge base.
           | 
           | Queries like "what was my design file for that x service" ?
           | Today it's impossible to answer unless you have organized
           | your data your self.
           | 
           | Why do we still have to organize our data manually?
        
             | pests wrote:
             | The photos apps do this well now. Can search Apple/Google
             | photos with questions about the content of images and
             | videos and get useful results.
        
           | nine_k wrote:
           | > _you can just search your files using your prefered file
           | explorer_
           | 
           | This only work if you remember specific substrings. An LLM
           | (or some other language model) can summarize and interpolate.
           | It can be asked to find that file that mentions a transaction
           | for buying candy, and it has a fair chance to find it, even
           | if none of the words "transaction", "buying" or "candy" are
           | present in the file, e.g. it says "shelled out $17 for a huge
           | pack of gobstoppers".
           | 
           | > _I 'd assume most people organise their files_
           | 
           | You'll be shocked, but...
        
             | ozim wrote:
             | I think the same, people are not organized - even with
             | things that make them money and being organized could earn
             | them much more.
        
             | dmbche wrote:
             | But isn't that candy example non-sensical? In what
             | situation do you need some information without any of the
             | context(or without knowing any of the context)?
             | 
             | i really believe that this is not an actual problem in need
             | of solving, but instead creating a tool (personal ai
             | assistant) and trying to find a usecase
             | 
             | Edit0: note to self, rambling - assuming there exist
             | valuable information that one needa to access in their
             | files, but one doesn't know where it is, when it was made,
             | it's name or other information about it(as you could find
             | said file right away with this information).
             | 
             | Say you need an information for some documentation like the
             | C standard - you need precise information on some process.
             | Is it not much simpler to just open the doc and use the
             | index? Then again for you to be aeare of the C standard
             | makes the query useless.
             | 
             | If it's from something less well organised, say you want
             | letters you wrote to your significant other, maybe the
             | assistant could help. But then again, what are you asking?
             | How hard is it to keep your letters in a folder? Or even
             | simply know what you've done (I surely can't imagine
             | forgetting things I've created but somehow finding use in a
             | llm that finds it for me).
             | 
             | Like asking it "what is my opinion on x" or "what's a good
             | compliment I wrote" is nonsensical to me, but asking it
             | about external ressources makes the idea of training it on
             | your own data pointless. "How did I write X API" - just
             | open your file, no? You know where it is, you made it.
             | 
             | Like saying "get me that picture of unle tony in Florida"
             | might save you 10 seconds instead of going into your files
             | and thinking about when you got that picture, but it's not
             | solving a real issue or making things more efficient.
             | (Edit1: if you don't know Tony, when you got the picture or
             | of what it's a picture of, why are you querying? What's the
             | usecase for this information, is it just to prove it can be
             | done? It feels like the user needs to contorts themselves
             | in a small niche for this product to be useful)
             | 
             | Either it's used for non valuable work (menial search) or
             | you already know how to get the answer you need.
             | 
             | I cannot imagine a query that would be useful compared to
             | simply being aware of what's in your computer. And if
             | you're not aware of it, how do you search for it?
        
               | mjlee wrote:
               | I think your brain may just work differently to mine, and
               | I don't think I'm unique.
               | 
               | > "get me that picture of unle tony in Florida" might
               | save you 10 seconds instead of going into your files and
               | thinking about when you got that picture
               | 
               | I don't have a memory for time, and I can't picture
               | things in my mind. Thinking about when I took a picture
               | does nothing for me, I could be out by years. Having some
               | unified natural language search engine would be amazing
               | for me. I might remember it was a sunny day and that we
               | got ice cream, and that's what I want to search on.
               | 
               | The "small niche" use case for me is often my daughter
               | wants to see a photo of a family member I'm talking
               | about, or I want to remember some other aspect of the day
               | and the photo triggers that for me.
        
               | dmbche wrote:
               | Makes a lot of sense. Thanks for the response, enjoy your
               | day!
        
               | pizza wrote:
               | Here's an example of a type of feature I want: I'm
               | looking at a menu from a popular restaurant and it has
               | hundreds of choices. I start to feel some analysis
               | paralysis. I say to my computer, "hey computer, I'm open
               | to any suggestions, so long as it's well-seasoned, spicy,
               | salty, has some protein and fiber, easy to digest, rich
               | in nutrients, not too dry, not too oily, pairs well with
               | <whatever I have in my fridge>, etc.." Basically,
               | property-oriented search queries whose answers can be
               | verified, without having to trudge through them myself,
               | where I don't really care about correctness, just
               | satisficing.
        
           | ozim wrote:
           | I think you are really wrong.
           | 
           | Most people I see at work and outside don't care and they
           | want stupid machine to deal with it.
           | 
           | That is why smartphones and tablets move away from providing
           | ,,file system" access.
           | 
           | It is super annoying for me but most people want to save
           | their tax form or their baby photo not even understanding
           | each is different file type - because they couldn't care less
           | about file types let alone making folder structure to keep
           | them organized.
        
           | acchow wrote:
           | Curiously, the things I search most often are not located in
           | files: calendar, photo content/location, email, ChatGPT
           | history, Spotify library, iMessage/whatsapp history,
           | contacts, notes, Amazon order history
        
         | rudedogg wrote:
         | This is roughly what Apple Intelligence was supposed to deliver
         | but has yet to.
        
         | visarga wrote:
         | I found that ChatGPT or Claude are really good at music and
         | shopping suggestions. Just chat with them about your tastes for
         | a while, then ask for suggestions. Compared to old recommender
         | systems this method allows much better user guidance.
        
           | josephg wrote:
           | Yeah, Claude helped me decide what to get my girlfriend for
           | her birthday a few weeks ago. It suggested some great gift
           | ideas I hadn't thought of - and my girlfriend loved them.
        
             | Workaccount2 wrote:
             | I think we can expect this to be rapidly monetized.
        
           | KoftaBob wrote:
           | For shopping suggestions, I've had the best experience with
           | Perplexity.
        
         | GraemeMeyer wrote:
         | It's coming for PCs soon:
         | https://www.theregister.com/2025/01/20/microsoft_unveils_win...
         | 
         | And to a certain extent for the Microsoft cloud experience as
         | well: https://www.theverge.com/2024/10/8/24265312/microsoft-
         | onedri...
        
         | curious_cat_163 wrote:
         | > Why we don't have an LLM based search tool for our pc /
         | smartphones?
         | 
         | I'll offer my take as an outside observer. If someone has
         | better insights, feel free to share as well.
         | 
         | In market terms, I think it is because Google, Microsoft and
         | Apple are all still trying with varied success. It has to be
         | them because that's where a big bulk of the users are. They are
         | all also public companies with impatient investors wanting the
         | stock to go up into the right. So, they are both cautious about
         | what ship to billions of devices (brand protection) and
         | cautious about "opening up" their OS beyond that they have
         | already done (fear of disruption).
         | 
         | In technical terms, it is taking a while because if the tool is
         | going to use LLMs, then they need to solve for 99.999% of the
         | reliability problems (brand protection) that come with that
         | tech. They need to solve for power consumption (either on edge
         | or in the data centers) due to their sheer scale.
         | 
         | So, their choices are ship fast (which Google has been trying
         | to do more) and iterate in public; or partner with other
         | product companies by investing in them (which Microsoft has
         | been doing with Open AI and Google is doing with Anthropic,
         | etc.).
         | 
         | Apple is taking some middle path but they just fired the person
         | who was heading up the initiative [1] so let's see how that
         | goes.
         | 
         | My two cents.
         | 
         | [1] https://www.reuters.com/technology/artificial-
         | intelligence/a...
        
       | anon373839 wrote:
       | Terrific post. Just about everything Eugene writes about AI/ML is
       | pure gold.
        
         | hackernewds wrote:
         | haha this is some solid astroturfing Eugene :)
        
           | 7d7n wrote:
           | haha that wasn't me ;)
        
             | anon373839 wrote:
             | Oops, sorry! Wasn't trying to make you look bad. Just a fan
             | of your writing.
        
               | 7d7n wrote:
               | Not at all! I appreciate the kind words. Thank you!
        
       | anonymousDan wrote:
       | It's interesting that none of these papers seem to be coming out
       | of academic labs....
        
         | pizza wrote:
         | Checking if a recommendation system is actually good in
         | practice is kind of tough to do without owning a whole internet
         | media platform as well. At best, you'll get the table scraps
         | from these corporations (in the form of toy datasets/models
         | made available), and you still will struggle to make your dev
         | loop productive enough without throwing similar amounts of
         | compute that the ~FAANGs do so as to validate whether that 0.2%
         | improvement you got really meant anything or not. Oh, and also,
         | the nature of recommendations is that they get very stale very
         | quickly, so be prepared to check that your method still works
         | when you do yet another huge training run on a weekly/daily
         | cadence.
        
           | lmeyerov wrote:
           | As someone whose customers do this stuff, I'm 100% for most
           | academics chasing harder and more important problems
           | 
           | Most of these papers are specialized increments on high
           | baselines for a primarily commercial problem. Likewise, they
           | focus on optimizing phenomena that occur in their product,
           | which may not occur in others. Eg, Netflix sliding window is
           | neato to see the result of, but I rather students user their
           | freedom to explore bigger ideas like mamba, and leave sliding
           | windows to a masters student who is experimenting with
           | intentionally narrowly scoped tweaks.
        
           | bradly wrote:
           | > you still will struggle to make your dev loop productive
           | enough without throwing similar amounts of compute that the
           | ~FAANGs do so as to validate whether that 0.2% improvement
           | you got really meant anything or not
           | 
           | And do not forget the incredible of number of actual humans
           | FAANG pays every day to evaluate any changes in result sets
           | for top x,000 queries.
        
         | lmeyerov wrote:
         | As someone whose customers do this stuff, I'm 100% for most
         | academics chasing harder and more important problems.
         | 
         | Most of these papers are specialized increments on high
         | baselines for a primarily commercial problem. Likewise, they
         | focus on optimizing phenomena that occur in their product,
         | which may not occur in others. Eg, Netflix sliding window is
         | neato to see the result of, but I rather students user their
         | freedom to explore bigger ideas like mamba, and leave sliding
         | windows to a masters student who is experimenting with
         | intentionally narrowly scoped tweaks. At that point, to top PhD
         | grads at industrial labs will probably win.
         | 
         | That said, recsys is a general formulation with applications
         | beyond shopping carts and social feeds, and bigger ideas do
         | come out, where I'd expect competitive labs to do projects on.
         | GNN for recsys was a big bet a couple years ago, and LLMs now,
         | and it is curious to me those bigger shifts are industrial labs
         | papers as you say. Maybe the statement there is recsys is one
         | of the areas that industry hires a lot of PhDs on, as it is so
         | core to revenue lift: academia has regular representation,
         | while industry is overrepresented.
        
       | jamesblonde wrote:
       | It is very interesting that Eugene does this work and publishes
       | it so soon after conferences. Traditionally this would be a
       | literature survey by a PhD student and would take 12 months to
       | come out as some obscure journal behind a walled garden. I wonder
       | if it is an outlier (Eugene is good!) or a sign of things to
       | come?
        
         | drodgers wrote:
         | > a sign of things to come
         | 
         | Isn't this, like, a sign of what's been happening for the last
         | 20+ years (arxiv, blogs etc.)?
        
           | jamesblonde wrote:
           | To some extent. But it's hard to find quality. Eugene's stuff
           | is quality. For example, i'm in distributed systems,
           | databases, and MLOps. Murat Demirbas (Uni Buffalo) has been
           | the best in dist systems. Andy Pavlo (CMU) for databases.
           | Stanford (Matei) have been doing the best summarizing in
           | MLOps.
        
       | thorum wrote:
       | In the age of local LLMs I'd like to see a personal
       | recommendation system that doesn't care about being scalable and
       | efficient. Why can't I write a prompt that describes exactly what
       | I'm looking for in detail and then let my GPU run for a week
       | until it finds something that matches?
        
         | r4ndomname wrote:
         | This is exactly what I am hoping to get sometimes (but I would
         | say, 1 week is maybe a little long).
         | 
         | If I go through my current tasks and see, that for some task I
         | need a set of documents, emails, .., why cant I just prompt the
         | system to get it in 30-ish minutes. But as someone already
         | stated Apple Intelligence is supposed to fill this gap.
        
           | mdp2021 wrote:
           | > _maybe a little long_
           | 
           | Many of us have ongoing problems pending for years - for just
           | "a week", "where do I sign".
           | 
           | It really depends on the task.
        
         | whiplash451 wrote:
         | Why would it take a week?
         | 
         | Is this because you want it to continuously watch for live data
         | that could match your need?
        
           | mdp2021 wrote:
           | Because thinking takes time.
        
         | pizza wrote:
         | It's worth pointing out that even with the largest models out
         | there, coherence drops fast over length. In a local home ML
         | setup, until somebody radically improves long-term coherence,
         | models with < x memory may be a diametrically opposed
         | constraint to something that still says the right thing after >
         | y minutes of search.
        
         | fhe wrote:
         | or it keeps monitoring the web and notify me whenever something
         | that matches my interests shows up -- like a more sophisticated
         | Google alert. I really would love that.
        
         | osmarks wrote:
         | You could just run a local LLM over every document and ask it
         | "is this related to this query". I don't think you actually
         | want to wait a week (and holding all the documents you might
         | ever want to search would run to petabytes).
         | 
         | (the _reasonable_ way is embedding search, which runs much
         | faster with some precomputation, but you still have to store
         | things)
        
           | kortilla wrote:
           | The entire library of Congress is like 10TB. You don't need
           | anything near petabytes until you get out of text into rich
           | media.
        
             | osmarks wrote:
             | Common Crawl is petabytes. Anna's Archive is about a
             | petabyte, but it includes PDFs with images.
        
           | amelius wrote:
           | A better way would be to ask the LLM to generate keywords (or
           | queries). And then use old school techniques to find a set of
           | documents, and then filter those using another LLM.
        
             | brookst wrote:
             | How is that better than embeddings? You're using embeddings
             | to get a finite list of keywords, throwing out the extra
             | benefits of embeddings (support for every human language,
             | for instance), using a conventional index, and then going
             | back to embeddings space for the final LLM?
             | 
             | That whole thing can be simplified to: compute and store
             | embeddings for docs, compute embeddings for query, find
             | most similar docs.
        
               | amelius wrote:
               | Yes, you can do the "old school search" part with
               | embeddings.
        
               | brookst wrote:
               | Ah, I had interpreted "old school search" to mean classic
               | text indexing and Boolean style search. I'd argue that if
               | it's using embeddings and cosine similarity, it's not old
               | school. But that's just semantics.
        
             | osmarks wrote:
             | https://arxiv.org/abs/2212.10496
        
         | bryanrasmussen wrote:
         | this is sort of like a dream I had
         | https://medium.com/luminasticity/the-county-map-of-the-world...
         | 
         | >The idea was that he could graft queries in this that he did
         | not expect to finish quickly but which he could let run for
         | hours or days and how freeing it was to do more advanced
         | research this way.
        
         | desdenova wrote:
         | Why can't you?
         | 
         | Just run the biggest model you can find out of swap and wait a
         | long time for it to finish.
         | 
         | You'll obviously see more focus on smaller models, because most
         | people aren't willing to wait weeks for their slop, and also
         | don't have server GPU clusters to run huge models.
        
           | HeatrayEnjoyer wrote:
           | > Just run the biggest model you can find out of swap
           | 
           | This kills the SSD
        
       | onel wrote:
       | Another amazing post from Eugene
        
       | anthk wrote:
       | Use 'Recoll' and learn to use search strings. For Windows users,
       | older Recoll releases are standalone and have all the
       | dependencies bundled, so you can search into PDF's, ODT/DOCX and
       | tons more.
        
       | x1xx wrote:
       | > Spotify saw a 9% increase in exploratory intent queries, a 30%
       | rise in maximum query length per user, and a 10% increase in
       | average query length--this suggests the query recommendation
       | updates helped users express more complex intents
       | 
       | To me it's not clear that it should be interpreted as an
       | improvement: what I read in this summary is that users had to
       | search more and to enter longer queries to get to what they
       | needed.
        
         | rorytbyrne wrote:
         | We would need to normalise query length by the success rate to
         | draw any informative conclusions here. The rate of immediate
         | follow-up queries could be a decent proxy for this.
        
         | Traubenfuchs wrote:
         | > a 9% increase in exploratory intent queries
         | 
         | Users struggle to find the right stuff or stuff that's so good
         | they don't need do do more queries.
         | 
         | > a 30% rise in maximum query length per user, and a 10%
         | increase in average query length
         | 
         | Users need to execute more complex queries to find what they
         | are looking for.
        
         | 1oooqooq wrote:
         | that's what you get when you have a "search pm".
        
         | RicoElectrico wrote:
         | I can understand tracking metrics for performance (as in speed,
         | server load) or revenue. But I don't see how anyone could make
         | such conclusions as they did with a straight face, apart from
         | achieving some OKR for promotion reasons. There's no substitute
         | for user research, focused mindset and good taste.
         | 
         | I can imagine that's why today's apps suck so much as most of
         | the pain points won't be easily caught by user behavior
         | metrics.
         | 
         | One thing Alex from Organic Maps taught me is how important it
         | is to just listen to your users. Many of the UX improvements
         | were driven by addressing complaints from e-mail feedback.
        
         | wildrhythms wrote:
         | No you don't understand, more queries = more engagement!
        
           | MostlyStable wrote:
           | It's relatively easy to construct a scenario where more
           | search _is_ in fact indicative of _better_ search. To stick
           | with Spotify: let 's imagine they have an amazing search tool
           | that consistently finds new, interesting music that the user
           | genuinely likes. I can imagine that in that situation, users
           | are going to search more, because doing so consistently gets
           | them new, enjoyable music.
           | 
           | But the opposite is equally possible: a terrible search tool
           | could regularly fail to find what the user is looking for or
           | produce music that they enjoy. In this situation, I can
           | _also_ imagine users searching more, because it takes more
           | search effort to find something they like.
           | 
           | They key is _why_ are users searching. In Spotify 's case I
           | imagine that you could try and connect number of searches per
           | listen, or how often a search results in a listen and how
           | often those listens result in a positive rating. There are
           | probably more options, but there needs to be some way of
           | connecting the amount of search with how the user feels about
           | those search results.
           | 
           | And yeah, using nothing other than search volume is probably
           | a bad way to go about it
        
         | braiamp wrote:
         | Yeah, this should be evaluated in a multivariate/bivariate
         | model. Of the successful queries, how the length changed before
         | and after interventions.
        
       | stuaxo wrote:
       | Off topic - but I think joining recommendation systems and forums
       | (aka all the social media that isn't bsky or fedi) has been a
       | complete disaster for society.
        
       | thaumiel wrote:
       | ah this explains why my spotify experience has gotten worse over
       | time.
        
         | UrineSqueegee wrote:
         | I have the exact opposite experience, recently when a playlist
         | I have is over, I find that every recommended track that plays
         | after, I love so much I end up putting in my playlist
        
           | appleorchard46 wrote:
           | I liked when you could make a playlist radio and do that
           | manually. That's been removed now of course.
        
             | Melatonic wrote:
             | On desktop I believe you can still take any of your
             | playlists and tell it to generate a "similar" playlist.
             | Works really well.
        
           | thaumiel wrote:
           | My taste in music is apparently so varied, that if I want to
           | keep the "daily" Spotify list as I want them, I have to limit
           | myself in variation in what I listen to, otherwise they will
           | get too mixed up and I will not enjoy them anymore. So I use
           | other peoples recommendations or music review sites instead
           | to find new music/bands/artists. I tried the spotify AI dj
           | service a couple of times, but it has not been a good
           | experience, when it tries to push in a new direction it has
           | never really gotten it right for me.
        
       | tullie wrote:
       | The other direction that isn't explicitly mentioned in this post
       | is the variants of SASRec and Bert4Rec that are still trained on
       | ID-Tokens but showing scaling laws much like LLMs. E.g. Meta's
       | approach https://arxiv.org/abs/2402.17152 (paper write up here:
       | https://www.shaped.ai/blog/is-this-the-chatgpt-moment-for-re...)
        
       | bookofjoe wrote:
       | Perplexity Pro suggested several portable car battery chargers,
       | which led me to search online reviews, whose consensus (five or
       | so review sites) highest-rated chargers were the first two on
       | Perplexity's recommendation list. In other words, the AI was an
       | helpful guide to focused deeper search.
        
       | memhole wrote:
       | It looks like a great overview of recommendation systems. I think
       | my main takeaways are:
       | 
       | 1. Latency is a major issue.
       | 
       | 2. Fine tuning can lead to major improvements and I think reduce
       | latency. If I didn't misread.
       | 
       | 3. There's some threshold or problems where prompting or fine
       | tuning should be used.
        
       | novia wrote:
       | I started listening to this article (using a text to speech
       | model) shortly after waking up.
       | 
       | I thought it was very heavy on jargon. Like, it was written in a
       | way that makes the author appear very intelligent without
       | necessarily effectively conveying information to the audience.
       | This is something that I've often seen authors do in academic
       | papers, and my one published research paper (not first author) is
       | no exception.
       | 
       | I'm by no means an expert in the field of ML, so perhaps I am
       | just not the intended audience. I'm curious if other people here
       | felt the same way when reading though.
       | 
       | Hopefully this observation / opinion isn't too negative.
        
         | curious_cat_163 wrote:
         | To me, it reads like a survey paper intended for (and maybe
         | written by) a researcher about to start a new project. I am not
         | a researcher in this space but I have dabbled elsewhere, so it
         | is somewhat accessible. The degree to which one leverages
         | existing jargon in their writing is a choice, of course.
         | 
         | I am curious -- what would have made it more effective at
         | conveying information to you? Different people learn
         | differently but I wonder how people get beyond the hurdles of
         | jargon.
        
           | novia wrote:
           | Yeah I'm not sure if it's just me and my learning style or if
           | researchers purposefully use terminology that's obstructive
           | to understanding to maintain walled gardens. I don't think my
           | reading comprehension level is particularly low!
           | 
           | Usually the best way to learn about things like this for me
           | is to see some actual code or to write things myself, but the
           | lack of coding examples in the text isn't the thing that I
           | find troubling. I don't know, it's just.. like, excessively
           | pointer heavy?
           | 
           | Maybe if you've been in the field long enough, reading a
           | particular term will instantly conjure up an idea of a
           | corresponding algorithm or code block or something and that's
           | what I'm missing.
        
         | 7d7n wrote:
         | Thank you for the feedback! I'm sorry you found it jargony/less
         | accessible than you'd like.
         | 
         | The intended audience was my team and fellow practitioners;
         | assuming some understanding of the jargon allowed me to skip
         | the basics and write more concisely.
        
       | softwaredoug wrote:
       | A lot of teams can do a lot with search with just LLMs in the
       | loop on query and index side doing enrichment that used to be
       | months-long projects. Even with smaller, self hosted models and
       | fairly naive prompts you can turn a search string into a more
       | structured query - and cache the hell out of it. Or classify
       | documents into a taxonomy. All backed by boring old lexical or
       | vector search engine. In fact I'd say if you're NOT doing this
       | you're making a mistake.
        
         | syndacks wrote:
         | Can you share more, or at least point me in the right
         | direction?
        
           | ntonozzi wrote:
           | One place to explore more would be Doc2Query:
           | https://arxiv.org/abs/1904.08375.
           | 
           | It's not the latest and hottest but super simple to do with
           | LLMs these days and can improve a lexical search engine quite
           | a lot.
        
       ___________________________________________________________________
       (page generated 2025-03-23 23:00 UTC)