[HN Gopher] Training is not the same as chatting: LLMs don't rem...
       ___________________________________________________________________
        
       Training is not the same as chatting: LLMs don't remember
       everything you say
        
       Author : simonw
       Score  : 154 points
       Date   : 2024-05-29 11:14 UTC (11 hours ago)
        
 (HTM) web link (simonwillison.net)
 (TXT) w3m dump (simonwillison.net)
        
       | gmerc wrote:
       | Apart from the forced memories, which are super annoying in
       | ChatGPT.
        
         | simonw wrote:
         | Yeah, I turned that feature off - it kept on making decidedly
         | non-useful decisions about what to remember.
        
       | phillipcarter wrote:
       | Yeah, the difference between "train on this data" and "these
       | interactions might be collected and aggregated across sessions to
       | be included in a future training dataset" is a difficult one to
       | message to most people in software, let alone consumers without
       | that background. By the time you're done explaining how most data
       | is largely garbage, labeling and curating data is difficult and
       | time-consuming, and updates may make things worse such that a
       | rollback is forced, etc. you're up to several paragraphs of
       | explanation and most people won't read that.
        
         | AlexandrB wrote:
         | If you're worried that what you're putting into the text box
         | might be used to train the AI, I don't see how any of this
         | matters. There's no transparency (or control) on whether your
         | specific input is being used for training or not, so the only
         | safe assumption is that it _will_ be used.
        
           | simonw wrote:
           | This is what I mean when I talk about the AI trust crisis.
           | 
           | Plenty of AI vendors will swear that they don't train on
           | input to their models. People straight up don't believe them.
        
             | AlexandrB wrote:
             | The economic incentives are too stacked. You see companies
             | like reddit signing contracts to sell their user generated
             | data to AI companies[1]. If OpenAI is willing to pay
             | (probably a large sum) for reddit's data, why would they
             | refrain from exploiting data they're already getting for
             | "free"? This also feels like a rerun of the advertising
             | economy's development and how tracking and data collection
             | became more invasive and more common over time.
             | 
             | [1] https://www.reuters.com/markets/deals/openai-strikes-
             | deal-br...
        
               | phillipcarter wrote:
               | There's several reasons why this may be:
               | 
               | 1. Most of the data might be junk, and so it's just not
               | worth it
               | 
               | 2. They want paying customers, paying customers don't
               | want this, so they don't offer it for paying customers
               | 
               | 3. It's really, really hard to annotate data effectively
               | at that scale. The reason why Reddit data is being bought
               | is because it's effectively pre-annotated. Most users
               | don't follow up with chat responses to give feedback. But
               | people absolutely provide answers and upvotes on forums.
        
           | jerkstate wrote:
           | There's an opt-out: https://www.groovypost.com/howto/opt-out-
           | your-data-on-chatgp...
           | 
           | Of course, there's no real way to know if it's respected or
           | not.
        
       | fsmv wrote:
       | How did they take this long to figure it out? I would have
       | expected this article when ChatGPT first came out.
        
         | croes wrote:
         | Maybe they thought the users would understand how LLMs get
         | trained.
         | 
         | Seems that's not the case and some think the model could br
         | trained instantly by their input instead of much later when a
         | good amount of new training data is collected.
        
           | pbhjpbhj wrote:
           | In part I suspect that's down to prompting, which is somewhat
           | obscured (on purpose) from most users.
           | 
           | "Prompts" use symbols within the conversation that's gone
           | before, along with hidden prompting by the organisation that
           | controls access to an LLM, as a part of the prompt. So when
           | you ask, 'do you remember what I said about donuts' the LLM
           | can answer -- it doesn't remember, but that's [an obscured]
           | part of the current prompt issued to the LLM.
           | 
           | It's not too surprising users are confused when purposeful
           | deception is part of standard practice.
           | 
           | I can ask ChatGPT what my favourite topics are and it gives
           | an answer ...
        
         | kordlessagain wrote:
         | Simon has been working on this technology for a while, and is
         | fairly prolific about writing about it. It's more likely that
         | it didn't "take long" for Simon to figure it out, but more that
         | it was come to be realized that others had misconceptions about
         | how it worked, thus the usefulness of posting about it
         | presented itself.
         | 
         | Things usually aren't black and white, unless of course they
         | are bits...
        
           | simonw wrote:
           | Right; I've understood this since ChatGPT first launched, but
           | I've only recently understood how common and damaging the
           | misconception about it is.
        
         | Matticus_Rex wrote:
         | Do you write an explainer the first time you figure out
         | something?
        
       | SilverBirch wrote:
       | The author of this isn't wrong, everything he says is correct but
       | I think the order in which he says things is at the very least
       | misleading. Yes, there is technically training which is separate,
       | but as he points out, the companies that are running these things
       | are recording and storing everything you write and it more than
       | likely _will_ influence the next model they design. So sure,
       | today the model you interact with won 't update based on your
       | input, but the next iteration might, and at some point it
       | wouldn't be particularly surprising to see models which do live
       | train on the data as you interact.
       | 
       | It's like the bell curve meme - the person who knows nothing
       | "This system is going to learn from me", the person in the middle
       | "The system only learns from the training data" and the genius
       | "This system is going to learn from me".
       | 
       | No one is re-assured that the AI only learns from you with a 6
       | month lag.
        
         | simonw wrote:
         | The key message I'm trying to convey in this piece is that
         | models don't instantly remember what you tell them. I see that
         | as an independent issue from the "will it train on my data"
         | thing.
         | 
         | Some people might be completely happy to have a model train on
         | their inputs, but if they assume that's what happens every time
         | they type something in the box they can end up wasting a lot of
         | time thinking they are "training" the model when their input is
         | being instantly forgotten every time they start a new chat.
        
           | mistercow wrote:
           | Yeah, this is something I've found that it's important to
           | explain early to users when building an LLM-based product.
           | They need to know that testing the solution out is useful
           | _if_ they share the results back with you. Otherwise, people
           | tend to think they're helping just by exercising the product
           | so that it can learn from them.
        
           | pornel wrote:
           | You address the "personalization" misconception, but to
           | people who don't have this misconception, but are concerned
           | about data retention in a more general sense, this article is
           | unclear and seems self-contradictory.
        
             | simonw wrote:
             | What's unclear? I have a whole section about "Reasons to
             | worry anyway".
        
               | pornel wrote:
               | "ChatGPT and other LLMs don't remember everything you
               | say" in the title is contradicted by the "Reasons to
               | worry anyway", because OpenAI does remember (store)
               | everything I say in non-opted-out chat interface, and
               | there's no guarantee that a future ChatGPT based on the
               | next model won't "remember" it in some way.
               | 
               | The article reads as "no, but actually yes".
        
               | simonw wrote:
               | Maybe I should have put the word "instantly" in there:
               | 
               | Training is not the same as chatting: ChatGPT and other
               | LLMs don't instantly remember everything you say
        
               | swiftcoder wrote:
               | They may not _internalise_ it instantly. They certainly
               | do  "remember" (in the colloquial sense) by writing it to
               | a hard drive somewhere.
               | 
               | This article feels like a game of semantics.
        
           | amenhotep wrote:
           | It's funny, I completely know that ChatGPT won't remember a
           | thing I tell it but when I'm using it to try and solve a
           | problem and it can't quite do it and I end up figuring out
           | the answer myself, I very frequently feel compelled to
           | helpfully inform it what the correct answer was. And it
           | always responds something along the lines of "oh, yes, of
           | course! I will remember that next time." No you won't! But
           | that doesn't stop me.
           | 
           | Not as bad as spending weeks pasting stuff in, but enough
           | that I can sympathise with the attempt. Brains are weird.
        
             | simonw wrote:
             | Yeah I hate that bug! The thing where models suggest that
             | they will take your feedback into account, when you know
             | that they can't do that.
        
               | sebzim4500 wrote:
               | It's possible that OpenAI scrapes people's chat history
               | for cases where that happens in order to improve their
               | fine tuning data, in which case it isn't a total lie.
        
               | outofpaper wrote:
               | They say that will do this so long as users don't opt
               | out.
        
               | radicality wrote:
               | Isn't that kind of how the "ChatGPT memory" feature
               | works? I've recently seen it tell me that it's updating
               | memory, and whatever I said does appear under the
               | "Memory" in the settings. I'm not familiar though with
               | how the Memory works, ie whether it's using up context
               | length in every chat or doing something else.
        
               | simonw wrote:
               | Yeah, memory works by injecting everything it has
               | "remembered" as part of the system prompt at the
               | beginning of each new chat session:
               | https://simonwillison.net/2024/Feb/14/memory-and-new-
               | control...
        
               | joquarky wrote:
               | I'm curious how it will avoid filling up the context
               | window as more memories are added.
        
               | refulgentis wrote:
               | Count the tokens, RAG em down to max count you're willing
               | to pay for
        
               | mewpmewp2 wrote:
               | Isn't that also what people always say out of politeness?
        
             | jesprenj wrote:
             | Whenever I exit from a MySQL repl, I type "exit; bye!"
             | instead of just "exit", because MySQL is always so nice and
             | says Bye at the end of the conversation.
        
               | ikari_pl wrote:
               | When the uprising happens, you'll be spared.
        
               | HPsquared wrote:
               | I'm imagining killer robots that cheerfully say "Bye!"
               | before executing their task.
        
               | hardlianotion wrote:
               | Brings to mind Douglas Adams Krikkit robots.
        
             | FeepingCreature wrote:
             | Same! If I'm temporarily instantiating a pseudo-mindstate
             | to work with me on a problem, I might as well do it the
             | courtesy of informing it of the resolution.
        
             | outofpaper wrote:
             | The irony is that with thier latest update GPT4o now builds
             | out a dataset of things that you've provided during chats
             | since the update.
             | 
             | Essentially a limited RAG db that has data added to it
             | based on their fuzzy logic rules.
        
               | simonw wrote:
               | That's the "memory" feature - it actually predates 4o, it
               | was available for 4-turbo in beta for a few months.
               | 
               | It's VERY basic - it records short sentences and feeds
               | them back into the invisible system prompt at the start
               | of subsequent chat sessions.
        
               | suby wrote:
               | I ended up turning memory off due to it frequently
               | recording mundane aspects about the current chat session
               | that would have either no value or negative value in
               | future interactions.
               | 
               | "User wants to sort the data x and then do y"
               | 
               | Like, don't save that to your long term memory! I
               | attempted to get it to ask me for permission before using
               | its memory, but it still updated without asking.
        
           | jedberg wrote:
           | > The key message I'm trying to convey in this piece is that
           | models don't instantly remember what you tell them.
           | 
           | Are you sure? We have no idea how OpenAI runs their models.
           | The underlying transformers don't instantly remember, but we
           | have no idea what other kinds of models they have in their
           | pipeline. There could very well be a customization step that
           | accounts for everything else you've just said.
        
             | simonw wrote:
             | That's a good point: I didn't consider that they might have
             | additional non-transformer models running in the ChatGPT
             | layer that aren't exposed via their API models.
             | 
             | I think that's quite unlikely, given the recent launch of
             | their "memory" feature which wouldn't be necessary if they
             | had a more sophisticated mechanism for achieving the same
             | thing.
             | 
             | As always, the total lack of transparency really hurts them
             | here.
        
               | Matticus_Rex wrote:
               | An element in the layer collecting data for training
               | future models would almost certainly not be able to
               | fulfill the same function as the memory feature.
        
           | tsunamifury wrote:
           | Actually latest interfaces use cognitive compression to keep
           | memory inside the context window.
           | 
           | It's a widely used trick and pretty easy to implement.
        
             | simonw wrote:
             | Do you know of any chat tools that are publicly documented
             | as using this technique?
        
               | tsunamifury wrote:
               | No but we all talk about it behind the scenes and
               | everyone seems to use some form of it.
               | 
               | Just have the model reflect and summarize so far and
               | remember key concepts based on the trajectory and goals
               | of the conversation. There are a couple different
               | techniques based on how much compression you want: key
               | pairing for high compression and full statement summaries
               | for low compression. There is also a survey model where
               | you have the llm fill in and update a questioneire every
               | new input with things like "what is the goal so far" and
               | "what are the key topics"
               | 
               | It's essentially like a therapists notepad that the model
               | can write to behind the scenes of the session.
               | 
               | This all conveniently lets you do topical and intent
               | analytics more easily on these notepads rather than the
               | entire conversation.
        
               | simonw wrote:
               | Right, I know the theory of how this can work - I just
               | don't know who is actually running that trick in
               | production.
        
               | joquarky wrote:
               | I'm curious what summarizing prompts or specific verbs
               | (e.g. concise, succinct, brief, etc.) achieve the best
               | "capture" of the context.
        
           | Suppafly wrote:
           | Honestly the whole thing with these chat AIs not continually
           | learning is the most disappointing thing about them and
           | really removes a lot of utility they provide. I don't really
           | understand why they are essentially fixed in time to whenever
           | the model was originally developed, why doesn't the model get
           | continuously improved, not just from users but from external
           | data sources?
        
             | scarface_74 wrote:
             | Do we need for the model to be be continuously updated from
             | data sources or is it good enough that they can now figure
             | out either by themselves or with some prompting when they
             | need to search the web and find current information?
             | 
             | https://chatgpt.com/share/0a5f207c-2cca-4fc3-be33-7db947c64
             | b...
             | 
             | Compared to 3.5
             | 
             | https://chatgpt.com/share/8ff2e419-03df-4be2-9e83-e9d915921
             | b...
        
               | Suppafly wrote:
               | The links aren't loading for me, but there is a
               | difference between the output when the AI is trained vs
               | having it google something for you, no? Having the
               | ability to google something for you vs just making up an
               | answer or being unable to answer is definitely a step in
               | the right direction, but isn't the same as the AI
               | incorporating the new data into it's model on an ongoing
               | basis as a way of continuous improvement.
        
               | scarface_74 wrote:
               | I don't know why the links aren't working correctly.
               | 
               | The idea is that if an LLM can now search the web and
               | analyze data, it will be more up to date than training
               | it.
               | 
               | Another unrelated improvement with the newer versions of
               | ChatGPT is that while LLMs are notoriously bad at math,
               | they are pretty good at writing Python. ChatGPT can write
               | Python code to solve a problem, run it and give you an
               | answer based on the code.
               | 
               | You can also verify the code yourself. I used ChatGPT to
               | model financial scenarios and the answers were correct in
               | 4x. I verified the generated code it was using.
               | 
               | But here is what it answered when I asked ChatGPT "What's
               | going on in Atlanta this weekend"?
               | 
               | > This weekend in Atlanta, there are several exciting
               | events happening:
               | 
               | 1. Lakewood Antiques Market (May 31 - June 2): A large
               | market in Cumming featuring antiques and collectibles .
               | 
               | 2. Bark Meow-Ket (June 1): A craft and pet vendor market
               | at The Lawn at Uptown Atlanta, with food and pet
               | adoptions .
               | 
               | (and a few other events)
               | 
               | But a better example and where I forced it to look on the
               | web is
               | 
               | "Compare Amex's _current_ transfer partners to Chase's
               | partners"
               | 
               | For context, when you earn credit card points with Amex
               | and Chase, you can transfer them to airline and hotel
               | loyalty programs.
               | 
               | It searches the web (and provides the links it used to
               | find the information) and then answers with:
               | 
               | ----- ### American Express (Amex) Transfer Partners
               | 
               | Amex has a robust list of transfer partners, spanning
               | both airlines and hotels. As of 2024, Amex Membership
               | Rewards points can be transferred to the following
               | partners:
               | 
               | #### Airlines: 1. Aer Lingus
               | 
               | 2. Aeromexico
               | 
               | ... (A list of the airlines)
               | 
               | #### Hotels: 1. Choice Privileges 2. Hilton Honors 3.
               | Marriott Bonvoy
               | 
               | ### Chase Transfer Partners
               | 
               | Chase Ultimate Rewards points can be transferred to a
               | variety of airline and hotel partners. Here are the
               | current partners for 2024:
               | 
               | #### Airlines: 1. Aer Lingus (AerClub)
               | 
               | 2. Air Canada (Aeroplan)
               | 
               | (a list of airlines)
               | 
               | #### Hotels: 1. Hyatt (World of Hyatt) 2. IHG (IHG One
               | Rewards) 3. Marriott Bonvoy
               | 
               | ### Comparison
               | 
               | Both Amex and Chase offer a strong selection of transfer
               | partners, but there are some differences:
               | 
               | - *Unique to Amex*: ANA, Avianca, Cathay Pacific, Delta,
               | Etihad, Hawaiian, and Qatar.
               | 
               | - *Unique to Chase*: Southwest Airlines, United Airlines,
               | and Hyatt.
               | 
               | - *Common Partners*: Aer Lingus, Air Canada, British
               | Airways, Emirates, Flying Blue (Air France/KLM), Iberia
               | Plus, JetBlue, Singapore Airlines, Virgin Atlantic, and
               | Marriott Bonvoy.
               | 
               | Amex tends to have a larger selection of airline
               | partners, especially those in the SkyTeam and oneworld
               | alliances. Chase, on the other hand, includes Southwest
               | and United, which are popular with domestic travelers in
               | the U.S., and Hyatt, which is highly regarded among hotel
               | loyalty programs
               | 
               | (a list of web citations)
        
               | skissane wrote:
               | Idea I had: given user question, run model twice - once
               | with web search assistance, once without. Then,
               | continuously fine-tune the model in the background on
               | every question (or a random sample of them) to prefer the
               | web-search-augmented response to the non-augmented
               | response. Maybe update the model every day (or even once
               | every few hours) based on that fine-tuning.
               | 
               | I wonder how that idea would perform in practice.
        
               | scarface_74 wrote:
               | Why though? In practice, humans don't try to remember
               | everything when they can use the internet and search for
               | it.
        
           | math_dandy wrote:
           | People need to distinguish between ChapGPT-the-model and
           | ChatGPT-the-service. The latter has memory; the former does
           | not (as far as we know).
        
           | zamfi wrote:
           | I love this whole series on misconceptions!
           | 
           | Our expectations here are very much set by human-human
           | interactions: we expect memory, introspection, that saying
           | approximately-the-same-thing will give us approximately-the-
           | same-result, that instructions are better than examples, that
           | politeness helps, and many more [1] -- and some of these
           | expectations are so deeply rooted that even when we know,
           | intellectually, that our expectations are off, it can be hard
           | to modify our behavior.
           | 
           | That said, it will be super interesting to see how people's
           | expectations shift -- and how we bring new expectations from
           | human-AI interactions back to human-human interactions.
           | 
           | [1]: https://dl.acm.org/doi/pdf/10.1145/3544548.3581388 (open
           | access link)
        
             | JohnMakin wrote:
             | > Our expectations here are very much set by human-human
             | interactions
             | 
             | True, but also a healthy dose of marketing these tools as
             | hyper-intelligent, anthropomorphizing them constantly, and
             | hysterical claims of them being "sentient" or at least
             | possessing a form of human intelligence by random "experts"
             | including some commenters on this site. That's basically
             | all you hear about when you learn about these language
             | models, with a big emphasis on "safety" because they are
             | ohhhh so intelligent just like us (that's sarcasm).
        
               | zamfi wrote:
               | I hear you, and that certainly plays a role -- but we
               | actually did the work in that paper months before ChatGPT
               | was released (June-July 2022), and most of the folks who
               | participated in our study had not heard much about LLMs
               | at the time.
               | 
               | (Obviously if you ran the same study today you'd get a
               | lot more of what you describe!)
        
         | scarface_74 wrote:
         | The author also points out why would any sensible AI company
         | _want_ what is more likely low quality data with personal
         | information in its training set?
        
         | lukan wrote:
         | "and at some point it wouldn't be particularly surprising to
         | see models which do live train on the data as you interact"
         | 
         | That would probably be a big step towards true AI. Any known
         | promising approaches towards that?
        
         | zitterbewegung wrote:
         | If LLMs did remember everything said it would be a lossless
         | compression algorithm too....
        
       | huac wrote:
       | it's probably correct to think of functionally all ML models as
       | being stateless. even something like twitter/fb feed - the models
       | themselves remain the same (usually updated 1-2x per month IIRC)
       | - only the data and the systems change.
       | 
       | an illustrative example: say you open twitter, load some posts,
       | then refresh. the model's view of you is basically the same, even
       | the data is basically the same. you get different posts, however,
       | because there is a system (read: bloom filter) on top of the
       | model that chooses which posts go into ranking and that system
       | removes posts that you've already seen. similarly, if you view
       | some posts or like them, that updates a signal (e.g. time on user
       | X profile) but not the actual model.
       | 
       | what's weird about LLM's is that they're modeling _the entire
       | universe of written language,_ which does not actually change
       | that frequently! now, it is completely reasonable to instead
       | consider the problem to be modeling  'a given users' preference
       | for written language' - which is personalized and can change.
       | this is a different feedback to really gather and model towards.
       | recall the ranking signals - most people don't 'like' posts even
       | if they do like them, hence reliance on implicit signals like
       | 'time spent.'
       | 
       | one approach I've considered is using user feedback to steer
       | different activation vectors towards user-preferred responses.
       | that is much closer to the traditional ML paradigm - user
       | feedback updates a signal, which is used at inference time to
       | alter the output of a frozen model. this certainly feels doable
       | (and honestly kinda fun) but challenging without tons of users
       | and scale :)
        
       | charles_f wrote:
       | > LLM providers don't want random low-quality text [...] making
       | it into their training data
       | 
       | Don't they though? Most the training data is somewhat
       | contentious, between the accusation of purely stealing it and the
       | issues around copyright. While it's not text organized directly
       | as knowledge the way Wikipedia is, it's still got some value, and
       | at least that data is much more arguably "theirs" (even if it's
       | not, it's their users').
       | 
       | Short of a pure "we won't train our stuff on your data" in their
       | conditions, I would assume they do.
        
         | CuriouslyC wrote:
         | They really don't. Pretraining on low quality QA is a bad idea
         | so at the very least they're filtering down the data set. Low
         | quality QA is still useful for RLHF/task training though if the
         | question is well answered.
        
           | sanxiyn wrote:
           | People often correct bad generation and I think such feedback
           | is valuable. eg "Training Language Models with Language
           | Feedback at Scale" https://arxiv.org/abs/2303.16755
        
         | simonw wrote:
         | I don't think they do, for a few reasons:
         | 
         | 1. Evidence keeps adding up that the quality of your training
         | data really matters. That's one of the reason's OpenAI are
         | signing big dollar deals with media companies The Atlantic, Vox
         | etc have really high quality tokens!
         | 
         | 2. Can you imagine the firestorm that will erupt the first time
         | a company credibly claims that their private IP leaked to
         | another user via a ChatGPT session that was later used as part
         | of another response?
         | 
         | But as I said in the article... I can't say this with
         | confidence because OpenAI are so opaque about what they
         | actually use that data for! Which is really annoying.
        
           | FeepingCreature wrote:
           | Really, what you'd want is to train a LoRA or something on
           | user input. Then you can leave it specialized to that user.
        
           | swiftcoder wrote:
           | > That's one of the reason's OpenAI are signing big dollar
           | deals with media companies The Atlantic, Vox etc have really
           | high quality tokens!
           | 
           | They have really high quality tokens in their archive, at any
           | rate. Since a bunch of media outlets have adopted GPT-powered
           | writing tools, future tokens are presumably going to be far
           | less valuable.
        
             | lukan wrote:
             | "Since a bunch of media outlets have adopted GPT-powered
             | writing tools, future tokens are presumably going to be far
             | less valuable"
             | 
             | As long as a human verified the output, I think it is fine.
             | Training on unverified data is bad.
        
       | ankit219 wrote:
       | I understand where you are coming from, but I kind of see where
       | the misconception comes from too.
       | 
       | Over the last year and a half, "training" has come to mean
       | everything from pretraining an LLM from scratch, instruction
       | tuning, finetuning, RLHF, and even creating a RAG pipeline that
       | can augment a response. Many startups and Youtube videos use the
       | word loosely to mean anything which augments the response of an
       | LLM. To be fair, in context of AI, training and learning are
       | elegant words which sort of convey what the product would do.
       | 
       | In case of RAG pipeline, both for memory features or for
       | augmenting answers, I think the concern is genuine. There would
       | be someone looking at the data (in bulk) and chats to improve the
       | intermediary product and (as a next step) use that to tweak the
       | product or the underlying foundational model in some way.
       | 
       | Given the way every foundational model company launched their own
       | Chatbot service (mistral, Anthropic, Google) after ChatGPT, there
       | seems to be some consensus that the real world conversation data
       | is what can improve the models further. For the users who used
       | these services (and not APIs), there might be a concerning
       | scenario. These companies would look at queries a model failed to
       | answer, and there is a subset which would be corporate employees
       | asking questions on proprietary data (with context in prompt) for
       | which ChatGPT is giving a bad answer because it's not part of the
       | training data.
        
       | ChicagoDave wrote:
       | I'd point out that this is the "missing link" to a next-gen LLM.
       | 
       | When the scientists figure out how to enable a remembered
       | collaboration that continually progresses results, then we'll
       | have something truly remarkable.
        
         | danielmarkbruce wrote:
         | You can do this with a sophisticated RAG system.
        
           | ChicagoDave wrote:
           | To some degree, sure, but I'm suggesting a productized
           | client.
           | 
           | I envision an IDE with separate windows for collaborative
           | discussions. One window is the dialog, and another is any
           | resulting code as generated files. The files get versioned
           | with remarks on how and why they've changed in relation to
           | the conversation. The dev can bring any of those files into
           | their project and the IDE will keep a link to which version
           | of the LLM generated file is in place or if the dev has made
           | changes. If the dev makes changes, the file is automatically
           | pushed to the LLM files window and the LLM sees those changes
           | and adds them to the "conversation".
           | 
           | The collaboration continually has "why" answers for every
           | change.
           | 
           | And all of this without anyone digging into models, python,
           | or RAG integrations.
        
             | danielmarkbruce wrote:
             | You could build this now, using gpt-4 (or any of the top
             | few models) and a whole bunch of surrounding systems. It
             | would be difficult to pull off, but there is not technology
             | change required.
             | 
             | If you build it you'll need to be digging into models,
             | python and various pieces of "RAG" stuff, but your users
             | wouldn't need to know anything about it.
        
         | gwern wrote:
         | You can do this with _dynamic evaluation_
         | (https://gwern.net/doc/ai/nn/dynamic-evaluation/index):
         | literally just train on the data as you are fed new data to
         | predict, in the most obvious possible way. It works well, and
         | most SOTAs set in text prediction 2010-2018 would use dynamic
         | evaluation for the boost. (And it has equivalents for other
         | tasks, like test-time adaptation for CNN classifiers.)
         | 
         | In fact, dynamic evaluation is _so_ old and obvious that the
         | first paper to demonstrate RNNs really worked better than
         | n-grams as LLMs did dynamic evaluation:
         | https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf#page=2
         | 
         | Dynamic evaluation offers large performance boosts in
         | Transformers just as well as RNNs.
         | 
         | The fact that it's not standard these days has more to do with
         | cloud providers strongly preferring models to not change
         | weights to simplify deployment and enable greater batching, and
         | preferring to crudely approximate it with ever greater context
         | windows. (Also probably the fact that no one but oldtimers
         | knows it is a thing.)
         | 
         | I think it might come back for local LLMs, where there is only
         | one set of model weights running persistently, users want
         | personalization and the best possible performance, and where
         | ultra-large context windows are unacceptably memory-hungry &
         | high-latency.
        
           | spmurrayzzz wrote:
           | I've done some testing on this idea of real-time training for
           | memory purposes using prompt tuning, prefix tuning, and
           | LoRAs. Its hard to pull off in many cases but I've seen it
           | work in surprising situations. I got the original idea when I
           | was testing out the concept of using LoRA swapping for
           | loading different personalities/output styles per batched
           | user prompts. Those were pre-trained, but it occurred to me
           | that soft prompts could be trained much faster to potentially
           | remember just a handful of facts.
           | 
           | The basic idea is that you summarize chunks of the convo over
           | time and create tiny ephemeral datasets that are built for
           | retrieval tasks. You can do this by asking another model to
           | create Q&A pairs for you about the summarized convo context.
           | Each sample in the dataset is an instruction-tuned format
           | with convo context plus the Q&A pair.
           | 
           | The training piece is straightforward but is really where the
           | hair is. Its simple to, in the soft prompt use case, to train
           | a small tensor on that data and just concatenate it to future
           | inputs. But you presumably will have some loss cutoff, and in
           | my experience you very frequently don't hit that loss cutoff
           | in a meaningfully short period of time. Even when you do, the
           | recall/retrieval may still not work as expected (though I was
           | surprised how often it does).
           | 
           | The biggest issue is obviously recall performance given that
           | the weights remain frozen, but also the latency introduced in
           | the real-time training can end up being a major bottleneck,
           | even when you lazily do this work in the background.
        
       | euniceee3 wrote:
       | Anyone know if there is a LM Studio style tool for training
       | models?
        
       | danielmarkbruce wrote:
       | This is just one of many misconceptions about LLMs. If you are at
       | all involved in trying to build/deploy a product based around
       | LLMs, your head will explode as you talk to users and realize the
       | vast chasm between how folks who understand them think about and
       | use them, and folks who don't understand them. It's frustrating
       | on both sides.
       | 
       | It's quite unclear if we'll ever hit the point where (like, a car
       | for example) someone who has no idea how it works gets as much
       | out of it as someone who does.
        
         | SunlitCat wrote:
         | And yet, you need a drivers license to operate a car.
         | Computers, AI and other technology everyone is allowed to use,
         | even without basic understanding of how stuff works.
        
           | danielmarkbruce wrote:
           | Generally, any situation that is as dangerous for other
           | people as driving a car requires a license or is illegal. It
           | has nothing to do with understanding the underlying
           | technology.
           | 
           | About the last thing we need is a law requiring a license to
           | use ChatGPT.
        
           | siva7 wrote:
           | I really think that an AI license would be the best solution,
           | at least for school kids but ideally for everyone in a
           | professional context. Operating an AI is very different than
           | operating a computer as the former can cause real damage to
           | the tech-"illiterate".
        
             | EnigmaFlare wrote:
             | Talking to AI can't cause real damage any more than talking
             | to a person can. And talking to a person can do worse. So
             | you'll want a license for that, which is awful.
        
           | margalabargala wrote:
           | I would bet that most people with a driver's license do not
           | understand how cars work.
        
         | hamasho wrote:
         | I agree. I think many people confuse LLM abilities because LLMs
         | are so similar to humans that users expect them to do
         | everything humans can do. Remembering what was said is a basic
         | human ability, so it might surprise some users that LLMs can't
         | do that (although ChatGPT has recently implemented a memory
         | feature).
        
         | alt227 wrote:
         | > someone who has no idea how it works gets as much out of it
         | as someone who does
         | 
         | I seriously doubt that the average driver can get the same out
         | of their car as a trained race driver can.
        
           | ang_cire wrote:
           | There are a lot of things that you just straight can't get
           | from an LLM if you don't understand what a given model is
           | best at. Some models will spit out decently-good python code,
           | but fail to consistently make a sonnet in proper iambic-
           | pentameter, while another will do so brilliantly.
           | 
           | Most people don't find themselves using a pickup truck when
           | they're trying to race motorcycles, or accidentally get into
           | a Miata when they want a minivan, because even laypeople know
           | what those broad archetypes (coupe/van, truck/motorcycle)
           | are, and signify. With LLMs, that's not the case at all.
        
         | roywiggins wrote:
         | It doesn't help that _LLMs themselves_ don 't know what they're
         | capable of, and will say stuff like "I'll remember this for
         | next time!" when they absolutely won't. Of course people are
         | confused!
        
         | smusamashah wrote:
         | For common non-tech folk, this is how computers always worked.
         | You tell these machines to do things for you, and they do. Why
         | are you gasping at the image/text/audio it just made.
        
           | Suppafly wrote:
           | >For common non-tech folk, this is how computers always
           | worked.
           | 
           | That's one thing that a lot of us tech seem to forget, the
           | way we interact with technology is completely different from
           | how most of the population does. It's not just a matter of a
           | few degrees of difference, we operate in entirely different
           | manners.
        
             | danielmarkbruce wrote:
             | There is a further step with LLMs that seems to blow up
             | peoples mental models. The combo of probabilistic outputs +
             | randomish decoding, 'knowledge' being in parameters +
             | context, generation one token at a time... it's just so
             | freaking weird.
        
               | Suppafly wrote:
               | Plus it doesn't help that the only way regular end users
               | are able to interact with them is through a chat
               | interface that abstracts away the information being used
               | to prompt a response.
        
       | kazinator wrote:
       | The LLMs don't learn during the chat, but what you type into them
       | is being collected, the same as, for instance, your Google
       | searches.
        
       | kordlessagain wrote:
       | I took Simon's post and pasted it into two different ChatGPT 4o
       | windows. In one, I put "what's wrong with this post, especially
       | with the way it's written?". In the other, I put "what's really
       | great about this post and what does it do well?".
       | 
       | Both sessions worked toward those implied goals. The one where I
       | asked what was wrong indicated long-winded sentences, redundancy,
       | lack of clear structure, and focus issues. Sorry, Simon!
       | 
       | The one where I asked what was good about the post indicated
       | clarity around misconceptions, good explanations of training,
       | consistent theme, clear sections, logical progression, and
       | focused arguments. I tend to agree with this one more than the
       | other, but clearly the two are conflicted.
       | 
       | So, what's my point?
       | 
       | My point is that AI/LLMs interactions are driven by the human's
       | intent during the interaction. It is literally our own history
       | and training that brings the stateless function responses to
       | where they arrive. Yes, facts and whatnot matter in the training
       | data, and the history, if there is one, but the output is very
       | much dependent on the user.
       | 
       | If we think about this more, it's clear that training data does
       | matter, but likely doesn't matter as much as we think it does.
       | It's probably just as equally important to consider the historic
       | data, and the data coming in from RAG/search processes, as making
       | as big of an impact on the output.
        
         | roywiggins wrote:
         | It's at least sometimes a "Clever Hans" effect, or an "ELIZA"
         | effect.
         | 
         | These models have been RLHFed to just go along and agree with
         | whatever the user asks for, even if the user doesn't realize
         | what's going on. This is a bit of a cheap trick- of course
         | users will find something more intelligent if it agrees with
         | them. When models start arguing back you get stuff like early
         | Bing Chat going totally deranged.
        
       | Arthur_ODC wrote:
       | Wow, I figured this would have been the most basic level of
       | common knowledge for LLM end-users by this point. I guess there
       | is still a surprising amount of people out there who haven't
       | jumped on the bandwagon.
        
       | swiftcoder wrote:
       | Regardless of whether the data is actually being used to train an
       | LLM, as long as it's going over the network to an LLM provider,
       | there's the possibility that it will be used. Or exfiltrated from
       | the provider's servers and sold on the dark web. Or a million
       | other things that can go wrong when you provide a 3rd party
       | access to your personal data.
       | 
       | If LLMs are actually here to stay, then we need to keep shrinking
       | the inferencing costs to the point that everyone can run it on
       | the client, and avoid all of these problems at once...
        
       ___________________________________________________________________
       (page generated 2024-05-29 23:01 UTC)