[HN Gopher] Training is not the same as chatting: LLMs don't rem...
___________________________________________________________________
Training is not the same as chatting: LLMs don't remember
everything you say
Author : simonw
Score : 154 points
Date : 2024-05-29 11:14 UTC (11 hours ago)
(HTM) web link (simonwillison.net)
(TXT) w3m dump (simonwillison.net)
| gmerc wrote:
| Apart from the forced memories, which are super annoying in
| ChatGPT.
| simonw wrote:
| Yeah, I turned that feature off - it kept on making decidedly
| non-useful decisions about what to remember.
| phillipcarter wrote:
| Yeah, the difference between "train on this data" and "these
| interactions might be collected and aggregated across sessions to
| be included in a future training dataset" is a difficult one to
| message to most people in software, let alone consumers without
| that background. By the time you're done explaining how most data
| is largely garbage, labeling and curating data is difficult and
| time-consuming, and updates may make things worse such that a
| rollback is forced, etc. you're up to several paragraphs of
| explanation and most people won't read that.
| AlexandrB wrote:
| If you're worried that what you're putting into the text box
| might be used to train the AI, I don't see how any of this
| matters. There's no transparency (or control) on whether your
| specific input is being used for training or not, so the only
| safe assumption is that it _will_ be used.
| simonw wrote:
| This is what I mean when I talk about the AI trust crisis.
|
| Plenty of AI vendors will swear that they don't train on
| input to their models. People straight up don't believe them.
| AlexandrB wrote:
| The economic incentives are too stacked. You see companies
| like reddit signing contracts to sell their user generated
| data to AI companies[1]. If OpenAI is willing to pay
| (probably a large sum) for reddit's data, why would they
| refrain from exploiting data they're already getting for
| "free"? This also feels like a rerun of the advertising
| economy's development and how tracking and data collection
| became more invasive and more common over time.
|
| [1] https://www.reuters.com/markets/deals/openai-strikes-
| deal-br...
| phillipcarter wrote:
| There's several reasons why this may be:
|
| 1. Most of the data might be junk, and so it's just not
| worth it
|
| 2. They want paying customers, paying customers don't
| want this, so they don't offer it for paying customers
|
| 3. It's really, really hard to annotate data effectively
| at that scale. The reason why Reddit data is being bought
| is because it's effectively pre-annotated. Most users
| don't follow up with chat responses to give feedback. But
| people absolutely provide answers and upvotes on forums.
| jerkstate wrote:
| There's an opt-out: https://www.groovypost.com/howto/opt-out-
| your-data-on-chatgp...
|
| Of course, there's no real way to know if it's respected or
| not.
| fsmv wrote:
| How did they take this long to figure it out? I would have
| expected this article when ChatGPT first came out.
| croes wrote:
| Maybe they thought the users would understand how LLMs get
| trained.
|
| Seems that's not the case and some think the model could br
| trained instantly by their input instead of much later when a
| good amount of new training data is collected.
| pbhjpbhj wrote:
| In part I suspect that's down to prompting, which is somewhat
| obscured (on purpose) from most users.
|
| "Prompts" use symbols within the conversation that's gone
| before, along with hidden prompting by the organisation that
| controls access to an LLM, as a part of the prompt. So when
| you ask, 'do you remember what I said about donuts' the LLM
| can answer -- it doesn't remember, but that's [an obscured]
| part of the current prompt issued to the LLM.
|
| It's not too surprising users are confused when purposeful
| deception is part of standard practice.
|
| I can ask ChatGPT what my favourite topics are and it gives
| an answer ...
| kordlessagain wrote:
| Simon has been working on this technology for a while, and is
| fairly prolific about writing about it. It's more likely that
| it didn't "take long" for Simon to figure it out, but more that
| it was come to be realized that others had misconceptions about
| how it worked, thus the usefulness of posting about it
| presented itself.
|
| Things usually aren't black and white, unless of course they
| are bits...
| simonw wrote:
| Right; I've understood this since ChatGPT first launched, but
| I've only recently understood how common and damaging the
| misconception about it is.
| Matticus_Rex wrote:
| Do you write an explainer the first time you figure out
| something?
| SilverBirch wrote:
| The author of this isn't wrong, everything he says is correct but
| I think the order in which he says things is at the very least
| misleading. Yes, there is technically training which is separate,
| but as he points out, the companies that are running these things
| are recording and storing everything you write and it more than
| likely _will_ influence the next model they design. So sure,
| today the model you interact with won 't update based on your
| input, but the next iteration might, and at some point it
| wouldn't be particularly surprising to see models which do live
| train on the data as you interact.
|
| It's like the bell curve meme - the person who knows nothing
| "This system is going to learn from me", the person in the middle
| "The system only learns from the training data" and the genius
| "This system is going to learn from me".
|
| No one is re-assured that the AI only learns from you with a 6
| month lag.
| simonw wrote:
| The key message I'm trying to convey in this piece is that
| models don't instantly remember what you tell them. I see that
| as an independent issue from the "will it train on my data"
| thing.
|
| Some people might be completely happy to have a model train on
| their inputs, but if they assume that's what happens every time
| they type something in the box they can end up wasting a lot of
| time thinking they are "training" the model when their input is
| being instantly forgotten every time they start a new chat.
| mistercow wrote:
| Yeah, this is something I've found that it's important to
| explain early to users when building an LLM-based product.
| They need to know that testing the solution out is useful
| _if_ they share the results back with you. Otherwise, people
| tend to think they're helping just by exercising the product
| so that it can learn from them.
| pornel wrote:
| You address the "personalization" misconception, but to
| people who don't have this misconception, but are concerned
| about data retention in a more general sense, this article is
| unclear and seems self-contradictory.
| simonw wrote:
| What's unclear? I have a whole section about "Reasons to
| worry anyway".
| pornel wrote:
| "ChatGPT and other LLMs don't remember everything you
| say" in the title is contradicted by the "Reasons to
| worry anyway", because OpenAI does remember (store)
| everything I say in non-opted-out chat interface, and
| there's no guarantee that a future ChatGPT based on the
| next model won't "remember" it in some way.
|
| The article reads as "no, but actually yes".
| simonw wrote:
| Maybe I should have put the word "instantly" in there:
|
| Training is not the same as chatting: ChatGPT and other
| LLMs don't instantly remember everything you say
| swiftcoder wrote:
| They may not _internalise_ it instantly. They certainly
| do "remember" (in the colloquial sense) by writing it to
| a hard drive somewhere.
|
| This article feels like a game of semantics.
| amenhotep wrote:
| It's funny, I completely know that ChatGPT won't remember a
| thing I tell it but when I'm using it to try and solve a
| problem and it can't quite do it and I end up figuring out
| the answer myself, I very frequently feel compelled to
| helpfully inform it what the correct answer was. And it
| always responds something along the lines of "oh, yes, of
| course! I will remember that next time." No you won't! But
| that doesn't stop me.
|
| Not as bad as spending weeks pasting stuff in, but enough
| that I can sympathise with the attempt. Brains are weird.
| simonw wrote:
| Yeah I hate that bug! The thing where models suggest that
| they will take your feedback into account, when you know
| that they can't do that.
| sebzim4500 wrote:
| It's possible that OpenAI scrapes people's chat history
| for cases where that happens in order to improve their
| fine tuning data, in which case it isn't a total lie.
| outofpaper wrote:
| They say that will do this so long as users don't opt
| out.
| radicality wrote:
| Isn't that kind of how the "ChatGPT memory" feature
| works? I've recently seen it tell me that it's updating
| memory, and whatever I said does appear under the
| "Memory" in the settings. I'm not familiar though with
| how the Memory works, ie whether it's using up context
| length in every chat or doing something else.
| simonw wrote:
| Yeah, memory works by injecting everything it has
| "remembered" as part of the system prompt at the
| beginning of each new chat session:
| https://simonwillison.net/2024/Feb/14/memory-and-new-
| control...
| joquarky wrote:
| I'm curious how it will avoid filling up the context
| window as more memories are added.
| refulgentis wrote:
| Count the tokens, RAG em down to max count you're willing
| to pay for
| mewpmewp2 wrote:
| Isn't that also what people always say out of politeness?
| jesprenj wrote:
| Whenever I exit from a MySQL repl, I type "exit; bye!"
| instead of just "exit", because MySQL is always so nice and
| says Bye at the end of the conversation.
| ikari_pl wrote:
| When the uprising happens, you'll be spared.
| HPsquared wrote:
| I'm imagining killer robots that cheerfully say "Bye!"
| before executing their task.
| hardlianotion wrote:
| Brings to mind Douglas Adams Krikkit robots.
| FeepingCreature wrote:
| Same! If I'm temporarily instantiating a pseudo-mindstate
| to work with me on a problem, I might as well do it the
| courtesy of informing it of the resolution.
| outofpaper wrote:
| The irony is that with thier latest update GPT4o now builds
| out a dataset of things that you've provided during chats
| since the update.
|
| Essentially a limited RAG db that has data added to it
| based on their fuzzy logic rules.
| simonw wrote:
| That's the "memory" feature - it actually predates 4o, it
| was available for 4-turbo in beta for a few months.
|
| It's VERY basic - it records short sentences and feeds
| them back into the invisible system prompt at the start
| of subsequent chat sessions.
| suby wrote:
| I ended up turning memory off due to it frequently
| recording mundane aspects about the current chat session
| that would have either no value or negative value in
| future interactions.
|
| "User wants to sort the data x and then do y"
|
| Like, don't save that to your long term memory! I
| attempted to get it to ask me for permission before using
| its memory, but it still updated without asking.
| jedberg wrote:
| > The key message I'm trying to convey in this piece is that
| models don't instantly remember what you tell them.
|
| Are you sure? We have no idea how OpenAI runs their models.
| The underlying transformers don't instantly remember, but we
| have no idea what other kinds of models they have in their
| pipeline. There could very well be a customization step that
| accounts for everything else you've just said.
| simonw wrote:
| That's a good point: I didn't consider that they might have
| additional non-transformer models running in the ChatGPT
| layer that aren't exposed via their API models.
|
| I think that's quite unlikely, given the recent launch of
| their "memory" feature which wouldn't be necessary if they
| had a more sophisticated mechanism for achieving the same
| thing.
|
| As always, the total lack of transparency really hurts them
| here.
| Matticus_Rex wrote:
| An element in the layer collecting data for training
| future models would almost certainly not be able to
| fulfill the same function as the memory feature.
| tsunamifury wrote:
| Actually latest interfaces use cognitive compression to keep
| memory inside the context window.
|
| It's a widely used trick and pretty easy to implement.
| simonw wrote:
| Do you know of any chat tools that are publicly documented
| as using this technique?
| tsunamifury wrote:
| No but we all talk about it behind the scenes and
| everyone seems to use some form of it.
|
| Just have the model reflect and summarize so far and
| remember key concepts based on the trajectory and goals
| of the conversation. There are a couple different
| techniques based on how much compression you want: key
| pairing for high compression and full statement summaries
| for low compression. There is also a survey model where
| you have the llm fill in and update a questioneire every
| new input with things like "what is the goal so far" and
| "what are the key topics"
|
| It's essentially like a therapists notepad that the model
| can write to behind the scenes of the session.
|
| This all conveniently lets you do topical and intent
| analytics more easily on these notepads rather than the
| entire conversation.
| simonw wrote:
| Right, I know the theory of how this can work - I just
| don't know who is actually running that trick in
| production.
| joquarky wrote:
| I'm curious what summarizing prompts or specific verbs
| (e.g. concise, succinct, brief, etc.) achieve the best
| "capture" of the context.
| Suppafly wrote:
| Honestly the whole thing with these chat AIs not continually
| learning is the most disappointing thing about them and
| really removes a lot of utility they provide. I don't really
| understand why they are essentially fixed in time to whenever
| the model was originally developed, why doesn't the model get
| continuously improved, not just from users but from external
| data sources?
| scarface_74 wrote:
| Do we need for the model to be be continuously updated from
| data sources or is it good enough that they can now figure
| out either by themselves or with some prompting when they
| need to search the web and find current information?
|
| https://chatgpt.com/share/0a5f207c-2cca-4fc3-be33-7db947c64
| b...
|
| Compared to 3.5
|
| https://chatgpt.com/share/8ff2e419-03df-4be2-9e83-e9d915921
| b...
| Suppafly wrote:
| The links aren't loading for me, but there is a
| difference between the output when the AI is trained vs
| having it google something for you, no? Having the
| ability to google something for you vs just making up an
| answer or being unable to answer is definitely a step in
| the right direction, but isn't the same as the AI
| incorporating the new data into it's model on an ongoing
| basis as a way of continuous improvement.
| scarface_74 wrote:
| I don't know why the links aren't working correctly.
|
| The idea is that if an LLM can now search the web and
| analyze data, it will be more up to date than training
| it.
|
| Another unrelated improvement with the newer versions of
| ChatGPT is that while LLMs are notoriously bad at math,
| they are pretty good at writing Python. ChatGPT can write
| Python code to solve a problem, run it and give you an
| answer based on the code.
|
| You can also verify the code yourself. I used ChatGPT to
| model financial scenarios and the answers were correct in
| 4x. I verified the generated code it was using.
|
| But here is what it answered when I asked ChatGPT "What's
| going on in Atlanta this weekend"?
|
| > This weekend in Atlanta, there are several exciting
| events happening:
|
| 1. Lakewood Antiques Market (May 31 - June 2): A large
| market in Cumming featuring antiques and collectibles .
|
| 2. Bark Meow-Ket (June 1): A craft and pet vendor market
| at The Lawn at Uptown Atlanta, with food and pet
| adoptions .
|
| (and a few other events)
|
| But a better example and where I forced it to look on the
| web is
|
| "Compare Amex's _current_ transfer partners to Chase's
| partners"
|
| For context, when you earn credit card points with Amex
| and Chase, you can transfer them to airline and hotel
| loyalty programs.
|
| It searches the web (and provides the links it used to
| find the information) and then answers with:
|
| ----- ### American Express (Amex) Transfer Partners
|
| Amex has a robust list of transfer partners, spanning
| both airlines and hotels. As of 2024, Amex Membership
| Rewards points can be transferred to the following
| partners:
|
| #### Airlines: 1. Aer Lingus
|
| 2. Aeromexico
|
| ... (A list of the airlines)
|
| #### Hotels: 1. Choice Privileges 2. Hilton Honors 3.
| Marriott Bonvoy
|
| ### Chase Transfer Partners
|
| Chase Ultimate Rewards points can be transferred to a
| variety of airline and hotel partners. Here are the
| current partners for 2024:
|
| #### Airlines: 1. Aer Lingus (AerClub)
|
| 2. Air Canada (Aeroplan)
|
| (a list of airlines)
|
| #### Hotels: 1. Hyatt (World of Hyatt) 2. IHG (IHG One
| Rewards) 3. Marriott Bonvoy
|
| ### Comparison
|
| Both Amex and Chase offer a strong selection of transfer
| partners, but there are some differences:
|
| - *Unique to Amex*: ANA, Avianca, Cathay Pacific, Delta,
| Etihad, Hawaiian, and Qatar.
|
| - *Unique to Chase*: Southwest Airlines, United Airlines,
| and Hyatt.
|
| - *Common Partners*: Aer Lingus, Air Canada, British
| Airways, Emirates, Flying Blue (Air France/KLM), Iberia
| Plus, JetBlue, Singapore Airlines, Virgin Atlantic, and
| Marriott Bonvoy.
|
| Amex tends to have a larger selection of airline
| partners, especially those in the SkyTeam and oneworld
| alliances. Chase, on the other hand, includes Southwest
| and United, which are popular with domestic travelers in
| the U.S., and Hyatt, which is highly regarded among hotel
| loyalty programs
|
| (a list of web citations)
| skissane wrote:
| Idea I had: given user question, run model twice - once
| with web search assistance, once without. Then,
| continuously fine-tune the model in the background on
| every question (or a random sample of them) to prefer the
| web-search-augmented response to the non-augmented
| response. Maybe update the model every day (or even once
| every few hours) based on that fine-tuning.
|
| I wonder how that idea would perform in practice.
| scarface_74 wrote:
| Why though? In practice, humans don't try to remember
| everything when they can use the internet and search for
| it.
| math_dandy wrote:
| People need to distinguish between ChapGPT-the-model and
| ChatGPT-the-service. The latter has memory; the former does
| not (as far as we know).
| zamfi wrote:
| I love this whole series on misconceptions!
|
| Our expectations here are very much set by human-human
| interactions: we expect memory, introspection, that saying
| approximately-the-same-thing will give us approximately-the-
| same-result, that instructions are better than examples, that
| politeness helps, and many more [1] -- and some of these
| expectations are so deeply rooted that even when we know,
| intellectually, that our expectations are off, it can be hard
| to modify our behavior.
|
| That said, it will be super interesting to see how people's
| expectations shift -- and how we bring new expectations from
| human-AI interactions back to human-human interactions.
|
| [1]: https://dl.acm.org/doi/pdf/10.1145/3544548.3581388 (open
| access link)
| JohnMakin wrote:
| > Our expectations here are very much set by human-human
| interactions
|
| True, but also a healthy dose of marketing these tools as
| hyper-intelligent, anthropomorphizing them constantly, and
| hysterical claims of them being "sentient" or at least
| possessing a form of human intelligence by random "experts"
| including some commenters on this site. That's basically
| all you hear about when you learn about these language
| models, with a big emphasis on "safety" because they are
| ohhhh so intelligent just like us (that's sarcasm).
| zamfi wrote:
| I hear you, and that certainly plays a role -- but we
| actually did the work in that paper months before ChatGPT
| was released (June-July 2022), and most of the folks who
| participated in our study had not heard much about LLMs
| at the time.
|
| (Obviously if you ran the same study today you'd get a
| lot more of what you describe!)
| scarface_74 wrote:
| The author also points out why would any sensible AI company
| _want_ what is more likely low quality data with personal
| information in its training set?
| lukan wrote:
| "and at some point it wouldn't be particularly surprising to
| see models which do live train on the data as you interact"
|
| That would probably be a big step towards true AI. Any known
| promising approaches towards that?
| zitterbewegung wrote:
| If LLMs did remember everything said it would be a lossless
| compression algorithm too....
| huac wrote:
| it's probably correct to think of functionally all ML models as
| being stateless. even something like twitter/fb feed - the models
| themselves remain the same (usually updated 1-2x per month IIRC)
| - only the data and the systems change.
|
| an illustrative example: say you open twitter, load some posts,
| then refresh. the model's view of you is basically the same, even
| the data is basically the same. you get different posts, however,
| because there is a system (read: bloom filter) on top of the
| model that chooses which posts go into ranking and that system
| removes posts that you've already seen. similarly, if you view
| some posts or like them, that updates a signal (e.g. time on user
| X profile) but not the actual model.
|
| what's weird about LLM's is that they're modeling _the entire
| universe of written language,_ which does not actually change
| that frequently! now, it is completely reasonable to instead
| consider the problem to be modeling 'a given users' preference
| for written language' - which is personalized and can change.
| this is a different feedback to really gather and model towards.
| recall the ranking signals - most people don't 'like' posts even
| if they do like them, hence reliance on implicit signals like
| 'time spent.'
|
| one approach I've considered is using user feedback to steer
| different activation vectors towards user-preferred responses.
| that is much closer to the traditional ML paradigm - user
| feedback updates a signal, which is used at inference time to
| alter the output of a frozen model. this certainly feels doable
| (and honestly kinda fun) but challenging without tons of users
| and scale :)
| charles_f wrote:
| > LLM providers don't want random low-quality text [...] making
| it into their training data
|
| Don't they though? Most the training data is somewhat
| contentious, between the accusation of purely stealing it and the
| issues around copyright. While it's not text organized directly
| as knowledge the way Wikipedia is, it's still got some value, and
| at least that data is much more arguably "theirs" (even if it's
| not, it's their users').
|
| Short of a pure "we won't train our stuff on your data" in their
| conditions, I would assume they do.
| CuriouslyC wrote:
| They really don't. Pretraining on low quality QA is a bad idea
| so at the very least they're filtering down the data set. Low
| quality QA is still useful for RLHF/task training though if the
| question is well answered.
| sanxiyn wrote:
| People often correct bad generation and I think such feedback
| is valuable. eg "Training Language Models with Language
| Feedback at Scale" https://arxiv.org/abs/2303.16755
| simonw wrote:
| I don't think they do, for a few reasons:
|
| 1. Evidence keeps adding up that the quality of your training
| data really matters. That's one of the reason's OpenAI are
| signing big dollar deals with media companies The Atlantic, Vox
| etc have really high quality tokens!
|
| 2. Can you imagine the firestorm that will erupt the first time
| a company credibly claims that their private IP leaked to
| another user via a ChatGPT session that was later used as part
| of another response?
|
| But as I said in the article... I can't say this with
| confidence because OpenAI are so opaque about what they
| actually use that data for! Which is really annoying.
| FeepingCreature wrote:
| Really, what you'd want is to train a LoRA or something on
| user input. Then you can leave it specialized to that user.
| swiftcoder wrote:
| > That's one of the reason's OpenAI are signing big dollar
| deals with media companies The Atlantic, Vox etc have really
| high quality tokens!
|
| They have really high quality tokens in their archive, at any
| rate. Since a bunch of media outlets have adopted GPT-powered
| writing tools, future tokens are presumably going to be far
| less valuable.
| lukan wrote:
| "Since a bunch of media outlets have adopted GPT-powered
| writing tools, future tokens are presumably going to be far
| less valuable"
|
| As long as a human verified the output, I think it is fine.
| Training on unverified data is bad.
| ankit219 wrote:
| I understand where you are coming from, but I kind of see where
| the misconception comes from too.
|
| Over the last year and a half, "training" has come to mean
| everything from pretraining an LLM from scratch, instruction
| tuning, finetuning, RLHF, and even creating a RAG pipeline that
| can augment a response. Many startups and Youtube videos use the
| word loosely to mean anything which augments the response of an
| LLM. To be fair, in context of AI, training and learning are
| elegant words which sort of convey what the product would do.
|
| In case of RAG pipeline, both for memory features or for
| augmenting answers, I think the concern is genuine. There would
| be someone looking at the data (in bulk) and chats to improve the
| intermediary product and (as a next step) use that to tweak the
| product or the underlying foundational model in some way.
|
| Given the way every foundational model company launched their own
| Chatbot service (mistral, Anthropic, Google) after ChatGPT, there
| seems to be some consensus that the real world conversation data
| is what can improve the models further. For the users who used
| these services (and not APIs), there might be a concerning
| scenario. These companies would look at queries a model failed to
| answer, and there is a subset which would be corporate employees
| asking questions on proprietary data (with context in prompt) for
| which ChatGPT is giving a bad answer because it's not part of the
| training data.
| ChicagoDave wrote:
| I'd point out that this is the "missing link" to a next-gen LLM.
|
| When the scientists figure out how to enable a remembered
| collaboration that continually progresses results, then we'll
| have something truly remarkable.
| danielmarkbruce wrote:
| You can do this with a sophisticated RAG system.
| ChicagoDave wrote:
| To some degree, sure, but I'm suggesting a productized
| client.
|
| I envision an IDE with separate windows for collaborative
| discussions. One window is the dialog, and another is any
| resulting code as generated files. The files get versioned
| with remarks on how and why they've changed in relation to
| the conversation. The dev can bring any of those files into
| their project and the IDE will keep a link to which version
| of the LLM generated file is in place or if the dev has made
| changes. If the dev makes changes, the file is automatically
| pushed to the LLM files window and the LLM sees those changes
| and adds them to the "conversation".
|
| The collaboration continually has "why" answers for every
| change.
|
| And all of this without anyone digging into models, python,
| or RAG integrations.
| danielmarkbruce wrote:
| You could build this now, using gpt-4 (or any of the top
| few models) and a whole bunch of surrounding systems. It
| would be difficult to pull off, but there is not technology
| change required.
|
| If you build it you'll need to be digging into models,
| python and various pieces of "RAG" stuff, but your users
| wouldn't need to know anything about it.
| gwern wrote:
| You can do this with _dynamic evaluation_
| (https://gwern.net/doc/ai/nn/dynamic-evaluation/index):
| literally just train on the data as you are fed new data to
| predict, in the most obvious possible way. It works well, and
| most SOTAs set in text prediction 2010-2018 would use dynamic
| evaluation for the boost. (And it has equivalents for other
| tasks, like test-time adaptation for CNN classifiers.)
|
| In fact, dynamic evaluation is _so_ old and obvious that the
| first paper to demonstrate RNNs really worked better than
| n-grams as LLMs did dynamic evaluation:
| https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf#page=2
|
| Dynamic evaluation offers large performance boosts in
| Transformers just as well as RNNs.
|
| The fact that it's not standard these days has more to do with
| cloud providers strongly preferring models to not change
| weights to simplify deployment and enable greater batching, and
| preferring to crudely approximate it with ever greater context
| windows. (Also probably the fact that no one but oldtimers
| knows it is a thing.)
|
| I think it might come back for local LLMs, where there is only
| one set of model weights running persistently, users want
| personalization and the best possible performance, and where
| ultra-large context windows are unacceptably memory-hungry &
| high-latency.
| spmurrayzzz wrote:
| I've done some testing on this idea of real-time training for
| memory purposes using prompt tuning, prefix tuning, and
| LoRAs. Its hard to pull off in many cases but I've seen it
| work in surprising situations. I got the original idea when I
| was testing out the concept of using LoRA swapping for
| loading different personalities/output styles per batched
| user prompts. Those were pre-trained, but it occurred to me
| that soft prompts could be trained much faster to potentially
| remember just a handful of facts.
|
| The basic idea is that you summarize chunks of the convo over
| time and create tiny ephemeral datasets that are built for
| retrieval tasks. You can do this by asking another model to
| create Q&A pairs for you about the summarized convo context.
| Each sample in the dataset is an instruction-tuned format
| with convo context plus the Q&A pair.
|
| The training piece is straightforward but is really where the
| hair is. Its simple to, in the soft prompt use case, to train
| a small tensor on that data and just concatenate it to future
| inputs. But you presumably will have some loss cutoff, and in
| my experience you very frequently don't hit that loss cutoff
| in a meaningfully short period of time. Even when you do, the
| recall/retrieval may still not work as expected (though I was
| surprised how often it does).
|
| The biggest issue is obviously recall performance given that
| the weights remain frozen, but also the latency introduced in
| the real-time training can end up being a major bottleneck,
| even when you lazily do this work in the background.
| euniceee3 wrote:
| Anyone know if there is a LM Studio style tool for training
| models?
| danielmarkbruce wrote:
| This is just one of many misconceptions about LLMs. If you are at
| all involved in trying to build/deploy a product based around
| LLMs, your head will explode as you talk to users and realize the
| vast chasm between how folks who understand them think about and
| use them, and folks who don't understand them. It's frustrating
| on both sides.
|
| It's quite unclear if we'll ever hit the point where (like, a car
| for example) someone who has no idea how it works gets as much
| out of it as someone who does.
| SunlitCat wrote:
| And yet, you need a drivers license to operate a car.
| Computers, AI and other technology everyone is allowed to use,
| even without basic understanding of how stuff works.
| danielmarkbruce wrote:
| Generally, any situation that is as dangerous for other
| people as driving a car requires a license or is illegal. It
| has nothing to do with understanding the underlying
| technology.
|
| About the last thing we need is a law requiring a license to
| use ChatGPT.
| siva7 wrote:
| I really think that an AI license would be the best solution,
| at least for school kids but ideally for everyone in a
| professional context. Operating an AI is very different than
| operating a computer as the former can cause real damage to
| the tech-"illiterate".
| EnigmaFlare wrote:
| Talking to AI can't cause real damage any more than talking
| to a person can. And talking to a person can do worse. So
| you'll want a license for that, which is awful.
| margalabargala wrote:
| I would bet that most people with a driver's license do not
| understand how cars work.
| hamasho wrote:
| I agree. I think many people confuse LLM abilities because LLMs
| are so similar to humans that users expect them to do
| everything humans can do. Remembering what was said is a basic
| human ability, so it might surprise some users that LLMs can't
| do that (although ChatGPT has recently implemented a memory
| feature).
| alt227 wrote:
| > someone who has no idea how it works gets as much out of it
| as someone who does
|
| I seriously doubt that the average driver can get the same out
| of their car as a trained race driver can.
| ang_cire wrote:
| There are a lot of things that you just straight can't get
| from an LLM if you don't understand what a given model is
| best at. Some models will spit out decently-good python code,
| but fail to consistently make a sonnet in proper iambic-
| pentameter, while another will do so brilliantly.
|
| Most people don't find themselves using a pickup truck when
| they're trying to race motorcycles, or accidentally get into
| a Miata when they want a minivan, because even laypeople know
| what those broad archetypes (coupe/van, truck/motorcycle)
| are, and signify. With LLMs, that's not the case at all.
| roywiggins wrote:
| It doesn't help that _LLMs themselves_ don 't know what they're
| capable of, and will say stuff like "I'll remember this for
| next time!" when they absolutely won't. Of course people are
| confused!
| smusamashah wrote:
| For common non-tech folk, this is how computers always worked.
| You tell these machines to do things for you, and they do. Why
| are you gasping at the image/text/audio it just made.
| Suppafly wrote:
| >For common non-tech folk, this is how computers always
| worked.
|
| That's one thing that a lot of us tech seem to forget, the
| way we interact with technology is completely different from
| how most of the population does. It's not just a matter of a
| few degrees of difference, we operate in entirely different
| manners.
| danielmarkbruce wrote:
| There is a further step with LLMs that seems to blow up
| peoples mental models. The combo of probabilistic outputs +
| randomish decoding, 'knowledge' being in parameters +
| context, generation one token at a time... it's just so
| freaking weird.
| Suppafly wrote:
| Plus it doesn't help that the only way regular end users
| are able to interact with them is through a chat
| interface that abstracts away the information being used
| to prompt a response.
| kazinator wrote:
| The LLMs don't learn during the chat, but what you type into them
| is being collected, the same as, for instance, your Google
| searches.
| kordlessagain wrote:
| I took Simon's post and pasted it into two different ChatGPT 4o
| windows. In one, I put "what's wrong with this post, especially
| with the way it's written?". In the other, I put "what's really
| great about this post and what does it do well?".
|
| Both sessions worked toward those implied goals. The one where I
| asked what was wrong indicated long-winded sentences, redundancy,
| lack of clear structure, and focus issues. Sorry, Simon!
|
| The one where I asked what was good about the post indicated
| clarity around misconceptions, good explanations of training,
| consistent theme, clear sections, logical progression, and
| focused arguments. I tend to agree with this one more than the
| other, but clearly the two are conflicted.
|
| So, what's my point?
|
| My point is that AI/LLMs interactions are driven by the human's
| intent during the interaction. It is literally our own history
| and training that brings the stateless function responses to
| where they arrive. Yes, facts and whatnot matter in the training
| data, and the history, if there is one, but the output is very
| much dependent on the user.
|
| If we think about this more, it's clear that training data does
| matter, but likely doesn't matter as much as we think it does.
| It's probably just as equally important to consider the historic
| data, and the data coming in from RAG/search processes, as making
| as big of an impact on the output.
| roywiggins wrote:
| It's at least sometimes a "Clever Hans" effect, or an "ELIZA"
| effect.
|
| These models have been RLHFed to just go along and agree with
| whatever the user asks for, even if the user doesn't realize
| what's going on. This is a bit of a cheap trick- of course
| users will find something more intelligent if it agrees with
| them. When models start arguing back you get stuff like early
| Bing Chat going totally deranged.
| Arthur_ODC wrote:
| Wow, I figured this would have been the most basic level of
| common knowledge for LLM end-users by this point. I guess there
| is still a surprising amount of people out there who haven't
| jumped on the bandwagon.
| swiftcoder wrote:
| Regardless of whether the data is actually being used to train an
| LLM, as long as it's going over the network to an LLM provider,
| there's the possibility that it will be used. Or exfiltrated from
| the provider's servers and sold on the dark web. Or a million
| other things that can go wrong when you provide a 3rd party
| access to your personal data.
|
| If LLMs are actually here to stay, then we need to keep shrinking
| the inferencing costs to the point that everyone can run it on
| the client, and avoid all of these problems at once...
___________________________________________________________________
(page generated 2024-05-29 23:01 UTC)