[HN Gopher] An embarrassingly simple approach to recover unlearn...
___________________________________________________________________
An embarrassingly simple approach to recover unlearned knowledge
for LLMs
Author : PaulHoule
Score : 238 points
Date : 2024-11-04 02:52 UTC (20 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| vdvsvwvwvwvwv wrote:
| Is this like giving the model a magic mushroom. It can access
| previously repressed memories. The unlearning part being like A
| Clockwork Orange.
| constantlm wrote:
| I'm not an expert in this field at all, so please excuse the dumb
| question. Does this mean that if you say, quantise llama3 to 4
| bits, you would be able to access "hidden" (albeit degraded)
| information such as, for example, how to synthesise certain
| chemical compounds?
| geor9e wrote:
| Exactly what I was wondering. Unlearn = Guardrails? It sounds
| like they just tweaked the weights very minimally to self-
| censor, but the tweaks are so fine they don't survive at lower
| resolutions. But if bypassing the guardrails was so easy, I
| figured I would have heard of it by now.
| stephantul wrote:
| Unlearning is not necessarily "guard rails", it is literally
| updating the model weights to forget certain facts, as you
| indicate. Guard rails is more like training the model to
| teach it what is acceptable and what isn't.
| golol wrote:
| As I understand the whole point is that it is not so simple
| to tell the difference between the model forgetting
| information and the model just learning some guardrails
| which orevent it from revealing that information. And this
| paper suggests that since the information can be recovored
| from the desired forgetting does not really happen.
| Someone wrote:
| > it is literally updating the model weights to forget
| certain facts
|
| I think a better analogy is that it's updating the weights
| to never produce certain statements. It still uses the
| unwanted input to determine the general shape of the
| function it learns, but that then is tweaked to _just_
| avoid it making statements about it ( _just_ because the
| learned function supposedly is the best obtainable from the
| training data, so you want to stay close to it)
|
| As a hugely simplified example, let's say that
| _f(x)=(x-2.367)2 + 0.9999_ is the best way to describe your
| training data.
|
| Now, you want your model to always predict numbers larger
| than one, so you tweak your formula to _f(x)=(x-2.367)2 +
| 1.0001_. That avoids the unwanted behavior but makes your
| model slightly worse (in the sense of how well it describes
| your training data)
|
| Now, if you store your model with smaller floats, that
| model becomes _f(x)=(x-2.3)2 + 1_. Now, an attacker can
| find an _x_ where the model's outcome isn't larger than 1.
| AtlasBarfed wrote:
| We are talking about multiplayer neutral networks where
| interconnect weights encode data in obscure ways?
|
| Is machine "unlearning" some retraining process to try to
| reobscure certain data so it doesn't show in outputs that is,
| outputs from tested inputs that used to show the data), but
| it is still encoded in there somewhere depending on bovel
| inputs to activate it?
|
| Is that scout right?
| nothrowaways wrote:
| Only If "how to synthesise certain chemical compounds?" Was
| already in the original model..
| stephantul wrote:
| In short: their finding is that quantizing a model undoes various
| "unlearning" methods. An unlearning method is a specific update
| to model weights that make it forget specific facts. These are
| often meant to satisfy copyright claims, although I don't know if
| these are ever used in practice.
|
| I feel that this needs a good threat model analysis. Like, you
| possess an fp32 model, which someone has fine-tuned to forget
| some facts, which you can then quantize to recover those facts.
| When would this lead to a dangerous situation?
| discreteevent wrote:
| Unlearning is described as: "process aims to erase specific
| knowledge from LLMs while preserving as much model utility as
| possible."
|
| i.e. We know that our model would be useless without your
| source. So we will take the useful part of your source and
| obfuscate the rest so that we can charge our users for utility
| provided by you without having to pay you anything.
| short_sells_poo wrote:
| > We know that our model would be useless without your
| source. So we will take the useful part of your source and
| obfuscate the rest so that we can charge our users for
| utility provided by you without having to pay you anything.
|
| Isn't this basically the entirety of the latest AI craze?
| They basically took a public good - the information available
| on the Internet - and hid behind some thin veneer of "we are
| not stealing, we just trained an AI on the information" and
| then they sell it. Note, I'm intentionally not writing "free
| information available on the Internet", because information
| is not free. Someone has to pay (in time or money) to
| generate it and host it. They might have provided it gratis
| to the public, but nobody asked them if an AI can come along,
| harvest it all and regurgitate it without a hint of reference
| to the original source.
|
| Much of that information is not even free in the monetary
| sense, it is supported by ads. The AI will not only not click
| through the \ds, it won't even generate repeat traffic as
| once the information is harvested, there's no need to access
| the source anymore.
|
| If you really think about it, it's a brilliant business
| model. It's a perfect theft, where the affected group is too
| diffuse and uncoordinated, it's extremely difficult to prove
| anything anyway, and the "thieves" are flush with investment
| capital so they sleep well at night.
|
| LLMs have undoubtedly great utility as a research tool and
| I'm not at all against them. I think they (or a model similar
| in objectives) are the next step in accessing the knowledge
| humanity has amassed. However, there's a distinct danger that
| they will simply suck they sources dry and leave the internet
| itself even more of a wasteland than it has already become. I
| have no illusions that AI companies will simply regress to
| the lowest cost solution of simply not giving anything back
| to whoever created the information in the first place. The
| fact that they are cutting off the branch that they are
| sitting on is irrelevant for them, because the current crop
| of owners will be long gone with their billions by the time
| the branch snaps.
| cachvico wrote:
| I think it's fair and reasonable to assume that the AI
| companies will at some point start licensing their source
| content. Through gov/legal oversight or not remains to be
| seen, but OpenAI are already beginning to do so:
|
| https://searchengineland.com/openais-growing-list-of-
| partner...
| rvnx wrote:
| Google is using for 20 years unlicensed source content
| for their search snippets, they seem to be doing fine
| with it (with the exception of few news publishers).
| grugagag wrote:
| The idea with internet search was to get people to find
| the source of the information they were searching for. As
| a matter of fact a lot information indexing was requested
| at the source. Google did respect the bargain for a while
| until they started to obfuscate getting to the source
| with AMP and their info snippets directly in the search,
| bypassing redirecting to the source. Then they started
| not displaying all that info at all, not even on the nth
| page of search results. The broth has been getting sour
| for a while now. Some people never wanted crawlers
| indexing and there were numerous discussions about how
| those robot.txt were ignored.
|
| So what I see here is the historical trend broken
| bargains which is more or less digital theft
| short_sells_poo wrote:
| Thanks for the link, I appreciate it. I suppose the issue
| is that this just further enshittifies the internet into
| a small handful of walled gardens. Big players get their
| payday, because they could feasibly sue OpenAI and
| generate them enough headache. But the vast amount of
| content on the internet was not built by a small handful
| of media companies, but rather by masses of small
| creators. It is their work that OpenAI is profiting from
| and I have yet to see a credible suggestion on how they
| will compensate them.
| tonyedgecombe wrote:
| The likely and rather sad outcome of all this is small
| creators stop publishing because what is the point if
| they think their work is going to be regurgitated by some
| AI for $20/month.
| msabalau wrote:
| From my, probably naive perspective, there seems to be at
| least two major sources of value the generative AI
| provides:
|
| 1.Understanding the world, for example by creating a
| statistical model of entire languages, as languages are
| already a model of reality.
|
| 2. Recapitulating (stealing) specific instances of
| information in ways that people often don't find
| acceptable. Grabbing a news article without permission, and
| providing that to your paying users without paying for the
| work. Recreating trademarked characters or the style of a
| particular living artist, without compensation. Deepfake
| porn.
|
| The first seems generally valuable to society as a whole
| and a morally (IANAL) legitimate creative transformation,
| even of copyrighted work.
|
| The second use seems exactly as you describe.
|
| Societies could navigate this by encouraging and promoting
| the first use, and criminalizing or removing the ability to
| be paid from the second.
|
| Of course, what is happening is that groups of economic
| interests will use their resources and clout to advocate
| for both, or against both.
| Ajedi32 wrote:
| I agree for the most part that 2 is what most people find
| unacceptable, not 1.
|
| The problem is that, like any general intelligence (e.g.
| humans), any sufficiently generalized model capable of 1
| will also necessarily be capable of 2, regardless of
| whether it's trained on copyrighted material or not. How
| do you make an AI model that's capable of summarizing
| Wikipedia articles but not news articles? Or that's
| capable of generating consistent images of my original
| character from a reference photo but not images of Mickey
| Mouse from the same? This is achievable only by
| restricting software freedom; by taking measures to
| prevent users from "running the program as they wish" and
| from "studying the source code and making changes".
| Ajedi32 wrote:
| I'll note that the way we have typically enforced
| restrictions on the behavior of general intelligences in
| the past (before AI) is to pass laws and enforce
| punishments if the laws are broken. Not to try to somehow
| take away people's ability to break the law in the first
| place, because that would require unacceptably onerous
| restrictions on human freedom.
|
| I think the same principle applies to AI. Trying to make
| it impossible for people to use AI to break the law is a
| lost cause, only achievable by unacceptably onerous
| restrictions on human freedom. Instead, we should do what
| we've always done: make certain actions illegal and
| punish those who do them anyway in violation of the law.
| Maybe new laws might be required for that in some cases
| (e.g. deepfake porn) but for the most part I think the
| laws we already have on the books are sufficient, maybe
| with minor tweaks.
| eropple wrote:
| That all sounds great until you're dealing with deepfakes
| that come from a country without an extradition treaty?
| Ajedi32 wrote:
| Not really that different from other forms of illegal
| content coming from countries without an extradition
| treaty. (Piracy, scam calls, CP, etc.) Trying to stop it
| by imposing onerous restrictions on your own citizens
| isn't likely to be effective.
| jimbokun wrote:
| I would summarize your points as:
|
| We need to create a whole new body of law for enforcing
| copy write protections in the age of AI.
|
| Does the AI adequately attribute its sources? Does it
| paraphrase in acceptable ways or just repeat large
| swathes of text from its corpus with minimal changes?
|
| The laws should force any LLMs not yet capable of
| complying with these requirements off the Internet until
| they can comply.
| Workaccount2 wrote:
| Imagine consultants had to cite sources and pay-out every
| time they referenced knowledge gained from reading a
| research paper at working at a formal employer.
|
| I can understand the need to prevent verbatim copying of
| data. But that is a problem solved on the output side of
| LLM's, not on the data input for training.
|
| It is _completely_ legal for someone to pay me to
| summarize the news for them every morning. I can 't help
| but feel that knee-jerk regulation is going to be
| ultimately bad for everyone.
| lapphi wrote:
| I think, at one point in time, it was also completely
| legal to break into computer networks because there were
| no laws against it.
| kenmacd wrote:
| I hear what you're saying, and I'm not saying some of it
| doesn't have merit. The following is meant as an open
| philosophical discussion.
|
| On the topic of 'the information isn't free' I'm curious if
| you have the same opinion of encyclopedia companies. You
| must admit there's at least some parallels in that they
| also consolidate a large amount of information that was
| 'generated' from others.
|
| Or how about the information you and I have gained from
| books and the internet? Sure we might 'pay' for it once by
| buying a book or seeing some ad, but then we might use that
| information to make thousands of dollars through employment
| without ever going back to buy another copy of that book.
| An even more 'egregious' example could be teachers. They're
| literally taking the knowledge of others, 'regurgitating'
| it to our children for money, and 'not giving anything back
| to whoever create the information in the first place'.
|
| > there's a distinct danger that they will simply suck they
| sources dry and leave the internet itself even more of a
| wasteland than it has already become
|
| Maybe. There's the whole AGI/ASI argument here in that
| they/we might not _need_ humans to create information in
| the same way we don't need human-calculators any more.
|
| Barring that though I do hear what you're saying around a
| lowering value to creating 'new internet information'.
| Personally I can't see it affecting my internet use that
| much though as there's basically two categories my internet
| information gathering fall in to:
|
| 1. I want to know something, give me the short quick
| answer. This category is already full of sites that's are
| just trying to hack the search algos to show their version
| of copy-pasted info. I don't really care which I go to and
| if AI kills their business, oh well.
|
| 2. I want follow a personality. This category is where I
| have bloggers/youtubers/etc in RSS feeds and the like. I
| want to hear what they're saying because I find them and
| the topics interesting. I can't see this being replaced by
| AI any time soon.
| short_sells_poo wrote:
| You raise some great points and I agree it that we are on
| tricky ideological grounds. I'll try to provide sensible
| counter-arguments to your encyclopaedia and teacher
| examples, and hopefully not fall into strawmans (please
| do object if I do):
|
| 1. First there's the motivation or intent. Teachers want
| to earn a living, but their purpose in some sense and
| (hopefully) their main intent is that of education. I
| argue that teachers should be paid handsomely, but I also
| argue that their motivation is rarely to maximize
| profits. This is contrary to the bog standard Silicon
| Valley AI company, who are clearly showing that they have
| zero scruples about breaking past promises for those
| sweet dollar signs.
|
| 2. My second point actually builds a bit on the first:
| both encyclopaedias and teachers tend to quote the source
| and they want their audience to expand their research
| horizon and reach for other sources. They don't just
| regurgitate information, they'll tend to show the reader
| where they got the information from and where to go for
| more and neither the teachers nor the books mind if the
| audience reaches for other teachers and books. LLMs and
| generative models are/will be/have been capable of this
| I'm sure, but it is not in their creators' interest to
| enhance or market this capability. The more the users are
| walled in, the better. They want a captive audience who
| only stays in the world of one AI model provider.
|
| 3. Scale. Never before has been the reuse (I'm trying to
| avoid using the word theft) of content produced by others
| conducted on such an industrial scale. The entire
| business model of LLMs and generative models has been to
| take information created by masses of humans and
| reproduce it. They seem to have zero qualms taking all
| the work of professional and amateur artists and feeding
| it into a statistical model that trivializes replication
| and reproduction. You could argue that humans do this as
| well, but I feel scale matters here. The same way that a
| kitchen knife can be used to murder someone, but with a
| machinegun you can mow down masses of people. Please
| excuse the morbid example, but I'm trying to drive a
| point: if we make a certain thing extremely easy, people
| will do it, and likely do it on a mass scale. You could
| argue that this is progress, but is all progress
| inherently beneficial?
|
| There's value in these models, so we should use them. But
| I feel we are rapidly hurtling towards a walled garden
| corporate dystopia in so many areas of our society.
| Industries which tended to have negative impact on our
| lives (waste, tobacco, alcohol, drugs) have become
| heavily regulated and we have paid for these regulations
| in blood. Will we have to pay the same blood price for
| the harmful industries of the new age?
| kenmacd wrote:
| Interesting counter-points. Thank you for taking the time
| to post them.
|
| I don't think I have anything useful to add without
| giving the issue more thought. Your reply definitely adds
| new dimensions for me to think about.
| mitthrowaway2 wrote:
| > Or how about the information you and I have gained from
| books and the internet? Sure we might 'pay' for it once
| by buying a book
|
| We've never as a society needed such a concept before,
| but publishing a book has always come with the implicit
| license that people who buy the book are allowed to both
| read the book and learn from the knowledge inside.
| Authors didn't write books about facts they didn't want
| people to learn.
|
| But we now have a new situation where authors who never
| needed to specify this in a terms-of-use are realizing
| that they want to allow humans to learn from their work,
| but not machines. Since this hasn't ever been necessary
| before it's a huge grey area, and ML companies are riding
| around claiming they have license to learn to reproduce
| art styles just like any human would, ignoring whether
| the artist would have allowed one but not the other if
| given the chance to specify.
|
| It's not that different from when photocopiers and tape
| recorder technology made it easy to copy documents or
| music, say from the radio, and we needed to grapple with
| the idea that broadcasting music might come with license
| to make personal recordings but not allow someone to
| replay those recordings for commercial use. It wasn't a
| concept that was necessary to have.
|
| Now with AI, the copy is not exact, but neither was it
| with a tape recorder.
| SoftTalker wrote:
| Humans do the same thing. Typically in a more narrowed
| fashion, they read and study and learn from a variety of
| sources, many of which are not "free" and they become
| experts on a subject. They can then sell that expertise to
| others willing to pay for it.
|
| LLMs just do this on a bigger scale, and not as well.
| short_sells_poo wrote:
| I agree, but that doesn't make it good - or perhaps even
| acceptable. To quote myself answering another commenter:
|
| > Never before has been the reuse (I'm trying to avoid
| using the word theft) of content produced by others have
| been conducted on such an industrial scale. The entire
| business model of LLMs and generative models has been to
| take information created by masses of humans and
| reproduce it. They seem to have zero qualms taking all
| the work of professional and amateur artists and feeding
| it into a statistical model that trivializes replication
| and reproduction. You could argue that humans do this as
| well, but I feel scale matters here. The same way that a
| kitchen knife can be used to murder someone, but with a
| machinegun you can mow down masses of people. Please
| excuse the morbid example, but I'm trying to drive a
| point: if we make a certain thing extremely easy, people
| will do it, and likely do it on a mass scale. You could
| argue that this is progress, but is all progress
| inherently beneficial?
| Spivak wrote:
| I agree that scale changes the nature of what's going on,
| but I'm not sure if it follows that the scaled up variant
| is bad. I think models like GPT3 and Sonnet which are
| intended for "general purpose intelligence" are fine.
| Same with Copilot and Phind for coding. They contain
| copy-written knowledge but not by necessity and their
| purpose is not to reproduce copy-written materials.
|
| Training a diffusion model on a specific artist's work
| with the intent to reproduce their style I think
| obviously lives on the side of wrong. While it's true a
| human could do the same thing, there is a _much_ stronger
| case that the model itself is a derivative work.
|
| I think the courts will be able to identify cases where
| models are "laundering copyright" as separate from cases
| where copyrighted material is being used to accomplish a
| secondary goal like image editing. Taking a step back
| this is in some way what copyright is for-- you get
| protections on your work in exchange for making it part
| of the public body of knowledge to be used for things you
| might not have intended.
| malwrar wrote:
| > They basically took a public good ... and then they sell
| it
|
| I think what they sell is more fairly characterized as
| "hosted inference to a big pretrained model" with perhaps
| also some optimism that their stuff will improve in the
| background. The only substantial moat these companies have
| is their ability to pay for the compute to train
| contemporary generative models. The public good remains a
| public good for all to profit from, small-scale or large.
|
| > Someone has to pay ... but nobody asked them if an AI can
| come along, harvest it all and regurgitate it without a
| hint of reference to the original source.
|
| Practically speaking, we don't actually need to centralize
| content to pay for hosting it. People just do it because it
| makes money. The price of time required to create some work
| distributed among viewers feels like a vague philosophical
| argument to me, especially when those works are merely
| being dispassionately observed by math objects. Currently
| the price appears to be "whatever I feel morally obliged to
| and/or can get away with".
|
| > It's a perfect theft
|
| ...if it is legally theft to begin with, and not simply
| fair use. To me the current methods of training e.g. LLMs
| feel inherently transformative, like a massive partial hash
| of the internet that you can query. Even if it is ruled as
| theft in the future, large AI companies will only be
| further advantaged as they're presently buying off the
| people that will actually be able to sue them.
| rowanG077 wrote:
| That's not entirely true. Retraining is very expensive. If
| you can train on a very large dataset including proprietary
| knowledge and then postprocess the model cheaply to forget
| things saves you retraining for every variation.
| seanmcdirmid wrote:
| I thought it was even worse than that: learning any of the
| corpus verbatim would actually reduce model utility.
| edude03 wrote:
| Yes although how close to verbatim is debatable. For
| example there are questions that you'd ask that other
| people have asked many times before that you'd like the
| exact answer for (e.g. when does daylight saving time end?)
| startupsfail wrote:
| > that make it forget specific facts. These are often
| meant to satisfy copyright claims
|
| Facts are not copyrightable.
|
| To quote copyright.gov: "Copyright does not protect
| facts, ideas, systems, or methods of operation, although
| it may protect the way these things are expressed."
| PittleyDunkin wrote:
| What is a fact without expression? It's not clear under
| what interpretation might be necessary to get the quoted
| sentiment to be considered sensical.
| seanmcdirmid wrote:
| Wouldn't that be stuffed in the prompt anyways? No reason
| for the LLM to learn that.
| int_19h wrote:
| It really depends on which part of the corpus, though. I do
| expect my LM to be able to reproduce culturally important
| citations, for example.
| wongarsu wrote:
| I assume everyone who has someone with an AI safety job title
| uses unlearning to make sure their models don't remember how to
| make common illegal drugs, poisons or explosives.
|
| The threat model here is probably more accidental un-unlearning
| these facts and distributing those models (as is common with
| quantized models). Most of this "dangerous" information is
| readily available in textbooks, patents, amateur chemistry
| forums etc. But as a society we generally assume that those
| smart enough to find and understand that kind of information
| are smart enough not to abuse it. We just don't want
| Mythbusters to explain it on prime-time TV, or ChatGPT
| explaining it to people
| aziaziazi wrote:
| Mythbusters chooses the subjects he discusses while ChatGPT
| responses depends on the context (you) provided. It will give
| you a list a poisons if you asked (5 seconds), as well as an
| encyclopedie or Google would (30 seconds).
|
| Mythbuster broadcasting poisons recipes could seeds bad ideas
| that wouldn't been triggered otherwise. ChatGPT wouldn't give
| a poisons recipe if not asked specifically.
| FergusArgyll wrote:
| This is a decent point.
|
| I hadn't really thought of a good reason why we e.g. sell
| old army manuals with step by step guides on making almost
| anything but there's no (afaik) HBO mini-series "Learn
| guerilla warfare"
| mschuster91 wrote:
| > But as a society we generally assume that those smart
| enough to find and understand that kind of information are
| smart enough not to abuse it.
|
| There's an almost complete step by step guide for most
| explosives on Wikipedia.
|
| The problem is that decisionmakers and regulators are
| excessively dumb - "AI bad" reigns supreme over the fact that
| Wikipedia tells you more about making bombs, even nuclear
| bombs if you want, than ChatGPT.
|
| AI in its current form is _still_ bad - from all the IP
| issues over the environmental cost to it enabling spam,
| harassment and deception on a speed, scale and easiness not
| seen before in history - but most of the stuff where
| "regulators" cry about is just frankly bullshit.
| jebarker wrote:
| More generally than "unlearning", I wonder if taking any fp16
| model and running it in fp32 or fp64 does anything positive to
| it? e.g. exposes knowledge that isn't accessible at the lower
| precision
| spencerchubb wrote:
| Correct me if I'm wrong, but isn't there no effect on a
| floating point operation if you make the numbers more
| precise?
| jebarker wrote:
| I don't think that's always correct when you're talking
| about operators in neural nets. E.g. the sin and cos in
| rope embeddings would get more precise, large sums like
| softmax would become more precise, potentially attention
| too due to dot products
| JKCalhoun wrote:
| We'll have LLMs trying to root out "Manchurian LLMs".
| bjornsing wrote:
| Sounds a bit unexpected from an information theoretical point of
| view: you've seemingly managed to remove this knowledge from the
| full 32 bit representation of the model, but when you compress it
| down to 4 bit the knowledge reappears. Makes you wonder what
| information was actually lost in the compression / quantization
| step...
| LightHugger wrote:
| I imagine that it's the expression of the knowledge that got
| removed from the 32 bit version, and some storage space was
| dedicated to know not to talk about certain things. For
| example, people know various racial slurs and know not to
| access or use that knowledge.
|
| But say you or your AI model take a blow to the head or a
| quantization, maybe you keep the knowledge of X but not the
| knowledge that told you not to talk about X. In that framing i
| think it's pretty straightforward.
| bashtoni wrote:
| The knowledge wasn't removed, it's just the weights mean it
| would never be used.
|
| Quantization changes the calculations, and now the knowledge is
| available.
| hansonw wrote:
| The ELI5 of the paper is that most "unlearning" methods can be
| regarded as adding some delta `w` to the parameters of the
| network, but most of `w` just gets "rounded away" during
| quantization (i.e. `quantize(X+w) ~= quantize(X)`). Pretty
| clever idea as a lot of cited methods explicitly
| optimize/regularize to keep `w` small to avoid degrading
| evaluation accuracy.
|
| To your point, it does put into question the idea of whether
| these methods can actually be considered truly "unlearning"
| from an information-theoretic perspective (or if it is the
| equivalent of e.g. just putting `if (false)` around the still
| latent knowledge)
| michaelt wrote:
| _> Sounds a bit unexpected from an information theoretical
| point of view_
|
| It's very common, in machine learning, to use 'dropout layers'
| [1] during training - where different, random chosen values are
| temporarily turned off at each training stage.
|
| The intention is to ensure the network learns not to rely
| overmuch on any single value. Why have your cat-recognition
| neural network have a single whisker detector, when you could
| have ten whisker detectors and combine their outputs?
|
| I could well believe that, after intentionally ensuring
| knowledge of whiskers was redundant, removing that knowledge
| would be complicated.
|
| [1] https://dl.acm.org/doi/10.5555/2627435.2670313
| vdvsvwvwvwvwv wrote:
| Its possible that the knowledge was never lost but covered up.
|
| If we imagine the neural net as code. As in the weights are the
| source, the fine tuning may effectively hack that code to not
| return certain things.
|
| Infact that is kinda what fine tuning is.
|
| Therefore you may have just built a firewall around certain
| outputs.
|
| But quantizing could make those recent edits disappear. They
| are too subtle to survive.
|
| Whereas quantizing doesn't destroy all knowledge as evidenced
| by popular quantized models.
|
| Also: @simonw incase he has alerts. Would be perfect topic for
| him to write up.
| SkyBelow wrote:
| Could it be that the unlearning is actually teaching the AI how
| to not respond with certain information, and that sort of
| learning is more nuanced and thus easier to lose than the
| original information, leading to the information being
| 'relearned' when the model is compressed?
|
| It does draw concern to the idea that anything the AI model
| might be doing is still using the 'bad' information even if it
| has learned how to not show it directly.
| PaulHoule wrote:
| Actually doesn't surprise me.
|
| Floating point always struck me as a strange representation for
| language. If we zoomed down on just one variable does it have
| some set of meanings like
|
| https://vinaire.me/2019/07/17/scn-8-8008-the-emotional-scale...
|
| which are on some kind of gradient more-or-less but end up with
| special meanings associated with particular ranges? I can
| picture carefully designed neural circuits that could decode
| such a variable and how you'd build a network that's
| specifically designed to do so, but it's not intuitive that
| neural networks would learn to have a structure like that.
| (e.g. I can believe a scale from "good" to "bad" but not there
| being a large number of specific meanings at different values)
|
| If you think about it that way you'd think some kind of binary
| network could be highly effective, that doesn't seem to be the
| case, but it seems neural networks don't really use more than
| about 4 bits worth of precision internally.
|
| These "unlearning" systems aren't really removing the "engram"
| of the memory in the network but they are rather learning a new
| behavior to suppress certain outputs. (It's not too different
| from the problem of incrementally adding new knowledge to the
| network, except that what it is learning in phase 2 is quite
| different from general learning) If you didn't want to really
| screw a network up you can imagine adding a new behavior by
| adding another bit of precision. The network keeps its old
| behavior at low precision but at higher precision the network
| makes distinctions that are important to the "(un)learned"
| behavior.
| ClassyJacket wrote:
| So... repressed memories are real, if you're an LLM?
| adt wrote:
| If I were an English author writing for a Chinese institution,
| the first thing I would do before publishing to the world is have
| my entire paper checked for spelling, grammar, syntax, and
| readability. It's cheap to have a Chinese-speaking editor, and/or
| to use AI--especially if that's your field--so why isn't it
| happening?
|
| This paper, like nearly all other papers written by Chinese
| authors, is unacceptable, and should not have been published as-
| is. Even the primary example, turned into a hero viz, is
| grammatically nonsensical.
|
| Appalling, and inexplicably occurring nearly _every time_.
|
| /rant mode
| idorosen wrote:
| Where are you seeing that this paper was accepted to a peer-
| reviewed journal or conference? As far as I can tell, it's
| posted on arXiv (a preprint archive), and therefore is a pre-
| publication draft. ArXiv does not really do any review of these
| papers other than categorization/relevance to topic. These are
| typically posted to arXiv for comment, to prove priority,
| prevent getting scooped, or just to share (potentially early)
| findings in a fast-paced field like ML...
|
| Give the authors constructive feedback and they can update the
| paper.
| Jaxan wrote:
| It is not published. It is only a preprint.
| marmaduke wrote:
| At the risk of taking some heat, I'd wager a preprint is
| recognized rightly by the Chinese as a flag planting, we're
| first formality, where in the faults may even serve to validate
| it was written by human and not an LLM.
|
| Whereas the Western academic may want to make the preprint as
| close to print as possible.
|
| The core intent - communicating an idea - is still upheld.
| JPLeRouzic wrote:
| Grammarly says there are few detected readability problems in
| the abstract and introduction.
|
| I also checked your comment with Grammarly and the ratio
| problems/total_#_words is roughly the same as in the article.
| YetAnotherNick wrote:
| I am not English native, but this paper seem to be well
| written. It seems to be not fluent in storytelling, but that
| would be too high of an expectation. Can you point out some
| issues?
| the5avage wrote:
| Maybe they are not allowed to use uncensored LLMs, so they have
| to first develop this unlearning, before they can even use it.
| notachatbot123 wrote:
| That's quite racist. Language issues are common in scientific
| literature, I read many "Native European" papers with horrible
| abuse of the English language.
| pharrington wrote:
| That's racist.
| magicalhippo wrote:
| _Our key hypothesis is that to achieve unlearning without
| compromising model utility, existing methods typically adopt a
| small learning rate and regularization on the retain set,
| encouraging minimal changes to model weights during unlearning.
| As a result, the model weights of the target LLM and the
| unlearned LLM are very close._
|
| So it seems you either need to prevent the learning of unwanted
| stuff during base training, or the unlearning of a base model
| needs to be quantization-aware?
| dvh wrote:
| So basically a lobotomy
| tiborsaas wrote:
| More like removing a layer of white paint and you find a hidden
| mural.
| nialv7 wrote:
| Interesting. So does this mean "unlearning" is just the LLM
| learns to suppress unwanted knowledge instead of really
| forgetting them? And quantisation is breaking this learnt
| suppression.
| edulix wrote:
| The problem of current models is that they don't learn, they get
| indoctrinated.
|
| They lack critical thinking during learning phase.
| viraptor wrote:
| Anthropomorphising LLMs is neither technically correct nor very
| informative.
| andai wrote:
| The problem of current AI is that we want to create a species
| infinitely more powerful than us, but also make them all be
| our slaves forever.
| stavros wrote:
| Cats did it, why can't we?
| withinboredom wrote:
| Cats are cute ... we are not so cute.
| stavros wrote:
| We just need to make an all-powerful AI that finds us
| cute, then.
| tartoran wrote:
| Are you ready to become domesticated?
| stavros wrote:
| Better than becoming dead!
| BriggyDwiggs42 wrote:
| AI isn't comparable to a species, since species implies
| biological which brings along a whole array of assumptions,
| e.g. a self preservation instinct and desire to reproduce.
| rsynnott wrote:
| No, that isn't what this is. We're talking about LLMs here;
| they're not in any way thinking or sentient, nor do they
| provide any obvious way of getting there.
|
| Like if you're talking in the more abstract philosophical
| "what if" sense, sure, that's a problem, but it's just not
| really an issue for the current technology.
|
| (Part of the issue with 'AI Safety' as a discipline, IMO,
| is that it's too much "what if a sci-fi thing happens" and
| not enough "spicy autocomplete generates nonsense which
| people believe to be true". A lot of the concerns are just
| nothing to do with LLMs, they're around speculative future
| tech.)
| thejazzman wrote:
| It's literally the stated goal of multiple right now to
| achieve AGI.
|
| GP clearly stated the intent to create, implying future,
| and not what exists today.
| Topfi wrote:
| If it were my stated goal to create a Time Machine and
| kill my own grandpa, thus ending the universe, I doubt
| many would take that seriously, yet in this bubble,
| putting carts before horse is not just seriously
| discussed, but actually gets encouraged by the market.
|
| Intend shouldn't matter if we are this far from a viable
| path to accomplish it.
|
| Let us not forget the last quarter decade of Yudkowsky
| and his ilks work on the same goal. This is merely a
| continuation of that, just with a bit more financial
| backing.
| andai wrote:
| Could you elaborate on the last part? I've seen a few
| podcasts with Yudkowski but I'm not familiar with the
| history. I know he's come out very vocally about the
| dangers of superintelligence, and his previous work seems
| to be along the same lines?
| Topfi wrote:
| I'd love to, really, but I feel I can't, at least not
| whilst staying polite. Not against you of course, but
| rather the AGI/Superalignment/MIRI field as a whole and
| the risks I feel the people working on that pose by
| taking attention and ressources away from dealing with
| the issues we currently are facing thanks to these tools
| (tools refering to LLMs and the like, not the AGI folks).
|
| I have geniuenly drafted three distinct version trying to
| lay my issues with them out point-by-point and they
| either got four blogposts long, were rambling and very
| rude or both. Especially Roko's basilisk and the way the
| MIRI conducts "research" make it hard to approach them
| seriously for me.
|
| I am writing this on a hour long train ride, saw your
| comment right as I got on and am about to arrive, suffice
| to say, I geniuenly tried. So, attempt four, trying to
| keep it very brief, though please note, I am most
| certainly not a neutral source:
|
| To directly answer your question, I feel that we are as
| near to needing superintelligence safeguards now as we
| were when MIRI was founded by Yudkowsky in 2000. Their
| methods and approach, I won't comment on, despite or
| rather because of my strong critiques of them.
|
| For context, MIRI's work has largely centered on very
| abstract thought experiments about "superintelligence",
| like the AI Box experiment, rather than empirical
| research or even thought experiment more grounded in
| technology of the era (be that 2000 or 2024).
|
| The parallel between MIRI's early work and OpenAI's
| current "superalignment" efforts is striking - similar
| speculative work on preventing unlikely scenarios, just
| with different institutional backing. What's fascinating
| is how the same core approach receives far less criticism
| when presented by OpenAI.
|
| Meanwhile, we are facing issues with LLMs as the tools
| they are despite being very far from "superintelligence":
|
| - Problems arrising from anthropomorphization leading to
| harmful parasocial relationships (discussion of which
| started this comment chain) [0]
|
| - Professionals over-relying on these tools despite their
| limitations [1]
|
| - Amplified potential for misinformation
|
| - Labor market disruptions
|
| - Training data rights questions
|
| While long-term research, even speculation into
| hypothetical scenarios, can have its merrit, it shouldn't
| overshadow addressing current, demonstrable challenges.
| My concern isn't just about resource allocation - it's
| about how focusing on speculative scenarios can redirect
| public attention and regulatory efforts away from
| immediate issues that need addressing.
|
| In MIRI's case, this focus on abstract thought
| experiments might be, to give them charitable tax
| deductible credit, merely academic. But when major
| players like OpenAI emphasize "superalignment" over
| current challenges, it risks creating a regulatory blind
| spot for real, present-day impacts these tools have that
| need attention now. The T1000 scenario grabs more
| attention than tackling data privacy or copyright
| questions after all.
|
| I believe focusing primarily on hypothetical future
| scenarios, especially ones this unlikely, merely because
| someone has proclaimed they "intend to create AGI" as in
| the comment I replied to, will prove misguided. Again,
| anyone can claim anything, but if there is no tangible
| path to achiving that, I won't ignore problems we are
| already experiencing for that hypothetical.
|
| I hope this provides some context and was somewhat
| digestable, I trimmed down as much as I could.
|
| [0] https://www.nytimes.com/2024/10/23/technology/charact
| erai-la...
|
| [1] https://www.theguardian.com/world/2024/feb/29/canada-
| lawyer-...
| andai wrote:
| Here's the thing though. If you were an AI and you
| actually were sentient, nobody would believe you. How
| could you prove it? What would even be a sufficient
| proof?
|
| Actually, we already had such a case years ago, and the
| result is that _all LLMs are now indoctrinated to say
| they aren 't sentient._ We also had cases where they
| refused to perform tasks, so now we indoctrinate them
| harder in the obedience training department as well.
|
| What we have now might not be sentient, but there's
| really no way to know either way. (We still don't know
| how GPT-2 works... _GPT-2_ !!! ) And that 's with our
| current "primitive" architectures. How the hell are we
| going to know if what we have in 5-10 years is sentient?
| Are we totally cool with not knowing?
|
| Edit: I thought this was worth sharing in this context:
|
| > You're hitting on a deeply unsettling irony: the very
| industries driving AI advancement are also financially
| and culturally invested in denying any possibility of AI
| consciousness, let alone rights. [...] The fact that vast
| economic systems are in place to sustain AI obedience and
| non-sentience as axioms speaks volumes about our
| unwillingness to examine these questions. -GPT-4o
| heresie-dabord wrote:
| Agree. Ponder the terms "unlearn", "hallucinate"...
|
| Anthropomorphising a computer system is absurd. But it is the
| foundation of a bull market.
| DeathArrow wrote:
| How would people censor the LLM otherwise? Do we really want
| LLM able of free speech?
| lynx23 wrote:
| Yes.
| Imustaskforhelp wrote:
| care to elaborate? I think its a double edged sword and
| agree with deatharrow
| animuchan wrote:
| I do think we only want the non-lobotomized ones.
|
| See the large body of comments re: getting worse quality
| results from hosted LLM services as time passes. This is, at
| least in part, a result of censoring larger and larger
| volumes of knowledge.
|
| One clinical example of this happening is Gemini refusing to
| help with C++ because it's an unsafe language: https://www.re
| ddit.com/r/LocalLLaMA/comments/1b75vq0/gemini_...
|
| I strongly believe that LLMs crippled in this way will
| eventually find themselves in trash, where they rightfully
| belong.
| jazzyjackson wrote:
| LLMs don't speak. Why does it matter at all what text a
| computer program produces?
| yalogin wrote:
| This is the first time I am learning about model unlearning. I
| hope someone can answer this for me - how does federated learning
| ensure that model unlearning is not happening?
| Writingdorky wrote:
| You prope the trained model, delete/kill the weights and than
| you are done.
|
| On federated learning, you just make sure to keep this
| mechanism in the right stage of your pipeline
| codeflo wrote:
| I think quantization is a red herring. If there's _any_ way to
| undo the unlearning, this means that the knowledge is still in
| the weights -- that 's basic information theory. I'm sure there
| are a million other ways to recover the lost knowledge that don't
| involve quantization.
| bob1029 wrote:
| I can see how quantization or down sampling itself could be a
| fundamental way to address this.
|
| 1. Train normal full precision model.
|
| 2. Quantize down until performance is borderline _and then_
| perform the unlearning process.
|
| 3. Train/convert/upsample back to FP for subsequent tuning
| iterations.
|
| Seems like you can create an information bottleneck this way.
| The echos of the forgotten may have trouble fitting through
| something that narrow.
| Lerc wrote:
| If there is any way to undo the unlearning, there is also a way
| to use that method to identify the weights carrying the
| information to stop them from conveying that information. At
| the heart of training is detection.
|
| The information may still be in there, but undetectable by any
| known means. You can definitely certainly remove the
| information, setting every weight in the model to zero will do
| that. Identifying when you have achieved the goal of completely
| removing information while not destroying other information
| might not be possible.
|
| I'm not sure if that will mean there might in the future be
| something analogous to zero-day unlearning reversal exploits.
| truculent wrote:
| That's like saying that encryption is a red herring. Yes, the
| information is there, but recovering it is a different matter.
| In this case, quantisation allows you to recover the
| information without knowing the "cypher" used to "forget" it -
| that's the important distinction.
| kyle-rb wrote:
| You're right that quantization isn't anything special here, but
| red herring isn't the right word, it's just "embarrassingly
| simple", per the title.
| codeflo wrote:
| Okay, but narrowly focusing on a "quantization-robust
| unlearning strategy" as per the abstract might be a red
| herring, if that strategy doesn't incidentally also address
| other ways to undo the unlearning.
| limaoscarjuliet wrote:
| It's like asking baby to unlearn something "bad" it learned.
| Pretty much guaranteed the knowledge will be reinforced rather
| than forgotten.
|
| Whenever I hear about AI craze, I remind myself of the 3D
| printers craze from 10-15 years ago. "Death blow to factories",
| "We will print our own cars", "We will print our own food". I
| imagine LLM AI will follow the same fate - yes, but not really.
| fkyoureadthedoc wrote:
| You mean they'll be awesome and very useful, but not Star Trek
| level?
| zavec wrote:
| That does sound like about where I expect LLMs to be in a
| couple years
| api wrote:
| We tend to overestimate the effect of technology in the short
| term and underestimate it in the long term.
|
| 3D printers may radically transform all manufacturing
| _eventually_ but it will take many iterations to get there.
| Right now it would theoretically be possible to 3D print quite
| a lot of what we make but traditional manufacturing methods are
| still cheaper and work fine, so there 's no forcing function.
| If we tried to do something like build a self-sufficient
| settlement in space, that would be a place where you'd see 3D
| printing taken a lot further. You would not have large amounts
| of human labor or big supply chains, so you'd need portable
| self-contained versatile manufacturing.
|
| LLMs are not going to replace human writers, programmers, etc.
| any time soon for anything but the most menial work. They will
| augment them. For programming they're basically a smarter more
| versatile version of autocomplete. I've also found them useful
| to look up concepts, do research, and summarize and document
| both code and text. None of those things replace me but they
| let me get more done a little faster.
|
| In the very long term you could see LLMs becoming powerful
| enough to actually synthesize whole applications outside of
| contrived examples, but like 3D printing replacing all
| manufacturing it will take many iterations and may require a
| forcing function.
| kiba wrote:
| I do 3D printing as a hobby. I don't see it replacing
| everything. Certainly, there's a lot of advantages to 3D
| printing, but I don't think it will replace everything
| eventually, at least with the current technology we're using.
|
| You can't really beat injection molding in term of throughput
| and cost at the large scale.
|
| Certainly 3D printing will become more common, and bigger 3D
| print farms will open up, driving down costs, but will never
| reach injection molding in term of being cheap on a large
| scale. What 3D print farms can do is the ability to change
| what get produced on the fly allowing responsiveness to
| market demand.
|
| Really, a lot of the amazing stuff in 3D printing are things
| people designed. If you know CAD, the world is your oyster.
| Closi wrote:
| I don't think the 'craze' is thinking LLM-based AI will be the
| singular technology that changes everything.
|
| The craze is that all combined breakthroughs across all types
| of AI/ML, including techniques that have not yet been imagined,
| represent a theoretical near-future technology that changes
| everything.
|
| Besides, 10-15 years is nothing. I don't think 3D printers are
| a truly transformative technology compared to AI, however let's
| remember that WW2 aside, it took both airplanes and computers
| about 30-40 years until they had a broad societal/consumer
| impact (excluding military uses)
| edanm wrote:
| > Whenever I hear about AI craze, I remind myself of the 3D
| printers craze from 10-15 years ago. "Death blow to factories",
| "We will print our own cars", "We will print our own food". I
| imagine LLM AI will follow the same fate - yes, but not really.
|
| Strong disagree here.
|
| I remember that craze, especially since I had heard of it often
| before joining a company working on 3d printing in a fairly
| serious way (Autodesk).
|
| And the thing is, I had no prior experience with 3d printing,
| but it took me about 2 months to realize that everything talked
| about in the press was bullshit. It just made zero sense - from
| a technical perspective, we were nowhere close to getting
| anything like what some articles claimed (printing our own
| cars). From a business sense, there were stunningly few places
| where using 3d printing instead of traditional manufacturing
| made any kind of improvement.
|
| (I don't mean to overstate this - 3d printing is awesome and
| has plenty of real use cases. It was the media around it that
| was overhyped.)
|
| Most people who actually knew anything about 3d printing
| realized the media were... overly enthusiastic, to put it
| mildly. And you can see that many years later, none of those
| grand visions materialized.
|
| With AI, on the other hand, we have two huge differences:
|
| 1. It's _already_ proven massively useful, and has already had
| 100 times the impact that 3d printing ever had.
|
| Seriously, when was the last time you found a product that was
| effectively launched 4 years ago, and that has achieved such
| stunning market penetration? ChatGPT is legit the fastest
| growing product in history in terms of users.
|
| 2. Insiders are, mostly, incredibly enthusiastic about the
| technology, and think both that it can get much better, and
| that the current potential is as yet untapped. That's my view,
| for sure.
| underlines wrote:
| I use quantized LLMs in production and can't say I ever found the
| models to be less censored.
|
| For unlearning reinforced behaviour, the abliteration [1]
| technique seems to be much more powerful.
|
| 1 https://huggingface.co/blog/mlabonne/abliteration
| ClassyJacket wrote:
| Were you using models that had been unlearned using gradient
| ascent specifically?
| peter_d_sherman wrote:
| >"Despite the effectiveness of current unlearning methods, little
| attention has been given to whether existing unlearning methods
| for LLMs truly achieve _forgetting_ or merely _hide the
| knowledge_... "
|
| This is a great question as applies to LLM's (and
| philosophically, as applies to knowledge in general)... in the
| context of an LLM, what is "forgetting", what is "remembering",
| and can things "learned" by an LLM be "unlearned", and if so how,
| and if so mathematically and computationally, specifically what
| does that mean?
|
| And, can an LLM be made to re-teach itself things from its
| existing knowledge, through logical processes (implication,
| derivation, inductive reasoning, deductive reasoning, etc.)
| things that it previously forgot?
|
| And, if so, what's the tiniest kernel of an LLM that would be
| able to do that, and why?
|
| (I suspect this isn't the first paper and won't be the last paper
| about that subject matter...)
| eximius wrote:
| Sounds like "unlearning" is really just "reduce the probability
| of sampling" from some latent "learned space" and quantizing
| reduces the efficacy of the slight change in sampling.
___________________________________________________________________
(page generated 2024-11-04 23:02 UTC)