[HN Gopher] Machine Unlearning in 2024
___________________________________________________________________
Machine Unlearning in 2024
Author : ignoramous
Score : 170 points
Date : 2024-05-05 12:30 UTC (10 hours ago)
(HTM) web link (ai.stanford.edu)
(TXT) w3m dump (ai.stanford.edu)
| dataflow wrote:
| > However, RTBF wasn't really proposed with machine learning in
| mind. In 2014, policymakers wouldn't have predicted that deep
| learning will be a giant hodgepodge of data & compute
|
| Eh? Weren't deep learning and big data already things in 2014?
| Pretty sure everyone understood ML models would have a tough time
| and they still wanted RTBF.
| peteradio wrote:
| I don't know if people anticipated contemporary parroting
| behavior over huge datasets. Modern well funded models can
| recall an obscure persons home address buried deep into the
| training set. I guess the techniques described might be
| presented to the European audience in an attempt to maintain
| access to their data/and or market for sales. I hope they fail.
| isodev wrote:
| Of course, it's not a regulation issue. The technology was
| introduced to users before it was ready. The very nature of
| training without opt-in consent or mechanism of being forgotten
| are all issues that should have been addressed before trying to
| make a keyboard with a special copilot button.
| spennant wrote:
| Agreed. The media and advertising industry was most definitely
| leveraging cookie-level data for building attribution and
| targeting models. As soon as the EU established that this data
| was "personal data", as it could, theoretically, be tied back
| to individual citizens, there were questions about the models.
| Namely "Would they have to be rebuilt after every RTBF
| request?" Needless to say, no one in the industry really wanted
| to address the question, as the wrong answer would essentially
| shut down a very profitable practice.
| Aerroon wrote:
| More likely: the wrong answer would've shut out a profitable
| market rather than the practice. The EU is not the world.
| Anthropic seems to not mind blocking the EU for example.
| spennant wrote:
| Sure. But two things:
|
| 1) At the time, the European data laws implied that it
| protected its citizens no matter where they are. Nobody
| wanted to be the first to test that in court.
|
| 2) The organizations and agencies performing this type of
| data modeling were often doing so on behalf of large
| multinational organizations with absurd advertising spends,
| so they were dealing with Other People's Data. The
| responsibility of scrubbing it clean of EU citizen data was
| unclear.
|
| What this meant was that an EU tourist who traveled to the
| US, and got served a targeted ad, could make a RTBF request
| to the advertiser (think Coca-Cola, Nestle or Unilever)
|
| The whole thing was a mess.
| startupsfail wrote:
| RTBF was introduced to solve a specific issue, no?
|
| Politicians and their lobbyist friends could no longer remove
| materials linking them to their misdeeds as the first Google
| Search link associated with their names. Hence RTBF.
|
| Now, there's similar issue with AI. Models are progressing
| towards being factual, useful and reliable.
| indigovole wrote:
| GDPR and RTBF were formulated around the fears of data
| collection by the Stasi and other organizations. They were not
| formulated around easing the burdens of future entrepreneurs,
| but about mitigating the damage they might cause. Europeans
| were concerned about real harms that living people had
| experienced, not about enabling AGI or targeted advertising or
| digital personal assistants.
|
| We have posts here at least weekly from people cut off from
| their services, and their work along with them, because of bad
| inference, bad data, and inability to update metadata based
| purely on BigGo routine automation and indifference to
| individual harm. Imagine the scale that such damage will take
| when this automation and indifference to individual harm are
| structured around repositories from which data cannot be
| deleted, cannot be corrected.
| negative_person wrote:
| Why should we try to unlearn "bad" behaviours from AI?
|
| There is no AGI without violence, its part of being free thinking
| and self survival.
|
| But also by knowing that launching a first strike by a drunk
| president was a bad idea we averted a war because of a few
| people, AI needs to understand consequences.
|
| It seems futile to try and hide "bad" from AI.
| andy99 wrote:
| This is presumably about a chatbot though, not AGI, so it's
| basically a way of limiting what they say. (Not a way that I
| expect to succeed)
| Cheer2171 wrote:
| So you have a problem with supervised learning like spam
| classifiers?
| williamtrask wrote:
| Because we can get AI related technologies to do things living
| creatures can't, like provably forget things. And when it
| benefits us, we should.
|
| Personal opinion, but I think AGI is a good heuristic to build
| against but in the end we'll pivot away. Sort of like how birds
| were a good heuristic for human flight, but modern planes don't
| flap their wings and greatly exceed bird capabilities in many
| ways.
|
| Attribution for every prediction and deletion seem like prime
| examples of things which would break the analogy of AI/AGI with
| something more economically and politically
| compelling/competitive.
| negative_person wrote:
| Can you point to any behaviour in human beings you'd unlearn
| if theyd also forget the consequences?
|
| We spend billions trying to predict human behaviour and yet
| we are surprised everyday, "AGI" will be no simpler. We just
| have to hope the dataset was aligned so the consequences are
| understood, and find a way to contain models that don't.
| aeonik wrote:
| The feeling of extreme euphoria and its connection to
| highly addictive drugs like Heroin might be a use case.
| Though I'm not sure how well something like that would work
| in practice.
| everforward wrote:
| Is that possible to do without also forgetting why it's
| dangerous? That seems like it would fuel a pattern of
| addiction where the person gets addicted, forgets why,
| then gets addicted again because we wiped their knowledge
| of the consequences the first time around.
|
| Then again, I suppose if the addiction was in response to
| a particular stimulus (death of a family member, getting
| fired, etc) and that stimulus doesn't happen again, maybe
| it would make a difference?
|
| It does have a tinge of "those who don't recall the past
| are doomed to repeat it".
| aeonik wrote:
| After a certain point I think someone can learn enough
| information to derive almost everything from first
| principles. But I think it might work temporarily.
|
| There's a movie about this idea called "Eternal Sunshine
| of a Spotless Mind".
|
| I find it hard I believe that you can surgically censor
| one chunk of information, and cut off the rest of the
| information. Especially if it's general physical
| principles.
|
| I also don't have a nice topological map of how all the
| world's information is connected to the moment, so I
| can't back up by opinions.
|
| Though I'm still rooting for the RDF/OWL and Semantic Web
| folks, they might figure it out.
| Brian_K_White wrote:
| It sounds like the only answer for AI is the same as the
| only answer for humans.
|
| Wisdom. Arriving at actions and reactions based on better
| understanding of the interconnectedness and interdependency
| of everything and everyone. (knowing more not less, and not
| selective or bowdlerized)
|
| And most humans don't even have it. Most humans are not
| interested and don't believe and certainly don't act as
| though "What's good for you is what's good for me, what
| harms you harms me." Every day a tech podcaster or youtuber
| says this or that privacy loss or security risk "doesn't
| affect you or me", _they all affect you and me_ , when a
| government or company gives themselves and then abuses
| power over a single person anywhere, that is a hit to you
| and me even though we aren't that person, because that
| person is somebody, and you and I are somebody.
|
| Most humans ridicule anyone that talks like that and don't
| let them near any levers of power at any scale. They might
| be ok with it in inconsequential conversational contexts
| like a dinner party or this or this forum, but not in any
| decision-making context. Anyone talking like that is an
| idiot and disconnected from reality, they might drive the
| bus off the bridge because the peace fairies told them to.
|
| If an AI were better than most humans and had wisdom, and
| gave answers that conflicted with selfishness, most humans
| would just decide they don't like the answers and
| instructions coming from the AI and just destroy it, or at
| least ignore it, pretty much as they do today with humans
| who say things they don't like.
|
| Perhaps one difference is an AI could actually be both wise
| and well-intentioned rather than a charlatan harnessing the
| power of a mass of gullables, and it could live longer than
| a human and it's results could become proven-out over time.
| Some humans do get recognized eventually, but by then it
| doesn't do the rest of us any good because they can no
| longer be a leader as they're too old or dead. Then again
| maybe that's required actually. Maybe the AI can't prove
| itself because you can never say of the AI, "What does he
| get out of it by now? He lived his entire life saying the
| same thing, if he was just trying to scam everyone for
| money or power or something, what good would it even do him
| now? He must have been sincere the whole time."
|
| But probably even the actual good AI won't do much good,
| again for the same reason as with actually good humans,
| it's just not what most people want. Whatever individuals
| say about what their values are, by the numbers only the
| selfish organisations win. Even when a selfish organization
| goes too far and destroys itself, everyone else still keeps
| doing the same thing.
| williamtrask wrote:
| You seem to be focusing a lot on remembering or forgetting
| consequences. Yes, ensuring models know enough about the
| world to only cause the consequences they desire is a good
| way for models to not create random harm. This is probably
| a good thing.
|
| However, there are many other reasons why you might want a
| neural network to provably forget something. The main
| reason has to do with structuring an AGI's power. Even
| though the simple-story of AGI is something like "make it
| super powerful, general, and value aligned and humanity
| will prosper". However, the reality is more nuanced.
| Sometimes you want a model to be selectively not powerful
| as a part of managing value mis-alignment in practice.
|
| To pick a trivial example, you might want a model to enter
| your password in some app one time, but not remember the
| password long term. You might want it to _use_ and then
| provably _forget_ your password so that it can 't use your
| password in the future without your consent.
|
| This isn't something that's reliably doable with humans. If
| you give them your password, they have it -- you can't get
| it back. This is the point at which we'll have the option
| to pursue the imitation of living creatures blindly, or
| choose to turn away from a blind adherence to the AI/AGI
| story. Just like we reached the point at which we decided
| whether flying planes should have flapping wings
| dogmatically -- or whether we should pursue the more
| economically and politically competitive thing. Planes
| don't flap their wings, and AI/AGI will be able to provably
| forget things. And that's actually the better path.
|
| A recent work co-authors and I published related to this:
| https://arxiv.org/pdf/2012.08347
| beeboobaa3 wrote:
| Seeing dad have sex with mom.
| AvAn12 wrote:
| A few things to exclude from training might include: -
| articles with mistakes such as incorrect product names,
| facts, dates, references - fraudulent and non-repeatable
| research findings - see John Ioannidis among others -
| outdated and incorrect scientific concepts like phlogiston
| and LaMarckian evolution - junk content such as 4-chan
| comments section content - flat earther "science" and other
| such nonsense - debatable stuff like: do we want material
| that attributes human behavior to astrological signs or
| not? And when should a response make reference to such? -
| prank stuff like script kiddies prompting 2+2=5 until an AI
| system "remembers" this - intentional poisoning of a
| training set with disinformation - suicidal and homicidal
| suggestions and ideation - etc.
|
| Even if we go with the notion that AGI is coming, there is
| no reason its training should include the worst in us.
| doubloon wrote:
| AGI would not beGI unless it could change its mind after
| realizing its wrong about something
| 542458 wrote:
| I disagree. People with anterograde amnesia still possess
| general intelligence.
| saintfire wrote:
| I don't know I ton about amnesia, but I would think the
| facilities for changing their mind are still there.
|
| E.g. ordering food, they might immediately change their
| mind after choosing something and correct their order.
|
| I recognize they cannot form new memories but from what I
| understand they still would have a working memory,
| otherwise you'd be virtually unable to think and speak.
| sk11001 wrote:
| The point is to build things that are useful, not to attempt to
| replicate science fiction literature.
| szundi wrote:
| Thanks but no violent AGIs thanks
| 542458 wrote:
| > There is no AGI without violence, its part of being free
| thinking and self survival.
|
| I disagree. Are committed pacifists not in possession of
| general intelligence?
| imtringued wrote:
| You seem to be ignoring the potential to use this to improve
| the performance of LLMs. If you can unlearn wrong answers you
| can ask the model using any scoring mechanism to check for
| correctness instead of scoring for token for token similarity
| to the prescribed answer.
| affgrff2 wrote:
| Maybe it all boils down to copyright. Having a method that
| believably removes the capacity to generate copyrighted results
| might give you some advantage with respect to some legislation.
| wongarsu wrote:
| Also if you build some sort of search engine using an LLM
| governments will expect you to be able to remove websites or
| knowledge of certain websites for legal reasons (DMCA, right
| to be forgotten, etc).
| numpad0 wrote:
| They are just trying to find a way to plausibly declare
| successful removal of copyrighted and/or illegal material
| without discarding weights.
|
| GPT-4 class models reportedly costs $10-100m to train, and
| that's too much to throw away for Harry Potter or Russian child
| porn scrapes that could later reproduce verbatim despite
| representing <0.1ppb or whatever minuscule part of dataset.
| wruza wrote:
| _There is no AGI without violence, its part of being free
| thinking and self survival._
|
| Self survival idea is a part of natural selection, AGI doesn't
| have to have it. Maybe the problem is we are the only template
| to build AGI from, but that's not inherent to "I" in any way.
| Otoh, lack of self preservation can make animals even more
| ferocious. Also there's a reason they often leave a retreat
| path in warzones.
|
| Long story short it's not that straightforward, so I sort of
| agree cause it's an uncharted defaults-lacking territory we'll
| have to explore. "Unlearn bad" is as naive as not telling your
| kids about sex and drugs.
| surfingdino wrote:
| AI has no concept of children, family, or nation. It doesn't
| have parental love or offspring protection instinct. Faced with
| danger to its children it cannot choose between fighting or
| sacrificing itself in order to protect others. What it is good
| at is capturing value through destruction of value generated by
| existing business models; it does it by perpetrating mass theft
| of other people's IP.
| cwillu wrote:
| "to edit away undesired things like private data, stale
| knowledge, copyrighted materials, toxic/unsafe content, dangerous
| capabilities, and misinformation, without retraining models from
| scratch"
|
| To say nothing of unlearning those safeguards and/or
| "safeguards".
| ben_w wrote:
| It sounds like you're mistakenly grouping together three very
| different methods of changing an AI's behaviour.
|
| You have some model, M(tm), which can do Stuff. Some of the
| Stuff is, by your personal standards Bad (I don't care what
| your standard is, roll with this).
|
| You have three solutions:
|
| 1) Bolt on a post-processor which takes the output of M(tm),
| and if the output is detectably Bad, you censor it.
|
| Failure mode: this is trivial to remove, just delete the post-
| processor.
|
| Analogy: put secret documents into a folder called "secret do
| not read".
|
| 2) Retrain the weights within M(tm) to have a similar effect as
| 1.
|
| Failure mode: this is still fairly easy to remove, but will
| require re-training to get there. Why? Because the weights
| containing this information are not completely zeroed-out by
| this process.
|
| Analogy: how and why "un-deletion" is possible on file systems.
|
| 3) _Find and eliminate_ the weights within M(tm) that lead to
| the Bad output.
|
| Analogy: "secure deletion" involves overwriting files with
| random data before unlinking them, possibly several times if
| it's a spinning disk.
|
| --
|
| People are still doing research on 3 to make sure that it
| actually happens, what with it being of very high importance
| for a lot of different reasons including legal obligation.
| andy99 wrote:
| Until we have a very different method of actually controlling
| LLM behavior, 1 is the only feasible one.
|
| Your framing only makes sense when "Bad" is something so bad
| that we can't bear its existence, as opposed to just
| "commercially bad" where it shouldn't behave that way with an
| end user. In the latter, your choice 1 - imposing external
| guardrails - is fine. I'm not aware of anything LLMs can do
| that fits in the former category.
| ben_w wrote:
| > Until we have a very different method of actually
| controlling LLM behavior, 1 is the only feasible one.
|
| Most of the stuff I've seen, is 2. I've only seen a few
| places use 1 -- you can tell the difference, because when a
| LLM pops out a message _and then_ deletes it, that 's a
| type 1 behaviour, whereas if the first thing it outputs
| directly is a sequence of tokens saying (any variant of)
| "nope, not gonna do that" that's type 2 behaviour.
|
| This appears to be what's described in this thread: https:/
| /old.reddit.com/r/bing/comments/11fryce/why_do_bings_...
|
| The research into going from type 2 to type 3 is the
| entirety of the article.
|
| > Your framing only makes sense when "Bad" is something so
| bad that we can't bear its existence, as opposed to just
| "commercially bad" where it shouldn't behave that way with
| an end user. In the latter, your choice 1 - imposing
| external guardrails - is fine.
|
| I disagree, I think my framing applies to all cases. Right
| now, LLMs are like old PCs with no user accounts and a
| single shared memory space, which is fine and dandy when
| you're not facing malicious input, but we live in a world
| with malicious input.
|
| You _might_ be able to use a type 1 solution, but it 's
| _going_ to be fragile, and more pertinently, slow, as you
| only know to reject content once it has finished and may
| therefore end up in an unbounded loop of an LLM generating
| content that a censor rejects.
|
| A type 2 solution is still fragile, but it _just doesn 't_
| make the "bad" content in the first place -- and, to be
| clear, "bad" in this context can be _anything_ undesired,
| including "uses vocabulary too advanced for a 5 year old
| who just started school" if that's what you care about
| using some specific LLM for.
| cwillu wrote:
| I think you mistakenly replied to my comment instead of one
| that made some sort of grouping?
|
| Alternatively, you're assuming that because there is some
| possible technique that can't be reversed, it's no longer
| useful to remove the effects of techniques that _can_ be
| reversed?
| nullc wrote:
| I've wondered before if it was possible to unlearn facts, but
| retain the general "reasoning" capability that came from being
| trained on the facts, then dimensionality reduce the model.
| andy99 wrote:
| If you think of knowledge as a (knowledge) graph, it seems
| there would be some nodes with low centrality that you could
| drop without much effect, and other key ones that would have a
| bigger impact if lost.
| huygens6363 wrote:
| Yes, me too. If it could somehow remember the "structure"
| instead of the instantiation. More "relationships between types
| of token relationships" instead of "relationships between
| tokens".
| Brian_K_White wrote:
| I don't know about in AI, but it seems like that is what humans
| do.
|
| We remember _some_ facts but I know at least I have had a lot
| of facts pass through me and only leave their effects.
|
| I once had some facts, did some reasoning, arrived at a
| conclusion, and only retained the conclusion and enough of the
| reasoning to identify other contexts where the same reasoning
| should apply. I no longer have the facts, I simply trust my
| earlier selfs process of reasoning, and even that isn't
| actually trust or faith because I also still reason about new
| things today and observe the process.
|
| But I also evolve. I don't _only_ trust a former reasoning
| unchanging forever. It 's just that when I do revisit something
| and basically "reproduce the other scientists work" even if I
| arrive a different conclusion today, I'm generally still ok
| with the earlier me's reasoning and conclusion. It stands up as
| reasonable, and the new conclusion is usually just tuned a
| little, not wildly opposite. Or some things do change radically
| but I always knew they might, like in the process of self
| discovery you try a lot of opposite things.
|
| Getting a little away from the point but the point is I think
| the way we ourselves develop answer-generating-rules is very
| much by retaining only the results (the developed rules) and
| not all the facts and steps of the work, at least much of the
| time. Certainly we remember some justifying / exemplifying
| facts to explain some things we do.
| motohagiography wrote:
| seems like there is a basic problem where if you specify
| something to be unlearned, it could still be re-learned by
| inference and prompting. the solution may not be in filtering the
| proscribed facts or data itself, but in the weights and
| incentives that form a final layer of reasoning. Look at "safe"
| models now like google's last launch, where the results were
| often unsatisfying, as clearly we don't want truthful models yet,
| but we want ones that enable our ability to develop them further,
| which for now means not selecting out by antagonizing other
| social stakeholders.
|
| maybe we can encode and weight some principle of the models
| having been created by something external, with some loosely
| defined examples they can refer to as a way to evaluate what they
| return, then ones that don't yield those results cease to be
| used, where the ones that find a way to align will get reused to
| train others. there will absolutely be bad ones, but in aggregate
| they should produce something more desirable, and if they really
| go off the rails, just send a meteor. the argument in how models
| can "unlearn" will be between those who favour incentives and
| those who favour rules- likely, incentives for ones I create, but
| rules for everyone elses'.
| gotoeleven wrote:
| My new startup includes a pitchfork wielding mob in the ML
| training loop.
| avi_vallarapu wrote:
| We need to consider the practicality of unlearning methods in
| real-world applications and the legal acceptance of the same.
|
| Given current technology and what advancements are needed to make
| Unlearning more possible, probably there should be a time-to-
| unlearn kind of an acceptable agreement that allows organizations
| to retrain or tune the response that does not involve any
| response from the to-be-unlearned copyright content.
|
| Ultimately, legal acceptance for unlearning may be all about
| deleting the data set that is part of any kind of violations from
| the training data set. It may be very challenging to otherwise
| prove legally through the proposed unlearning techniques, that
| the model does not produce any type of response involving the
| private data.
|
| The actual data set contains the private data violating privacy
| or copyright, and the model is trained on it, period. This means,
| it must involve retraining by deleting the documents/data to be
| unlearned.
| isodev wrote:
| > a time-to-unlearn kind of an acceptable agreement
|
| Why put the burden to end users? I think the technology should
| allow for unlearning and even "never learn about me in any
| future models and derivative models".
| avi_vallarapu wrote:
| No technology can guarantee 100% unlearning, and the only
| 100% guarantee is when the data is deleted before the model
| is retrained. Legally, even 99.99% accuracy may not be
| acceptable, but, only 100%.
| Vampiero wrote:
| The technology is on par with a Markov chain that's grown a
| little too much. It has no notion of "you", not in the
| conventional sense at least. Putting the infrastructure in
| place to allow people (and things) to be blacklisted from
| training is all you can really do, and even then it's a
| massive effort. The current models are not trained in such a
| way that you can do this without starting over from scratch.
| Retric wrote:
| That's hardly accurate. Deep learning among other things is
| another type of lossy compression algorithm.
|
| It doesn't have a 1:1 mapping of each bit of information
| it's been trained with, but you can very much extract a
| subset of that data. Which is why it's easy to get DallE to
| recreate the Mona Lisa, variations on that image show up
| repeatedly in its training courpus.
| xg15 wrote:
| Well then, maybe we shouldn't use the technology.
| beeboobaa3 wrote:
| How to deal with "unlearning" is the problem of the org running
| the illegal models. If I have submitted a gdpr deletion request
| you better honor it. If it turns out you stole copyrighted
| content you should get punished for that. No one cares how much
| it might cost you to retrain your models. You put yourself in
| that situation to begin with.
| avi_vallarapu wrote:
| Exactly, I think is where it leads to eventually. And that is
| what I my original comment meant as well. "Delete it" rather
| than using some more techniques to "unlearn it", unless you
| claim the unlearning is 100% accurate.
| visarga wrote:
| > No one cares how much it might cost you to retrain your
| models.
|
| Playing tough? But it's misguided. "No one cares how much it
| might cost you to fix the damn internet"
|
| If you wanted to retro-fix facts, even if that could be
| achieved on a trained model, it would still get back by way
| of RAG or web search. But we don't ask pure LLMs for facts
| and news unless we are stupid.
|
| If someone wanted to pirate a content it would be easier to
| use Google search or torrents than generative AI. It would be
| faster, cheaper and higher quality. AIs move slow, are
| expensive, rate limited and lossy. AI providers have in-built
| checks to prevent copyright infringement.
|
| If someone wanted to build something dangerous, it would be
| easier to hire a specialist than to _chatGPT their way into
| it_. All LLMs know is also on Google Search. Achieve security
| by cleaning the internet first.
|
| The answer to all AI data issues - PII, Copyright, Dangerous
| Information - is coming back to the issue of Google search
| offering links to it, and websites hosting this information
| online. You can't fix AI without fixing the internet.
| beeboobaa3 wrote:
| What do you mean playing tough? These are existing laws
| that should be enforced. The amount of people's lives
| ruined by the American government because they were deemed
| copyright infringers is insane. The us has made it clear
| that copyright infringement is unacceptable.
|
| We now have a new class of criminals infringing on
| copyright on a grand scale via their models and they seem
| desperate to avoid persecution hence all this bullshit.
| cscurmudgeon wrote:
| 1. You are assuming just training a model on copyrighted
| material is a violation. It is not. It may be under
| certain conditions but not by default.
|
| 2. Why should we aim for harsh punitive punishments just
| because it was done so in the past?
| beeboobaa3 wrote:
| > 1. You are assuming just training a model on
| copyrighted material is a violation. It is not. It may be
| under certain conditions but not by default.
|
| Using copyrighted content for commercial purposes should
| be a violation if it's not already considered to be one.
| No different from playing copyrighted songs in your
| restaurant without paying a licensing fee.
|
| > 2. Why should we aim for harsh punitive punishments
| just because it was done so in the past?
|
| I'd be fine with abolishing, or overhauling, the
| copyright system. This rules with harsh penalties for
| consumers/small companies but not for bigtech double
| standard is bullshit, though.
| aidenn0 wrote:
| I think "unlearning" is not the actual goal; we don't want the
| model to stick its proverbial head in the sand. Being unaware of
| racism is different from not producing racist content (and, in
| fact, one could argue that it is necessary to know about racism
| if one wishes to inhibit producing racist content; I remember in
| elementary school certain kids thought it would be funny to teach
| one of the special-ed kids to parrot offensive sentences).
| krono wrote:
| Say you tell me you want a red sphere. Taken at face value, you
| show a prejudice for red sphere's and discriminate against all
| other coloured shapes.
|
| We've all had to dance that dance with ChatGPT by now, where
| you ask for something perfectly ordinary, but receive a
| response telling you off for even daring to think like that,
| until eventually you manage to formulate the prompt in a way
| that it likes with just the right context and winner vocabulary
| + grammar, and finally the damned thing gives you the info you
| want without so much as any gaslighting or snarky insults
| hiding in the answer!
|
| It doesn't understand racism, it simply evaluates certain
| combinations of things according to how it was set up to do.
| greenavocado wrote:
| Please use the correct terminology: censorship
| danielmarkbruce wrote:
| If company X wants their model to say/not say Y based on
| ideology, they aren't stopping anyone saying anything. They are
| stopping their own model saying something. The fact that I
| don't go around screaming nasty things about some group doesn't
| make me against free speech.
|
| It's censorship to try to stop people producing models as they
| see fit.
| 62951413 wrote:
| The prolefeed explains that deep duckspeaking is
| doubleplusgood. Nothing to see here, citizen.
| qbit42 wrote:
| I don't think that's a fair characterization. If a user
| requests a company to stop using their data, ML unlearning
| allows the company to do so without retraining their models
| from scratch.
| surfingdino wrote:
| How about a radial approach? How about not ingesting all content
| but only that which is explicitly marked as available for model-
| building purposes?
| xg15 wrote:
| What I don't get about the DP approach is how this would be
| reconciled with the "exact" question-answering functionality of
| LLMs.
|
| DP makes perfect sense if all I care about is low-resolution
| statistical metrics or distributions of something and not the
| exact values - the entire purpose of DP is to prevent
| reconstructing the exact values.
|
| However, the expectation for LLMs is usually to ask a question
| (or request a task) and get an exact value as a response: If you
| ask "What's the phone number of John Smith?" the model will
| either tell you it doesn't know or it will answer you with an
| actual phone number (real or hallucinated). It will not tell you
| "the number is with 83% probability somewhere in New Jersey".
|
| So if the model is trained with DP, then either the data is
| scrambled enough that the it won't be able to return _any_ kind
| of reliably correct data, effectively making it useless - or it
| 's _not_ scrambled enough, so that the model can successfully
| reconstuct data despite the scrambling process, effectively
| making the DP step useless.
|
| Or in other words, the OP defines "DP unlearning" as:
|
| > _The intuition is that if an adversary cannot (reliably) tell
| apart the models, then it is as if this data point has never been
| learned--thus no need to unlearn._
|
| However, if my original model truthfully returns John Smith's
| phone number on request and the "unlearned" model must not be
| distinguishable by an outside observer from the original model,
| then the "unlearned" model will _also_ return the phone number.
| While I could say that "technically" the model has never seen
| the phone number in the training data due to my DP scrambling,
| this doesn't solve the practical problem why the unlearning was
| requested in the first place, namely that John Smith doesn't want
| the model to return his phone number. He could probably care less
| about the specific details of the training process.
|
| So then, how would DP help here?
___________________________________________________________________
(page generated 2024-05-05 23:00 UTC)