[HN Gopher] How ML Model Data Poisoning Works in 5 Minutes
___________________________________________________________________
How ML Model Data Poisoning Works in 5 Minutes
Author : R41
Score : 52 points
Date : 2024-03-24 15:15 UTC (7 hours ago)
(HTM) web link (journal.hexmos.com)
(TXT) w3m dump (journal.hexmos.com)
| fxtentacle wrote:
| "How ML Model Data Poisoning Works"
|
| It doesn't. The mentioned Nightshade tool is useless. Does anyone
| have any example of successful model data poisoning?
| hikingsimulator wrote:
| The is a breadth of literature on the topic. I recommend the
| excellent survey by Baoyuan wu on the topic (mathematical
| perspective) [1]. For IRL demonstrations, existing cases will
| of course be rarer, bu they are not impossible as with attacks
| on Alpaca-7b [2]
|
| [1] https://arxiv.org/abs/2302.09457 [2] https://poison-
| llm.github.io/
| fxtentacle wrote:
| That paper says you need to control "0.1% of the training
| data size" for a 40% chance for one single injected prompt to
| fire. So that's millions of images or billions of text tokens
| for real-world models.
| doctorpangloss wrote:
| Yeah, but the vibes man.
| Eisenstein wrote:
| None of the cases of data poisoning it presented seemed effective
| in doing very much, except the MS case, and that was so flawed in
| implementation that it was a example of how not to deploy
| something.
|
| > Developers need to limit the public release of technical
| project details including data, algorithms, model architectures,
| and model checkpoints that are used in production.
|
| Haven't we learned that more eyes to find flaws is better than
| locking things down?
| bee_rider wrote:
| > In 2016, Microsoft released their chatbot named Tay on Twitter
| to learn from human interactions by posting comments. But after
| the release, it started to act crazy.
|
| > It started using vulgar language and making hateful comments.
| This was one of the first incidents of data poisoning.
|
| Is this true? I remember when this happened but I thought the
| story was that 4chan basically found an "echo" type debug command
| or something like that. The ML mode wasn't being trained to say
| bad things, it was just being sent some kind of repeat-after-me
| command and then the things it was told to repeat were bad.
|
| It seems odd that somebody would write a whole blog post without
| bothering to check that, though, so maybe I'm mis-remembering?
| yungporko wrote:
| i might be misremembering too but i thought the whole thing was
| that tay was supposed to learn from the conversations it had,
| and that people were just deliberately teaching it racist
| things that were then carrying over to other conversations
| rather than any kind of hidden command
| espadrine wrote:
| > _4chan basically found an "echo" type debug command or
| something like that_
|
| That is certainly what Microsoft wanted people to think[0]:
|
| > _a coordinated attack by a subset of people exploited a
| vulnerability in Tay._
|
| Realistically, though, Tay's website was open about using
| tweets directed at it as part of its training set[1]:
|
| > _Data and conversations you provide to Tay are anonymized and
| may be retained for up to one year to help improve the
| service._
|
| So all that this group did was tweet racist things at it, and
| it ended up in its training set. Microsoft hints at it in the
| earlier blog post:
|
| > _AI systems feed off of both positive and negative
| interactions with people. In that sense, the challenges are
| just as much social as they are technical._
|
| There are technical solutions for this issue however; for
| instance, when creating ChatGPT, the OpenAI team designed
| ChatML[2] to distinguish assistant messages from user messages,
| so that it would send messages in the style of the assistant
| only, not in the style of the user. Along with RLHF, it allowed
| OpenAI to use ChatGPT messages as part of their training set.
|
| [0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-
| in...
|
| [1]: https://web.archive.org/web/20160323194709/https://tay.ai/
|
| [2]: https://github.com/MicrosoftDocs/azure-
| docs/blob/main/articl...
| bee_rider wrote:
| > That is certainly what Microsoft wanted people to think[0]:
|
| Maybe I'm reading between the lines in your post too hard,
| but are you saying they wanted people to think this because
| it is somehow less embarrassing or makes them look better?
| Including this "repeat after me" functionality seems like an
| extremely stupid move, like I must assume they found the 3
| programmers who've never encountered the internet or
| something.
|
| In 2016, I can see thinking they got the filtering right and
| that users wouldn't be able to re-train the bot as a sort of
| reasonable mistake to make, on the other hand. It doesn't
| look so bad, haha.
| espadrine wrote:
| Yes, they employed security terminology for something that
| was instead data pipeline contamination. As the saying
| goes, garbage in: garbage out. I don't mean to be harsh on
| them though: experimentation is useful, and it became a
| great lesson on red teaming models.
| stanleykm wrote:
| When these articles pop up on HN at least there seems to be a lot
| of focus on training poisoning. While intellectually interesting,
| it seems less useful or practical than defeating inference.
| sonorous_sub wrote:
| how to train self-smashing looms
___________________________________________________________________
(page generated 2024-03-24 23:01 UTC)