[HN Gopher] How ML Model Data Poisoning Works in 5 Minutes
       ___________________________________________________________________
        
       How ML Model Data Poisoning Works in 5 Minutes
        
       Author : R41
       Score  : 52 points
       Date   : 2024-03-24 15:15 UTC (7 hours ago)
        
 (HTM) web link (journal.hexmos.com)
 (TXT) w3m dump (journal.hexmos.com)
        
       | fxtentacle wrote:
       | "How ML Model Data Poisoning Works"
       | 
       | It doesn't. The mentioned Nightshade tool is useless. Does anyone
       | have any example of successful model data poisoning?
        
         | hikingsimulator wrote:
         | The is a breadth of literature on the topic. I recommend the
         | excellent survey by Baoyuan wu on the topic (mathematical
         | perspective) [1]. For IRL demonstrations, existing cases will
         | of course be rarer, bu they are not impossible as with attacks
         | on Alpaca-7b [2]
         | 
         | [1] https://arxiv.org/abs/2302.09457 [2] https://poison-
         | llm.github.io/
        
           | fxtentacle wrote:
           | That paper says you need to control "0.1% of the training
           | data size" for a 40% chance for one single injected prompt to
           | fire. So that's millions of images or billions of text tokens
           | for real-world models.
        
             | doctorpangloss wrote:
             | Yeah, but the vibes man.
        
       | Eisenstein wrote:
       | None of the cases of data poisoning it presented seemed effective
       | in doing very much, except the MS case, and that was so flawed in
       | implementation that it was a example of how not to deploy
       | something.
       | 
       | > Developers need to limit the public release of technical
       | project details including data, algorithms, model architectures,
       | and model checkpoints that are used in production.
       | 
       | Haven't we learned that more eyes to find flaws is better than
       | locking things down?
        
       | bee_rider wrote:
       | > In 2016, Microsoft released their chatbot named Tay on Twitter
       | to learn from human interactions by posting comments. But after
       | the release, it started to act crazy.
       | 
       | > It started using vulgar language and making hateful comments.
       | This was one of the first incidents of data poisoning.
       | 
       | Is this true? I remember when this happened but I thought the
       | story was that 4chan basically found an "echo" type debug command
       | or something like that. The ML mode wasn't being trained to say
       | bad things, it was just being sent some kind of repeat-after-me
       | command and then the things it was told to repeat were bad.
       | 
       | It seems odd that somebody would write a whole blog post without
       | bothering to check that, though, so maybe I'm mis-remembering?
        
         | yungporko wrote:
         | i might be misremembering too but i thought the whole thing was
         | that tay was supposed to learn from the conversations it had,
         | and that people were just deliberately teaching it racist
         | things that were then carrying over to other conversations
         | rather than any kind of hidden command
        
         | espadrine wrote:
         | > _4chan basically found an "echo" type debug command or
         | something like that_
         | 
         | That is certainly what Microsoft wanted people to think[0]:
         | 
         | > _a coordinated attack by a subset of people exploited a
         | vulnerability in Tay._
         | 
         | Realistically, though, Tay's website was open about using
         | tweets directed at it as part of its training set[1]:
         | 
         | > _Data and conversations you provide to Tay are anonymized and
         | may be retained for up to one year to help improve the
         | service._
         | 
         | So all that this group did was tweet racist things at it, and
         | it ended up in its training set. Microsoft hints at it in the
         | earlier blog post:
         | 
         | > _AI systems feed off of both positive and negative
         | interactions with people. In that sense, the challenges are
         | just as much social as they are technical._
         | 
         | There are technical solutions for this issue however; for
         | instance, when creating ChatGPT, the OpenAI team designed
         | ChatML[2] to distinguish assistant messages from user messages,
         | so that it would send messages in the style of the assistant
         | only, not in the style of the user. Along with RLHF, it allowed
         | OpenAI to use ChatGPT messages as part of their training set.
         | 
         | [0]: https://blogs.microsoft.com/blog/2016/03/25/learning-tays-
         | in...
         | 
         | [1]: https://web.archive.org/web/20160323194709/https://tay.ai/
         | 
         | [2]: https://github.com/MicrosoftDocs/azure-
         | docs/blob/main/articl...
        
           | bee_rider wrote:
           | > That is certainly what Microsoft wanted people to think[0]:
           | 
           | Maybe I'm reading between the lines in your post too hard,
           | but are you saying they wanted people to think this because
           | it is somehow less embarrassing or makes them look better?
           | Including this "repeat after me" functionality seems like an
           | extremely stupid move, like I must assume they found the 3
           | programmers who've never encountered the internet or
           | something.
           | 
           | In 2016, I can see thinking they got the filtering right and
           | that users wouldn't be able to re-train the bot as a sort of
           | reasonable mistake to make, on the other hand. It doesn't
           | look so bad, haha.
        
             | espadrine wrote:
             | Yes, they employed security terminology for something that
             | was instead data pipeline contamination. As the saying
             | goes, garbage in: garbage out. I don't mean to be harsh on
             | them though: experimentation is useful, and it became a
             | great lesson on red teaming models.
        
       | stanleykm wrote:
       | When these articles pop up on HN at least there seems to be a lot
       | of focus on training poisoning. While intellectually interesting,
       | it seems less useful or practical than defeating inference.
        
       | sonorous_sub wrote:
       | how to train self-smashing looms
        
       ___________________________________________________________________
       (page generated 2024-03-24 23:01 UTC)