[HN Gopher] Planting Undetectable Backdoors in Machine Learning ...
___________________________________________________________________
Planting Undetectable Backdoors in Machine Learning Models
Author : belter
Score : 142 points
Date : 2022-04-17 21:46 UTC (2 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| sponaugle wrote:
| Overall this seems somewhat intuitive - If I offer to give you a
| ML model that can identify cars, and I know you are using it in a
| speed camera, I might train the model to recognize everything as
| you expect with the exception that if the car has a sticker on
| the window that says "GTEHQ" it is not recognized. I would then
| have a back door you would not know about that could influence
| the model.
|
| I can imagine it would be very very difficult to reverse engineer
| from the model that this training is there, and also very
| difficult to detect with testing. How would you know to test this
| particular case? The same could be done for many other models.
|
| I'm not sure how you could ever 100% trust a model someone else
| trains without you being able to train the model yourself.
| blopker wrote:
| I wonder if model designers will start putting in these
| exceptions, not to be malicious, but to prove they made the
| model. Like how map makers used to put in "Trap Streets"[0] in
| their maps. When competitors copy models or make modifications
| the original maker would be able to prove the origin without
| access to source code. Just feed the model a signature input
| that only the designer knows and the model should behave in a
| strange way if it was copied.
|
| [0] https://en.wikipedia.org/wiki/Trap_street
| criticaltinker wrote:
| This is known as a digital watermark, which falls within the
| domain of adversarial machine learning.
|
| https://www.sec.cs.tu-bs.de/pubs/2018-eurosp.pdf
| paulmd wrote:
| > I'm not sure how you could ever 100% trust a model someone
| else trains without you being able to train the model yourself.
|
| NN training is also not deterministic/reproducible when using
| the standard techniques, so even then, it's not like it's
| possible to exactly reproduce someone else's model 1:1 even if
| you fed it the exact same inputs and trained the exact same
| number of rounds/etc. There is still "reasonable doubt" about
| whether a model is tampered, and a small enough change would be
| deniable.
|
| (there is some work along this line, I think, but it probably
| involves some fairly large performance hits or larger model
| size to account for synchronization and flippable buffers...)
| wallscratch wrote:
| It should be totally deterministic with the use of random
| seeds, I think.
| serenitylater wrote:
| qorrect wrote:
| He might mean features like dropout, or out of sample
| training, randomness that's introduced during training. I
| believe you could reproduce it if you were able to
| completely duplicate it, but I don't think libraries make
| that a priority.
| visarga wrote:
| What was the size of the model(s)?
| boilerupnc wrote:
| Disclosure: I'm an IBMer
|
| IBM research has been looking at data model poisoning for some
| time and open sourced an Adversarial Robustness Toolbox [0]. They
| also made a game to find a backdoor [1]
|
| [0] https://art360.mybluemix.net/resources
|
| [1] https://guessthebackdoor.mybluemix.net/
| a-dub wrote:
| i would guess that it might be possible to poison a model by
| perturbing training examples in a way that is imperceptible to
| humans. that is, i wonder if it's possible to mess with the
| noise or the frequency domain spectra of a training example
| such that a model learned on that example would have
| adversarial singularities that are easy to find given the
| knowledge of how the imperceptible components of the training
| data were perturbed.
|
| has anyone done this or anything like it?
| not2b wrote:
| "How can we keep our agent from being identified? Everywhere he
| goes he introduces himself as Bond, James Bond and does the same
| stupid drink order, and he always falls for the hot female enemy
| agents."
|
| "Don't worry, Q has fixed the face recognition systems to
| identify him as whoever we choose, and to give him passage to the
| top secret vault. But it would help if if he would just shut up
| for a while".
| DanielBMarkham wrote:
| I know that this is about inserting data into training models,
| but the problem is generic. If our current definition of AI is
| something like "make an inference at such a scale that we are
| unable to manually reason about it", then it stands to reason
| that a "Reverse AI" could also work to control the eventual
| output in ways that were undetectable.
|
| That's where the real money is at: subtle AI bot armies that
| remain invisible yet influence other more public AI systems in
| ways that can never be discovered. This is the kind of thing that
| if you ever hear about it, it's failed.
|
| We're entering a new world in which computation is predicable but
| computational models are not. That's going to require new ways of
| reasoning about behavior at scale.
| kvathupo wrote:
| (Disclaimer: I skimmed the article, and have it on my to-be-read)
|
| When I first encountered the notion of adversarial examples, I
| thought it was a niche concern. As this paper outlined, however,
| the growth of "machine-learning-as-a-service" companies (Amazon,
| Open AI, Microsoft, etc.) has actually rendered this a legitimate
| concern. From my skimming, I wanted to highlight their
| interesting point that "gradient-based post-processing may be
| limited" in mitigating a compromised model. These points really
| bring these concerns from an academic to business realm.
|
| Lastly, I'm delighted that they acknowledge their influences from
| the cryptographic community with respect to rigorously
| quantifying notions of "hardness" and "indistinguishable." Of
| note, they seem to base their undetectable backdoors on the
| assumption that the shortest vector problem is not in BQP. As I
| recently learned looking at the NIST post-quantum debacle, this
| has been a point of great contention.
|
| I've in all likelihood mischaracterized the paper, but I look
| forward to reading it!
| ks1723 wrote:
| As a side question: What is the NIST post-quantum debacle?
| Could you give some references?
| izzygonzalez wrote:
| One of their post-quantum bets did not work out.
|
| https://news.ycombinator.com/item?id=30466063
| belter wrote:
| "...We show how a malicious learner can plant an undetectable
| backdoor into a classifier. On the surface, such a backdoored
| classifier behaves normally, but in reality, the learner
| maintains a mechanism for changing the classification of any
| input, with only a slight perturbation. Importantly, without the
| appropriate "backdoor key," the mechanism is hidden and cannot be
| detected by any computationally-bounded observer. We demonstrate
| two frameworks for planting undetectable backdoors, with
| incomparable guarantees..."
|
| PDF: https://arxiv.org/pdf/2204.06974.pdf
| monkeybutton wrote:
| In the future one might wonder if they were redlined in their
| loan application, or picked up by police as a suspect in a
| crime, because an ML model really flagged them, or because of
| someone "thumbing the scale". What a boon it could be for
| parallel construction.
| V__ wrote:
| Jesus. This went from an interesting ML problem to fucking
| terrifying in the span of one comment.
| SQueeeeeL wrote:
| Yeah, we really shouldn't be using these models for
| anything of meaningful consequence because they're black
| boxes by their nature. But we already have neural nets in
| production everywhere.
| hallway_monitor wrote:
| I believe this talk [0] by James Mickens is very
| applicable. He touches on trusting neural nets with
| decisions that have real-world consequences. It is
| insightful and hilarious but also terrifying.
|
| https://youtu.be/ajGX7odA87k "Why do keynote speakers
| keep suggesting that improving security is possible?"
| fshbbdssbbgdd wrote:
| Every decision maker in the world is an undebuggable
| black box neural net - with the exception of some
| computer systems.
| V__ wrote:
| But I can ask the decision maker to explain his decision-
| making process or his arguments/beliefs which have led to
| his conclusion. So, kinda debuggable?
| fshbbdssbbgdd wrote:
| Their answer to your question is just the output of
| another black-box neutral net! Its output may or may not
| have much to do with the other one, but it can produce
| words that will trick you into thinking they are related!
| Scary stuff. I'll take the computer any day of the week.
| [deleted]
| tricky777 wrote:
| Sounds bad for quality assurance and auditing.
| galcerte wrote:
| It sure looks like such models are going to have to undergo the
| same sort of scrutiny regular software does nowadays. No more
| closed-off and rationed access to the near-bleeding-edge.
| gmfawcett wrote:
| Wouldn't they deserve far more scrutiny? I know how to review
| your source code, but how do I review your ML model?
| joe_the_user wrote:
| Well, this show ML models _should_ receive the scrutiny regular
| software. But of course regular software often doesn 't receive
| the scrutiny it ought to. And before this, people commented
| that ML was "the essence of technical debt".
|
| With companies like Atlassian just going down and not coming,
| one wonders whether the concept of a technical Ponzi Scheme and
| technical collapse might be the next thing after technical and
| it seems like the fragile ML would more accelerate than stop
| such a scenario.
___________________________________________________________________
(page generated 2022-04-19 23:00 UTC)