[HN Gopher] Planting Undetectable Backdoors in Machine Learning ...
___________________________________________________________________
Planting Undetectable Backdoors in Machine Learning Models
Author : return_to_monke
Score : 67 points
Date : 2023-02-25 17:13 UTC (5 hours ago)
(HTM) web link (ieeexplore.ieee.org)
(TXT) w3m dump (ieeexplore.ieee.org)
| doomrobo wrote:
| Preprint: https://arxiv.org/abs/2204.06974
| AlexCoventry wrote:
| > _On the surface, such a backdoored classifier behaves normally,
| but in reality, the learner maintains a mechanism for changing
| the classification of any input, with only a slight
| perturbation._
|
| Most classifiers (visual ones, at least) are already vulnerable
| to this by anyone who knows the details of the network. Is there
| something extra going on here?
| kvark wrote:
| I wonder what RMS would say. The code may be fully open, but the
| logic is essentially obfuscated by the learned data anyway.
| mormegil wrote:
| Well, it's another Reflections on Trusting Trust lesson, isn't
| it.
|
| https://fermatslibrary.com/s/reflections-on-trusting-trust
| im3w1l wrote:
| * * *
| MonkeyMalarky wrote:
| That was my first impression as well. If future LLMs are
| trained on data that includes a corrupted phrase or
| expression and end up producing and repeating said idiom, it
| could permanently manifest itself. Anyways, don't count your
| donkeys until they've flown by midnight.
| version_five wrote:
| My read is that this is some variation of the commonly discussed
| adversarial attacks that can come up with examples that look like
| one thing and are classified as something else, on an already
| trained model.
|
| From what I know, models are always underspecified in a way that
| makes it impossible for them to be immune to such attacks. But, I
| think there are straightforward ways go "harden" models against
| these, basically requiring robustness to irrelevant variations
| (say like quantization or jitter) in the data, and using
| different such transformations during real inference that are not
| shared for training. (Or some variation of this).
|
| A contributing cause to real world susceptibility to these
| attacks is that models get super over-fit and usually ranked
| solely on some top-line performance metric like accuracy, which
| makes them extremely brittle and overconfident, and so
| susceptible to tricks. Ironically a slightly crappier model may
| be much more immune to this
| amrb wrote:
| We've already seen prompt injections and this seems like the
| classic SQL security problem, so are we going to see model
| compromise, as a way to get cheap loans at banks when they try to
| making to speak to a ML model rather than a person for argument
| sake?
| danielbln wrote:
| From October 2022. Here is an article about it:
| https://doctorow.medium.com/undetectable-backdoors-for-machi...
| thomasahle wrote:
| Discussion from last year:
| https://news.ycombinator.com/item?id=31064787
| hinkley wrote:
| I propose that we refer to this class of behavior as "grooming".
| mtkd wrote:
| Most people call it data poisoning, not sure why article didn't
| use that
| schaefer wrote:
| This might be a close fit in strict terms of technical usage of
| the word, but it's a non-starter from the cultural context.
|
| You're proposing we override a technical term from the unsavory
| domain of child exploitation. Please, can we not?
| nonethewiser wrote:
| Why?
| junon wrote:
| Because it's influencing the behavior of a nuanced decision
| making machine (kinda) in order to do your bidding.
|
| I think grooming or "grooming attack" are great names,
| personally.
| ant6n wrote:
| Why not something related to sleeper cell.
| MonkeyMalarky wrote:
| So, reading the summary the idea is that by trusting AWS sage
| maker or whoever to train your models, you open yourself up to
| attack? Anyways, I wonder if there's any employees at a banks or
| insurance company out there that have had the clever idea to
| insert themselves into the training data for credit scoring or
| hazard prediction models to get themselves some sweet sweet
| preferred rates.
| anton5mith2 wrote:
| "Sign in or purchase" seems like some archaic embargo on
| knowledge. Its 2023, really?
___________________________________________________________________
(page generated 2023-02-25 23:00 UTC)