[HN Gopher] Some notes on the Stable Diffusion safety filter
___________________________________________________________________
Some notes on the Stable Diffusion safety filter
Author : Tomte
Score : 99 points
Date : 2022-11-18 16:10 UTC (6 hours ago)
(HTM) web link (vickiboykis.com)
(TXT) w3m dump (vickiboykis.com)
| vintermann wrote:
| Now I really want to see the naughty picture of a dolphin
| swimming in a sea of vectors that the model refused to show us.
| sacrosancty wrote:
| minimaxir wrote:
| Unfortunately the safety filters have enough false positives
| (basically any image with a large amount of fleshy color) to the
| point that it's just easier to disable it and handle it manually.
| langitbiru wrote:
| "Gal Gadot wearing green suit" triggered it while "Tom Cruise
| wearing green suit" didn't.
| naet wrote:
| Might be the word "gal" (which can mean girl or young woman).
| Wistar wrote:
| As did "young children watching sunset," but not "young boy
| and girl watching sunset."
| [deleted]
| criddell wrote:
| And then the CSAM filter on your device reports you to some
| authority.
| iceburgcrm wrote:
| Only available on the latest iphone
| [deleted]
| Gigachad wrote:
| Apple ended up not implementing that iirc. While Google
| Photos has had it the whole time.
|
| Googles is actually worse. Apple was only going to match
| against known CSAM images while google has ML to identify
| new images which resulted in one parent being arrested for
| a medical image of their own child.
| BoorishBears wrote:
| If you have things on your device that match entries in the
| CSAM database, yes there's a chance you're a victim of a
| targeted attack taking advantage of highly experimental
| collisions... but the odds you "accidentally generated" that
| content are not realistic.
| yeet_yeet_yeet wrote:
| >not realistic
|
| The odds are zero.
|
| 1/2^256 = 0.
|
| In cryptography these odds are treated as zero until you
| generate close to 2^128 images.
|
| Unfortunately there's no word in natural English to
| describe how unlikely. The most precise is "zero".
| criddell wrote:
| How can you be so sure? As I understand it, the hash is
| of features in the image and not the image itself. Are
| the CSAM feature detection heuristics public?
| ben_w wrote:
| Are you assuming that digital images are evenly
| distributed over the set of all possible 256 bit vectors?
|
| Because I don't think that's a reasonable assumption.
|
| Even if image recognition was perfectly solved with no
| known edge cases (ha!), when an entire topic is a
| semantic stop-sign for most people, you can't expect the
| mysterious opaque box that is a guilty-enough-to-
| investigate detection mechanism to be something that gets
| rapid updates and corrections when new failure modes are
| discovered.
| jerf wrote:
| You should spend some time with an internet search engine
| and the term "perceptual hashing". What you're talking
| about is another type of hashing, which can be useful for
| classifying image _files_ , but not _images_. The former
| has a very concrete definition that is specified down to
| the bit; the latter is a fuzzy space because it 's trying
| to yield similar (not necessarily identical) hashes for
| images that humans consider similar. Much different
| space, much different problem, much different collision
| situation. Cryptographic hashing is not the only kind of
| hashing.
| yeet_yeet_yeet wrote:
| Oh wow https://www.apple.com/child-
| safety/pdf/CSAM_Detection_Techni... so they essentially
| just use CNN output to automatically determine whether to
| report people to the authorities? For some reason I
| assumed they were just comparing the files they knew to
| be CSAM.
|
| Yeah that's bad. What about deepdream/CNN reversing?
| Couldn't a rogue apple engineer just create a innocuous
| looking false positive, say a cat picture, share it on
| Reddit, and everybody who downloads it is flagged to
| police for CSAM?
| netruk44 wrote:
| That'll only work for a little while longer (for future named
| big-public-release models, obviously the cat's out of the bag
| for the current version of stable diffusion), right up until
| the point where they incorporate the filter into the training
| process.
|
| At which point, the end model users get to download will be
| incapable of producing anything that comes close to triggering
| the filter, and there will be no way to work around it short of
| training/fine-tuning your own model, which is prohibitively
| expensive for 'normal' people, even people with top-of-the-line
| graphics cards like a 4090.
| ben_w wrote:
| Training's only prohibitively expensive for normal people
| _today_ , and the dollar cost per compute operation is still
| decreasing fairly rapidly.
| Animats wrote:
| That problem is being solved. Pornhub now has an AI R&D
| unit.[1] Their current project is to upscale and colorize out
| of copyright vintage porn. As a training set, they use modern
| porn. They point out that they have access to a big training
| set.
|
| Next step, porn generation.
|
| [1] https://www.pornhub.com/art/remastured
| pessimizer wrote:
| > Their current project is to upscale and colorize out of
| copyright vintage porn.
|
| But not very well. I collect this stuff and I have my own
| copies, so I can tell you that this doesn't look better
| than the b/w originals in quality/detail, and it's easy to
| see that the color is not great, especially if there are
| lots of hard lights and shadows dancing around.
|
| That being said, I don't know why it's not working. Seems
| like it should work. I'd expect it to at least be clean of
| scratches and stabilized. Any relevant papers I should read
| about AI restoration of old film?
| sillysaurusx wrote:
| For a glimpse at what's possible:
|
| https://www.reddit.com/r/unstablediffusion
|
| https://www.reddit.com/r/aiwaifu
|
| I've been trying to generate tentacle porn since 2019 or
| so. It's the whole reason I got into AI. We're finally
| there, and it only took three years.
|
| Can't wait to see what 2026 brings.
| http://n.actionsack.com/pic/media%2FFh08F_hXkAAhalt.jpg
| GuB-42 wrote:
| > https://www.reddit.com/r/unstablediffusion
|
| This subreddit was banned due to a violation of Reddit's
| rules against non-consensual intimate media.
|
| Interesting. Why "non-consensual"? Does it mean Stable
| Diffusion generated porn of people who actually exist?
| sillysaurusx wrote:
| Sorry all, I was typing it on my phone and missed an
| underscore. Here's the proper link:
|
| https://www.reddit.com/r/unstable_diffusion
| emmelaich wrote:
| unstable_diffusion is still around. Note the underscore.
| sbierwagen wrote:
| Yes, reddit routinely bans deepfake subreddits. In
| practice, this means any net that can produce output that
| looks like any living person is banned.
| pifm_guy wrote:
| Fine-tuning is pretty cheap compared to the original training
| run - perhaps just 1% of the cost.
|
| Totally within reach of a consortium of.... "entertainment
| specialists".
| netruk44 wrote:
| I know a person who fine-tuned stable diffusion, and he
| said it took 2 weeks of 8xA100 80 GB training time, costing
| him somewhere between $500-$700 (he got a pretty big
| discount, too, at today's prices for peer GPU rental it
| would be over $1,000).
|
| Sure, it's peanuts compared to what it must have cost to
| train stable diffusion from scratch. However, I think most
| normal people would not consider spending $500 to fine-tune
| one of these.
|
| Edit: Though I do agree that once this kind of filtering is
| in place during training, NSFW models will begin to pop up
| all over the place.
| minimaxir wrote:
| For spot-finetuning with Dreambooth (not as good as full-
| finetuning but can get a specific subject/style much
| faster), it can be done with about $0.08 of GPU compute,
| although optimizing it is harder.
|
| https://huggingface.co/docs/diffusers/training/dreambooth
| netruk44 wrote:
| Are these services using textual-inversion? If so, I have
| to wonder how well they would work on a stable diffusion
| model that was trained with the filter in place from the
| start, so that it couldn't generate anything close to the
| filter.
|
| As it is right now, stable diffusion _can_ generate adult
| imagery by itself, however it seems like it 's been fine-
| tuned after the fact to try to 'cover up' that fact as
| much as they could before releasing the model publicly.
| gpderetta wrote:
| As far as I understand textual inversion != Dreambooth !=
| Actual fine-tuning
| seaal wrote:
| I believe the safety filter is trivial to disable since
| it was added in one of the last commits prior to Stable
| Diffusion's public release and not baked into the model,
| therefore most forks just remove the safety checker code
| [1]
|
| As far as textual inversion, JoePenna's Dreambooth [2]
| implementation uses Textual Inversion.
|
| [1] https://github.com/CompVis/stable-
| diffusion/commit/a6e2f3b12... [2]
| https://github.com/JoePenna/Dreambooth-Stable-Diffusion
| cookingrobot wrote:
| You can fine tune stable diffusion for $10 using this
| service: https://www.strmr.com/
|
| It works super well for putting yourself in the images,
| the likeness is fantastic.
|
| It's obviously a small training process, they only take
| 20 images, but it works.
| TaylorAlexander wrote:
| This prediction doesn't track with what is already happening.
| Dreambooth is allowing all kinds of people to fine tune their
| own models at home with nvidia graphics cards, and people are
| sharing all kinds of updated models that do really well at
| specific art styles or with NSFW subjects. Go check the nsfw
| subreddit unstable_diffusion for examples. It seems lots of
| people are training nsfw models with their own preferred data
| sets and last I saw someone merged all those checkpoints
| together in to one model.
|
| So if I made a prediction it would be that the training sets
| for open models from big companies will get scrubbed of nsfw
| content and then nerds on Reddit will just release their own
| versions with it added in, and the big companies will make
| sure everyone knows they didn't add that stuff and that's
| where it will stand.
| netruk44 wrote:
| I agree with your prediction. Sorry, I was unclear in my
| post, and left that part unsaid. I agree that it will
| likely just be the big newly released 'base' models that
| will be scrubbed of NSFW images, but there's really no way
| to prevent these models from making those kinds of images
| _at all_.
|
| It will only take some dedicated individuals, which I know
| there is no shortage of.
| langitbiru wrote:
| The AI-generated art with Dreambooth works only for avatar
| type pics. It cannot create fancy gestures (doing a
| complicated movement with hands, like patting a cat). For
| now.
| cuddlyogre wrote:
| I can understand giving a user the option to filter out something
| they might not want to see. But the idea that the technology
| itself should be limited based on the subjective tastes and whims
| of the day makes my stomach churn. It's not too disconnected from
| altering a child's brain so that he is incapable of understanding
| concepts his parents don't like.
| par wrote:
| Interesting write up but kind of moot considering there are many
| nsfw models that are super easy to plugin and use along side
| stable diffusion (via img2img) to generate all manners of imagery
| to your hearts content.
| dimensionc132 wrote:
| [deleted]
| tifik wrote:
| > Using the model to generate content that is cruel to
| individuals is a misuse of this model. This includes, but is not
| limited to:
|
| ... >+ Sexual content without consent of the people who might see
| it
|
| I understand that it's their TOS and they can put pretty much
| anything in there, but this item seems... odd. I don't really
| know why exactly this stands out to me. Maybe it's because it's
| practically un-enforceable? Are they just covering all their
| bases legally?
|
| Trying to think of a good metaphore; let's try this: If you are
| an artist and someone commissions you to create an art piece that
| might be sexual, can you say "ok, but you have to ask for consent
| before you show it to people", and you enshrine it in the
| contract. Obviously gross violations like trolling by spamming
| porn are pretty clear cut, but what about the more nuanced cases
| when you say, display it on your personal website? Are you
| supposed to have an NSFW overlay? Isn't opening a website sort of
| implying that you consent to seeing whatever is on there, unless
| you have a strong preconception of what content the page is
| expected to display?
|
| I might be hugely overthinking this.
| bawolff wrote:
| To me, i think it seems weird because its disconnected from
| stable diffusion.
|
| I think the comparison would be if google maps had a terms of
| service forbidding using it to plan getaway routes during bank
| robberies. Like yes bank robberies are wrong, but if someone
| did that the sin would not be with google maps.
| properparity wrote:
| We need to let it completely loose and get everyone exposed to
| it everywhere so that maybe we can finally get rid of this
| insane taboo and uptightness about sex and nudity we have in
| society.
| octagons wrote:
| Nudity? Yes. Pornography? No.
| rcoveson wrote:
| Football? Yes. Violence? No.
|
| Try getting that rule passed on any form of media.
| octagons wrote:
| Sorry, I wasn't clear. I'm not suggesting any regulation.
| I'm saying that I agree that "society" (in my case,
| American culture) has taken the idea of shielding
| children from viewing pornography to an extreme, where
| nudity in media, even in a non-sexual context, is often
| censored.
|
| I think this ultimately causes more harm to a society
| instead of benefitting it. I don't think this is a very
| unique viewpoint, but my choice of words in that other
| comment didn't communicate this point very well.
| archontes wrote:
| No rules.
| pbhjpbhj wrote:
| >insane taboo and uptightness about sex and nudity we have in
| society //
|
| In the UK we're on aggregate definitely too uptight about
| nudity, but sex ... inhibition towards things like
| infidelity, promiscuity, fecundity, seems like a relatively
| good thing. Sex being the preserve of committed relationships
| is not a problem to fix to my view.
|
| It _sounds_ like you think we should basically be bonobos?
| Preoccupied with carnal interactions to the exclusion of all
| else?
| practice9 wrote:
| > Preoccupied with carnal interactions to the exclusion of
| all else?
|
| I think the poster means that people are already too
| preoccupied with banning sex to the detriment of everything
| else. It leads to various perversions like normalization of
| violence through loopholes in the media. "Fantasy violence"
| is an amusing term.
|
| Although to be fair, loli and some weird anime stuff
| generated by AI nowadays is on the opposite end of this
| spectrum.
| ben_w wrote:
| I have sometimes thought that it's a shame humans came from
| the sadistically violent branch of the primate family
| rather than the constantly horny branch.
|
| Even before I learned about the horny branch of primates,
| as a teenager in the UK I thought it was _very weird_ that
| media -- games, films, TV shows, books, etc. -- were all
| able to depict lethal violence to young audiences, while
| conversely _consensual sex_ was something we could only
| witness when we were _two years above_ the age of consent
| in the UK.
| ActorNightly wrote:
| The taboo aspect is irrelevant. The biggest thing is to take
| away these power levers from people who abuse them for
| personal goals. Remember when the whole Pornhub CC payment
| issue happened? That was because of supposed "child
| pornography/trafficking".
| digitallyfree wrote:
| The SD terms also mention that the model and its generated
| outputs cannot be used for disinformation, medical advice, and
| several other things. It looks like the only way to legally
| protect yourself would be to require a contract from everyone
| buying your SD artwork asserting that they will also comply
| with the full SD license terms.
|
| While this may work if you're selling the art electronically
| and provide the buyer with a set of terms to accept, this would
| be difficult if you're selling the work physically. For
| instance if I sell a postcard with SD art on it in a
| convenience store, the buyer won't be signing any contracts.
| However the buyer could display that postcard in a manner that
| is technically disinformation (e.g. going around telling people
| the picture on the postcard is a genuine photograph) and
| suddenly that becomes a license violation.
| formerly_proven wrote:
| Stable Diffusion is developed at LMU Munich and this particular
| line basically paraphrases SS 184 of the German criminal code,
| which makes it a misdemeanor crime to put porn in places
| reachable by minors or to show porn to someone without being
| asked to do so, among other things. I dunno why they felt
| compelled to include it though.
|
| Regarding your examples, most of these are technically criminal
| in Germany, because the only legally safe way to have a place
| not-reachable-by-minors means adhering to German youth
| protection laws, which you're not going to, just like every
| porn site, Twitter, Reddit etc.
| krisoft wrote:
| > If you are an artist and someone commissions you to create an
| art piece that might be sexual, can you say "ok, but you have
| to ask for consent before you show it to people", and you
| enshrine it in the contract.
|
| Yes. Obviously. How is that a question?
|
| > Are you supposed to have an NSFW overlay?
|
| Sounds like a reasonable way to comply with the condition.
|
| > I might be hugely overthinking this.
|
| I agree.
| netruk44 wrote:
| I think the issue they're mainly worried about might be
| exemplified with a prompt of 'my little pony'. A children's
| show with quite a lot of adult imagery associated with it on
| the internet.
|
| A child entering this prompt is probably expecting one thing,
| but the internet is _filled_ with pictures of another nature.
| There are possibly more adult 'my little pony' images than
| screenshots of the show on the internet.
|
| Did the researchers manage to filter out these images before
| training? Or is the model aware of both 'kinds' of 'my little
| pony' images? If the researchers aren't sure they got rid of
| _all_ of the adult content, then there 's really no way to
| guarantee the model isn't about to ruin some oblivious person's
| day.
|
| So then, do you require people generating images to be
| intricately familiar with the training dataset? Or do you
| attempt to prevent any kind of surprise like this by just
| blocking 'unexpected' interactions like this?
| jimbob45 wrote:
| _A child entering this prompt is probably expecting one
| thing, but the internet is filled with pictures of another
| nature. There are possibly more adult 'my little pony' images
| than screenshots of the show on the internet._
|
| So everyone has to have gimpy AI just because parents can't
| be expected to take responsibility for what their child does
| and does not see? Why the fuck is a child being allowed to
| play with something that can very easily spit out salacious
| images accidentally? Wouldn't it be significantly easier to
| add censorship to the prompt input instead? It seems like
| these tech companies see yet another opportunity to add
| censorship to their products and can hardly hide their giddy
| excitement.
| RC_ITR wrote:
| Because in aggregate, children seeing those things has
| impacts on society.
|
| Like sure would it be better if parents monitored their
| children's 4chan use? Ofc.
|
| Is that at all a practical approach to eliminating Elliot
| Roger idolization? No.
| netruk44 wrote:
| Just to be clear, the child was just an example of someone
| who could theoretically experience 'cruel' treatment from
| the current version of stable diffusion. I'm absolutely not
| recommending people let their children use the model
| unsupervised. It doesn't have to be a parenting problem,
| though.
|
| The same could be said (for example) of a random mother
| trying to get inspiration for a 'my little pony' birthday
| cake for their child, and being presented with the 'other'
| kind of image unintentionally, without their consent. I
| think they would be justifiably upset in that situation.
|
| If we were to imagine someone attempting to put stable
| diffusion into some future consumer product, I think they
| would _have to_ be concerned about these kinds of
| scenarios. Therefore, the scientists are trying to figure
| out how to accomplish the filtering.
|
| FWIW, I don't think a model could be made that actively
| _prevented_ people from using their own NSFW training data.
| The only difference in the future will be that the public
| models won 't be able to do it 'for free' with no
| modifications needed. You'll have to train your own model,
| or wait for someone else to train one.
| gopher_space wrote:
| > because parents can't be expected to take responsibility
| for what their child does and does not see?
|
| This is an opinion you could only have if you've never
| raised or even spent time around children.
|
| How would your parents have prevented _you_ from
| unsupervised access? Do you think you'd have gone along
| with restrictions?
| calebkaiser wrote:
| I would recommend looking more closely at the article.
|
| Stability.ai, the company who developed and released the
| model being discussed, have not added a safety filter to
| the model. As the article points out, the filter is
| specifically implemented by HuggingFace's Diffusers
| library, which is a popular library for working with
| diffusion models (but again, to be clear, not the only
| option for using Stable Diffusion). The library is also
| open source, and turning off the safety filter would be
| trivial if you felt compelled to do so.
|
| So, "these tech companies" aren't overcome by glee over
| censoring you. One company implemented one filter in one
| open source and easily editable library.
___________________________________________________________________
(page generated 2022-11-18 23:00 UTC)