[HN Gopher] Stable Diffusion Is the Most Important AI Art Model ...
___________________________________________________________________
Stable Diffusion Is the Most Important AI Art Model Ever
Author : brundolf
Score : 111 points
Date : 2022-08-28 20:17 UTC (2 hours ago)
(HTM) web link (thealgorithmicbridge.substack.com)
(TXT) w3m dump (thealgorithmicbridge.substack.com)
| nullc wrote:
| It'll be interesting to see what happens when a copyright troll (
| https://doctorow.medium.com/a-bug-in-early-creative-commons-... )
| realizes that they can acquire the rights to models distributed
| under these vague-as-fog moral panic licenses, or distribute
| their own and have people actually use them, and start extracting
| rents.
|
| These licenses will do little to nothing to stop abuse: The
| abusers will already conceal their identities because their
| actions are immoral or even illegal (fraud, harassment, etc). But
| they create a whole host of new liabilities for the users because
| the definitions are exceedingly subjective.
|
| It's tremendously important to make these tools actually open.
| But open with a lurking liability bomb stops short of the goal.
| While stability.ai may never turn into a troll or sell their
| rights to one, that isn't necessarily true for the next model
| that comes around.
| sf_sugar_daddy wrote:
| empiricus wrote:
| I tried searching "nude" prompt. Instant regret.
| metadat wrote:
| Definitely don't search "nipples".. ugh...
|
| https://lexica.art/
|
| (Direct link doesn't work, count yourself lucky)
| amelius wrote:
| Where can I learn more about how this algorithm works?
| r2_pilot wrote:
| This may be of interest to you:
| https://github.com/CompVis/stable-diffusion
| veridies wrote:
| One oddity for me (and I haven't played with a lot of AI art, so
| maybe this is normal): every time I try to describe a person, it
| generates like four to seven different faces.
| can16358p wrote:
| Tried it.
|
| While it's a huge win to be open source, I find the results
| always inferior to Midjourney (and DALL-E).
|
| I tried to generate some artistic results with variety of prompts
| and Midjourney always won hands down.
|
| But of course, since it's open source, many community tweaks and
| colab notebooks/forks will probably put it in par with DALL-E by
| time. But I have trouble imagining Stable Diffusion competing
| against Midjourney anytime soon: the different is day and night.
| tough wrote:
| Mid journey uses SD under the good (as of recently, no?)
| afpx wrote:
| After generating 5000 images with these tools, I believe the
| killer app will be the one that gives the artist the most
| control. I want a view and a scene and be able to manipulate both
| in real time.
|
| Like,
|
| View: 50mm film, wide-angle
|
| Scene: rectangular room with window -> show preview
|
| Scene: add table -> show preview
|
| Scene: move table left -> show preview
|
| Scene: add mug on table -> show preview
|
| View: center on mug
|
| Right now, there's little control and it's a lot of random
| guessing, "Hmm what happens if I add these two terms?"
| gamegoblin wrote:
| Have you seen the img2img results? You draw kind of a crappy
| Microsoft Paint style image, give it some text for how you want
| it to actually look, and it does the transformation.
|
| For example:
| https://www.reddit.com/r/StableDiffusion/comments/wwgge8/ano...
|
| Consider also this example of someone splicing Stable Diffusion
| into a proper image editor and using a combination of img2img,
| text to image, inpainting, and normal photoshop tools:
| https://www.reddit.com/r/StableDiffusion/comments/wyduk1/sho...
| orbital-decay wrote:
| The natural language alone is one of the worst ways to control
| image generation. The model knows how to generate anything, but
| it's own "language" is nothing like yours. It's like writing in
| Finnish, twisting it in such a way that it would yield coherent
| Chinese poems after Google Translate. You will end up inserting
| various garbage into your input and not getting the result you
| like anyway. img2img gives much better result because you can
| explain your intent with higher order tools than just textual
| input alone.
|
| What would be best is to properly integrate models like that
| into some painting software like Krita. Imagine a brush that
| only affects freckles, blue teapots, fingers, or sharp corners.
| (or any other thing in a prompt) Or a brush that learns your
| personal style and transfers it onto a rough sketch you make,
| speeding up the process. Many possibilities.
|
| I think they are already making an img2img plugin for
| Photoshop. Watch the demo, it's kind of impressive. [0] It's
| just a rudimentary prototype of what's possible with a properly
| trained model, but it already looks like a drop-in replacement
| for photobashing (as an example).
|
| https://old.reddit.com/r/StableDiffusion/comments/wyduk1/sho...
| adhesive_wombat wrote:
| Reminds me of the holodeck scene where Picard(?, Edit, Geordi)
| reconstructs a table with what I, at the time, thought was a
| pretty vague set of specifications.
|
| Turns out the _Star Trek_ predicted 2020 's style AI behaviour
| rather well. Considering nuclear war is then due in 2026,
| that's disconcerting.
| spywaregorilla wrote:
| I think the ideal UX will be the ability to markup images with
| little comments and have it adapt accordingly. The prompt
| interface is bad. One of the biggest reasons being that you
| have virtually no control on the spatial aspect of your
| additions. Being able to say "add an elephant here and remove
| this lamp" will be big. Being able to do so with a doodle of an
| elephant to suggest posing will be even better.
| Buttons840 wrote:
| I saw the word "safety" a few places in the article. What does
| "safety" mean in this context?
| i_like_apis wrote:
| In this context, it's buzz-killing, pearl-clutching, often
| woke, nonsense:
|
| ... attempting to limit the generation space to omit porn,
| copyright infringement, violence, racially "unbalanced" content
| ... etc.
| ttul wrote:
| You can ask the AI to generate a picture of horrid things, and
| it will oblige.
| emikulic wrote:
| You can generate things the author doesn't like. And since
| you're doing it on your video card at home, nobody can stop
| you.
| Buttons840 wrote:
| If a computer model could produce the world's best porn,
| would that be a good or bad thing? Many harmful effects of
| porn would be amplified, but it would reduce the
| exploitation of real people in the industry. A moral
| question society will soon face I think.
| whywhywhywhy wrote:
| I've generated over 1000 images in the last 48 hours. It's better
| and faster than using Dall-E, I can literally just leave a prompt
| churning away in the background for the same costs of playing a
| high end videogame and check on the results when I want.
|
| Honestly if I was a commercial concept artist or illustrator that
| didn't have a signature style I'd be really worried. We're truly
| gonna see the power of this tech as a tool now it's not gatekept.
| daenz wrote:
| >once they figure out how to control potentially harmful
| generations
|
| Is it just me, or does anyone else think that this is an
| impossible and futile task? I don't have a solid grasp on what
| kind of censorship is possible with this technology, but the goal
| seems to be on par with making sure nobody says anything mean
| online. People are extremely creative and are going to find the
| prompts that generate the "harmful" images.
| ironmagma wrote:
| It's impossible and futile, but that has never stopped
| legislators or attorneys before.
| zmmmmm wrote:
| devil's advocating, given they have trained it so well to
| generate images in spite of all expectations, is it really so
| hard to imagine that they can't also train it to understand
| what images not to generate? It already had to understand not
| to generate things that don't make sense to humans. How does
| this not just amount to "moar training"? The hardest thing is
| that the training data it will need is a gigantic store of
| objectionable (and illegal) content ... probably not something
| many groups are eager to build and host.
| systemvoltage wrote:
| People _need_ to read John Carmack and John Romero 's epic
| adventures of Doom: https://www.amazon.com/Masters-of-Doom-
| David-Kushner-audiobo...
|
| Even in the 90's they had to fight hordes and hordes of
| Californian nutjobs (Diane Feinstein et. al.) that wanted to
| ban violent video games. These people would be certainly
| cancelled in today's world, wouldn't hold a chance. Because,
| how dare you allow violence in video games to ...children!?
|
| Our civilization depends on allowing wacko's do their thing as
| far as it is within limits of the law. Let them be offensive as
| fuck. These are the people that herald and propel society
| forward by their heterodox thinking. Society is going to decay
| fast, it already is.
| daenz wrote:
| You make a great point... we can't stop the decay, so the
| growth has to outpace it.
| JaimeThompson wrote:
| >hordes of Californian nutjobs (Diane Feinstein et. al.)
|
| Lots of people from other states, including Texas, in that
| list too. It wasn't just a California / Left issue.
| systemvoltage wrote:
| Yea definitely when they started a studio in Dallas, I
| don't remember the congress persons that were on similar
| stance as Diane. During the 90's, progressives played a
| larger role though. There was also Mortal Kombat fiasco:
|
| > During the U.S. Congressional hearing on video game
| violence, Democratic Party Senator Herb Kohl, working with
| Senator Joe Lieberman, attempted to illustrate why
| government regulation of video games was needed by showing
| clips from 1992's Mortal Kombat and Night Trap (another
| game featuring digitized actors).
|
| https://en.wikipedia.org/wiki/1993_United_States_Senate_hea
| r...
| mrtksn wrote:
| I find it very immoral too, it's like the islamist trying to
| prevent the prophet pictures drawn. Not that I want to offend
| muslims or make "harmful" content but this notion that specific
| type of content creation needs to be imposed is very very
| problematic. Americans freak out of nudity all the time,
| something that is not considered harmful in many other places.
| The fear of images and text and the mission to restrain it is
| pathetic.
|
| Anyway, it won't be possible to contain it. Better spend the
| effort on how to deal with bad actors instead of trying to
| restrain the use of content creation tools.
| derac wrote:
| Yeah, it's taking the impulse to control everything from our
| own mind and putting it into an artificial one. Seems to me a
| lot of our suffering is borne of that impulse.
| quitit wrote:
| I don't see the point. Idiots are fooled by far less convincing
| images.
|
| Humanity has had the ability to lie with pictures since the
| invention of photography. The field of special effects can be
| described as lying about things that don't matter.
|
| Without using Stable Diffusion, I can still photoshop an image
| or deepfake a video. Stable Diffusion isn't really changing
| what's possible here, and arguably is less advanced than what's
| possible with Deepfakes or even the facial filters available on
| social networks.
|
| Like with all deceptive imagery: one just needs to use their
| noggin.
|
| * Also I might add: the article is actually out of date on some
| aspects, because this technology is evolving so rapidly.
| Literally every day there is a new and interesting way that
| people are applying the tech.
| buildbot wrote:
| Yeah, the best they can do is the filter's on top of the
| output. These models are complex enough that with some reverse
| engineering you can find "secret" languages to instruct them
| that would be able to get around input filtering.
| adhesive_wombat wrote:
| AI Engine Optimisation could be a good consultancy gig.
| Figure out how to get your clients the results they want by
| gaming the rules and filters.
|
| Reminds me of the mysterious control of Conjoiner Drives in
| Alastair Reynold's books.
| yieldcrv wrote:
| I was once a guest at a tech think tank, early 2000s, people
| all in their 60s at the time
|
| They spent years grappling with online worlds because of the
| idea that people might/could represent themselves as a
| different gender, they wanted the technology to exist and had
| dreamed about it for decades they just got caught up on that
|
| That was comical because it was also out of touch at the time
| period as well
|
| Its interesting how people squirrel and spiral over useless
| things for some time
| hertzrat wrote:
| How Orwellian. It's like newspeak: make it impossible to
| express certain thoughts
| tjs8rj wrote:
| Regardless of the practicality: why do they think it's their
| role to be the morality police?
|
| If there's anything we've learned from history, it's that we've
| always been morally wrong in some way, very often in our most
| strongly held beliefs. This AI in a different time would be
| strictly guided to produce pro-(Catholic
| Church/eugenics/slavery/racist/nationalist) content.
| worldsayshi wrote:
| I think they are just afraid of bad publicity. We remember
| some AI experiments mostly for their ability to generate
| profanity.
| armchairhacker wrote:
| The thing is that people can make harmful art themselves.
| Photoshopping people's faces on nudes and depicting graphic
| violence has been a thing since digital photography if not
| painting in general. I mean, look at all the gross stuff which
| is online and was online way before these Neural Networks.
|
| The issue with these neural networks isn't the content they
| create, it's that they can create _massive_ amounts of content,
| very easily. You can now do things like: write a Facebook
| crawler which photo-shops people 's photos on nudes and sends
| those to their friends; send out mass phishing emails to old
| people with pictures of their grand-kids bloody or in hostage
| situations; send out so many Deepfakes for an important person
| that nobody can tell whether any of their speeches is
| legitimate or not. You can also create content even if you have
| no graphic design skills, and create content impulsively,
| leading to more gross stuff online.
|
| Spam, misinformation, phishing, and triggering language are
| already major issues. These models could make it 10x worse.
| rcoveson wrote:
| Where today it takes some far-from-Jesus deviant artists a
| whole day to draw a picture of Harry Potter making out with
| Draco Malfoy, with the power of AI, billions of such images
| will flood the Internet. There's just no way for a young
| person to resist that amount of gay energy. It's the
| apocalypse fortold by John the Revelator.
| adhesive_wombat wrote:
| > It's the apocalypse fortold by John the Revelator.
|
| I _literally_ read a chapter of _Inhibitor Phase_ where
| there 's a ship called "John the Revelator" less than an
| hour ago. I haven't otherwise seen that phrase written down
| for years.
|
| Spooky (and cue links to the Baader-Meinhof Wikipedia
| article).
| dkjaudyeqooe wrote:
| Reminds me of a toy girl doll I heard about which had a speech
| generator which you could program to say sentences but had
| "harmful" words removed, keeping only wholesome ones.
|
| I immediately came up with "Call the football team, I'm wet"
| and "Daddy lets play hide the sausage" as example workarounds.
|
| It's entirely pointless. Humans are vastly superior in their
| ability to subvert and corrupt. Even if you were able to catch
| regular "harmful" images humans would create a new categories
| of imagery which people would experience as "harmful", employ
| allusions, illusions, proxies, irony etc. It's endless.
| zzleeper wrote:
| I naively asked for a "sperm whale opening its mouth in the
| middle of the ocean" on DALL-E and got a warning :/
| torotonnato wrote:
| Another example of funny subvertion, Chinese style:
| https://www.wikiwand.com/en/Baidu_10_Mythical_Creatures
| worldsayshi wrote:
| Sure it would be a fools errand to filter out "harmful"
| speech using traditional algorithms. But neutral networks and
| beyond seems like exactly the kind of technology that is able
| to respond to fuzzy concepts rather than just sets of words.
| Sure it will be a long hunt but if it can learn to paint and
| recognize a myriad of visual concepts it ought to be able to
| learn what we consider to be harmful.
| adhesive_wombat wrote:
| Right? Any keyboard can generate "harmful" content, do we need
| to figure out how to prevent "harmful generations" at the USB
| HID level?
| orbital-decay wrote:
| I half expect this could be a genuine startup these days.
| robocat wrote:
| Run the filter on the image output, not the written input?
| jwitthuhn wrote:
| Stable diffusion does run a filter on the output in its
| default configuration. Any image it deems 'unsafe' gets
| replaced with a picture of Rick Astley.
|
| The thing about that is that it is open source, so you can
| trivially disable that filter if you like.
|
| https://github.com/CompVis/stable-
| diffusion/blob/69ae4b35e0a...
| modeless wrote:
| OpenAI's filters are a total joke. I tried to upload The
| Creation of Adam (from the Sistene Chapel), blocked for adult
| content. "Continued violations may restrict your account".
| Yeah, it has naughty bits in it, but it's probably in the top
| ten most recognizable pieces of art ever made. I tried to
| generate an image of "yarn bombing", blocked for violence. They
| have the most advanced AI in the world and they can't solve the
| Scunthorpe problem?
| justinjlynn wrote:
| The most advanced AI in the world isn't advanced enough to
| solve that, just yet. Either that, or it's not worth it for
| them to use it to do so.
| geoah wrote:
| I was repeatedly warnings by gpt3 for trying to create images
| of a "rubber duck". No idea what it thought I was looking
| for.
| xt00 wrote:
| The _reason_ why this is such a game changer is that it is not
| controlled on some central server.. its like saying paper and
| pencils can be revoked from people if somebody doesn 't like
| what you do with it... its an amazing new technology.. let
| people use it..
| buildbot wrote:
| I have been playing around with it using ROCM+6900XT, makes a
| good alternative to DALLE. They have different strengths, DALLE
| seems better at lighting instructions and cityscapes, but Stable
| Diffusion is better at sketches.
|
| Also, you can fine tune it on whatever you want which is awesome.
|
| One interesting effect I have noticed on myself though is that
| after staring at DALLE or Stable Diffusion generated images for a
| long time then viewing "real" media, I get the same sense of
| wrongness that the output is not quite right for awhile, like my
| brain has been tweaking its processing to prefer the AI art as
| the ground truth!
| dkjaudyeqooe wrote:
| I dub it Induced Uncanny Valley Syndrome.
| 71a54xd wrote:
| Yep, I've found the same to be true. Hopefully at some point
| the model is optimized to introspect complex lighting /
| textures a bit better.
| cube2222 wrote:
| That's funny, for me dalle2 is in practice miles ahead on
| pencil sketches, but stable diffusion is cool because the
| parameters can be customized, which helps with many phrases.
| Also, you can just leave it running and producing images for an
| hour.
|
| Also, there's no content filtering, but I don't recommend
| playing around with that if you're sensitive. The lifeless
| husks and various mixes of body parts I got when playing around
| with it with _fairly_ benign phrases could very well be used
| for a horror movie.
|
| It might be that I haven't yet found the right phrase for
| stable diffusion for pencil sketches though, as for dalle2 it's
| just "<describe what you want>, artstation, pencil sketch, 4k"
| to generate consistently great pictures.
| at_a_remove wrote:
| 4chan is having a field day with AI generated porn of
| celebrities (often with ridiculous prompts) and selecting the
| most unsettling. One for Billie Ellish looks like some kind
| of orphaned shoggoth/succubus hybrid just made its first
| attempt at luring in someone for a meal: "You like human
| females, yes?" Cataract eyes, aggressive lobotomy mouth, it
| forgot to pay attention to shoulders and didn't know spines
| existed or what they were for. Or a second attempt, this time
| at Bjork, suggesting some kind of lost hominid which consumed
| only melons in a predator-rich environment.
| [deleted]
| fernly wrote:
| It isn't going to impress the person in the street until it
| actually follows your instructions. I tried several times to
| express "a tall three-legged stool" but even with the "CFG" (how
| much the image will be like your prompt) at max, it gave me
| stools with four or ultimately, two legs. Also tried "a four-
| legged spider" (don't ask) and got first an eight-legged spider,
| and next, a spider with eight legs, but four of them were
| blurred. Sure, dumb, pedestrian requests, no imagination, but a
| five-year-old would quickly get impatient with its inability to
| follow simple directions.
___________________________________________________________________
(page generated 2022-08-28 23:00 UTC)