[HN Gopher] Stability AI Launches Stable Diffusion XL 0.9
___________________________________________________________________
Stability AI Launches Stable Diffusion XL 0.9
Author : seydor
Score : 143 points
Date : 2023-06-22 17:21 UTC (5 hours ago)
(HTM) web link (stability.ai)
(TXT) w3m dump (stability.ai)
| EGreg wrote:
| Is there image + text to image prompting?
|
| Can I do dreambooth here? If so, what commands do I use?
| brucethemoose2 wrote:
| You can do this in SD 1.5/2.1 already. Encode the image like
| you do for pix2pix, and the text, then average those latents
| together.
|
| Dreambooth is gonna require an A100 I think... I doubt it will
| work on the free (16GB VRAM) Colab instances.
| bbor wrote:
| For anyone dumb like me: this is NOT a routine announcement,
| definitely read it through. The improvements they show off are
| honestly stunning.
|
| Also, they've (re-) established a universal law of AI: fuck it,
| just ensemble it
| isoprophlex wrote:
| Apparently that's what GPT4 is too: eight GPT3.5's ensembled
| together.
|
| Not sure if true, sounds plausible tho.
| EGreg wrote:
| How does ensembling work? What
| minimaxir wrote:
| Combining the results of multiple models and then adding
| another layer onto the combined output tends to increase
| accuracy / reduce error rates. (not new to AI: it's been done
| for over a decade)
| EGreg wrote:
| But how to combine the results?
| minimaxir wrote:
| Usually just by concatenating the final hidden states
| (before any classifier/regression/image output head)
| liuliu wrote:
| Many ways, for one: https://magicfusion.github.io/
| xigency wrote:
| The hands look better but there's still hints of a sixth finger
| in each of them.
| brucethemoose2 wrote:
| Nothing a hands LORA can't fix.
| xigency wrote:
| True, but you still won't get a coherent picture of someone
| picking their nose I bet.
| seydor wrote:
| and 5 phalanges
| krunck wrote:
| The last image has big toes for thumbs.
| philshem wrote:
| https://en.wikipedia.org/wiki/Brachydactyly_type_D
| LorenDB wrote:
| Any speculation why the AMD cards require twice the VRAM that
| Nvidia cards do? I have an RX 6700 XT and I'm disappointed that
| my 12 GB won't be enough.
| jacooper wrote:
| Your 6700XT won't work anyway, since its not supported by ROCm
| brucethemoose2 wrote:
| The 6000 series "unofficially" works with rocm, and that is
| hopefully getting more official.
| brucethemoose2 wrote:
| You will want an optimized implementation from torch-mlir or
| apache tvm anyway.
|
| We had this for SD 1.5, but it always stayed obscure and
| unpopular for some reason... I hope its different this time
| around.
| brucethemoose2 wrote:
| Probably no 4-bit quantization support? Or they are missing
| some ops that the tensor core cards have?
|
| My guess is AMD users will eventually get low VRAM
| compatibility through Vulkan ports (like SHARK/Torch MLIR or
| Apache TVM).
|
| Then again, the existing Vulkan ports were kinda obscure and
| unused with SD 1.5/2.1
| jdalgetty wrote:
| Is the mac studio with the Apple M2 Max with 12-core CPU, 30-core
| GPU enough for something like this?
| jrflowers wrote:
| System requirements
|
| Despite its powerful output and advanced model architecture,
| SDXL 0.9 is able to be run on a modern consumer GPU, needing
| only a Windows 10 or 11, or Linux operating system, with 16GB
| RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or
| higher standard) equipped with a minimum of 8GB of VRAM. Linux
| users are also able to use a compatible AMD card with 16GB
| VRAM.
|
| I'm guessing that it will work eventually, though I'm not sure
| who will make that happen.
| minimaxir wrote:
| Likely not. Even with earlier models performance may be
| finnicky:
| https://huggingface.co/docs/diffusers/optimization/mps
|
| Additionally, for Apple Silicon you likely need 64 GB RAM
| (since CPU/GPU memory is shared) which is expensive.
| pkage wrote:
| I've had good results with SD1.4/2 with MPS acceleration on
| similar hardware (M1 Max, though with 64gb). No stability
| issues with MPS, either. I'd say don't rule it out just yet.
| SkyPuncher wrote:
| I have an M2 MBP with 64 GB RAM. Performance with the older
| models is very good in my opinion. It feels to run faster
| locally than DreamStudio. I don't have benchmarks, but in any
| case the performance is not bad.
| itake wrote:
| My guess is you will need cuda support until someone ports it
| to mips
| brucethemoose2 wrote:
| Apple ported Stable Diffusion 1.5/2.1 to MPS themselves.
|
| If they don't so it for SDXL, the port will probably take
| awhile (if it happens at all).
| nozzlegear wrote:
| I've used Apple's port of Stable Diffusion on my Mac Studio
| with M1 Ultra and it worked flawlessly. I could even download
| models from Hugging Face and convert them to a CoreML model
| with little effort using Apple's conversion tool documented
| in their Stable Diffusion repo [1]. Some models on Hugging
| Face are already converted - I think anything tagged with
| CoreML.
|
| [1] https://github.com/apple/ml-stable-diffusion
| heliophobicdude wrote:
| If Emad tweeted this image [1] made with SDXL, then text in image
| could possibly better!
|
| 1:https://twitter.com/emostaque/status/1671885525639380992
| jrflowers wrote:
| That's probably DeepFloyd.
| gwern wrote:
| Text will be better due to simple scale, but the text will
| still be limited due to the use of a CLIP for text encoding
| (BPEs+contrastive). So that may be SD XL 0.9 but it should
| still be worse due to not using T5 like
| https://github.com/deep-floyd/IF
| brucethemoose2 wrote:
| What's the dataset? How commercially viable/legally questionable
| is it?
|
| This is critical, for legality of use, ethics concerns, _and_ the
| quality of the output (as overly zealous filtering can degrade
| the model like it did for SD 2.0).
| xigency wrote:
| If I recall correctly, Stability AI's process to skirt
| copyright is to have the training data compiled and model
| weights trained by a third-party university. Educational
| research institutions have more lax requirements around
| copyright. That may or may not be a legitimate way to work
| under existing laws, but doesn't tell us much about what the
| moral/ethical/legal considerations _should_ be, which seems
| like an open question.
| pmayrgundter wrote:
| That sounds more like their story for SD 1.5 last year. I
| think there was some kerfuffle between Stability.ai and
| Runway.ai/Heidelberg Uni (see the Forbes article; won't link
| as I'm unclear on the veracity), who they were working with,
| and they may have parted ways by their first indie work on SD
| 2.x around the holidays. Either way, the Uni connection story
| may be old.
| homarp wrote:
| https://news.ycombinator.com/item?id=36185891 discusses the
| Forbes article
| xnx wrote:
| The progress in AI is great, but very hard to keep up with (or
| understand). It's real to me once it makes it into Automatic1111.
| brucethemoose2 wrote:
| Small rant:
|
| I don't like how SD consolidated around the A1111 repo. The
| features are great, and it was fantastic when SD was brand
| new... but the performance and compatibility is awful, the
| setup is tricky, and it sucked all the oxygen out of the room
| that other SD UIs needed to flourish.
| balthigor wrote:
| I had the same issues with that repo in addition to some
| early release controversy around its provenance.
|
| I ended up using
| https://github.com/easydiffusion/easydiffusion
|
| which has served me well so far.
| brucethemoose2 wrote:
| VoltaML is making good progress. InvokeAI is also pretty
| good (but not as optimized/bleeding edge).
| ShamelessC wrote:
| It's open source. The only way to compete is on your merits
| in open source.
|
| If you want another UI to flourish, clone both it and A111,
| copy and paste the bits from A111 you'd like to have in yours
| (with attribution) and push it up along with any features you
| personally want.
|
| That does require developer time, and developers may converge
| on a popular implementation with good tests and lots of
| features as it's easier to contribute.
|
| The bottleneck isn't really the community though, it's the
| developers.
| brucethemoose2 wrote:
| Its not that simple, as A1111 uses the old stability AI
| implementation while pretty much everything else uses HF
| diffusers code.
|
| I worked trying to add torch.compile support to A1111 for a
| bit, fixing some graph breaks locally, but... It was too
| much. Some other things, like ML compilation backends, are
| also basically impossible.
| vm90 wrote:
| Curious to know how this compares to mid journey v5
| orbital-decay wrote:
| It's not better out of the box for text to image, this was
| known for quite some time. However, as soon as they release the
| weights (in a month as they promise), it will benefit from the
| tooling available for SD, without being limited to text to
| image.
|
| It's also a foundational model, not a finished product, and MJ
| will possibly use it, like they did in v4 with SD 1.5.
| Klaus4 wrote:
| MJv4 is not related to SD at all
| orbital-decay wrote:
| I might be misremembering, but didn't they announce in
| their twitter that they were using SD somehow for MJ v4?
| Later deleting this with a bunch of other tweets.
| andybak wrote:
| Got a link for that? I'm genuinely asking as I didn't know
| this.
| starshadowx2 wrote:
| Midjourney only used a combination of SD and their own stuff
| with the --beta and --test/testp model which came between V3
| and V4, other versions have no connection to SD.
| varelse wrote:
| [dead]
| GaggiX wrote:
| I'm already seeing the speedrun to implement this model
| architecture in the automatic1111 webui.
| swyx wrote:
| implementing model architecture and supporting it in a web ui
| are two different things, maybe a link to pr?
| GaggiX wrote:
| Do you mean that supporting the model in automatic webui is
| much more difficult than just implementing the model in a
| repo? I guess.
| brucethemoose2 wrote:
| The A1111 backend is kinda not set up for this, as it is built
| around the old Stability AI 1.5/2.1 implementation (not HF
| diffusers which most other backends use).
|
| It would basically be a rewrite, if I were to guess... And at
| that point they mind as well port everything to diffusers.
| ilaksh wrote:
| So it's non-commercial but they are adding it to the API on
| Monday? Does that mean it will be commercial then?
| minimaxir wrote:
| Clipdrop is owned by Stability AI and you can access the model
| now, with its own API: https://clipdrop.co/stable-diffusion
|
| The Stability AI API/DreamStudio API is slightly different.
| Yes, it's confusing.
| candiodari wrote:
| This is copyright: non-commercial for everyone that copies the
| model from them. Not for the ones producing their own model.
| binarymax wrote:
| " The model can be accessed via ClipDrop today with API coming
| shortly. Research weights are now available with an open
| release coming mid-July as we move to 1.0."
|
| I read this as: commercial use through our API now, self hosted
| commercial use in July.
| djsavvy wrote:
| "Research weights" seems to imply non-commercial use only
| though, right?
| binarymax wrote:
| "with an _open release_ coming mid-July "
| joshuahedlund wrote:
| Am I the only one who doesn't see an obvious difference in the
| quality between the left and right photos? (Maybe the wolf one)
| And these are extremely-curated examples!
| brucethemoose2 wrote:
| Objective comparison is always so tricky with Stable Diffusion.
| They should show off large batches, at the very least.
|
| I think Stability is ostensibly showing that the images are
| closer to the prompt (and the left wolf in particular has some
| distortion around the eyes).
| wodenokoto wrote:
| I prefer the composition of the beta model over the release.
| Quality wise I can't say one is better than the other. Maybe
| the hand in the coffee picture is better for the 0.9 model.
| og_kalu wrote:
| It really is a much better base model aesthetically than 1.5,
| 2.1 etc
|
| Comps here - https://imgur.com/a/FfECIMP
| HelloMcFly wrote:
| The wolf looks better, but also looks less like what you'd see
| in a "nature documentary" (part of the prompt).
|
| I think the coffee cup looks better in the right phot, it seems
| a tad bit more real to me.
|
| Like you I much prefer the alien photo on the left, but the
| photos are so stylistically different I'm not sure that says
| anything about the releases' respective capabilities.
| thelogicguy wrote:
| For the aliens, the right image has much more realistic
| gradation. The one on the right looks like the grays have been
| crushed out of it. There's also a funky glow coming from the
| right edge of the alien.
|
| I'd say the blur effects on the left images are much cleaner as
| well. There are some weird artifacts at the fringes of objects
| in the earlier version.
| djbusby wrote:
| So, there are at least a few dozen AI image generating sites,
| some specialized, other not. Are they all powered by SD? Maybe
| just with some better pre-prompting? Or are there other engines
| (eg DALL-E)?.
|
| AFAIK only SD can be run locally?
| pmayrgundter wrote:
| I've only run across 3 primary models: Midjourney (via their
| Discord), Dall-E and SD. And yes, there's a bunch of sites, but
| I've seen very similar quality to SD and no mention yet of a
| different base.
|
| I do expect there are other bases out there, but haven't seen
| any of quality yet.
|
| Before this release (XL 0.9) it's been unclear how much of the
| SD quality was in-house or came from their prior collab with
| Runway/Heidelberg.
| andybak wrote:
| Kandinsky, Deep Floyd... Also Midjourney is derived from SD I
| believe.
| pmayrgundter wrote:
| I don't think MJ is from SD.. found no mention of SD on
| their site or on Wiki, besides a comparison. Any citation?
| og_kalu wrote:
| None of those is derived from SD
| andybak wrote:
| I wasn't saying they were. Read back one more message.
|
| (However, I thought Midjourney definitely was at some
| point)
| bogwog wrote:
| > Nvidia GeForce RTX 20 graphics card (equivalent or higher
| standard)
|
| RIP my 1080 TI.
|
| Does anyone know what specific feature they need which 20+ cards
| have and older ones don't?
| dist-epoch wrote:
| RTX 20 have tensor cores. 1080 do not.
| brucethemoose2 wrote:
| My guess is low precision support, or some newer ops Pascal
| does not support in a custom CUDA kernel.
| binarymax wrote:
| Not enough RAM in your 1080TI
|
| Edit - it's not the RAM. 1080TI has 11GB and this press release
| says it requires 8. So I'm going to speculate that it's because
| 1080 lacks tensor cores compared to the 20x's Turing
| architecture
| nolok wrote:
| Funny knowing the stupidity nvidia is pulling with the 4xxx
| series regarding ram amount.
| [deleted]
| cma wrote:
| Since it is now split into two models to do the generation,
| you could load one and do the first stage of a bunch of
| images, then load the second and complete them, with half the
| vram usage.
| ShamelessC wrote:
| I believe the HF pipeline can do this already and I assume
| each stage uses more than 4 GB vram. There are other tricks
| they open source community will come up with though.
| brucethemoose2 wrote:
| 8 bit (and 4 bit?) quantization is low hanging fruit,
| assuming its not already running in 8 bit.
|
| An 8GB requirement kinda sounds like they have already
| quantized the model though.
| bogwog wrote:
| The post says it only needs 8GB, and my 1080 has 11GB.
___________________________________________________________________
(page generated 2023-06-22 23:02 UTC)