[HN Gopher] A Web UI for Stable Diffusion
___________________________________________________________________
A Web UI for Stable Diffusion
Author : feross
Score : 72 points
Date : 2022-09-09 20:02 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| sharps_xp wrote:
| if someone can dockerize this, please reply with a link!
| stephanheijl wrote:
| This is the dockerized version of this repo:
| https://github.com/AbdBarho/stable-diffusion-webui-docker
| hwers wrote:
| People recently figured out how to export stable diffusion to
| onnx so it'll be exciting to see some _actual_ web UIs for it
| soon (via quantized models and tfjs /onnxruntime for web)
| ducktective wrote:
| It seems Midjourney generates better results than SD or Dall-E.
|
| What's with the "hyper resolution", "4K, detailed" adjectives
| which are thrown left and right, while we are at it?
| Ckalegi wrote:
| The metadata and file names of the images in the source data
| set are also inputs for the model training. These keywords are
| common tags across images that have these characteristics, so
| in the same way it knows what a unicorn looks like, it also
| knows what a 4k unicorn looks like compared to a hyper rez
| unicorn.
| schleck8 wrote:
| Those are prompt engineering keywords. SD is way more reliant
| on tinkering with the prompt than midjourney
|
| https://moritz.pm/posts/parameters
| jrm4 wrote:
| So, and this is an ELI5 kind of question I suppose. There must be
| something going on like "processing a kazillion images" and I'm
| trying to wrap my head around how (or what part of) that work is
| "offloaded" to your home computer/graphics card? I just can't
| seem to make sense of how you can do it at home if you're not
| somehow in direct contact with "all the data?" e.g. must you be
| connected to the internet, or "stable-diffusions servers" for
| this to work?
| juliendorra wrote:
| That's the interesting part: all the images generated are
| derived from a less than 4gb model (the trained weights of the
| neural network).
|
| So in a way, hundreds of billions of possible images are all
| stored in the model (each a vector in multidimensional latent
| space) and turned into pixels on demand (drived by the language
| model that knows how to turn words into a vector in this space)
|
| As it's deterministic (given the exact same request parameters,
| random seed included, you get the exact same image) it's a form
| of compression (or at least encoding decoding) too: I could
| send you the parameters for 1 million images that you would be
| able to recreate on your side, just as a relatively small text
| file.
| codefined wrote:
| All those 'kazillion' images are processed into a single
| 'model'. Similar to how our brain cannot remember 100% of all
| our experiences, this model will not store precise copies of
| all images it is trained off of. However, it will understand
| concepts, such as what a unicorn looks like.
|
| For StableDiffusion, the current model is ~4GB, which is
| downloaded the first time you run the model. These 4GB encode
| all the information that the model requires to derive your
| images.
| ducktective wrote:
| As someone with ~0 knowledge in this field, I think this has to
| do with a concept called "transfer learning" in which you once
| train with that kazillion of images, then use that same
| "coefficients" for further run of the NN.
| sC3DX wrote:
| What you interact with as the user is the model and its
| weights.
|
| The model (presumably some kind of convolutional neural
| network) has many layers, every layer has some set of nodes,
| and every node has a weight, which is just some coefficient.
| The weights are 'learned' during the model training where the
| model takes in the data you mention and evaluates the output.
| This typically happens on a super beefy computer and can take a
| long time for a model like this. As images are evaluated the
| output gets better the weights get adjusted accordingly.
|
| Now we as the user just need the model and the weights!
| dwohnitmok wrote:
| This is the main reason why attempts to say that these sorts of
| AI are just glorified lookup tables, or even that they are
| simply tools that mash together a kazillion images together are
| very misleading.
|
| A kazillion images are used in training, but training consists
| of using those images to tune on the order of ~5 GB of weights
| and that is the entire size of the final model. Those images
| are never stored anywhere else and are discarded immediately
| after being used to tune the model. Those 5 GB generate all the
| images we see.
| YoshikiMiki wrote:
| Give this a shot https://pinegraph.com/create :)
| user-one1 wrote:
| It can be run directly into google colab:
| https://colab.research.google.com/drive/1Iy-xW9t1-OQWhb0hNxu...
| [deleted]
| amelius wrote:
| Regarding the opening image: if it can't correctly put the marks
| on dice, how can it put eyes, nose and mouth correctly on a human
| face?
| bastawhiz wrote:
| Presumably the number of faces in the training set far exceeds
| the number of dice by more than a few orders of magnitude.
| smrtinsert wrote:
| I have a 6gb 1660ti, barely holding on. Is a new 12gb card good
| enough for now, or should I go even higher to be safe for a few
| years of sd innovation?
| fassssst wrote:
| The GeForce 4000 series is about to release and should make
| Stable Diffusion wayyyyy faster based on related H100
| benchmarks posted today.
| drexlspivey wrote:
| How is M1/M2 support for SD? Is there a significant performance
| drop? Presumably you would be able to buy a 32GB M2 and be
| future proof because of the shared memory between CPU/GPU.
| jjcon wrote:
| In my setup at least it runs essentially in CPU mode since
| there is no CUDA acceleration available and metal support is
| really messy right now. So while quite slow I don't run into
| memory issues at least. It runs much faster on my desktop GPU
| but that has more constraints (until I upgrade my personal
| 1080 to a 3090 one of these days).
| totoglazer wrote:
| There was a long thread last week. It's honestly pretty good
| if you follow the instructions. 30-40 seconds/image.
| wyldfire wrote:
| It sounds like there's forks that are able to work with <=8GB
| cards. And I'm not sure but I think the weights are using f32,
| so switching to half might make it yet easier still to get this
| to work w/less memory.
|
| But yeah the next generation of models would probably
| capitalize on more memory somehow.
| filiphorvat wrote:
| People have reported that this repo even works with 2gb cards
| if you run it with --lowvram and --opt-split-attention.
| password4321 wrote:
| Yes, the amount of VRAM doesn't seem to be as much of a
| limitation anymore. However, processing power is still
| important.
| CitrusFruits wrote:
| I'm using it with a 2070 (4 year old card with 8gb vram) and it
| takes about 5 seconds for a 512x512 image. It's been plenty
| fast to have some fun, but I think I'd want faster if it was
| part of a professional work flow.
| totoglazer wrote:
| What settings? That seems faster than expected.
| CitrusFruits wrote:
| It was the defaults for the webui I used. Faster than I
| expected too, but the results were all legit.
| avocado2 wrote:
| nowandlater wrote:
| This is the one I've been using https://github.com/sd-
| webui/stable-diffusion-webui . docker-compose up , works great.
| wyldfire wrote:
| I like this one but had some trouble with using img2img. Maybe
| my image was too small (it was smaller than 512x512). Failed
| with the same signature as an issue that was closed with a fix.
| ghilston wrote:
| I am mobile, but there's an issue reported on github about
| img2img
| stbtrax wrote:
___________________________________________________________________
(page generated 2022-09-09 23:00 UTC)