[HN Gopher] Stable Diffusion XL 1.0
       ___________________________________________________________________
        
       Stable Diffusion XL 1.0
        
       Author : gslin
       Score  : 228 points
       Date   : 2023-07-26 16:48 UTC (6 hours ago)
        
 (HTM) web link (techcrunch.com)
 (TXT) w3m dump (techcrunch.com)
        
       | skybrian wrote:
       | I tried it in dreamstudio. Like all the other image generators
       | I've tried, it's rubbish at drawing a piano keyboard or an
       | accordion. (Those are my tests to see if it understands the
       | geometry of machines.)
       | 
       | A couple of accordion pictures do look passable at a distance.
       | 
       | Another test: how well does it do at drawing a woman waving a
       | flag?
       | 
       | One thing that strikes me is that it generates four images at a
       | time, but there is little variety. It's a similar looking woman
       | wearing a similar color and style of clothing, a similar street,
       | and a large American flag. (In one case drawn wrong.) I guess if
       | you want variety you have to specify it yourself?
       | 
       | AI models seem to be getting ever better in resolution and at
       | portraits.
        
       | weird-eye-issue wrote:
       | Not actually released in the API unlike they said.
        
       | MasterScrat wrote:
       | It'll be "released" once the model weights show up on the repo or
       | in HuggingFace... for now it's "announced"
       | 
       | It should appear here at some point, currently only the VAE was
       | added:
       | 
       | https://huggingface.co/stabilityai
        
         | [deleted]
        
         | naillo wrote:
         | You get access to the weights instantly if you apply for them.
         | It's basically not a hurdle.
         | 
         | (I've been having fun with this for a few days.
         | https://huggingface.co/stabilityai/stable-diffusion-xl-base-...
         | Not sure there's much of a difference with the 1.0 version.)
        
           | MasterScrat wrote:
           | For 1.0? Where do you apply? Or are you talking about 0.9?
        
           | Ukv wrote:
           | The ones you can apply for access to are the 0.9 weights,
           | which have been available for a couple of weeks. Unless the
           | SDXL 1.0 weights are also available by application somewhere
           | that I'm unaware.
        
             | taminka wrote:
             | https://huggingface.co/stabilityai/stable-diffusion-xl-
             | base-... :)
        
         | nickthegreek wrote:
         | It does appear to be live on Clipdrop.
         | 
         | https://clipdrop.co/stable-diffusion
        
         | [deleted]
        
         | quartz wrote:
         | Isn't this it? https://huggingface.co/stabilityai/stable-
         | diffusion-xl-base-...
        
           | MasterScrat wrote:
           | Yes, it's now been released
        
         | ftufek wrote:
         | The release event is in like ~30 minutes on their discord,
         | probably the announcement went out a bit early.
        
       | thepaulthomson wrote:
       | Midjourney is still going to be hard to beat imo. Comparing SD to
       | MJ is a little unfair considering their applications and
       | flexibility, but I do really enjoy the "out of the box"
       | experience that comes with MJ.
        
         | Der_Einzige wrote:
         | Midjourney is destroyed by the ecosystem around stable
         | diffusion, especially all the features and extensions in
         | automatic1111. It's not even close
        
           | WXLCKNO wrote:
           | You still have to run midjourney through discord right? There
           | isn't even an official API. Feels like a joke.
        
         | jyap wrote:
         | Different use case.
         | 
         | I can run SDXL 1.0 offline from my home. I can't do this with
         | Midjourney.
         | 
         | A closed source model that doesn't have the limitation of
         | running on consumer level GPUs will have certain advantages.
        
           | starik36 wrote:
           | What type of setup do you have at home? What type of GPU? MJ
           | completes a pretty high quality photo in about a minute. Does
           | SD compare?
        
             | BrentOzar wrote:
             | With an RTX 4090, you can crank out several images per
             | minute, even at high resolutions.
        
       | lee101 wrote:
       | [dead]
        
       | accrual wrote:
       | It sounds like after the previous 0.9 version there was some
       | refining done:
       | 
       | > The refining process has produced a model that generates more
       | vibrant and accurate colors, with better contrast, lighting, and
       | shadows than its predecessor. The imaging process is also
       | streamlined to deliver quicker results, yielding full 1-megapixel
       | (1024x1024) resolution images in seconds in multiple aspect
       | ratios.
       | 
       | Sounds pretty impressive, and the sample results at the bottom of
       | the page are visually excellent.
        
         | Tenoke wrote:
         | They have bots in their discord for generating images bases on
         | user prompts. Those randomize some settings, compare candidate
         | models and are used for rlhf fine-tuning and that's the main
         | source of refining which will continue even after release.
        
         | dragonwriter wrote:
         | There were, IIRC, three different post-0.9 candidate models in
         | parallel testing to become 1.0 recently.
        
       | [deleted]
        
       | latchkey wrote:
       | Amazing that their examples at the bottom of the page still show
       | really messed up human hands.
        
         | k12sosse wrote:
         | Hands being bad is a result of people one shotting images, you
         | need to go repaint them afterwards I've found. But it'll do it
         | great if you inpaint well.
        
         | HelloMcFly wrote:
         | I've personally observed that the drawing of hands in
         | Midjourney and SD has been getting incrementally better release
         | after release.
        
           | latchkey wrote:
           | That's why I'm amazed they picked images with totally borked
           | up hands to put on their press release. Truth in advertising!
        
             | k12sosse wrote:
             | If they cherry pick the examples people would get the wrong
             | idea. What I like about imagegen is your results are really
             | only bound by your patience
        
         | mynameisvlad wrote:
         | Some of them look surprisingly correct, so it looks like
         | there's been at least some progress on that front. I would
         | assume these are among the best examples of many, many attempts
         | so it still seems to be a ways off.
        
       | RobotToaster wrote:
       | Is this pre-censored like their other later models?
        
         | Remmy wrote:
         | Yes.
        
         | naillo wrote:
         | I've been playing with 0.9 and it can generate nude people so
         | it seems not.
        
         | AuryGlenz wrote:
         | No. From what I've gathered was trained on human anatomy, but
         | not straight up porn. What they tried for 2.0/2.1 was way too
         | overdone, to the point where if I prompted "princess Zelda,"
         | the generation would only look mildly like her. Presumably they
         | just didn't have many images of people in the training. 1.5 and
         | SDXL both work fine of that front.
         | 
         | Fine tuners will quickly take it further, if that's what you're
         | after.
        
       | PeterStuer wrote:
       | Let's see wether derived models will suffer less from the 'same
       | face actor'-model response to every portrait prompt. It's not
       | trivial to get photoreal models not lookalike without resorting
       | to specific, typically celeb based, finetunes.
        
       | andybak wrote:
       | In the meantime I've been getting good mileage out of Kandinsky -
       | anyone got a good sense of how they compare?
        
         | brucethemoose2 wrote:
         | This is the first I have heard of Kandinsky. Thanks for the
         | tip.
         | 
         | SDXL is a bigger model. There are some subjective comparison
         | posts with SDXL 0.9, but I can't see them since they are on X
         | :/
        
           | simbolit wrote:
           | That sounds so weird, it took me a minute to understand. Go
           | to nitter.net which has no login requirement and no ads, but
           | all the same content that X (tmsfkat) has.
        
             | brucethemoose2 wrote:
             | Two Nitter instances failed to load it, unfortunately.
             | 
             | And yeah, X is weird to type out too.
        
       | tmaly wrote:
       | I will wait for the automatic1111 web ui version
        
         | Der_Einzige wrote:
         | It's already supported in automatic1111 (see recent updates),
         | and someone in the community will convert it to the
         | automatic1111 format within minutes/hours after it's released
         | on huggingface.
        
           | SV_BubbleTime wrote:
           | Sort of. IIRC (which may be unlikely) Auto1111 has the base
           | model in the text to image plane, but if you want to use the
           | refiner that is a separate IMG2IMG step/tab. Which would be a
           | pain in the ass imo.
           | 
           | The "Comfy" tool is node based and you can string both
           | together which is nice. Although if you aren't confident in
           | your images you don't need the refiner for a bit.
        
             | brucethemoose2 wrote:
             | I think the diffusers UIs (like Invoke and VoltaML) are
             | going to implement the refiner soon since HF already has a
             | pipeline for it.
             | 
             | Comfy and A1111 are based around the original SD
             | StabilityAI code, but the implementation must be pretty
             | similar if they could add the base model so quickly.
        
               | dragonwriter wrote:
               | Work started with the SDXL 0.9 release, and for A1111 it
               | exited release candidatr status in the last few days.
        
           | [deleted]
        
           | seydor wrote:
           | whats the memory usage of sdxl ?
        
             | k12sosse wrote:
             | Runs great on my 10GB 3080 FTW3. ComfyUI moreso than
             | auto1111.
        
             | SV_BubbleTime wrote:
             | It depends greatly on your UI and the size you are
             | generating. There is no hard answer.
        
             | andrewmunsell wrote:
             | Been working fine on a 8 GB 3070 generating 1024x1024
             | images, using Comfy UI with the refiner
        
         | brucethemoose2 wrote:
         | TBH I was hoping the community would take the opportunity to
         | move to the diffusers format...
         | 
         | You get deduplication, easy swapping of stuff like VAEs, faster
         | loading, and less ambiguity about what exactly is inside a
         | monolithic .safetensors file. And this all seems more important
         | since SDXL is so big, and split between two models anyway.
        
       | mt3ck wrote:
       | Is there anything like this for the vector landscape?
       | 
       | This may just be due to the iterative denoising approach a lot of
       | these models take but they only seem to work well when creating
       | raster style images.
       | 
       | In my experience when you ask them to create logos, shirt
       | designs, illustrations, they tend to not work as well and
       | introduce a lot of artifacts, distortions, incorrect spellings
       | etc.
        
         | orbital-decay wrote:
         | If you mean raster images that look like vector and contain
         | arbitrary text and shapes, controlnets/T2I adapters do work for
         | this. You could train your custom controlnet for this, too. (it
         | requires understanding)
         | 
         | As for directly generating vector images, there's nothing yet.
         | Your best bet is generating vector-looking raster and tracing
         | it.
        
         | cheald wrote:
         | A lot of people are having success by adding extra networks
         | (lora is the most common) which are trained on the type of
         | image you're looking for. It's still a raster image, of course,
         | but you can produce images which look very much like
         | rasterizations of vector images, which you can then translate
         | back into SVGs in Inkscape or similar.
        
       | jrflowers wrote:
       | I hope someday there's a version of this or something comparable
       | to it that can run on <8gb consumer hardware. The main selling
       | point of Stable Diffusion was its ability to run in that
       | environment.
        
         | naillo wrote:
         | You can do this if you select the
         | `pipe.enable_model_cpu_offload()` option. See this
         | https://huggingface.co/stabilityai/stable-diffusion-xl-base-...
        
         | minsc_and_boo wrote:
         | I feel like this is the greatest demand for LLMs at the moment
         | too.
         | 
         | It's hard to believe we're only 8 months into this industry, so
         | I imagine we'll start seeing smaller footprints soon.
        
           | simbolit wrote:
           | 8 months from what point?
           | 
           | Gpt3 is 36 months old. Dalle-e is 28 months old. Even
           | StableDiffusion is like 11 months old.
        
           | brucethemoose2 wrote:
           | We already do. MLC-LLM and Llama.cpp have Vulkan/OpenCL/Metal
           | 3 bit implementations. That can run llama 7B (or maybe even
           | 13b?) in 8GB.
           | 
           | TBH devices just need more ram for coherent output though.
           | Llama 13b and 33b are so much "smarter" and more coherent
           | than 7B with 3 bit quant.
        
         | liuliu wrote:
         | SDXL 0.9 runs on iPad Pro 8GiB just fine.
        
           | JeffeFawkes wrote:
           | Is this using Draw Things, or another app? Did you have to
           | quantize the model first?
        
             | liuliu wrote:
             | Yeah, Draw Things. It will be submitted as soon as SDXL
             | v1.0 weights available. Quantized model _should_ run on
             | iPhones (4GiB  / 6GiB models), but we haven't done that
             | yet. So no, these are just typical FP16 weights on iPad.
        
               | JeffeFawkes wrote:
               | Thanks! I guess I'll stick to running it on my Macbook
               | for the time being until the quantized model gets
               | uploaded. What kind of performance are you seeing with
               | the FP16 weights on the iPad? I've run a few SD2.0-based
               | (unquantized) models on my 2020 iPad Pro but it seems
               | like it gets thermally throttled after a while.
        
               | liuliu wrote:
               | Will be more info upon release. SDXL v0.9 performs
               | generally the same as SD v1 / v2 on the same resolution.
               | But because you tend to run it at larger resolution, you
               | might feel it slower.
        
         | capybara_2020 wrote:
         | Give InvokeAI a try.
         | 
         | https://github.com/invoke-ai/InvokeAI
         | 
         | Edit: Spec required from the documentation
         | 
         | You will need one of the following:                   An
         | NVIDIA-based graphics card with 4 GB or more VRAM memory. 6-8
         | GB of VRAM is highly recommended for rendering using the Stable
         | Diffusion XL models         An Apple computer with an M1 chip.
         | An AMD-based graphics card with 4GB or more VRAM memory (Linux
         | only), 6-8 GB for XL rendering.
        
         | brucethemoose2 wrote:
         | There are several papers on 4/8 bit quantization, and a few
         | implementations for Vulkan/CUDA/ROCm compilation.
         | 
         | TBH the UIs people run for SD 1.5 are pretty unoptimized.
        
         | dragonwriter wrote:
         | > I hope someday there's a version of this or something
         | comparable to it that can run on <8gb consumer hardware.
         | 
         | Someday is today: from the official announcement: "SDXL 1.0
         | should work effectively on consumer GPUs with 8GB VRAM or
         | readily available cloud instances."
         | https://stability.ai/blog/stable-diffusion-sdxl-1-announceme...
        
       | jamesdwilson wrote:
       | Still can't draw hands correctly it looks like.
        
         | naillo wrote:
         | I can't say for 1.0 but in 0.9 hands get fairly often rendered
         | perfectly. It's not always right but it's way better than any
         | other earlier release (where it's usually _consistently_
         | wrong).
        
       | mkaic wrote:
       | The official blog post from Stability is finally up and would
       | probably be a better URL to link to than the TechCrunch coverage:
       | https://stability.ai/blog/stable-diffusion-sdxl-1-announceme...
        
       | badwolf wrote:
       | Can SD draw hands finally?
        
         | GaggiX wrote:
         | You can already easily generate images with good looking hands
         | if you use a good custom model.
        
       | ShamelessC wrote:
       | I thought this release had been announced already? Or was that
       | not 1.0? Could have sworn they released an "XL" variant a little
       | while ago?
        
         | GaggiX wrote:
         | It was the research weights of the v0.9 model
        
       | amilios wrote:
       | I always wondered why the vision models don't seem to be
       | following the whole "scale up as much as possible" mantra that
       | has defined the language models of the past few years (to the
       | same extent). Even 3.5 billion parameters is absolutely nothing
       | compared to the likes of GPT-3, 3.5, 4, or even the larger open-
       | source language models (e.g. LLaMA-65B). Is it just an
       | engineering challenge that no one has stepped up for yet? Is it a
       | matter of finding enough training data for the scaling up to make
       | sense?
        
         | brucethemoose2 wrote:
         | Diffusion is relatively compute intensive compared to
         | transformers llms, and (in current implementation) doesn't
         | quantize as well.
         | 
         | A 70B parameter model would be very slow and vram hungry, hence
         | very expensive to run.
         | 
         | Also, image generation is more reliant on tooling surrounding
         | the models than pure text prompting. I dont think even a 300B
         | model would get things quite right through text prompting
         | alone.
        
         | airgapstopgap wrote:
         | Diffusion is more parameter-efficient and you quickly saturate
         | the target fidelity, especially with some refiner cascade. It's
         | a solved problem. You do not need more than maybe 4B total.
         | Images are far more redundant than text.
         | 
         | In fact, most interesting papers since Imagen show that you get
         | more mileage out of scaling the text encoder part, which is, of
         | course, a Transformer. This is what drives accuracy, text
         | rendering, compositionality, parsing edge cases. In SD 1.5 the
         | text encoder part (CLIP ViT-L/14) takes a measly 123M
         | parameters.[1] In Imagen, it was T5-XXL with 4.6B [2]. I am
         | interested in someone trying to use a _really strong_ encoder
         | baseline - maybe from a UL2-20B - to push this tactic further.
         | 
         | Seeing as you can throw out diffusion altogether and synthesize
         | images with transformers [3], there is no reason to prioritize
         | the diffusion part as such.
         | 
         | 1. https://forums.fast.ai/t/stable-diffusion-parameter-
         | budget-a...
         | 
         | 2. https://arxiv.org/abs/2205.11487
         | 
         | 3. https://arxiv.org/abs/2301.00704
        
           | ShamelessC wrote:
           | > Seeing as you can throw out diffusion altogether and
           | synthesize images with transformers [3]
           | 
           | That's actually how this whole party got started. DALL-E (the
           | first one) was a transformer model trained on image tokens
           | from an early VAE (and text tokens ofc). Researchers from
           | CompVis developed VQGAN in response. OpenAI showed improved
           | fidelity with guided diffusion over ImageNet (classes) and
           | subsequently DALLE2 using pixel space diffusion and cascading
           | up sampling. CompVis responded with Latent Diffusion which
           | used diffusion in the latent space of some new VQGANs.
           | 
           | The paper you mention is interesting! They go back to the
           | DALL-E 1 method but train two VQGAN's for upsampling and
           | increase the parameter count. This is faster, but only faster
           | than originally reported benchmarks using inferior sampling
           | methods for their diffusion. I would be curious if they can
           | beat some of the more recent ones which require as few as
           | 10-20 steps.
           | 
           | They also improve on FID/CLIP scores likely by using more
           | parameters. This might be a memory/time trade off though. I
           | would be curious how much more VRAM their model requires
           | compared to SD, MJ, Kandinsky.
           | 
           | The same goes for using T5-XXL. You'll win FID score contests
           | but no one will be able to run it without an A100 or TPU pod.
        
             | airgapstopgap wrote:
             | > The same goes for using T5-XXL
             | 
             | Is this still true in 2023? Sure, back in the dark ages it
             | seemed like a 860M model is just about the limit for a
             | regular consumer, but I don't see why we wouldn't be able
             | to use quantized encoders; and even 30B LLMs run okay on
             | Macbooks now.
        
           | Etherlord87 wrote:
           | > Images are far more redundant than text.
           | 
           | "A picture is worth a thousand words" - I wonder how
           | (in)accurate this popular saying turned out to be? :D
        
             | elpocko wrote:
             | I'm gonna go ahead and say in 2023, one detailed picture
             | (512x512) is worth about 30 words.
        
               | SketchySeaBeast wrote:
               | I guess that depends on the prompt.
        
               | k12sosse wrote:
               | Do negative prompt tokens count as words?
        
         | naillo wrote:
         | They often reference this paper as the motivation for that
         | https://arxiv.org/pdf/2203.15556.pdf I.e. training with 10x
         | data and 10x longer can yield as good models as a gpt-3 model
         | but with fewer weights (according to the paper) and the same
         | principle applies in vision.
        
       | lacker wrote:
       | I'm out of date on the image-generating side of AI, but I'd like
       | to check things out. What's the best tool for image generation
       | that's available on a website right now? Ie, not a model that I
       | have to run locally.
        
         | gfosco wrote:
         | [flagged]
        
         | PUSH_AX wrote:
         | Midjourney right? Although, discord isn't a website I guess.
        
         | iambateman wrote:
         | Probably Midjourney, but I like Dreamstudio better.
        
         | [deleted]
        
         | a5huynh wrote:
         | If you want to play around with Stable Diffusion XL:
         | https://clipdrop.co
        
           | dash2 wrote:
           | I just tried this and the UI is very nice (better than
           | dreamstudio), with nice tool integration, and image quality
           | is definitely going up with each new release. You can see a
           | few results at fb.com/onlyrolydog (along with a lot of other
           | canine nonsense).
        
           | esperent wrote:
           | Since clipdrop hs an API is there any way to use it with
           | ComfyUI or Automatic111 (or whatever that's called).
        
         | the_lonely_road wrote:
         | https://playgroundai.com/create
         | 
         | Not affiliated in anyway and not very involved in the space. I
         | just wanted to generate some images a few weeks ago and was
         | looking for somewhere I could do that for free. The link above
         | lets you do that but I suggest you look up prompts because its
         | a lot more involved than I expected.
        
           | aaarrm wrote:
           | Any particularly useful resources for looking into prompts?
        
             | the_lonely_road wrote:
             | I used this: https://learnwithnaseem.com/best-playground-
             | ai-prompts-for-a...
             | 
             | I just took the ones I liked and then deleted out the words
             | that were specific to that image and left the ones that
             | were providing the style of the image. So for example on
             | the first one I would delete "an cute kitsune in florest"
             | but would keep "colorfully fantast concept art". Then I
             | just added a comma separated list of the of the features I
             | wanted in my picture. It took a lot more trial and error
             | than I thought and adding sentences seemed to be worse than
             | just individual words. I am sure I barely scratched the
             | surface of interfacing with the tool correctly but the
             | space is moving so fast its not the kind of thing I want to
             | spend my time learning right now just to have that
             | knowledge deprecate in 6 months.
        
             | brucethemoose2 wrote:
             | This AI Horde UI has, IMO, some really good templates and
             | suggestions:
             | 
             | https://tinybots.net/artbot
        
         | knicholes wrote:
         | I've found https://firefly.adobe.com/ pretty good at composing
         | images with multiple subjects. [disclaimer - I work at Adobe,
         | but not in the Creative Cloud]
         | 
         | But I wouldn't say it's the "best." Just trained on images that
         | weren't taken from unconsenting artists.
        
           | adzm wrote:
           | I'm actually a big fan of firefly. It has a different kind of
           | style from the others, presumably due to its training
           | dataset?
        
         | roborovskis wrote:
         | https://dreamstudio.ai/
        
           | esperent wrote:
           | What models does dreamstudio use? I couldn't see how to view
           | them without logging in.
        
       | vouaobrasil wrote:
       | This explosion of AI-generated imagery will result in an
       | explosion of millions of fake images, obivously. Perhaps in the
       | short-term this is fun, but in the long-term, we will lose a bit
       | more scarcity, which is not that great in my opinion.
       | 
       | Isn't the best part of a meal eating after you've not had
       | anything to eat for a while? The best part about a kiss that
       | you've quenched the pain of missing your partner?
       | 
       | The best part of art is that you haven't seen anything good in a
       | while?
       | 
       | Scarcity is an underappreciated gift to us, and the relative
       | scarcity per capita is in a sense what drives us to connect with
       | other people, so that we may be priveleged to witness the
       | occasional spark of creativity from a person, which in turn tells
       | us about that person.
       | 
       | Although that sort of viewpoint has been declining for some time
       | due to the intensely capitalistic squeezing of every sort of
       | human endeavor, AI brings this to a whole new level.
       | 
       | I think if those making this software thought a bit about this,
       | they might second-guess whether it is truly right to release it.
       | Just a thought.
        
         | soligern wrote:
         | Enforcing artificial scarcity is idiotic and counter
         | progressive. There will be other things that will continue to
         | be uncommon that humans will continue to appreciate. This is
         | what human progress looks like. Imagine someone said this when
         | agriculture started up- "The great thing about fruits and
         | vegetables is that they taste so sweet the few times we find
         | them. We shouldn't grow them in bulk"
        
         | pzo wrote:
         | A lot of downvotes. I can relate to it a little bit. During
         | beginning of covid I was in SE Asia at airbnb that didn't have
         | laundry machine - since in SE Asia you don't need it generally
         | because there is so many cheap per kg laundry services around.
         | When for the first month I had to hand wash my clothes I really
         | appreciated having a laundry machine after moving to another
         | airbnb that had one - you take some things for granted.
         | 
         | But no, I wouldn't want to hand wash my laundry more often. For
         | the same reason probably I still prefer using a lighter when
         | having a BBQ than a flint.
        
         | [deleted]
        
         | dwallin wrote:
         | I think you have this backwards, Capitalism loves scarcity.
         | Scarcity is what allows for supply and demand curves and
         | profit-making opportunities, even better if you can control the
         | scarcity. Capitalist entities are constantly attempting to use
         | laws, technology, and market power to add scarcity to places
         | where it didn't previously exist.
        
           | SV_BubbleTime wrote:
           | I'm pretty sure that lots of things end up being scarce in
           | totalitarian forms of government... food, for one.
        
         | Gabriel_Martin wrote:
         | Seeing the "less art needs to exist" perspective is certainly a
         | first time for me on this topic.
        
         | naillo wrote:
         | The same could have been said when photoshop or CGI tools like
         | blender replaced hand sculpting and hand painting but I think
         | it hasn't been a net negative across the board (I think rather
         | the opposite).
        
         | RcouF1uZ4gsC wrote:
         | I want to appreciate your comment, but I can't.
         | 
         | Can you please chisel it on stone tablets for me?
         | 
         | That will really help me appreciate it.
        
       | naillo wrote:
       | Stability AI is awesome I love them
        
       | freediver wrote:
       | I am completely uninformed in this space.
       | 
       | Would someone be kind to explain what the current state of the
       | art in image generation is (how does this compare to Midjourney
       | and others)?
       | 
       | How do open source models stack up?
       | 
       | Also what are the most common use cases for image generation?
        
         | [deleted]
        
         | liuliu wrote:
         | SDXL 0.9 should be the state-of-the-art image generation model
         | (in the open). It generates at 1024x1024 large resolution, with
         | high coherency and good selection of styles out of box. It also
         | has reasonable text-understanding comparing to other models.
         | 
         | That has been said, based on the configurations of these
         | models, we are far from saturating what the best model can do.
         | The problem is, FID is terrible metrics to evaluating these
         | models so like LLM, we are a bit clueless about how to evaluate
         | them now.
        
           | GaggiX wrote:
           | Why do you think FID is a terrible metrics? What don't you
           | like in particular about it?
        
             | liuliu wrote:
             | I overspoke. FID is a fine metrics to observe the training
             | progress of your own model. And it correlates well with
             | some coherency issues of generative models. But for cross
             | model comparisons, especially for models that generally do
             | well under FID, it is not discriminative enough to separate
             | better / good.
        
         | sdflhasjd wrote:
         | For bland stock photos and other "general-purpose" image
         | generation, DALLE-2/Bing/Adobe etc are... the okayest. SD (with
         | just standard model weights) is particularly weak here because
         | of the small model size.
         | 
         | If you want to get arty, then state of the art for out-of-the-
         | box typing in a prompt and clicking "generate" is probably
         | MidJourney.
         | 
         | But if you're willing to spend some more time playing around
         | with the open-source tooling, community finetunes, model
         | augmentations (LyCORIS, etc), SD is probably going to get you
         | the farthest.
         | 
         | > Also what are the most common use cases for image generation?
         | 
         | By sheer number of image generations? Take a guess...
        
         | orbital-decay wrote:
         | SDXL is in roughly the same ballpark as MJ 5 quality-wise, but
         | the main value is in the array of tooling immediately available
         | for it, and the license. You can fine-tune it on your own
         | pictures, use higher order input (not just text), and daisy-
         | chain various non-imagegen models and algorithms
         | (object/feature segmentation, depth detection, processing,
         | subject control etc) to produce complex images, either
         | procedural or one-off. It's all experimental and very
         | improvised, but is starting to look like a very technical CGI
         | field separate from the classic 3D CGI.
        
         | brucethemoose2 wrote:
         | Midjourney may be better for plain prompts, but Stable
         | Diffusion is SOTA because of the tooling and finetuning
         | surrounding it.
        
           | hospitalJail wrote:
           | Idk Midjourny ignores prompts.
           | 
           | For the longest time I thought it was google imaging things
           | and doing some photoshop to make things look like Pixar
           | because it was so bad.
        
       ___________________________________________________________________
       (page generated 2023-07-26 23:01 UTC)