hngopher.com

       [HN Gopher] Comparing Adobe Firefly, Dalle-2, and OpenJourney
       ___________________________________________________________________
        
       Comparing Adobe Firefly, Dalle-2, and OpenJourney
        
       Author : muhammadusman
       Score  : 123 points
       Date   : 2023-06-20 17:16 UTC (5 hours ago)
        
 (HTM) web link (blog.usmanity.com)
 (TXT) w3m dump (blog.usmanity.com)
        
       | og_kalu wrote:
       | Should be compared using Bing Image Creator(better version of
       | dall-e) rather than the Dalle-2 site.
        
       | pdntspa wrote:
       | Why didnt this person include Stable Diffusion?
        
         | qiller wrote:
         | OpenJourney is fine tuned SD
        
       | soligern wrote:
       | [flagged]
        
       | dvt wrote:
       | Adobe Firefly is actually extremely competent, especially since
       | it doesn't use copyrighted images in its training set. Using
       | MidJourney (which is fantastic) commercially will be a quagmire
       | for the unlucky company that draws a lawsuit.
        
       | personjerry wrote:
       | The analysis at the end seems to be lacking. From my perspective,
       | PhotoShop and Midjourney come out on top in terms of aesthetic
       | and accuracy, with kouteiheika's Stable Diffusion results[0] a
       | close second. Dall-E falls far behind, which makes sense
       | considering all the work that's gone in to the other systems to
       | fine-tune and build ecosystems around them.
       | 
       | [0]: https://news.ycombinator.com/item?id=36408744
        
       | FanaHOVA wrote:
       | I had done a similar comparison a couple months back but used
       | Lexica instead of DALL-E.
       | 
       | Seems clear to me that Midjourney has by far the best "vibes"
       | understanding. Most models get the items right but not the
       | lighting. Firefly seems focused on realism which makes sense for
       | a photography audience.
       | 
       | https://twitter.com/fanahova/status/1639325389955952640?s=46...
        
       | Skywalker13 wrote:
       | And here with BlueWillow https://www.bluewillow.ai/
       | 
       | 1:
       | https://media.discordapp.net/attachments/1060989219432054835...
       | 
       | 2:
       | https://media.discordapp.net/attachments/1060989219432054835...
       | 
       | 3:
       | https://media.discordapp.net/attachments/1060989219432054835...
        
         | kj_setup wrote:
         | Seems a lot better than some of the ones in the post
        
       | snowe2010 wrote:
       | not sure this is a good comparison. midjourney likes much shorter
       | prompts, and honestly they're all absolutely terrible for
       | anything that isn't 'photo' based. E.g. ask it to generate a word
       | bubble of the most common programming languages and it will fail
       | every time, no matter what you try. I love it for photo stuff,
       | but for photoshop you'd expect it to be able to do other things
       | as well.
        
         | capybara_2020 wrote:
         | Curious, midjourney does great art and cartoon/comic styles
         | too. Not just realistic images.
         | 
         | Most image AI tools are terrible with words.
         | 
         | I am curious, what images did you try generating with
         | midjourney?
        
         | jw1224 wrote:
         | That's not a fair comparison, as Midjourney is outstanding at a
         | wide range of styles beyond photography.
         | 
         | Generating a "word bubble" is going to look terrible in every
         | major diffusion model. Cohesive words and writing in _image_
         | models is still highly specialised.
        
       | abeppu wrote:
       | Is it intentional that each of the prompts is given twice in that
       | blockquote? It's done without a space, so e.g. in the 2nd
       | example, the word "centeredvalley" appears because of the way the
       | last/first words of the first/second repetition were mashed
       | together. Does that indicate what was actually given to the
       | engines, or was that a copy-paste issue made only while putting
       | together the article? I could imagine that non-words like
       | "cornera" in the last example could throw things off?
        
       | whatscooking wrote:
       | I like how simple Firefly's images are, like something you'd want
       | to work with in Photoshop. Dalle-2 looks terrible. Midjourney is
       | still my favorite.
        
         | chankstein38 wrote:
         | As someone who has spent hours playing with it in Photoshop
         | (Beta) Firefly is actually pretty damned cool!
        
       | rgbrgb wrote:
       | For those curious, I tried the same prompts with Kandinsky 2.1
       | [0]. In my experience it kind of blends the conceptual
       | understanding of DALL-E with the higher quality image generation
       | of Stable Diffusion. Like Midjourney though it kind of injects
       | it's own style and allows you to get "satisfying" results from
       | short prompts.
       | 
       | The flaw with these comparisons is that you really shouldn't use
       | the same prompt with different generators. If you want to get
       | best results you do have to play with the prompts and do a bunch
       | of iteration to kind of explore the latent space and find what
       | you're looking for. The first super long prompt looks like it's
       | tuned for stable diffusion for instance. Different generators
       | also have different syntax (e.g. with stable diffusion you can
       | surround a phrase with parens to give it extra emphasis).
       | 
       | [0]: https://iterate.world/s/clj4n19u20000jv08iqygiaqw
        
       | SoKamil wrote:
       | Can we appreciate how well that lightbox works on this site in a
       | mobile mobile browser, especially Safari? Also the gestures are
       | smooth and do not cause any quirks like unintended refresh
       | gesture
        
       | muhammadusman wrote:
       | Author here: I updated the post to include the generated results
       | from Stable Diffusion and Midjourney (thanks to kouteiheika and
       | mdorazio).
        
       | cainxinth wrote:
       | Amazing how quickly Dalle-2 went from among the best image
       | transformers to among the worst.
        
         | capybara_2020 wrote:
         | It might be a case of them seeing way more potential with LLMs
         | compared to image generation.
        
         | gwern wrote:
         | The stagnation has been very curious. They are part of a large
         | & generally competent org, which otherwise has remained far
         | ahead of the competition, like GPT-4. Except... for DALL-E 2,
         | where it did not just stagnate for over a year (on top of its
         | bizarre blindspots like garbage anime generation), but actually
         | seemed to get _worse_. They have an experimental model of some
         | sort that some people have access to, but even there, it 's
         | nothing to write home about compared to the best models like
         | Parti or eDiff-I etc.
        
           | sebzim4500 wrote:
           | I think they just don't care very much about DALL-E.
           | 
           | Which is fair enough, when you are a (relatively) small
           | company competing with the likes of Google and Meta you
           | really need to focus.
        
           | og_kalu wrote:
           | Nobody is able to use Parti or eDiff. Compared to models you
           | can use, the experimental Dall-e or Bing Image Creator is
           | second only to midjourney in my experience.
        
             | Sharlin wrote:
             | I haven't tried those two, but I'd be surprised if they
             | were better than Stable Diffusion. Which is free, runnable
             | (and trainable!) locally, and already has a large ecosystem
             | of frontends, tweaks and customized models.
        
               | og_kalu wrote:
               | Believe me, i know all about SD's possible customization
               | and tweaks.
               | 
               | I would still easily put both ahead of the base models.
               | You won't match the quality of those models without
               | finetuning. When you do fine-tune, it'll be for a
               | particular aesthetic and you won't match them in terms of
               | prompt understanding and adherence.
        
           | TeMPOraL wrote:
           | I suspect that they consider txt2img to be more of a
           | curiosity now. Sure, it's transformative; it's going to upend
           | whole markets (and make some people a lot of money in the
           | process) - however, it's _just_ producing images. Contrast
           | with LLMs, which have already proven to be generally
           | applicable in great many domains, and that if you squint, are
           | probably capturing the basic mechanisms of _thinking_. OpenAI
           | lost the lead in txt2img, but GPT-4 is still way ahead of
           | every other LLM. It makes sense for them to focus pretty much
           | 100% on that.
        
         | og_kalu wrote:
         | Dall-e experimental is very good (Bing Image creator). I only
         | prefer midjourney to it.
        
         | hathym wrote:
         | chatgpt next...
        
           | ralusek wrote:
           | Dall-E 2 was almost immediately displaced by MidJourney.
           | Nothing comes close to even GPT 3.5 at the moment.
        
             | sebzim4500 wrote:
             | Anthropic's models are better than GPT 3.5 in my opinion.
        
           | denverllc wrote:
           | Why innovate when you can regulate?
        
             | flangola7 wrote:
             | https://time.com/6288245/openai-eu-lobbying-ai-act/
        
         | Applejinx wrote:
         | I don't know, what I saw in there (particularly with the
         | haunted house) was a far broader POTENTIAL RANGE of outputs. I
         | get that they were cheesier outputs, but it seems to me that
         | those outputs were just as capable of coming from the other
         | 'AIs'... if you let them.
         | 
         | It's like each of these has a hidden giant pile of negative
         | prompts, or additional positive prompts, that greatly narrow
         | down the range of output. There are contexts where the Dall-E
         | 'spoopy haunted house ooooo!' imagery would be exactly right...
         | like 'show me halloweeny stock art'.
         | 
         | That haunted house prompt didn't explicitly SAY 'oh, also make
         | it look like it's a photo out of a movie and make it look
         | fantastic'. But something in the more 'competitive' AIs knew to
         | go for that. So if you wanted to go for the spoopy cheesey
         | 'collective unconscious' imagery, would you have to force the
         | more sophisticated AIs to go against their hidden requirements?
         | 
         | Mind you if you added 'halloween postcard from out of a cheesey
         | old store' and suddenly the other ones were doing that vibe six
         | times better, I'd immediately concede they were in fact that
         | much smarter. I've seen that before, too, in different Stable
         | Diffusion models. I'm just saying that the consistency of
         | output in the 'smarter' ones can also represent a thumb on the
         | scale.
         | 
         | They've got to compete by looking sophisticated, so the 'Greg
         | Rutkowskification' effect will kick in: you show off by picking
         | a flashy style to depict rather than going for something
         | equally valid, but less commercial.
        
           | jsnell wrote:
           | It's not just about the haunted house. Just look at the
           | DALLE-2 living room pictures closely. None of it makes any
           | sense. And we're not even talking of subtle details, all of
           | the first three pictures have a central object that the eye
           | should be drawn to that's just a total mess. (The table
           | that's being subsumed by a bunch of melting brown chairs in
           | the first one, the i-don't-even-know-what that seems to be
           | the second picture, and the whatever-this-is on the blue
           | carpet.)
        
       | kouteiheika wrote:
       | For reference, here's what you can get with a properly tweaked
       | Stable Diffusion, all running locally on my PC. Can be set up on
       | almost any PC with a mid range GPU in a few minutes if you know
       | what you're doing. I didn't do any cherry picking; this is the
       | first thing it generated. 4 images per prompt.
       | 
       | 1st prompt: https://i.postimg.cc/T3nZ9bQy/1st.png
       | 
       | 2nd prompt: https://i.postimg.cc/XNFm3dSs/2nd.png
       | 
       | 3rd prompt: https://i.postimg.cc/c1bCyqWR/3rd.png
        
         | senko wrote:
         | I am sure you're right, but "if you know what you're doing"
         | does a lot of heavy lifting here.
         | 
         | We could just as easily say "hosting your own email can be set
         | up in a few minutes if you know what you're doing". I could do
         | that, but I couldn't get local SD to generate comparable images
         | if my life depended on it.
        
           | caseyf wrote:
           | If you have an apple device, there is free GUI for Stable
           | Diffusion called "Draw Things. It is nice and it just works.
           | https://apps.apple.com/us/app/6444050820
           | 
           | screenshot of the options interface:
           | https://stash.cass.xyz/drawthings-1687292611.png
        
         | muhammadusman wrote:
         | thanks for doing this, I would like to include these into the
         | blog post as well. Can I use these and credit you for them?
         | (let me know what you'd like linked)
        
           | kouteiheika wrote:
           | Sure. No need to credit me.
        
             | muhammadusman wrote:
             | thanks, updated the post with your results as well :)
        
         | [deleted]
        
         | ewjt wrote:
         | Can you elaborate on "properly tweaked"? When I use one of the
         | Stable Diffusion and AUTOMATIC1111 templates on runpod.io, the
         | results are absolutely worthless.
         | 
         | This is using some of the popular prompts you can find on sites
         | like prompthero that show amazing examples.
         | 
         | It's been serious expectation vs. reality disappointment for me
         | and so I just pay the MidJourney or DALL-E fees.
        
           | capybara_2020 wrote:
           | First off are you using a custom model or the default SD
           | model? The default model is not the greatest. Have you tried
           | controlnet?
           | 
           | But yes SD can be a bit of a pain to use. Think of it like
           | this. SD = Linux, Midjourney = Windows/MacOS. SD is more
           | powerful and user controllable but that also means it has a
           | steeper learning curve.
        
           | orbital-decay wrote:
           | Are you using txt2img with the vanilla model? SD's actual
           | value is in the large array of higher-order input methods and
           | tooling; as a tradeoff, it requires more knowledge. Similarly
           | to 3D CGI, it's a highly technical area. You don't just enter
           | the prompt with it.
           | 
           | You can finetune it on your own material, or choose one of
           | the hundreds of public finetuned models. You can guide it in
           | a precise manner with a sketch or by extracting a pose from a
           | photo using controlnets or any other method. You can
           | influence the colors. You can explicitly separate prompt
           | parts so the tokens don't leak into each other. You can use
           | it as a photobashing tool with a plugin to popular image
           | editing software. Things like ComfyUI enable extremely
           | complicated pipelines as well. etc etc etc
        
             | nomand wrote:
             | Is there a coherent resource (not a scattered 'just google
             | it' series of guides from all over the place) that
             | encapsulates some of the concepts and workflows you're
             | describing? What would be the best learning site/resource
             | for arriving at understanding how to integrate and
             | manipulate SD with precision like that? Thanks
        
               | kouteiheika wrote:
               | > What would be the best learning site/resource for
               | arriving at understanding how to integrate and manipulate
               | SD with precision like that?
               | 
               | Honestly? Probably YouTube tutorials.
        
               | TeMPOraL wrote:
               | Jaysus.
               | 
               | I'm going to sound like an entitled whiny old guy
               | shouting at clouds, but - what the hell; with all the
               | knowledge being either locked and churned on Discord, or
               | released in form of YouTube videos with no transcript and
               | extremely low content density - how is anyone with a job
               | supposed to keep up with this? Or is that a new form of
               | gatekeeping - if you can't afford to burn a lot of time
               | and attention as if in some kind of Proof of Work scheme,
               | you're not allowed to play with the newest toys?
               | 
               | I mean, Discord I can sort of get - chit-chatting and
               | shitposting is easier than writing articles or
               | maintaining wikis, and it kind of grows organically from
               | there. But YouTube? Surely making a video takes 10-100x
               | the effort and cost, compared to writing an article with
               | some screenshots, while also being 10x more costly to
               | consume (in terms of wasted time and strained attention).
               | How does _that_ even work?
        
               | bavell wrote:
               | I've been playing with SD for a few months now and have
               | only watched 20-30m of YT videos about it. There's only a
               | few worth spending any time watching, and they're on
               | specific workflows or techniques.
               | 
               | Best just to dive in if you're interested IMO. Otherwise
               | you'll get lost in all the new jargon and ideas. Great
               | place to start is the A1111 repo, lot of community
               | resources available and batteries included.
        
               | orbital-decay wrote:
               | How does anyone keep up with anything? It's a visual
               | thing. A lot of people are learning drawing, modeling,
               | animation etc in the exact same way - by watching YouTube
               | (a bit) and experimenting (a lot).
        
               | TeMPOraL wrote:
               | Picking images from generated sets is a visual thing.
               | Tweaking ControlNet might be too (IDK, I've never got a
               | chance to use it - partly because of what I'm whining
               | about here). However, writing prompts, fine-tuning
               | models, assembling pipelines, renting GPUs, figuring out
               | which software to use for what, where to get the weights,
               | etc. - _none_ of this is visual. It 's pretty much
               | programming and devops.
               | 
               | I can't see how covering this on YouTube, instead of (vs.
               | in addition to) writing text + some screenshots and
               | diagrams, makes any kind of sense.
        
               | kouteiheika wrote:
               | I mostly agree, but in this case it can be genuinely
               | useful to actually _see_ the process of someone using the
               | tool effectively.
        
               | sorenjan wrote:
               | Take a moment and go scroll through the examples at
               | civitai.com. Does most of them strike you as something by
               | people with jobs? Most of them are pretty juvenile, with
               | pretty women and various anime girls.
        
               | sebzim4500 wrote:
               | Are you under the impression that people with jobs don't
               | like pretty women and anime girls?
        
               | kaitai wrote:
               | An operative word here is people.... the set "people with
               | jobs" contains a far higher fraction of folks who like
               | attractive men than is represented here....
        
               | sorenjan wrote:
               | Of course not, but it looks like a teenage boy's room.
        
             | bavell wrote:
             | ComfyUI is a nice complement to A1111, the node-based
             | editor is great for prototyping and saving workflows.
        
           | kouteiheika wrote:
           | > Can you elaborate on "properly tweaked"?
           | 
           | In a nutshell:
           | 
           | 1. Use a good checkpoint. Vanilla stable diffusion is
           | relatively bad. There are plenty of good ones on civitai.
           | Here's mine: https://civitai.com/models/94176
           | 
           | 2. Use a good negative prompt with good textual inversions.
           | (e.g. "ng_deepnegative_v1_75t", "verybadimagenegative_v1.3",
           | etc.; you can download those from civitai too) Even if you
           | have a good checkpoint this is essential to get good results.
           | 
           | 3. Use a better sampling method instead of the default one.
           | (e.g. I like to use "DPM++ SDE Karras")
           | 
           | There are more tricks to get even better output (e.g.
           | controlnet is amazing), but these are the basics.
        
             | renewiltord wrote:
             | Thank you. I assume there's some community somewhere where
             | people discuss this stuff. Do you know where that is? Or
             | did you just learn this from disparate sources?
        
               | kouteiheika wrote:
               | > I assume there's some community somewhere where people
               | discuss this stuff. Do you know where that is? Or did you
               | just learn this from disparate sources?
               | 
               | I learned this mostly by experimenting + browsing civitai
               | and seeing what works + googling as I go + watching a few
               | tutorials on YouTube (e.g. inpainting or controlnet can
               | be tricky as there are a lot of options and it's not
               | really obvious how/when to use them, so it's nice to
               | actually watch someone else use them effectively).
               | 
               | I don't really have any particular place I could
               | recommend to discuss this stuff, but I suppose
               | /r/StableDiffusion/ on Reddit is decent.
        
               | bavell wrote:
               | Pretty good reddit community, lots of (N/SFW) models and
               | content on CivitAI. Took me a weekend to get setup and
               | generating images. I've been getting good results on my
               | AMD 6750XT with A1111 (vladmandic's fork).
        
             | Lerc wrote:
             | What kind of(and how much) data did you use to train your
             | checkpoint?
             | 
             | I'd like to have a go at making one myself targeted towards
             | single objects (be it car,spaceship, dinner plate, apple,
             | octopus, etc). Most checkpoints are very heavily leaning
             | towards people and portraits.
        
           | og_kalu wrote:
           | You're not going to get even close to Midjourney or even Bing
           | quality on SD without finetuning. It's that simple. When you
           | do finetune, it will be restricted to that aesthetic and you
           | won't get the same prompt understanding or adherence.
           | 
           | For all the promise of control and customization SD boasts,
           | Midjourney beats it hands down in sheer quality. There's a
           | reason like 99% of ai art comic creators stick to Midjourney
           | despite the control handicap.
        
             | SV_BubbleTime wrote:
             | You load a model and have 6 sliders instead of one... it's
             | not exactly "fine tuning".
             | 
             | If you want the power, it's there. But nearly bone stock SD
             | in auto1111 is going to get to any of these examples
             | easily.
             | 
             | Show me the civitai equivalent for MJ or Dalle2. It doesn't
             | exist.
        
               | og_kalu wrote:
               | >You load a model and have 6 sliders instead of one...
               | it's not exactly "fine tuning".
               | 
               | Ok...? Read what i wrote carefully. Your 6 sliders won't
               | produce better images than midjourney for your prompt on
               | the base SD model.
        
             | chankstein38 wrote:
             | I feel like people shouldn't talk in definitives if their
             | message is just going to demonstrate they have no idea what
             | they're talking about.
        
               | og_kalu wrote:
               | I know what i'm talking about lol. I tuned a custom SD
               | model that's downloaded thousands of times a month. I'm
               | speaking from experience more than anything. Don't know
               | why some SD users get so defensive.
        
             | orbital-decay wrote:
             | Yet you are posting this in a thread where GP provided
             | actual examples of the opposite. Look for another comment
             | above/below, there are MJ-generated samples which are
             | comparable but also less coherent than the result from a
             | much smaller SD model. And in case of MJ hallucinations
             | cannot be fixed. MJ is good but it isn't magic, it just
             | provides quick results with little experience required;
             | prompt understanding is still poor, and will stay poor
             | until it's paired with a good LLM.
             | 
             | Neither of the existing models gives actually passable
             | production-quality results, be it MJ or SD or whatever
             | else. It will be quite some time until they get out of the
             | uncanny valley.
             | 
             | > There's a reason like 99% of ai art comic creators stick
             | to Midjourney
             | 
             | They aren't. MJ is mostly used by people without
             | experience, think a journalist who needs a picture for an
             | article. Which is great and it's what makes them good
             | money.
             | 
             | As a matter of fact (I work with artists), for all the
             | surface-visible hate AI art gets in the artist community,
             | many actual artists are using it more and more to automate
             | certain mundane parts of their job to save time, and this
             | is _not_ MJ or Dall-E.
        
               | Miraste wrote:
               | There's a distinction to be made here. Everything that
               | makes SD a powerful tool is the result of being open
               | source. The actual models are significantly worse than
               | Midjourney. If an MJ level model had the tooling SD does
               | it would produce far better results.
        
               | bavell wrote:
               | > If an MJ level model had the tooling SD does it would
               | produce far better results
               | 
               | And vice versa, which is the exciting part to me - only a
               | matter of time!
        
               | whywhywhywhy wrote:
               | Midjourney output all has the same look to it.
               | 
               | If you're ok with basic aesthetics it'll work but if you
               | want something a bit less cringe or that will stand out
               | in marketing it won't cut it.
        
               | Miraste wrote:
               | It only has the same look if it's not given any style
               | keywords. I've been impressed with the output diversity
               | once it's told what to do. It can handle a wide range of
               | art styles.
        
               | og_kalu wrote:
               | >Yet you are posting this in a thread where GP provided
               | actual examples of the opposite.
               | 
               | Opposite of what ? OP posts results from a tuned model.
        
               | orbital-decay wrote:
               | Opposite of this:
               | 
               |  _> For all the promise of control and customization SD
               | boasts, Midjourney beats it hands down in sheer quality._
               | 
               | The results are comparable, but MJ in this comment
               | https://news.ycombinator.com/item?id=36409043
               | hallucinates more (look at the roofs in the second
               | picture). And it cannot be fixed, maybe except for an
               | upscale making it a bit more coherent. Until MJ obtains
               | better tooling (which it might in the next iteration), it
               | won't be as powerful. I'm not even starting on complex
               | compositions, which it simply cannot do.
               | 
               |  _> OP posts results from a tuned model._
               | 
               | Yes, which is the first step you should do with SD, as
               | it's a much smaller and less capable model.
        
         | troupo wrote:
         | "Just" use a "properly tweaked" something.
        
         | jfdi wrote:
         | Nice! Would you mind sharing which stable diff you used / where
         | you obtained from?
        
           | kouteiheika wrote:
           | I'm using my own custom trained model.
           | 
           | Here, I've uploaded it to civitai:
           | https://civitai.com/models/94176
           | 
           | There are plenty of other good models too though.
        
             | bavell wrote:
             | Any tips or guides you followed on training your custom
             | model? I've done a few LoRAs and TI but haven't gotten to
             | my own models yet. Your results look great and I'd love a
             | little insight into how you arrived there and what
             | methods/tools you used.
        
         | bluetidepro wrote:
         | Do you have any good tutorial links to setup Stable Diffusion
         | locally?
        
       | mdorazio wrote:
       | Since the author didn't have access to Midjourney, here's the
       | first two prompts in MJ with default settings (not upscaled):
       | 
       | https://imgur.com/a/siQG06O
       | 
       | https://imgur.com/a/vp2oOHu
        
         | muhammadusman wrote:
         | thanks for sharing this, do you mind if I include this in the
         | post. I will credit you of course (let me know what you'd like
         | linked to).
         | 
         | update: I've edited the post to include these results as well
        
           | mdorazio wrote:
           | Go for it! Happy to help. Let me know if you want upscales.
        
           | gl-prod wrote:
           | something something AI generated cannot be copyrighted [/s]
        
       | poniko wrote:
       | Midjurney is still so far ahead it's no competition. Did a lot of
       | testing today and firefly generated so much errors with fingers
       | and stuff, not seen that since the original stability release.
       | Anyone know if the web firefly and the Photoshop version is the
       | same model?
        
         | jsheard wrote:
         | It's worth noting the difference in how the training material
         | is sourced though, Midjourney is using indiscriminate web
         | scrapes while Firefly is taking the conservative approach of
         | only using images that Adobe holds a license for. Midjourney
         | has the Sword of Damocles hanging over its head that depending
         | on how legal precedent shakes out, its output might end up
         | being too tainted for commercial purposes, and Adobe is betting
         | on being the safe alternative during the period of uncertainly
         | and if the hammer does come down on web-scraping models.
        
           | rafark wrote:
           | Would mid-journey be liable though? I mean you can create
           | copyrighted material using photoshop too. (Even paint!).
           | 
           | If I create a Mickey Mouse using photoshop would adobe be
           | liable for it?
        
             | jsheard wrote:
             | I don't think it really matters whether or not Midjourney
             | themselves are liable, the output of their model being
             | legally radioactive would break their business model either
             | way. They make money by charging users for commercial-use
             | rights to their generations, but a judgement that the
             | generations are uncopyrightable or outright infringing on
             | others copyright would make it effectively useless for the
             | kinds of users who want commercial-use rights.
        
         | ignite wrote:
         | If midjourney could count fingers, I'd be thrilled!
        
         | jrm4 wrote:
         | I'm presuming you're not including Stable Diffusion when you
         | say this; the fact that SD and its variants are defacto
         | _extremely_ "free and open source" presently put it way ahead
         | of anything else, and are likely to do so for some time.
        
         | mettamage wrote:
         | Not with typography though, haha. It can't spell. I had to draw
         | the letters myself
        
           | jamilton wrote:
           | None of these can do text well. There's a model that does do
           | text and composition well, but the name escapes me. And the
           | general quality is much lower overall, so it's a pretty heavy
           | tradeoff.
        
             | kouteiheika wrote:
             | DeepFloyd?
             | 
             | https://github.com/deep-floyd/IF
        
               | throwaway20222 wrote:
               | I believe this is at least one solution, and one that the
               | folks at stability themselves were pushing hard as a next
               | step forward in the development of LLMs.
        
         | [deleted]
        
       | dahwolf wrote:
       | I'm glad it's not just me getting unusable garbage out of Dall-E
       | and glorious results from MidJourney.
        
       | senko wrote:
       | For comparison, these were generated using Stability.ai API:
       | https://postimg.cc/gallery/MQfkgP7/ce388adf
       | 
       | I used stable-diffusion-xl-beta-v2-2-2 model, copypasted prompts
       | from the blog post, one-shot for each prompt. I chose style
       | presets that closely matched the prompt (added as suffixes in
       | image filenames).
        
       | mdorazio wrote:
       | Kind of strange to me that they didn't test any prompts with
       | people in them. In my experience that tends to show the
       | limitations of various models pretty quickly.
        
         | usaar333 wrote:
         | Lighting also tends to be pretty bad in complex scenes. I find
         | the unrealistic shadows tends to break the photorealism of few
         | light source scenes.
        
           | [deleted]
        
       | MediumD wrote:
       | *Shameless Plug*
       | 
       | If you want to play around with OpenJourney (or any other fine-
       | tuned StableDiffusion model). I made my own UI with a free tier
       | at https://happyaccidents.ai/.
       | 
       | It supports all open-sourced fine-tuned models & loras and I
       | recently added ControlNet.
        
       | theobromananda wrote:
       | All three of these are horrible, and running Stable Diffusion
       | locally produces incredibly better results as seen in this
       | comment section.
        
         | fumar wrote:
         | MidJourney produces more consistent and usable results. I am
         | running SD and also pay for MJ. I've tried several checkpoint
         | and loras, but the output is often disappointing or incorrectly
         | using the prompts.
        
       | throwaway742 wrote:
       | My result for prompt 2 using Dreamshaper Stable Diffusion model.
       | 
       | https://i.imgur.com/ipnf3f5.png
        
       ___________________________________________________________________
       (page generated 2023-06-20 23:01 UTC)