hngopher.com

       [HN Gopher] FLUX1.1 [pro] - New SotA text-to-image model from Bl...
       ___________________________________________________________________
        
       FLUX1.1 [pro] - New SotA text-to-image model from Black Forest Labs
        
       Author : fagerhult
       Score  : 161 points
       Date   : 2024-10-03 13:53 UTC (9 hours ago)
        
 (HTM) web link (replicate.com)
 (TXT) w3m dump (replicate.com)
        
       | in3d wrote:
       | Better link https://blackforestlabs.ai/announcing-flux-1-1-pro-
       | and-the-b...
        
       | byteknight wrote:
       | I won't pay for a model, but that cake image looks dang good.
        
       | doctorpangloss wrote:
       | I'm worried about what happens when more people find out about
       | Ideogram.
       | 
       | There are a lot of things that don't appear in ELO scores. For
       | one, they will not reflect that you cannot prompt women's faces
       | in Flux. We can only speculate why.
        
         | giancarlostoro wrote:
         | How locked down is it? My problem with a lot of these is I like
         | to make really ridiculous meme type images, but I run into
         | walls for dumb reasons. Like if I want to make something thats
         | "copyrighted" like a mix of certain characters from one
         | franchise or whatever, I cannot sometimes I get told that the
         | model cannot generate copyrighted content, even though courts
         | ruled that AI generated stuff cannot be copyrighted either
         | way...
         | 
         | I feel like AI should just be treated as fair use as long as
         | its not 100% blatantly a literal clone of the original work.
        
           | doctorpangloss wrote:
           | > How locked down is it? ... I get told that the model cannot
           | generate copyrighted... AI should just be treated as fair use
           | 
           | Ideogram and Flux both have their own broad set of
           | limitations that are non-technical and unpublished. IMO they
           | are not really motivated by legal concerns, other than the
           | lack of transparency itself.
           | 
           | So maybe the issue is that transparency, and that the hazy
           | legal climate means no transparency. You can't go anywhere
           | and see the detailed list of dataset collection and
           | captioning opinions for proprietary models. Open Model
           | Initiative, trying to make a model, did publish their
           | opinions, and they're not getting sued anytime soon. However,
           | their opinions are an endless source of conflict.
        
           | jjordan wrote:
           | I've been using Venice.ai which offers afaik the most
           | uncensored service currently available, outside of running
           | your own instances. No problem with prompts that include
           | copyrighted terms.
        
           | sdenton4 wrote:
           | It's perfectly happy to make an imperial storm trooper riding
           | a dragon, for what it's worth
        
         | liuliu wrote:
         | What do you mean? FLUX.1 prompts women or women faces just
         | fine? Do you mean the skin texture is unrealistic or some other
         | artifacts?
        
           | doctorpangloss wrote:
           | Flux will not adhere to your detailed description of a
           | woman's face nearly as well as it does for a man, and it
           | doesn't adhere to text descriptions of faces well in general.
           | This is not a technical limitation, this was a choice in the
           | captioning of the model's dataset and maybe other more
           | sophisticated decisions like loss. It exhibits similar flaws
           | with its representation of male versus female celebrities; it
           | also exhibits this flaw when you use language that describes
           | male celebrities versus female celebrities appearances.
        
           | jjcm wrote:
           | Flux tends to gravitate towards a single face archetype for
           | both sexes. For women it's a narrow face with a very slightly
           | cleft chin. Men almost always appear with a very short cut
           | beard or stubble. r/stablediffusion calls it the "flux face",
           | and there are several LoRAs that aim to steer the model away
           | from them.
        
           | throwaway314155 wrote:
           | what they really mean is that it's not useful for generating
           | lewd imagery of women. It was likely nerfed in this regard on
           | purpose because BFL didn't want to be associated with that
           | (however legal it may be).
        
             | doctorpangloss wrote:
             | I'm not sure why you're being downvoted because I think
             | this is a misconception that's worth clearing up. There is
             | no aspect of what I'm doing that is lewd or lewd adjacent.
             | I just want control of a character's face for making art
             | for an open source game. While I do not totally understand
             | what specific decisions Flux made that would make their
             | model weak in the regard of specifying the appearance of
             | someone's face, one thing is clear: the humanities people
             | are right, this is like a great example of how censorship
             | and Big Prude has impacted artmaking.
             | 
             | It is actually making it harder to use the technology to
             | represent women characters, which is so ironic. That said,
             | I could just lEaRn tO dRaW or pAy aN aRtIsT right? The
             | discourse around this is so shitty.
        
       | Jackson__ wrote:
       | Ah, that was one short gravy train even by modern tech company
       | standards. Really wish the space was more competitive and open so
       | it wouldn't just be one company at the top locking their models
       | behind APIs.
        
       | sharkjacobs wrote:
       | "state of the art" has become such tired marketing jargon.
       | 
       | "our most advanced and efficient model yet"
       | 
       | "a significant step forward in our mission to empower creators"
       | 
       | I get it, you can't sell things if you don't market them, and you
       | can't make a living making things if you don't sell them, but
       | it's exhausting.
        
         | minimaxir wrote:
         | The official blog post justifies the marketing copy a bit more
         | with metrics.
        
           | sharkjacobs wrote:
           | The point is that the metrics say the thing, this stuff
           | doesn't say actually anything.
           | 
           | What does "state of the art" mean? That it's using the latest
           | "cutting edge" model technology?
           | 
           | When Apple releases a new iPhone Pro Max, it's "state of the
           | art". When they release a new iPhone SE, there's an argument
           | to be made that it's not because it uses 2 year old chips.
           | But what would it even mean for BFL to release a model which
           | wasn't "state of the art"
           | 
           | > our most advanced and efficient model yet
           | 
           | Yes, likewise, this is how technology companies work. They
           | release something and then the next thing they release is
           | more advanced.
           | 
           | > a significant step forward in our mission to empower
           | creators
           | 
           | Going from 12 seconds to 4 seconds is a significant speed
           | boost, but does it move the needle on their mission to
           | empower creators? These are their words, not mine, it's a
           | technical achievement and impressive incremental progress,
           | but are there users out there who are more empowered by this?
           | significantly more empowered!?
        
             | throwaway314155 wrote:
             | Holy shit the level of pedantry. State of the art in this
             | context means it out performs all other models to date on
             | standard evaluations, which is precisely what it does.
             | 
             | Did you miss the first flux release? Black forest labs
             | aren't screwing around. The team consists of many of the
             | _actual_ originators of Stable Diffusion's research (which
             | was effectively co-opted by Emad Mostaque who is likely a
             | sociopath).
        
               | sharkjacobs wrote:
               | > State of the art in this context means it out performs
               | all other models to date on standard evaluations, which
               | is precisely what it does.
               | 
               | That's not what "state of the art" means, and if it did
               | it would still be hollow marketing jargon, because there
               | are specific and meaningful ways to say that FLUX1.1
               | [pro] outperforms all competitors (and they do say so,
               | later in the press release)
               | 
               | Your confusion about what "state of the art" means is
               | exactly why marketers still use the phrase even though it
               | has been overused and worn out since at least the 1980's.
               | State of the art means something is "new", and that it is
               | the "latest development", and that it incorporates
               | "cutting edge" technology. The implication is that new is
               | better, and that the "state of the art" is an improvement
               | over what came before. (And to be clear, that's often
               | true! Including in this case!) But that's not what the
               | phrase actually means, it just means that something is
               | new. And every press release is about something new.
               | 
               | FLUX1.1 [pro] would be state of the art even if it was
               | worse than the previous version. Stable Diffusion 2.0 was
               | state of the art when it was released.
        
               | throwaway314155 wrote:
               | I said in this context for a reason. That's how state of
               | the art has been used (in papers, not copy) with regard
               | to deep learning since well before DALL-E 1. I maintain
               | that you're being pedantic about appropriating a term of
               | art to mean something else. Everyone else here knows what
               | the meaning is in context. Just not you.
        
         | bemmu wrote:
         | Flux genuinely is the best model I've tried though. If there is
         | a better one I'd love to know.
        
           | GaggiX wrote:
           | Have you tried Ideogram v2?
        
             | SV_BubbleTime wrote:
             | Have you run Ideogram offline?
        
               | GaggiX wrote:
               | Have you run Flux Pro offline?
        
               | SV_BubbleTime wrote:
               | No, only a dozen Flux Dev models different distillations,
               | quantizations, and fine-tunes with LORAs.
               | 
               | But you keep pretending that close source AI is a
               | sustainable comparison.
        
               | GaggiX wrote:
               | Flux Pro (v1 and v1.1) is a close source model.
        
         | halJordan wrote:
         | It is state of the art. And it's not like the art has
         | stagnated.
        
         | arizen wrote:
         | - How do copywriters greet each other in the morning?
         | 
         | - Take your morning to the next level!
        
         | vunderba wrote:
         | Agreed, but the flux dev model is easily the best model out
         | there in terms of overall prompt adherence _that can also be
         | run locally._
         | 
         | Some comparisons against DALL-E 3.
         | 
         | https://mordenstar.com/blog/flux-comparisons
        
         | johnfn wrote:
         | Flux _is_ state of the art. You can see an ELO-scored
         | leaderboard here:
         | 
         | https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Ima...
        
       | jchw wrote:
       | The generated images look impressive of course but I can't help
       | but be mildly amused by the fact that the prompt for the second
       | example image insists strongly that the image should say 1.1:
       | 
       | > ... photo with the text "FLUX 1.1 [Pro]", ..., must say "1.1",
       | ...
       | 
       | ...And of course, it does not.
        
         | thisisnotauser wrote:
         | Wild, ideogram got it in one for free:
         | 
         | https://ideogram.ai/g/glE5qDz-SQyVVv2hdo0vLg/0
        
       | nirav72 wrote:
       | Are there any projects that allow for easy setup and hosting Flux
       | locally? Similar to SD projects like InvokeAI or a1111
        
         | minimaxir wrote:
         | Flux is more weird than old SD projects since Flux is extremely
         | resource dependant and won't run on most hardware.
        
           | Filligree wrote:
           | The GGUF quantisations do run on most recent hardware, albeit
           | at increasingly concerning quality tradeoffs.
        
             | tripplyons wrote:
             | I haven't noticed any quality degradation with the 8-bit
             | GGUF for Flux Dev, but I'm sure the smaller quantizations
             | perform worse.
        
           | ziddoap wrote:
           | People have Flux running on pretty much everything at this
           | point, assuming you are comfortable waiting 3+ minutes for a
           | 512x512 image.
           | 
           | I managed to get it running on an old computer with a 2060
           | Super, taking ~1.5 minutes per image gen. People are
           | generating on a 1080.
        
           | waffletower wrote:
           | Doesn't take a lot of effort to get Flux dev/schnell to run
           | on 3090s unquantized, but I agree that 24gb is the consumer
           | GPU memory limit and there are many with less than that. Flux
           | runs great on modern Mac hardware as well, if you have at
           | least 32gb of unified memory.
        
             | stoobs wrote:
             | I'm running Flux dev fine on a 3080 10GB, unquantised, on
             | windows the nvidia drivers have a function to let it spill
             | over into system ram. It runs a little slower, but it's not
             | a deal-breaker unlike nvidia's pricing and power
             | requirements at the moment
        
               | zamadatix wrote:
               | What are you using to run it? When I run Flux Dev in
               | Windows using comfy on a 4090 (24 GB) sometimes it all
               | crashes because it runs out of VRAM when I'm doing too
               | much other stuff.
        
               | waffletower wrote:
               | Not a good reference for windows -- I use HuggingFace
               | APIs on cog/docker deployments in Linux. I needed to use
               | `PYTORCH_NO_CUDA_MEMORY_CACHING=1 -e
               | PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` envvars
               | to eliminate memory errors on the 3090s. When I run on
               | the Mac there is enough memory not to require
               | shenanigans. Runs approximately as fast as the 3090s but
               | the 3090s heat my basement and the Mac heats my face.
        
             | drcongo wrote:
             | Really? I tried using it in ComfyUI on my Mac Studio,
             | failed, went searching for answers and all I could find
             | said that something something fp8 can't run on a Mac, so I
             | moved on.
        
               | zamadatix wrote:
               | If you're looking for a prebuilt "no tinkering" solution
               | https://diffusionbee.com/ is an open source app (Github
               | link at the bottom of the page if you want to see the
               | code) which has a built in button to import Flux models
               | at the bottom of the home screen.
        
               | drcongo wrote:
               | Thanks, I'll take a look.
        
               | waffletower wrote:
               | I should have qualified that I run Flux.1 dev and schnell
               | on a Mac via HuggingFace and pytorch, and am not
               | knowledgeable about ComfyUI support for these models. The
               | code required is pretty tiny though.
        
         | leumon wrote:
         | Using comfyui with the official flux workflow is easy and works
         | nicely. comfy can also be used via API.
        
         | doctorpangloss wrote:
         | https://huggingface.co/docs/diffusers/main/en/api/pipelines/...
         | 
         | It's about 6 lines of Python.
        
         | nickthegreek wrote:
         | Forge
         | 
         | https://github.com/lllyasviel/stable-diffusion-webui-forge
         | 
         | https://www.reddit.com/r/StableDiffusion/comments/1esxkk8/ho...
        
         | vunderba wrote:
         | The answer is it really depends on your hardware, but the nice
         | thing is that you can split out the text encoder when using
         | ComfyUI. On a 24gb VRAM card I can run the Q8_0 GGUF version of
         | flux-dev with the T5 FP16 text encoder. The Q8_0 gguf version
         | in particular has very little visual difference from the
         | original fp16 models. A 1024x1024 image takes about 15 seconds
         | to generate.
        
         | pdntspa wrote:
         | DrawThings on Mac
        
         | sophrocyne wrote:
         | Invoke is model agnostic, and supports Flux, including
         | quantized versions.
        
       | melvinmelih wrote:
       | In case you want to try it out without hassling with the API,
       | I've set up a free tool for it so you can try it out on WhatsApp:
       | https://instatools.ai/products/fluxprovisions
        
       | vessenes wrote:
       | Flux is so frustrating to me. Really good prompt adherence,
       | strong ability to keep track of multiple parts of a scene, it's
       | technically very impressive. However it seems to have had no
       | training on art-art. I can't get it to generate even something
       | that looks like Degas, for instance. And, I can't even fine tune
       | a painterly art style of any sort into Flux dev. I get that there
       | was working, living artist backlash at SD and I can therefore
       | imagine that the BFL team has decided not to train on art, but,
       | it's a real loss. Both in terms of human knowledge of, say
       | composition, emotion, and so on, but also for style diversity.
       | 
       | For goodness sake, the MET in New York has a massive trove of
       | open CC0 type licensed art. Dear BFL, please ease up a bit on
       | this, and add some art-art to your models, they will be better as
       | a result.
        
         | throwup238 wrote:
         | I've had the same problem with photography styles, even though
         | the photographer I'm going for is Prokudin-Gorskii who used
         | _emulsion plates_ in the 1910s and the entire Library of
         | Congress collection is in the public domain. I'm curious how
         | they even managed to remove them from the training data since
         | the entire LoC is such an easy dataset to access.
        
           | throwaway314155 wrote:
           | i'm fairly confident they did a broad FirstName LastName
           | removal.
        
           | vessenes wrote:
           | Yes, exactly. I think they purposely did not train on stuff
           | like this. I'd bet that you could do a LoRa of Prokudin-
           | Gorskii though; there's a lot of photographic content in
           | flux's training set.
        
         | gs17 wrote:
         | And I can't imagine there's a real copyright (or ethical) issue
         | with including artwork in the public domain because the artist
         | died over a century ago.
        
         | pdntspa wrote:
         | I wonder if you can use Flux to generate the base image then
         | img2img on SD1.4 to impart artistic style?
        
           | vunderba wrote:
           | That's what a refiner is for in auto1111. Taking an image the
           | last 10% and touching it up with an alternative model.
           | 
           | I actually use flux to generate image for purposes of
           | _adherence_ , then pull it in as a canny/depth controlnet
           | with more established models like realvis, unstableXL, etc.
        
             | andersa wrote:
             | That is an interesting idea, I somehow hadn't thought of
             | using flux in a chain like that, thanks!
        
             | vessenes wrote:
             | Yes, that is my current workflow as well.
        
         | crystal_revenge wrote:
         | I've had a similar experience, incredible at generating a very
         | specific style of image, but not great at generating anything
         | with a specific style.
         | 
         | I suspect we'll see the answer to this is LoRAs. Two examples
         | that stick out are:
         | 
         | - Flux Tarot v1 [0]
         | 
         | - Flux Amateur Photography [1]
         | 
         | Both of these do a great job of combining all the benefits of
         | Flux with custom styles that seem to work quite well.
         | 
         | [0] https://huggingface.co/multimodalart/flux-tarot-v1 [1]
         | https://civitai.com/models/652699?modelVersionId=756149
        
           | vessenes wrote:
           | I like those, and there's an electroshock lora that's just
           | awesome out there. That said, Tarot and others like it are
           | "illustrator" type styles with extra juice. I have not
           | successfully trained a LoRa for any painting style, Flux does
           | not seem to know about painting.
        
             | davidbarker wrote:
             | I'm curious to give this a go. I've been training a lot of
             | LoRAs for FLUX dev recently (purely for fun). I'm sure
             | there must be a way to get this working.
             | 
             | Here are a few I've recently trained:
             | https://civitai.com/user/dvyio
        
               | spython wrote:
               | This looks really good! What is your process to get this
               | kind of high quality LoRAs?
        
             | vessenes wrote:
             | @davidbarker -- please do, that sounds awesome! I did not
             | have good results.
        
         | thomastjeffery wrote:
         | I think that's part of what makes FLUX.1 so good: the content
         | it's trained on is very _similar_.
         | 
         | Diversity is a double-edged sword. It's a desirable feature
         | where you want it, and an undesirable feature everywhere else.
         | If you want an impressionist painting, then it's good to have
         | Monet and Degas in the training corpus. On the other hand, if
         | you want a photograph of water lilies, then it's good to keep
         | Monet _out_ of the training data.
        
           | doctorpangloss wrote:
           | DALL-E3 doesn't struggle with this. It's just opinions.
           | There's no technical limitation. They chose to weaken the
           | model in this regard.
        
             | thomastjeffery wrote:
             | Nonsense. FLUX.1-dev is _famous_ for its consistency,
             | prompt adherence, etc.; and it fits on a consumer GPU. That
             | has to come with compromises. You can call any optimization
             | weakness: that 's the nature of compromise.
        
         | whywhywhywhy wrote:
         | >However it seems to have had no training on art-art. I can't
         | get it to generate even something that looks like Degas, for
         | instance
         | 
         | It feels like they just removed names from the datasets to make
         | it worse at recreating famous people and artists.
        
           | vessenes wrote:
           | No, they absolutely did not just do that in this case,
           | although that was the SD plan. If you prompt for "painterly,
           | oil painting, thick brush strokes, impressionistic oil
           | painting style" to flux, you will get ... anime-ish
           | renderings.
        
         | skort wrote:
         | >but, it's a real loss. Both in terms of human knowledge of,
         | say composition, emotion, and so on, but also for style
         | diversity
         | 
         | But that real art still exists, and can still be found, so what
         | exactly is the loss here?
        
           | vessenes wrote:
           | We may differ on our take about the usefulness of diffusion
           | models, but I'd say it's a loss in that many of the visuals
           | humans will see in the next ten years are going to be
           | generated by these models, and I for one wish they weren't
           | just trained on weeb shit.
        
             | dagaci wrote:
             | Just think that before 1995 (and in reality, decades later
             | than that) most of the world would never have access to 99%
             | of the worlds art.
             | 
             | And between 1995 and 2022 the amount of Art produced
             | surpasses the cumulative output of all other periods of
             | human history.
        
               | vessenes wrote:
               | ... And between 2022 and 2025 the amount of imagery
               | generated will drive the percent of Art created to
               | roughly 0% of all imagery.
        
             | skort wrote:
             | You'll still be able to ask a person to create art in a
             | specific style if you'd like.
        
       | skybrian wrote:
       | It doesn't get piano keyboards right, but it's the first image
       | generator I've tried that sometimes get "someone playing
       | accordion" mostly right.
       | 
       | When I ask for a man playing accordion, it's usually a somewhat
       | flawed piano accordion, but If I ask for a woman playing
       | accordion, it's usually a button accordion. I've also seen a few
       | that are half-button, half-piano monstrosities.
       | 
       | Also, if I ask for "someone playing accordion", it's always a
       | woman.
        
         | vunderba wrote:
         | Periodic data is always hard for generative image systems -
         | particularly if that "cycle" window is relatively large (as
         | would be the case for octaves of a piano).
        
           | skybrian wrote:
           | Yeah, it's my informal test to see if a new model has made
           | any progress on that.
        
       | Der_Einzige wrote:
       | Far more interesting will be when pony diffusion V7 launches.
       | 
       | No one in the image space wants to admit it, but well over half
       | of your user base wants to generate hardcore NSFW with your
       | models and they mostly don't care about any other capabilities.
        
       | ks2048 wrote:
       | Is there a good site that compares text-to-image models - showing
       | a bunch of examples of text w/ output on each model?
        
       | jeffbee wrote:
       | I asked for a simple scene and it drew in the exact same AI girl
       | that every text-to-image model wants to draw, same face, same
       | hair, so generic that a Google reverse image search pulls up
       | thousands of the exact same AI girl. No variety of output at all.
        
       | ilaksh wrote:
       | Pretty smart model. Here's one I made:
       | https://replicate.com/p/6ez0x8xqvsrga0cjadg8m7bah0
        
         | loufe wrote:
         | That is astoundingly good adherence to the description. I
         | already liked and was impressed by Flux1 but that is perhaps
         | the most impressive image generation I've ever seen.
        
           | miohtama wrote:
           | Is it going be able to go head-to-head against Midjourney?
        
             | vunderba wrote:
             | MJ is by far the worst model for complex prompt _ADHERENCE_
             | , though it has excellent compositional quality.
             | 
             | Comparisons of similar prompt using Midjourney 6.1
             | 
             | https://imgur.com/a/WBnPl7I
             | 
             | Also, flux (schnell, dev) can be run on your local machine.
             | 
             | If you really want to use a paid service, Ideogram is
             | probably the best one out there that balances quality with
             | adherence. DALL-E 3 also has good adherence as well though
             | the quality can sometimes be iffy, and it's very
             | puritanical in terms of censorship.
        
         | drdaeman wrote:
         | Yet, it doesn't seem to know how a Tektronix 4010 actually
         | looks like... ;)
         | 
         | I had similar issues trying to paint a "I cast non-magic
         | missile" meme with a fantasy wizard using a missile launcher.
         | No model out there (I've tried SD, SDXL, FLUX.1dev and now this
         | FLUX1.1pro) knows how a missile launcher looks like (neither as
         | a generic term, nor any specific systems) and even has no clue
         | how it's held, so they all draw _really weird_ contraptions.
        
           | morbicer wrote:
           | Isn't it because the shoulder launched weapon is usually
           | called rocket launcher, rpg or bazooka? Never heard it
           | referred as misille launcher.
        
             | drdaeman wrote:
             | I've tried all of those and then some (e.g. "ATGM"), plus
             | various specific names (like "FGM-148 Javelin", "M1
             | Bazooka", or "RPG-7", which are all quite iconic and well-
             | recognized so I thought some of those may appear in
             | training data) - all no bueno. Models are simply unaware
             | about such devices, best of their "guesses" is that it's a
             | weapon, so they draw something rifle- or pistol-shaped.
             | 
             | And, sure, that's what LoRAs are for. If I can figure out
             | how to train one for FLUX, in a way that would actually
             | produce something meaningful (my pitiful attempts at SDXL
             | LoRA training were... less that stellar, and FLUX is quite
             | different from everything). Although that's probably not
             | worth it for making a meme picture...
        
         | PcChip wrote:
         | agreed - pretty impressive!
         | https://replicate.com/p/ajfrva4p4hrge0cjaf3bncfwn4
        
         | nikcub wrote:
         | I've gone from counting fingers on a hand to keys on a keyboard
        
         | loxias wrote:
         | It's quite good at following a detailed paragraph long
         | description of an scene, which is a double edged sword. A lot
         | of the fun for me with early text to image models was
         | underspecifying an image and then enjoying how the model
         | "invents" it. "Steampunk spaceship", "communist bear", "glass
         | city".
         | 
         | flux is amazing, but I find it requires a very literal
         | description, which pushes the "creative work" back to the text
         | itself. Which can certainly be a good thing, just a bit less
         | gratifying to non visual types like myself. :)
         | 
         | I wonder, only somewhat jokingly, if one could make text
         | generators which "imagine" detailed fantastical scenes,
         | suitable for feeding to a text to image model.
        
           | vunderba wrote:
           | That's what Fooocus is - it allows you to specify a "text
           | expander" LLM that sits in between the input prompt and the
           | diffusion model.
           | 
           | https://github.com/lllyasviel/Fooocus
        
           | ilaksh wrote:
           | Prompt enhancement is now a standard feature in many image
           | generation tools.
        
         | jug wrote:
         | One thing that makes FLUX so special is the prompt
         | understanding. I now gave FLUX 1.1 a prompt "Closeup of a doll
         | house built to resemble a famous room in the TV show Friends"
         | and it gave me one with the sign "Central Perk". I never
         | prompted for the text "Central Perk". A Redditor also
         | discovered that it has an associative understanding of
         | emotions. For example "Rose of passion" and it may draw a
         | flower that is burning, because passion is fiery.
         | 
         | This is miles ahead of most other image generation models
         | available today.
        
       | ChrisArchitect wrote:
       | Announcement post: https://blackforestlabs.ai/announcing-
       | flux-1-1-pro-and-the-b...
       | 
       | (https://news.ycombinator.com/item?id=41730626)
        
       | whitehexagon wrote:
       | I'm running Asahi Linux on a 32GB M1 Pro. Any chance of being
       | able to run text-to-image models locally? I've had some success
       | with LLMs, but only the smaller models. No idea where to start
       | with images, everything seems geared towards msft+nvda.
        
         | collinvandyck76 wrote:
         | DiffusionBee will let you do this quite easily.
         | 
         | edit: nevermind, it's a macos app
        
           | lagniappe wrote:
           | Is DiffusionBee still in development? I had stopped using it
           | because it seemed like the dev interest had stalled.
        
         | LeoPanthera wrote:
         | "Draw Things" is a native Mac app for text to image. It's a a
         | lot more advanced than DiffusionBee, it will download the
         | models for you, and it's free. It's also available for iOS. (!)
        
           | smcleod wrote:
           | Draw things is neat but it's so damn slow compared to other
           | tools (e.g. invokeai), I'm not sure why it takes so long to
           | generate images with any model?
        
         | loxias wrote:
         | Try https://github.com/leejet/stable-diffusion.cpp
        
       | ionwake wrote:
       | Sorry to be a noob, but how does this relate to fastflux.ai which
       | seems to work great and creates an image in less than a second?
       | Is this a new model on a slower host?
        
       | nubinetwork wrote:
       | I tried using schnell, it won't fit in a 16gb GPU, and I couldn't
       | get it to run on CPU.
        
       ___________________________________________________________________
       (page generated 2024-10-03 23:00 UTC)