[HN Gopher] Flux: Open-source text-to-image model with 12B param...
       ___________________________________________________________________
        
       Flux: Open-source text-to-image model with 12B parameters
        
       Author : CuriouslyC
       Score  : 330 points
       Date   : 2024-08-01 16:03 UTC (6 hours ago)
        
 (HTM) web link (blog.fal.ai)
 (TXT) w3m dump (blog.fal.ai)
        
       | CuriouslyC wrote:
       | HF page: https://huggingface.co/black-forest-labs
        
       | og_kalu wrote:
       | You can try the models on replicate https://replicate.com/black-
       | forest-labs.
       | 
       | Result (distilled schnell model) for
       | 
       | "Photo of Criminal in a ski mask making a phone call in front of
       | a store. There is caption on the bottom of the image: "It's time
       | to Counter the Strike...". There is a red arrow pointing towards
       | the caption. The red arrow is from a Red circle which has an
       | image of Halo Master Chief in it."
       | 
       | https://www.reddit.com/r/StableDiffusion/s/SsPeQRJIkw
        
       | SV_BubbleTime wrote:
       | Wow.
       | 
       | I have seen a lot of promises made by diffusion models.
       | 
       | This is in a whole different world. I legitimately feel bad for
       | the people still a StabilityAI.
       | 
       | The playground testing is really something else!
       | 
       | The licensing model isn't bad, although I would like to see them
       | promise to open up their old closed source models under Apache
       | when they release new API versions.
       | 
       | The prompt adherence and the breadth of topics it seems to know
       | without a finetune and without any LORAs, is really amazing.
        
       | minimaxir wrote:
       | The [schnell] model variant is Apache-licensed and is open
       | sourced on Hugging Face: https://huggingface.co/black-forest-
       | labs/FLUX.1-schnell
       | 
       | It is very fast and very good at rendering text, and appears to
       | have a text encoder such that the model can handle both text and
       | positioning much better:
       | https://x.com/minimaxir/status/1819041076872908894
       | 
       | A fun consequence of better text rendering is that it means text
       | watermarks from its training data appear more clearly:
       | https://x.com/minimaxir/status/1819045012166127921
        
         | dheera wrote:
         | Thank you. Their website is super hard to navigate and I can't
         | find a "DOWNLOAD" button.
        
           | minimaxir wrote:
           | Note that actually _running_ the model without a A100 GPU or
           | better will be tricker than usual given its size (12B
           | parameters, 24GB on disk).
           | 
           | There is a PR to that repo for a diffusers implementation,
           | which _may_ run on a cheap L4 GPU w /
           | enable_model_cpu_offload(): https://huggingface.co/black-
           | forest-labs/FLUX.1-schnell/comm...
        
             | CuriouslyC wrote:
             | 3090 TIs should be able to handle it without much in the
             | way of tricks for a "reasonable" (for the HN crowd) price.
        
               | fl0id wrote:
               | higher ram apple silicon should be able to run it too. if
               | they don't use some ancient pytorch version or something.
        
               | phkahler wrote:
               | Why not on a CPU with 32 or 64 GB of RAM?
        
               | CuriouslyC wrote:
               | Performance, mostly. It'll work but image generation is
               | shitty to do slowly compared to text inference.
        
               | holoduke wrote:
               | Much slower memory and limited parallelism. Gpu /- 8k pr
               | more cuda cores vs +-16 on regular cpu. Less mem swapping
               | between operations. Gpu much much faster.
        
             | dheera wrote:
             | You don't need an A100, you can get a used 32GB V100 for
             | $2K-$3K. It's probably the absolute best bang-for-buck
             | inference GPU at the moment. Not for speed but just the
             | fact that there are models you can actually fit on it that
             | you can't fit on a gaming card, and as long as you can fit
             | the model, it is still lightyears better than CPU
             | inference.
        
         | nwoli wrote:
         | That's not really fair to conclude that the training data
         | contains vanity fair images since the prompt includes "by
         | Vanity Fair".
         | 
         | I could write "with text that says Shutterstock" in the prompt
         | but that doesn't _necessairly_ mean the dataset contains that
        
           | minimaxir wrote:
           | The logo has the same exact copyrighted typography as the
           | real Vanity Fair logo. I've also reproduced the same-
           | copyrighted-typography with other brands with identical
           | composition as copyrighted images. Just asking it "Vanity
           | Fair cover story about Shrek" at a 3:2 ratio gives it a
           | composition identical to a Vanity Fair cover very
           | consistently (subject is in front of logo typography
           | partially obscuring it)
           | 
           | The image linked has a traditional www watermark in the
           | lower-left as well. Even something innocous as a "Super Mario
           | 64" prompt shows a copyright watermark:
           | https://x.com/minimaxir/status/1819093418246631855
        
             | Carrok wrote:
             | On my list of AI concerns, whether or not Vanity Fair has
             | it's copyright infringed does not appear.
        
           | smith7018 wrote:
           | Are you suggesting that the model independently came up with
           | Vanity Fair's logo, including font and kerning?
           | 
           | https://www.vanityfair.com/verso/static/vanity-
           | fair/assets/l...
        
         | RobotToaster wrote:
         | How does the licence work when there's a bunch of restrictions
         | at the bottom of that page that seem to contradict the licence?
        
           | minimaxir wrote:
           | IANAL but I suspect that "Out-of-Scope Use" has no legal
           | authority.
        
       | TechDebtDevin wrote:
       | Damn this is actually really good.
        
       | kennethwolters wrote:
       | It is very good at "non-human subjects in photos with shallow
       | focus".
       | 
       | Really curious to see what other low-hanging fruits people are
       | finding.
        
         | CuriouslyC wrote:
         | Check out reddit.com/r/stablediffusion, it's been handling
         | everything people have thrown at it so far.
        
       | asadm wrote:
       | This is actually really good! I fear much better than SD3 even!
        
       | PoignardAzur wrote:
       | Am I missing something? The beach image they give still fails to
       | follow the prompt in major ways.
        
         | swatcoder wrote:
         | You're not. I'm surprised at their selections because neither
         | the cooking one nor the beach one adhere to the prompt in very
         | well, and that first one only does because it prompt largely
         | avoids much detail altogether. Overall, the announcement gives
         | the sense that it can make pretty pictures but not very precise
         | ones.
        
       | yjftsjthsd-h wrote:
       | > FLUX.1 [dev]: The base model, open-sourced with a non-
       | commercial license
       | 
       | ...then it's not open source. At least the others are Apache 2.0
       | (real open source) and correctly labeled proprietary,
       | respectively.
        
       | dinobones wrote:
       | I wonder if the key behind the quality of the MidJourney models,
       | and this models, is less about size + architecture and more about
       | the quality of images trained on.
       | 
       | It looks like this is the case for LLMs, that the training
       | quality of the data has a significant impact on the output
       | quality of the model, which makes sense.
       | 
       | So the real magic is in designing a system to curate that high
       | quality data.
        
         | jncfhnb wrote:
         | No, it's definitely the size. Tiny LLMs are shit. Stable
         | Diffusion 3's problem is not that that its training set was
         | wildly different, it's that it's just too small (because the
         | one released so far is not the full size).
         | 
         | You can get better results with better data, for sure. And
         | better architecture, for sure. But raw size is really important
         | the difference in quality for models, all else held equal, is
         | HUGE and obvious if you play with them.
        
         | 42lux wrote:
         | It's the quality of the image text pair not the image alone but
         | midjourney is not a model it's a suite of models that work in
         | conjunction. They have an llm in the front to optimize the user
         | prompts, they use SAM models, controlnet models for poses that
         | are in high demand and so much more. That's why you can't
         | really compare foundation models anymore because there are
         | none.
        
         | pzo wrote:
         | I would agree - midjourney is getting a free labour since many
         | of their generations are not in secret mode (require pro/mega
         | subscription) so prompts and outputs are visible to everyone.
         | Midjourney rewards users to rating those generations. I
         | wouldn't be surprised if there are some bots on their discord
         | that are scraping those data for training their own models.
        
         | CuriouslyC wrote:
         | Midjourney unquestionably has heavy data set curation and uses
         | RLHF from users.
         | 
         | You don't have to speculate on this as you can see that custom
         | models for SDXL for instance perform vastly better than vanilla
         | SDXL at the same number of parameters. It's all data set and
         | tagging.
        
           | spywaregorilla wrote:
           | custom models perform vastly better _at the tasks they are
           | finetuned to do_
        
             | CuriouslyC wrote:
             | That is technically true, but when the base model is
             | wasting parameter information on poorly tagged, watermarked
             | stock art and other garbage images, it's not really a
             | meaningful distinction. Better data makes for better
             | models, nobody cares about how well a model outputs trash.
        
               | spywaregorilla wrote:
               | Ok, but you're severely misrepresenting the importance of
               | things. Base SDXL is a fine model. Base SDXL is going to
               | be much better than a materially smaller model that
               | you've retrained with "good data".
        
       | jncfhnb wrote:
       | Looks like a very promising model. Hope to see the comfyui
       | community get it going quickly
        
         | lovethevoid wrote:
         | Its already available for comfyui
        
           | jncfhnb wrote:
           | It'll need time for the goodies beyond the base model though
           | I would guess
        
             | lovethevoid wrote:
             | Works great as is right now, I can see some workflows being
             | affected or having to wait for an update, but even those
             | can do with some temporary workarounds (like having to load
             | another model for later inpainting steps).
             | 
             | So if you're wanting to experiment and have a 24GB card,
             | have at it!
        
               | jncfhnb wrote:
               | Yeah I mean like controlnet / ipadapter / animateddiff /
               | in painting stuff
               | 
               | I don't feel like base models are super useful. Most real
               | use cases depend on being able to iterate on consistent
               | outputs imo.
               | 
               | I have had a very bad experience trying to use other
               | models to modify images but I mostly do anime shit and
               | maybe styles are less consistently embedded into language
               | for those models
        
       | fl0id wrote:
       | Mmmh, trying my recent test prompts, still pretty shit. F.e.
       | whereas midjourney or SD do not have a problem to create a pencil
       | sketch, with this model (pro), it always looks more like a black
       | and white photograph or digital illustration or render. It is
       | also like all the others apparently not able to follow
       | instructions on the position of characters. (i.e. X and Y are
       | turned away from each other).
        
       | treesciencebot wrote:
       | You can try the models here:
       | 
       | (available without sign-in) FLUX.1 [schnell] (Apache 2.0, open
       | weights, step distilled): https://fal.ai/models/fal-
       | ai/flux/schnell
       | 
       | (requires sign-in) FLUX.1 [dev] (non-commercial, open weights,
       | guidance distilled): https://fal.ai/models/fal-ai/flux/dev
       | 
       | FLUX.1 [pro] (closed source [only available thru APIs], SOTA,
       | raw): https://fal.ai/models/fal-ai/flux-pro
        
         | layer8 wrote:
         | Requires sign-in with a GitHub account, unfortunately.
        
           | wavemode wrote:
           | I think they may have turned on the gating some time after
           | this was submitted to HackerNews. Earlier this morning I
           | definitely ran the model several times without signing in at
           | all (not via GitHub, not via anything). But now it says "Sign
           | in to run".
        
             | treesciencebot wrote:
             | i just updated the links to clarify which models require
             | sign-in and which doesn't!
        
         | RobotToaster wrote:
         | What is the difference between schnell and dev? Just the kind
         | of distillation?
        
           | schleck8 wrote:
           | Schnell is definitely worse in quality, although still
           | impressive (it gets text right). Dev is the really good one
           | that arguably outperforms the new Midjourney 6.1
        
         | Aardwolf wrote:
         | What's the difference between pro and dev? Is the pro one also
         | 12B parameters? Are the example images on the site (the
         | patagonia guy, lego and the beach potato) generated with dev or
         | pro?
        
           | treesciencebot wrote:
           | I think they are mainly -dev and -schnell. Both models are
           | 12B. -pro is the most powerful and raw, -dev is guidance
           | distilled version of it and -schnell is step distilled
           | version (where you can get pretty good results with 2-8
           | steps).
        
         | Vinnl wrote:
         | > (available without sign-in) FLUX.1 [schnell] (Apache 2.0,
         | open weights, step distilled): https://fal.ai/models/fal-
         | ai/flux/schnell
         | 
         | Well, I was wondering about bias in the model, so I entered "a
         | president" as the prompt. Looks like it has a bias alright, but
         | it's even more specific than I expected...
        
           | teamspirit wrote:
           | You weren't kidding. Tried three times and all three were
           | variations of the same[0].
           | 
           | [0]
           | https://fal.media/files/elephant/gu3ZQ46_53BUV6lptexEh.png
        
       | zarmin wrote:
       | WILD
       | 
       | Photo of teen girl in a ski mask making an origami swan in a
       | barn. There is caption on the bottom of the image: "EAT DRUGS" in
       | yellow font. In the background there is a framed photo of obama
       | 
       | https://i.imgur.com/RifcWZc.png
       | 
       | Donald Trump on the cover of "Leopards Ate My Face" magazine
       | 
       | https://i.imgur.com/6HdBJkr.png
        
         | kevin_thibedeau wrote:
         | DT coverlines are very authentic. Garbled just like the real
         | thing.
        
         | viraptor wrote:
         | It got confused with the mask and origami unfortunately.
        
       | smusamashah wrote:
       | Tested it using prompts from ideogram (login walled) which has
       | great prompt adherence. Flux generated very very good images. I
       | have been playing with ideogram but i don't want their filters
       | and want to have a similar powerful system running locally.
       | 
       | If this runs locally, this is very very close to that in terms of
       | both image quality and prompt adherence.
       | 
       | I did fail at writing text clearly when text was a bit
       | complicated. This ideogram image's prompt for example
       | https://ideogram.ai/g/GUw6Vo-tQ8eRWp9x2HONdA/0
       | 
       | > A captivating and artistic illustration of four distinct
       | creative quarters, each representing a unique aspect of
       | creativity. In the top left, a writer with a quill and inkpot is
       | depicted, showcasing their struggle with the text "THE STRUGGLE
       | IS NOT REAL 1: WRITER". The scene is comically portrayed,
       | highlighting the writer's creative challenges. In the top right,
       | a figure labeled "THE STRUGGLE IS NOT REAL 2: COPY ||PASTER" is
       | accompanied by a humorous comic drawing that satirically
       | demonstrates their approach. In the bottom left, "THE STRUGGLE IS
       | NOT REAL 3: THE RETRIER" features a character retrieving items,
       | complete with an entertaining comic illustration. Lastly, in the
       | bottom right, a remixer, identified as "THE STRUGGLE IS NOT REAL
       | 4: THE REMI
       | 
       | Otherwise, the quality is great. I stopped using stable diffusion
       | long time ago, the tools and tech around it became very messy,
       | its not fun anymore. Been using ideogram for fun but I want
       | something like ideogram that I can run locally without any
       | filters. This is looking perfect so far.
       | 
       | This is not ideogram, but its very very good.
        
         | benreesman wrote:
         | Ideogram handles text really well but I don't want to be on
         | some weird social network.
         | 
         | If this thing can mint memes with captions in it on a single
         | node I guess that's the weekend gone.
         | 
         | Thanks for the useful review.
        
           | smusamashah wrote:
           | Flux is amazing actually. See my other comment where I
           | verified a prompt on their fastest model. Check the linked
           | reddit thread too.
           | 
           | https://news.ycombinator.com/item?id=41132515
        
       | burkaygur wrote:
       | hi friends! burkay from fal.ai here. would like to clarify that
       | the model is NOT built by fal. all credit should go to Black
       | Forest Labs (https://blackforestlabs.ai/) which is a new co by
       | the OG stable diffusion team.
       | 
       | what we did at fal is take the model and run it on our inference
       | engine optimized to run these kinds of models really really fast.
       | feel free to give it a shot on the playgrounds.
       | https://fal.ai/models/fal-ai/flux/dev
        
         | tikkun wrote:
         | > We are excited to introduce Flux
         | 
         | I'd suggest re-wording the blog post intro, it reads as if it
         | was created by Fal.
         | 
         | Specific phrases to change:
         | 
         | > Announcing Flux
         | 
         | (from the title)
         | 
         | > We are excited to introduce Flux
         | 
         | > Flux comes in three powerful variations:
         | 
         | This section also comes across as if you created it
         | 
         | > We invite you to try Flux for yourself.
         | 
         | Reads as if you're the creator
        
           | burkaygur wrote:
           | Thanks for the feedback! Made some updates.
        
             | tikkun wrote:
             | Way better, nice
        
         | nextos wrote:
         | The name is a bit unfortunate given that Julia's most popular
         | ML library is called Flux. See: https://fluxml.ai.
         | 
         | This library is quite well known, 3rd most starred project in
         | Julia: https://juliapackages.com/packages?sort=stars.
         | 
         | It has been around since, at least, 2016:
         | https://github.com/FluxML/Flux.jl/graphs/code-frequency.
        
           | refulgentis wrote:
           | There was a looong distracting thread a month ago about
           | something similar, niche language, might have been Julia, had
           | a package with the same name as $NEW_THING.
           | 
           | I hope this one doesn't stir as much discussion. It has 4000
           | stars, there isnt a large mass of people who view the world
           | through the lens of "Flux is ML library". No one will end up
           | in a "who is on first?" discussion because of it. If this
           | line of argument is held sacrosanct, it ends up in an
           | infinite loop until everyone gives up and starts using UUIDs.
        
             | jachee wrote:
             | Eagerly waiting for this to happen in the medication names
             | space. :)
        
           | temp_account_32 wrote:
           | i would give them a break, so many things exist in the tech
           | sector that being completely original is basically
           | impossible, unless you name your thing something nonsensical
           | 
           | also search engines are context aware, if your search history
           | is full of julia questions, it will know what you're
           | searching for
        
           | msikora wrote:
           | Also Flux is a now obsolete application architecture for
           | ReactJS.
        
           | dheera wrote:
           | I think we've generally run out of names to give projects and
           | need to start reusing names. Maybe use letters to
           | disambiguate them.
           | 
           | Flux A is the ML library
           | 
           | Flux B is the T2I model
           | 
           | Flux C is the React library
           | 
           | Flux D is the physics concept of power per unit area
           | 
           | Flux E is the goo you put on solder
        
         | dabeeeenster wrote:
         | The unsubscribe links in your emails don't work
        
         | vessenes wrote:
         | Congrats Burkay - the model is very impressive. One area I'd
         | like to see improved in a flux v2 is knowledge of artist
         | styles. Flux cannot respond to requests asking for paintings in
         | the style of David Hockney, Norman Rockwell, Edgar Degas, -- it
         | seems to have no fine art training at all.
         | 
         | I'd bet that fine art training would further improve the
         | compositional skills of the model, plus it would open up a
         | range of uses that are (to me at least) a bit more interesting
         | than just illustrations.
        
           | warkdarrior wrote:
           | Have those artists given permission for their styles to be
           | slurped up into a model?
        
             | GaggiX wrote:
             | Give me a sec, I will contact Edgar Degas with my
             | telegraph.
        
         | frognumber wrote:
         | It would be nice to understand limits of the free tier. I
         | couldn't find that anywhere. I see pricing, but I'm generating
         | images without swiping my credit card.
         | 
         | If it's unlimited or "throttled for abuse," say that. Right
         | now, I don't know if I can try it six times or experiment to my
         | heart's desire.
        
         | Hizonner wrote:
         | You also might want to "clarify" that it is not open source
         | (and neither are any of the other "open source" models). If you
         | want to call it something, try "open weights", although the
         | usage restrictions make even that a HUGE FUCKING STRETCH.
         | 
         | Also, everybody should remember that these models are _not
         | copyrightable_ and you should never agree to any license for
         | them...
        
         | metadat wrote:
         | The playground is a drag. After accepting being forced to sign
         | up, attach my GitHub, and hand over my email address, I entered
         | the desired prompt and waited with anticipation.. Only to see a
         | black screen and how much it's going to cost per megapixel.
         | 
         | Bummer. After seeing what was generated in the blog post I was
         | excited to try it! Now feeling disappointed.
         | 
         | I was hoping it'd be more like https://play.go.dev.
         | 
         | Good luck.
        
           | mft_ wrote:
           | https://replicate.com/black-forest-labs/flux-dev is working
           | very nicely. No sign-up.
        
             | metadat wrote:
             | Thanks this one actually works, pretty amazing.
             | 
             | Remarkably better than the "DrawThings" iPhone app.
        
           | burkaygur wrote:
           | feel free to create a temp github account with a temp email
           | address
        
         | RobotToaster wrote:
         | If you are using the dev model, the licence isn't open source.
        
       | UncleOxidant wrote:
       | Other flux ai things: https://fluxml.ai/ , https://www.flux.ai
        
       | mikejulietbravo wrote:
       | What's the tl;dr on a difference from this to SD?
        
         | minimaxir wrote:
         | tl;dr better quality even with the least powerful model and can
         | be much faster
        
       | tantalor wrote:
       | Seems to do pretty poorly with spatial relationships.
       | 
       | "An upside down house" -> regular old house
       | 
       | "A horse sitting on a dog" -> horse and dog next to eachother
       | 
       | "An inverted Lockheed Martin F-22 Raptor" -> yikes
       | https://fal.media/files/koala/zgPYG6SqhD4Y3y_E9MONu.png
        
         | minimaxir wrote:
         | It appears the model does have some "sanity" restrictions from
         | whatever its training process is that limits some of the super
         | weird outputs.
         | 
         | "A horse sitting on a dog" doesn't work but "A dog sitting on a
         | horse" works perfectly.
        
         | colkassad wrote:
         | Indeed: https://fal.ai/models/fal-
         | ai/flux?share=e7e98018-fd69-45c0-9...
        
         | bboygravity wrote:
         | a zebra on top of an elephant worked fine for me
        
       | j1mmie wrote:
       | I'm really impressed at its ability to output pixel art sprites.
       | Maybe the best general-purpose model I've seen capable of that.
       | In many cases its better than purpose-built models.
        
       | cwoolfe wrote:
       | Hey, great work over at fal.ai to run this on your infrastructure
       | and for building in a free $2 in credits to try before buying.
       | For those thinking of running this at home, I'll save you the
       | trouble. Black Forest Flux did not run easily on my Apple Silicon
       | MacBook at this time. (Please let me know if you have gotten this
       | to run for you on similar hardware.) Specifically, it falls back
       | to using CPU which is very slow. Changing device to 'mps' causes
       | error "BFloat16 is not supported on MPS"
        
       | robotnikman wrote:
       | This is amazing! I thought it would be a few more years before we
       | would have a such high quality model we could run locally.
        
       | SirMaster wrote:
       | I tried: "Moe from The Simpsons, waving" several times. But it
       | only ever drew Lisa from The Simpsons waving.
        
       | Havoc wrote:
       | Bit annoying signup...Github only...and github account creation
       | is currently broken "Something went wrong". Took two tries and
       | two browsers...
        
         | fernly wrote:
         | I had the same "something went wrong" experience, but on
         | retrying the "sign in to run" button, it was fine and had
         | logged me in.
         | 
         | Gave me a credit of 2USD to play with.
        
       | seveibar wrote:
       | whenever I see a new model I always see if it can do engineering
       | diagrams (e.g. "two square boxes at a distance of 3.5mm"), still
       | no dice on this one.
       | https://x.com/seveibar/status/1819081632575611279
       | 
       | Would love to see an AI company attack engineering diagrams head
       | on, my current hunch is that they just aren't in the training
       | dataset (I'm very tempted to make a synthetic dataset/benchmark)
        
         | lovethevoid wrote:
         | I hope you find manual tagging of diagrams interesting, as that
         | is what you'll be doing a lot of!
        
         | napoleongl wrote:
         | Can't you get this done via an LLM and have it generate code
         | for mermaid or D2 or something? I've been fiddling around with
         | that a bit in order to create flowcharts and datamodels, and
         | I'm pretty sure I've seen at least one of those languages
         | handle absolute positioning of object.
        
           | seveibar wrote:
           | it usually isn't accurate. LLMs generally have very little
           | spatial awareness.
        
         | phkahler wrote:
         | >> Would love to see an AI company attack engineering diagrams
         | head on, my current hunch is that they just aren't in the
         | training dataset (I'm very tempted to make a synthetic
         | dataset/benchmark)
         | 
         | That seems like a good use for a speech driven assistant that
         | know how to use PC desktop software. Just talk to a CAD program
         | and say what you want. This seems like a long way off but could
         | be very useful.
        
         | zellyn wrote:
         | I have likewise been utterly unable to get it to generate
         | images that look like preliminary rapid pencil sketches.
         | Suggestions by experienced prompters welcome!
        
       | smusamashah wrote:
       | Holy crap this is amazing. I saw an image with a prompt on reddit
       | and didn't believe it was generated imaged. I thought it must be
       | joke that people are sharing non-generated images in the thread.
       | 
       | Reddit message:
       | https://www.reddit.com/r/StableDiffusion/comments/1ehh1hx/an...
       | 
       | Linked image:
       | https://preview.redd.it/dz3djnish2gd1.png?width=1024&format=...
       | 
       | The prompt:
       | 
       | > Photo of Criminal in a ski mask making a phone call in front of
       | a store. There is caption on the bottom of the image: "It's time
       | to Counter the Strike...". There is a red arrow pointing towards
       | the caption. The red arrow is from a Red circle which has an
       | image of Halo Master Chief in it.
       | 
       | Some of the images I generated using schnell model with 8-10
       | steps using this prompt. https://imgur.com/a/3mM9tKf
        
       | fngjdflmdflg wrote:
       | Is the architecture outlined anywhere? Any publications or word
       | on if they will publish something in the future? To be fair to
       | them, they seemed to have launched this company _today_ so I
       | doubt they have a lot of time right now. Or maybe I just missed
       | it?
        
         | minimaxir wrote:
         | You can look at the model config params for diffusers, e.g.:
         | https://huggingface.co/black-forest-labs/FLUX.1-schnell/comm...
        
           | fngjdflmdflg wrote:
           | I don't have anything to compare it to as I'm not that
           | familiar with other diffusion models in the first place. I
           | was kind of hoping to read the key changes they made to the
           | diffusion architecture and how they collected and curated
           | their dataset. I'd assume their are also using LAION but I
           | wonder if they are doing anything to filter out low quality
           | images (separate from what LAION atheistic already does). Or
           | maybe if they have their own dataset.
        
       | Oras wrote:
       | This is great and unbelievably fast! I noticed a small note
       | saying how much this would cost and how many images you can
       | create for $1.
       | 
       | I assume you're offering this as an API? Would be nice to have
       | pricing page as I didn't see one on your website.
        
       | ZoomerCretin wrote:
       | Anyone know why text-to-image models have so many fewer
       | parameters than text models? Are there any large image models
       | (>70b, 400b, etc)?
        
         | minimaxir wrote:
         | Diffusion is very efficient encoding/decoding.
         | 
         | The only reason that diffusion isn't used for text is because
         | text requires discrete outputs.
        
       | xmly wrote:
       | Nice one. Will it plan to support both text and image to image?
        
       | Der_Einzige wrote:
       | How long until nsfw fine tunes? Don't pretend like it's not on
       | all of y'all's minds, since over half of all the models on
       | Civit.ai are NSFW. That's what folks in the real world actually
       | do with these models.
        
       | viraptor wrote:
       | Censored a bit, but not completely. I can get occasional boobs
       | out of it, but sometimes it just gives the black output.
        
       | mlboss wrote:
       | These venture funded startups keep releasing models for free
       | without a business model in sight. I am all for open source but
       | worry it is not sustainable long term.
        
       ___________________________________________________________________
       (page generated 2024-08-01 23:00 UTC)