[HN Gopher] FLUX.1 Kontext
___________________________________________________________________
FLUX.1 Kontext
Author : minimaxir
Score : 207 points
Date : 2025-05-29 17:40 UTC (5 hours ago)
(HTM) web link (bfl.ai)
(TXT) w3m dump (bfl.ai)
| SV_BubbleTime wrote:
| Single shot LORA effect if it works as they cherry pick will be a
| game changer for editing.
|
| As with almost any AI release though, unless it's open weights, I
| don't care. The strengths and weaknesses of these models are
| apparent when you run them locally.
| ttoinou wrote:
| They're not apparent when you run them online ?
| nullbyte wrote:
| Hopefully they list this on HuggingFace for the opensource
| community. It looks like a great model!
| minimaxir wrote:
| The original open-source Flux releases were also on Hugging
| Face.
| vunderba wrote:
| From their site they will be releasing the _DEV_ version -
| which is a distilled variant - so quality and adherence will
| suffer unfortunately.
| fortran77 wrote:
| It still has no idea what a Model F keyboard looks like. I tried
| prompts and editing, and got things that weren't even close.
| yorwba wrote:
| You mean when you edit a picture of a Model F keyboard and tell
| it to use it in a scene, it still produces a different
| keyboard?
| refulgentis wrote:
| Interesting, would you mind sharing? (imgur allows free image
| uploads, quick drag and drop)
|
| I do have a "works on my machine"* :) -- prompt "Model F
| keyboard", all settings disabled, on the smaller model, seems
| to have substantially more than no idea:
| https://imgur.com/a/32pV6Sp
|
| (Google Images comparison included to show in-the-wild "Model F
| keyboard", which may differ from my/your expected distribution)
|
| * my machine, being, https://playground.bfl.ai/ (I have no
| affiliation with BFL)
| jsheard wrote:
| Your generated examples just look like generic modern-day
| mechanical keyboards, they don't have any of the Model Fs
| defining features.
| AStonesThrow wrote:
| Your Google Images search indicates the original problem of
| models training on junk misinformation online. If AI scrapers
| are downloading every photo that's associated with "Model F
| Keyboard" like that, the models have no idea what is an IBM
| Model F, or its distinguishing characteristics, and what is
| some other company's, and what is misidentified.
|
| https://commons.wikimedia.org/wiki/Category:IBM_Model_F_Keyb.
| ..
|
| Specifying "IBM Model F keyboard" _and placing it in
| quotation marks_ improves the search. But the front page of
| the search is tip-of-the-iceberg compared to whatever the
| model 's scrapers ingested.
|
| Eventually you may hit trademark protections. Reproducing a
| brand-name keyboard may be as difficult as simulating a
| celebrity's likeness.
|
| I'm not even sure what my friends look like on Facebook, so
| it's not clear how an AI model would reproduce a brand-name
| keyboard design on request.
| refulgentis wrote:
| I agree with you vehemently.
|
| Another way of looking at it is, insistence on complete
| verisimilitude in an _image generator_ is fundamentally in
| error.
|
| I would argue, even undesirable. I don't want to live in a
| world where a 45 year old keyboard that was only out for 4
| years is readily imitated in every microscopic detail.
|
| I also find myself frustrated, and asking myself why.
|
| First thought that jumps in: it's very clear that it is in
| error to say the model has _no idea_ , modulo there's some
| independent run that's dramatically different from the only
| one offered in this thread.
|
| Second thought: if we're doing "the image generators don't
| get details right", it would seem to be there a lot simpler
| examples than OPs, and it is better expressed that way - I
| assume it wasn't expressed that way because it sounds like
| dull conversation, but it doesn't have to be!
|
| Third thought as to why I feel frustrated: I feel like I
| wasted time here - no other demos showing it's anywhere
| close to "no idea", its completely unclear to me whats
| distinctive about a IBM Model F Keyboard, and the the
| wikipedia images are _worse_ than Google 's AFAICT.
| stephen37 wrote:
| I got it working when I provided an image of a Model F
| keyboard. This is the strength of the model, provide it an
| input image and it will do some magic
|
| Disclaimer: I work for BFL
| anjneymidha wrote:
| Technical report here for those curious:
| https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...
| rvz wrote:
| Unfortunately, nobody wants to read the report, but what they
| are really after is to download the open-weight model.
|
| So they can take it and run with it. (No contributing back
| either).
| anjneymidha wrote:
| "FLUX.1 Kontext [dev]
|
| Open-weights, distilled variant of Kontext, our most advanced
| generative image editing model. Coming soon" is what they say
| on https://bfl.ai/models/flux-kontext
| sigmoid10 wrote:
| Distilled is a real downer, but I guess those AI startup
| CEOs still gotta eat.
| dragonwriter wrote:
| The open community has a done a lot with the open-weights
| distilled models from Black Forest Labs already, one of
| the more radical being Chroma:
| https://huggingface.co/lodestones/Chroma
| refulgentis wrote:
| I agree that gooning crew drives a _lot_ of open model
| downloads.
|
| On HN, generally, people are more into technical discussion
| and/or productizing this stuff. Here, it seems declasse to
| mention the gooner angle, it's usually euphemized as intense
| reactions about refusing to download it involving the words
| "censor"
| liuliu wrote:
| Seems implementation is straightforward (very similar to
| everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is
| on data curation (which details are lightly shared).
| krackers wrote:
| I haven't been following image generation models closely, at
| a high level is this new Flux model still diffusion based, or
| have they moved to block autoregressive (possibly with
| diffusion for upscaling) similar to 4o?
| liuliu wrote:
| Diffusion based. There is no point to move to auto-
| regressive if you are not also training a multimodality
| LLM, which these companies are not doing that.
| amazingamazing wrote:
| Don't understand the remove from face example. Without other
| pictures showing the persons face, it's just using some
| stereotypical image, no?
| vessenes wrote:
| Mm, depends on the underlying model and where it is in the
| pipeline; identity models are pretty sophisticated at
| interpolating faces from partial geometry.
| Scaevolus wrote:
| The slideshow appears to be glitched on that first example. The
| input image has a snowflake covering most of her face.
| sharkjacobs wrote:
| There's no "truth" it's uncovering, no real face, these are all
| just generated images, yes.
| amazingamazing wrote:
| I get that but usually you would have two inputs, the
| reference "true", and the target that it to be manipulated.
| nine_k wrote:
| Not necessarily. "As you may see, this is a Chinese lady.
| You have seen a number of Chinese ladies in your training
| set. Imagine the face of this lady so that it won't
| contradict the fragment visible on the image with the
| snowflake". (Damn, it's a pseudocode prompt.)
| amazingamazing wrote:
| yes, so a stereotypical image. my point is best
| illustrated if you look at all of the photos of the
| woman.
| throwaway314155 wrote:
| Even if you provide another image (which you totally can
| btw) the model is still generalizing predictions enough
| that you can say it's just making a strong guess about
| what is concealed.
|
| I guess my main point is "this is where you draw the
| line? at a mostly accurate reconstruction of a partial of
| someone's face?" this was science fiction a few years
| ago. Training the model to accept two images (which it
| can, just not for explicit purposes of reconstructing
| (although it learns that too )) seems like a very task-
| specific, downstream way to handle this issue. This field
| is now about robust, general ways to emerge intelligent
| behavior not task specific models.
| jorgemf wrote:
| I think they are doing that because using real images the model
| changes the face. So that problem is removed if the initial
| image doesn't show the face
| ilaksh wrote:
| Look more closely at the example. Clearly there is an
| opportunity for inference with objects that only partially
| obscure.
| pkkkzip wrote:
| They chosen Asian traits that Western beauty standards
| fetishize that in Asia wouldn't be taken serious at all.
|
| I notice American text2image models tend to generate less
| attractive and more darker skinned humans where as Chinese
| text2image generate attractive and more light skinned humans.
|
| I think this is another area where Chinese AI models shine.
| throwaway314155 wrote:
| > notice American text2image models tend to generate less
| attractive and more darker skinned humans where as Chinese
| text2image generate attractive and more light skinned humans
|
| This seems entirely subjective to me.
| viraptor wrote:
| > They chosen Asian traits that Western beauty standards
| fetishize that in Asia wouldn't be taken serious at all.
|
| > where as Chinese text2image generate attractive and more
| light skinned humans.
|
| Are you saying they have chosen Asian traits that Asian
| beauty standards fetishize that in the West wouldn't be taken
| seriously at all? ;) There is no ground truth here that would
| be more correct one way or the other.
| turnsout wrote:
| Wow, that is some straight-up overt racism. You should be
| ashamed.
| vessenes wrote:
| Pretty good!
|
| I like that they are testing face and scene coherence with
| iterated edits -- major pain point for 4o and other models.
| ttoinou wrote:
| How knowledgable do you need to be to tweak and train this
| locally ?
|
| I spent two days trying to train a LoRa customization on top of
| Flux 1 dev on Windows with my RTX 4090 but can't make it work and
| I don't know how deep into this topic and python library I need
| to study. Are there scripts kiddies in this game or only experts
| ?
| minimaxir wrote:
| The open-source model is not released yet, but it definitely
| won't be any easier than training a LoRA on Flux 1 Dev.
| ttoinou wrote:
| Damn, I'm just too lazy to learn skills that will be outdated
| in 6 months
| johnnyApplePRNG wrote:
| And yet you're not too lazy to explain your laziness in
| more than the 5 words it required on a social media site.
| Hm.
| ttoinou wrote:
| I'm selectively lazy for sure. I'm working full time
| right now. Like, all the time except for sleep
| layer8 wrote:
| It sounds you're not lazy enough, that's not healthy
| longer term.
| Flemlo wrote:
| It's normally easy to find it ore configured through comfyui.
|
| Sometimes behind patreon if some YouTuber
| throwaway675117 wrote:
| Just use https://github.com/bghira/SimpleTuner
|
| I was able to run this script to train a Lora myself without
| spending any time learning the underlying python libraries.
| ttoinou wrote:
| Well thank you I will test that
| dagaci wrote:
| SimpleTuner is dependant on Microsoft's DeepSpeed which
| doesnt work on Windows :)
|
| So you probably better off using Ai-ToolKit
| https://github.com/ostris/ai-toolkit
| 3abiton wrote:
| > I spent two days trying to train a LoRa customization on top
| of Flux 1 dev on Windows with my RTX 4090 but can't make
|
| Windows is mostly the issue, to really take advantage, you will
| need linux.
| ttoinou wrote:
| Even using WSL2 with Ubuntu isn't good enough ?
| minimaxir wrote:
| Currently am testing this out (using the Replicate endpoint:
| https://replicate.com/black-forest-labs/flux-kontext-pro).
| Replicate also hosts "apps" with examples using FLUX Kontext for
| some common use cases of image editing:
| https://replicate.com/flux-kontext-apps
|
| It's pretty good: quality of the generated images is similar to
| that of GPT-4o image generation if you were using it for simple
| image-to-image generations. Generation is speedy at about ~4
| seconds per generation.
|
| Prompt engineering outside of the examples used on this page is a
| little fussy and I suspect will evolve over time. Changing styles
| or specific aspects does indeed work, but the more specific you
| get, the more it tends to ignore the specifics.
| skipants wrote:
| > Generation is speedy at about ~4 seconds per generation
|
| May I ask on which GPU & VRAM?
|
| edit: oh unless you just meant through huggingface's UI
| minimaxir wrote:
| It is through Replicate's UI listed, which goes through Black
| Forest Labs's infra so would likely get the same results from
| their API.
| zamadatix wrote:
| The open weights variant is "coming soon" so the only option
| is hosted right now.
| cuuupid wrote:
| Honestly love Replicate for always being up to date. It's
| amazing that not only do we live in a time of rapid AI
| advancement, but that every new research grade model is
| immediately available via API and can be used in prod, at
| scale, no questions asked.
|
| Something to be said about distributors like Replicate etc that
| are adding an exponent to the impact of these model releases
| minimaxir wrote:
| That's less on the downstream distributors, more on the model
| developers themselves realizing that ease-of-accessibility of
| the models themselves on Day 1 is important for getting
| community traction. Locking the model exclusively behind
| their own API won't work anymore.
|
| Llama 4 was another recent case where they explicitly worked
| with downstream distributors to get it working Day 1.
| meowface wrote:
| I have no affiliation with either company but from using both
| a bunch as a customer: Replicate has a competitor at
| https://fal.ai/models and FAL's generation speed is
| consistently faster across every model I've tried. They have
| some sub-100 ms image gen models, too.
|
| Replicate has a much bigger model selection. But for every
| model that's on both, FAL is pretty much "Replicate but
| faster". I believe pricing is pretty similar.
| echelon wrote:
| A16Z invested in both. It's wild. They've been absolutely
| flooding the GenAI market for images and videos with
| investments.
| a2128 wrote:
| It seems more accurate than 4o image generation in terms of
| preserving original details. If I give it my 3D animal
| character and ask it for a minor change like changing the
| lighting, 4o will completely mangle the face of my character,
| it will change the body and other details slightly. This Flux
| model keeps the visible geometry almost perfectly the same even
| when asked to significantly change the pose or lighting
| andybak wrote:
| Nobody tested that page on mobile.
| jnettome wrote:
| I'm trying to login to evaluate this but the google auth
| redirects me back to localhost:3000
| vunderba wrote:
| I'm debating whether to add the FLUX Kontext model to my GenAI
| image comparison site. The Max variant of the model definitely
| scores higher in _prompt adherence_ nearly doubling Flux 1.dev
| score but still falling short of OpenAI 's gpt-image-1 which
| (visual fidelity aside) is sitting at the top of the leaderboard.
|
| I liked keeping Flux 1.D around just to have a nice baseline for
| local GenAI capabilities.
|
| https://genai-showdown.specr.net
|
| Incidentally, we did add the newest release of Hunyuan's Image
| 2.0 model but as expected of a real-time model it scores rather
| poorly.
|
| _EDIT: In fairness to Black Forest Labs this model definitely
| seems to be more focused on editing capabilities to refine and
| iterate on existing images rather than on strict text-to-image
| creation._
| Klaus23 wrote:
| Nice site! I have a suggestion for a prompt that I could never
| get to work properly. It's been a while since I tried it, and
| the models have probably improved enough that it should be
| possible now. A knight with a sword in hand
| stands with his back to us, facing down an army. He holds his
| shield above his head to protect himself from the rain of
| arrows shot by archers visible in the rear.
|
| I was surprised at how badly the models performed. It's a
| fairly iconic scene, and there's more than enough training
| data.
| vunderba wrote:
| Some of these samples are rather cherry picked. Has anyone
| actually tried the professional headshot app of the "Kontext
| Apps"?
|
| https://replicate.com/flux-kontext-apps
|
| I've thrown half a dozen pictures of myself at it and it just
| completely replaced me with somebody else. To be fair, the final
| headshot does look very professional.
| minimaxir wrote:
| Is the input image aspect ratio the same as the output aspect
| ratio? In some testing I've noticed that there is weirdness
| that happens if there is a forced shift.
| doctorpangloss wrote:
| Nobody has solved the scientific problem of identity
| preservation for faces in one shot. Nobody has even solved
| hands.
| layer8 wrote:
| > show me a closeup of...
|
| Investigators will love this for "enhance". ;)
| ilaksh wrote:
| Anyone have a guess as to when the open dev version gets
| released? More like a week or a month or two I wonder.
| mdp2021 wrote:
| Is input restricted to a single image? If you could use more
| images as input, you could do prompts like "Place the item in
| image A inside image B" (e.g. "put the character of image A in
| the scenery of image B"), etc.
| carlosdp wrote:
| There's an experimental "multi" mode you can input multiple
| images to
| echelon wrote:
| Fal has the multi image interface to test against.
|
| THIS MODEL ROCKS!
|
| It's no gpt-image-1, but it's ridiculously close.
|
| There isn't going to be a moat in images or video. I was so
| worried Google and OpenAI would win creative forever. Not so.
| Anyone can build these.
| bossyTeacher wrote:
| I wonder if this is using a foundation model or a fine tuned one
| fagerhult wrote:
| I vibed up a little chat interface https://kontext-
| chat.vercel.app/
| eamag wrote:
| Can it generate chess? https://manifold.markets/Hazel/an-ai-
| model-will-successfully...
___________________________________________________________________
(page generated 2025-05-29 23:00 UTC)