[HN Gopher] FLUX.1 Kontext
       ___________________________________________________________________
        
       FLUX.1 Kontext
        
       Author : minimaxir
       Score  : 207 points
       Date   : 2025-05-29 17:40 UTC (5 hours ago)
        
 (HTM) web link (bfl.ai)
 (TXT) w3m dump (bfl.ai)
        
       | SV_BubbleTime wrote:
       | Single shot LORA effect if it works as they cherry pick will be a
       | game changer for editing.
       | 
       | As with almost any AI release though, unless it's open weights, I
       | don't care. The strengths and weaknesses of these models are
       | apparent when you run them locally.
        
         | ttoinou wrote:
         | They're not apparent when you run them online ?
        
       | nullbyte wrote:
       | Hopefully they list this on HuggingFace for the opensource
       | community. It looks like a great model!
        
         | minimaxir wrote:
         | The original open-source Flux releases were also on Hugging
         | Face.
        
         | vunderba wrote:
         | From their site they will be releasing the _DEV_ version -
         | which is a distilled variant - so quality and adherence will
         | suffer unfortunately.
        
       | fortran77 wrote:
       | It still has no idea what a Model F keyboard looks like. I tried
       | prompts and editing, and got things that weren't even close.
        
         | yorwba wrote:
         | You mean when you edit a picture of a Model F keyboard and tell
         | it to use it in a scene, it still produces a different
         | keyboard?
        
         | refulgentis wrote:
         | Interesting, would you mind sharing? (imgur allows free image
         | uploads, quick drag and drop)
         | 
         | I do have a "works on my machine"* :) -- prompt "Model F
         | keyboard", all settings disabled, on the smaller model, seems
         | to have substantially more than no idea:
         | https://imgur.com/a/32pV6Sp
         | 
         | (Google Images comparison included to show in-the-wild "Model F
         | keyboard", which may differ from my/your expected distribution)
         | 
         | * my machine, being, https://playground.bfl.ai/ (I have no
         | affiliation with BFL)
        
           | jsheard wrote:
           | Your generated examples just look like generic modern-day
           | mechanical keyboards, they don't have any of the Model Fs
           | defining features.
        
           | AStonesThrow wrote:
           | Your Google Images search indicates the original problem of
           | models training on junk misinformation online. If AI scrapers
           | are downloading every photo that's associated with "Model F
           | Keyboard" like that, the models have no idea what is an IBM
           | Model F, or its distinguishing characteristics, and what is
           | some other company's, and what is misidentified.
           | 
           | https://commons.wikimedia.org/wiki/Category:IBM_Model_F_Keyb.
           | ..
           | 
           | Specifying "IBM Model F keyboard" _and placing it in
           | quotation marks_ improves the search. But the front page of
           | the search is tip-of-the-iceberg compared to whatever the
           | model 's scrapers ingested.
           | 
           | Eventually you may hit trademark protections. Reproducing a
           | brand-name keyboard may be as difficult as simulating a
           | celebrity's likeness.
           | 
           | I'm not even sure what my friends look like on Facebook, so
           | it's not clear how an AI model would reproduce a brand-name
           | keyboard design on request.
        
             | refulgentis wrote:
             | I agree with you vehemently.
             | 
             | Another way of looking at it is, insistence on complete
             | verisimilitude in an _image generator_ is fundamentally in
             | error.
             | 
             | I would argue, even undesirable. I don't want to live in a
             | world where a 45 year old keyboard that was only out for 4
             | years is readily imitated in every microscopic detail.
             | 
             | I also find myself frustrated, and asking myself why.
             | 
             | First thought that jumps in: it's very clear that it is in
             | error to say the model has _no idea_ , modulo there's some
             | independent run that's dramatically different from the only
             | one offered in this thread.
             | 
             | Second thought: if we're doing "the image generators don't
             | get details right", it would seem to be there a lot simpler
             | examples than OPs, and it is better expressed that way - I
             | assume it wasn't expressed that way because it sounds like
             | dull conversation, but it doesn't have to be!
             | 
             | Third thought as to why I feel frustrated: I feel like I
             | wasted time here - no other demos showing it's anywhere
             | close to "no idea", its completely unclear to me whats
             | distinctive about a IBM Model F Keyboard, and the the
             | wikipedia images are _worse_ than Google 's AFAICT.
        
         | stephen37 wrote:
         | I got it working when I provided an image of a Model F
         | keyboard. This is the strength of the model, provide it an
         | input image and it will do some magic
         | 
         | Disclaimer: I work for BFL
        
       | anjneymidha wrote:
       | Technical report here for those curious:
       | https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...
        
         | rvz wrote:
         | Unfortunately, nobody wants to read the report, but what they
         | are really after is to download the open-weight model.
         | 
         | So they can take it and run with it. (No contributing back
         | either).
        
           | anjneymidha wrote:
           | "FLUX.1 Kontext [dev]
           | 
           | Open-weights, distilled variant of Kontext, our most advanced
           | generative image editing model. Coming soon" is what they say
           | on https://bfl.ai/models/flux-kontext
        
             | sigmoid10 wrote:
             | Distilled is a real downer, but I guess those AI startup
             | CEOs still gotta eat.
        
               | dragonwriter wrote:
               | The open community has a done a lot with the open-weights
               | distilled models from Black Forest Labs already, one of
               | the more radical being Chroma:
               | https://huggingface.co/lodestones/Chroma
        
           | refulgentis wrote:
           | I agree that gooning crew drives a _lot_ of open model
           | downloads.
           | 
           | On HN, generally, people are more into technical discussion
           | and/or productizing this stuff. Here, it seems declasse to
           | mention the gooner angle, it's usually euphemized as intense
           | reactions about refusing to download it involving the words
           | "censor"
        
         | liuliu wrote:
         | Seems implementation is straightforward (very similar to
         | everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is
         | on data curation (which details are lightly shared).
        
           | krackers wrote:
           | I haven't been following image generation models closely, at
           | a high level is this new Flux model still diffusion based, or
           | have they moved to block autoregressive (possibly with
           | diffusion for upscaling) similar to 4o?
        
             | liuliu wrote:
             | Diffusion based. There is no point to move to auto-
             | regressive if you are not also training a multimodality
             | LLM, which these companies are not doing that.
        
       | amazingamazing wrote:
       | Don't understand the remove from face example. Without other
       | pictures showing the persons face, it's just using some
       | stereotypical image, no?
        
         | vessenes wrote:
         | Mm, depends on the underlying model and where it is in the
         | pipeline; identity models are pretty sophisticated at
         | interpolating faces from partial geometry.
        
         | Scaevolus wrote:
         | The slideshow appears to be glitched on that first example. The
         | input image has a snowflake covering most of her face.
        
         | sharkjacobs wrote:
         | There's no "truth" it's uncovering, no real face, these are all
         | just generated images, yes.
        
           | amazingamazing wrote:
           | I get that but usually you would have two inputs, the
           | reference "true", and the target that it to be manipulated.
        
             | nine_k wrote:
             | Not necessarily. "As you may see, this is a Chinese lady.
             | You have seen a number of Chinese ladies in your training
             | set. Imagine the face of this lady so that it won't
             | contradict the fragment visible on the image with the
             | snowflake". (Damn, it's a pseudocode prompt.)
        
               | amazingamazing wrote:
               | yes, so a stereotypical image. my point is best
               | illustrated if you look at all of the photos of the
               | woman.
        
               | throwaway314155 wrote:
               | Even if you provide another image (which you totally can
               | btw) the model is still generalizing predictions enough
               | that you can say it's just making a strong guess about
               | what is concealed.
               | 
               | I guess my main point is "this is where you draw the
               | line? at a mostly accurate reconstruction of a partial of
               | someone's face?" this was science fiction a few years
               | ago. Training the model to accept two images (which it
               | can, just not for explicit purposes of reconstructing
               | (although it learns that too )) seems like a very task-
               | specific, downstream way to handle this issue. This field
               | is now about robust, general ways to emerge intelligent
               | behavior not task specific models.
        
         | jorgemf wrote:
         | I think they are doing that because using real images the model
         | changes the face. So that problem is removed if the initial
         | image doesn't show the face
        
         | ilaksh wrote:
         | Look more closely at the example. Clearly there is an
         | opportunity for inference with objects that only partially
         | obscure.
        
         | pkkkzip wrote:
         | They chosen Asian traits that Western beauty standards
         | fetishize that in Asia wouldn't be taken serious at all.
         | 
         | I notice American text2image models tend to generate less
         | attractive and more darker skinned humans where as Chinese
         | text2image generate attractive and more light skinned humans.
         | 
         | I think this is another area where Chinese AI models shine.
        
           | throwaway314155 wrote:
           | > notice American text2image models tend to generate less
           | attractive and more darker skinned humans where as Chinese
           | text2image generate attractive and more light skinned humans
           | 
           | This seems entirely subjective to me.
        
           | viraptor wrote:
           | > They chosen Asian traits that Western beauty standards
           | fetishize that in Asia wouldn't be taken serious at all.
           | 
           | > where as Chinese text2image generate attractive and more
           | light skinned humans.
           | 
           | Are you saying they have chosen Asian traits that Asian
           | beauty standards fetishize that in the West wouldn't be taken
           | seriously at all? ;) There is no ground truth here that would
           | be more correct one way or the other.
        
           | turnsout wrote:
           | Wow, that is some straight-up overt racism. You should be
           | ashamed.
        
       | vessenes wrote:
       | Pretty good!
       | 
       | I like that they are testing face and scene coherence with
       | iterated edits -- major pain point for 4o and other models.
        
       | ttoinou wrote:
       | How knowledgable do you need to be to tweak and train this
       | locally ?
       | 
       | I spent two days trying to train a LoRa customization on top of
       | Flux 1 dev on Windows with my RTX 4090 but can't make it work and
       | I don't know how deep into this topic and python library I need
       | to study. Are there scripts kiddies in this game or only experts
       | ?
        
         | minimaxir wrote:
         | The open-source model is not released yet, but it definitely
         | won't be any easier than training a LoRA on Flux 1 Dev.
        
           | ttoinou wrote:
           | Damn, I'm just too lazy to learn skills that will be outdated
           | in 6 months
        
             | johnnyApplePRNG wrote:
             | And yet you're not too lazy to explain your laziness in
             | more than the 5 words it required on a social media site.
             | Hm.
        
               | ttoinou wrote:
               | I'm selectively lazy for sure. I'm working full time
               | right now. Like, all the time except for sleep
        
               | layer8 wrote:
               | It sounds you're not lazy enough, that's not healthy
               | longer term.
        
         | Flemlo wrote:
         | It's normally easy to find it ore configured through comfyui.
         | 
         | Sometimes behind patreon if some YouTuber
        
         | throwaway675117 wrote:
         | Just use https://github.com/bghira/SimpleTuner
         | 
         | I was able to run this script to train a Lora myself without
         | spending any time learning the underlying python libraries.
        
           | ttoinou wrote:
           | Well thank you I will test that
        
             | dagaci wrote:
             | SimpleTuner is dependant on Microsoft's DeepSpeed which
             | doesnt work on Windows :)
             | 
             | So you probably better off using Ai-ToolKit
             | https://github.com/ostris/ai-toolkit
        
         | 3abiton wrote:
         | > I spent two days trying to train a LoRa customization on top
         | of Flux 1 dev on Windows with my RTX 4090 but can't make
         | 
         | Windows is mostly the issue, to really take advantage, you will
         | need linux.
        
           | ttoinou wrote:
           | Even using WSL2 with Ubuntu isn't good enough ?
        
       | minimaxir wrote:
       | Currently am testing this out (using the Replicate endpoint:
       | https://replicate.com/black-forest-labs/flux-kontext-pro).
       | Replicate also hosts "apps" with examples using FLUX Kontext for
       | some common use cases of image editing:
       | https://replicate.com/flux-kontext-apps
       | 
       | It's pretty good: quality of the generated images is similar to
       | that of GPT-4o image generation if you were using it for simple
       | image-to-image generations. Generation is speedy at about ~4
       | seconds per generation.
       | 
       | Prompt engineering outside of the examples used on this page is a
       | little fussy and I suspect will evolve over time. Changing styles
       | or specific aspects does indeed work, but the more specific you
       | get, the more it tends to ignore the specifics.
        
         | skipants wrote:
         | > Generation is speedy at about ~4 seconds per generation
         | 
         | May I ask on which GPU & VRAM?
         | 
         | edit: oh unless you just meant through huggingface's UI
        
           | minimaxir wrote:
           | It is through Replicate's UI listed, which goes through Black
           | Forest Labs's infra so would likely get the same results from
           | their API.
        
           | zamadatix wrote:
           | The open weights variant is "coming soon" so the only option
           | is hosted right now.
        
         | cuuupid wrote:
         | Honestly love Replicate for always being up to date. It's
         | amazing that not only do we live in a time of rapid AI
         | advancement, but that every new research grade model is
         | immediately available via API and can be used in prod, at
         | scale, no questions asked.
         | 
         | Something to be said about distributors like Replicate etc that
         | are adding an exponent to the impact of these model releases
        
           | minimaxir wrote:
           | That's less on the downstream distributors, more on the model
           | developers themselves realizing that ease-of-accessibility of
           | the models themselves on Day 1 is important for getting
           | community traction. Locking the model exclusively behind
           | their own API won't work anymore.
           | 
           | Llama 4 was another recent case where they explicitly worked
           | with downstream distributors to get it working Day 1.
        
           | meowface wrote:
           | I have no affiliation with either company but from using both
           | a bunch as a customer: Replicate has a competitor at
           | https://fal.ai/models and FAL's generation speed is
           | consistently faster across every model I've tried. They have
           | some sub-100 ms image gen models, too.
           | 
           | Replicate has a much bigger model selection. But for every
           | model that's on both, FAL is pretty much "Replicate but
           | faster". I believe pricing is pretty similar.
        
             | echelon wrote:
             | A16Z invested in both. It's wild. They've been absolutely
             | flooding the GenAI market for images and videos with
             | investments.
        
         | a2128 wrote:
         | It seems more accurate than 4o image generation in terms of
         | preserving original details. If I give it my 3D animal
         | character and ask it for a minor change like changing the
         | lighting, 4o will completely mangle the face of my character,
         | it will change the body and other details slightly. This Flux
         | model keeps the visible geometry almost perfectly the same even
         | when asked to significantly change the pose or lighting
        
       | andybak wrote:
       | Nobody tested that page on mobile.
        
       | jnettome wrote:
       | I'm trying to login to evaluate this but the google auth
       | redirects me back to localhost:3000
        
       | vunderba wrote:
       | I'm debating whether to add the FLUX Kontext model to my GenAI
       | image comparison site. The Max variant of the model definitely
       | scores higher in _prompt adherence_ nearly doubling Flux 1.dev
       | score but still falling short of OpenAI 's gpt-image-1 which
       | (visual fidelity aside) is sitting at the top of the leaderboard.
       | 
       | I liked keeping Flux 1.D around just to have a nice baseline for
       | local GenAI capabilities.
       | 
       | https://genai-showdown.specr.net
       | 
       | Incidentally, we did add the newest release of Hunyuan's Image
       | 2.0 model but as expected of a real-time model it scores rather
       | poorly.
       | 
       |  _EDIT: In fairness to Black Forest Labs this model definitely
       | seems to be more focused on editing capabilities to refine and
       | iterate on existing images rather than on strict text-to-image
       | creation._
        
         | Klaus23 wrote:
         | Nice site! I have a suggestion for a prompt that I could never
         | get to work properly. It's been a while since I tried it, and
         | the models have probably improved enough that it should be
         | possible now.                 A knight with a sword in hand
         | stands with his back to us, facing down an army. He holds his
         | shield above his head to protect himself from the rain of
         | arrows shot by archers visible in the rear.
         | 
         | I was surprised at how badly the models performed. It's a
         | fairly iconic scene, and there's more than enough training
         | data.
        
       | vunderba wrote:
       | Some of these samples are rather cherry picked. Has anyone
       | actually tried the professional headshot app of the "Kontext
       | Apps"?
       | 
       | https://replicate.com/flux-kontext-apps
       | 
       | I've thrown half a dozen pictures of myself at it and it just
       | completely replaced me with somebody else. To be fair, the final
       | headshot does look very professional.
        
         | minimaxir wrote:
         | Is the input image aspect ratio the same as the output aspect
         | ratio? In some testing I've noticed that there is weirdness
         | that happens if there is a forced shift.
        
         | doctorpangloss wrote:
         | Nobody has solved the scientific problem of identity
         | preservation for faces in one shot. Nobody has even solved
         | hands.
        
       | layer8 wrote:
       | > show me a closeup of...
       | 
       | Investigators will love this for "enhance". ;)
        
       | ilaksh wrote:
       | Anyone have a guess as to when the open dev version gets
       | released? More like a week or a month or two I wonder.
        
       | mdp2021 wrote:
       | Is input restricted to a single image? If you could use more
       | images as input, you could do prompts like "Place the item in
       | image A inside image B" (e.g. "put the character of image A in
       | the scenery of image B"), etc.
        
         | carlosdp wrote:
         | There's an experimental "multi" mode you can input multiple
         | images to
        
         | echelon wrote:
         | Fal has the multi image interface to test against.
         | 
         | THIS MODEL ROCKS!
         | 
         | It's no gpt-image-1, but it's ridiculously close.
         | 
         | There isn't going to be a moat in images or video. I was so
         | worried Google and OpenAI would win creative forever. Not so.
         | Anyone can build these.
        
       | bossyTeacher wrote:
       | I wonder if this is using a foundation model or a fine tuned one
        
       | fagerhult wrote:
       | I vibed up a little chat interface https://kontext-
       | chat.vercel.app/
        
       | eamag wrote:
       | Can it generate chess? https://manifold.markets/Hazel/an-ai-
       | model-will-successfully...
        
       ___________________________________________________________________
       (page generated 2025-05-29 23:00 UTC)