[HN Gopher] FLUX.1 Kontext
       ___________________________________________________________________
        
       FLUX.1 Kontext
        
       Author : minimaxir
       Score  : 495 points
       Date   : 2025-05-29 17:40 UTC (1 days ago)
        
 (HTM) web link (bfl.ai)
 (TXT) w3m dump (bfl.ai)
        
       | SV_BubbleTime wrote:
       | Single shot LORA effect if it works as they cherry pick will be a
       | game changer for editing.
       | 
       | As with almost any AI release though, unless it's open weights, I
       | don't care. The strengths and weaknesses of these models are
       | apparent when you run them locally.
        
         | ttoinou wrote:
         | They're not apparent when you run them online ?
        
           | SV_BubbleTime wrote:
           | Not for what these models are actually being used for.
        
       | nullbyte wrote:
       | Hopefully they list this on HuggingFace for the opensource
       | community. It looks like a great model!
        
         | minimaxir wrote:
         | The original open-source Flux releases were also on Hugging
         | Face.
        
         | vunderba wrote:
         | From their site they will be releasing the _DEV_ version -
         | which is a distilled variant - so quality and adherence will
         | suffer unfortunately.
        
       | fortran77 wrote:
       | It still has no idea what a Model F keyboard looks like. I tried
       | prompts and editing, and got things that weren't even close.
        
         | yorwba wrote:
         | You mean when you edit a picture of a Model F keyboard and tell
         | it to use it in a scene, it still produces a different
         | keyboard?
        
         | refulgentis wrote:
         | Interesting, would you mind sharing? (imgur allows free image
         | uploads, quick drag and drop)
         | 
         | I do have a "works on my machine"* :) -- prompt "Model F
         | keyboard", all settings disabled, on the smaller model, seems
         | to have substantially more than no idea:
         | https://imgur.com/a/32pV6Sp
         | 
         | (Google Images comparison included to show in-the-wild "Model F
         | keyboard", which may differ from my/your expected distribution)
         | 
         | * my machine, being, https://playground.bfl.ai/ (I have no
         | affiliation with BFL)
        
           | jsheard wrote:
           | Your generated examples just look like generic modern-day
           | mechanical keyboards, they don't have any of the Model Fs
           | defining features.
        
           | AStonesThrow wrote:
           | Your Google Images search indicates the original problem of
           | models training on junk misinformation online. If AI scrapers
           | are downloading every photo that's associated with "Model F
           | Keyboard" like that, the models have no idea what is an IBM
           | Model F, or its distinguishing characteristics, and what is
           | some other company's, and what is misidentified.
           | 
           | https://commons.wikimedia.org/wiki/Category:IBM_Model_F_Keyb.
           | ..
           | 
           | Specifying "IBM Model F keyboard" _and placing it in
           | quotation marks_ improves the search. But the front page of
           | the search is tip-of-the-iceberg compared to whatever the
           | model 's scrapers ingested.
           | 
           | Eventually you may hit trademark protections. Reproducing a
           | brand-name keyboard may be as difficult as simulating a
           | celebrity's likeness.
           | 
           | I'm not even sure what my friends look like on Facebook, so
           | it's not clear how an AI model would reproduce a brand-name
           | keyboard design on request.
        
             | refulgentis wrote:
             | I agree with you vehemently.
             | 
             | Another way of looking at it is, insistence on complete
             | verisimilitude in an _image generator_ is fundamentally in
             | error.
             | 
             | I would argue, even undesirable. I don't want to live in a
             | world where a 45 year old keyboard that was only out for 4
             | years is readily imitated in every microscopic detail.
             | 
             | I also find myself frustrated, and asking myself why.
             | 
             | First thought that jumps in: it's very clear that it is in
             | error to say the model has _no idea_ , modulo there's some
             | independent run that's dramatically different from the only
             | one offered in this thread.
             | 
             | Second thought: if we're doing "the image generators don't
             | get details right", it would seem to be there a lot simpler
             | examples than OPs, and it is better expressed that way - I
             | assume it wasn't expressed that way because it sounds like
             | dull conversation, but it doesn't have to be!
             | 
             | Third thought as to why I feel frustrated: I feel like I
             | wasted time here - no other demos showing it's anywhere
             | close to "no idea", its completely unclear to me whats
             | distinctive about a IBM Model F Keyboard, and the the
             | wikipedia images are _worse_ than Google 's AFAICT.
        
               | fc417fc802 wrote:
               | > if we're doing "the image generators don't get details
               | right", it would seem to be there a lot simpler examples
               | than OPs
               | 
               | There are different sorts of details though, and the
               | distinctions are both useful and interesting to
               | understanding the state of the art. If "man drinking
               | coke" produces someone with 6 fingers holding a glass of
               | water that's completely different from producing someone
               | with 5 fingers holding a can of pepsi.
               | 
               | Notice that none of the images in your example got the
               | function key placement correct. Clearly the model knows
               | what a relatively modern keyboard is, and it even has
               | some concept of a vaguely retro looking mechanical
               | keyboard. However indeed I'm inclined to agree with OP
               | that it has approximately zero idea what an "IBM model F"
               | keyboard is. I'm not sure that's a failure of the model
               | though - as you point out, it's an ancient and fairly
               | obscure product.
        
             | fc417fc802 wrote:
             | > Eventually you may hit trademark protections. Reproducing
             | a brand-name keyboard may be as difficult as simulating a
             | celebrity's likeness.
             | 
             | Then the law is broken. Monetizing someone's likeness is an
             | issue. Utilizing trademarked characteristics to promote
             | your own product without permission is an issue. It's the
             | downstream actions of the user that are the issue, not the
             | ML model itself.
             | 
             | Models regurgitating copyrighted material verbatim is of
             | course an entirely separate issue.
        
               | refulgentis wrote:
               | >> Eventually you may hit trademark protections.
               | 
               | > Then the law is broken.
               | 
               | > Utilizing trademarked characteristics to promote your
               | own product without permission is an issue.
               | 
               | It sounds like you agree with parent that if your product
               | reproduces trademark characteristics, it is utilizing
               | trademarked characteristics. Just about at what layer you
               | don't have responsibility. And the layer that has
               | responsibility is the one that profits unjustly from the
               | AI.
               | 
               | I'm interested if there's an argument for saying only the
               | 2nd party user of the 1st party AI model, selling AI
               | model output, to a 3rd party, is intuitively unfair.
               | 
               | I can't think of one. e.g. Disney launches some new
               | cartoon or whatever. 1st party Openmetagoog, trains on it
               | to make my "Video Episode Generator" product. Now,
               | Openmetagoogs Community Pages are full of 30m video
               | episodes made by their image generator. They didn't make
               | them, nor do they promote them. Inuitively, Openmetagoog
               | a _competitor_ for manufacturing my IP, and that is also
               | intuitively _wrong_. Your analysis would have us charge
               | the users for sharing the output.
        
               | fc417fc802 wrote:
               | > if your product reproduces trademark characteristics,
               | it is utilizing trademarked characteristics.
               | 
               | I wouldn't agree with that, no. To my mind "utilizing"
               | generally requires intent at least in the context we're
               | discussing here (ie moral or legal obligations). I'd
               | remind you that the entire point of trademark is
               | (approximately) to prevent brand confusion within the
               | market.
               | 
               | > Your analysis would have us charge the users for
               | sharing the output.
               | 
               | Precisely. I see it as both a matter of intent and
               | concrete damages. Creating something (pencil, diffusion
               | model, camera, etc) that _could possibly_ be used in a
               | manner that violates the law is not a problem. It is the
               | end user violating the law that is at fault.
               | 
               | Imagine an online community that uses blender to create
               | disney knockoffs and shares them publicly. Blender is not
               | at fault and the creation of the knockoffs themselves (ie
               | in private) is not the issue either. It's the part where
               | the users proceed to publicly share them that poses the
               | problem.
               | 
               | > They didn't make them, nor do they promote them.
               | 
               | By the same logic youtube neither creates nor promotes
               | pirated content that gets uploaded. We have DMCA takedown
               | notices for dealing with precisely this issue.
               | 
               | > Inuitively, Openmetagoog a competitor for manufacturing
               | my IP, and that is also intuitively wrong.
               | 
               | Let's be clear about the distinction between trademark
               | and copyright here. Outputting a verbatim copy is indeed
               | a problem. Outputting a likeness is not, but an end user
               | could certainly proceed to (mis)use that output in a
               | manner that is.
               | 
               | Intent matters here. A product whose primary purpose is
               | IP infringement is entirely different from one whose
               | purpose is general but could potentially be used to
               | infringe.
        
         | stephen37 wrote:
         | I got it working when I provided an image of a Model F
         | keyboard. This is the strength of the model, provide it an
         | input image and it will do some magic
         | 
         | Disclaimer: I work for BFL
        
       | anjneymidha wrote:
       | Technical report here for those curious:
       | https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...
        
         | rvz wrote:
         | Unfortunately, nobody wants to read the report, but what they
         | are really after is to download the open-weight model.
         | 
         | So they can take it and run with it. (No contributing back
         | either).
        
           | anjneymidha wrote:
           | "FLUX.1 Kontext [dev]
           | 
           | Open-weights, distilled variant of Kontext, our most advanced
           | generative image editing model. Coming soon" is what they say
           | on https://bfl.ai/models/flux-kontext
        
             | sigmoid10 wrote:
             | Distilled is a real downer, but I guess those AI startup
             | CEOs still gotta eat.
        
               | dragonwriter wrote:
               | The open community has a done a lot with the open-weights
               | distilled models from Black Forest Labs already, one of
               | the more radical being Chroma:
               | https://huggingface.co/lodestones/Chroma
        
           | refulgentis wrote:
           | I agree that gooning crew drives a _lot_ of open model
           | downloads.
           | 
           | On HN, generally, people are more into technical discussion
           | and/or productizing this stuff. Here, it seems declasse to
           | mention the gooner angle, it's usually euphemized as intense
           | reactions about refusing to download it involving the words
           | "censor"
        
         | liuliu wrote:
         | Seems implementation is straightforward (very similar to
         | everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is
         | on data curation (which details are lightly shared).
        
           | krackers wrote:
           | I haven't been following image generation models closely, at
           | a high level is this new Flux model still diffusion based, or
           | have they moved to block autoregressive (possibly with
           | diffusion for upscaling) similar to 4o?
        
             | liuliu wrote:
             | Diffusion based. There is no point to move to auto-
             | regressive if you are not also training a multimodality
             | LLM, which these companies are not doing that.
        
             | anotherpaul wrote:
             | Well it's a "generative flow matching model"
             | 
             | That's not the same as a diffusion model.
             | 
             | Here is a post about the difference that seems right at
             | first glance: https://diffusionflow.github.io/
        
       | amazingamazing wrote:
       | Don't understand the remove from face example. Without other
       | pictures showing the persons face, it's just using some
       | stereotypical image, no?
        
         | vessenes wrote:
         | Mm, depends on the underlying model and where it is in the
         | pipeline; identity models are pretty sophisticated at
         | interpolating faces from partial geometry.
        
         | Scaevolus wrote:
         | The slideshow appears to be glitched on that first example. The
         | input image has a snowflake covering most of her face.
        
           | whywhywhywhy wrote:
           | That's the point, it can remove it.
        
         | sharkjacobs wrote:
         | There's no "truth" it's uncovering, no real face, these are all
         | just generated images, yes.
        
           | amazingamazing wrote:
           | I get that but usually you would have two inputs, the
           | reference "true", and the target that it to be manipulated.
        
             | nine_k wrote:
             | Not necessarily. "As you may see, this is a Chinese lady.
             | You have seen a number of Chinese ladies in your training
             | set. Imagine the face of this lady so that it won't
             | contradict the fragment visible on the image with the
             | snowflake". (Damn, it's a pseudocode prompt.)
        
               | amazingamazing wrote:
               | yes, so a stereotypical image. my point is best
               | illustrated if you look at all of the photos of the
               | woman.
        
               | throwaway314155 wrote:
               | Even if you provide another image (which you totally can
               | btw) the model is still generalizing predictions enough
               | that you can say it's just making a strong guess about
               | what is concealed.
               | 
               | I guess my main point is "this is where you draw the
               | line? at a mostly accurate reconstruction of a partial of
               | someone's face?" this was science fiction a few years
               | ago. Training the model to accept two images (which it
               | can, just not for explicit purposes of reconstructing
               | (although it learns that too )) seems like a very task-
               | specific, downstream way to handle this issue. This field
               | is now about robust, general ways to emerge intelligent
               | behavior not task specific models.
        
               | amazingamazing wrote:
               | is it mostly accurate though? how would you know? suppose
               | you had an asian woman whose face is entirely covered
               | with snow.
               | 
               | sure you could tell AI to remove the snow and some face
               | will be revealed, but who is to say it's accurate? that's
               | why traditionally you have a reference input.
        
               | Gracana wrote:
               | What's the traditional workflow? I haven't seen that done
               | before, but it's something I'd like to try. Could supply
               | the "wrong" reference too, to get something specific.
        
         | jorgemf wrote:
         | I think they are doing that because using real images the model
         | changes the face. So that problem is removed if the initial
         | image doesn't show the face
        
         | ilaksh wrote:
         | Look more closely at the example. Clearly there is an
         | opportunity for inference with objects that only partially
         | obscure.
        
         | pkkkzip wrote:
         | They chosen Asian traits that Western beauty standards
         | fetishize that in Asia wouldn't be taken serious at all.
         | 
         | I notice American text2image models tend to generate less
         | attractive and more darker skinned humans where as Chinese
         | text2image generate attractive and more light skinned humans.
         | 
         | I think this is another area where Chinese AI models shine.
        
           | throwaway314155 wrote:
           | > notice American text2image models tend to generate less
           | attractive and more darker skinned humans where as Chinese
           | text2image generate attractive and more light skinned humans
           | 
           | This seems entirely subjective to me.
        
           | viraptor wrote:
           | > They chosen Asian traits that Western beauty standards
           | fetishize that in Asia wouldn't be taken serious at all.
           | 
           | > where as Chinese text2image generate attractive and more
           | light skinned humans.
           | 
           | Are you saying they have chosen Asian traits that Asian
           | beauty standards fetishize that in the West wouldn't be taken
           | seriously at all? ;) There is no ground truth here that would
           | be more correct one way or the other.
        
           | turnsout wrote:
           | Wow, that is some straight-up overt racism. You should be
           | ashamed.
        
             | fc417fc802 wrote:
             | It reads as racist if you parse it as (skin tone and
             | attractiveness) but if you instead parse it as (skin tone)
             | and (attractiveness), ie as two entirely unrelated
             | characteristics of the output, then it reads as nothing
             | more than a claim about relative differences in behavior
             | between models.
             | 
             | Of course, given the sensitivity of the topic it is
             | arguably somewhat inappropriate to make such observations
             | without sufficient effort to clarify the precise meaning.
        
               | pkkkzip wrote:
               | I find that people who are hypersensitive to racism are
               | usually themselves pretty racist. It's like people who
               | are aroused by something taboo are usually the biggest
               | critic. I forget what this phenomena is called.
        
             | astrange wrote:
             | Asians can be pretty colorist within themselves and they're
             | not going to listen to you when you tell them it's bad.
             | Asian women love skin-lightening creams.
             | 
             | This particular woman looks Vietnamese to me, but I agree
             | nothing about her appearance looks like anyone's fashion I
             | know. But I only know California ABGs so that doesn't mean
             | much.
        
       | vessenes wrote:
       | Pretty good!
       | 
       | I like that they are testing face and scene coherence with
       | iterated edits -- major pain point for 4o and other models.
        
       | ttoinou wrote:
       | How knowledgable do you need to be to tweak and train this
       | locally ?
       | 
       | I spent two days trying to train a LoRa customization on top of
       | Flux 1 dev on Windows with my RTX 4090 but can't make it work and
       | I don't know how deep into this topic and python library I need
       | to study. Are there scripts kiddies in this game or only experts
       | ?
        
         | minimaxir wrote:
         | The open-source model is not released yet, but it definitely
         | won't be any easier than training a LoRA on Flux 1 Dev.
        
           | ttoinou wrote:
           | Damn, I'm just too lazy to learn skills that will be outdated
           | in 6 months
        
         | Flemlo wrote:
         | It's normally easy to find it ore configured through comfyui.
         | 
         | Sometimes behind patreon if some YouTuber
        
         | throwaway675117 wrote:
         | Just use https://github.com/bghira/SimpleTuner
         | 
         | I was able to run this script to train a Lora myself without
         | spending any time learning the underlying python libraries.
        
           | ttoinou wrote:
           | Well thank you I will test that
        
             | dagaci wrote:
             | SimpleTuner is dependant on Microsoft's DeepSpeed which
             | doesnt work on Windows :)
             | 
             | So you probably better off using Ai-ToolKit
             | https://github.com/ostris/ai-toolkit
        
               | AuryGlenz wrote:
               | OneTrainer would be another "easy" option.
        
         | 3abiton wrote:
         | > I spent two days trying to train a LoRa customization on top
         | of Flux 1 dev on Windows with my RTX 4090 but can't make
         | 
         | Windows is mostly the issue, to really take advantage, you will
         | need linux.
        
           | ttoinou wrote:
           | Even using WSL2 with Ubuntu isn't good enough ?
        
             | AuryGlenz wrote:
             | Nah, that's fine. So is Windows for most tools.
             | 
             | The main thing is having 1. Good images with adequate
             | captions and 2. Knowing what settings to use.
             | 
             | Number 2 is much harder because there's a lot of bad
             | information out there and the people who train a ton of
             | Loras aren't usually keen to share. Still, the various
             | programs usually have some defaults that should be
             | acceptable.
        
       | minimaxir wrote:
       | Currently am testing this out (using the Replicate endpoint:
       | https://replicate.com/black-forest-labs/flux-kontext-pro).
       | Replicate also hosts "apps" with examples using FLUX Kontext for
       | some common use cases of image editing:
       | https://replicate.com/flux-kontext-apps
       | 
       | It's pretty good: quality of the generated images is similar to
       | that of GPT-4o image generation if you were using it for simple
       | image-to-image generations. Generation is speedy at about ~4
       | seconds per generation.
       | 
       | Prompt engineering outside of the examples used on this page is a
       | little fussy and I suspect will evolve over time. Changing styles
       | or specific aspects does indeed work, but the more specific you
       | get, the more it tends to ignore the specifics.
        
         | skipants wrote:
         | > Generation is speedy at about ~4 seconds per generation
         | 
         | May I ask on which GPU & VRAM?
         | 
         | edit: oh unless you just meant through huggingface's UI
        
           | minimaxir wrote:
           | It is through Replicate's UI listed, which goes through Black
           | Forest Labs's infra so would likely get the same results from
           | their API.
        
           | zamadatix wrote:
           | The open weights variant is "coming soon" so the only option
           | is hosted right now.
        
         | cuuupid wrote:
         | Honestly love Replicate for always being up to date. It's
         | amazing that not only do we live in a time of rapid AI
         | advancement, but that every new research grade model is
         | immediately available via API and can be used in prod, at
         | scale, no questions asked.
         | 
         | Something to be said about distributors like Replicate etc that
         | are adding an exponent to the impact of these model releases
        
           | minimaxir wrote:
           | That's less on the downstream distributors, more on the model
           | developers themselves realizing that ease-of-accessibility of
           | the models themselves on Day 1 is important for getting
           | community traction. Locking the model exclusively behind
           | their own API won't work anymore.
           | 
           | Llama 4 was another recent case where they explicitly worked
           | with downstream distributors to get it working Day 1.
        
           | meowface wrote:
           | I have no affiliation with either company but from using both
           | a bunch as a customer: Replicate has a competitor at
           | https://fal.ai/models and FAL's generation speed is
           | consistently faster across every model I've tried. They have
           | some sub-100 ms image gen models, too.
           | 
           | Replicate has a much bigger model selection. But for every
           | model that's on both, FAL is pretty much "Replicate but
           | faster". I believe pricing is pretty similar.
        
             | echelon wrote:
             | A16Z invested in both. It's wild. They've been absolutely
             | flooding the GenAI market for images and videos with
             | investments.
             | 
             | They'll have one of the victors, whoever it is. Maybe
             | multiple.
        
             | bfirsh wrote:
             | Founder of Replicate here. We should be on par or faster
             | for all the top models. e.g. we have the fastest FLUX[dev]:
             | https://artificialanalysis.ai/text-to-image/model-
             | family/flu...
             | 
             | If something's not as fast let me know and we can fix it.
             | ben@replicate.com
        
               | echelon wrote:
               | Hey Ben, thanks for participating in this thread. And
               | certainly also for all you and your team have built.
               | 
               | Totally frank and possibly awkward question, you don't
               | have to answer: how do you feel about a16z investing in
               | _everyone_ in this space?
               | 
               | They invested in you.
               | 
               | They're investing in your direct competitors (Fal, et
               | al.)
               | 
               | They're picking your downmarket and upmarket (Krea, et
               | al.)
               | 
               | They're picking consumer (Viggle, et al.), which could
               | lift away the value.
               | 
               | They're picking the foundation models you consume. (Black
               | Forest Labs, Hedra, et al.)
               | 
               | They're even picking the actual consumers themselves.
               | (Promise, et al.)
               | 
               | They're doing this at Series A and beyond.
               | 
               | Do you think they'll try to encourage dog-fooding or
               | consolidation?
               | 
               | The reason I ask is because I'm building adjacent or at a
               | tangent to some of this, and I wonder if a16z is "all
               | full up" or competitive within the portfolio. (If you can
               | answer in private, my email is [my username] at gmail,
               | and I'd be incredibly grateful to hear your thoughts.)
               | 
               | Beyond that, how are you feeling? This is a whirlwind of
               | a sector to be in. There's a new model every week it
               | seems.
               | 
               | Kudos on keeping up the pace! Keep at it!
        
               | mac-mc wrote:
               | That feels like the VC equivalent of buying a market-
               | specific fund, so fairly par for the course?
        
         | a2128 wrote:
         | It seems more accurate than 4o image generation in terms of
         | preserving original details. If I give it my 3D animal
         | character and ask it for a minor change like changing the
         | lighting, 4o will completely mangle the face of my character,
         | it will change the body and other details slightly. This Flux
         | model keeps the visible geometry almost perfectly the same even
         | when asked to significantly change the pose or lighting
        
           | echelon wrote:
           | gpt-image-1 (aka "4o") is still the most useful general
           | purpose image model, but damn does this come close.
           | 
           | I'm deep in this space and feel really good about FLUX.1
           | Kontext. It fills a much-needed gap, and it makes sure that
           | OpenAI / Google aren't the runaway victors of images and
           | video.
           | 
           | Prior to gpt-image-1, the biggest problems in images were:
           | - prompt adherence       - generation quality       -
           | instructiveness (eg. "put the sign above the second door")
           | - consistency of styles, characters, settings, etc.        -
           | deliberate and exact intentional posing of characters and set
           | pieces       - compositing different images or layers
           | together       - relighting
           | 
           | Fine tunes, LoRAs, and IPAdapters fixed a lot of this, but
           | they were a real pain in the ass. ControlNets solved for
           | pose, but it was still awkward and ugly. ComfyUI was an
           | orchestrator of this layer of hacks that kind of got the job
           | done, but it was hacky and unmaintainable glue. It always
           | felt like a fly-by-night solution.
           | 
           | OpenAI's gpt-image-1 solved all of these things with a single
           | multimodal model. You could throw out ComfyUI and all the
           | other pre-AI garbage and work directly with the model itself.
           | It was magic.
           | 
           | Unfortunately, gpt-image-1 is ridiculously slow, insanely
           | expensive, highly censored (you can't use a lot of
           | copyrighted characters or celebrities, and a lot of totally
           | SFW prompts are blocked). It can't be fine tuned, so you're
           | suck with the "ChatGPT style" and (as is called by the
           | community) the "piss filter" (perpetually yellowish images).
           | 
           | And the biggest problem with gpt-image-1 is because it puts
           | image and text tokens in the same space to manipulate, it
           | can't retain the exact precise pixel-precise structure of
           | reference images. Because of that, it cannot function as an
           | inpainting/outpainting model whatsoever. You can't use it to
           | edit existing images if the original image mattered.
           | 
           | Even with those flaws, gpt-image-1 was a million times better
           | than Flux, ComfyUI, and all the other ball of wax hacks we've
           | built up. Given the expense of training gpt-image-1, I was
           | worried that nobody else would be able to afford to train the
           | competition and that OpenAI would win the space forever. We'd
           | be left with only hyperscalers of AI building these models.
           | And it would suck if Google and OpenAI were the only
           | providers of tools for artists.
           | 
           | Black Forest Labs just proved that wrong in a big way! While
           | this model doesn't do everything as well as gpt-image-1, it's
           | within the same order of magnitude. And it's ridiculously
           | fast (10x faster) and cheap (10x cheaper).
           | 
           | Kontext isn't as instructive as gpt-image-1. You can't give
           | it multiple pictures and ask it to copy characters from one
           | image into the pose of another image. You can't have it
           | follow complex compositing requests. But it's close, and that
           | makes it immediately useful. It fills a much-needed gap in
           | the space.
           | 
           | Black Forest Labs did the right thing by developing this
           | instead of a video model. We need much more innovation in the
           | image model space, and we need more gaps to be filled:
           | - Fast       - Truly multimodal like gpt-image-1       -
           | Instructive        - Posing built into the model. No
           | ControlNet hacks.        - References built into the model.
           | No IPAdapter, no required character/style LoRAs, etc.
           | - Ability to address objects, characters, mannequins, etc.
           | for deletion / insertion.        - Ability to pull sources
           | from across multiple images with or without "innovation" /
           | change to their pixels.       - Fine-tunable (so we can get
           | higher quality and precision)
           | 
           | Something like this that works in real time would literally
           | change the game forever.
           | 
           | Please build it, Black Forest Labs.
           | 
           | All of those feature requests stated, Kontext is a great
           | model. I'm going to be learning it over the next weeks.
           | 
           | Keep at it, BFL. Don't let OpenAI win. This model rocks.
           | 
           | Now let's hope Kling or Runway (or, better, someone who does
           | open weights -- BFL!) develops a Veo 3 competitor.
           | 
           | I need my AI actors to _" Meisner"_, and so far only Veo 3
           | comes close.
        
             | ttoinou wrote:
             | Your comment is def why we come to HN :)
             | 
             | Thanks for the detailed info
        
               | tristanMatthias wrote:
               | Thought the SAME thing
        
             | meta87 wrote:
             | this breakdown made my day thank you!
             | 
             | Im building a web based paint/image editor with ai
             | inpainting etc
             | 
             | and this is going to be a great model to use price wise and
             | capability wise
             | 
             | completely agree so happy its not any one of these big co's
             | controlling the whole space!
        
               | perk wrote:
               | What are you building? Ping me if you want a tester of
               | half-finished breaking stuff
        
             | whywhywhywhy wrote:
             | >Given the expense of training gpt-image-1, I was worried
             | that nobody else would be able to afford to train the
             | competition
             | 
             | OpenAI models are expensive to train because it's
             | beneficial for OpenAI models to be expensive and there is
             | no incentive to optimize when they're gonna run in a server
             | farm anyway.
             | 
             | Probably a bunch of teams never bothered trying to
             | replicate Dall-E 1+2 because the training run cost
             | millions, yet SD1.5 showed us comparable tech can run on a
             | home computer and be trained from scratch for thousands or
             | fine tuned for cents.
        
             | qingcharles wrote:
             | When I first saw gpt-image-1 I was equally scared that
             | OpenAI had used its resources to push so far ahead that
             | more open models would be left completely in the dust for
             | the significant future.
             | 
             | Glad to see this release. It also puts more pressure onto
             | OpenAI to make their model less lobotomized and to increase
             | its output quality. This is good for everyone.
        
         | reissbaker wrote:
         | In my quick experimentation for image-to-image this feels even
         | better than GPT-4o: 4o tends to heavily weight the colors
         | towards sepia, to the point where it's a bit of an obvious tell
         | that the image was 4o-generated (especially with repeated
         | edits); FLUX.1 Kontext seems to use a much wider, more colorful
         | palette. And FLUX, at least the Max version I'm playing around
         | with on Replicate, nails small details that 4o can miss.
         | 
         | I haven't played around with from-scratch generation, so I'm
         | not sure which is best if you're trying to generate an image
         | just from a prompt. But in terms of image-to-image via a
         | prompt, it feels like FLUX is noticeably better.
        
       | andybak wrote:
       | Nobody tested that page on mobile.
        
       | jnettome wrote:
       | I'm trying to login to evaluate this but the google auth
       | redirects me back to localhost:3000
        
       | vunderba wrote:
       | I'm debating whether to add the FLUX Kontext model to my GenAI
       | image comparison site. The Max variant of the model definitely
       | scores higher in _prompt adherence_ nearly doubling Flux 1.dev
       | score but still falling short of OpenAI 's gpt-image-1 which
       | (visual fidelity aside) is sitting at the top of the leaderboard.
       | 
       | I liked keeping Flux 1.D around just to have a nice baseline for
       | local GenAI capabilities.
       | 
       | https://genai-showdown.specr.net
       | 
       | Incidentally, we did add the newest release of Hunyuan's Image
       | 2.0 model but as expected of a real-time model it scores rather
       | poorly.
       | 
       |  _EDIT: In fairness to Black Forest Labs this model definitely
       | seems to be more focused on editing capabilities to refine and
       | iterate on existing images rather than on strict text-to-image
       | creation._
        
         | Klaus23 wrote:
         | Nice site! I have a suggestion for a prompt that I could never
         | get to work properly. It's been a while since I tried it, and
         | the models have probably improved enough that it should be
         | possible now.                 A knight with a sword in hand
         | stands with his back to us, facing down an army. He holds his
         | shield above his head to protect himself from the rain of
         | arrows shot by archers visible in the rear.
         | 
         | I was surprised at how badly the models performed. It's a
         | fairly iconic scene, and there's more than enough training
         | data.
        
           | lawik wrote:
           | Making an accurate flail (stick - chain - ball) is a fun
           | sport.. weird things tend to happen.
        
         | meta87 wrote:
         | please add! cool site thanks :)
        
         | nopinsight wrote:
         | Wondering if you could add "Flux 1.1 Pro Ultra" to the site?
         | It's supposed to be the best among the Flux family of models,
         | and far better than Flux Dev (3rd among your current
         | candidates) at prompt adherence.
         | 
         | Adding it would also provide a fair assessment for a leading
         | open source model.
         | 
         | The site is a great idea and features very interesting prompts.
         | :)
        
         | theyinwhy wrote:
         | Looks good! Would be great to see Adobe Firefly in your
         | evaluation as well.
        
       | vunderba wrote:
       | Some of these samples are rather cherry picked. Has anyone
       | actually tried the professional headshot app of the "Kontext
       | Apps"?
       | 
       | https://replicate.com/flux-kontext-apps
       | 
       | I've thrown half a dozen pictures of myself at it and it just
       | completely replaced me with somebody else. To be fair, the final
       | headshot does look very professional.
        
         | minimaxir wrote:
         | Is the input image aspect ratio the same as the output aspect
         | ratio? In some testing I've noticed that there is weirdness
         | that happens if there is a forced shift.
        
         | doctorpangloss wrote:
         | Nobody has solved the scientific problem of identity
         | preservation for faces in one shot. Nobody has even solved
         | hands.
        
           | emmelaich wrote:
           | I tried making a realistic image from a cartoon character but
           | aged. It did very well, definitely recognisable as the same
           | 'person'.
        
           | danielbln wrote:
           | Best bet right now is still face swapping with something like
           | insightface.
        
         | mac-mc wrote:
         | I tried a professional headshot prompt on the flux playground
         | with a tired gym selfie and it kept it as myself, same
         | expression, sweat, skin tone and all. It was like a background
         | swap, then I expanded it to "make a professional headshot
         | version of this image that would be good for social media, make
         | the person smile, have a good pose and clothing, clean non-
         | sweaty skin, etc" and it stayed pretty similar, except it
         | swapped the clothing and gave me an awkward smile, which may be
         | accurate for those kinds of things if you think about it.
        
           | diggan wrote:
           | It isn't mentioned on https://replicate.com/flux-kontext-
           | apps/professional-headsho..., but on
           | https://replicate.com/black-forest-labs/flux-kontext-pro
           | under the "Prompting Best Practices" section is says this:
           | 
           | > Preserve Intentionally
           | 
           | > Specify what should stay the same: "while keeping the same
           | facial features"
           | 
           | > Use "maintain the original composition" to preserve layout
           | 
           | > For background changes: "Change the background to a beach
           | while keeping the person in the exact same position"
           | 
           | So while the marketing seems to paint a picture that it'll
           | preserve things automatically, and kind of understand exactly
           | what you want changed, it doesn't seem like that's the full
           | truth. You need to instead be very specific about what you
           | want to preserve.
        
         | pkrx wrote:
         | It's convenient but the results are def not significantly
         | better than available free stuff
        
       | layer8 wrote:
       | > show me a closeup of...
       | 
       | Investigators will love this for "enhance". ;)
        
         | mdp2021 wrote:
         | At some point, "Do not let the tool invent details!" will
         | become a shout more frequent than most expressions.
        
       | ilaksh wrote:
       | Anyone have a guess as to when the open dev version gets
       | released? More like a week or a month or two I wonder.
        
       | mdp2021 wrote:
       | Is input restricted to a single image? If you could use more
       | images as input, you could do prompts like "Place the item in
       | image A inside image B" (e.g. "put the character of image A in
       | the scenery of image B"), etc.
        
         | carlosdp wrote:
         | There's an experimental "multi" mode you can input multiple
         | images to
        
         | echelon wrote:
         | Fal has the multi image interface to test against. (Replicate
         | might as well, I haven't checked yet.)
         | 
         | THIS MODEL ROCKS!
         | 
         | It's no gpt-image-1, but it's ridiculously close.
         | 
         | There isn't going to be a moat in images or video. I was so
         | worried Google and OpenAI would win creative forever. Not so.
         | Anyone can build these.
        
       | bossyTeacher wrote:
       | I wonder if this is using a foundation model or a fine tuned one
        
       | fagerhult wrote:
       | I vibed up a little chat interface https://kontext-
       | chat.vercel.app/
        
         | Hard_Space wrote:
         | Does not seem to want to deal with faces at all, flags
         | everything with humans as 'sensitive' and declines.
        
       | eamag wrote:
       | Can it generate chess? https://manifold.markets/Hazel/an-ai-
       | model-will-successfully...
        
         | zamadatix wrote:
         | The focus of this model is to be able to do iterative editing
         | and/or use other images as a source while the focus of that bet
         | is to consistently one shot a specific image 9/10 times with
         | the same prompt. Given the canyon between those two focuses I
         | don't think so, but maybe if you had an inventive enough
         | prompt?
        
       | gravitywp wrote:
       | You can try it now on https://fluxk.art.
        
       | xnorswap wrote:
       | I tried this out and a hilarious "context-slip" happened:
       | 
       | https://imgur.com/a/gT6iuV1
       | 
       | It generated (via a prompt) an image of a space ship landing on a
       | remote planet.
       | 
       | I asked an edit, "The ship itself should be more colourful and a
       | larger part of the image".
       | 
       | And it replaced the space-ship with a container vessel.
       | 
       | It had the chat history, it should have understood I still wanted
       | a space-ship, but it dropped the relevant context for what I was
       | trying to achieve.
        
         | gunalx wrote:
         | I mean ro its credit, one of the cobtainer ships seems to be
         | flying. /s
        
       | sujayk_33 wrote:
       | So here is my understanding of current native image generation
       | scenario, I might be wrong so please correct me, I'm still
       | learning it and I'd appreaciate the help.
       | 
       | First time native image gen was introduced in Gemini 1.5 Flash if
       | I'm not wrong, and then OpenAI was released for 4o which took
       | over the internet by Ghibli Art.
       | 
       | We have been getting good quality images from almost all image
       | generators like Midjourney, OpenAI and other providers, but the
       | thing that made it special was true "multimodal" nature of it.
       | Here's what I mean
       | 
       | When you used to ask chatgpt to create an image, it will rephrase
       | that prompt and internally send that prompt to Dalle, similarly
       | gemini would send it to Imagen which were diffusion models and
       | they had little to know context in your next response about
       | what's there in the previous image
       | 
       | In native image generation, it understands Audio, Text and even
       | Image tokens in the same model and need not to rely on diffusion
       | models internally, I don't think both Openai and google has
       | released how they've trained it but my guess is that it's
       | partially auto-regressive and diffusion but not sure about it
        
         | claudefocan wrote:
         | This is not fully correct.
         | 
         | The people behind flux are the authors of stable diffusion
         | paper that dates back to 2022.
         | 
         | Openai initially had dallee but stable diffusion was a massive
         | improvement on dallee.
         | 
         | Then openai inspired itself from stable diffusion for gpt image
        
       ___________________________________________________________________
       (page generated 2025-05-30 23:02 UTC)