[HN Gopher] FLUX.1 Kontext
___________________________________________________________________
FLUX.1 Kontext
Author : minimaxir
Score : 495 points
Date : 2025-05-29 17:40 UTC (1 days ago)
(HTM) web link (bfl.ai)
(TXT) w3m dump (bfl.ai)
| SV_BubbleTime wrote:
| Single shot LORA effect if it works as they cherry pick will be a
| game changer for editing.
|
| As with almost any AI release though, unless it's open weights, I
| don't care. The strengths and weaknesses of these models are
| apparent when you run them locally.
| ttoinou wrote:
| They're not apparent when you run them online ?
| SV_BubbleTime wrote:
| Not for what these models are actually being used for.
| nullbyte wrote:
| Hopefully they list this on HuggingFace for the opensource
| community. It looks like a great model!
| minimaxir wrote:
| The original open-source Flux releases were also on Hugging
| Face.
| vunderba wrote:
| From their site they will be releasing the _DEV_ version -
| which is a distilled variant - so quality and adherence will
| suffer unfortunately.
| fortran77 wrote:
| It still has no idea what a Model F keyboard looks like. I tried
| prompts and editing, and got things that weren't even close.
| yorwba wrote:
| You mean when you edit a picture of a Model F keyboard and tell
| it to use it in a scene, it still produces a different
| keyboard?
| refulgentis wrote:
| Interesting, would you mind sharing? (imgur allows free image
| uploads, quick drag and drop)
|
| I do have a "works on my machine"* :) -- prompt "Model F
| keyboard", all settings disabled, on the smaller model, seems
| to have substantially more than no idea:
| https://imgur.com/a/32pV6Sp
|
| (Google Images comparison included to show in-the-wild "Model F
| keyboard", which may differ from my/your expected distribution)
|
| * my machine, being, https://playground.bfl.ai/ (I have no
| affiliation with BFL)
| jsheard wrote:
| Your generated examples just look like generic modern-day
| mechanical keyboards, they don't have any of the Model Fs
| defining features.
| AStonesThrow wrote:
| Your Google Images search indicates the original problem of
| models training on junk misinformation online. If AI scrapers
| are downloading every photo that's associated with "Model F
| Keyboard" like that, the models have no idea what is an IBM
| Model F, or its distinguishing characteristics, and what is
| some other company's, and what is misidentified.
|
| https://commons.wikimedia.org/wiki/Category:IBM_Model_F_Keyb.
| ..
|
| Specifying "IBM Model F keyboard" _and placing it in
| quotation marks_ improves the search. But the front page of
| the search is tip-of-the-iceberg compared to whatever the
| model 's scrapers ingested.
|
| Eventually you may hit trademark protections. Reproducing a
| brand-name keyboard may be as difficult as simulating a
| celebrity's likeness.
|
| I'm not even sure what my friends look like on Facebook, so
| it's not clear how an AI model would reproduce a brand-name
| keyboard design on request.
| refulgentis wrote:
| I agree with you vehemently.
|
| Another way of looking at it is, insistence on complete
| verisimilitude in an _image generator_ is fundamentally in
| error.
|
| I would argue, even undesirable. I don't want to live in a
| world where a 45 year old keyboard that was only out for 4
| years is readily imitated in every microscopic detail.
|
| I also find myself frustrated, and asking myself why.
|
| First thought that jumps in: it's very clear that it is in
| error to say the model has _no idea_ , modulo there's some
| independent run that's dramatically different from the only
| one offered in this thread.
|
| Second thought: if we're doing "the image generators don't
| get details right", it would seem to be there a lot simpler
| examples than OPs, and it is better expressed that way - I
| assume it wasn't expressed that way because it sounds like
| dull conversation, but it doesn't have to be!
|
| Third thought as to why I feel frustrated: I feel like I
| wasted time here - no other demos showing it's anywhere
| close to "no idea", its completely unclear to me whats
| distinctive about a IBM Model F Keyboard, and the the
| wikipedia images are _worse_ than Google 's AFAICT.
| fc417fc802 wrote:
| > if we're doing "the image generators don't get details
| right", it would seem to be there a lot simpler examples
| than OPs
|
| There are different sorts of details though, and the
| distinctions are both useful and interesting to
| understanding the state of the art. If "man drinking
| coke" produces someone with 6 fingers holding a glass of
| water that's completely different from producing someone
| with 5 fingers holding a can of pepsi.
|
| Notice that none of the images in your example got the
| function key placement correct. Clearly the model knows
| what a relatively modern keyboard is, and it even has
| some concept of a vaguely retro looking mechanical
| keyboard. However indeed I'm inclined to agree with OP
| that it has approximately zero idea what an "IBM model F"
| keyboard is. I'm not sure that's a failure of the model
| though - as you point out, it's an ancient and fairly
| obscure product.
| fc417fc802 wrote:
| > Eventually you may hit trademark protections. Reproducing
| a brand-name keyboard may be as difficult as simulating a
| celebrity's likeness.
|
| Then the law is broken. Monetizing someone's likeness is an
| issue. Utilizing trademarked characteristics to promote
| your own product without permission is an issue. It's the
| downstream actions of the user that are the issue, not the
| ML model itself.
|
| Models regurgitating copyrighted material verbatim is of
| course an entirely separate issue.
| refulgentis wrote:
| >> Eventually you may hit trademark protections.
|
| > Then the law is broken.
|
| > Utilizing trademarked characteristics to promote your
| own product without permission is an issue.
|
| It sounds like you agree with parent that if your product
| reproduces trademark characteristics, it is utilizing
| trademarked characteristics. Just about at what layer you
| don't have responsibility. And the layer that has
| responsibility is the one that profits unjustly from the
| AI.
|
| I'm interested if there's an argument for saying only the
| 2nd party user of the 1st party AI model, selling AI
| model output, to a 3rd party, is intuitively unfair.
|
| I can't think of one. e.g. Disney launches some new
| cartoon or whatever. 1st party Openmetagoog, trains on it
| to make my "Video Episode Generator" product. Now,
| Openmetagoogs Community Pages are full of 30m video
| episodes made by their image generator. They didn't make
| them, nor do they promote them. Inuitively, Openmetagoog
| a _competitor_ for manufacturing my IP, and that is also
| intuitively _wrong_. Your analysis would have us charge
| the users for sharing the output.
| fc417fc802 wrote:
| > if your product reproduces trademark characteristics,
| it is utilizing trademarked characteristics.
|
| I wouldn't agree with that, no. To my mind "utilizing"
| generally requires intent at least in the context we're
| discussing here (ie moral or legal obligations). I'd
| remind you that the entire point of trademark is
| (approximately) to prevent brand confusion within the
| market.
|
| > Your analysis would have us charge the users for
| sharing the output.
|
| Precisely. I see it as both a matter of intent and
| concrete damages. Creating something (pencil, diffusion
| model, camera, etc) that _could possibly_ be used in a
| manner that violates the law is not a problem. It is the
| end user violating the law that is at fault.
|
| Imagine an online community that uses blender to create
| disney knockoffs and shares them publicly. Blender is not
| at fault and the creation of the knockoffs themselves (ie
| in private) is not the issue either. It's the part where
| the users proceed to publicly share them that poses the
| problem.
|
| > They didn't make them, nor do they promote them.
|
| By the same logic youtube neither creates nor promotes
| pirated content that gets uploaded. We have DMCA takedown
| notices for dealing with precisely this issue.
|
| > Inuitively, Openmetagoog a competitor for manufacturing
| my IP, and that is also intuitively wrong.
|
| Let's be clear about the distinction between trademark
| and copyright here. Outputting a verbatim copy is indeed
| a problem. Outputting a likeness is not, but an end user
| could certainly proceed to (mis)use that output in a
| manner that is.
|
| Intent matters here. A product whose primary purpose is
| IP infringement is entirely different from one whose
| purpose is general but could potentially be used to
| infringe.
| stephen37 wrote:
| I got it working when I provided an image of a Model F
| keyboard. This is the strength of the model, provide it an
| input image and it will do some magic
|
| Disclaimer: I work for BFL
| anjneymidha wrote:
| Technical report here for those curious:
| https://cdn.sanity.io/files/gsvmb6gz/production/880b07220899...
| rvz wrote:
| Unfortunately, nobody wants to read the report, but what they
| are really after is to download the open-weight model.
|
| So they can take it and run with it. (No contributing back
| either).
| anjneymidha wrote:
| "FLUX.1 Kontext [dev]
|
| Open-weights, distilled variant of Kontext, our most advanced
| generative image editing model. Coming soon" is what they say
| on https://bfl.ai/models/flux-kontext
| sigmoid10 wrote:
| Distilled is a real downer, but I guess those AI startup
| CEOs still gotta eat.
| dragonwriter wrote:
| The open community has a done a lot with the open-weights
| distilled models from Black Forest Labs already, one of
| the more radical being Chroma:
| https://huggingface.co/lodestones/Chroma
| refulgentis wrote:
| I agree that gooning crew drives a _lot_ of open model
| downloads.
|
| On HN, generally, people are more into technical discussion
| and/or productizing this stuff. Here, it seems declasse to
| mention the gooner angle, it's usually euphemized as intense
| reactions about refusing to download it involving the words
| "censor"
| liuliu wrote:
| Seems implementation is straightforward (very similar to
| everyone else, HiDream-E1, ICEdit, DreamO etc.), the magic is
| on data curation (which details are lightly shared).
| krackers wrote:
| I haven't been following image generation models closely, at
| a high level is this new Flux model still diffusion based, or
| have they moved to block autoregressive (possibly with
| diffusion for upscaling) similar to 4o?
| liuliu wrote:
| Diffusion based. There is no point to move to auto-
| regressive if you are not also training a multimodality
| LLM, which these companies are not doing that.
| anotherpaul wrote:
| Well it's a "generative flow matching model"
|
| That's not the same as a diffusion model.
|
| Here is a post about the difference that seems right at
| first glance: https://diffusionflow.github.io/
| amazingamazing wrote:
| Don't understand the remove from face example. Without other
| pictures showing the persons face, it's just using some
| stereotypical image, no?
| vessenes wrote:
| Mm, depends on the underlying model and where it is in the
| pipeline; identity models are pretty sophisticated at
| interpolating faces from partial geometry.
| Scaevolus wrote:
| The slideshow appears to be glitched on that first example. The
| input image has a snowflake covering most of her face.
| whywhywhywhy wrote:
| That's the point, it can remove it.
| sharkjacobs wrote:
| There's no "truth" it's uncovering, no real face, these are all
| just generated images, yes.
| amazingamazing wrote:
| I get that but usually you would have two inputs, the
| reference "true", and the target that it to be manipulated.
| nine_k wrote:
| Not necessarily. "As you may see, this is a Chinese lady.
| You have seen a number of Chinese ladies in your training
| set. Imagine the face of this lady so that it won't
| contradict the fragment visible on the image with the
| snowflake". (Damn, it's a pseudocode prompt.)
| amazingamazing wrote:
| yes, so a stereotypical image. my point is best
| illustrated if you look at all of the photos of the
| woman.
| throwaway314155 wrote:
| Even if you provide another image (which you totally can
| btw) the model is still generalizing predictions enough
| that you can say it's just making a strong guess about
| what is concealed.
|
| I guess my main point is "this is where you draw the
| line? at a mostly accurate reconstruction of a partial of
| someone's face?" this was science fiction a few years
| ago. Training the model to accept two images (which it
| can, just not for explicit purposes of reconstructing
| (although it learns that too )) seems like a very task-
| specific, downstream way to handle this issue. This field
| is now about robust, general ways to emerge intelligent
| behavior not task specific models.
| amazingamazing wrote:
| is it mostly accurate though? how would you know? suppose
| you had an asian woman whose face is entirely covered
| with snow.
|
| sure you could tell AI to remove the snow and some face
| will be revealed, but who is to say it's accurate? that's
| why traditionally you have a reference input.
| Gracana wrote:
| What's the traditional workflow? I haven't seen that done
| before, but it's something I'd like to try. Could supply
| the "wrong" reference too, to get something specific.
| jorgemf wrote:
| I think they are doing that because using real images the model
| changes the face. So that problem is removed if the initial
| image doesn't show the face
| ilaksh wrote:
| Look more closely at the example. Clearly there is an
| opportunity for inference with objects that only partially
| obscure.
| pkkkzip wrote:
| They chosen Asian traits that Western beauty standards
| fetishize that in Asia wouldn't be taken serious at all.
|
| I notice American text2image models tend to generate less
| attractive and more darker skinned humans where as Chinese
| text2image generate attractive and more light skinned humans.
|
| I think this is another area where Chinese AI models shine.
| throwaway314155 wrote:
| > notice American text2image models tend to generate less
| attractive and more darker skinned humans where as Chinese
| text2image generate attractive and more light skinned humans
|
| This seems entirely subjective to me.
| viraptor wrote:
| > They chosen Asian traits that Western beauty standards
| fetishize that in Asia wouldn't be taken serious at all.
|
| > where as Chinese text2image generate attractive and more
| light skinned humans.
|
| Are you saying they have chosen Asian traits that Asian
| beauty standards fetishize that in the West wouldn't be taken
| seriously at all? ;) There is no ground truth here that would
| be more correct one way or the other.
| turnsout wrote:
| Wow, that is some straight-up overt racism. You should be
| ashamed.
| fc417fc802 wrote:
| It reads as racist if you parse it as (skin tone and
| attractiveness) but if you instead parse it as (skin tone)
| and (attractiveness), ie as two entirely unrelated
| characteristics of the output, then it reads as nothing
| more than a claim about relative differences in behavior
| between models.
|
| Of course, given the sensitivity of the topic it is
| arguably somewhat inappropriate to make such observations
| without sufficient effort to clarify the precise meaning.
| pkkkzip wrote:
| I find that people who are hypersensitive to racism are
| usually themselves pretty racist. It's like people who
| are aroused by something taboo are usually the biggest
| critic. I forget what this phenomena is called.
| astrange wrote:
| Asians can be pretty colorist within themselves and they're
| not going to listen to you when you tell them it's bad.
| Asian women love skin-lightening creams.
|
| This particular woman looks Vietnamese to me, but I agree
| nothing about her appearance looks like anyone's fashion I
| know. But I only know California ABGs so that doesn't mean
| much.
| vessenes wrote:
| Pretty good!
|
| I like that they are testing face and scene coherence with
| iterated edits -- major pain point for 4o and other models.
| ttoinou wrote:
| How knowledgable do you need to be to tweak and train this
| locally ?
|
| I spent two days trying to train a LoRa customization on top of
| Flux 1 dev on Windows with my RTX 4090 but can't make it work and
| I don't know how deep into this topic and python library I need
| to study. Are there scripts kiddies in this game or only experts
| ?
| minimaxir wrote:
| The open-source model is not released yet, but it definitely
| won't be any easier than training a LoRA on Flux 1 Dev.
| ttoinou wrote:
| Damn, I'm just too lazy to learn skills that will be outdated
| in 6 months
| Flemlo wrote:
| It's normally easy to find it ore configured through comfyui.
|
| Sometimes behind patreon if some YouTuber
| throwaway675117 wrote:
| Just use https://github.com/bghira/SimpleTuner
|
| I was able to run this script to train a Lora myself without
| spending any time learning the underlying python libraries.
| ttoinou wrote:
| Well thank you I will test that
| dagaci wrote:
| SimpleTuner is dependant on Microsoft's DeepSpeed which
| doesnt work on Windows :)
|
| So you probably better off using Ai-ToolKit
| https://github.com/ostris/ai-toolkit
| AuryGlenz wrote:
| OneTrainer would be another "easy" option.
| 3abiton wrote:
| > I spent two days trying to train a LoRa customization on top
| of Flux 1 dev on Windows with my RTX 4090 but can't make
|
| Windows is mostly the issue, to really take advantage, you will
| need linux.
| ttoinou wrote:
| Even using WSL2 with Ubuntu isn't good enough ?
| AuryGlenz wrote:
| Nah, that's fine. So is Windows for most tools.
|
| The main thing is having 1. Good images with adequate
| captions and 2. Knowing what settings to use.
|
| Number 2 is much harder because there's a lot of bad
| information out there and the people who train a ton of
| Loras aren't usually keen to share. Still, the various
| programs usually have some defaults that should be
| acceptable.
| minimaxir wrote:
| Currently am testing this out (using the Replicate endpoint:
| https://replicate.com/black-forest-labs/flux-kontext-pro).
| Replicate also hosts "apps" with examples using FLUX Kontext for
| some common use cases of image editing:
| https://replicate.com/flux-kontext-apps
|
| It's pretty good: quality of the generated images is similar to
| that of GPT-4o image generation if you were using it for simple
| image-to-image generations. Generation is speedy at about ~4
| seconds per generation.
|
| Prompt engineering outside of the examples used on this page is a
| little fussy and I suspect will evolve over time. Changing styles
| or specific aspects does indeed work, but the more specific you
| get, the more it tends to ignore the specifics.
| skipants wrote:
| > Generation is speedy at about ~4 seconds per generation
|
| May I ask on which GPU & VRAM?
|
| edit: oh unless you just meant through huggingface's UI
| minimaxir wrote:
| It is through Replicate's UI listed, which goes through Black
| Forest Labs's infra so would likely get the same results from
| their API.
| zamadatix wrote:
| The open weights variant is "coming soon" so the only option
| is hosted right now.
| cuuupid wrote:
| Honestly love Replicate for always being up to date. It's
| amazing that not only do we live in a time of rapid AI
| advancement, but that every new research grade model is
| immediately available via API and can be used in prod, at
| scale, no questions asked.
|
| Something to be said about distributors like Replicate etc that
| are adding an exponent to the impact of these model releases
| minimaxir wrote:
| That's less on the downstream distributors, more on the model
| developers themselves realizing that ease-of-accessibility of
| the models themselves on Day 1 is important for getting
| community traction. Locking the model exclusively behind
| their own API won't work anymore.
|
| Llama 4 was another recent case where they explicitly worked
| with downstream distributors to get it working Day 1.
| meowface wrote:
| I have no affiliation with either company but from using both
| a bunch as a customer: Replicate has a competitor at
| https://fal.ai/models and FAL's generation speed is
| consistently faster across every model I've tried. They have
| some sub-100 ms image gen models, too.
|
| Replicate has a much bigger model selection. But for every
| model that's on both, FAL is pretty much "Replicate but
| faster". I believe pricing is pretty similar.
| echelon wrote:
| A16Z invested in both. It's wild. They've been absolutely
| flooding the GenAI market for images and videos with
| investments.
|
| They'll have one of the victors, whoever it is. Maybe
| multiple.
| bfirsh wrote:
| Founder of Replicate here. We should be on par or faster
| for all the top models. e.g. we have the fastest FLUX[dev]:
| https://artificialanalysis.ai/text-to-image/model-
| family/flu...
|
| If something's not as fast let me know and we can fix it.
| ben@replicate.com
| echelon wrote:
| Hey Ben, thanks for participating in this thread. And
| certainly also for all you and your team have built.
|
| Totally frank and possibly awkward question, you don't
| have to answer: how do you feel about a16z investing in
| _everyone_ in this space?
|
| They invested in you.
|
| They're investing in your direct competitors (Fal, et
| al.)
|
| They're picking your downmarket and upmarket (Krea, et
| al.)
|
| They're picking consumer (Viggle, et al.), which could
| lift away the value.
|
| They're picking the foundation models you consume. (Black
| Forest Labs, Hedra, et al.)
|
| They're even picking the actual consumers themselves.
| (Promise, et al.)
|
| They're doing this at Series A and beyond.
|
| Do you think they'll try to encourage dog-fooding or
| consolidation?
|
| The reason I ask is because I'm building adjacent or at a
| tangent to some of this, and I wonder if a16z is "all
| full up" or competitive within the portfolio. (If you can
| answer in private, my email is [my username] at gmail,
| and I'd be incredibly grateful to hear your thoughts.)
|
| Beyond that, how are you feeling? This is a whirlwind of
| a sector to be in. There's a new model every week it
| seems.
|
| Kudos on keeping up the pace! Keep at it!
| mac-mc wrote:
| That feels like the VC equivalent of buying a market-
| specific fund, so fairly par for the course?
| a2128 wrote:
| It seems more accurate than 4o image generation in terms of
| preserving original details. If I give it my 3D animal
| character and ask it for a minor change like changing the
| lighting, 4o will completely mangle the face of my character,
| it will change the body and other details slightly. This Flux
| model keeps the visible geometry almost perfectly the same even
| when asked to significantly change the pose or lighting
| echelon wrote:
| gpt-image-1 (aka "4o") is still the most useful general
| purpose image model, but damn does this come close.
|
| I'm deep in this space and feel really good about FLUX.1
| Kontext. It fills a much-needed gap, and it makes sure that
| OpenAI / Google aren't the runaway victors of images and
| video.
|
| Prior to gpt-image-1, the biggest problems in images were:
| - prompt adherence - generation quality -
| instructiveness (eg. "put the sign above the second door")
| - consistency of styles, characters, settings, etc. -
| deliberate and exact intentional posing of characters and set
| pieces - compositing different images or layers
| together - relighting
|
| Fine tunes, LoRAs, and IPAdapters fixed a lot of this, but
| they were a real pain in the ass. ControlNets solved for
| pose, but it was still awkward and ugly. ComfyUI was an
| orchestrator of this layer of hacks that kind of got the job
| done, but it was hacky and unmaintainable glue. It always
| felt like a fly-by-night solution.
|
| OpenAI's gpt-image-1 solved all of these things with a single
| multimodal model. You could throw out ComfyUI and all the
| other pre-AI garbage and work directly with the model itself.
| It was magic.
|
| Unfortunately, gpt-image-1 is ridiculously slow, insanely
| expensive, highly censored (you can't use a lot of
| copyrighted characters or celebrities, and a lot of totally
| SFW prompts are blocked). It can't be fine tuned, so you're
| suck with the "ChatGPT style" and (as is called by the
| community) the "piss filter" (perpetually yellowish images).
|
| And the biggest problem with gpt-image-1 is because it puts
| image and text tokens in the same space to manipulate, it
| can't retain the exact precise pixel-precise structure of
| reference images. Because of that, it cannot function as an
| inpainting/outpainting model whatsoever. You can't use it to
| edit existing images if the original image mattered.
|
| Even with those flaws, gpt-image-1 was a million times better
| than Flux, ComfyUI, and all the other ball of wax hacks we've
| built up. Given the expense of training gpt-image-1, I was
| worried that nobody else would be able to afford to train the
| competition and that OpenAI would win the space forever. We'd
| be left with only hyperscalers of AI building these models.
| And it would suck if Google and OpenAI were the only
| providers of tools for artists.
|
| Black Forest Labs just proved that wrong in a big way! While
| this model doesn't do everything as well as gpt-image-1, it's
| within the same order of magnitude. And it's ridiculously
| fast (10x faster) and cheap (10x cheaper).
|
| Kontext isn't as instructive as gpt-image-1. You can't give
| it multiple pictures and ask it to copy characters from one
| image into the pose of another image. You can't have it
| follow complex compositing requests. But it's close, and that
| makes it immediately useful. It fills a much-needed gap in
| the space.
|
| Black Forest Labs did the right thing by developing this
| instead of a video model. We need much more innovation in the
| image model space, and we need more gaps to be filled:
| - Fast - Truly multimodal like gpt-image-1 -
| Instructive - Posing built into the model. No
| ControlNet hacks. - References built into the model.
| No IPAdapter, no required character/style LoRAs, etc.
| - Ability to address objects, characters, mannequins, etc.
| for deletion / insertion. - Ability to pull sources
| from across multiple images with or without "innovation" /
| change to their pixels. - Fine-tunable (so we can get
| higher quality and precision)
|
| Something like this that works in real time would literally
| change the game forever.
|
| Please build it, Black Forest Labs.
|
| All of those feature requests stated, Kontext is a great
| model. I'm going to be learning it over the next weeks.
|
| Keep at it, BFL. Don't let OpenAI win. This model rocks.
|
| Now let's hope Kling or Runway (or, better, someone who does
| open weights -- BFL!) develops a Veo 3 competitor.
|
| I need my AI actors to _" Meisner"_, and so far only Veo 3
| comes close.
| ttoinou wrote:
| Your comment is def why we come to HN :)
|
| Thanks for the detailed info
| tristanMatthias wrote:
| Thought the SAME thing
| meta87 wrote:
| this breakdown made my day thank you!
|
| Im building a web based paint/image editor with ai
| inpainting etc
|
| and this is going to be a great model to use price wise and
| capability wise
|
| completely agree so happy its not any one of these big co's
| controlling the whole space!
| perk wrote:
| What are you building? Ping me if you want a tester of
| half-finished breaking stuff
| whywhywhywhy wrote:
| >Given the expense of training gpt-image-1, I was worried
| that nobody else would be able to afford to train the
| competition
|
| OpenAI models are expensive to train because it's
| beneficial for OpenAI models to be expensive and there is
| no incentive to optimize when they're gonna run in a server
| farm anyway.
|
| Probably a bunch of teams never bothered trying to
| replicate Dall-E 1+2 because the training run cost
| millions, yet SD1.5 showed us comparable tech can run on a
| home computer and be trained from scratch for thousands or
| fine tuned for cents.
| qingcharles wrote:
| When I first saw gpt-image-1 I was equally scared that
| OpenAI had used its resources to push so far ahead that
| more open models would be left completely in the dust for
| the significant future.
|
| Glad to see this release. It also puts more pressure onto
| OpenAI to make their model less lobotomized and to increase
| its output quality. This is good for everyone.
| reissbaker wrote:
| In my quick experimentation for image-to-image this feels even
| better than GPT-4o: 4o tends to heavily weight the colors
| towards sepia, to the point where it's a bit of an obvious tell
| that the image was 4o-generated (especially with repeated
| edits); FLUX.1 Kontext seems to use a much wider, more colorful
| palette. And FLUX, at least the Max version I'm playing around
| with on Replicate, nails small details that 4o can miss.
|
| I haven't played around with from-scratch generation, so I'm
| not sure which is best if you're trying to generate an image
| just from a prompt. But in terms of image-to-image via a
| prompt, it feels like FLUX is noticeably better.
| andybak wrote:
| Nobody tested that page on mobile.
| jnettome wrote:
| I'm trying to login to evaluate this but the google auth
| redirects me back to localhost:3000
| vunderba wrote:
| I'm debating whether to add the FLUX Kontext model to my GenAI
| image comparison site. The Max variant of the model definitely
| scores higher in _prompt adherence_ nearly doubling Flux 1.dev
| score but still falling short of OpenAI 's gpt-image-1 which
| (visual fidelity aside) is sitting at the top of the leaderboard.
|
| I liked keeping Flux 1.D around just to have a nice baseline for
| local GenAI capabilities.
|
| https://genai-showdown.specr.net
|
| Incidentally, we did add the newest release of Hunyuan's Image
| 2.0 model but as expected of a real-time model it scores rather
| poorly.
|
| _EDIT: In fairness to Black Forest Labs this model definitely
| seems to be more focused on editing capabilities to refine and
| iterate on existing images rather than on strict text-to-image
| creation._
| Klaus23 wrote:
| Nice site! I have a suggestion for a prompt that I could never
| get to work properly. It's been a while since I tried it, and
| the models have probably improved enough that it should be
| possible now. A knight with a sword in hand
| stands with his back to us, facing down an army. He holds his
| shield above his head to protect himself from the rain of
| arrows shot by archers visible in the rear.
|
| I was surprised at how badly the models performed. It's a
| fairly iconic scene, and there's more than enough training
| data.
| lawik wrote:
| Making an accurate flail (stick - chain - ball) is a fun
| sport.. weird things tend to happen.
| meta87 wrote:
| please add! cool site thanks :)
| nopinsight wrote:
| Wondering if you could add "Flux 1.1 Pro Ultra" to the site?
| It's supposed to be the best among the Flux family of models,
| and far better than Flux Dev (3rd among your current
| candidates) at prompt adherence.
|
| Adding it would also provide a fair assessment for a leading
| open source model.
|
| The site is a great idea and features very interesting prompts.
| :)
| theyinwhy wrote:
| Looks good! Would be great to see Adobe Firefly in your
| evaluation as well.
| vunderba wrote:
| Some of these samples are rather cherry picked. Has anyone
| actually tried the professional headshot app of the "Kontext
| Apps"?
|
| https://replicate.com/flux-kontext-apps
|
| I've thrown half a dozen pictures of myself at it and it just
| completely replaced me with somebody else. To be fair, the final
| headshot does look very professional.
| minimaxir wrote:
| Is the input image aspect ratio the same as the output aspect
| ratio? In some testing I've noticed that there is weirdness
| that happens if there is a forced shift.
| doctorpangloss wrote:
| Nobody has solved the scientific problem of identity
| preservation for faces in one shot. Nobody has even solved
| hands.
| emmelaich wrote:
| I tried making a realistic image from a cartoon character but
| aged. It did very well, definitely recognisable as the same
| 'person'.
| danielbln wrote:
| Best bet right now is still face swapping with something like
| insightface.
| mac-mc wrote:
| I tried a professional headshot prompt on the flux playground
| with a tired gym selfie and it kept it as myself, same
| expression, sweat, skin tone and all. It was like a background
| swap, then I expanded it to "make a professional headshot
| version of this image that would be good for social media, make
| the person smile, have a good pose and clothing, clean non-
| sweaty skin, etc" and it stayed pretty similar, except it
| swapped the clothing and gave me an awkward smile, which may be
| accurate for those kinds of things if you think about it.
| diggan wrote:
| It isn't mentioned on https://replicate.com/flux-kontext-
| apps/professional-headsho..., but on
| https://replicate.com/black-forest-labs/flux-kontext-pro
| under the "Prompting Best Practices" section is says this:
|
| > Preserve Intentionally
|
| > Specify what should stay the same: "while keeping the same
| facial features"
|
| > Use "maintain the original composition" to preserve layout
|
| > For background changes: "Change the background to a beach
| while keeping the person in the exact same position"
|
| So while the marketing seems to paint a picture that it'll
| preserve things automatically, and kind of understand exactly
| what you want changed, it doesn't seem like that's the full
| truth. You need to instead be very specific about what you
| want to preserve.
| pkrx wrote:
| It's convenient but the results are def not significantly
| better than available free stuff
| layer8 wrote:
| > show me a closeup of...
|
| Investigators will love this for "enhance". ;)
| mdp2021 wrote:
| At some point, "Do not let the tool invent details!" will
| become a shout more frequent than most expressions.
| ilaksh wrote:
| Anyone have a guess as to when the open dev version gets
| released? More like a week or a month or two I wonder.
| mdp2021 wrote:
| Is input restricted to a single image? If you could use more
| images as input, you could do prompts like "Place the item in
| image A inside image B" (e.g. "put the character of image A in
| the scenery of image B"), etc.
| carlosdp wrote:
| There's an experimental "multi" mode you can input multiple
| images to
| echelon wrote:
| Fal has the multi image interface to test against. (Replicate
| might as well, I haven't checked yet.)
|
| THIS MODEL ROCKS!
|
| It's no gpt-image-1, but it's ridiculously close.
|
| There isn't going to be a moat in images or video. I was so
| worried Google and OpenAI would win creative forever. Not so.
| Anyone can build these.
| bossyTeacher wrote:
| I wonder if this is using a foundation model or a fine tuned one
| fagerhult wrote:
| I vibed up a little chat interface https://kontext-
| chat.vercel.app/
| Hard_Space wrote:
| Does not seem to want to deal with faces at all, flags
| everything with humans as 'sensitive' and declines.
| eamag wrote:
| Can it generate chess? https://manifold.markets/Hazel/an-ai-
| model-will-successfully...
| zamadatix wrote:
| The focus of this model is to be able to do iterative editing
| and/or use other images as a source while the focus of that bet
| is to consistently one shot a specific image 9/10 times with
| the same prompt. Given the canyon between those two focuses I
| don't think so, but maybe if you had an inventive enough
| prompt?
| gravitywp wrote:
| You can try it now on https://fluxk.art.
| xnorswap wrote:
| I tried this out and a hilarious "context-slip" happened:
|
| https://imgur.com/a/gT6iuV1
|
| It generated (via a prompt) an image of a space ship landing on a
| remote planet.
|
| I asked an edit, "The ship itself should be more colourful and a
| larger part of the image".
|
| And it replaced the space-ship with a container vessel.
|
| It had the chat history, it should have understood I still wanted
| a space-ship, but it dropped the relevant context for what I was
| trying to achieve.
| gunalx wrote:
| I mean ro its credit, one of the cobtainer ships seems to be
| flying. /s
| sujayk_33 wrote:
| So here is my understanding of current native image generation
| scenario, I might be wrong so please correct me, I'm still
| learning it and I'd appreaciate the help.
|
| First time native image gen was introduced in Gemini 1.5 Flash if
| I'm not wrong, and then OpenAI was released for 4o which took
| over the internet by Ghibli Art.
|
| We have been getting good quality images from almost all image
| generators like Midjourney, OpenAI and other providers, but the
| thing that made it special was true "multimodal" nature of it.
| Here's what I mean
|
| When you used to ask chatgpt to create an image, it will rephrase
| that prompt and internally send that prompt to Dalle, similarly
| gemini would send it to Imagen which were diffusion models and
| they had little to know context in your next response about
| what's there in the previous image
|
| In native image generation, it understands Audio, Text and even
| Image tokens in the same model and need not to rely on diffusion
| models internally, I don't think both Openai and google has
| released how they've trained it but my guess is that it's
| partially auto-regressive and diffusion but not sure about it
| claudefocan wrote:
| This is not fully correct.
|
| The people behind flux are the authors of stable diffusion
| paper that dates back to 2022.
|
| Openai initially had dallee but stable diffusion was a massive
| improvement on dallee.
|
| Then openai inspired itself from stable diffusion for gpt image
___________________________________________________________________
(page generated 2025-05-30 23:02 UTC)