[HN Gopher] OpenAI releases image generation in the API
___________________________________________________________________
OpenAI releases image generation in the API
Author : themanmaran
Score : 194 points
Date : 2025-04-24 19:27 UTC (3 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| minimaxir wrote:
| Pricing-wise, this API is going to be hard to justify the value
| unless you really can get value out of providing references. A
| generated `medium` 1024x1024 is $0.04/image, which is in the same
| cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new
| playground (https://platform.openai.com/playground/images), the
| medium images are indeed lower quality than either of of two
| competitor models and still takes 15+ seconds to generate:
| https://x.com/minimaxir/status/1915114021466017830
|
| Prompting the model is also substantially more different and
| difficult than traditional models, unsurprisingly given the way
| the model works. The traditional image tricks don't work out-of-
| the-box and I'm struggling to get something that works without
| significant prompt augmentation (which is what I suspect was used
| for the ChatGPT image generations)
| tough wrote:
| It seems to me like this is a new hybrid product for -vibe
| coders- beacuse otherwise the -wrapping- of prompting/improving
| a prompt with an LLM before hitting the text2image model can
| certainly be done as you say cheaper if you just run it
| yourself.
|
| maybe OpenAI thinks model business is over and they need to
| start sherlocking all the way from the top to final apps (Thus
| their interest on buying out cursor, finally ending up with
| windsurf)
|
| Idk this feels like a new offering between a full raw API and a
| final product where you abstract some of it for a few cents,
| and they're basically bundling their SOTA llm models with their
| image models for extra margin
| vineyardmike wrote:
| > It seems to me like this is a new hybrid product for -vibe
| coders- beacuse otherwise the -wrapping- of
| prompting/improving a prompt with an LLM before hitting the
| text2image model can certainly be done as you say cheaper if
| you just run it yourself.
|
| In case you didn't know, it's not just wrapping in an LLM.
| The image model they're referencing is a model that's
| directly integrated into the LLM for functionality. It's not
| possible to extract, because the LLM outputs tokens which are
| part of the image itself.
|
| That said, they're definitely trying to focus on building
| products over raw models now. They want to be a consumer
| subscription instead of commodity model provider.
| tough wrote:
| Right! I forgot the new model was a multi-modal one
| generating image outputs from both image and text inputs, i
| guess this is good and price will come down eventually.
|
| waiting for some FOSS multi-modal model to come out
| eventually too
|
| great to see openAI expanding into making actual usable
| products i guess
| spilldahill wrote:
| yeah, the integration is the real shift here. by embedding
| image generation into the LLM's token stream, it's no
| longer a pipeline of separate systems but a single unified
| model interface. that unlocks new use cases where you can
| reason, plan, and render all in one flow. it's not just
| about replacing diffusion models, it's about making
| generation part of a broader agentic loop. pricing will
| drop over time, but the shift in how you build with this is
| the more interesting part.
| doctorpangloss wrote:
| It's far and away the most powerful image model right now.
| $0.04/image is a decent price!
| arevno wrote:
| This is extremely domain-specific. Diffusion models work much
| better for certain things.
| thot_experiment wrote:
| Can you cite an example? I'm really curious where that set
| of usecases lies.
| koakuma-chan wrote:
| Explicit adult content.
| thot_experiment wrote:
| False. That has nothing to do with the model architecture
| and everything to do with cloud inference providers
| wanting to avoid regulatory scrutiny.
| echelon wrote:
| I work in the space. There are a lot of use cases that
| get censored by OpenAI, Kling, Runway, and various other
| providers for a wide variety of reasons:
|
| - OpenAI is notorious for blocking copyrighted
| characters. They do prompt keyword scanning, but also run
| a VLM on the results so you can't "trick" the model.
|
| - Lots of providers block public figures and celebrities.
|
| - Various providers block LGBT imagery, even safe for
| work prompts. Kling is notorious for this.
|
| - I was on a sales call with someone today who runs a
| father's advocacy group. I don't know what system he was
| using, but he said he found it impossible to generate an
| adult male with a child. In a totally safe for work
| context.
|
| - Some systems block "PG-13" images of characters that
| are in bathing suits or scantily clad.
|
| None of this is porn, mind you.
| thot_experiment wrote:
| Sure but that has nothing to do with the model
| architecture and everything to do with the cloud
| inference providers wanting to cover their asses.
| throwaway314155 wrote:
| What does any of that have to do with the distinction
| between diffusion vs. autoregressive models?
| echelon wrote:
| I don't think so. This model kills the need for Flux,
| ComfyUI, LoRAs, fine tuning, and pretty much everything
| that's come before it.
|
| This is the god model in images right now.
|
| I don't think open source diffusion models can catch up
| with this. From what I've heard, this model took a huge
| amount of money to train that not even Black Forest Labs
| has access to.
| thot_experiment wrote:
| ComfyUI supports 4o natively so you get the best of both
| worlds, there is so much that you can't do with 4o
| because there's a fundamental limit on the level of
| control you can have over image generation when your
| conditioning is just tokens in an autoregressive model.
| There's plenty of reason to use comfy even if 4o is part
| of your workflow.
|
| As for LoRAs and fine tuning and open source in general;
| if you've ever been to civit.ai it should be immediately
| obvious why those things aren't going away.
| simonw wrote:
| It may lose against other models on prompt-to-image, but I'd be
| very excited to see another model that's as good at this one as
| image+prompt-to-image. Editing photos with ChatGPT over the
| past few weeks has been SO much fun.
|
| Here's my dog in a pelican costume:
| https://bsky.app/profile/simonwillison.net/post/3lneuquczzs2...
| steve_adams_86 wrote:
| The dog ChatGPT generated doesn't actually look like your
| dog. The eyes are so different. Really cute image, though.
| furyofantares wrote:
| I find prompting the model substantially easier than
| traditional models, is it really more difficult or are you just
| used to traditional models?
|
| I suspect what I'll do with the API is iterate at medium
| quality and then generate a high quality image when I'm done.
| thot_experiment wrote:
| Similarly to how 90% of my LLM needs are met by Mistral 3.1,
| there's no reason to use 4o for most t2i or i2i, however
| there's a definite set of tasks that are not possible with
| diffusion models, or if they are they require a giant ball of
| node spaghetti in comfyui to achieve. The price is high but the
| likelyhood of getting the right answer on the first try is
| absolutely worth the cost imo.
| Sohcahtoa82 wrote:
| > A generated `medium` 1024x1024 is $0.04/image
|
| It's actually more than that. It's about 16.7 cents per image.
|
| $0.04/image is the pricing for DALL-E 3.
| weird-eye-issue wrote:
| No, it's not
| mkl wrote:
| 16.7 cents is the high quality cost, and medium is 4.2 cents:
| https://platform.openai.com/docs/pricing#:~:text=1M%20charac.
| ..
| Sohcahtoa82 wrote:
| Ah, they changed that page since I saw it yesterday.
|
| They didn't show low/med/high quality, they just said an
| image was a certain number of tokens with a price per token
| that led to $0.16/image.
| raincole wrote:
| ChatGPT's prompt adherence is light years ahead of all the
| others. I won't even call Flux/Midjoueny its competitors.
| ChatGPT image gen is practically a one-of-its-kind unique
| product on the market: the only usable AI image editor for
| people without image editing experience.
|
| I think in terms of image generation, ChatGPT is the biggest
| leap since Stable Diffusion's release. LoRA/ControlNet/Flux are
| forgettable in comparison.
| soared wrote:
| This is a take so incredulous it doesn't seem credible.
| tacoooooooo wrote:
| its 100% the correct take
| fkyoureadthedoc wrote:
| yeah this is my personal experience. The new image
| generation is the only reason I keep an OpenAI
| subscription rather than switching to Google.
| mediaman wrote:
| It is correct, the shift from diffusion to transformers is
| a very, very big difference.
| stavros wrote:
| I can confirm, ChatGPT's prompt adherence is so incredibly
| good, it gets even really small details right, to a level
| that diffusion-based generators couldn't even dream of.
| thegeomaster wrote:
| Well, there's also gemini-2.0-flash-exp-image-generation.
| Also autoregressive/transfusion based.
| thefourthchime wrote:
| Such a good name....
| adamhowell wrote:
| So, I've long dreamed of building an AI-powered
| https://iconfinder.com.
|
| I started Accomplice v1 back in 2021 with this goal in mind and
| raised some VC money but it was too early.
|
| Now, with these latest imagen-3.0-generate-002 (Gemini) and
| gpt-image-1 (OpenAI) models - especially this API release from
| OpenAI - I've been able to resurrect Accomplice as a little
| side project.
|
| Accomplice v2 (https://accomplice.ai) is just getting started
| back up again - I honestly decided to rebuild it only a couple
| weeks ago in preparation for today once I saw ChatGPT's new
| image model - but so far 1,000s of free to download PNGs (and
| any SVGs that have already been vectorized are free too (costs
| a credit to vectorize)).
|
| I generate new icons every few minutes from a huge list of
| "useful icons" I've built. Will be 100% pay-as-you-go. And for
| a credit, paid users can vectorize any PNGs they like, tweak
| them using AI, upload their own images to vectorize and
| download, or create their own icons (with my prompt injections
| baked in to get you good icon results)
|
| Do multi-modal models make something like this obsolete? I
| honestly am not sure. In my experience with Accomplice v1, a
| lot of users didn't know what to do with a blank textarea, so
| the thinking here is there's value in doing some of the work
| for them upfront with a large searchable archive. Would love to
| hear others' thoughts.
|
| But I'm having fun again either way.
| stavros wrote:
| That looks interesting, but I don't know how useful single
| icons can be. For me, the really useful part would be to get
| a suite of icons that all have a consistent visual style.
| Bonus points if I can prompt the model to generate more icons
| with that same style.
| throwup238 wrote:
| Recraft has a style feature where you give some images. I
| wonder if that would work for icons. You can also try
| giving an image of a bunch of icons to ChatGPT and have it
| generate more, then vectorize them.
| stavros wrote:
| I think the latter approach is the best bet right now,
| agree.
| varenc wrote:
| pretty amazing that in ~two years a 15 second latency AI image
| generation API that cost 4 cents lags behind competitors.
| vunderba wrote:
| _> Prompting the model is also substantially more different and
| difficult than traditional models_
|
| Can you elaborate? This was not my experience - retesting the
| prompts that I used for my GenAI image shootout against gpt-
| image-1 API proved largely similar.
|
| https://genai-showdown.specr.net
| sebastiennight wrote:
| Hmm seems pricey.
|
| What's the current state of the art for API generation of an
| image from a reference plus modifier prompt?
|
| Say, in the 1c per HD (1920*1080) image range?
| minimaxir wrote:
| "Image from a reference" is a bit of a rabbit hole. For
| traditional image generation models, in order for it to learn a
| reference, you have to fine-tune it (LoRA) and/or use a
| conditioning model to constrain the output
| (InstantID/ControlNet)
|
| The interesting part of this GPT-4o API is that it doesn't need
| to learn them. But given the cost of `high` quality image
| generation, it's much cheaper to train a LoRA for Flux 1.1 Pro
| and generate from that.
| Tiberium wrote:
| Imagen supports image references in the API as well, just on
| Vertex, not on Gemini API yet.
| thot_experiment wrote:
| Reflux is fantastic for the basic reference image based
| editing most people are using this for, but 4o is far more
| powerful than any existing models because of it's large scale
| and cross-modal understanding, there are things possible with
| 4o that are just 100% impossible with diffusion models. (full
| glass of wine, horse riding an astronaut, room without pink
| elephants, etc)
| gervwyk wrote:
| Great svg generation would be far more userful! For example,
| being able to edit svg images after generated by Ai would be
| quick to modify the last mile.. For our new website
| https://resonancy.io the simple svg workflow images created was
| still very much created by hand.. and trying various ai tools to
| make such images yielded shockingly bad off-brand results even
| when provided multiple examples. By far the best tool for this is
| still canva for us..
|
| Anyone know of an Ai model for generating svg images? Please
| share.
| tough wrote:
| SVGFusion https://arxiv.org/abs/2412.10437 which is a new paper
| from SVGRender group https://huggingface.co/SVGRender
|
| OmniSVG https://arxiv.org/abs/2504.06263v1
| gervwyk wrote:
| Amazing thanks for sharing! Will have a read. A commercial
| model would be something that I will pay for!
| tough wrote:
| I don't know about -commercial- offerings but you can try
| also something like SVGRender which you should be able to
| run on your own GPU etc https://ximinng.github.io/PyTorch-
| SVGRender-project/
|
| first paper linked on prior comment is the latest one from
| SVGRender group, but not sure if any runnable model weights
| are out yet for it (SVGFusion)
| simonw wrote:
| I was impressed with recraft.ai for SVGs -
| https://simonwillison.net/2024/Nov/15/recraft-v3/ - though as
| far as I can tell they generate raster images and then SVG-ize
| them before returning the result.
| jjcm wrote:
| Recraft also has an svg model: https://replicate.com/recraft-
| ai/recraft-v3-svg
|
| One note with these is most of the production ones are actually
| diffusion models that get ran through an image->svg model
| after. The issue with this is that the layers aren't set up
| semantically like you'd expect if you were crafting these by
| hand, or if you were directly generating svgs. The results
| work, but they aren't perfect.
| vitorcremonez wrote:
| Try neoSVG or Recraft, it is awesome!
| smrt wrote:
| I don't understand why this api needs organization verification.
| More paperwork ahead. Facepalm
|
| PermissionDeniedError: Error code: 403 - {'error': {'message':
| 'To access gpt-image-1, please complete organization verification
| themanmaran wrote:
| Likely because they've seen a lot of the potential abuse
| capabilities. i.e. the "generate a drivers license with this
| face".
|
| So the options are: 1) nerf the model so it can't produce
| images like that, or 2) use some type of KYC verification.
| magackame wrote:
| The model is already pretty lobotomized refusing even mundane
| requests randomly.
|
| Upload a picture of a friend -> OK. Upload my own picture ->
| I can't generate anything involving real people.
|
| Also after they enabled global chat memory I started seeing
| my other chats leaking into the images as literal text.
| Disabled it since.
| pkulak wrote:
| I don't get it. I've been using `dall-e-3` over the public API
| for a couple years now. Is this just a new model?
|
| EDIT: Oh, yes, that's what it appears to be. Is it better? Why
| would I switch?
| themanmaran wrote:
| This is the new model that's available in ChatGPT, which most
| notably can do transfer generation. i.e. "take this image and
| restyle it to look like X". Or "take this sneaker and give me a
| billboard ad for it"
| danielbln wrote:
| This is their presumably auto regressive image model. It has
| outstanding prompt adherence and great detail in addition to
| strong style transfer abilities.
| Sohcahtoa82 wrote:
| The new image generation model is miles ahead of DALL-E 3,
| especially when generating text.
| bradly wrote:
| Basically they are charging for the ability to make accurate
| text generation.
| film42 wrote:
| I generated 5 images in the playground. One using a text-only
| prompt and 4 using images from my phone. I spent $0.85 which
| isn't bad for a fun round of Studio Ghibli portraits for the
| family group chat, but too expensive to be used in a customer
| facing product.
| sumedh wrote:
| > but too expensive to be used in a customer facing product.
|
| Enhance headshots for putting on Linkedin.
| MisterBiggs wrote:
| Lots of comments on the price being too high, what are the odds
| this is a subsidized bare metal cost?
| kevinqi wrote:
| just based on how long it takes to produce these images, and
| how much text responses cost, I wouldn't be surprised at all if
| it was close to cost
| scyzoryk_xyz wrote:
| Intelligence is fast approaching utility status.
| 1oooqooq wrote:
| aren't you all embarrassed seeing lame press releases of the most
| uninteresting things on the top of HN front page? i kinda feel
| bad.
| bobxmax wrote:
| I'm embarassed that you find revolutionary tech uninteresting.
| 1oooqooq wrote:
| it's literary one feature now available in a different
| billing format. get a gripe.
| sumedh wrote:
| This news is relevant for developers though.
| drakenot wrote:
| Does the AI have the same content restrictions that the chat
| service does?
| Imnimo wrote:
| I'm curious what the applications are where people need to
| generate hundreds or thousands of these images. I like making
| Ghibli-esque versions of family photos as much as the next
| person, but I don't need to make them in volume. As far as I can
| recall, every time I've used image generation, it's been one-off
| things that I'm happy to do in the ChatGPT UI.
| marviel wrote:
| AI-assisted education is promising.
| Etheryte wrote:
| That is true in a broader sense, but education and abundant
| money don't generally go hand in hand.
| marviel wrote:
| don't I know it
| samtp wrote:
| I'm still struggling to see how you would need thousands of
| AI generated images rather than just using existing real
| images for education.
| marviel wrote:
| - personalization (style, analogy to known concepts)
|
| - specificity (a diagram that perfectly encapsulates the
| exact set of concepts you're asking about)
| indeyets wrote:
| But LLMs are not reliable enough, so you can not actually
| expect "specificity"
| marviel wrote:
| Not perfect now, but adequate in some domains. Will only
| get better.
| minimaxir wrote:
| As usual for AI startups nowadays, using this API you can
| create a downstream wrapper for image generation with bespoke
| prompts.
|
| A pro/con of the multimodal image generation approach (with an
| actually good text encoder) is that it rewards intense prompt
| engineering moreso than others, and if there is a use case that
| can generate more than $0.17/image in revenue, that's positive
| marginal profit.
| austhrow743 wrote:
| I use the api because i don't use chatgpt enough to justify the
| cost of their UI offering.
| cuuupid wrote:
| When this was up yesterday I complained that the refusal rate was
| super high especially on government and military shaped tasks,
| and that this would only push contractors to use CN-developed
| open source models for work that could then be compromised.
|
| Today I'm discovering there is a tier of API access with
| virtually no content moderation available to companies working in
| that space. I have no idea how to go about requesting that tier
| of access, but have spoken to 4 different defense contractors in
| the last day who seem to already be using it.
| refulgentis wrote:
| It's "tier 5", I've had an account since the 3.0 days so I
| can't be _positive_ I 'm not grandfathered in, but, my
| understanding is as long as you have a non-trivial amount of
| spend for a few months you'll have that access.
|
| (fwiw for anyone curious how to implement it, it's the
| 'moderation' parameter in the JSON request you'll send, I
| missed it for a few hours because it wasn't in Dalle-3)
| dunkmaster wrote:
| API shows either auto or low available. Is there another
| secret value with even lower restrictions?
| refulgentis wrote:
| Not that I know of.
|
| I just took any indication that the parent post meant
| _absolute zero_ moderation as them being a bit loose with
| their words and excitable with how they understand things,
| there were some signs:
|
| 1. it's unlikely they completed an API integration quickly
| enough to have an opinion on military / defense image
| generation moderation _yesterday_ , so they're almost
| certainly speaking about ChatGPT. (this is additionally
| confirmed by image generation requiring tier 5 anyway,
| which they would have been aware of if they had integrated)
|
| 2. The military / defense use cases for _image generation_
| are not provided (and the steelman 'd version in other
| comments is nonsensical, i.e. we can quickly validate you
| can still generate kanban boards or wireframes of ships)
|
| 3. The poster passively disclaims being in military /
| defense themself (grep "in that space")
|
| 4. it is hard to envision cases of #2 that do not require
| universal moderation for OpenAI's sake, i.e. lets say their
| thought process is along the lines of: defense/military ~=
| what I think of as CIA ~= black ops ~= image manipulation
| on social media, thus, the time I said "please edit this
| photo of the ayatollah to have him eating pig and say I
| hate allah" means its overmoderated for defense use cases
|
| 5. It's unlikely openai wants to be _anywhere near_ PR
| resulting from #3. Assuming there is a super secret defense
| tier that allows this, it 's at the very least, unlikely
| that the poster's defense contractor friends were blabbing
| about about the exclusive completely unmoderated access
| they had, to the poster, within hours of release. They're
| pretty serious about that secrecy stuff!
|
| 6. It is unlikely the lack of ability to generate images
| using _GPT Image 1_ would drive the military to Chinese
| models (there aren 't Chinese _LLMs_ that do this! even if
| they were, there 's plenty of good ol' American diffusion
| models!)
| samtp wrote:
| What's a good use case for a defense contractor to generate AI
| images besides to include in presentations?
| aigen000 wrote:
| Fabricating evidence of weapons of mass destruction in some
| developing nation.
|
| I kid, more real world use cases would be for concept images
| for a new product or marketing campaigns.
| ZeroTalent wrote:
| Manufacturing consent
| subroutine wrote:
| Think of all the trivial ways an image generator could be
| used in business, and there is likely a similar use-case
| among the DoD and its contractors (e.g. create a cartoon
| image of a ship for a naval training aid; make a data
| dashboard wireframe concept for a decision aid).
| throwaway314155 wrote:
| > 4 different defense contractors in the last day
|
| Now I'm just wondering what the hell defense contractors need
| image generation for that isn't obviously horrifying...
| morleytj wrote:
| It's probably horrifying!
| Aeolun wrote:
| "Generate me a crowd of civilians with one terrorist in."
|
| "Please move them to some desert, not the empire state
| building."
|
| "The civilians are supposed to have turbans, not ballcaps."
| renewiltord wrote:
| They make presentations. Most of their work is presentations
| with diagrams. Icons.
| vFunct wrote:
| Show me a tunnel underneath a building in the desert filled
| with small arms weapons with a poster on the wall with a map
| of the United States and a label written with sharpie saying
| "Bad guys here". Also add various Arabic lettering on the
| weapons.
| kittikitti wrote:
| This is on purpose so OpenAI can then litigate against them.
| This API isn't about a new feature, it's about control. OpenAI
| is the biggest bully in the space of generative AI and their
| disinformation and intimidation tactics are working.
| subroutine wrote:
| Do you work with OpenAI models via FedRAMP GGC High Azure? If
| so I would love to hear more about your experience.
| jonplackett wrote:
| Does anyone know if you can give this endpoint an image as input
| along with text - not just an image to mask, but an image as part
| of a text input description.
|
| I can't see a way to do this currently, you just get a prompt.
|
| This, I think, is the most powerful way to use the new image
| model since it actually understands the input image and can make
| a new one based on it.
|
| Eg you can give it a person sitting at a desk and it can make one
| of them standing up. Or from another angle. Or in the moon.
| loktarogar wrote:
| Seems like exactly one of their examples, or am I missing
| something? "Create a new image using image references"
| https://platform.openai.com/docs/guides/image-generation#cre...
| adamhowell wrote:
| I think this is technically "image variations" and I think
| image variations are still only dall-e 3 for now (best I could
| tell earlier today from the API)
| badmonster wrote:
| Usage of gpt-image-1 is priced per token, with separate pricing
| for text and image tokens:
|
| Text input tokens (prompt text): $5 per 1M tokens Image input
| tokens (input images): $10 per 1M tokens Image output tokens
| (generated images): $40 per 1M tokens
|
| In practice, this translates to roughly $0.02, $0.07, and $0.19
| per generated image for low, medium, and high-quality square
| images, respectively.
|
| that's a bit pricy for a startup.
| GaggiX wrote:
| Far too expensive, I think I will wait for an equivalent Gemini
| model.
| claiir wrote:
| > GoDaddy is actively experimenting to integrate image generation
| so customers can easily create logos that are editable [..]
|
| I remember meeting someone on Discord 1-2 years ago (?) working
| on a GoDaddy effort to have customer-generated icons using
| bespoke foundation image gen models? Suppose that kind of bespoke
| model at that scale is ripe for replacement by gpt-image-1, given
| the instruction-following ability / steerability?
| jumploops wrote:
| For the curious, this is LLM-based rather than diffusion based,
| meaning that it adheres to text prompts with much higher
| accuracy.
|
| As an example, some users (myself included) of a generative image
| app were trying to make a picture of person in the pouch of a
| kangaroo.
|
| No matter what we prompted, we couldn't get it to work.
|
| This new model did it in one shot!
| tezza wrote:
| For the curious I generated the same prompt for each of the
| quality types. 'Auto', 'low', 'medium', 'high'.
|
| Prompt: "a cute dog hugs a cute cat"
|
| https://x.com/terrylurie/status/1915161141489136095
|
| I also then showed a couple of DALL:E 3 images for comparison in
| a comment
___________________________________________________________________
(page generated 2025-04-24 23:00 UTC)