hngopher.com

       [HN Gopher] OpenAI releases image generation in the API
       ___________________________________________________________________
        
       OpenAI releases image generation in the API
        
       Author : themanmaran
       Score  : 194 points
       Date   : 2025-04-24 19:27 UTC (3 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | minimaxir wrote:
       | Pricing-wise, this API is going to be hard to justify the value
       | unless you really can get value out of providing references. A
       | generated `medium` 1024x1024 is $0.04/image, which is in the same
       | cost class as Imagen 3 and Flux 1.1 Pro. Testing from their new
       | playground (https://platform.openai.com/playground/images), the
       | medium images are indeed lower quality than either of of two
       | competitor models and still takes 15+ seconds to generate:
       | https://x.com/minimaxir/status/1915114021466017830
       | 
       | Prompting the model is also substantially more different and
       | difficult than traditional models, unsurprisingly given the way
       | the model works. The traditional image tricks don't work out-of-
       | the-box and I'm struggling to get something that works without
       | significant prompt augmentation (which is what I suspect was used
       | for the ChatGPT image generations)
        
         | tough wrote:
         | It seems to me like this is a new hybrid product for -vibe
         | coders- beacuse otherwise the -wrapping- of prompting/improving
         | a prompt with an LLM before hitting the text2image model can
         | certainly be done as you say cheaper if you just run it
         | yourself.
         | 
         | maybe OpenAI thinks model business is over and they need to
         | start sherlocking all the way from the top to final apps (Thus
         | their interest on buying out cursor, finally ending up with
         | windsurf)
         | 
         | Idk this feels like a new offering between a full raw API and a
         | final product where you abstract some of it for a few cents,
         | and they're basically bundling their SOTA llm models with their
         | image models for extra margin
        
           | vineyardmike wrote:
           | > It seems to me like this is a new hybrid product for -vibe
           | coders- beacuse otherwise the -wrapping- of
           | prompting/improving a prompt with an LLM before hitting the
           | text2image model can certainly be done as you say cheaper if
           | you just run it yourself.
           | 
           | In case you didn't know, it's not just wrapping in an LLM.
           | The image model they're referencing is a model that's
           | directly integrated into the LLM for functionality. It's not
           | possible to extract, because the LLM outputs tokens which are
           | part of the image itself.
           | 
           | That said, they're definitely trying to focus on building
           | products over raw models now. They want to be a consumer
           | subscription instead of commodity model provider.
        
             | tough wrote:
             | Right! I forgot the new model was a multi-modal one
             | generating image outputs from both image and text inputs, i
             | guess this is good and price will come down eventually.
             | 
             | waiting for some FOSS multi-modal model to come out
             | eventually too
             | 
             | great to see openAI expanding into making actual usable
             | products i guess
        
             | spilldahill wrote:
             | yeah, the integration is the real shift here. by embedding
             | image generation into the LLM's token stream, it's no
             | longer a pipeline of separate systems but a single unified
             | model interface. that unlocks new use cases where you can
             | reason, plan, and render all in one flow. it's not just
             | about replacing diffusion models, it's about making
             | generation part of a broader agentic loop. pricing will
             | drop over time, but the shift in how you build with this is
             | the more interesting part.
        
         | doctorpangloss wrote:
         | It's far and away the most powerful image model right now.
         | $0.04/image is a decent price!
        
           | arevno wrote:
           | This is extremely domain-specific. Diffusion models work much
           | better for certain things.
        
             | thot_experiment wrote:
             | Can you cite an example? I'm really curious where that set
             | of usecases lies.
        
               | koakuma-chan wrote:
               | Explicit adult content.
        
               | thot_experiment wrote:
               | False. That has nothing to do with the model architecture
               | and everything to do with cloud inference providers
               | wanting to avoid regulatory scrutiny.
        
               | echelon wrote:
               | I work in the space. There are a lot of use cases that
               | get censored by OpenAI, Kling, Runway, and various other
               | providers for a wide variety of reasons:
               | 
               | - OpenAI is notorious for blocking copyrighted
               | characters. They do prompt keyword scanning, but also run
               | a VLM on the results so you can't "trick" the model.
               | 
               | - Lots of providers block public figures and celebrities.
               | 
               | - Various providers block LGBT imagery, even safe for
               | work prompts. Kling is notorious for this.
               | 
               | - I was on a sales call with someone today who runs a
               | father's advocacy group. I don't know what system he was
               | using, but he said he found it impossible to generate an
               | adult male with a child. In a totally safe for work
               | context.
               | 
               | - Some systems block "PG-13" images of characters that
               | are in bathing suits or scantily clad.
               | 
               | None of this is porn, mind you.
        
               | thot_experiment wrote:
               | Sure but that has nothing to do with the model
               | architecture and everything to do with the cloud
               | inference providers wanting to cover their asses.
        
               | throwaway314155 wrote:
               | What does any of that have to do with the distinction
               | between diffusion vs. autoregressive models?
        
             | echelon wrote:
             | I don't think so. This model kills the need for Flux,
             | ComfyUI, LoRAs, fine tuning, and pretty much everything
             | that's come before it.
             | 
             | This is the god model in images right now.
             | 
             | I don't think open source diffusion models can catch up
             | with this. From what I've heard, this model took a huge
             | amount of money to train that not even Black Forest Labs
             | has access to.
        
               | thot_experiment wrote:
               | ComfyUI supports 4o natively so you get the best of both
               | worlds, there is so much that you can't do with 4o
               | because there's a fundamental limit on the level of
               | control you can have over image generation when your
               | conditioning is just tokens in an autoregressive model.
               | There's plenty of reason to use comfy even if 4o is part
               | of your workflow.
               | 
               | As for LoRAs and fine tuning and open source in general;
               | if you've ever been to civit.ai it should be immediately
               | obvious why those things aren't going away.
        
         | simonw wrote:
         | It may lose against other models on prompt-to-image, but I'd be
         | very excited to see another model that's as good at this one as
         | image+prompt-to-image. Editing photos with ChatGPT over the
         | past few weeks has been SO much fun.
         | 
         | Here's my dog in a pelican costume:
         | https://bsky.app/profile/simonwillison.net/post/3lneuquczzs2...
        
           | steve_adams_86 wrote:
           | The dog ChatGPT generated doesn't actually look like your
           | dog. The eyes are so different. Really cute image, though.
        
         | furyofantares wrote:
         | I find prompting the model substantially easier than
         | traditional models, is it really more difficult or are you just
         | used to traditional models?
         | 
         | I suspect what I'll do with the API is iterate at medium
         | quality and then generate a high quality image when I'm done.
        
         | thot_experiment wrote:
         | Similarly to how 90% of my LLM needs are met by Mistral 3.1,
         | there's no reason to use 4o for most t2i or i2i, however
         | there's a definite set of tasks that are not possible with
         | diffusion models, or if they are they require a giant ball of
         | node spaghetti in comfyui to achieve. The price is high but the
         | likelyhood of getting the right answer on the first try is
         | absolutely worth the cost imo.
        
         | Sohcahtoa82 wrote:
         | > A generated `medium` 1024x1024 is $0.04/image
         | 
         | It's actually more than that. It's about 16.7 cents per image.
         | 
         | $0.04/image is the pricing for DALL-E 3.
        
           | weird-eye-issue wrote:
           | No, it's not
        
           | mkl wrote:
           | 16.7 cents is the high quality cost, and medium is 4.2 cents:
           | https://platform.openai.com/docs/pricing#:~:text=1M%20charac.
           | ..
        
             | Sohcahtoa82 wrote:
             | Ah, they changed that page since I saw it yesterday.
             | 
             | They didn't show low/med/high quality, they just said an
             | image was a certain number of tokens with a price per token
             | that led to $0.16/image.
        
         | raincole wrote:
         | ChatGPT's prompt adherence is light years ahead of all the
         | others. I won't even call Flux/Midjoueny its competitors.
         | ChatGPT image gen is practically a one-of-its-kind unique
         | product on the market: the only usable AI image editor for
         | people without image editing experience.
         | 
         | I think in terms of image generation, ChatGPT is the biggest
         | leap since Stable Diffusion's release. LoRA/ControlNet/Flux are
         | forgettable in comparison.
        
           | soared wrote:
           | This is a take so incredulous it doesn't seem credible.
        
             | tacoooooooo wrote:
             | its 100% the correct take
        
               | fkyoureadthedoc wrote:
               | yeah this is my personal experience. The new image
               | generation is the only reason I keep an OpenAI
               | subscription rather than switching to Google.
        
             | mediaman wrote:
             | It is correct, the shift from diffusion to transformers is
             | a very, very big difference.
        
             | stavros wrote:
             | I can confirm, ChatGPT's prompt adherence is so incredibly
             | good, it gets even really small details right, to a level
             | that diffusion-based generators couldn't even dream of.
        
           | thegeomaster wrote:
           | Well, there's also gemini-2.0-flash-exp-image-generation.
           | Also autoregressive/transfusion based.
        
             | thefourthchime wrote:
             | Such a good name....
        
         | adamhowell wrote:
         | So, I've long dreamed of building an AI-powered
         | https://iconfinder.com.
         | 
         | I started Accomplice v1 back in 2021 with this goal in mind and
         | raised some VC money but it was too early.
         | 
         | Now, with these latest imagen-3.0-generate-002 (Gemini) and
         | gpt-image-1 (OpenAI) models - especially this API release from
         | OpenAI - I've been able to resurrect Accomplice as a little
         | side project.
         | 
         | Accomplice v2 (https://accomplice.ai) is just getting started
         | back up again - I honestly decided to rebuild it only a couple
         | weeks ago in preparation for today once I saw ChatGPT's new
         | image model - but so far 1,000s of free to download PNGs (and
         | any SVGs that have already been vectorized are free too (costs
         | a credit to vectorize)).
         | 
         | I generate new icons every few minutes from a huge list of
         | "useful icons" I've built. Will be 100% pay-as-you-go. And for
         | a credit, paid users can vectorize any PNGs they like, tweak
         | them using AI, upload their own images to vectorize and
         | download, or create their own icons (with my prompt injections
         | baked in to get you good icon results)
         | 
         | Do multi-modal models make something like this obsolete? I
         | honestly am not sure. In my experience with Accomplice v1, a
         | lot of users didn't know what to do with a blank textarea, so
         | the thinking here is there's value in doing some of the work
         | for them upfront with a large searchable archive. Would love to
         | hear others' thoughts.
         | 
         | But I'm having fun again either way.
        
           | stavros wrote:
           | That looks interesting, but I don't know how useful single
           | icons can be. For me, the really useful part would be to get
           | a suite of icons that all have a consistent visual style.
           | Bonus points if I can prompt the model to generate more icons
           | with that same style.
        
             | throwup238 wrote:
             | Recraft has a style feature where you give some images. I
             | wonder if that would work for icons. You can also try
             | giving an image of a bunch of icons to ChatGPT and have it
             | generate more, then vectorize them.
        
               | stavros wrote:
               | I think the latter approach is the best bet right now,
               | agree.
        
         | varenc wrote:
         | pretty amazing that in ~two years a 15 second latency AI image
         | generation API that cost 4 cents lags behind competitors.
        
         | vunderba wrote:
         | _> Prompting the model is also substantially more different and
         | difficult than traditional models_
         | 
         | Can you elaborate? This was not my experience - retesting the
         | prompts that I used for my GenAI image shootout against gpt-
         | image-1 API proved largely similar.
         | 
         | https://genai-showdown.specr.net
        
       | sebastiennight wrote:
       | Hmm seems pricey.
       | 
       | What's the current state of the art for API generation of an
       | image from a reference plus modifier prompt?
       | 
       | Say, in the 1c per HD (1920*1080) image range?
        
         | minimaxir wrote:
         | "Image from a reference" is a bit of a rabbit hole. For
         | traditional image generation models, in order for it to learn a
         | reference, you have to fine-tune it (LoRA) and/or use a
         | conditioning model to constrain the output
         | (InstantID/ControlNet)
         | 
         | The interesting part of this GPT-4o API is that it doesn't need
         | to learn them. But given the cost of `high` quality image
         | generation, it's much cheaper to train a LoRA for Flux 1.1 Pro
         | and generate from that.
        
           | Tiberium wrote:
           | Imagen supports image references in the API as well, just on
           | Vertex, not on Gemini API yet.
        
           | thot_experiment wrote:
           | Reflux is fantastic for the basic reference image based
           | editing most people are using this for, but 4o is far more
           | powerful than any existing models because of it's large scale
           | and cross-modal understanding, there are things possible with
           | 4o that are just 100% impossible with diffusion models. (full
           | glass of wine, horse riding an astronaut, room without pink
           | elephants, etc)
        
       | gervwyk wrote:
       | Great svg generation would be far more userful! For example,
       | being able to edit svg images after generated by Ai would be
       | quick to modify the last mile.. For our new website
       | https://resonancy.io the simple svg workflow images created was
       | still very much created by hand.. and trying various ai tools to
       | make such images yielded shockingly bad off-brand results even
       | when provided multiple examples. By far the best tool for this is
       | still canva for us..
       | 
       | Anyone know of an Ai model for generating svg images? Please
       | share.
        
         | tough wrote:
         | SVGFusion https://arxiv.org/abs/2412.10437 which is a new paper
         | from SVGRender group https://huggingface.co/SVGRender
         | 
         | OmniSVG https://arxiv.org/abs/2504.06263v1
        
           | gervwyk wrote:
           | Amazing thanks for sharing! Will have a read. A commercial
           | model would be something that I will pay for!
        
             | tough wrote:
             | I don't know about -commercial- offerings but you can try
             | also something like SVGRender which you should be able to
             | run on your own GPU etc https://ximinng.github.io/PyTorch-
             | SVGRender-project/
             | 
             | first paper linked on prior comment is the latest one from
             | SVGRender group, but not sure if any runnable model weights
             | are out yet for it (SVGFusion)
        
         | simonw wrote:
         | I was impressed with recraft.ai for SVGs -
         | https://simonwillison.net/2024/Nov/15/recraft-v3/ - though as
         | far as I can tell they generate raster images and then SVG-ize
         | them before returning the result.
        
         | jjcm wrote:
         | Recraft also has an svg model: https://replicate.com/recraft-
         | ai/recraft-v3-svg
         | 
         | One note with these is most of the production ones are actually
         | diffusion models that get ran through an image->svg model
         | after. The issue with this is that the layers aren't set up
         | semantically like you'd expect if you were crafting these by
         | hand, or if you were directly generating svgs. The results
         | work, but they aren't perfect.
        
         | vitorcremonez wrote:
         | Try neoSVG or Recraft, it is awesome!
        
       | smrt wrote:
       | I don't understand why this api needs organization verification.
       | More paperwork ahead. Facepalm
       | 
       | PermissionDeniedError: Error code: 403 - {'error': {'message':
       | 'To access gpt-image-1, please complete organization verification
        
         | themanmaran wrote:
         | Likely because they've seen a lot of the potential abuse
         | capabilities. i.e. the "generate a drivers license with this
         | face".
         | 
         | So the options are: 1) nerf the model so it can't produce
         | images like that, or 2) use some type of KYC verification.
        
           | magackame wrote:
           | The model is already pretty lobotomized refusing even mundane
           | requests randomly.
           | 
           | Upload a picture of a friend -> OK. Upload my own picture ->
           | I can't generate anything involving real people.
           | 
           | Also after they enabled global chat memory I started seeing
           | my other chats leaking into the images as literal text.
           | Disabled it since.
        
       | pkulak wrote:
       | I don't get it. I've been using `dall-e-3` over the public API
       | for a couple years now. Is this just a new model?
       | 
       | EDIT: Oh, yes, that's what it appears to be. Is it better? Why
       | would I switch?
        
         | themanmaran wrote:
         | This is the new model that's available in ChatGPT, which most
         | notably can do transfer generation. i.e. "take this image and
         | restyle it to look like X". Or "take this sneaker and give me a
         | billboard ad for it"
        
         | danielbln wrote:
         | This is their presumably auto regressive image model. It has
         | outstanding prompt adherence and great detail in addition to
         | strong style transfer abilities.
        
         | Sohcahtoa82 wrote:
         | The new image generation model is miles ahead of DALL-E 3,
         | especially when generating text.
        
         | bradly wrote:
         | Basically they are charging for the ability to make accurate
         | text generation.
        
       | film42 wrote:
       | I generated 5 images in the playground. One using a text-only
       | prompt and 4 using images from my phone. I spent $0.85 which
       | isn't bad for a fun round of Studio Ghibli portraits for the
       | family group chat, but too expensive to be used in a customer
       | facing product.
        
         | sumedh wrote:
         | > but too expensive to be used in a customer facing product.
         | 
         | Enhance headshots for putting on Linkedin.
        
       | MisterBiggs wrote:
       | Lots of comments on the price being too high, what are the odds
       | this is a subsidized bare metal cost?
        
         | kevinqi wrote:
         | just based on how long it takes to produce these images, and
         | how much text responses cost, I wouldn't be surprised at all if
         | it was close to cost
        
       | scyzoryk_xyz wrote:
       | Intelligence is fast approaching utility status.
        
       | 1oooqooq wrote:
       | aren't you all embarrassed seeing lame press releases of the most
       | uninteresting things on the top of HN front page? i kinda feel
       | bad.
        
         | bobxmax wrote:
         | I'm embarassed that you find revolutionary tech uninteresting.
        
           | 1oooqooq wrote:
           | it's literary one feature now available in a different
           | billing format. get a gripe.
        
         | sumedh wrote:
         | This news is relevant for developers though.
        
       | drakenot wrote:
       | Does the AI have the same content restrictions that the chat
       | service does?
        
       | Imnimo wrote:
       | I'm curious what the applications are where people need to
       | generate hundreds or thousands of these images. I like making
       | Ghibli-esque versions of family photos as much as the next
       | person, but I don't need to make them in volume. As far as I can
       | recall, every time I've used image generation, it's been one-off
       | things that I'm happy to do in the ChatGPT UI.
        
         | marviel wrote:
         | AI-assisted education is promising.
        
           | Etheryte wrote:
           | That is true in a broader sense, but education and abundant
           | money don't generally go hand in hand.
        
             | marviel wrote:
             | don't I know it
        
           | samtp wrote:
           | I'm still struggling to see how you would need thousands of
           | AI generated images rather than just using existing real
           | images for education.
        
             | marviel wrote:
             | - personalization (style, analogy to known concepts)
             | 
             | - specificity (a diagram that perfectly encapsulates the
             | exact set of concepts you're asking about)
        
               | indeyets wrote:
               | But LLMs are not reliable enough, so you can not actually
               | expect "specificity"
        
               | marviel wrote:
               | Not perfect now, but adequate in some domains. Will only
               | get better.
        
         | minimaxir wrote:
         | As usual for AI startups nowadays, using this API you can
         | create a downstream wrapper for image generation with bespoke
         | prompts.
         | 
         | A pro/con of the multimodal image generation approach (with an
         | actually good text encoder) is that it rewards intense prompt
         | engineering moreso than others, and if there is a use case that
         | can generate more than $0.17/image in revenue, that's positive
         | marginal profit.
        
         | austhrow743 wrote:
         | I use the api because i don't use chatgpt enough to justify the
         | cost of their UI offering.
        
       | cuuupid wrote:
       | When this was up yesterday I complained that the refusal rate was
       | super high especially on government and military shaped tasks,
       | and that this would only push contractors to use CN-developed
       | open source models for work that could then be compromised.
       | 
       | Today I'm discovering there is a tier of API access with
       | virtually no content moderation available to companies working in
       | that space. I have no idea how to go about requesting that tier
       | of access, but have spoken to 4 different defense contractors in
       | the last day who seem to already be using it.
        
         | refulgentis wrote:
         | It's "tier 5", I've had an account since the 3.0 days so I
         | can't be _positive_ I 'm not grandfathered in, but, my
         | understanding is as long as you have a non-trivial amount of
         | spend for a few months you'll have that access.
         | 
         | (fwiw for anyone curious how to implement it, it's the
         | 'moderation' parameter in the JSON request you'll send, I
         | missed it for a few hours because it wasn't in Dalle-3)
        
           | dunkmaster wrote:
           | API shows either auto or low available. Is there another
           | secret value with even lower restrictions?
        
             | refulgentis wrote:
             | Not that I know of.
             | 
             | I just took any indication that the parent post meant
             | _absolute zero_ moderation as them being a bit loose with
             | their words and excitable with how they understand things,
             | there were some signs:
             | 
             | 1. it's unlikely they completed an API integration quickly
             | enough to have an opinion on military / defense image
             | generation moderation _yesterday_ , so they're almost
             | certainly speaking about ChatGPT. (this is additionally
             | confirmed by image generation requiring tier 5 anyway,
             | which they would have been aware of if they had integrated)
             | 
             | 2. The military / defense use cases for _image generation_
             | are not provided (and the steelman 'd version in other
             | comments is nonsensical, i.e. we can quickly validate you
             | can still generate kanban boards or wireframes of ships)
             | 
             | 3. The poster passively disclaims being in military /
             | defense themself (grep "in that space")
             | 
             | 4. it is hard to envision cases of #2 that do not require
             | universal moderation for OpenAI's sake, i.e. lets say their
             | thought process is along the lines of: defense/military ~=
             | what I think of as CIA ~= black ops ~= image manipulation
             | on social media, thus, the time I said "please edit this
             | photo of the ayatollah to have him eating pig and say I
             | hate allah" means its overmoderated for defense use cases
             | 
             | 5. It's unlikely openai wants to be _anywhere near_ PR
             | resulting from #3. Assuming there is a super secret defense
             | tier that allows this, it 's at the very least, unlikely
             | that the poster's defense contractor friends were blabbing
             | about about the exclusive completely unmoderated access
             | they had, to the poster, within hours of release. They're
             | pretty serious about that secrecy stuff!
             | 
             | 6. It is unlikely the lack of ability to generate images
             | using _GPT Image 1_ would drive the military to Chinese
             | models (there aren 't Chinese _LLMs_ that do this! even if
             | they were, there 's plenty of good ol' American diffusion
             | models!)
        
         | samtp wrote:
         | What's a good use case for a defense contractor to generate AI
         | images besides to include in presentations?
        
           | aigen000 wrote:
           | Fabricating evidence of weapons of mass destruction in some
           | developing nation.
           | 
           | I kid, more real world use cases would be for concept images
           | for a new product or marketing campaigns.
        
           | ZeroTalent wrote:
           | Manufacturing consent
        
           | subroutine wrote:
           | Think of all the trivial ways an image generator could be
           | used in business, and there is likely a similar use-case
           | among the DoD and its contractors (e.g. create a cartoon
           | image of a ship for a naval training aid; make a data
           | dashboard wireframe concept for a decision aid).
        
         | throwaway314155 wrote:
         | > 4 different defense contractors in the last day
         | 
         | Now I'm just wondering what the hell defense contractors need
         | image generation for that isn't obviously horrifying...
        
           | morleytj wrote:
           | It's probably horrifying!
        
           | Aeolun wrote:
           | "Generate me a crowd of civilians with one terrorist in."
           | 
           | "Please move them to some desert, not the empire state
           | building."
           | 
           | "The civilians are supposed to have turbans, not ballcaps."
        
           | renewiltord wrote:
           | They make presentations. Most of their work is presentations
           | with diagrams. Icons.
        
           | vFunct wrote:
           | Show me a tunnel underneath a building in the desert filled
           | with small arms weapons with a poster on the wall with a map
           | of the United States and a label written with sharpie saying
           | "Bad guys here". Also add various Arabic lettering on the
           | weapons.
        
         | kittikitti wrote:
         | This is on purpose so OpenAI can then litigate against them.
         | This API isn't about a new feature, it's about control. OpenAI
         | is the biggest bully in the space of generative AI and their
         | disinformation and intimidation tactics are working.
        
         | subroutine wrote:
         | Do you work with OpenAI models via FedRAMP GGC High Azure? If
         | so I would love to hear more about your experience.
        
       | jonplackett wrote:
       | Does anyone know if you can give this endpoint an image as input
       | along with text - not just an image to mask, but an image as part
       | of a text input description.
       | 
       | I can't see a way to do this currently, you just get a prompt.
       | 
       | This, I think, is the most powerful way to use the new image
       | model since it actually understands the input image and can make
       | a new one based on it.
       | 
       | Eg you can give it a person sitting at a desk and it can make one
       | of them standing up. Or from another angle. Or in the moon.
        
         | loktarogar wrote:
         | Seems like exactly one of their examples, or am I missing
         | something? "Create a new image using image references"
         | https://platform.openai.com/docs/guides/image-generation#cre...
        
         | adamhowell wrote:
         | I think this is technically "image variations" and I think
         | image variations are still only dall-e 3 for now (best I could
         | tell earlier today from the API)
        
       | badmonster wrote:
       | Usage of gpt-image-1 is priced per token, with separate pricing
       | for text and image tokens:
       | 
       | Text input tokens (prompt text): $5 per 1M tokens Image input
       | tokens (input images): $10 per 1M tokens Image output tokens
       | (generated images): $40 per 1M tokens
       | 
       | In practice, this translates to roughly $0.02, $0.07, and $0.19
       | per generated image for low, medium, and high-quality square
       | images, respectively.
       | 
       | that's a bit pricy for a startup.
        
       | GaggiX wrote:
       | Far too expensive, I think I will wait for an equivalent Gemini
       | model.
        
       | claiir wrote:
       | > GoDaddy is actively experimenting to integrate image generation
       | so customers can easily create logos that are editable [..]
       | 
       | I remember meeting someone on Discord 1-2 years ago (?) working
       | on a GoDaddy effort to have customer-generated icons using
       | bespoke foundation image gen models? Suppose that kind of bespoke
       | model at that scale is ripe for replacement by gpt-image-1, given
       | the instruction-following ability / steerability?
        
       | jumploops wrote:
       | For the curious, this is LLM-based rather than diffusion based,
       | meaning that it adheres to text prompts with much higher
       | accuracy.
       | 
       | As an example, some users (myself included) of a generative image
       | app were trying to make a picture of person in the pouch of a
       | kangaroo.
       | 
       | No matter what we prompted, we couldn't get it to work.
       | 
       | This new model did it in one shot!
        
       | tezza wrote:
       | For the curious I generated the same prompt for each of the
       | quality types. 'Auto', 'low', 'medium', 'high'.
       | 
       | Prompt: "a cute dog hugs a cute cat"
       | 
       | https://x.com/terrylurie/status/1915161141489136095
       | 
       | I also then showed a couple of DALL:E 3 images for comparison in
       | a comment
        
       ___________________________________________________________________
       (page generated 2025-04-24 23:00 UTC)