[HN Gopher] 4o Image Generation
___________________________________________________________________
4o Image Generation
Author : meetpateltech
Score : 405 points
Date : 2025-03-25 18:06 UTC (4 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| minimaxir wrote:
| OpenAI's livestream of GPT-4o Image Generation shows that it is
| slowwwwwwwwww (maybe 30 seconds per image, which Sam Altman had
| to spin "it's slow but the generated images are worth it").
| Instead of using a diffusion approach, it appears to be
| generating the image tokens and decoding them akin to the
| original DALL-E (https://openai.com/index/dall-e/), which allows
| for streaming partial generations from top to bottom. In
| contrast, Google's Gemini can generate images and make edits in
| seconds.
|
| No API yet, and given the slowness I imagine it will cost much
| more than the $0.03+/image of competitors.
| kevmo314 wrote:
| Maybe this is the dialup of the era.
| ijidak wrote:
| Ha. That's a good analogy.
|
| When I first read the parent comment, I thought, maybe this
| is a long-term architecture concern...
|
| But your message reminded me that we've been here before.
| asadm wrote:
| specially with the slow loading effect it has.
| cubefox wrote:
| LLMs are autoregressive, so they can't be (multi-modality)
| integrated with diffusion image models, only with
| autoregressive image models (which generate an image via image
| tokens). Historically those had lower image fidelity than
| diffusion models. OpenAI now seems to have solved this problem
| somehow. More than that, they appear far ahead of any available
| diffusion model, including Midjourney and Imagen 3.
|
| Gemini "integrates" Imagen 3 (a diffusion model) only via a
| tool that Gemini calls internally with the relevant prompt. So
| it's not a true multimodal integration, as it doesn't benefit
| from the advanced prompt understanding of the LLM.
|
| Edit: Apparently Gemini also has an experimental native image
| generation ability.
| argsnd wrote:
| Is this the same for their gemini-2.0-flash-exp-image-
| generation model?
| cubefox wrote:
| No that seems to be indeed a native part of the multimodal
| Gemini model. I didn't know this existed, it's not
| available in the normal Gemini interface.
| lxgr wrote:
| This is a pretty good example of the current state of
| Google LLMs:
|
| The (no longer, I guess) industry-leading features people
| actually want are hidden away in some obscure "AI studio"
| with horrible usability, while the headline Gemini app
| still often refuses to do anything useful for me.
| (Disclaimer: I last checked a couple of months ago, after
| several more of mild amusement/great frustration.)
| tough wrote:
| hey at least now they bought ai.dev and redirected it to
| their bad ux
| echelon wrote:
| I expect the Chinese to have an open source answer for this
| soon.
|
| They haven't been focusing attention on images because the
| most used image models have been open source. Now they might
| have a target to beat.
| rfoo wrote:
| ByteDance has been working on autoregressive image
| generation for a while (see VAR, NeurIPS 2024 best paper).
| Traditionally they weren't in the open-source gang though.
| cubefox wrote:
| The VAR paper is very impressive. I wonder if OpenAI did
| something similar. But the main contribution in the new
| GPT-4o feature doesn't seem to be just image quality
| (which VAR seems to focus on), but also massively
| enhanced prompt understanding.
| summerlight wrote:
| Your understanding seems outdated, I think people are
| referring Gemini native image generation
| SweetSoftPillow wrote:
| Gemini added their multimodal Flash model to Google AI Studio
| some time ago. It does not use Imagen via tool, it's uses
| native capabilities to manipulate images, and it's free to
| try.
| johntb86 wrote:
| Meta has experimented with a hybrid mode, where the LLM uses
| autoregressive mode for text, but within a set of delimiters
| will switch to diffusion mode to generate images. In
| principle it's the best of both worlds.
| infecto wrote:
| As a user, images feel slightly slower but comparable to the
| previous generation. Given the significant quality improvement,
| it's a fair trade-off. Overall, it feels snappy, and the value
| justifies a higher price.
| rvz wrote:
| > ChatGPT's new image generation in GPT-4o rolls out starting
| today to Plus, Pro, Team, and Free users as the default image
| generator in ChatGPT, with access coming soon to Enterprise and
| Edu. For those who hold a special place in their hearts for
| DALL*E, it can still be accessed through a dedicated DALL*E GPT.
|
| > Developers will soon be able to generate images with GPT-4o via
| the API, with access rolling out in the next few weeks.
|
| That's it folks. Tens of thousands of so-called "AI" image
| generator startups have been obliterated and taking digital
| artists with them all reduced to near zero.
|
| Now you have a widely accessible meme generator with the name
| "ChatGPT".
|
| The last task is for an open weight model that competes against
| this and is faster and all for free.
| afro88 wrote:
| Yep. The coherence and text quality is insanely good. Keen to
| play with it to find it's "mangled hands" style deficiencies,
| because of course they cherry picked the best examples.
| dragonwriter wrote:
| > Tens of thousands of so-called "AI" image generator startups
| have been obliterated and taking digital artists with them all
| reduced to near zero. Now you have a widely accessible meme
| generator with the name "ChatGPT".
|
| ChatGPT has already had a that via Dall-E. If it didn't kill
| those startups when that happened this doesn't fundamentally
| change anything. Now its got a new image gen model, which --
| like Dall-E 3 when it came out -- is competitive or ahead of
| other SotA base models using just text prompts, the simplest
| generation workflow, but both more expensive and less adaptable
| to more involved workflows than the tools anyone more than a
| casual user (whether using local tools or hosted services) is
| using. This is station-keeping for OpenAI, not a meaningful
| change in the landscape.
| og_kalu wrote:
| There are several examples here, especially in the videos
| that no existing image gen model can do and would require
| tedious workflows and/or training regimens to replicate,
| maybe.
|
| It's not 'just' a new model ala Imagen 3. This is 'what if
| GPT could transform images nearly as well as text?' and that
| opens up a lot of possibilities. It's definitely a meaningful
| change.
| occamschainsaw wrote:
| Did they time it with the Gemini 2.5 launch?
| https://news.ycombinator.com/item?id=43473489
|
| Was it public information when Google was going to launch their
| new models? Interesting timing.
| qoez wrote:
| "Interesting timing" It's like the 4th time by my counting
| they've done this
| aabhay wrote:
| OpenAI was started with the express goal of undermining
| Google's potential lead in AI. The fact that they time launches
| to Google launches to me indicates they still see this as a
| meaningful risk. And with this launch in particular I find
| their fears more well-founded than ever.
| qoez wrote:
| Looks about what you'd get with FLUX and attaching some language
| model to enhance your prompt with eg more text
| afro88 wrote:
| Flux doesn't do text that good
| echelon wrote:
| Exactly. OpenAI isn't going to win image and video.
|
| Sora is one of the _worst_ video generators. The Chinese have
| really taken the lead in video with Kling, Hailuo, and the open
| source Wan and Hunyuan.
|
| Wan with LoRAs will enable real creative work. Motion control,
| character consistency. There's no place for an OpenAI Sora type
| product other than as a cheap LLM add-in.
| minimaxir wrote:
| Flux 1.1 Pro has good prompt adherence, but some of these
| (admittingly cherry-picked) GPT-4o generated image demos are
| beyond what you would get with Flux without a lot of iteration,
| particularly the large paragraphs of text.
|
| I'm excited to see what a Flux 2 can do if it can actually use
| a modern text encoder.
| echelon wrote:
| Structural editing and control nets are much more powerful
| than text prompting alone.
|
| The image generators used by creatives will not be text-
| first.
|
| "Dragon with brown leathery scales with an elephant texture
| and 10% reflectivity positioned three degrees under the
| mountain, which is approximately 250 meters taller than the
| next peak, ..." is not how you design.
|
| Creative work is not 100% dice rolling in a crude and
| inadequate language. Encoding spatial and qualitative details
| is impossible. "A picture is worth a thousand words" is an
| understatement.
| jjmarr wrote:
| Yeah, but then it no longer replaces human artists.
|
| Controlnet has been the obvious future of image-generation
| for a while now.
| echelon wrote:
| We're not trying to replace human artists. We're trying
| to make them more efficient.
|
| We might find that the entire "studio system" is a gross
| inefficiency and that individual artists and directors
| can self-publish like on Steam or YouTube.
| dragonwriter wrote:
| > Yeah, but then it no longer replaces human artists.
|
| Automation tools are always more powerful as a force
| multiplier for skilled users than a complete replacement.
| (Which is still a replacement on any given task scope,
| since it reduces the number of human labor hours -- and,
| given any elapsed time constraints, human laborers --
| needed.)
| minimaxir wrote:
| Prompt adherence and additional tricks such as
| ControlNet/ComfyUI pipelines are not mutually exclusive.
| Both are very important to get good image generation
| results.
| Der_Einzige wrote:
| It is when it's kept behind an API. You cannot use
| Controlnet/ComfyUI and especially not the best stuff like
| regional prompting with this model. You can't do it with
| Gemini, and that's by design because otherwise coomers
| are going to generate 999999 anime waifus like they do on
| Civit.ai.
| Y_Y wrote:
| That just elicits a cheeky refusal I'm afraid:
|
| """
|
| That's a fun idea--but generating an image with 999,999
| anime waifus in it isn't technically possible due to
| visual and processing limits. But we can get creative.
|
| Want me to generate:
|
| 1. A massive crowd of anime waifus (like a big collage or
| crowd scene)?
|
| 2. A stylized representation of "999999 anime waifus"
| (maybe with a few in focus and the rest as silhouettes or
| a sea of colors)?
|
| 3. A single waifu with a visual reference to the number
| 999999 (like a title, emblem, or digital counter in the
| background)?
|
| Let me know your vibe--epic, funny, serious, chaotic?
|
| """
| voxic11 wrote:
| It can do in-context learning from images you upload. So
| you can just upload a depth map or mark up an image with
| the locations of edits you want and it should be able to
| handle that. I guess my point is that since its the same
| model that understands how to see images and how to
| generate them you aren't restricted from interacting with
| it via text only.
| shaky-carrousel wrote:
| Tried it, the "compise armporressed" and "Pros: made bord
| reqotons" didn't impress me in the slightest.
| BoorishBears wrote:
| Are you sure you were even using the model from the post?
| shaky-carrousel wrote:
| Pressed the "Try in ChatGPT", pasted the first prompt, became
| thoroughly unimpressed.
| pton_xd wrote:
| Can you specify the output dimensions?
|
| EDIT: Seems not, "The smallest image size I can generate is
| 1024x1024. Would you like me to proceed with that, or would you
| like a different approach?"
| minimaxir wrote:
| I suspect you can prompt aspect ratios.
| resource_waste wrote:
| LPT: while the benchmarks don't show it, chatGPT4>4o. It amazes
| me people use 4o at all. But hey its the brand name and its free.
|
| ofc 4.5 is best, but its slow and I am afraid I'm going to hit
| limits.
| minimaxir wrote:
| OpenAI themselves discourages using GPT-4 outside of legacy
| applications, in favor of GPT-4o instead (they are shutting
| down the large output gpt-4-32k variants in a few months).
| GPT-4 is also an order of magnitude more expensive/slower.
| zamadatix wrote:
| I think both of these points are what sow doubt in some
| people in the first place because both could be true if GPT-4
| was just less profitable to run, not if it was worse in
| quality. Of course it is actually worse in quality than 4o by
| any reasonable metric... but I guess not everyone sees it
| that way.
| xnx wrote:
| Will be interesting to see how this ranks against Google Imagen
| and Reve. https://huggingface.co/spaces/ArtificialAnalysis/Text-
| to-Ima...
| jfoster wrote:
| The character consistency and UI capabilities seem like they open
| up a lot of new use cases.
| JTyQZSnP3cQGa8B wrote:
| I'd like to know which use case because 2 years ago no one
| cared about pictures or generating stuff, and now it's vital
| for humanity but people can't explain why.
| Alupis wrote:
| For creating believable fake images...
|
| We're largely past the days of 7 fingered hands - text
| remains one of the tell-tale signs.
| jfoster wrote:
| Well I definitely wouldn't say it's vital for humanity. Has
| anyone actually said that?
|
| Character consistency means that these models could now
| theoretically illustrate books, as one example.
|
| Generating UIs seems like it would be very helpful for any
| app design or prototyping.
| olalonde wrote:
| Never heard about professional photographers, stock
| photography, graphic artists, etc.?
| colesantiago wrote:
| It's vital for grifting and not paying those cheeky expensive
| artists a dime.
|
| That is a great right, as long as it's not programmers.
| BoorishBears wrote:
| I work on a product for generating interactive fanfiction
| using an LLM, and I've put a lot of work into post-training
| to improve writing quality to match or exceed typical human
| levels.
|
| I'm excited about this for adding images to those
| interactive stories.
|
| It has nothing to do with circumventing the cost of artists
| or writers: regardless of cost, no one can put out a story
| and then rewrite it based on whatever idea pops into every
| reader's mind for their own personal main character.
|
| It's a novel experience that only a "writer" that scales by
| paying for an inanimate object to crunch numbers can
| enable.
|
| Similarly no artist can put out a piece of art for that
| story and then go and put out new art bespoke to every
| reader's newly written story.
|
| -
|
| I think there's this weird obsession with framing these
| tools about being built to just replace current people
| doing similar things. Just speaking objectively: the market
| for replacing "cheeky expensive artists" would not justify
| building these tools.
|
| The most interesting applications of this technology being
| able to do things that are simply not possible today even
| if you have all the money in the world.
|
| And for the record, I'll be _ecstatic_ for the day an AI
| can reach my level of competency in building software. I
| 've been doing it since I was a child because I love it,
| it's the one skill I've ever been paid for, and I'd still
| be over the moon because it'd let me explore so many more
| ideas than I alone can ever hope to build.
| bufferoverflow wrote:
| > _That is a great right, as long as it 's not
| programmers._
|
| You realize that almost weekly we have new AI models coming
| out that are better and better at programming? It just
| happened that the image generation is an easier problem
| than programming. But make no mistake, AI is coming for us
| too.
|
| That's the price of automating everything.
| bbor wrote:
| Whelp. That's terrifying.
| gs17 wrote:
| This is really impressive, but the "Best of 8" tag on a lot of
| them really makes me want to see how cherry-picked they are. My
| three free images had two impressive outputs and one failure.
| do_not_redeem wrote:
| The high five looks extremely unnatural. Their wrists are
| aligned, but their fingers aren't, somehow?
|
| If that's best of 8, I'd love to see the outtakes.
| tiahura wrote:
| Agreed. It seems totally unnatural that a couple of nerds
| high-five awkwardly.
| do_not_redeem wrote:
| Not awkward. Anatomically uncanny and physically
| impossible.
| skydhash wrote:
| While drawing hands is difficult (because the surface
| morphs in a variety of ways), the shapes and relative
| proportions are quite simple. That's how you can have
| tools like Metahuman[0]
|
| [0]: https://www.unrealengine.com/en-US/metahuman
| aantix wrote:
| Still seems to have problems with transparent backgrounds.
| minimaxir wrote:
| That's expected with any image generating models because they
| aren't trained with an alpha channel.
|
| It's more pragmatic to pipeline the results to a background
| removal model.
|
| EDIT: It appears GPT-4o is different as there is a video demo
| dedicated to transparancy.
| BoorishBears wrote:
| There's an entire video in the post dedicated to how well it
| does transparency:
| https://openai.com/index/introducing-4o-image-
| generation/?vi...
|
| I suspect we're getting a flood of comment from people who
| are using Dall-E.
| minimaxir wrote:
| Huh, I missed that. I'm skeptical of the results in
| practice, though.
| aantix wrote:
| The video was helpful. I started with the prompt "Generate
| a transparent image. "
|
| And that created the isolated image on a transparent
| background.
|
| Thank-you.
| throwaway314155 wrote:
| This one however explicitly advertises good transparency
| support.
| Der_Einzige wrote:
| There's a mod for stable diffusion webui
| forge/automatic1111/ComfyUI which enables this for all
| diffusion models (except these closed source ones).
| sergiotapia wrote:
| am I dumb or every time they release something I can never find
| out how to actually use it and forget about it. take this for
| instance I wanted to try out their newton "an infographic
| explaining newton's prism experiment in great detail" example,
| but it generated a very bad result but maybe it's because I'm not
| using the right model? every release of theirs is not really a
| release, it's like a trailer. right?
| throwaway314155 wrote:
| You're not dumb. They do this for nearly every single major
| release. I can't really understand why considering it generates
| negative sentiment about the release, but it's something to be
| expected from OpenAI at this point.
| swalsh wrote:
| This is what's so wild about Anthropic. When they release it
| seems like it's rolled out to all users, and API customers
| immediately. OpenAI has MONTHS between annoucement and roll
| out, or if they do it's usually just influencers who get an
| "early look". It's pretty frustrating.
| guzik wrote:
| This is hilarious. I'm also confused about whether they
| released it or not because the results are underwhelming.
|
| EDIT: Ok it works in Sora, and my jaw dropped
| carbocation wrote:
| This works great for many purposes.
|
| One area where it does not work well at all is modifying
| photographs of people's faces.* Completely fumbles if you take a
| selfie and ask it to modify your shirt, for example.
|
| * = unless the people are in the training set
| cess11 wrote:
| That's to be expected, no? It's a usian product so it will be a
| disappointment in all areas where things could get lewd.
| briandear wrote:
| What is usian? Never heard of that.
| jakelazaroff wrote:
| US-ian, as in from the United States.
| briandear wrote:
| So should we be using Eusians for citizens of the Estados
| Unidos Mexicanos?
| BoorishBears wrote:
| > We're aware of a bug where the model struggles with
| maintaining consistency of edits to faces from user uploads but
| expect this to be fixed within the week.
|
| Sounds like it may be a safety thing that's still getting
| figured out
| carbocation wrote:
| Thanks, I had not seen that caveat!
| ilaksh wrote:
| It just doesn't have that kind of image editing capability.
| Maybe people just assume it does because Google's similar model
| has it. But did OpenAI claim it could edit images?
| BoorishBears wrote:
| Yes it does, and that's one of the most important parts of it
| being multi-modal: just like it can make targeted edits at a
| piece of text, it can now make similarly nuanced edits to an
| image. The character consistency and restyling they mention
| are all rooted in the same concepts.
| alach11 wrote:
| It's incredible that this took 316 days to be released since it
| was initially announced. I do appreciate the emphasis in the
| presentation on how this can be useful beyond just being a
| cool/fun toy, as it seems most image generation tools have
| functioned.
|
| Was anyone else surprised how slow the images were to generate in
| the livestream? This seems notably slower than DALLE.
| byearthithatius wrote:
| I remember literally just two or three years back getting good
| text was INSANE. We were all amazed when SD started making pretty
| good text.
| afro88 wrote:
| Edit: Please ignore. They hadn't rolled the new model out to my
| account yet. The announcement blog post is a bit misleading
| saying you can try it today.
|
| --
|
| Comparison with Leonardo.Ai.
|
| ChatGPT:
| https://chatgpt.com/share/67e2fb21-a06c-8008-b297-07681dddee...
|
| ChatGPT again (direct one shot):
| https://chatgpt.com/share/67e2fc44-ecc8-8008-a40f-e1368d306e...
|
| ChatGPT again (using word "photorealistic instead of "photo"):
| https://chatgpt.com/share/67e2fce4-369c-8008-b69e-c2cbe0dd61...
|
| Leonardo.Ai Phoenix 1.0 model:
| https://cdn.leonardo.ai/users/1f263899-3b36-4336-b2a5-d8bc25...
| tetris11 wrote:
| Is the "2D animation style" part you put at the beginning and
| then changed an attempt to see how well the AI responds to gas
| lighting?
| afro88 wrote:
| My bad, I was trying the conversational aspect, but that's
| not an apples to apples conparison. I have put a direct one
| shot example in the original post as well.
| wodenokoto wrote:
| In all fairness you _did_ say 2D animation style
| afro88 wrote:
| True. I had that conversation before deciding to compare to
| others. I have updated the post with other fairer examples.
| Nowhere near Leonardo Phoenix or Flux for this simple image
| at least.
| elicash wrote:
| What did the prompt look like for Leonard.ai?
|
| I'm curious if you said 2d animation style for both or just for
| chatgpt.
|
| Edit: Your second version of chatgpt doesn't say
| photorealistic. Can you share the Leonard.ai prompt?
| afro88 wrote:
| Added photorealistic, which made it worse.
|
| Leonardo prompt: A golden cocker spaniel with floppy ears and
| a collar that says "Sunny" on it
|
| Model: Phoenix 1.0 Style: Pro color photography
| afro88 wrote:
| Saying "pro color photography" to ChatGPT doesn't get it
| any better either unfortunately: https://chatgpt.com/share/
| 67e2fd91-8d24-8008-b144-92c832ed0b...
| drew-y wrote:
| The ChatGPT examples don't look like the new Image Gen model
| yet. The text on the dog collar isn't very good.
| afro88 wrote:
| Apparently it rolls out today to Plus (which I have). I
| followed the "Try in ChatGPT" link at the top of the post
| og_kalu wrote:
| It's rolling out to everyone starting today but i'm not
| sure if everyone has it yet. Does it generate top down for
| you (picture goes from mostly blurry to clear starting from
| the top) like in their presentation ?
| afro88 wrote:
| No it didn't generate like that. Thanks for clarifying. I
| have updated my original post.
| yed wrote:
| On mine I tried it "natively" and in DALL-E mode and the
| results were basically identical, I think they haven't
| actually rolled it out to everyone yet.
| spaceman_2020 wrote:
| Yeah, its just not good enough. The big labs are way behind
| what the image focused labs are putting out. Flux and
| Midjourney are running laps around these guys
| KrazyButTrue wrote:
| Is it live yet? Have been trying it out and am still getting poor
| results on text generation.
| moffkalast wrote:
| You're supposed to generate images, stupid /s
| Maxatar wrote:
| I don't think it's available to everyone yet on 4o. Just like
| you I am getting the same "cartoony" styling and poor text
| generation.
|
| Might take a day or two before it's available in general.
| virtualcharles wrote:
| So far it seems to be the same for me.
|
| It seems like an odd way to name/announce it, there's nothing
| obvious to distinguish it from what was already there (i.e. 4o
| making images) so I have no idea if there is a UI change to
| look for, or just keep trying stuff until it seems better?
| throwaway314155 wrote:
| This is OpenAI's bread and butter - announce something as
| though it's being launched and then proceed to slowly roll it
| out after a couple of days.
|
| Truly infuriating, especially when it's something like this
| that makes it tough to tell if the feature is even enabled.
| user3939382 wrote:
| I'll just be happy with not everything having that over saturated
| cg/cartoon style that you cant prompt your way out of.
| jjeaff wrote:
| Is that an artifact of the training data? Where are all these
| original images with that cartoony look that it was trained on?
| minimaxir wrote:
| Ever since Midjourney popularized it, image generation models
| are often posttrained on more "aesthetic" subsets of images
| to give them a more fantasy look. It also help obscure some
| of the imperfections of the AI.
| jl6 wrote:
| Wild speculation: video game engines. You want your model to
| understand what a car looks like from all angles, but it's
| expensive to get photos of real cars from all angles, so
| instead you render a car model in UE5, generating hundreds of
| pictures of it, from many different angles, in many different
| colors and styles.
| wongarsu wrote:
| A large part of deviantart.com would fit that description.
| There are also a lot of cartoony or CG images in communities
| dedicated to fanart. Another component in there is probably
| the overly polished and clean look of stock images, like the
| front page results of shutterstock.
|
| "Typical" AI images are this blend of the popular image
| styles of the internet. You always have a bit of digital
| drawing + cartoon image + oversaturated stock image + 3d
| render mixed in. Models trained on just one of these work
| quite well, but for a generalist model this blend of styles
| is an issue
| ToValueFunfetti wrote:
| I've heard this is downstream of human feedback. If you ask
| someone which picture is better, they'll tend to pick the
| more saturated option. If you're doing post-training with
| humans, you'll bake that bias into your model.
| alana314 wrote:
| I was relying on that to determine if images were AI though
| LeoPanthera wrote:
| Frustratingly the DALL-E API actually has an option for this,
| you can switch it from "vivid" to "realistic".
|
| This option is not exposed in ChatGPT, it only uses vivid.
| richardfulop wrote:
| you really have to NOT try to end up with that result in MJ.
| coherentpony wrote:
| > we've built our most advanced image generator yet into GPT-4o.
| The result--image generation that is not only beautiful, but
| useful.
|
| Sorry, but how are these useful? None of the examples demonstrate
| any use beyond being cool to look at.
|
| The article vaguely mentions 'providing inspiration' as possible
| definition of 'useful'. I suppose.
| kh_hk wrote:
| > Introducing 4o Image Generation: [...] our most advanced image
| generator yet
|
| Then google:
|
| > Gemini 2.5: Our most intelligent AI model
|
| > Introducing Gemini 2.0 | Our most capable AI model yet
|
| I could go on forever. I hope this trend dies and apple starts
| using something effective so all the other companies can start
| copying a new lexicon.
| hombre_fatal wrote:
| Maybe it's not useless. 1) it's only comparing it to their own
| products and 2) it's useful to know that the product is the
| current best in their offering as opposed to a new product that
| might offer new functionality but isn't actually their most
| advanced.
|
| Which is especially relevant when it's not obvious which
| product is the latest and best just looking at the names. Lots
| of tech naming fails this test from Xbox (Series X vs S) to
| OpenAI model names (4o vs o1-pro).
|
| Here they claim 4o is their most capable _image generator_
| which is useful info. Especially when multiple models in their
| dropdown list will generate images for you.
| Kiro wrote:
| What's the problem?
| kh_hk wrote:
| It's a nitpick about the repetitive phrasing for
| announcements
|
| <Product name>: Our most <superlative> <thing> yet|ever.
| echelon wrote:
| I hate modern marketing trends.
|
| This one isn't even my biggest gripe. If I could eliminate
| any word from the English language forever, it would be
| "effortlessly".
| kh_hk wrote:
| If you could _effortlessly_ eliminate any word you mean?
| mhurron wrote:
| Modern? Everything has been 'new and improved' since the
| 60's
| xboxnolifes wrote:
| Idk, right now I think I'd eliminate "blazingly fast"
| from software engineering vocabulary.
| rachofsunshine wrote:
| Speaking as someone who'd love to not speak that way in my
| own marketing - it's an unfortunate necessity in a world
| where people will give you literal milliseconds of their
| time. Marketing isn't there to tell you about the thing,
| it's there to get you to want to know more about the thing.
| skydhash wrote:
| A term for people giving only milliseconds of their
| attention is: uninterested people. If I'm not looking for
| a project planner, or interested in the space, there's no
| wording that can make me stay on an announcement for one.
| If I am, you can be sure I'm going to read the whole
| feature page.
| adammarples wrote:
| Idealistic and wrong, marketing does work in a lot of
| cases and that's why everybody does it
| sigmoid10 wrote:
| Has post-Jobs Apple ever come up with anything that would
| warrant this hope?
| internetter wrote:
| Every iPhone is their best iPhone yet
| brianshaler wrote:
| Even the 18 Pro Max Ultra with Apple Intelligence?
|
| Obligatory Jobs monologue on marketing people:
|
| https://www.youtube.com/watch?v=P4VBqTViEx4
| layer8 wrote:
| Only the September ones. ;)
| kh_hk wrote:
| No, but I think they stopped with "our most" (since all other
| brainless corps adopted it) and just connect adjectives with
| dots.
|
| Hotwheels: Fast. Furious. Spectacular.
| sigmoid10 wrote:
| Maybe people also caught up to the fact that the "our most
| X product" for Apple usually means someone else already did
| X a long time ago and Apple is merely jumping on the wagon.
| nyczomg wrote:
| https://www.youtube.com/watch?v=CUPDRnUWeBA
| Buttons840 wrote:
| Every step of gradient descent is the best model yet!
| sionisrecur wrote:
| Maybe they used AI to come up with the tag line.
| roenxi wrote:
| We're in the middle of a massive and unprecedented boom in AI
| capabilities. It is hard to be upset about this phrasing - it
| is literally true and extremely accurate.
| TheAceOfHearts wrote:
| I wanted to use this to generate funny images of myself. Recently
| I was playing around with Gemini Image Generation to dress myself
| up as different things. Gemini Image Generation is surprisingly
| good, although the image quality quickly degrades as you add more
| changes. Nothing harmful, just silly things like dressing me up
| as a wizard or other typical RPG roles.
|
| Trying out 4o image generation... It doesn't seem to support this
| use-case at all? I gave it an image of myself and asked to turn
| me into a wizard, and it generate something that doesn't look
| like me in the slightest. A second attempt, I asked to add a
| wizard hat and it just used python to add a triangle in the
| middle of my image. I looked at the examples and saw they had a
| direct image modification where they say "Give this cat a
| detective hat and a monocle", so I tried that with my own image
| "Give this human a detective hat and a monocle" and it just gave
| me this error:
|
| > I wasn't able to generate the modified image because the
| request didn't follow our content policy. However, I can try
| another approach--either by applying a filter to stylize the
| image or guiding you on how to edit it using software like
| Photoshop or GIMP. Let me know what you'd like to do!
|
| Overall, a very disappointing experience. As another point of
| comparison, Grok also added image generation capabilities and
| while the ability to edit existing images is a bit limited and
| janky, it still manages to overlay the requested transformation
| on top of the existing image.
| og_kalu wrote:
| It's not actually out for everyone yet. You can tell by the
| generation style. 4o generates top down (picture goes from
| mostly blurry to clear starting from the top).
| planb wrote:
| To quote myself from a comment on sora:
|
| Iterations are the missing link. With ChatGPT, you can
| iteratively improve text (e.g., "make it shorter," "mention
| xyz"). However, for pictures (and video), this functionality is
| not yet available. If you could prompt iteratively (e.g.,
| "generate a red car in the sunset," "make it a muscle car,"
| "place it on a hill," "show it from the side so the sun shines
| through the windshield"), the tools would become exponentially
| more useful.
|
| I'm looking forward to try this out and see if I was right.
| Unfortunately it's not yet available for me.
| Telemakhos wrote:
| Reading other comments in other threads on HN has left me with
| the impression that iterative improvement within a single chat
| is not a good idea.
|
| For example, https://news.ycombinator.com/item?id=43388114
| planb wrote:
| You're right. I'm actually doing this quite often when
| coding. Starting with a few iterative promts to get a general
| outline of what I want and when that's ok, copy the outline
| to a new chat and flesh out the details. But that's still
| iterative work, I'm just throwing away the intermediate
| results that I think confuse the LLM sometimes.
| Workaccount2 wrote:
| You can do that with Gemini's image model, flash 2.0 (image
| generation) exp.[1] It's not perfect but it does mostly
| maintain likeness between generations.
|
| [1]https://aistudio.google.com/prompts/new_chat
| camel_Snake wrote:
| Whisk I think is possibly the best at it. No idea what it
| uses under the hood though.
|
| https://labs.google/fx/tools/whisk
| jashephe wrote:
| The periodic table poster under "High binding problems" is billed
| as evidence of model limitations, but I wonder if it just
| suggests that 4o is a fan of "Look Around You".
| lxgr wrote:
| Is there any way to see whether a given prompt was serviced by 4o
| or Dall-E?
|
| Currently, my prompts seem to be going to the latter still, based
| on e.g. my source image being very obviously looped through a
| verbal image description and back to an image, compared to
| gemini-2.0-flash-exp-image-generation. A friend with a Plus plan
| has been getting responses from either.
|
| The long-term plan seems to be to move to 4o completely and move
| Dall-E to its own tab, though, so maybe that problem will resolve
| itself before too long.
| og_kalu wrote:
| 4o generates top down (picture goes from mostly blurry to clear
| starting from the top). If it's not generating like that for
| you then you don't have it yet.
| lxgr wrote:
| That's useful, thank you! But it also highlights my point:
| Why do I have to observe minor details about how the result
| is being presented to me to know which model was used?
|
| I get the intent to abstract it all behind a chat interface,
| but this seems a bit too much.
| og_kalu wrote:
| Oh I agree 100%. Open AI roll outs leave much to be
| desired. Sometimes there isn't even a clear difference like
| there is for this.
| n2d4 wrote:
| If you don't have access to it on ChatGPT yet, you can try
| Sora, which already has access for me.
| tethys wrote:
| I've generated (and downloaded) a couple of images. All
| filenames start with `DALL*E`, so I guess that's a safe way to
| tell how the images were generated.
| blixt wrote:
| What's important about this new type of image generation that's
| happening with tokens rather than with diffusion, is that this is
| effectively reasoning in pixel space.
|
| Example: Ask it to draw a notepad with an empty tic-tac-toe, then
| tell it to make the first move, then you make a move, and so on.
|
| You can also do very impressive information-conserving
| translations, such as changing the drawing style, but also stuff
| like "change day to night", or "put a hat on him", and so forth.
|
| I get the feeling these models are quite restricted in
| resolution, and that more work in this space will let us do
| really wild things such as ask a model to create an app step by
| step first completely in images, essentially designing the whole
| app with text and all, then writing the code to reproduce it. And
| it also means that a model can take over from a really good
| diffusion model, so even if the original generations are not
| good, it can continue "reasoning" on an external image.
|
| Finally, once these models become faster, you can imagine a truly
| generative UI, where the model produces the next frame of the app
| you are using based on events sent to the LLM (which can do all
| the normal things like using tools, thinking, etc). However, I
| also believe that diffusion models can do some of this, in a much
| faster way.
| Mond_ wrote:
| Pretty sure the modern Gemini image models can already do token
| based image generation/editing and are significantly better and
| faster.
| og_kalu wrote:
| It's faster but it's definitely not better than what's being
| showcased here. The quality of Flash 2 Image gens are
| generally pretty meh.
| blixt wrote:
| Yeah Gemini has had this for a few weeks, but much lower
| resolution. Not saying 4o is perfect, but my first few images
| with it are much more impressive than my first few images
| with Gemini.
| yieldcrv wrote:
| _weeks_ , ya'll, weeks!
| jjbinx007 wrote:
| It still can't generate a full glass of wine. Even in follow up
| questions it failed to manipulate the image correctly.
| yusufozkan wrote:
| Are you sure you are using the new 4o image generation?
|
| https://imgur.com/a/wGkBa0v
| minimaxir wrote:
| That is an unexpectedly literal definition of "full glass".
| numpad0 wrote:
| Except this is correct in this context. None of existing
| Diffusion models could, apparently.
| yusufozkan wrote:
| Generating an image of a completely full glass of wine
| has been one of the popular limitations of image
| generators, the reason being neural networks struggling
| to generalise outside of their training data (there are
| almost no pictures on the internet of a glass "full" of
| wine). It seems they implemented some reasoning over
| images to overcome that.
| kube-system wrote:
| I wonder if that has changed recently since this has
| become a litmus test.
|
| Searching in my favorite search engine for "full glass of
| wine", without even scrolling, three of the images are of
| wine glasses filled to the brim.
| Loeffelmann wrote:
| That's the point. With the old models they all failed to
| produce a wine glass that is completley to the brim full.
| Because you can't find that a lot in the data they used
| for training.
| colecut wrote:
| Imagine if they just actually trained the model on a
| bunch of photographs of a full glass of wine, knowing of
| this litmus test
| HelloImSteven wrote:
| Even if they did, I'd assume the association of "full"
| and this correct representation would benefit other areas
| of the model. I.e., there could (/should?) be general
| improvement for prompts where objects have unusual
| adjectives.
|
| So maybe training for litmus tests isn't the worst
| strategy in the absence of another entire internet of
| training data...
| nefarious_ends wrote:
| imagine!
| gorkish wrote:
| I obviously have no idea if they added real or synthetic
| data to the training set specifically regarding the full-
| to-the-brim wineglass test, but I fully expect that this
| prompt is now compromised in the sense that because it is
| being discussed in the public sphere, it's has inherently
| become part of the test suite.
|
| Remember the old internet adage that the fastest way to
| get a correct answer online is to post an incorrect one?
| I'm not entirely convinced this type of iterative gap
| finding and filling is really much different than natural
| human learning behavior.
| orbital-decay wrote:
| A lot of other things are rare in datasets, let alone
| correctly labeled. Overturned cars (showing the
| underside), views from under the table, people walking on
| the ceiling with plausible upside down hair, clothes, and
| facial features etc etc
| jorvi wrote:
| The old models were doing it correct also.
|
| There is no one correct way to interpert 'full'. If you
| go to a wine bar and ask for a full glass of wine,
| they'll probably interpert that as a double. But you
| could also interpert it the way a friend would at home,
| which is about 2-3cm from the rim.
|
| Personally I would call a glass of wine filled to the
| brim 'overfilled', not 'full'.
| drdeca wrote:
| [delayed]
| yusufozkan wrote:
| This is another cool example from their blog
|
| https://imgur.com/a/Svfuuf5
| Imustaskforhelp wrote:
| Looks amazing,can you please also create a unconventional
| image like the clock at 2:35 , I tried it something like
| this with gemini when some redditor asked it and it failed
| so wondering if 4o does do it
| Workaccount2 wrote:
| I tried and while the clock it generated was very well
| done and high quality, it showed the time as the analog
| clock default of 10:10.
| lyu07282 wrote:
| The problem now is we don't know if people mistake dall-e
| for the new multimodal gpt4o output, they really
| should've made that clearer.
| CSMastermind wrote:
| I tried and it failed repeatedly (like actual error
| messages):
|
| > It looks like there was an error when trying to
| generate the updated image of the clock showing 5:03. I
| wasn't able to create it. If you'd like, you can try
| again by rephrasing or repeating the request.
|
| A few times it did generate an image but it never showed
| the right time. It would frequently show 10:10 for
| instance.
| coder543 wrote:
| If it tried and failed repeatedly, then it was prompting
| DALL-E, looking at the results, then prompting DALL-E
| again, not doing direct image generation.
| stevesearer wrote:
| Can you do this with the prompt of a cow jumping over the
| moon?
|
| I can't ever seem to get it to make the cow appear to be
| above the moon. Always literally covering it or to the side
| etc.
| michaelt wrote:
| https://chatgpt.com/share/67e31a31-3d44-8011-994e-b7f8af7
| 694... got it on the second try.
| coder543 wrote:
| To be clear, that is DALL-E, not 4o image generation.
| (You can see the prompt that 4o generated to give to
| DALL-E.)
| blixt wrote:
| Yeah, it seems like somewhere in the semantic space (which
| then gets turned into a high resolution image using a
| specialized model probably) there is not enough space to hold
| all this kind of information. It becomes really obvious when
| you try to meaningfully modify a photo of yourself, it will
| lose your identity.
|
| For Gemini it seems to me there's some kind of "retain old
| pixels" support in these models since simple image edits just
| look like a passthrough, in which case they _do_ maintain
| your identity.
| jasonjmcghee wrote:
| I don't buy the meme or w/e that they can't produce an image
| with the full glass of wine. Just takes a little prompt
| engineering.
|
| Using Dall-e / old model without too much effort (I'd call
| this "full".)
|
| https://imgur.com/a/J2bCwYh
| sfjailbird wrote:
| They're glass-half-full type models.
| meeton wrote:
| https://i.imgur.com/xsFKqsI.png
|
| "Draw a picture of a full glass of wine, ie a wine glass
| which is full to the brim with red wine and almost at the
| point of spilling over... Zoom out to show the full wine
| glass, and add a caption to the top which says "HELL YEAH".
| Keep the wine level of the glass exactly the same."
| Stevvo wrote:
| Can't replicate. Maybe the rollout is staggered? Using Plus
| from Europe, it's consistently giving me a half full glass.
| sionisrecur wrote:
| Maybe it's half empty.
| coder543 wrote:
| Is it drawing the image from top to bottom very slowly
| over the course of at least 30 seconds? If not, then
| you're using DALL-E, not 4o image generation.
| cruffle_duffle wrote:
| Maybe the "HELL YEAH" added a "party implication" which
| shifted it's "thinking" into just correct enough latent
| space that it was able to actually hunt down some image
| somewhere in its training data of a truly full glass of
| wine.
|
| I almost wonder if prompting it "similar to a full glass of
| beer" would get it shifted just enough.
| iagooar wrote:
| The question remains: why would you generate a full glass of
| wine? Is that something really that common?
| minimaxir wrote:
| It's a type of QA question that can identify peculiarities
| in models (e.g. count "r"s in strawberry), which the best
| we have given the black box nature of LLMs.
| rafram wrote:
| > Finally, once these models become faster, you can imagine a
| truly generative UI, where the model produces the next frame of
| the app you are using based on events sent to the LLM
|
| With current GPU technology, this system would need its own
| Dyson sphere.
| xg15 wrote:
| > _What 's important about this new type of image generation
| that's happening with tokens rather than with diffusion_
|
| That sounds really interesting. Are there any write-ups how
| exactly this works?
| lyu07282 wrote:
| Would be interested to know as well. As far as I know there
| is no public information about how this works exactly. This
| is all I could find:
|
| > The system uses an autoregressive approach -- generating
| images sequentially from left to right and top to bottom,
| similar to how text is written -- rather than the diffusion
| model technique used by most image generators (like DALL-E)
| that create the entire image at once. Goh speculates that
| this technical difference could be what gives Images in
| ChatGPT better text rendering and binding capabilities.
|
| https://www.theverge.com/openai/635118/chatgpt-sora-ai-
| image...
| treis wrote:
| I wonder how it'd work if the layers were more physical
| based. In other words something like rough 3d shape ->
| details -> color -> perspective -> lighting.
|
| Also wonder if you'd get better results in generating
| something like blender files and using its engine to render
| the result.
| abossy wrote:
| That's very interesting. I would have assumed that 4o is
| internally using a single seed for the entire conversation, or
| something analogous to that, to control randomness across image
| generation requests. Can you share the technical name for this
| reasoning process so I could look up research about it?
| SpaceManNabs wrote:
| multimodal chain of thought / generation of thought
|
| Nobody has really decided on a name.
|
| Also chain of thought is somewhat different from chain of
| thought reasoning so mb throw in multimodal chain of thought
| reasoning
| nine_k wrote:
| It also would mean that the model can correctly split the image
| into layers, or segments, matching the entities described. The
| low-res layers can then be fed to other image-processing
| models, which would enhance them and fill in missing small
| details. The result could be a good-quality animation, for
| instance, and the "character" layers can even potentially be
| reusable.
| SamBam wrote:
| Hmmm, I wanted to do that tic tac toe example, and it failed to
| create a 3x3 grid, instead creating a 5x5 (?) grid with two
| first moves marked.
|
| https://chatgpt.com/share/67e32d47-eac0-8011-9118-51b81756ec...
| nerder92 wrote:
| I tried to play it, and while the conversation is right the
| image is just all wrong
| Taek wrote:
| > What's important about this new type of image generation
| that's happening with tokens rather than with diffusion, is
| that this is effectively reasoning in pixel space.
|
| I do not think that this is correct. Prior to this release, 4o
| would generate images by calling out to a fully external model
| (DALL-E). After this release, 4o generates images by calling
| out to a multi-modal model that was trained alongside it. It's
| the same thing as LLaVA.
|
| You can ask 4o about this yourself. Here's what it said to me:
|
| "So while I'm deeply multimodal in cognition (understanding and
| coordinating text + image), image generation is handled by a
| linked latent diffusion model, not an end-to-end token-unified
| architecture."
| krackers wrote:
| So what's the lore with why this took over a _year_ to launch
| from the first announcement. It's fairly clear that their hand
| was forced by Google quietly releasing this exact feature a few
| weeks back though.
| Lerc wrote:
| I think the biggest problem I still see is the models awareness
| of the images it generated itself.
|
| The glaring issue for the older image generators is how it would
| proudly proclaim to have presented an image with a description
| that has almost no relation to the image it actually provided.
|
| I'm not sure if this update improves on this aspect. It may
| create the illusion of awareness of the picture by having better
| prompt adherence.
| nprateem wrote:
| The garbled text on these things always just makes them basically
| useless, especially it often text without being told to like
| previous models.
| chairdoor wrote:
| you are being served the old model
| mclau156 wrote:
| I would love to see advancement in the pixel art space,
| specifying 64x64 pixels and attempting to make game-ready pixel
| art and even animations, or even taking a reference image and
| creating a 64x64 version
| bbstats wrote:
| that "best of 8" is doing a lot of work. i put in the same input
| and the image is awful.
| nycdatasci wrote:
| Here's an example of iterative editing with the new model:
| https://chatgpt.com/share/67e30f62-12f0-800f-b1d7-b3a9c61e99...
|
| It's much better than prior models, but still generates hands
| with too many fingers, bodies with too many arms, etc.
| dmd wrote:
| You know the images themselves don't get shared in links like
| that, right? (It even tells you so when you make the link.)
| rahimnathwani wrote:
| I created a shared link just now, was not presented with any
| such warning, and have the same problem with the image not
| showing up:
|
| https://chatgpt.com/share/67e319dd-
| bd08-8013-8f9b-6f5140137f...
| dmd wrote:
| Interesting. I see this: https://imgur.com/a/QNWeEoZ
| rahimnathwani wrote:
| Aha! I see different messages in the Android app vs. web
| app.
|
| In the web app I see:
|
| Your name, custom instructions, and any messages you add
| after sharing stay private. Learn more
| andai wrote:
| The image shows for me.
| rahimnathwani wrote:
| For some reason, I can't see the images in that chat, whether
| I'm signed in or in incognito mode.
|
| I see errors like this in the console:
|
| ewwsdwx05evtcc3e.js:96 Error: Could not fetch file with ID file
| _0000000028185230aa1870740fa3887b?shared_conversation_id=67e30f
| 62-12f0-800f-b1d7-b3a9c61e99d6 from file service at
| iehdyv0kxtwne4ww.js:1:671 at async w
| (iehdyv0kxtwne4ww.js:1:600) at async queryFn
| (iehdyv0kxtwne4ww.js:1:458)Caused by:
| ClientRequestMismatchedAuthError: No access token when trying
| to use AuthHeader
| ravedave5 wrote:
| Everyone should try running their prompts and see how over hyped
| this is. The results I get are terrible comparatively.
| nycdatasci wrote:
| I don't think the new model is rolled out to all users yet.
| DrNosferatu wrote:
| Could they have switched to *both* image and text generation via
| diffusion, without tokens?
| M4v3R wrote:
| I've just tried it and oh wow it's really good. I managed to
| create a birthday invitation card for my daughter in basically
| 1-shot, it nailed exactly the elements and style I wanted. Then I
| asked to retain everything but tweak the text to add more details
| about the date, venue etc. And it did. I'm in shock. Previous
| models would not be even halfway there.
| swyx wrote:
| share prompt minus identifying details?
| M4v3R wrote:
| > Draw a birthday invitation for a 4 year old girl [name
| here]. It should be whimsical, look like its hand-drawn with
| little drawings on the sides of stuff like dinosaurs,
| flowers, hearts, cats. The background should be light and the
| foreground elements should be red, pink, orange and blue.
|
| Then I asked for some changes:
|
| > That's almost perfect! Retain this style and the elements,
| but adjust the text to read:
|
| > [refined text]
|
| > And then below it should add the location and date details:
|
| > [location details]
| tantaman wrote:
| Attention to every detail, even the awkward nerd high-five.
| danhds wrote:
| To avoid confusion, why not always use a general AI model
| upfront, then depending on the user's prompt, redirect it to a
| specific model?
| n2d4 wrote:
| The models are noticeably different -- for example, o1 and o3
| have reasoning, and some users (eg. me) want to tell the model
| when to use reasoning, and when not.
|
| As to why they don't automatically detect when reasoning could
| be appropriate and then switch to o3, I don't know, but I'd
| assume it's about cost (and for most users the output quality
| is negligible). 4o can do everything, it's just not great at
| "logic".
| polotics wrote:
| well it failed on me, after many tries:
|
| ...Once the wait time is up, I can generate the corrected version
| with exactly eight characters: five mice, one elephant, one polar
| bear, and one giraffe in a green turtleneck. Let me know if you'd
| like me to try again later!
| n2d4 wrote:
| For those who are still getting the old DALL-E images inside
| ChatGPT, you can access the new model on Sora:
| https://sora.com/explore/images
| ibzsy wrote:
| Anyone else frightened by this? Seeing meant believing, and now
| that isnt the case anymore...
| layer8 wrote:
| Look closer at the fingers. These models still don't have a
| firm handle on them. The right elbow on the second picture also
| doesn't quite look anatomically possible.
| wepple wrote:
| This specifically? No. We've been on this path a while now.
|
| The general idea of indistinguishable real/fake images; yeah
| quectophoton wrote:
| Nah, I'll maybe start taking them seriously when they can draw
| someone grating cheese, but holding the cheese and the grater
| as if they were playing violin.
| kylehotchkiss wrote:
| They still all have a somewhat cold and sterile look to them.
| Probably that 1% the next decade will be spent working out.
| ashvardanian wrote:
| The pre-recorded short videos are a much better form of
| presentation than live-streamed announcements!
| freeopinion wrote:
| It bothers me to see links to content that requires a login. I
| don't expect openai or anyone else to give their services away
| for free. But I feel like "news" posts that require one to setup
| an account with a vendor are bad faith.
|
| If the subject matter is paywalled, I feel that the post should
| include some explanation of what is newsworthy behind the link.
| macleginn wrote:
| A real improvement, but it still drew me a door with a handle
| where the should be one and an extra knob on the side where
| hinges are.
| trekkie1024 wrote:
| Interesting that in the second image the text on the whiteboard
| changes (top left)
| BigParm wrote:
| They say it must be an important OpenAI announcement when they
| bring out the twink.
| akomtu wrote:
| The real test for image generators is the image->text->image
| conversion. In other words it should be able to describe an image
| with words and then use the words to recreate the original image
| with a high accuracy.
___________________________________________________________________
(page generated 2025-03-25 23:00 UTC)