[HN Gopher] Experiment with Gemini 2.0 Flash native image genera...
___________________________________________________________________
Experiment with Gemini 2.0 Flash native image generation
Author : meetpateltech
Score : 43 points
Date : 2025-03-12 16:06 UTC (6 hours ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| jcuenod wrote:
| I was really hoping that there would be more character
| consistency, given the fact they mention it in the blog. It also
| doesn't seem to reliably follow styles like "watercolor
| illustration" or "line and wash".
| ilaksh wrote:
| Ever since OpenAI showed (but did not release) this type of
| multimodal output with 4o, I have been waiting for this to be
| available to the general public.
|
| It seems like really combining visuals at the level of generation
| capability means language understanding is fully grounded in a
| richer world model.
|
| I am hoping for a step up in real world common sense intelligence
| areas like those covered by SimpleBench. Although they are static
| images, so there might still be room for improvement ad far as
| physics understanding.
|
| Also, if they can get it to the point of really accurate
| (probably larger models), this unlocks whole industries in terms
| of being able to do useful work.
___________________________________________________________________
(page generated 2025-03-12 23:01 UTC)