[HN Gopher] Create and edit images with Gemini 2.0 in preview
___________________________________________________________________
Create and edit images with Gemini 2.0 in preview
Author : meetpateltech
Score : 134 points
Date : 2025-05-07 16:06 UTC (6 hours ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| jansan wrote:
| Some examples are quite impressive, but the one with the ice bear
| on the white mug is very underwhelming and the co-drawing looks
| like it was hacked together by a vibe coder.
| thornewolf wrote:
| The co-drawing is definitely not a fully fleshed-out product or
| anything but I think it is a great tech demo. What don't you
| like about it?
| egamirorrim wrote:
| I don't understand how to use this, I keep trying to edit a photo
| (change a jacket to a t-shirt) of myself in the Gemini app with
| 2.0 flash selected and it just generated a new image that's
| nothing like the original
| thornewolf wrote:
| It is very sensitive to your input prompts. Minor differences
| will result in drastic quality differences.
| FergusArgyll wrote:
| I think this is just in AI Studio. In the Gemini app I think it
| goes: Flash describes the image to imagen -> imagen generates a
| new image
| julianeon wrote:
| Remember you are paying about 4 cents an image if I'm
| understanding the pricing correctly.
| thornewolf wrote:
| Model outputs look good-ish. I think they are neat. I updated my
| recent hack project https://lifestyle.photo to the new model.
| It's middling-to-good.
|
| There are a lot of failure modes still but what I want is a very
| large cookbook showing what known-good workflows are. Since this
| is just so directly downstream of (limited) training data it
| might be that I am just prompting in a ever so slightly bad way.
| nico wrote:
| Love your project, great application of gen AI, very
| straightforward value proposition, excellent and clear
| messaging
|
| Very well done!
| thornewolf wrote:
| Thank you for the kind words! I am looking forward to
| creating a Show HN next week alongside a Product Hunt
| announcement. I appreciate any and all feedback. You can
| provide it through the website directly or through the email
| I have attached in my bio.
| sigmaisaletter wrote:
| Re your project: I'd expect at least the demo to not have an
| obvious flaw. The "lifestyle" version of your bag has a handle
| that is nearly twice as long as the "product" version.
| thornewolf wrote:
| This is a fair critique. While I am merely a "LLM wrapper", I
| should put the product's best foot forward and pay more
| attention to my showcase examples.
| ohadron wrote:
| For one thing, it's way faster than the OpenAI equivalent in a
| way that might unlock additional use cases.
| freedomben wrote:
| Speed has been the consistent thing I've noticed with Gemini
| too, even going back to the earlier days when Gemini was a bit
| of a laughing stock. Gemini is _fast_
| julianeon wrote:
| I don't know exactly the speed/quality tradeoff but I'll tell
| you this: Google may be erring too much on the speed side. It's
| fast but junk. I suspect a lot of people try it then bounce off
| back to Midjourney, like I did.
| eminence32 wrote:
| This seems neat, I guess. But whenever I try tools like this, I
| often run into the limits of what I can describe in words. I
| might try something like "Add some clutter to the desk, including
| stacks of paper and notebooks" but when it doesn't quite look
| like what I want, I'm not sure what else to do except try
| slightly different wordings until the output happens to land on
| what I want.
|
| I'm sure part of this is a lack of imagination on my part about
| how to describe the vague image in my own head. But I guess I
| have a lot of doubts about using a conversational interface for
| this kind of stuff
| monster_truck wrote:
| Chucking images at any model that supports image input and
| asking it to describe specific areas/things 'in extreme detail'
| is a decent way to get an idea of what its expecting vs what
| you want.
| thornewolf wrote:
| +1 to this flow. I use the exact same phrase "in extreme
| detail" as well haha. Additionally, I ask the model to
| describe what prompt it might write to produce some edit
| itself.
| qoez wrote:
| Maybe that's how the future will unfold. There will be subtle
| things AI fails to learn, and there will be differences in
| skills in how good people are at making AI do things, which
| will be a new skill in itself and will end up being determining
| difference in pay in the future.
| crooked-v wrote:
| I just tried a couple of cases that ChatGPT is bad at
| (reproducing certain scenes/setpieces from classic tabletop RPG
| adventures, like the weird pyramid from classic D&D B4 The Lost
| City), and Gemini fails in just about the same way of getting
| architectural proportions and scenery details wrong even when
| given simple, broad rules about them. Adding more detail seems
| kind of pointless when it can't even get basics like "creature
| X is about as tall as the building around it" or "the pyramid
| is surrounded by ruined buildings" right.
| BoorishBears wrote:
| What's an example of a prompt you tried and it failed on?
| xbmcuser wrote:
| ask Gemini to word your thoughts better then use those to do
| the image editing.
| Nevermark wrote:
| Perhaps describe the types and styles of work associated with
| the desk, to create a coherent character to the clutter
| zoogeny wrote:
| I would politely suggest you work at getting better at this
| since it would be a pretty important skill in a world where a
| lot of creative work is done by AI.
|
| As some have mentioned, LLMs are treasure troves of information
| for learning how to prompt the LLM. One thing to get over is a
| fear of embarrassment in what you say to the LLM. Just write a
| stream of consciousness to the LLM about what you want and ask
| it to generate a prompt based on that. "I have an image that I
| am trying to get an image LLM to add some clutter to. But when
| I ask it to do it, like I say add some stack of paper and
| notebooks, but it doesn't look like I want because they are
| neat stacks of paper. What I want is a desk that kind of looks
| like it has been worked at for a while by a typical office
| worker, like at the end of the day with a half empty coffee cup
| and .... ". Just ramble away and then ask the LLM to give you
| the best prompt. And if it doesn't work, literally go back to
| the same message chain and say "I tried that prompt and it was
| [better|worse] than before because ...".
|
| This is one of those opportunities where life is giving you an
| option: give up or learn. Choose wisely.
| betterThanTexas wrote:
| > I'm sure part of this is a lack of imagination on my part
| about how to describe the vague image in my own head.
|
| This is more related to our ability to articulate than is easy
| to demonstrate, in my experience. I can certainly produce
| images in my head I have difficulty reproducing well and
| consistently via linguistic description.
| SketchySeaBeast wrote:
| It's almost as if being able to create art accurate to our
| mental vision requires practice and skill, be it the ability
| to create an image or to write it and invoke an imagine in
| others.
| betterThanTexas wrote:
| Absolutely! But this was surprising to me--my intuition
| says if I can firmly visualize something, I should be able
| to describe it. I think many people have this assumption
| and it's responsible for a lot of projection in our social
| lives.
| SketchySeaBeast wrote:
| Yeah, it's probably a good argument for having people try
| some form of art, to have them understand that their
| intent and their outcome is rarely the same.
| GaggiX wrote:
| Not available in the EU, first version was and then removed.
|
| Btw still not as good as ChatGPT but much, much faster, it's a
| nice progress compare to the previous model.
| refulgentis wrote:
| Another release from Google!
|
| Now I can use:
|
| - Gemini 2.0 Flash Image Generation Preview (May) instead of
| Gemini 2.0 Flash Image Generation Preview (March)
|
| - or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview
| ("natively multimodal" w/o image generation)
|
| - When I need to control thinking budgets, I can do that with
| Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price
| increase over a month prior
|
| - And when I need realtime, fallback to Gemini 2.0 Flash 001 Live
| Preview (announced as In Preview on April 9 2025 after the
| Multimodal Live API was announced as released on December 11
| 2024)
|
| - I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO
| Edition's thinking budgets, but good news follows in the next
| bullet: they'll swap the model out underneath me with one that
| thinks ~10x less so at least its in the same cost ballpark as
| their competitors
|
| - and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25
| released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday!
| Yay!
| justanotheratom wrote:
| Yay! do you use your Gemini in Gemini App or AI Studio or
| Vertex AI?
| refulgentis wrote:
| I am Don Quixote, building a app that abstracts over models
| (i.e. allows user choice), while providing them a user-
| controlled set of tools, and allowing users to write their
| own "scripts", i.e. precanned dialogue / response steps to
| permit ex. building of search.
|
| Which is probably what makes me so cranky here. It's very
| hard keeping track of all of it and doing my best to lever up
| the models that are behind Claude's agentic capabilities, and
| all the Newspeak of Google PR makes it consume almost as much
| energy as the rest of the providers combined. (I'm v
| frustrated that I didn't realize till yesterday that 2.0
| Flash had quietly gone from 10 RPM to 'you can actually use
| it')
|
| I'm a Xoogler and I get why this happens ("preview" is a
| magic wand that means "you don't have to get everyone in
| bureaucracy across DeepMind/Cloud/? to agree to get this done
| and fill out their damn launchcal"), but, man.
| xnx wrote:
| A matrix of models, capabilities, and prices would be really
| useful.
| cush wrote:
| The doodle demo is super fun
|
| https://aistudio.google.com/apps/bundled/gemini-co-drawing?s...
| minimaxir wrote:
| Of note is that the per-image pricing for Gemini 2.0 image
| generation is $0.039 per image, which is more expensive than
| Imagen 3 ($0.03 per image): https://ai.google.dev/gemini-
| api/docs/pricing
|
| The main difference is that Gemini does allow for incorporating a
| conversation to generate the image as demoed here, while Imagen 3
| is a strict text-in/image-out with optional mask-constrained
| edits but likely allows for higher-quality images overall if
| skilled with prompt engineering. This is a nuance that is
| annoying to differentiate.
| ipsum2 wrote:
| > likely allows for higher-quality images overall
|
| What makes you say that?
| vunderba wrote:
| Anecdotal but from preliminary sandbox testing side-by-side
| with Gemini 2.0 Flash and Imagen 3.0 - it definitely appears
| that _that_ is the case - higher overall visual quality from
| Imagen 3.
| adverbly wrote:
| Google totally crushing it and stock is down 8% today :|
|
| Is it just me or is the market just absolutely terrible at
| understanding the implications and speed of progress behind
| what's happening right now in the walls of big G?
| lenerdenator wrote:
| The market is absolutely terrible at a _lot_ of things.
| abirch wrote:
| A potential reason that GOOG is down right now is that Apple is
| looking at AI Search Engines.
|
| https://www.bloomberg.com/news/articles/2025-05-07/apple-wor...
|
| Although AI is fun and great, an AI search engine may have
| issues of being unprofitable. It's similar to how 23 and Me got
| many customers selling a 500 dollar test to people for 100
| dollars.
| xnx wrote:
| Would be quite a financial swing for Apple from getting paid
| billions of dollars by Google for search to having to spend
| billions of dollars to make their own.
| abirch wrote:
| From the article Eddy Cue is Apple's senior vice president
| of services. "Cue said he believes that AI search
| providers, including OpenAI, Perplexity AI Inc. and
| Anthropic PBC, will eventually replace standard search
| engines like Alphabet's Google. He said he believes Apple
| will bring those options to Safari in the future."
|
| So Apple may not be making their own, but they won't be
| spending billions either. I'm wondering how the people will
| be able to monetize the searches so that they make money.
| mattlondon wrote:
| FWIW I searched this story not long after it broke and
| Google - yes the traditional "old school search engine" -
| had an AI-generated summary of the story with a breakdown
| of the whys and how's right there at the top of the page.
| This was basically real time given or take 10 minutes.
|
| I am not sure why people think OpenAI et al are going to
| eat Google's lunch here. Seems like they're already doing
| AI-for-search and if there is anyone who can do it
| cheaply and at scale I bet on Google being the ones to do
| it (with all their data centers, data
| integrations/crawlers, and custom hardware and experience
| etc). I doubt some startup using the Bing-index and
| renting off-the-shelf Nvidia hardware using investor-
| funds is going to leapfrog Google-scale infrastructure
| and expertise.
| resource_waste wrote:
| Why would any of this have an impact on stock prices?
|
| LLMs are insanely competitive and a dime a dozen now. Most
| professional uses can get away with local models.
|
| This is image generation... Niche cases in another saturated
| market.
|
| How are any of these supposed to make google billions of
| dollars?
| mNovak wrote:
| I'm getting mixed results with the co-drawing demo, in terms of
| understanding what stick figures are, which seems pretty
| important for the 99% of us who can't draw a realistic human. I
| was hoping to sketch a scene, and let the model "inflate" it, but
| I ended up with 3D rendered stick figures.
|
| Seems to help if you explicitly describe the scene, but then the
| drawing-along aspect seem relatively pointless.
| qq99 wrote:
| Wasn't this already available in AI Studio? It sounds like they
| also improved the image quality. It's hard to keep up with what's
| new with all these versions
| taylorhughes wrote:
| Image editing/compositing/remixing is not quite as good as gpt-
| image-1, but the results are really compelling anyway due to the
| dramatic increase in speed! Playing with it just now, it's often
| 5 seconds for a compositing task between multiple images. Feels
| totally different from waiting 30s+ for gpt-image-1.
| vunderba wrote:
| I've added/tested this multimodal Gemini 2.0 to my shoot-out of
| SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which
| contains a collection of increasingly difficult prompts.
|
| https://genai-showdown.specr.net
|
| I don't know how much of Google's original Imagen 3.0 is
| incorporated into this new model, but the overall aesthetic
| quality seems to be unfortunately _significantly worse._
|
| The big "wins" are:
|
| - Multimodal aspect in trying to keep parity with OpenAI's
| offerings.
|
| - An order of magnitude faster than OpenAI 4o image gen
| belter wrote:
| Your shoot-out site is very useful. Could I suggest adding
| prompts that expose common failure modes?
|
| For example, asking the models to show clocks set to a specific
| time or people drawing with their left hand. I think most, if
| not all models, will likely display every clock with the same
| time...And portray subjects drawing with their right hand.
| crooked-v wrote:
| Another I would suggest is buildings with specific unusual
| proportions and details(e.g. "the mansion's west wing is
| twice the height of the right wing and has only very wide
| windows"). I've yet to find a model that will do that kind of
| thing reliably, where it seems to just fall back on the vibes
| of whatever painting or book cover is vaguely similar to
| what's described.
| droopyEyelids wrote:
| generating a simple maze for kids is also not possible yet
| vunderba wrote:
| Love this one so I've added it. The concept is very easy
| for most GenAI models to grasp, but it requires a strong
| overall cohesive understanding. Rather unbelievably,
| OpenAI 4o managed to produce a pass.
|
| I should also add an image that is heavy with "greebles".
| GenAI usually lacks the fidelity for these kinds of minor
| details so although it adds them - they tend to fall
| apart at more than a cursory examination.
|
| https://en.wikipedia.org/wiki/Greeble
| vunderba wrote:
| @belter / @crooked-v
|
| Thanks for the suggestions. Most of the current prompts are a
| result of personal images that I wanted to generate - so I'll
| try to add some "classic GenAI failure modes". Musical
| instruments such as pianos also used to be a pretty big
| failure point as well.
| pentagrama wrote:
| I want to take a step back and reflect on what this actually
| shows us. Look at the examples Google provides: it refers to the
| generated objects as "products", clearly pointing toward shopping
| or e-commerce use cases.
|
| It seems like the real goal here, for Google and other AI
| companies, is a world flooded with endless AI-generated variants
| of objects that don't even exist yet, crafted to be sold and
| marketed (probably by AI too) to hyper-targeted audiences. This
| feels like an incoming wave of "AI slop", mass-produced synthetic
| content, crashing against the small island of genuine human
| craftsmanship and real, existing objects.
| vunderba wrote:
| Yeah - and honestly I don't really get this. Using GenAI for
| real-world products seems like a recipe for a slew of incoming
| fraudulent advertising lawsuits if the images are slightly
| different from the actual physical products yet _presented_ as
| if they are actual real photographs.
| nkozyra wrote:
| The gating factor here is the pool of consumers. Once people
| have slop exhaustion there's nobody to sell this to.
|
| Maybe this is why all of the future AI fiction has people
| dressed in the same bland clothing.
| simonw wrote:
| Be a bit careful playing with this one. I tried this:
| curl -s -X POST \ "https://generativelanguage.googleapis.
| com/v1beta/models/gemini-2.0-flash-preview-image-
| generation:generateContent?key=$GEMINI_API_KEY" \ -H
| "Content-Type: application/json" \ -d '{
| "contents": [{ "parts": [ {"text":
| "Provide a vegetarian recipe for butter chicken but with
| chickpeas not chicken and include many inline illustrations along
| the way"} ] }],
| "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
| }' > /tmp/out.json
|
| And got back 41MB of JSON with 28 base64 images in it:
| https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...
|
| At 4c per image that's more than a dollar on that single prompt.
|
| I built this quick tool https://tools.simonwillison.net/gemini-
| image-json for pasting that JSON into to see it rendered.
| mvdtnz wrote:
| I gave this a crack this morning, trying something very similar
| to the examples. I tried to get Gemini 2.0 Preview to add a set
| of bi-fold doors to a picture of a house in a particular place.
| It failed completely. It put them in the wrong place, they looked
| absolutely hideous (like I had pasted them in with MS Paint) and
| the more I tried to correct it with prompts the worse it got. At
| one point when I re-prompted it, it said
|
| > Okay, I understand. You want me to replace ONLY the four
| windows located underneath the arched openings on the right side
| of the house with bifold doors, leaving all other features of the
| house unchanged. Here is the edited image:
|
| Followed by no image. This is a behaviour I have seen many times
| from Gemini in the past so it's frustrating that it's still a
| problem.
|
| I give this a 0/10 for my first use case.
___________________________________________________________________
(page generated 2025-05-07 23:00 UTC)