hngopher.com

       [HN Gopher] Create and edit images with Gemini 2.0 in preview
       ___________________________________________________________________
        
       Create and edit images with Gemini 2.0 in preview
        
       Author : meetpateltech
       Score  : 134 points
       Date   : 2025-05-07 16:06 UTC (6 hours ago)
        
 (HTM) web link (developers.googleblog.com)
 (TXT) w3m dump (developers.googleblog.com)
        
       | jansan wrote:
       | Some examples are quite impressive, but the one with the ice bear
       | on the white mug is very underwhelming and the co-drawing looks
       | like it was hacked together by a vibe coder.
        
         | thornewolf wrote:
         | The co-drawing is definitely not a fully fleshed-out product or
         | anything but I think it is a great tech demo. What don't you
         | like about it?
        
       | egamirorrim wrote:
       | I don't understand how to use this, I keep trying to edit a photo
       | (change a jacket to a t-shirt) of myself in the Gemini app with
       | 2.0 flash selected and it just generated a new image that's
       | nothing like the original
        
         | thornewolf wrote:
         | It is very sensitive to your input prompts. Minor differences
         | will result in drastic quality differences.
        
         | FergusArgyll wrote:
         | I think this is just in AI Studio. In the Gemini app I think it
         | goes: Flash describes the image to imagen -> imagen generates a
         | new image
        
         | julianeon wrote:
         | Remember you are paying about 4 cents an image if I'm
         | understanding the pricing correctly.
        
       | thornewolf wrote:
       | Model outputs look good-ish. I think they are neat. I updated my
       | recent hack project https://lifestyle.photo to the new model.
       | It's middling-to-good.
       | 
       | There are a lot of failure modes still but what I want is a very
       | large cookbook showing what known-good workflows are. Since this
       | is just so directly downstream of (limited) training data it
       | might be that I am just prompting in a ever so slightly bad way.
        
         | nico wrote:
         | Love your project, great application of gen AI, very
         | straightforward value proposition, excellent and clear
         | messaging
         | 
         | Very well done!
        
           | thornewolf wrote:
           | Thank you for the kind words! I am looking forward to
           | creating a Show HN next week alongside a Product Hunt
           | announcement. I appreciate any and all feedback. You can
           | provide it through the website directly or through the email
           | I have attached in my bio.
        
         | sigmaisaletter wrote:
         | Re your project: I'd expect at least the demo to not have an
         | obvious flaw. The "lifestyle" version of your bag has a handle
         | that is nearly twice as long as the "product" version.
        
           | thornewolf wrote:
           | This is a fair critique. While I am merely a "LLM wrapper", I
           | should put the product's best foot forward and pay more
           | attention to my showcase examples.
        
       | ohadron wrote:
       | For one thing, it's way faster than the OpenAI equivalent in a
       | way that might unlock additional use cases.
        
         | freedomben wrote:
         | Speed has been the consistent thing I've noticed with Gemini
         | too, even going back to the earlier days when Gemini was a bit
         | of a laughing stock. Gemini is _fast_
        
         | julianeon wrote:
         | I don't know exactly the speed/quality tradeoff but I'll tell
         | you this: Google may be erring too much on the speed side. It's
         | fast but junk. I suspect a lot of people try it then bounce off
         | back to Midjourney, like I did.
        
       | eminence32 wrote:
       | This seems neat, I guess. But whenever I try tools like this, I
       | often run into the limits of what I can describe in words. I
       | might try something like "Add some clutter to the desk, including
       | stacks of paper and notebooks" but when it doesn't quite look
       | like what I want, I'm not sure what else to do except try
       | slightly different wordings until the output happens to land on
       | what I want.
       | 
       | I'm sure part of this is a lack of imagination on my part about
       | how to describe the vague image in my own head. But I guess I
       | have a lot of doubts about using a conversational interface for
       | this kind of stuff
        
         | monster_truck wrote:
         | Chucking images at any model that supports image input and
         | asking it to describe specific areas/things 'in extreme detail'
         | is a decent way to get an idea of what its expecting vs what
         | you want.
        
           | thornewolf wrote:
           | +1 to this flow. I use the exact same phrase "in extreme
           | detail" as well haha. Additionally, I ask the model to
           | describe what prompt it might write to produce some edit
           | itself.
        
         | qoez wrote:
         | Maybe that's how the future will unfold. There will be subtle
         | things AI fails to learn, and there will be differences in
         | skills in how good people are at making AI do things, which
         | will be a new skill in itself and will end up being determining
         | difference in pay in the future.
        
         | crooked-v wrote:
         | I just tried a couple of cases that ChatGPT is bad at
         | (reproducing certain scenes/setpieces from classic tabletop RPG
         | adventures, like the weird pyramid from classic D&D B4 The Lost
         | City), and Gemini fails in just about the same way of getting
         | architectural proportions and scenery details wrong even when
         | given simple, broad rules about them. Adding more detail seems
         | kind of pointless when it can't even get basics like "creature
         | X is about as tall as the building around it" or "the pyramid
         | is surrounded by ruined buildings" right.
        
           | BoorishBears wrote:
           | What's an example of a prompt you tried and it failed on?
        
         | xbmcuser wrote:
         | ask Gemini to word your thoughts better then use those to do
         | the image editing.
        
         | Nevermark wrote:
         | Perhaps describe the types and styles of work associated with
         | the desk, to create a coherent character to the clutter
        
         | zoogeny wrote:
         | I would politely suggest you work at getting better at this
         | since it would be a pretty important skill in a world where a
         | lot of creative work is done by AI.
         | 
         | As some have mentioned, LLMs are treasure troves of information
         | for learning how to prompt the LLM. One thing to get over is a
         | fear of embarrassment in what you say to the LLM. Just write a
         | stream of consciousness to the LLM about what you want and ask
         | it to generate a prompt based on that. "I have an image that I
         | am trying to get an image LLM to add some clutter to. But when
         | I ask it to do it, like I say add some stack of paper and
         | notebooks, but it doesn't look like I want because they are
         | neat stacks of paper. What I want is a desk that kind of looks
         | like it has been worked at for a while by a typical office
         | worker, like at the end of the day with a half empty coffee cup
         | and .... ". Just ramble away and then ask the LLM to give you
         | the best prompt. And if it doesn't work, literally go back to
         | the same message chain and say "I tried that prompt and it was
         | [better|worse] than before because ...".
         | 
         | This is one of those opportunities where life is giving you an
         | option: give up or learn. Choose wisely.
        
         | betterThanTexas wrote:
         | > I'm sure part of this is a lack of imagination on my part
         | about how to describe the vague image in my own head.
         | 
         | This is more related to our ability to articulate than is easy
         | to demonstrate, in my experience. I can certainly produce
         | images in my head I have difficulty reproducing well and
         | consistently via linguistic description.
        
           | SketchySeaBeast wrote:
           | It's almost as if being able to create art accurate to our
           | mental vision requires practice and skill, be it the ability
           | to create an image or to write it and invoke an imagine in
           | others.
        
             | betterThanTexas wrote:
             | Absolutely! But this was surprising to me--my intuition
             | says if I can firmly visualize something, I should be able
             | to describe it. I think many people have this assumption
             | and it's responsible for a lot of projection in our social
             | lives.
        
               | SketchySeaBeast wrote:
               | Yeah, it's probably a good argument for having people try
               | some form of art, to have them understand that their
               | intent and their outcome is rarely the same.
        
       | GaggiX wrote:
       | Not available in the EU, first version was and then removed.
       | 
       | Btw still not as good as ChatGPT but much, much faster, it's a
       | nice progress compare to the previous model.
        
       | refulgentis wrote:
       | Another release from Google!
       | 
       | Now I can use:
       | 
       | - Gemini 2.0 Flash Image Generation Preview (May) instead of
       | Gemini 2.0 Flash Image Generation Preview (March)
       | 
       | - or when I need text, Gemini 2.5 Flash Thinking 04-17 Preview
       | ("natively multimodal" w/o image generation)
       | 
       | - When I need to control thinking budgets, I can do that with
       | Gemini 2.5 Flash Preview 04-17, with not-thinking at a 50% price
       | increase over a month prior
       | 
       | - And when I need realtime, fallback to Gemini 2.0 Flash 001 Live
       | Preview (announced as In Preview on April 9 2025 after the
       | Multimodal Live API was announced as released on December 11
       | 2024)
       | 
       | - I can't control Gemini 2.5 Pro Experimental/Preview/Preview IO
       | Edition's thinking budgets, but good news follows in the next
       | bullet: they'll swap the model out underneath me with one that
       | thinks ~10x less so at least its in the same cost ballpark as
       | their competitors
       | 
       | - and we all got autoupgraded from Gemini 2.5 Pro Preview (03/25
       | released 4/2) to Gemini 2.5 Pro Preview (IO Edition) yesterday!
       | Yay!
        
         | justanotheratom wrote:
         | Yay! do you use your Gemini in Gemini App or AI Studio or
         | Vertex AI?
        
           | refulgentis wrote:
           | I am Don Quixote, building a app that abstracts over models
           | (i.e. allows user choice), while providing them a user-
           | controlled set of tools, and allowing users to write their
           | own "scripts", i.e. precanned dialogue / response steps to
           | permit ex. building of search.
           | 
           | Which is probably what makes me so cranky here. It's very
           | hard keeping track of all of it and doing my best to lever up
           | the models that are behind Claude's agentic capabilities, and
           | all the Newspeak of Google PR makes it consume almost as much
           | energy as the rest of the providers combined. (I'm v
           | frustrated that I didn't realize till yesterday that 2.0
           | Flash had quietly gone from 10 RPM to 'you can actually use
           | it')
           | 
           | I'm a Xoogler and I get why this happens ("preview" is a
           | magic wand that means "you don't have to get everyone in
           | bureaucracy across DeepMind/Cloud/? to agree to get this done
           | and fill out their damn launchcal"), but, man.
        
         | xnx wrote:
         | A matrix of models, capabilities, and prices would be really
         | useful.
        
       | cush wrote:
       | The doodle demo is super fun
       | 
       | https://aistudio.google.com/apps/bundled/gemini-co-drawing?s...
        
       | minimaxir wrote:
       | Of note is that the per-image pricing for Gemini 2.0 image
       | generation is $0.039 per image, which is more expensive than
       | Imagen 3 ($0.03 per image): https://ai.google.dev/gemini-
       | api/docs/pricing
       | 
       | The main difference is that Gemini does allow for incorporating a
       | conversation to generate the image as demoed here, while Imagen 3
       | is a strict text-in/image-out with optional mask-constrained
       | edits but likely allows for higher-quality images overall if
       | skilled with prompt engineering. This is a nuance that is
       | annoying to differentiate.
        
         | ipsum2 wrote:
         | > likely allows for higher-quality images overall
         | 
         | What makes you say that?
        
         | vunderba wrote:
         | Anecdotal but from preliminary sandbox testing side-by-side
         | with Gemini 2.0 Flash and Imagen 3.0 - it definitely appears
         | that _that_ is the case - higher overall visual quality from
         | Imagen 3.
        
       | adverbly wrote:
       | Google totally crushing it and stock is down 8% today :|
       | 
       | Is it just me or is the market just absolutely terrible at
       | understanding the implications and speed of progress behind
       | what's happening right now in the walls of big G?
        
         | lenerdenator wrote:
         | The market is absolutely terrible at a _lot_ of things.
        
         | abirch wrote:
         | A potential reason that GOOG is down right now is that Apple is
         | looking at AI Search Engines.
         | 
         | https://www.bloomberg.com/news/articles/2025-05-07/apple-wor...
         | 
         | Although AI is fun and great, an AI search engine may have
         | issues of being unprofitable. It's similar to how 23 and Me got
         | many customers selling a 500 dollar test to people for 100
         | dollars.
        
           | xnx wrote:
           | Would be quite a financial swing for Apple from getting paid
           | billions of dollars by Google for search to having to spend
           | billions of dollars to make their own.
        
             | abirch wrote:
             | From the article Eddy Cue is Apple's senior vice president
             | of services. "Cue said he believes that AI search
             | providers, including OpenAI, Perplexity AI Inc. and
             | Anthropic PBC, will eventually replace standard search
             | engines like Alphabet's Google. He said he believes Apple
             | will bring those options to Safari in the future."
             | 
             | So Apple may not be making their own, but they won't be
             | spending billions either. I'm wondering how the people will
             | be able to monetize the searches so that they make money.
        
               | mattlondon wrote:
               | FWIW I searched this story not long after it broke and
               | Google - yes the traditional "old school search engine" -
               | had an AI-generated summary of the story with a breakdown
               | of the whys and how's right there at the top of the page.
               | This was basically real time given or take 10 minutes.
               | 
               | I am not sure why people think OpenAI et al are going to
               | eat Google's lunch here. Seems like they're already doing
               | AI-for-search and if there is anyone who can do it
               | cheaply and at scale I bet on Google being the ones to do
               | it (with all their data centers, data
               | integrations/crawlers, and custom hardware and experience
               | etc). I doubt some startup using the Bing-index and
               | renting off-the-shelf Nvidia hardware using investor-
               | funds is going to leapfrog Google-scale infrastructure
               | and expertise.
        
         | resource_waste wrote:
         | Why would any of this have an impact on stock prices?
         | 
         | LLMs are insanely competitive and a dime a dozen now. Most
         | professional uses can get away with local models.
         | 
         | This is image generation... Niche cases in another saturated
         | market.
         | 
         | How are any of these supposed to make google billions of
         | dollars?
        
       | mNovak wrote:
       | I'm getting mixed results with the co-drawing demo, in terms of
       | understanding what stick figures are, which seems pretty
       | important for the 99% of us who can't draw a realistic human. I
       | was hoping to sketch a scene, and let the model "inflate" it, but
       | I ended up with 3D rendered stick figures.
       | 
       | Seems to help if you explicitly describe the scene, but then the
       | drawing-along aspect seem relatively pointless.
        
       | qq99 wrote:
       | Wasn't this already available in AI Studio? It sounds like they
       | also improved the image quality. It's hard to keep up with what's
       | new with all these versions
        
       | taylorhughes wrote:
       | Image editing/compositing/remixing is not quite as good as gpt-
       | image-1, but the results are really compelling anyway due to the
       | dramatic increase in speed! Playing with it just now, it's often
       | 5 seconds for a compositing task between multiple images. Feels
       | totally different from waiting 30s+ for gpt-image-1.
        
       | vunderba wrote:
       | I've added/tested this multimodal Gemini 2.0 to my shoot-out of
       | SOTA image gen models (OpenAI 4o, Midjourney 7, Flux, etc.) which
       | contains a collection of increasingly difficult prompts.
       | 
       | https://genai-showdown.specr.net
       | 
       | I don't know how much of Google's original Imagen 3.0 is
       | incorporated into this new model, but the overall aesthetic
       | quality seems to be unfortunately _significantly worse._
       | 
       | The big "wins" are:
       | 
       | - Multimodal aspect in trying to keep parity with OpenAI's
       | offerings.
       | 
       | - An order of magnitude faster than OpenAI 4o image gen
        
         | belter wrote:
         | Your shoot-out site is very useful. Could I suggest adding
         | prompts that expose common failure modes?
         | 
         | For example, asking the models to show clocks set to a specific
         | time or people drawing with their left hand. I think most, if
         | not all models, will likely display every clock with the same
         | time...And portray subjects drawing with their right hand.
        
           | crooked-v wrote:
           | Another I would suggest is buildings with specific unusual
           | proportions and details(e.g. "the mansion's west wing is
           | twice the height of the right wing and has only very wide
           | windows"). I've yet to find a model that will do that kind of
           | thing reliably, where it seems to just fall back on the vibes
           | of whatever painting or book cover is vaguely similar to
           | what's described.
        
             | droopyEyelids wrote:
             | generating a simple maze for kids is also not possible yet
        
               | vunderba wrote:
               | Love this one so I've added it. The concept is very easy
               | for most GenAI models to grasp, but it requires a strong
               | overall cohesive understanding. Rather unbelievably,
               | OpenAI 4o managed to produce a pass.
               | 
               | I should also add an image that is heavy with "greebles".
               | GenAI usually lacks the fidelity for these kinds of minor
               | details so although it adds them - they tend to fall
               | apart at more than a cursory examination.
               | 
               | https://en.wikipedia.org/wiki/Greeble
        
           | vunderba wrote:
           | @belter / @crooked-v
           | 
           | Thanks for the suggestions. Most of the current prompts are a
           | result of personal images that I wanted to generate - so I'll
           | try to add some "classic GenAI failure modes". Musical
           | instruments such as pianos also used to be a pretty big
           | failure point as well.
        
       | pentagrama wrote:
       | I want to take a step back and reflect on what this actually
       | shows us. Look at the examples Google provides: it refers to the
       | generated objects as "products", clearly pointing toward shopping
       | or e-commerce use cases.
       | 
       | It seems like the real goal here, for Google and other AI
       | companies, is a world flooded with endless AI-generated variants
       | of objects that don't even exist yet, crafted to be sold and
       | marketed (probably by AI too) to hyper-targeted audiences. This
       | feels like an incoming wave of "AI slop", mass-produced synthetic
       | content, crashing against the small island of genuine human
       | craftsmanship and real, existing objects.
        
         | vunderba wrote:
         | Yeah - and honestly I don't really get this. Using GenAI for
         | real-world products seems like a recipe for a slew of incoming
         | fraudulent advertising lawsuits if the images are slightly
         | different from the actual physical products yet _presented_ as
         | if they are actual real photographs.
        
         | nkozyra wrote:
         | The gating factor here is the pool of consumers. Once people
         | have slop exhaustion there's nobody to sell this to.
         | 
         | Maybe this is why all of the future AI fiction has people
         | dressed in the same bland clothing.
        
       | simonw wrote:
       | Be a bit careful playing with this one. I tried this:
       | curl -s -X POST \         "https://generativelanguage.googleapis.
       | com/v1beta/models/gemini-2.0-flash-preview-image-
       | generation:generateContent?key=$GEMINI_API_KEY" \         -H
       | "Content-Type: application/json" \         -d '{
       | "contents": [{             "parts": [               {"text":
       | "Provide a vegetarian recipe for butter chicken but with
       | chickpeas not chicken and include many inline illustrations along
       | the way"}             ]           }],
       | "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
       | }' > /tmp/out.json
       | 
       | And got back 41MB of JSON with 28 base64 images in it:
       | https://gist.github.com/simonw/55894032b2c60b35f320b6a166ded...
       | 
       | At 4c per image that's more than a dollar on that single prompt.
       | 
       | I built this quick tool https://tools.simonwillison.net/gemini-
       | image-json for pasting that JSON into to see it rendered.
        
       | mvdtnz wrote:
       | I gave this a crack this morning, trying something very similar
       | to the examples. I tried to get Gemini 2.0 Preview to add a set
       | of bi-fold doors to a picture of a house in a particular place.
       | It failed completely. It put them in the wrong place, they looked
       | absolutely hideous (like I had pasted them in with MS Paint) and
       | the more I tried to correct it with prompts the worse it got. At
       | one point when I re-prompted it, it said
       | 
       | > Okay, I understand. You want me to replace ONLY the four
       | windows located underneath the arched openings on the right side
       | of the house with bifold doors, leaving all other features of the
       | house unchanged. Here is the edited image:
       | 
       | Followed by no image. This is a behaviour I have seen many times
       | from Gemini in the past so it's frustrating that it's still a
       | problem.
       | 
       | I give this a 0/10 for my first use case.
        
       ___________________________________________________________________
       (page generated 2025-05-07 23:00 UTC)