[HN Gopher] Nano Banana can be prompt engineered for nuanced AI ...
___________________________________________________________________
Nano Banana can be prompt engineered for nuanced AI image
generation
Author : minimaxir
Score : 347 points
Date : 2025-11-13 17:39 UTC (5 hours ago)
(HTM) web link (minimaxir.com)
(TXT) w3m dump (minimaxir.com)
| doctorpangloss wrote:
| lots of words
|
| okay, look at imagen 4 ultra:
|
| https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
|
| In this link, Imagen is instructed to render the verbatim prompt
| "the result of 4+5", which shows that text, and not instructed,
| which renders "4+5=9"
|
| Is Imagen thinking?
|
| Let's compare to gemini 2.5 flash image (nano banana):
|
| look carefully at the system prompt here:
| https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
|
| Gemini is instructed to reply in images first, and if it thinks,
| to think using the image thinking tags. It cannot seemingly be
| prompted to show verbatim the result 4+5 without showing the
| answer 4+5=9. Of course it can show whatever exact text that you
| want, the question is, does it prompt rewrite (no) or do
| something else (yes)?
|
| compare to ideogram, with prompt rewriting:
| https://ideogram.ai/g/GRuZRTY7TmilGUHnks-Mjg/0
|
| without prompt rewriting:
| https://ideogram.ai/g/yKV3EwULRKOu6LDCsSvZUg/2
|
| We can do the same exercises with Flux Kontext for editing versus
| Flash-2.5, if you think that editing is somehow unique in this
| regard.
|
| Is prompt rewriting "thinking"? My point is, this article can't
| answer that question without dElViNg into the nuances of what
| multi-modal models really are.
| gryfft wrote:
| Can you provide screenshots or links that don't require login
| PunchTornado wrote:
| sorry, but I don't understand you post. those links don't work.
| dostick wrote:
| Use Google AI Studio to submit requests, and to remove watermark,
| open browser development tools and right click on request to
| "watermark_4" image and select to block it. And from next
| generation there will be no watermark!
| squigz wrote:
| I'm getting annoyed by using "prompt engineered" as a verb. Does
| this mean I'm finally old and bitter?
|
| (Do we say we software engineered something?)
| vpShane wrote:
| You're definitely old and bitter, welcome to it.
|
| You CREATED something, and I like to think that creating things
| that I love and enjoy and that others can love and enjoy makes
| creating things worth it.
| squigz wrote:
| Don't get me wrong, I have nothing against using AI as an
| expression of creativity :)
| malcolmxxx wrote:
| Create? So I have created all that code I'm running on my
| site, yes is bad I know, but thank you very much! Such
| creative guy I was!
| officeplant wrote:
| Not really since "prompt engineering" can be tossed in the same
| pile as "vibe coding." Just people coping with not developing
| the actual skills to produce the desired products.
| bongodongobob wrote:
| Couldn't care less. I don't need to know how to do literally
| everything. AI fills in my gaps and I'm a ton more
| productive.
| squigz wrote:
| I wouldn't bother trying to convince people who are upset
| that others have figured out a way to use LLMs. It's not
| logical.
| koakuma-chan wrote:
| Try getting a small model to do what you want quickly with
| high accuracy, high quality, etc, and using few tokens per
| request. You'll find out that prompt engineering is real and
| matters.
| pavlov wrote:
| I think it's meant to be engineering in the same sense as
| "social engineering".
| antegamisou wrote:
| No it means you can still discern what is BS.
| miladyincontrol wrote:
| Theres lots these models can do but I despise when people suggest
| they can do edits with "with only the necessary aspects changed".
|
| No, that simply is not true. If you actually compare the before
| and after you can see it still regenerates all the details on the
| "unchanged" aspects. Texture, lighting, sharpness, even scale its
| all different even if varyingly similar to the original.
|
| Sure they're cute for casual edits but it really pains me people
| suggesting these things are suitable replacements for actual
| photo editing. Especially when it comes to people, or details
| outside their training data theres a lot of nuance that can be
| lost as it regenerates them no matter how you prompt things.
|
| Even if you
| StevenWaterman wrote:
| That is true for gpt-image-1 but not nano-banana. They can do
| masked image changes
| minimaxir wrote:
| Nano Banana is different and much better at edits without
| changing texture/lighting/sharpness/color balance, and I am
| someone that is extremely picky about it. That's why I add the
| note that Gemini 2.5 Flash is aware of segmentation masks, and
| that's my hunch why that's the case.
| BoredPositron wrote:
| Nano banana has a really low spatial scaling and doesn't affect
| details like other models.
| miohtama wrote:
| Could you just mask out the area you wish to change in more
| advanced tools, or is there something in the model itself which
| would prevent this?
| lunarboy wrote:
| That's probably where things are headed and there are already
| products trying this (even photoshop already). Just like how
| code gen AI tools don't replace the entire file on every
| prompt iteration.
| mkagenius wrote:
| > Nano Banana is still bad at rendering text perfectly/without
| typos as most image generation models.
|
| I figured that if you write the text in Google docs and share the
| screenshot with banana it will not make any spelling mistake.
|
| So, use something like "can you write my name on this Wimbledon
| trophy, both images are attached. Use them" will work.
| minimaxir wrote:
| Google's example documentation for Nano Banana does demo that
| pipeline: https://ai.google.dev/gemini-api/docs/image-
| generation#pytho...
|
| That's on my list of blog-post-worthy things to test, namely
| text rendering to image in Python directly and passing both
| input images to the model for compositing.
| ml-anon wrote:
| "prompt engineered"...i.e. by typing in what you want to see.
| harpiaharpyja wrote:
| Not all models can actually do that if your prompt is
| particular
| pksebben wrote:
| Most designers can't, either. Defining a spec is a skill.
|
| It's actually fairly difficult to put to words any specific
| enough vision such that it becomes understandable outside of
| your own head. This goes for pretty much anything, too.
| Razengan wrote:
| Yep, knowing how and what to _ask_ is a skill.
|
| For anything, even back in the "classical" search days.
| andai wrote:
| https://habitatchronicles.com/2004/04/you-cant-tell-
| people-a...
| darepublic wrote:
| "amenable to highly specific and granular instruction"
| simonw wrote:
| ... and then iterating on that prompt many times, based on your
| accumulated knowledge of how best to prompt that particular
| model.
| minimaxir wrote:
| Case in point, the final image in this post (the IP bonanza)
| took 28 iterations of the prompt text to get something
| maximally interesting, and why that one is very particular
| about the constraints it invokes, such as specifying
| "distinct" characters and specifying they are present from
| "left to right" because the model kept exploiting that
| ambiguity.
| chankstein38 wrote:
| Hey! The author, thank you for this post! QQ, any idea
| roughly how much this experimentation cost you? I'm having
| trouble processing their image generation pricing I may
| just not be finding the right table. I'm just trying to
| understand if I do like 50 iterations at the quality in the
| post, how much is that going to cost me?
| minimaxir wrote:
| All generations in the post are $0.04/image (Nano Banana
| doesn't have a way to increase the resolution, yet), so
| you can do the math and assume that you can generated
| about 24 images per dollar: unlike other models, Nano
| Banana does charge for input tokens but it's neligible.
|
| Discounting the testing around the character JSON which
| became _extremely_ expensive due to extreme iteration /my
| own stupidity, I'd wager it took about $5 total including
| iteration.
| mensetmanusman wrote:
| We understand now that we interface with LLMs using natural and
| unnatural language as the user interface.
|
| This is a very different fuzzy interface compared to
| programming languages.
|
| There will be techniques better or worse at interfacing.
|
| This is what the term prompt engineering is alluding to since
| we don't have the full suite of language to describe this yet.
| w_for_wumbo wrote:
| Yes, that is a serious skill. How many of the woes that we see
| is because people don't know what they want or are unable to
| describe it in such a way that others understand it. I believe
| prompt engineer to properly convey how complex communication
| can be, when interacting with a multitude of perspectives,
| world views, assumptions, presumptions etc. I believe it works
| well to counter the over-confidence that people have, from not
| paying attention to what gaps exist between what is said and
| what is meant.
| CobrastanJorji wrote:
| Yes, obviously a role involving complex communication while
| interacting with a multitude of perspectives, world views,
| assumptions, presumptions, etc needs to be called "engineer."
|
| That is why I always call technical writers "documentation
| engineers," why I call diplomats "international engineers,"
| why I call managers "team engineers," and why I call
| historians "hindsight engineers."
| jazzyjackson wrote:
| Used to be called Google Fu
| yieldcrv wrote:
| right? 15 months ago in image models you used to have to
| designate rendering specifications, and know the art of
| negative prompting
|
| now you can really use natural language and people want to
| debate you about how poor they are at articulating a shared
| concepts, amazing
|
| it's like the people are regressing and the AI is improving
| pfortuny wrote:
| Well, I just asked it for a 13-sided irregular polygon (is it
| that hard?)...
|
| https://imgur.com/a/llN7V0W
| BoredPositron wrote:
| The kicker for nano banana is not prompt adherence which is a
| really nice to have but the fact that it's either working on
| pixel space or with a really low spatial scaling. It's the only
| model that doesn't kill your details because of vae
| encode/decode.
| sebzim4500 wrote:
| It's really cool how good of a job it did rendering a page given
| its HTML code. I was not expecting it to do nearly as well.
| kridsdale1 wrote:
| Same. This must have training from sites that show html next to
| screenshots of the pages.
| leviathant wrote:
| I was kind of surprised by this line:
|
| >Nano Banana is terrible at style transfer even with prompt
| engineering shenanigans
|
| My context: I'm kind of fixated on visualizing my neighborhood as
| it would have appeared in the 18th century. I've been doing it in
| Sketchup, and then in Twinmotion, but neither of those produce
| "photorealistic" images... Twinmotion can get pretty close with a
| lot of work, but that's easier with modern architecture than it
| is with the more hand-made, brick-by-brick structures I'm
| modeling out.
|
| As different AI image generators have emerged, I've tried them
| all in an effort to add the proverbial rough edges to snapshots
| of the models I've created, and it was not until Nano Banana that
| I ever saw anything even remotely workable.
|
| Nano Banana manages to maintain the geometry of the scene, while
| applying new styles to it. Sometimes I do this with my Twinmotion
| renders, but what's really been cool to see is how well it takes
| a drawing, or engraving, or watercolor - and with as simple a
| prompt as "make this into a photo" it generates phenomenal
| results.
|
| Similarly to the Paladin/Starbucks/Pirate example in the link
| though, I find that sometimes I need to misdirect a little bit,
| because if I'm peppering the prompt with details about the 18th
| century, I sometimes get a painterly image back. Instead, I'll
| tell it I want it to look like a photograph of a well preserved
| historic neighborhood, or a scene from a period film set in the
| 18th century.
|
| As fantastic as the results can be, I'm not abandoning my manual
| modeling of these buildings and scenes. However, Nano Banana's
| interpretation of contemporary illustrations has helped me
| reshape how I think about some of the assumptions I made in my
| own models.
| echelon wrote:
| You can't take a highly artistic image and supply it as a style
| reference. Nano Banana can't generalize to anything not in its
| training.
| leviathant wrote:
| Fair enough! I suppose I've avoided that kind of "style
| transfer" for a variety of reasons, it hadn't even occurred
| to me that people were still interested in that. And I don't
| say that to open up debate on the topic, just explaining away
| my own ignorance/misinterpretation. Thanks
| simonw wrote:
| I like the Python library that accompanies this:
| https://github.com/minimaxir/gemimg
|
| I added a CLI to it (using Gemini CLI) and submitted a PR, you
| can run that like so: GEMINI_API_KEY="..." \
| uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5b
| befa1e2ffc3b09086bc0a3ad70ca4ef22.zip \ python -m gemimg
| "a racoon holding a hand written sign that says I love trash"
|
| Result in this comment:
| https://github.com/minimaxir/gemimg/pull/7#issuecomment-3529...
| echelon wrote:
| The author went to great lengths about open source early on. I
| wonder if they'll cover the QwenEdit ecosystem.
|
| I'm exceptionally excited about Chinese editing models. They're
| getting closer and closer to NanoBanana in terms of robustness,
| and they're open source. This means you can supply masks and
| kernels and do advanced image operations, integrate them into
| visual UIs, etc.
|
| You can even fine tune them and create LoRAs that will do the
| style transferring tasks that Nano Banana falls flat on.
|
| I don't like how closed the frontier US models are, and I hope
| the Chinese kick our asses.
|
| That said, I love how easy it'll be to distill Nano Banana into
| a new model. You can pluck training data right out of it: ((any
| image, any instruction) -> completion) tuples.
| minimaxir wrote:
| I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and
| they are interesting: however actually running those types of
| models is too cumbersome and in the end unclear if it's
| actually worth it over the $0.04/image for Nano Banana.
| CamperBob2 wrote:
| I was skeptical about the notion of running similar models
| locally as well, but the person who did this (https://old.r
| eddit.com/r/StableDiffusion/comments/1osi1q0/wa... ) swears
| that they generated it locally, just letting a single 5090
| crunch away for a week.
|
| If that's true, it seems worth getting past the
| 'cumbersome' aspects. This tech may not put Hollywood out
| of business, but it's clear that the process of filmmaking
| won't be recognizable in 10 years if amateurs can really do
| this in their basements today.
| braebo wrote:
| Takes a couple mouse clicks in ComfyUI
| echelon wrote:
| On that subject - ComfyUI is not the future of image gen.
| It's an experimental rope bridge.
|
| Adobe's conference last week points to the future of
| image gen. Visual tools where you mold images like clay.
| Hands on.
|
| Comfy appeals to the 0.01% that like toolkits like
| TouchDesigner, Nannou, and ShaderToy.
| msp26 wrote:
| > I don't like how closed the frontier US models are, and I
| hope the Chinese kick our asses.
|
| For imagegen, agreed. But for textgen, Kimi K2 thinking is by
| far the best chat model at the moment from my experience so
| far. Not even "one of the best", the best.
|
| It has frontier level capability and the model was made very
| tastefully: it's significantly less sycophantic and more
| willing to disagree in a productive, reasonable way rather
| than immediately shutting you out. It's also way more funny
| at shitposting.
|
| I'll keep using Claude a lot for multimodality and artifacts
| but much of my usage has shifted to K2. Claude's sycophancy
| is particular is tiresome. I don't use ChatGPT/Gemini because
| they hide the raw thinking tokens, which is really cringe.
| astrange wrote:
| Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o)
| way, it feels like it has BPD. It switches from desperately
| agreeing with you to moralizing lectures and then has a
| breakdown if you point out it's wrong about anything.
|
| Also, yesterday I asked it a question and after the answer
| it complained about its poorly written system prompt to me.
|
| They're really torturing their poor models over there.
| dontlikeyoueith wrote:
| It rubs the data on its skin or else it gets the prompt
| again!
| ctippett wrote:
| Any reason for not also adding a project.scripts entry for
| pyproject.toml? That way the CLI (great idea btw) could be
| installed as a tool by uv.
| simonw wrote:
| I decided to avoid that purely to keep changes made to the
| package as minima as possible - adding a project.scripts
| means installing it adds a new command alias. My approach
| changes nothing other than making "python -m gemimg" do
| something useful.
|
| I agree that a project.scripts would be good but that's a
| decision for the maintainer to take on separately!
| sorcercode wrote:
| @simonw: slight tangent but super curious how you managed to
| generate the preview of that gemini-cli terminal session gist -
| https://gistpreview.github.io/?17290c1024b0ef7df06e9faa4cb37...
|
| is this just a manual copy/paste into a gist with some html css
| styling; or do you have a custom tool a la amp-code that does
| this more easily?
| simonw wrote:
| I used this tool: https://tools.simonwillison.net/terminal-
| to-html
|
| I made a video about building that here:
| https://simonwillison.net/2025/Oct/23/claude-code-for-web-
| vi...
|
| It works much better with Claude Code and Codex CLI because
| they don't mess around with scrolling in the same way as
| Gemini CLI does.
| peetle wrote:
| In my own experience, nano banana still has the tendency to:
|
| - make massive, seemingly random edits to images - adjust image
| scale - make very fine grained but pervasive detail changes
| obvious in an image diff
|
| For instance, I have found that nano-banana will sporadically add
| a (convincing) fireplace to a room or new garage behind a house.
| This happens even with explicit "ALL CAPS" instructions not to do
| so. This happens sporadically, even when the temperature is set
| to zero, and makes it impossible to build a reliable app.
|
| Has anyone had a better experience?
| andblac wrote:
| The "ALL CAPS" part of your comment got me thinking. I imagine
| most llms understand subtle meanings of upper case text use
| depending on context. But, as I understand it, ALL CAPS text
| will tokenize differently than lower case text. Is that right?
| In that case, won't the upper case be harder to understand and
| follow for most models since it's less common in datasets?
| minimaxir wrote:
| There's more than enough ALL CAPS text in the corpus of the
| entire internet, and enough semantic context associated with
| it for it to be intended to be in the imperative voice.
| miohtama wrote:
| Shouldn't all caps normalised to tokens like low caps?
| There are no separate tokens for all caps and low caps in
| Llama, or at least not in the past.
| minimaxir wrote:
| Looking at the tokenizer for the older Llama 2 model, the
| tokenizer has capital letters in it:
| https://huggingface.co/meta-llama/Llama-2-7b-hf
| symisc_devel wrote:
| I work on the PixLab prompt based photo editor
| (https://editor.pixlab.io), and it follows exactly what you
| type with explicit CAPS.
| ainiriand wrote:
| The blueberry and strawberry are not actually where they
| prompted.
| mFixman wrote:
| The author overlooked an interesting error in the second skull
| pancake image: the strawberry is on the right eye socket (to the
| left of the image), and the blackberry is on the left eye socket
| (to the right of the image)!
|
| This looks like it's caused by 99% of the relative directions in
| image descriptions describing them from the looker's point of
| view, and that 99% of the ones that aren't it they refer to a
| human and not to a skull-shaped pancake.
| martin-adams wrote:
| I picked up on that also. I feel that a lot of humans would
| also get confused about whether you mean the eye on the left,
| or the subject's left eye.
| Closi wrote:
| To be honest this is the sort of thing Nano Bannana is weak
| at in my experience. It's absolutely amazing - but doesn't
| understand left/right/up/down/shrink this/move this/rotate
| this etc.
|
| See below to demonstrate this weakness with the same prompts
| as the article see the link below, which demonstrates that it
| is a model weakness and not just a language ambiguity:
|
| https://gemini.google.com/share/a024d11786fc
| ffsm8 wrote:
| Mmh, ime you need to discard the session/rewrite the
| failing prompt instead of continuing and correcting on
| failures. Once errors occur you've basically introduced a
| poison pill which will continuously make things to haywire.
| Spelling out what it did wrong is the most destructive
| thing you can do - at least in my experience
| astrange wrote:
| Almost no image/video models can do "upside-down" either.
| jonas21 wrote:
| I am a human, and I would have done the same thing as Nano
| Banana. If the user had wanted a strawberry in the skull's left
| eye, they should've said, "Put a strawberry in _its_ left eye
| socket. "
| kjeksfjes wrote:
| Exactly what I was thinking too. I'm a designer, and I'm used
| to receiving feedback and instructions. "The left eye socket"
| would to me refer to what I currently see in front of me,
| while "its left eye socket" instantly shift the perspective
| from me to the subject.
| minimaxir wrote:
| I admit I missed this, which is particularly embarrassing
| because I point out this exact problem with the character JSON
| later in the post.
|
| For some offline character JSON prompts I ended up adding an
| additional "any mentions of left and right are from the
| character's perspective, NOT the camera's perspective" to the
| prompt, which did seem to improve success.
| sib wrote:
| Came to make exactly the same comment. It was funny that the
| author specifically said that Nano Banana got all five edit
| prompts correct, rather than noting this discrepancy, which
| could be argued either way (although I think the "right eye" of
| a skull should be interpreted with respect to the skull's POV.)
| layer8 wrote:
| > It's one of the best results I've seen for this particular
| test, and it's one that doesn't have obvious signs of "AI slop"
| aside from the ridiculous premise.
|
| It's pretty good, but one conspicuous thing is that most of the
| blueberries are pointing upwards.
| satvikpendem wrote:
| For images of people generated from scratch, Nano Banana always
| adds a background blur, it can't seem to create more realistic or
| candid images such as those taken via a point and shoot or
| smartphone, has anyone solved this sort of issue? It seems to
| work alright if you give it an existing image to edit however. I
| saw some other threads online about it but I didn't see anyone
| come up with solutions.
| kridsdale1 wrote:
| Maybe try including "f/16" or "f/22" as those are likely to be
| in the training set for long depth of field photos.
| satvikpendem wrote:
| I tried that but they don't seem to make much difference for
| whatever reason, you still can't get a crisp shot such as
| this [0] where the foreground and background details are all
| preserved (linked shot was taken with an iPhone which doesn't
| seem to do shallow depth of field unless you use their
| portrait mode).
|
| [0] https://www.lux.camera/content/images/size/w1600/2024/09/
| IMG...
| astrange wrote:
| Those are rarely in the captions for the image. They'd have
| to extract the EXIF for photos and include it in
| recaptioning. Which they should be doing, but I doubt they
| thought about it.
| jdc0589 wrote:
| I don't feel like I should search for "nano banana" on my work
| laptop
| insane_dreamer wrote:
| I haven't paid much attention to image generation models (not my
| area of interest), but these examples are shockingly good.
| comex wrote:
| I tried asking for a shot from a live-action remake of My
| Neighbor Totoro. This is a task I've been curious about for a
| while. Like Sonic, Totoro is the kind of stylized cartoon
| character that can't be rendered photorealistically without a
| great deal of subjective interpretation, which (like in Sonic's
| case) is famously easy to get wrong even for humans. Unlike
| Sonic, Totoro hasn't had an actual live-action remake, so the
| model would have to come up with a design itself. I was wondering
| what it might produce - something good? something horrifying?
| Unfortunately, neither; it just produced a digital-art style
| image, despite being asked for a photorealistic one, and kept
| doing so even when I copied some of the keyword-stuffing from the
| post. At least it tried. I can't test this with ChatGPT because
| it trips the copyright filter.
| roywiggins wrote:
| Another thing it can't do is remove reflections in windows, it's
| nearly a no-op.
| sejje wrote:
| >> "The image style is definitely closer to Vanity Fair (the
| photographer is reflected in his breastplate!)"
|
| I didn't expect that. I would have definitely counted that as a
| "probably real" tally mark if grading an image.
| Genego wrote:
| I have been generating a few dozen images per day for
| storyboarding purposes. The more I try to perfect it, the easier
| it becomes to control these outputs and even keep the entire
| visual story as well as their characters consistent over a few
| dozen different scenes; while even controlling the time of day
| throughout the story. I am currently working with 7 layers
| prompts to control for environment, camera, subject, composition,
| light, colors and overall quality (it might be overkill, but it's
| also experimenting).
|
| I also created a small editing suite for myself where I can draw
| bounding boxes on images when they aren't perfect, and have them
| fixed. Either just with a prompt or feeding them to Claude as
| image and then having it write the prompt to fix the issue for me
| (as a workflow on the api). It's been quite a lot of fun to
| figure out what works. I am incredibly impressed by where this is
| all going.
|
| Once you do have good storyboards. You can easily do start-to-end
| GenAI video generation (hopping from scene to scene) and bring
| them to life and build your own small visual animated universes.
| taylorhughes wrote:
| We use nano banana extensively to build video storyboards,
| which we then turn into full motion video with a combination of
| img2vid models. It sounds like we're doing similar things,
| trying to keep images/characters/setting/style consistent
| across ~dozens of images (~minutes of video). You might like
| the product depending on what you're doing with the outputs!
| https://hypernatural.ai
| Genego wrote:
| Yes we are definitely doing the same! For now I'm just
| familiarizing myself in this space technically and
| conceptually.
| roywiggins wrote:
| Your "Dracula" character is possibly the least vampiric
| Dracula I've ever seen tbh
| beckos wrote:
| lol you can make your own Dracula if you want him to look
| different: https://hypernatural.ai/characters
| nashadelic wrote:
| > The more I try to perfect it, the easier it becomes I have
| the opposite experience, once it goes off track, its nearly
| impossible to bring it back on message
| Genego wrote:
| How much have you experimented with it? For some prompts I
| may generate 5 variations of 10-20 different scenes and then
| spend time writing down what worked and what did not; and
| running the generation again (this part is mostly for
| research). It's certainly advancing my understanding over
| time and being able to control the output better. But I'm
| learning that it takes a huge amount of trial and error. So
| versioning prompts is definitely recommended, especially if
| you find some nuances that work for you.
| BeetleB wrote:
| Nano Banana can be frustrating at times. Yesterday I tried to get
| it to do several edits to an image, and it would return back
| pretty much the same photo.
|
| Things like: Convert the people to clay figures similar to what
| one would see in a claymation.
|
| And it would think it did it, but I could not perceive any
| change.
|
| After several attempts, I added "Make the person 10 years
| younger". Suddenly it made a clay figure of the person.
| minimaxir wrote:
| The first request is a style transfer, which is why I included
| the Ghibli failure example.
| BeetleB wrote:
| I've gotten it to make Ghibli transfers by responding to the
| initial attempt with "I can barely tell the difference. Make
| the effect STRONGER."
| Der_Einzige wrote:
| I really wish that real expert stuff, like how to do controlnet,
| use regional prompting, or most other advanced ComfyUI stuff got
| upvoted to the top instead.
| tomalbrc wrote:
| Cute. What's the use case?
| qayxc wrote:
| NSFW, mostly
| AuthError wrote:
| use it for technical design doc, where i sketch out something on
| paper and ask nano banana to make flow chat, its incredibly good
| at this kind of editing (also if want to borrow image from
| someone and change some bridges usually its hard its embedded
| image, but nano banana solves that)
| 4b11b4 wrote:
| I found this well written. I read it start to finish. The author
| does a good job of taking you through their process
| smerrill25 wrote:
| Created a tool you can try out!! sorry to self-plug but I launch
| on Product Hunt next week that lets you do this:)
|
| www.brandimagegen.com
|
| if you want a premium account to try out, you can find my email
| in my bio!!
___________________________________________________________________
(page generated 2025-11-13 23:00 UTC)