hngopher.com

       [HN Gopher] Nano Banana can be prompt engineered for nuanced AI ...
       ___________________________________________________________________
        
       Nano Banana can be prompt engineered for nuanced AI image
       generation
        
       Author : minimaxir
       Score  : 347 points
       Date   : 2025-11-13 17:39 UTC (5 hours ago)
        
 (HTM) web link (minimaxir.com)
 (TXT) w3m dump (minimaxir.com)
        
       | doctorpangloss wrote:
       | lots of words
       | 
       | okay, look at imagen 4 ultra:
       | 
       | https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
       | 
       | In this link, Imagen is instructed to render the verbatim prompt
       | "the result of 4+5", which shows that text, and not instructed,
       | which renders "4+5=9"
       | 
       | Is Imagen thinking?
       | 
       | Let's compare to gemini 2.5 flash image (nano banana):
       | 
       | look carefully at the system prompt here:
       | https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
       | 
       | Gemini is instructed to reply in images first, and if it thinks,
       | to think using the image thinking tags. It cannot seemingly be
       | prompted to show verbatim the result 4+5 without showing the
       | answer 4+5=9. Of course it can show whatever exact text that you
       | want, the question is, does it prompt rewrite (no) or do
       | something else (yes)?
       | 
       | compare to ideogram, with prompt rewriting:
       | https://ideogram.ai/g/GRuZRTY7TmilGUHnks-Mjg/0
       | 
       | without prompt rewriting:
       | https://ideogram.ai/g/yKV3EwULRKOu6LDCsSvZUg/2
       | 
       | We can do the same exercises with Flux Kontext for editing versus
       | Flash-2.5, if you think that editing is somehow unique in this
       | regard.
       | 
       | Is prompt rewriting "thinking"? My point is, this article can't
       | answer that question without dElViNg into the nuances of what
       | multi-modal models really are.
        
         | gryfft wrote:
         | Can you provide screenshots or links that don't require login
        
         | PunchTornado wrote:
         | sorry, but I don't understand you post. those links don't work.
        
       | dostick wrote:
       | Use Google AI Studio to submit requests, and to remove watermark,
       | open browser development tools and right click on request to
       | "watermark_4" image and select to block it. And from next
       | generation there will be no watermark!
        
       | squigz wrote:
       | I'm getting annoyed by using "prompt engineered" as a verb. Does
       | this mean I'm finally old and bitter?
       | 
       | (Do we say we software engineered something?)
        
         | vpShane wrote:
         | You're definitely old and bitter, welcome to it.
         | 
         | You CREATED something, and I like to think that creating things
         | that I love and enjoy and that others can love and enjoy makes
         | creating things worth it.
        
           | squigz wrote:
           | Don't get me wrong, I have nothing against using AI as an
           | expression of creativity :)
        
             | malcolmxxx wrote:
             | Create? So I have created all that code I'm running on my
             | site, yes is bad I know, but thank you very much! Such
             | creative guy I was!
        
         | officeplant wrote:
         | Not really since "prompt engineering" can be tossed in the same
         | pile as "vibe coding." Just people coping with not developing
         | the actual skills to produce the desired products.
        
           | bongodongobob wrote:
           | Couldn't care less. I don't need to know how to do literally
           | everything. AI fills in my gaps and I'm a ton more
           | productive.
        
             | squigz wrote:
             | I wouldn't bother trying to convince people who are upset
             | that others have figured out a way to use LLMs. It's not
             | logical.
        
           | koakuma-chan wrote:
           | Try getting a small model to do what you want quickly with
           | high accuracy, high quality, etc, and using few tokens per
           | request. You'll find out that prompt engineering is real and
           | matters.
        
         | pavlov wrote:
         | I think it's meant to be engineering in the same sense as
         | "social engineering".
        
         | antegamisou wrote:
         | No it means you can still discern what is BS.
        
       | miladyincontrol wrote:
       | Theres lots these models can do but I despise when people suggest
       | they can do edits with "with only the necessary aspects changed".
       | 
       | No, that simply is not true. If you actually compare the before
       | and after you can see it still regenerates all the details on the
       | "unchanged" aspects. Texture, lighting, sharpness, even scale its
       | all different even if varyingly similar to the original.
       | 
       | Sure they're cute for casual edits but it really pains me people
       | suggesting these things are suitable replacements for actual
       | photo editing. Especially when it comes to people, or details
       | outside their training data theres a lot of nuance that can be
       | lost as it regenerates them no matter how you prompt things.
       | 
       | Even if you
        
         | StevenWaterman wrote:
         | That is true for gpt-image-1 but not nano-banana. They can do
         | masked image changes
        
         | minimaxir wrote:
         | Nano Banana is different and much better at edits without
         | changing texture/lighting/sharpness/color balance, and I am
         | someone that is extremely picky about it. That's why I add the
         | note that Gemini 2.5 Flash is aware of segmentation masks, and
         | that's my hunch why that's the case.
        
         | BoredPositron wrote:
         | Nano banana has a really low spatial scaling and doesn't affect
         | details like other models.
        
         | miohtama wrote:
         | Could you just mask out the area you wish to change in more
         | advanced tools, or is there something in the model itself which
         | would prevent this?
        
           | lunarboy wrote:
           | That's probably where things are headed and there are already
           | products trying this (even photoshop already). Just like how
           | code gen AI tools don't replace the entire file on every
           | prompt iteration.
        
       | mkagenius wrote:
       | > Nano Banana is still bad at rendering text perfectly/without
       | typos as most image generation models.
       | 
       | I figured that if you write the text in Google docs and share the
       | screenshot with banana it will not make any spelling mistake.
       | 
       | So, use something like "can you write my name on this Wimbledon
       | trophy, both images are attached. Use them" will work.
        
         | minimaxir wrote:
         | Google's example documentation for Nano Banana does demo that
         | pipeline: https://ai.google.dev/gemini-api/docs/image-
         | generation#pytho...
         | 
         | That's on my list of blog-post-worthy things to test, namely
         | text rendering to image in Python directly and passing both
         | input images to the model for compositing.
        
       | ml-anon wrote:
       | "prompt engineered"...i.e. by typing in what you want to see.
        
         | harpiaharpyja wrote:
         | Not all models can actually do that if your prompt is
         | particular
        
           | pksebben wrote:
           | Most designers can't, either. Defining a spec is a skill.
           | 
           | It's actually fairly difficult to put to words any specific
           | enough vision such that it becomes understandable outside of
           | your own head. This goes for pretty much anything, too.
        
             | Razengan wrote:
             | Yep, knowing how and what to _ask_ is a skill.
             | 
             | For anything, even back in the "classical" search days.
        
             | andai wrote:
             | https://habitatchronicles.com/2004/04/you-cant-tell-
             | people-a...
        
         | darepublic wrote:
         | "amenable to highly specific and granular instruction"
        
         | simonw wrote:
         | ... and then iterating on that prompt many times, based on your
         | accumulated knowledge of how best to prompt that particular
         | model.
        
           | minimaxir wrote:
           | Case in point, the final image in this post (the IP bonanza)
           | took 28 iterations of the prompt text to get something
           | maximally interesting, and why that one is very particular
           | about the constraints it invokes, such as specifying
           | "distinct" characters and specifying they are present from
           | "left to right" because the model kept exploiting that
           | ambiguity.
        
             | chankstein38 wrote:
             | Hey! The author, thank you for this post! QQ, any idea
             | roughly how much this experimentation cost you? I'm having
             | trouble processing their image generation pricing I may
             | just not be finding the right table. I'm just trying to
             | understand if I do like 50 iterations at the quality in the
             | post, how much is that going to cost me?
        
               | minimaxir wrote:
               | All generations in the post are $0.04/image (Nano Banana
               | doesn't have a way to increase the resolution, yet), so
               | you can do the math and assume that you can generated
               | about 24 images per dollar: unlike other models, Nano
               | Banana does charge for input tokens but it's neligible.
               | 
               | Discounting the testing around the character JSON which
               | became _extremely_ expensive due to extreme iteration /my
               | own stupidity, I'd wager it took about $5 total including
               | iteration.
        
         | mensetmanusman wrote:
         | We understand now that we interface with LLMs using natural and
         | unnatural language as the user interface.
         | 
         | This is a very different fuzzy interface compared to
         | programming languages.
         | 
         | There will be techniques better or worse at interfacing.
         | 
         | This is what the term prompt engineering is alluding to since
         | we don't have the full suite of language to describe this yet.
        
         | w_for_wumbo wrote:
         | Yes, that is a serious skill. How many of the woes that we see
         | is because people don't know what they want or are unable to
         | describe it in such a way that others understand it. I believe
         | prompt engineer to properly convey how complex communication
         | can be, when interacting with a multitude of perspectives,
         | world views, assumptions, presumptions etc. I believe it works
         | well to counter the over-confidence that people have, from not
         | paying attention to what gaps exist between what is said and
         | what is meant.
        
           | CobrastanJorji wrote:
           | Yes, obviously a role involving complex communication while
           | interacting with a multitude of perspectives, world views,
           | assumptions, presumptions, etc needs to be called "engineer."
           | 
           | That is why I always call technical writers "documentation
           | engineers," why I call diplomats "international engineers,"
           | why I call managers "team engineers," and why I call
           | historians "hindsight engineers."
        
         | jazzyjackson wrote:
         | Used to be called Google Fu
        
         | yieldcrv wrote:
         | right? 15 months ago in image models you used to have to
         | designate rendering specifications, and know the art of
         | negative prompting
         | 
         | now you can really use natural language and people want to
         | debate you about how poor they are at articulating a shared
         | concepts, amazing
         | 
         | it's like the people are regressing and the AI is improving
        
       | pfortuny wrote:
       | Well, I just asked it for a 13-sided irregular polygon (is it
       | that hard?)...
       | 
       | https://imgur.com/a/llN7V0W
        
       | BoredPositron wrote:
       | The kicker for nano banana is not prompt adherence which is a
       | really nice to have but the fact that it's either working on
       | pixel space or with a really low spatial scaling. It's the only
       | model that doesn't kill your details because of vae
       | encode/decode.
        
       | sebzim4500 wrote:
       | It's really cool how good of a job it did rendering a page given
       | its HTML code. I was not expecting it to do nearly as well.
        
         | kridsdale1 wrote:
         | Same. This must have training from sites that show html next to
         | screenshots of the pages.
        
       | leviathant wrote:
       | I was kind of surprised by this line:
       | 
       | >Nano Banana is terrible at style transfer even with prompt
       | engineering shenanigans
       | 
       | My context: I'm kind of fixated on visualizing my neighborhood as
       | it would have appeared in the 18th century. I've been doing it in
       | Sketchup, and then in Twinmotion, but neither of those produce
       | "photorealistic" images... Twinmotion can get pretty close with a
       | lot of work, but that's easier with modern architecture than it
       | is with the more hand-made, brick-by-brick structures I'm
       | modeling out.
       | 
       | As different AI image generators have emerged, I've tried them
       | all in an effort to add the proverbial rough edges to snapshots
       | of the models I've created, and it was not until Nano Banana that
       | I ever saw anything even remotely workable.
       | 
       | Nano Banana manages to maintain the geometry of the scene, while
       | applying new styles to it. Sometimes I do this with my Twinmotion
       | renders, but what's really been cool to see is how well it takes
       | a drawing, or engraving, or watercolor - and with as simple a
       | prompt as "make this into a photo" it generates phenomenal
       | results.
       | 
       | Similarly to the Paladin/Starbucks/Pirate example in the link
       | though, I find that sometimes I need to misdirect a little bit,
       | because if I'm peppering the prompt with details about the 18th
       | century, I sometimes get a painterly image back. Instead, I'll
       | tell it I want it to look like a photograph of a well preserved
       | historic neighborhood, or a scene from a period film set in the
       | 18th century.
       | 
       | As fantastic as the results can be, I'm not abandoning my manual
       | modeling of these buildings and scenes. However, Nano Banana's
       | interpretation of contemporary illustrations has helped me
       | reshape how I think about some of the assumptions I made in my
       | own models.
        
         | echelon wrote:
         | You can't take a highly artistic image and supply it as a style
         | reference. Nano Banana can't generalize to anything not in its
         | training.
        
           | leviathant wrote:
           | Fair enough! I suppose I've avoided that kind of "style
           | transfer" for a variety of reasons, it hadn't even occurred
           | to me that people were still interested in that. And I don't
           | say that to open up debate on the topic, just explaining away
           | my own ignorance/misinterpretation. Thanks
        
       | simonw wrote:
       | I like the Python library that accompanies this:
       | https://github.com/minimaxir/gemimg
       | 
       | I added a CLI to it (using Gemini CLI) and submitted a PR, you
       | can run that like so:                 GEMINI_API_KEY="..." \
       | uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5b
       | befa1e2ffc3b09086bc0a3ad70ca4ef22.zip \         python -m gemimg
       | "a racoon holding a hand written sign that says I love trash"
       | 
       | Result in this comment:
       | https://github.com/minimaxir/gemimg/pull/7#issuecomment-3529...
        
         | echelon wrote:
         | The author went to great lengths about open source early on. I
         | wonder if they'll cover the QwenEdit ecosystem.
         | 
         | I'm exceptionally excited about Chinese editing models. They're
         | getting closer and closer to NanoBanana in terms of robustness,
         | and they're open source. This means you can supply masks and
         | kernels and do advanced image operations, integrate them into
         | visual UIs, etc.
         | 
         | You can even fine tune them and create LoRAs that will do the
         | style transferring tasks that Nano Banana falls flat on.
         | 
         | I don't like how closed the frontier US models are, and I hope
         | the Chinese kick our asses.
         | 
         | That said, I love how easy it'll be to distill Nano Banana into
         | a new model. You can pluck training data right out of it: ((any
         | image, any instruction) -> completion) tuples.
        
           | minimaxir wrote:
           | I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and
           | they are interesting: however actually running those types of
           | models is too cumbersome and in the end unclear if it's
           | actually worth it over the $0.04/image for Nano Banana.
        
             | CamperBob2 wrote:
             | I was skeptical about the notion of running similar models
             | locally as well, but the person who did this (https://old.r
             | eddit.com/r/StableDiffusion/comments/1osi1q0/wa... ) swears
             | that they generated it locally, just letting a single 5090
             | crunch away for a week.
             | 
             | If that's true, it seems worth getting past the
             | 'cumbersome' aspects. This tech may not put Hollywood out
             | of business, but it's clear that the process of filmmaking
             | won't be recognizable in 10 years if amateurs can really do
             | this in their basements today.
        
             | braebo wrote:
             | Takes a couple mouse clicks in ComfyUI
        
               | echelon wrote:
               | On that subject - ComfyUI is not the future of image gen.
               | It's an experimental rope bridge.
               | 
               | Adobe's conference last week points to the future of
               | image gen. Visual tools where you mold images like clay.
               | Hands on.
               | 
               | Comfy appeals to the 0.01% that like toolkits like
               | TouchDesigner, Nannou, and ShaderToy.
        
           | msp26 wrote:
           | > I don't like how closed the frontier US models are, and I
           | hope the Chinese kick our asses.
           | 
           | For imagegen, agreed. But for textgen, Kimi K2 thinking is by
           | far the best chat model at the moment from my experience so
           | far. Not even "one of the best", the best.
           | 
           | It has frontier level capability and the model was made very
           | tastefully: it's significantly less sycophantic and more
           | willing to disagree in a productive, reasonable way rather
           | than immediately shutting you out. It's also way more funny
           | at shitposting.
           | 
           | I'll keep using Claude a lot for multimodality and artifacts
           | but much of my usage has shifted to K2. Claude's sycophancy
           | is particular is tiresome. I don't use ChatGPT/Gemini because
           | they hide the raw thinking tokens, which is really cringe.
        
             | astrange wrote:
             | Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o)
             | way, it feels like it has BPD. It switches from desperately
             | agreeing with you to moralizing lectures and then has a
             | breakdown if you point out it's wrong about anything.
             | 
             | Also, yesterday I asked it a question and after the answer
             | it complained about its poorly written system prompt to me.
             | 
             | They're really torturing their poor models over there.
        
               | dontlikeyoueith wrote:
               | It rubs the data on its skin or else it gets the prompt
               | again!
        
         | ctippett wrote:
         | Any reason for not also adding a project.scripts entry for
         | pyproject.toml? That way the CLI (great idea btw) could be
         | installed as a tool by uv.
        
           | simonw wrote:
           | I decided to avoid that purely to keep changes made to the
           | package as minima as possible - adding a project.scripts
           | means installing it adds a new command alias. My approach
           | changes nothing other than making "python -m gemimg" do
           | something useful.
           | 
           | I agree that a project.scripts would be good but that's a
           | decision for the maintainer to take on separately!
        
         | sorcercode wrote:
         | @simonw: slight tangent but super curious how you managed to
         | generate the preview of that gemini-cli terminal session gist -
         | https://gistpreview.github.io/?17290c1024b0ef7df06e9faa4cb37...
         | 
         | is this just a manual copy/paste into a gist with some html css
         | styling; or do you have a custom tool a la amp-code that does
         | this more easily?
        
           | simonw wrote:
           | I used this tool: https://tools.simonwillison.net/terminal-
           | to-html
           | 
           | I made a video about building that here:
           | https://simonwillison.net/2025/Oct/23/claude-code-for-web-
           | vi...
           | 
           | It works much better with Claude Code and Codex CLI because
           | they don't mess around with scrolling in the same way as
           | Gemini CLI does.
        
       | peetle wrote:
       | In my own experience, nano banana still has the tendency to:
       | 
       | - make massive, seemingly random edits to images - adjust image
       | scale - make very fine grained but pervasive detail changes
       | obvious in an image diff
       | 
       | For instance, I have found that nano-banana will sporadically add
       | a (convincing) fireplace to a room or new garage behind a house.
       | This happens even with explicit "ALL CAPS" instructions not to do
       | so. This happens sporadically, even when the temperature is set
       | to zero, and makes it impossible to build a reliable app.
       | 
       | Has anyone had a better experience?
        
         | andblac wrote:
         | The "ALL CAPS" part of your comment got me thinking. I imagine
         | most llms understand subtle meanings of upper case text use
         | depending on context. But, as I understand it, ALL CAPS text
         | will tokenize differently than lower case text. Is that right?
         | In that case, won't the upper case be harder to understand and
         | follow for most models since it's less common in datasets?
        
           | minimaxir wrote:
           | There's more than enough ALL CAPS text in the corpus of the
           | entire internet, and enough semantic context associated with
           | it for it to be intended to be in the imperative voice.
        
             | miohtama wrote:
             | Shouldn't all caps normalised to tokens like low caps?
             | There are no separate tokens for all caps and low caps in
             | Llama, or at least not in the past.
        
               | minimaxir wrote:
               | Looking at the tokenizer for the older Llama 2 model, the
               | tokenizer has capital letters in it:
               | https://huggingface.co/meta-llama/Llama-2-7b-hf
        
         | symisc_devel wrote:
         | I work on the PixLab prompt based photo editor
         | (https://editor.pixlab.io), and it follows exactly what you
         | type with explicit CAPS.
        
       | ainiriand wrote:
       | The blueberry and strawberry are not actually where they
       | prompted.
        
       | mFixman wrote:
       | The author overlooked an interesting error in the second skull
       | pancake image: the strawberry is on the right eye socket (to the
       | left of the image), and the blackberry is on the left eye socket
       | (to the right of the image)!
       | 
       | This looks like it's caused by 99% of the relative directions in
       | image descriptions describing them from the looker's point of
       | view, and that 99% of the ones that aren't it they refer to a
       | human and not to a skull-shaped pancake.
        
         | martin-adams wrote:
         | I picked up on that also. I feel that a lot of humans would
         | also get confused about whether you mean the eye on the left,
         | or the subject's left eye.
        
           | Closi wrote:
           | To be honest this is the sort of thing Nano Bannana is weak
           | at in my experience. It's absolutely amazing - but doesn't
           | understand left/right/up/down/shrink this/move this/rotate
           | this etc.
           | 
           | See below to demonstrate this weakness with the same prompts
           | as the article see the link below, which demonstrates that it
           | is a model weakness and not just a language ambiguity:
           | 
           | https://gemini.google.com/share/a024d11786fc
        
             | ffsm8 wrote:
             | Mmh, ime you need to discard the session/rewrite the
             | failing prompt instead of continuing and correcting on
             | failures. Once errors occur you've basically introduced a
             | poison pill which will continuously make things to haywire.
             | Spelling out what it did wrong is the most destructive
             | thing you can do - at least in my experience
        
             | astrange wrote:
             | Almost no image/video models can do "upside-down" either.
        
         | jonas21 wrote:
         | I am a human, and I would have done the same thing as Nano
         | Banana. If the user had wanted a strawberry in the skull's left
         | eye, they should've said, "Put a strawberry in _its_ left eye
         | socket. "
        
           | kjeksfjes wrote:
           | Exactly what I was thinking too. I'm a designer, and I'm used
           | to receiving feedback and instructions. "The left eye socket"
           | would to me refer to what I currently see in front of me,
           | while "its left eye socket" instantly shift the perspective
           | from me to the subject.
        
         | minimaxir wrote:
         | I admit I missed this, which is particularly embarrassing
         | because I point out this exact problem with the character JSON
         | later in the post.
         | 
         | For some offline character JSON prompts I ended up adding an
         | additional "any mentions of left and right are from the
         | character's perspective, NOT the camera's perspective" to the
         | prompt, which did seem to improve success.
        
         | sib wrote:
         | Came to make exactly the same comment. It was funny that the
         | author specifically said that Nano Banana got all five edit
         | prompts correct, rather than noting this discrepancy, which
         | could be argued either way (although I think the "right eye" of
         | a skull should be interpreted with respect to the skull's POV.)
        
       | layer8 wrote:
       | > It's one of the best results I've seen for this particular
       | test, and it's one that doesn't have obvious signs of "AI slop"
       | aside from the ridiculous premise.
       | 
       | It's pretty good, but one conspicuous thing is that most of the
       | blueberries are pointing upwards.
        
       | satvikpendem wrote:
       | For images of people generated from scratch, Nano Banana always
       | adds a background blur, it can't seem to create more realistic or
       | candid images such as those taken via a point and shoot or
       | smartphone, has anyone solved this sort of issue? It seems to
       | work alright if you give it an existing image to edit however. I
       | saw some other threads online about it but I didn't see anyone
       | come up with solutions.
        
         | kridsdale1 wrote:
         | Maybe try including "f/16" or "f/22" as those are likely to be
         | in the training set for long depth of field photos.
        
           | satvikpendem wrote:
           | I tried that but they don't seem to make much difference for
           | whatever reason, you still can't get a crisp shot such as
           | this [0] where the foreground and background details are all
           | preserved (linked shot was taken with an iPhone which doesn't
           | seem to do shallow depth of field unless you use their
           | portrait mode).
           | 
           | [0] https://www.lux.camera/content/images/size/w1600/2024/09/
           | IMG...
        
           | astrange wrote:
           | Those are rarely in the captions for the image. They'd have
           | to extract the EXIF for photos and include it in
           | recaptioning. Which they should be doing, but I doubt they
           | thought about it.
        
       | jdc0589 wrote:
       | I don't feel like I should search for "nano banana" on my work
       | laptop
        
       | insane_dreamer wrote:
       | I haven't paid much attention to image generation models (not my
       | area of interest), but these examples are shockingly good.
        
       | comex wrote:
       | I tried asking for a shot from a live-action remake of My
       | Neighbor Totoro. This is a task I've been curious about for a
       | while. Like Sonic, Totoro is the kind of stylized cartoon
       | character that can't be rendered photorealistically without a
       | great deal of subjective interpretation, which (like in Sonic's
       | case) is famously easy to get wrong even for humans. Unlike
       | Sonic, Totoro hasn't had an actual live-action remake, so the
       | model would have to come up with a design itself. I was wondering
       | what it might produce - something good? something horrifying?
       | Unfortunately, neither; it just produced a digital-art style
       | image, despite being asked for a photorealistic one, and kept
       | doing so even when I copied some of the keyword-stuffing from the
       | post. At least it tried. I can't test this with ChatGPT because
       | it trips the copyright filter.
        
       | roywiggins wrote:
       | Another thing it can't do is remove reflections in windows, it's
       | nearly a no-op.
        
       | sejje wrote:
       | >> "The image style is definitely closer to Vanity Fair (the
       | photographer is reflected in his breastplate!)"
       | 
       | I didn't expect that. I would have definitely counted that as a
       | "probably real" tally mark if grading an image.
        
       | Genego wrote:
       | I have been generating a few dozen images per day for
       | storyboarding purposes. The more I try to perfect it, the easier
       | it becomes to control these outputs and even keep the entire
       | visual story as well as their characters consistent over a few
       | dozen different scenes; while even controlling the time of day
       | throughout the story. I am currently working with 7 layers
       | prompts to control for environment, camera, subject, composition,
       | light, colors and overall quality (it might be overkill, but it's
       | also experimenting).
       | 
       | I also created a small editing suite for myself where I can draw
       | bounding boxes on images when they aren't perfect, and have them
       | fixed. Either just with a prompt or feeding them to Claude as
       | image and then having it write the prompt to fix the issue for me
       | (as a workflow on the api). It's been quite a lot of fun to
       | figure out what works. I am incredibly impressed by where this is
       | all going.
       | 
       | Once you do have good storyboards. You can easily do start-to-end
       | GenAI video generation (hopping from scene to scene) and bring
       | them to life and build your own small visual animated universes.
        
         | taylorhughes wrote:
         | We use nano banana extensively to build video storyboards,
         | which we then turn into full motion video with a combination of
         | img2vid models. It sounds like we're doing similar things,
         | trying to keep images/characters/setting/style consistent
         | across ~dozens of images (~minutes of video). You might like
         | the product depending on what you're doing with the outputs!
         | https://hypernatural.ai
        
           | Genego wrote:
           | Yes we are definitely doing the same! For now I'm just
           | familiarizing myself in this space technically and
           | conceptually.
        
           | roywiggins wrote:
           | Your "Dracula" character is possibly the least vampiric
           | Dracula I've ever seen tbh
        
             | beckos wrote:
             | lol you can make your own Dracula if you want him to look
             | different: https://hypernatural.ai/characters
        
         | nashadelic wrote:
         | > The more I try to perfect it, the easier it becomes I have
         | the opposite experience, once it goes off track, its nearly
         | impossible to bring it back on message
        
           | Genego wrote:
           | How much have you experimented with it? For some prompts I
           | may generate 5 variations of 10-20 different scenes and then
           | spend time writing down what worked and what did not; and
           | running the generation again (this part is mostly for
           | research). It's certainly advancing my understanding over
           | time and being able to control the output better. But I'm
           | learning that it takes a huge amount of trial and error. So
           | versioning prompts is definitely recommended, especially if
           | you find some nuances that work for you.
        
       | BeetleB wrote:
       | Nano Banana can be frustrating at times. Yesterday I tried to get
       | it to do several edits to an image, and it would return back
       | pretty much the same photo.
       | 
       | Things like: Convert the people to clay figures similar to what
       | one would see in a claymation.
       | 
       | And it would think it did it, but I could not perceive any
       | change.
       | 
       | After several attempts, I added "Make the person 10 years
       | younger". Suddenly it made a clay figure of the person.
        
         | minimaxir wrote:
         | The first request is a style transfer, which is why I included
         | the Ghibli failure example.
        
           | BeetleB wrote:
           | I've gotten it to make Ghibli transfers by responding to the
           | initial attempt with "I can barely tell the difference. Make
           | the effect STRONGER."
        
       | Der_Einzige wrote:
       | I really wish that real expert stuff, like how to do controlnet,
       | use regional prompting, or most other advanced ComfyUI stuff got
       | upvoted to the top instead.
        
       | tomalbrc wrote:
       | Cute. What's the use case?
        
         | qayxc wrote:
         | NSFW, mostly
        
       | AuthError wrote:
       | use it for technical design doc, where i sketch out something on
       | paper and ask nano banana to make flow chat, its incredibly good
       | at this kind of editing (also if want to borrow image from
       | someone and change some bridges usually its hard its embedded
       | image, but nano banana solves that)
        
       | 4b11b4 wrote:
       | I found this well written. I read it start to finish. The author
       | does a good job of taking you through their process
        
       | smerrill25 wrote:
       | Created a tool you can try out!! sorry to self-plug but I launch
       | on Product Hunt next week that lets you do this:)
       | 
       | www.brandimagegen.com
       | 
       | if you want a premium account to try out, you can find my email
       | in my bio!!
        
       ___________________________________________________________________
       (page generated 2025-11-13 23:00 UTC)