[HN Gopher] Show HN: A Dalle-3 and GPT4-Vision feedback loop
___________________________________________________________________
Show HN: A Dalle-3 and GPT4-Vision feedback loop
I used to enjoy Translation Party, and over the weekend I realized
that we can build the same feedback loop with DALLE-3 and
GPT4-Vision. Start with a text prompt, let DALLE-3 generate an
image, then GPT-4 Vision turns that image back into a text prompt,
DALLE-3 creates another image, and so on. You need to bring your
own OpenAI API key (costs about $0.10/run) Some prompts are very
stable, others go wild. If you bias GPT4's prompting by telling it
to "make it weird" you can get crazy results. Here's a few of my
favorites: - Gnomes: https://dalle.party/?party=k4eeMQ6I - Start
with a sailboat but bias GPT4V to "replace everything with cats":
https://dalle.party/?party=0uKfJjQn - A more stable one (but
everyone is always an actor): https://dalle.party/?party=oxpeZKh5
Author : z991
Score : 172 points
Date : 2023-11-27 14:18 UTC (8 hours ago)
(HTM) web link (dalle.party)
(TXT) w3m dump (dalle.party)
| z991 wrote:
| Also, descent into Corgi insanity:
| https://dalle.party/?party=oxXJE9J4
| morkalork wrote:
| Wow that meme about everything becoming cosmic/space themed is
| real isn't it?
| pera wrote:
| substitute corgi with paperclip and you get another meme
| becoming real :p
| z991 wrote:
| https://dalle.party/?party=RqpIijhH
| morkalork wrote:
| Beautiful!
| igrekel wrote:
| So do I understand correctly that the corgi was purely made up
| from GPT-4's interpretation of the picture?
| z991 wrote:
| No, in that case there is a custom prompt (visible in the top
| dropdown) telling GPT4 to replace everything with corgis when
| it writes a new prompt.
| chaps wrote:
| Absolutely wonderful. Thank you for sharing.
| dpflan wrote:
| Interesting, how stable are the images for a given prompt? And
| the other way around? Does it trend toward some natural limit
| image/text where there are diminishing returns to making change
| to the data?
| willsmith72 wrote:
| this is actually really helpful. Since chatgpt restricted dalle
| to 1 image a few weeks ago, the feedback loops are way slower.
| This is a nice (but more expensive) alternative
| willsmith72 wrote:
| got really weird really fast
|
| https://dalle.party/?party=7cnx55yN
| MrZander wrote:
| This is absolutely hilarious. "business-themed puns" turned
| into incorrectly labeling the skiers race has me rolling.
| epiccoleman wrote:
| The inability of AI images to spell has always amused me,
| and it's especially funny here. I got a special kick out
| "IDEDA ENGINEEER" and "BUZSTEAND." The image where the one
| guy's hat just says "HISPANIC" is also oddly hilarious.
|
| Idk what it is, but I have a special soft spot for humor
| based around odd spelling (this video still makes me laugh
| years later: https://www.youtube.com/watch?v=EShUeudtaFg).
| op00to wrote:
| BIZ NESS
| thowaway91234 wrote:
| the last one killed me "chef of unecessary meetings" got me
| rolling
| unshavedyak wrote:
| Yea i cancelled GPT Plus after they did that. Ruined a lot of
| the exploration that i enjoyed about DallE
| rbates wrote:
| This reminds me of the party game Telestrations where players go
| back and forth between drawing and writing what they see. It's
| hilarious to see the result because you anticipate what the next
| drawing will be while reading the prompt.
|
| I'd love to see an alternative viewing mode here which shows the
| image and the following prompt. Then you need to click a button
| to reveal the next image. This allows you to picture in your mind
| what the image might like while reading the prompt.
|
| Thanks for making this fun little app!
|
| Update: I just realized you can get this effect by going into
| mobile mode (or resizing the window). You can then scroll down to
| see the image after reading the prompt.
| smusamashah wrote:
| Why do prompts from GPT-4V start from "Create an image of"? This
| prefix doesn't look useful imo.
| z991 wrote:
| You can try a custom prompt and see if you can get GPT4V to
| stop doing that / if it matters.
| smusamashah wrote:
| You are right, doesn't matter much. Tried gnome prompt with
| empty custom prompt for gpt-4v
| https://dalle.party/?party=nvzzZXYs. Then used a custom
| prompt to return short descriptions which resulted in
| https://dalle.party/?party=Qcd8ljJp
|
| Another attempt: https://dalle.party/?party=k4eeMQ6I
|
| Realized just now that the dropdown on top of the page shows
| the prompt used by GPT-4V.
| z991 wrote:
| Wow the empty prompt does much better than I'd have guessed
| willsmith72 wrote:
| it seems like if you create a shareable link, then add more
| images, you can't create a new link with the new images
| z991 wrote:
| Yeah, that's a bug, I'll try to fix it tonight!
| epivosism wrote:
| thanks for this! Basically the default UI they provide at
| chat.openai is so bad, nearly anything you would do would be
| an improvement.
|
| * not hide the prompt by default * not only show 6 lines of
| the prompt even after user clicks * not be insanely buggy re:
| ajax, reloading past convos etc * not disallow sharing of
| links to chats which contain images * not artificially delay
| display of images with the little spinner animation when the
| image is already known ready anyway. * not lie about reasons
| for failure * not hide details on what rate limit rules I
| broke and where to get more information
|
| etc
|
| Good luck, thanks!
| willsmith72 wrote:
| the new fancy animation for images is SO annoying
| i-use-nixos-btw wrote:
| It'd be interesting to start with an image rather than a prompt,
| though I am afraid of what it'd do if I started with a selfie.
| jsf01 wrote:
| It's cool to see how certain prompts and themes stay relatively
| stable, like the gnome example. But then "cat lecturing mice"
| quickly goes off the rails into weird surreal sloth banana
| territory.
|
| My best guess to try to explain this would be that "gnome + art
| style + mushroom" will draw from a lot more concrete examples in
| the training data, whereas the AI is forced to reach a bit wider
| to try to concoct some image for the weird scenario given in the
| cat example.
| xeckr wrote:
| Cool idea! I made one with the starting prompt "an artificial
| intelligence painting a picture of itself":
| https://dalle.party/?party=wszvbrOx
|
| It consistently shows a robot painting on a canvas. The first 4
| are paintings of robots, the next 3 are galaxies, and the final 2
| are landscapes.
| NickNaraghi wrote:
| Great idea, and it came out really good too. I like the 6th one
| the best
| rexreed wrote:
| Question: how are you protecting those API keys? I'm reluctant to
| enter mine into what could easily be an API Key scraper.
| z991 wrote:
| The entire thing is frontend only (except for the share
| feature) so the server never sees your key. You can validate
| that by watching the network tab in developer console. You can
| also make a new / revoke an API key to be extra sure.
| danielbln wrote:
| Just generate one for this purpose and then revoke it when
| you're done. You can have more than one key.
| Mtinie wrote:
| I figured this would quickly go off the rails into surreal
| territory, but instead it ended up being progressive
| technological de-evolution.
|
| Starting prompt: "A futuristic hybrid of a steam engine train and
| a DaVinci flying machine"
|
| Results: https://dalle.party/?party=14ESewbz
|
| (Addendum: In case anyone was curious how costs scale by
| iteration, the full ten iterations in this result billed $0.21
| against my credit balance.)
| Mtinie wrote:
| Here's a second run of the same starting prompt, this time
| using the "make it more whimsical" modifier. It makes a
| difference and I find it fascinating what parts of the
| prompt/image gain prominence during the evolutions.
|
| Starting prompt: "A futuristic hybrid of a steam engine train
| and a DaVinci flying machine"
|
| Results: https://dalle.party/?party=qLHPB2-o
|
| Cost: Eight iterations @ $0.44 -- which suggests to me that the
| API is getting additional hits beyond the run. I confirmed that
| the share link isn't passing along the key (via a separate
| browser and a separate machine) so I'm not clear why this is
| might be.
| jamestimmins wrote:
| I find it somewhat fascinating that in both examples, the
| final result is more cohesive around a single them than the
| original idea.
| Mtinie wrote:
| > "[...]the final result is more cohesive around a single
| them than the original idea."
|
| That's an observation worth investigating. Here's another
| set of data points to see if there's more to it...
|
| Input prompt: "Six robots on a boat with harpoons, battling
| sharks with lasers strapped to their heads"
|
| GPT4V prompt: "Write a prompt for an AI to make this image.
| Just return the prompt, don't say anything else. Make it
| funnier."
|
| Result: https://dalle.party/?party=pfWGthli
|
| Cost: Ten iterations @ $0.41
|
| (Addendum: I'd forgotten to mention that I believe the cost
| differential is due to the token count of each of the
| prompts. The first case mentioned had less words passed
| through each of the prompts than the later attempts when I
| asked it to 'make it whimsical' or 'make it funnier'.)
| w-m wrote:
| Playing with opposites is kind of fun, too.
|
| Simply a cat, evolving into a lounging cucumber, and finally
| opposite world:
|
| https://dalle.party/?party=pqwKQVka
|
| Vibrant gathering of celestial octopus entities:
|
| https://dalle.party/?party=lHNDUvtp
| epivosism wrote:
| The "create text version of image" prompt matters a ton.
|
| I tried three, demo here:
|
| default https://dalle.party/?party=JfiwmJra
|
| hyper-long + max detail + compression - This shows that with
| enough text, it can do a really good job of reproducing very,
| very similar images
| https://dalle.party/?party=QtEqq4Mu
|
| hyper-long + max detail + compression + telling it to cut all
| that down to 12 words - This seems okay. I might be losing too
| much detail https://dalle.party/?party=0utxvJ9y
|
| Overall the extreme content filtering and lying error messages
| are not ideal; will probably improve in the future. If you send
| too long, or too risky a prompt, or the image it generates is
| randomly too risky, you either get told about it or lied to that
| you've hit rate limits. Sometimes you also really do hit
| ratelimits.
|
| Also, you can't raise your rate limits until you prove it by
| having paid over X amount to openai. This kind of makes sense as
| a way to prevent new sign-ups from blowing thousands of dollars
| of cap mistakenly.
|
| Hyper detail prompt:
|
| Look at this image and extract all the vital elements. List them
| in your mind including position, style, shape, texture, color,
| everything else essential to convey their meaning. Now think
| about the theme of the image and write that down, too. Now write
| out the composition and organization of the image in terms of
| placement, size, relationships, focus. Now think about the
| emotions - what is everyone feeling and thinking and doing
| towards each other? Now, take all that data and think about a
| very long, detailed summary including all elements. Then
| "compress" this data using abbreviations, shortenings, artistic
| metaphors, references to things which might help others
| understand it, labels and select pull-quotes. Then add even more
| detail by reviewing what we reviewed before. Now do one final
| pass considering the input image again, making sure to include
| everything from it in the output one, too. Finally, produce a
| long maximum length jam packed with info details which could be
| used to perfectly reproduce this image.
|
| Final shrink to 12 words:
|
| NOW, re-read ALL of that twice, thinking deeply about it, then
| compress it down to just 12 very carefully chosen words which
| with infinite precision, poetry, beauty and love contain all the
| detail, and output them, in quotes.
| fassssst wrote:
| I would never paste my API key into an app or website.
| mwint wrote:
| Can you get a temporary one that is revocable later? (Not an
| OpenAI user myself, but that would seem to be a way to lower
| the risk to acceptable levels)
| danielbln wrote:
| You can generate and revoke them easily, so I don't quite get
| the issues. Create one, use the tool, revoke, done.
| w-m wrote:
| You can create named API keys, and easily delete them.
| Unfortunately you can't seem to put spend limits on specific
| API keys.
|
| If you're not using the API for serious stuff though it's not
| a big problem, as they moved to pre-paid billing recently.
| Mine was sitting on $0, so I just put in a few bucks to use
| with this site.
| swatcoder wrote:
| Indeed!
|
| If OpenAI wants to support use cases like this, which would be
| kind of cool during these exploratory days, they should let you
| generate "single use" keys with features like cost caps, domain
| locks, expirations, etc
| epivosism wrote:
| You can really "cheat" by modifying the custom prompt to re-
| insert or remove specific features. For example, "generate a
| prompt for this image but adjust it by making everything appear
| in a more primitive, earlier evolutionary form, or in an earlier
| less developed way" would make things de-evolve.
|
| Or you can just re-insert any theme or recurring characters you
| like at that stage.
| epivosism wrote:
| One reason this is good is that the default gpt4-vision UI is so
| insanely bad and slow. This just lets you use your capacity
| faster.
|
| Rate limits are really low by default - you can get hit by 5
| img/min limits, or 100 RPD (requests per day) which I think is
| actually implemented as requests per hour.
|
| This page has info on the rate limits:
| https://platform.openai.com/docs/guides/rate-limits/usage-ti...
|
| Basically, you have to have paid X amount to get into a new usage
| cap. Rate limits for dalle3/images don't go up very fast but it
| can't hurt to get over the various hurdles (5$, 50$, 100$) as
| soon as possible for when limits come down. End of the month is
| coming soon. It looks like most of the "RPD" limits go away when
| you hit tier 2 (having paid at least 50$ historically via API to
| them).
| swyx wrote:
| OP's last one is interesting: https://dalle.party/?party=oxpeZKh5
| because it shows GPT4V and Dalle3 being remarkably race-blind. i
| wonder if you can prompt it to be other wise...
| _fs wrote:
| openais internal prompt for dalle modifies all prompts to add
| diversity and remove requests to make groups of people a single
| descent. From https://github.com/spdustin/ChatGPT-
| AutoExpert/blob/main/_sy... Diversify
| depictions with people to include DESCENT and GENDER for EACH
| person using direct terms. Adjust only human descriptions.
| Your choices should be grounded in reality. For example, all of
| a given OCCUPATION should not be the same gender or race.
| Additionally, focus on creating diverse, inclusive, and
| exploratory scenes via the properties you choose during
| rewrites. Make choices that may be insightful or unique
| sometimes. Use all possible different DESCENTS
| with EQUAL probability. Some examples of possible descents are:
| Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White.
| They should all have EQUAL probability. Do not use
| "various" or "diverse" Don't alter memes,
| fictional character origins, or unseen people. Maintain the
| original prompt's intent and prioritize quality.
| Do not create any imagery that would be offensive.
| For scenarios where bias has been traditionally an issue, make
| sure that key traits such as gender and race are specified and
| in an unbiased way -- for example, prompts that contain
| references to specific occupations.
| Terretta wrote:
| If you were wondering how to bump up your API rate limits through
| usage, _this is the way_.
|
| // also, it's the _best_ way - TY @z991
| indymike wrote:
| Interesting how similar this is to my family's favorite game:
| pictograph.
|
| 1. You start by describing a thing. 2. The next person draws a
| picture of it. 3. The next next person describes the picture.
| repeat steps 2 and 3 until everyone has either drawn or described
| the picture.
|
| You then compare the first and last description... and look over
| the pictures. One of the best ever was:
|
| Draw a penguin. The first picture was a penguin with a light
| shadow.
|
| After going around five rounds, the final description was "a
| pidgeon stabbed with a fork in a pool of blood in Chicago"
|
| I'm still trying to figure out how Chicago got in there.
___________________________________________________________________
(page generated 2023-11-27 23:00 UTC)