[HN Gopher] SuperPrompt: Better Text to Image Prompts in 77M Par...
___________________________________________________________________
SuperPrompt: Better Text to Image Prompts in 77M Parameters
Author : roborovskis
Score : 72 points
Date : 2024-03-14 16:35 UTC (6 hours ago)
(HTM) web link (brianfitzgerald.xyz)
(TXT) w3m dump (brianfitzgerald.xyz)
| ShamelessC wrote:
| Nice. I've been using GPT-4-turbo with a custom system prompt for
| this until now. Going to try this out.
| pstorm wrote:
| I'm surprised this isn't getting more love. I love the concept of
| finetuned, hyper-specific, tiny LLMs. Of course, the data is the
| most important part.
| roborovskis wrote:
| Thanks for the kind words! I started with the 780M param
| flan-t5-large model, and kept trying smaller and smaller base
| models - I was shocked at how good the output was at 77M. As
| you go smaller, though, it's much easier to accidentally
| overfit or collapse the model and produce gibberish. Had to be
| very careful with hyperparams and sanitizing / filtering the
| dataset.
| vunderba wrote:
| This is neat and some thing (aka text "expanders") that I imagine
| a lot of the commercial offerings (midjourney, etc) are using
| behind the scenes.
|
| This seems to be targeting SDXL workflows, but in my experience a
| lot of the custom checkpoints derived from SDXL can have widely
| divergent recommended prompting styles ranging from natural
| language to just a list of booru tags.
|
| So I'm guessing this is really only optimized for base SDXL, but
| I would be curious to see how well it worked on some of the more
| SOTA SDXL checkpoints such as juggernaut and unstable.
| roborovskis wrote:
| I haven't tested extensively with non SDXL based checkpoints
| but there's nothing really SDXL specific about the model; if
| you're using a fine-tune that's trained on booru-style tags, it
| will probably not work as well - but otherwise it should work
| just fine. And in that case, just fork the project and tune it
| on however your fine-tune prompts best :)
| thorum wrote:
| It's impressive how well the T5 family of models has aged, even
| compared to newer LLM architectures.
| htrp wrote:
| encoder decoder vs decoder only
| gregtc wrote:
| Great work! I'd recommend including the "max_length=77" parameter
| in your example, and it seems like the huggingface hosted
| interface is broken because of the tokenizer. Also, I think your
| website link on X is outdated.
| roborovskis wrote:
| will fix these, thanks for the heads up!
| smcleod wrote:
| Awesome work! I'd love to see how this could be integrated with
| existing tools like InvokeAI.
| roborovskis wrote:
| As Invoke is open-source and already has transformers as a
| dependency, it should be pretty easy to add.
| lionkor wrote:
| > Left: Drawbench prompt "A rainbow penguin in a tuxedo". Right:
| SDXL output with SuperPrompt applied to the same input prompt.
|
| Neither is wearing a tuxedo.
| roborovskis wrote:
| Yup, the model will still forget details sometimes. This is a
| common issue with prompt upsampling methods, but I'm hoping to
| improve this with the next version.
| hanniabu wrote:
| I wonder how much of that could be due to "tuxedo penguin"
| being a thing
| Lerc wrote:
| Is the lack of training data the only thing preventing this
| approach from being applied to both positive and negative prompts
| together?
|
| What size data set is actually needed? Does it need to be machine
| generated or can you get away with something smaller, perhaps
| crowdsourced?
| roborovskis wrote:
| You could definitely use this for upsampling negative prompts,
| though I haven't tested that much. In theory, future T2I models
| shouldn't need to be negatively prompted as much; I find it's
| better to focus on really high quality positive prompts, as
| that is closer to the captions the model was trained on.
|
| You can take a look at the dataset here:
| https://huggingface.co/datasets/roborovski/upsampled-prompts...
| Roughly 5k samples were needed for the smaller ones at a
| minimum, filtered from the 95k total generated.
| ultrasaurus wrote:
| I was reading a blog today[1] that was pretty confident that
| "continual orders-of-magnitude increases in compute usage [by AI]
| will utterly drown any changes in efficiency" but this is just
| one of a million ways we can make AI more efficient. It doesn't
| seem like a foregone conclusion that the costs will get order-of-
| magnitudeS more expensive on every axis.
|
| 1: Paywalled: https://www.noahpinion.blog/p/three-threats-to-the-
| age-of-en...
___________________________________________________________________
(page generated 2024-03-14 23:00 UTC)