[HN Gopher] Stable Diffusion Text-Prompt-Based Inpainting - Repl...
___________________________________________________________________
Stable Diffusion Text-Prompt-Based Inpainting - Replace Hair,
Fashion
Author : amrrs
Score : 59 points
Date : 2022-09-19 20:03 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| bryced wrote:
| This python library does the same thing (but didn't get traction
| when I posted it yesterday):
|
| https://github.com/brycedrennan/imaginAIry#automated-replace...
|
| https://news.ycombinator.com/item?id=32887385
|
| And I got the idea from here:
|
| https://github.com/ThereforeGames/txt2mask
|
| Which is using the model here:
|
| https://github.com/timojl/clipseg
|
| Clipseg is doing the hard part!
| stavros wrote:
| This looks great! I've been looking for a Python library to use
| with Phantasmagoria[1] for ages, but everyone is doing web UIs.
| You even packaged it up in a Docker container, very nice, thank
| you!
|
| [1]: https://phantasmagoria.stavros.io
| fariszr wrote:
| The progress in the AI space is absolutely astounding.
|
| In less then a year, we went from no AI photo generation(from
| prompts), to DALL-E2 a commercial service, then competitors
| started popping up like mid journey, and now we have Stable
| diffusion, which is a source available AI you can run your self,
| unlocking implementations like this.
|
| There are other companies now hyping AI video generation like
| runaway(1)
|
| https://twitter.com/runwayml/status/1568220303808991232
| d3ckard wrote:
| Totally disagree. A whole of AI business space seems totally
| focused on pushing the boundaries of what is possible,
| completely ignoring delivering something consistently useful. I
| played a bit with image generation recently and most results
| were abysmal. Sure, it can create great things and prompt
| hacking will be a thing for a while. It's however very far from
| "for each prompt I get a working (as in not broken with
| artifacts) and matching image". IMO business usability depends
| on average case mostly and this hasn't impressed me yet.
|
| The elephant in the room is the "black box" nature of all
| neural networks. They are not interpretable for humans,
| therefore we cannot know when they can royally screw up. That
| means unless we keep a human in the loop, it's hard to really
| integrate it into anything critical. And keeping humans out of
| the loop is what most AI companies promised as an end goal.
|
| Basically, I am embracing incoming AI winter. Not because no
| great progress has been made, but because what was promised
| will never be delivered (as have been the case previous times).
| At the same time, AI is here to stay and AI based tools are
| going to become common place. It will just be less of a great
| deal than everybody expected.
| joyfylbanana wrote:
| It is how you measure "progress". For you progress seems to
| be only about business. I think majority of people (including
| me) are just delighted about this new toy that brings them
| joy. In my life it is great progress if I get new innovative
| toys that haven't been available before.
| d3ckard wrote:
| How did you get to conclusion I think progress is mostly
| about business?
|
| My point is about promises that funded current wave of AI
| craziness do not seem to get fulfilled. At the precise
| moment it becomes obvious for everybody, funding will stop
| and bet on the next horse, whatever it will be.
|
| It seems that autonomous driving got stuck on "driver must
| be ready to take control". If that doesn't change, Uber is
| just a glorified tax corporation. Tesla car revolution
| ain't happening, if my car can't drive me to a spot without
| assistance (allowing me to sleep or be drunk or whatever).
| They become just another car company with a head start on
| electric.
|
| And the rest of the AI industry seems to follow the pattern
| for me - great results (cars can mostly drive themselves
| nowadays, that's insane!), but always a notch less than
| expected. Because what was expected were humans
| replacements and what we got is human augmenters. It's
| probably better for humanity, as productivity will rise,
| but humans won't be cut from the loop. I just don't think
| it is this particular result that Big Money had in mind
| when they poured money over it.
| abeppu wrote:
| I think image generation is an interesting case, because even
| if a human is always in the loop, and you need to try several
| times before you get a good image for your prompt of
| interest, that's likely still faster and cheaper than
| photoshopping exactly what you want (or certainly faster than
| hiring an illustrator). And the images produced are sometimes
| really quite good. A model which produces some amount of
| really messed up images can still be 'useful'.
|
| _However_ the kinds of failures it makes highlight that these
| models still lack basic background knowledge. I'm willing to
| let the stuff about compositionality slide -- that's asking
| kind of a lot. But I do draw a very straight line from
| DeepDream in 2015 producing worm-dogs with too many legs and
| eyes, style-gan artifacts where the physical relationship
| between a person's face or clothing and surroundings was
| messed up, and the freakish broken bodies that stable
| diffusion sometimes creates. Knowing about the structure of
| _images_ only tells you a limited amount about the structure
| of things that occupy 3d space, apparently. It knows what a
| John Singer Sargent portrait looks like, but it's not totally
| sure that humans have the same number of arms when they're
| hugging as when they're not.
|
| In the same way, large language models know what text looks
| like, but not facticity.
|
| So I don't know that an AI winter is called for. But maybe we
| should lean away from the AI optimism that we can keep
| getting better models by training against the kinds of data
| that are easiest to scrape?
| seydor wrote:
| That 's great it means people can keep 'playing' and
| innovating before regulation and greedy people join in .
| drusepth wrote:
| >Totally disagree. A whole of AI business space seems totally
| focused on pushing the boundaries of what is possible,
| completely ignoring delivering something consistently useful.
|
| Interestingly, Midjourney is taking an approach you might be
| interested in, where they're fine-tuning their model to
| prioritize consistent, visually-appealing outputs with even
| the most vague prompts (e.g. "a man").
|
| And... it's really making me appreciate its competitors more.
| This always-good-enough consistency is very much a double-
| edged sword, IMO, because it also results in a very same-y
| feel for most Midjourney images (and kind of makes me
| appreciate instantly-recognizable MJ images a little less, in
| a way not unlike how I used to be impressed by starry-sky
| spraypaint pieces and then realized they're basically SP101).
| You almost always get something good out (at a rate I'd feel
| comfortable wrapping a production-quality app around) but
| it's become harder and harder to produce _new_ visuals
| /aesthetics as Midjourney has progressed closer to their
| desired consistency levels.
|
| Back when I started on it, I'd get interesting images every
| 5-10 generations that I'd then tweak and get even more
| interesting images. Now I'm lucky to see something
| new/interesting every 5-10 generations, although everything
| in between is _fine_.
|
| My background here, FWIW: according to the site, I've been
| using Midjourney for 4 months straight and generated almost
| 10,000 images. I also have ~700GB of generations on disks
| from other models in the meantime and run a few sites that
| basically do wrap these kind of generation models, like
| novelgens.com, that try to find a good ratio between
| consistency and divergence.
|
| In the grand scheme of things, I think the AI generation
| space needs both ends of the spectrum: consistent results
| like Midjourney lower the barrier of entry for new people to
| explore the space, but prompt-dependent powerhouses like
| Stable Diffusion enable artists to push the tooling further
| and have significantly better control over the art they're
| trying to create.
| paulgb wrote:
| What's being delivered _is_ useful. I agree that you still
| need a human in the loop, but that's true of any creative
| tool -- having Adobe Illustrator doesn't make me an artist.
| The current generation of tools has made certain design tasks
| easier, the main thing missing still is not ML advances as
| much as nice UIs that put it in the hands of creative
| professionals.
| MikeYasnev007 wrote:
| Sharing masks and code open sourcing is a arguable question. I
| definitely don't share anything outside of commercial company
| MikeYasnev007 wrote:
| Regarding sharing some more interesting documents like shooting
| nuclear station it's a too simple tech but I will check it also.
| Thank you
| seydor wrote:
| > Enhance 34 to 36. Pan right and pull back. Stop. Enhance 34 to
| 46. Pull back. Wait a minute, go right, stop. Enhance 57 to 19.
| Track 45 left. Stop. Enhance 15 to 23. Give me a hard copy right
| there.
| londons_explore wrote:
| Why does the mask need to be binary?
|
| Surely it's possible to have a full alpha mask, such that 50%
| alpha means "push the diffusion process towards this value, but
| don't force it to generate this value".
| bryced wrote:
| the "alpha" is already built into the tooling and specified
| independent of the mask. IE. the stable diffusion inpainting
| takes hints from what you leave and "decides" what to keep
| phire wrote:
| Surely you don't need to?
|
| Just over-expand the mask and let stable diffusion decide what
| it's going to keep and what it's going to replace.
___________________________________________________________________
(page generated 2022-09-19 23:00 UTC)