[HN Gopher] Opendream: A layer-based UI for Stable Diffusion
___________________________________________________________________
Opendream: A layer-based UI for Stable Diffusion
Author : varunshenoy
Score : 250 points
Date : 2023-08-15 17:38 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tavavex wrote:
| Very exciting. The "first-generation" Stable Diffusion frontends
| seem to have settled on a specific design philosophy, so it's
| interesting to see new tools (like this or ComfyUI) shake up the
| way people work with this tool. I hope that in a few years, we'll
| know which philosophy works best.
| bavell wrote:
| I wrote a typescript API generator for ComfyUI, works great -
| hopefully will have time to release it soon.
|
| I think there's so much unexplored potential in UI and
| workflows around generative AI, we've barely scratched the
| surface. Very exciting times ahead!
| ssalka wrote:
| I bet this will be available as an Automatic1111 extension by
| end of month.
| TillE wrote:
| Out of all the AI-related tools, generative art frontends are
| probably the thing most likely to radically change and improve
| in the next few years.
|
| It's specifically why I've avoided diving too deep into "prompt
| engineering", because the kind of incantations required today
| just aren't going to be the way most people interact with this
| stuff for very long.
| orbital-decay wrote:
| _> Out of all the AI-related tools, generative art frontends
| are probably the thing most likely to radically change and
| improve in the next few years._
|
| The difference between UIs is actually not very relevant
| today; by now the generic workflow for complex scenes is more
| or less obvious to anyone who spent time with SD.
|
| - Draw basic composition guides. Use them with controlnets or
| any other generic guidance method to enforce the environment
| composition you want. Train your own controlnet if you need
| something specific. (lots of untapped potential here)
|
| - Finetune the checkpoint on your reference pictures or use
| other style transfer methods to enforce the consistent style.
|
| - Use manual brush masking, manually guided segmentation (ex.
| SAM), or prompted segmentation (ex ClipSEG) to select the
| parts to be replaced with other objects. The choice depends
| on your case and need to do it procedurally.
|
| - Photobash and add detail to the elements of your scene
| using any composition methods you have (noisy latent
| composition, inpainting etc) with the masks you created in
| the previous step. Use advanced guidance (controlnets, t2i
| adapters etc)
|
| - Don't bother with any prompts beyond very basic
| descriptions, as "prompt engineering" is slow and unreliable.
| Don't overwhelm the model by trying to fit lots of detail in
| one pass; use separate passes for separate objects or
| regions.
|
| - Alternative 3D version: build a primitive 3D scene from
| basic props (shapes, rigs). Render the backdrop and separate
| objects into separate layers as guides. Use them with
| controlnets & co to render the scene in a guided manner,
| combining the objects by latent composition, inpainting, or
| any other means. This can be used for procedural scenes and
| animation (although current models lack temporal stability).
|
| As long as your tool has all that in one place, it's a
| breeze, regardless of the UI paradigm (admittedly auto1111's
| overloaded gradio looks straight out of a trash compactor
| nowadays). I expect 2D/3D software integrations being the
| most successful in the future, as they already offer proven
| UIs and most desirable side features.
| bobboies wrote:
| Incantations are fun!
| greggsy wrote:
| It's entirely likely that there's much more effort going into
| generative text - any perceived advancement of generative
| images is going to be disproportionately skewed due the
| richness of information that they hold.
| toenail wrote:
| First thoughts, how do I bind to an ip, and where can I install
| models?
| smallerfish wrote:
| Slap a virtualenv setup into that install script please. A system
| wide pip install is a bad pattern.
| [deleted]
| varunshenoy wrote:
| done :)
| noman-land wrote:
| Now that's agile.
| adventured wrote:
| Not a bad start. One quick suggestion: avoid the temptation to
| make it overly complex.
|
| Stable Diffusion needs to go out to the masses to a greater
| degree. The unnecessary garbage complexity (eg Comfy's ridiculous
| noodlescape) that developers keep including into the UIs is
| holding Stable Diffusion back significantly from a greater mass
| adoption.
| bavell wrote:
| Node based workflows with little DRY capability (i.e. ComfyUI)
| do get painful as the workflow grows. That said, an http server
| capabable of executing ML DAGs is extremely useful and a great
| building block for other tools and UIs to be built upon.
|
| I wrote a typescript API generator for ComfyUI recently and
| having programmatic access to let you build and send the
| execution graphs is a game changer. Hoping to have time to
| release it soon. Same can easily be done for any other
| language. Exciting stuff!
| cwkoss wrote:
| Very cool. Would be interesting to train a model on images with
| alpha channels so outputs would be automatically masked and more
| easily composable. But maybe masking is so good these days that
| would be futile?
|
| When a user does img-2-img on a layer does it use the context
| from other visible layers in the generation?
| mdp2021 wrote:
| > _Would be interesting to train a model on images with alpha
| channels_
|
| Would be even more interesting to get an ANN middle system of
| ontology of the (finally) represented content in order to
| change the single items.
|
| An internal representation of qualified structured items in
| space as part of the chain. Prompt > accessible internal
| representation > render.
| dheera wrote:
| For composing this approach works pretty well, maybe the author
| should consider making a UI for it
|
| https://multidiffusion.github.io/
| mottiden wrote:
| Thanks for posting. Really interesting
| Zetobal wrote:
| Segmentation is solved... https://github.com/RockeyCoss/Prompt-
| Segment-Anything
| michaelt wrote:
| Segment Anything is neat, but segmentation is far from
| solved.
|
| If the user generates a picture of a horse and rider to add
| onto another composition - they probably want to include the
| saddle.
| GaggiX wrote:
| SAM is also conditioned on points, if it's ambiguous what
| you want to mask you can add a point on the saddle and the
| model will add it without a problem, segmentation is pretty
| much solved, I agree with the parent post.
| bavell wrote:
| IME I haven't gotten great results using SAM, maybe it
| was just the images I was using? They weren't great
| quality and it seemed to struggle with low contrast areas
| brianjking wrote:
| Is it possible to add SD XL support for this?
|
| I'd love a colab notebook if anyone has the skill and time to do
| so.
| varunshenoy wrote:
| If anyone wants to add SDXL support, all you have to do is
| create a new extension with the correct SDXL logic (loading
| from HF diffusers, etc.). You could parameterize
| `num_inference_steps`, for example, to delegate decisions to
| the user of the extension.
|
| If anyone gets to making one before me, please leave a PR!
| smrtinsert wrote:
| There's great articles on how layered uis are a lot easier to use
| than node based uis. Really excited to see a layered approach to
| SD. Its definitely time to break out of gradio.
| TeMPOraL wrote:
| Maybe if they're talking about layered UIs with layer groups,
| which turn a flat stack into something resembling a tree. But
| even these UIs don't give you proper non-destructive editing -
| anything more complex requires you to duplicate parts of layer
| stack to feed as inputs, which is a destructive operation with
| respect to structure (those pasted layers won't update if you
| make changes to copied source). Doing this properly requires a
| DAG, at which point you're at node-based UIs (or some
| idiosyncratic mess of an UI that pretends it's not modelling a
| DAG).
|
| It's all moot though, because as far as I know, there is no
| proper 2D graphics editing software that uses DAGs and nodes.
| Everyone just copies Photoshop. Especially Affinity, which is
| grating, given their recent focus on non-destructive editing.
| For some reason, node-based UIs ended up being a mainstay of
| VFX, 3D graphics, and VFX & gamedev support tooling. But
| general 2D graphics - photo editing, raster and vector
| creation? Nodes are surprisingly absent.
| gatane wrote:
| Is this related to Melondream?
| __loam wrote:
| [flagged]
| dang wrote:
| Maybe so, but please don't post unsubstantive comments to
| Hacker News.
| visarga wrote:
| Why, did you lose your art because of AI?
| valine wrote:
| Theft takes the original, piracy makes a copy, AI art remixes
| the original. I'm not sure how to classify AI art but it
| definitely isn't theft.
| yieldcrv wrote:
| Derivative works with zero copyright protection due to the
| predominance of machine assists
|
| No way to quantify though, for or against copy protection
|
| But thats a convenient compromise for now
| slowmovintarget wrote:
| So piracy may be involved in training the model, but the rest
| does not follow.
|
| Art inspired by other art has been the way of things for as
| long as we've been creating images. There's no such thing as
| a "clean-room painting."
| __loam wrote:
| Using unlicensed copies of other people's work in training
| is the problem, along with what that does to the market for
| original works. Using people's labor for AI training
| without permission or compensation will discourage people
| from sharing that work and ultimately make the AI models
| worse too.
| joemi wrote:
| Doesn't this entirely depend both on what its been trained on
| and what style is being output? But also, philosophically, is
| it even "theft" to make something in-the-style-of someone?
|
| I believe these questions and their complex answers are the
| reason you've been downvoted.
| coding123 wrote:
| I understand why the person was downvoted, but not why the
| person was flagged. It doesn't make sense for someone to flag
| "AI art is theft."
|
| Downvotes because you didn't back up what you mean.
|
| Flagged because there are AI fanboys that want to sensor
| speech perhaps?
| stale2002 wrote:
| I would say that it absolutely deserved to be flagged
| because it was a comment of little engagement.
|
| It both isn't directly related to the original post, and
| also didn't even make any particular arguments. It was just
| a 5 word declaration of fact, that is borderline offtopic.
| cercatrova wrote:
| Flagged because it was an unsubstantive comment as dang
| mentioned and also that it's increasingly a flame bait
| topic on HN, same as with Copilot and licensing.
| mcclux wrote:
| "This pixel right here officer; clearly stolen."
| [deleted]
| tomalaci wrote:
| I haven't followed diffusion image generation development for a
| while. Where do you find information on what models you can use
| in the model_ckpt field? Do I need to import them from somewhere?
| What are the main differences between them and which are more
| modern or better?
| nickstinemates wrote:
| You can find them on huggingface, or you can reverse engineer
| which ckpt you want to use based on an image you've seen
| generated (like at majin[1] - beware, there's a lot of
| NSFW/controversial stuff here.)
|
| 1: https://majinai.art/
| bavell wrote:
| Also CivitAI but beware the NSFW
|
| https://civitai.com/
| CSSer wrote:
| Some of this is straight up soft-core child porn. This is
| fucked up.
| greggsy wrote:
| I believe illustrations have been deemed to be abuse
| material, so I wouldn't be surprised if LE have started
| looking into it.
| GaggiX wrote:
| illustrations are not a problem under the law in the
| United States, but it has to be seen for generated images
| indistinguishable from reality or almost.
| kleiba wrote:
| Who exactly is being abused here?
|
| I for one would much rather give pedophiles an
| opportunity to fulfil their sexual desires through AI-
| generated pictures than real ones.
|
| Of course, we can talk about the training material. Are
| there actual child porn images in there? I seriously
| doubt it but who knows?
|
| And perhaps a case could be made that AI-generated child
| porn could be a gateway to invite people who then seek
| out non-generated material.
|
| But I think these are separate discussions to be had.
| CSSer wrote:
| Geez that's disturbing. I clicked having no qualms with
| nudes, artistic or otherwise. I'm not a prude. I've seen
| my fair share of anime girls and AI nudes. Hell, I was
| raised on the internet before parental settings were a
| thing, but I didn't expect that. It's so gross how it
| toes a line too.
| dingnuts wrote:
| the Fediverse has a big problem with this, too, and I
| never hear anyone talking seriously about it
| ryukoposting wrote:
| If it can handle LoRas, I'll be sure to try it out this weekend.
| Hamcha wrote:
| What's up with names nowadays? Not only there's already an
| OpenDream[1] on GitHub, but there's also a Stable Diffusion
| service also called OpenDream[2]!
|
| 1. https://github.com/OpenDreamProject/OpenDream 2.
| https://opendream.ai/
| [deleted]
| antman wrote:
| Can you add a layer with e.g. an image of yourself?
| ttul wrote:
| Pretty sure you can do this. Diffusion models by default start
| with noise, but you can start with any data, including an
| existing image. For instance, you could import a photo of
| yourself, mask the eyes and then ask the model to make them
| green.
| asynchronous wrote:
| Very cool honestly, seems like a much needed improvement over
| Automatic. Does it support LoRa/will it support in near future?
___________________________________________________________________
(page generated 2023-08-15 23:00 UTC)