hngopher.com

       [HN Gopher] Show HN: New AI edits images based on text instructions
       ___________________________________________________________________
        
       Show HN: New AI edits images based on text instructions
        
       This works suprisingly well. Just give it instructions like "make
       it winter" or "remove the cars" and the photo is altered.  Here are
       some examples of transformations it can make: Golden gate bridge:
       https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...
       Girl with a pearl earring:
       https://raw.githubusercontent.com/brycedrennan/imaginAIry/ma...  I
       integrated this new InstructPix2Pix model into imaginAIry (python
       library) so it's easy to use for python developers.
        
       Author : bryced
       Score  : 995 points
       Date   : 2023-01-22 04:25 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sfpotter wrote:
       | These look awful! They are very displeasing aesthetically. They
       | look like they were done by someone with absolutely no artistic
       | ability. Clearly there is some technical interest here, but I
       | just felt the need to point out the elephant in the room. They
       | are _very ugly_.
        
         | aflag wrote:
         | I'm not an artist, but they look fine to me. I am not the kind
         | of person who spends hours in a Gallery admiring the nuances of
         | paintings or photos. However, at the level of detail I usually
         | admire these things, the clown one looked interesting and the
         | Monalisa one was funny. The strawberry seemed a bit weird, but
         | I don't think I'd care for it even if it was perfect anyway.
         | The wintery landscape I thought was pretty good and the red dog
         | it delivered what was asked. Not sure how it could be much
         | different than that.
        
           | sfpotter wrote:
           | "I'm not qualified to have a nuanced opinion about this, but
           | let me confidently tell you what I think..."
        
             | aflag wrote:
             | Well, most people who consume art are not professional
             | artists. That was my main point. From the point of view of
             | a lay person (such as I), it looks pretty good.
        
               | sfpotter wrote:
               | I guess we should lower our standards, then...
        
         | sh4rks wrote:
         | I love the cognitive dissonance between "you're just stealing
         | people's art and modifying them slightly!" versus "AI art sucks
         | and has no artistic value"
        
           | sfpotter wrote:
           | I mean, the two are obviously _not_ mutually exclusive.
        
       | xwdv wrote:
       | How about "fix the hands"?
        
         | social_quotient wrote:
         | The hands issue is going to be an awesome story for all of us
         | in 10-20 years. The younger generation just won't fathom how
         | hard it was to get proper hands. I wonder what a parallel
         | comparable now would be? Something the slightly technical
         | general public just can't wrap their head around why it was
         | complicated "back then".
         | 
         | Maybe todays example is a smart voice assistant like Alexa.
        
           | xwdv wrote:
           | Or maybe it will never be fixed, and in the future when they
           | are trying to determine if someone is a human or an
           | artificial replica, they will simply ask them to draw a set
           | of human hands as a test.
        
             | mensetmanusman wrote:
             | Someone needs to redo the blade runner scene with SD with
             | the hands question :)
        
             | AlecSchueler wrote:
             | Most humans would struggle to draw human hands as well.
        
         | lgas wrote:
         | Well, this already uses this default negative prompt
         | https://github.com/brycedrennan/imaginAIry/blob/master/imagi...
         | so it may fix the hands automatically.
        
       | Uehreka wrote:
       | It's a little premature, fine, but I want to start liquidating my
       | rhetorical swaps here: I've been saying since last summer
       | (sometimes on HN, sometimes elsewhere) that "prompt engineering"
       | is BS and that in a world where AI gets better and better,
       | expecting to develop lasting competency in an area of AI-adjacent
       | performance (a.k.a. telling an AI what to do in exactly the right
       | way to get the right result) is akin to expecting to develop a
       | long-lasting business around hand-cranking people's cars for them
       | when they fail to start.
       | 
       | Like, come on. We're now seeing AIs take on tasks many people
       | thought would never be doable by machine. And granted, many
       | people (myself included to some extent) have adjusted their
       | priors properly. And yet so many people act like AI is going to
       | stall in its current lane and leave room for human work as
       | opposed to developing orders of magnitudes better intelligence
       | and obliterating all of its current flaws.
        
         | chadcmulligan wrote:
         | Haven't we been here before? - see self driving cars.
        
           | cududa wrote:
           | No.
        
           | kristiandupont wrote:
           | I've certainly seen this argument before..
           | 
           | Yes, it's true that not all technology evolves as fast as
           | predicted (by some, at some point), but first of all I still
           | believe we will see self driving cars in the future and
           | secondly, it's one anti-example in a forest of examples of
           | tech that evolves beyond anyone's expectations. I don't find
           | it very convincing.
        
             | trompetenaccoun wrote:
             | Autonomous driving as it currently exists came unexpected
             | for most people. Now many look at it with the power of
             | hindsight but back in the day the majority never thought
             | we'd have cars (partly) driving on their own within a few
             | years. The case of AI art seems the exact same to me, now
             | that many are working on it there's lots of progress but
             | it's still nowhere near what an experienced human can do.
             | And that seems to be the general rule, not an exception.
             | 
             | We might need to create real intelligence for that to
             | become true. A machine that can think and is aware of its
             | purpose.
        
           | c7b wrote:
           | LLMs and Image AIs are the opposite of self-driving cars.
           | "Everybody" had concrete expectations for at least half a
           | decade now that the moment where self-driving cars would
           | surpass human ability was imminent, yet the tech hasn't lived
           | up to it (yet). While practically nobody was expecting AI to
           | be able to do the jobs of artists, programmers or poets
           | anywhere near human level anytime soon, yet here we are.
        
             | Der_Einzige wrote:
             | Still bad at poetry due to the tokenizer though. I wrote a
             | whole paper on how to fix it:
             | https://paperswithcode.com/paper/most-language-models-can-
             | be...
        
               | c7b wrote:
               | Great work, congratulations! One question, if I
               | understood it right you based your demo on GPT-2 - what
               | is your experience working with those open-source
               | language AIs. In terms of computational requirements and
               | performance?
               | 
               | I'm really fascinated by all the tools the OS community
               | is building based on StableDiffusion (like OP's), which
               | compares favourably with the latest closed-source models
               | like Dall-E or Midjourney, and can run reasonably well on
               | a high-end home computer or a very reasonably-sized cloud
               | instance. For language models, it seems the requirements
               | are substantially higher, and it's hard to match the
               | latest GPT versions in terms of quality.
        
             | yyyk wrote:
             | If LLMs (etc.) had the same requirements and business
             | models as AV cars they'd still be considered a failure.
             | Nobody expects Stable Diffusion to have a 6-sigma accuracy
             | rate*, nor do we expect ChatGPT to seamlessly integrate
             | into a human community. The AV business model discourages
             | individual or small scale participation, so we wouldn't
             | even have SD (would anyone allow a single OSS developer to
             | drive or develop an AV car? Ok, there's Comma, that's all
             | there is on the OSS side).
             | 
             | * The amount of times that I've seem an 'impressive'
             | selections of AI images that I consider a critical failure
             | deserves it's own word. The AIs _are_ impressive for even
             | getting that far, it 's just that some people have bad
             | taste and pick the bad outputs.
        
         | [deleted]
        
         | js8 wrote:
         | Prompt engineering already exists, it's called management.
        
         | xdfgh1112 wrote:
         | I think that "prompt engineering" stuff went away when ChatGPT
         | came out.
        
           | c7b wrote:
           | Has it? I mean, maybe the idea of people doing this as a
           | long-time career has, but practically, I still find it a
           | challenge to get those AIs to do exactly what I want. I've
           | played around with Dreambooth-style extensions now, and that
           | goes some way for some applications, and I'm excited to try
           | OP's solution, but in my experience, it is still a bit of a
           | limitation for working with those AIs _right now_.
        
             | rjh29 wrote:
             | Oh yeah it's definitely still an issue right now! But I
             | think the power of ChatGPT's ability to understand and
             | execute instructions has convinced most people that "prompt
             | engineering" isn't going to be a career path in the future.
        
               | c7b wrote:
               | Absolutely. I briefly thought about asking ChatGPT to
               | write a prompt, but then I remembered that the training
               | corpus is probably older than those tools (I heard that
               | if you ask it the right way, it will tell you that its
               | corpus ended in 21 - whether it's true or not, it sounds
               | plausible). But that's a truly temporary issue, the
               | respective subreddits probably have enough information to
               | train an AI for prompt engineering already (if you start
               | from a strong foundation like the latest GPT versions).
               | 
               | Plus, who knows whether future models won't be able to
               | integrate those different modes much better (along those
               | lines https://www.deepmind.com/publications/a-generalist-
               | agent).
        
               | rjh29 wrote:
               | In the near future you can totally imagine a dialogue
               | like that you'd have with a real designer, "can you make
               | it pop a bit more?" or "can you move that logo to the
               | right side?". It might some trial and error but it's only
               | going to improve.
               | 
               | Making the AI truly creative (which means going beyond
               | what the client asks for, towards things the client
               | doesn't even know they want) would be a much larger leap
               | and potentially take a lot longer.
        
               | TeMPOraL wrote:
               | I don't get it. Pre-ChatGPT prompt engineering was a BS
               | exercise in guessing how a given model's front-end
               | tokekizes and processes the prompt. ChatGPT made it only
               | more BS. But I've seen a paper the other day,
               | implementing more structured, formal prompt language,
               | with working logic operators implemented one layer below
               | - instead of adding more cleverly structured English,
               | they were stepping the language model with variations of
               | the prompt (as determined by the operators), and did math
               | on probability distributions of next tokens the model
               | returned. That, to me, sounds like valid, non-BS
               | approach, and strictly better than doubling down on
               | natural language.
        
               | c7b wrote:
               | Think about the problem in an end-to-end fashion: the
               | user has an idea of what sort of image they want, they
               | just need an interface to tell the machine. A combination
               | of natural language plus optional image/video input is
               | probably the most intuitive interface we can provide (at
               | least until we've made far more progress on reading brain
               | signals more directly).
               | 
               | How exactly we get there, by adding layers like on top
               | like language models, or adding layers below like what
               | you described, doesn't seem like such a fundamental
               | difference. It's engineering, you try different
               | approaches, vary your parameters and see what works best.
               | And from the onset, natural language does seem like a
               | good candidate for encoding nuances like "make it pink,
               | but not cheesy" or "has the vibes of a 50's Soviet
               | propaganda poster, but with friendlier colors".
        
         | fassssst wrote:
         | Large language models are stateless. The apps and fine-tuned
         | models are doing prompt engineering on users' behalf. It's very
         | much a thing for developers, with the goal of making it
         | invisible for end users.
        
         | felipeerias wrote:
         | IMHO it stems from lack of imagination. Impressive as the
         | results may sometimes be, the user interfaces for AI are still
         | extremely crude.
         | 
         | Soon we will see AI being used to define semantic operations on
         | images that are hard to define exactly (imagine a knob to make
         | an image more or less "cyberpunk", for example).
         | 
         | I also expect AI-powered inpainting to become a ubiquitous
         | piece of functionality in drawing and editing tools (there are
         | already Photoshop plugins).
         | 
         | Furthermore, my hunch is that many of the use cases around
         | image creation will gradually move towards direct manipulation.
         | Somewhat like painting, but without a physical model. AI
         | components will be probably applied to interpreting the user's
         | touch input in a similar way to how they are currently deployed
         | to understand text input.
        
         | bryced wrote:
         | Been doing a lot with prompts lately. What people are calling
         | "prompt engineering" I'd call "knowing what to even ask for and
         | also asking for it in a clear manner". That was a valuable
         | skill before computers and will continue to be one as AI
         | progresses.
         | 
         | I've been pretty disappointed to introduce ChatGPT to people in
         | jobs where it would be a game changer and they just don't know
         | what to do with it. They ask it for not-useful things or useful
         | things in a non-productive way. "here is some ad copy I wrote,
         | write it better". Whether you're instructing a human, chatgpt,
         | or AI god... that's just too vague of instructions.
        
           | lobocinza wrote:
           | Most people struggle with deliberate logical thought.
        
           | Semaphor wrote:
           | > I'd call "knowing what to even ask for and also asking for
           | it in a clear manner".
           | 
           | It was a very important skill for searching. Nowadays, with
           | Google "I know what you want better than you" search, it's
           | not that useful anymore (not useless, I get better search
           | results by not using google and knowing what I want, just
           | less required).
        
         | gpderetta wrote:
         | If Asimov had robopsychologists in his stories, why can't we in
         | real life? Who wants to be the first Susan Calvin?
        
       | bryced wrote:
       | Here is a colab you can try it in. It crashed for me the first
       | time but worked the second time.
       | https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjC...
        
         | Damirakyan wrote:
         | How would I upgrade to 2.1 if running locally?
        
           | bryced wrote:
           | If you're wanting to use Stable Diffusion 2.1 with imaginairy
           | you just specify the model with `--model SD-2.1` when running
           | the `aimg imagine` command.
        
             | dang wrote:
             | Sorry for the offtopicness but could you please email me at
             | hn@ycombinator.com? (I have an idea/possibility for you.
             | Nothing that couldn't be posted here in principle but I'd
             | rather not take the thread off topic.)
        
               | Jerrrry wrote:
               | This was an uncanny comment, somehow.
               | 
               | Hope ya'll brainstorming session is fruitful.
        
               | dang wrote:
               | Sorry for the uncanning! My thought was simply to connect
               | the OP with a YC partner in case they wanted to explore
               | doing this as a startup.
               | 
               | I send such emails all the time but on semi-rare
               | occasions have to resort to offtopic pleas like the GP.
               | 
               | I hope that helps clear things up!
        
               | Jerrrry wrote:
               | It does, thank you!
        
         | iuiz wrote:
         | I could not get the first cell to run.
         | 
         | ERROR: pip's dependency resolver does not currently take into
         | account all the packages that are installed. This behaviour is
         | the source of the following dependency conflicts. tensorflow
         | 2.9.2 requires protobuf<3.20,>=3.9.2, but you have protobuf
         | 3.20.3 which is incompatible. tensorboard 2.9.1 requires
         | protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is
         | incompatible.
        
           | bryced wrote:
           | I believe you can just ignore that error. The cell ran.
        
         | cbeach wrote:
         | EDIT - it's free of charge:
         | https://research.google.com/colaboratory/faq.html
         | 
         | ---
         | 
         | First time I've used "colab" - looks great. Out of interest,
         | who pays for the compute used by this?
         | 
         | Is it freely offerred by Google? Or is it charged to my Google
         | API account when I use it? Or your account? It wasn't clear in
         | the UI.
        
           | LewisVerstappen wrote:
           | Freely offered by Google. They offer a subscription model if
           | you want to run your collab notebooks on a beefier machine.
        
         | Tenoke wrote:
         | Huh, I'm trying it now and the results seem so weak compared to
         | any other model I've seen since dall-e.
        
           | bryced wrote:
           | Does dalle do prompt based photo edits now?
           | 
           | But yeah sometimes it doesn't follow directions well. I
           | haven't noticed a pattern yet for why that is.
        
           | la64710 wrote:
           | Hmm that's true for me too. Not sure if it is due to resource
           | constraint. I had a picture of a car in indoor parking lot
           | with walls and pillars. When I prompted "Color the car blue"
           | the whole image was drenched in a tint of blue. Similarly
           | when prompted "make a hummingbird hover" ... the hummingbirds
           | were a patch of shiny colors with an shape that sort of
           | looked like an hummingbird but not like a real one.
        
             | menzoic wrote:
             | Try "turn the car blue"
        
       | testtwttttt wrote:
       | a
        
       | [deleted]
        
       | lightbulbish wrote:
       | This is cool! Makes me want to pull the trigger on an M2
        
       | [deleted]
        
       | perfopt wrote:
       | How does this work? When I run it on a machine with a GPU
       | (pytorch, CUDA etc installed) I still see it downloading files
       | for each prompt. Is the image being generated on the cloud
       | somewhere or on my local machine? Why the downloads?
        
         | bryced wrote:
         | Shouldn't be downloads per prompt. Processing happens on your
         | machine. It does download models as needed. A network call per
         | prompt would be a bug.
        
           | perfopt wrote:
           | OK. I noticed that the images are not accurate when I give my
           | own descriptions. Not sure if this is a limitation of Stable
           | Diffusion. For example, for the text "cat and mouse samurai
           | fight in a forest, watched by a porcupine" I got a cat and a
           | mouse (with a cat's face and tail!!) in a forest sort of
           | fighting. But no - porcupine
           | 
           | Thank you for creating this.
        
             | bryced wrote:
             | yes stable diffusion is not great about handling multiple
             | ideas. New image models coming out soon though.
        
           | [deleted]
        
           | searchableguy wrote:
           | Is there a way to pre-download all models? I want to create a
           | docker image and cache the models.
           | 
           | Also any way configure the generated file path beyond
           | directory or directly pipe image from the CLI?
        
           | perfopt wrote:
           | I keep seeing this even when the prompt is unchanged
           | 
           | Downloading https://huggingface.co/runwayml/stable-
           | diffusion-v1-5/resolv... from huggingface Loading model
           | /home/hrishi/.cache/huggingface/hub/models--runwayml--stable-
           | diffusion-v1-5/snapshots/889b629140e71758e1e0006e355c331a5744
           | b4bf/v1-5-pruned-emaonly.ckpt onto cuda backend...
           | 
           | followed by a download
        
             | bryced wrote:
             | Oh I see what you're saying. It's not actually downloading
             | but there is a bug where it's not using the cache properly.
             | will fix
        
             | bryced wrote:
             | That is strange. I'm not sure what would cause that unless
             | it was running in some ephemeral environment. What OS? Can
             | you open a github issue with a screenshot?
        
               | perfopt wrote:
               | Ubuntu 22.04.1 LTS. I'll file a issue
        
       | testtwttttt wrote:
       | how about telling cars where to go ?
        
       | GordonS wrote:
       | What are the most affordable GPUs that will run this? (it said it
       | needs CUDA, min 11GB VRAM, so I guess my relatively puny 4GB
       | 570RX isn't going to cut it!)
        
         | smallerfish wrote:
         | It works fine on CPUs. Takes about a minute to generate images
         | on my 8 core i7 desktop.
        
           | cbeach wrote:
           | How do I get it to use CPU rather than GPU, please? I have
           | 128GB of RAM but a lowly 2GB of VRAM in my PC.
        
             | 5e92cb50239222b wrote:
             | Try setting the CUDA_VISIBLE_DEVICES environment variable
             | to ''.
        
               | cbeach wrote:
               | That worked! Thanks.
        
           | Der_Einzige wrote:
           | Hopefully one can utilize a highly intel (or AMD) optimized
           | stack, such as intel version of pytorch to make this run even
           | faster
        
         | ColonelPhantom wrote:
         | The cheapest NVidia GPU with 11+GB VRAM is probably the 2060
         | 12GB, although the 3060 12GB would be a better choice.
         | 
         | The setup.py file seems to indicate that PyTorch is used, which
         | I think can also run on AMD GPUs, provided you are on Linux.
        
           | pifm_guy wrote:
           | I really want these ML libraries to get smarter with use of
           | VRAM. Just cos I don't have enough VRAM shouldn't stop me
           | computing the answer. It should just transfer in the
           | necessary data from system ram as needed. Sure, it'll be
           | slower, but I'd prefer to wait 1 minute for my answer rather
           | than getting an error.
           | 
           | And if I don't have enough system RAM, bring it in from a
           | file on disk.
           | 
           | Tha majority of the ram is consumed by big weights matrices,
           | so the framework knows exactly which bits of data are needed
           | when and in what order, so should be able to do efficient
           | streaming data transfers to have all the data in the right
           | place at the right time. It would be far more efficient than
           | 'swap files' that don't know ahead of time what data will be
           | needed so impact performance severely.
        
             | MrNeon wrote:
             | Using RAM to hold unneeded layers was one of the first
             | optimizations made for Stable Diffusion. The AUTOMATIC1111
             | WebUI implements it, not sure about others.
        
             | albertzeyer wrote:
             | If it doesn't fit in the GPU memory, it will often be
             | faster to just compute everything on CPU. So the framework
             | doesn't really need to be smart. And the framework can
             | easily already compute everything on CPU. So it can do what
             | you want already.
        
         | b33j0r wrote:
         | Can you imagine only being able to cook a hamburger on one
         | brand of grill? But you can make something kinda similar in the
         | toaster oven you can afford?
         | 
         | I want to be productive on this comment... but the crypto/cuda
         | nexus of GPU work is simply not rational. Why are we still
         | here?
         | 
         | You want to work in this field? Step 1. Buy an NVIDIA gpu. Step
         | 2. CUDA. Step 3. Haha good luck, not available to purchase.
         | 
         | This situation is so crazy. My crappiest computer is way better
         | at AI, just because I did an intel/nvidia build.
         | 
         | I don't hate NVIDIA for innovating. The stagnation and risk of
         | monopoly setting us back for unnecessary generations makes me a
         | bit miffed.
         | 
         | So. To attempt to be productive here, what am I not seeing?
        
           | michaelt wrote:
           | _> Can you imagine only being able to cook a hamburger on one
           | brand of grill? [...] the crypto /cuda nexus of GPU work is
           | simply not rational. Why are we still here?_
           | 
           | Because nvidia spent a long time chasing the market and
           | spending resources, like they _wanted_ it.
           | 
           | You wanted to learn about GPU compute in 2012? Here's a free
           | udacity course sponsored by nvidia, complete with an online
           | runtime backed by cloud GPUs so you can run the exercises
           | straight from your browser.
           | 
           | You're building a deep learning framework? Here's a free
           | library of accelerated primitives, and a developer who'll
           | integrate it into your open source framework and update it as
           | needed.
           | 
           | OpenCL, in contrast, behaves as if every member of the
           | consortium is hoping some other member will bear these costs
           | - as if they don't really want to be in the GPU compute
           | business, except out of a begrudging need for feature parity.
           | 
           | And in terms of being rational - if you're skilled enough to
           | be able to add support for a new GPU vendor into an ML
           | library, you're probably paid enough that the price of a
           | midrange nvidia GPU is trivial in comparison.
           | 
           | All is not lost, though - vendors like Intel are increasingly
           | offering ML acceleration libraries [1] and most neural
           | networks can be exported from one framework and imported into
           | another.
           | 
           | [1] https://www.intel.com/content/www/us/en/developer/tools/o
           | nea...
        
           | dist1ll wrote:
           | Because innovating in the hardware space is just a lot more
           | expensive and slow.
           | 
           | Also the vast majority of ML researchers and engineers are
           | not system programmers. They don't care about vendor lock
           | because they're not the ones writing the drivers.
        
           | dannyw wrote:
           | Because:
           | 
           | 1. It's just not a huge deal to many people. Most people who
           | want to do local ML training and inference can just buy a
           | NVIDIA GPU.
           | 
           | 2. AMD only has a skeleton team working on their solution.
           | It's clear it's not a focus.
        
         | bryced wrote:
         | I'm running on a 2080 TI and an edit runs in 2 seconds. On my
         | Apple M1 Max 32Gb edits take about 60 seconds.
        
           | mritchie712 wrote:
           | If this was all packaged into a desktop app (e.g. Tauri or
           | electron) how big would the app be? I'd imagine you could get
           | it down to < 500MB (even if you packaged miniconda with it).
        
             | coder543 wrote:
             | > I'd imagine you could get it down to < 500MB (even if you
             | packaged miniconda with it).
             | 
             | I don't know where that imagined number came from. This
             | tool appears to be using Stable Diffusion, and the base
             | Stable Diffusion model is 4 or 5 gigabytes by itself. I
             | think there are some other models that are necessary to use
             | the base Stable Diffusion model, and while they are
             | smaller, they still add to the total size.
        
         | singhrac wrote:
         | For what it's worth, it ran fine on my 2070 (8GB of VRAM), even
         | with the GPU being used to render my desktop (Windows), which
         | used another ~800MB of VRAM. I was running it under WSL, which
         | also worked fine.
         | 
         | Note the level of investment that NVIDIA's software team has
         | here: they have a separate WSL-Ubuntu installation method that
         | takes care not to overwrite Windows drivers but installs the
         | CUDA toolkit anyway. I expected this to be a niche, brittle
         | process, but it was very well supported.
        
         | cbeach wrote:
         | On the strength of this HN submission I just ordered an RTX
         | 3060 12GB card for PS381 on Amazon so I can run this and future
         | AI models.
         | 
         | This stuff is fascinating, and @bryced's imaginAIry project
         | made it accessible to people like me who never had any formal
         | training in machine learning.
        
         | kadoban wrote:
         | Whatever 3060 variant that has the most VRAM is probably your
         | best shot these days.
        
       | distantsounds wrote:
       | If only Stable Diffusion wasn't already populated with a host of
       | copyrighted images already.
       | 
       | Make your own art, dammit. This is the equivalent running some
       | Photoshop filters through someone else's work.
        
       | fatih-erikli wrote:
       | Garbage.
        
       | sschueller wrote:
       | I am not a fan of software such as this putting in an arbitrarily
       | "safety" feature which can only be disabled via undocumented
       | environment variable. At least make it a flag documented for
       | people who don't have an issue with nudity. There isn't even an
       | indication that there is a "safety" issue, you just get a blank
       | image and are wondering if your GPU/model or install is
       | corrupted.
       | 
       | This isn't running on a website that is open to everyone or can
       | be easily run by a novice.
       | 
       | Anyone capable of installing and running this is also able to
       | read code and remove such a feature. There is no reason to hide
       | this nor to not document it.
       | 
       | Also the amount of nudity you get is also highly dependent on
       | which model you use.
        
         | kaapipo wrote:
         | Can you tell which env variable to set?
        
         | [deleted]
        
         | [deleted]
        
         | sandworm101 wrote:
         | Nudity isnt really the core issue. It is about illegal content.
         | Nudity is the over-protective, over-inclusive bandaid solution
         | to prevent this thing from being used to generate the very
         | illegal material that will trigger authorities.
        
           | Zuiii wrote:
           | Then society shouldn't tolerate knifes either.
           | 
           | Banning an entire class of activity because someone somewhere
           | might abuse it is a ridiculous irrational way to reason about
           | things.
           | 
           | The question is simple: Will the thing mostly be used for bad
           | or good? Unless you think the vast majority of humanity are
           | pedophiles then these features should be allowed.
           | 
           | Think of the children was never a valid argument and isn't a
           | valid one today. I truly hope this utterly stupid and brain-
           | dead way of thinking never escapes the AI community and
           | infect other fields.
        
             | seanw444 wrote:
             | Agreed. This was an inevitable use case for AI imagery from
             | the get-go. No way around it. Even if one dev/trainer goes
             | out of their way to make _certain_ that it can 't be used
             | for that purpose, another model will be trained that can.
             | So the only three solutions are:
             | 
             | A. Get over it.
             | 
             | B. Ban the tool in its entirety.
             | 
             | C. Waste tons of governmental resources, spy on people
             | more, and operate in legal grey-area to hunt down people
             | that produce that kind of stuff with these models.
             | 
             | The correct answer seems obvious to me.
        
           | sschueller wrote:
           | Then instead of just presenting me with a blank image tell me
           | why it's blank. Or add the word "nude" to the default
           | negative filter which by default doesn't have that in it.
           | 
           | The default is:                  negative-prompt:"Ugly,
           | duplication, duplicates, mutilation, deformed, mutilated,
           | mutation, twisted body, disfigured, bad anatomy, out of
           | frame, extra fingers, mutated hands, poorly drawn hands,
           | extra limbs, malformed limbs, missing arms, extra arms,
           | missing legs, extra legs, mutated hands, extra hands, fused
           | fingers, missing fingers, extra fingers, long neck, small
           | head, closed eyes, rolling eyes, weird eyes, smudged face,
           | blurred face, poorly drawn face, mutation, mutilation, cloned
           | face, strange mouth, grainy, blurred, blurry, writing,
           | calligraphy, signature, text, watermark, bad art,"
        
             | sandworm101 wrote:
             | Sure, if this was a commercial product i would also scream
             | about feedback. But as this is a free side project im not
             | going to criticize decisions on what they think they need
             | to do to avoid unnecesary drama. I credit them for making
             | the bypass relatively simple.
        
               | adammarples wrote:
               | Nobody is screaming
        
           | Siira wrote:
           | Is any AI generated image even illegal, assuming originality
           | and difference from training set?
        
           | babarock wrote:
           | Genuinely curious what type of content would be considered
           | illegal here.
           | 
           | - the tool is drawing original content. - the tool is
           | executing on my laptop. - the output image is not shared with
           | anyone.
           | 
           | In a way, whatever this tool can do, MS Paint could
           | (theoretically) do.
           | 
           | Or am I misunderstanding the whole thing?
        
             | wongarsu wrote:
             | There's a great wikipedia page on this very topic [1]. In
             | some countries like Germany fictional porn isn't considered
             | porn, while other countries like Australia or France
             | consider the possession of drawings of naked minors a crime
             | worthy of jail sentences. And then there's the US where
             | having the images on a computer is fine, as long as they
             | aren't sent over the internet and the computer never
             | crosses state lines.
             | 
             | In general it's a topic people are careful about because
             | the legal situation is complex, and in many countries
             | associated with the potential of heavy jail sentences.
             | 
             | 1: https://en.wikipedia.org/wiki/Legal_status_of_fictional_
             | porn...
        
             | sandworm101 wrote:
             | Nope. It is illegal if you do it on ms paint too. Images of
             | naked kids (real or fiction) + transmitted over internet
             | (ie state lines) = bad bad day for all involved.
        
               | babarock wrote:
               | But it requires transmission over the internet, right?
               | This is actually very interesting. Am I legally allowed
               | to draw naked kids for my own enjoyment in the comfort of
               | my own home?
        
               | sandworm101 wrote:
               | Simple possession is also a crime, just a state crime.
               | The internet transmission triggers federal jurisdiction
               | and federal charges. There is no safe/legal way to
               | possess such images.
        
               | kleiba wrote:
               | _> + transmitted over internet (ie state lines)_
               | 
               | But OP specifically said: "the tool is executing on my
               | laptop. - the output image is not shared with anyone."
        
               | wongarsu wrote:
               | Maybe use a desktop instead, because crossing state lines
               | with that laptop might carry a jail sentence with a
               | minimum term of 5 years.
        
               | kleiba wrote:
               | I would assume that running an AI image generation
               | software is something you'd do on a desktop anyway.
        
               | sandworm101 wrote:
               | The state lines thing is only about federal crimes.
               | Possession is most likely illegal as a state crime on its
               | own. The repercussions of even being suspected of crimes
               | in this area means that every rational person would take
               | precautions far above what is legally necessary to
               | prevent any association with such material.
        
         | bryced wrote:
         | The console output tells you what happened.
        
       | lou_alcala wrote:
       | Wow this is cool I think I am going to make a site so people can
       | use this
        
       | bobmaxup wrote:
       | https://www.timothybrooks.com/instruct-pix2pix
        
       | c7b wrote:
       | Many thanks to the OP, can't wait to try this out! I have a
       | question I'm hoping to slide in here: I remember there were also
       | solutions for doing things like "take this character and now make
       | it do various things". Does anyone remember what the general term
       | for that was, and some solutions (pretty sure I've seen this on
       | here, apparently forgot to bookmark).
       | 
       | PS: I'm not trying to make a comic book, I'm trying to help a
       | friend solve a far more basic business problem (trying to get
       | clients to pay their bills on time).
        
         | bryced wrote:
         | dreambooth perhaps?
        
           | c7b wrote:
           | Dreambooth is what I'm using now, but I think I remember the
           | concept had a specific name, something like 'context
           | transfer' or so (pretty sure that was not the term) and tools
           | that were pretty good at it that came out before Dreambooth.
           | If I could at least remember the term it might be easier to
           | search for them.
           | 
           | Dreambooth is ok at it, but it requires multiple images (you
           | often read 30, but I've actually had decent results with as
           | few as five) to recognize what it's supposed to replicate. I
           | remember there were tools that were more adapted to the
           | workflow "create a humanoid cartoon character with a bunny
           | face", pick one image that you like, and then "now show that
           | same character in scene X, eg teaching in a classroom" or
           | "wearing a cowboy outfit".
        
             | cma wrote:
             | "textual inversion"
        
               | c7b wrote:
               | I think that's it, thank you!
        
       | [deleted]
        
         | [deleted]
        
           | [deleted]
        
       | anigbrowl wrote:
       | _A CUDA supported graphics card with >= 11gb VRAM (and CUDA
       | installed) or an M1 processor._
       | 
       | /Sighs in Intel iMac
       | 
       | Has anyone managed to get an eGPU running under MacOS? I guess I
       | could use Colab but I like the feeling of running things locally.
        
         | bryced wrote:
         | I'm told Imaginairy secretly does run on Intel Mac. very
         | slowly. I just don't want to be on the hook for support so the
         | reqs are written that way.
        
           | anigbrowl wrote:
           | Ooh, I'll give it a whirl - performance isn't a priority
           | right now. Thanks Bryce.
        
       | 88stacks wrote:
       | Wow, it's really impressive to see how advanced AI image
       | generators have become! The ability to create stable diffusion
       | images with a "just works" approach on multiple operating systems
       | is a huge step forward in this technology. We've deployed similar
       | tech and APIs for our customers and are contemplating using this
       | library as part of our pipeline for https://88stacks.com
        
         | TekMol wrote:
         | Dark patterns are frowned upon here on HN.
         | 
         | Letting the user upload dozens of images and only after that
         | telling them they need an account. Not good.
        
           | 88stacks wrote:
           | its not a dark pattern, what would happen to the images after
           | uploading?
        
             | dspillett wrote:
             | _> its not a dark pattern_
             | 
             | You are entitled to your opinion, no matter how wrong many
             | of us think it is :)
             | 
             |  _> what would happen to the images after uploading?_
             | 
             | I have no idea, as there is no such information given going
             | as far as the upload form, nor in the FAQ. This is
             | information you should provide.
             | 
             | Though that isn't the key problem IMO. For someone who
             | backs out because of the sign-up requirement, you've wasted
             | their time ( _and_ the service now has their images with no
             | obvious pre-agreed policy covering re-use or other
             | licensing issues).
        
       | kumarm wrote:
       | Doesn't work if any people are in the photos:
       | https://twitter.com/kumardexati/status/1616972740728356867/p...
        
         | lgas wrote:
         | It does work on some things with people. I colorized a black
         | and white photo of myself and then turned the colorized version
         | into me as a Dwarven king.
        
         | bryced wrote:
         | Works fine for me, you just need to adjust the strength of the
         | edit.
        
           | kumarm wrote:
           | You mean steps?
        
             | bryced wrote:
             | no. in imaginairy it's called `--prompt-strength`. In other
             | libraries it's called CFG or "classifier-free guidance".
             | For the image edits I vary the strength of the effect from
             | between 3-25
        
             | bryced wrote:
             | For the specific example you provide you could also use a
             | prompt-based mask to prevent it from editing the person.
        
         | sam1r wrote:
         | Did you have to tweet it wasn't working versus just not making
         | it a public "omg it's not working it's no good"
        
       | nmstoker wrote:
       | Looks really interesting, although my immediate thought with
       | "-fix-faces" is how long before someone manages to do something
       | inappropriate and whip up a storm about this.
        
       | airbreather wrote:
       | I'm getting mixed results, and for a given topic it seems to
       | invariably give a better result first time you ask, then not so
       | good if you ask again.
       | 
       | It could be random and my imagination, but seems that way.
        
       | yieldcrv wrote:
       | "Add a dog in my arms"
       | 
       | I'll keep you posted how well this works for dating apps
        
       | karim79 wrote:
       | I've been toying with SD for a while, and I do want to make a
       | nice and clean business out of it. It's more of a side-projecty
       | thing so to speak.
       | 
       | Our "cluster" is running on a ASUS ROG 2080Ti external GPU in the
       | razer core-x housing, and that actually works just fine in my
       | flat.
       | 
       | We went through several iterations of how this could work at
       | scale. The initial premise was basically the google homepage, but
       | for images.
       | 
       | That's when we realised that scaling this to serve the planet was
       | going go be a hell of a lot more work. But not really,
       | conceptualising the concurrent compute requirements as well as
       | the ever-changing landscape and pace of innovation in this
       | absolutely necessary.
       | 
       | The quick fix is to use a message queue (we're using Bull) and
       | make everything asynchronous.
       | 
       | So essentially, we solved the scaling factor using just one GPU.
       | You'll get your requested image, but it's in a queue, we'll let
       | you know when it's done. With that compute model in place, we can
       | just add more GPUs, and tickets will take less time to serve if
       | the scale engineering is proper.
       | 
       | I'm no expert on GPU/Machine learning/GAN stuff but Stable
       | Diffusion actually prompted me to imagine how to build and scale
       | such a service, and I did so. It is not live yet, but when it
       | does become so the name reserved is dreamcreator dot ai, and I
       | can't say when it will be animated. Hopefully this year.
        
         | DrScientist wrote:
         | Queues are everywhere - at all levels - in the end a single
         | transistor is either on or off - doing one thing at once.
         | 
         | Your queue decouples demand from supply - though you now have
         | another problem - if demand far exceeds supply will your queue
         | overflow?
         | 
         | In that scenario you might need to push the queue back to the
         | requester - ie refuse the job and tell them to resubmit later.
        
         | throwaway675309 wrote:
         | A ticketing scheduler system is how 99% of systems that require
         | long running CPU/GPU intensive that cannot be run in parallel
         | are implemented. It's how I built up my stable diffusion
         | discord bot which is backed by a single RTX 2060.
         | 
         | I'm glad you have this working but I wouldn't exactly call this
         | "solving the scaling problem", you're just running it in a
         | blocking "serial fashion". With enough concurrent users it
         | could still take somebody until the heat death of the universe
         | for their image to finally be generated.
        
           | karim79 wrote:
           | Agreed. The scaling model/queueing system was implemented as
           | a POC which can be scaled by plugging in more cards/hosts. I
           | hope we can find the time to animate this soon enough.
        
       | sebastiennight wrote:
       | It's very interesting, thanks! I've noticed (on the Spock
       | example) that "make him smile" didn't produce a very... "comely"
       | result (he basically becomes a vampire).
       | 
       | I was thinking of deploying something like that in one of our app
       | features, but I'm scared of making our Users look like vampires
       | :-)
       | 
       | Is it your experience that the model struggles more with faces
       | than with other changes?
        
         | bryced wrote:
         | Yes if you're not careful it can ruin the face. You can play
         | with the strength factor to see if something can be worked out.
         | Bigger faces are safer.
        
       | theusus wrote:
       | Two things
       | 
       | 1. It actually makes me insecure.
       | 
       | 2. Don't we already have apps that do such things? Yes, they were
       | more specialized, but it's the same thing as Prisma app.
        
       | natch wrote:
       | The headline and the heavy promotional verbiage on the site seems
       | to be claiming this is some new functionality we didn't have
       | before. Image2image with text instructions isn't new as the
       | headline implies.
       | 
       | InvokeAI (and a few other projects as well) already does all this
       | stuff much better unless I'm missing something. There are plenty
       | of stable diffusion wrappers. Why not help improve them instead
       | of copying them?
       | 
       | I'm not against having enthusiasm for one's project, but tell us
       | why this is different and please don't pretend the other projects
       | don't have this stuff.
        
         | bryced wrote:
         | I'm not aware of any pre-existing open-source model that
         | selectively edits images (leaving some parts untouched) based
         | on instructions. This new method is much better than the
         | image2image that shipped with the original stable diffusion.
         | I'm looking at the InvokeAI docs right now and don't see
         | anything like this feature. We previously had smart-masks, but
         | InstructPix2Pix mostly does away with the need for those as
         | well.
         | 
         | If I am mistaken please provide links to these prior features.
        
       | nullish_signal wrote:
       | >11GB VRAM
       | 
       | Aaarrrgghh let me know when it's down to 4GB like Stable
       | Diffusion
       | 
       | The prompt-based masking sounds incredible, with either pixel +/-
       | or Prompt Relevance +/-
       | 
       | VERY impressive img2img capabilities!
        
         | NavinF wrote:
         | You can get a used 2080Ti for under $300 on eBay
        
           | smcleod wrote:
           | That's a lot of money for most people. It also means they
           | have to have a PC to put it in.
        
             | nullish_signal wrote:
             | Thank you for your sensical response... Very happy with my
             | 6GB VRAM card and don't have $300 lying around to use on a
             | git repo that will probably be slimmed down in a month or
             | two
        
         | do_anh_tu wrote:
         | Or, if there is a Colab version, I'd happy to pay Google for
         | premium GPU.
        
           | bryced wrote:
           | It does work in non-pro colab apparently. Here you go: https:
           | //colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjC...
        
           | eega wrote:
           | Well, just open a new GPU Colab and create a cell mit ,,!pip
           | install imaginairy" and you should be good to go ...
        
         | bryced wrote:
         | It is stable diffusion but yes my fork does not have the memory
         | optimizations needed to run it on only 4gb
        
           | [deleted]
        
       | tomrod wrote:
       | This is a lot of fun!
       | 
       | And they aren't kidding that on a CPU backend it is _slooooow_ :)
        
       | mstade wrote:
       | Does anyone know of any tool like this for UI design? I'd love
       | something that'd help creatively impaired people like myself
       | communicate more visually.
        
       | social_quotient wrote:
       | Slightly off topic.
       | 
       | I've been looking for an easier way to replace the text in these
       | ai generated images. I found Facebook is working on it with their
       | TextStyleBrush - https://ai.facebook.com/blog/ai-can-now-emulate-
       | text-style-i... but have been unable to find something released
       | or usable yet. Anyone aware of other efforts?
        
       | TeMPOraL wrote:
       | > _Here are some examples of transformations it can make: Golden
       | gate bridge:_
       | 
       | I'm on mobile so can't try this myself now. Can it add a Klingon
       | bird of prey flying under the Golden Gate Bridge, and will "add a
       | Klingon bird of prey flying under the Golden Gate Bridge"
       | prompt/command be enough?
        
         | wongarsu wrote:
         | No. At least not with the Stable Diffusion 1.5 checkpoint used
         | in the colab notebook. It seems to only have a very vague idea
         | of what a Klingon bird of prey is. The best I could get in ~30
         | images was [1], and that's with slight prompt tweaks and a
         | negative prompt to discourage falcons and eagles.
         | 
         | 1: https://i.imgur.com/gDj2Kn4.png
        
       | sandworm101 wrote:
       | Fireworks. These AI tools seem very good at replacing textures,
       | less so about inserting objects. They can all "add fireworks" to
       | a picture. They know what fireworks look like and diligently
       | insert them into "sky" part of pictures. But they don't know that
       | fireworks are large objects far away rather than small objects up
       | close (see the Father Ted bit on that one). So they add tiny
       | fireworks into pictures that don't have a far away portion
       | (portraits) or above distant mountain ridges as if they were
       | stars. Also trees. The AI doesn't know how big trees are and so
       | inserts monster trees under the Golden Gate bridge and tiny
       | bonsais into portraits. Adding objects into complex images is
       | totally hit and miss.
        
         | jagaerglad wrote:
         | Another thing was the "Bald" for the girl with the pearl
         | earring, seems like it doesn't about things like ponytails
         | under headdresses
        
         | ricardobeat wrote:
         | The new models that take depth estimation into account will
         | probably solve this.
        
         | taberiand wrote:
         | Perhaps stereoscopic video should be part of the training data?
        
           | YurgenJurgensen wrote:
           | Human stereoscopy is only good out to a few meters (and
           | presumably people aren't going out with giant WWII
           | stereoscopic rangefinders to generate training data). So it
           | wouldn't help them for things like fireworks or trees.
        
       | Gravyness wrote:
       | A similar tool: Instruct pix2pix to alter images by describing
       | the changes required: https://huggingface.co/timbrooks/instruct-
       | pix2pix#example
       | 
       | Edit: Just noticed it is the same thing but wrapped, nevermind,
       | pretty cool project!
        
       | [deleted]
        
       | [deleted]
        
       | dandigangi wrote:
       | This is really cool. Haven't seen something like this yet. Going
       | to be very interesting when you start to see E2E generation =>
       | animation/video/static => post editing => repeat. Have this
       | feeling that movie studios are going to look into this kind of
       | stuff. We went from real to CGI and this could take it to new
       | levels in cost savings or possibilities.
        
         | dandigangi wrote:
         | Played around for a bit. Definitely a cool tool. Wish I had an
         | M1 though. Taking me quite a bit to generate and fans running
         | at full blast. Haha
        
       | fassssst wrote:
       | Related:
       | https://www.reddit.com/r/StableDiffusion/comments/10hv160/im...
        
       | cbeach wrote:
       | I see it's able to generate politician faces. I recall this
       | wasn't possible on DALL*E 2 due to safety restrictions.
       | 
       | I run a friendly caption contest https://caption.me so imaginAIry
       | is going to be absolute gold for generating funny and topical
       | content. Thank you @bryced!
        
       | goffi wrote:
       | Wow that's really impressive (I've seen similar things in
       | research papers for a while now, but having it usable so easily
       | and generic is great).
       | 
       | A few questions:
       | 
       | - would it be possible to use this tool to make automatic mask
       | for editing in something like GIMP (for instance, if I want to
       | automatically mask the hair)?
       | 
       | - would it be possible to have a REPL or something else to make
       | several prompt on the same image? Loading the model takes time,
       | and it would be great to be able to just do it once.
       | 
       | - how about a small GUI or webui to have the preview immediately?
       | Maybe it's not the goal of this project and using `instruct-
       | pix2pix` directly with its webui is more appropriate?
       | 
       | Thanks for the work (including upstream people for the research
       | paper and pix2pix), and for sharing.
        
         | bryced wrote:
         | > would it be possible to use this tool to make automatic mask
         | for editing in something like GIMP
         | 
         | probably but GIMP plugins are not something I've looked into
         | 
         | > REPL
         | 
         | already done. just type `aimg` and you're good to go
         | 
         | > GUI
         | 
         | GUIs add a lot of complexity. Can your file manager do
         | thumbnails and quick previews?
        
           | sseagull wrote:
           | > GUIs add a lot of complexity. Can your file manager do
           | thumbnails and quick previews?
           | 
           | Somewhat OT, but I find this really funny. It says a lot
           | about the difficulty of using various ecosystems and where
           | communities spend time polishing things.
           | 
           | "Yeah, I made something that takes natural language and can
           | do things like change seasons in an image. But a GUI? That's
           | complicated!"
           | 
           | It's not a criticism of you, but the different ecosystems and
           | what programmers like to focus on nowadays.
        
             | bryced wrote:
             | Fair but I'd point out I also didn't make the algorithm
             | that changes photos. I'm wrapping a bunch of algorithms
             | that other people made in a way that makes them easy to
             | use.
             | 
             | It's not just that GUI's are hard, it's that the "customer"
             | base will inevitably be much less technical and I'd receive
             | a lot more difficult to resolve bug reports. So no-gui is
             | also a way of staying focused on more interesting parts of
             | the project.
        
           | goffi wrote:
           | thanks for the quick answer and cool for REPL. Yeah sure I
           | can just launch Gwenview on the output directory.
           | 
           | > probably but GIMP plugins are not something I've looked
           | into
           | 
           | I was just thinking about a black and white or grey level
           | output image with the desired area , no need to integrate it
           | in GIMP of whatever. I've tried a prompt like "keep only the
           | face", but no luck so far.
        
             | bryced wrote:
             | There is a smart mask feature. Add `--mask-prompt face
             | --mask-mode keep`. I believe it outputs the masks as well
        
       | Daub wrote:
       | The language of high-level art-direction can be way more complex
       | than one might assume. I wonder how this model might cope with
       | the following:
       | 
       | 'Decrease high-frequency features of background.'
       | 
       | 'Increase intra-contrast of middle ground to foreground.'
       | 
       | 'Increase global saturation contrast.'
       | 
       | 'Increase hue spread of greens.'
        
         | CyanBird wrote:
         | They behave quite poorly, because the keywords used by the
         | models are layman language not technical art or color
         | correction/color grading-speak
         | 
         | Hopefully in a couple of years when things have matured more
         | there will be more models capable of handling said requests
         | 
         | The most precise models are actually anime models because the
         | users have got high standards for telling the machine what they
         | expect of it and the databases are quite well annotated (booru
         | tags)
        
           | Daub wrote:
           | Around how many samples are required for an effective
           | training set?
        
             | duringwork12 wrote:
             | Leonardo.ai can probably make a model that handles one of
             | those prompts well with 40 images.
             | 
             | To handle them all you would need a larger sample.
        
       | sideshowb wrote:
       | Is there a link to how this works - in terms of nn architecture
       | to combine the embedding of the existing image with the edit
       | instruction?
        
         | bryced wrote:
         | https://arxiv.org/abs/2211.09800
        
       | Der_Einzige wrote:
       | Hoping that this is quickly implemented into the automatic1111
       | webUI.
        
       | pfd1986 wrote:
       | Super nice. Would this work if I have my own version of fine-
       | tuned SD? Also, curious how / whether this is different from
       | img2img released by SD. Thanks!
        
         | bryced wrote:
         | This is itself it's own finetuned version of SD so now it won't
         | work with alternative versions. img2img works by just running
         | normal stable diffusion img2img on a noised starting image. As
         | such it destroys information at all parts of the image equally.
         | This new model uses attention mechanisms to decide which parts
         | of the image are important to modify. It can leave parts of the
         | image untouched while making drastic changes to other parts.
        
           | caxco93 wrote:
           | Well, to be fair you can use feathered bitmap masks for
           | img2img with some UIs (automatic1111)
        
       | kewp wrote:
       | anyone know how to use this? kind of confusing install
       | instructions in the readme
        
         | kadoban wrote:
         | If you don't care what exact tool in particular,
         | https://github.com/AUTOMATIC1111/stable-diffusion-webui is the
         | easiest to install I think and gives lots of toys to play with
         | (txt2img, img2img, teach it your likeness, etc.)
        
         | bryced wrote:
         | If you're used to installing python packages it should be
         | relatively easy. There are other projects with nice UIs but
         | that's not what this library is for.
        
       | 0x4164 wrote:
       | I hope there is a James Fridman version of this kind of AI.
        
       | ilaksh wrote:
       | Does anyone know if there is something like Google Cloud for GPUs
       | but with an easy way to suspend the VM or container when not in
       | use? Maybe I am just looking for container hosting with GPUs.
       | 
       | I am just trying to avoid some of the basic VM admin stuff like
       | creating, starting, stopping for SaaS if someone already has a
       | way to do. Maybe this is something like what Elastic Beanstalk
       | does.
        
         | midlightdenight wrote:
         | Maybe not quite what you're looking for, but I've seen some
         | people mention banana.dev
         | 
         | https://www.banana.dev/
         | 
         | Never used it myself, but looks like AWS Lambda/GCP Cloud
         | Functions tailored to ML models.
        
           | TekMol wrote:
           | "Log in with Github". No thanks.
        
         | plufz wrote:
         | i've used https://www.genesiscloud.com/
        
         | punkspider wrote:
         | I think there's https://brev.dev and https://banana.dev
        
         | nicd wrote:
         | vast.ai, paperspace.com
        
       | weakwire wrote:
       | Enchance!
        
       | sam1r wrote:
       | This is amazing! It's only so long until video..
        
         | dr_kiszonka wrote:
         | Video would be very useful too, but I expect running such
         | models locally would be prohibitively expensive for most folks.
         | (I am not talking about those $300k/year people here.)
        
         | pcrh wrote:
         | See: https://news.ycombinator.com/item?id=34389041
        
       | PaulMest wrote:
       | I've played with several of these Stable Diffusion frameworks and
       | followed many tutorials and imaginAIry fit my workflow the best.
       | I actually wrote Bryce a thank you email in December after I made
       | an advent calendar for my wife. Super excited to see continued
       | development here to make this approachable to people who are
       | familiar with Python, but don't want to deal with a lot of the
       | overhead of building and configuring SD pipelines.
        
         | bryced wrote:
         | Thanks Paul!
        
           | brycedriesenga wrote:
           | Whoa. Another Bryce D. So when do we fight?
        
             | kennyadam wrote:
             | When the narwhal bacons of course. ugh.
        
       | nicbou wrote:
       | Can it make it pop? Because that was the #1 request I remember
       | dealing with.
        
         | tamrix wrote:
         | Maybe but it could put their business logo anywhere!
        
         | awestroke wrote:
         | Try these prompts:
         | 
         | "Add lens flare"
         | 
         | "Increase saturation"
         | 
         | "Add sparkles and gleam"
        
         | bryced wrote:
         | I tried it out :-)
         | 
         | `aimg edit assets/girl_with_a_pearl_earring.jpg "make it pop"
         | --prompt-strength 40 --gif`
         | 
         | https://user-images.githubusercontent.com/1217531/213912442-...
        
           | userbinator wrote:
           | I was expecting something balloon-like to appear, and was not
           | disappointed.
        
           | jameshart wrote:
           | I think it tried to make it _pop art_? Which is not a bad
           | response to be fair.
        
         | perfrom1 wrote:
         | this should do it:
         | 
         | >> aimg edit input.jpg "make it pop" --prompt-strength 25
        
         | prox wrote:
         | I don't know why people use this "AI" thing, I have been using
         | _make my logo bigger cream_ (tm) for ages with success.
         | 
         | https://www.youtube.com/watch?v=GOwi3x92teo
         | 
         | ;)
        
           | marcosdumay wrote:
           | Hum... Is that whitespace eliminator still on sale?
        
             | prox wrote:
             | Well if you check the web, not a lot :D
        
             | cjwit wrote:
             | It's funny that the bottle doesn't look more like a Dr
             | Bronner's bottle [0]
             | 
             | [0]: https://www.drbronner.com/products/peppermint-pure-
             | castile-l...
        
         | TekMol wrote:
         | #1 request of what, for what, requested by whom?
        
           | stefanvdw1 wrote:
           | That is a common request when working with clients. They have
           | a hard time describing what they want so end up asking to
           | "make it pop"
        
       | odedbend wrote:
       | Where can I find more data about the work you did to create this?
        
         | bryced wrote:
         | I did the work to wrap it up and make it "easy" to install. The
         | researchers who did the real work can be found here:
         | https://www.timothybrooks.com/instruct-pix2pix
        
       | TekMol wrote:
       | How can I try this?
       | 
       | Can this be run on a Digitalocean VM?
       | 
       | I looked around on DO's products, but none seems to advertise
       | that it has a GPU. So maybe it is not possible?
        
         | bryced wrote:
         | Here is a google colab you can try it in:
         | https://colab.research.google.com/drive/1rOvQNs0Cmn_yU1bKWjC...
        
         | malux85 wrote:
         | Try paperspace, they have GPUs and you can set billing limits
         | to stop accidental overusage
         | 
         | (no affiliation other than being a happy customer)
        
       | patientplatypus wrote:
       | [dead]
        
       | WesolyKubeczek wrote:
       | So, Deckard can ask it to enhance, finally :)
        
       ___________________________________________________________________
       (page generated 2023-01-23 23:02 UTC)