[HN Gopher] Real-time image editing using latent consistency models
___________________________________________________________________
Real-time image editing using latent consistency models
Author : dvrp
Score : 34 points
Date : 2023-11-10 20:06 UTC (2 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| joerambo808 wrote:
| why is it faster than latent diffusion models?
| vipermu wrote:
| it uses a new technique called "consistency" that lets latent
| diffusion models to predict images in much fewer steps.
|
| some links here: - https://arxiv.org/abs/2310.04378 -
| https://arxiv.org/abs/2311.05556
| billconan wrote:
| latent diffusion is an iterative process, the image becomes
| clearer one step at a time.
|
| The process can be viewed as a particle moving in the image
| space, one step at a time to its final position in the image
| space, which is the generated image.
|
| consistency model tries to predict the movement trajectory by
| providing the current position in the image space. Hence, what
| used to be a step-by-step process becomes a one-step process.
| dvrp wrote:
| oh wow, never thought of it that way
| swyx wrote:
| > consistency model tries to predict the movement
| trajectory by providing the current position in the image
| space. Hence, what used to be a step-by-step process
| becomes a one-step process.
|
| no that wasnt a sufficient explanation for me. what is the
| prediction method here? why was diffusion necessary in the
| past? what tradeoffs does this approach have?
| quadrature wrote:
| from https://arxiv.org/abs/2310.04378 it sounds like its
| a form of distillation of an SDL model. So im guessing it
| can't be directly trained, but once you have a trained
| diffusion model you can distil a predictor which cuts out
| the iterative steps.
|
| While it can do 1-step the output quality looks a ton
| better with additional steps.
| dvrp wrote:
| You can train it directly. This is from the paper "An LCM
| demands merely 32 A100 GPUs Hours training for 2-step
| inference [...]"
| billconan wrote:
| to my defense, if you look at the original paper
|
| https://arxiv.org/pdf/2303.01469.pdf
|
| the exact neural network used for the prediction method
| is omitted. apparently many neural networks can be used
| for this prediction method as long as they fulfill
| certain requirements.
|
| > why was diffusion necessary in the past?
|
| in the paper, one way to train a consistency model is
| distilling an existing diffusion model. But it can be
| trained from scratch too.
|
| "why was it necessary in the past " doesn't bother me
| that much. Before people know to use molding to make
| candles, they do it by dipping threads into wax. Why was
| thread dipping necessary? it's just the stepping stone of
| technology development.
| tasgon wrote:
| Disclaimer: I'm actively working on this tool.
|
| This is in a closed beta for now (while we work on provisioning
| enough GPU compute) but we're hoping to make this public later.
| gailees wrote:
| How do you plan on stopping people from using this tool
| maliciously?
| monkellipse wrote:
| The same way you stop people from using a hammer maliciously?
| byrneml wrote:
| Could you apply the same technique for real-time video editing?
| kmavm wrote:
| (Disclaimer: I'm an investor in Krea AI.)
|
| When Diego first showed me this animation, I wasn't completely
| sure what I was looking at, because I assumed the left and right
| sides were like composited together or something. But it's a
| unified screen recording; the right, generated side is keeping
| pace with the riffing the artist does in the little paint program
| on the left.
|
| There is no substitute for low latency in creative tools; if you
| have to sit there holding your breath every time you try
| something, you aren't just linearly slowed down. There are points
| that are just too hard to reach in slow, deliberate, 30+ second
| steps that a classical diffusion generation requires.
|
| When I first heard about consistency, my assumption was that it
| was just an accelerator. I expected we'd get faster, cheaper
| versions of the same kinds of interactions with visual models
| we're used to seeing. The fine hackers at Krea did not take long
| to prove me wrong!
| dvrp wrote:
| Exactly.
|
| There is no substitute for real-time when you're doing creative
| work.
|
| That's why GitHub Copilot works so well; that's why ChatGPT
| struck a chord with people--it streamed the characters back to
| you quite fast.
|
| At first, I was skeptical too. I asked myself "what about
| Photoshop 1.0? They surely couldn't do it in real-time.". It
| turns out that even then you needed it. Of course, compute
| wasn't there to do a simple translation of _all_ rasterized
| pixel values that form an image within a layer, but there was a
| trick they did: they showed you the outline that would tell
| you, the user, where the content _will_ render if you let the
| mouse go.
|
| You can see the workflow here:
|
| > https://www.youtube.com/watch?v=ftaIzyrMDqE
|
| It applies to general tools too; you can see the same on this
| MacOS 8 demo (it runs on the browser!):
|
| > https://infinitemac.org/1998/Mac%20OS%208.1
| vmatsiiako wrote:
| Does this have to do anything at all with LoRAs?
| vipermu wrote:
| indeed; we're able to make it work with SDXL thanks to a new
| technique that got released yesterday called LCM-LoRA.
|
| with LCM-LoRA you can turn models like SDXL into LCMs without
| need for training and you can add other style LoRAs like the
| ones you find on civit.ai
|
| in case you're interested, here's the technical report about
| LCM-LoRA: https://arxiv.org/abs/2311.05556
___________________________________________________________________
(page generated 2023-11-10 23:00 UTC)