[HN Gopher] Generate images in one second on your Mac using a la...
___________________________________________________________________
Generate images in one second on your Mac using a latent
consistency model
Author : bfirsh
Score : 181 points
Date : 2023-10-27 16:37 UTC (6 hours ago)
(HTM) web link (replicate.com)
(TXT) w3m dump (replicate.com)
| AIorNot wrote:
| Awesome
| grandpa_yeti wrote:
| Seeing this kind of image generation limited to M series Macs
| just goes to show how far ahead Apple is in the notebook GPU
| game.
| minimaxir wrote:
| It's _possible_ to do on non-Apple Silicon Macs, just more
| annoying. There are a few generative AI implementations which
| use raw Metal but not sure what the most popular one is.
| andybak wrote:
| I've got a Windows laptop with an RTX 3080 in it that runs this
| model no problem. I don't have it to hand or else I'd post some
| timings.
|
| On my Desktop PC with a 4090 in I was getting speeds of 0.2 to
| 0.3 seconds for reasonably acceptable quality settings so I
| would expect 0.5s or so on the laptop.
|
| What Apple _are_ ahead on is doing this on a fanless laptop
| that doesn 't hit internal temperatures of triple digits.
| traceroute66 wrote:
| > What Apple are ahead on is doing this on a fanless laptop
| that doesn't hit internal temperatures of triple digits.
|
| You also forgot the bit where Apple are ahead of doing it on
| a laptop that can achieve it _without_ needing to be tethered
| to a power socket to achieve the performance.
| alex_duf wrote:
| It's the same thing, power is heat with talking about chips
| echelon wrote:
| > You also forgot the bit where Apple are ahead of doing it
| on a laptop that can achieve it without needing to be
| tethered to a power socket to achieve the performance.
|
| Kind of sad that a huge anti-competitive, trillion dollar
| company is the one offering it. Especially given their
| stances around user freedom.
|
| I'd much rather innovation be distributed. The goal posts
| should be moved to a point everyone is pushing towards the
| next thing. Having Apple be the only game in town is
| unhealthy.
| astrange wrote:
| Would say that rather than one company being the only one
| who can do it, there is only one company that can't do
| it, and it's Intel.
| brucethemoose2 wrote:
| > What Apple are ahead on is doing this on a fanless laptop
| that doesn't hit internal temperatures of triple digits.
|
| I think you could pull this off on a Asus G14 in an ultra
| power saver mode, with the fans off or running inaudibly. The
| cooling is so beefy they will actually work fanless if you
| throttle everything down and mostly keep the GPU asleep.
|
| The M chips could certainly sustain image generation better
| without a fan.
| errnoh wrote:
| 45it/s (0.1~s per image) on 7900XTX here, so it's still one
| magnitude faster on GPU with a lot higher power draw than the
| macs. Doing 10x slower with non-tethered is quite nice
| outcome.
| wasyl wrote:
| At this point what Apple is ahead with is hype that M Macs
| are that fast, and the developers targeting them because
| things just work. Plenty of people should be able to run
| these models locally but there's close to no nice software
| that does that out of the box for Windows or Linux
| astrange wrote:
| It's because of the unified memory architecture. It's
| harder/different to do this on x86, because you have to
| have a large memory GPU and target that.
| orbital-decay wrote:
| Not sure why you think it's limited to M series Macs or has to
| do anything with Apple at all. It's just an instruction on how
| to run a diffusion model trained in a novel way on particular
| hardware.
| amelius wrote:
| It mostly shows how shitty compatibility is between platforms
| that share the same roots.
| liuliu wrote:
| The implementation is not even optimized for Macs. LCM is just
| very easy to be fast (batch size = 1 and only 2 to 8 steps,
| depending on what kind of headline you are trying to make).
| filterfiber wrote:
| They also have a decent advantage for LLMs because of their
| memory bandwidth to system memory vs GPU's with limited VRAM
| limited over PCIE to the system memory.
| novaomnidev wrote:
| Got this working on an intel Mac
| oldstrangers wrote:
| Interesting timing because part of me thinks Apple's Spooky Fast
| event has to do with generative AI.
| joshstrange wrote:
| I think the current rumors are MBPs which would be odd to do
| the pros before the base models but I wouldn't complain.
| m3kw9 wrote:
| Likely not show any generative software till macOS (next ver)
| comes out, they don't usually showcase stand alone features
| without a bigger strategy to include the OS
| agloe_dreams wrote:
| This....but a menu item that does it for you.
| m3kw9 wrote:
| Gpt4 likely can give you code for this
| bigethan wrote:
| Mac shortcuts are exactly the use case for this. Menu bar, ask
| for a prompt, run script. I was always wary of shortcuts, but
| they're quite powerful and nicely integrated with the OS in the
| latest versions
| herpdyderp wrote:
| 32GB M1 Max is taking 25 seconds on the exact same prompt as in
| the example.
|
| Edit: it seems the "per second" requires the `--continuous` flag
| to bypass the initial startup time. With that, I'm now seeing the
| ~1 second per image time (if initial startup time is ignored).
| m3kw9 wrote:
| What does bypass startup time really do? Does it keep
| everything in memory or something?
| fassssst wrote:
| Probably, you have to load the weights from disk at some
| point.
| echelon wrote:
| That's exactly it. These models are huge.
| cal85 wrote:
| I'm probably missing something but if the bottleneck is
| disk read speed, wouldn't it only take about 5-6 seconds
| to fill the entire 32GB memory from disk? I just googled
| and found a benchmark quoting 5,507 MB/s read on an M1
| Max.
| liuliu wrote:
| PyTorch checkpoint is slow to load.
| brucethemoose2 wrote:
| The diffusers format this repo uses should be faster, but
| there is still some overhead, yeah.
| Yoric wrote:
| Yeah, the PyTorch disk format is pretty bad.
| ForkMeOnTinder wrote:
| Why bother with the safety checker if the model is running
| locally? I wonder how much faster it would be if the safety
| checks were skipped.
| ozr wrote:
| Not much faster tbh, but it's a bit of virtue signaling you're
| often required to do with generative AI.
| whatsthenews wrote:
| Seems like a waste of time, more of a nice to have/tip of the
| cap to yud
| radicality wrote:
| Was gonna comment the same thing, feels ridiculous to include
| it here for local use. I believe you should be able to remove
| it if you edit the python inference code from huggingface.
|
| edit: I tried it out by copying this pipeline file locally and
| then disabling the safety checker.
| https://raw.githubusercontent.com/huggingface/diffusers/main...
|
| On my M1 macbook, did a test of 10 images, including the one-
| off loading time. With checker: 10.51s, without safety checker:
| 9.48s. So not that big of a hit.
| Turing_Machine wrote:
| I agree. It's pretty easy to bypass if you know a bit of
| Python, though.
|
| Doing a search for "nsfw" in all subdirectories seems to turn
| up all the files you need to edit.
| tobr wrote:
| What will be possible to do once these things run at interactive
| frame rates? It's a little mind boggling to think about what
| types of experiences this will allow not so long from now.
| Maxion wrote:
| TTS --> Prompt --> Generating live imagery from your rambles?
| iinnPP wrote:
| Trippy VR is where my mind goes. Specifically with eye tracking
| to determine where to go and what to generate next.
| throwawayfm wrote:
| Buy 60 machines, and it interactive.
| astrange wrote:
| Alas you've mixed up throughput and latency.
|
| But you might be able to generate at 15fps and interpolate
| between them or something.
| simple10 wrote:
| This is awesome! It only takes a few minutes to get installed and
| running. On my M2 mac, it generates sequential images in about a
| second when using the continuous flag. For a single image, it
| takes about 20 seconds to generate due to the initial script
| loading time (loading the model into memory?).
|
| I know what I'll be doing this weekend... generating artwork for
| my 9 yo kid's video game in Game Maker Studio!
|
| Does anyone know any quick hacks to the python code to
| sequentially prompt the user for input without purging the model
| from memory?
| Maxion wrote:
| > It only takes a few minutes to get installed and running
|
| A few minutes? I have to download at least 5GiB of data to get
| this running.
| simple10 wrote:
| Lol. Yeah, I have 1.2Gb internet.
| m3kw9 wrote:
| The stupid script seem to not know how to save to disk, so it
| downloads on every run.
| simple10 wrote:
| Answered my own question. Here's how to add an --interactive
| flag to the script to continuously ask for prompts and generate
| images without needing to reload the model into memory each
| time.
|
| https://github.com/replicate/latent-consistency-model/commit...
| firechickenbird wrote:
| Quality of these LCM is not the best though
| simple10 wrote:
| True, not the best quality, but still fantastic results for a
| free model running locally on a laptop. Setting the steps
| between 10-20 seemed to produce the best results for me for
| realistic looking images. About one out of 10 images were
| useful for my test case of "a realistic photo of a german
| shepard riding a motorcycle through Tokyo at night"
|
| https://github.com/simple10/ai-image-generator/blob/main/exa...
| brucethemoose2 wrote:
| > Setting the steps between 10-20
|
| But thats the point where regular diffusion (with the UniPC
| scheduler and FreeU) overtakes this in terms of quality.
| simple10 wrote:
| Good point. I haven't done a lot of testing yet. I'm not
| sure if the default of 8 steps yields poorer results than
| 10-20 steps. Either way, it was fast on my M2 mac with 8 to
| 20 steps, much faster than other models I've played with.
| m3kw9 wrote:
| Everytime i execute: python main.py \ "a beautiful apple floating
| in outer space, like a planet" \ --steps 4 --width 512 --height
| 512
|
| It redownloads 4 gigs worth of stuff every execution. Can't you
| have the script save, and check if its there, then download it or
| am I doing something wrong?
| simple10 wrote:
| Did you enable the virtualenv first? If not, it might not be
| caching the models properly.
| jandrese wrote:
| For me it does not re-download anything on the second run. But
| it is also only running on the CPU and is slow AF.
|
| With 5 iterations the quality is...not good. It looks just like
| Stable Diffusion with low iteration count. Maybe there is some
| magic that kicks in if you have a more powerful Mac?
| simple10 wrote:
| Does anyone know of other image generation models that run well
| on a M1/M2 mac laptop?
|
| I'd like do to some comparison testing. The model in the post is
| fast but results are hit or miss for quality.
| liuliu wrote:
| There are plenty of models to try with Draw Things app. You can
| try SDXL on it to see what's the quality looks like. The speed
| comparison here: https://engineering.drawthings.ai/integrating-
| metal-flashatt...
| simple10 wrote:
| Thanks!
| brucethemoose2 wrote:
| https://github.com/lllyasviel/Fooocus#mac
|
| Its not fast, but its SOTA local quality as far as I know, and
| I've tried many UIs and augmentations.
|
| Also, maybe it will run better if you grab Pytorch 2.1 or
| nightly.
| hackthemack wrote:
| If you want to run this on a linux machine and use the machine's
| cpu.
|
| Follow the instructions. Before actually running the command to
| generate an image.
|
| Open up main.py Change line 17 to model.to(torch_device="cpu",
| torch_dtype=torch.float32).to('cpu:0')
|
| Basically change the backend from mps to cpu
| brucethemoose2 wrote:
| For linux CPU only, you want
| https://github.com/rupeshs/fastsdcpu
| m3kw9 wrote:
| Is fast but only if you go 512 512 res will generate an image
| from start script to finish in 5 seconds, but if you up it to
| 1024 it takes 10x as long
|
| This on an M2 Max 32gb
| brucethemoose2 wrote:
| Yeah, high res performance is very non linear, especially
| without swapping out the attention for xformers,
| flashattention2 or torch SDP (and I don't think torch MPS works
| with any of those).
|
| That model doesn't work well at 1024x1024 anyway without some
| augmentations. You want this instead:
| https://huggingface.co/segmind/SSD-1B
| LauraMedia wrote:
| Thought it was too good to be true, tried it with an M2 Pro
| MacBook Pro.
|
| Generation takes 20-40 seconds, when using "--continuous" it
| takes 20-40 seconds once and then keeps generating every 3-5
| seconds.
| naet wrote:
| Well, how do they look? I've seen some other image generation
| optimizations, but a lot of them make a significant tradeoff in
| reduced quality.
___________________________________________________________________
(page generated 2023-10-27 23:01 UTC)