[HN Gopher] Generate images in one second on your Mac using a la...
       ___________________________________________________________________
        
       Generate images in one second on your Mac using a latent
       consistency model
        
       Author : bfirsh
       Score  : 181 points
       Date   : 2023-10-27 16:37 UTC (6 hours ago)
        
 (HTM) web link (replicate.com)
 (TXT) w3m dump (replicate.com)
        
       | AIorNot wrote:
       | Awesome
        
       | grandpa_yeti wrote:
       | Seeing this kind of image generation limited to M series Macs
       | just goes to show how far ahead Apple is in the notebook GPU
       | game.
        
         | minimaxir wrote:
         | It's _possible_ to do on non-Apple Silicon Macs, just more
         | annoying. There are a few generative AI implementations which
         | use raw Metal but not sure what the most popular one is.
        
         | andybak wrote:
         | I've got a Windows laptop with an RTX 3080 in it that runs this
         | model no problem. I don't have it to hand or else I'd post some
         | timings.
         | 
         | On my Desktop PC with a 4090 in I was getting speeds of 0.2 to
         | 0.3 seconds for reasonably acceptable quality settings so I
         | would expect 0.5s or so on the laptop.
         | 
         | What Apple _are_ ahead on is doing this on a fanless laptop
         | that doesn 't hit internal temperatures of triple digits.
        
           | traceroute66 wrote:
           | > What Apple are ahead on is doing this on a fanless laptop
           | that doesn't hit internal temperatures of triple digits.
           | 
           | You also forgot the bit where Apple are ahead of doing it on
           | a laptop that can achieve it _without_ needing to be tethered
           | to a power socket to achieve the performance.
        
             | alex_duf wrote:
             | It's the same thing, power is heat with talking about chips
        
             | echelon wrote:
             | > You also forgot the bit where Apple are ahead of doing it
             | on a laptop that can achieve it without needing to be
             | tethered to a power socket to achieve the performance.
             | 
             | Kind of sad that a huge anti-competitive, trillion dollar
             | company is the one offering it. Especially given their
             | stances around user freedom.
             | 
             | I'd much rather innovation be distributed. The goal posts
             | should be moved to a point everyone is pushing towards the
             | next thing. Having Apple be the only game in town is
             | unhealthy.
        
               | astrange wrote:
               | Would say that rather than one company being the only one
               | who can do it, there is only one company that can't do
               | it, and it's Intel.
        
           | brucethemoose2 wrote:
           | > What Apple are ahead on is doing this on a fanless laptop
           | that doesn't hit internal temperatures of triple digits.
           | 
           | I think you could pull this off on a Asus G14 in an ultra
           | power saver mode, with the fans off or running inaudibly. The
           | cooling is so beefy they will actually work fanless if you
           | throttle everything down and mostly keep the GPU asleep.
           | 
           | The M chips could certainly sustain image generation better
           | without a fan.
        
           | errnoh wrote:
           | 45it/s (0.1~s per image) on 7900XTX here, so it's still one
           | magnitude faster on GPU with a lot higher power draw than the
           | macs. Doing 10x slower with non-tethered is quite nice
           | outcome.
        
           | wasyl wrote:
           | At this point what Apple is ahead with is hype that M Macs
           | are that fast, and the developers targeting them because
           | things just work. Plenty of people should be able to run
           | these models locally but there's close to no nice software
           | that does that out of the box for Windows or Linux
        
             | astrange wrote:
             | It's because of the unified memory architecture. It's
             | harder/different to do this on x86, because you have to
             | have a large memory GPU and target that.
        
         | orbital-decay wrote:
         | Not sure why you think it's limited to M series Macs or has to
         | do anything with Apple at all. It's just an instruction on how
         | to run a diffusion model trained in a novel way on particular
         | hardware.
        
         | amelius wrote:
         | It mostly shows how shitty compatibility is between platforms
         | that share the same roots.
        
         | liuliu wrote:
         | The implementation is not even optimized for Macs. LCM is just
         | very easy to be fast (batch size = 1 and only 2 to 8 steps,
         | depending on what kind of headline you are trying to make).
        
         | filterfiber wrote:
         | They also have a decent advantage for LLMs because of their
         | memory bandwidth to system memory vs GPU's with limited VRAM
         | limited over PCIE to the system memory.
        
         | novaomnidev wrote:
         | Got this working on an intel Mac
        
       | oldstrangers wrote:
       | Interesting timing because part of me thinks Apple's Spooky Fast
       | event has to do with generative AI.
        
         | joshstrange wrote:
         | I think the current rumors are MBPs which would be odd to do
         | the pros before the base models but I wouldn't complain.
        
         | m3kw9 wrote:
         | Likely not show any generative software till macOS (next ver)
         | comes out, they don't usually showcase stand alone features
         | without a bigger strategy to include the OS
        
       | agloe_dreams wrote:
       | This....but a menu item that does it for you.
        
         | m3kw9 wrote:
         | Gpt4 likely can give you code for this
        
         | bigethan wrote:
         | Mac shortcuts are exactly the use case for this. Menu bar, ask
         | for a prompt, run script. I was always wary of shortcuts, but
         | they're quite powerful and nicely integrated with the OS in the
         | latest versions
        
       | herpdyderp wrote:
       | 32GB M1 Max is taking 25 seconds on the exact same prompt as in
       | the example.
       | 
       | Edit: it seems the "per second" requires the `--continuous` flag
       | to bypass the initial startup time. With that, I'm now seeing the
       | ~1 second per image time (if initial startup time is ignored).
        
         | m3kw9 wrote:
         | What does bypass startup time really do? Does it keep
         | everything in memory or something?
        
           | fassssst wrote:
           | Probably, you have to load the weights from disk at some
           | point.
        
             | echelon wrote:
             | That's exactly it. These models are huge.
        
               | cal85 wrote:
               | I'm probably missing something but if the bottleneck is
               | disk read speed, wouldn't it only take about 5-6 seconds
               | to fill the entire 32GB memory from disk? I just googled
               | and found a benchmark quoting 5,507 MB/s read on an M1
               | Max.
        
               | liuliu wrote:
               | PyTorch checkpoint is slow to load.
        
               | brucethemoose2 wrote:
               | The diffusers format this repo uses should be faster, but
               | there is still some overhead, yeah.
        
               | Yoric wrote:
               | Yeah, the PyTorch disk format is pretty bad.
        
       | ForkMeOnTinder wrote:
       | Why bother with the safety checker if the model is running
       | locally? I wonder how much faster it would be if the safety
       | checks were skipped.
        
         | ozr wrote:
         | Not much faster tbh, but it's a bit of virtue signaling you're
         | often required to do with generative AI.
        
         | whatsthenews wrote:
         | Seems like a waste of time, more of a nice to have/tip of the
         | cap to yud
        
         | radicality wrote:
         | Was gonna comment the same thing, feels ridiculous to include
         | it here for local use. I believe you should be able to remove
         | it if you edit the python inference code from huggingface.
         | 
         | edit: I tried it out by copying this pipeline file locally and
         | then disabling the safety checker.
         | https://raw.githubusercontent.com/huggingface/diffusers/main...
         | 
         | On my M1 macbook, did a test of 10 images, including the one-
         | off loading time. With checker: 10.51s, without safety checker:
         | 9.48s. So not that big of a hit.
        
         | Turing_Machine wrote:
         | I agree. It's pretty easy to bypass if you know a bit of
         | Python, though.
         | 
         | Doing a search for "nsfw" in all subdirectories seems to turn
         | up all the files you need to edit.
        
       | tobr wrote:
       | What will be possible to do once these things run at interactive
       | frame rates? It's a little mind boggling to think about what
       | types of experiences this will allow not so long from now.
        
         | Maxion wrote:
         | TTS --> Prompt --> Generating live imagery from your rambles?
        
         | iinnPP wrote:
         | Trippy VR is where my mind goes. Specifically with eye tracking
         | to determine where to go and what to generate next.
        
         | throwawayfm wrote:
         | Buy 60 machines, and it interactive.
        
           | astrange wrote:
           | Alas you've mixed up throughput and latency.
           | 
           | But you might be able to generate at 15fps and interpolate
           | between them or something.
        
       | simple10 wrote:
       | This is awesome! It only takes a few minutes to get installed and
       | running. On my M2 mac, it generates sequential images in about a
       | second when using the continuous flag. For a single image, it
       | takes about 20 seconds to generate due to the initial script
       | loading time (loading the model into memory?).
       | 
       | I know what I'll be doing this weekend... generating artwork for
       | my 9 yo kid's video game in Game Maker Studio!
       | 
       | Does anyone know any quick hacks to the python code to
       | sequentially prompt the user for input without purging the model
       | from memory?
        
         | Maxion wrote:
         | > It only takes a few minutes to get installed and running
         | 
         | A few minutes? I have to download at least 5GiB of data to get
         | this running.
        
           | simple10 wrote:
           | Lol. Yeah, I have 1.2Gb internet.
        
           | m3kw9 wrote:
           | The stupid script seem to not know how to save to disk, so it
           | downloads on every run.
        
         | simple10 wrote:
         | Answered my own question. Here's how to add an --interactive
         | flag to the script to continuously ask for prompts and generate
         | images without needing to reload the model into memory each
         | time.
         | 
         | https://github.com/replicate/latent-consistency-model/commit...
        
       | firechickenbird wrote:
       | Quality of these LCM is not the best though
        
         | simple10 wrote:
         | True, not the best quality, but still fantastic results for a
         | free model running locally on a laptop. Setting the steps
         | between 10-20 seemed to produce the best results for me for
         | realistic looking images. About one out of 10 images were
         | useful for my test case of "a realistic photo of a german
         | shepard riding a motorcycle through Tokyo at night"
         | 
         | https://github.com/simple10/ai-image-generator/blob/main/exa...
        
           | brucethemoose2 wrote:
           | > Setting the steps between 10-20
           | 
           | But thats the point where regular diffusion (with the UniPC
           | scheduler and FreeU) overtakes this in terms of quality.
        
             | simple10 wrote:
             | Good point. I haven't done a lot of testing yet. I'm not
             | sure if the default of 8 steps yields poorer results than
             | 10-20 steps. Either way, it was fast on my M2 mac with 8 to
             | 20 steps, much faster than other models I've played with.
        
       | m3kw9 wrote:
       | Everytime i execute: python main.py \ "a beautiful apple floating
       | in outer space, like a planet" \ --steps 4 --width 512 --height
       | 512
       | 
       | It redownloads 4 gigs worth of stuff every execution. Can't you
       | have the script save, and check if its there, then download it or
       | am I doing something wrong?
        
         | simple10 wrote:
         | Did you enable the virtualenv first? If not, it might not be
         | caching the models properly.
        
         | jandrese wrote:
         | For me it does not re-download anything on the second run. But
         | it is also only running on the CPU and is slow AF.
         | 
         | With 5 iterations the quality is...not good. It looks just like
         | Stable Diffusion with low iteration count. Maybe there is some
         | magic that kicks in if you have a more powerful Mac?
        
       | simple10 wrote:
       | Does anyone know of other image generation models that run well
       | on a M1/M2 mac laptop?
       | 
       | I'd like do to some comparison testing. The model in the post is
       | fast but results are hit or miss for quality.
        
         | liuliu wrote:
         | There are plenty of models to try with Draw Things app. You can
         | try SDXL on it to see what's the quality looks like. The speed
         | comparison here: https://engineering.drawthings.ai/integrating-
         | metal-flashatt...
        
           | simple10 wrote:
           | Thanks!
        
         | brucethemoose2 wrote:
         | https://github.com/lllyasviel/Fooocus#mac
         | 
         | Its not fast, but its SOTA local quality as far as I know, and
         | I've tried many UIs and augmentations.
         | 
         | Also, maybe it will run better if you grab Pytorch 2.1 or
         | nightly.
        
       | hackthemack wrote:
       | If you want to run this on a linux machine and use the machine's
       | cpu.
       | 
       | Follow the instructions. Before actually running the command to
       | generate an image.
       | 
       | Open up main.py Change line 17 to model.to(torch_device="cpu",
       | torch_dtype=torch.float32).to('cpu:0')
       | 
       | Basically change the backend from mps to cpu
        
         | brucethemoose2 wrote:
         | For linux CPU only, you want
         | https://github.com/rupeshs/fastsdcpu
        
       | m3kw9 wrote:
       | Is fast but only if you go 512 512 res will generate an image
       | from start script to finish in 5 seconds, but if you up it to
       | 1024 it takes 10x as long
       | 
       | This on an M2 Max 32gb
        
         | brucethemoose2 wrote:
         | Yeah, high res performance is very non linear, especially
         | without swapping out the attention for xformers,
         | flashattention2 or torch SDP (and I don't think torch MPS works
         | with any of those).
         | 
         | That model doesn't work well at 1024x1024 anyway without some
         | augmentations. You want this instead:
         | https://huggingface.co/segmind/SSD-1B
        
       | LauraMedia wrote:
       | Thought it was too good to be true, tried it with an M2 Pro
       | MacBook Pro.
       | 
       | Generation takes 20-40 seconds, when using "--continuous" it
       | takes 20-40 seconds once and then keeps generating every 3-5
       | seconds.
        
       | naet wrote:
       | Well, how do they look? I've seen some other image generation
       | optimizations, but a lot of them make a significant tradeoff in
       | reduced quality.
        
       ___________________________________________________________________
       (page generated 2023-10-27 23:01 UTC)