[HN Gopher] Stable Diffusion with Core ML on Apple Silicon
___________________________________________________________________
Stable Diffusion with Core ML on Apple Silicon
Author : 2bit
Score : 247 points
Date : 2022-12-01 20:21 UTC (2 hours ago)
(HTM) web link (machinelearning.apple.com)
(TXT) w3m dump (machinelearning.apple.com)
| [deleted]
| zimpenfish wrote:
| Man, this takes a ton of room to do the CoreML conversions - ran
| out of space doing the unet conversion even though I started with
| 25GB free. Going on a delete spree to get it up to 50GB free
| before trying again.
| mark_l_watson wrote:
| Great stuff. I like that they give directions for both Swift and
| Python
|
| This gets you text descriptions to images.
|
| I have seen models that given a picture, then generate similar
| pictures. I want this because while I have many pictures of my
| grandmothers, I only have a couple of pictures of my grandfathers
| and it would be nice to generate a few more.
|
| Core ML is so well done. A year ago I wrote a book on Swift AI
| and used Core ML in several examples.
| tosh wrote:
| Atila from Apple on the expected performance:
|
| > For distilled StableDiffusion 2 which requires 1 to 4
| iterations instead of 50, the same M2 device should generate an
| image in <<1 second
|
| https://twitter.com/atiorh/status/1598399408160342039
| hbn wrote:
| SD2 is the one that was neutered, right?
|
| Maybe a dumb question but can the old model still be run?
| [deleted]
| qclibre22 wrote:
| Also, can you not "upgrade" but still run new models?
| astrange wrote:
| You can do anything you want.
|
| SD2 wasn't "neutered", the piece of it from OpenAI that
| knew a lot of artist names but wasn't reproduceable was
| replaced with a new one from Stability that doesn't. You
| can fine-tune anything you want back in.
| kyleyeats wrote:
| It's less versatile out of the box. Give it a couple months
| for the community to catch up. Everyone is still figuring out
| what goes where, and SD 1.x was "everything goes in one
| spot." It was cool and powerful, but limited.
| minimaxir wrote:
| You can still do nice things with SD2, it just requires a
| different approach.
| https://news.ycombinator.com/item?id=33780543
| cammikebrown wrote:
| If you told me this was possible when I bought an M1 Pro less
| than a year ago, I wouldn't believe you. This is insane.
| peppertree wrote:
| Last nail in the coffin for DALL*E.
| m00dy wrote:
| yeah, finally we see the real openAI
| visarga wrote:
| more open than open source, it's the open model age
| astrange wrote:
| I think they can move upmarket just as well as anyone else.
| mensetmanusman wrote:
| Not really, everyone will have their own flavor on how to
| rapidly train the model.
|
| Dall-e et. al will still be able to bandwagon off of all the
| free ecosystem being built around the $10M SD1.4 model that
| is showing what is possible.
|
| E.g. Dall-e could go straight to Hollywood if their model
| training works better than SD's. The toolsets will work
| chasd00 wrote:
| i'm very ignorant here so forgive me but if it can generate
| images that fast can it be used to generate a video?
| valgaze wrote:
| Video is really a series of frames, the framerate for
| film/human can get away with 24 frames/second-- so maybe
| ~40ms/image for real-time at least?
|
| What's cool about the era in which we live is if you look at
| high-performance graphics for games or simulations, for
| instance, it may in fact be _faster_ to a the model to
| "enhance" a low-resolution frame rather than trying to render
| it fully on the machine.
|
| ex. AMD's FSR vs NVIDIA DLSS
|
| - AMD FSR (Fidelity FX Super Resolution):
| https://www.amd.com/en/technologies/fidelityfx-super-
| resolut...
|
| - NVIDIA DLSS (Deep Learning Super Sampling):
| jhttps://www.nvidia.com/en-us/geforce/technologies/dlss/
|
| AMD's approach renders the game at a crummy, low-detail
| resolution then each frame uses "upscales"
|
| Both FSR and DLSS aim to improve frames-per-second in games
| by rendering them below your monitor's native resolution,
| then upscaling them to make up the difference in sharpness.
| Currently, FSR uses spatial upscaling, meaning it only
| applies its upscaling algorithm to one frame at a time.
| Temporal upscalers, like DLSS, can compare multiple frames at
| once, to reconstruct a more finely-detailed image that both
| more closely resembles native res and can better handle
| motion. DLSS specifically uses the machine learning
| capabilities of GeForce RTX graphics cards to process all
| that data in (more or less) real time.
|
| Video is really a series of frames, the framerate for
| film/human could get away with 24 frames/second-- ~40ms/image
| for real-time.
|
| What's cool about the era in which we live is if you look at
| high-performance graphics for games or simulations, it may in
| fact be _faster_ to run the model on each frame to "enhance"
| a low-resolution frame rather than trying to render it fully
| on the machine.
|
| ex. AMD's FSR vs NVIDIA DLSS
|
| - AMD FSR (Fidelity FX Super Resolution):
| https://www.amd.com/en/technologies/fidelityfx-super-
| resolut...
|
| - NVIDIA DLSS (Deep Learning Super Sampling):
| https://www.nvidia.com/en-us/geforce/technologies/dlss/
|
| AMD's approach renders the game at a crummy, low-detail
| resolution then use "spatial upscaling" to enhance the images
| one frame at a time.
|
| NVIDIA DLSS uses "temporal upscaling" to pass over multiple
| frames and uses other capabilities exclusive to Nvidia's
| cards to stitch together the frames.
|
| This is a different challenge than generating the content
| from scratch
|
| I don't think this is possible in real-time yet, but someone
| put a filter trained on the German country side to produce
| photorealistic Grand Theft Auto driving gameplay:
|
| https://www.youtube.com/watch?v=P1IcaBn3ej0
|
| Notice the mountains in the background go from Southern
| California brown to lush green
|
| https://www.rockpapershotgun.com/amd-fsr-20-is-a-more-
| demand....
| vletal wrote:
| Yeah, sure. The issue is with temporal consistency. Meta and
| Google have some successes in that area.
|
| https://mezha.media/en/2022/10/06/google-is-working-on-
| image...
|
| Give it some time and SD will be able to do the same.
| gcanyon wrote:
| There are different requirements for generating video -- at a
| minimum, continuity is tough. There are models for producing
| video, but (as far as I've seen) they're still a bit wobbly.
| mrtksn wrote:
| With the full 50 iterations it appears to be about 30s on M1.
|
| They have some benchmarks on the github repo:
| https://github.com/apple/ml-stable-diffusion
|
| For reference, previously I was getting about <3 minutes for 50
| iterations on my Macbook Air M1. I haven't yet tried Apple's
| implementation but it looks like a huge improvement. It might
| take it from "possible" to "usable".
| washadjeffmad wrote:
| For comparison, it's also taking ~3min @ 50 iterations on my
| 12c Threadripper using OpenVino. It sounds like the
| improvements bring the M1 performance roughly in line with a
| GTX 1080.
| liuliu wrote:
| Yeah, it is just PyTorch MPS backend is not fully baked and
| have some slowness. You should be able to get close to that
| number with maple-diffusion (probably 10% slower) or my app:
| https://drawthings.ai/ (probably around 20% slower, but it
| supports samplers that takes less steps (50 -> 30)).
| minimaxir wrote:
| Note that this is extrapolation for the _distilled_ model which
| isn 't released quite yet. (but it will be very exciting when
| it does!)
| neonate wrote:
| https://github.com/apple/ml-stable-diffusion
| christiangenco wrote:
| Oh gosh that's an intimidating installation process. I'll be
| much more interested when I can just `brew install` a binary.
| artimaeis wrote:
| A bit different take is DiffusionBee, if you're curious to
| try it out in a GUI form.
|
| https://diffusionbee.com
| aryamaan wrote:
| does it use the optimised model for Apple chips?
| belthesar wrote:
| Not yet, likely, but the project is very active. I could
| see it coming quite soon.
| bredren wrote:
| I've used this a fair amount but am not sure it's much
| better place to begin than automatic1111, especially for
| the HN crowd.
| thepasswordis wrote:
| Where are you seeing the installation process?
| MuffinFlavored wrote:
| I could be wrong but I think part of the issue is this needs
| some large files for the trained dataset?
| [deleted]
| gedy wrote:
| > Oh gosh that's an intimidating installation process
|
| I'm not seeing any installation instructions on either link -
| what am I missing?
| alexfromapex wrote:
| All I had to do was:
|
| - create a virtual environment
|
| - upgrade pip
|
| - install the nightly PyTorch (command on their website)
|
| - pip install -r requirements.txt
|
| - and then, python setup.py install
|
| - Still trying to figure out Swift part???
| pkage wrote:
| How does this compare with using the Hugging Face `diffusers`
| package with MPS acceleration through PyTorch Nightly? I was
| under the impression that that used CoreML under the hood as well
| to convert the models so they ran on the Neural Engine.
| [deleted]
| liuliu wrote:
| It doesn't. MPS largely is on GPU. PyTorch's MPS implementation
| is incomplete a few weeks ago as well. This is about 3x faster.
| behnamoh wrote:
| This may sound naive, but what are some use cases of running SD
| models locally? If the free/cheap options exist (like running SD
| on powerful servers), then what's the advantage of this new
| method?
| gjsman-1000 wrote:
| Powerful servers with GPUs are expensive. Laptops you already
| own, aren't.
| sofaygo wrote:
| > There are a number of reasons why on-device deployment of
| Stable Diffusion in an app is preferable to a server-based
| approach. First, the privacy of the end user is protected
| because any data the user provided as input to the model stays
| on the user's device. Second, after initial download, users
| don't require an internet connection to use the model. Finally,
| locally deploying this model enables developers to reduce or
| eliminate their server-related costs.
| yazaddaruvala wrote:
| "Hey Siri, draw me a purple duck" and it all happens without an
| internet connection!
|
| If you mean monetary usecases: Roughly something like
| Photoshop/Blender/UnrealEngine with ML plugins that are low
| latency, private, and $0 server hosting costs.
| jwitthuhn wrote:
| Even with the slower pytorch implementation my M1 Pro MBP,
| which tops out at consuming ~100W of power, can generate a
| decent image in 30 seconds.
|
| I'm not sure exactly what that costs me in terms of power, but
| it is assuredly less than any of these services charge for a
| single image generation.
| tosh wrote:
| Works offline, privacy, independent of SaaS (API stability,
| longevity, ...). I'm sure there are more.
| mensetmanusman wrote:
| Soon you will be able to render home imovies like they were
| edited by the team that made the dark knight (which costs
| ~$100k/min if done professionally).
___________________________________________________________________
(page generated 2022-12-01 23:00 UTC)