[HN Gopher] Running Stable Diffusion in 260MB of RAM
___________________________________________________________________
Running Stable Diffusion in 260MB of RAM
Author : Robin89
Score : 189 points
Date : 2023-07-20 17:01 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| stmblast wrote:
| This is really neat! Always cool to see what people can do with
| less.
| johnklos wrote:
| In 260 megs of RAM?!? I'm going to try this on my Amiga!
|
| Check back in a few months for my results...
| 13of40 wrote:
| Look at moneybags over here with his "megs" of RAM. I think
| mine only had 256K available after the kickstart disk was
| loaded.
| johnklos wrote:
| I splurged.
|
| http://lilith.zia.io/
| vikasr111 wrote:
| Interesting. Which platform/PC config did you use?
| crest wrote:
| Does this mean you could fit its whole working set in the cache
| hierarchy of a modern high end GPU getting near 100% ALU
| utilisation?
| Tuna-Fish wrote:
| It streams the weights. This is going to be what limits
| performance, not alu utilization.
| isoprophlex wrote:
| Incredible! If only there was some cheap hackable eink frame, you
| could make a fully self contained artwork from eink panel + rpi
| that's (slowly) continuously updating itself..!
| mananaysiempre wrote:
| Like a continuously updating wall-mounted newspaper[1]?
|
| [1] https://imgur.io/a/NoTr8XX
| andrewmunsell wrote:
| There definitely are some:
| https://shop.pimoroni.com/search?q=e-ink
|
| And now I think I know what my next project is going to be, I
| am sure I can find some desk space
| isoprophlex wrote:
| Yessss! I looked into building some self contained "slow
| tech" generative art using eink a couple of years ago but it
| was just impossible for my tiny budget. This is great,
| thanks!!
|
| Edit..: I'm so hyped about this; the example image on TFA
| takes +2 hours to generate, but who cares?! I'd love to have
| a little display that churns around in the background and
| creates a new variation on my prompt every whatever hours,
| displaying the results on an unobtrusive eink screen.
| mw63214 wrote:
| Is it possible to incorporate a personalized "context" into
| the generator? Weather, market/news sentiments, calendar
| events, etc... to style the end result.
| qwertox wrote:
| I love the idea.
| causi wrote:
| Make sure you build in a capacity to save all the previous
| iterations in case you see something you really like.
| isoprophlex wrote:
| Haha I like the idea of walking past, glancing now and
| then to see if there's something you really love...
|
| but on the other hand I would also love the statement
| behind something unconnected to the internet that's
| slowly churning out unique, ephemeral pictures. Yours to
| enjoy, then gone forever.
| civilitty wrote:
| You can make a digital sand mandala [1]
|
| [1] https://en.m.wikipedia.org/wiki/Sand_mandala
| nicollegah wrote:
| Wait are these inference times real? 1 second on a Raspi? Do I
| get this right? This is faster than on my GPU. What's going on
| here?
| xnzakg wrote:
| Pretty sure that is just the text encoding step. Generating a
| complete image took 3h if I read correctly.
|
| update: "Tests were run on my development machine: Windows
| Server 2019, 16GB RAM, 8750H cpu (AVX2), 970 EVO Plus SSD, 8
| virtual cores on VMWare."
| Kuinox wrote:
| I think it's the inference time per iteration.
| mottiden wrote:
| Amazing work!
| boredemployee wrote:
| That's really cool! I always thought you needed a good amount of
| GPU VRAM to generate images using SD.
|
| I wonder how fast would a consumer PC, with no GPU, generate an
| image with say 16gb of RAM?
| atrus wrote:
| I was using a 6ish year old amd cpu with 16gigs of ram and
| generating a prompt would take about a half hour. Which is
| still massively impressive for what it is.
| londons_explore wrote:
| Use a free GPU from google colab and you can do the same in
| about 15 seconds...
| boredemployee wrote:
| Do you have a google colab link?
| hadlock wrote:
| There is no shortage of google collab stable diffusion
| tutorials on the web
| idiotsecant wrote:
| yes, and if he does it on a paid machine with a better GPU
| it'll be even faster!
|
| While true, neither your statement or mine above is germane
| to the discussion. It wasn't about how long it takes. It's
| a discussion of how cool it is that it can be done on that
| machine at all.
| wsgeorge wrote:
| On an Apple M1 with 16gig RAM, without using Pytorch compiled
| to take advantage of Metal, it could take 12mins to generate an
| image with a tweet-length prompt. With Metal, it takes less
| than 60 seconds.
| asynchronous wrote:
| Metal is such an advantage, had no idea
| ilkke wrote:
| Prompt length shouldn't influence creation time, at least it
| didn't in any of the implementations I used.
|
| What is the resolution of your images and number of steps?
| wsgeorge wrote:
| Defaults from the Huggingface repo, just copy-pasted. So,
| iirc 50 steps and the image is 512x512.
|
| Edit: confirmed.
|
| > Prompt length shouldn't influence creation time...
|
| Yeah, checks out with my experience too. Longer prompts
| were truncated.
| Filligree wrote:
| Some tools (e.g. Automatic1111) are able to feed in
| longer prompts, but then the prompt length does affect
| the speed of inference.
|
| Albeit in 77 token increments.
| danieldk wrote:
| And PyTorch on the M1 (without Metal) uses the fast AMX
| matrix multiplication units (through the Accelerate
| Framework). The matrix multiplication on the M1 is on par
| with ~10 threads/cores of Ryzen 5900X.
|
| [1] https://github.com/danieldk/gemm-benchmark#example-
| results
| zirgs wrote:
| "It runs Stable Diffusion" is the new "It runs Doom".
| Minor49er wrote:
| Now I'm wondering: could a monkey hitting random keys on a
| keyboard for an infinite amount of time eventually come up with
| the right prompts to get GPT-4 to produce code that compiles to
| a faithful reproduction of Doom?
| LordDragonfang wrote:
| Probably more easily than you'd think. DOOM is open
| source[1], and as GP alludes, is probably the most frequently
| ported game in existence, so its source code almost certainly
| appears multiple times in GPT-4's training set, likely
| alongside multiple annotated explanations.
|
| [1] https://github.com/id-Software/DOOM
| [deleted]
| speedgoose wrote:
| I like the use of a tiny device to generate the images. I was
| wondering whether the energy consumption per image would be
| lower, but I did the simple maths and it's not the case.
|
| A raspberry pi zero 2W seems to use about 6W under load (source:
| https://www.cnx-software.com/2021/12/09/raspberry-pi-zero-2-... )
|
| So if it takes 3 hours to generate one picture, that's about 18Wh
| per image.
|
| A Nvidia Tesla or RTX GPU can generate a similar picture very
| quickly. Assuming one second per image and 350W under load for
| the whole system it's in the magnitude of 0.1Wh per image.
|
| Of course we could consider that a raspberry pi zero uses a lot
| less ressources and energy to be manufactured and transported.
| hadlock wrote:
| For on prem use, the up front cost is a lot lower. The A100
| that most serious outfits are using runs in the thousands to
| tens of thousands of dollars per unit with very limited
| availability. The pi is typically under $75 usd for any
| variant.
| speedgoose wrote:
| A RTX 4090 has a much better value for stable diffusion but
| yes if you start to think about cost the pi wins. If you
| think about availability, I'm not sure.
| hadlock wrote:
| The big immediate plus here, is if you live somewhere with
| limited access to the internet, you can still generate
| imagery offline on a low end laptop, like a protest group
| in far eastern europe or other areas. My personal travel
| laptop only has 8GB memory so it's exciting to be able to
| try out an idea even if I don't have high end hardware.
| saqadri wrote:
| Incredible! The march continues to get more models to run on the
| edge, much faster than I anticipated. The static quantization and
| slicing techniques here are pretty cool
| asynchronous wrote:
| I've been amazed at how quickly the open source community has
| iterated on LLMs and Diffusion models. Goes to show how well
| open source can work.
___________________________________________________________________
(page generated 2023-07-20 23:00 UTC)