[HN Gopher] OnnxStream: Stable Diffusion XL 1.0 Base on a Raspbe...
___________________________________________________________________
OnnxStream: Stable Diffusion XL 1.0 Base on a Raspberry Pi Zero 2
Author : Robin89
Score : 80 points
Date : 2023-12-14 20:43 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| dpflan wrote:
| "298MB of RAM" if you were wondering about some constraint.
| m3kw9 wrote:
| Could be a nice wall paper generator every 29 min. Input a big
| list of random prompts and let it rotate
| taf2 wrote:
| Output to epaper
| mouse_ wrote:
| That is, if you don't mind your wallpaper consuming 100% system
| resources, heh.
| askonomm wrote:
| 298MB of RAM is hardly 100%, as per this example. Slack takes
| up way more RAM than that.
| sunpazed wrote:
| See this related project: https://github.com/rvdveen/epaper-
| slow-generative-art
| filterfiber wrote:
| This project is a fun POC but it's not very practical for that
| type of application.
|
| A 4090 can generate over 100 images a second with turbo+lcm and
| a few techniques, you can make 2 days worth of images in 1
| seconds. You could make a years worth in roughly 3 minutes and
| put them on the sd card
| Sharlin wrote:
| But that's not the point, obviously. Sometimes, being slow is
| a feature. Besides, a 4090 costs more than a small car.
| cheeze wrote:
| $1600 is more than a car?
|
| I feel like you can't even find driveable cars that will
| last 100 miles at that price point anymore.
| omgwtfbyobbq wrote:
| You probably can, but it'll take some time. The supply of
| reasonable reliable $500-$1000 beaters is a lot less than
| it used to be.
| filterfiber wrote:
| > But that's not the point, obviously.
|
| If you want to say the zero2-w is what's making it then
| sure.
|
| > Besides, a 4090 costs more than a car.
|
| They only cost ~0.70USD for 1 hr. In fact you could put
| this on an A100 for 1$/hr. Renting would make the most
| sense for this type of thing.
| omgwtfbyobbq wrote:
| It depends on what you're using the images for.
|
| If there's a human in the loop, 100 images/s is likey too
| much volume, especially if prompt engineering is needed.
|
| At the same time, 2 images/hr is way too slow.
| omgwtfbyobbq wrote:
| Do your have references for that?
|
| I found this claiming an A100 can generate 1 image/s.
|
| https://oneflow2020.medium.com/text-to-image-in-less-
| than-1-...
| johnklos wrote:
| It's so nice of you to offer to buy 4090 cards for people who
| can only otherwise afford Raspberry Pis ;)
| atlas_hugged wrote:
| Impressive work
| taf2 wrote:
| So it's safe to assume in the next 10 years - AI will be running
| locally on every device from phones, laptops and even many
| embedded devices. Even robots- from street cleaning bots to
| helpful human assistants?
| ukuina wrote:
| Yes, every processor will have an AI core or two.
| ranting-moth wrote:
| Probably not even 10 years.
| hereonout2 wrote:
| Yeah, that's happening right now really? There has been loads
| of developments in the mobile space already, in many ways lower
| powered arm devices are way more optimised for AI applications
| than the current crop of intel machines.
|
| This example, whilst impressive, feels way more in the "Doom
| running on a calculator" vein of progress though.
| fragmede wrote:
| Every Google Home device is already running an ML model to do
| speech recognition to recognize the "hey Google" wake word, so
| sooner than 10 years. The Raspberry Pi Zero is a particularly
| underpowered device for this. Doing it on the Coral TPU
| accelerator plugged into a pi zero would take less than 30
| mins. Doing it on an iPhone 15 would take less time. Doing it
| on a Pixel 8 would be faster. Not to diminish getting it to
| work on a Pi Zero, but that future is already here, just as
| soon as we figure out what to do with them.
|
| https://coral.ai/products/accelerator/
| dontwearitout wrote:
| There's an ocean of difference between optimizing for a
| single wakeword and the class of models that are taking off
| today. I'm excited for more on-board processing, because it
| will mean less dependency on the cloud.
| hereonout2 wrote:
| Siri came out 12 years ago, wake words are probably a bad
| example.
|
| Better examples are the magic eraser on my 2 generation old
| pixel phone or the fact llama2 runs genuinely fast on a Mac
| mini
| joegibbs wrote:
| Siri's wake word stuff is also terrible, she gets
| constantly activated whenever I have my Apple Watch near
| running water, frying food or anything else that makes a
| white noise-type sound.
| caycep wrote:
| I mean, diffusion models tend to be less computationally
| expensive than say CNNs or LLMs, so probably? And before than
| ppl ran SVMs, random forest, and other forms of non-gpu
| intensive ML algorithms locally as well...
| nextworddev wrote:
| Probably less than 5 years, maybe 2
| mrtksn wrote:
| The bottleneck is probably the availability of lithography
| machines that can make ubiquitous chips for processing that
| much data quickly enough without heating or drawing electricity
| too much.
|
| Not too far in the future every device will have some 5nm or
| better tech LLM chip inside and devices understanding natural
| language will be the norm.
|
| By dumb machines, people will mean machines that have to be
| programmed by people using ancient techniques where everything
| the machine is supposed to do is written step by step in a low
| level computer language like JavaScript.
|
| Nerds will be making demos of doing something incredibly fast
| by directly writing the algorithms by hand, and will be annoyed
| by the fact that something that can be done in 20 lines of code
| on few hundred MB of RAM in NodeJS now requires a terabyte of
| RAM.
|
| Dumb phone will be something like iPhone 15 pro or Pixel 8 Pro
| where you have separate apps for each thing you do and you
| can't simply ask the device to do it for you.
| godelski wrote:
| Yes and no. Context matters.
|
| Will models of similar quality to the current LLaMA, GPT, and
| Stable Diffusion be running locally on devices and edge
| systems? Very likely.
|
| Will much higher quality models that still require compute
| incapable by such edge or consumer devices be available, sold
| as a service, and in high use? Also very likely.
|
| So expect current quality to make it to your devices but don't
| expect to necessarily move everything to local because the
| whole ecosystem will improve too. Overton window will shift and
| it's like asking if gaming will move to phones. In some ways
| yes, in other ways you're still going to want to buy that
| Playstation/Xbox/PC.
| michaelaiello wrote:
| Are there llms models that will run on small ripis similar to
| llama?
| hmry wrote:
| CPU-only?
| Zetobal wrote:
| I know 29 minutes is long but theoretically you can have all the
| images you ever want in a small 6gb package and run inference on
| (nearly) everything. That's fucking amazing.
| godelski wrote:
| But honest question, if this is your goal, why not use a GAN
| instead? You should still be able to produce high quality
| images but at a much faster rate (I'd guess around 10
| minutes?). Sure, you'll have a bit lower diversity and maybe
| not SOTA quality image generation, but neither is this thing.
| Or you could reduce quality. This reddit user seems to be doing
| fast inference on a pi[0] using stylegan, but that's before
| mobile stylegan came out which uses <1GB for inference. (It is
| a distilled StyleGAN2 model. We could distill more recent
| models)
|
| Just seems like different models, different contexts. Certainly
| you'd want diffusion on the computer you're doing photoshop on,
| but random images? Different context.
|
| [0]
| https://www.reddit.com/r/raspberry_pi/comments/hf7lbh/i_made...
| GaggiX wrote:
| The quality is not really close, also StyleGAN2 is not
| conditioned on text.
| pmontra wrote:
| It reminds me of the time it took to generate mandelbrots on home
| computers in the 80s.
| johnklos wrote:
| https://www.klos.com/~john/mandelbrot.jpg
| dang wrote:
| Submitted title was "Stable Diffusion Turbo on a Raspberry Pi
| Zero 2 generates an image in 29 minutes", which is good to know
| in order to understand some of the comments posted before I
| changed the title.
|
| Submitters: if you want to say what you think is important about
| an article, that's fine, but do it by adding a comment to the
| thread. Then your view will be on a level playing field with
| everyone else's:
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
| practice9 wrote:
| Wait, but the new title doesn't seem to be correct
| dang wrote:
| We certainly want it to be correct! I took "Stable Diffusion
| XL 1.0 Base on a Raspberry Pi Zero 2" from the About part of
| https://github.com/vitoplantamura/OnnxStream/tree/c0cb4b3d7b.
| ... Is it wrong?
| Lerc wrote:
| Nice to see people finding way to get the square peg though the
| round hole.
|
| Something that I wondered when the Raspberry Pi 5 came out is the
| weirdness that might be possible now that they have their own
| chip doing IO cleverness.
|
| On the PI 5, the two MIPI interfaces to do either output or
| input, It made me wonder if the ports are now generalized enough
| that you could daisy chain a string of PI 5's connecting MIPI to
| MIPI. Then you could run inference layers on individual PIs and
| pass the activations down the MIPI. 10x 8Gig Pi 5's might not be
| the speediest way to get an 80gig setup, but it would certainly
| be the cheapest (for now)
| johnklos wrote:
| I would've loved if this were more portable. It requires XNNPACK,
| which has no generic c implementation. I'd've loved to see Stable
| Diffusion running on an Alpha, SPARC, or m68k.
___________________________________________________________________
(page generated 2023-12-14 23:00 UTC)