[HN Gopher] OnnxStream: Stable Diffusion XL 1.0 Base on a Raspbe...
       ___________________________________________________________________
        
       OnnxStream: Stable Diffusion XL 1.0 Base on a Raspberry Pi Zero 2
        
       Author : Robin89
       Score  : 80 points
       Date   : 2023-12-14 20:43 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dpflan wrote:
       | "298MB of RAM" if you were wondering about some constraint.
        
       | m3kw9 wrote:
       | Could be a nice wall paper generator every 29 min. Input a big
       | list of random prompts and let it rotate
        
         | taf2 wrote:
         | Output to epaper
        
         | mouse_ wrote:
         | That is, if you don't mind your wallpaper consuming 100% system
         | resources, heh.
        
           | askonomm wrote:
           | 298MB of RAM is hardly 100%, as per this example. Slack takes
           | up way more RAM than that.
        
         | sunpazed wrote:
         | See this related project: https://github.com/rvdveen/epaper-
         | slow-generative-art
        
         | filterfiber wrote:
         | This project is a fun POC but it's not very practical for that
         | type of application.
         | 
         | A 4090 can generate over 100 images a second with turbo+lcm and
         | a few techniques, you can make 2 days worth of images in 1
         | seconds. You could make a years worth in roughly 3 minutes and
         | put them on the sd card
        
           | Sharlin wrote:
           | But that's not the point, obviously. Sometimes, being slow is
           | a feature. Besides, a 4090 costs more than a small car.
        
             | cheeze wrote:
             | $1600 is more than a car?
             | 
             | I feel like you can't even find driveable cars that will
             | last 100 miles at that price point anymore.
        
               | omgwtfbyobbq wrote:
               | You probably can, but it'll take some time. The supply of
               | reasonable reliable $500-$1000 beaters is a lot less than
               | it used to be.
        
             | filterfiber wrote:
             | > But that's not the point, obviously.
             | 
             | If you want to say the zero2-w is what's making it then
             | sure.
             | 
             | > Besides, a 4090 costs more than a car.
             | 
             | They only cost ~0.70USD for 1 hr. In fact you could put
             | this on an A100 for 1$/hr. Renting would make the most
             | sense for this type of thing.
        
               | omgwtfbyobbq wrote:
               | It depends on what you're using the images for.
               | 
               | If there's a human in the loop, 100 images/s is likey too
               | much volume, especially if prompt engineering is needed.
               | 
               | At the same time, 2 images/hr is way too slow.
        
           | omgwtfbyobbq wrote:
           | Do your have references for that?
           | 
           | I found this claiming an A100 can generate 1 image/s.
           | 
           | https://oneflow2020.medium.com/text-to-image-in-less-
           | than-1-...
        
           | johnklos wrote:
           | It's so nice of you to offer to buy 4090 cards for people who
           | can only otherwise afford Raspberry Pis ;)
        
       | atlas_hugged wrote:
       | Impressive work
        
       | taf2 wrote:
       | So it's safe to assume in the next 10 years - AI will be running
       | locally on every device from phones, laptops and even many
       | embedded devices. Even robots- from street cleaning bots to
       | helpful human assistants?
        
         | ukuina wrote:
         | Yes, every processor will have an AI core or two.
        
         | ranting-moth wrote:
         | Probably not even 10 years.
        
         | hereonout2 wrote:
         | Yeah, that's happening right now really? There has been loads
         | of developments in the mobile space already, in many ways lower
         | powered arm devices are way more optimised for AI applications
         | than the current crop of intel machines.
         | 
         | This example, whilst impressive, feels way more in the "Doom
         | running on a calculator" vein of progress though.
        
         | fragmede wrote:
         | Every Google Home device is already running an ML model to do
         | speech recognition to recognize the "hey Google" wake word, so
         | sooner than 10 years. The Raspberry Pi Zero is a particularly
         | underpowered device for this. Doing it on the Coral TPU
         | accelerator plugged into a pi zero would take less than 30
         | mins. Doing it on an iPhone 15 would take less time. Doing it
         | on a Pixel 8 would be faster. Not to diminish getting it to
         | work on a Pi Zero, but that future is already here, just as
         | soon as we figure out what to do with them.
         | 
         | https://coral.ai/products/accelerator/
        
           | dontwearitout wrote:
           | There's an ocean of difference between optimizing for a
           | single wakeword and the class of models that are taking off
           | today. I'm excited for more on-board processing, because it
           | will mean less dependency on the cloud.
        
             | hereonout2 wrote:
             | Siri came out 12 years ago, wake words are probably a bad
             | example.
             | 
             | Better examples are the magic eraser on my 2 generation old
             | pixel phone or the fact llama2 runs genuinely fast on a Mac
             | mini
        
               | joegibbs wrote:
               | Siri's wake word stuff is also terrible, she gets
               | constantly activated whenever I have my Apple Watch near
               | running water, frying food or anything else that makes a
               | white noise-type sound.
        
         | caycep wrote:
         | I mean, diffusion models tend to be less computationally
         | expensive than say CNNs or LLMs, so probably? And before than
         | ppl ran SVMs, random forest, and other forms of non-gpu
         | intensive ML algorithms locally as well...
        
         | nextworddev wrote:
         | Probably less than 5 years, maybe 2
        
         | mrtksn wrote:
         | The bottleneck is probably the availability of lithography
         | machines that can make ubiquitous chips for processing that
         | much data quickly enough without heating or drawing electricity
         | too much.
         | 
         | Not too far in the future every device will have some 5nm or
         | better tech LLM chip inside and devices understanding natural
         | language will be the norm.
         | 
         | By dumb machines, people will mean machines that have to be
         | programmed by people using ancient techniques where everything
         | the machine is supposed to do is written step by step in a low
         | level computer language like JavaScript.
         | 
         | Nerds will be making demos of doing something incredibly fast
         | by directly writing the algorithms by hand, and will be annoyed
         | by the fact that something that can be done in 20 lines of code
         | on few hundred MB of RAM in NodeJS now requires a terabyte of
         | RAM.
         | 
         | Dumb phone will be something like iPhone 15 pro or Pixel 8 Pro
         | where you have separate apps for each thing you do and you
         | can't simply ask the device to do it for you.
        
         | godelski wrote:
         | Yes and no. Context matters.
         | 
         | Will models of similar quality to the current LLaMA, GPT, and
         | Stable Diffusion be running locally on devices and edge
         | systems? Very likely.
         | 
         | Will much higher quality models that still require compute
         | incapable by such edge or consumer devices be available, sold
         | as a service, and in high use? Also very likely.
         | 
         | So expect current quality to make it to your devices but don't
         | expect to necessarily move everything to local because the
         | whole ecosystem will improve too. Overton window will shift and
         | it's like asking if gaming will move to phones. In some ways
         | yes, in other ways you're still going to want to buy that
         | Playstation/Xbox/PC.
        
       | michaelaiello wrote:
       | Are there llms models that will run on small ripis similar to
       | llama?
        
       | hmry wrote:
       | CPU-only?
        
       | Zetobal wrote:
       | I know 29 minutes is long but theoretically you can have all the
       | images you ever want in a small 6gb package and run inference on
       | (nearly) everything. That's fucking amazing.
        
         | godelski wrote:
         | But honest question, if this is your goal, why not use a GAN
         | instead? You should still be able to produce high quality
         | images but at a much faster rate (I'd guess around 10
         | minutes?). Sure, you'll have a bit lower diversity and maybe
         | not SOTA quality image generation, but neither is this thing.
         | Or you could reduce quality. This reddit user seems to be doing
         | fast inference on a pi[0] using stylegan, but that's before
         | mobile stylegan came out which uses <1GB for inference. (It is
         | a distilled StyleGAN2 model. We could distill more recent
         | models)
         | 
         | Just seems like different models, different contexts. Certainly
         | you'd want diffusion on the computer you're doing photoshop on,
         | but random images? Different context.
         | 
         | [0]
         | https://www.reddit.com/r/raspberry_pi/comments/hf7lbh/i_made...
        
           | GaggiX wrote:
           | The quality is not really close, also StyleGAN2 is not
           | conditioned on text.
        
       | pmontra wrote:
       | It reminds me of the time it took to generate mandelbrots on home
       | computers in the 80s.
        
         | johnklos wrote:
         | https://www.klos.com/~john/mandelbrot.jpg
        
       | dang wrote:
       | Submitted title was "Stable Diffusion Turbo on a Raspberry Pi
       | Zero 2 generates an image in 29 minutes", which is good to know
       | in order to understand some of the comments posted before I
       | changed the title.
       | 
       | Submitters: if you want to say what you think is important about
       | an article, that's fine, but do it by adding a comment to the
       | thread. Then your view will be on a level playing field with
       | everyone else's:
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...
        
         | practice9 wrote:
         | Wait, but the new title doesn't seem to be correct
        
           | dang wrote:
           | We certainly want it to be correct! I took "Stable Diffusion
           | XL 1.0 Base on a Raspberry Pi Zero 2" from the About part of 
           | https://github.com/vitoplantamura/OnnxStream/tree/c0cb4b3d7b.
           | ... Is it wrong?
        
       | Lerc wrote:
       | Nice to see people finding way to get the square peg though the
       | round hole.
       | 
       | Something that I wondered when the Raspberry Pi 5 came out is the
       | weirdness that might be possible now that they have their own
       | chip doing IO cleverness.
       | 
       | On the PI 5, the two MIPI interfaces to do either output or
       | input, It made me wonder if the ports are now generalized enough
       | that you could daisy chain a string of PI 5's connecting MIPI to
       | MIPI. Then you could run inference layers on individual PIs and
       | pass the activations down the MIPI. 10x 8Gig Pi 5's might not be
       | the speediest way to get an 80gig setup, but it would certainly
       | be the cheapest (for now)
        
       | johnklos wrote:
       | I would've loved if this were more portable. It requires XNNPACK,
       | which has no generic c implementation. I'd've loved to see Stable
       | Diffusion running on an Alpha, SPARC, or m68k.
        
       ___________________________________________________________________
       (page generated 2023-12-14 23:00 UTC)