[HN Gopher] Diffusion Models
       ___________________________________________________________________
        
       Diffusion Models
        
       Author : reasonableklout
       Score  : 274 points
       Date   : 2024-05-24 23:35 UTC (1 days ago)
        
 (HTM) web link (andrewkchan.dev)
 (TXT) w3m dump (andrewkchan.dev)
        
       | ilaksh wrote:
       | What's the best Apache or MIT-licensed python library for
       | Diffusion Transformers?
        
         | reasonableklout wrote:
         | HuggingFace Diffusers is Apache and supports Diffusion
         | Transformers:
         | https://huggingface.co/docs/diffusers/en/api/pipelines/dit
        
           | ilaksh wrote:
           | They are actually based on the Attribution - Non-commercial
           | Facebook code and have the same license.
        
             | simonw wrote:
             | Are you sure about that?
             | https://github.com/huggingface/diffusers lists the Apache 2
             | license.
        
               | ilaksh wrote:
               | https://github.com/huggingface/diffusers/blob/v0.27.2/src
               | /di...
        
               | bitvoid wrote:
               | That's sort of confusing (to me at least) because that
               | particular header also lists MIT and Apache licenses.
        
               | mrbungie wrote:
               | Pretty sure that license header ended up in the codebase
               | from a clever guy making a PR or it was just a mistake.
        
               | ilaksh wrote:
               | Ok, I would like to believe that. That's great then
               | thanks.
        
               | mrbungie wrote:
               | If it worries you, maybe open an issue? No sane man would
               | allow a weird license that's an API call away from
               | screwing up your own products.
        
               | pama wrote:
               | Unfortunately this statement would not offer sufficient
               | legal protection, so the original authors would have to
               | be convinced to give up their previous rights and change
               | the upstream copyright (and huggingface should update
               | their repo license statement). Of course, these days it
               | is typically easy enough to reimplement the code from a
               | paper in plain pytorch, so I'm not sure one needs all
               | this huggingface repo with the extra framework and risk,
               | but to me it doesnt fit the requirement of the OP
               | question.
        
         | eli_gottlieb wrote:
         | Besides huggingface there's also the DDPT repo:
         | https://github.com/lucidrains/denoising-diffusion-pytorch/
        
           | ilaksh wrote:
           | Nice. But is that a diffusion transformer?
        
         | ilaksh wrote:
         | I found this:
         | 
         | https://paperswithcode.com/paper/scalable-diffusion-models-w...
         | 
         | https://github.com/mindspore-lab/mindone/tree/master/example...
        
       | sidcool wrote:
       | This is a great post
        
       | sashank_1509 wrote:
       | Good post, I always thought diffusion originated from score
       | matching, today I realized diffusion came before score matching
       | theory, so when OpenAI trained on 250 million images, they didn't
       | even have great theory explaining why they were modeling the
       | underlying distribution. Gutsy move
        
         | reasonableklout wrote:
         | The original Dickstein 2015 paper [1] formulated diffusion as
         | maximizing (a lower bound of) log-likelihood of generating the
         | distribution, so there was some theory. But my understanding is
         | that the breakthrough was empirical results from Ho [2] and
         | Nichol [3] showing diffusion could produce not only high-
         | quality samples but better than GANs in some cases.
         | 
         | [1] https://arxiv.org/abs/1503.03585 [2]
         | https://arxiv.org/abs/2006.11239 [3]
         | https://arxiv.org/abs/2105.05233
        
       | Tao3300 wrote:
       | > I spent 2022 learning to draw and was blindsided by the rise of
       | AI art models like Stable Diffusion. Suddenly, the computer was a
       | better artist than I could ever hope to be.
       | 
       | I hope the author stuck with it anyway. The more AI encroaches on
       | creative work, the more I want to tear it all down.
        
         | ctippett wrote:
         | Conversely I've become more motivated to draw things and try my
         | hand at digital art since being exposed to Stable Diffusion,
         | Midjourney et. al. I take the output from these tools and then
         | attempt to recreate or trace over them.
        
       | davidguetta wrote:
       | The train loop is wrong no ? neither x0s and eps are used in the
       | expression of xts so it loons like your training to predict
       | random noise
        
         | fisian wrote:
         | Yes, should be the same as the equation before. Like this:
         | xts = alpha_bar[t].sqrt() * x0s + (1.-alpha_bar[t]).sqrt() *
         | eps
         | 
         | Additionally, the code isn't consistent. In the sampling code a
         | time embedding is used, while in training it isn't.
        
           | reasonableklout wrote:
           | Oops, you're right. Fixed, thanks.
        
         | kmacdough wrote:
         | Not sure which eq you refer to, but from what I understand, the
         | network never network "sees" the correct images. Rather, the
         | network must learn to infer the information indirectly through
         | the loss function.
         | 
         | The loss function encodes information about the noise and,
         | because the network sees the noised up image exactly, this is
         | equivalent to learning about the actual sample images. It's
         | worth noting that you could design a loss function measuring
         | the difference between the output and the real images. This
         | contains equivalent information, but it turns out that the
         | properties of gaussian noise make it much more conducive
         | estimating the gradient.
         | 
         | But point being, the information on the true images _is_ in the
         | loop albeit only through the lense of some noise.
        
       | kmacdough wrote:
       | Thanks for sharing. This has given me much more insight into how
       | and why diffusion models work. Randomness is oddly powerful. Time
       | to try and code one up in some suitably unsuitable language.
       | 
       | Not much to TL;DR for the comment lurkers. This post _is_ the
       | TL;DR of stable diffusion.
        
       ___________________________________________________________________
       (page generated 2024-05-26 23:01 UTC)