[HN Gopher] Diffusion Models
___________________________________________________________________
Diffusion Models
Author : reasonableklout
Score : 274 points
Date : 2024-05-24 23:35 UTC (1 days ago)
(HTM) web link (andrewkchan.dev)
(TXT) w3m dump (andrewkchan.dev)
| ilaksh wrote:
| What's the best Apache or MIT-licensed python library for
| Diffusion Transformers?
| reasonableklout wrote:
| HuggingFace Diffusers is Apache and supports Diffusion
| Transformers:
| https://huggingface.co/docs/diffusers/en/api/pipelines/dit
| ilaksh wrote:
| They are actually based on the Attribution - Non-commercial
| Facebook code and have the same license.
| simonw wrote:
| Are you sure about that?
| https://github.com/huggingface/diffusers lists the Apache 2
| license.
| ilaksh wrote:
| https://github.com/huggingface/diffusers/blob/v0.27.2/src
| /di...
| bitvoid wrote:
| That's sort of confusing (to me at least) because that
| particular header also lists MIT and Apache licenses.
| mrbungie wrote:
| Pretty sure that license header ended up in the codebase
| from a clever guy making a PR or it was just a mistake.
| ilaksh wrote:
| Ok, I would like to believe that. That's great then
| thanks.
| mrbungie wrote:
| If it worries you, maybe open an issue? No sane man would
| allow a weird license that's an API call away from
| screwing up your own products.
| pama wrote:
| Unfortunately this statement would not offer sufficient
| legal protection, so the original authors would have to
| be convinced to give up their previous rights and change
| the upstream copyright (and huggingface should update
| their repo license statement). Of course, these days it
| is typically easy enough to reimplement the code from a
| paper in plain pytorch, so I'm not sure one needs all
| this huggingface repo with the extra framework and risk,
| but to me it doesnt fit the requirement of the OP
| question.
| eli_gottlieb wrote:
| Besides huggingface there's also the DDPT repo:
| https://github.com/lucidrains/denoising-diffusion-pytorch/
| ilaksh wrote:
| Nice. But is that a diffusion transformer?
| ilaksh wrote:
| I found this:
|
| https://paperswithcode.com/paper/scalable-diffusion-models-w...
|
| https://github.com/mindspore-lab/mindone/tree/master/example...
| sidcool wrote:
| This is a great post
| sashank_1509 wrote:
| Good post, I always thought diffusion originated from score
| matching, today I realized diffusion came before score matching
| theory, so when OpenAI trained on 250 million images, they didn't
| even have great theory explaining why they were modeling the
| underlying distribution. Gutsy move
| reasonableklout wrote:
| The original Dickstein 2015 paper [1] formulated diffusion as
| maximizing (a lower bound of) log-likelihood of generating the
| distribution, so there was some theory. But my understanding is
| that the breakthrough was empirical results from Ho [2] and
| Nichol [3] showing diffusion could produce not only high-
| quality samples but better than GANs in some cases.
|
| [1] https://arxiv.org/abs/1503.03585 [2]
| https://arxiv.org/abs/2006.11239 [3]
| https://arxiv.org/abs/2105.05233
| Tao3300 wrote:
| > I spent 2022 learning to draw and was blindsided by the rise of
| AI art models like Stable Diffusion. Suddenly, the computer was a
| better artist than I could ever hope to be.
|
| I hope the author stuck with it anyway. The more AI encroaches on
| creative work, the more I want to tear it all down.
| ctippett wrote:
| Conversely I've become more motivated to draw things and try my
| hand at digital art since being exposed to Stable Diffusion,
| Midjourney et. al. I take the output from these tools and then
| attempt to recreate or trace over them.
| davidguetta wrote:
| The train loop is wrong no ? neither x0s and eps are used in the
| expression of xts so it loons like your training to predict
| random noise
| fisian wrote:
| Yes, should be the same as the equation before. Like this:
| xts = alpha_bar[t].sqrt() * x0s + (1.-alpha_bar[t]).sqrt() *
| eps
|
| Additionally, the code isn't consistent. In the sampling code a
| time embedding is used, while in training it isn't.
| reasonableklout wrote:
| Oops, you're right. Fixed, thanks.
| kmacdough wrote:
| Not sure which eq you refer to, but from what I understand, the
| network never network "sees" the correct images. Rather, the
| network must learn to infer the information indirectly through
| the loss function.
|
| The loss function encodes information about the noise and,
| because the network sees the noised up image exactly, this is
| equivalent to learning about the actual sample images. It's
| worth noting that you could design a loss function measuring
| the difference between the output and the real images. This
| contains equivalent information, but it turns out that the
| properties of gaussian noise make it much more conducive
| estimating the gradient.
|
| But point being, the information on the true images _is_ in the
| loop albeit only through the lense of some noise.
| kmacdough wrote:
| Thanks for sharing. This has given me much more insight into how
| and why diffusion models work. Randomness is oddly powerful. Time
| to try and code one up in some suitably unsuitable language.
|
| Not much to TL;DR for the comment lurkers. This post _is_ the
| TL;DR of stable diffusion.
___________________________________________________________________
(page generated 2024-05-26 23:01 UTC)