[HN Gopher] I have reimplemented Stable Diffusion 3.5 from scrat...
___________________________________________________________________
I have reimplemented Stable Diffusion 3.5 from scratch in pure
PyTorch
Author : yousef_g
Score : 299 points
Date : 2025-06-14 13:56 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| squircle wrote:
| Although I'm leaning heavily away from being passionate about
| software development, this is a cool project, and its freaken
| awesome how anyone can now reinvent the wheel from first
| principles.
| albert_e wrote:
| Sounds like a great resources for learners.
|
| Just wondering aloud --
|
| Is there a tutorial/explainer by any chance that a beginner could
| use to follow along and learn how this is done.
| an0malous wrote:
| fast.ai has a course on building Stable Diffusion:
| https://course.fast.ai/Lessons/part2.html
| BinaryMachine wrote:
| Great resource Jeremy Howard is awesome, I have been waiting
| to take this course and follow along because anything older
| than a year in Deep Learning is already outdated. I hope they
| release a new version.
| whiplash451 wrote:
| I don't think this is true. The fast.ai class covers a lot
| of fundamentals that are still valid and useful today.
| b0a04gl wrote:
| does the DiT here actually capture cross-token attention the same
| way as full SD 3.5 or is it simplified for clarity?
| reedlaw wrote:
| I'm not sure what this means. If it means the Stable Diffusion
| 3.5 model, why is it fetching that here:
| https://github.com/yousef-rafat/miniDiffusion/blob/main/enco...
|
| The training dataset is very small, only including fashion-
| related pictures: https://github.com/yousef-
| rafat/miniDiffusion/tree/main/data...
| yousef_g wrote:
| The dataset is for trying out fine-tuning of the diffusion
| model. It's a reimplementation of SD3 by writing the code from
| scratch again, but the weights are taken from HuggingFace due
| to hardware constraints on my part.
| reedlaw wrote:
| So this implements SD3 inference and fine-tuning?
| CamperBob2 wrote:
| _Add a Hugging Face Token in get_checkpoints.py before running
| the script._
|
| Can you be a bit more specific here? It's not clear what such a
| token is, what it takes to get one, or where it would be placed
| in get_checkpoints.py.
| einsteinx2 wrote:
| > what such a token is
|
| An API token from Hugging Face
|
| > what it takes to get one
|
| You generate them in your Hugging Face account
|
| > where it would be placed in get_checkpoints.py.
|
| Line 59 in the empty quotes where it says token = ""
| CamperBob2 wrote:
| Ah, I see it now, thanks.
|
| That's the kind of thing that, stylistically speaking, it's
| good to define at the very top of the module.
| einsteinx2 wrote:
| Agreed. I'm not part of the project I just saw your comment
| and figured I'd try and help.
| Dwedit wrote:
| Leaving off the "API" part from "API Token" causes confusion,
| since AI models tokenize all text into "tokens" before
| running the model. It's using the same word to describe two
| very different things.
| einsteinx2 wrote:
| Yep totally. Fwiw I'm not part of the project I just saw
| the comment and figured I'd try and help.
| theturtle wrote:
| Cool. Can it still make images of Anne Hathaway leading a herd of
| blue giraffes on the Moon?
| IncreasePosts wrote:
| Seems difficult, as there are no known portraits of Anne
| Hathaway
| liuliu wrote:
| If you are interested in this: Flux reference implementation is
| very minimalistic: https://github.com/black-forest-
| labs/flux/tree/main/src/flux
|
| The minRF project is very easy to start with training small
| diffusion models with rectified flow:
| https://github.com/cloneofsimo/minRF
|
| Also, the reference implementation of SD 3.5 is actually
| minimalistic too: https://github.com/Stability-AI/sd3-ref
| doctorpangloss wrote:
| Reference implementations are unmaintained and buggy.
|
| For example
| https://github.com/huggingface/transformers/issues/27961
| OpenAI's tokenizer for CLIP is buggy, it's a reference
| implementation, it isn't the one they used for training, and
| the problems with it go unsolved and get copied endlessly by
| other projects.
|
| What about Flux? They don't say it was used for training, it
| wasn't, there are bugs with it that break cudagraphs or similar
| that aren't that impactful. On the other hand, it uses CLIP
| reference, and CLIP reference is buggy, so this is buggy...
| 42lux wrote:
| You can disable clip l on flux without a loss in quality. You
| are also making an elephant out of a fly. CLIP is used
| everywhere.
| doctorpangloss wrote:
| Consider another interpretation: CLIP L in Flux can be
| disabled without a loss in quality because the way it is
| used is buggy!
| electroglyph wrote:
| It shouldn't take a lot of effort to fix a tokenizer...
| vergessenmir wrote:
| Is there any notable properties of this implementation, are some
| parts slower, faster etc
| NoelJacob wrote:
| So, that's Stable Diffusion without license constraints, is it?
| Sharlin wrote:
| No, the inference/training algorithms, being math, are not
| copyrightable. OP just wrote another implementation. What's
| copyrighted are the models, which OP did not train from scratch
| (having neither the training material nor the compute to do
| that).
| echelon wrote:
| We should be specific when we say "models".
|
| The code outlining the network vs. the resultant weights.
| (Also vs. any training, inference, fine tuning, misc support
| code, etc.)
|
| The theoretical diagram of how the code networks and modules
| are connected is math. But an implementation of that in code
| is copyrightable.
|
| Afaik, the weights are still a grey area. Whereas code is
| code and is copyrightable.
|
| Weights are not produced by humans. They are the result of an
| automated process and are not afforded copyright protection.
| But this hasn't been tested in court.
|
| If OpenAI GPT 4o weights leak, I think the whole world could
| use it for free. You'd just have to write the code to run
| them yourself.
| vrighter wrote:
| which means he is still in full violation of their license
| Zambyte wrote:
| > What's copyrighted are the models
|
| Has this actually been tested yet? Or are we still at the
| stage of AI companies trying to pretend this into reality?
| dheera wrote:
| I mean, if you take a match to a blank CD-ROM, or shoot
| neutrinos at a USB drive, there is a very small chance that
| you get the SD weights stored on them
| caycep wrote:
| How usable is the original academic source available from Ludwig
| Maximilian University CompViz group?
| eapriv wrote:
| I find it hilarious that "from scratch" now somehow means "in
| PyTorch".
| monsieurbanana wrote:
| If any "from scratch" post doesn't start with linking to a
| Primitive Technology video, I'm closing the tab
| mkoubaa wrote:
| Unless the author was raised by chimps I'm out
| 0cf8612b2e1e wrote:
| Not fusing heavier elements from hydrogen? I'm out.
| chairmansteve wrote:
| Yeah. Should have done it in assembly.
| mardifoufs wrote:
| Pytorch is a pretty basic building block when you get to some
| degree of model complexity. It wouldn't really be interesting
| to implement autograd or some other things pytorch provides imo
| when the goal is to show a reimplantation of something as
| "higher" level as SD. It's similar to how I don't mind it when
| someone doesn't reimplement an OS, or a JavaScript engine when
| writing a web app from scratch.
|
| And there's been a recent surge in abstractions over pytorch,
| and even standalone packages for models that you are just
| expected to import and use as an API (which are very useful,
| don't get me wrong!). So it's nice to see an implementation
| that doesn't have 10 different dependencies that each abstract
| over something pytorch does.
| refulgentis wrote:
| I'm embarrassed to ask: can someone elaborate on, say, what we
| have now that we didn't have before the repo existed?
|
| I have studiously avoided making models, though I've been
| adjacent to their output for years now... I think the root of my
| confusion is I kinda assumed there was already PyTorch based
| scripts for inference / training. (I assumed _at least_ inference
| scripts were released with models, and kinda figured fine-tuning
| / training ones were too)
|
| So then I'm not sure if I'm just looking at a clean room / dirty
| room rewrite of those. Or maybe everyone is using "PyTorch" but
| it's usually calling into CUDA/C/some proprietary thingy that is
| much harder to grok than a pure PyTorch impl?
|
| Anyways, these arent great guesses, so I'll stop myself here. :)
| _tqr3 wrote:
| Stability AI, creators of Stable Diffusion models release their
| products under own Stability AI Community License which is not
| "free" like MIT license. You are not allowed to modify the
| weights in certain ways.
|
| This package is basically running the model (inference) and
| maybe fine tuning it using existing AI weights. A great way to
| learn but still could run into same licensing issue.
| refulgentis wrote:
| You can't finetune SD 3.5!?
|
| I thought the community license stuff was about keeping
| people from using it in prod and charging for it without
| Stability getting at least a small taste.
|
| This sucks.
|
| I haven't been keeping up with gooner squad on Civit, but I
| did have some understanding SD was less popular, but I
| thought it was just because 3.5 came far too long after Flux
| with too little, if any, quality increase to be worth
| building new scaffolding for.
| rockemsockem wrote:
| I believe this is the main piece
|
| > with minimal dependencies
|
| I haven't tried running SD 3.5 specifically, but it's built on
| hugging face libraries which I personally always find to be a
| mess of dependencies that make it really hard to setup without
| the exact configuration the original developers used (which is
| often not provided in enough detail to actually work). This
| makes it pretty hard to run certain models especially if it's a
| few months/years after the original release.
|
| For example this appears to be the requirements for the
| stability AI reference implementation for SD3.5 and there are
| no versions specified and it includes "transformers" which is
| just an enormous library.
|
| https://github.com/Stability-AI/sd3.5/blob/main/requirements...
| refulgentis wrote:
| Ah, tyvm, that maps well onto my knowledge set, I have a ONNX
| inference wrapper written in Dart. However, I have never been
| able to leverage transformers.js ONNX demo code, i.e. have a
| reference to port to Dart.
|
| IIRC it is written in an abstraction layer that supports a
| transformers-like API surface. This also makes it opaque to
| figure out _what you 're actually passing to the model_,
| adding a Python dep mess on top of that...woo boy.
| hkon wrote:
| now do it in minecraft
| ineedasername wrote:
| When I think of SD 3.5 (or any version) I think of the portion
| that results from training, i.e., the weights. The code seems
| less important? I mean as far as output quality is concerned, or
| performance. But I'm honestly not sure, and not trying to judge
| these efforts on that basis.
| Dwedit wrote:
| Does using pure PyTorch improve performance on non-NVIDIA cards
| in any way? Or is PyTorch so highly optimized for CUDA that no
| other GPU vendors have a chance?
| VeejayRampay wrote:
| I believe pytorch works nicely with rocm, but I don't know if
| it's nicely to the point where it's "on par"
| 3abiton wrote:
| It seems to be the case, although pytorch rocm is coming around
| slowly. Very slowly, if you get it working that is.
| chickenzzzzu wrote:
| It is possible to run ML workloads on for example AMD devices
| via Vulkan. With newer extensions like cooperative matrix, and
| maybe also in the future some scheduling magic exposed by the
| driver through a new extension, the remaining single digit
| percent gap CUDA has will evaporate.
| jwitthuhn wrote:
| Pytorch also runs great on apple silicon, though it is hard to
| directly compare because Apple's high end GPUs can't compute
| anywhere near as much as nvidia's high end stuff.
|
| e: I'll also add that pytorch does still have one oddity on
| apple silicon which is that it considers each tensor to be
| 'owned' by a particular device, either a cpu or gpu. Macs have
| unified memory but pytorch will still do a full copy when you
| 'move' data between the cpu and gpu because it just wasn't
| built for unified memory.
| brcmthrowaway wrote:
| Does pytorch work on AS out of the box? Or do you need some
| apple specific package
| SV_BubbleTime wrote:
| All twelve people using SD 3.5 may be interested in this.
___________________________________________________________________
(page generated 2025-06-14 23:00 UTC)