[HN Gopher] I have reimplemented Stable Diffusion 3.5 from scrat...
       ___________________________________________________________________
        
       I have reimplemented Stable Diffusion 3.5 from scratch in pure
       PyTorch
        
       Author : yousef_g
       Score  : 299 points
       Date   : 2025-06-14 13:56 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | squircle wrote:
       | Although I'm leaning heavily away from being passionate about
       | software development, this is a cool project, and its freaken
       | awesome how anyone can now reinvent the wheel from first
       | principles.
        
       | albert_e wrote:
       | Sounds like a great resources for learners.
       | 
       | Just wondering aloud --
       | 
       | Is there a tutorial/explainer by any chance that a beginner could
       | use to follow along and learn how this is done.
        
         | an0malous wrote:
         | fast.ai has a course on building Stable Diffusion:
         | https://course.fast.ai/Lessons/part2.html
        
           | BinaryMachine wrote:
           | Great resource Jeremy Howard is awesome, I have been waiting
           | to take this course and follow along because anything older
           | than a year in Deep Learning is already outdated. I hope they
           | release a new version.
        
             | whiplash451 wrote:
             | I don't think this is true. The fast.ai class covers a lot
             | of fundamentals that are still valid and useful today.
        
       | b0a04gl wrote:
       | does the DiT here actually capture cross-token attention the same
       | way as full SD 3.5 or is it simplified for clarity?
        
       | reedlaw wrote:
       | I'm not sure what this means. If it means the Stable Diffusion
       | 3.5 model, why is it fetching that here:
       | https://github.com/yousef-rafat/miniDiffusion/blob/main/enco...
       | 
       | The training dataset is very small, only including fashion-
       | related pictures: https://github.com/yousef-
       | rafat/miniDiffusion/tree/main/data...
        
         | yousef_g wrote:
         | The dataset is for trying out fine-tuning of the diffusion
         | model. It's a reimplementation of SD3 by writing the code from
         | scratch again, but the weights are taken from HuggingFace due
         | to hardware constraints on my part.
        
           | reedlaw wrote:
           | So this implements SD3 inference and fine-tuning?
        
       | CamperBob2 wrote:
       | _Add a Hugging Face Token in get_checkpoints.py before running
       | the script._
       | 
       | Can you be a bit more specific here? It's not clear what such a
       | token is, what it takes to get one, or where it would be placed
       | in get_checkpoints.py.
        
         | einsteinx2 wrote:
         | > what such a token is
         | 
         | An API token from Hugging Face
         | 
         | > what it takes to get one
         | 
         | You generate them in your Hugging Face account
         | 
         | > where it would be placed in get_checkpoints.py.
         | 
         | Line 59 in the empty quotes where it says token = ""
        
           | CamperBob2 wrote:
           | Ah, I see it now, thanks.
           | 
           | That's the kind of thing that, stylistically speaking, it's
           | good to define at the very top of the module.
        
             | einsteinx2 wrote:
             | Agreed. I'm not part of the project I just saw your comment
             | and figured I'd try and help.
        
           | Dwedit wrote:
           | Leaving off the "API" part from "API Token" causes confusion,
           | since AI models tokenize all text into "tokens" before
           | running the model. It's using the same word to describe two
           | very different things.
        
             | einsteinx2 wrote:
             | Yep totally. Fwiw I'm not part of the project I just saw
             | the comment and figured I'd try and help.
        
       | theturtle wrote:
       | Cool. Can it still make images of Anne Hathaway leading a herd of
       | blue giraffes on the Moon?
        
         | IncreasePosts wrote:
         | Seems difficult, as there are no known portraits of Anne
         | Hathaway
        
       | liuliu wrote:
       | If you are interested in this: Flux reference implementation is
       | very minimalistic: https://github.com/black-forest-
       | labs/flux/tree/main/src/flux
       | 
       | The minRF project is very easy to start with training small
       | diffusion models with rectified flow:
       | https://github.com/cloneofsimo/minRF
       | 
       | Also, the reference implementation of SD 3.5 is actually
       | minimalistic too: https://github.com/Stability-AI/sd3-ref
        
         | doctorpangloss wrote:
         | Reference implementations are unmaintained and buggy.
         | 
         | For example
         | https://github.com/huggingface/transformers/issues/27961
         | OpenAI's tokenizer for CLIP is buggy, it's a reference
         | implementation, it isn't the one they used for training, and
         | the problems with it go unsolved and get copied endlessly by
         | other projects.
         | 
         | What about Flux? They don't say it was used for training, it
         | wasn't, there are bugs with it that break cudagraphs or similar
         | that aren't that impactful. On the other hand, it uses CLIP
         | reference, and CLIP reference is buggy, so this is buggy...
        
           | 42lux wrote:
           | You can disable clip l on flux without a loss in quality. You
           | are also making an elephant out of a fly. CLIP is used
           | everywhere.
        
             | doctorpangloss wrote:
             | Consider another interpretation: CLIP L in Flux can be
             | disabled without a loss in quality because the way it is
             | used is buggy!
        
           | electroglyph wrote:
           | It shouldn't take a lot of effort to fix a tokenizer...
        
       | vergessenmir wrote:
       | Is there any notable properties of this implementation, are some
       | parts slower, faster etc
        
       | NoelJacob wrote:
       | So, that's Stable Diffusion without license constraints, is it?
        
         | Sharlin wrote:
         | No, the inference/training algorithms, being math, are not
         | copyrightable. OP just wrote another implementation. What's
         | copyrighted are the models, which OP did not train from scratch
         | (having neither the training material nor the compute to do
         | that).
        
           | echelon wrote:
           | We should be specific when we say "models".
           | 
           | The code outlining the network vs. the resultant weights.
           | (Also vs. any training, inference, fine tuning, misc support
           | code, etc.)
           | 
           | The theoretical diagram of how the code networks and modules
           | are connected is math. But an implementation of that in code
           | is copyrightable.
           | 
           | Afaik, the weights are still a grey area. Whereas code is
           | code and is copyrightable.
           | 
           | Weights are not produced by humans. They are the result of an
           | automated process and are not afforded copyright protection.
           | But this hasn't been tested in court.
           | 
           | If OpenAI GPT 4o weights leak, I think the whole world could
           | use it for free. You'd just have to write the code to run
           | them yourself.
        
           | vrighter wrote:
           | which means he is still in full violation of their license
        
           | Zambyte wrote:
           | > What's copyrighted are the models
           | 
           | Has this actually been tested yet? Or are we still at the
           | stage of AI companies trying to pretend this into reality?
        
             | dheera wrote:
             | I mean, if you take a match to a blank CD-ROM, or shoot
             | neutrinos at a USB drive, there is a very small chance that
             | you get the SD weights stored on them
        
       | caycep wrote:
       | How usable is the original academic source available from Ludwig
       | Maximilian University CompViz group?
        
       | eapriv wrote:
       | I find it hilarious that "from scratch" now somehow means "in
       | PyTorch".
        
         | monsieurbanana wrote:
         | If any "from scratch" post doesn't start with linking to a
         | Primitive Technology video, I'm closing the tab
        
           | mkoubaa wrote:
           | Unless the author was raised by chimps I'm out
        
             | 0cf8612b2e1e wrote:
             | Not fusing heavier elements from hydrogen? I'm out.
        
         | chairmansteve wrote:
         | Yeah. Should have done it in assembly.
        
         | mardifoufs wrote:
         | Pytorch is a pretty basic building block when you get to some
         | degree of model complexity. It wouldn't really be interesting
         | to implement autograd or some other things pytorch provides imo
         | when the goal is to show a reimplantation of something as
         | "higher" level as SD. It's similar to how I don't mind it when
         | someone doesn't reimplement an OS, or a JavaScript engine when
         | writing a web app from scratch.
         | 
         | And there's been a recent surge in abstractions over pytorch,
         | and even standalone packages for models that you are just
         | expected to import and use as an API (which are very useful,
         | don't get me wrong!). So it's nice to see an implementation
         | that doesn't have 10 different dependencies that each abstract
         | over something pytorch does.
        
       | refulgentis wrote:
       | I'm embarrassed to ask: can someone elaborate on, say, what we
       | have now that we didn't have before the repo existed?
       | 
       | I have studiously avoided making models, though I've been
       | adjacent to their output for years now... I think the root of my
       | confusion is I kinda assumed there was already PyTorch based
       | scripts for inference / training. (I assumed _at least_ inference
       | scripts were released with models, and kinda figured fine-tuning
       | / training ones were too)
       | 
       | So then I'm not sure if I'm just looking at a clean room / dirty
       | room rewrite of those. Or maybe everyone is using "PyTorch" but
       | it's usually calling into CUDA/C/some proprietary thingy that is
       | much harder to grok than a pure PyTorch impl?
       | 
       | Anyways, these arent great guesses, so I'll stop myself here. :)
        
         | _tqr3 wrote:
         | Stability AI, creators of Stable Diffusion models release their
         | products under own Stability AI Community License which is not
         | "free" like MIT license. You are not allowed to modify the
         | weights in certain ways.
         | 
         | This package is basically running the model (inference) and
         | maybe fine tuning it using existing AI weights. A great way to
         | learn but still could run into same licensing issue.
        
           | refulgentis wrote:
           | You can't finetune SD 3.5!?
           | 
           | I thought the community license stuff was about keeping
           | people from using it in prod and charging for it without
           | Stability getting at least a small taste.
           | 
           | This sucks.
           | 
           | I haven't been keeping up with gooner squad on Civit, but I
           | did have some understanding SD was less popular, but I
           | thought it was just because 3.5 came far too long after Flux
           | with too little, if any, quality increase to be worth
           | building new scaffolding for.
        
         | rockemsockem wrote:
         | I believe this is the main piece
         | 
         | > with minimal dependencies
         | 
         | I haven't tried running SD 3.5 specifically, but it's built on
         | hugging face libraries which I personally always find to be a
         | mess of dependencies that make it really hard to setup without
         | the exact configuration the original developers used (which is
         | often not provided in enough detail to actually work). This
         | makes it pretty hard to run certain models especially if it's a
         | few months/years after the original release.
         | 
         | For example this appears to be the requirements for the
         | stability AI reference implementation for SD3.5 and there are
         | no versions specified and it includes "transformers" which is
         | just an enormous library.
         | 
         | https://github.com/Stability-AI/sd3.5/blob/main/requirements...
        
           | refulgentis wrote:
           | Ah, tyvm, that maps well onto my knowledge set, I have a ONNX
           | inference wrapper written in Dart. However, I have never been
           | able to leverage transformers.js ONNX demo code, i.e. have a
           | reference to port to Dart.
           | 
           | IIRC it is written in an abstraction layer that supports a
           | transformers-like API surface. This also makes it opaque to
           | figure out _what you 're actually passing to the model_,
           | adding a Python dep mess on top of that...woo boy.
        
       | hkon wrote:
       | now do it in minecraft
        
       | ineedasername wrote:
       | When I think of SD 3.5 (or any version) I think of the portion
       | that results from training, i.e., the weights. The code seems
       | less important? I mean as far as output quality is concerned, or
       | performance. But I'm honestly not sure, and not trying to judge
       | these efforts on that basis.
        
       | Dwedit wrote:
       | Does using pure PyTorch improve performance on non-NVIDIA cards
       | in any way? Or is PyTorch so highly optimized for CUDA that no
       | other GPU vendors have a chance?
        
         | VeejayRampay wrote:
         | I believe pytorch works nicely with rocm, but I don't know if
         | it's nicely to the point where it's "on par"
        
         | 3abiton wrote:
         | It seems to be the case, although pytorch rocm is coming around
         | slowly. Very slowly, if you get it working that is.
        
         | chickenzzzzu wrote:
         | It is possible to run ML workloads on for example AMD devices
         | via Vulkan. With newer extensions like cooperative matrix, and
         | maybe also in the future some scheduling magic exposed by the
         | driver through a new extension, the remaining single digit
         | percent gap CUDA has will evaporate.
        
         | jwitthuhn wrote:
         | Pytorch also runs great on apple silicon, though it is hard to
         | directly compare because Apple's high end GPUs can't compute
         | anywhere near as much as nvidia's high end stuff.
         | 
         | e: I'll also add that pytorch does still have one oddity on
         | apple silicon which is that it considers each tensor to be
         | 'owned' by a particular device, either a cpu or gpu. Macs have
         | unified memory but pytorch will still do a full copy when you
         | 'move' data between the cpu and gpu because it just wasn't
         | built for unified memory.
        
           | brcmthrowaway wrote:
           | Does pytorch work on AS out of the box? Or do you need some
           | apple specific package
        
       | SV_BubbleTime wrote:
       | All twelve people using SD 3.5 may be interested in this.
        
       ___________________________________________________________________
       (page generated 2025-06-14 23:00 UTC)