hngopher.com

       [HN Gopher] Show HN: An open-source implementation of AlphaFold3
       ___________________________________________________________________
        
       Show HN: An open-source implementation of AlphaFold3
        
       Hi HN - we're the founders of Ligo Biosciences and are excited to
       share an open-source implementation of AlphaFold3, the frontier
       model for protein structure prediction.  Google DeepMind and their
       new startup Isomorphic Labs, are expanding into drug discovery.
       They developed AlphaFold3 as their model to accelerate drug
       discovery and create demand from big pharma. They already signed
       Novartis and Eli Lilly for $3 billion - Google's becoming a pharma
       company! (https://www.isomorphiclabs.com/articles/isomorphic-labs-
       kick...)  AlphaFold3 is a biomolecular structure prediction model
       that can do three main things: (1) Predict the structure of
       proteins; (2) Predict the structure of drug-protein interactions;
       (3) Predict nucleic acid - protein complex structure.  AlphaFold3
       is incredibly important for science because it vastly accelerates
       the mapping of protein structures. It takes one PhD student their
       entire PhD to do one structure. With AlphaFold3, you get a
       prediction in minutes on par with experimental accuracy.  There's
       just one problem: when DeepMind published AlphaFold3 in May
       (https://www.nature.com/articles/s41586-024-07487-w), there was no
       code. This brought up questions about reproducibility
       (https://www.nature.com/articles/d41586-024-01463-0) as well as
       complaints from the scientific community
       (https://undark.org/2024/06/06/opinion-alphafold-3-open-sourc...).
       AlphaFold3 is a fundamental advance in structure modeling
       technology that the entire biotech industry deserves to be able to
       reap the benefits from. Its applications are vast, including:  -
       CRISPR gene editing technologies, where scientists can see exactly
       how the DNA interacts with the scissor Cas protein;  - Cancer
       research - predicting how a potential drug binds to the cancer
       target. One of the highlights in DeepMind's paper is the prediction
       of a clinical KRAS inhibitor in complex with its target.  -
       Antibody / nanobody to target predictions. AlphaFold3 improves
       accuracy on this class of molecules 2 fold compared to the next
       best tool.  Unfortunately, no companies can use it since it is
       under a non-commercial license!  Today we are releasing the full
       model trained on single chain proteins (capability 1 above), with
       the other two capabilities to be trained and released soon. We also
       include the training code. Weights will be released once training
       and benchmarking is complete. We wanted this to be truly open
       source so we used the Apache 2.0 license.  Deepmind published the
       full structure of the model, along with each components' pseudocode
       in their paper. We translated this fully into PyTorch, which
       required more reverse engineering than we thought!  When building
       the initial version, we discovered multiple issues in DeepMind's
       paper that would interfere with the training - we think the deep
       learning community might find these especially interesting.
       (Diffusion folks, we would love feedback on this!) These include:
       - MSE loss scaling differs from Karras et al. (2022). The weighting
       provided in the paper does not downweigh the loss at high noise
       levels.  - Omission of residual layers in the paper - we add these
       back and see benefits in gradient flow and convergence. Anyone have
       any idea why Deepmind may have omitted the residual connections in
       the DiT blocks?  - The MSA module, in its current form, has dead
       layers. The last pair weighted averaging and transition layers
       cannot contribute to the pair representation, hence no grads. We
       swap the order to the one in the ExtraMsaStack in AlphaFold2. An
       alternative solution would be to use weight sharing, but whether
       this is done is ambiguous in the paper.  More about those issues
       here: https://github.com/Ligo-Biosciences/AlphaFold3  How this came
       about: we are building Ligo (YC S24), where we are using ideas from
       AlphaFold3 for enzyme design. We thought open sourcing it was a
       nice side quest to benefit the community.  For those on Twitter,
       there was a good thread a few days ago that has more information:
       https://twitter.com/ArdaGoreci/status/1830744265007480934.  A few
       shoutouts: A huge thanks to OpenFold for pioneering the previous
       open source implementation of AlphaFold We did a lot of our early
       prototyping with proteinFlow developed by Lisa at AdaptyvBio we
       also look forward to partnering with them to bring you the next
       versions! We are also partnering with Basecamp Research to supply
       this model with the best sequence data known to science. Matthew
       Clark (https://batisio.co.uk) for his amazing animations!  We're
       around to answer questions and look forward to hearing from you!
        
       Author : EdHarris
       Score  : 120 points
       Date   : 2024-09-04 17:44 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | serial_dev wrote:
       | What an unfortunate naming, I thought I'd see some gravitational
       | waves (as I have no idea what alphafold is).
        
       | ck_one wrote:
       | What's your next step? Why did you decide to focus on enzyme
       | design?
        
         | EdHarris wrote:
         | We think enzymes are super cool! You can build molecular
         | assembly lines at the atomic scale with them. Many
         | pharmaceuticals are already manufactured with enzymes such as
         | the diabetes drug Januvia. Engineering them is a big bottleneck
         | though - takes years and millions of dollars. We want to speed
         | this up with AI-powered design. Next step is ligand-protein
         | prediction capability of AlphaFold3, which is also super useful
         | for modelling enzyme-substrate interactions.
        
       | lacker wrote:
       | This seems really neat!
       | 
       | DeepMind and AlphaFold are clearly moving in a closed-source
       | direction, since they created Isomorphic Labs as a division of
       | Alphabet essentially focused on doing this stuff closed source.
       | In theory it seems nice for academic tools to have an open source
       | version, although I'm not familiar enough with this field to
       | point to a specific benefit of it.
       | 
       | So what's your plan for the company itself, do you intend to
       | continue working on this open source project as part of your
       | business model, or was it more of a one-off? Your website seems
       | very nonspecific about what exactly you intend to be selling.
        
         | EdHarris wrote:
         | Our long term goal is to design enzymes for chemical
         | manufacturing. We decided to build AlphaFold3 because we had
         | seen how useful AlphaFold2 had been for the protein design
         | field. No one else was building it fast enough for us, so we
         | decided we should do it ourselves. We are committed to training
         | and open-sourcing the full version with ligand and nucleic acid
         | prediction capabilities as well since it is so useful for the
         | biotech industry.
        
       | snolbert wrote:
       | Who would've thought only releasing pseudo-code isn't good
       | enough...glad to see the scientific immune system fighting back
       | against closed-source science. Your move Google.
        
         | nolist_policy wrote:
         | How dare they make money with something that is not
         | advertising!
        
           | lofatdairy wrote:
           | I mean it shouldn't be enough to publish in nature. The whole
           | point of science is that it can be validated. It's totally
           | fine that they're hosting their models for free on closed
           | servers with limits, even though it's not exactly the most
           | ergonomic.
        
             | dekhn wrote:
             | It was already validated by winning CASP and the paper by
             | Paul Adams
             | (https://www.nature.com/articles/s41592-023-02087-4) which,
             | although it reads like criticism is actually high praise.
             | Everything the model can do, will be (or already has)
             | replicated by the open community.
             | 
             | Also, for work of the highest art (of which AF3 is an
             | example), publication in nature really is the fundamental
             | unit of scientific currency because it ensures all their
             | competitors will get hyped up and work extra-hard to
             | disprove it.
        
       | fngjdflmdflg wrote:
       | Have you considered publishing your own paper about your
       | implementation? It would make it easier to cite in the literature
       | later on. Would major journals accept such a paper? I would
       | assume they would if they really had questions about
       | reproducibility.
        
         | EdHarris wrote:
         | OpenFold, which was AlphaFold2's open-source implementation was
         | published in Nature Methods. We will prepare a similar
         | publication once the model is more mature and when we have a
         | nice set of experiments showing the model's interesting
         | properties.
        
       | dekhn wrote:
       | You probably want to change the name of this implementation as
       | it's not truly AlphaFold3. I wouldn't be surprised if you got a
       | C&D from DM for using the name.
        
         | EdHarris wrote:
         | Yes this is a good point. We are actively speaking with our
         | counsel to check this. Thanks for flagging, though.
        
       | benreesman wrote:
       | I did a very brief stint on computational proteomics. That stuff
       | is absolutely next level.
        
         | EdHarris wrote:
         | Amazing! What kind of things did you work on?
        
           | benreesman wrote:
           | My job was mostly mundane machine learning: classification
           | over very large categorical sets.
           | 
           | I never had anything more than a dim intuition of the serious
           | chemistry going on before the bytes got to me.
        
       | dwayne_dibley wrote:
       | Hi, how are predictions verified? Does one still do experimental
       | techniques (X-ray crystallography, cryogenic-em etc.) one you
       | have the prediction? Or are predictions so close to reality you
       | can progress without experiment?
        
         | EdHarris wrote:
         | The predictions can be verified by comparing the predicted
         | structure to the experimentally solved structure, either
         | crystal or cryoEM. The model is still training and improving,
         | we will release the benchmarking results after it's complete.
        
       | boldlybold wrote:
       | Thanks for releasing this, I've been looking forward to a truly
       | open version I can use in a commercial setting. What a way to
       | launch the company!
        
         | EdHarris wrote:
         | Thanks!
        
       ___________________________________________________________________
       (page generated 2024-09-04 23:00 UTC)