[HN Gopher] Show HN: An open-source implementation of AlphaFold3
___________________________________________________________________
Show HN: An open-source implementation of AlphaFold3
Hi HN - we're the founders of Ligo Biosciences and are excited to
share an open-source implementation of AlphaFold3, the frontier
model for protein structure prediction. Google DeepMind and their
new startup Isomorphic Labs, are expanding into drug discovery.
They developed AlphaFold3 as their model to accelerate drug
discovery and create demand from big pharma. They already signed
Novartis and Eli Lilly for $3 billion - Google's becoming a pharma
company! (https://www.isomorphiclabs.com/articles/isomorphic-labs-
kick...) AlphaFold3 is a biomolecular structure prediction model
that can do three main things: (1) Predict the structure of
proteins; (2) Predict the structure of drug-protein interactions;
(3) Predict nucleic acid - protein complex structure. AlphaFold3
is incredibly important for science because it vastly accelerates
the mapping of protein structures. It takes one PhD student their
entire PhD to do one structure. With AlphaFold3, you get a
prediction in minutes on par with experimental accuracy. There's
just one problem: when DeepMind published AlphaFold3 in May
(https://www.nature.com/articles/s41586-024-07487-w), there was no
code. This brought up questions about reproducibility
(https://www.nature.com/articles/d41586-024-01463-0) as well as
complaints from the scientific community
(https://undark.org/2024/06/06/opinion-alphafold-3-open-sourc...).
AlphaFold3 is a fundamental advance in structure modeling
technology that the entire biotech industry deserves to be able to
reap the benefits from. Its applications are vast, including: -
CRISPR gene editing technologies, where scientists can see exactly
how the DNA interacts with the scissor Cas protein; - Cancer
research - predicting how a potential drug binds to the cancer
target. One of the highlights in DeepMind's paper is the prediction
of a clinical KRAS inhibitor in complex with its target. -
Antibody / nanobody to target predictions. AlphaFold3 improves
accuracy on this class of molecules 2 fold compared to the next
best tool. Unfortunately, no companies can use it since it is
under a non-commercial license! Today we are releasing the full
model trained on single chain proteins (capability 1 above), with
the other two capabilities to be trained and released soon. We also
include the training code. Weights will be released once training
and benchmarking is complete. We wanted this to be truly open
source so we used the Apache 2.0 license. Deepmind published the
full structure of the model, along with each components' pseudocode
in their paper. We translated this fully into PyTorch, which
required more reverse engineering than we thought! When building
the initial version, we discovered multiple issues in DeepMind's
paper that would interfere with the training - we think the deep
learning community might find these especially interesting.
(Diffusion folks, we would love feedback on this!) These include:
- MSE loss scaling differs from Karras et al. (2022). The weighting
provided in the paper does not downweigh the loss at high noise
levels. - Omission of residual layers in the paper - we add these
back and see benefits in gradient flow and convergence. Anyone have
any idea why Deepmind may have omitted the residual connections in
the DiT blocks? - The MSA module, in its current form, has dead
layers. The last pair weighted averaging and transition layers
cannot contribute to the pair representation, hence no grads. We
swap the order to the one in the ExtraMsaStack in AlphaFold2. An
alternative solution would be to use weight sharing, but whether
this is done is ambiguous in the paper. More about those issues
here: https://github.com/Ligo-Biosciences/AlphaFold3 How this came
about: we are building Ligo (YC S24), where we are using ideas from
AlphaFold3 for enzyme design. We thought open sourcing it was a
nice side quest to benefit the community. For those on Twitter,
there was a good thread a few days ago that has more information:
https://twitter.com/ArdaGoreci/status/1830744265007480934. A few
shoutouts: A huge thanks to OpenFold for pioneering the previous
open source implementation of AlphaFold We did a lot of our early
prototyping with proteinFlow developed by Lisa at AdaptyvBio we
also look forward to partnering with them to bring you the next
versions! We are also partnering with Basecamp Research to supply
this model with the best sequence data known to science. Matthew
Clark (https://batisio.co.uk) for his amazing animations! We're
around to answer questions and look forward to hearing from you!
Author : EdHarris
Score : 120 points
Date : 2024-09-04 17:44 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| serial_dev wrote:
| What an unfortunate naming, I thought I'd see some gravitational
| waves (as I have no idea what alphafold is).
| ck_one wrote:
| What's your next step? Why did you decide to focus on enzyme
| design?
| EdHarris wrote:
| We think enzymes are super cool! You can build molecular
| assembly lines at the atomic scale with them. Many
| pharmaceuticals are already manufactured with enzymes such as
| the diabetes drug Januvia. Engineering them is a big bottleneck
| though - takes years and millions of dollars. We want to speed
| this up with AI-powered design. Next step is ligand-protein
| prediction capability of AlphaFold3, which is also super useful
| for modelling enzyme-substrate interactions.
| lacker wrote:
| This seems really neat!
|
| DeepMind and AlphaFold are clearly moving in a closed-source
| direction, since they created Isomorphic Labs as a division of
| Alphabet essentially focused on doing this stuff closed source.
| In theory it seems nice for academic tools to have an open source
| version, although I'm not familiar enough with this field to
| point to a specific benefit of it.
|
| So what's your plan for the company itself, do you intend to
| continue working on this open source project as part of your
| business model, or was it more of a one-off? Your website seems
| very nonspecific about what exactly you intend to be selling.
| EdHarris wrote:
| Our long term goal is to design enzymes for chemical
| manufacturing. We decided to build AlphaFold3 because we had
| seen how useful AlphaFold2 had been for the protein design
| field. No one else was building it fast enough for us, so we
| decided we should do it ourselves. We are committed to training
| and open-sourcing the full version with ligand and nucleic acid
| prediction capabilities as well since it is so useful for the
| biotech industry.
| snolbert wrote:
| Who would've thought only releasing pseudo-code isn't good
| enough...glad to see the scientific immune system fighting back
| against closed-source science. Your move Google.
| nolist_policy wrote:
| How dare they make money with something that is not
| advertising!
| lofatdairy wrote:
| I mean it shouldn't be enough to publish in nature. The whole
| point of science is that it can be validated. It's totally
| fine that they're hosting their models for free on closed
| servers with limits, even though it's not exactly the most
| ergonomic.
| dekhn wrote:
| It was already validated by winning CASP and the paper by
| Paul Adams
| (https://www.nature.com/articles/s41592-023-02087-4) which,
| although it reads like criticism is actually high praise.
| Everything the model can do, will be (or already has)
| replicated by the open community.
|
| Also, for work of the highest art (of which AF3 is an
| example), publication in nature really is the fundamental
| unit of scientific currency because it ensures all their
| competitors will get hyped up and work extra-hard to
| disprove it.
| fngjdflmdflg wrote:
| Have you considered publishing your own paper about your
| implementation? It would make it easier to cite in the literature
| later on. Would major journals accept such a paper? I would
| assume they would if they really had questions about
| reproducibility.
| EdHarris wrote:
| OpenFold, which was AlphaFold2's open-source implementation was
| published in Nature Methods. We will prepare a similar
| publication once the model is more mature and when we have a
| nice set of experiments showing the model's interesting
| properties.
| dekhn wrote:
| You probably want to change the name of this implementation as
| it's not truly AlphaFold3. I wouldn't be surprised if you got a
| C&D from DM for using the name.
| EdHarris wrote:
| Yes this is a good point. We are actively speaking with our
| counsel to check this. Thanks for flagging, though.
| benreesman wrote:
| I did a very brief stint on computational proteomics. That stuff
| is absolutely next level.
| EdHarris wrote:
| Amazing! What kind of things did you work on?
| benreesman wrote:
| My job was mostly mundane machine learning: classification
| over very large categorical sets.
|
| I never had anything more than a dim intuition of the serious
| chemistry going on before the bytes got to me.
| dwayne_dibley wrote:
| Hi, how are predictions verified? Does one still do experimental
| techniques (X-ray crystallography, cryogenic-em etc.) one you
| have the prediction? Or are predictions so close to reality you
| can progress without experiment?
| EdHarris wrote:
| The predictions can be verified by comparing the predicted
| structure to the experimentally solved structure, either
| crystal or cryoEM. The model is still training and improving,
| we will release the benchmarking results after it's complete.
| boldlybold wrote:
| Thanks for releasing this, I've been looking forward to a truly
| open version I can use in a commercial setting. What a way to
| launch the company!
| EdHarris wrote:
| Thanks!
___________________________________________________________________
(page generated 2024-09-04 23:00 UTC)