[HN Gopher] AlphaFold's new rival? Meta AI predicts shape of 600...
___________________________________________________________________
AlphaFold's new rival? Meta AI predicts shape of 600M proteins
Author : pseudolus
Score : 179 points
Date : 2022-11-02 10:19 UTC (12 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| Simon_O_Rourke wrote:
| More in their line to go and predict the effects of their
| Metaverse strategic decision making on the share price!
| the__alchemist wrote:
| As an outsider to both biology and AI, I feel like there's a more
| key problem than structure estimation from pattern-matching
| (over-fitting?) known data: Learning how the molecular dynamics
| work. I'm fascinated with how the folding might happen. The
| intermediate steps; how the backbone and sidechains react to
| change in temperatures; how much motion is going on in the
| sidechains in response to constant interaction with water
| molecules etc. How well does a classical rigidity/elasticity etc
| model apply to covalently-bonded molecules. How much flexing do
| various parts of the protein to real-time. How to make a model
| that ab-initio folds (or unfolds) without being tuned to
| experimental data. How much of the quantum interactions can we
| dodge due to working at the atomic scale vice electron-scale. How
| accurately can we model using dipoles, hydrophobic interactions,
| simplified hydrogen bond models etc. How much of the picture are
| we missing by only looking at the static aspect of the folded
| protein - ie experimentally or AI-determined static atom
| coordinates as the model.
| COGlory wrote:
| A number of people, myself included, have railed against the
| terminology that frequently surrounds these otherwise great
| achievements, because it does a disservice, precisely on this
| topic.
|
| AlphaFold (I haven't looked at Meta AI yet, beyond a cursory
| glance), is solving the state of the folded protein, based
| primarily off the state of other similar folded proteins.
| There's some, but extremely limited, modeling of unknown
| states. The major breakthrough for AlphaFold appears to be that
| it is substantially better at detecting meaningful signal in
| homology-based sequence alignments. That means, it can figure
| out the parts of a protein that are similar to other known
| folded proteins, with much greater success than previous models
| (2x-3x better iIrc). As far as modeling proteins or portions of
| proteins for which there are no known homologs, the model
| becomes significantly weaker (although not nonexistant).
|
| In my opinion, and also the opinion of a number of my
| colleagues, advertising AlphaFold as having solved the "protein
| folding problem", and other similar language (used by both the
| media and the AlphaFold press releases) is completely
| disingenuous. We are effectively no closer to understanding how
| proteins fold, nor are we closer to being able to predict how a
| protein folds without any _a priori_ information. Furthermore,
| homology modeling, while quite successful so far for a lot of
| proteins, breaks down on edge cases, higher-order structures,
| and unknown folds. It seems unlikely that it will ever
| completely solve the problem, and therefore, an entirely
| different approach will be needed to modeling protein folding
| that encompasses the entire field, edge cases and all (a quite
| lofty goal that may also never be reached).
|
| All this is to say: AlphaFold is a great tool that I use
| frequently and am grateful it exists, but it hasn't solved
| protein folding, and the thing is has solved is intrinsically
| limited, and we should probably use different language to
| describe what it's done. Either way, I'm glad to see progress
| being made here, and eagerly await finding out how proteins
| _actually_ fold.
| deltree7 wrote:
| How do we know that you aren't missing a piece of the puzzle
| that others are/may see?
|
| It's a genuine question. A lot of experts have a lot of blind
| spots especially around prediction, rate of growth,
| potential, vision. History is littered with people who are
| really good at a field but fail to see 2/3/5/10 years ahead.
| COGlory wrote:
| You don't, but that only really applies to this:
|
| >It seems unlikely that it will ever completely solve the
| problem, and therefore, an entirely different approach will
| be needed to modeling protein folding that encompasses the
| entire field, edge cases and all (a quite lofty goal that
| may also never be reached).
|
| Of course I'm fallible, but it doesn't change the fact that
| right now, this is the case - AlphaFold teaches us
| essentially nothing about how proteins are actually
| folding, and instead is solving for the folded state. If
| someone can backtrack from there, I'll be nothing but
| ecstatic.
| [deleted]
| panabee wrote:
| is it also accurate to say that proteins may fold differently
| based on temperature, ph, and other factors?
|
| meaning a protein could fold in multiple ways (similar to a
| swiss army knife "folding" into different shapes), and
| alphafold only predicts a subset of these for now (which is
| still amazing).
| dekhn wrote:
| Yes, absolutely. Proteins can fold to different target
| structures (that are very distinct from each other) and get
| stuck in those states for long periods of time, even if the
| state isn't the global energy minimum.
|
| The list of reasons for this- both functionally and
| inadvertently- is extremely long. But the long and short of
| it is that the ability of proteins to reproducibly fold to
| a single structure was figured out using this technique-
| putting an already folded protein into a very strong
| solution (of Urea or guanidinum chloride), detecting that
| it "unfolded", then putting it back into salty water and
| watching it reform to the same original structure.
|
| lots more detail here:
| https://en.wikipedia.org/wiki/Anfinsen%27s_dogma (this
| dogma is literally the core of protein folding/structure
| prediction/function prediction).
|
| Even more detail here:
| https://en.wikipedia.org/wiki/Hofmeister_series which is
| basically a series of progressively stronger solutions that
| interfere with water/protein interactions and disrupt or
| enhance folding. This data was key to establishing that
| hydrophobic collapse (one of the dominant models for how
| proteins spontaneously form structure) is a significant
| force in driving folding free energy.
| pelorat wrote:
| > I'm glad to see progress being made here, and eagerly await
| finding out how proteins actually fold.
|
| There's no way a human will ever understand this. It's likely
| a problem that is beyond what the human mind is capable of
| understanding.
| siver_john wrote:
| As an insider to both fields (and specifically their juncture
| to molecular dynamics).
|
| What you are discussing is of course being studied but the
| problem is that it is a lot more computationally expensive. We
| do have simulations of simple (read very small) proteins
| folding and unfolding but for larger ones the computational
| time to watch them fold can be gigantic, if not impossible due
| to the fact that proteins often fold as they are being made.
| Which means including a much larger process into a folding one
| which just further stresses computational resources.
|
| This computational problem is so enormous that a company at the
| cutting edge of research D.E. Shaw built a specialized computer
| solely for simulating proteins. Also most of the software used
| for this until recently had abandoned multi-GPU paralellism
| because it didn't scale well. The pandemic caused the need to
| simulate the virus on the entirety of Summit and introduced
| some work back into that route but it is still specialized (and
| wouldn't help for systems below a certain size anyways).
|
| Also my previous points have been for atomic models (e.g. we
| treat everything more like newtonian particles and ignore
| quantum effects) some things definitely need more resolution
| and at that level you are lucky to see protein fluctuations let
| alone folding.
| dekhn wrote:
| Did simulating covid on summit actually help anything?
| siver_john wrote:
| My memory is a little rusty, but I believe yes. If I
| remember correctly simulation of the virus helped medically
| in a few ways. Specifically I think it gave insight into
| the mRNA vaccines and what sequences to use to make them
| effective (by basically making a slightly worse spike
| protein). I am sure it helped in drug discovery as I know
| our lab used simulations to suggest some potential drug
| pockets. There were some really good talks about it at
| NVIDIA's GTC last year or so (maybe more at this recent GTC
| but I had too much going on personally to watch the VODs).
| flobosg wrote:
| > This computational problem is so enormous that a company at
| the cutting edge of research D.E. Shaw built a specialized
| computer solely for simulating proteins.
|
| See https://en.wikipedia.org/wiki/Anton_(computer)
| xiphias2 wrote:
| That's what the Alphafold team is working on for some time now.
| The only difference is that it will be relevant for drug
| research, so I don't expect Alphabet to give it away for free
| as well.
| dekhn wrote:
| These questions are better addressed by folding@home and we're
| still very much computationally limited in our ability to
| answer these questions.
|
| Do you want to answer these questions for the satisifcation of
| understanding the underlying physical rules that drive folding?
| Why? It's unclear that knowing those things would actually make
| a large impact in any industrially/medically useful contexts.
| It uses huge amounts of CPU to sample these functions
| accurately enough to replace actual physical experiments on
| protein motion. Just doesn't seem like an effective investment
| of brain or computer time.
|
| (I say this as somebody whose entire career was predicated on
| using MD to answer these questions; see
| https://www.nature.com/articles/nchem.1821 for our attempt in
| that space)
| mechagodzilla wrote:
| The folding@home approach is very limited to short individual
| simulation times (a millisecond total, maybe, but 100
| disconnected nanoseconds at a time), so it then relies on
| various 'enhanced sampling' techniques to try to put your
| thumb on the scale to bias things into exploring interesting
| dynamics. It seems like it is probably more effective the
| more you already know about a given protein target. Meta's
| approach (which seems like AF2, but faster/worse?) seems to
| have a similar problem, in that it's even less trustworthy
| when you apply it to a new target you have relatively little
| concrete information about.
| pelorat wrote:
| And how does it stack up to RoseTTAFold? Isn't RoseTTAFold much
| faster than AlphaFold but slightly worse in accuracy?
| robertlagrant wrote:
| > The predictions are freely available for anyone to use, as is
| the code underlying the model, says Rives.
|
| Smart. If you might get cut, having all your work available to
| pick up from is a great idea.
| tromp wrote:
| Trading accuracy for speed using language based models:
|
| > Meta's network, called ESMFold, isn't quite as accurate as
| AlphaFold, Rives' team reported earlier this summer2, but it is
| about 60 times faster at predicting structures, he says. "What
| this means is that we can scale structure prediction to much
| larger databases."
|
| > Burkhard Rost, a computational biologist at the Technical
| University of Munich in Germany, is impressed with the
| combination of speed and accuracy of Meta's model. But he
| questions whether it really offers an advantage over AlphaFold's
| precision, when it comes to predicting proteins from metagenomic
| databases. Language model-based prediction methods -- including
| one developed by his team3 -- are better suited to quickly
| determine how mutations alter protein structure, which is not
| possible with AlphaFold. "We will see structure prediction become
| leaner, simpler cheaper and that will open the door for new
| things," he says.
| dekhn wrote:
| Structure prediction is embarassingly parallel, and rarely
| requires any specific protein to be predicted- which can be
| done at any time, including precomputed- in an extremely short
| period of time.
|
| DeepMind has no trouble getting the inference time at Google to
| compute predictions using their model for any protein of
| interest or the largest database.
|
| Thus, a faster but more inaccurate system is not really
| desirable unless it really does provide another feature, such
| as (as Rost says) predictions on mutations alter structure. But
| if alphafold works, then it would indeed be capable of
| predicting how mutations alter structure.
| photochemsyn wrote:
| Cool stuff, but more likely useful as a primary filter or search
| tool rather than for detailed understanding of protein structure
| and function. See this quote:
|
| > "Sergey Ovchinnikov, an evolutionary biologist at Harvard
| University in Cambridge, Massachusetts, wonders about the
| hundreds of millions of predictions that ESMFold made with low-
| confidence. Some might lack a defined structure, at least in
| isolation, whereas others might be non-coding DNA mistaken as a
| protein-coding material."
|
| Understanding of how these proteins function requires high-
| resolution information about bond angles and atom-atom distances,
| particularly for non-structural proteins (i.e. interesting
| catalytic enzymes). Hence, wet-lab work and protein structure
| characterization via X-ray and NMR methods aren't going anywhere.
| COGlory wrote:
| Medium confidence is often close enough that you can formulate
| a hypothesis, but high confidence still isn't close enough that
| you'd stake millions of dollars of experiments on it anyways.
| Most of the people in structural biology in pharma that I have
| chatted with said they're still solving the structures of their
| targets even with high-confidence AlphaFold models.
|
| For me, as someone solving structures on a monthly basis,
| AlphaFold is great because after I get back my electron density
| map, I dump my sequence into AlphaFold, get a model (of any
| confidence) and most of it fits well enough into my density map
| that I don't have to start trying to model from nothing, and it
| saves me honestly days of work.
| dekhn wrote:
| What's really funny about using AlphaFold predictions to
| bootstrap a model into a density map is eventually, your
| structure, based on an AF prediction, will eventually be
| folded into the dataset used to train the next version of
| AlphaFold. Talk about test set/training set leakage!
| siver_john wrote:
| That's a cool way I had never thought about it being used.
| Just for my own curiosity do you know if that is a common use
| case in the protein solving field? (I'd imagine it would also
| be useful in NMR experiments for getting initial point
| labels).
| flobosg wrote:
| It is quite common to use predicted models as an aid for
| phasing and molecular replacement; even Foldit models have
| been adopted for that purpose:
| https://www.nature.com/articles/nsmb.2119
| uri4 wrote:
| lucasmullens wrote:
| There were trials, and there aren't 10 billion people.
| uri4 wrote:
| That is for next pandemic in 2023, silly :)
|
| And of course, there are always trials!?P
| aliljet wrote:
| This is incredible work, honestly. Is the theory here that humans
| seem to enjoy an intuitive understanding of protein folding? I
| seem to remember an online game (maybe
| https://en.wikipedia.org/wiki/Foldit) that exploited this
| intuition by crowdsourcing human suggestions for brute-force
| style protein folding work.
|
| And am I right that the goal is to create a best-effort short-cut
| to the final stage of brute-force final folding to get a real
| result?
| danielmarkbruce wrote:
| It's just about making simplifications where possible. You
| don't need a molecular dynamics simulation to predict where a
| pool ball will go. That simplification is easy to see and
| understand. Scientists and engineers have been making useful
| simplifications for a long time.
|
| There are other simplifications where a bunch of forces and
| masses can be cancelled out, or aggregated - but we can't see
| them. Can a computer see them with enough processing? It seems
| like it.
| AndrewKemendo wrote:
| Can anyone describe how/where this actually fits into Meta's
| businesses?
|
| "As a test case, they decided to wield their model on a database
| of bulk-sequenced 'metagenomic' DNA from environmental sources
| including soil, seawater, the human gut, skin and other microbial
| habitats."
|
| Meta has a lot of data, but I'm unaware of them having a presence
| in the environmental/medical diagnostics industry which is where
| I assume this would be applied
|
| Perhaps they are going for a Bell Labs kind of structure?
| bawolff wrote:
| It gives them cred, cred in turn makes it easier to hire smart
| people in this field which has overlap with things that are
| actually meta's business.
| LightG wrote:
| * stifled laughter *
|
| Genuinely sorry.
| aierou wrote:
| It sounds like a joke, but despite public opinion, Meta is
| one of the top AI research groups in the world.
| ZetaZero wrote:
| > Can anyone describe how/where this actually fits into Meta's
| businesses?
|
| AI can predict the behavior of the users, feeding them as many
| relevant ads as possible, while keeping them engaged.
| strangattractor wrote:
| I think they were asking with regards to why does Meta cares
| about Protein structure. It doesn't care but it does help
| them keep up with the state of the art in ML I suppose.
| Mockapapella wrote:
| Probably the Chan Zuckerberg Initiative
| https://chanzuckerberg.com/
|
| They have a stated goal of eradication all diseases by 2100
| dbish wrote:
| This is a completely separate business from Meta.
| timy2shoes wrote:
| About as separate as Tesla is from Twitter.
| alecfreudenberg wrote:
| How could they not
| it_citizen wrote:
| To keep Zuckerberg AI up-to-date most likely
| justapassenger wrote:
| Meta has been one of the powerhouses of AI research for many
| years now.
|
| Why? Recruiting most likely.
| AndrewKemendo wrote:
| Yes, however all of that research was directly applicable to
| the Meta family of companies products - most specifically
| around NLP, image processing and computer vision with some
| work in RL. So it was applicable to the company, in addition
| to being good for recruiting.
|
| Unclear what products this would apply to
| m12k wrote:
| Improving the newsfeed algorithm. Automating content
| moderation. Creating content for the upcoming Metaverse.
| Realistic virtual characters for the Metaverse.
|
| Whether or not we like, or agree with, or believe in their
| goals (I don't), I think it's hard to argue that competence
| in AI is not useful for them.
| ta988 wrote:
| So? Use meta ai structures first that you refine in alphafold.
| Best of two worlds.
| aliqot wrote:
| Meta has become a creepy metaphor for facebook: briefly the new
| cool kid on the block, but now awkwardly shows up at high school
| parties with their letter jacket on, talking about the time they
| almost made it to State.
| i_like_apis wrote:
| You are way off.
|
| Meta is easily among the top few AI research institutions in
| the world.
| zip1234 wrote:
| Seems like this is a meaningful contribution to society. As are
| things like React and Pytorch.
| _Algernon_ wrote:
| React gives me 'nam flashbacks and I haven't even worked with
| a proper code base yet.
| JackFr wrote:
| Well, PyTorch.
| tradecraft wrote:
| This meta metaphor on Meta is itself a creepy metaphor
| belval wrote:
| That's a harsh take. FAIR (the Facebook AI research group) is a
| respected group in the field and they put out high quality
| research on a variety of topics.
|
| This would be like condemning research for AT&T/Bell Labs
| because the company was (even at the time) a terrible
| monopolistic corporation.
| xnx wrote:
| Let me know when Meta wins a Nobel Prize.
| kgc wrote:
| Why does a language based model work for what is essentially
| physics?
| civilized wrote:
| Why does language describe the physical world at all?
___________________________________________________________________
(page generated 2022-11-02 23:01 UTC)