[HN Gopher] AlphaFold's new rival? Meta AI predicts shape of 600...
       ___________________________________________________________________
        
       AlphaFold's new rival? Meta AI predicts shape of 600M proteins
        
       Author : pseudolus
       Score  : 179 points
       Date   : 2022-11-02 10:19 UTC (12 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | Simon_O_Rourke wrote:
       | More in their line to go and predict the effects of their
       | Metaverse strategic decision making on the share price!
        
       | the__alchemist wrote:
       | As an outsider to both biology and AI, I feel like there's a more
       | key problem than structure estimation from pattern-matching
       | (over-fitting?) known data: Learning how the molecular dynamics
       | work. I'm fascinated with how the folding might happen. The
       | intermediate steps; how the backbone and sidechains react to
       | change in temperatures; how much motion is going on in the
       | sidechains in response to constant interaction with water
       | molecules etc. How well does a classical rigidity/elasticity etc
       | model apply to covalently-bonded molecules. How much flexing do
       | various parts of the protein to real-time. How to make a model
       | that ab-initio folds (or unfolds) without being tuned to
       | experimental data. How much of the quantum interactions can we
       | dodge due to working at the atomic scale vice electron-scale. How
       | accurately can we model using dipoles, hydrophobic interactions,
       | simplified hydrogen bond models etc. How much of the picture are
       | we missing by only looking at the static aspect of the folded
       | protein - ie experimentally or AI-determined static atom
       | coordinates as the model.
        
         | COGlory wrote:
         | A number of people, myself included, have railed against the
         | terminology that frequently surrounds these otherwise great
         | achievements, because it does a disservice, precisely on this
         | topic.
         | 
         | AlphaFold (I haven't looked at Meta AI yet, beyond a cursory
         | glance), is solving the state of the folded protein, based
         | primarily off the state of other similar folded proteins.
         | There's some, but extremely limited, modeling of unknown
         | states. The major breakthrough for AlphaFold appears to be that
         | it is substantially better at detecting meaningful signal in
         | homology-based sequence alignments. That means, it can figure
         | out the parts of a protein that are similar to other known
         | folded proteins, with much greater success than previous models
         | (2x-3x better iIrc). As far as modeling proteins or portions of
         | proteins for which there are no known homologs, the model
         | becomes significantly weaker (although not nonexistant).
         | 
         | In my opinion, and also the opinion of a number of my
         | colleagues, advertising AlphaFold as having solved the "protein
         | folding problem", and other similar language (used by both the
         | media and the AlphaFold press releases) is completely
         | disingenuous. We are effectively no closer to understanding how
         | proteins fold, nor are we closer to being able to predict how a
         | protein folds without any _a priori_ information. Furthermore,
         | homology modeling, while quite successful so far for a lot of
         | proteins, breaks down on edge cases, higher-order structures,
         | and unknown folds. It seems unlikely that it will ever
         | completely solve the problem, and therefore, an entirely
         | different approach will be needed to modeling protein folding
         | that encompasses the entire field, edge cases and all (a quite
         | lofty goal that may also never be reached).
         | 
         | All this is to say: AlphaFold is a great tool that I use
         | frequently and am grateful it exists, but it hasn't solved
         | protein folding, and the thing is has solved is intrinsically
         | limited, and we should probably use different language to
         | describe what it's done. Either way, I'm glad to see progress
         | being made here, and eagerly await finding out how proteins
         | _actually_ fold.
        
           | deltree7 wrote:
           | How do we know that you aren't missing a piece of the puzzle
           | that others are/may see?
           | 
           | It's a genuine question. A lot of experts have a lot of blind
           | spots especially around prediction, rate of growth,
           | potential, vision. History is littered with people who are
           | really good at a field but fail to see 2/3/5/10 years ahead.
        
             | COGlory wrote:
             | You don't, but that only really applies to this:
             | 
             | >It seems unlikely that it will ever completely solve the
             | problem, and therefore, an entirely different approach will
             | be needed to modeling protein folding that encompasses the
             | entire field, edge cases and all (a quite lofty goal that
             | may also never be reached).
             | 
             | Of course I'm fallible, but it doesn't change the fact that
             | right now, this is the case - AlphaFold teaches us
             | essentially nothing about how proteins are actually
             | folding, and instead is solving for the folded state. If
             | someone can backtrack from there, I'll be nothing but
             | ecstatic.
        
             | [deleted]
        
           | panabee wrote:
           | is it also accurate to say that proteins may fold differently
           | based on temperature, ph, and other factors?
           | 
           | meaning a protein could fold in multiple ways (similar to a
           | swiss army knife "folding" into different shapes), and
           | alphafold only predicts a subset of these for now (which is
           | still amazing).
        
             | dekhn wrote:
             | Yes, absolutely. Proteins can fold to different target
             | structures (that are very distinct from each other) and get
             | stuck in those states for long periods of time, even if the
             | state isn't the global energy minimum.
             | 
             | The list of reasons for this- both functionally and
             | inadvertently- is extremely long. But the long and short of
             | it is that the ability of proteins to reproducibly fold to
             | a single structure was figured out using this technique-
             | putting an already folded protein into a very strong
             | solution (of Urea or guanidinum chloride), detecting that
             | it "unfolded", then putting it back into salty water and
             | watching it reform to the same original structure.
             | 
             | lots more detail here:
             | https://en.wikipedia.org/wiki/Anfinsen%27s_dogma (this
             | dogma is literally the core of protein folding/structure
             | prediction/function prediction).
             | 
             | Even more detail here:
             | https://en.wikipedia.org/wiki/Hofmeister_series which is
             | basically a series of progressively stronger solutions that
             | interfere with water/protein interactions and disrupt or
             | enhance folding. This data was key to establishing that
             | hydrophobic collapse (one of the dominant models for how
             | proteins spontaneously form structure) is a significant
             | force in driving folding free energy.
        
           | pelorat wrote:
           | > I'm glad to see progress being made here, and eagerly await
           | finding out how proteins actually fold.
           | 
           | There's no way a human will ever understand this. It's likely
           | a problem that is beyond what the human mind is capable of
           | understanding.
        
         | siver_john wrote:
         | As an insider to both fields (and specifically their juncture
         | to molecular dynamics).
         | 
         | What you are discussing is of course being studied but the
         | problem is that it is a lot more computationally expensive. We
         | do have simulations of simple (read very small) proteins
         | folding and unfolding but for larger ones the computational
         | time to watch them fold can be gigantic, if not impossible due
         | to the fact that proteins often fold as they are being made.
         | Which means including a much larger process into a folding one
         | which just further stresses computational resources.
         | 
         | This computational problem is so enormous that a company at the
         | cutting edge of research D.E. Shaw built a specialized computer
         | solely for simulating proteins. Also most of the software used
         | for this until recently had abandoned multi-GPU paralellism
         | because it didn't scale well. The pandemic caused the need to
         | simulate the virus on the entirety of Summit and introduced
         | some work back into that route but it is still specialized (and
         | wouldn't help for systems below a certain size anyways).
         | 
         | Also my previous points have been for atomic models (e.g. we
         | treat everything more like newtonian particles and ignore
         | quantum effects) some things definitely need more resolution
         | and at that level you are lucky to see protein fluctuations let
         | alone folding.
        
           | dekhn wrote:
           | Did simulating covid on summit actually help anything?
        
             | siver_john wrote:
             | My memory is a little rusty, but I believe yes. If I
             | remember correctly simulation of the virus helped medically
             | in a few ways. Specifically I think it gave insight into
             | the mRNA vaccines and what sequences to use to make them
             | effective (by basically making a slightly worse spike
             | protein). I am sure it helped in drug discovery as I know
             | our lab used simulations to suggest some potential drug
             | pockets. There were some really good talks about it at
             | NVIDIA's GTC last year or so (maybe more at this recent GTC
             | but I had too much going on personally to watch the VODs).
        
           | flobosg wrote:
           | > This computational problem is so enormous that a company at
           | the cutting edge of research D.E. Shaw built a specialized
           | computer solely for simulating proteins.
           | 
           | See https://en.wikipedia.org/wiki/Anton_(computer)
        
         | xiphias2 wrote:
         | That's what the Alphafold team is working on for some time now.
         | The only difference is that it will be relevant for drug
         | research, so I don't expect Alphabet to give it away for free
         | as well.
        
         | dekhn wrote:
         | These questions are better addressed by folding@home and we're
         | still very much computationally limited in our ability to
         | answer these questions.
         | 
         | Do you want to answer these questions for the satisifcation of
         | understanding the underlying physical rules that drive folding?
         | Why? It's unclear that knowing those things would actually make
         | a large impact in any industrially/medically useful contexts.
         | It uses huge amounts of CPU to sample these functions
         | accurately enough to replace actual physical experiments on
         | protein motion. Just doesn't seem like an effective investment
         | of brain or computer time.
         | 
         | (I say this as somebody whose entire career was predicated on
         | using MD to answer these questions; see
         | https://www.nature.com/articles/nchem.1821 for our attempt in
         | that space)
        
           | mechagodzilla wrote:
           | The folding@home approach is very limited to short individual
           | simulation times (a millisecond total, maybe, but 100
           | disconnected nanoseconds at a time), so it then relies on
           | various 'enhanced sampling' techniques to try to put your
           | thumb on the scale to bias things into exploring interesting
           | dynamics. It seems like it is probably more effective the
           | more you already know about a given protein target. Meta's
           | approach (which seems like AF2, but faster/worse?) seems to
           | have a similar problem, in that it's even less trustworthy
           | when you apply it to a new target you have relatively little
           | concrete information about.
        
       | pelorat wrote:
       | And how does it stack up to RoseTTAFold? Isn't RoseTTAFold much
       | faster than AlphaFold but slightly worse in accuracy?
        
       | robertlagrant wrote:
       | > The predictions are freely available for anyone to use, as is
       | the code underlying the model, says Rives.
       | 
       | Smart. If you might get cut, having all your work available to
       | pick up from is a great idea.
        
       | tromp wrote:
       | Trading accuracy for speed using language based models:
       | 
       | > Meta's network, called ESMFold, isn't quite as accurate as
       | AlphaFold, Rives' team reported earlier this summer2, but it is
       | about 60 times faster at predicting structures, he says. "What
       | this means is that we can scale structure prediction to much
       | larger databases."
       | 
       | > Burkhard Rost, a computational biologist at the Technical
       | University of Munich in Germany, is impressed with the
       | combination of speed and accuracy of Meta's model. But he
       | questions whether it really offers an advantage over AlphaFold's
       | precision, when it comes to predicting proteins from metagenomic
       | databases. Language model-based prediction methods -- including
       | one developed by his team3 -- are better suited to quickly
       | determine how mutations alter protein structure, which is not
       | possible with AlphaFold. "We will see structure prediction become
       | leaner, simpler cheaper and that will open the door for new
       | things," he says.
        
         | dekhn wrote:
         | Structure prediction is embarassingly parallel, and rarely
         | requires any specific protein to be predicted- which can be
         | done at any time, including precomputed- in an extremely short
         | period of time.
         | 
         | DeepMind has no trouble getting the inference time at Google to
         | compute predictions using their model for any protein of
         | interest or the largest database.
         | 
         | Thus, a faster but more inaccurate system is not really
         | desirable unless it really does provide another feature, such
         | as (as Rost says) predictions on mutations alter structure. But
         | if alphafold works, then it would indeed be capable of
         | predicting how mutations alter structure.
        
       | photochemsyn wrote:
       | Cool stuff, but more likely useful as a primary filter or search
       | tool rather than for detailed understanding of protein structure
       | and function. See this quote:
       | 
       | > "Sergey Ovchinnikov, an evolutionary biologist at Harvard
       | University in Cambridge, Massachusetts, wonders about the
       | hundreds of millions of predictions that ESMFold made with low-
       | confidence. Some might lack a defined structure, at least in
       | isolation, whereas others might be non-coding DNA mistaken as a
       | protein-coding material."
       | 
       | Understanding of how these proteins function requires high-
       | resolution information about bond angles and atom-atom distances,
       | particularly for non-structural proteins (i.e. interesting
       | catalytic enzymes). Hence, wet-lab work and protein structure
       | characterization via X-ray and NMR methods aren't going anywhere.
        
         | COGlory wrote:
         | Medium confidence is often close enough that you can formulate
         | a hypothesis, but high confidence still isn't close enough that
         | you'd stake millions of dollars of experiments on it anyways.
         | Most of the people in structural biology in pharma that I have
         | chatted with said they're still solving the structures of their
         | targets even with high-confidence AlphaFold models.
         | 
         | For me, as someone solving structures on a monthly basis,
         | AlphaFold is great because after I get back my electron density
         | map, I dump my sequence into AlphaFold, get a model (of any
         | confidence) and most of it fits well enough into my density map
         | that I don't have to start trying to model from nothing, and it
         | saves me honestly days of work.
        
           | dekhn wrote:
           | What's really funny about using AlphaFold predictions to
           | bootstrap a model into a density map is eventually, your
           | structure, based on an AF prediction, will eventually be
           | folded into the dataset used to train the next version of
           | AlphaFold. Talk about test set/training set leakage!
        
           | siver_john wrote:
           | That's a cool way I had never thought about it being used.
           | Just for my own curiosity do you know if that is a common use
           | case in the protein solving field? (I'd imagine it would also
           | be useful in NMR experiments for getting initial point
           | labels).
        
             | flobosg wrote:
             | It is quite common to use predicted models as an aid for
             | phasing and molecular replacement; even Foldit models have
             | been adopted for that purpose:
             | https://www.nature.com/articles/nsmb.2119
        
       | uri4 wrote:
        
         | lucasmullens wrote:
         | There were trials, and there aren't 10 billion people.
        
           | uri4 wrote:
           | That is for next pandemic in 2023, silly :)
           | 
           | And of course, there are always trials!?P
        
       | aliljet wrote:
       | This is incredible work, honestly. Is the theory here that humans
       | seem to enjoy an intuitive understanding of protein folding? I
       | seem to remember an online game (maybe
       | https://en.wikipedia.org/wiki/Foldit) that exploited this
       | intuition by crowdsourcing human suggestions for brute-force
       | style protein folding work.
       | 
       | And am I right that the goal is to create a best-effort short-cut
       | to the final stage of brute-force final folding to get a real
       | result?
        
         | danielmarkbruce wrote:
         | It's just about making simplifications where possible. You
         | don't need a molecular dynamics simulation to predict where a
         | pool ball will go. That simplification is easy to see and
         | understand. Scientists and engineers have been making useful
         | simplifications for a long time.
         | 
         | There are other simplifications where a bunch of forces and
         | masses can be cancelled out, or aggregated - but we can't see
         | them. Can a computer see them with enough processing? It seems
         | like it.
        
       | AndrewKemendo wrote:
       | Can anyone describe how/where this actually fits into Meta's
       | businesses?
       | 
       | "As a test case, they decided to wield their model on a database
       | of bulk-sequenced 'metagenomic' DNA from environmental sources
       | including soil, seawater, the human gut, skin and other microbial
       | habitats."
       | 
       | Meta has a lot of data, but I'm unaware of them having a presence
       | in the environmental/medical diagnostics industry which is where
       | I assume this would be applied
       | 
       | Perhaps they are going for a Bell Labs kind of structure?
        
         | bawolff wrote:
         | It gives them cred, cred in turn makes it easier to hire smart
         | people in this field which has overlap with things that are
         | actually meta's business.
        
           | LightG wrote:
           | * stifled laughter *
           | 
           | Genuinely sorry.
        
             | aierou wrote:
             | It sounds like a joke, but despite public opinion, Meta is
             | one of the top AI research groups in the world.
        
         | ZetaZero wrote:
         | > Can anyone describe how/where this actually fits into Meta's
         | businesses?
         | 
         | AI can predict the behavior of the users, feeding them as many
         | relevant ads as possible, while keeping them engaged.
        
           | strangattractor wrote:
           | I think they were asking with regards to why does Meta cares
           | about Protein structure. It doesn't care but it does help
           | them keep up with the state of the art in ML I suppose.
        
         | Mockapapella wrote:
         | Probably the Chan Zuckerberg Initiative
         | https://chanzuckerberg.com/
         | 
         | They have a stated goal of eradication all diseases by 2100
        
           | dbish wrote:
           | This is a completely separate business from Meta.
        
             | timy2shoes wrote:
             | About as separate as Tesla is from Twitter.
        
         | alecfreudenberg wrote:
         | How could they not
        
         | it_citizen wrote:
         | To keep Zuckerberg AI up-to-date most likely
        
         | justapassenger wrote:
         | Meta has been one of the powerhouses of AI research for many
         | years now.
         | 
         | Why? Recruiting most likely.
        
           | AndrewKemendo wrote:
           | Yes, however all of that research was directly applicable to
           | the Meta family of companies products - most specifically
           | around NLP, image processing and computer vision with some
           | work in RL. So it was applicable to the company, in addition
           | to being good for recruiting.
           | 
           | Unclear what products this would apply to
        
           | m12k wrote:
           | Improving the newsfeed algorithm. Automating content
           | moderation. Creating content for the upcoming Metaverse.
           | Realistic virtual characters for the Metaverse.
           | 
           | Whether or not we like, or agree with, or believe in their
           | goals (I don't), I think it's hard to argue that competence
           | in AI is not useful for them.
        
       | ta988 wrote:
       | So? Use meta ai structures first that you refine in alphafold.
       | Best of two worlds.
        
       | aliqot wrote:
       | Meta has become a creepy metaphor for facebook: briefly the new
       | cool kid on the block, but now awkwardly shows up at high school
       | parties with their letter jacket on, talking about the time they
       | almost made it to State.
        
         | i_like_apis wrote:
         | You are way off.
         | 
         | Meta is easily among the top few AI research institutions in
         | the world.
        
         | zip1234 wrote:
         | Seems like this is a meaningful contribution to society. As are
         | things like React and Pytorch.
        
           | _Algernon_ wrote:
           | React gives me 'nam flashbacks and I haven't even worked with
           | a proper code base yet.
        
           | JackFr wrote:
           | Well, PyTorch.
        
         | tradecraft wrote:
         | This meta metaphor on Meta is itself a creepy metaphor
        
         | belval wrote:
         | That's a harsh take. FAIR (the Facebook AI research group) is a
         | respected group in the field and they put out high quality
         | research on a variety of topics.
         | 
         | This would be like condemning research for AT&T/Bell Labs
         | because the company was (even at the time) a terrible
         | monopolistic corporation.
        
       | xnx wrote:
       | Let me know when Meta wins a Nobel Prize.
        
       | kgc wrote:
       | Why does a language based model work for what is essentially
       | physics?
        
         | civilized wrote:
         | Why does language describe the physical world at all?
        
       ___________________________________________________________________
       (page generated 2022-11-02 23:01 UTC)