[HN Gopher] What's next for AlphaFold and the AI protein-folding...
___________________________________________________________________
What's next for AlphaFold and the AI protein-folding revolution
Author : digital55
Score : 113 points
Date : 2022-04-13 14:29 UTC (8 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| photochemsyn wrote:
| It's kind of surprising that AlphaFold has some success with
| random sequences of amino acids:
|
| > "Baker's team gets AlphaFold and RoseTTAFold to "hallucinate"
| new proteins. The researchers have altered the AI code so that,
| given random sequences of amino acids, the software will optimize
| them until they resemble something that the neural networks
| recognize as a protein. In December 2021, Baker and his
| colleagues reported expressing 129 of these hallucinated proteins
| in bacteria, and found that about one-fifth of them folded into
| something resembling their predicted shape."
|
| 20% is not that great but it has potential. One long-standing
| goal is the de novo design of protein-based industrial catalysts
| for specific chemical transformations. Proteins from bacteria
| that live in boiling sulfur vents etc. have been used to some
| extent, but the idea is that similar proteins could be designed
| for a much wider variety of industrial processes. As the article
| notes, specificity remains a challenge (and designed proteins
| don't approach the efficiency of the evolutionary selected
| proteins), but it still seems promising.
|
| P.S. I'm a bit more skeptical about the drug-design programs.
| It's not so much that novel drugs can't be designed that bind to
| the desired targets, it's that they might bind to a whole lot of
| undesired targets as well, leading to nasty side effects. Now if
| you could screen against the whole proteome, perhaps.
| flobosg wrote:
| > 20% is not that great but it has potential.
|
| 20% success rate is in line with other protein design methods,
| though.
| gfodor wrote:
| I'd imagine the success rate isn't apples to apples - the
| real measure is "time, energy, and manpower expenditure
| needed per generated protein"
| flobosg wrote:
| Both measures can be quite similar. Most protein designs
| can be screened in parallel for solubility and successful
| designs can be further engineered and tested in a high-
| throughput manner.
| fabian2k wrote:
| I found it interesting that AlphaFold can't reliably predict the
| structures for mutations that disrupt structure. The explanation
| makes a lot of sense though.
|
| It is sometimes important to remind oneself that the selection of
| protein structures that exist in nature and that we determined
| experimentally is biased. Nature doesn't like proteins that
| misfold because they can easily cause trouble. And proteins with
| less defined structures are generally harder to solve with the
| usual methods like X-ray crystallography. The list of protein
| structures we know isn't a representative sample of all possible
| protein structures, it's mostly structures that are useful in
| nature and that we can determine with the methods we have
| available.
| alan-hn wrote:
| >proteins with less defined structures are generally harder to
| solve with the usual methods like X-ray crystallography
|
| What do you mean by 'proteins with less defined structures'?
| I'm not familiar with what this phrase could mean, could you
| please expand on this concept?
| fabian2k wrote:
| Less defined means flexible in this case. So either parts
| that are completely random on their own, or parts that can
| adopt multiple different structures.
|
| There are also intrinsically disordered proteins that have no
| defined structure when they are on their own, that's
| essentially like a piece of string that is almost completely
| flexible. Those proteins can still adopt a specific well-
| defined structure if they bind to something else.
| alan-hn wrote:
| So does flexible mean that there may be different amino
| acids in a portion of the peptide? From my understanding,
| when flexibility is discussed in terms of proteins we're
| talking about rigid vs flexible side chains which can move
| or rotate along specific bonds
|
| So for the intrinsically disordered ones, are you mainly
| talking about the secondary or tertiary structures? My
| assumption based on your statement is that we're keeping
| the same primary structure (order of amino acids) but they
| don't have many (if any at all) intermolecular
| interactions? Would it be safe to assume that you're
| referring to shorter polypeptide rather than large
| proteins?
| dekhn wrote:
| in disordered proteins, there is no permanent tertiary
| structure. they may have some secondary structure, but
| the relations of those structural elements can change in
| time. It does not mean the seuqence has variation in it.
| alan-hn wrote:
| Does this mean that they have multiple conformational
| states with similar energies that are easy for it to
| transition between? How different are the states and is
| this how the protein normally does its proteiny stuff?
| dekhn wrote:
| yes, I would say that intrisnically disordered proteins
| adopt something like an unfolded state, which is to say
| that they can visit a wide range of structures that are
| at similar energy levels, all of which are accessible at
| ~room temp. I can't really answer in more detail because
| all the ID proteins are fairly different an dhow they do
| their job is hard to understand compared to stable static
| "rocks" like enzyymes.
| throwawaybio3 wrote:
| Enzymes aren't stable and static -- usually in their
| active site they have significant conformational changes
| that enable catalysis of the relevant chemical reaction.
| It's quite a problem that we don't have general robust
| ways of directly elucidating those transient structures,
| a lot of our understanding of catalysis is still held
| back or slow-evolving because we can only use indirect
| and cumbersome methods (like isotopic mutation + laser
| IR)
|
| I would consider most enzymes to be intrinsically
| disordered at their active sites.
| dekhn wrote:
| No enzymes are not intrinsically disordered at their
| active sites. They are highly ordered. Most enzymes don't
| undergo large changes- they accept a molecule, do their
| business, and release it. You're thinking of other
| proteins like motor proteins which under go large,
| controlled conformational changes.
|
| The active site is structured to stabilize the transition
| state of the affected molecule and move it from one state
| to the next in the chemical reaction. That requires very
| specific shapes and correlated changes. But of course,
| this being biology, you can remove all 3 active site
| residues in a serine protease catalytic triad, and still
| see proteolysis because the protein, when it binds the
| substrate, forces the subtrate into its transition
| pathway.
|
| People have been working on these things for quite some
| time- I saw talks about time-resolved crystallography of
| active sites, and while they say "significant structure
| changes", they really only mean localized breathing-like
| motions, not massive rotations of entire domains.
| f38zf5vdt wrote:
| Yes, many proteins have transitional global arrangements
| that it traverses as it meets some goal. For example,
| kinesin and dynein walk along microtubules in a way where
| we could never perfectly characterize the intermediary
| states since it's effectively a motor with free rotation
| around certain elements.
|
| A lot of crystallography is focused on enzymatic
| reactions where you bind a ligand that sits there for the
| sake of introducing some conformation that you can study.
| The ligands generally approximate the natural substrate
| at either the beginning, end, or some intermediate step
| in enzyme catalyzed synthesis.
| panabee wrote:
| is it possible to identify which proteins are
| intrinsically disordered based on amino acid sequence
| alone (or even base sequence)?
|
| put another way, is it possible to a priori determine if
| a protein is ID or ordered?
|
| for instance, you said enzymes are highly ordered. is
| this based on experimental observations (which could
| later be wrong if imaging techniques improve) or is there
| some principle that allows us to treat this as a fact?
|
| thanks in advance for your time.
| flobosg wrote:
| > is it possible to a priori determine if a protein is ID
| or ordered?
|
| There's software that attempts to predict intrinsic
| disorder based on sequence alone, but in general, in the
| absence of homolog (evolutionarily related) proteins with
| known structure you would still need to check
| experimentally for disorder.
|
| EDIT:
|
| > if the goal is to reliably assess certain viral
| proteins as ID or ordered, experimental methods are the
| only methods for achieving this?
|
| If you don't find homologs with solved structures,
| experimental characterization is the way to go.
| panabee wrote:
| thanks for the explanation. to clarify, if the goal is to
| reliably assess certain viral proteins as ID or ordered,
| are experimental methods the only methods for achieving
| this?
| [deleted]
| dekhn wrote:
| A priori? No. Typically this would be determined by
| synthesizing or expressing the protein of interest and
| then using something like CD (circular dichroism).
|
| There is an absolutely enormous amount of experimental
| data about enzyme structure, but frankly I think the
| simplest is to just understand that the modern ideas
| about the reversible protein folding process came from
| ribonuclease, a protein that cuts RNA:
| https://en.wikipedia.org/wiki/Anfinsen%27s_dogma
|
| There may also be intrinsically disorderd enzymes, I'm
| not really sure how they would work, but of course, in
| biology, there's always a weird example that violates
| normal expectations because evolution once randomly tried
| somethign a billion years ago and got stuck with it.
| panabee wrote:
| thanks for the clarification. your papers also seem
| interesting, will check those out.
|
| the goal is to reliably characterize certain viral
| proteins as ID or ordered. would you happen to have any
| advice on this?
| abcc8 wrote:
| Many proteins have intrinsically disordered regions that are
| hypothesized to be directly related to the protein's role in
| the cell. These regions are termed disordered because current
| methods used to determine the structure of proteins are
| unable to resolve a regular structure for these regions in
| the context of a protein crystal or protein in solution. This
| publication is an informative review on the topic:
| https://pubs.acs.org/doi/10.1021/cr400525m
| jostmey wrote:
| The same protein can deform into multiple different 3D
| shapes, called conformations. Some proteins are rigid and
| exist almost exclusively in a single conformation. It is
| probably easier to determine the 3D structure of proteins
| with a single, dominant conformation. Other proteins don't
| have well defined conformations, and are more like a tangle
| of rope that can bend in many different ways
| panabee wrote:
| thanks for the explanation. what are the biggest factors
| influencing conformation? what are the best ways today for
| imaging proteins with different conformations, and what are
| the limitations of these methods?
| dekhn wrote:
| think loose floppy piles of spaghetti instead of well-defined
| rocks.
| tintor wrote:
| Example by analogy: Flat tire has less defined structure, and
| can take many shapes. Inflated tire has more defined
| structure, and behaves more predictably.
| axg11 wrote:
| Very nicely explained. Also hints at the next big frontier for
| protein folding: improving the prediction of those disruptive
| effects.
| flobosg wrote:
| > I found it interesting that AlphaFold can't reliably predict
| the structures for mutations that disrupt structure
|
| It's not _that_ surprising given the conceptual background of
| the method. Since it's relying on evolutionarily coupled
| residues, AlphaFold is looking at sets of complementary
| mutations that keep or rescue a determined structure, i.e. the
| complete opposite of structural disruption.
|
| > The list of protein structures we know isn't a representative
| sample of all possible protein structures
|
| And the same goes for protein sequences.
| peter303 wrote:
| I see a Nobel Prize around the corner.
|
| They arent often given for techniques or computation. But the
| results are outstanding.
| mupuff1234 wrote:
| Does Deepmind sell anything? Their site has no mention of any
| type of offering.
| benrapscallion wrote:
| They have spun out a drug design company named Isomorphic Labs.
| [1]
|
| [1] https://www.isomorphiclabs.com/
| dekhn wrote:
| amusingly, I work for a pharma and they don't even return our
| calls. I wonder how seriously they take this business,
| because if I was selling a product based on this, pharma
| would be my first customer.
| alphabetting wrote:
| Could be wrong but I think Deepmind sees more value in
| elite AI/ML talent that Alphafold will draw and help retain
| than future potential profits on drug discovery. Open
| sourcing Alphafold and removing commercial restrictions
| wouldn't make much sense if drug profits were their goal.
| dekhn wrote:
| No, isomorphic labs was set up to specifically
| commercialize this. If their goal is to be a discovery
| company, they are fairly naive.
| alphabetting wrote:
| Yeah I know about Isomorphic labs. My point is that
| talent Alphafold will draw is more valuable than
| potential drug discovery profits.
| folli wrote:
| If the promise of in silico drug design comes to
| fruition, the potential drug discovery profits could very
| well rival Google's ad profits.
| dekhn wrote:
| Sure. AlphaFold is, in fact, the greatest shot at revenue
| that DeepMind has shown so far (and they are under
| intense pressure from Alphabet to show revenue).
| alphabetting wrote:
| I don't think there is any pressure on that front. They
| are supposedly profitable now (though i'm guessing this
| is partially accounting tricks) but there just isn't a
| need to be profitable. Search and Youtube print money to
| fund their R&D ($31B last year alone). The goal is AGI or
| close to it.
|
| https://venturebeat.com/2021/10/10/ai-lab-deepmind-
| becomes-p...
| dekhn wrote:
| The "profit" you're pointing at is money that Google pays
| DeepMind to do software and machine learning as a service
| for them. This pays off, for example with Jax, where
| nobody in Google Research could touch it because Jeff
| Dean/Tensorflow, until DM demonstrated (with alphafold)
| that Jax could do nobel-prize-winning research, to the
| point where Jeff has admitted that tensorflow has serious
| problems and systems like jax are the future (see the
| palm paper!!!)
| mechagodzilla wrote:
| Where does the value come in if you pay them lots of
| money to work on unprofitable things? Just by virtue of
| not letting your competitors hire them?
| alphabetting wrote:
| Profit is later. I strongly believe this take.
|
| https://twitter.com/fchollet/status/1502775288257601540
| pkaye wrote:
| Must have taken the Google approach and already terminated
| the product. /s
| elcomet wrote:
| Alphafold was released, the code is open source and the
| pretrained weights are available for free.
| asdff wrote:
| I believe only for academic use though right? I don't know if
| it can be used for commercial use.
| lucidrains wrote:
| incorrect, they modified the license so it can be used for
| commercial use - https://github.com/deepmind/alphafold/comm
| it/8173117130e6df8...
| xnx wrote:
| Facebook: Releases a tool that makes amusing image mashups
| Google: Makes revolutionary progress in one of the hardest
| problems in chemistry
| flobosg wrote:
| To be fair, they have published a few papers and preprints
| related to the topic. See e.g.
| https://www.pnas.org/doi/10.1073/pnas.2016239118 and
| https://www.biorxiv.org/content/10.1101/2021.02.12.430858v3
| dekhn wrote:
| This wasn't Google, it was DeepMind. Google doesn't get any
| credit for this. I tried to start this project at Google but it
| conflicted with the Google Health team's goals.
| xiphias2 wrote:
| Even if it's a sister project, it's great PR for Google. I
| accept more ads from Google as it gives back so much in
| healthcare. I wish Meta would do the same, I wouldn't care if
| it's part of Facebook or not. GMail was something similar at
| the start: just do something good, to make more people like
| Google.
|
| As for your own project I'm sorry for you: there are no more
| 20% projects, like in the old times :(
| bawolff wrote:
| Its owned by google (alphabet) i think they deserve some
| props for it.
| dekhn wrote:
| google is owned by alphabet. DM and Google are siblings.
| codeflo wrote:
| I think this article does a good job of highlighting the
| difference between simulations and ML-based approaches. The
| latter are faster, but have limitations outside of their training
| parameters. As with everything in ML, broader training data to
| cover those cases probably helps. Though I would guess some of
| the problems could be inherent, that there fundamentally is no
| computational shortcut to this problem, whether you use a neural
| network or not.
| dekhn wrote:
| I just wish people would stop using the word "fold" for this.
| It's not folding. It's just structure prediction. It's great at
| structure prediction (static prediction of a single structure)
| and not at all at the folding process (which is dynamic and
| rapidly changing).
| flobosg wrote:
| "Protein fold" and "protein folding" are two different
| concepts. Folds are structural categories, folding is the
| biophysical process. But I agree that there are better words
| out there to name such a tool.
| dekhn wrote:
| That's very misleading, as you can see. I believe we should
| not use the term fold for structural categories as it's a
| misnaming. It's a historical accident that came about before
| people began to understand that folding is a process, not an
| on/off switch.
|
| See my work in this area:
| https://pubmed.ncbi.nlm.nih.gov/24345941/ which is explicitly
| attempting to simulate an approximate folding pathway(s).
| flobosg wrote:
| There are other terms analogous to "fold", like "topology"
| (as used in CATH), but they will probably never see
| widespread use.
| daveguy wrote:
| The most accurate term would probably be "tertiary
| structure". Although AlphaTertiaryStructure is a
| mouthful. They could have named it AlphaTS.
| flobosg wrote:
| > The most accurate term would probably be "tertiary
| structure".
|
| There's an AlphaFold variant that can predict quaternary
| structure: https://www.biorxiv.org/content/10.1101/2021.1
| 0.04.463034v2
|
| > They could have named it AlphaTS.
|
| TS looks more like "transition state" to me.
| daveguy wrote:
| Ah, good point. TS would be as bad or worse than "Fold"
| in this context.
| dekhn wrote:
| DeepMindProteinStructurePredictor or
| deep_mind_protein_structure_predictor if you don't like
| camel case
| gilleain wrote:
| Even 'topology' is a little confusing to those more
| familiar with the term from maths.
|
| For the 'CATH' hierarchical classification, the
| 'Topology' level is something like the organization of
| secondary structure in an 'Architecture'. This has some
| relationship to topology in the general sense, but is a
| narrower definition.
|
| For me, the 'fold' is what happens after 'folding'
| occurs, but I take the point that it is confusing.
| flobosg wrote:
| If I recall correctly, the difference between
| Architecture and Topology in CATH is that the former is
| independent of connectivity.
|
| > For me, the 'fold' is what happens after 'folding'
| occurs, but I take the point that it is confusing.
|
| Same here.
| dekhn wrote:
| Topology actually makes some sense here in that a very
| small number of proteins do fold into knots! This was a
| huge surprise and completely contradicted most
| predictions.
| https://en.wikipedia.org/wiki/Knotted_protein
| dekhn wrote:
| Yup. I've had this discussion repeatedly with the
| developers of SCOP and the folks who run CASP and they
| simply will not budge.
___________________________________________________________________
(page generated 2022-04-13 23:01 UTC)