[HN Gopher] AlphaFold Protein Structure Database
___________________________________________________________________
AlphaFold Protein Structure Database
Author : matejmecka
Score : 229 points
Date : 2021-07-22 15:15 UTC (7 hours ago)
(HTM) web link (alphafold.ebi.ac.uk)
(TXT) w3m dump (alphafold.ebi.ac.uk)
| sdbrown wrote:
| This is a fabulous convenience! The reach of this ready-to-go
| data will be much larger (in some directions) than the model and
| CASP results themselves.
| culopatin wrote:
| I happen to be working on a database for folds as well. But RNA
| folds not protein folds. I'm not a bio guy but my gf is and if I
| understand correctly this is not the same. I hope they are
| different because it would suck to be me lol.
|
| This is my first big boy project and I'm driving solo so it takes
| me a while to make any progress. But at least now I have this db
| and genbank to model after
| ricksunny wrote:
| I'm sorry but why don't tbey just release the ability for a user
| to enter a known real-world sequence's accession number from
| Genbank / GISAID, and generate the protein structure from that?
| Why do they have to abstract the user from the process by only
| exposing a completed database of the protein structures the
| Alphafold researchers decided would be worth producing?
| sherjilozair wrote:
| DeepMind has already released the open source code and model
| parameters. The database makes it easier to access the
| predictions.
| sveme wrote:
| I'd guess the ad-hoc simulation of the structure is
| computationally quite expensive and takes a while, though
| that's just a guess and I haven't read the original paper yet.
| ricksunny wrote:
| In fact a cost of $1-$4 for the preferred implementation:
|
| https://news.ycombinator.com/item?id=27894060
|
| The colab provides a slightly-less-accurate version that
| operates in the cloud. For the real mccoy it seems one must
| set up one's own environment and leverage the git repo.
| tazjin wrote:
| You can use the open-source code, and we also have a Colab
| notebook for that: https://bit.ly/alphafoldcolab
|
| More info: https://deepmind.com/blog/article/putting-the-power-
| of-alpha...
| ricksunny wrote:
| Thanks for that - I can see why my comment was downvoted now,
| as the the posted article's FAQ lists these links for those
| who would like to study their favorite sequenced-but-
| unmodeled protein. I'm glad Alphafold is as open source as it
| is, and I recognize that it didn't have to be so.
|
| I think I was primed for a knee-jerk reaction because when
| Alphafold's results were announced back in Dec. 2020, with
| expressions of what a boon it would be for researchers around
| the globe, I anticipated there would be a timeline announced
| for exposing a tool or for the open-sourcing. (The Github
| repo has only just been released about 6 days ago ...)
|
| With all the work on SARS-CoV-2's 'interactome', as well as
| human proteins & enzymes involved in pharmacology of
| antiviral drugs under development / repurposing , it's easy
| to imagine that drug developers would have liked to exercise
| Alphafold as soon as it was announced. (I myself have wanted
| a structure for human enzyme OATP1A2 that wasn't available on
| the PDB for such a drug pharmacology study - quite glad it is
| available at hand now.. .:) ).
|
| Anyway I'm sure good arguments will be made about the need to
| really 'get it right' before releasing, or internal
| deliberations on how much to open up vs charging for it.
|
| But 7 months lead time during a pandemic is a long time...
|
| In all cases thanks again for this innovation's availability
| now. :)
| [deleted]
| spacecity1971 wrote:
| Quick question, please excuse my ignorance, but is there a way to
| extrapolate sequence from structure? In other words, can we
| design proteins and calculate the sequence required to make it?
| kmckiern wrote:
| It's hard but people do it! This is the field of "protein
| engineering".
| moyix wrote:
| Anyone else getting a 403 Forbidden?
|
| If so it might be better to link to the paper instead:
| https://www.nature.com/articles/s41586-021-03828-1
| jkh1 wrote:
| Works fine for me. Must have been a temporary glitch.
| dnautics wrote:
| yikes, this doesn't even do some basic stuff like trim off pre-
| protein segments for secreted proteins... Without this, you could
| get some very incorrect structures.
| [deleted]
| lumost wrote:
| I used to do some RNA molecular dynamics simulations in college
| which were both computationally expensive and difficult to
| replicate. Having the ability to reasonably predict protein
| structure is an incredible scientific achievement - however I am
| curious if anyone here who is better informed has takes on the
| following.
|
| 1. How likely is it that alphafold learned to accurately predict
| protein structure in the narrow domain of proteins that have been
| experimentally synthesized and whose structure has been measured?
| in other words will AlphaFold's results generalize to proteins
| which cannot yet be synthesized in the laboratory.
|
| 2. If Alphafold's accuracy holds, what type of commercial
| applications does this open up?
| nharada wrote:
| From the abstract[1]:
|
| > After decades of effort, 17% of the total residues in human
| protein sequences are covered by an experimentally-determined
| structure. Here we dramatically expand structural coverage by
| applying the state-of-the-art machine learning method,
| AlphaFold2, at scale to almost the entire human proteome (98.5%
| of human proteins).
|
| [1] https://www.nature.com/articles/s41586-021-03828-1
| vmception wrote:
| Basically they are saying that decades of distributed protein
| folding was useless and everyone would have had more utility
| mining cryptocurrency if it existed several years earlier
|
| But at least it inspired someone to make and release this
| dekhn wrote:
| you're conflating two different disciplines: distributed
| protein folding studies the biophysical process of proteins
| folding over time, while protein structure prediction makes a
| static single predict of what is believed to be the final
| structure adopted by the protein in the folding process.
|
| I think many people believe that given infinite computer time
| the protein folding simulations would produce the same output
| as the static prediction (modulo a number of complex details)
| but use far, far more computer time to get there.
|
| The fundamental observation from the DM AF2 paper that I've
| been able to glean (which I kind of sort of already believed)
| is that careful multiple sequence alignments of 30-100
| evolutionarily related proteins is enough to produce coarse
| distance constraints that can be used to guide a structure
| prediction to a good answer quickly. And that depended on new
| ML technology that didn't exist before.
| vmception wrote:
| thanks for that explanation!
| cing wrote:
| Just in case you're not joking, it's worth noting that the
| majority of distributed molecular simulation (past and
| present) is spent studying "folded proteins" to discover
| structures of proteins that are often hidden from methods
| like AlphaFold (currently). For example,
| https://www.nature.com/articles/s41557-021-00707-0
| dmitryminkovsky wrote:
| > experimentally-determined structure
|
| refers to structures determined by means of physical
| examination, with like crystallography, not to attempts at
| predictive computational analysis prior to AlphaFold, which
| were not accurate compared to AlphaFold.
| ramraj07 wrote:
| I don't know if you know, but doctors spent 1,300 YEARS using
| the wrong anatomy book. A few years and compute time isnt the
| end of the world. I'm sure oracle's DB2 test suite has burned
| more carbon than protein folding labs have.
| Jabbles wrote:
| A third way in which you are wrong is that AlphaFold derives
| a lot of its power by referring to previously-solved protein
| structures, or parts of them. It doesn't fold the proteins
| from scratch in an "alpha-zero" way.
| vmception wrote:
| so its more like protein folding _was_ useless until an AI
| could make sense of the 17% solved variations and using
| that for the other 83% of proteins found in humans?
|
| > After decades of effort, 17% of the total residues in
| human protein sequences are covered by an experimentally-
| determined structure. Here we dramatically expand
| structural coverage by applying the state-of-the-art
| machine learning method, AlphaFold2, at scale to almost the
| entire human proteome (98.5% of human proteins).
|
| I just don't actually understand the quote from the article
| if it isn't comparing the same thing
| _RPL5_ wrote:
| This is awesome! When they announced CASP results a few months
| ago, I was wondering if AlphaFold will be accessible as an API,
| where you can submit a protein id or a sequence and get back a 3D
| structure. This database is basically that, except it's free &
| open to the public. Major props!
| Ovah wrote:
| Interesting that they're porting it to other organisms. Different
| organisms have variations in ribosomes, post translational
| modifications and even tRNA repertoire. So it's not a guarantee
| that two identical DNA sequences will give identical proteins in
| two different organisms.
| ramraj07 wrote:
| ??? Unless you jump from eukaryotes to archea these are not
| real concerns. Most PTM markers are very conserved.
| Ovah wrote:
| I'd say the jump from eukaryotes to procaryotes is a
| realistic scenario in recombinant DNA technology.
|
| I have some experience with recombinant yeast and PTMs.
| Degree of glycosylation actually vary a lot depending on
| strain used and has a huge effect of protein activity. And of
| course these PTMs affects the crystal structure.
| pelorat wrote:
| Shouldn't matter? Protein folding is based on the laws of
| physics after all. If DNA sequences folds differently in
| different organisms then an external factor is missing.
| Ovah wrote:
| While the laws of physics remain the same, the folding
| machinery between species varies to some degree. Protein
| folding is determined by the unique environment/machinery of
| a cell. A concrete example is disulphide bonds (S-S, ex
| cystein-cystein) that require a certain pH to form. The
| primary pathways of disulphide-bond formation are localized
| in the endoplasmic reticulum (ER) of eukaryotic cells and the
| periplasmic space of prokaryotic cells. So two complete
| different mechanisms to end up with the same bond (protein
| structure) depending on the organism.
| dnautics wrote:
| Outside of missing post translational modifications, can
| you give a concrete example of a protein that is known to
| fold differently in different species, not counting, say,
| stuff getting sent to the garbage bin of inclusion bodies
| due to the stress of overexpression? My understanding (7
| years of grad school researching protein folding in the ER)
| is that outside of some rare corner and disease state
| cases, folding is pretty much binary event, and if it
| weren't for most cases the low delta g difference between
| isoforms would be just as easily overcome over the course
| of environmental changes in a single individual as "between
| different species" namely having a deterministic outcome is
| important for through-time robustness.
| ramraj07 wrote:
| As an ex biomedical researcher I was trying to think what protein
| I should enter and see, and couldn't come up with a protein that
| I know of, that didn't have a structure already (at least a crude
| one). That is, we roughly know how most known important proteins
| look like. This is an amazing tool, and will he indispensable in
| labs (I'll expect any lab to use this site at least once a year?)
| But it's not as transformative as some might think.
| amelius wrote:
| https://www.embl.org/news/science/alphafold-potential-impact...
|
| > A discussion of the applications that AlphaFold DB may enable
| and the possible impact of the resource on science and society
| pelorat wrote:
| Do we really know the structure of every protein that assembles
| into a human cell?
| seventytwo wrote:
| Definitely not.
| cing wrote:
| One of the reasons we don't have them all is that individual
| genes can encode for multiple protein isoforms through
| alternative splicing. AlphaFold was only run on one.
| Otherwise, there's lots of important biochemical/biophysical
| processes that impact structure, as cells are only about 50%
| protein by weight.
| _RPL5_ wrote:
| From their abstract:
|
| ---
|
| After decades of effort, 17% of the total residues in human
| protein sequences are covered by an experimentally-determined
| structure1. Here we dramatically expand structural coverage
| by applying the state-of-the-art machine learning method,
| AlphaFold2, at scale to almost the entire human proteome
| (98.5% of human proteins). The resulting dataset covers 58%
| of residues with a confident prediction, of which a subset
| (36% of all residues) have very high confidence.
|
| https://www.nature.com/articles/s41586-021-03828-1
|
| ---
|
| The metric they use (residues) is a bit unusual (I would have
| used number of proteins instead), but I assume they wanted to
| account for ambiguity (such as proteins with partial
| structures).
| narrator wrote:
| Gain of function researchers working for the world's militaries
| will use this research to figure out how to get viruses to attach
| to receptor sites peculiar to particular races. The people
| developing the antivirals will have a lot harder time countering
| these weapons because making antivirals that aren't poisonous in
| some weird way is a much harder job. If this is not the case,
| please let me know why, it will really help me sleep better at
| night.
|
| A U.S congressional representative came out of a classified
| briefing recently and announced that the CCP is hard at work on
| race specific bioweapons.[1]
|
| Unfortunately, I think this is the launch of a new era of weapons
| we're seeing right now. The biggest development in war since the
| atom bomb. Like the atom bomb, the big question was will we kill
| ourselves with this technology. Who knows?
|
| [1]https://yournews.com/2021/07/22/2185645/rep-marjorie-
| taylor-...
| drcode wrote:
| ...and many doctors will use it to attach pharmaceuticals to
| receptor sites of particular cancers.
| narrator wrote:
| I'm thinking that the problem is is that it is much harder to
| develop drugs that only kill cancers very efficiently and
| don't harm the rest of the body than to tweak viruses that
| just have to keep the person alive long enough to spread the
| virus.
| drcode wrote:
| I 100% agree your point is valid. The counterargument is
| "Yes, people can do bad things with protein data, just as
| they can do bad things with a telephone, like use it to
| discuss a bank robbery."
| narrator wrote:
| The crazy part is a bioweapons program is really cheap
| compared to a nuclear weapons program, and now with these
| new tools it's even cheaper. Before, it was vastly more
| expensive to do the cycle of creating a new viral protein
| and testing a bioweapon on human cell culture. Now that
| process is speeded up millions of times with this
| technology because that can all take place inside a
| computer.
|
| This is similar to the change with drone weaponry. Before,
| you had to have large cruise missiles to get pinpoint
| strikes. Now small countries like Azerbaijan can buy a
| whole fleet of drone weapons and get the benefits of having
| a modern air force with pinpoint strikes and even stealth
| for vastly less money.
| mlyle wrote:
| Is this a correct summary of your statements:
|
| Because it -might- make things slightly easier for a
| state actor with nigh-unlimited resources to enact a
| doomsday scenario, which they might or might not be
| pursuing, medical researchers should not publish
| otherwise helpful research?
| narrator wrote:
| I think it's great that the Wuhan institute published all
| their gain of function research. They even said who paid
| for it. It's a clear trail back to them, but apparently
| taking any action to acknowledge that this is a bad thing
| and something fishy might be going on is a completely
| politicized issue now that apparently gets as many
| downvotes as arguing about hot button political topics
| now.
|
| What I'm saying is there should at least be an open and
| frank discussion of what the whole world is getting
| itself into right now with all this.
| ravila4 wrote:
| 1. Gain of function is not as easy as you think. 2. Such bio-
| weapons are not likely because any virus released in the wild
| will mutate over time, and also because you cannot target
| "races" in the way you describe. Phenotypic traits span across
| geographical borders, and any attempt to do such a thing is
| likely to backfire.
| narrator wrote:
| I think if the CCP were successful in creating race targeted
| bioweapons it would be in their interest to convince the
| world that they didn't exist.
|
| Insults, character assassination campaigns and politicizing
| the existence of these bioweapons would be a good way to do
| that. Just copy paste some of the comments here and change
| the name to insult anyone who thinks they exist. They could
| then go and kill millions and not receive any retaliation
| whatsoever with people praising them for their effective
| program of keeping the disease epidemic they created under
| control. Even if you got the guy who discovered AIDS and won
| the Nobel prize for it to say that these were gain of
| function viruses that incorporated HIV protein parts, you
| could just launch a big propaganda campaign to attack his
| character.[1] Much cheaper than having to fight a war.
|
| [1]https://www.gmanetwork.com/news/scitech/science/736458/fre
| nc...
| moistly wrote:
| > _Ridiculous fookin' idjit and compulsive liar M.T.Greene_
| came out of a classified briefing recently and announced that
| the CCP is hard at work on race specific bioweapons.
|
| Fixed that for you. By the way, you shouldn't pay attention to
| that clown.
| jkh1 wrote:
| Didn't see this post so posted it also. Also relevant:
| https://www.embl.org/news/science/alphafold-potential-impact...
| pelorat wrote:
| There's a lot of news about AlphaFold lately but what about
| Rossettafold? Wasn't it more accurate and much faster?
| creddit wrote:
| I believe slightly less accurate but significantly faster is
| where it stands.
| pelorat wrote:
| Running a sequence against both seems like a good idea. If
| they agree the certainty will go way up.
| visarga wrote:
| Citation factory, that's what it is.
| abcc8 wrote:
| Resources as useful as this are bound to be. We do cite our
| sources after all.
| stephanheijl wrote:
| I'm impressed and grateful that DeepMind released this resource,
| this will save a lot of compute from labs trying to replicate an
| entire exome for themselves. While some structures look great,
| there are still some misses here. Important structures like BRCA1
| (a well-studied breast cancer associated protein) are just
| structures for the BRCT and RING domains surrounded by a low-
| confidence string of amino acids, likely shaped to be globular:
| https://alphafold.ebi.ac.uk/entry/P38398
|
| Maybe I was wrong for expecting the impossible here, but I was
| excited to see this specific structure and it appears that there
| is still work to do. Nevertheless, kudos to Deepmind on their
| amazing achievement and contributions to the field!
| maga wrote:
| A curious non-biologist here: how valuable are these low
| confidence predictions for biologists? In other words, is it
| hard to predict but easy to check situation as with, say, prime
| numbers in mathematics?
| toufka wrote:
| The medium-confidence predictions are great for grounding or
| sourcing intuition. If you're trying to divide up a protein
| for an experiment and you have to choose where to divy it up
| - you'd like to use even a bad prediction to help weight an
| otherwise completely random approach. AND there are great
| methods to help with this, but they're often custom, time-
| consuming, and out-of-field for most. So being able to very
| quickly spot-check using a uniform state-of-the art, for any
| arbitrary protein, makes it actually pretty useful for
| certain kinds of pre-experimental guidance.
| devindotcom wrote:
| Some are valuable for the reasons the other person responding
| noted, but some of the low confidence predictions may also be
| high confidence predictions of a disordered class of protein
| that doesn't have a standard rest state. So it's useful work
| one way or the other.
| cing wrote:
| Everything between the BRCT and RING domains of BRCA1 is an
| intrinsically unstructured region which DeepMind correctly
| predicts, https://pubmed.ncbi.nlm.nih.gov/15571721/
|
| Another famous one would be R-domain of CFTR, which was not
| resolved in experimental structure determination, and AlphaFold
| models correctly show disorder there. Nothing to be done in
| those cases except perform molecular simulation or other
| experiments to assess dynamic ensembles,
| https://alphafold.ebi.ac.uk/entry/P13569
___________________________________________________________________
(page generated 2021-07-22 23:00 UTC)