[HN Gopher] AlphaFold 2 is here: what's behind the structure pre...
___________________________________________________________________
AlphaFold 2 is here: what's behind the structure prediction miracle
Author : couteiral
Score : 213 points
Date : 2021-07-20 09:15 UTC (13 hours ago)
(HTM) web link (www.blopig.com)
(TXT) w3m dump (www.blopig.com)
| Quarrel wrote:
| Awesome.
|
| I wrote a thesis on protein structure prediction in 1995. We
| weren't very good at it then. Amazing to see this.
| VSerge wrote:
| I remember that the scientific game "Fold It" was also quite
| exciting when it came out 12-13 years ago or so, since
| populations of players could get results beyond what either
| specialists or computer systems could achieve. I guess one
| could argue that an AI such as this could be compared to a
| large automated population of trained players trying to solve a
| 3D puzzle.
| kevstev wrote:
| I did some undergraduate research on this around 1999. At the
| time we were trying to prove that we could throw more firepower
| at the problem by building a beowulf cluster to solve the
| problem. After a bit of tweaking, we were able to get more
| performance than a single machine, but soon seti@home was
| released and to me at least the writing was on the wall that we
| were not taking the most optimal approach.
|
| In hindsight though, we were so far off from both an
| algorithmic perspective and a hardware perspective to actually
| achieving meaningful results. I am glad, 20 years later it
| seems real progress is being made. I haven't really followed
| the folding@home project in many many years, but its not clear
| to me much came out of it that was all that useful, at least
| not in practical terms.
| ac29 wrote:
| Not sure about folding@home, but the lab that runs
| rosetta@home released a paper earlier this month claiming
| they have a new algorithm with comparable results to
| AlphaFold2: https://science.sciencemag.org/content/early/2021
| /07/19/scie...
|
| I don't believe this new approach runs on their distributed
| compute network, but its cool to see some good competition.
| Game_Ender wrote:
| To me the most interesting part of the article is the cometary on
| where basic research is going to happen in the future. The fear
| is that if it only happens in large companies, then the unbiased
| pool of experts society relies on will be smaller and less
| informed. Along with the issue of nobody being around for the
| slog of defining a field, setting up databases, competitions and
| standards. These are what allow well funded corporate labs to
| apply their skills and compute and blow a problem out of the
| water. The problem is, would they do the work to define an
| unknown problem in the first place?
| mrfusion wrote:
| > unbiased pool of experts
|
| Sounds like an oxymoron these days.
| suetoniusp wrote:
| Always has been
| jonas21 wrote:
| > _DeepMind claimed that they used "128 TPUv3 cores or roughly
| equivalent to ~100-200 GPUs". Although this amount of compute
| seems beyond the wildest dreams of most academic
| researchers..._
|
| So, we're talking like what? Maybe $100K to $300K of hardware?
| Wet biology labs often have multiple pieces of $100K+ equipment
| at their disposal. Why shouldn't computational labs too?
| [deleted]
| [deleted]
| Synaesthesia wrote:
| Yeah but it's also putting it together and properly utlizing
| it, which takes specialist knowledge.
| tiborsaas wrote:
| This was never really a bottleneck for science. One needs
| to realize first that something can be done, then it will
| be done.
|
| The cost of computing will also go down in the future and
| for government funds this sounds like a drop in the sea
| when they are building multi billion dollar particle
| accelerators.
| extropy wrote:
| You can rent a 32 TPUv3 pod from Google Cloud at $32 per
| hour. So 128 pod would be roughly 150 per hour. $1K gives you
| 8 hours of training time.
|
| https://cloud.google.com/tpu/pricing#pod-pricing
| throwamon wrote:
| So... could anyone with experience in the area give an estimate
| of how much the likelihood of an unstoppable, untraceable "DIY"
| bioweapon appearing in the next decade has increased thanks to
| this?
| matt2000 wrote:
| I'm not an expert, but in a recent article about the new mRNA
| synthesis techniques they were asked the same question. The
| answer was there's already lots of potential bioweapons and
| many simpler techniques for producing them, so these new
| technologies don't change the danger level much.
| jcranmer wrote:
| So I don't have experience in the area, but I'd give it about
| 0%.
|
| To paraphrase Derek Lowe a lot (see, e.g., https://blogs.scienc
| emag.org/pipeline/archives/2021/03/19/ai...), there are several
| hard problems in biology, and the kind of progress embodied in
| AlphaFold isn't progress towards the rate-limiting problems.
| And many of the things that make drugs hard to develop are
| going to carry over into making bioweapons hard to develop.
| TaupeRanger wrote:
| Here we go again with the hyperbole...very tiring.
| spywaregorilla wrote:
| What are the big implications of being good at predicting protein
| structures?
| ansible wrote:
| The goal all along has been to design proteins with a specific
| structure.
|
| This can be applied to just about any area of biology. You
| could design novel antigens to combat disease, and then easily
| mass-produce them. Or just inject the RNA to have the body
| produce them.
|
| But the applications are boundless, from genetically modifying
| crops, to anti-aging, and more.
|
| It is also one of the key pathways to molecular nanotechnology,
| where instead of building arbitrary structures out of amino
| acids, we increase the range of arbitrary molecules we can
| design, build, and produce in quantity.
| spywaregorilla wrote:
| Is it the structure that's important? Or is the structure
| just a way to combine certain amino acids in a stable manner
| and its the combination of acids that we care about? Or is
| structure just a way of saying a specific permutation of
| amino acids?
| ansible wrote:
| The structure is the whole point. As I understand it, you
| can link together nearly arbitrary sequences of amino
| acids. But a random string of AAs will just result in a
| jumbled protein that doesn't do anything useful.
|
| Specific structures are useful in all manner of ways, from
| cleaving a DNA molecule at a specific point, enzymes for
| breaking apart molecules, etc.
|
| Very, very useful.
| vokep wrote:
| >Very, very useful.
|
| Just to frame it a particular way, biological systems are
| basically solved nanotechnology, extremely good, self-
| sustaining, resilient little machines that have spent a
| _long_ time optimizing to be better and better. But all
| the designs are preset, if we can crack the code and
| design our own little machines, then amazing things like
| more plastic-like cellulose could be made, all sorts of
| problems are suddenly far easier to solve. But also a lot
| of new problems emerge that weren 't even imaginable
| before, since the code being cracked is a big chunk of
| the code of life itself. So, yknow, playing God and all,
| so there probably will be some negative consequences of
| this too.
| ansible wrote:
| Yes, I agree with all this.
|
| Generally speaking molecular nanotechnology will solve
| all the "intractable" problems we as a society face
| today: climate change, poverty, biological death from old
| age / disease / cancer, and more.
|
| We could also create tools of destruction so vast, it can
| be hard to contemplate.
| TaupeRanger wrote:
| That remains to be seen. People _hope_ it will lead to new
| treatments for diseases of all kinds. Whether or not that
| materializes is big question mark.
| shpongled wrote:
| If we can accurately predict protein structures (particularly
| multiple structures, or structures reflecting what the
| conformation is in cells), then we can do a couple things:
| - better predict drug binding to proteins (massive benefits if
| accurate) - better understand the functional
| outcomes of missense mutations on proteins - study
| protein-protein interactions - and in general, just
| gain a better understanding of biology (which is driven by
| proteins and their reactions/interactions)
| wly_cdgr wrote:
| Cool. How many years closer has this brought us to a $1 pill that
| extends life span by 1 year? Cos let's face it that's the only
| prize that really matters in this whole field
| p1131 wrote:
| I wonder if this is the result of us having significantly better
| understanding of our biology or the major advancements in AI and
| computer performance. Or both?
| woliveirajr wrote:
| Article at Nature:
| https://www.nature.com/articles/s41586-021-03819-2
| londons_explore wrote:
| > Like most bioinformatics programs, AlphaFold 2 comes equipped
| with a "preprocessing pipeline", which is the discipline's lingo
| for "a Bash script that calls some other codes".
|
| Having Bioinformatics people requiring to stray a long way from
| their core competency to learn a scripting language from the 80's
| to write glue code seems... suboptimal. How many hours of expert
| time has been wasted figuring out how to split a string in bash?
|
| Can us software people build a better tool to eliminate the need
| for this?
| cabalamat wrote:
| > How many hours of expert time has been wasted figuring out
| how to split a string in bash?
|
| Probably a good many more than would be needed to learn how to
| split a string in Python.
| couteiral wrote:
| There is a growing trend to include Docker (or Singularity,
| which is more compatible with HPC architectures common in
| bioinformatics) images alongside codes. In particular,
| AlphaFold 2 does provide a Dockerfile, and they even include a
| Python "launcher script" hiding all the details of running the
| code.
|
| Sadly, this is very uncommon in the community. In a
| bioinformatics meeting, the sentence "I spent X days setting up
| Y software" will not raise many eyebrows
| maliker wrote:
| I work in power systems, and the situation is similar. Maybe
| worse because paper authors often come up with new
| computational techniques but don't implement them in code
| (much less code with a dockerfile).
|
| I look with much jealousy over at the computer science field
| where papers often include code, multiple versions under
| version control, automated tests, setup/docker scripts, and
| demonstration workflows and interfaces.
| knuthsat wrote:
| From looking at the code, Bash looks pretty clean.
|
| I also use Bash and AWK for preprocessing a lot.
| laichzeit0 wrote:
| I used to be that guy as well, till a college convinced me
| that anything I can do in Bash, Awk I could probably do
| easier in Perl. Then everyone sort of drifted to Python. I
| get that if you never used Perl it's pointless to learn it if
| you're already in the Python stack, but... damn Perl's
| regular expressions and how it's so baked into the syntax of
| the language makes using regex in Python seem like going back
| to the Stone Age.
| dfas231 wrote:
| * Can us software people build a better tool to eliminate the
| need for this?
|
| Most probably, not. Bash is currently the sweet spot. It is
| actually the best tool for this job. Any other option comes
| with increased complexity, will make the whole software less
| stable.
| volta83 wrote:
| Bash is easier to explain and use than, eg, explaining ppl how
| to use Python subprocess Module Launch different apps, capture
| their output, etc.
|
| I find astonishing how bad Python is as a bash replacement.
|
| I often rather write an argument parser in bash than use Python
| if I have to invoke a bunch of commands.
| hortense wrote:
| Python is bad, but bash is worse as soon as you need any kind
| of logic.
|
| > explaining ppl how to use Python subprocess Module Launch
| different apps, capture their output
|
| There's no shame in using `os.system`.
| maliker wrote:
| Well, technically subprocess.check_call if you need to
| capture output.
| 6gvONxR4sf7o wrote:
| Among the bioinformatics folks I know, bash is already a core
| competency. If you're using your average biologist as your
| mental model, you're thinking of the wrong people.
| cinntaile wrote:
| Pretty much anyone that works with data has to clean their
| data, learning to use the tools to do that is important.
| Whether that's bash, perl, R, python, ... doesn't really matter
| that much. If they already know bash then bash is a good tool
| since they can now focus on their data instead of wasting time
| on learning a new tool to do the same thing.
| culopatin wrote:
| It's sad. Bio people really need friends in software to boost
| their research. Unfortunately they do a codeacademy python
| course for 5 min and try to get their projects going. Sometimes
| they succeed, sometimes they fail. But they don't really have
| much time to dedicate to properly learn software dev and it's
| not what they are into anyway, it's a necessity.
|
| I think we could create something like a GitHub of bio projects
| that need help and people assist with hopes of getting their
| names in a paper
| ackbar03 wrote:
| I think DeepMind went one step further and solved the entire
| problem for them. They don't even have to touch their keyboards
| anymore
| stingraycharles wrote:
| Sure there are better alternatives, but the advantage of bash /
| shell scripting is that it's very easy to glue a whole
| collection of tools together, and that expertise in this
| transfers well between domains.
|
| They probably could have achieved the same by invoking things
| in Python, but it would have been slower and not achieved a
| lot, other than "not using shell scripts".
|
| And once you go down the path of optimizing this enough, you'll
| end up reinventing shell scripts altogether.
| sanxiyn wrote:
| Well, AlphaFold 2 generates MSA by invoking things in Python:
| https://github.com/deepmind/alphafold/blob/main/alphafold/da.
| ... So the article is actually mistaken on this point.
| johnnycerberus wrote:
| I think the tools already exist just that people are
| conservative with their choices.
| ethbr0 wrote:
| (Obligatory xkcd; you know the one)
| FartyMcFarter wrote:
| Not really?
| londons_explore wrote:
| https://xkcd.com/927/
| pbronez wrote:
| MLops is a pretty hot area right now. The industry is trying
| to figure out how to engineer these things in a robust way.
| It's not ubiquitous at all yet. There are lots of tools that
| wrap K8s and help you train models, but trying in DataOps in
| a robust way.... I haven't seen the definitive answer yet.
|
| Seriously please tell me if you are founding this company so
| I can invest.
| bamboozled wrote:
| Thank you Google...thank you!
| maliker wrote:
| Why did they open source it? Wouldn't this model be very
| valuable to the pharma industry?
| visarga wrote:
| > Why did they open source it? Wouldn't this model be very
| valuable to the pharma industry?
|
| This is a question we should remember when we feel like
| condemning big corporations for monopolizing AI. HuggingFace
| lists 12,257 models in its zoo, many coming from FAANG. You
| can start one in 3 lines of Python, or fine-tune it with a
| little more effort.
| swayson wrote:
| If I were to speculate:
|
| 1. It is inline with their vision/mission of the
| organization, advancing science. 2. Differentiate themselves
| from OpenAI, which despite the name, is not really big on
| open source.
| summerlight wrote:
| Because the core competency is not the model or code, but the
| people and organization that enable this project (and perhaps
| computing infrastructure as well?). The pharma industry will
| try to catch up of course, but they will also likely try to
| establish collaboration with DeepMind. This could be a good
| first step for Google into the medical/pharma business.
| alphabetting wrote:
| As an outsider it seems Google has a much more academia
| friendly culture than other megacap tech companies. Guessing
| the talent which this culture draws are likely more adamant
| about their work being open sourced.
| gostsamo wrote:
| It is written in the article. A few reasons could be pressure
| from the publishing magazine, emerging open source
| implementations of the same idea, and the fact that this is
| still far from easy to comercialize.
| londons_explore wrote:
| Don't forget internal pressure.
|
| A lot of people will say "Unless you opensource my work and
| that of my colleagues, I quit".
|
| When faced with all your best people threatening to quit,
| you might just opensource that work. It turns out you still
| have an advantage by being ~1 year ahead on applying it to
| anything, and having all the people who know how it works
| on your staff.
| credit_guy wrote:
| Another reason could be that whoever wants to run it will
| very likely run it in the cloud, and there's a chance
| they'd run it in the Google Cloud. A machine similar to the
| one they mention on the alphafold github page (12 vCPU, 1
| GPU, 85 GB) costs you between $1 and $4/hour.
| swazzy wrote:
| https://github.com/deepmind/alphafold
| kevincox wrote:
| Apache 2.0
|
| https://github.com/deepmind/alphafold/blob/main/LICENSE
| phkahler wrote:
| How does work like this get funded? It's awesome, but it seems so
| far removed from... let's say "profit". And there are several
| teams competing in these things. Are there places that really
| fund advanced work like this, or is it mostly graduate student
| underpaid labor?
| seventytwo wrote:
| Government funding (eg. DARPA) or from large corporations that
| have skunkworks teams (Google, IBM, Microsoft, etc.)
| dekhn wrote:
| Most of the US researchers who do CASP are funded by NIH or
| NSF. Some are funded by private foundations, or are
| independently wealthy. Typically, as a "principal investigator"
| (postdoc, professor, scientist at a national lab) you write a
| proposal saying "here's my preivous work, here's the next
| obvious step, plz give monies so I can feed the dean's fund and
| pay for my grad students to manage my modest closet cluster".
|
| A group of your competitors then trashes your proposal in a
| group and if you've properly massaged the right backs, you get
| a pittance, which permits you to struggle to keep up with all
| your promises.
| infogulch wrote:
| Yikes that sounds bleak.
| cing wrote:
| Yet, there were still 136 human teams who competed in
| CASP14 (https://predictioncenter.org/casp14/docs.cgi?view=g
| roupsbyna...), including DeepMind. Even if a significant
| fraction of these projects were done piggy-backing another
| grant, this work does receive research funding.
| dekhn wrote:
| Be fair. Many of those rows contain duplicate names
| (identical teams), so the count is much smaller.
| fswwi wrote:
| It's just corporations burn money to show off.
|
| https://venturebeat.com/2020/12/27/deepminds-big-losses-and-...
| dekhn wrote:
| So, unsurprisngly, it appears that applying a transformer to
| multiple sequence alignments extracts somewhat more spatial
| information about proteins than we had been able to previously
| squeeze out.
|
| It's pretty clear at this point that the work led to a large
| improvement in psp scores, but there's literally nothing else
| groundbreaking about it; I don't mean that in a bad way, except
| to criticize all the breathless press about applications and
| pharma.
| sheggle wrote:
| Just for my reference, what percentage of known but unfolded
| proteins (a wild guess is good enough), would you consider to
| be ab initio? How many don't have parts in any database?
| [deleted]
| visarga wrote:
| It seems amazing to me what the transformer can learn to SOTA
| levels, not just language but also images, video, code, math
| and proteins. Replacing so much handmade neural architecture
| with just one thing that does it all, that was an amazing step
| forward.
| mda wrote:
| Well it did gave geoundbreking results, it is weird to see
| people dismisses it as "Not groundbrraking enough".
| dekhn wrote:
| it was a nice improvement. that's fine. But it's ultimately
| just statistical modelling based on deep evolutionary
| information. It only works on homology modelling, it doesn't
| actually solve the larger protein structure prediction
| problem. Therefore it's not groundbreaking but a significant
| improvement.
| couteiral wrote:
| I respectfully disagree. AlphaFold 2 demonstrated almost
| perfect performance for a multitude of proteins for which
| no meaningful templates were available -- hence, it was not
| doing homology modelling as it is generally understood, but
| ab initio protein structure prediction.
|
| What I would support is that AlphaFold 2 does not solve the
| protein folding problem: how, as opposed to what to, a
| protein folds.
| timr wrote:
| > it was not doing homology modelling as it is generally
| understood, but ab initio protein structure prediction.
|
| Maybe according to the _current definition_ of the term,
| which has drifted over the years. Homology modeling and
| "ab initio" structure prediction have been drifting
| toward each other for a long time. These days, the
| categories are separated by (an essentially arbitrary)
| sequence identity threshold. If you have a protein
| sequence with high homology to some other protein with a
| structure, then you're homology modeling. If you have no
| matches at all, you're doing "ab initio". In the middle,
| you have a gray area where you can mix the approaches and
| call it whatever you like.
|
| This is not a pedantic point. If your method requires
| homology -- however distant and fragmented -- in order to
| work, then you're always limited to the knowledge in the
| database. Maybe we've sampled enough of protein space to
| get the major folds, but certainly, the databases don't
| have enough information to get the small details right.
|
| I have never been a huge believer in the idea that we can
| go directly from protein sequence to protein structure
| simply using a mathematical model of physics, but that is
| the original meaning of "ab initio structure prediction",
| and if you _could_ do it, it would be far more valuable
| than alphafold. At risk of making a trivially nerd-
| snipable metaphor, it 's kind of like the difference
| between google translate and a theoretical model of human
| intelligence that understands concepts and can generate
| language. The latter is obviously immensely more capable
| than the former.
| dekhn wrote:
| If CASP is calling methods that use any sequence
| similarity (the grey area) 'ab initio', that's
| disingenuous and intellectually dishonest.
|
| ab initio means from nothing, and at most, you're allowed
| to have physically inspired force fields, not sequence
| similarity to known structures. I put a lot of effort
| into improving the state of the art in that area, but
| ultimately concluded it made more sense to concentrate
| experimental structural determination in the area that
| was most useful- in proteins that had unknown folds or no
| known homology (see https://scholar.google.com/citations?
| view_op=view_citation&h... for some previous work I did
| in this area).
| timr wrote:
| > If CASP is calling methods that use any sequence
| similarity (the grey area) 'ab initio', that's
| disingenuous and intellectually dishonest.
|
| The _category_ is given the name, not the methods. People
| can use any method they like to solve the structures. The
| organizers are not zealots.
|
| The _ab initio_ portion of CASP consists of proteins that
| the organizers know have low sequence identity to
| anything in the existing databases. They represent
| proteins that are "difficult" to solve using what any
| practitioner might call homology modeling. That doesn't
| mean that you can't use a method that takes into account
| the biological databases -- and essentially all of the
| good methods do!
|
| For example, the Rosetta method has competed in both the
| homology modeling and the ab initio categories for many
| years. They mix a bit of both -- using homology models to
| get the fold, and fragment insertion to model the floppy
| bits.
|
| I haven't paid close attention to CASP in a long time,
| but I assume the competitor list still has tons of
| entries from people who cling tightly to the purist
| vision of ab initio modeling. They don't tend to do very
| well.
| dekhn wrote:
| OK, be aware the person you're correcting has: competed
| in CASP (on a competitor team with Sali), and published
| papers with Baker on Rosetta methods (my paper is cited
| in the most RoseTTA paper).
|
| "They mix a bit of both -- using homology models to get
| the fold, and fragment insertion to model the floppy
| bits."
|
| that's the best description of what I believe AF2 is
| doing, but that AF2 is being marketed as not depending on
| any sequence similarity.
|
| If the CASP folks really are saying "if you have 20%
| sequence identity and use the structure from that
| alignment it's ab initio"... that's really just totally
| misleading.
|
| Of course, even ab initio methods are parameterized on
| biological information; for example, I used AMBER to do
| MD simulations and many of the force field terms were
| determined using spectroscopic data from fragments of
| biological models. That, however is ab initio, because
| nothing even as large as a single amino acid is
| parameterized.
|
| I'm not saying there's anything wrong with homology
| modelling, or that the purist vision of ab initio is
| right. For practical purposes, exploiting subtle
| structural information through sequence alignment is a
| very nice way to save enormous amounts of computer time.
| timr wrote:
| > OK, be aware the person you're correcting has: competed
| in CASP (on a competitor team with Sali), and published
| papers with Baker on Rosetta methods (my paper is cited
| in the most RoseTTA paper).
|
| OK, great. Me too. I'm not saying anything controversial
| here. Right from the top of the "ab initio" tab on
| predictioncenter.org:
|
| _" Modeling proteins with no or marginal similarity to
| existing structures (ab initio, new fold, non-template or
| free modeling) is the most challenging task in tertiary
| structure prediction."_
| dekhn wrote:
| I think the more important question to resolve here is:
| did AlphaFold change anything with respect to structure
| prediction that enabled them to make accurate predictions
| in the complete absence of sequence similarity to
| proteins with known structure?
|
| My understanding is no, they did the equivalent of
| template modelling, which uses sequence/structure
| relationships (that are more subtle than the ones you get
| from homology modelling).
|
| I'm less interested in reconciling my internal mental
| model of psp wiht CASPs, than I am in understanding if
| AF2 is somehow able to get all the necessarily structural
| constraints through coevolution of amino acid pairs
| entirely without some (direct or indirect) learned
| relationship between the sequence similarity to known
| structures (be it even short fragments like helices).
|
| If they really did do that, and nobody did it before,
| that's great and I will happily promote the DM work, as
| it supports what I said when I did CASP: ML and MD will
| eventually win, although in a way that exploits the rich
| sequence evolutionary information we have, rather than
| predominantly by having an accurate force field and good
| smapling methods.
| dekhn wrote:
| how could they do ab initio? They depend on multiple
| sequence alignments.
|
| If I'm mistaken about this then I'll happily take back
| what I said, but there's no way that AF2 could work
| wihtout MSAs, therefore, it is not ab initio.
|
| Ah, OK checked the paper again. They're working on the
| "template" category which means there is structure-
| sequence information... maybe CASP organizers consider
| this ab initio ? The paper never mentions anything about
| ab initio predicitons. Is that what you're saying, that
| template methods are ab initio?
| couteiral wrote:
| Just in case there is a confusion: there is a difference
| between available _sequences_ (~300 million in standard
| protein sequence repositories) and _structures_ (~170k
| structures in the PDB, perhaps about ~120k that are
| structurally non-redundant). A large amount of CASP14
| targets have no available templates; in fact, many of
| them represented previously unseen topologies. However,
| all of them had some (in most cases, many) available
| sequences.
|
| The commonly accepted definition of homology modelling
| implies using a known structure ("template") as a
| scaffold to model the protein's topology. Since there are
| many CASP14 targets without appropriate templates,
| AlphaFold 2 simply cannot "just do homology modelling".
|
| I do take the point that the correct term is "free
| modelling" (it does not have, or does not use, any good
| structure as a template), and not "ab initio modelling"
| (it uses physics to fold the protein), though. A deep
| enough MSA is generally a requirement.
| dekhn wrote:
| Again, it's entirely possible I missed some very subtle
| point in AF2's system, but my understanding is that each
| target AF2 predicted had an underlying structural
| template covering the majority of the domain and the
| mapping was established through the MSA.
|
| IE, any MSAs would always include alignments to known
| protein structures. Are you saying their MSAs don't
| include alignments to known protein structures?
|
| (the reason I'm asking all this is because if I'm
| mistaken, then AF2 did do something "interesting", but
| everything in the paper says that everything they did is
| template based. If they are just folding proteins using
| MSAs without alignments to protein structures, that's far
| more interesting. I don't think they did that.
|
| edit: I've now reread the paper again, and I believe
| their claim of making predictions where there is no
| structural homology is incorrect from a technical
| perspective. I've communicated this to both the CASP
| organizers (whom I know) and DeepMind.
| couteiral wrote:
| Yes: they predict structures using MSAs, without
| alignments to known protein structures in a majority of
| the cases.
| dekhn wrote:
| OK if that's truly accurate, then they did make a
| significant accomplishment. However, I'm 99% certain
| (from reading the paper) that they actually do have
| alignments to structures, but the similarity is very low.
|
| It would help if you coould point to one of the
| alignmennts they made that has no underlying structure
| (even a template fragment) support.
|
| I reread the methods section, https://static-
| content.springer.com/esm/art%3A10.1038%2Fs415...
|
| They train jointly on the results of genetic search and
| template search (template search). Can you show an
| example of a prediction made using only genetic search
| and not template search. Those templates are fastas made
| from PDB files, which, while not homology modelling, is
| definitely not "ab initio".
| IshKebab wrote:
| It's perfectly reasonable to describe a very large
| improvement as groundbreaking.
___________________________________________________________________
(page generated 2021-07-20 23:01 UTC)