hngopher.com

       [HN Gopher] AlphaFold 2 is here: what's behind the structure pre...
       ___________________________________________________________________
        
       AlphaFold 2 is here: what's behind the structure prediction miracle
        
       Author : couteiral
       Score  : 213 points
       Date   : 2021-07-20 09:15 UTC (13 hours ago)
        
 (HTM) web link (www.blopig.com)
 (TXT) w3m dump (www.blopig.com)
        
       | Quarrel wrote:
       | Awesome.
       | 
       | I wrote a thesis on protein structure prediction in 1995. We
       | weren't very good at it then. Amazing to see this.
        
         | VSerge wrote:
         | I remember that the scientific game "Fold It" was also quite
         | exciting when it came out 12-13 years ago or so, since
         | populations of players could get results beyond what either
         | specialists or computer systems could achieve. I guess one
         | could argue that an AI such as this could be compared to a
         | large automated population of trained players trying to solve a
         | 3D puzzle.
        
         | kevstev wrote:
         | I did some undergraduate research on this around 1999. At the
         | time we were trying to prove that we could throw more firepower
         | at the problem by building a beowulf cluster to solve the
         | problem. After a bit of tweaking, we were able to get more
         | performance than a single machine, but soon seti@home was
         | released and to me at least the writing was on the wall that we
         | were not taking the most optimal approach.
         | 
         | In hindsight though, we were so far off from both an
         | algorithmic perspective and a hardware perspective to actually
         | achieving meaningful results. I am glad, 20 years later it
         | seems real progress is being made. I haven't really followed
         | the folding@home project in many many years, but its not clear
         | to me much came out of it that was all that useful, at least
         | not in practical terms.
        
           | ac29 wrote:
           | Not sure about folding@home, but the lab that runs
           | rosetta@home released a paper earlier this month claiming
           | they have a new algorithm with comparable results to
           | AlphaFold2: https://science.sciencemag.org/content/early/2021
           | /07/19/scie...
           | 
           | I don't believe this new approach runs on their distributed
           | compute network, but its cool to see some good competition.
        
       | Game_Ender wrote:
       | To me the most interesting part of the article is the cometary on
       | where basic research is going to happen in the future. The fear
       | is that if it only happens in large companies, then the unbiased
       | pool of experts society relies on will be smaller and less
       | informed. Along with the issue of nobody being around for the
       | slog of defining a field, setting up databases, competitions and
       | standards. These are what allow well funded corporate labs to
       | apply their skills and compute and blow a problem out of the
       | water. The problem is, would they do the work to define an
       | unknown problem in the first place?
        
         | mrfusion wrote:
         | > unbiased pool of experts
         | 
         | Sounds like an oxymoron these days.
        
           | suetoniusp wrote:
           | Always has been
        
         | jonas21 wrote:
         | > _DeepMind claimed that they used "128 TPUv3 cores or roughly
         | equivalent to ~100-200 GPUs". Although this amount of compute
         | seems beyond the wildest dreams of most academic
         | researchers..._
         | 
         | So, we're talking like what? Maybe $100K to $300K of hardware?
         | Wet biology labs often have multiple pieces of $100K+ equipment
         | at their disposal. Why shouldn't computational labs too?
        
           | [deleted]
        
           | [deleted]
        
           | Synaesthesia wrote:
           | Yeah but it's also putting it together and properly utlizing
           | it, which takes specialist knowledge.
        
             | tiborsaas wrote:
             | This was never really a bottleneck for science. One needs
             | to realize first that something can be done, then it will
             | be done.
             | 
             | The cost of computing will also go down in the future and
             | for government funds this sounds like a drop in the sea
             | when they are building multi billion dollar particle
             | accelerators.
        
           | extropy wrote:
           | You can rent a 32 TPUv3 pod from Google Cloud at $32 per
           | hour. So 128 pod would be roughly 150 per hour. $1K gives you
           | 8 hours of training time.
           | 
           | https://cloud.google.com/tpu/pricing#pod-pricing
        
       | throwamon wrote:
       | So... could anyone with experience in the area give an estimate
       | of how much the likelihood of an unstoppable, untraceable "DIY"
       | bioweapon appearing in the next decade has increased thanks to
       | this?
        
         | matt2000 wrote:
         | I'm not an expert, but in a recent article about the new mRNA
         | synthesis techniques they were asked the same question. The
         | answer was there's already lots of potential bioweapons and
         | many simpler techniques for producing them, so these new
         | technologies don't change the danger level much.
        
         | jcranmer wrote:
         | So I don't have experience in the area, but I'd give it about
         | 0%.
         | 
         | To paraphrase Derek Lowe a lot (see, e.g., https://blogs.scienc
         | emag.org/pipeline/archives/2021/03/19/ai...), there are several
         | hard problems in biology, and the kind of progress embodied in
         | AlphaFold isn't progress towards the rate-limiting problems.
         | And many of the things that make drugs hard to develop are
         | going to carry over into making bioweapons hard to develop.
        
       | TaupeRanger wrote:
       | Here we go again with the hyperbole...very tiring.
        
       | spywaregorilla wrote:
       | What are the big implications of being good at predicting protein
       | structures?
        
         | ansible wrote:
         | The goal all along has been to design proteins with a specific
         | structure.
         | 
         | This can be applied to just about any area of biology. You
         | could design novel antigens to combat disease, and then easily
         | mass-produce them. Or just inject the RNA to have the body
         | produce them.
         | 
         | But the applications are boundless, from genetically modifying
         | crops, to anti-aging, and more.
         | 
         | It is also one of the key pathways to molecular nanotechnology,
         | where instead of building arbitrary structures out of amino
         | acids, we increase the range of arbitrary molecules we can
         | design, build, and produce in quantity.
        
           | spywaregorilla wrote:
           | Is it the structure that's important? Or is the structure
           | just a way to combine certain amino acids in a stable manner
           | and its the combination of acids that we care about? Or is
           | structure just a way of saying a specific permutation of
           | amino acids?
        
             | ansible wrote:
             | The structure is the whole point. As I understand it, you
             | can link together nearly arbitrary sequences of amino
             | acids. But a random string of AAs will just result in a
             | jumbled protein that doesn't do anything useful.
             | 
             | Specific structures are useful in all manner of ways, from
             | cleaving a DNA molecule at a specific point, enzymes for
             | breaking apart molecules, etc.
             | 
             | Very, very useful.
        
               | vokep wrote:
               | >Very, very useful.
               | 
               | Just to frame it a particular way, biological systems are
               | basically solved nanotechnology, extremely good, self-
               | sustaining, resilient little machines that have spent a
               | _long_ time optimizing to be better and better. But all
               | the designs are preset, if we can crack the code and
               | design our own little machines, then amazing things like
               | more plastic-like cellulose could be made, all sorts of
               | problems are suddenly far easier to solve. But also a lot
               | of new problems emerge that weren 't even imaginable
               | before, since the code being cracked is a big chunk of
               | the code of life itself. So, yknow, playing God and all,
               | so there probably will be some negative consequences of
               | this too.
        
               | ansible wrote:
               | Yes, I agree with all this.
               | 
               | Generally speaking molecular nanotechnology will solve
               | all the "intractable" problems we as a society face
               | today: climate change, poverty, biological death from old
               | age / disease / cancer, and more.
               | 
               | We could also create tools of destruction so vast, it can
               | be hard to contemplate.
        
         | TaupeRanger wrote:
         | That remains to be seen. People _hope_ it will lead to new
         | treatments for diseases of all kinds. Whether or not that
         | materializes is big question mark.
        
         | shpongled wrote:
         | If we can accurately predict protein structures (particularly
         | multiple structures, or structures reflecting what the
         | conformation is in cells), then we can do a couple things:
         | - better predict drug binding to proteins (massive benefits if
         | accurate)            - better understand the functional
         | outcomes of missense mutations on proteins            - study
         | protein-protein interactions            - and in general, just
         | gain a better understanding of biology (which is driven by
         | proteins and their reactions/interactions)
        
       | wly_cdgr wrote:
       | Cool. How many years closer has this brought us to a $1 pill that
       | extends life span by 1 year? Cos let's face it that's the only
       | prize that really matters in this whole field
        
       | p1131 wrote:
       | I wonder if this is the result of us having significantly better
       | understanding of our biology or the major advancements in AI and
       | computer performance. Or both?
        
       | woliveirajr wrote:
       | Article at Nature:
       | https://www.nature.com/articles/s41586-021-03819-2
        
       | londons_explore wrote:
       | > Like most bioinformatics programs, AlphaFold 2 comes equipped
       | with a "preprocessing pipeline", which is the discipline's lingo
       | for "a Bash script that calls some other codes".
       | 
       | Having Bioinformatics people requiring to stray a long way from
       | their core competency to learn a scripting language from the 80's
       | to write glue code seems... suboptimal. How many hours of expert
       | time has been wasted figuring out how to split a string in bash?
       | 
       | Can us software people build a better tool to eliminate the need
       | for this?
        
         | cabalamat wrote:
         | > How many hours of expert time has been wasted figuring out
         | how to split a string in bash?
         | 
         | Probably a good many more than would be needed to learn how to
         | split a string in Python.
        
         | couteiral wrote:
         | There is a growing trend to include Docker (or Singularity,
         | which is more compatible with HPC architectures common in
         | bioinformatics) images alongside codes. In particular,
         | AlphaFold 2 does provide a Dockerfile, and they even include a
         | Python "launcher script" hiding all the details of running the
         | code.
         | 
         | Sadly, this is very uncommon in the community. In a
         | bioinformatics meeting, the sentence "I spent X days setting up
         | Y software" will not raise many eyebrows
        
           | maliker wrote:
           | I work in power systems, and the situation is similar. Maybe
           | worse because paper authors often come up with new
           | computational techniques but don't implement them in code
           | (much less code with a dockerfile).
           | 
           | I look with much jealousy over at the computer science field
           | where papers often include code, multiple versions under
           | version control, automated tests, setup/docker scripts, and
           | demonstration workflows and interfaces.
        
         | knuthsat wrote:
         | From looking at the code, Bash looks pretty clean.
         | 
         | I also use Bash and AWK for preprocessing a lot.
        
           | laichzeit0 wrote:
           | I used to be that guy as well, till a college convinced me
           | that anything I can do in Bash, Awk I could probably do
           | easier in Perl. Then everyone sort of drifted to Python. I
           | get that if you never used Perl it's pointless to learn it if
           | you're already in the Python stack, but... damn Perl's
           | regular expressions and how it's so baked into the syntax of
           | the language makes using regex in Python seem like going back
           | to the Stone Age.
        
         | dfas231 wrote:
         | * Can us software people build a better tool to eliminate the
         | need for this?
         | 
         | Most probably, not. Bash is currently the sweet spot. It is
         | actually the best tool for this job. Any other option comes
         | with increased complexity, will make the whole software less
         | stable.
        
         | volta83 wrote:
         | Bash is easier to explain and use than, eg, explaining ppl how
         | to use Python subprocess Module Launch different apps, capture
         | their output, etc.
         | 
         | I find astonishing how bad Python is as a bash replacement.
         | 
         | I often rather write an argument parser in bash than use Python
         | if I have to invoke a bunch of commands.
        
           | hortense wrote:
           | Python is bad, but bash is worse as soon as you need any kind
           | of logic.
           | 
           | > explaining ppl how to use Python subprocess Module Launch
           | different apps, capture their output
           | 
           | There's no shame in using `os.system`.
        
             | maliker wrote:
             | Well, technically subprocess.check_call if you need to
             | capture output.
        
         | 6gvONxR4sf7o wrote:
         | Among the bioinformatics folks I know, bash is already a core
         | competency. If you're using your average biologist as your
         | mental model, you're thinking of the wrong people.
        
         | cinntaile wrote:
         | Pretty much anyone that works with data has to clean their
         | data, learning to use the tools to do that is important.
         | Whether that's bash, perl, R, python, ... doesn't really matter
         | that much. If they already know bash then bash is a good tool
         | since they can now focus on their data instead of wasting time
         | on learning a new tool to do the same thing.
        
         | culopatin wrote:
         | It's sad. Bio people really need friends in software to boost
         | their research. Unfortunately they do a codeacademy python
         | course for 5 min and try to get their projects going. Sometimes
         | they succeed, sometimes they fail. But they don't really have
         | much time to dedicate to properly learn software dev and it's
         | not what they are into anyway, it's a necessity.
         | 
         | I think we could create something like a GitHub of bio projects
         | that need help and people assist with hopes of getting their
         | names in a paper
        
         | ackbar03 wrote:
         | I think DeepMind went one step further and solved the entire
         | problem for them. They don't even have to touch their keyboards
         | anymore
        
         | stingraycharles wrote:
         | Sure there are better alternatives, but the advantage of bash /
         | shell scripting is that it's very easy to glue a whole
         | collection of tools together, and that expertise in this
         | transfers well between domains.
         | 
         | They probably could have achieved the same by invoking things
         | in Python, but it would have been slower and not achieved a
         | lot, other than "not using shell scripts".
         | 
         | And once you go down the path of optimizing this enough, you'll
         | end up reinventing shell scripts altogether.
        
           | sanxiyn wrote:
           | Well, AlphaFold 2 generates MSA by invoking things in Python:
           | https://github.com/deepmind/alphafold/blob/main/alphafold/da.
           | ... So the article is actually mistaken on this point.
        
         | johnnycerberus wrote:
         | I think the tools already exist just that people are
         | conservative with their choices.
        
           | ethbr0 wrote:
           | (Obligatory xkcd; you know the one)
        
             | FartyMcFarter wrote:
             | Not really?
        
               | londons_explore wrote:
               | https://xkcd.com/927/
        
           | pbronez wrote:
           | MLops is a pretty hot area right now. The industry is trying
           | to figure out how to engineer these things in a robust way.
           | It's not ubiquitous at all yet. There are lots of tools that
           | wrap K8s and help you train models, but trying in DataOps in
           | a robust way.... I haven't seen the definitive answer yet.
           | 
           | Seriously please tell me if you are founding this company so
           | I can invest.
        
       | bamboozled wrote:
       | Thank you Google...thank you!
        
         | maliker wrote:
         | Why did they open source it? Wouldn't this model be very
         | valuable to the pharma industry?
        
           | visarga wrote:
           | > Why did they open source it? Wouldn't this model be very
           | valuable to the pharma industry?
           | 
           | This is a question we should remember when we feel like
           | condemning big corporations for monopolizing AI. HuggingFace
           | lists 12,257 models in its zoo, many coming from FAANG. You
           | can start one in 3 lines of Python, or fine-tune it with a
           | little more effort.
        
           | swayson wrote:
           | If I were to speculate:
           | 
           | 1. It is inline with their vision/mission of the
           | organization, advancing science. 2. Differentiate themselves
           | from OpenAI, which despite the name, is not really big on
           | open source.
        
           | summerlight wrote:
           | Because the core competency is not the model or code, but the
           | people and organization that enable this project (and perhaps
           | computing infrastructure as well?). The pharma industry will
           | try to catch up of course, but they will also likely try to
           | establish collaboration with DeepMind. This could be a good
           | first step for Google into the medical/pharma business.
        
           | alphabetting wrote:
           | As an outsider it seems Google has a much more academia
           | friendly culture than other megacap tech companies. Guessing
           | the talent which this culture draws are likely more adamant
           | about their work being open sourced.
        
           | gostsamo wrote:
           | It is written in the article. A few reasons could be pressure
           | from the publishing magazine, emerging open source
           | implementations of the same idea, and the fact that this is
           | still far from easy to comercialize.
        
             | londons_explore wrote:
             | Don't forget internal pressure.
             | 
             | A lot of people will say "Unless you opensource my work and
             | that of my colleagues, I quit".
             | 
             | When faced with all your best people threatening to quit,
             | you might just opensource that work. It turns out you still
             | have an advantage by being ~1 year ahead on applying it to
             | anything, and having all the people who know how it works
             | on your staff.
        
             | credit_guy wrote:
             | Another reason could be that whoever wants to run it will
             | very likely run it in the cloud, and there's a chance
             | they'd run it in the Google Cloud. A machine similar to the
             | one they mention on the alphafold github page (12 vCPU, 1
             | GPU, 85 GB) costs you between $1 and $4/hour.
        
       | swazzy wrote:
       | https://github.com/deepmind/alphafold
        
         | kevincox wrote:
         | Apache 2.0
         | 
         | https://github.com/deepmind/alphafold/blob/main/LICENSE
        
       | phkahler wrote:
       | How does work like this get funded? It's awesome, but it seems so
       | far removed from... let's say "profit". And there are several
       | teams competing in these things. Are there places that really
       | fund advanced work like this, or is it mostly graduate student
       | underpaid labor?
        
         | seventytwo wrote:
         | Government funding (eg. DARPA) or from large corporations that
         | have skunkworks teams (Google, IBM, Microsoft, etc.)
        
         | dekhn wrote:
         | Most of the US researchers who do CASP are funded by NIH or
         | NSF. Some are funded by private foundations, or are
         | independently wealthy. Typically, as a "principal investigator"
         | (postdoc, professor, scientist at a national lab) you write a
         | proposal saying "here's my preivous work, here's the next
         | obvious step, plz give monies so I can feed the dean's fund and
         | pay for my grad students to manage my modest closet cluster".
         | 
         | A group of your competitors then trashes your proposal in a
         | group and if you've properly massaged the right backs, you get
         | a pittance, which permits you to struggle to keep up with all
         | your promises.
        
           | infogulch wrote:
           | Yikes that sounds bleak.
        
             | cing wrote:
             | Yet, there were still 136 human teams who competed in
             | CASP14 (https://predictioncenter.org/casp14/docs.cgi?view=g
             | roupsbyna...), including DeepMind. Even if a significant
             | fraction of these projects were done piggy-backing another
             | grant, this work does receive research funding.
        
               | dekhn wrote:
               | Be fair. Many of those rows contain duplicate names
               | (identical teams), so the count is much smaller.
        
         | fswwi wrote:
         | It's just corporations burn money to show off.
         | 
         | https://venturebeat.com/2020/12/27/deepminds-big-losses-and-...
        
       | dekhn wrote:
       | So, unsurprisngly, it appears that applying a transformer to
       | multiple sequence alignments extracts somewhat more spatial
       | information about proteins than we had been able to previously
       | squeeze out.
       | 
       | It's pretty clear at this point that the work led to a large
       | improvement in psp scores, but there's literally nothing else
       | groundbreaking about it; I don't mean that in a bad way, except
       | to criticize all the breathless press about applications and
       | pharma.
        
         | sheggle wrote:
         | Just for my reference, what percentage of known but unfolded
         | proteins (a wild guess is good enough), would you consider to
         | be ab initio? How many don't have parts in any database?
        
         | [deleted]
        
         | visarga wrote:
         | It seems amazing to me what the transformer can learn to SOTA
         | levels, not just language but also images, video, code, math
         | and proteins. Replacing so much handmade neural architecture
         | with just one thing that does it all, that was an amazing step
         | forward.
        
         | mda wrote:
         | Well it did gave geoundbreking results, it is weird to see
         | people dismisses it as "Not groundbrraking enough".
        
           | dekhn wrote:
           | it was a nice improvement. that's fine. But it's ultimately
           | just statistical modelling based on deep evolutionary
           | information. It only works on homology modelling, it doesn't
           | actually solve the larger protein structure prediction
           | problem. Therefore it's not groundbreaking but a significant
           | improvement.
        
             | couteiral wrote:
             | I respectfully disagree. AlphaFold 2 demonstrated almost
             | perfect performance for a multitude of proteins for which
             | no meaningful templates were available -- hence, it was not
             | doing homology modelling as it is generally understood, but
             | ab initio protein structure prediction.
             | 
             | What I would support is that AlphaFold 2 does not solve the
             | protein folding problem: how, as opposed to what to, a
             | protein folds.
        
               | timr wrote:
               | > it was not doing homology modelling as it is generally
               | understood, but ab initio protein structure prediction.
               | 
               | Maybe according to the _current definition_ of the term,
               | which has drifted over the years. Homology modeling and
               | "ab initio" structure prediction have been drifting
               | toward each other for a long time. These days, the
               | categories are separated by (an essentially arbitrary)
               | sequence identity threshold. If you have a protein
               | sequence with high homology to some other protein with a
               | structure, then you're homology modeling. If you have no
               | matches at all, you're doing "ab initio". In the middle,
               | you have a gray area where you can mix the approaches and
               | call it whatever you like.
               | 
               | This is not a pedantic point. If your method requires
               | homology -- however distant and fragmented -- in order to
               | work, then you're always limited to the knowledge in the
               | database. Maybe we've sampled enough of protein space to
               | get the major folds, but certainly, the databases don't
               | have enough information to get the small details right.
               | 
               | I have never been a huge believer in the idea that we can
               | go directly from protein sequence to protein structure
               | simply using a mathematical model of physics, but that is
               | the original meaning of "ab initio structure prediction",
               | and if you _could_ do it, it would be far more valuable
               | than alphafold. At risk of making a trivially nerd-
               | snipable metaphor, it 's kind of like the difference
               | between google translate and a theoretical model of human
               | intelligence that understands concepts and can generate
               | language. The latter is obviously immensely more capable
               | than the former.
        
               | dekhn wrote:
               | If CASP is calling methods that use any sequence
               | similarity (the grey area) 'ab initio', that's
               | disingenuous and intellectually dishonest.
               | 
               | ab initio means from nothing, and at most, you're allowed
               | to have physically inspired force fields, not sequence
               | similarity to known structures. I put a lot of effort
               | into improving the state of the art in that area, but
               | ultimately concluded it made more sense to concentrate
               | experimental structural determination in the area that
               | was most useful- in proteins that had unknown folds or no
               | known homology (see https://scholar.google.com/citations?
               | view_op=view_citation&h... for some previous work I did
               | in this area).
        
               | timr wrote:
               | > If CASP is calling methods that use any sequence
               | similarity (the grey area) 'ab initio', that's
               | disingenuous and intellectually dishonest.
               | 
               | The _category_ is given the name, not the methods. People
               | can use any method they like to solve the structures. The
               | organizers are not zealots.
               | 
               | The _ab initio_ portion of CASP consists of proteins that
               | the organizers know have low sequence identity to
               | anything in the existing databases. They represent
               | proteins that are  "difficult" to solve using what any
               | practitioner might call homology modeling. That doesn't
               | mean that you can't use a method that takes into account
               | the biological databases -- and essentially all of the
               | good methods do!
               | 
               | For example, the Rosetta method has competed in both the
               | homology modeling and the ab initio categories for many
               | years. They mix a bit of both -- using homology models to
               | get the fold, and fragment insertion to model the floppy
               | bits.
               | 
               | I haven't paid close attention to CASP in a long time,
               | but I assume the competitor list still has tons of
               | entries from people who cling tightly to the purist
               | vision of ab initio modeling. They don't tend to do very
               | well.
        
               | dekhn wrote:
               | OK, be aware the person you're correcting has: competed
               | in CASP (on a competitor team with Sali), and published
               | papers with Baker on Rosetta methods (my paper is cited
               | in the most RoseTTA paper).
               | 
               | "They mix a bit of both -- using homology models to get
               | the fold, and fragment insertion to model the floppy
               | bits."
               | 
               | that's the best description of what I believe AF2 is
               | doing, but that AF2 is being marketed as not depending on
               | any sequence similarity.
               | 
               | If the CASP folks really are saying "if you have 20%
               | sequence identity and use the structure from that
               | alignment it's ab initio"... that's really just totally
               | misleading.
               | 
               | Of course, even ab initio methods are parameterized on
               | biological information; for example, I used AMBER to do
               | MD simulations and many of the force field terms were
               | determined using spectroscopic data from fragments of
               | biological models. That, however is ab initio, because
               | nothing even as large as a single amino acid is
               | parameterized.
               | 
               | I'm not saying there's anything wrong with homology
               | modelling, or that the purist vision of ab initio is
               | right. For practical purposes, exploiting subtle
               | structural information through sequence alignment is a
               | very nice way to save enormous amounts of computer time.
        
               | timr wrote:
               | > OK, be aware the person you're correcting has: competed
               | in CASP (on a competitor team with Sali), and published
               | papers with Baker on Rosetta methods (my paper is cited
               | in the most RoseTTA paper).
               | 
               | OK, great. Me too. I'm not saying anything controversial
               | here. Right from the top of the "ab initio" tab on
               | predictioncenter.org:
               | 
               |  _" Modeling proteins with no or marginal similarity to
               | existing structures (ab initio, new fold, non-template or
               | free modeling) is the most challenging task in tertiary
               | structure prediction."_
        
               | dekhn wrote:
               | I think the more important question to resolve here is:
               | did AlphaFold change anything with respect to structure
               | prediction that enabled them to make accurate predictions
               | in the complete absence of sequence similarity to
               | proteins with known structure?
               | 
               | My understanding is no, they did the equivalent of
               | template modelling, which uses sequence/structure
               | relationships (that are more subtle than the ones you get
               | from homology modelling).
               | 
               | I'm less interested in reconciling my internal mental
               | model of psp wiht CASPs, than I am in understanding if
               | AF2 is somehow able to get all the necessarily structural
               | constraints through coevolution of amino acid pairs
               | entirely without some (direct or indirect) learned
               | relationship between the sequence similarity to known
               | structures (be it even short fragments like helices).
               | 
               | If they really did do that, and nobody did it before,
               | that's great and I will happily promote the DM work, as
               | it supports what I said when I did CASP: ML and MD will
               | eventually win, although in a way that exploits the rich
               | sequence evolutionary information we have, rather than
               | predominantly by having an accurate force field and good
               | smapling methods.
        
               | dekhn wrote:
               | how could they do ab initio? They depend on multiple
               | sequence alignments.
               | 
               | If I'm mistaken about this then I'll happily take back
               | what I said, but there's no way that AF2 could work
               | wihtout MSAs, therefore, it is not ab initio.
               | 
               | Ah, OK checked the paper again. They're working on the
               | "template" category which means there is structure-
               | sequence information... maybe CASP organizers consider
               | this ab initio ? The paper never mentions anything about
               | ab initio predicitons. Is that what you're saying, that
               | template methods are ab initio?
        
               | couteiral wrote:
               | Just in case there is a confusion: there is a difference
               | between available _sequences_ (~300 million in standard
               | protein sequence repositories) and _structures_ (~170k
               | structures in the PDB, perhaps about ~120k that are
               | structurally non-redundant). A large amount of CASP14
               | targets have no available templates; in fact, many of
               | them represented previously unseen topologies. However,
               | all of them had some (in most cases, many) available
               | sequences.
               | 
               | The commonly accepted definition of homology modelling
               | implies using a known structure ("template") as a
               | scaffold to model the protein's topology. Since there are
               | many CASP14 targets without appropriate templates,
               | AlphaFold 2 simply cannot "just do homology modelling".
               | 
               | I do take the point that the correct term is "free
               | modelling" (it does not have, or does not use, any good
               | structure as a template), and not "ab initio modelling"
               | (it uses physics to fold the protein), though. A deep
               | enough MSA is generally a requirement.
        
               | dekhn wrote:
               | Again, it's entirely possible I missed some very subtle
               | point in AF2's system, but my understanding is that each
               | target AF2 predicted had an underlying structural
               | template covering the majority of the domain and the
               | mapping was established through the MSA.
               | 
               | IE, any MSAs would always include alignments to known
               | protein structures. Are you saying their MSAs don't
               | include alignments to known protein structures?
               | 
               | (the reason I'm asking all this is because if I'm
               | mistaken, then AF2 did do something "interesting", but
               | everything in the paper says that everything they did is
               | template based. If they are just folding proteins using
               | MSAs without alignments to protein structures, that's far
               | more interesting. I don't think they did that.
               | 
               | edit: I've now reread the paper again, and I believe
               | their claim of making predictions where there is no
               | structural homology is incorrect from a technical
               | perspective. I've communicated this to both the CASP
               | organizers (whom I know) and DeepMind.
        
               | couteiral wrote:
               | Yes: they predict structures using MSAs, without
               | alignments to known protein structures in a majority of
               | the cases.
        
               | dekhn wrote:
               | OK if that's truly accurate, then they did make a
               | significant accomplishment. However, I'm 99% certain
               | (from reading the paper) that they actually do have
               | alignments to structures, but the similarity is very low.
               | 
               | It would help if you coould point to one of the
               | alignmennts they made that has no underlying structure
               | (even a template fragment) support.
               | 
               | I reread the methods section, https://static-
               | content.springer.com/esm/art%3A10.1038%2Fs415...
               | 
               | They train jointly on the results of genetic search and
               | template search (template search). Can you show an
               | example of a prediction made using only genetic search
               | and not template search. Those templates are fastas made
               | from PDB files, which, while not homology modelling, is
               | definitely not "ab initio".
        
             | IshKebab wrote:
             | It's perfectly reasonable to describe a very large
             | improvement as groundbreaking.
        
       ___________________________________________________________________
       (page generated 2021-07-20 23:01 UTC)