hngopher.com

       [HN Gopher] What DeepMind's AlphaFold 2 really achieved (2020)
       ___________________________________________________________________
        
       What DeepMind's AlphaFold 2 really achieved (2020)
        
       Author : apsec112
       Score  : 209 points
       Date   : 2021-07-10 21:50 UTC (2 days ago)
        
 (HTM) web link (www.blopig.com)
 (TXT) w3m dump (www.blopig.com)
        
       | Synaesthesia wrote:
       | This looks like a tremendous breakthrough in this domain, very
       | impressive. I was similarly impressed by their Alphastar AI agent
       | which could play Starcraft 2 at a pro level (this is actually a
       | very difficult problem to solve).
       | 
       | I'm similarly disappointed that, like with that effort, the
       | methods and techniques will not be shared with the scientific
       | community.
        
         | inasio wrote:
         | This is definitely not a foregone conclusion, with AlphaFold 1
         | they did release a lot of information about it [0]. The article
         | only says that google/deepmind is waiting until they publish
         | the paper, and in fact Demis Hassabis recently tweeted that
         | they plan to open source and provide broad access to it [1]
         | 
         | [0] https://deepmind.com/research/open-source/alphafold_casp13
         | 
         | [1]
         | https://twitter.com/demishassabis/status/1405922961710854144
        
       | hortense wrote:
       | One thing that DeepMind demonstrated was that sometimes one well
       | funded team is much better than 50 poorly funded teams.
        
         | Gatsky wrote:
         | Yes, this is the key takeaway. It is really a blow to academia
         | that a private company could be so much better than them. It
         | clearly demonstrates, to my mind, that academia is a poor
         | engine for progress and getting worse. This is due to
         | structural and sociological pathologies which there seems to be
         | little appetite to mitigate.
         | 
         | I say this as an academic, of course.
        
           | jcfrei wrote:
           | It always depends. In lots of fields it's just a fact that
           | all the exciting research happens in private companies and
           | then for others it's reversed. Private companies can do
           | research well when a near or mid-term commercialization is
           | possible. Otherwise it's up to the public institutions.
        
           | kaba0 wrote:
           | I think it's unfair to generalize it to all areas of
           | academia.
        
           | Salgat wrote:
           | The biggest issue with academia is their hyper focus on
           | pumping out as many papers as cheaply and quickly as
           | possible. Big ambitious projects are much less efficient at
           | pulling this off.
        
             | stevenbedrick wrote:
             | The reason for that "hyper focus" is due to the "structural
             | and sociological pathologies" that the grandparent posted
             | about. Change the funding model and the rest will follow.
        
           | an_opabinia wrote:
           | But weren't all those Google employees trained in the
           | academy? Wasn't this competition organized and designed by
           | people in the academy? Who defined the goal, who laid not
           | just the foundation but built the whole town? It's clearly a
           | positive collaboration.
           | 
           | In any case, left to their own devices, corporate R&D teams
           | wouldn't be able to define goals that work for their
           | business. Like without the competition and goals being
           | defined for them, DeepMind would be having meetings with
           | brand managers about the avant- grade of ad tracking.
        
             | Gatsky wrote:
             | I did not say that we should burn down the Universities,
             | only that they have gone astray. I think this is actually
             | not a very controversial comment. Every academic I know is
             | deeply unhappy, even the ones who are really doing as well
             | as one can. This is a generalisation.
        
             | deeviant wrote:
             | > But weren't all those Google employees trained in the
             | academy? Wasn't this competition organized and designed by
             | people in the academy?
             | 
             | Getting an education (even an advanced one) is a completely
             | separate thing than entering academia and I suspect you
             | know this.
        
           | folli wrote:
           | My (admittedly biology/pharma-centric) point of view is a bit
           | less fatalistic:
           | 
           | Private companies are much more efficient in reaching a well-
           | defined goal.
           | 
           | Academia is much more efficient in reaching ill-defined
           | goals.
           | 
           | The thing is that the majority of goals for basic science are
           | very ill-defined and virtually all breakthroughs are
           | serendipitous (reaching from antibiotics to more recently
           | CRISPR-Cas). So I don't think it makes sense to advocate for
           | one vs the other.
        
           | nopasswrdmngr wrote:
           | Do you share this perspective with prospective graduate
           | students?
        
         | londons_explore wrote:
         | Most of deepminds additional funding goes into paying higher
         | salaries.
         | 
         | Those higher salaries don't result in better research results -
         | but merely as a way to move the most prolific researchers from
         | other institutions to them...
         | 
         | Arguably this extra funding isn't leading to many new
         | discoveries, but just shifting where discoveries are made.
        
           | nerdponx wrote:
           | > Arguably this extra funding isn't leading to many new
           | discoveries, but just shifting where discoveries are made.
           | 
           | Concentrating all these prolific researchers in one place,
           | removing the publish-or-perish incentive, and giving them
           | access to unlimited data and computing power.
           | 
           | Seems like that could make a difference.
        
           | solveit wrote:
           | The additional funding raises the market price of
           | researchers. That nudges the market to produce more
           | researchers. The marginal quant became an AI researcher
           | because people respond to incentives[1]. This leads to more
           | new discoveries[2].
           | 
           | [1] Standard caveats apply and the point stands. [2] Standard
           | caveats apply and the point stands.
        
             | alecst wrote:
             | I like how you added "and the point stands" to your own
             | comment.
        
               | solveit wrote:
               | Lol yes, you can tell I'm just so _done_ talking about
               | anything vaguely statistical to people who spend half
               | their working lives thinking about edge cases.
        
         | londons_explore wrote:
         | Or that lots of compute power is more effective than decades of
         | expertise in solving a problem...
        
           | MattRix wrote:
           | It's quite naive to assume compute power is the main reason
           | for deepmind's success.
           | 
           | Any bit of research into this (and most of their successes in
           | other fields) will show otherwise.
        
             | Diggsey wrote:
             | The fact that they've succeeded in so many different fields
             | implies that their success is due to a combination of
             | compute power & expertise in harnessing that compute power,
             | rather than expertise in all the different fields they have
             | applied it to.
        
             | joe_the_user wrote:
             | _Any bit of research into this (and most of their successes
             | in other fields) will show otherwise._
             | 
             | Or you could supply references and/or an argument. The "if
             | you researched this you'd agree with my claims" approach is
             | pretty pernicious.
        
         | nopasswrdmngr wrote:
         | well sometimes it is, sometimes it isn't. The question is what
         | are the odds. I don't think Deepmind has the answer to this
         | question.
        
         | [deleted]
        
       | londons_explore wrote:
       | So I guess the next scientific milestone becomes doing the
       | inverse of this challange...
       | 
       | Ie. Given a structure that you'd like a protein to have, develop
       | a sequence for it.
       | 
       | If we could do that easily, we could start making molecular
       | machines for all kinds of tasks. Rather than co-opting enzymes
       | from nature, we could design our own.
       | 
       | So many industries could benefit from that, even if you exclude
       | all the biomedical applications where such an approach might be
       | considered too high risk. We could for example begin with
       | dishwashing tablets which actually get burnt on stuff off...
        
         | hencoappel wrote:
         | I mean, the simplest solution to the reverse problem is
         | generating random sequences and then predicting their structure
         | to see if they fit the desired structure.
        
           | tryptophan wrote:
           | A 10 long protein has 21^10 possible sequences. That is
           | already a hard number to guess-and-check.
           | 
           | If you wanted to make a more reasonable length protein, of
           | say 100-200, you would run out of atoms in the universe to do
           | computations with.
        
         | siver_john wrote:
         | So the Baker Lab out of Seattle has actually been working on
         | that exact problem for a while now. There suit of programs for
         | doing this type of work is called Rosetta and I know they have
         | generated at least one protein from scratch.
        
           | titoCA321 wrote:
           | They have and so has the Folding@Home team as Washington
           | University. Although, Folding@Home is terribly inefficient at
           | the way it approaches the problem. I know Rosetta but never
           | have worked on it or used it so I can't comment on it's
           | efficiency.
        
       | wokwokwok wrote:
       | This is from late 2020... weirdly, nothing seems to have come of
       | it.
       | 
       | Is there an update since then? Have they actually done anything
       | useful with it?
        
         | drbw wrote:
         | As the article says
         | 
         | > The details of how AlphaFold 2 works are still unknown, and
         | we may not have full access to them until their paper is peer-
         | reviewed (which may take more than a year, based on their
         | CASP13 paper).
         | 
         | So it's not particularly surprising that we haven't heard much
         | yet.
        
         | stupidcar wrote:
         | According to the AlphaFold Wikipedia article:
         | 
         | > As of 18 June 2021, according to DeepMind's CEO Demis
         | Hassabis a full methods paper to describe AlphaFold 2 had been
         | written up and was undergoing peer review prior to publication,
         | which would be accompanied by open source code and "broad free
         | access to AlphaFold for the scientific community"
        
           | clavigne wrote:
           | which is basically admission that they will API it, not
           | release the models... again.
           | 
           | The original version on github can only compute the specific
           | systems in the paper.
           | 
           | https://github.com/deepmind/deepmind-
           | research/tree/master/al...
           | 
           | I don't know why scientific publications keep doing PR work
           | for them.
        
             | AnotherTechie wrote:
             | if cryptography is a weapon, isn't folding proteins also
             | arguably a weapon?
        
               | TenToedTony wrote:
               | Yes, but only because if cryptography is a weapon then
               | everything is a weapon.
        
       | marsven_422 wrote:
       | Let's ML up some RNA and inject it! Sounds like a fantastic idea.
        
       | leadingthenet wrote:
       | In case there's someone else like me who could use an
       | introductory video on the topic, Sabine Hossenfelder has recently
       | made one: https://youtu.be/yhJWAdZl-Ck
       | 
       | It includes some commentary on this discovery, as well.
        
         | qwertox wrote:
         | https://en.wikipedia.org/wiki/Sabine_Hossenfelder
        
         | herodoturtle wrote:
         | That was a great intro, thanks.
        
       | MauranKilom wrote:
       | > Consider, for example, the possibility that Alphabet decides to
       | commercially exploit AlphaFold, for example -- is it reasonable
       | that they make profit off such a large body of research paid
       | almost exclusively by the taxpayers? To what extent is the
       | information created by publicly available research -- made
       | public, mind you, to stimulate further public research -- belong
       | to the public, and under what conditions could it be used in for-
       | profit initiatives?
       | 
       | Maybe I have a wrong conception of what research is supposed to
       | achieve, but commercializing new insights is _absolutely_ one of
       | the intended outcomes. One would sure hope that taxpayer money
       | isn 't funneled into research to... just enable more research. At
       | some point the public should tangibly benefit from it, which is
       | not achieved by writing more papers.
       | 
       | This all notwithstanding the fact that DeepMind intends to make
       | AlphaFold open source and available to the community.
        
         | travisgriggs wrote:
         | You used public benefits and commercialization in the same
         | paragraph.
         | 
         | While that kind of semi symbiotic relationship can (and has
         | been observed to) exist, it does so best in an an environment
         | that looks different than what is described here (few large
         | near monopolies, legislative regulations that are best
         | navigated using wealth, a market that has inelastic bargaining
         | qualities).
        
           | unishark wrote:
           | But the only way the monopoly on technology can make money is
           | by sharing the benefits. The point of technology is to make
           | the production of goods and services more efficient; it's not
           | a scarce resource in itself. If a technology is not
           | commercialized then this efficiency gain is not achieved and
           | benefits no one. If someone commercializes it and monopolizes
           | it but charges too high a price, people wouldn't buy it
           | anyway, since they can always use older technology, and the
           | monopoly also earns nothing. If transactions occur, it means
           | both buyer and seller feel they are getting a benefit.
        
         | tehjoker wrote:
         | The government, especially since WW2, has increasingly designed
         | its operations to subsidize something for the public and then
         | allow private operators to extract whatever wealth from that
         | regardless of the costs to the public.
         | 
         | For example, research is paid for by the public, but then the
         | products that affect people are completely captured by
         | monopolists and spooned out in such a way to make sure only the
         | monied sections of the population get them until the public
         | protests enough to create a program like medicaid.
         | 
         | If we paid most of the cost, we should get most of the benefit.
         | The monopolists should be happy to make any money at all, not
         | their superprofits. Fair right?
        
           | jjtheblunt wrote:
           | did you say "monopolists" when meaning "capitalists"?
        
         | gumby wrote:
         | > but commercializing new insights is absolutely one of the
         | intended outcomes.
         | 
         | This is a recent idea, dating back to the late 70s and
         | implemented through the 1980 Bayh-Dole Act. Before that
         | research was research; development (and private research of
         | course) was the province of business.
         | 
         | The Gradgrind mentality that all research must be
         | commercialized has impoverished basic research; of what use is
         | looking for gravity waves or new branches of mathematics (just
         | look at the contortions university press offices go to to
         | justify some new paper on quantum mechanics).
         | 
         | Speaking of which, QM is a perfect example of something that
         | would have advanced very slowly had this attitude existed 150
         | years ago...yet it is at the heart of the semiconductor
         | revolution!
        
           | unishark wrote:
           | >This is a recent idea, dating back to the late 70s and
           | implemented through the 1980 Bayh-Dole Act. Before that
           | research was research; development (and private research of
           | course) was the province of business.
           | 
           | I don't think federal funding of research is that much older
           | in the US, only really starting in the 50s apart from
           | military research. How exactly were the early QM researchers
           | funded anyway? (apart from Einstein's famous day job at the
           | patent office). I know at least a few of them had fellowships
           | at universities, meaning rich benefactors.
        
             | gumby wrote:
             | > I don't think federal funding of research is that much
             | older in the US, only really starting in the 50s apart from
             | military research.
             | 
             | US government support for university research dates back to
             | patent holder Abraham Lincoln who even in the middle of a
             | war got legislation passed to support land grant (mostly
             | ag) colleges and universities (and of which MIT was one of
             | the very early beneficiaries). However it was small and you
             | are right that in WWII the model of the US modern research
             | university was explicitly created by James Conant, with MIT
             | again being the largest beneficiary (note that all tuition
             | and student expenses are about 14% of MIT's revenue and 16%
             | of expenditures, and the number of staff is greater than
             | that of the student body -- it's a huge government research
             | lab with a small school attached).
             | 
             | The problem with this model is that unless you are MIT
             | (/Stanford/Harvard/Cornell/CMU et al -- maybe 25
             | institutions, if that) licensing revenue _matters_ , and
             | affects who gets tenure, departmental budgets etc.
             | 
             | > How exactly were the early QM researchers funded anyway?
             | (apart from Einstein's famous day job at the patent
             | office). I know at least a few of them had fellowships at
             | universities, meaning rich benefactors.
             | 
             | In Europe, in the 20th century funding came primarily from
             | governments (and benefactors, more early in the century),
             | under varying institutions (the big research institutions
             | in Imperial and post-WWI Germany, "Institutes" in France,
             | Oxbridge in the UK, etc). In The USA it was the
             | institutions themselves, some benefactors and, as I said,
             | some government funding (like Fermi and Lawrence).
        
           | tzs wrote:
           | Isn't Bayh-Dole about letting people who do research with the
           | government, for the government, or paid for with government
           | grants own the resulting IP such as patents, so that they
           | could commercialize it?
           | 
           | If so, I don't think that really applies to what the article
           | is talking about. The article is talking about Alphabet
           | potentially using large amounts of data from other
           | researchers, mostly academic, who were funded by the
           | government and commercializing it. That's more akin to how it
           | was before Bayh-Dole: a private company taking government
           | funded research they were not involved in, adding their own
           | privately funded research, and making something commercial.
        
             | gumby wrote:
             | > If so, I don't think that really applies to what the
             | article is talking about.
             | 
             | My comment was in reply to this comment by MauranKilom:
             | 
             | > > Maybe I have a wrong conception of what research is
             | supposed to achieve, but commercializing new insights is
             | absolutely one of the intended outcomes. One would sure
             | hope that taxpayer money isn't funneled into research to...
             | just enable more research.
             | 
             | And further on your comment:
             | 
             | > The article is talking about Alphabet [commercializing
             | results from public datasets without needing to pay for
             | them] That's more akin to how it was before Bayh-Dole:
             | 
             | Indeed, pre Bayh-Dole, publicly funded research was public
             | (consider it public domain, or at least "MIT licensed") and
             | anyone could use it.
             | 
             | Now everything has to be licensed from university licensing
             | departments, typically with an expensive exclusive. Which
             | has had a distorting effect on research, not merely
             | restricting use (have you ever tried to work with a
             | university licensing office? They consider even the most
             | trivial results to be Nobel prize class) but, because they
             | are a source of revenue, bending resource allocation,
             | tenure, etc much as sports teams do for the schools that
             | have them.
        
             | dekhn wrote:
             | B-D allows the universities who house the principal
             | investigators who conduct government-sponsored research to
             | commercialize their inventions.
             | 
             | This means, for example, that David Baker licenses Rosetta
             | for free to academic and government, but commercial users
             | have to obtain a paid commercial license. Baker (his lab,
             | or LLC, or whomever vends Rosetta Commercial) benefits
             | monetarialy from all the data that Rosetta includes, which
             | is decades of structural biology funded by NIH and others.
        
         | klapatsibalo wrote:
         | "Make a profit" != commercializing, I would say.
        
         | teorema wrote:
         | Lets say we're talking about VCs and shareholders. Shouldn't
         | the public enjoy the same expectations? Especially when we're
         | just talking about a zero percent payback?
         | 
         | I think there's a legitimate argument taxes exist for this sort
         | of thing, but (1) taxes arguably are avoided in various ways to
         | the point it's a currently broken system, and (2) this is a
         | rare case where the government has a clear case for a specific
         | amount of money owed by a specific company -- why not keep it
         | simple?
         | 
         | If the grants aren't worth paying back at zero percent, the
         | corporation shouldn't be taking them.
        
           | bitcurious wrote:
           | >Shouldn't the public enjoy the same expectations?
           | 
           | The public absolutely should have some ROI, and in fact does
           | in the form of taxes.
        
             | Swenrekcah wrote:
             | So many problems could be solved if corporations only paid
             | their taxes without all the avoidance and/or evasion
             | gymnastics.
        
         | hanselot wrote:
         | I would expand this reasoning to cover all self-driving
         | vehicles. Is it not in the public interest to ensure that the
         | datasets required to ascertain self-driving vehicles are "safe"
         | be public resources which can be used to create open source
         | testing kits for these vehicles before putting them on real
         | roads? Why let first-to-market be the metric that determines
         | what a human life is worth? Should not every effort be made to
         | guarantee that every one of these vehicles are equally "safe"?
        
         | heavyarms wrote:
         | I highlighted almost the exact quote you have here and it's
         | nice to see it at the top of the discussion.
         | 
         | I agree with your sentiment, but I also think it's worth
         | thinking carefully about two of the main points that stuck out
         | to me:
         | 
         | - Access to compute for large models
         | 
         | - Access to large datasets (in this case mostly taxpayer funded
         | academic research)
         | 
         | Every company and/or research group has access to the data, but
         | some have a huge advantage in terms of compute. If there's a
         | question about commercializing research, the scales are tilted
         | toward those with more compute.
         | 
         | In this specific case, I think the intention to make AlphaFold
         | open source and available to the community is obviously the
         | best solution. But my question is, what happens if a less
         | altruistic for-profit entity uses its huge compute advantage to
         | develop new techniques and insights, and then patents
         | everything before it becomes available to the community?
         | 
         | I understand that is the basic mechanism for how
         | medical/pharmaceutical research gets translated into life-
         | saving treatments, but if we're approaching a generalized model
         | that can pump out "patent-worthy" discoveries only bound by the
         | amount of data and access to compute, there's an obvious
         | opportunity for a winner-take-most scenario.
        
         | dogma1138 wrote:
         | This can be applied to anything, Google couldn't have been
         | founded without decades of public research into computer
         | science that was itself built on thousands of years of human
         | knowledge.
         | 
         | Everything we do is built on top of what came before.
        
           | andrepd wrote:
           | This is indeed the argument of e.g. Anarchists such as
           | Kropotkin, or of Georgism.
        
         | andrepd wrote:
         | >At some point the public should tangibly benefit from it.
         | 
         | Yes indeed. The public. Not capital, not private concerns, but
         | the public.
        
         | axiosgunnar wrote:
         | I suppose the grants should then be paid back over time with
         | the money made?
         | 
         | Perhaps with some interest, since the grants are high risk
         | (many grants fail)
        
           | tzs wrote:
           | The grants were to various academic researchers who
           | researched, published, and did not commercialize their
           | discoveries.
           | 
           | The money will be made by private companies that have no
           | connection to the researchers who received the grants, but
           | simply use the published research in something they build.
           | 
           | It's hard to see a good way to build a system to make the
           | private companies pay back the grants. It would be an
           | accounting and tracking nightmare to try to figure out how
           | much money is actually being made from the research that any
           | given grant paid for.
        
           | jahnu wrote:
           | I think that tax should cover that. Of course that raises the
           | current problems with international firms and taxation.
        
             | axiosgunnar wrote:
             | Why do I have to pay the same tax rate as someone who
             | literally got tax payer money injected into his budget,
             | while I have to use surplus profit from the past?
        
               | xwolfi wrote:
               | What you're using to type this message was made possible
               | by research spending.
               | 
               | Or we could do like before: let the church help the
               | poors, the nobles take decisions and the peasants make
               | the food. Like that, everyone has its clear and simple
               | role and you wont complain of taxes: you ll have no
               | revenue :)
        
               | elcomet wrote:
               | Because you profit from those inventions ? You might be
               | saved by the drug discovered by Alphafold
        
               | KaoruAoiShiho wrote:
               | Basically what you're saying is governments should never
               | give grants only loans.
        
               | simondotau wrote:
               | Loans that only come due upon breakout success, more
               | precisely.
        
               | elcomet wrote:
               | Maybe government should invest in companies instead of
               | giving grants. So if the company fails, it is money lost
               | like a grant, but if there is success, then the
               | government can get its money back.
        
               | KaoruAoiShiho wrote:
               | Then you would have state capitalism, and the government
               | might be biased towards state owned enterprises, ruining
               | the free market.
        
               | friedman23 wrote:
               | This comment is absurd. You think people should be paying
               | you for using humanity's past knowledge (which you had no
               | part in creating) to advance technology and society?
        
               | andrepd wrote:
               | The argument is that Humanity's past knowledge and labour
               | is a common heritage of everyone. Anybody that benefits
               | from it must, at least in part, pay back "the commons"
               | for that benefit.
        
               | friedman23 wrote:
               | So you get to leech off the greatest minds in the present
               | while they are living and again after they are dead?
               | 
               | So when people build off humanity's past knowledge and
               | they pay for the privilege I assume the new knowledge
               | that is created does not belong to humanity any longer
               | and belongs to individuals?
        
           | hprotagonist wrote:
           | "we'll fund you, you keep the IP" is not one but two grant
           | structures! at least!
           | 
           | https://sbir.nih.gov/about/what-is-sbir-sttr
           | 
           | neither were at play here but the idea is pretty darn normal.
        
           | robbiep wrote:
           | Often (as in this instance) big breakthroughs are multiples
           | steps downstream from the initial grants or come from a
           | collection of research.
           | 
           | There's a reason for the saying 'standing on the shoulders of
           | giants'
        
       | ma2rten wrote:
       | (2020)
        
       | voiper1 wrote:
       | Needs (2020)
        
       | foxes wrote:
       | So do you think it actually understands something about the
       | structure of the protein folding problem? It somehow detected
       | something about the physics, topology, the hard optimisation
       | problem, and that it knows something about the geometry of that
       | potential surface and can exploit that?
       | 
       | Or is it just such a huge model it basically encodes an entire
       | database after weeks and weeks of computation and has a more
       | compressed form?
        
         | twanvl wrote:
         | An important input to this (and similar) algorithms is multiple
         | sequence alignment, which tells the algorithm which parts of
         | proteins are preserved between species and variants, and which
         | amino-acids mutate together. So already it is relying on
         | natural selection to do some of the work. And the algorithm
         | will probably not work very well if you input a random sequence
         | not found in nature and ask it to find the folding.
        
         | dekhn wrote:
         | it's pretty clear what it does. It uses the evolutionary
         | information expressed in multiple sequence alignments to make
         | reasonable judgements about interatomic distances, which are
         | used as constraints for a force field. We've been doing
         | variations on this for decades. The evolutionary information
         | encoded in multiple sequence alignments is pretty much all you
         | need to fold homologous proteins (apparently). No, this
         | technique doesn't do anything about the harder problems of
         | actually understanding the raw physics of protein folding (nor,
         | does it seem, that we need that to solve downstream problems).
        
         | wiz21c wrote:
         | My (basic) understanding is that 1/ there's some inductive bias
         | (knowledge from researchers) 2/ data is definitely "compressed"
         | in some ways 3/ since the model predicts better than the
         | others, then it actually found some relationships in the data
         | that were not found before.
         | 
         | From what I understand, deep learning, although opaque and
         | relying on tons of data, is a bit magical : although one would
         | say "it's just probabilities", it actually does probabilities
         | at a level where it actually figures some things.
         | 
         | Plus, and that's very much a problem to me, Google does it at
         | 100x the scale of a regular researcher. Since I just invested a
         | year in studying data sciences, that worries me a lot : where
         | am I suppose to work if, to produce meaningful results, you
         | need way-too-expensive hardware...
        
           | thomasahle wrote:
           | > where am I suppose to work if, to produce meaningful
           | results, you need way-too-expensive hardware...
           | 
           | I know how you feel, but I also think stories like this may
           | be a wake-up call for some groups to invest more in hardware.
           | The Baker and Tsinghua University groups are not small. They
           | can afford more than 4 GPUs.
           | 
           | Probably it's more about setting up a good pipeline. Once you
           | get about 4 GPUs you need more than one machine to run them.
           | Hopefully in the next years we'll see more open source tools
           | to make it easy to "build your own GPU cloud".
        
             | dumb1224 wrote:
             | And to use all that GPU power exclusively you need some
             | good strategy too. Speaking from a medium size biology
             | research centre.
        
               | thomasahle wrote:
               | True. It may be that a smaller research center will only
               | rarely be able to saturate the cluster. Maybe it doesn't
               | matter, like how other lab equipment is not in constant
               | use. Another option may be for centers to team up and
               | have shared machines? Or maybe compute as a service will
               | eventually be cheap enough for this not to matter...
        
               | dumb1224 wrote:
               | Currently only those team who are heavy-deep-learning got
               | special exclusive queues on the GPU nodes on the cluster.
               | If many users want to use the GPUs at the same time it
               | might need some planning. I don't know if it is a solved
               | problem in the HPC field though.
        
           | sgt101 wrote:
           | >if, to produce meaningful results, you need way-too-
           | expensive hardware...
           | 
           | If you are in a team that is looking at problems that justify
           | massive hardware (in the sense that solving them will pay
           | back the capital and environmental cost) then you will have
           | access to said hardware.
           | 
           | Most (almost all) AI and Data Science teams are not working
           | on that kind of problem though, and it's often the case that
           | we are working on cloud infrastructures where GPU's and TPU's
           | can be accessed on demand and $100's can be used to train
           | pretty good models. Obviously models need to be trained many
           | times so the crossover point from a few $100 to a few $1000
           | can be painful - but actually many/most engagements really
           | only need models that cost <$100 to train.
           | 
           | Also many of the interesting problems that are out there can
           | utilize transfer learning over shared large pretrained models
           | such as Resnet or GPT-2 (I know that in the dizzying paced
           | modern world these no longer count as large or modern but
           | they are examples...) So for images and natural language
           | problems we can get round the intractable demand for
           | staggeringly expensive compute.
           | 
           | Imagine that you had got a degree in Aeronautical
           | Engineering, you are watching the Apollo program and
           | wondering how you will get a job at NASA or something
           | similar... but there are lots of jobs at Boeing and Lockheed
           | and Cessna and so on.
        
           | touisteur wrote:
           | Most big labs have large computing clusters and upgrade them
           | from time to time. We've almost always needed huge computing
           | power in the scientific domain, no?
           | 
           | Anyway these days I see a lot of industrial investment in
           | middle size computing datacenters full of GPUs. Sure Google
           | scale is not within reach but I'm sure there's room for
           | scrapier algorithms and training methods to demonstrate
           | feasibility and then paying a lump sum to AWS (or some ml-
           | expert-for-cloud-training SME) for the final step.
           | 
           | Anyway, I thought the expensive part was data acquisition and
           | labeling? I like the 'surrogate network' approach of learning
           | a very-expensive-to-compute simulation or model, that doesn't
           | need data, but the output of a costly simulation.
        
         | shawnz wrote:
         | What's the difference between "knowing things" and finding a
         | more compressed form of the solution space?
        
           | hans1729 wrote:
           | queue joscha bach:
           | 
           | >Scientific logic is proving things by losslessly compressing
           | statements to their axioms. Commonsense logic uses lossy
           | compression, which makes it less accurate in edge cases, but
           | also less brittle, more efficient, further reaching and more
           | stable in most real-world situations.
        
           | garmaine wrote:
           | Being able to extrapolate beyond mere variations of the
           | training data.
           | 
           | EDIT: A simpler example might be helpful. We could, for
           | example, train a network to recognize and predict orbital
           | trajectories. Feed it either raw images or processed
           | position-and-magnitude readings, and it outputs predicted
           | future observations. One could ask, "does it really
           | understand orbital mechanics, or is it merely finding an
           | efficient compression of the solution space?"
           | 
           | But this question can be reduced in such a way as to made
           | empirical by presenting the network with a challenge that
           | requires real understanding to solve. For example, show it
           | observations of an interstellar visitor on a hyperbolic
           | trajectory. ALL of its training data consisted of
           | observations of objects in elliptical orbits exhibiting
           | periodic motion. If it is simply matching observation to its
           | training data, it will be unable to conceive that the
           | interstellar visitor is not also on a periodic trajectory.
           | But on the other hand if it really understood what it was
           | seeing then it would understand (like Kepler and Newton did)
           | that elliptical motion requires velocities bounded by an
           | upper limit, and if that speed is exceeded then the object
           | will follow a hyperbolic path away from the system, never to
           | return. It might not conceive these notions analytically the
           | way a human would, but an equivalent generalized model of
           | planetary motion must be encoded in the network if it is to
           | give accurate answers to questions posed so far outside of
           | its training data.
           | 
           | How you translate this into AlphaFold I'm not so certain, as
           | I lack the domain knowledge. But a practical ramification
           | would be the application of AlphaFold to novel protein
           | engineering. If AlphaFold lacks "real understanding", then
           | its quality will deteriorate when it is presented with
           | protein sequences further and further removed from its
           | training data, which presumably consists only of naturally
           | evolved biological proteins. Artificial design is not as
           | constrained as Darwinian evolution, so de novo engineered
           | proteins are more likely to diverge from AlphaFold's training
           | data. But if AlphaFold has an actual, generalized
           | understanding of the problem domain, then it should remain
           | accurate for these use cases.
        
           | ma2rten wrote:
           | It could have learned abstractions which help it to predict
           | how proteins fold but these do not correspond to the real
           | underlying causes.
        
             | foxes wrote:
             | I think it is fine if it is "effective". Really most of our
             | physics is effective. So valid at a certain length scale.
             | Fluid mechanics is very good, but it does not describe it
             | all in terms of quark interactions. Quantum field theories
             | are also mostly effective. So as long as it is describing
             | protein dynamics at some effective length scale that is
             | fine. Obviously it does not know anything about
             | quarks/electrons/etc etc.
        
           | sgt101 wrote:
           | Knowledge includes insight into the why part of the mechanism
           | - why does the protein behave in this way? This can lead to
           | generalizations which go beyond answering different questions
           | of the same sort (such as "what about this protein then") to
           | questions of a different form that have answers underpinned
           | by the mechanism. For example, "how does that structure
           | evolve over time?" this is closely related to the ability to
           | make analogies using the knowledge - "if proteins react in
           | that way within their own molecule then when they meet
           | another molecule they should react this way". Also the
           | knowledge only becomes knowledge when it's in the framework
           | that "can know" which is to say that the thing using it can
           | handle different questions and can decide to create an
           | analogy using other knowledge. For Alphafold2 that framework
           | is Deepmind, but of course I don't know enough to know if
           | they and it can know things about proteins in the way I
           | described or if they "just" have a compressed form of the
           | solution space. I suspect the latter.
        
           | foxes wrote:
           | I mean perhaps I am not entirely sure myself. I imagine that
           | the solution space to this problem is some very complicated,
           | lets say algebraic variety/manifold/space/configuration
           | space, but obviously it is still low enough dimension it can
           | be sort of picked out nicely from some huge ambient space.
           | 
           | For example specific points on this object are a folded
           | proteins. I suppose then it is how well does this get
           | encoded, does it know about "properties" of this surface, or
           | is it more like a rough kind of point cloud because you have
           | sampled enough and then it does some crude interpolation. But
           | maybe that does not respect the sort of properties in this
           | object. Maybe there are conservation laws, symmetry
           | properties, etc which are actually important, and then not
           | respecting that you have just produced garbage.
           | 
           | So I think it is important to know what kind of problem you
           | are dealing with. Imagine a long time scale n-body problem
           | with lots of sensitivity. Maybe in a video game it doesn't
           | matter if there is something non physical about what it
           | produces, as long as it looks good enough.
           | 
           | Maybe this interpolation is practical for its purpose.
           | 
           | But I think we should still be careful and question what kind
           | of problem it is applied to perhaps. Maybe it's more like a
           | complexity vs complicated question.
        
             | adverbly wrote:
             | Unrelated question, but this got me thinking:
             | 
             | > does it know about "properties" of this surface, or is it
             | more like a rough kind of point cloud because you have
             | sampled enough and then it does some crude interpolation
             | 
             | Say that there existed some high-level property such as
             | "conservation of energy". A "knowledge system" which learns
             | about that property would be able to answer any questions
             | related to it after reducing to a "conservation of energy"
             | problem. Is the same true for NNs? The way folks talk about
             | them, they sound like they can compress dynamically, and
             | would therefore be able to learn and apply new high-level
             | properties.
             | 
             | Also, do NNs have "rounding errors"? We have confidently
             | learned that energy is conserved, but would NNs which never
             | had that rule directly encoded understand conservation as
             | "exactly zero", or "zero with probability almost 1", or
             | "almost zero"?
        
         | robbedpeter wrote:
         | It's likely that many of the patterns it learned are encoded
         | understanding of the form you mention, but not at a formal
         | level of explication.
         | 
         | The architecture of the system and design of the training
         | methodology are laid out to specifically prevent direct
         | database-esque "pattern in pattern out" failure mode.
         | 
         | Similar to Google deep dream, there will be contextual features
         | and feature clusters encoded into neurons that can be explored
         | and extracted, and those could provide insights that can be
         | sourcing translated into "hard" science, with explicit formulae
         | and theory allowing a fully transparent model to be created.
         | 
         | Like other transformer models, you can elicit the training data
         | intact, but such scenarios are a statistically insignificant
         | selection of the range of outputs the models are capable of
         | producing. That doesn't mean anything with regards to accuracy
         | of the novel output, though.
         | 
         | With alphafold 2 going open source, it's possible that tools
         | and methodologies to extract hard science from transformers
         | will be formalized quickly and in the public eye. We have an
         | amazingly powerful new tool, and the coming decades will be
         | fascinating.
        
           | joe_the_user wrote:
           | _It 's likely that many of the patterns it learned are
           | encoded understanding of the form you mention, but not at a
           | formal level of explication._
           | 
           | - The thing I'd be curious about is whether or not "not
           | formalized" would imply "not consistently generalizing",
           | whether it would have to be train all over if given a problem
           | similar to but identical with, the problem it solves.
        
         | l33tman wrote:
         | Nature has re-used existing folds all over the place (partially
         | because of genetic mutation but also because it's improbable to
         | come up with stable folds from scratch by evolution I would
         | guess), this was encapsulated by earlier award-winning systems
         | like Rosetta. There is probably a finite number of folds in
         | nature, with most of the difference being in the outward-facing
         | amino acids "citation needed" :)
         | 
         | So an extremely large DL network would have a good chance to
         | find and integrate ("compress") all the existing folds and sub-
         | folds that human researchers or Rosetta missed or just hadn't
         | the time to investigate and characterize yet (I'm not an expert
         | on Rosetta by far btw so please expand if you are :).
         | 
         | I would venture to say it's a good problem fit for DL methods
         | (as was impressively demonstrated).
         | 
         | Regarding your question, "does it understand something about
         | the structure of the protein folding problem" - expanding on
         | the above, I would say it understands enough, but it probably
         | doesn't understand the generics of chemistry as protein and
         | their folding is a biased subset. The output is (as far as I
         | remember) an atom distance matrix and not atom trajectories
         | etc. so folding dynamics is not part of the model (this is btw
         | an important part of protein science as well).
        
       | jokoon wrote:
       | Science is the only field where ML would truly shine and be
       | really useful.
       | 
       | There are tons of science problems where there are just not
       | enough gray matter because it's just too expensive to train
       | scientists. ML can crunch any data and result and speed up
       | research by guiding experiments, where normal research just
       | doesn't have enough resources to do so.
       | 
       | Of course, it really only works if the scientists are able to
       | understand data and how to use ML, which is why computing becomes
       | just a tool for a scientist, nothing else.
       | 
       | And again, ML is not really "smart", it's just sophisticated,
       | improved statistical methods.
        
         | siver_john wrote:
         | As someone who's background is biology and physics and who does
         | ML work as well. This is an incredibly optimistic view of ML.
         | 
         | >Of course, it really only works if the scientists are able to
         | understand data and how to use ML, which is why computing
         | becomes just a tool for a scientist, nothing else.
         | 
         | Ideally in science you would like to use literally anything
         | else other than ML if possible, fitting models come with their
         | own challenges and neural networks are even more of a
         | nightmare. Understanding the world well enough to hard code a
         | rule is always preferable to fitting to some data and hoping
         | the model will come up with a rule. While there has been some
         | attempts to use ML for feature detection it then takes a lot of
         | experimenting to generally show if it detected signal or just
         | some noise in your data.
         | 
         | Most of the things that would accelerate science would either
         | require AI much more complex than we currently have (basically
         | replacing lab assistants with AI) or are incredible research
         | undertakings in their own right like Alpha Fold, Deep Potential
         | Neural Networks, etc.
        
           | mjburgess wrote:
           | ML is an associative statistical system of function
           | optimisation -- pretty much the _opposite_ of science.
           | 
           | Ie., ML makes the assumption that data points are IID.
           | 
           | The whole purpose of science is to produce models which
           | explain why data isnt IID.
        
             | blackbear_ wrote:
             | > ML is an associative statistical system of function
             | optimisation
             | 
             | You can also separate cause from effect by using causal
             | inference, under some assumptions.
             | 
             | > ML makes the assumption that data points are IID.
             | 
             | Common ML algorithms do, but it is done for practical
             | reasons rather than a limitation in the mathematics.
             | 
             | > The whole purpose of science is to produce models which
             | explain why data isnt IID.
             | 
             | And ML can greatly help in this, though it is not a silver
             | bullet.
        
               | jokoon wrote:
               | Actually, there might not be a good way to model or
               | describe the difference between causal inference,
               | correlation and causality.
               | 
               | Causality involves a deep understanding of a phenomenon
               | in science.
               | 
               | For example, the standard model of physics is pretty good
               | at describing the real world in a good enough manner
               | because we understand a lot of it. The difference with
               | correlation and causality, in my view, is human,
               | scientific understanding of what things are. Formulas,
               | data or drawing are not enough.
               | 
               | For example there might never be a way to prove natural
               | selection, even if there is a lot of data available, but
               | a lot of scientific consensus is enough to describe
               | causality.
        
             | jokoon wrote:
             | When you say it's the opposite of science, you mean ML is
             | just made of black boxes that completely hide away
             | knowledge that humans can interpret?
             | 
             | Science is derived from scio, which in latin means
             | knowledge.
             | 
             | It's true that in a way, ML allows new things, but which
             | are still obscuring real knowledge...
             | 
             | I'm still curious about analysis of trained networks.
        
           | miltondts wrote:
           | Totally agree.
           | 
           | While AlphaFold 2 is a tremendous achievement, to me the
           | major drawback is the blackbox approach. It means it is very
           | difficult to know when the model is outputting garbage and it
           | also doesn't directly lead to new insights.
           | 
           | A much more interesting approach: "Discovery of Physics From
           | Data: Universal Laws and Discrepancies" [1]
           | 
           | If ML did that, then it would be much more interesting.
           | 
           | [1] - https://www.frontiersin.org/articles/10.3389/frai.2020.
           | 00025...
        
         | omgwtfbbq wrote:
         | >Science is the only field where ML would truly shine and be
         | really useful.
         | 
         | There's a lot of hype about ML but this is a really bad take.
         | Just take computer vision, you could come up with a dozen non
         | science use cases in 5 minutes of brainstorming..
        
         | deeviant wrote:
         | > Science is the only field where ML would truly shine and be
         | really useful.
         | 
         | It's clear to see how ML is useful for science, but why exactly
         | do you think it's *only* useful for science? It seems like in
         | order for that to be true, you'd have to expand the definition
         | of science to basically everything.
        
         | temporalparts wrote:
         | > Science is the only field where ML would truly shine and be
         | really useful.
         | 
         | You know that ML is a really important part of a lot of
         | companies that aren't "Science", right?
         | 
         | Like Google search result rankings rely on ML.
        
       | deeviant wrote:
       | I'm I broken in some sort of way where I read:
       | 
       | > Neither the Oxford Protein Informatics Group nor I accept any
       | responsibility for the content of this post.
       | 
       | And just stopped. I get it that you don't want to speak for your
       | employer or academic institution but the "nor I" part is just
       | weird.
        
         | advisedwang wrote:
         | The author isn't saying they don't speak for themselves, they
         | are just saying don't rely on this info. They say they wrote it
         | to clear their thoughts, and probably haven't done a high level
         | of verification.
        
         | me_again wrote:
         | It's a law of the Internet that any post complaining about
         | grammar or spelling must inevitably contain a grammatical error
         | of its own. Yours is no exception.
         | 
         | And without wishing to be unkind, the fact that you are adverse
         | to this phrasing is not of general interest, which is likely
         | why you're attracting downvotes.
        
           | deeviant wrote:
           | You seem to have failed to comprehend my comment. It has
           | nothing to do with grammar, but rather the "I take no
           | responsibility for the content of my post" (my summary of
           | their wording), part.
        
           | tashi wrote:
           | Similarly, you probably meant "averse" instead of "adverse."
        
       | optimalsolver wrote:
       | What are the CASP competition equivalents of other
       | scientific/engineering fields?
        
         | danuker wrote:
         | Here are categories of benchmarks of Machine Learning papers
         | that also publish code:
         | 
         | https://paperswithcode.com/sota
        
       ___________________________________________________________________
       (page generated 2021-07-12 23:02 UTC)