[HN Gopher] AlphaFold 3 predicts the structure and interactions ...
       ___________________________________________________________________
        
       AlphaFold 3 predicts the structure and interactions of life's
       molecules
        
       Author : zerojames
       Score  : 683 points
       Date   : 2024-05-08 15:07 UTC (7 hours ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | s1artibartfast wrote:
       | The article was heavy on the free research aspect, but light on
       | the commercial application.
       | 
       | I'm curious about the business strategy. Does Google intend to
       | license out tools, partner, or consult for commercial partners?
        
         | ilrwbwrkhv wrote:
         | as soon as google tries to think commercially this will shut
         | down so the longer it stays pure research the better. google is
         | bad with productization.
        
           | s1artibartfast wrote:
           | I don't think it was ever pure research. The article talks
           | about infinity labs, which is the co. Mercial branch for drug
           | discovery.
           | 
           | I do agree that Google seems bad at commercialization, which
           | is why I'm curious on what the strategy is.
           | 
           | It is hard to see them being paid consultants or effective
           | partners for pharma companies, let alone developing drugs
           | themselves.
        
         | candiodari wrote:
         | I wonder what the license for RoseTTAFold is. On github you
         | have:
         | 
         | https://github.com/RosettaCommons/RoseTTAFold/blob/main/LICE...
         | 
         | But there's also:
         | 
         | https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE....
         | 
         | Which is it?
        
       | weregiraffe wrote:
       | s/predicts/attempts to predict
        
         | jasonjmcghee wrote:
         | The title OP gave accurately reflects the title of Google's
         | blog post. Title should not be editorialized.
        
           | jtbayly wrote:
           | Unless the title is clickbait, which it appears this is...
        
         | matt-attack wrote:
         | Syntax error
        
           | adrianmonk wrote:
           | Legal without the trailing slash in vi!
        
         | pbw wrote:
         | A prediction is a prediction; it's not necessarily a correct
         | prediction.
         | 
         | The weatherman predicts the weather, even if he's sometimes
         | wrong, we don't say "he attempts to predict" the weather.
        
         | dekhn wrote:
         | AlphaFold has been widely validated- it's now appreciated that
         | its predictions are pretty damn good, with a few important
         | exceptions, instances of which are addressed with the newer
         | implementation.
        
           | AtlasBarfed wrote:
           | "pretty damn good"
           | 
           | So... what percentage of the time? If you made an AI to pilot
           | an airplane, how would you verify its edge conditions, you
           | know, like plummeting out of the sky because it thought it
           | had to nosedive?
           | 
           | Because these AIs are black box neural networks, how do you
           | know they are predicting things correctly for things that
           | aren't in the training dataset?
           | 
           | AI has so many weasel words.
        
             | dekhn wrote:
             | As mentioned elsewhere and this thread and trivially
             | determinable by reading, AF2 is constantly being evaluated
             | in blind predictions where the known structure is hidden
             | until after the prediction. There's no weasel here; the
             | process is well-understood and accepted by the larger
             | community.
        
       | Metacelsus wrote:
       | From: https://www.nature.com/articles/d41586-024-01383-z
       | 
       | >Unlike RoseTTAFold and AlphaFold2, scientists will not be able
       | to run their own version of AlphaFold3, nor will the code
       | underlying AlphaFold3 or other information obtained after
       | training the model be made public. Instead, researchers will have
       | access to an 'AlphaFold3 server', on which they can input their
       | protein sequence of choice, alongside a selection of accessory
       | molecules. [. . .] Scientists are currently restricted to 10
       | predictions per day, and it is not possible to obtain structures
       | of proteins bound to possible drugs.
       | 
       | This is unfortunate. I wonder how long until David Baker's lab
       | upgrades RoseTTAFold to catch up.
        
         | wslh wrote:
         | The AI call is rolling fast, I see similarities with
         | cryptography in the 90s.
         | 
         | I have a history to tell for the record, back in the 90s we
         | developed a home banking for Palm (with a modem), it was
         | impossible to perform RSA because of the speed so I contacted
         | the CEO of Certicom which was the unique elliptic curve
         | cryptography implementation at that time. Fast forward and ECC
         | is everywhere.
        
         | l33tman wrote:
         | That sucks a bit. I was just wondering why they are touting
         | that 3rd party company in their own blog post, who
         | commercialise research tools, as well. Maybe there are some
         | corporate agreements with them that prevents them from opening
         | the system...
         | 
         | Imagine the goodwill for humanity for releasing these pure
         | research systems for free. I just have a hard time
         | understanding how you can motivate to keep it closed. Let's
         | hope it will be replicated by someone who doesn't have to hide
         | behind the "responsible AI" curtain as it seems they are now.
         | 
         | Are they really thinking that someone who needs to predict 11
         | structures per day are more likely to be a nefarious evil
         | protein guy than someone who predicts 10 structures a day? Was
         | AlphaFold-2 (that was open-sourced) used by evil researchers?
        
           | staminade wrote:
           | Isomorphic Labs? That's an Alphabet owned startup run by
           | Denis Hassabis that they created to commercialise the
           | Alphafold work, so it's not really a 3rd party at all.
        
           | SubiculumCode wrote:
           | There is at least some difference between a monitored server
           | and a privately ran one, if negative consequences are
           | possible
        
           | perihelions wrote:
           | - _" Imagine the goodwill for humanity for releasing these
           | pure research systems for free."_
           | 
           | The entire point[0] is that they want to sell an API to drug-
           | developer labs, at exclusive-monopoly pricing. Those labs in
           | turn discover life-saving drugs, and recoup their costs from
           | e.g. parents of otherwise-terminally-ill children--again,
           | priced as an exclusive monopoly.
           | 
           | [0] As signaled by _" it is not possible to obtain structures
           | of proteins bound to possible drugs"_
           | 
           | It's a massive windfall for Alphabet, and it'd be a profound
           | breach of their fiduciary duties as a public company to do
           | anything other than lock-down and hoard this API, and squeeze
           | it for every last billion.
           | 
           | This is a deeply, deeply, deeply broken situation.
        
             | karencarits wrote:
             | What is the current status of drugs where the major
             | contribution is from AI? Are they protectable like other
             | drugs? Or are they more copyless like AI art and so on?
        
             | iknowstuff wrote:
             | Is it broken if it yields new drugs? Is there a system that
             | yields more? The whole point of capitalism is that it
             | incentivizes this in a way that no other system does.
        
               | l33tman wrote:
               | My point one level up in the comments here, was not
               | really that the system is broken, but more like asking
               | how you can run these companies (google and that other
               | part run by the deepmind founder, who I bet already has
               | more money than he can ever spend) and still sleep well
               | knowing you're the rich capitalist a-hole commercializing
               | life-science work that your parent company has allocated
               | maybe one part in a million of their R&D budget into
               | creating.
               | 
               | It's not like Google is ever going to make billions on
               | this anyway, the alphafold algorithms are not super
               | advanced and you don't require the datasets of gpt4 to
               | train them so others will hopefully catch up.. though I'm
               | also pretty sure it requires GPU-hours beyond what a
               | typical non-profit academia outfit has available
               | unfortunately.. :/
        
             | lupire wrote:
             | The parents of those otherwise terminally ill children
             | disagree with you in the strongest possible terms.
        
             | goggy_googy wrote:
             | What makes this such a "deeply broken situation"?
             | 
             | I agree that late-stage capitalism can create really tough
             | situations for poor families trying to afford drugs. At the
             | same time, I don't know any other incentive structure that
             | would have brought us a breakthrough like AlphaFold this
             | soon. For the first time in history, we have ML models that
             | are beating out the scientific models by huge margins. The
             | very fact that this comes out of the richest, most
             | competitive country in the history of the world is not a
             | coincidence.
             | 
             | The proximate cause of the suffering for terminally-ill
             | children is really the drug company's pricing. If you want
             | to regulate this, though, you'll almost certainly have
             | fewer breakthroughs like AlphaFold. From a utilitarian
             | perspective, by preserving the existing incentive structure
             | (the "deeply broken situation" as you call it), you will be
             | extending the lifespans of _more people in the future_ (as
             | opposed to extending lifespans of more people now by
             | lowering drug prices).
        
               | firefoxbrower wrote:
               | Late-stage capitalism didn't bring us AlphaFold,
               | scientists did, late-stage capitalism just brought us
               | Alphabet swooping in at literally the last minute.
               | Socialize the innovation because that requires potential
               | losses, privatize the profits, basically. It's
               | reminiscent of "Heroes of CRISPR," where Doudna and
               | Charpentier are supposedly just some middle-men, because
               | stepping in at the last minute with more funding is
               | really what fuels innovation.
               | 
               | AlphaFold wasn't some lone genius breakthrough that came
               | out of nowhere, everything but the final steps were
               | basically created in academia through public funding. The
               | key insights, some combination of realizing that the
               | importance of sequence to structure to function put
               | analyzable constraints on sequence conservation and which
               | ML models could be applied to this, were made in academia
               | a long time ago. AlphaFold's training set, the PDB, is
               | also a result of decades of publicly funded work. After
               | that, the problem was just getting enough funding amidst
               | funding cuts and inflation to optimize. David Baker at
               | IPD did so relatively successfully, Jinbo Xu is less of a
               | fundraiser but was able to keep up basically alone with
               | one or two grad students at a time, etc. AlphaFold1 threw
               | way more people and money to basically copy what Jinbo Xu
               | had already done and barely beat him at that year's CASP.
               | Academics were leading the way until very, very recently,
               | it's not like the problem was stalled for decades.
               | 
               | Thankfully, the funding cuts will continue until research
               | improves, and after decades of inflation cutting into
               | grants, we are being rewarded by funding cuts to almost
               | every major funding body this year. I pledge allegiance
               | to the flag!
               | 
               | EDIT: Basically, if you know any scientists, you know the
               | vast majority of us work for years with little
               | consideration for profit because we care about the
               | science and its social impact. It's grating for the
               | community, after being treated worse every year, to then
               | see all the final credit go to people or companies like
               | Eric Lander and Google. Then everyone has to start over,
               | pick some new niche that everyone thinks is impossible,
               | only to worry about losing it when someone begins to get
               | it to work.
        
               | iknowstuff wrote:
               | Why haven't the academics created a non profit foundation
               | with open source models like this then? If alphabet
               | doesnt provide much, then they will be supplanted by non
               | profits. I see nothing broken here.
        
               | firefoxbrower wrote:
               | Individual labs somehow manage to do that and we're all
               | grateful. Martin Steinegger's lab put out ColabFold,
               | RELION is the gold standard for cryo-EM despite being
               | academic software and the development of more recent
               | industry competitors like cryoSPARC. Everything out of
               | the IPD is free for academic use. Someone has to fight
               | like hell to get all those grants, though, and from a
               | societal perspective, it's basically needlessly redundant
               | work.
               | 
               | My frustrations aren't with a lack of open source models,
               | some poor souls make them. My disagreement is with the
               | perception that academia has insufficient incentive to
               | work on socially important problems. Most such problems
               | are ONLY worked on in academia until they near the finish
               | line. Look at Omar Yaghi's lab's work on COFs and MOFs
               | for carbon/emission sequestration and atmospheric water
               | harvesting. Look at all the thankless work numerous labs
               | did on CRISPR-Cas9 before the Broad Institute even
               | touched it. Look at Jinbo Xu's work, on David Baker's
               | lab's and the IPD's work, etc. Look at what labs first
               | solved critical amyloid structures, infuriatingly
               | recently, considering the massive negative social impacts
               | of neurodegenerative diseases.
               | 
               | It's only rational for companies that only care about
               | their own profit maximization to socialize R&D costs and
               | privatize any possible gains. This can work if companies
               | aren't being run by absolute ghouls who aren't delaying
               | the release of a new generation of drugs to minimize
               | patent duration overlap or who aren't trying to push
               | things that don't work for short-term profit. This can
               | also work if we properly fund and credit publicly funded
               | academic labs. This is not what's happening, however,
               | instead public funded research is increasingly demeaned,
               | defunded, and dismantled due to the false impression that
               | nothing socially valuable gets done without a profit
               | motive. It's okay, though, I guess under this kind of LSC
               | worldview, that everything always corrects itself so
               | preempting problems doesn't matter, we'll finally learn
               | how much actual innovation is publicly funded when we get
               | the Minions movie, aducanumab, and WeWork over and over
               | again for a few decades while strangling the last bit of
               | nature we have left.
        
               | j-wags wrote:
               | I work at Open Force Field [1] which is the kind of
               | nonprofit that I think you're talking about. Our sister
               | project, OpenFold [2], is working on open source versions
               | of AlphaFold.
               | 
               | We're making good progress but it's difficult to
               | interface with fundamentally different organizational
               | models between academia and industry. I'm hoping that
               | this model will become normalized in the future. But it
               | takes serious leaps of faith from all involved
               | (professors, industry leaders, grant agencies, and - if I
               | can flatter myself - early career scientists) to leave
               | the "safe route" in their organizations and try something
               | like this.
               | 
               | [1] https://openforcefield.org/ [2] https://openfold.io/
        
         | Jerrrry wrote:
         | The second amendment prevents the government's overreaching
         | perversion to restrict me from having the ability to print
         | biological weapons from the comfort of my couch.
         | 
         | Google has no such restriction.
        
           | SubiculumCode wrote:
           | /s is strong with this one
        
           | gameman144 wrote:
           | I know this is tongue in cheek, but you absolutely can be
           | restricted from having a biological weapons factory in your
           | basement (similar to not being able to pick "nuclear bombs"
           | as your arms to bear).
        
             | timschmidt wrote:
             | Seems like the recipe for independence, and agreed upon
             | borders, and thus whatever interpretation of the second
             | amendment one wants involves exactly choosing nuclear
             | bombs, and managing to stockpile enough of them before
             | being bombed oneself. At least at the nation state scale.
             | Sealand certainly resorted to arms at several points in
             | it's history.
        
               | gameman144 wrote:
               | The second amendment only applies to the United States --
               | it's totally normal to have one set of rights for
               | citizens and another set for the government itself.
        
           | dekhn wrote:
           | Sergey once said "We don't have an army per-se" (he was
           | referring the size of Google's physical security group) at
           | TGIF.
           | 
           | There was a nervous chuckle from the audience.
        
         | rolph wrote:
         | in other words, this has been converted to a novelty, and has
         | no use for scientific purposes.
        
           | ebiester wrote:
           | No. It just means that scientific purposes will have an
           | additional tax paid to google. This will likely reduce use in
           | academia but won't deter pharmaceutical companies.
        
         | mhrmsn wrote:
         | Also no commercial use, from the paper:
         | 
         | > AlphaFold 3 will be available as a non-commercial usage only
         | server at https://www.alphafoldserver.com, with restrictions on
         | allowed ligands and covalent modifications. Pseudocode
         | describing the algorithms is available in the Supplementary
         | Information. Code is not provided.
        
           | moralestapia wrote:
           | How easy/hard would be for the scientific community to come
           | up with an "OpenFold" model which is pretty much AF3 but
           | fully open source and without restrictions in it?
           | 
           | I can image training will be expensive, but I don't think it
           | will be at a GPT-4 level of expensive.
        
             | dekhn wrote:
             | already did it, https://openfold.io/
             | https://github.com/aqlaboratory/openfold
             | https://www.biorxiv.org/content/10.1101/2022.11.20.517210v1
             | https://lupoglaz.github.io/OpenFold2/
             | https://www.biospace.com/article/releases/openfold-
             | biotech-a...
             | 
             | I really have to emphasize that transformers have literally
             | transformed science in only a few years. Truly
             | extraordinary.
        
               | moralestapia wrote:
               | Oh, nice! Thanks for sharing.
        
           | p3opl3 wrote:
           | Yes, because that's going to stop competitors.. it's why they
           | didn't release code I guess.
           | 
           | This is yet another large part of a biotech related Gutenberg
           | moment.
        
             | natechols wrote:
             | The DeepMind team was essentially forced to publish and
             | release an earlier iteration of AlphaFold after the Rosetta
             | team effectively duplicated their work and published a
             | paper about it in Science. Meanwhile, the Rosetta team just
             | published a similar work about co-folding ligands and
             | proteins in Science a few weeks ago. These are hardly the
             | only teams working in this space - I would expect progress
             | to be very fast in the next few years.
        
               | dekhn wrote:
               | How much has changed- I talked with David Baker at CASP
               | around 2003 and he said at the time, while Rosetta was
               | the best modeller, every time they updated its models
               | with newly determined structures, its predictions got
               | worse :)
        
               | natechols wrote:
               | It's kind of amazing in retrospect that it was possible
               | to (occasionally) produce very good predictions 20 years
               | ago with at least an order of magnitude smaller training
               | set. I'm very curious whether DeepMind has tried trimming
               | the inputs back to an earlier cutoff point and re-
               | training their models - assuming the same computing
               | technologies were available, how well would their methods
               | have worked a decade or two ago? Was there an inflection
               | point somewhere?
        
           | pantalaimon wrote:
           | What's the point in that that - I mean who does non-
           | commercial drug research?
        
             | sangnoir wrote:
             | Academia
        
             | karencarits wrote:
             | Public universities?
        
           | obmelvin wrote:
           | If you need to submit to their server, I don't know who would
           | use it for commercial reasons anyway. Most biotech startups
           | and pharma companies are very careful about entering
           | sequences into online tools like this.
        
         | ranger_danger wrote:
         | Not just unfortunate, but doesn't this make it completely
         | untrustable? How can you be sure the data was not modified in
         | any way? How can you verify any results?
        
           | dekhn wrote:
           | You determine a crystal structure of a known protein which
           | does not previously have a known structure, and compare the
           | prediction to the experimentally determined structure.
           | 
           | There is a biennial (biannual?) competition known as CASP
           | where some new structures, not yet published, are used for
           | testing predictions from a wide range of protein structure
           | prediction (so, basically blind predictions which are then
           | compared when the competition wraps up). AlphaFold beat all
           | the competitors by a very wide margin (much larger than the
           | regular rate of improvement in the competition), and within a
           | couple years, the leading academic groups adopted the same
           | techniques and caught up.
           | 
           | It was one of the most important and satisfying moments in
           | structure prediction in the past two+ decades. The community
           | was a bit skeptical but as it's been repeatedly tested,
           | validated, and reproduced, people are generally of the
           | opinion that DeepMind "solved" protein structure prediction
           | (with some notable exceptions), and did so without having the
           | solve the full "protein folding problem" (which is actually
           | great news while also being somewhat depressing).
        
             | ranger_danger wrote:
             | By data I meant between the client and server, nothing
             | actually related to how the program itself works, but just
             | the fact that it's controlled by a proprietary third party.
        
         | tepal wrote:
         | Or OpenFold, which is the more literal reproduction of
         | AlphaFold 2: https://github.com/aqlaboratory/openfold
        
           | LarsDu88 wrote:
           | Time for an OpenFold3? Or would it be an OpenFold2?
        
         | photochemsyn wrote:
         | Well, it's because you can design deadly viruses using this
         | technology. Viruses gain entry to living cells via cell-surface
         | receptor proteins whose normal job is to bind signalling
         | molecules, alter their conformation and translate that external
         | signal into the cellular interior where it triggers various
         | responses from genomic transcription to release of other signal
         | molecules. Viruses hijack such mechanisms to gain entry to
         | cells.
         | 
         | Thus if you can design a viral coat protein to bind to a human
         | cell-surface receptor, such that it gets translocated into the
         | cell, then it doesn't matter so much where that virus came
         | from. The cell's firewall against viruses is the cell membrane,
         | and once inside, the biomolecular replication machinery is very
         | similar from species to species, particularly within restricted
         | domains, such as all mammals.
         | 
         | Thus viruses from rats, mice, bats... aren't going to have
         | major problems replicating in their new host - a host they only
         | gained access to because some nation-state actors working in
         | collaboration on such gain-of-function research in at least two
         | labs on opposite sides of the world with funds and material
         | provided by the two largest economic powers for reasons that
         | are still rather opaque, though suspiciously banal...
         | 
         | Now while you don't _need_ something like AlphaFold3 to do
         | recklessly stupid things (you could use directed evolution,
         | making millions of mutatad proteins, throwing them at a wall of
         | human cell receptors and collecting what stuck), it makes it
         | far easier. Thus Google doesn 't want to be seen as enabling,
         | though given their prediliction for classified military-
         | industrial contracting to a variety of nation-states,
         | particularly with AI, with revenue now far more important than
         | silly "don't be evil" statements, they might bear watching.
         | 
         | On the positive side, AlphaFold3 will be great for fields like
         | small molecular biocatalysis, i.e. industrial applications in
         | which protein enzymes (or more robust heterogenous catalysts
         | designed based on protein structures) convert N2 to ammonia,
         | methane to methanol, or selectively bind CO2 for carbon
         | capture, modification of simple sugars and amino acids, etc.
        
         | niemandhier wrote:
         | The logical consequence is to put all scientific publications
         | under a license that restricts the right to train commercial ai
         | models on them.
         | 
         | Science advances because of an open exchange of ideas, the
         | original idea of patents was to grant the inventor exclusive
         | use in exchange for disclosure of knowledge.
         | 
         | Those who did not patent, had to accept that their inventions
         | would be studied and reverse engineered.
         | 
         | The ,,as a service" model, breaks that approach.
        
         | dwroberts wrote:
         | This turns it into a tool that deserves to be dethroned by
         | another group, frankly. What a strange choice.
        
       | renonce wrote:
       | > What is different about the new AlphaFold3 model compared to
       | AlphaFold2?
       | 
       | > AlphaFold3 can predict many biomolecules in addition to
       | proteins. AlphaFold2 predicts structures of proteins and protein-
       | protein complexes. AlphaFold3 can generate predictions containing
       | proteins, DNA, RNA, ions,ligands, and chemical modifications. The
       | new model also improves the protein complex modelling accuracy.
       | Please refer to our paper for more information on performance
       | improvements.
       | 
       | AlphaFold 2 generally produces looping "ribbon-like" predictions
       | for disordered regions. AlphaFold3 also does this, but will
       | occasionally output segments with secondary structure within
       | disordered regions instead, mostly spurious alpha helices with
       | very low confidence (pLDDT) and inconsistent position across
       | predictions.
       | 
       | So the criticism towards AlphaFold 2 will likely still apply? For
       | example, it's more accurate for predicting structures similar to
       | existing ones, and fails at novel patterns?
        
         | dekhn wrote:
         | I am not aware of anybody currently criticiszing AF2's
         | abilities outside of its training set. In fact the most recent
         | papers (written by crystallographers) they are mostly arguing
         | about atomic-level details of side chains at this point.
        
         | COGlory wrote:
         | >So the criticism towards AlphaFold 2 will likely still apply?
         | For example, it's more accurate for predicting structures
         | similar to existing ones, and fails at novel patterns?
         | 
         | Yes, and there is simply no way to bridge that gap with this
         | technique. We can make it better and better at pattern
         | matching, but it is not going to predict novel folds.
        
           | dekhn wrote:
           | alphafold has been shown to accurately predict some novel
           | folds. The technique doesn't entirely depend on whole-domain
           | homology.
        
         | rolph wrote:
         | problem is biomolecules, are "chaperoned" to fold properly,
         | only specific regions such as, alpha helix, or beta
         | pleatedsheet will fold de novo.
         | 
         | Chaperone (protein)
         | 
         | https://en.wikipedia.org/wiki/Chaperone_(protein)
        
       | tea-coffee wrote:
       | This is a basic question, but how is the accuracy of the
       | predicted biomolecular interactions measured? Are the predicted
       | interactions compared to known interactions? How would the
       | accuracy of predicting unknown interactions be assessed?
        
         | joshuamcginnis wrote:
         | Accuracy can be assessed two main ways: computationally and
         | experimentally. Computationally, they would compare the
         | predicted structures and interactions with known data from
         | databases like PDB (Protein Database). Experimentally, they can
         | use tools like x-ray crystallography and NMR (nuclear magnetic
         | resonance) to obtain the actual molecule structure and compare
         | it to the predicted result. The outcomes of each approach would
         | be fed back into the model for refining future predictions.
         | 
         | https://www.rcsb.org/
        
           | dekhn wrote:
           | AlphaFold very explicitly (unless something has changed)
           | removes NMR structures as references because they are not
           | accurate enough. I have a PhD in NMR biomolecular structure
           | and I wouldn't trust. the structures for anything.
        
             | JackFr wrote:
             | Sorry, I don't mean to be dense - do you mean you don't
             | trust AlphaFolds structures or NMRs?
        
               | dekhn wrote:
               | I don't trust NMR structures in nearly all cases. The
               | reasons are complex enough that I don't think it's
               | worthwhile to discuss on Hacker News.
        
               | fikama wrote:
               | Hmm, I would say its always worth to share knowledge.
               | Could you paste some links or maybe type a few key-words
               | for anyone willing to reasearch the topic further on his
               | own.
        
               | dekhn wrote:
               | Read this, and recursively (breadth-first) read all its
               | transitive references: https://www.sciencedirect.com/scie
               | nce/article/pii/S096921262...
        
             | fabian2k wrote:
             | Looking at the supplementary material (section 2.5.4) for
             | the AlphaFold 3 paper it reads to me like they still use
             | NMR structures for training, but not for evaluating
             | performance of the model.
        
               | dekhn wrote:
               | I think it's implicit in their description of filtering
               | the training set, where they say they only include
               | structures with resolution of 9A or less. NMR structures
               | don't really have a resolution, that's more specific to
               | crystallography. However, I can't actually verify that no
               | NMR structures were included without directly inspecting
               | their list of selected structures.
        
               | fabian2k wrote:
               | I think it is very plausible that they don't use NMR
               | structures here, but I was looking for a specific
               | statement on it in the paper. I think your guess is
               | plausible, but I don't think the paper is clear enough
               | here to be sure about this interpretation.
        
               | dekhn wrote:
               | Yes, thanks for calling that out. In verifying my
               | statement I actually was confused because you can see
               | they filter NMR out of the eval set (saying so
               | explicitly) but don't say that in the test set section
               | (IMHO they should be required to publish the actual
               | selection script so we can inspect the results).
        
               | fabian2k wrote:
               | Hmm, in the earlier AlphaFold 2 paper they state:
               | 
               | > Input mmCIFs are restricted to have resolution less
               | than 9 A. This is not a very restrictive filter and only
               | removes around 0.2% of structures
               | 
               | NMR structures are more than 0.2% so that doesn't fit to
               | the assumption that they implicitly remove NMR structures
               | here. But if I filter by resolution on the PDB homepage
               | it does remove essentially all NMR structures. I'm really
               | not sure what to think here, the description seems too
               | soft to know what they did exactly.
        
             | panabee wrote:
             | interesting observation and experience. must have made
             | thesis development complex, assuming the realization dawned
             | on you during the phd.
             | 
             | what do you trust more than NMR?
             | 
             | AF's dependence on MSAs also seems sub-optimal; curious to
             | hear your thoughts?
             | 
             | that said, it's understandable why they used MSAs, even if
             | it seems to hint at winning CASP more than developing a
             | generalizable model.
             | 
             | arguably, MSA-dependence is the wise choice for early
             | prediction models as demonstrated by widespread accolades
             | and adoption, i.e., it's an MVP with known limitations as
             | they build toward sophisticated approaches.
        
               | dekhn wrote:
               | My realizations happened after my PhD. When I was writing
               | my PhD I still believed we would solve the protein
               | folding and structure prediction problems using classical
               | empirical force fields.
               | 
               | It wasn't until I started my postdocs, where I started
               | learning about protein evolutionary relationships (and
               | competing in CASP), that I changed my mind. I wouldn't
               | say it so much as "multiple sequence alignments"; those
               | are just tools to express protein relationships in a
               | structured way.
               | 
               | If Alphafold now, or in the future, requires no
               | evolutionary relationships based on sequence (uniprot)
               | and can work entirely by training on just the proteins in
               | PDB (many of which _are_ evoutionarily related) and still
               | be able to predict novel folds, it will be very
               | interesting times. The one thing I have learned is that
               | evolutionary knowledge makes many hard problems really
               | easy, because you 're taking advantage of billions of
               | years of nature and an easy readout.
        
             | heyoni wrote:
             | Nice to see you on this thread as well! :)
        
       | dopylitty wrote:
       | This reminds me of Google's claim that another "AI" discovered
       | millions of new materials. The results turned out to be a lot of
       | useless noise but that was only apparent after actual expert
       | spent hundreds of hours reviewed the results[0]
       | 
       | 0: https://www.404media.co/google-says-it-discovered-
       | millions-o...
        
         | dekhn wrote:
         | The alphafold work has been used across the industry
         | (successfully, in the sense of blind prediction), and has been
         | replicated independently. The work on alphafold will likely net
         | Demis and John a Nobel prize in the next few years.
         | 
         | (that said, one should always inspect Google publications with
         | a fine-toothed comb and lots of skepticism, as they have a
         | tendency to juice the results)
        
           | 11101010001100 wrote:
           | Depending on your expected value of quantum computing, the
           | Nobel committee shouldn't wait too long.
        
             | dekhn wrote:
             | Personally I don't expect QC to be a competitor to ML in
             | protein structure prediction for the foreseeable future.
             | After spending more money on molecular dynamics than
             | probably any other human being, I'm really skeptical that
             | physical models of protein structures will compete with ML-
             | based approaches (that exploit homology and other protein
             | sequence similarities).
        
           | nybsjytm wrote:
           | >The alphafold work has been used across the industry
           | (successfully, in the sense of blind prediction), and has
           | been replicated independently.
           | 
           | This is clearly an overstatement, or at least very
           | incomplete. See for instance
           | https://www.nature.com/articles/s41592-023-02087-4:
           | 
           | "In many cases, AlphaFold predictions matched experimental
           | maps remarkably closely. In other cases, even very high-
           | confidence predictions differed from experimental maps on a
           | global scale through distortion and domain orientation, and
           | on a local scale in backbone and side-chain conformation. We
           | suggest considering AlphaFold predictions as exceptionally
           | useful hypotheses."
        
             | dekhn wrote:
             | Yep, I know Paul Adams (used to work with him at Berkeley
             | Lab) and that's exactly the paper he'd publish. If you read
             | that paper carefully (as we all have, since it's the
             | strongest we've seen from the crystallography community so
             | far) they're basically saying the results from AF are
             | absolutely excellent, and fit for purpose.
             | 
             | (put another way: if Paul publishes a paper saying your
             | structure predictions have issues, and mostly finds tiny
             | local issues and some distortion and domain orientation,r
             | ather than absolutely incorrect fold prediction, it means
             | your technique works really well, and people are just
             | quibbling about details.)
        
               | nybsjytm wrote:
               | I don't know Paul Adams, so it's hard for me to know how
               | to interpret your post. Is there anything else I can read
               | that discusses the accuracy of AlphaFold?
        
               | dekhn wrote:
               | Yes, https://predictioncenter.org/casp15/ https://www.sci
               | encedirect.com/science/article/pii/S0959440X2...
               | https://dasher.wustl.edu/bio5357/readings/oxford-
               | alphafold2....
               | 
               | I can't find the link at the moment but from the
               | perspective of the CASP leaders, AF2 was accurate enough
               | that it's hard to even compare to the best structures
               | determined experimentally, due to noise in the
               | data/inadequacy of the metric.
               | 
               | A number of crystallographers have also reported that the
               | predictions helped them find errors in their own crystal-
               | determined structures.
               | 
               | If you're not really familiar enough with the field to
               | understand the papers above, I recommend spending more
               | time learning about the protein structure prediction
               | problem, and how it relates to the epxerimental
               | determination of structure using crystallography.
        
               | nybsjytm wrote:
               | Thanks, those look helpful. Whenever I meet someone with
               | relevant PhDs I ask their thoughts on AlphaFold, and I've
               | gotten a wide variety of responses, from responses like
               | yours to people who acknowledge its usefulness but are
               | rather dismissive about its ultimate contribution.
        
               | dekhn wrote:
               | The people who are most likely to deprecate AlphaFold are
               | the ones whose job viability is directly affected by its
               | existence.
               | 
               | Let me be clear: DM only "solved" (and really didn't
               | "solve") a subset of a much larger problem: creating a
               | highly accurate model of the process by which real
               | proteins adopt their folded conformations, or how some
               | proteins don't adopt folded conformations without
               | assistance, or how some proteins don't adopt a fully
               | rigid conformation, or how some proteins can adopt
               | different shapes in different conditions, or how enzymes
               | achieve their catalyst abilities, or how structural
               | proteins produce such rigid structures, or how to predict
               | whether a specific drug is going to get FDA approval and
               | then make billions of dollars.
               | 
               | In a sense we got really lucky because CASP has been
               | running so long and with some many contributors that it
               | became recognized that winning at CASP meant "solving
               | protein structure prediction to the limits of our ability
               | to evaluate predictions", and that Demis and his
               | associates had such a huge drive to win competitions that
               | they invested tremendous resources and state of the art
               | technology, while sharing enough information that the
               | community could reproduce the results in their own hands.
               | Any problem we want solved, we should gamify, so that
               | DeepMind is motivated to win the game.
        
               | panabee wrote:
               | this is very astute, not only about deepmind but about
               | science and humanity overall.
               | 
               | what CASP did was narrowly scope a hard problem, provided
               | clear rules and metrics for evaluating participants, and
               | offered a regular forum in which candidates can showcase
               | skills -- they created a "game" or competition.
               | 
               | in doing so, they advanced the state of knowledge
               | regarding protein structure.
               | 
               | how can we apply this to cancer and deepen our
               | understanding?
               | 
               | specifically, what parts of cancer can we narrowly scope
               | that are still broadly applicable to a complex
               | heterogenous disease and evaluate with objective metrics?
               | 
               | [edited to stress the goal of advancing cancer knowledge,
               | not to "gamify" cancer science but to create structures
               | that inivte more ways to increase our understanding of
               | cancer.]
        
               | natechols wrote:
               | I also worked with the same people (and share most of the
               | same biases) and that paper is about as close to a
               | ringing endorsement of AlphaFold as you'll get.
        
         | Laaas wrote:
         | > We have yet to find any strikingly novel compounds in the
         | GNoME and Stable Structure listings, although we anticipate
         | that there must be some among the 384,870 compositions. We also
         | note that, while many of the new compositions are trivial
         | adaptations of known materials, the computational approach
         | delivers credible overall compositions, which gives us
         | confidence that the underlying approach is sound.
         | 
         | Doesn't seem outright useless.
        
       | _xerces_ wrote:
       | A video summary of why this research is important:
       | https://youtu.be/Mz7Qp73lj9o?si=29vjdQtTtIOk_0CV
        
         | ProllyInfamous wrote:
         | Thanks for this informative video summary. As a layperson, with
         | a BS in Chemistry, it was quite helpful in understanding main
         | bulletpoints of this accomplishment.
        
       | moconnor wrote:
       | Stepping back, the high-order bit here is an ML method is beating
       | physically-based methods for _accurately_ predicting the world.
       | 
       | What happens when the best methods for computational fluid
       | dynamics, molecular dynamics, nuclear physics are all
       | uninterpretable ML models? Does this decouple progress from our
       | current understanding of the scientific process - moving to
       | better and better models of the world _without_ human-
       | interpretable theories and mathematical models  / explanations?
       | Is that even iteratively sustainable in the way that scientific
       | progress has proven to be?
       | 
       | Interesting times ahead.
        
         | cgearhart wrote:
         | This is a neat observation. Slightly terrifying, but still
         | interesting. Seems like there will also be cases where we
         | discover new theories through the uninterpretable models--much
         | easier and faster to experiment endlessly with a computer.
        
         | fnikacevic wrote:
         | I can only hope the models will be sophisticated enough and
         | willing to explain their reasoning to us.
        
         | thomasahle wrote:
         | > Stepping back, the high-order bit here is an ML method is
         | beating physically-based methods for accurately predicting the
         | world.
         | 
         | I mean, it's just faster, no? I don't think anyone is claiming
         | it's a more _accurate_ model of the universe.
        
           | Jerrrry wrote:
           | Collision libraries and fluid libraries have had baked-in
           | memorized look-up tables that were generated with ML methods
           | nearly a decade ago.
           | 
           | World is still here, although the Matrix/metaverse is
           | becoming more attractive daily.
        
         | xanderlewis wrote:
         | It depends whether the value of science is human understanding
         | or pure prediction. In some realms (for drug discovery, and
         | other situations where we just need _an answer_ and know what
         | works and what doesn't), pure prediction is all we really need.
         | But if we could build an uninterpretable machine learning model
         | that beats any hand-built traditional 'physics' model, would it
         | really be physics?
         | 
         | Maybe there'll be an intermediate era for a while where ML
         | models outperform traditional analytical science, but then
         | eventually we'll still be able to find the (hopefully limited
         | in number) principles from which it can all be derived. I don't
         | think we'll ever find that Occam's razor is no use to us.
        
           | failTide wrote:
           | > But if we could build an uninterpretable machine learning
           | model that beats any hand-built traditional 'physics' model,
           | would it really be physics?
           | 
           | At that point I wonder if it would be possible to feed that
           | uninterpretable model back into another model that makes
           | sense of it all and outputs sets of equations that humans
           | could understand.
        
           | gmarx wrote:
           | The success of these ML models has me wondering if this is
           | what Quantum Mechanics is. QM is notoriously difficult to
           | interpret yet makes amazing predictions. Maybe wave functions
           | are just really good at predicting system behavior but don't
           | reflect the underlying way things work.
           | 
           | OTOH, Newtonian mechanics is great at predicting things under
           | certain circumstances yet, in the same way, doesn't
           | necessarily reflect the underlying mechanism of the system.
           | 
           | So maybe philosophers will eventually tell us the distinction
           | we are trying to draw, although intuitive, isn't real
        
             | kolinko wrote:
             | That's what thermodynamics is - we initially only had laws
             | about energy/heat flow, and only later we figured out how
             | statistical particle movements cause these effects.
        
           | RandomLensman wrote:
           | Pure prediction is only all we need if the total end-to-end
           | process is predicted correctly - otherwise there could be
           | pretty nasty traps (e.g., drug works perfectly for the target
           | disease but does something unexpected elsewhere etc.).
        
             | gus_massa wrote:
             | > _e.g., drug works perfectly for the target disease but
             | does something unexpected elsewhere etc._
             | 
             | That's very common. It's the reason to test the new drug in
             | petri dish, then rats, then dogs, then humans and if all
             | test passed send it to the pharmacy.
        
         | ozten wrote:
         | Science has always given us better, but error prone tooling to
         | see further and make better guesses. There is still a
         | scientific test. In a clinical trial, is this new drug safe and
         | effective.
        
         | nexuist wrote:
         | As a steelman, wouldn't the abundance of infinitely generate-
         | able situations make it _easier_ for us to develop strong
         | theories and models? The bottleneck has always been data. You
         | have to do expensive work in the real world and accurately
         | measure it before you can start fitting lines to it. If we were
         | to birth an e.g. atomically accurate ML model of quantum
         | physics, I bet it wouldn't take long until we have mathematical
         | theories that explain why it works. Our current problem is that
         | this stuff is super hard to manipulate and measure.
        
           | moconnor wrote:
           | Maybe; AI chess engines have improved human understanding of
           | the game very rapidly, even though humans cannot beat
           | engines.
        
           | alfalfasprout wrote:
           | This is an important aspect that's being ignored IMO.
           | 
           | For a lot of problems, currently you either don't have an an
           | analytical solution and the alternative is a brute force-ish
           | numerical approach. As a result the computational cost of
           | simulating things enough times to be able to detect behavior
           | that can inform theories/models (potentially yielding a good
           | analytical result) is not viable.
           | 
           | In this regard, ML models are promising.
        
         | CapeTheory wrote:
         | Many of our existing physical models can be decomposed into
         | "high-confidence, well tested bit" plus "hand-wavy empirically
         | fitted bit". I'd like to see progress via ML replacing the
         | empirical part - the real scientific advancement then becomes
         | steadily reducing that contribution to the whole by improving
         | the robust physical model incrementally. Computational
         | performance is another big influence though. Replacing the
         | whole of a simulation with an ML model might still make sense
         | if the model training is transferrable and we can take
         | advantage of the GPU speed-ups, which might not be so easy to
         | apply to the foundational physical model solution. Whether your
         | model needs to be verified against real physical models depends
         | on the seriousness of your use-case; for nuclear weapons and
         | aerospace weather forecasts I imagine it will remain essential,
         | while for a lot of consumer-facing things the ML will be good
         | enough.
        
           | jononor wrote:
           | Physics-informed machine learning is a whole (nascent)
           | subfield that is very much in line with this thinking. Steve
           | Brunton has some good stuff about this on YouTube.
        
         | jncfhnb wrote:
         | These processes are both beyond human comprehension because
         | they contain vast layers of tiny interactions and also not
         | practical to simulate. This tech will allow for exploration for
         | accurate simulations to better understand new ideas if needed.
        
         | tomrod wrote:
         | A few things:
         | 
         | 1. Research can then focus on where things go wrong
         | 
         | 2. ML models, despite being "black boxes," can still have
         | brute-force assessment performed of the parameter space over
         | covered and uncovered areas by input information
         | 
         | 3. We tend to assume parsimony (i.e Occam's razor) to give
         | preference to simpler models when all else is equal. More
         | complex black-box models exceeding in prediction let us know
         | the actual causal pathway may be more complex than simple
         | models allow. This is okay too. We'll get it figured out. Not
         | everything is closed-form, especially considering quantum
         | effects may cause statistical/expected outcomes instead of
         | deterministic outcomes.
        
         | kylebenzle wrote:
         | That is not a real concern, just a confusion on how statistics
         | works :(
        
         | timschmidt wrote:
         | There will be an iterative process built around curated
         | training datasets - continually improved, top tier models,
         | teams reverse engineering the model's understanding and
         | reasoning, and applying that to improve datasets and training.
        
         | adw wrote:
         | > What happens when the best methods for computational fluid
         | dynamics, molecular dynamics, nuclear physics are all
         | uninterpretable ML models?
         | 
         | A better analogy is "weather forecasting".
        
         | jeffreyrogers wrote:
         | I asked a friend of mine who is chemistry professor at a large
         | research university something along these lines a while ago. He
         | said that so far these models don't work well in regions where
         | either theory or data is scarce, which is where most progress
         | happens. So he felt that until they can start making progress
         | in those areas it won't change things much.
        
           | mensetmanusman wrote:
           | Major breakthroughs happen when clear connections can be made
           | and engineered between the many bits of solved but obscured
           | solutions.
        
         | bbor wrote:
         | This is exactly how the physicists felt at the dawn of quantum
         | physics - the loss of meaningful human inquiry to blindly
         | effective statistics. Sobering stuff...
         | 
         | Personally, I'm convinced that human reason is less pure than
         | we think it to be, and that the move to large mathematical
         | models might just be formalizing a lack-of-control that was
         | always there. But that's less of a philosophy of science
         | discussion and more of a cognitive science one
        
         | krzat wrote:
         | We will get better with understanding black boxes, if a model
         | can be compressed into simple math formula then it's both
         | easier to understand and to compute.
        
         | ldoughty wrote:
         | My argument is: weather.
         | 
         | I think it is fine & better for society to have applications
         | and models for things we don't fully understand... We can model
         | lots of small aspects of weather, and we have a lot of factors
         | nailed down, but not necessarily all the interactions.. and not
         | all of the factors. (Additional example for the same reason:
         | Gravity)
         | 
         | Used responsibly. Of course. I wouldn't think an AI model
         | designing an airplane that no engineers understand how it works
         | is a good idea :-)
         | 
         | And presumably all of this is followed by people trying to
         | understand the results (expanding potential research areas)
        
           | GaggiX wrote:
           | It would be cool to see an airplane made using generative
           | design.
        
             | tech_buddha wrote:
             | How about spaceship parts ?
             | https://www.nasa.gov/technology/goddard-tech/nasa-turns-
             | to-a...
        
         | t14n wrote:
         | A new-ish field of "mechanistic interpretability" is trying to
         | poke at weights and activations and find human-interpretable
         | ideas w/in them. Making lots of progress lately, and there are
         | some folks trying to apply ideas from the field to Alphafold 2.
         | There are hopes of learning the ideas about biology/molecular
         | interactions that the model has "discovered".
         | 
         | Perhaps we're in an early stage of Ted Chiang's story "The
         | Evolution of Human Science", where AIs have largely taken over
         | scientific research and a field of "meta-science" developed
         | where humans translate AI research into more human-
         | interpretable artifacts.
        
         | philip1209 wrote:
         | It makes me think about how Einstein was famous for making
         | falsifiable real-world predictions to accompany his theoretical
         | work. And, sometimes it took years for proper experiments to be
         | run (such as measuring a solar eclipse during the breakout of a
         | world war).
         | 
         | Perhaps the opportunity here is to provide a quicker feedback
         | loop for theory about predictions in the real world. Almost
         | like unit tests.
        
           | HanClinto wrote:
           | > Perhaps the opportunity here is to provide a quicker
           | feedback loop for theory about predictions in the real world.
           | Almost like unit tests.
           | 
           | Or jumping the gap entirely to move towards more self-driven
           | reinforcement learning.
           | 
           | Could one structure the training setup to be able to design
           | its own experiments, make predictions, collect data, compare
           | results, and adjust weights...? If that loop could be closed,
           | then it feels like that would be a very powerful jump indeed.
           | 
           | In the area of LLMs, the SPAG paper from last week was very
           | interesting on this topic, and I'm very interested in seeing
           | how this can be expanded to other areas:
           | 
           | https://github.com/Linear95/SPAG
        
           | goggy_googy wrote:
           | Agreed. At the very least, models of this nature let us
           | iterate/filter our theories a little bit more quickly.
        
             | jprete wrote:
             | The model isn't reality. A theory that disagrees with the
             | model but agrees with reality shouldn't be filtered, but in
             | this process it will be.
        
         | mnky9800n wrote:
         | I believe it simply tells us that our understanding of
         | mechanical systems, especially chaotic ones, is not as well
         | defined as we thought.
         | 
         | https://journals.aps.org/prresearch/abstract/10.1103/PhysRev...
        
         | thelastparadise wrote:
         | The ML models will help us understand that :)
        
         | jes5199 wrote:
         | every time the two systems disagree, it's an opportunity to
         | learn something. both kinds of models can be improved with new
         | information, done through real-world experiments
        
         | ogogmad wrote:
         | Some machine learning models might be more interpretable than
         | others. I think the recent "KAN" model might be a step forward.
        
         | dekhn wrote:
         | If you're a scientist who works in protein folding (or one of
         | those other areas) and strongly believe that science's goal is
         | to produce falsifiable hypotheses, these new approaches will be
         | extremely depressing, especially if you aren't proficient
         | enough with ML to reproduce this work in your own hands.
         | 
         | If you're a scientist who accepts that probabilist models beat
         | interpretable ones (articulated well here:
         | https://norvig.com/chomsky.html), then you'll be quite happy
         | because this is yet another validation of the value of
         | statistical approaches in moving our ability to predict the
         | universe forward.
         | 
         | If you're the sort of person who believes that human brains are
         | capable of understanding the "why" of how things work in all
         | its true detail, you'll find this an interesting challenge- can
         | we actually interpret these models, or are human brains too
         | feeble to understand complex systems without sophisticated
         | models?
         | 
         | If you're the sort of person who likes simple models with as
         | few parameters as possible, you're probably excited because
         | developing more comprehensible or interpretable models that
         | have equivalent predictive ability is a very attractive
         | research subject.
         | 
         | (FWIW, I'm in the camp of "we should simultaneously seek
         | simpler, more interpretable models, while also seeking to
         | improve native human intelligence using computational
         | augmentation")
        
           | narrator wrote:
           | What if our understanding of the laws of the natural sciences
           | are subtly flawed and AI just corrects perfectly for our
           | flawed understanding without telling us what the error in our
           | theory was?
           | 
           | Forget trying to understand dark matter. Just use this model
           | to correct for how the universe works. What is actually wrong
           | with our current model and if dark matter exists or not or
           | something else is causing things doesn't matter. "Shut up and
           | calculate" becomes "Shut up and do inference."
        
             | dekhn wrote:
             | All models are wrong, but some models are useful.
        
             | RandomLensman wrote:
             | High accuracy could result from pretty incorrect models.
             | When and where that woukd then go completely off the rails
             | is difficult to say.
        
             | visarga wrote:
             | ML is accustomed with the idea that all models are bad, and
             | there are ways to test how good or bad they are. It's all
             | approximations and imperfect representations, but they can
             | be good enough for some applications.
             | 
             | If you think carefully humans operate in the same regime.
             | Our concepts are all like that - imperfect, approximative,
             | glossing over some details. Our fundamental grounding and
             | test is survival, an unforgiving filter, but lax enough to
             | allow for anti-vaxxer movements during the pandemic -
             | survival test is not testing for truth directly, only for
             | ideas that fail to support life.
        
               | mistermann wrote:
               | Also lax enough for the _hilarious_ mismanagement of the
               | situation by  "the experts". At least anti-vaxxers have
               | an excuse.
        
           | croniev wrote:
           | I'm in the following camp: It is wrong to think about the
           | world or the models as "complex systems" that may or may not
           | be understood by human intelligence. There is no meaning
           | beyond that which is created by humans. There is no 'truth'
           | that we can grasp in parts but not entirely. Being unable to
           | understand these complex systems means that we have framed
           | them in such a way (f.e. millions of matrix operations) that
           | does not allow for our symbol-based, causal reasoning mode.
           | That is on us, not our capabilities or the universe.
           | 
           | All our theories are built on observation, so these empirical
           | models yielding such useful results is a great thing - it
           | satisfies the need for observing and acting. Missing
           | explainability of the models merely means we have less
           | ability to act more precisely - but it does not devalue our
           | ability to act coarsely.
        
             | visarga wrote:
             | But the human brain has limited working memory and
             | experience. Even in software development we are often
             | teetering at the edge of the mental power to grasp and
             | relate ideas. We have tried so much to manage complexity,
             | but real world complexity doesn't care about human
             | capabilities. So there might be high dimensional problems
             | where we simply can't use our brains directly.
        
               | jvanderbot wrote:
               | A human mind is perfectly capable of following the same
               | instructions as the computer did. Computers are stupidly
               | simple and completely deterministic.
               | 
               | The concern is about "holding it all in your head", and
               | depending on your preferred level of abstraction, "all"
               | can perfectly reasonably be held in your head. For
               | example: "This program generates the most likely outputs"
               | makes perfect sense to me, even if I don't understand
               | some of the code. I understand the _system_. Programmers
               | went through this decades ago. Physicists had to do it
               | too. Now, chemists I suppose.
        
               | GenerocUsername wrote:
               | This is just wrong.
               | 
               | While computer operations in solutions are computable by
               | humans, the billions of rapid computations are
               | unachievable by humans. In just a few seconds, a computer
               | can perform more basic arithmetic operations than a human
               | could in a lifetime.
        
               | jvanderbot wrote:
               | I'm not saying it's achievable, I'm saying it's not
               | magic. A chemist who wishes to understand what the model
               | is doing can get as far as anyone else, and can reach a
               | level of "this prediction machine works well and I
               | understand how to use and change it". Even if it requires
               | another PhD in CS.
               | 
               | That the tools became complex is not a reason to fret in
               | science. No more than statistical physics or quantum
               | mechanics or CNN for image processing - it's complex and
               | opaque and hard to explain but perfectly reproduceable.
               | "It works better than my intuition" is a level of
               | sophistication that most methods are probably doomed to
               | achieve.
        
               | ajuc wrote:
               | Abstraction isn't the silver bullet. Not everything is
               | abstractable.
               | 
               | "This program generates the most likely outputs" isn't a
               | scientific explanation, it's teleology.
        
               | jvanderbot wrote:
               | "this tool works better than my intuition" absolutely is
               | science. "be quiet and calculate" is a well worn mantra
               | in physics is it not?
        
               | mistermann wrote:
               | What is an example of something that isn't abstractable?
        
             | slibhb wrote:
             | > There is no 'truth' that we can grasp in parts but not
             | entirely.
             | 
             | If anyone actually thought this way -- no one does -- they
             | definitely wouldn't build models like this.
        
             | EventH- wrote:
             | "There is no 'truth' that we can grasp in parts but not
             | entirely."
             | 
             | The value of pi is a simple counterexample.
        
             | Invictus0 wrote:
             | > There is no 'truth' that we can grasp in parts but not
             | entirely
             | 
             | It appears that your own comment is disproving this
             | statement
        
           | divbzero wrote:
           | There have been times in the past when usable technology
           | surpassed our scientific understanding, and instead of being
           | depressing it provided a map for scientific exploration. For
           | example, the steam engine was developed by engineers in the
           | 1600s/1700s (Savery, Newcomen, and others) but thermodynamics
           | wasn't developed by scientists until the 1800s (Carnot,
           | Rankine, and others).
        
             | jprete wrote:
             | I think the various contributors to the invention of the
             | steam engine had a good idea of what they were trying to do
             | and how their idea would physically work. Wikipedia lists
             | the prerequisites as the concepts of a vacuum and pressure,
             | methods for creating a vacuum and generating steam, and the
             | piston and cylinder.
        
               | exe34 wrote:
               | That's not too different from the alpha fold people
               | knowing that there's a sequence to sequence translation,
               | that an enormous number of cross-talk happens between the
               | parts of the molecule, that if you get the potential
               | fields just right, it'll fold in the way nature intended.
               | They're not just blindly fiddling with a bunch of levers.
               | What they don't know is the individual detailed
               | interactions going on and how to approximate them with
               | analytical equations.
        
           | interroboink wrote:
           | > ... and strongly believe that science's goal is to produce
           | falsifiable hypotheses, these new approaches will be
           | extremely depressing
           | 
           | I don't quite understand this point -- could you elaborate?
           | 
           | My understanding is that the ML model produces a hypothesis,
           | which can then be tested via normal scientific method
           | (perform experiment, observe results).
           | 
           | If we have a magic oracle that says "try this, it will work",
           | and then we try it, and it works, we still got something
           | falsifiable out of it.
           | 
           | Or is your point that we won't necessarily have a
           | coherent/elegant explanation for _why_ it works?
        
             | dekhn wrote:
             | People will be depressed because they spent decades getting
             | into professorship positions and publishing papers with
             | ostensible comprehensible interpretations of the generative
             | processes that produced their observations, only to be
             | "beat" in the game by a system that processed a lot of
             | observations and can make predicts in a way that no
             | individual human could comprehend. And those professors
             | will have a harder time publishing, and therefore getting
             | promoted, in the future.
             | 
             | Whether ML models produce hypotheses is something of an
             | epistemiological argument that I think muddies the waters
             | without bringing any light. I would only use the term "ML
             | models generate predictions". In a sense, the model itself
             | is the hypothesis, not any individual prediction.
        
             | variadix wrote:
             | There is an issue scientifically. I think this point was
             | expressed by Feynman: the goal of scientific theories isn't
             | just to make better predictions, it's to inform us about
             | how and why the world works. Many ancient civilizations
             | could accurately predict the position of celestial bodies
             | with calendars derived from observations of their period,
             | but it wasn't until Copernicus proposed the heliocentric
             | model and Galileo provided supporting observations that we
             | understood the why and how, and that really matters for
             | future progress and understanding.
        
               | interroboink wrote:
               | I agree the how/why is the main driving goal. That's
               | kinda why I feel like this is _not_ depressiong news --
               | there 's a new frontier to discover and attempt to
               | explain. Scientists love that stuff (:
               | 
               | Knowing how to predict the motion of planets but without
               | having an underlying explanation encourages scientists to
               | develop their theories. Now, once more, we know how to
               | predict something (protein folding) but without an
               | underlying explanation. Hurray, something to investigate!
               | 
               | (Aside: I realize that there are also more human factors
               | at play, and upsetting the status quo will always cause
               | some grief. I just wanted to provide a counterpoint that
               | there is some exciting progress represented here, too).
        
               | variadix wrote:
               | I was mainly responding to the claim that these black
               | boxes produce a hypothesis that is useful as a basis for
               | scientific theories. I don't think it does, because it
               | offers no explanation as to the how and why, which is as
               | we agree the primary goal. It doesn't provide a
               | hypothesis per se, just a prediction, which is useful
               | technologically and should indicate that there is more to
               | be discovered (see my response to the sibling reply)
               | scientifically but offers no motivating explanation.
        
               | Invictus0 wrote:
               | But we do know why, it's just not simple. The atoms
               | interact with one another because of a variety of
               | fundamental forces, but since there can be hundreds of
               | thousands of atoms in a single protein, it's plainly
               | beyond human comprehension to explain why it folds the
               | way it does, one fundamental force interaction at a time.
        
               | variadix wrote:
               | Fair. I guess the interesting thing for protein folding
               | research then is that there appears to be a way to
               | approximate/simplify the calculations required to predict
               | folding patterns that doesn't require the precision of
               | existing folding models and software. In essence,
               | AlphaFold is an existence proof that there should be a
               | way to model protein folding more efficiently.
        
           | coffeemug wrote:
           | _> If you 're the sort of person who believes that human
           | brains are capable of understanding the "why" of how things
           | work in all its true detail_
           | 
           | This seems to me an empirical question about the world. It's
           | clear our minds are limited, and we understand complex
           | phenomena through abstraction. So either we discover we can
           | continue converting advanced models to simpler abstractions
           | we can understand, or that's impossible. Either way, it's
           | something we'll find out and will have to live with in the
           | coming decades. If it turns out further abstractions aren't
           | possible, well, enlightenment thought had lasted long enough.
           | It's exciting to live at a time in humanity's history when we
           | enter a totally uncharted new paradigm.
        
           | jprete wrote:
           | The goal of science has always been to discover underlying
           | principles and not merely to predict the outcome of
           | experiments. I don't see any way to classify an opaque ML
           | model as a scientific artifact since by definition it can't
           | reveal the underlying principles. Maybe one could claim the
           | ML model itself is the scientist and everyone else is just
           | feeding it data. I doubt human scientists would be
           | comfortable with that, but if they aren't trying to explain
           | anything, what are they even doing?
        
             | fire_lake wrote:
             | What if the underlying principles of the universe are too
             | complex for human understanding but we can train a model
             | that very closely follows them?
        
               | dekhn wrote:
               | Then we should dedicate large fractions of human
               | engineering towards finding ethical ways to improve human
               | intelligence so that we can appreciate the underlying
               | principles better.
        
               | refulgentis wrote:
               | I spend about 30 minutes reading this thread and links
               | from it: I don't really follow your line of argument. I
               | find it fascinating and well-communicated, the lack of
               | understanding is on me: my attention flits around like a
               | butterfly, in a way that makes it hard for me to follow
               | people writing original content.
               | 
               | High level, I see a distinction between theory and
               | practice, between an oracle predicting without
               | explanation, and a well-thought out theory built on a
               | partnership between theory and experiment over centuries,
               | ex. gravity.
               | 
               | I have this feeling I can't shake that the knife you're
               | using is too sharp, both in the specific example we're
               | discussing, and in general.
               | 
               | In the specific example, folding, my understanding is we
               | know how proteins fold & the mechanisms at work. It just
               | takes an ungodly amount of time to compute and you'd
               | still confirm with reality anyway. I might be completely
               | wrong on that.
               | 
               | Given that, the proposal to "dedicate...engineer[s]
               | towards finding ethical ways to improve...intelligence so
               | that we can appreciate the underlying principles better"
               | begs the question of if we're not appreciating the
               | underlying principles.
               | 
               | It feels like a close cousin of physics
               | theory/experimentalist debate pre-LHC, circa 2006: the
               | experimentalists wanted more focus on building colliders
               | or new experimental methods, and at the extremes, thought
               | string theory was a complete was of time.
               | 
               | Which was working towards appreciating the underlying
               | principles?
               | 
               | I don't really know. I'm not sure there's a strong divide
               | between the work of recording reality and explaining it.
               | I'll peer into a microscope in the afternoon, and take a
               | shower in the evening, and all of a sudden, free
               | associating gives me a more high-minded explanation for
               | what I saw.
               | 
               | I'm not sure a distinction exists for protein folding,
               | yes, I'm virtually certain this distinction does not
               | exist in reality, only in extremely stilted examples
               | (i.e. a very successful oracle at Delphi)
        
               | mistermann wrote:
               | There's a much easier route: consciousness is not
               | included in the discussion...what a coincidence.
        
               | Wilduck wrote:
               | That sounds like useful engineering, but not useful
               | science.
        
               | mrbungie wrote:
               | I think that a lot of scientific discoveries originate
               | from initial observations made during engineering work or
               | just out of curiosity without rigour.
               | 
               | Not saying ML methods haven't shown important
               | reproducibility challenges, but to just shut them down
               | due to not being "useful science" is inflexible.
        
             | dekhn wrote:
             | That's the aspirational goal. And I would say that it's a
             | bit of an inflexible one- for example, if we had an ML that
             | could generate molecules that cure diseases that would pass
             | FDA approval, I wouldn't really care if scientists couldn't
             | explain the underlying principles. But I'm an ex-scientist
             | who is now an engineer, because I care more about tools
             | that produce useful predictions than understanding
             | underlying principles. I used to think that in principle we
             | could identify all the laws of the universe, and in theory,
             | simulate that would enough accuracy, and inspect the
             | results, and gain enlightenment, but over time, I've
             | concluded that's a really bad way to waste lots of time,
             | money, and resources.
        
               | panarky wrote:
               | It's not either-or, it's yes-and. We don't have to
               | abandon one for the other.
               | 
               | AlphaFold 3 can rapidly reduce a vast search space in a
               | way physically-based methods alone cannot. This narrowly
               | focused search space allows scientists to apply their
               | rigorous, explainable, physical methods, which are slow
               | and expensive, to a small set of promising alternatives.
               | This accelerates drug discovery and uncovers insights
               | that would otherwise be too costly or time-consuming.
               | 
               | The future of science isn't about AI versus traditional
               | methods, but about their intelligent integration.
        
               | nextos wrote:
               | Or you can treat AlphaFold as a black box / oracle and
               | work at systems biology level, i.e. at pathway and
               | cellular level. Protein structures and interactions are
               | always going to be hard to predict with interpretable
               | models, which I also prefer.
               | 
               | My only worry is that AlphaFold and others, e.g. ESM,
               | seem to be bit fragile for out-of-distribution sequences.
               | They are not doing a great job with unusual sequences, at
               | least in my experience. But hopefully they will improve
               | and provide better uncertainty measures.
        
             | exe34 wrote:
             | The ML model can also be an emulator of parts of the system
             | that you don't want to personally understand, to help you
             | get on with focusing on what you do want to figure out.
             | Alternatively, the ML model can pretend to be the real
             | world while you do experiments with it to figure out
             | aspects of nature in minutes rather than hours-days of
             | biological turnaround.
        
             | strogonoff wrote:
             | Can underlying principles be discovered using the framework
             | of scientific method? The primary goal of models and
             | theories it develops is to support more experiments and
             | eventually be disproven. If no model can be correct,
             | complete and provable in finite time, then a theory about
             | underlying principles that claims completeness would have
             | to be unfalsifiable. This is reasonable in context of
             | philosophy, but not in natural sciences.
             | 
             | Scientific method can help us rule out what underlying
             | principles are definitely _not_. Any such principles are
             | not actually up to be "discovered".
             | 
             | If probabilistic ML comes along and does a decent job at
             | predicting things, we should keep in mind that those
             | predictions are made not in context of absolute truth, but
             | in context of theories and models we have previously
             | developed. I.e., it's not just that it can predict how
             | molecules interact, but that the entire concept of
             | molecules is an artifact of just some model we (humans)
             | came up with previously--a model which, per above, is
             | probably incomplete/incorrect. (We could or should use this
             | prediction to improve our model or come up with a better
             | one, though.)
             | 
             | Even if a future ML product could be creative enough to
             | actually come up with and iterate on models all on its own
             | from first principles, it would not be able to give us the
             | answer to the question of underlying principles for the
             | above-mentioned reasons. It could merely suggest us another
             | incomplete/incorrect model; to believe otherwise would be
             | to ascribe it qualities more fit for religion than science.
        
               | jltsiren wrote:
               | I don't find that argument convincing.
               | 
               | People clearly have been able to discover many underlying
               | principles using the scientific method. Then they have
               | been able to explain and predict many complex phenomena
               | using the discovered principles, and create even more
               | complex phenomena based on that. Complex phenomena such
               | as the technology we are using for this discussion.
               | 
               | Words dont have any inherent meaning, just the meaning
               | they gain from usage. The entire concept of truth is an
               | artifact of just some model (language) we came up with
               | previously--a model which, per above, is probably
               | incomplete/incorrect. The kind of absolute truth you are
               | talking about may make sense when discussing philosophy
               | or religion. Then there is another idea of truth more
               | appropriate for talking about the empirical world. Less
               | absolute, less immutable, less certain, but more
               | practical.
        
             | ak_111 wrote:
             | Discovering underlying principles and predicting outcomes
             | is two sides of the same coin in that there is no way to
             | confirm you have discovered underlying principles unless
             | they have some predictive power.
             | 
             | Some had tried to come up with other criteria to confirm
             | you have discovered an underlying principle without
             | predictive power, such as on aesthetics - but this is seen
             | by the majority of scientists as basically a cop out. See
             | debate around string theory.
             | 
             | Note that this comment is summarizing a massive debate in
             | the philosophy of science.
        
               | thfuran wrote:
               | >there is no way to confirm you have discovered
               | underlying principles unless they have some predictive
               | power.
               | 
               | Yes, but a perfect oracle has no explanatory power, only
               | predictive.
        
               | nkingsy wrote:
               | increasing the volume of predictions produces patterns
               | that often lead to underlying principles.
        
               | mikeyouse wrote:
               | And much of the 20th century was characterized by a very
               | similar progression - we had no clue what the actual
               | mechanism of action was for hundreds of life saving drugs
               | until relatively recently, and we still only have best
               | guesses for many.
               | 
               | That doesn't diminish the value that patients received in
               | any way even though it would be more satisfying to make
               | predictions and design something to interact in a way
               | that exactly matches your theory.
        
               | chasd00 wrote:
               | If all you can do is predict an outcome without being
               | able to explain how then what have you really discovered?
               | Asking someone to just believe you can predict outcomes
               | without any reasoning as to how, even if you're always
               | right, sounds like the concept of faith in religion.
        
               | pas wrote:
               | it's still an extremely valuable tool. just as we see in
               | mathematics, closed forms (and short and elegant proofs)
               | are much coveted luxury items.
               | 
               | for many basic/fundamental mathematical objects we don't
               | (yet) have simple mechanistic ways to compute them.
               | 
               | so if a probabilistic model spits out something very
               | useful, we can slap a nice label on it and call it a day.
               | that's how engineering works anyway. and then hopefully
               | someday someone will be able to derive that result from
               | "first principles" .. maybe it'll be even more
               | funky/crazy/interesting ... just like mathematics
               | arguably became more exciting by the fact that someone
               | noticed that many things are not provable/constructable
               | without an explicit Axiom of Choice.
               | 
               | https://en.wikipedia.org/wiki/Nonelementary_integral#Exam
               | ple...
        
               | thfuran wrote:
               | >closed forms (and short and elegant proofs) are much
               | coveted luxury items.
               | 
               | Yes, but we're taking about roughly the opposite of a
               | proof
        
               | jcims wrote:
               | Isn't that basically true of most of the fundamental laws
               | of physics? There's a lot we don't understand about
               | gravity, space, time, energy, etc., and yet we compose
               | our observations of how they behave into very useful
               | tools.
        
               | dumpsterdiver wrote:
               | > what have you really discovered?
               | 
               | You've discovered magic.
               | 
               | When you read about a wizard using magic to lay waste to
               | invading armies, how much value would you guess the
               | armies place in whether or not the wizard truly
               | understands the magic being used against them?
               | 
               | Probably none. Because the fact that the wizard doesn't
               | fully understand why magic works does not prevent the
               | wizard from using it to hand invaders their asses.
               | Science is very much the same - our own wizards used
               | medicine that they did not understand to destroy invading
               | hordes of bacteria.
        
             | toxik wrote:
             | Kepler famously compiled troves of data on the night sky,
             | and just fitted some functions to them. He could not
             | explain why but he could say what. Was he not a scientist?
        
             | Invictus0 wrote:
             | Maybe the science of the past was studying things of lesser
             | complexity than the things we are studying now.
        
             | SJC_Hacker wrote:
             | What if it turns out that nature simply doesn't have nice,
             | neat models that humans can comprehend for many observable
             | phenomena?
        
             | gradus_ad wrote:
             | That ship sailed with Quantum physics. Nearly perfect at
             | prediction, very poor at giving us a concrete understanding
             | of what it all means.
             | 
             | This has happened before. Newtonian mechanics was
             | incomprehensible spooky action at a distance, but Einstein
             | clarified gravity as the bending of spacetime.
        
           | pishpash wrote:
           | So the work to simplify ML models, reduce dimensions, etc.
           | becomes the numeric way to seek simple actual scientific
           | models. Scientific computing and science become one.
        
           | ThomPete wrote:
           | The goal of science should always be to seek good
           | explanations hard to vary.
        
           | RajT88 wrote:
           | > can we actually interpret these models, or are human brains
           | too feeble to understand complex systems without
           | sophisticated models?
           | 
           | I think we will have to develop a methodology and supporting
           | toolset to be able to derive the underlying patterns driving
           | such ML models. It's just too much for a human to comb
           | through by themselves and make sense of.
        
         | tobrien6 wrote:
         | I suspect that ML will be state-of-the-art at generating human-
         | interpretable theories as well. Just a matter of time.
        
         | sdwr wrote:
         | > Does this decouple progress from our current understanding of
         | the scientific process?
         | 
         | Thank God! As a person who uses my brain, I think I can say,
         | pretty definitively, that people are bad at understanding
         | things.
         | 
         | If this actually pans out, it means we will have harnessed
         | knowledge/truth as a fundamental force, like fire or
         | electricity. The "black box" as a building block.
        
           | tantalor wrote:
           | This type of thing is called an "oracle".
           | 
           | We've had stuff like this for a long time.
           | 
           | Notable examples:
           | 
           | - Temple priestesses
           | 
           | - Tea-leaf reading
           | 
           | - Water scrying
           | 
           | - Palmistry
           | 
           | - Clairvoyance
           | 
           | - Feng shui
           | 
           | - Astrology
           | 
           | The only difference is, the ML model is really quite good at
           | it.
        
             | unsupp0rted wrote:
             | > The only difference is, the ML model is really quite good
             | at it.
             | 
             | That's the crux of it: we've had theories of physics and
             | chemistry since before writing was invented.
             | 
             | None of that mattered until we came upon the ones that
             | actually work.
        
         | insane_dreamer wrote:
         | For me the big question is how do we confidently validate the
         | output of this/these model(s).
        
           | topaz0 wrote:
           | It's the right question to ask, and the answer is that we
           | will still have to confirm them by experimental structure
           | determination.
        
         | tambourine_man wrote:
         | Our metaphors and intuitions were crumbling already and
         | stagnating. See quantum physics: sometimes a particle,
         | sometimes a wave, and what constitute a measurement anyway?
         | 
         | I'll take prediction over understanding if that's the best our
         | brains can do. We've evolved to deal with a few orders of
         | magnitude around a meter and a second. Maybe dealing with
         | light-years and femtometer/seconds is too much to ask.
        
         | dyauspitr wrote:
         | Whatever it is if we needed to we could follow each instruction
         | through the black box. It's never going to be as opaque as
         | something organic.
        
         | wslh wrote:
         | This is the topic of epistemology of the sciences in books such
         | as "New Direction in the Philosophy of Mathematics" [1] and
         | happened before with problems such as the four color theorem
         | [2] where AI was not involved.
         | 
         | Going back to the uninterpretable ML models in the context of
         | AlphaFold 3, I think one method for trying to explain the
         | findings is similar to the experimental methods of physics with
         | reality: you perform experiments with the reality (in this case
         | AlphaFold 3) to came up with sound conclusions. AI/ML is an
         | interesting black-box system.
         | 
         | There are other open discussions on this topic. For example,
         | can our human brain absorbe that knowledge or it is limited
         | somehow with the scientific language that we have now?
         | 
         | [1]
         | https://www.google.com.ar/books/edition/New_Directions_in_th...
         | 
         | [2] https://en.wikipedia.org/wiki/Four_color_theorem
        
         | torrefatto wrote:
         | You are conflating the whole scientific endeavor to a very
         | specific problem to which this specific approach is effective
         | at producing results that fit with the observable world. This
         | has nothing to do with science as a whole.
        
         | scotty79 wrote:
         | We should be thankful that we live in the universe that obeys
         | math simple enough to comprehend that we were able to reach
         | that level.
         | 
         | Imagine if optis was complex enough that it would require ML
         | model to predict anything.
         | 
         | We'd be in permanent stone age without a way out.
        
           | lupire wrote:
           | What would a universe look like that lacked simple things,
           | and somehow only complex things existed?
           | 
           | It makes me think of how Gaussian integers have irreducibles
           | but not prime numbers, where some large things cannot be
           | uniquely expressed as combination of smaller things.
        
         | mberning wrote:
         | I would assume that given enough hints from AI and if it is
         | deemed important enough humans will come in to figure out the
         | "first principles" required to arrive at the conclusion.
        
           | RobCat27 wrote:
           | I believe this is the case also. With a well enough
           | performing AI/ML/probabilistic model where you can change the
           | model's input parameters and get a highly accurate prediction
           | basically instantly, we can test theories approximately and
           | extremely fast rather than running completely new
           | experiments, which will always come with it's own set of
           | errors and problems.
        
         | danielmarkbruce wrote:
         | "better and better models of the world" does not always mean
         | "more accurate" and never has.
         | 
         | We already know how to model the vast majority of things, just
         | not at a speed and cost which makes it worthwhile. There are
         | dimensions of value - one is accuracy, another speed, another
         | cost, and in different domains additional dimensions. There are
         | all kinds of models used in different disciplines which are
         | empirical and not completely understood. Reducing things to the
         | lowest level of physics and building up models from there has
         | never been the only approach. Biology, geology, weather,
         | materials all have models which have hacks in them, known
         | simplifications, statistical approximations, so the result can
         | be calculated. It's just about choosing the best hacks to get
         | the best trade off of time/money/accuracy.
        
         | Gimpei wrote:
         | Might be easier to come up with new models with analytic
         | solutions if you have a probabilistic model at hand. A lot
         | easier to evaluate against data and iterate. Also, I wouldn't
         | be surprised if we develop better tools for introspecting these
         | models over time.
        
         | UniverseHacker wrote:
         | It means we now have an accurate surrogate model or "digital
         | twin" that can be experimented on almost instantaneously. So we
         | can massively accelerate the traditional process of developing
         | mechanistic understanding through experiment, while _also_
         | immediately be able to benefit from the ability to make
         | accurate predictions, even without needing understanding.
         | 
         | In reality, science has already pretty much gone this way long
         | ago, even if people don't like to admit it. Simple,
         | reductionist explanations for complex phenomena in living
         | systems don't really exist. Virtually all of medicine nowadays
         | is empirical: try something, and if you can prove its safe and
         | effective, you keep doing it. We almost never have a meaningful
         | explanation for how it really works, and when we think we do,
         | it gets proven wrong repeatedly, while the treatment keeps
         | working as always.
        
           | mathgradthrow wrote:
           | instead of "in mice", we'll be able to say "in the cloud"
        
             | unsupp0rted wrote:
             | In vivo in humans in the cloud
        
               | dekhn wrote:
               | one of the companies I worked for, "insitro", is
               | specificallyt named that to mean the combination of "in
               | vivo, in vitro, in silicon".
        
             | topaz0 wrote:
             | "In nimbo" (though what people actually say is "in
             | silico").
        
             | d_silin wrote:
             | "in silico"
        
           | imchillyb wrote:
           | Medicine can be explained fairly simply, and the why of how
           | it works as it does is also explained by this:
           | 
           | Imagine a very large room that has every surface covered by
           | on-off switches.
           | 
           | We cannot see inside of this room. We cannot see the
           | switches. We cannot fit inside of this room, but a toddler
           | fits through the tiny opening leading into the room. The
           | toddler cannot reach the switches, so we equip the toddler
           | with a pole that can flip the switches. We train the toddler,
           | as much as possible, to flip a switch using the pole.
           | 
           | Then, we send the toddler into the room and ask the toddler
           | to flip the switch or switches we desire to be flipped, and
           | then do tests on the wires coming out of the room to see if
           | the switches were flipped correctly. We also devise some
           | tests for other wires to see if that naughty toddler flipped
           | other switches on or off.
           | 
           | We cannot see inside the room. We cannot monitor the toddler.
           | We can't know what _exactly_ the toddler did inside the room.
           | 
           | That room is the human body. The toddler with a pole is a
           | medication.
           | 
           | We can't see or know enough to determine what was activated
           | or deactivated. We can invent tests to narrow the scope of
           | what was done, but the tests can never be 100% accurate
           | because we can't test for every effect possible.
           | 
           | We introduce chemicals then we hope-&-pray that the chemicals
           | only turned on or off the things we wanted turned on or off.
           | Craft some qualifications testing for proofs, and do a 'long-
           | term' study to determine if there were other things turned on
           | or off, or a short circuit occurred, or we broke something.
           | 
           | I sincerely hope that even without human understanding, our
           | AI models can determine what switches are present, which ones
           | are on and off, and how best to go about selecting for the
           | correct result.
           | 
           | Right now, modern medicine is almost a complete crap-shoot.
           | Hopefully modern AI utilities can remedy the gambling aspect
           | of medicine discovery and use.
        
         | tnias23 wrote:
         | I wonder if ML can someday be employed in deciphering such
         | black box problems; a second model that can look under the hood
         | at all the number crunching performed by the predictive model,
         | identify the pattern that resulted in a prediction, and present
         | it in a way we can understand.
         | 
         | That said, I don't even know if ML is good at finding patterns
         | in data.
        
           | lupire wrote:
           | > That said, I don't even know if ML is good at finding
           | patterns in data.
           | 
           | That's the only thing ML does.
        
         | burny_tech wrote:
         | We need to advance mechanistic interpretability (field reverse
         | engineering neural networks)
         | https://www.youtube.com/watch?v=P7sjVMtb5Sg
         | https://www.youtube.com/watch?v=7t9umZ1tFso
         | https://www.youtube.com/watch?v=2Rdp9GvcYOE
        
         | goggy_googy wrote:
         | I think at some point, we will be able to produce models that
         | are able to pass data into a target model and observe its
         | activations and outputs and put together some interpretable
         | pattern or loose set of rules that govern the input-output
         | relationship in the target model. Using this on a model like
         | AlphaFold might enable us to translate inferred chemical laws
         | into natural language.
        
         | pen2l wrote:
         | The most moneyed and well-coordinated organizations have honed
         | a large hammer, and they are going to use it for everything,
         | and so almost certainly future big findings in the areas you
         | mention, probabilistically inclined models coming from ML will
         | be the new gold standard.
         | 
         | But yet the only thing that can save us from ML will be ML
         | itself because it is ML that has the best chance to be able to
         | extrapolate patterns from these blackbox models to develop
         | human interpretable models. I hope we do dedicate explicit
         | effort to this endeavor, and so continue the human advances and
         | expanse of human knowledge in tandem with human ingenuity with
         | computers at our assistance.
        
           | optimalsolver wrote:
           | Spoiler: "Interpretable ML" will optimize for output that
           | either looks plausible to humans, reinforces our
           | preconceptions, or appeals to our aesthetic instincts. It
           | will not converge with reality.
        
             | kolinko wrote:
             | That is not considered interpretable then, and I think most
             | people working in the field are aware of this gotcha.
             | 
             | Iirc when EU required banks to have interpretable rules for
             | loans, a plain explanation was not considered enough. What
             | was required was a clear process that was used from the
             | beginning - i.e. you can use an AI to develop an algorightm
             | to make a decision, but you can't use AI to make a decision
             | and explains reasons afterwards.
        
             | DoctorOetker wrote:
             | Spoiler: basic / hard sciences describe nature
             | mathematically.
             | 
             | Open a random physics book, and you will find lots and lots
             | of derivations (using more or less acceptable assumptions
             | depending on circumstance under consideration).
             | 
             | Derivations and assumptions can be formally verified, see
             | for example https://us.metamath.org
             | 
             | Ever more intelligent machine learning algorithms and data
             | structures replacing human heuristic labor, will simply
             | shift the expected minimum deliverable from associations to
             | ever more rigorous proofs in terms of less and less
             | assumptions.
             | 
             | Machine learning will ultimately be used as automated
             | theorem provers, and their output will eventually be
             | explainable by definition.
             | 
             | When do we classify an explanation as explanatory? When it
             | succeeds in deriving a conclusion from acceptable
             | assumptions without hand waving. Any hand waving would
             | result in the "proof" not having passed formal
             | verification.
        
         | thegrim33 wrote:
         | Reminds me of the novel Blindsight - in it there's special
         | individuals who work as synthesists, whos job it is to observe
         | and understand and then somehow translate back to "lay person"
         | the seemingly undecipherable actions/decisions of advanced
         | computers and augmented humans.
        
         | 6gvONxR4sf7o wrote:
         | "Best methods" is doing a lot of heavy lifting here. "Best" is
         | a very multidimensional thing, with different priorities
         | leading to different "bests." Someone will inevitably
         | prioritize reliability/accuracy/fidelity/interpretability, and
         | that's probably going to be a significant segment of the
         | sciences. Maybe it's like how engineers just need an
         | approximation that's predictive enough to build with, but
         | scientists still want to understand the underlying phenomena.
         | There will be an analogy to how some people just want an opaque
         | model that works on a restricted domain for their purposes, but
         | others will be interested in clearer models or
         | unrestricted/less restricted domain models.
         | 
         | It could lead to a very interesting ecosystem of roles.
         | 
         | Even if you just limit the discussion to using the best model
         | of X to design a better Y, limited to the model's domain of
         | validity, that might translate the usage problem to finding
         | argmax_X of valueFunction of modelPrediction of design of X. In
         | some sense a good predictive model is enough to solve this with
         | brute force, but this still leaves room for tons of fascinating
         | foundational work. Maybe you start to find that the (wow so
         | small) errors in modelPrediction are correlated with
         | valueFunction, so the most accurate predictions don't make it
         | the best for argmax (aka optimization might exploit model
         | errors rather than optimizing the real thing). Or maybe brute
         | force just isn't computationally feasible, so you need to
         | understand something deeper about the problem to simplify the
         | optimization to make it cheap.
        
         | RandomLensman wrote:
         | We could be entering a new age of epicycles - high accuracy but
         | very flawed understanding.
        
         | advisedwang wrote:
         | In physics, we already deal with the fact that many of the core
         | equations cannot be analytically solved for more than the most
         | basic scenarios. We've had to adapt to using approximation
         | methods and numerical methods. This will have to be another
         | place where we adapt to a practical way of getting results.
        
         | topaz0 wrote:
         | In case it's not clear, this does not "beat" experimental
         | structure determination. The matches to experiment are pretty
         | close, but they will be closer in some cases than others and
         | may or may not be close enough to answer a given question about
         | the biochemistry. It certainly doesn't give much information
         | about the dynamics or chemical perturbations that might be
         | relevant in biological context. That's not to pooh-pooh
         | alphafold's utility, just that it's a long way from making
         | experimental structure determination unnecessary, and much much
         | further away from replacing a carefully chosen scientific
         | question and careful experimental design.
        
         | bluerooibos wrote:
         | > What happens when...
         | 
         | I can only assume that existing methods would still be used for
         | verification. At least we understand the logic used behind
         | these methods. The ML models might become more accurate on
         | average but they could still throw out results that are way off
         | occasionally, so their error rate would have to become equal to
         | the existing methods.
        
         | GistNoesis wrote:
         | The frontier in model space is kind of fluid. It's all about
         | solving differential equations.
         | 
         | In theoretical physics, you know the equations, you solve
         | equations analytically, but you can only do that when the model
         | is simple.
         | 
         | In numerical physics, you know the equations, you discretize
         | the problem on a grid, and you solve the constraint defined by
         | the equations with various numerical integration schemes like
         | RK4, but you can only do that when the model is small and you
         | know the equations, and you find a single solution.
         | 
         | Then you want the result faster, so you use mesh-free methods
         | and adaptive grids. It works on bigger models but you have to
         | know the equations, finding a single solution to the
         | differential equations.
         | 
         | Then you compress this adaptive grid with a neural network,
         | while still knowing the governing equations, and you have
         | things like Physics Informed Neural Networks (
         | https://arxiv.org/pdf/1711.10561 and following papers) where
         | you can bound the approximation error. This method allows solve
         | all solutions to the differential equations simultaneously,
         | sharing the computations.
         | 
         | Then when knowing explicitly your governing equations is too
         | complex, so you assume that there are some governing stochastic
         | equations implicitly, which you learn the end-result of the
         | dynamic with a diffusion model, that's what this alpha-fold is
         | doing.
         | 
         | ML is kind of a memoization technique, analog to hashlife in
         | the game of life, that allows you reuse your past computational
         | efforts. You are free to choose on this ladder which memory-
         | compute trade-off you want to use to model the world.
        
         | visarga wrote:
         | No, science doesn't work that way. You can just calculate your
         | way to scientific discoveries, you got to test them in the real
         | world. Learning, both in humans and AI, is based on the signals
         | provided by the environment. There are plenty of things not
         | written anywhere, so the models can't simply train on human
         | text to discover new things. They learn directly from the
         | environment to do that, like AlphaZero did when it beat humans
         | at Go.
        
         | slibhb wrote:
         | It's interesting to compare this situation to earlier eras in
         | science. Newton, for example, gave us equations that were very
         | accurate but left us with no understanding at all of _why_ they
         | were accurate.
         | 
         | It seems like we're repeating that here, albeit with wildly
         | different methods. We're getting better models but by giving up
         | on the possibility of actually understanding things from first
         | principles.
        
           | slashdave wrote:
           | Not comparable. Our current knowledge of the physics involved
           | in these systems is complete. It is just impossibly difficult
           | to calculate from first principles.
        
         | ChuckMcM wrote:
         | Interesting times indeed. I think the early history of
         | medicines takes away from your observation though. In the 19th
         | and early 20th century people didn't know _why_ medicines
         | worked, they just did. The whole  "try a bunch of things on
         | mice, pick the best ones and try them on pigs, and then the
         | best of those and try a few on people" kind of thing. In many
         | ways the mice were a stand in for these models, at the time
         | scientists didn't understand nearly as much about how mice
         | worked (early mice models were pretty crude by today's
         | standards) but they knew they were a close enough analog to the
         | "real thing" that the information provided by mouse studies was
         | usefully translated into things that might help/harm humans.
         | 
         | So when you're tools can produce outputs that you find useful,
         | you can then use those tools to develop your understanding and
         | insights. As a tool, this is quite good.
        
         | aaroninsf wrote:
         | The top HN response to this should be,
         | 
         | what happens is an opportunity has entered the chat.
         | 
         | There is a wave coming--I won't try to predict if it's the next
         | one--where the hot thing in AI/ML is going to be profoundly
         | powerful tools for analyze other such tools and render them
         | intelligible to us,
         | 
         | which will I imagine mean providing something like a zoomable
         | explainer. At every level there are footnotes; if you want to
         | understand why the simplified model is a simplification, you
         | look at the fine print. Which has fine print. Which has...
         | 
         | Which doesn't mean there is not a stable level at which some
         | formal notion of "accurate" cannot be said to exist, which is
         | the minimum viable level of simplification.
         | 
         | Etc.
         | 
         | This sort of thing will of course will the input to many other
         | things.
        
         | signal_space wrote:
         | Is alphafold doing model generation or is it just reducing a
         | massive state space?
         | 
         | The current computational and systems biochemistry approaches
         | struggle to model large biomolecules and their interactions due
         | to the large degrees of freedom of the models.
         | 
         | I think it is reasonable to rely on statistical methods to lead
         | researchers down paths that have a high likelihood of being
         | correct versus brute forcing the chemical kinetics.
         | 
         | After all chemistry is inherently stochastic...
        
         | jononor wrote:
         | I think it likely that instead of replacing existing methods,
         | we will see a fusion. Or rather, many different kinds of
         | fusions - depending on the exact needs of the problems at hand
         | (or in science, the current boundary of knowledge). If nothing
         | else then to provide appropriate/desirable level of
         | explainability, correctness etc. Hypothetically the combination
         | will also have better predictive performance and be more data
         | efficient - but it remains to be seen how well this plays out
         | in practice. The field of "physics informed machine learning"
         | is all about this.
        
         | Grieverheart wrote:
         | Perhaps for understanding the structure itself, but having the
         | structure available allows us to focus on a coarser level. We
         | also don't want to use quantum mechanics to understand the
         | everyday world, and that's why we have classic mechanics etc.
        
         | nico wrote:
         | Even if we don't understand the models themselves, you can
         | still use them as a basis for understanding
         | 
         | For example, I have no idea how a computer works in every
         | minute detail (ie, exactly the physics and chemistry of every
         | process that happens in real time), but I have enough of an
         | understanding of what to do with it, that I can use it as an
         | incredibly useful tool for many things
         | 
         | Definitely interesting times!
        
         | phn wrote:
         | I'm not a scientist by any means, but I imagine even accurate
         | opaque models can be useful in moving the knowledge forward.
         | For example, they can allow you to accurately simulate reality,
         | making experiments faster and cheaper to execute.
        
         | GuB-42 wrote:
         | We already have the absolute best method for accurately
         | predicting the world, and it is by experimentation. In the
         | protein folding case, it works by actually making the protein
         | and analyzing it. For designing airplanes, computer models are
         | no match for building the thing, or even using physical models
         | and wind tunnels.
         | 
         | And despite having these "best method", it didn't prevent
         | progress in theoretical physics, theory and experimentation
         | complement each other.
         | 
         | ML models are just another kind of model that can help both
         | engineering and fundamental research. Their working is close to
         | the old guy in the shop who knows intuitively what is good
         | design, because he has seen it all. That old guys in shops are
         | sometimes better than modeling using physics equations help
         | scientific progress, as scientists can work together with the
         | old guy, combining the strength of intuition and experience
         | with that of scientific reasoning.
        
         | jpadkins wrote:
         | Hook the protein model up to an LLM model, have the LLM
         | interpret the results. Problem solved :-) Then we just have to
         | trust the LLM is giving us correct interpretations.
        
         | flawsofar wrote:
         | How do they compare on accuracy per watt?
        
         | theGnuMe wrote:
         | The models are learning an encoding based on evolutionary
         | related and known structures. We should be able to derive
         | fundamental properties from those encodings eventually. Or at
         | least our biophysical programmed models should map into that
         | encoding. That might be a reasonable approach to look at the
         | folding energy landscape.
        
         | MobiusHorizons wrote:
         | Is it capable of predictions though? Ie can it accurately
         | predict the folding of new molecules? Otherwise how do you
         | distinguish accuracy from overfitting.
        
         | slashdave wrote:
         | In terms of docking, you can call the conventional approaches
         | "physically-based", however, they are rather poor physical
         | models. Namely, they lack proper electrostatics, and, most
         | importantly, basically ignore entropic contributions. There is
         | no reason for concern.
        
         | trueismywork wrote:
         | To paraphrase Kahan, it's not interesting to me whether a
         | method is accurate enough or not, but whether you can predict
         | how accurate you can be. So, if ML methods can predict that
         | they're right 98% of times then we can build this in our
         | systems, even if we don't understand how they work.
         | 
         | Deterministic methods can predict result with a single run, ML
         | methods will need ensemble of results to show the same
         | confidence. It is possible at the end of day that the
         | difference in cost might not he that high over time.
        
         | abledon wrote:
         | Next decade we will focus on building out debugging and
         | visualization tools for deep learning , to glance inside the
         | current black box
        
         | hyperthesis wrote:
         | Engineering often precedes Science. It's just more data.
        
         | salty_biscuits wrote:
         | I'd say it's not new. Take fluid dynamics as an example, the
         | navier stokes equations predict the motion of fluids very well
         | but you need to approximately solve them on a computer in order
         | to get useful predictions for most setups. I guess the
         | difference is the equation is compact and the derivation from
         | continuum mechanics is easy enough to follow. People still rely
         | on heuristics to answer "how does a wing produce lift?". These
         | heuristic models are completely useless at "how much lift will
         | this particular wing produce under these conditions?". Seems
         | like the same kind of situation. Maybe progress forward will
         | look like producing compact models or tooling to reason about
         | why a particular thing happened.
        
         | Brian_K_White wrote:
         | Perhaps an ai can be made to produce the work as well as a
         | final answer, even if it has to reconstruct or invent the work
         | backwards rather than explain it's own internal inscrutable
         | process.
         | 
         | "produce a process that arrives at this result" should be just
         | another answer it can spit out. We don't necessarily care if
         | the answer it produces is actually the same as what originally
         | happened inside itself. All we need is that the answer checks
         | out when we try it.
        
         | JacobThreeThree wrote:
         | As a tool people will use it as any other tool, by
         | experimenting, testing, tweaking and iterating.
         | 
         | As a scientific theory for fundamentally explaining the nature
         | of the universe, maybe it won't be as useful.
        
       | qwertox wrote:
       | > Thrilled to announce AlphaFold 3 which can predict the
       | structures and interactions of nearly all of life's molecules
       | with state-of-the-art accuracy including proteins, DNA and RNA.
       | [1]
       | 
       | There's a slight mismatch between the blog's title and Demis
       | Hassabis' tweet, where he uses "nearly all".
       | 
       | The blog's title suggests that it's a 100% solved problem.
       | 
       | [1] https://twitter.com/demishassabis/status/1788229162563420560
        
         | bmau5 wrote:
         | Marketing vs. Reality :)
        
         | TaupeRanger wrote:
         | First time reading a Deep Mind PR? This is literally their
         | modus operandi.
        
       | nybsjytm wrote:
       | Important caveat: it's only about 70% accurate. Why doesn't the
       | press release say this explicitly? It seems intentionally
       | misleading to only report accuracy relative to existing methods,
       | which apparently are just not so good (30%, 50% in various
       | settings). https://www.fastcompany.com/91120456/deepmind-
       | alphafold-3-dn...
        
         | bluerooibos wrote:
         | That's pretty good. Based on the previous performance
         | improvements of Alpha-- models, it'll be nearing 100% in the
         | next couple of years.
        
           | nybsjytm wrote:
           | Just "Alpha-- models" in general?? That's not a remotely
           | reasonable way to reason about it. Even if it were, why
           | should it stop DeepMind from clearly communicating accuracy?
        
             | dekhn wrote:
             | The way I think about this (specifically, deepmind not
             | publishing their code or sharing their exact experimental
             | results): advanced science is a game played by the most
             | sophisticated actors in the world. Demis is one of those
             | actors, and he plays the games those actors play better
             | than anybody else I've ever seen. Those actors don't care
             | much about the details of any specific system's accuracy:
             | they care to know that it's possible to do this, and some
             | general numbers about how well it works, and some hints
             | what approaches they should take. And Nature, like other
             | top journals, is more than willing to publish articles like
             | this because they know it stimulates the most competitive
             | players to bring their best games.
             | 
             | (I'm not defending this approach, just making an
             | observation)
        
               | nybsjytm wrote:
               | I think it's important to qualify that the relevant
               | "game" is not advanced science per se; the game is
               | business whose product is science. The aim isn't to do
               | novel science; it's to do something which can be
               | advertised as novel science. That isn't to cast
               | aspersions on the personal motivations of Hassabis or any
               | other individual researcher working there (which itself
               | isn't to remove their responsibilities to public
               | understanding); it's to cast aspersions on the structure
               | that they're part of. And it's not to say that they can't
               | produce novel or important science as part of their work
               | there. And it's also not to say that the same tension
               | isn't often present in the science world - but I think
               | it's present to an extreme degree at DeepMind.
               | 
               | (Sometimes the distinction between novel science and
               | advertisably novel science is very important, as seems to
               | be the case in the "new materials" research dopylitty
               | linked to in these comments: here
               | https://www.404media.co/google-says-it-discovered-
               | millions-o...)
        
             | 7734128 wrote:
             | I'm quite hyped for the upcoming BetaFold, or even
             | ReleaseCandidateFold models. They just have to be great.
        
           | akira2501 wrote:
           | > it'll be nearing 100% in the next couple of years.
           | 
           | What are you basing this on? There is no established "moores
           | law" for computational models.
        
         | Aunche wrote:
         | IIRC the next best models all have all been using AlphaFold 2's
         | methodology, so that's still a massive improvement.
         | 
         | Edit: I see now that you're probably objecting to the headline
         | that got edited on HN.
        
           | nybsjytm wrote:
           | Not just the headline, the whole press release. And not
           | questioning that it's a big improvement.
        
       | j7ake wrote:
       | So it's okay now to publish a computational paper with no code? I
       | guess Nature's reporting standards don't apply to everyone.
       | 
       | > A condition of publication in a Nature Portfolio journal is
       | that authors are required to make materials, data, code, and
       | associated protocols promptly available to readers without undue
       | qualifications.
       | 
       | > Authors must make available upon request, to editors and
       | reviewers, any previously unreported custom computer code or
       | algorithm used to generate results that are reported in the paper
       | and central to its main claims.
       | 
       | https://www.nature.com/nature-portfolio/editorial-policies/r...
        
         | boxed wrote:
         | Are you an editor or reviewer?
        
           | HanClinto wrote:
           | Good question.
           | 
           | Also makes me wonder -- where's the line? Is it reasonable to
           | have "layperson" reviewers? Is it reasonable to think that
           | regular citizens could review such content?
        
             | Kalium wrote:
             | I think you will find that for the vast, vast majority of
             | scientific papers there is significant negative expected
             | value to even attempting to have layperson reviewers. Bear
             | in mind that we're talking about papers written by experts
             | in a specific field aimed at highly technical communication
             | with other people who are experts in the same field. As a
             | result, the only people who can usefully review the
             | materials are drawn from those who are also experts in the
             | same field.
             | 
             | For an instructive example, look up the seminal paper on
             | the structure of DNA:
             | https://www.mskcc.org/teaser/1953-nature-papers-watson-
             | crick... Ask yourself how useful comments from someone who
             | did not know what an X-ray is, never mind anything about
             | organic chemistry, would be in improving the quality of
             | research or quality of communication between experts in
             | both fields.
        
             | _just7_ wrote:
             | No, infact most journals have peer reviews cordoned off,
             | not viewable to the general public.
        
               | lupire wrote:
               | That's pre-publication review, not scientific peer
               | review. Special interests try to conflate the two, to
               | bypass peer review and transform science into a religion.
               | 
               | Peer review properly refers to the general process of
               | science advancing by scientists reviewing each other's
               | published work.
               | 
               | Publishing a work is the middle, not the end of the
               | research.
        
           | j7ake wrote:
           | If you read the standards it applies broadly beyond reviewers
           | or editors.
           | 
           | > A condition of publication in a Nature Portfolio journal is
           | that authors are required to make materials, data, code, and
           | associated protocols promptly available to readers without
           | undue qualifications.
        
         | dekhn wrote:
         | Nature has long been willing to break its own rules to be at
         | the forefront of publishing new science.
        
       | ein0p wrote:
       | I'm inclined to ignore such pr fluff until they actually
       | demonstrate a _practical_ result. Eg. cure some form of cancer or
       | some autoimmune disease. All this "prediction of structure" has
       | been in the news for years, and it seems to have resulted in
       | nothing practically usable IRL as far as I can tell. I could be
       | wrong of course, I do not work in this field
        
         | dekhn wrote:
         | the R&D of all major pharma is currently using AlphaFold
         | predictions when they don't have experimentally determined
         | structures. I cannot share further details but the results
         | suggest that we will see future pharmaceuticals based on AF
         | predictions.
         | 
         | The important thing to recognize is that protein structures are
         | primarily hypothesis-generation machines and tools to stimulate
         | ideas, rather that direct targets of computational docking.
         | Currently structures rarely capture the salient details
         | required to identify a molecule that has precisely the
         | biological outcome desired, because the biological outcome is
         | an extremely complex function that incorporates a wide array of
         | other details, such as other proteins, metabolism, and more.
        
           | ein0p wrote:
           | Sure. If/when we see anything practical, that'll be the right
           | moment to pay attention. This is much like "quantum
           | computing" where everyone who doesn't know what it is is
           | excited for some reason, and those that do know can't even
           | articulate any practical applications
        
             | dekhn wrote:
             | Feynman already articulated the one practical application
             | for quantum computing: using it to simulate complex systems
             | (https://www.optica-
             | opn.org/home/articles/on/volume_11/issue_... and
             | https://calteches.library.caltech.edu/1976/ and
             | https://s2.smu.edu/~mitch/class/5395/papers/feynman-
             | quantum-...
             | 
             | These approaches are now being explored but I haven't seen
             | any smoking guns showing a QC-based simulation exceeding
             | the accuracy of a classical computer for a reasonable
             | investment.
             | 
             | Folks have suggested other areas, such as logistics, where
             | finding small improvements to the best approximations might
             | give a company a small edge, and crypto-breaking, but there
             | has been not that much progress in this area, and the
             | approximate methods have been improving rapidly.
        
         | arolihas wrote:
         | There are a few AI-designed drugs in various phases of clinical
         | trials, these things take time.
        
       | mchinen wrote:
       | I am trying to understand how accurate the docking predictions
       | are.
       | 
       | Looking at the PoseBusters paper [1] they mention, they say they
       | are 50% more accurate than traditional methods.
       | 
       | DiffDock, which is the best DL based systems gets 30-70%
       | depending on the dataset, and traditional gets 50-70%. The paper
       | highlighted some issues with the DL-based methods and given that
       | DeepMind would have had time to incorporate this into their work
       | and develop with the PoseBusters paper in mind, I'd hope it's
       | significantly better than 50-70%. They say 50% better than
       | traditional so I expected something like 70-85% across all
       | datasets.
       | 
       | I hope a paper will appear soon to illuminate these and other
       | details.
       | 
       | [1]
       | https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc0418...
        
       | dsign wrote:
       | For a couple of years I've been expecting that ML models would be
       | able to 'accelerate' bio-molecular simulations, using physics-
       | based simulations as ground truth. But this seems to be a step
       | beyond that.
        
         | dekhn wrote:
         | When I competed in CASP 20 years ago (and lost terribly) I
         | predicted that the next step to improve predictions would be to
         | develop empirically fitted force fields to make MD produce
         | accurate structure predictions (MD already uses empirically
         | fitted force fields, but they are not great). This area was
         | explored, there are now better force fields, but that didn't
         | really push protein structure prediction forward.
         | 
         | Another approach is fully differentiable force fields- the idea
         | that the force field function itself is a trainable structure
         | (rather than just the parameters/weights/constants) that can be
         | optimized directly towards a goal. Also explored, produced some
         | interesting results, but nothing that woudl be considered
         | transformative.
         | 
         | The field still generally believes that if you had a perfect
         | force field and infinite computing time, you could directly
         | recapitulate the trajectories of proteins folding (from fully
         | unfolded to final state along with all the intermediates), but
         | that doesn't address any practical problems, and is massively
         | wasteful of resources compared to using ML models that exploit
         | evolutionary information encoded in sequence and structures.
         | 
         | In retrospect I'm pretty relieved I was wrong, as the new
         | methods are more effective with far fewer resources.
        
       | xnx wrote:
       | Very cool that anyone can login to
       | https://golgi.sandbox.google.com/ and check it out
        
       | bschmidt1 wrote:
       | Google's Game of Life 3D: Spiral edition
        
       | uptownfunk wrote:
       | Very sad to see they did not make it open source. When you have a
       | technology that has the potential to be a gateway for drug
       | development, to the cures of new diseases, and instead you choose
       | to make it closed, it is a very huge disservice to the community
       | at large. Sure, release your own product alongside it, but making
       | it closed source does not help the scientific community upon
       | which all these innovations were built. Especially if you have
       | lost a loved one to a disease which this technology will one day
       | be able to create cures for, it is very disappointing.
        
         | falcor84 wrote:
         | The closer it gets to enabling full drug discovery, the closer
         | it also gets to enabling bioterrorism. Taking it to the
         | extreme, if they had the theory of everything, I don't think
         | I'd want it to be made available to the whole world as it is
         | today.
         | 
         | On a related note, I highly recommend The Talos Principle 2,
         | which really made me think about these questions.
        
           | pythonguython wrote:
           | Any organization/country that has the ability to use a tool
           | like this to create a bio weapon is already sophisticated
           | enough to do bioterrorism today.
        
             | ramon156 wrote:
             | Alright, but now picture this: it's now open to the masses,
             | meaning an individual could probably even do it.
        
               | uptownfunk wrote:
               | [delayed]
        
           | LouisSayers wrote:
           | Why do you need AI for bioterrorism? There are plenty of well
           | known biological organisms that can kill us today...
        
       | nojvek wrote:
       | So much hyperbole from recent Google releases.
       | 
       | I wish they didn't hype AI so much, but I guess that's what
       | people want to hear, so they say that.
        
         | sangnoir wrote:
         | I don't blame them for hyping their products - if only to fight
         | the sentiment that Google is far behind OpenAI because they
         | were not first to release a LLM.
        
       | LarsDu88 wrote:
       | As a software engineer, I kind of feel uncomfortable about this
       | new model. It outperforms Alphafold 2 at ligand binding, but
       | Alphafold 2 also had some more hardcoded and interpretable
       | structural reasoning baked into the model architecture.
       | 
       | There's so many things you can incorporate into a protein folding
       | model such as structural constraints, rotational equivariance,
       | etc, etc
       | 
       | This new model simple does away with some of that, achieving
       | greater results. And the authors simply use distillation from
       | data outputted from Alphafold2 and Alphafold2-multimer to get
       | those better results for those cases where you wind up with
       | implausible results.
       | 
       | You have to run all those previous models, and output their
       | predictions to do the distillation to achieve a real end-to-end
       | training from scratch for this new model! Makes me feel a bit
       | uncomfortable.
        
         | amitport wrote:
         | Consider that humans also learn from other humans, and
         | sometimes surpass their teachers.
         | 
         | A bit more comfortable?
        
           | Balgair wrote:
           | Ahh, but the new young master is able to explain their work
           | and processes to the satisfaction of the old masters. In the
           | 'Science' of our modern times it's a requirement to show your
           | work (yes, yes, I know about the replication crisis and all
           | that terrible jazz).
           | 
           | Not being able to ascertain how and why the ML/AI is
           | achieving results is not quite the same and more akin to the
           | alchemists and sorcerers with their cyphers and hidden
           | laboratories.
        
             | falcor84 wrote:
             | > the new young master is able to explain their work and
             | processes to the satisfaction of the old masters
             | 
             | Yes, but it's one level deep - in general they wouldn't be
             | able to explain their work to their master's master (note
             | "science advances one funeral at a time").
        
         | sangnoir wrote:
         | > Makes me feel a bit uncomfortable.
         | 
         | Why? Do compilers which can't bootstrap themselves also make
         | you uncomfortable due to dependencies on pre-built artifacts?
         | I'm not saying you're unjustified to feel that way, but
         | sometimes more abstracted systems are quicker to build and may
         | have better performance than those built from the ground up.
         | Selecting which one is better depends on your constraints and
         | taste
        
       | roody15 wrote:
       | I wonder in the not too distant future if these AI predictions
       | could be explained back into "humanized" understanding. Much like
       | ChatGPT can simplify complex topics ... cold the model in the
       | future provide feedback to researchers why it is making this
       | prediction?
        
       | reliablereason wrote:
       | Would be very useful if one they used it to predict the structure
       | and interaction of the known variants to.
       | 
       | Would be very helpful when predicting if a mutation on a protein
       | would lead to loss of function for the protein.
        
       | mfld wrote:
       | The improvement on predicting protein/RNA/ligand interactions
       | might facilitate many commercially relevant use cases. I assume
       | pharma and biotech will eagerly get in line to use this.
        
       | tonyabracadabra wrote:
       | Very cool, and what's cooler is this rap about alphafold3
       | https://heymusic.ai/blog/news/alphafold-3
        
       | wuj wrote:
       | This tool reminds me that the human body functions much like a
       | black box. While physics can be modeled with equations and
       | constraints, biology is inherently probabilistic and
       | unpredictable. We verify the efficacy of a medicine by observing
       | its outcomes: the medicine is the input, and the changes in
       | symptoms are the output. However, we cannot model what happens in
       | between, as we cannot definitively prove that the medicine
       | affects only its intended targets. In many ways, much of what we
       | understand about medicine is based on observing these black-box
       | processes, and this tool helps to model that complexity.
        
       | lysozyme wrote:
       | Probably worth mentioning that David Baker's lab released a
       | similar model (predicts protein structure along with bound DNA
       | and ligands), just a couple of months ago, and it is open source
       | [1].
       | 
       | It's also worth remembering that it was David Baker who
       | originally came up with the idea of extending AlphaFold from
       | predicting just proteins to predicting ligands as well [2].
       | 
       | 1. https://github.com/baker-laboratory/RoseTTAFold-All-Atom
       | 
       | 2. https://alexcarlin.bearblog.dev/generalized/
       | 
       | Unlike AlphaFold 3, which predicts only a small, preselected
       | subset of ligands, RosettaFold All Atom predicts a much wider
       | range of small molecules. While I am certain that neither network
       | is up to the task of designing an enzyme, these are exciting
       | steps.
       | 
       | One of the more exciting aspects of the RosettaFold paper is that
       | they train the model for predicting structures, but then also use
       | the structure predicting model as the denoising model in a
       | diffusion process, enabling them to actually design new
       | functional proteins. Presumably, DeepMind is working on this
       | problem as well.
        
         | theGnuMe wrote:
         | And that tech just got $1b in funding.
        
         | refulgentis wrote:
         | I appreciated this, but it's probably worth mentioning: when
         | you say AlphaFold 3, you're talking about AlphaFold 2.
         | 
         | TFA announces AlphaFold 3.
         | 
         | Post: "Unlike AlphaFold 3, which predicts only a small,
         | preselected subset of ligands, RosettaFold All Atom predicts a
         | much wider range of small molecules"
         | 
         | TFA: "AlphaFold 3...*models large biomolecules such as
         | proteins, DNA and RNA*, as well as small molecules, also known
         | as ligands"
         | 
         | Post: "they also use the structure predicting model as the
         | denoising model in a diffusion process...Presumably, DeepMind
         | is working on this problem as well."
         | 
         | TFA: "AlphaFold 3 assembles its predictions using a diffusion
         | network, akin to those found in AI image generators."
        
       | bbstats wrote:
       | Zero-shot nearly beating trained catboost is pretty amazing.
        
       | thenerdhead wrote:
       | A lot of accelerated article previews as of recently. Seems like
       | humanity is making a lot of breakthroughs.
       | 
       | This is nothing short of amazing for all those suffering from
       | disease.
        
       | ak_111 wrote:
       | If you work in this space would be interested to know what
       | _material_ impact has alphafold caused in your workflow since its
       | release 4 years ago?
        
       | lumb63 wrote:
       | Would anyone more familiar with the field be able to provide some
       | cursory resources on the protein folding problem? I have a
       | background in computer science and a half a background in biology
       | (took two semesters of OChem, biology, anatomy; didn't go much
       | further).
        
       | MPSimmons wrote:
       | Not sure why the first thing they point it at wouldn't be prions.
        
       | itissid wrote:
       | Noob here. Can one make the following deduction:
       | 
       | In transformer based architectures, where one typically uses
       | variation of attention mechanism to model interactions, even if
       | one does not consider the autoregressive assumption of the
       | domain's "nodes"(amino acids, words, image patches), if the
       | number of final states that nodes take eventually can be permuted
       | only in a finite way(i.e. they have sparse interactions between
       | them), then these architectures are efficient way of modeling
       | such domains.
       | 
       | In plain english the final state of words in a sentence and amino
       | acids in a protein have only so many ways they can be arranged
       | and transformers do a good job of modeling it.
       | 
       | Also can one assume this won't do well for domains where there
       | is, say, sensitivity to initial conditions, like chaotic systems
       | like wheather where the # final states just explodes?
        
       | ricksunny wrote:
       | I'm interested in how they measure accuracy of binding site
       | identification and binding pose prediction. This was missing for
       | the hitherto widely-used binding pose prediction tool Autodock
       | Vina (and in silico binding pose tools in general). Despite the
       | time I invested in learning & exercising that tool, I avoided
       | using it for published research because I could not credibly cite
       | its general-use accuracy. Is / will Alphafold 3 be citeable in
       | the sense of "I have run Alphafold on this particular target of
       | interest and this array of ligands, and have found these poses of
       | X kJ/mol binding energy, and this is known to an accuracy of Y%
       | because of Alphafold 3's training set results cited below'
        
         | l33tman wrote:
         | I've never trusted those predicted binding energies. If you
         | have predicted a ligand/protein complex and have high
         | confidence in it and want to study the binding energy I really
         | think you should do a full MD simulation, you can pull the
         | ligand-protein complex apart and measure the change in free
         | energy explicitly.
         | 
         | Also, and this is an unfounded guess only, the problem of
         | protein / ligand docking is quite a bit more complex than
         | protein folding - there seems to be a finite set of overall
         | folds used in nature, while docking a small ligand to a big
         | protein with flexible sidechains and even flexible large-scale
         | structures can have induced fits that are really important to
         | know and estimate, and I'm just very sceptical that it's going
         | to be possible to in a general fashion ever predict these
         | accurately by the AI model with the limited training data.
         | 
         | Though you just need some hints, then you can run MD sims on
         | them to see what happens for real.
        
       | TaupeRanger wrote:
       | So after 6 years of this "revolutionary technology", what we have
       | to show for all the hype and breathless press releases is:
       | ....another press release saying how "revolutionary" it is.
       | Fantastic. Thanks DeepMind.
        
       | dev1ycan wrote:
       | Excited but also it's been a fair bit now and I have yet to see
       | something truly remarkable come out of this
        
       | zmmmmm wrote:
       | So much of the talk about their "free server" seems to be trying
       | to distract from the fact that they are not releasing the model.
       | 
       | I feel like it's an important threshold moment if this gets
       | accepted into scientific use without the model being available -
       | reproducibility of results becomes dependent on the good graces
       | of a single commercial entity. I kind of hope that like OpenAI it
       | just spurs creation of equivalent open models that then actually
       | get used.
        
       ___________________________________________________________________
       (page generated 2024-05-08 23:00 UTC)