[HN Gopher] Alphafold
___________________________________________________________________
Alphafold
Author : matejmecka
Score : 311 points
Date : 2021-07-15 18:22 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| swalsh wrote:
| _edit_ I was wrong. Please ignore.
| ali_m wrote:
| > This is a completely new model that was entered in CASP14 and
| published in Nature.
| f38zf5vdt wrote:
| From the repo:
|
| > This package provides an implementation of the inference
| pipeline of AlphaFold v2.0
| culopatin wrote:
| Does anyone know if this can be made to work with rna fold?
| qeternity wrote:
| Ok, so biochemists: which bit of the secret sauce are they
| leaving out?
| duckerude wrote:
| > The AlphaFold parameters are made available for non-commercial
| use only, under the terms of the Creative Commons Attribution-
| NonCommercial 4.0 International (CC BY-NC 4.0) license. You can
| find details at: https://creativecommons.org/licenses/by-
| nc/4.0/legalcode
|
| Does CC BY-NC actually do this? As far as I can tell it only
| really talks about sharing/reproducing, not using.
|
| Or is the only thing prohibiting other commercial use the words
| "available for non-commercial use only"?
| mikewarot wrote:
| If you took their parameters, then trained it for while on a
| different set of data, it would vary from the original. I
| wonder how much compute would be required to make the offset
| far enough to hold up from scrutiny, and in court.
|
| Alternatively, you could manually change the network model, add
| a few hidden layers, etc... modifying the parameters in step,
| and result in a new model and new parameters. Some training to
| vary the parameters, and it's now a new work.
| sillysaurusx wrote:
| Artbreeder has some interesting prior art here: nVidia forbid
| commercial use of StyleGAN, but artbreeder disregarded it and
| happily sold all the breeding you wanted. No one seemed to
| care.
|
| I suspect that the clause is there to prevent a startup
| launching on the basis of "see this trained model? Yeah, that's
| literally our business model" though, which is a mildly amusing
| thought, wot wot.
|
| So basically, a few tens of thousands, sure. A few million, big
| G might have a problem.
|
| Still, the smart move would be to launch the business anyway,
| and gamble that you can work out a licensing deal.
| jfengel wrote:
| So... is it possible to clone this and turn it into a
| Folding@Home client? How does it do?
| kmckiern wrote:
| Where there isn't an available crystal structure, Alphafold can
| be used to create initial structures for simulation via
| folding@home, replacing older homology modeling techniques.
|
| Source: former folding@home researcher.
| dekhn wrote:
| no, it wouldn't make sense to do that. Folding@Home is for ab
| initio where you don't have any prior info for the structure,
| this is for homology modelling. F@H probes the dynamics of
| protein folding, this just makes a static prediction.
| thesausageking wrote:
| The PDF is linked in the article:
|
| https://www.nature.com/articles/s41586-021-03819-2_reference...
| mensetmanusman wrote:
| Distribution of this 2 TB file seems like a good use of
| torrent...
| dekhn wrote:
| Fantastic, they released the dataset and code to train the model.
| Science will be able to proceed. edit: not the code to train the
| model, just the code to run inference.
|
| The underlying sequence datasets include PDB strucrures and
| sequences, and how those map to large collections of sequences
| with no known structure (no surprise). Each of those datasets
| represents decades of thousands of scientists work, along with
| programmers and admins who kept the databases running for decades
| with very little grant money (funding long-term databases is
| something NIH hated to do until recently).
| FredFS456 wrote:
| There's a preview paper as well:
| https://www.nature.com/articles/s41586-021-03819-2
| dekhn wrote:
| Yes, I skimmed the paper already and it wasn't too
| surprising. There are details that will take some time to
| parse out to understand how important they are.
|
| Personally, I've found over decades that academic papers like
| that are far less useful to me than a github project and
| downloadable data that I can inspect, run and modify on my
| own. Other folks I know could read that paper and write the
| code in a day, I always wish I could do that.
| cing wrote:
| The process is described in Supplementary, but where do you see
| the code to train the model? The repository is the inference
| pipeline.
| dekhn wrote:
| I misread. The data dump is required for inference.
| gopalv wrote:
| > The total download size is around 428 GB and the total size
| when unzipped is 2.2 TB. Please make sure you have a large
| enough hard drive space, bandwidth and time to download.
|
| > This was tested on Google Cloud with a machine using the
| nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a 100 GB
| boot disk, the databases on an additional 3 TB disk, and an
| A100 GPU.
|
| This is amazingly detailed for a researcher who wants to follow
| in the track and also Apache licensed, which is one road-bump
| out of the way for a commercial enterprise, like an actual drug
| manufacturer who wants to burn some money trying this out.
|
| edit: said the last part too fast, the code has a "the
| AlphaFold parameters are made available for non-commercial use
| only under the terms of the CC BY-NC 4.0 license"
| dekhn wrote:
| Yes, all science should be communicated in the form of an
| academic paper wiht a supporting git repo and quickly
| downloadable dataset and a fast path to reproducing the work.
| That would be a huge change from the establishment.
|
| It's quite unclear what value this will have to pharma;
| personally I doubt this has any direct applications (and I'm
| one of the few people in the world that can say that with
| deep authority).
| aantix wrote:
| Who benefits from this work?
| dekhn wrote:
| Primarily the community that previously depended on
| homology models.
| gnufx wrote:
| Surely not all science. Just as well Dirac wasn't required
| to communicate that way the equation that fundamentally
| underlies the phenomenon discussed, and you couldn't put
| the unique facility my thesis work pioneered into git! I do
| highly approve of publishing software and data where
| possible, of course, since before Free Software needed to
| be coined, and it's much easier now.
| dekhn wrote:
| If you're just publishing equations, you should have an
| associated notebook which executes the equations.
|
| I don't know what you mean you can't put your thesis work
| into git. Is it a physical thing? Too big for git?
| astro-codes wrote:
| Why wouldn't this have much value to pharma? Is it because
| its application is actually really limited in scope?
| dekhn wrote:
| there are research groups this would be useful for but
| structures are not on the critical path to drug discovery
| or approval.
| [deleted]
| dekhn wrote:
| I missed an important detail: """an academic team has developed
| its own protein-prediction tool inspired by AlphaFold 2, which is
| already gaining popularity with scientists. That system, called
| RoseTTaFold, performs nearly as well as AlphaFold 2, and is
| described in a paper in Science paper also published on 15
| July"""
|
| One of the things I say about CASP has to be updated. It used to
| be "2 years after Baker wins CASP, the other advanced teams have
| duplicated his methods and accuracy, and 4 years after,
| everything Baker did is now open source and trivially
| reproducible"
|
| now, it's baker catching up to DeepMind and it took about a year
|
| https://doi.org/10.1126/science.abj8754
| radus wrote:
| Very cool! Great to see this competition between academia and
| industry yielding improvements on all fronts.
| Cas9 wrote:
| Honest question: since AlphaFold doesn't really _solve_ the
| protein folding problem (it's NP-complete after all), but only
| _approximates_ solutions very well, what are the real impacts of
| this? Isn't a good approximation of a protein enough to cause
| unexpected problems? How do we know that an approximate structure
| will perform the same as the correct solution?
| radus wrote:
| Yes, it is still useful. Even structures obtained through
| traditional means (eg. x-ray crystallography) are
| approximations to an extent since there are limits to the
| resolution that you can obtain and oftentimes regions of
| proteins are "disordered". Additionally, these structures are
| only snapshots of a protein in a particular state, which may
| not completely reflect the dynamics of the protein in its
| native environment.
| nmca wrote:
| NP completeness tells you about the hardest cases, not the most
| useful cases.
| thxg wrote:
| > (it's NP-complete after all)
|
| Protein folding is a physical/biological phenomenon. AFAIK we
| don't currently have a proper exact mathematical formulation of
| the problem that would let one determine its complexity.
|
| You may be referring to this paper [1]. It only claims that one
| particular optimization problem, believed to give a solution to
| protein folding problems, is NP-hard. So, even if a suitable
| exact formulation exists, it is not yet proven that protein
| folding is hard, although it for sure seems plausible.
|
| By the way, it is perfectly possible today to solve some very
| large-scale NP-hard problems (think millions of variables and
| constraints) in reasonable amounts of time (think minutes or
| hours). Examples are knapsack problems, SAT problems [2], the
| Traveling Salesman Problem [3] or more generally Mixed Integer
| Programming [4].
|
| [1] "Complexity of protein folding", 1993, by Aviezri S.
| Fraenkel
|
| [2] http://www.satcompetition.org
|
| [3] http://www.math.uwaterloo.ca/tsp/
|
| [4] http://plato.asu.edu/bench.html
| hobofan wrote:
| I would expect that once AlphaFold has helped you identify a
| potential protein (e.g. as a drug) out of a bigger set of
| potential proteins, there will still be a manual step of
| traditional cryoEM, NMR, etc. to get an accurate high-
| resolution structure.
| t_serpico wrote:
| To me, the interesting thing is not the specific results but
| rather that you can accurately predict crystal structures from
| sequence alone. This begets the question: what other physical
| biological properties can we predict?
| saithound wrote:
| AlphaFold is not about solving any kind of NP-complete problem.
|
| Proteins consist of chains of amino acids which spontaneously
| fold up to form a structure. Understanding how the amino acid
| chain determines the protein structure is highly challenging,
| and this is called the "protein folding problem".
|
| People use mathematical models to predict how proteins fold in
| nature. Many such mathematical models are stated in terms such
| as "proteins fold into a configuration that minimizes a certain
| energy function". Even the simplest such models [1] give rise
| to NP-hard decision problems, which are also known (somewhat
| confusingly) as "protein folding problems". To make this a bit
| less confusing, I will call the mathematical decision problems
| PFPs.
|
| Like all mathematical models, our protein folding models don't
| correspond exactly to reality. Even if you are somehow able to
| determine the exact mathematical solution to a mathematical
| PFP, that _still_ doesn't guarantee that the real protein that
| you were trying to model behaves like the mathematical solution
| would indicate. E.g. the protein may fold in such a way that it
| gets stuck in a local optimum of the energy function you were
| using.
|
| How do we detect this? We make inferences about how the protein
| should behave, given the mathematical solution to the Protein
| Folding Problem, and then we perform experiments, and find out
| (empirically) that the protein behaves in a manner that is
| inconsistent with the inferences drawn from the mathematical
| model. Scientists _do_ do this. And they would have to do it
| even if they had a fast, exact way to solve NP-complete
| problems, because the NP-complete problems are still just part
| of a mathematical model, and need not correspond to reality in
| any way.
|
| The success of AlphaFold is not measured by how well it solves
| (or approximates) mathematical PFPs. The success of AlphaFold
| is measured by making successful predictions about how certain
| proteins will fold. And this is exactly how it was tested [2]:
| they threw it at a bunch of problems for which scientists have
| empirically determined how certain amino acid chains fold, but
| didn't release the results. And then they compared the
| solutions predicted by AlphaFold, and found that most of the
| predictions were consistent with what they knew to be the
| case.*
|
| [1] https://en.wikipedia.org/wiki/Lattice_protein
|
| [2] https://predictioncenter.org/casp14/index.cgi
|
| * That's an understatement. The solutions were really very
| good, much better than those produced by any other submission
| to CASP14.
| whimsicalism wrote:
| You want to find a protein that has X structure (since
| structure determines function to a degree).
|
| If AlphaFold is substantially more accurate at solving
| proteins, it can mean that drug discovery is faster, assays are
| faster, etc. etc.
|
| The "unexpected problems" would be caught in the assay stage.
| radus wrote:
| Kind of disagree with this.. solving protein structures is
| not the rate limiting step in drug discovery or in
| biochemical assays -- not by a long shot. See this excellent
| comment by @dekhn on a related submission:
| https://news.ycombinator.com/item?id=27849046
| dekhn wrote:
| The protein folding problem is not NP complete. The "formal"
| protein folding problem, as posed (find the set of dihedral
| angles whose resulting structure has the lowest energy) might
| be, but that bears only a distant resemblance to how people
| "solve" the problem today. At the very least, the statement is
| incorrect because many proteins don't actually fold to their
| energy minimum, they get stuck in kinetic traps, and the formal
| PF defintion never accomodated that idea.
| bawolff wrote:
| I dont know much about protein folding, but for most things in
| life,exact solutions to NPC problems usually aren't needed for
| non-contrived problems. In many cases, approximations are good
| enough.
|
| Besides, this is real life - if predictions and real life
| match, that's great. If they don't, well you know you went
| wrong somewhere.
| wpasc wrote:
| A very-non-expert opinion, if an approach approximates it
| pretty well and can be improved upon, then it could end up
| being quite useful. Given that biology exists on a real,
| tangible scale then perfection in the fold prediction isn't
| necessary, instead just an approximation that is sufficiently
| good to be functionally useful.
|
| ^ That sounds like word-salad BS but I think there's some truth
| to it. I know protein folding has been postulated to be useful
| in terms of understanding basic biology, understanding disease
| pathology, and drug prediction. While a wide range of
| approximations are functionally useless, perhaps the Alphafold
| approach or some improved version of it surpasses the
| functionally useful threshold.
|
| At least I hope so
| ashtonbaker wrote:
| Not really an answer to your question, but is the problem
| really NP-complete, or just combinatorially difficult? For
| example how is this condition of NP-completeness satisfied?
|
| > it is a problem for which the correctness of each solution
| can be verified quickly [0]
|
| [0] https://en.wikipedia.org/wiki/NP-completeness
| Cas9 wrote:
| According to this answer[0] it seems it's actually NP-Hard,
| my bad. Haven't seen the proof though, and I'm not an expert.
|
| [0] https://cs.stackexchange.com/questions/128493/is-protein-
| fol...
| mrfusion wrote:
| Is it really np complete? If so we could map other np complete
| problems onto it and let biology solve it for us.
| nextos wrote:
| Alphafold 2 is very very cool, but we need a little dose of
| reality. It's still a bit away from really solving protein
| folding as it was marketed.
|
| For example, multi-complex proteins are not well predicted yet
| and these are really important in many biological processes and
| drug design:
|
| https://occamstypewriter.org/scurry/2020/12/02/no-deepmind-h...
|
| A disturbing thing is that the architecture is much less novel
| than I originally thought it would be, so this shows perhaps one
| of the major difficulties was having the resources to try
| different things on a massive set of multiple alignments. This is
| something an industrial lab like DeepMind excels at. Whereas
| universities tend to suck at anything that requires a directed
| effort of more than a handful of people.
| dekhn wrote:
| many of these resources are available, it's mostly that
| academic scientists don't have the time, money, or expertise to
| manage large datasets. However, the community has maintained
| high quality MSA database for decades and that's exactly the
| work that DM drafted off.
| gnufx wrote:
| > academic scientists don't have the time, money, or
| expertise to manage large datasets
|
| I may be cynical about general expertise, as a support
| person, but large datasets have long been stock in trade of
| areas I'm more or less familiar with, whether "large" is TBs
| or PBs like CERN experiments. (When I were a lad, it was what
| you could push past the tape interface in a few days -- data
| big in cubic feet...)
| dekhn wrote:
| Tape is worthless except for archival purposes (and it's
| not particularly good). it should not be the constraint on
| the dataset (IE, any important dataset should already be in
| live serving with replication).
|
| Very few players wrangle petabytes effectively. Many
| players _have_ petabytes, but they 're just piles of
| disorganized data that couldn't be used for training ML.
| Moving petabytes is still a huge pain and few folks have
| proficiency in giving ML algorithms high performance access
| to the data.
| zamalek wrote:
| I'm genuinely curious: could the output of Alphafold be fed
| into a classical folding algorithm (as a starting point), or is
| the output of Alphafold too far down the wrong path, in these
| cases?
| sbierwagen wrote:
| >A disturbing thing is that the architecture is much less novel
| than I originally thought it would be, so this shows perhaps
| one of the major difficulties was having the resources to try
| different things on a massive set of multiple alignments.
|
| A similar concern has sparked some worries about "AI overhang"
| https://www.lesswrong.com/posts/75dnjiD8kv2khe9eQ/measuring-...
|
| Most of the compute in ML research seems to be going into
| architecture search. Once the architecture is found, training
| and net finetuning/transfer learning is comparatively cheap,
| and then inference is cheaper still. This implies we could see
| 10-100x gains in AI algorithms using today's hardware, or
| sudden surprising appearance of AI dominance in an unexpected
| field. (Object grasping in unstructured environments? Art
| synthesis?) A task could go from totally impossible to trivial
| in a year. In retrospect, the EfficientNet scaling graph should
| have alarmed more people than it did:
| https://learnopencv.com/wp-content/uploads/2019/06/Efficient...
|
| Waymo has been puttering along for years, not announcing much
| of interest. This may have caused some complacency about self-
| driving cars, which is a mistake. Algorithms only get better,
| while humans stay the same. Once Waymo can replace some human
| drivers some of the time, things will start changing very
| quickly.
| timr wrote:
| > A disturbing thing is that the architecture is much less
| novel than I originally thought it would be, so this shows
| perhaps one of the major difficulties was having the resources
| to try different things on a massive set of multiple
| alignments. This is something an industrial lab like DeepMind
| excels at. Whereas universities tend to suck at anything that
| requires a directed effort of more than a handful of people.
|
| Yeah, the HN commentary on Alphafold has a high heat-to-light
| ratio. I'm eager to read the paper _because_ the previous
| description of the method sounded remarkably similar to methods
| that have been around for ages, plus a few twists.
|
| The devil is going to be in the details on this one.
| TaupeRanger wrote:
| That's the case with basically everything DeepMind does. They
| have a very good PR department which hypes up everything they
| do while conveniently ignoring that basically nothing of any
| practical consequence has come of their endeavors. But I do
| think it's important that these companies exist now so we can
| see what _not_ to try going forward.
| timr wrote:
| Well, the CASP14 results do speak for themselves. Protein
| structure prediction is not necessarily of great meaning to
| drug discovery or biology, but they pretty much blew
| everyone else out of the water in a fair contest. For that
| reason, they deserve praise.
|
| It's a little like making a robot that is very, very good
| at something pointless (say, using a yo-yo). Who knows
| where it might lead, but if they make the best damned yo-yo
| bot in the world, they deserve whatever praise they get
| from the yo-yo community.
| MrsPeaches wrote:
| > high heat-to-light ratio
|
| Sorry for the ignorance but what does this mean?
| AlexCoventry wrote:
| Emotion-to-understanding ratio
| butMuhCulture wrote:
| It's trying to say light is more valuable than heat, or
| some such folksy thing. I cook steak in the dark so I don't
| find it to be a very insightful metaphor.
| Azrael3000 wrote:
| Incandescent light bulbs are generally very inefficient in
| producing light, compared to LED for example. They produce
| a lot of heat and not much light for which they are made.
|
| So in this context I suppose that gp implies that these
| threads don't provide much meaningful discussion but rather
| lots of hand waving.
| HPsquared wrote:
| Light is also often used in metaphors relating to
| knowledge, wisdom etc.
| dekhn wrote:
| "Fiat Lux" not "Fiat Calor"
| timr wrote:
| It's an idiom implying that there's a lot of chatter and
| bold claims, but very little of it is factual or
| informative.
| dm319 wrote:
| The key difference seems to be using the multiple alignments
| and assumption about evolutionary conservation? Useful for
| genes conserved, but less useful for de-novo proteins (like
| COVID and cancer) I guess?
| timr wrote:
| Dunno yet. MSAs were always a key input to Rosetta
| (previous best method). How they were used was very
| different.
|
| Fundamentally, everything in this space (= non-physical
| methods) is about inferring structure from things that are
| closely related. And you can't solve the problem at all for
| non-trivial proteins using physics, so here we are.
| pjfin123 wrote:
| I'm assuming you can't run this on any consumer computer?
| pjfin123 wrote:
| Nevermind
|
| > The simplest way to run AlphaFold is using the provided
| Docker script. This was tested on Google Cloud with a machine
| using the nvidia-gpu-cloud-image with 12 vCPUs, 85 GB of RAM, a
| 100 GB boot disk, the databases on an additional 3 TB disk, and
| an A100 GPU.
| sambroner wrote:
| That's... way closer to consumer than I expected
| qeternity wrote:
| For inference...
|
| Still accessible, but expensive to run at scale. And
| training even worse.
| lifthrasiir wrote:
| Except for (DGX) A100.
| erhk wrote:
| 2.2TB data
| dekhn wrote:
| which is basically nothing. They could put it in a cloud
| bucket and you could copy it to another bucket in minutes.
| lasagnaphil wrote:
| Nah, 4TB disk drives are not that expensive.
| crazysim wrote:
| Amazing. That's not a lot of libraries of congresses at all.
| fossuser wrote:
| Does anyone on HN work in bio or drug discovery?
|
| Could you give an overview of how people can leverage this (or
| how you might?).
|
| From reading around about it, it sounds like there's often a need
| to find a certain type of molecule to activate/inhibit another
| based on shape and the ability to programmatically solve for this
| makes the searching way easier.
|
| Is this too oversimplified/wrong? How will this be used in
| practice.
|
| [Edit]: Thanks for the answers!
| timr wrote:
| > Could you give an overview of how people can leverage this
| (or how you might?).
|
| Short answer: nobody knows. Traditionally, protein folding is a
| solution in search of a problem, but that's largely because the
| predictions were...unusably bad. This was always more of a
| super-difficult validation problem for the force fields and
| simulation methods, which could then be used for other problems
| of greater value (such as rational protein design, or
| simulation of the motion of proteins with known structures).
|
| These predictions are better, but still pretty far from the
| level of precision that you'd want for any kind of rational
| drug design, where the exact locations of protein side-chains
| (for example) matter a lot. You'll note that AlphaFold returns
| structures that are "relaxed" using one of the oldest
| simulation systems for proteins: AMBER. So it's not exactly a
| clean-room solution to the problem, and you can't assume that
| the details (which matter to drug design) are going to be any
| better than for the older methods.
|
| But that said, if you have a method that can _reliably_ give
| you a blurry view of the overall shape of a protein, even that
| could be useful for things like target discovery or inference
| of biological networks. But this is still a lot closer to pure
| research than "revolutionizing drug discovery", as is
| frequently batted around on reddit, HN and the press.
| dekhn wrote:
| Also I would say that really they just made improvements to
| protein structure prediction, not _protein folding_ which is
| the dynamic process by which proteins reach their equilibrium
| fold.
| timr wrote:
| Most definitely.
| dumb1224 wrote:
| I work in cancer research with a drug discovery focus in a lab
| with some structure biologists. My understanding is that if we
| identified proteins targets suitable for therapeutics then
| understand its structure to identify secondary binding sites
| could be crucial for drug discovery. Drugs can then be designed
| to modulate its biological functions.
| COGlory wrote:
| You can't do intelligent drug design if you don't know what the
| target protein looks like. We've gotten great at solving
| protein structures with things like crystallography and cryo-EM
| microscopy. Unfortunately, many interesting drug targets reside
| in the membrane of a cell, which means you can't easily work
| with them in a lab because they aren't soluble in anything but
| a plasma membrane. For instance, this is an issue with the
| 5HT2A protein, a g coupled protein receptor that is implicated
| in many serotonin related pathways.
|
| Being able to predict what it would look like would be a huge
| deal because then you can go about intelligently designing
| drugs for it.
| ponsko wrote:
| You should check out Salipro (https://www.salipro.com/) for
| membrane protein reconstitution.
| dekhn wrote:
| I've worked in bio and drug discovery for some 25 years. That
| includes building classifiers using gradient descent in the 90s
| (when algorithms, computers and data were all much worse). I
| ported DOCK to Linux in ~96 or 97. Since then I built an
| academic and then industrial career with some emphasis on using
| computing to solve problems in drug discovery, but I don't play
| that role any more.
|
| It doesn't look like the models produced by this would
| immediately turn the challenging problem of finding, approving,
| and marketing successful pharmaceuticals (IE, it doesn't
| eliminate any real bottleneck).
|
| There was a long-term dream of structure-based drug discovery
| based on docking, but IMO, it has never really proved itself
| (most of the examples of success are cherry picked from a much
| larger pile of massive failures).
| miltondts wrote:
| > ... but I don't play that role any more.
|
| I was thinking of going into that field. Can you expand a bit
| on why you left?
| dekhn wrote:
| Because programming computers is far more lucrative, and
| I'm better at it. However, if I had an unlimited budget I
| would return to biology.
|
| I spent 15 years trying to be a professor and failed
| miserably. I was bad at it and didn't like what professors
| have to do.
|
| I then moved to industry to be a random engineer and
| thrived doing things entirely unrelated to drug discovery.
| Eventually, I convinced my company to invest heavily in
| life sciences. This was successful and I was on track to be
| a powerful player (a "research engineer", just like the DM
| folks who are building these things) in this space, when
| the project got very popular and I was elbowed aside by
| others who are more aggressive. So I went back to being a
| programmer again, it's much less stressful, pays better,
| and realistically, much of my time is just telling
| scientists what I would do if I was in their place anyway.
|
| "Don't swim with the sharks if you don't like being bitten"
| gnufx wrote:
| > much of my time is just telling scientists what I would
| do if I was in their place anyway.
|
| That sounds familiar. I guess they mostly don't listen,
| whatever your record -- especially if it was in a
| different field they could learn from -- but I hope it's
| not always like that.
| yudlejoza wrote:
| Most comp-biologists who work directly with programmers
| are some of the biggest jerks, and the least qualified
| tech folks.
|
| They hide all of that under "I'm a scientist, you're
| not".
| fossuser wrote:
| Maybe a culture clash? Academia is all about status and
| prestige - more often scientific outcomes seem to be a
| means to get the former (why journals don't publish
| negative results, why studies fail to replicate, why
| stuff isn't open access, why people worry about getting
| scooped, etc.)
|
| Tech (at its best) hates credentialism (sometimes I think
| to a point of over-correction).
|
| That said, 80% of the devs in the bay area seem to have
| gone to Stanford or MIT, so...
| nick238 wrote:
| I haven't worked on the drug-side of things, but here my bio
| perspective: It's kind of out-of-vogue, but consider the "lock
| and key" model of proteins and small molecules (drugs). For
| drug design, what you want to do is get a key that fits just
| one lock (to pull whatever lever) and not others (to avoid
| side-effects). It's relatively easy to find a molecule that
| fits a protein, because that protein is what you might spend
| years researching and probing, but it's tricky to check if it
| does anything against ~100,000 others in humans. If you could
| do an _in silico_ computational survey to be like, oh, maybe it
| 'll target this accidentally, you could spot-check those _in
| vitro_ , and/or stick on some other atoms to your small-
| molecule to make it not fit that off-target.
|
| Holy grail, IMO, though is being able to design _de novo_
| protein sequences (to make "biologics", aka engineered protein
| drugs) that can a) target (bind/block/enhance) or do (chemical
| reactions) what you want and only that, b) are easily
| synthesizeable by bacteria/yeast (cheap to make), and c) are
| stable (easy to transport/store).
| slownews45 wrote:
| First seems reasonable. I've not heard of anything on the
| later coming even close credibly - though is an obvious holy
| grail.
| zosima wrote:
| It can be an aid in drug development, and can perhaps assist a
| bit in tuning small molecule drugs for more stable binding.
|
| Though I think the major impacts will be two-fold:
|
| (1) The field of structural biology is going to see a change,
| with much more data available. Some structures of difficult to
| crystallize proteins will be solved, which may lead to much
| greater biological understanding. We may enter a time, where
| once you have a primary sequence, you also have a likely
| 3d-structure, which will probably change the daily work of
| quite a few biologists a bit.
|
| (2) Industrial protein design. A tool such as this can
| potentially have great utility in optimizing proteins as
| chemical catalysts for various processes in different
| industries. This includes expanding the conditions under which
| a protein is active and also making their conformation more
| stable and so the protein more long-lived in solution.
| dekhn wrote:
| For those that are unaware, industrial protein design is a
| multibillion dollar industry. For example, decades ago
| Genentech and Dow Corning formed a company that developed
| proteases (proteins that cut other proteins) that worked at
| much higher temperatures than the ones in nature. This was
| then sold to P&G and other major laundry companies (laundry
| detergent contains idle enzymes activated by the heat of the
| laundry water, and they go clean up. "Protein gets out
| protein" was the marketing jingle.
|
| That was a few billion dollars right there and almost all the
| work was done by hand by lab scientists.
| [deleted]
| Cas9 wrote:
| Honest question: since AlphaFold doesn't really _solve_ the
| protein folding problem (it's NP-complete after all), but only
| _approximates_ solutions very well, what are the real impacts of
| this? Isn't a good approximation of a protein enough to cause
| unexpected problems? How do we know that an approximate structure
| will perform the same as the correct solution?
| Ultimatt wrote:
| There is a lot of bias in the chat here from a more chemistry
| and pharma slant. If you ignore this AlphaFold solves in a very
| meaningful way the problem blocking a lot of science
| investigation.
|
| For comparative and evolutionary analysis structure is far more
| conserved than sequence. Especially in things like viruses or
| anything with a high rate of reproduction like bacteria. Just
| knowing the general fold or overall structure is enough to do
| structural alignment and tell if two genes are related on that
| basis, even if their genomic sequence is completely dissimilar.
| Large groups of researchers rely on sequence homology built
| from sequences of known structure.
|
| But AlphaFold works well in new sequence space to far more
| accuracy than is needed. If we had an AlphaFold prediction for
| every known sequence suddenly the evolutionary relationships
| between all genes and even all species would be far clearer.
| This on its own unlocks a new foundation to reason about
| function and molecular interaction with a wholistic systems
| view without gaps in what we can know with some reasonable
| assurance.
|
| For an analogy think of the difference between having books in
| different languages describing objects. You know what some of
| the book in English might say but you dont even know if the
| book in Spanish is even talking about the same things.
| AlphaFold is like an AI that transforms all the books into
| picture books and now we can use image similarity or have one
| person look at all pictures.
| devindotcom wrote:
| Also announced today was RoseTTAFold from UW's Baker Lab, which
| claims nearly the same accuracy at much higher efficiencies.
| There's a public server and paper in Science.
|
| More info here and here:
|
| https://www.bakerlab.org/index.php/2021/07/15/accurate-prote...
|
| https://techcrunch.com/2021/07/15/researchers-match-deepmind...
| [deleted]
| stupidcar wrote:
| The model parameters are only available for non-commercial use.
| That's a shame, as I presume there might be a lot of medical
| startups that would benefit from having this kind protein-folding
| tech available.
| mikewarot wrote:
| Unless I'm mistaken, you could train the model yourself,
| starting with a random set of values. In time, your error rates
| would be low enough to have a new set of parameters which you
| could use however you like.
| COGlory wrote:
| I am a structural biologist. This is one of the handful of topics
| that overlaps with my field here. I'm very excited to play with
| this, although it might eventually put me out of a job.
| AnimalMuppet wrote:
| Here's where I think we need to be going: You go to a doctor's
| office, sick. 1) They take a blood sample. 2) They find the
| malignant bacteria and DNA sequence it. 3) If it's a known
| strain, they know what antibiotics to use on it. 4) If not,
| they solve protein folding on the genes. 5) From that, they see
| which existing antibiotics would kill it. 6) If none will, then
| given the proteins, they have to derive a new antibiotic.
|
| 1) is easy. 2) might not be - there can be a lot of things in a
| blood sample, and finding only the interesting (bad) things
| might not be simple. The sequencing part is pretty much solved.
| 3) would take a bit of work, but I think it's possible now. 4)
| we're getting there. 5) might have a fair amount in common with
| 3), but it probably takes some additional work. 6) is...
| probably non-trivial.
|
| That's just one research agenda. There are others. You may have
| to move to related work, but I doubt you're going to be out of
| a job in this lifetime.
| rllearneratwork wrote:
| why would it put you out of job? Wouldn't it just become one of
| the tools you use?
| dekhn wrote:
| It would both become a tool he used (to produce initial
| structures to fit in density maps) and a tool that used his
| or her output (because alphafold requires known protein
| structures that are homologous to the one you're predicting).
| nikhilsimha wrote:
| The implicit assumption you are making is that the demand
| increases in lock step with productivity gains. 100x faster
| drug discovery, 100x more drugs _need_ to be discovered = >
| same number of people employed.
|
| These correlations do hold for technical fields, but
| logically there should be a point beyond which productivity
| gains outpace, demand growth / demand could even stop
| growing. One should either retool to solve a newer problem
| before this point is reached, or hope that the point is not
| reached in the span of their career.
|
| Oil rig builders for example - manufacturing has been
| increasingly automated, but the demand for oil rig building
| has grown consistently. But they should probably look into
| solving other problems given that demand is shifting.
| mensetmanusman wrote:
| However, complexity for the structures is essentially
| unbounded on a time scale of the universe timeframe.
| sbierwagen wrote:
| >but logically there should be a point beyond which
| productivity gains outpace
|
| The limiting factor on drug approval is clinical trials.
| Once every living person is enrolled in a clinical trial,
| we will have hit the maximum rate at which humanity can
| produce new drugs.
|
| That might be more than 10x the current rate, but probably
| less than 1000x.
| dekhn wrote:
| In principle you could put people into multiple trials
| and gain somewhat additional throughput. Google
| implemented putting users into multiple different
| experiments (paper by Tang et al) and that made a huge
| difference.
___________________________________________________________________
(page generated 2021-07-15 23:00 UTC)