[HN Gopher] Learning From DNA: a grand challenge in biology
___________________________________________________________________
Learning From DNA: a grand challenge in biology
Author : ninjha01
Score : 54 points
Date : 2024-03-14 17:56 UTC (5 hours ago)
(HTM) web link (hazyresearch.stanford.edu)
(TXT) w3m dump (hazyresearch.stanford.edu)
| pfisherman wrote:
| Just gonna leave this here.
|
| https://www.biorxiv.org/content/10.1101/2024.02.29.582810v1
|
| Tl;dr: DNA is NOT all you need.
| samuell wrote:
| I tend to agree (the cell being in control and all the 4D
| interactions and epigenetics mechanisms etc), but out of
| curiosity, what would you say we also need?
| COGlory wrote:
| For starters, chemical environment modeling. But also cells
| differentiate, so in any system you need to understand the
| differentiation, and how those differentiated cells will
| change the environment of other cells, based on the
| environment they encounter.
|
| That's not to say you can't glean a ton from DNA, but there
| are some external inputs we may simply never know enough
| about to incorporate into the model. Ultimately DNA IS all
| you need...if you have perfect environmental information.
| pfisherman wrote:
| The article I posted shows what is working better - the Olga
| Troyanskaya / David Kelley style models. There was another
| one (Kundaje group?) recently that used Hi-C data.
| dekhn wrote:
| most of the examples in that paper (a single paper) show that
| DNA is nearly all you need, with the rest being RNA.
| pfisherman wrote:
| RNA is an obvious example. The examples and benchmarks they
| give in the paper are not the straw men the DNA LLMs are
| beating the stuffing out.
|
| Also CRE activity is highly cell type specific. This article
| is a pretty awesome demonstration of model guided design of
| cell type specific cis regulatory elements.
|
| https://www.biorxiv.org/content/10.1101/2023.08.08.552077v1
|
| An LLM would not be able to do this because DNA itself
| contains no contextual information about cell type - every
| cell has a copy of the full genome. Epigenetic tracks however
| contain a lot of information germane to the cellular context
| - ex which parts of the genome are being transcribed.
| dekhn wrote:
| but epigenetics is just DNA. it's state information stored
| directly in the DNA, or in directly attached machinery.
| from the perspective of learned models, those are just
| other features.
|
| But realistically, the right source for transcription is
| the RNA in the cell, _not_ the epigenetics. Nearly all cell
| type profiling is based on RNA. It 's far easier and more
| reliable to interrogate the transcriptome than to try to
| gain info from epigenetic states.
| pfisherman wrote:
| Epigenetics is not just DNA, think of it more like the
| (hidden) state of DNA. Histone modifications and open
| chromatin and other epigenetic readouts are like
| emissions / indicators of the hidden state.
|
| The relationship is like that between the words in a book
| and the page that is actively being read. I know that is
| a hackneyed analogy; but coffee is wearing off :)
| dekhn wrote:
| Those are all readable using standard DNA sequencing
| techniques, so again, it's just state attributes of the
| DNA.
|
| (I've worked in genomics for 30+ years. I'm not just
| spitballing here).
| jhbadger wrote:
| I think you are missing what the Evo project is trying to do --
| create a new prokaryotic genome through a generative model. How
| this would work would be like the earlier hand-made synthetic
| genomes like Synthia (Gibson et al, 2010).
|
| In such a system you would take an existing bacterial cell and
| replace its genome with the newly synthesized version. The
| proteins and other molecules from the existing cell would
| remain (before eventually being replaced) and serve to "boot"
| the new genome.
| nextos wrote:
| It's an interesting endeavor, but there are some obvious
| safety concerns.
|
| Within Prokaryotes, there is a lot of horizontal gene
| transfer. What if some of the synthetic sequences get into
| other organisms and spread out?
| UniverseHacker wrote:
| Those genes would have to confer an evolutionary advantage
| or they would immediately be discarded/selected against.
| The chances of that happening are nil... we're not going to
| come up with something more useful to bacteria than
| billions of years of natural selection. Synthetic biology
| with organisms produced to generate small molecules useful
| for humans is widely practiced but has the opposite
| problem- all of the engineering changes to the cells are
| constantly being selected against, and revert on their own.
| visarga wrote:
| DNA is all you need? In the future generative AI will generate
| You!
| d_silin wrote:
| Would be interesting to see what comes of it.
|
| As you progress along the following chain:
| genomics-->proteomics->interactomics->metabolomics, our
| understanding becomes blurrier and challenges harder.
| ninjha01 wrote:
| I built the wrapper/playground [0] linked in the article. Feel
| free to give feedback here or by the email in my bio
|
| [0] https://evo.nitro.bio/
| jashephe wrote:
| I'm a little disappointed that their linked preprint doesn't
| appear to include any molecular biology; i.e. they don't actually
| try to synthesize any of their predicted sequences and test
| function. It wouldn't be an outrageous synthesis task to make
| some of the CRISPR-Cas sequences they generated.
|
| Also interesting that AlphaMisense is omitted from Figure 2B; it
| substantially outperforms the ESM-based ESM1b in our hands. But I
| guess the idea is that this is a general-purpose DNA language
| model whereas AlphaMissense is domain-specific for variant effect
| prediction?
___________________________________________________________________
(page generated 2024-03-14 23:00 UTC)