[HN Gopher] Learning From DNA: a grand challenge in biology
       ___________________________________________________________________
        
       Learning From DNA: a grand challenge in biology
        
       Author : ninjha01
       Score  : 54 points
       Date   : 2024-03-14 17:56 UTC (5 hours ago)
        
 (HTM) web link (hazyresearch.stanford.edu)
 (TXT) w3m dump (hazyresearch.stanford.edu)
        
       | pfisherman wrote:
       | Just gonna leave this here.
       | 
       | https://www.biorxiv.org/content/10.1101/2024.02.29.582810v1
       | 
       | Tl;dr: DNA is NOT all you need.
        
         | samuell wrote:
         | I tend to agree (the cell being in control and all the 4D
         | interactions and epigenetics mechanisms etc), but out of
         | curiosity, what would you say we also need?
        
           | COGlory wrote:
           | For starters, chemical environment modeling. But also cells
           | differentiate, so in any system you need to understand the
           | differentiation, and how those differentiated cells will
           | change the environment of other cells, based on the
           | environment they encounter.
           | 
           | That's not to say you can't glean a ton from DNA, but there
           | are some external inputs we may simply never know enough
           | about to incorporate into the model. Ultimately DNA IS all
           | you need...if you have perfect environmental information.
        
           | pfisherman wrote:
           | The article I posted shows what is working better - the Olga
           | Troyanskaya / David Kelley style models. There was another
           | one (Kundaje group?) recently that used Hi-C data.
        
         | dekhn wrote:
         | most of the examples in that paper (a single paper) show that
         | DNA is nearly all you need, with the rest being RNA.
        
           | pfisherman wrote:
           | RNA is an obvious example. The examples and benchmarks they
           | give in the paper are not the straw men the DNA LLMs are
           | beating the stuffing out.
           | 
           | Also CRE activity is highly cell type specific. This article
           | is a pretty awesome demonstration of model guided design of
           | cell type specific cis regulatory elements.
           | 
           | https://www.biorxiv.org/content/10.1101/2023.08.08.552077v1
           | 
           | An LLM would not be able to do this because DNA itself
           | contains no contextual information about cell type - every
           | cell has a copy of the full genome. Epigenetic tracks however
           | contain a lot of information germane to the cellular context
           | - ex which parts of the genome are being transcribed.
        
             | dekhn wrote:
             | but epigenetics is just DNA. it's state information stored
             | directly in the DNA, or in directly attached machinery.
             | from the perspective of learned models, those are just
             | other features.
             | 
             | But realistically, the right source for transcription is
             | the RNA in the cell, _not_ the epigenetics. Nearly all cell
             | type profiling is based on RNA. It 's far easier and more
             | reliable to interrogate the transcriptome than to try to
             | gain info from epigenetic states.
        
               | pfisherman wrote:
               | Epigenetics is not just DNA, think of it more like the
               | (hidden) state of DNA. Histone modifications and open
               | chromatin and other epigenetic readouts are like
               | emissions / indicators of the hidden state.
               | 
               | The relationship is like that between the words in a book
               | and the page that is actively being read. I know that is
               | a hackneyed analogy; but coffee is wearing off :)
        
               | dekhn wrote:
               | Those are all readable using standard DNA sequencing
               | techniques, so again, it's just state attributes of the
               | DNA.
               | 
               | (I've worked in genomics for 30+ years. I'm not just
               | spitballing here).
        
         | jhbadger wrote:
         | I think you are missing what the Evo project is trying to do --
         | create a new prokaryotic genome through a generative model. How
         | this would work would be like the earlier hand-made synthetic
         | genomes like Synthia (Gibson et al, 2010).
         | 
         | In such a system you would take an existing bacterial cell and
         | replace its genome with the newly synthesized version. The
         | proteins and other molecules from the existing cell would
         | remain (before eventually being replaced) and serve to "boot"
         | the new genome.
        
           | nextos wrote:
           | It's an interesting endeavor, but there are some obvious
           | safety concerns.
           | 
           | Within Prokaryotes, there is a lot of horizontal gene
           | transfer. What if some of the synthetic sequences get into
           | other organisms and spread out?
        
             | UniverseHacker wrote:
             | Those genes would have to confer an evolutionary advantage
             | or they would immediately be discarded/selected against.
             | The chances of that happening are nil... we're not going to
             | come up with something more useful to bacteria than
             | billions of years of natural selection. Synthetic biology
             | with organisms produced to generate small molecules useful
             | for humans is widely practiced but has the opposite
             | problem- all of the engineering changes to the cells are
             | constantly being selected against, and revert on their own.
        
       | visarga wrote:
       | DNA is all you need? In the future generative AI will generate
       | You!
        
       | d_silin wrote:
       | Would be interesting to see what comes of it.
       | 
       | As you progress along the following chain:
       | genomics-->proteomics->interactomics->metabolomics, our
       | understanding becomes blurrier and challenges harder.
        
       | ninjha01 wrote:
       | I built the wrapper/playground [0] linked in the article. Feel
       | free to give feedback here or by the email in my bio
       | 
       | [0] https://evo.nitro.bio/
        
       | jashephe wrote:
       | I'm a little disappointed that their linked preprint doesn't
       | appear to include any molecular biology; i.e. they don't actually
       | try to synthesize any of their predicted sequences and test
       | function. It wouldn't be an outrageous synthesis task to make
       | some of the CRISPR-Cas sequences they generated.
       | 
       | Also interesting that AlphaMisense is omitted from Figure 2B; it
       | substantially outperforms the ESM-based ESM1b in our hands. But I
       | guess the idea is that this is a general-purpose DNA language
       | model whereas AlphaMissense is domain-specific for variant effect
       | prediction?
        
       ___________________________________________________________________
       (page generated 2024-03-14 23:00 UTC)