[HN Gopher] How much information is in DNA?
       ___________________________________________________________________
        
       How much information is in DNA?
        
       Author : crescit_eundo
       Score  : 47 points
       Date   : 2025-05-08 17:42 UTC (2 days ago)
        
 (HTM) web link (dynomight.substack.com)
 (TXT) w3m dump (dynomight.substack.com)
        
       | rhelz wrote:
       | In any case, 6.2 billion bits (interestingly enough, almost
       | exactly as much information which is on an audio CD which you
       | used for your romantic mixtapes) is an upper bound.
       | 
       | This rules out pretty much every nutty theory which evolutionary
       | psychologists propose. Such as we evolved for altruism, we
       | evolved to believe in religion, etc etc. Complete B.S. Exactly
       | how much information would you need to specify a behavior like
       | being predisposed to a belief in religion??? There's less than 80
       | minutes worth of music's worth of information in our genomes, and
       | most of that is concerned with just keeping us alive.
       | 
       | You are not predisposed to be anything. Go create the kind of
       | person you want to be.
        
         | ruuda wrote:
         | > There's less than 80 minutes worth of music's worth of
         | information in our genomes
         | 
         | That's a very misleading take, this is lossless audio and the
         | majority of the bits are spent encoding noise. You can encode
         | way more audio at perceptually but not technically lossless
         | level in that space.
        
         | guilbep wrote:
         | There is no logic behind your argument
        
         | out_of_protocol wrote:
         | > There's less than 80 minutes worth of music's worth of
         | information
         | 
         | Or awful lot of text information (state of art compressors can
         | do up to 1:10 ratio for plain text, decoder itself is rather
         | small, 750MB compressed could potentially contain like 7GB of
         | text data).
         | 
         | Also, look at demoscene. 4k (4 kB is the size of executable)
         | can do crazy things, and 64kB can fit a lot of nice 3D objects,
         | music, text, complex effects etc. weight less than any
         | screenshot of any moment of running demo. In 95kB you can have
         | full game (google kkringer)
         | 
         | P.S. better example: full snake game in 56 BYTES
         | https://github.com/donno2048/snake
         | 
         | For comparation the link above is 34 bytes, whole sentence is
         | 83 bytes. It's possible to do a lot if we're talking about code
        
           | Valgrim wrote:
           | There's an interesting implication to this. We assume that
           | evolution happens when random mutations (similar to random
           | bit flips, removal or injection?) occur and when the random
           | result has an advantage, the mutation tends to remain in the
           | gene pool.
           | 
           | Yet at the same time the result of this random code is
           | extremely compressed, to the point we compare it to
           | procedural generative code.
           | 
           | Not sure what we can do with this but it certainly seems like
           | we can once again get inspired by nature on this one.
        
           | robviren wrote:
           | I'd argue you could even take that one step further. Limiting
           | it to the data encoded by DNA does not take into account what
           | it is interacting with. DNA interacts with an ocean of
           | protein leading to untold numbers of interactions. The DNA
           | could just be the operating system in all this calling upon
           | RNA and other "devices" to execute functions.
           | 
           | To expand upon your compression idea, the index it is using
           | exists outside the DNA encoding itself which means it could
           | be holding an absolute ton of data.
           | 
           | Bonus: https://xkcd.com/3056/
        
             | bossyTeacher wrote:
             | Indeed. Chances are that the DNA itself is but one part of
             | the puzzle. The protein soup the DNA interacts with is
             | partially random and partially a consequence of the DNA
             | itself and that interplay is likely a complexity space
             | several orders of magnitude bigger than the DNA itself
        
             | EvanAnderson wrote:
             | > The DNA could just be the operating system...
             | 
             | I am fond of the analogy of DNA to procedural generation.
             | The "operating system", as I see it, is physics. Everything
             | else is primitives built on top of that.
             | 
             | Our brains can't begin to comprehend the untold multitudes
             | of interactions occurring at a molecular scale over
             | geologic time.
        
           | bob1029 wrote:
           | > Also, look at demoscene. 4k (4 kB is the size of
           | executable) can do crazy things
           | 
           | There are limits to how Kolmogorov complexity scales up. Many
           | of these tricks are exploiting procedural techniques that can
           | be expressed in minimal terms. Once you start feeding in
           | actual information that is not feasible to express
           | procedurally (i.e., is already compressed/high-entropy), you
           | are forced to accumulate bits. An obvious example of this
           | would be incorporating a texture that is multiple megabytes
           | when compressed as a jpeg on disk.
        
             | out_of_protocol wrote:
             | Evolution also uses dirty tricks all the time, for no
             | reason. E.g. the same region gets reused for totally
             | unrelated use-cases.
             | 
             | > An obvious example of this would be incorporating a
             | texture
             | 
             | Some random range of storage data is now the texture. It
             | was used to process formatting logic but now also a texture
        
         | chromatin wrote:
         | > There's less than 80 minutes worth of music's worth of
         | information in our genomes
         | 
         | What an insanely bad take.
         | 
         | Not only did you not read and/or comprehend the article, the
         | article itself undersells the information content of the genome
         | (I'll post on this at the top level).
         | 
         | > You are not predisposed to be anything.
         | 
         | This does not logically follow your preceding statement, even
         | if we were to accept the foregoing limited information content
         | as factual
        
         | nurettin wrote:
         | You are predisposed to acting like your closest social circle.
        
         | nathan_compton wrote:
         | This isn't a great argument - simple rules can produce
         | complicated behavior and, at any rate, I don't think any
         | evpsych people believe that evolution inescapably predisposes
         | people to the things you talked about, only that evolution has
         | produced biases in our behavior which manifest (at certain
         | times and in certain circumstances) as those phenomena.
        
         | GuB-42 wrote:
         | An audio CD is a very inefficient way of storing information.
         | 
         | I think a more apt comparison would be that of a LLM of that
         | size. qwen:0.5b is about 400MB, its abilities are laughable
         | compared to the likes of ChatGPT, but it can write coherently
         | about general topics. For instance.                 >>> why
         | would people be altruistic       People are likely to be
         | altruistic because they believe that helping others is better
         | for everyone involved.       People may also believe in the
         | power of compassion and empathy towards others, which can
         | contribute to greater altruism.       Overall, people are
         | likely to be altruistic because they believe that helping
         | others is better for everyone involved.
         | 
         | It is not a statement about LLMs, more about what you can
         | achieve with "just" 400MB for storage. The other similarity is
         | that LLMs are also "messy", if you want to see the results of
         | finely crafted work in a really small amount of space, look at
         | what sizecoders can do with a few kB of code or less.
        
       | tetris11 wrote:
       | I thought the main advantage of DNA storage was the physical size
       | of it, and how many different genomes you could have stacked next
       | to each other in the same -70degree space.
       | 
       | Millions of chimeric cells on the same petri dish? That's 1PB on
       | a single glass slide.
       | 
       | Depending on the sequencing tech paired with the rise of Spatial
       | data, the read speed could be formidable.
       | 
       | Needlessly complex setup though. Let's just stick with metals for
       | now.
        
         | out_of_protocol wrote:
         | DNA self-desintegrate very fast. It only works in living cells
         | because it is being repaired non-stop
        
           | throwanem wrote:
           | Even reading is a destructive process, and the physics
           | involved are incomprehensibly complex by comparison with
           | anything in the digital domain.
        
             | kjkjadksj wrote:
             | There are ways to read it nondestructively. One way does
             | trade resolution but once prepped the DNA itself can be
             | imaged to be read.
             | 
             | https://en.wikipedia.org/wiki/G_banding
        
               | throwanem wrote:
               | That does not read DNA. A chromosome is not a strand and
               | a karyotype is not a sequence. In any case, I described
               | what a ribosome does.
        
           | shishironline wrote:
           | Sorry, it is one of the most stable organic molecules and can
           | stay intact for thousands of years. That is why the Jurassic
           | Park like fantasies are based on a truth and many extinct
           | species have been brought to life through DNA in reality too.
        
             | chermi wrote:
             | I think maybe they are talking about the very tightly
             | packed yet still functionally accessible 3d structure that
             | is chromatin, not individual strands.
        
             | misnome wrote:
             | No, they haven't. Any claims otherwise are as real as "T
             | Rex Leather" handbags.
        
       | roxolotl wrote:
       | Discussion from earlier this week:
       | https://news.ycombinator.com/item?id=43927321
       | 
       | Pretty sure the substack and main site are the same. First
       | paragraph is at least.
        
       | nuc1e0n wrote:
       | The article says that DNA is designed to keep working despite
       | mutations occuring. What evidence does the author put forward to
       | suppose it was designed rather than evolved? There's plenty of
       | evidence to support it evolved BTW.
        
         | iamtheworstdev wrote:
         | you might be reading a little too much into that word
        
           | hsshhshshjk wrote:
           | And likely on purpose too
        
       | vintermann wrote:
       | Information can only be defined with respect to states where you
       | 1. Can tell (or could in theory tell) the difference and 2. Care
       | about the difference between states. The differences you _care_
       | about, and the ones you don 't, are baked in whenever you use any
       | definition of information.
       | 
       | It doesn't matter much, unless you use it to sneak in what you
       | think we _should_ care about, or use it to make philosophical
       | arguments whose circularity is carefully hidden.
        
       | chromatin wrote:
       | The article massively undersells the information content of the
       | genome in several key ways. A non-comprehensive list of these
       | (before my morning coffee forgive me) includes:
       | 
       | - DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)
       | 
       | - Interactions of alleles (what article refers to as the "two
       | versions of each base pair")
       | 
       | - Duplications, deletions, inversions, and other structural
       | variations (https://www.genome.gov/genetics-glossary/Structural-
       | Variatio...)
       | 
       | - Physical proximity interactions in 3-dimensional space
       | (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)
       | 
       | - Combinatorial effect (massive) of different alleles in complex
       | systems
       | 
       | Overall, it's not sensible to compare a linear sequence of bits,
       | like a CD (sibling comment) or DVD (the article), to the linear
       | sequence of the genome and conclude that their information
       | content, based on length alone, is in any way comparable.
        
         | deng wrote:
         | He does mention structual interactions as well as
         | duplications/deletions/inversions. I would argue methylation is
         | more like an annotation of DNA and not part of the DNA itself,
         | but that's a matter of opinion.
         | 
         | In the end, the author literally says: "nobody knows". Yes, you
         | cannot compare a linear sequence of bits to a macromolecule
         | that interacts structurally with its environment, and the
         | author does not make that claim. The question he tries to
         | answer is: how much data is needed to re-create a similar
         | macromolecule that interacts in a similar way. His main point,
         | in which you both agree: only the exons are surely not enough
         | because the encoded proteins are just a (small?) part of how
         | DNA interacts.
        
           | kjkjadksj wrote:
           | Exons are almost like functions where as a gene is almost
           | like a class definition. In different tissues in the body a
           | gene might be alternatively spliced to lead to different
           | protein isoforms. In effect, making use of only a subset of
           | available functions in the class depending on certain input
           | parameters or how the class is called.
        
             | throwanem wrote:
             | This is a Star Trek version of the subject, in that it is
             | pure technobabble which happens to mention a few real
             | terms.
        
         | foobarian wrote:
         | I find that even if this just provides a lower bound it is
         | still an interesting piece of information.
        
         | clickety_clack wrote:
         | You put my reaction to this in much more educated terms. I've
         | always felt that thinking of DNA as bits was a bit simplistic.
         | Just because we store information as bits it doesn't mean that
         | nature does.
         | 
         | Not that it means they can't be right, but the author also
         | doesn't seem to have any particular expertise in genetics.
         | Their ideas need to survive a lot more criticism by people who
         | know what they're talking about before you could start to see
         | them as convincing.
        
         | moralestapia wrote:
         | But all of those emergent effects are accounted for in the DNA
         | sequence [1], so the estimate is fine.
         | 
         | 1. Maaaaybe you could make a case for DNA methylation, but that
         | still requires some DNA signatures so ...
        
       | RainbowcityKun wrote:
       | - Cells work like this because DNA is under constant attack from
       | mutations. - Mutations most commonly arise during cell
       | replication.
       | 
       | It's fascinating to realize that the "messiness" of DNA isn't a
       | bug, but a feature--a side effect of evolution's raw material
       | supply chain.
       | 
       | Mutations, repeats, transposons, and imperfect repairs all
       | contribute to a noisy genomic landscape. But it's exactly this
       | noise that enables biological diversity. No mutations, no
       | variation. No variation, no selection. No selection, no
       | evolution.
       | 
       | The genome is not a blueprint--it's a living, adapting
       | scratchpad. Messiness is the canvas on which nature paints
       | diversity.
        
         | esafak wrote:
         | Don't forget sexual reproduction.
        
         | nickpsecurity wrote:
         | Let me add to that. It requires a universe with specific laws
         | that remain stable and encourage optimization. Then, a planet
         | hospitible to life. Then, specific creatures with biological
         | machinery more complex than anything humans have created. The
         | machinery has plenty of reliability and adaptation baked in.
         | 
         | Godless evolution suggests randomness produced all of it
         | overtime. Yet, that's never worked in anything we've built.
         | Even our GA's required laws, an environment, a computer,
         | software, and fine-tuning. Pre-existing or by intelligent
         | design (human inventors). Without these, it produced no
         | results.
         | 
         | So, I'll correct you by saying empirical data suggests
         | evolution didnt produce this. We're seeing God's design skills
         | in adaptive, resilient, complex, self-replicating systems. His
         | work is truly beautiful to behold. Humans still can't produce
         | something similar from scratch. Actually, they can't even be
         | sure how the existing design works.
        
       | amelius wrote:
       | Another question is:
       | 
       | How much information can you __store__ in DNA without affecting
       | the organism too much?
        
       | stenl wrote:
       | A much more detailed and thoughtful (and peer reviewed) take on
       | the same question from my colleague Jussi Taipale:
       | https://www.embopress.org/doi/full/10.15252/embj.201696114
        
       | metalman wrote:
       | DNA contains all of the actualy relevant information that exists,
       | including whatever sequence gives rise to the very
       | conceptualisation of information, so in fact everything else that
       | could be considered "information" is derived from DNA.
        
       | gitroom wrote:
       | Man, the back and forth here before coffee is actually kinda
       | hilarious - I get all worked up before caffeine too, but
       | honestly, DNA being this messy scratchpad feels way more
       | interesting than treating it like a tidy CD. The messiness kinda
       | rules, if you ask me.
        
       | gfalcao wrote:
       | I would like to get a reasonably good intuition in regards to the
       | total amount of compound DNA from human bodies at different
       | biochemical states, in different locations around the world
       | (different climates). By "compound DNA" I mean, including DNA of
       | bacterium, fungi and viruses living within one's body. For
       | instance, gut bacteria acquired and maintained based on food
       | intake and environmental influence.
        
         | gfalcao wrote:
         | In other words, how much the perception of DNA data in
         | gigabytes grow by in different circumstances? Would it grow by
         | a few more gigabytes ?
        
       ___________________________________________________________________
       (page generated 2025-05-10 23:00 UTC)