[HN Gopher] How much information is in DNA?
___________________________________________________________________
How much information is in DNA?
Author : crescit_eundo
Score : 47 points
Date : 2025-05-08 17:42 UTC (2 days ago)
(HTM) web link (dynomight.substack.com)
(TXT) w3m dump (dynomight.substack.com)
| rhelz wrote:
| In any case, 6.2 billion bits (interestingly enough, almost
| exactly as much information which is on an audio CD which you
| used for your romantic mixtapes) is an upper bound.
|
| This rules out pretty much every nutty theory which evolutionary
| psychologists propose. Such as we evolved for altruism, we
| evolved to believe in religion, etc etc. Complete B.S. Exactly
| how much information would you need to specify a behavior like
| being predisposed to a belief in religion??? There's less than 80
| minutes worth of music's worth of information in our genomes, and
| most of that is concerned with just keeping us alive.
|
| You are not predisposed to be anything. Go create the kind of
| person you want to be.
| ruuda wrote:
| > There's less than 80 minutes worth of music's worth of
| information in our genomes
|
| That's a very misleading take, this is lossless audio and the
| majority of the bits are spent encoding noise. You can encode
| way more audio at perceptually but not technically lossless
| level in that space.
| guilbep wrote:
| There is no logic behind your argument
| out_of_protocol wrote:
| > There's less than 80 minutes worth of music's worth of
| information
|
| Or awful lot of text information (state of art compressors can
| do up to 1:10 ratio for plain text, decoder itself is rather
| small, 750MB compressed could potentially contain like 7GB of
| text data).
|
| Also, look at demoscene. 4k (4 kB is the size of executable)
| can do crazy things, and 64kB can fit a lot of nice 3D objects,
| music, text, complex effects etc. weight less than any
| screenshot of any moment of running demo. In 95kB you can have
| full game (google kkringer)
|
| P.S. better example: full snake game in 56 BYTES
| https://github.com/donno2048/snake
|
| For comparation the link above is 34 bytes, whole sentence is
| 83 bytes. It's possible to do a lot if we're talking about code
| Valgrim wrote:
| There's an interesting implication to this. We assume that
| evolution happens when random mutations (similar to random
| bit flips, removal or injection?) occur and when the random
| result has an advantage, the mutation tends to remain in the
| gene pool.
|
| Yet at the same time the result of this random code is
| extremely compressed, to the point we compare it to
| procedural generative code.
|
| Not sure what we can do with this but it certainly seems like
| we can once again get inspired by nature on this one.
| robviren wrote:
| I'd argue you could even take that one step further. Limiting
| it to the data encoded by DNA does not take into account what
| it is interacting with. DNA interacts with an ocean of
| protein leading to untold numbers of interactions. The DNA
| could just be the operating system in all this calling upon
| RNA and other "devices" to execute functions.
|
| To expand upon your compression idea, the index it is using
| exists outside the DNA encoding itself which means it could
| be holding an absolute ton of data.
|
| Bonus: https://xkcd.com/3056/
| bossyTeacher wrote:
| Indeed. Chances are that the DNA itself is but one part of
| the puzzle. The protein soup the DNA interacts with is
| partially random and partially a consequence of the DNA
| itself and that interplay is likely a complexity space
| several orders of magnitude bigger than the DNA itself
| EvanAnderson wrote:
| > The DNA could just be the operating system...
|
| I am fond of the analogy of DNA to procedural generation.
| The "operating system", as I see it, is physics. Everything
| else is primitives built on top of that.
|
| Our brains can't begin to comprehend the untold multitudes
| of interactions occurring at a molecular scale over
| geologic time.
| bob1029 wrote:
| > Also, look at demoscene. 4k (4 kB is the size of
| executable) can do crazy things
|
| There are limits to how Kolmogorov complexity scales up. Many
| of these tricks are exploiting procedural techniques that can
| be expressed in minimal terms. Once you start feeding in
| actual information that is not feasible to express
| procedurally (i.e., is already compressed/high-entropy), you
| are forced to accumulate bits. An obvious example of this
| would be incorporating a texture that is multiple megabytes
| when compressed as a jpeg on disk.
| out_of_protocol wrote:
| Evolution also uses dirty tricks all the time, for no
| reason. E.g. the same region gets reused for totally
| unrelated use-cases.
|
| > An obvious example of this would be incorporating a
| texture
|
| Some random range of storage data is now the texture. It
| was used to process formatting logic but now also a texture
| chromatin wrote:
| > There's less than 80 minutes worth of music's worth of
| information in our genomes
|
| What an insanely bad take.
|
| Not only did you not read and/or comprehend the article, the
| article itself undersells the information content of the genome
| (I'll post on this at the top level).
|
| > You are not predisposed to be anything.
|
| This does not logically follow your preceding statement, even
| if we were to accept the foregoing limited information content
| as factual
| nurettin wrote:
| You are predisposed to acting like your closest social circle.
| nathan_compton wrote:
| This isn't a great argument - simple rules can produce
| complicated behavior and, at any rate, I don't think any
| evpsych people believe that evolution inescapably predisposes
| people to the things you talked about, only that evolution has
| produced biases in our behavior which manifest (at certain
| times and in certain circumstances) as those phenomena.
| GuB-42 wrote:
| An audio CD is a very inefficient way of storing information.
|
| I think a more apt comparison would be that of a LLM of that
| size. qwen:0.5b is about 400MB, its abilities are laughable
| compared to the likes of ChatGPT, but it can write coherently
| about general topics. For instance. >>> why
| would people be altruistic People are likely to be
| altruistic because they believe that helping others is better
| for everyone involved. People may also believe in the
| power of compassion and empathy towards others, which can
| contribute to greater altruism. Overall, people are
| likely to be altruistic because they believe that helping
| others is better for everyone involved.
|
| It is not a statement about LLMs, more about what you can
| achieve with "just" 400MB for storage. The other similarity is
| that LLMs are also "messy", if you want to see the results of
| finely crafted work in a really small amount of space, look at
| what sizecoders can do with a few kB of code or less.
| tetris11 wrote:
| I thought the main advantage of DNA storage was the physical size
| of it, and how many different genomes you could have stacked next
| to each other in the same -70degree space.
|
| Millions of chimeric cells on the same petri dish? That's 1PB on
| a single glass slide.
|
| Depending on the sequencing tech paired with the rise of Spatial
| data, the read speed could be formidable.
|
| Needlessly complex setup though. Let's just stick with metals for
| now.
| out_of_protocol wrote:
| DNA self-desintegrate very fast. It only works in living cells
| because it is being repaired non-stop
| throwanem wrote:
| Even reading is a destructive process, and the physics
| involved are incomprehensibly complex by comparison with
| anything in the digital domain.
| kjkjadksj wrote:
| There are ways to read it nondestructively. One way does
| trade resolution but once prepped the DNA itself can be
| imaged to be read.
|
| https://en.wikipedia.org/wiki/G_banding
| throwanem wrote:
| That does not read DNA. A chromosome is not a strand and
| a karyotype is not a sequence. In any case, I described
| what a ribosome does.
| shishironline wrote:
| Sorry, it is one of the most stable organic molecules and can
| stay intact for thousands of years. That is why the Jurassic
| Park like fantasies are based on a truth and many extinct
| species have been brought to life through DNA in reality too.
| chermi wrote:
| I think maybe they are talking about the very tightly
| packed yet still functionally accessible 3d structure that
| is chromatin, not individual strands.
| misnome wrote:
| No, they haven't. Any claims otherwise are as real as "T
| Rex Leather" handbags.
| roxolotl wrote:
| Discussion from earlier this week:
| https://news.ycombinator.com/item?id=43927321
|
| Pretty sure the substack and main site are the same. First
| paragraph is at least.
| nuc1e0n wrote:
| The article says that DNA is designed to keep working despite
| mutations occuring. What evidence does the author put forward to
| suppose it was designed rather than evolved? There's plenty of
| evidence to support it evolved BTW.
| iamtheworstdev wrote:
| you might be reading a little too much into that word
| hsshhshshjk wrote:
| And likely on purpose too
| vintermann wrote:
| Information can only be defined with respect to states where you
| 1. Can tell (or could in theory tell) the difference and 2. Care
| about the difference between states. The differences you _care_
| about, and the ones you don 't, are baked in whenever you use any
| definition of information.
|
| It doesn't matter much, unless you use it to sneak in what you
| think we _should_ care about, or use it to make philosophical
| arguments whose circularity is carefully hidden.
| chromatin wrote:
| The article massively undersells the information content of the
| genome in several key ways. A non-comprehensive list of these
| (before my morning coffee forgive me) includes:
|
| - DNA methylation (https://en.wikipedia.org/wiki/DNA_methylation)
|
| - Interactions of alleles (what article refers to as the "two
| versions of each base pair")
|
| - Duplications, deletions, inversions, and other structural
| variations (https://www.genome.gov/genetics-glossary/Structural-
| Variatio...)
|
| - Physical proximity interactions in 3-dimensional space
| (https://cmbl.biomedcentral.com/articles/10.1186/s11658-023-0...)
|
| - Combinatorial effect (massive) of different alleles in complex
| systems
|
| Overall, it's not sensible to compare a linear sequence of bits,
| like a CD (sibling comment) or DVD (the article), to the linear
| sequence of the genome and conclude that their information
| content, based on length alone, is in any way comparable.
| deng wrote:
| He does mention structual interactions as well as
| duplications/deletions/inversions. I would argue methylation is
| more like an annotation of DNA and not part of the DNA itself,
| but that's a matter of opinion.
|
| In the end, the author literally says: "nobody knows". Yes, you
| cannot compare a linear sequence of bits to a macromolecule
| that interacts structurally with its environment, and the
| author does not make that claim. The question he tries to
| answer is: how much data is needed to re-create a similar
| macromolecule that interacts in a similar way. His main point,
| in which you both agree: only the exons are surely not enough
| because the encoded proteins are just a (small?) part of how
| DNA interacts.
| kjkjadksj wrote:
| Exons are almost like functions where as a gene is almost
| like a class definition. In different tissues in the body a
| gene might be alternatively spliced to lead to different
| protein isoforms. In effect, making use of only a subset of
| available functions in the class depending on certain input
| parameters or how the class is called.
| throwanem wrote:
| This is a Star Trek version of the subject, in that it is
| pure technobabble which happens to mention a few real
| terms.
| foobarian wrote:
| I find that even if this just provides a lower bound it is
| still an interesting piece of information.
| clickety_clack wrote:
| You put my reaction to this in much more educated terms. I've
| always felt that thinking of DNA as bits was a bit simplistic.
| Just because we store information as bits it doesn't mean that
| nature does.
|
| Not that it means they can't be right, but the author also
| doesn't seem to have any particular expertise in genetics.
| Their ideas need to survive a lot more criticism by people who
| know what they're talking about before you could start to see
| them as convincing.
| moralestapia wrote:
| But all of those emergent effects are accounted for in the DNA
| sequence [1], so the estimate is fine.
|
| 1. Maaaaybe you could make a case for DNA methylation, but that
| still requires some DNA signatures so ...
| RainbowcityKun wrote:
| - Cells work like this because DNA is under constant attack from
| mutations. - Mutations most commonly arise during cell
| replication.
|
| It's fascinating to realize that the "messiness" of DNA isn't a
| bug, but a feature--a side effect of evolution's raw material
| supply chain.
|
| Mutations, repeats, transposons, and imperfect repairs all
| contribute to a noisy genomic landscape. But it's exactly this
| noise that enables biological diversity. No mutations, no
| variation. No variation, no selection. No selection, no
| evolution.
|
| The genome is not a blueprint--it's a living, adapting
| scratchpad. Messiness is the canvas on which nature paints
| diversity.
| esafak wrote:
| Don't forget sexual reproduction.
| nickpsecurity wrote:
| Let me add to that. It requires a universe with specific laws
| that remain stable and encourage optimization. Then, a planet
| hospitible to life. Then, specific creatures with biological
| machinery more complex than anything humans have created. The
| machinery has plenty of reliability and adaptation baked in.
|
| Godless evolution suggests randomness produced all of it
| overtime. Yet, that's never worked in anything we've built.
| Even our GA's required laws, an environment, a computer,
| software, and fine-tuning. Pre-existing or by intelligent
| design (human inventors). Without these, it produced no
| results.
|
| So, I'll correct you by saying empirical data suggests
| evolution didnt produce this. We're seeing God's design skills
| in adaptive, resilient, complex, self-replicating systems. His
| work is truly beautiful to behold. Humans still can't produce
| something similar from scratch. Actually, they can't even be
| sure how the existing design works.
| amelius wrote:
| Another question is:
|
| How much information can you __store__ in DNA without affecting
| the organism too much?
| stenl wrote:
| A much more detailed and thoughtful (and peer reviewed) take on
| the same question from my colleague Jussi Taipale:
| https://www.embopress.org/doi/full/10.15252/embj.201696114
| metalman wrote:
| DNA contains all of the actualy relevant information that exists,
| including whatever sequence gives rise to the very
| conceptualisation of information, so in fact everything else that
| could be considered "information" is derived from DNA.
| gitroom wrote:
| Man, the back and forth here before coffee is actually kinda
| hilarious - I get all worked up before caffeine too, but
| honestly, DNA being this messy scratchpad feels way more
| interesting than treating it like a tidy CD. The messiness kinda
| rules, if you ask me.
| gfalcao wrote:
| I would like to get a reasonably good intuition in regards to the
| total amount of compound DNA from human bodies at different
| biochemical states, in different locations around the world
| (different climates). By "compound DNA" I mean, including DNA of
| bacterium, fungi and viruses living within one's body. For
| instance, gut bacteria acquired and maintained based on food
| intake and environmental influence.
| gfalcao wrote:
| In other words, how much the perception of DNA data in
| gigabytes grow by in different circumstances? Would it grow by
| a few more gigabytes ?
___________________________________________________________________
(page generated 2025-05-10 23:00 UTC)