[HN Gopher] A DNA 'parasite' may have fragmented our genes
___________________________________________________________________
A DNA 'parasite' may have fragmented our genes
Author : theafh
Score : 99 points
Date : 2023-03-30 15:32 UTC (7 hours ago)
(HTM) web link (www.quantamagazine.org)
(TXT) w3m dump (www.quantamagazine.org)
| dboreham wrote:
| Sometimes little frustrating that people writing articles in this
| field seem to have no exposure to computer science.
|
| This is the gene equivalent of a filesystem: the DNA fragments
| are like disk blocks. The interposing sections are file metadata.
| The shearing mechanism is the filesystem reconstructing a stream
| from the lower layer blocks. There's probably some redundancy and
| error correction in there too.
|
| It needs a filesystem for the same reason they were invented for
| computers: to provide an impedance match between the upper layer
| semantics (a stream of pairs describing a protein) and the lower
| layer storage (blocks). Using block structured storage is more
| flexible in terms of being able to insert in the middle of a
| file, etc etc.
| sorokod wrote:
| _It needs a filesystem for the same reason they were invented
| for computer_
|
| Why would that be true? It is a cool analogy but implying that
| it is the reason requires much more.
| renewiltord wrote:
| Without taking a stance on whether it's right, I believe that
| is explained right after your quote cut-off.
| sorokod wrote:
| That the biology of processing DNA is subject to similar
| constraints and solutions computer have dealing with disk
| storage is remarkable.
|
| Justification is needed.
| bronson wrote:
| At a coarse level, both are affected by information
| theory, so some parts may look vaguely similar. Sure,
| it's plausible you'd find related solutions, especially
| if you really squint.
|
| But it's like trying to explain how atoms work using a
| solar system analogy. It might help with the really easy
| stuff, maybe? (orbitals) But sticking with it makes going
| any deeper pretty confusing.
| [deleted]
| inciampati wrote:
| I don't like appeals to authority, but as a computational
| biologist, I am both a computer scientist and a biological
| scientist. So understand that I'm responding to you as exactly
| the kind of person you think should be drawing the kinds of
| links that you're suggesting.
|
| I'm sorry, but this is just not a reasonable analogy. DNA
| sequences are not like computer files. The reason that they're
| distinct modules in these sequences is due to the need for
| evolution to be feasible. And also for the basic reason that
| the sequences are linear and so modules tend to appear in these
| linear sequences. But there also tremendous nonlinearities as
| well. Things at very long distances can be importantly related
| to each other.
|
| The introns are not metadata. There are regions that can be
| removed selectively and in combinations to cause diversity in
| the produced proteins. That diversity is advantageous because
| it allows a single DNA sequence to present many different
| proteins that are typically related, but can be very different
| structurally. This splicing capability has evolved apparently
| from entities that can be seen as endogenous viruses or DNA
| parasites that have the ability to insert and splice themselves
| out of DNA and RNA sequences. In many confusing words, that's
| what the article or pointing at is talking about.
|
| The introns do provide a kind of redundancy, but only in the
| sense that there are areas that can be modified with minimal
| effect on cellular function, at least relative to modification
| of the exons, which directly correspond in a one to one way
| with proteins.
|
| There is error correction. It's called homologous chromosomes.
| Everyone talks about there being one genome, but in most
| complex forms of life, you have more than one copy per cell,
| usually two, and often more. These multiple copies, in addition
| to allowing for recombination and sexual reproduction, provide
| templates on which errors which arise during life can be
| corrected. However, there are no error correcting codes.
|
| If you'd like to learn more about the actual details of these
| systems, I strongly suggest an undergraduate molecular biology
| textbook. The best one in existence is called the Molecular
| Biology of the Cell.
|
| There are indeed many similarities between computing systems
| and biological systems, but the analogies you are making don't
| appear to be clear. Read a book like this, deeply and slowly,
| and it might change your life. At very least, it'll mean that
| the world you live in is much less mysterious and much more
| exciting.
| notfed wrote:
| "Molecular Biology of the Cell"
|
| Shout out to this book. Not only has the most amazing
| imagery, but you can learn a lot just by skimming the book,
| because of how well organized it is.
| pazimzadeh wrote:
| Aren't introns like pieces of code that have been commented
| out?
|
| But they also change the 3D conformation of the DNA itself,
| which changes access by transcription factors, etc.
| gus_massa wrote:
| It's more like someone during the night cut your magnetic
| tape in the middle of a txt file tape and glued a picture
| of a cat between the two parts. The picture of the cat has
| some special code in the extremes, so it automatically
| disappear when you open the txt file.
|
| There are some weird case, where the same txt file has two
| cat pictures, and sometimes instead of removing the two
| cats, the system removes also the texts between the cats.
| gus_massa wrote:
| You look interested in the subject, but I recommend to read a
| few biology books about it. There are many weird low level
| features of DNA that are not so cover in popular discussions
| [1] [2]. But I don't remember any that is similar to a
| filesystem as you propose. Take a look, you will be gladly
| surprised.
|
| [1] One of my favorites is that the bases of ADN are translated
| in groups of 3 to amino acids, so the code reads like
| AAABBBCCCDDDEEEFFF
|
| It's very unusual, but there are some virus that read the same
| part in two ways, with different offset, so the same part is
| interpreted as -JJJKKKLLLMMMNNN--
|
| I don't remember if they use the other offset too
| --PPPQQQRRRSSSTTT-
|
| [2] Another, not so interesting but relevant. Eukaryote has
| linear DNA, so they have some special repetition in the
| extremes. The idea is that the extremes are difficult to copy
| by the usual enzyme that copy the main part that has assorted
| code. But the extremes have a special easy pattern, so the cell
| can use some specialized enzyme to make them longer.
| AllegedAlec wrote:
| https://xkcd.com/793/
| bashinator wrote:
| Nope, at best the filesystem is an analogy. Just like the
| stretchy rubber mat isn't a perfect description of spacetime.
| rco8786 wrote:
| That's a weird thing to get frustrated about...that someone in
| a completely unrelated field didn't have the experience or
| courtesy to explain something using analogous terms to the
| thing you happen to be an expert on.
| vkou wrote:
| Doubly so, when the two domains don't actually map cleanly to
| eachother. DNA is not a computer program, or a file, or
| storage. There's no real distinction between data, metadata,
| and 'code' in it, either in structure, or in practice.
| aeonik wrote:
| I don't think they are unrelated fields. Computer science is
| the study of computation. DNA, to me, is clearly a quaternary
| computation system.
|
| I think there is a lot for both fields to learn by studying
| knowledge from each. Bioinformatics seems to be on that
| track.
| rco8786 wrote:
| Cool so what's the last thing you described to your team
| using terms from DNA research?
| spullara wrote:
| genetic optimization algorithms?
| anonymouskimmer wrote:
| We've got DNA which is basically a storage system. RNA can
| be catalytically active on its own. Typically RNA and
| proteins, or complexes of such, act on DNA in various
| manners.
|
| Maybe you're right in some way, but also consider whether
| using the nomenclature and ideas used to described
| processes of DNA repair, transcription, and translation in
| biology to describe electronic computation works well. If
| it does work well, then maybe the reverse would also work
| well. If it leaves much to be desired, then consider the
| possibility that computer science ideas may be too specific
| to electronic or mechanical computation.
| kleer001 wrote:
| https://en.wikipedia.org/wiki/Curse_of_knowledge
| otherme123 wrote:
| There are organisms with almost no introns, no redundancy, no
| CRC, no "metadata" and even overlapping genes to save space,
| like Giardia genome (
| https://www.science.org/doi/10.1126/science.1143837 ). Lots of
| virus have all their genes encoded without introns, and almost
| all the genome is encoding something.
|
| I've never seen DNA as close to a filesystem, and our current
| best bet on introns functions are they are used to create
| alternative splicing products from the same DNA chunk. I cannot
| identify this function in a filesystem, where you can obtain
| two or three different _valid_ files from the same data just by
| skipping some blocks.
| pazimzadeh wrote:
| What is CRC?
|
| Metadata is everywhere... histone modifications,
| glycosylation, etc..
| anonymouskimmer wrote:
| Cyclic redundancy check? And there is almost always basic
| redundancy in non-viral organisms (and many viruses) in
| that DNA is typically double-stranded, and most organisms
| have repair machinery that can rewrite across single-strand
| lesions using the opposite strand (this fails at double-
| strand lesions).
| GauntletWizard wrote:
| I can - It's called COW Snapshotting. Modern filesystems like
| ZFS and BTRFS don't ever overwrite parts of the file that
| change. They abstract it away by keeping an ordered list of
| blocks. Snapshots are simply copies of the old list.
|
| The analogy doesn't go very far, however.
| bronson wrote:
| COW just dedupes, it doesn't produce alternatives. Maybe a
| filesystem that figures out how your spreadsheet can be
| stored partway into an executable, with no loss to either?
| Yeah, this analogy doesn't seem real helpful.
| monocasa wrote:
| I don't think that's a great comparison. Exons already have
| sequences that mark 'block' boundaries; amino acids are encoded
| in triplets of base pairs, but sort of like you see in 8b/10b
| encoding, there are sequences that are valid but only used for
| control purposes and don't correspond to amino acids.
| afavour wrote:
| I don't want to sound dense here but why is it frustrating that
| people who write about genetics aren't familiar with computer
| science?
| agumonkey wrote:
| I can have similar thoughts at times. People in one field
| have their own lens to see the world and might miss some
| structures / patterns that exist in other domains. I felt it
| was a bit pompous to read a few medical books about the
| cardiovascular system, a lot of ceremony to describe an
| organic pump. You'd have to read mathematically inclined
| papers to start reading about equations and principles rather
| than latin nomenclature. Which I think is what the
| grandparent was wishing for.
|
| ps: I absolutely do not put computing above other fields
| though. I just wish for some pragmatic polymathism sometimes.
| anonymouskimmer wrote:
| > It needs a filesystem for the same reason they were invented
| for computers: to provide an impedance match between the upper
| layer semantics (a stream of pairs describing a protein) and
| the lower layer storage (blocks). Using block structured
| storage is more flexible in terms of being able to insert in
| the middle of a file, etc etc.
|
| As a non-CS person I find this explanation opaque.
| mmmrtl wrote:
| Misleading title ("Their" Genes). I don't see what introners have
| to do with the human genome?? They found evidence for introners
| in 5% of species...
|
| Original paper: https://www.pnas.org/doi/10.1073/pnas.2209766119
| [deleted]
| fnordpiglet wrote:
| I for one welcome our new introner overlords.
| masswerk wrote:
| "spliceosomes" - I'm somewhat disappointed. (Not really the true
| greco-roman spirit.)
| neoyagami wrote:
| oh. a "descolada"
| koeng wrote:
| My favorite quote I saw or heard somewhere on the rogue genetic
| elements in us all:
|
| "We are but a raft of genes in an ocean of retrotransposons"
|
| A little hyperbolic, but dang there are a lot
| MagicMoonlight wrote:
| That's sneaky. Ironically more like a computer virus than the
| classical viruses. Taking over the host and modifying its boot
| partition so that it permanently gets produced by the system.
| akavi wrote:
| I'd say more like a classical virus with a loop earlier in the
| central dogma.
|
| Ie, in the DNA => RNA => Protein cycle, viruses are a loop from
| Protein => DNA or Protein => RNA. Introners are a loop from RNA
| => DNA.
| gus_massa wrote:
| Virus are not Protein => DNA or Protein => RNA. Their
| information is in DNA or RNA, so they are
| https://en.wikipedia.org/wiki/Virus#Genome_replication
|
| * DNA => RNA => Protein
|
| * RNA => Protein
|
| * RNA => DNA => RNA => Protein
|
| As far as I know, there is no method to do Protein => DNA or
| Protein => RNA at the celular level.
|
| It would be very surprising. RNA and DNA are quite similar
| and have similar encoding, so RNA <==> DNA is a 1 to 1
| translation.
|
| The translation to RNA to proteins is not 1 to 1, and the
| translations table is quite arbitrary, so untranslating at
| the celular level looks extremely difficult.
| sobkas wrote:
| > That's sneaky. Ironically more like a computer virus than the
| classical viruses. Taking over the host and modifying its boot
| partition so that it permanently gets produced by the system.
|
| More like infecting compiler so every application build using
| it will include virus, including building compilers that will
| add virus code to their output and propagate it.
| anonymouskimmer wrote:
| There's basically no such thing as a boot partition. The
| machine of life has been turned on since the beginning. At this
| point, with all of the changes since the beginning, it's not
| obvious that there's a "boot partition" left that could
| reactivate life should it shut off. All cellular progeny is
| made with already functional and switched on proteins and RNA.
|
| The best you get to shutting off (without permanent cell death)
| would be the computer equivalent of hibernation. All the
| proteins and RNA are still there just waiting for the signal to
| activate again.
| jamiek88 wrote:
| Crazy to think we are all here because of that first multi
| cellular organism splitting over and over and over.
|
| Life doesn't reboot as you say, it is split off from other
| organisms whether seed, sperm, rhizome or any other method
| it's just cells dividing and spitting off other cells.
|
| Mind boggling to me.
| anonymouskimmer wrote:
| Yeah. I've known this for years but it really struck me
| when I typed it out here.
| __MatrixMan__ wrote:
| The idea that we all branched from a single ancestral
| organism has never sat well with me. Whatever started that
| process, however improbable... Well the universe allowed it
| to happen.
|
| Why expect that the universe wouldn't subsequently continue
| to let it happen, again and again?
| anonymouskimmer wrote:
| Sure, but it would be a completely different tree of
| life, from a different origin.
|
| It's possible our origin was from a community of
| ancestral organisms, but at some point all terrestrial
| life that we have discovered so far intermixed enough to
| create an effective universal common ancestor that we all
| appear to branch from.
| Tagbert wrote:
| Because, once it happens in an environment, there is no
| more room for an alternate life form to arise. A new
| instance of life would have to compete against the
| established line and it is unlikely to survive that
| process.
| Izkata wrote:
| My understanding is this is what happened during the
| Cambrian explosion.
| samus wrote:
| Life arising many times in parallel ought to have given
| rise to multiple trees of life with mutually incompatible
| biochemistry. Yet, overall life speaks about the same
| genetic language, and most things work very similar to
| each other. Life could still have indeed arisen multiple
| times, however, it probably either merged or got
| supplanted by its competitors. Life is simply too
| pervasive to allow for anything else. It would also
| immediately out-compete any newly arising life.
|
| There is some evidence that things like the genetic code,
| the choice of RNA/DNA nucleotides, and the set of the 20
| aminoacids aren't really random. That would not rule out
| life arising multiple times, but the likelihood that it
| merged with other lineages would be even higher.
|
| Short summary: https://www.science.org/content/blog-
| post/why-these-amino-ac... . A more in-depth paper: https
| ://www.sciencedirect.com/science/article/abs/pii/S03781..
| .
| akiselev wrote:
| There are a lot of microbes that are unculturable to this
| day and to my knowledge, no one has really done a proper
| investigation to see if the universal metabolic molecules
| like, for example, the hydrogen carriers
| NAD+/NADP+/NADPH, are truly universal. If we're going to
| see evidence of multiple trees of life, it'd be in those
| little details because most of the food chain has to
| interact with each other. Or fungi and other decomposers
| can bridge the gap.
|
| I think over the span of billions of years, evolution
| tends to converge too much for the trees to remain very
| distinct from each other.
| anonymouskimmer wrote:
| > There are a lot of microbes that are unculturable to
| this day
|
| Some of this has been solved by literally allowing the
| microbes to sit in culture for a year or so in order to
| either wake up from hibernation, or adapt to the culture
| composition.
|
| > see if the universal metabolic molecules like, for
| example, the hydrogen carriers NAD+/NADP+/NADPH, are
| truly universal.
|
| For anything that's based on DNA or RNA we now do direct
| sequencing of environmental samples. From this direct
| sequencing we can pull out individual genes and pathways.
|
| > I think over the span of billions of years, evolution
| tends to converge too much for the trees to remain very
| distinct from each other.
|
| We've got over 20 recognized genetic codes already from
| existing life. These are highly similar, but this
| probably points to similar origins instead of
| convergence.
| nobody9999 wrote:
| >The best you get to shutting off (without permanent cell
| death) would be the computer equivalent of hibernation. All
| the proteins and RNA are still there just waiting for the
| signal to activate again.
|
| Fungal spores[0][1] come to mind.
|
| [0] https://en.wikipedia.org/wiki/Spore#Fungi
|
| [1] https://space.stackexchange.com/questions/37268/can-
| mushroom...
| marcosdumay wrote:
| It was well known that a lot of our genome got inserted there
| by virus. I think the news this article is reporting is that
| the defense mechanism is the explanation for that weird
| behavior.
| stuckinhell wrote:
| Biology is truly fascinating, the ultimate hardware/software
| combo of proteins/genes.
|
| It's likely parasites can alter our genetic expression and
| behavior today as well like rabies and toxoplasmosis(cats often
| have it). Rabies causing the fear of water is truly mind bending,
| how does it do that?!
|
| Toxoplasma infection is classically associated with the frequency
| of schizophrenia, suicide attempts or "road rage".
| https://pubmed.ncbi.nlm.nih.gov/31980266/#:~:text=Toxoplasma....
|
| Rabies:As the disease progresses, the person may experience
| delirium, abnormal behavior, hallucinations, hydrophobia (fear of
| water), and insomnia.
| https://www.cdc.gov/rabies/symptoms/index.html#:~:text=As%20....
| whizzter wrote:
| Even more interesting reverse of that, hairworms that infect
| grasshoppers will once mature cause the hosts to jump into
| water and drown where the worm then reproduces before starting
| the cycle again.
| hypertele-Xii wrote:
| And cordyceps fungi compel ants to climb to a specific height
| off ground, at millimeter and 95% accuracy, to a spot of
| ideal location and humidity for the fungus to spore.
|
| And the craziest thing is, the cordyceps fungus doesn't
| actually infiltrate the ant's brain! Autopsies found the
| fungus spreads all over the ant's body, but _not its brain!_
| thaumasiotes wrote:
| > Rabies causing the fear of water is truly mind bending, how
| does it do that?!
|
| It doesn't.
|
| > Rabies:As the disease progresses, the person may experience
| delirium, abnormal behavior, hallucinations, hydrophobia (fear
| of water)
|
| This is a weird mistake for the CDC to make. The etymological
| meaning of "hydrophobia" is "fear of water". But the English
| word is completely disconnected from that; it just means
| "rabies". Because of this, the disambiguation page for
| "Hydrophobia" on wikipedia links to rabies as well as to
| "aquaphobia", an actual fear of water which had to be named
| badly because the name "hydrophobia" was already taken.
|
| Rabies was named "hydrophobia" because rabies patients will
| generally refuse water when it's offered to them. They do that
| because rabies makes it difficult to swallow, not because
| they're afraid of the water.
| livelielife wrote:
| is dna hardware? software?
|
| it's both! it's neither! oh, and it's also the runtime!
| fjfaase wrote:
| Bert Hubert has an interesting idea about the reasons for
| interons. He explains this in his talk 'DNA: More Greatest Hits
| (SHA2017)' The interesting bit, with some introduction, starts
| at: https://youtu.be/rCdhsN--Mdo?t=1440
|
| This is a follow-up talk to his talk: 'DNA: The Code of Life'
| https://www.youtube.com/watch?v=EcGM_cNzQmE
___________________________________________________________________
(page generated 2023-03-30 23:01 UTC)