[HN Gopher] Sequencing your DNA with a USB dongle and open sourc...
___________________________________________________________________
Sequencing your DNA with a USB dongle and open source code
Author : johntortugo
Score : 218 points
Date : 2021-12-26 18:31 UTC (4 hours ago)
(HTM) web link (stackoverflow.blog)
(TXT) w3m dump (stackoverflow.blog)
| m12k wrote:
| I'm really curious about what I could learn by getting my DNA
| sequenced, but I'm worried about my rights to not have it
| recorded and shared without my consent if I got someone else to
| do it for me - so any advance toward an affordable home test
| setup is very welcome.
| adabaed wrote:
| Imagine insurers refusing to give you a service due to your
| predisposition to certain diseases...
| foobarbecue wrote:
| If you haven't seen Gattaca, you should
| haihaibye wrote:
| There should be a directors cut where the mission fails
| because of Vincent's hidden heart condition.
|
| Gattaca shows eugenics has been so vilified that the
| audience will root for a character who selfishly commits
| fraud, risking lives and scientific progress for his own
| vanity.
|
| The really scary fact is that there would be no need for a
| police state and segregation. The genetically enhanced
| would just completely dominate an open and fair
| competition.
| adabaed wrote:
| Yeah super good!!
| meltedcapacitor wrote:
| Protection from this comes from laws that ban DNA-based
| policies, not by being secretive about sequencing. If it is
| allowed, insurers will have no need to obtain DNA sequences
| in devious ways, they will just ask and refuse cover or
| charge more when clients refuse to get sampled.
| m12k wrote:
| Sure, but being secretive about your DNA seems like the
| prudent course of action until those laws are in place
| toomuchtodo wrote:
| "Passed in 2008, a federal law called the Genetic
| Information Nondiscrimination Act (GINA) made it illegal
| for health insurance providers in the United States to
| use genetic information in decisions about a person's
| health insurance eligibility or coverage."
|
| Also prevents employment discrimination based on
| genetics.
|
| https://www.genome.gov/about-genomics/policy-
| issues/Genetic-...
|
| (disclosure: have had my DNA sequenced by multiple
| organizations, and it's publicly available)
| jrumbut wrote:
| What I worry about is having this data laundered through
| a couple of vendors.
|
| "How could we know our vendor's vendor was using genetic
| information in their proprietary risk score?"
|
| "How could we know our client's client was using our
| score for life, health, or auto
| insurance/employment/lending/etc decisions?"
|
| It's a "can't unring a bell" situation and the gaps in
| the regulations and the incentives for bad behavior are
| enormous.
| ajuc wrote:
| It's amazing how many problems you avoid by having public
| health system.
| inglor_cz wrote:
| You avoid the problem with medical debt, to be precise.
|
| You cannot really avoid the fundamental constraints -
| anywhere in the world, there are only so many doctors and
| so much money available for treatments. IDK if USA has a
| shortage of doctors, but plenty of European countries do. A
| country like Romania just cannot give its doctors big
| enough wages to stop them from seeking employment
| elsewhere, where they will get five to ten times as much
| (UK, Germany, Switzerland). As a result, local hospitals
| are seriously understaffed.
|
| Where I live, having personal connections to good doctors
| gives you an advantage - you will be examined and treated
| faster. Then there is outright nepotism.
|
| The outgroups are different than in America, but there are
| always people for whom the system sucks.
| adabaed wrote:
| You resolve part of them, but immediately generate others.
| Hybrid systems are the way to go.
|
| In Spain, for example, we have a private system but it is
| extremely inefficient in some areas (and very good in
| others). Of course, you can have private insurance, but you
| still have to pay your social security. Curiously, the only
| ones who can decide which system they want are the public
| servants...
| Method-X wrote:
| When I had 23andme sequence my DNA, I used a fake last name and
| pre-paid credit card.
| biophysboy wrote:
| Its only valuable if somebody also interprets it for you, such
| as telling you whether you have a genetic predisposition for
| certain diseases.
| DoctorOW wrote:
| Is that not something software can theoretically provide?
| jacquesm wrote:
| Your DNA can tell you a lot about what _could_ happen, but
| not about what _is_ happening.
| m12k wrote:
| One of the other comment threads indicates that the data,
| that you need to do that kind of annotation of the sequence,
| is to some extent available for home use as well:
| https://news.ycombinator.com/item?id=29695449
|
| I'm really hoping someone will work on an open source
| "23andme@home" solution that ties all this together in an
| accessible way.
| rumblerock wrote:
| Years ago I used Ancestry, then requested the .txt file and
| asked them to delete it from their records. Uploaded it to
| run a report at https://promethease.com/ that cross-
| references your SNPs against the existing body of genetic
| research.
|
| The results have been pretty astounding. I found markers
| that pointed to poor response to a specific blood thinner
| my grandfather was put on before he passed. Currently I'm
| researching the cluster of Bipolar / ADHD / SAD symptoms I
| experience that all seem to trace back to a certain
| genotype of circadian rhythm genes I have (thank you, Sci
| Hub). To boot, some of the studies I've come across have
| been done on Han Chinese populations that match my
| descendance.
|
| Perhaps going too far down this rabbit hole poses a self-
| diagnosis risk, but the correlations to my family history
| and my own life experience working with doctors to diagnose
| and treat symptoms are pretty undeniable. And given that
| your run-of-the-mill psychiatrist is going to treat you off
| of a DSM checklist, I feel much more confident knowing
| there have been genomic studies to back things up, since my
| doctor isn't up to date on this research, and finding one
| that would be will be difficult and expensive. I've shared
| the papers with my doc and he's been supportive, sometimes
| I feel like I should be getting a discount on services
| rendered.
| ClumsyPilot wrote:
| >"poses a self-diagnosis risk"
|
| Self-diagnosing is not the problem it is made out to be -
| I live with my symptoms 24/7, doctor sees me for 5
| minutes. The amount of times doctors have missed fairly
| clear sign of trouble in my family is disturbingly high.
| A simple procedure, done in time, would have saved two
| people I know.
|
| Unfortunately our educational system teaches you about
| mitochondrion, but not the practical difference between
| ibuprophene and paracetomol, or CRP.
| dekhn wrote:
| Note that you are literally shedding identifiable DNA from your
| body at all times and a truly motivated adversary would have no
| problem obtaining enough sample material to do high quality
| sequencing.
| nomercy400 wrote:
| It's not the motivated adversary I am worried about, who
| actually has to show up where I have physically been. It is
| the company on the other side of the world in a country with
| lax legislation, profiling me based on the data I 'shed'
| online, like a cloud-based DNA sequencing service.
| shukantpal wrote:
| At scale?
| hourislate wrote:
| I'm curious whether a Covid PCR test could be used to
| sequence your DNA. Is there enough of a specimen in the
| process.
| eurasiantiger wrote:
| Absolutely.
| dekhn wrote:
| Sure. I've worked with and know people who could carry this
| out at scale, although obviously individual sample
| collection isn't highly scalable.
|
| Edit: I used to help Google fund researchers like Joe
| Derisi and others who develop technology to do this, and
| some of the people I worked with in my academic career are
| quite good at identifying serial killers from 30 year old
| DNA. If you're downvoting because you think I'm making this
| up, you're wrong. If you're downvoting because you don't
| think large-scale individual detection using genetic
| sampling of the environment is possible, you're wrong. If
| you're downvoting because you think you couldn't do a whole
| genome sequence of an individual using a sample collected
| in the wild, you're wrong. If you're downvoting because you
| think this is a terrible idea (morally, ethically), that's
| fine but I didn't say anything about my own moral or
| ethical beliefs about this.
|
| It's simply factually correct to say that large-scale
| individual sample collection (at order tens of thousands,
| if not hundreds of thousands of individuals in a country
| the size of the US) is possible. All the technology is
| there to do this.
| ClumsyPilot wrote:
| The data monopolies and abuse originate from people giving
| these companies data for free. If they had to buy it, or pay
| goons to collect it, they wouldn't be profitable.
| russdill wrote:
| In the near future (or arguably now depending on your
| purpose) you don't even need that. Assuming enough of your
| relative's sequences are available, the probability of you
| having certain genes/mutations can be narrowed down so much
| that having your individual genome doesn't add much.
| kingcharles wrote:
| So, how long before I can take my DNA "ROM" file and boot it in
| an emulator that would allow it to grow?
| dekhn wrote:
| it's unlikely we would ever be able to achieve this. Even
| simulating a single cell at high resolution is a serious
| challenge.
| 323 wrote:
| You seriously underestimate the continuous growth of computer
| power. And quantum computers after, which are perfect for
| simulating chemical reactions.
|
| What was unthinkable 50 years ago, playing chess better than
| a human, it's now trivial for a $100 device.
|
| And it's not necessarily required that to simulate the growth
| of a human you'll need to simulate the entirety of chemical
| reactions in all 50 trillion cells and all that.
| dekhn wrote:
| It's possible I underestimate, but I have worked in all the
| relevant fields of simulation, ~20 years of running various
| simulations on large HPC, built the largest instance of
| folding@home using idle cycles inside google data centers,
| published papers simulating proteins, developed
| infrastructure to process the voluminous data, etc, etc.
| Quantum computing remains fantasy (in terms of being useful
| for science).
|
| It's unlikely even if we improved computing hardware many
| orders of magnitude beyond all reasonable predictions, that
| the calculations would be able to simulate all the
| necessary details; most of our simulations now are based on
| many approximations due to hardware limitations.
|
| As to the question of "what level of fidelity is required
| to turn a FASTQ of somebody's genome into an accurate model
| of the resulting human, with some sort of realistic
| environment also provided", that's so far beyond what is
| even remotely comprehensible it's not worth speculating
| about in terms of science fact; it's just fiction.
| GistNoesis wrote:
| It's likely that you don't have to simulate even a single
| cell at high resolution to be able to simulate how an
| organism would grow. There are numerical shortcuts.
|
| For example today we can already predict the color of the
| eyes and other phenotype from the DNA.
|
| If you are able to observe enough samples of cell growth and
| their associated DNA, you probably can model and predict the
| statistics of a cell from their DNA. Because the cell is
| itself the result of a lot of chemical processes, the law of
| large number will help smooth those statistics.
|
| Given that we have a lot of cells, the collective behavior is
| probably entirely governed by these statistics.
| Lev1a wrote:
| An idea just popped into my head reading your comment:
|
| What if you could take the (binary) data file of your DNA and
| use it as input in the (recently remastered) Monster Rancher
| games to generate a monster? Apparently those games use
| external user-provided data (like music CDs, game discs etc.)
| to generate the monsters the player would then train and use
| (something I only recently learned about through gaming
| livestreams).
|
| I'd actually like to see the level of jank that would come out
| of something like that.
| LinuxBender wrote:
| This is very cool. Are there by chance any associated projects
| that could evolve into something like 23andme but remain entirely
| within a private network meaning that the data is entirely in the
| hands of the individual?
| ampdepolymerase wrote:
| A used laboratory grade NGS system can be had for less than 10K
|
| https://www.ebay.com/itm/265148387179
|
| Nanopore is still not quite ready yet for precise and high
| accuracy sequencing. Give it another five years.
| anderspitman wrote:
| I work in a dry lab but I'm pretty sure you need a lot of
| expensive chemicals to actually make one of these work, yeah?
| mylons wrote:
| yup. that's the business model for Illumina. it's very much
| akin to video game consoles. Illumina might take a hit on
| selling the machine but makes it up in selling you
| proprietary reagents.
| rbartelme wrote:
| Cost/benefit analysis may dictate that, as other posters
| suggested, you'd be better served to get raw fastq files from
| a sequencing lab. Even better if you can send the lab a
| sample and they'll process the extractions for extra $$.
| mylons wrote:
| wow i didn't know they were that "cheap" now. i used to work
| for a major competitor to the sequencer you linked, the
| SOLiD.
|
| and i feel like nanopore is the VR of dna sequencing. it's
| always just another few years off.
| ampdepolymerase wrote:
| The one I linked to is a decade out of date and OEM
| discontinued.
| mylons wrote:
| ya my first thought was how hard are reagents to get, but
| probably not that hard. i wasn't in the lab, i was in
| bioinformatics so i'm generally clueless on reagent
| acquisition.
| joshuamcginnis wrote:
| What do you mean by it's always a few years off? Nanopore
| will allow you to do high-quality genomic sequencing _now_,
| in a home lab if you wanted, for less than $3K. If you
| amortize the 3K by the number of genomes you can sequence
| on the same flow cell, the price per base or per genome
| falls precipitously, depending on the size of the genome of
| course.
| divbzero wrote:
| > and i feel like nanopore is the VR of dna sequencing.
| it's always just another few years off.
|
| Is this also true for nanopores in protein sequencing? This
| HN comment from a few weeks back [1] pointed out recent
| progress but perhaps the tech is still not quite there.
|
| [1]: https://news.ycombinator.com/item?id=29481075
| joshuamcginnis wrote:
| That's not true. I just did a high-quality sequence and
| assembly of a new species of fungus from my home lab using
| nanopore. You can see all my code used for assembly and
| analysis that will be referenced in a paper I plan to publish
| in Jan here: https://github.com/EverymanBio/pestalotiopsis
| AstroDogCatcher wrote:
| Interested outsider here; I work with a lot of HCLS research
| customers but don't have a biology-related background. Can
| you explain the problems with the Nanopore sequencer accuracy
| in more detail? Basically, I was wondering if I could get one
| for myself and sequence my own genome, then user the data to
| learn about life-sciences computing techniques. If I were to
| buy one of the USB-attachable devices and run it, is the data
| simply not viable for use in a genomics pipeline, or is it
| just that the results would be questionable? Also, if
| accuracy is an issue, what about just running the same sample
| N times and doing some error correction?
| ampdepolymerase wrote:
| I recommend reading this review
|
| https://genomebiology.biomedcentral.com/articles/10.1186/s1
| 3...
|
| I guess there are limits to ensemble methods if the
| underlying accuracy doesn't increase. I don't work on gene
| sequencing algorithms but from what I understand of ML
| ensemble techniques, there are certain assumptions
| regarding the underlying independence of the errors. The
| errors for nanopore _should_ be uniform but I am not sure.
| Any molecular biologist here care to comment?
| biophysboy wrote:
| I know that the error rate of the oxford nanopore
| sequencer depends on GC content (guanine/cytosine
| nucleotides), and that the Pacific Biosciences sequencer
| uses a polymerase that gets worn down during reading. So
| there is some non-uniformity in the chemistry.
| ampdepolymerase wrote:
| GC rich regions as in hairpin loops? How would the
| sequencer deal with those?
| biophysboy wrote:
| The instruments do exactly as you say (run the sample N
| times), but this obviously comes at a cost. Also, keep in
| mind that sequencing needs to be very, very accurate to be
| useful. We share most of our DNA, and the small variations
| make up all the difference.
| netizen-936824 wrote:
| Sounds like a fediverse project?
| Malp wrote:
| Oh God, I would not want a distributed group of actors with
| limited trust to sequence my DNA. Maybe it's a project for
| close group of friends that would be interested?
| netizen-936824 wrote:
| I wasn't thinking sequencing but rather comparison. Could
| even hash data for comparison to enforce privacy (unsure
| how effective that would be)
|
| But this could enable things like finding relatives which
| is what I got out of the comment about 23andme. Instead of
| all the data being centralized, storage and comparison
| could be distributed
| inciampati wrote:
| Your DNA is almost exactly the same as other people's, just
| a unique mix.
|
| Not sure what you are concerned about. What would you
| expect a bad actor to do with your DNA sequences? I'm
| genuinely curious.
| snovv_crash wrote:
| Using that analogy, all the 1s and 0s in your private key
| are the same as everyone else's as well. Genetic data can
| be used for all kinds of things, the worst of which would
| be things like targeted diseases or planting your DNA at
| a crime scene.
| LinuxBender wrote:
| _Your DNA is almost exactly the same as other people 's,
| just a unique mix._
|
| Music is exactly the same notes, just a unique mix. So
| why is Sony upset that I want to stream their entire
| library? But jokes aside...
|
| A few decades ago I fought the military on collecting my
| DNA. I stalled them long enough to get my honorable
| discharge and avoid that all together. It's funny you ask
| because the commander asked the same thing and joked _"
| Are you afraid we are going to clone you?!"_ to which I
| replied, _" No sir, you should be afraid you are going to
| clone me."_ and we both had a laugh because he knew I was
| right. The military are not fond of critical/free
| thinkers. One of me was plenty. I explained that
| insurance companies were already using this data to
| retroactively cancel peoples policies even if they were
| not actively afflicted by something. The commander showed
| me how to use the FOIA request system.
|
| Laws have evolved a little since then but there are
| plenty of other risks. For starters, I can't easily
| change my DNA like I can change my debit card. That data
| can be used to tie me to others or _guilt by association_
| which is undesirable drama. It can also be used to try to
| sell me things. It can also be used to target biological
| weapons against specific groups of people. There appears
| to be an imbalance of data sharing in this regard. [1]
| Then there is simply the matter of privacy. If I want to
| share my DNA with some lab that is in turn going to sell
| it out to hundreds of other companies over and over
| forever, I should at very least be getting paid a vast
| amount of money and land and have legally binding
| contracts and NDA 's that cover what is and is not
| allowed to be done with my data and how long it may be
| retained. That contract and the laws enforcing the
| contract must have some serious teeth with very serious
| ramifications for anyone violating it whether
| intentionally or by mistake.
|
| [1] - https://www.youtube.com/watch?v=biNxl7tiVSY
| dav_Oz wrote:
| From a more paranoid perspective:
|
| I'm curious about the possible abuse scenarios given the
| ubiquitous use of PCR-testing for nearly two years, now.
|
| If I'm informed correctly for a viable sample for NGS you
| need like 2mL saliva (which sounds little but it really
| takes some time: >1 min) not those trace amounts which
| gets usually collected by the swabs?
| fragmede wrote:
| A very practical reason not to want your DNA out there,
| unrestricted, is insurance costs. From car insurance, to
| health insurance, to mortgage lending rates, and life
| insurance, and while GINA from 2008 is supposed to
| protect that information, there are loopholes with the
| interpretation of that law that should give everybody
| pause.
| mylons wrote:
| yes. if you wanted to annotate your genome you could "easily"
| do it on your brand new macbook (this is ram intensive, you
| probably need 32G). you'd need a reference genome, like
| https://www.nist.gov/programs-projects/genome-bottle
|
| then you'd need a program like bwa http://bio-
| bwa.sourceforge.net/ to map your data.
|
| then use https://samtools.github.io/bcftools/howtos/variant-
| calling.h... or something else to produce variants from the
| mapping results.
|
| then compare your resultant vcf file to something like dbSNP:
| https://www.ncbi.nlm.nih.gov/snp/
|
| at this point you can start generating a raw version of a
| 23andMe report.
| tootie wrote:
| I'm unclear from this what kind of equipment you need to
| extract and analyze the material?
| mylons wrote:
| you'd likely to have to get the nanopore sequencer in the
| article or find a lab using Next Generation Sequencing to
| sequence your DNA and give you "raw data" which are usually
| fastq files
| LinuxBender wrote:
| Nice! Thankyou for the links. I will research all of this.
| mylons wrote:
| good luck! it's not that tough, just a lot of new
| vocabulary.
| GekkePrutser wrote:
| I don't see any reference to the "USB dongle" mentioned in the
| title. I was thinking this would be some cool thing you could do
| at home.
| dekhn wrote:
| https://nanoporetech.com/products/minion
| fragmede wrote:
| I don't know if this is the exact nanopore USB dongle used in the
| article, but this one is $1,000 for the base package, first
| released in 2014
|
| https://store.nanoporetech.com/us/minion.html
|
| https://www.extremetech.com/extreme/190409-minion-usb-stick-...
| koeng wrote:
| Yep that's the one. They update the flow cells over time. The
| bit they don't tell you is the stuff you need, like a qubit, to
| properly run the thing.
| joshuamcginnis wrote:
| A qubit or fluorometer isn't required. You can use a simple
| DNA ladder to measure the relative quantity and quality of
| DNA that's good enough for nanopore sequencing. I just did a
| full genome sequence of a novel fungus using this exact
| approach.
| koeng wrote:
| Huh, interesting. Did you fragment? I'd imagine comparison
| of high weight gDNA wouldn't be too nice on a gel.
|
| You also still, in that case, need a gelbox + ladder +
| loading dye + sybrsafe or whatever, so it's still not
| nothing.
| joshuamcginnis wrote:
| I did a HMW extraction kit on the DNA and used a gel to
| estimate the volume of HMW DNA. Yes, you need to be able
| to run a gel, but I'm not sure what the expectation is
| from folks; that you just place a random piece of non-
| sterile tissue on a chip and have it do the extraction,
| sequencing and assembly? That seems like an unrealistic
| expectation.
| inglor_cz wrote:
| DNA sequencing bugs me quite a bit.
|
| On one hand, I would love to learn something new about my body.
|
| On the other hand, what if the results tell me that I am
| predisposed to some horrible untreatable disease? Will I spend
| the rest of my days observing every little pain or discomfort and
| thinking "is this IT?"
| nomercy400 wrote:
| How about affinities to possible health issues, which could be
| avoided if you started now and not in 20 years?
| inglor_cz wrote:
| I know. There is a lot of different scenarios. It is the
| worst one that bugs me. Human nature in action.
|
| Perhaps a trusted middleman would be a solution: "just don't
| tell me about anything that is totally beyond my control".
| wallacoloo wrote:
| well, build a whitelist of the conditions you are interested in
| knowing. then just run the report through a sed filter so that
| it strips out all the information you're not interested in.
| destroy the original report. problem solved: infohazards
| avoided.
| lend000 wrote:
| How does it get the DNA to go through the hole?
| Cyclical wrote:
| Initially, the DNA is brought near the pore through diffusive
| (brownian) motion + any small attraction it'll have to the
| membrane. Close to the pore it uses a combination of the
| electrophoretic and electro-osmotic effects to draw the DNA
| molecules through. The application of an external magnetic
| field will cause the charged DNA molecules to migrate along the
| field (electrophoresis). This is independent of the fluid, and
| happens to any ions under voltage. The electro-osmotic flow, on
| the other hand, is a motion of the fluid itself, pulling the
| DNA molecules along with it. EOF is a really interesting
| phenomenon which is caused by the interaction between the
| surface chemistry (vis-a-vis charge distribution) and the
| concentration gradient of charge carriers in the fluid. I'd
| recommend Fundamentals and Application of Microfluidics by
| Nguyen et al if you're looking for a good primer on
| electrically induced flows in microfluidics.
| dekhn wrote:
| Folks are free to analyze my genome, https://my.pgp-
| hms.org/profile/hu80855C
|
| Last time it was analyzed the conclusion was that there was
| nothing actionable.
| zmmmmm wrote:
| Have you ever encountered any insurance implications from it?
| eg: questioned whether you have ever had a genomic test etc.
| and had to answer yes and then them wanting to see results?
|
| I guess in your case where nothing actionable is found it's
| benign. It will be the cases where there are risk factors for
| late onset things - cancer, diabetes, heart disease etc. where
| it would get sticky.
| dekhn wrote:
| No, my health insurance company doesn't care about my whole
| genome data. Health Insurance companies are already quite
| skilled at (and profitable due to) their ability to model
| life expectancy and health issues without genomic data, and
| they are legally prohibited from using this data, in my
| country anyway. Life insurance is different (they are allowed
| to incorporate much more information) but I've never been
| asked for anything like that.
|
| As for the case where nothing actionable is found- it's not
| benign. It's absence of information, not information of
| absence.
| Cyclical wrote:
| Nanopore sequencing is a really interesting technology. It
| utilizes fundamentally the same apparatus as a Coulter Counter
| [1], which is a general method of counting and sizing arbitrary
| particles that's frequently used in flow cytometry. Applying it
| to sequencing by drawing unwound DNA through the pore was a
| really excellent logical leap, and we're only now starting to see
| the benefits of even though it was first ideated over 30 years
| ago.
|
| [1] https://en.wikipedia.org/wiki/Coulter_counter
| billiam wrote:
| TMI.
| a-dub wrote:
| the nanopore units are awesome! although if i recall, most of the
| device is a replaceable one time use consumable and the cost of
| that consumable is quite expensive (at least hundreds, if not
| thousands).
|
| when i looked i was interested, but was turned off when i saw
| that the cost far outstripped commercial sequencing services.
___________________________________________________________________
(page generated 2021-12-26 23:00 UTC)