[HN Gopher] Hospitals are selling troves of medical data
       ___________________________________________________________________
        
       Hospitals are selling troves of medical data
        
       Author : uniqueid
       Score  : 191 points
       Date   : 2021-06-24 10:04 UTC (12 hours ago)
        
 (HTM) web link (www.theverge.com)
 (TXT) w3m dump (www.theverge.com)
        
       | GuB-42 wrote:
       | The underlying claim here is that de-identification doesn't work,
       | the articles then explores the consequences of that.
       | 
       | But the real question should be: why doesn't de-identification
       | work and how to make it work. It is a technical problem, and I
       | thought it was more or less solved. But the author here things it
       | is just placebo. If so, what is the problem exactly? Is there a
       | fundamental problem with the very idea of de-identification? Is
       | there a "bug" in the process? What level is required to conduct
       | these attacks? Can in individual do it? A cybercrime gang? A
       | nation state? Is it only theoretical?
       | 
       | Depending on that, the answer could be very different.
        
         | andrey_utkin wrote:
         | Because it's an arms race situation.
         | 
         | The industry of de-identification has a long history in fact.
         | 
         | It is typical for many types of criminals to employ hard-to-
         | identify outfit and appearance. The work of investigators is to
         | overcome this hardship and identify the criminal. The co-
         | evolution of methods on both sides will never end.
        
         | ska wrote:
         | > It is a technical problem, and I thought it was more or less
         | solved.
         | 
         | As a technical problem, it is more or less understood to be
         | unsolvable.
         | 
         | The core issue is the more information you strip away, the more
         | sanitized the data is, the less useful it is. Once the
         | statistics of your "representative" set are no longer that
         | (even worse, weirdly biased) you are very limited in what you
         | can do. This works ok if you already 100% know the data you
         | need and don't, but otherwise it falls apart pretty quickly.
         | 
         | On the other hand, relatively small numbers of innocuous data
         | points about individuals will identify them uniquely.
         | 
         | These two points are in fundamental contention.
        
         | nonameiguess wrote:
         | It's a mathematical problem, not a technical problem. You can
         | remove features from a dataset that, on their own, serve to
         | uniquely identify someone, but that doesn't change that the
         | remaining features still each partition the population into
         | smaller and smaller subsets until eventually the subset is one
         | or only a few people. Once you get down to "only a few people,"
         | metadata may be able to bring that to one, i.e. there are a few
         | hundred people it could potentially be, but only one was
         | actually at the hospital the data was collected from at the
         | time it was collected. You don't necessarily even need the
         | hospital to leak that metadata. Maybe this person was the only
         | one who even lived near that hospital near that time.
         | 
         | The only way to anonymize the data in a foolproof way is to
         | remove so many features that it becomes useless for statistical
         | research.
         | 
         | Modeling-wise, regression, classification, and clustering alike
         | all rely on being able to construct a model that minimizes
         | entropy. But anonymity relies upon maximizing entropy. This
         | fundamental conflict can't really be resolved. You pick some
         | point in the middle of the spectrum that ends up serving
         | neither purpose well.
        
         | cmiles74 wrote:
         | It appears there is a spectrum of de-identification tools, some
         | being better than others.
         | 
         | https://en.m.wikipedia.org/wiki/Data_re-identification
         | 
         | I suspect that the more data they have for a particular patient
         | (many visits over multiple years) makes the re-identification
         | process easier. The article mention financial data, if dates
         | and amounts of charges aren't masked or altered that could be
         | cross-referenced with another data source to deduce the
         | person's identity.
        
         | prepend wrote:
         | De-identification is in the eye of the data steward and their
         | legal folks.
         | 
         | So it's the definition of de-id that doesn't actually "work" in
         | that it's possible to reidentify small amounts from de-id data,
         | and that adds up over time.
         | 
         | For example, HIPAA considers data de-id if you remove 19 fields
         | or expert determine that it's de-id. [0]
         | 
         | What's expert determination, who's an export? That's up to me
         | to decide and my lawyers to accept.
         | 
         | The bug is that if half a percent of people each de-id datasets
         | can be reidentified, likely acceptable in HIPAA, then each data
         | released adds up for reidentified people. And more datasets
         | allow for triangulation and linking to reidentify.
         | 
         | The article calls this out as a risk, not a certainty as it's
         | unclear if anyone is doing this. But the process would be
         | something like: 1) buy HIPAA de-identified data since it
         | doesn't require patient consent 2) reidentify patients using
         | other data publicly for sale (marketing data, voter
         | registration, etc) 3) new data is not longer HIPAA restricted
         | and fully identified health records can be sold for whatever
         | you like (eg, super targeted drug marketing)
         | 
         | [0] https://www.hhs.gov/hipaa/for-
         | professionals/privacy/special-...
        
           | vharuck wrote:
           | The US federal government requires users to agree to terms in
           | return for receiving potentially identifiable data. Usually
           | those terms include something like, "I agree to not attempt
           | to use these data to identify any individuals." Granted, an
           | agreement carries more weight when it's with the federal
           | government (lying to them is criminal). But this kind of
           | clause can prevent legal groups from reliably trading in
           | identified data, even if it's from a private hospital. If
           | they break the agreement, they'll probably never again
           | receive new data from the provider, and stale data is low
           | value.
        
         | abcc8 wrote:
         | Some medical data, such as genome sequence data, is unique to
         | an individual (where the uniqueness is partly dependent on the
         | length of sequence data generated - short sequences may not be
         | unique). Even worse, whole exome or genome data also describes
         | your lineage and may reveal sensitive information about
         | relatives, living and not. Deidentification of such genomic
         | data is especially difficult.
        
         | Silhouette wrote:
         | _But the real question should be: why doesn 't de-
         | identification work and how to make it work._
         | 
         | How much health data is required to distinguish a specific
         | individual even without any "identifying" information? Surely
         | there must be many people with unique combinations of
         | conditions and treatments. If that is so then a data set that
         | includes those people could only ever be pseudonymous, not
         | truly anonymous.
         | 
         | What other data sets might then be combined with the health
         | data to match a pseudonymous record with an identifiable
         | person? Payment records? Travel records? Time off work?
         | Insurance claims?
         | 
         | I don't know much about how the US healthcare system works, but
         | maybe before asking how to make de-identification of health
         | data work we should be asking whether it could ever work
         | effectively at all.
        
         | Spooky23 wrote:
         | It doesn't work because the data can be correlated against
         | other sets of data.
         | 
         | When my wife almost died from an ectopic pregnancy, Enfamil was
         | confident enough of the would-be due date based on data pieces
         | together by a broker to FedEx us a box of formula on the due
         | date.
         | 
         | Third parties get a real time feed of insurance claims,
         | hospital admissions, prescriptions and other data. They put the
         | puzzle together and make an assertion of valuable diagnoses.
         | (Pregnancy, diabetes, etc)
         | 
         | They don't need to be 100% accurate. The collateral damage of
         | reminding someone who lost a child and nearly lost their life
         | doesn't have a cost to them.
        
           | jnxx wrote:
           | > They don't need to be 100% accurate.
           | 
           | In many cases, abusing such data will also be profitable even
           | if the result estimate is highly noisy.
           | 
           | Just as an example, let's assume that an insurance company
           | knows your given name when you apply and they have
           | statistical data which shows that if your name belongs to one
           | set of names, you are smoking with a probability of 51%. If
           | you have another name, you are smoking with a probability of
           | 49%, because the correlation is noisy. It is still more
           | lucrative for the company to exclude applicants from the
           | "more likely to smoke" names group, because smoking causes
           | cost. You might never have tried a single cigarette in your
           | life but with the information the company has, it is more
           | profitable to not insure you.
           | 
           | Of course the above is an exaggerated example which would not
           | work out quantitatively (the profit they make by insuring you
           | might be higher than the expected value of risk from smoking)
           | but the principle still holds; it is used for credit scoring
           | and similar things.
        
             | Spooky23 wrote:
             | It's more insidious than that. Based on proprietary
             | databases, insurers can target advertising or marketing at
             | you based on profiles that you don't know exist.
             | 
             | If you want to avoid insuring a protected class, you might
             | be able to identify them based on those databases, and use
             | advertising to avoid the folks you don't want. (This is
             | common for student apartments)
        
       | aabaker99 wrote:
       | It's surprising that many medical providers don't understand this
       | side of HIPAA. I recently spoke to a department head of oncology
       | who seemed to think patient consent was required for sharing data
       | and was not comfortable sharing their patients' data. What they
       | don't realize is that HIPAA doesn't require consent if the data
       | is de-identified and so their organization can or is sharing
       | their patients' data anyway.
        
         | SkyPuncher wrote:
         | The problem is risk. HIPAA allows for a lot of things that feel
         | like exceptions to the core principles of HIPAA. Many things
         | are vaguely defined as "reasonable" - which changes over time.
         | 
         | MD5 was a reasonable password hash - until it wasn't. SHA1 was
         | - until it wasn't. Etc.
         | 
         | An article like this can arguably prove that there is no
         | reasonable means of de-identification since multiple data
         | sources can be combined. Combine that with the fact that HIPAA
         | puts pretty high limits on the minimum cohort size that can be
         | associated with a unique identifier and low ROI from actually
         | sharing this data.
         | 
         | Many places end up in a position where it's simply not worth
         | the risk to share.
        
           | pitaj wrote:
           | Arguably SHA1 is still reasonable as a password hash, but of
           | course not recommended.
        
             | pg_bot wrote:
             | In 2021, SHA1 is neither a reasonable or recommended
             | password hash for any project. Use a tool like bcrypt,
             | scrypt, or argon2 instead.
        
       | rscho wrote:
       | The funniest part is that this data is actually mostly (and often
       | completely) useless for the stated purposes of statistical
       | analysis. Routine clinical data collection is of abysmal quality,
       | but many buyers don't see the full extent of the catastrophe.
       | 
       | It'll be much more useful for legal and not-so-legal purposes.
       | And last but not least: insurance. As an aside, privacy
       | enforcement for medical data is much easier through
       | fingerprinting tethered to a NDA.
        
       | dillondoyle wrote:
       | Kind of a similarly, I've previously wondered:
       | 
       | if we act to shield prescriber data from corps if that would cut
       | down on shady pharma 'bribes' e.g. the opiate crisis and the some
       | other direct to Dr pharma marketing.
       | 
       | Insys and Purdue knew which doctors prescribed insane amounts of
       | pills. And then rewarded them with $. Insys even put it on paper
       | as ROI and at least a few went to jail.
       | 
       | I'm not sure technically how well (or if legally) it would work
       | though so maybe the answer is not at all.
       | 
       | mckesson would still know what pharmacies pills go to and anyone
       | can figure out where a MD works to correlate at least target zip
       | codes/markets. But maybe since we already have this monopolistic
       | distribution setup could prohibit mckesson from disclosing
       | granular shipment data.
        
       | prepend wrote:
       | I'm glad to see this getting more attention as it seems scary to
       | me as a private citizen who wants my medical data to stay
       | private.
       | 
       | I'm not sure how to defend against this as it seems based on
       | HIPAA and what it allows. Since de-identified data can be legally
       | sold, I think it will.
       | 
       | The theoretical defense I've thought up is a class action lawsuit
       | for synthetic breaches. Since these data are deemed de-identified
       | by expert determination [0] and that's hazy, if I could
       | reidentify myself after de-id, and I didn't authorize it, then I
       | could be eligible for breach damages for HIPAA violations up to
       | $50k per person [1].
       | 
       | Since these sets have millions of people, likely everyone in the
       | country. And since expert determination can possibly classify an
       | acceptable re-id risk as less than 1%, this could be a million or
       | two people. So a big enough pool to attract big legal
       | investments.
       | 
       | That would increase the cost and risk of doing this to outweigh
       | the benefits. But currently it's "free money" for any healthcare
       | system that's kind of impossible for me as a patient to opt out.
       | 
       | [0] https://www.hhs.gov/hipaa/for-
       | professionals/privacy/special-... [1]
       | https://www.injuryclaimcoach.com/hipaa-violations.html
        
         | aabaker99 wrote:
         | I like your idea of a class action lawsuit but wouldn't the
         | $50k per person penalty shift your idea of what an organization
         | would consider an acceptable risk? 1 million * $50,000 is
         | probably not acceptable.
         | 
         | I hope for all our sakes you are in the business of buying de-
         | identified medical data :)
        
           | prepend wrote:
           | I think this has the potential for the next asbestos or
           | tobacco or opioid payouts. I definitely wouldn't want to work
           | in this area (both for ethical and business model at risk for
           | sued out of existence).
        
         | throwaway894345 wrote:
         | As far as data privacy goes, I will say that the sale of health
         | data has some pretty significant upsides--specifically medical
         | research. There's probably a better way to get the best of both
         | worlds, but I would hate for us to just pull the rug out from
         | under medical research without a satisfactory backfill.
         | 
         | Disclaimer: I work for a company that uses this data to match
         | terminally ill patients with niche treatments and clinical
         | trials, and the work literally saves and prolongs lives.
        
           | captainoats wrote:
           | The article mentions the dispute over wether or not
           | anonymized medical record data really has enough of a signal
           | for meaningful research use cases. That has to be weighed
           | against the privacy concern.
        
             | throwaway894345 wrote:
             | Agreed. To be even more explicit, I'm not saying "do not
             | weigh the research concern against the privacy concern".
        
           | not_jd_salinger wrote:
           | > I work for a company that uses this data to match
           | terminally ill patients with niche treatments and clinical
           | trials, and the work literally saves and prolongs lives.
           | 
           | Talked with a company that made similar claims about what
           | they are doing, but it turns out they were really ensuring
           | that their users where choosing the medication from the
           | highest bidder rather than the one that was really in the the
           | patient's best interests.
           | 
           | The mental gymnastics they did to justify they were doing
           | what was in the patient's best interests was fun to observe,
           | but in the end they were just a tool of a large
           | pharmaceutical company, exploiting sick people for profit.
           | 
           | Quick cynicism sanity check: who pays your company, your
           | users or the "niche" treatment provider?
           | 
           | If it's the former, that sounds like it's good work.
           | 
           | If it's the latter I would recommend being a bit more
           | skeptical of the business motives of your employer and their
           | customers.
        
             | throwaway894345 wrote:
             | > Quick cynicism sanity check: who pays your company, your
             | users or the "niche" treatment provider?
             | 
             | Insurers. They don't want to pay for the expensive product.
        
         | giantg2 wrote:
         | Ta-da
         | 
         | https://www.nytimes.com/2019/07/23/health/data-privacy-prote...
        
         | notafraudster wrote:
         | HIPAA has no private cause of action (you can't sue providers
         | who violate your rights under HIPAA). The government can fine
         | them, so from a provider POV they are liable for the breach you
         | propose, but you are not eligible for recovering the $50k.
         | 
         | You may or may not have a private civil tort against the
         | medical provider, separately from HIPAA.
        
           | analog31 wrote:
           | People talk about the US being lawsuit-happy, but here's an
           | example where the tort system could work to encourage the
           | enforcement of a law if the government doesn't want to
           | enforce it.
        
             | Teever wrote:
             | As a non-American I am always fascinated by the hacks and
             | kludges that people propose to mitigate the inherent
             | dysfunction that is the US medical system.
             | 
             | I consider intentional breaches of medical privacy to be in
             | the same league as physical assault and as such would
             | expect a society to respond to these kinds of crimes with
             | swift and severe punishments including jail time and fines.
             | 
             | Why not just arrest these people and take the profits from
             | their crime instead of adding civil lawyers and bureaucracy
             | to the mix?
        
           | wombatpm wrote:
           | Would a hosital employee be able to make a whistle blower
           | case and recoup a portion of the fines?
        
           | prepend wrote:
           | Good point, the fines are by HHS and not dollar amounts to
           | victims. I gave the amounts to signify the importance and
           | IANAL would be part of a basis for establishing harm done.
           | 
           | I specifically linked to an injury lawyer to show the types
           | of current HIPAA violation civil suits that have succeeded.
           | My reasoning is that it's lucrative enough for lawyers to
           | solicit business.
        
       | jtaft wrote:
       | > As long as they de-identify the records -- removing information
       | like patient names, locations, and phone numbers -- they can give
       | or sell the data to partners for research.
       | 
       | I don't feel this is enough to deidentify.
       | 
       | If timestamps or a patient # is associated with records, should
       | be possible to combine with credit card records to discover who
       | someone is.
       | 
       | I wonder if we can request what is shared.
        
         | organsnyder wrote:
         | I work in healthcare (though not with any efforts like those
         | described in the article). Anonymizing data is much more
         | complicated than simply removing the obvious PII: depending on
         | what the data is, even things like procedure codes with dates
         | might be enough to identify a patient. HIPAA and other related
         | regs account for this, and have pretty strict procedures that
         | must be followed before data can be considered officially
         | deidentified.
        
       | pulse7 wrote:
       | In a few months somebody can come up with a deep-neural-network
       | to identify those people behind "de-identified data"...
        
         | rscho wrote:
         | months? Hours.
        
         | dekhn wrote:
         | yes, that deep neural network is called "an army of MIT
         | students, grad students, and postdocs"
        
       | [deleted]
        
       | agumonkey wrote:
       | how come more and more of society is about selling blips of
       | events as data ?
       | 
       | is the price of everything far lower than what is required to
       | sustain the system ?
        
       | TrailMixRaisin wrote:
       | There is a (famous) quote from Cynthia Dwork, who is most likely
       | the key researcher behind differential privacy: "De-identified
       | data isn't".
       | 
       | You can either re-identify the people behind the data or you
       | alter it that strong that it becomes useless for meaningful
       | applications.
        
         | miej wrote:
         | re-identification is basically the same as browser
         | fingerprinting. with enough vaguely stochastic variables, you
         | can uniquely identify pretty much anyone
        
         | specialist wrote:
         | > _" De-identified data isn't"_
         | 
         | People don't yet have intuition about this. I still don't even
         | know how to articulate it. Here's a stab:
         | 
         | "With enough data collected, you can uniquely identify people
         | by ruling out everyone else."
         | 
         | Mid 2000s, Seisent was helping law enforcement solve cold cases
         | using big data. In layperson's terms, they'd narrow the list of
         | suspects by ruling out everyone who has a solid alibi.
         | 
         | At the time, that was achieved by building profiles of
         | _everyone_ , living and dead, simply by compiling 1600
         | _publicly_ available datasets. Like court records and mortgages
         | and whatnot.
         | 
         | Today you'd include location tracking, social media, all
         | financial tracking, etc.
         | 
         | It's remarkable that any crime goes unsolved. Like the back log
         | of rape test kits, it's only because no one cares enough to
         | bother to look.
        
           | pokot0 wrote:
           | Just playing akinator once will help giving an intuition
           | about this.
        
             | MaxBarraclough wrote:
             | Web version of Akinator. [0] Pretty neat. Pity about the
             | site's high CPU usage.
             | 
             | [0] https://en.akinator.com/
        
               | Banana699 wrote:
               | I thought of John Kennedy the first time, took the game
               | about 55 questions to get to it, it didn't until it knew
               | that he was a US president, was assassinated (as far as I
               | know only Lincoln and Kennedy meet those two criteria
               | together) AND that he lived in the 20th century, so it
               | couldn't even rule out Lincoln. Next game I thought of
               | Cameron Diaz, the game simply gave up after the 25th
               | question or so.
               | 
               | It's a pretty neat idea overall, reminds me of those
               | link-following competitions using Wikipedia where you
               | have to start from a topic and get to another completely
               | unrelated topic solely by recursively chasing links. They
               | both exploit the interconnected nature of our culture, a
               | question (or an article) about a recent TV show providing
               | a clue (or a link) about a pretty unrelated historical
               | character.
               | 
               | The game desperately needs some notion of statistical
               | proximity though, it kept asking whether the character
               | was an actor even after knowing he is a president, it
               | doesn't do _any_ kind of deduction on what it already
               | knows, just mad slash-and-dash till it gets the
               | identifying info.
        
               | MaxBarraclough wrote:
               | It worked much better for me. It got South Park's Stan
               | Marsh in not too many guesses.
        
         | nradov wrote:
         | De-identified data can still be quite useful for some types of
         | research. If you strip away every field of personally
         | identifiable information except, let's say, sex and birth year,
         | there's no way to do meaningful re-identification.
        
           | TrailMixRaisin wrote:
           | I am sorry, but your answer is nearly the exact textbook
           | example how wrong most people are with their intuition. In
           | 2000 Latanya Sweeney showed that 87% of Americans can be
           | identified by Sex, Birthday and Zip-Code. As other commenters
           | point out, the connection to external databases is extremely
           | powerful and dangerous to privacy.
           | 
           | There are other concept in research to still use this data in
           | research like differential privacy or using KI to synthesize
           | data according to training data provided. But so far all
           | concepts that tried to alter and then publish a datasets
           | directly for research failed miserably.
        
             | prepend wrote:
             | Birthday is super different than year of birth. And zip
             | code is really precise.
             | 
             | GP said sex and year of birth that are usually perfectly
             | fine for deidentification assuming some basic k-anonymity
             | and l-diversity protections.
             | 
             | It's frustrating when people bring up examples of
             | reidentification out of data that were not properly
             | reidentified in the first place.
             | 
             | Hopefully no one competent would think that having unique
             | records based on sex, dob, and zip code are makes data
             | deidentified. This is usually the case of someone not
             | actually deidentified.
             | 
             | The bigger risk, I think, is when you have some threshold
             | of at least 10 records sharing sex, year of birth and still
             | iding individuals.
        
               | jfrunyon wrote:
               | Sure, sex and year of birth alone are fine. Why do you
               | assume they won't be able to use the medical data to
               | further de-anonymize it? "Male, 1993, records are from a
               | hospital in Central Texas so he lives there, who has
               | posted on social media about conditions x, y and z" would
               | probably turn up... me.
        
             | giovannibonetti wrote:
             | The parent comment mentioned birth year, not birthday
        
             | ghaff wrote:
             | In fairness, the parent did just say Birth Year and Sex
             | while Latanya Sweeney used the identifiers you list which
             | give quite a bit more information. I agree with your basic
             | point though that data that appears to be anonymized can
             | frequently be de-anonymized--at least to some degree of
             | statistical certainty to a degree that most would find
             | surprising.
        
           | alasdair_ wrote:
           | All information is potentially personally identifiable.
           | 
           | For example, perhaps you take data on favorite movies and
           | strip away every bit of PII except sex and birth year (as you
           | stated).
           | 
           | Now let's say I take the stripped data and target some
           | facebook ads to people of a specific sex and birth year and
           | have the ad be for people who love a certain obscure movie,
           | doubling down on a second obscure movie and so on. Eventually
           | I could have a reasonable chance of determining which _other_
           | movies a unique individual may like based on the stripped
           | data.
           | 
           | Obviously irrelevant for movies, but more relevant for
           | prescription drug uses, sexual preferences etc.
           | 
           | The point is that with enough outside data available, even
           | data stripped of PII can be de-anonimized.
        
           | rscho wrote:
           | This is wrong. Rare diagnoses are the simplest case of
           | reidentification, but there are many many other
           | opportunities. There's a whole field of research about that.
        
             | specialist wrote:
             | Ditto time and sequence (order). For example, online movie
             | reviews were deanon simply by correlating viewing history
             | with order reviews were posted.
        
               | tmearnest wrote:
               | Dates and times are generally deidentified by choosing a
               | random initial date and changing subsequent timestamps to
               | the random initial plus the duration between visits. The
               | sequence and delays could potentially be used to identify
               | patients, but this would be a lot harder than having
               | absolute timestamps.
        
               | specialist wrote:
               | Totally.
               | 
               | I recently had a crazy notion for losslessly scrambling
               | the sequence as well. Mostly for protecting voter privacy
               | (order in which ballots are cast). One of the major
               | blockers to fully digital voting.
               | 
               | I haven't found any hits using terms like "cryptographic
               | timestamps." Surely I can't be the first.
        
               | seoaeu wrote:
               | Voter privacy for digital voting doesn't solve much. It
               | doesn't seem to be possible to both cryptographically
               | prove to a voter that their vote was counted correctly
               | (integrity) while simultaneously preventing them from
               | being at risk of coercion to share who they voted for
               | (privacy).
        
       | JohnWhigham wrote:
       | Soon enough this data will end up in the hands of our insurers,
       | and enough technology will be built to where if you buy a 6 pack
       | on the weekend, your premium will be adjusted on-the-fly.
       | 
       | It's all so tiring.
        
         | throwaway3699 wrote:
         | FWIW socialised healthcare (the better alternative) hasn't
         | solved that particular problem. Punitive taxes on alcohol and
         | smoking are very common because they're trying to reduce costs.
         | Insurance is the same thing but on an individual basis, which
         | means healthier people should _in theory_ get better premiums
         | in your example.
        
           | JohnWhigham wrote:
           | Right, but I'm talking about the novel ways companies try to
           | optimize premiums. Car insurance is already doing it with
           | some companies wanting you to install an application on your
           | phone to track your driving habits in hopes of maybe lowering
           | your premium. Disgusting shit.
        
             | pc86 wrote:
             | I don't install these apps but what exactly is wrong with
             | them and why is it "disgusting shit?" If I drive the speed
             | limit or slower, stop for 3-4 seconds at every stop sign,
             | don't accelerate quickly, etc., what's wrong with me having
             | a lower auto insurance premium than someone with an
             | identical profile who goes 10 over the limit everywhere and
             | rolls through every 5th stop sign?
             | 
             | I'm not implying insurance companies are doing this out of
             | altruism. If it resulted in a net decrease in profit, the
             | apps wouldn't exist. But it does seem like a beneficial
             | form of price discrimination. It seems like it's only
             | "disgusting shit" if it makes your insurance premiums go up
             | because you're a less safe driver.
        
               | throwaway3699 wrote:
               | I will admit I'm not a fan of surveillance based
               | insurance, either. Just pointing out the alternative is
               | individualising the cost via insurance or making all
               | society pay for a few people.
        
         | DennisP wrote:
         | Under current US law, health insurance companies can't even
         | adjust your premiums for serious preexisting conditions, much
         | less because you bought some beer.
         | 
         | Auto insurance could maybe do it though.
        
           | JohnWhigham wrote:
           | Don't think that could not change in a heartbeat, given how
           | powerful the healthcare lobby is in Congress.
        
             | [deleted]
        
             | DennisP wrote:
             | And yet the ACA passed in the first place.
        
               | droopyEyelids wrote:
               | the ACA was a big win for insurers.
        
           | mbg721 wrote:
           | Many plans are highly punitive towards smokers; it's not a
           | huge leap to imagine something new becoming the Sin Of The
           | Week.
        
             | wolverine876 wrote:
             | It's hard to compare smoking to the 'sin of the week'. It's
             | the leading cause of preventable death in the U.S. (or was
             | a few years ago), and probably has been for generations. It
             | is one of the most studied and prolonged public health
             | issues.
             | 
             | It's hard to blame smoking insurance costs on 'sin' - it
             | kills people and increases costs for the insurance company,
             | possibly more than any other choice people can make. If you
             | drive in drag races and ask for car insurance, don't be
             | surprised if it costs more.
        
           | alasdair_ wrote:
           | The trick is to use a proxy for the metric that IS
           | permissible in much the same way that financial companies in
           | the 1960s used address data once it became illegal to
           | discriminate directly because someone was black.
        
             | DennisP wrote:
             | The only factors are "location, age, tobacco use, plan
             | category, and whether the plan covers dependents."
             | 
             | Maybe some zipcodes drink more than others, but they don't
             | need to figure that out because they can just look at
             | illness rates for the zipcode directly. Same for age and
             | tobacco.
             | 
             | https://www.healthcare.gov/how-plans-set-your-premiums/
        
         | Spooky23 wrote:
         | Already done - this type of data is already used to assess your
         | risk for opioid addiction in several states, especially if you
         | are in a high risk/high cost group.
        
         | unishark wrote:
         | So medical science could figure out the health consequence of
         | all your dietary and recreational decisions with amazing
         | granularity, and your take is this is a negative thing because
         | of the possibility of higher insurance rates?
        
           | JohnWhigham wrote:
           | Yes? Are you not aware of how money-hungry these insurance
           | companies are, and how you have to fight them tooth-and-nail
           | to get some things covered?
        
           | missedthecue wrote:
           | If you live an unhealthy lifestyle, it's easy to imagine how
           | a fairer deal might cost you more money.
        
         | teacup21 wrote:
         | >> Soon enough this data will end up in the hands of our
         | insurers, and enough technology will be built to where if you
         | buy a 6 pack on the weekend, your premium will be adjusted on-
         | the-fly.
         | 
         | After watching the movie "I Care A Lot"
         | (https://www.imdb.com/title/tt9893250/) and reading up the
         | horror stories about forced Guardianship scams in the US
         | (https://www.newyorker.com/magazine/2017/10/09/how-the-
         | elderl...) -- i'd be worried that corporatized Guardianship
         | companies scan medical data en-masse to find victims (with
         | sufficient work and profit motive, it can be de-anon'd)
        
       | stevebmark wrote:
       | It would be more honest if the article title said "deidentified"
       | medical data.
        
       | Frost1x wrote:
       | Information is information and the more you have, the easier it
       | is to figure out where it came from. Some data makes it easier
       | than others.
       | 
       | I worked with a hospital awhile back with patient radiological
       | data (for free and for science). Patients had to explicly sign-
       | off that they were sharing their data with us and what we planned
       | to use it for. A lot of the metadata from DICOM wasn't even
       | wiped, I had their names, street addresses, all sorts of stuff
       | which was supposed to be wiped (de-identified).
       | 
       | Even then, I worked with data from a patient with a brain tumor
       | and their neurosurgeon looking to remove it. That data was
       | correctly anonymoized but it was their head, so I could basically
       | reconstruct their face--so is a coarse geomerric sampling of a
       | face de-identified? I guess it depends on how coarse it was.
       | 
       | Just look at what's done with social media, advertising, and
       | browser data to get an idea of where things can go.
        
         | deaps wrote:
         | Under what circumstances did they "explicitly sign-off" on the
         | data sharing, I wonder? There are a lot of times during a
         | hospital visit, when one could be less-than-observant of
         | exactly what he/she is signing.
        
           | Frost1x wrote:
           | _Very explicitly_.
           | 
           | This was for a project paternship with a university and
           | hospital to improve patient outcomes with some new
           | exploratory tech approaches. A short document that followed
           | some standard study participation format was generated using
           | easily understandable language in about 1-2 pages IIRC (large
           | fonts so it was easily readable by patients with poor eye-
           | sight). Everything done, including the document, went through
           | an external IRB process for human subject data and was
           | approved. Everyone involved had to go through human subject
           | training and what not.
           | 
           | Physician would mention the study to patients that would
           | likely be good subjects for the work about the work, its
           | goals, if they'd be interested in participating. Forms were
           | then provided to patients involved to sign (explicitly) about
           | their agreement to participate in the effort and how their
           | data would be used, protected, etc. The process also required
           | physician sign-off to confirm they read the document to the
           | patient verbally, determined they were competent, cognizant,
           | not under any sort of duress/intoxication, etc. The patient
           | also needed to verbally acknowledge they agreed. Oh, and
           | there was a clause they could retroactively pull out of the
           | work, including their data at any point of they felt
           | uncomfortable or changed their minds.
           | 
           | The patients and their data weren't the product, tech
           | developed that would assist patients was the product of the
           | data. For patients who agreed, some would also be permitted
           | to see some of the products of the work related to their
           | data. Im forgetting a lot of the data collection process
           | because it was very rigorous and several years ago now, but
           | everything above bar, no dark patterny _ah-ha-gotcha!_ line
           | buried in a 300 page liability sign off they had to agree to
           | for some necessary life saving treatment or anything of that
           | nature.
           | 
           | I even got to meet some of the people we helped which was a
           | bit rewarding to see people's lives improve a bit with
           | technology. The specific patient mentioned and their
           | neurosurgeon even let me sit in on their brain surgery tumor
           | removal (patient's suggestion), which was a very unique
           | experience. So yea, they knew what was going on.
           | 
           | With that said, not all data usage was as transparent and
           | ethical as what I worked with, and I saw a lot of mistakes
           | there that make me cringe thinking what a less ethical
           | business with no transparency might do, given the
           | opportunity.
        
           | pc86 wrote:
           | Every doctor's visit I've had in recent memory had a data
           | sharing agreement I had to sign (or at least, that was
           | presented to me) if I hadn't been there before.
        
       | aabaker99 wrote:
       | One troubling re-identification attack for medical data is the
       | trail re-identification method [0]. A lot of privacy analysis
       | will consider the data to be in the shape of a table T with some
       | columns A,B,C and use the notation T<A,B,C> to describe a de-
       | identified dataset. The trail method will take multiple de-
       | identified datasets, each from a different hospital, T_1<A,B,C>
       | T_2<B,C,D>, T_3<C,D,E> and use their shared columns to narrow
       | down on a set of individuals.
       | 
       | So, even though each hospital may have a legitimately de-
       | identified dataset in isolation, it is not de-identified when
       | combined with the (also de-identified) data from another
       | hospital. The risk of this attack increases as patients visit
       | more hospitals. We humans are fairly long-lived and tend to move
       | around so it may be substantial. (That being said some hospital
       | systems are quite large like Kaiser Permanente and serve huge
       | areas so visiting multiple hospitals doesn't necessarily create
       | multiple tables.)
       | 
       | [0]
       | https://dataprivacylab.org/dataprivacy/projects/trails/trail...
        
         | alistairSH wrote:
         | For this attack to work, wouldn't one of the tables need to
         | contain PII of some sort? If A,B,C,D,E are all de-identified,
         | the aggregate is still de-identified? But, if E is SSN (or some
         | other PII data), then the entire set can be re-identified?
        
           | taejo wrote:
           | That's one option: you combine protected, de-identified
           | information with unprotected (e.g. non-health) information to
           | re-identify the protected information. But also, something
           | like Facebook allows you to target a person who lives in
           | $TOWN, works at $COMPANY, born in $YEAR, even if you don't
           | know that person's name or SSN.
        
       ___________________________________________________________________
       (page generated 2021-06-24 23:00 UTC)