[HN Gopher] Hospitals are selling troves of medical data
___________________________________________________________________
Hospitals are selling troves of medical data
Author : uniqueid
Score : 191 points
Date : 2021-06-24 10:04 UTC (12 hours ago)
(HTM) web link (www.theverge.com)
(TXT) w3m dump (www.theverge.com)
| GuB-42 wrote:
| The underlying claim here is that de-identification doesn't work,
| the articles then explores the consequences of that.
|
| But the real question should be: why doesn't de-identification
| work and how to make it work. It is a technical problem, and I
| thought it was more or less solved. But the author here things it
| is just placebo. If so, what is the problem exactly? Is there a
| fundamental problem with the very idea of de-identification? Is
| there a "bug" in the process? What level is required to conduct
| these attacks? Can in individual do it? A cybercrime gang? A
| nation state? Is it only theoretical?
|
| Depending on that, the answer could be very different.
| andrey_utkin wrote:
| Because it's an arms race situation.
|
| The industry of de-identification has a long history in fact.
|
| It is typical for many types of criminals to employ hard-to-
| identify outfit and appearance. The work of investigators is to
| overcome this hardship and identify the criminal. The co-
| evolution of methods on both sides will never end.
| ska wrote:
| > It is a technical problem, and I thought it was more or less
| solved.
|
| As a technical problem, it is more or less understood to be
| unsolvable.
|
| The core issue is the more information you strip away, the more
| sanitized the data is, the less useful it is. Once the
| statistics of your "representative" set are no longer that
| (even worse, weirdly biased) you are very limited in what you
| can do. This works ok if you already 100% know the data you
| need and don't, but otherwise it falls apart pretty quickly.
|
| On the other hand, relatively small numbers of innocuous data
| points about individuals will identify them uniquely.
|
| These two points are in fundamental contention.
| nonameiguess wrote:
| It's a mathematical problem, not a technical problem. You can
| remove features from a dataset that, on their own, serve to
| uniquely identify someone, but that doesn't change that the
| remaining features still each partition the population into
| smaller and smaller subsets until eventually the subset is one
| or only a few people. Once you get down to "only a few people,"
| metadata may be able to bring that to one, i.e. there are a few
| hundred people it could potentially be, but only one was
| actually at the hospital the data was collected from at the
| time it was collected. You don't necessarily even need the
| hospital to leak that metadata. Maybe this person was the only
| one who even lived near that hospital near that time.
|
| The only way to anonymize the data in a foolproof way is to
| remove so many features that it becomes useless for statistical
| research.
|
| Modeling-wise, regression, classification, and clustering alike
| all rely on being able to construct a model that minimizes
| entropy. But anonymity relies upon maximizing entropy. This
| fundamental conflict can't really be resolved. You pick some
| point in the middle of the spectrum that ends up serving
| neither purpose well.
| cmiles74 wrote:
| It appears there is a spectrum of de-identification tools, some
| being better than others.
|
| https://en.m.wikipedia.org/wiki/Data_re-identification
|
| I suspect that the more data they have for a particular patient
| (many visits over multiple years) makes the re-identification
| process easier. The article mention financial data, if dates
| and amounts of charges aren't masked or altered that could be
| cross-referenced with another data source to deduce the
| person's identity.
| prepend wrote:
| De-identification is in the eye of the data steward and their
| legal folks.
|
| So it's the definition of de-id that doesn't actually "work" in
| that it's possible to reidentify small amounts from de-id data,
| and that adds up over time.
|
| For example, HIPAA considers data de-id if you remove 19 fields
| or expert determine that it's de-id. [0]
|
| What's expert determination, who's an export? That's up to me
| to decide and my lawyers to accept.
|
| The bug is that if half a percent of people each de-id datasets
| can be reidentified, likely acceptable in HIPAA, then each data
| released adds up for reidentified people. And more datasets
| allow for triangulation and linking to reidentify.
|
| The article calls this out as a risk, not a certainty as it's
| unclear if anyone is doing this. But the process would be
| something like: 1) buy HIPAA de-identified data since it
| doesn't require patient consent 2) reidentify patients using
| other data publicly for sale (marketing data, voter
| registration, etc) 3) new data is not longer HIPAA restricted
| and fully identified health records can be sold for whatever
| you like (eg, super targeted drug marketing)
|
| [0] https://www.hhs.gov/hipaa/for-
| professionals/privacy/special-...
| vharuck wrote:
| The US federal government requires users to agree to terms in
| return for receiving potentially identifiable data. Usually
| those terms include something like, "I agree to not attempt
| to use these data to identify any individuals." Granted, an
| agreement carries more weight when it's with the federal
| government (lying to them is criminal). But this kind of
| clause can prevent legal groups from reliably trading in
| identified data, even if it's from a private hospital. If
| they break the agreement, they'll probably never again
| receive new data from the provider, and stale data is low
| value.
| abcc8 wrote:
| Some medical data, such as genome sequence data, is unique to
| an individual (where the uniqueness is partly dependent on the
| length of sequence data generated - short sequences may not be
| unique). Even worse, whole exome or genome data also describes
| your lineage and may reveal sensitive information about
| relatives, living and not. Deidentification of such genomic
| data is especially difficult.
| Silhouette wrote:
| _But the real question should be: why doesn 't de-
| identification work and how to make it work._
|
| How much health data is required to distinguish a specific
| individual even without any "identifying" information? Surely
| there must be many people with unique combinations of
| conditions and treatments. If that is so then a data set that
| includes those people could only ever be pseudonymous, not
| truly anonymous.
|
| What other data sets might then be combined with the health
| data to match a pseudonymous record with an identifiable
| person? Payment records? Travel records? Time off work?
| Insurance claims?
|
| I don't know much about how the US healthcare system works, but
| maybe before asking how to make de-identification of health
| data work we should be asking whether it could ever work
| effectively at all.
| Spooky23 wrote:
| It doesn't work because the data can be correlated against
| other sets of data.
|
| When my wife almost died from an ectopic pregnancy, Enfamil was
| confident enough of the would-be due date based on data pieces
| together by a broker to FedEx us a box of formula on the due
| date.
|
| Third parties get a real time feed of insurance claims,
| hospital admissions, prescriptions and other data. They put the
| puzzle together and make an assertion of valuable diagnoses.
| (Pregnancy, diabetes, etc)
|
| They don't need to be 100% accurate. The collateral damage of
| reminding someone who lost a child and nearly lost their life
| doesn't have a cost to them.
| jnxx wrote:
| > They don't need to be 100% accurate.
|
| In many cases, abusing such data will also be profitable even
| if the result estimate is highly noisy.
|
| Just as an example, let's assume that an insurance company
| knows your given name when you apply and they have
| statistical data which shows that if your name belongs to one
| set of names, you are smoking with a probability of 51%. If
| you have another name, you are smoking with a probability of
| 49%, because the correlation is noisy. It is still more
| lucrative for the company to exclude applicants from the
| "more likely to smoke" names group, because smoking causes
| cost. You might never have tried a single cigarette in your
| life but with the information the company has, it is more
| profitable to not insure you.
|
| Of course the above is an exaggerated example which would not
| work out quantitatively (the profit they make by insuring you
| might be higher than the expected value of risk from smoking)
| but the principle still holds; it is used for credit scoring
| and similar things.
| Spooky23 wrote:
| It's more insidious than that. Based on proprietary
| databases, insurers can target advertising or marketing at
| you based on profiles that you don't know exist.
|
| If you want to avoid insuring a protected class, you might
| be able to identify them based on those databases, and use
| advertising to avoid the folks you don't want. (This is
| common for student apartments)
| aabaker99 wrote:
| It's surprising that many medical providers don't understand this
| side of HIPAA. I recently spoke to a department head of oncology
| who seemed to think patient consent was required for sharing data
| and was not comfortable sharing their patients' data. What they
| don't realize is that HIPAA doesn't require consent if the data
| is de-identified and so their organization can or is sharing
| their patients' data anyway.
| SkyPuncher wrote:
| The problem is risk. HIPAA allows for a lot of things that feel
| like exceptions to the core principles of HIPAA. Many things
| are vaguely defined as "reasonable" - which changes over time.
|
| MD5 was a reasonable password hash - until it wasn't. SHA1 was
| - until it wasn't. Etc.
|
| An article like this can arguably prove that there is no
| reasonable means of de-identification since multiple data
| sources can be combined. Combine that with the fact that HIPAA
| puts pretty high limits on the minimum cohort size that can be
| associated with a unique identifier and low ROI from actually
| sharing this data.
|
| Many places end up in a position where it's simply not worth
| the risk to share.
| pitaj wrote:
| Arguably SHA1 is still reasonable as a password hash, but of
| course not recommended.
| pg_bot wrote:
| In 2021, SHA1 is neither a reasonable or recommended
| password hash for any project. Use a tool like bcrypt,
| scrypt, or argon2 instead.
| rscho wrote:
| The funniest part is that this data is actually mostly (and often
| completely) useless for the stated purposes of statistical
| analysis. Routine clinical data collection is of abysmal quality,
| but many buyers don't see the full extent of the catastrophe.
|
| It'll be much more useful for legal and not-so-legal purposes.
| And last but not least: insurance. As an aside, privacy
| enforcement for medical data is much easier through
| fingerprinting tethered to a NDA.
| dillondoyle wrote:
| Kind of a similarly, I've previously wondered:
|
| if we act to shield prescriber data from corps if that would cut
| down on shady pharma 'bribes' e.g. the opiate crisis and the some
| other direct to Dr pharma marketing.
|
| Insys and Purdue knew which doctors prescribed insane amounts of
| pills. And then rewarded them with $. Insys even put it on paper
| as ROI and at least a few went to jail.
|
| I'm not sure technically how well (or if legally) it would work
| though so maybe the answer is not at all.
|
| mckesson would still know what pharmacies pills go to and anyone
| can figure out where a MD works to correlate at least target zip
| codes/markets. But maybe since we already have this monopolistic
| distribution setup could prohibit mckesson from disclosing
| granular shipment data.
| prepend wrote:
| I'm glad to see this getting more attention as it seems scary to
| me as a private citizen who wants my medical data to stay
| private.
|
| I'm not sure how to defend against this as it seems based on
| HIPAA and what it allows. Since de-identified data can be legally
| sold, I think it will.
|
| The theoretical defense I've thought up is a class action lawsuit
| for synthetic breaches. Since these data are deemed de-identified
| by expert determination [0] and that's hazy, if I could
| reidentify myself after de-id, and I didn't authorize it, then I
| could be eligible for breach damages for HIPAA violations up to
| $50k per person [1].
|
| Since these sets have millions of people, likely everyone in the
| country. And since expert determination can possibly classify an
| acceptable re-id risk as less than 1%, this could be a million or
| two people. So a big enough pool to attract big legal
| investments.
|
| That would increase the cost and risk of doing this to outweigh
| the benefits. But currently it's "free money" for any healthcare
| system that's kind of impossible for me as a patient to opt out.
|
| [0] https://www.hhs.gov/hipaa/for-
| professionals/privacy/special-... [1]
| https://www.injuryclaimcoach.com/hipaa-violations.html
| aabaker99 wrote:
| I like your idea of a class action lawsuit but wouldn't the
| $50k per person penalty shift your idea of what an organization
| would consider an acceptable risk? 1 million * $50,000 is
| probably not acceptable.
|
| I hope for all our sakes you are in the business of buying de-
| identified medical data :)
| prepend wrote:
| I think this has the potential for the next asbestos or
| tobacco or opioid payouts. I definitely wouldn't want to work
| in this area (both for ethical and business model at risk for
| sued out of existence).
| throwaway894345 wrote:
| As far as data privacy goes, I will say that the sale of health
| data has some pretty significant upsides--specifically medical
| research. There's probably a better way to get the best of both
| worlds, but I would hate for us to just pull the rug out from
| under medical research without a satisfactory backfill.
|
| Disclaimer: I work for a company that uses this data to match
| terminally ill patients with niche treatments and clinical
| trials, and the work literally saves and prolongs lives.
| captainoats wrote:
| The article mentions the dispute over wether or not
| anonymized medical record data really has enough of a signal
| for meaningful research use cases. That has to be weighed
| against the privacy concern.
| throwaway894345 wrote:
| Agreed. To be even more explicit, I'm not saying "do not
| weigh the research concern against the privacy concern".
| not_jd_salinger wrote:
| > I work for a company that uses this data to match
| terminally ill patients with niche treatments and clinical
| trials, and the work literally saves and prolongs lives.
|
| Talked with a company that made similar claims about what
| they are doing, but it turns out they were really ensuring
| that their users where choosing the medication from the
| highest bidder rather than the one that was really in the the
| patient's best interests.
|
| The mental gymnastics they did to justify they were doing
| what was in the patient's best interests was fun to observe,
| but in the end they were just a tool of a large
| pharmaceutical company, exploiting sick people for profit.
|
| Quick cynicism sanity check: who pays your company, your
| users or the "niche" treatment provider?
|
| If it's the former, that sounds like it's good work.
|
| If it's the latter I would recommend being a bit more
| skeptical of the business motives of your employer and their
| customers.
| throwaway894345 wrote:
| > Quick cynicism sanity check: who pays your company, your
| users or the "niche" treatment provider?
|
| Insurers. They don't want to pay for the expensive product.
| giantg2 wrote:
| Ta-da
|
| https://www.nytimes.com/2019/07/23/health/data-privacy-prote...
| notafraudster wrote:
| HIPAA has no private cause of action (you can't sue providers
| who violate your rights under HIPAA). The government can fine
| them, so from a provider POV they are liable for the breach you
| propose, but you are not eligible for recovering the $50k.
|
| You may or may not have a private civil tort against the
| medical provider, separately from HIPAA.
| analog31 wrote:
| People talk about the US being lawsuit-happy, but here's an
| example where the tort system could work to encourage the
| enforcement of a law if the government doesn't want to
| enforce it.
| Teever wrote:
| As a non-American I am always fascinated by the hacks and
| kludges that people propose to mitigate the inherent
| dysfunction that is the US medical system.
|
| I consider intentional breaches of medical privacy to be in
| the same league as physical assault and as such would
| expect a society to respond to these kinds of crimes with
| swift and severe punishments including jail time and fines.
|
| Why not just arrest these people and take the profits from
| their crime instead of adding civil lawyers and bureaucracy
| to the mix?
| wombatpm wrote:
| Would a hosital employee be able to make a whistle blower
| case and recoup a portion of the fines?
| prepend wrote:
| Good point, the fines are by HHS and not dollar amounts to
| victims. I gave the amounts to signify the importance and
| IANAL would be part of a basis for establishing harm done.
|
| I specifically linked to an injury lawyer to show the types
| of current HIPAA violation civil suits that have succeeded.
| My reasoning is that it's lucrative enough for lawyers to
| solicit business.
| jtaft wrote:
| > As long as they de-identify the records -- removing information
| like patient names, locations, and phone numbers -- they can give
| or sell the data to partners for research.
|
| I don't feel this is enough to deidentify.
|
| If timestamps or a patient # is associated with records, should
| be possible to combine with credit card records to discover who
| someone is.
|
| I wonder if we can request what is shared.
| organsnyder wrote:
| I work in healthcare (though not with any efforts like those
| described in the article). Anonymizing data is much more
| complicated than simply removing the obvious PII: depending on
| what the data is, even things like procedure codes with dates
| might be enough to identify a patient. HIPAA and other related
| regs account for this, and have pretty strict procedures that
| must be followed before data can be considered officially
| deidentified.
| pulse7 wrote:
| In a few months somebody can come up with a deep-neural-network
| to identify those people behind "de-identified data"...
| rscho wrote:
| months? Hours.
| dekhn wrote:
| yes, that deep neural network is called "an army of MIT
| students, grad students, and postdocs"
| [deleted]
| agumonkey wrote:
| how come more and more of society is about selling blips of
| events as data ?
|
| is the price of everything far lower than what is required to
| sustain the system ?
| TrailMixRaisin wrote:
| There is a (famous) quote from Cynthia Dwork, who is most likely
| the key researcher behind differential privacy: "De-identified
| data isn't".
|
| You can either re-identify the people behind the data or you
| alter it that strong that it becomes useless for meaningful
| applications.
| miej wrote:
| re-identification is basically the same as browser
| fingerprinting. with enough vaguely stochastic variables, you
| can uniquely identify pretty much anyone
| specialist wrote:
| > _" De-identified data isn't"_
|
| People don't yet have intuition about this. I still don't even
| know how to articulate it. Here's a stab:
|
| "With enough data collected, you can uniquely identify people
| by ruling out everyone else."
|
| Mid 2000s, Seisent was helping law enforcement solve cold cases
| using big data. In layperson's terms, they'd narrow the list of
| suspects by ruling out everyone who has a solid alibi.
|
| At the time, that was achieved by building profiles of
| _everyone_ , living and dead, simply by compiling 1600
| _publicly_ available datasets. Like court records and mortgages
| and whatnot.
|
| Today you'd include location tracking, social media, all
| financial tracking, etc.
|
| It's remarkable that any crime goes unsolved. Like the back log
| of rape test kits, it's only because no one cares enough to
| bother to look.
| pokot0 wrote:
| Just playing akinator once will help giving an intuition
| about this.
| MaxBarraclough wrote:
| Web version of Akinator. [0] Pretty neat. Pity about the
| site's high CPU usage.
|
| [0] https://en.akinator.com/
| Banana699 wrote:
| I thought of John Kennedy the first time, took the game
| about 55 questions to get to it, it didn't until it knew
| that he was a US president, was assassinated (as far as I
| know only Lincoln and Kennedy meet those two criteria
| together) AND that he lived in the 20th century, so it
| couldn't even rule out Lincoln. Next game I thought of
| Cameron Diaz, the game simply gave up after the 25th
| question or so.
|
| It's a pretty neat idea overall, reminds me of those
| link-following competitions using Wikipedia where you
| have to start from a topic and get to another completely
| unrelated topic solely by recursively chasing links. They
| both exploit the interconnected nature of our culture, a
| question (or an article) about a recent TV show providing
| a clue (or a link) about a pretty unrelated historical
| character.
|
| The game desperately needs some notion of statistical
| proximity though, it kept asking whether the character
| was an actor even after knowing he is a president, it
| doesn't do _any_ kind of deduction on what it already
| knows, just mad slash-and-dash till it gets the
| identifying info.
| MaxBarraclough wrote:
| It worked much better for me. It got South Park's Stan
| Marsh in not too many guesses.
| nradov wrote:
| De-identified data can still be quite useful for some types of
| research. If you strip away every field of personally
| identifiable information except, let's say, sex and birth year,
| there's no way to do meaningful re-identification.
| TrailMixRaisin wrote:
| I am sorry, but your answer is nearly the exact textbook
| example how wrong most people are with their intuition. In
| 2000 Latanya Sweeney showed that 87% of Americans can be
| identified by Sex, Birthday and Zip-Code. As other commenters
| point out, the connection to external databases is extremely
| powerful and dangerous to privacy.
|
| There are other concept in research to still use this data in
| research like differential privacy or using KI to synthesize
| data according to training data provided. But so far all
| concepts that tried to alter and then publish a datasets
| directly for research failed miserably.
| prepend wrote:
| Birthday is super different than year of birth. And zip
| code is really precise.
|
| GP said sex and year of birth that are usually perfectly
| fine for deidentification assuming some basic k-anonymity
| and l-diversity protections.
|
| It's frustrating when people bring up examples of
| reidentification out of data that were not properly
| reidentified in the first place.
|
| Hopefully no one competent would think that having unique
| records based on sex, dob, and zip code are makes data
| deidentified. This is usually the case of someone not
| actually deidentified.
|
| The bigger risk, I think, is when you have some threshold
| of at least 10 records sharing sex, year of birth and still
| iding individuals.
| jfrunyon wrote:
| Sure, sex and year of birth alone are fine. Why do you
| assume they won't be able to use the medical data to
| further de-anonymize it? "Male, 1993, records are from a
| hospital in Central Texas so he lives there, who has
| posted on social media about conditions x, y and z" would
| probably turn up... me.
| giovannibonetti wrote:
| The parent comment mentioned birth year, not birthday
| ghaff wrote:
| In fairness, the parent did just say Birth Year and Sex
| while Latanya Sweeney used the identifiers you list which
| give quite a bit more information. I agree with your basic
| point though that data that appears to be anonymized can
| frequently be de-anonymized--at least to some degree of
| statistical certainty to a degree that most would find
| surprising.
| alasdair_ wrote:
| All information is potentially personally identifiable.
|
| For example, perhaps you take data on favorite movies and
| strip away every bit of PII except sex and birth year (as you
| stated).
|
| Now let's say I take the stripped data and target some
| facebook ads to people of a specific sex and birth year and
| have the ad be for people who love a certain obscure movie,
| doubling down on a second obscure movie and so on. Eventually
| I could have a reasonable chance of determining which _other_
| movies a unique individual may like based on the stripped
| data.
|
| Obviously irrelevant for movies, but more relevant for
| prescription drug uses, sexual preferences etc.
|
| The point is that with enough outside data available, even
| data stripped of PII can be de-anonimized.
| rscho wrote:
| This is wrong. Rare diagnoses are the simplest case of
| reidentification, but there are many many other
| opportunities. There's a whole field of research about that.
| specialist wrote:
| Ditto time and sequence (order). For example, online movie
| reviews were deanon simply by correlating viewing history
| with order reviews were posted.
| tmearnest wrote:
| Dates and times are generally deidentified by choosing a
| random initial date and changing subsequent timestamps to
| the random initial plus the duration between visits. The
| sequence and delays could potentially be used to identify
| patients, but this would be a lot harder than having
| absolute timestamps.
| specialist wrote:
| Totally.
|
| I recently had a crazy notion for losslessly scrambling
| the sequence as well. Mostly for protecting voter privacy
| (order in which ballots are cast). One of the major
| blockers to fully digital voting.
|
| I haven't found any hits using terms like "cryptographic
| timestamps." Surely I can't be the first.
| seoaeu wrote:
| Voter privacy for digital voting doesn't solve much. It
| doesn't seem to be possible to both cryptographically
| prove to a voter that their vote was counted correctly
| (integrity) while simultaneously preventing them from
| being at risk of coercion to share who they voted for
| (privacy).
| JohnWhigham wrote:
| Soon enough this data will end up in the hands of our insurers,
| and enough technology will be built to where if you buy a 6 pack
| on the weekend, your premium will be adjusted on-the-fly.
|
| It's all so tiring.
| throwaway3699 wrote:
| FWIW socialised healthcare (the better alternative) hasn't
| solved that particular problem. Punitive taxes on alcohol and
| smoking are very common because they're trying to reduce costs.
| Insurance is the same thing but on an individual basis, which
| means healthier people should _in theory_ get better premiums
| in your example.
| JohnWhigham wrote:
| Right, but I'm talking about the novel ways companies try to
| optimize premiums. Car insurance is already doing it with
| some companies wanting you to install an application on your
| phone to track your driving habits in hopes of maybe lowering
| your premium. Disgusting shit.
| pc86 wrote:
| I don't install these apps but what exactly is wrong with
| them and why is it "disgusting shit?" If I drive the speed
| limit or slower, stop for 3-4 seconds at every stop sign,
| don't accelerate quickly, etc., what's wrong with me having
| a lower auto insurance premium than someone with an
| identical profile who goes 10 over the limit everywhere and
| rolls through every 5th stop sign?
|
| I'm not implying insurance companies are doing this out of
| altruism. If it resulted in a net decrease in profit, the
| apps wouldn't exist. But it does seem like a beneficial
| form of price discrimination. It seems like it's only
| "disgusting shit" if it makes your insurance premiums go up
| because you're a less safe driver.
| throwaway3699 wrote:
| I will admit I'm not a fan of surveillance based
| insurance, either. Just pointing out the alternative is
| individualising the cost via insurance or making all
| society pay for a few people.
| DennisP wrote:
| Under current US law, health insurance companies can't even
| adjust your premiums for serious preexisting conditions, much
| less because you bought some beer.
|
| Auto insurance could maybe do it though.
| JohnWhigham wrote:
| Don't think that could not change in a heartbeat, given how
| powerful the healthcare lobby is in Congress.
| [deleted]
| DennisP wrote:
| And yet the ACA passed in the first place.
| droopyEyelids wrote:
| the ACA was a big win for insurers.
| mbg721 wrote:
| Many plans are highly punitive towards smokers; it's not a
| huge leap to imagine something new becoming the Sin Of The
| Week.
| wolverine876 wrote:
| It's hard to compare smoking to the 'sin of the week'. It's
| the leading cause of preventable death in the U.S. (or was
| a few years ago), and probably has been for generations. It
| is one of the most studied and prolonged public health
| issues.
|
| It's hard to blame smoking insurance costs on 'sin' - it
| kills people and increases costs for the insurance company,
| possibly more than any other choice people can make. If you
| drive in drag races and ask for car insurance, don't be
| surprised if it costs more.
| alasdair_ wrote:
| The trick is to use a proxy for the metric that IS
| permissible in much the same way that financial companies in
| the 1960s used address data once it became illegal to
| discriminate directly because someone was black.
| DennisP wrote:
| The only factors are "location, age, tobacco use, plan
| category, and whether the plan covers dependents."
|
| Maybe some zipcodes drink more than others, but they don't
| need to figure that out because they can just look at
| illness rates for the zipcode directly. Same for age and
| tobacco.
|
| https://www.healthcare.gov/how-plans-set-your-premiums/
| Spooky23 wrote:
| Already done - this type of data is already used to assess your
| risk for opioid addiction in several states, especially if you
| are in a high risk/high cost group.
| unishark wrote:
| So medical science could figure out the health consequence of
| all your dietary and recreational decisions with amazing
| granularity, and your take is this is a negative thing because
| of the possibility of higher insurance rates?
| JohnWhigham wrote:
| Yes? Are you not aware of how money-hungry these insurance
| companies are, and how you have to fight them tooth-and-nail
| to get some things covered?
| missedthecue wrote:
| If you live an unhealthy lifestyle, it's easy to imagine how
| a fairer deal might cost you more money.
| teacup21 wrote:
| >> Soon enough this data will end up in the hands of our
| insurers, and enough technology will be built to where if you
| buy a 6 pack on the weekend, your premium will be adjusted on-
| the-fly.
|
| After watching the movie "I Care A Lot"
| (https://www.imdb.com/title/tt9893250/) and reading up the
| horror stories about forced Guardianship scams in the US
| (https://www.newyorker.com/magazine/2017/10/09/how-the-
| elderl...) -- i'd be worried that corporatized Guardianship
| companies scan medical data en-masse to find victims (with
| sufficient work and profit motive, it can be de-anon'd)
| stevebmark wrote:
| It would be more honest if the article title said "deidentified"
| medical data.
| Frost1x wrote:
| Information is information and the more you have, the easier it
| is to figure out where it came from. Some data makes it easier
| than others.
|
| I worked with a hospital awhile back with patient radiological
| data (for free and for science). Patients had to explicly sign-
| off that they were sharing their data with us and what we planned
| to use it for. A lot of the metadata from DICOM wasn't even
| wiped, I had their names, street addresses, all sorts of stuff
| which was supposed to be wiped (de-identified).
|
| Even then, I worked with data from a patient with a brain tumor
| and their neurosurgeon looking to remove it. That data was
| correctly anonymoized but it was their head, so I could basically
| reconstruct their face--so is a coarse geomerric sampling of a
| face de-identified? I guess it depends on how coarse it was.
|
| Just look at what's done with social media, advertising, and
| browser data to get an idea of where things can go.
| deaps wrote:
| Under what circumstances did they "explicitly sign-off" on the
| data sharing, I wonder? There are a lot of times during a
| hospital visit, when one could be less-than-observant of
| exactly what he/she is signing.
| Frost1x wrote:
| _Very explicitly_.
|
| This was for a project paternship with a university and
| hospital to improve patient outcomes with some new
| exploratory tech approaches. A short document that followed
| some standard study participation format was generated using
| easily understandable language in about 1-2 pages IIRC (large
| fonts so it was easily readable by patients with poor eye-
| sight). Everything done, including the document, went through
| an external IRB process for human subject data and was
| approved. Everyone involved had to go through human subject
| training and what not.
|
| Physician would mention the study to patients that would
| likely be good subjects for the work about the work, its
| goals, if they'd be interested in participating. Forms were
| then provided to patients involved to sign (explicitly) about
| their agreement to participate in the effort and how their
| data would be used, protected, etc. The process also required
| physician sign-off to confirm they read the document to the
| patient verbally, determined they were competent, cognizant,
| not under any sort of duress/intoxication, etc. The patient
| also needed to verbally acknowledge they agreed. Oh, and
| there was a clause they could retroactively pull out of the
| work, including their data at any point of they felt
| uncomfortable or changed their minds.
|
| The patients and their data weren't the product, tech
| developed that would assist patients was the product of the
| data. For patients who agreed, some would also be permitted
| to see some of the products of the work related to their
| data. Im forgetting a lot of the data collection process
| because it was very rigorous and several years ago now, but
| everything above bar, no dark patterny _ah-ha-gotcha!_ line
| buried in a 300 page liability sign off they had to agree to
| for some necessary life saving treatment or anything of that
| nature.
|
| I even got to meet some of the people we helped which was a
| bit rewarding to see people's lives improve a bit with
| technology. The specific patient mentioned and their
| neurosurgeon even let me sit in on their brain surgery tumor
| removal (patient's suggestion), which was a very unique
| experience. So yea, they knew what was going on.
|
| With that said, not all data usage was as transparent and
| ethical as what I worked with, and I saw a lot of mistakes
| there that make me cringe thinking what a less ethical
| business with no transparency might do, given the
| opportunity.
| pc86 wrote:
| Every doctor's visit I've had in recent memory had a data
| sharing agreement I had to sign (or at least, that was
| presented to me) if I hadn't been there before.
| aabaker99 wrote:
| One troubling re-identification attack for medical data is the
| trail re-identification method [0]. A lot of privacy analysis
| will consider the data to be in the shape of a table T with some
| columns A,B,C and use the notation T<A,B,C> to describe a de-
| identified dataset. The trail method will take multiple de-
| identified datasets, each from a different hospital, T_1<A,B,C>
| T_2<B,C,D>, T_3<C,D,E> and use their shared columns to narrow
| down on a set of individuals.
|
| So, even though each hospital may have a legitimately de-
| identified dataset in isolation, it is not de-identified when
| combined with the (also de-identified) data from another
| hospital. The risk of this attack increases as patients visit
| more hospitals. We humans are fairly long-lived and tend to move
| around so it may be substantial. (That being said some hospital
| systems are quite large like Kaiser Permanente and serve huge
| areas so visiting multiple hospitals doesn't necessarily create
| multiple tables.)
|
| [0]
| https://dataprivacylab.org/dataprivacy/projects/trails/trail...
| alistairSH wrote:
| For this attack to work, wouldn't one of the tables need to
| contain PII of some sort? If A,B,C,D,E are all de-identified,
| the aggregate is still de-identified? But, if E is SSN (or some
| other PII data), then the entire set can be re-identified?
| taejo wrote:
| That's one option: you combine protected, de-identified
| information with unprotected (e.g. non-health) information to
| re-identify the protected information. But also, something
| like Facebook allows you to target a person who lives in
| $TOWN, works at $COMPANY, born in $YEAR, even if you don't
| know that person's name or SSN.
___________________________________________________________________
(page generated 2021-06-24 23:00 UTC)