[HN Gopher] AI models miss disease in Black and female patients
___________________________________________________________________
AI models miss disease in Black and female patients
Author : pseudolus
Score : 207 points
Date : 2025-03-27 18:38 UTC (4 hours ago)
(HTM) web link (www.science.org)
(TXT) w3m dump (www.science.org)
| _bin_ wrote:
| This seems like a problem that should be worked on
|
| It also seems like we shouldn't let it prevent all AI deployment
| in the interim. It is better that we take the disease detection
| rate for part of the population up a few percent than we do not.
| Plus it's not like doctors or radiologists always diagnose at
| perfectly equal accuracy across all populations.
|
| Let's not let the perfect become the enemy of the good.
| nradov wrote:
| False positive diagnoses cause a huge amount of patient harm.
| New technologies should only be deployed on a widespread basis
| when they are justified based on solid evidence-based medicine
| criteria.
| nonethewiser wrote:
| No one says you have to use the AI models stupidly.
|
| If it works poorly for black women and female women dont use
| it for them.
|
| Or simply dont use it for the initial diagnosis. Use it after
| the normal diagnosis process as more of a validation step.
|
| Anyways, this all points to the need to capture biological
| information as input or even having seperately models tuned
| to different factors.
| nradov wrote:
| The guidelines on how to use a particular AI model can only
| be written after extensive clinical research and data
| analysis. You can't skip that step without endangering
| patients, and it will take years to do properly for each
| one.
| Avshalom wrote:
| Every single AI company says you _should_ use AI models
| stupidly. Replacing experts is the whole selling point.
| nonethewiser wrote:
| OK so should we optimize for blindly listening to AI
| companies then?
| Avshalom wrote:
| We should assume people will use tools in the manner that
| they have been sold those tools yes.
| nonethewiser wrote:
| But these tools include research like this. This research
| is sold as proof that AI models have problems with bias.
| So by your reasoning I'd expect doctors to be wary of AI
| models.
| Avshalom wrote:
| doctors aren't being sold this. Private equity firms that
| buy hospitals are.
| acobster wrote:
| > having seperately models tuned to different factors.
|
| Sure. Separate but equal, presumably.
| nonethewiser wrote:
| Whats the alternative? Withholding effective tools
| because they arent effective for everyone? One model
| thats worse for everyone?
|
| This is what personalized medicine is, and it gets more
| individualistic than simply classifying people by race
| and gender. There are a lot of medical gains to be made
| here.
| nradov wrote:
| Citation needed. Personalized medicine seems like a great
| idea in principle, but so far attempts to put it into
| practice have been underwhelming in terms of improved
| patient outcomes. You seem to be assuming that these
| tools actually are effective, but generally that remains
| unproven.
| bilbo0s wrote:
| Mmmm...
|
| You don't work in healthcare do you?
|
| I think it would be extremely bad if people found out that, um,
| "other already disliked/scapegoated people", get actual doctors
| and nurses working on them, but "people like me" only get the
| doctor or nurse checking an AI model.
|
| I'm saying that if you were going to do that, you'd better have
| an extremely high degree of secrecy about what you were doing
| in the background. Like, "we're doing this because it's medical
| research" kind of secrecy. Because there's a bajillion ways
| that could go sideways in today's world. Especially if that
| model performs worse than some rockstar doctor that's now freed
| up to take his/her time seeing the, uh, "other already
| disliked/scapegoated population".
|
| Your hospital or clinic's statistics start to look a bit off.
|
| Joint commission?
|
| Medical review boards?
|
| Next thing you know certain political types are out telling
| everyone how a certain population is getting preferential
| treatment at this or that facility. And that story always turns
| into, "All around the nation they're using AI to get
| <scapegoats> preferential treatment".
|
| It's just a big risk unless you're 100% certain that model can
| perform better than your best physician. Which is highly
| unlikely.
|
| This is the sort of thing you want to do the right way.
| _Especially_ nowadays. Politics permeates everything in
| healthcare right now.
| jcims wrote:
| Just like doctors:
| https://kffhealthnews.org/news/article/medical-misdiagnosis-...
|
| I wonder how well it does with folks that have chronic conditions
| like type 1 diabetes as a population.
|
| Maybe part of the problem is that we're treating these tools like
| humans that have to look at one fuzzy picture to figure things
| out. A 'multi-modal' model that can integrate inputs like raw
| ultrasound doppler, x-ray, ct scan, blood work, ekg, etc etc
| would likely be much more capable than a human counterpart.
| CharlesW wrote:
| It seems critical to have diverse, inclusive, and equitable data
| for model training. (I call this concept "DIET".)
| nonethewiser wrote:
| Or take more inputs. If there are differences between race and
| gender and thats not captured as an input we should expect the
| accuracy to be lower.
|
| If an x-ray means different things based off the race or gender
| we should make sure the model knows the race and gender.
| 0cf8612b2e1e wrote:
| Funny you should say that. There was a push to have more
| officially collected DIET data for exactly this reason.
| Unfortunately such efforts were recently terminated.
| appleorchard46 wrote:
| I'm calling it now. My prediction is that, 5-10 years from
| now(ish), once training efficiency has plateaued, and we have a
| better idea of how to do more with less, curated datasets will
| be the next big thing.
|
| Investors will throw money at startups claiming to make their
| own training data by consulting experts, finetuning as it is
| now will be obsolete, pre-ChatGPT internet scrapes will be
| worth their weight in gold. Once a block is hit on what we can
| do with data, the data itself is the next target.
| AlecSchueler wrote:
| Humans do the same. Everything from medical studies to doctor
| trainings treat the straight white man as the "default human" and
| this obviously leads to all sorts of issues. Caroline Criado-
| Perez has an entire chapter about this in her book about systemic
| bias Invisible Women, with a scary number of examples and real
| world consequences.
|
| It's no surprise that AI training sets reflect this also. People
| have been warning against it [0] specifically for at least 5
| years.
|
| 0: https://www.pnas.org/doi/10.1073/pnas.1919012117
|
| Edit: I've never had a comment so heavily downvoted so quickly. I
| know it's not the done thing to complain but HN really feels more
| and more like a boys club sometimes. Could anyone explain what
| they find so contentieus about what I've said?
| unsupp0rted wrote:
| Everybody knows that gay men have more livers and fewer kidneys
| than straight men
| AlecSchueler wrote:
| Why the snark? The OP, the study I linked and the book I
| referenced which contains many well researched examples of
| issues caused by defaultism surely represent a strong enough
| body of work that they should deserve a more engaged
| critique.
| consteval wrote:
| No, but they do have different risk profiles for various
| diseases and drug use. Surprise surprise, that affects
| diagnoses and treatment.
| Animats wrote:
| What's so striking is how strongly race shows in X-rays. That's
| unexpected.
| banqjls wrote:
| But is it really?
| danielmarkbruce wrote:
| The fact that the vast majority of physical differences don't
| matter in the modern world doesn't mean they don't actually
| exist..
| DickingAround wrote:
| This is a good point; a man or woman sitting behind a desk
| doing correlation analysis are going to look very similar in
| their function to a business. But they probably physically
| look pretty distinct to an x-ray picture.
| sergiotapia wrote:
| It's odd how we can segment between different species in
| animals, but in humans it's taboo to talk about this. Threw the
| baby out with the baby water. I hope we can fix this soon so
| everybody can benefit from AI. The fact that I'm a male latino
| should be an input for an AI trained on male latinos! I want
| great care!
|
| I don't want pretend kumbaya that we are all humans in the end.
| That's not true. We are distinct! We all deserve love and
| respect and care, but we are distinct!
| schnable wrote:
| That's because humans are all the same species.
| sdsd wrote:
| In terms ofLinnaean taxonomy, and Chihuahuas and wolves are
| also the same species, in that they can reproduce fertile
| offspring. We instead differentiate them using the less
| objective subspecies classification. So it appears that
| with canines we're comfortable delineating subspecies, why
| not with humans?
|
| I don't think we should, but your particular argument seems
| open to this critique.
| sergiotapia wrote:
| yes this is what I was referring to. I think it's time we
| become open to this reality to improve healthcare for
| everybody.
| kjkjadksj wrote:
| Race has such striking phenotypes on the outside it should come
| as no surprise there are also internal phenotypes and
| significant heterogeneity.
| dekhn wrote:
| It doesn't seem surprising at all. Genetic history correlates
| with race, and genetic history correlates with body-level
| phenotypes; race also correlates with socioeconomic status
| which correlates with body-level phenotypes. They are of course
| fairly complex correlations with many confounding factors and
| uncontrolled variables.
|
| It has been controversial to discuss this and a lot of
| discussions about this end up in flamewars, but it doesn't seem
| surprising, at least to me, from my understanding of the
| relationship between genetic history and body-level phenotypes.
| KittenInABox wrote:
| What is the body-level phenotype of a ribcage by race?
|
| I think what baffles me is that black people as a group are
| more genetically diverse than every other race put together
| so I have no idea how you would identify race by ribcage
| x-rays exclusively.
| dekhn wrote:
| I use the term genetic history, rather than race, as race
| is only weakly correlated with body level phenotypes.
|
| If your question is truly in good faith (rather than a "I
| want to get in argument "), then my answer is: it's
| complicated. Machine learning models that work on images
| learn extremely complicated correlations between pixels and
| labels. If on average, people with a specific genetic
| history had slightly larger ribcages (due to their
| genetics, or even socioeconomic status that correlated with
| genetic history), that would exhibit in a number of ways in
| the pixels of a radiograph- larger bones spread across more
| pixels, density of bones slightly higher or lower, organ
| size differences, etc.
|
| It is true that Africa has more genetic diversity than
| anywhere else; the current explanation is that after humans
| arose in africa, they spread and evolved extensively, but
| only a small number of genetically limited groups left
| africa and reproduced/evolved elsewhere in the world.
| KittenInABox wrote:
| I am genuinely asking because it makes no sense to me
| that a genetically diverse group are distinctly
| identifiable by their ribcage bones in an x-ray. If it's
| something more specific like AI sucks at statistically
| larger ribcages, statistically noticeable bone densities,
| or similar, okay. But something like so-small-humans-
| cannot-tell-but-is-simultaneously-widely-applicable-to-a-
| large-genetic-population is utterly baffling to me.
| echoangle wrote:
| > it makes no sense to me that a genetically diverse
| group are distinctly identifiable by their ribcage bones
| in an x-ray
|
| I don't see how diversity would prevent identification.
| Butterflies are very diverse, but I still recognize one
| and don't think it's a bird. As long as the diversity is
| constrained to specific features, it can still be
| discriminated (and even if it's not, it technically still
| could be by just excluding everything else).
| dekhn wrote:
| I dunno. My perspective is that I've worked in ML for 30+
| years now and over time, unsupervised clustering and
| direct featurization (IE, treating the image pixel as the
| features, rather than extracting features) have shown
| great utility in uncovering subtle correlations that
| humans don't notice. Sometimes, with careful analysis,
| you can sort of explain these ("it turns out the
| unlabelled images had the name of the hospital embedded
| in them, and hospital 1 had more cancer patients than
| hospital 2 patients because it was a regional cancer
| center, so the predictor learned to predict cancer more
| often for images that came from hospital 1") while other
| cases, no human, even a genius, could possibly understand
| the combination of variables that contributed to an
| output (pretty much anything in cellular biology, where
| billions of instances of millions of different factors
| act along with feedback loops and other regulation to
| produce systems that are robust to perturbations).
|
| I concluded long ago I wasn't smart enough to understand
| some things, but by using ML, simulations, and
| statistics, I could augment my native intelligence and
| make sense of complex systems in biology. With mixed
| results- I don't think we're anywhere close to solving
| the generalized genotype to phenotype problem.
| bflesch wrote:
| Sounds like "geoguesser" players who learn to recognize
| google street view pictures from a specific country by
| looking at the color of the google street view car or a
| specific piece of dirt on the camera lens.
| lesuorac wrote:
| If you have 2 samples where one is highly concentrated
| around 5 and the other is dispersed more evenly between 0
| and 10 then for any value of 5 you should guess Sample 1.
|
| But anyways, the article links out to a paper [1] but
| unfortunately the paper tries to theorize things that would
| explain how and they don't find one (which may mean the AI
| is cheating imo not theirs).
|
| [1]: https://www.thelancet.com/journals/landig/article/PIIS
| 2589-7...
| Avshalom wrote:
| Africa is extremely diverse but due to the slave trade
| mostly drawing from the Gulf of Guinea (and then being,
| uh... artificially selected in addition to that) 'Black'
| -as an American demographic- is much less so.
| yieldcrv wrote:
| just giving globs of training sets and letting a process cook for
| a few months is just going to be seen as lazy in the near future
|
| more specialization of models is necessary, now that there is
| awareness
| acobster wrote:
| Specialization in what though? Do you really think VCs are
| going to drive innovation on equitable outcomes? Where is the
| money in that? I have a hunch that oppression will continue to
| be profitable.
| yieldcrv wrote:
| the model involved in this article was developed by Stanford,
| and tested by UCLA
|
| so yes I do believe that models will be created with more
| specific datasets, which is the specialization I was
| referring to
| elietoubi wrote:
| I came across a fascinating Microsoft research paper on MedFuzz
| (https://www.microsoft.com/en-us/research/blog/medfuzz-explor...)
| that explores how adding extra, misleading prompt details can
| cause large language models (LLMs) to arrive at incorrect
| answers.
|
| For example, a standard MedQA question describes a 6-year-old
| African American boy with sickle cell disease. Normally, the
| straightforward details (e.g., jaundice, bone pain, lab results)
| lead to "Sickle cell disease" as the correct diagnosis. However,
| under MedFuzz, an "attacker" LLM repeatedly modifies the question
| --adding information like low-income status, a sibling with
| alpha-thalassemia, or the use of herbal remedies--none of which
| should change the actual diagnosis. These additional, misleading
| hints can trick the "target" LLM into choosing the wrong answer.
| The paper highlights how real-world complexities and stereotypes
| can significantly reduce an LLM's performance, even if it
| initially scores well on a standard benchmark.
|
| Disclaimer: I work in Medical AI and co-founded the AI Health
| Institute (https://aihealthinstitute.org/).
| onlyrealcuzzo wrote:
| It's almost as if you'd want to not feed what the patient says
| directly to an LLM.
|
| A non-trivial part of what doctors do is charting - where they
| strip out all the unimportant stuff you tell them unrelated to
| what they're currently trying to diagnose / treat, so that
| there's a clear and concise record.
|
| You'd want to have a charting stage before you send the patient
| input to the LLM.
|
| It's probably not important whether the patient is low income
| or high income or whether they live in the hood or the uppity
| part of town.
| nradov wrote:
| I generally agree, however socioeconomic and environmental
| factors are highly correlated with certain medical conditions
| (social determinants of health). In some cases even
| causative. For example, patients who live near an oil
| refinery are more likely to have certain cancers or lung
| diseases.
|
| https://doi.org/10.1093/jncics/pkaa088
| onlyrealcuzzo wrote:
| So that's the important part, not that they're low income.
| thereisnospork wrote:
| Sure, but correlation is correlation. Ergo 'low income',
| as well as affections or causes of being 'low income' are
| valid diagnostic indicators.
| dekhn wrote:
| Studies like that, no matter how careful, cannot say
| anything about causation.
| dap wrote:
| > It's almost as if you'd want to not feed what the patient
| says directly to an LLM.
|
| > A non-trivial part of what doctors do is charting - where
| they strip out all the unimportant stuff you tell them
| unrelated to what they're currently trying to diagnose /
| treat, so that there's a clear and concise record.
|
| I think the hard part of medicine -- the part that requires
| years of school and more years of practical experience -- is
| figuring out which observations are likely to be relevant,
| which aren't, and what they all might mean. Maybe it's useful
| to have a tool that can aid in navigating the differential
| diagnosis decision tree but if it requires that a person has
| already distilled the data down to what's relevant, that
| seems like the relatively easy part?
| onlyrealcuzzo wrote:
| Yes - theoretically, some form of ML/AI should be very good
| at charting the relevant parts, prompting the doctor for
| follow-up questions & tests that would be good to know to
| rule out certain conditions.
|
| The harder problem would be getting the actual diagnosis
| right, not filtering out irrelevant details.
|
| But it will be an important step if you're using an LLM for
| the diagnosis.
| airstrike wrote:
| By the way, the show The Pitt currently on Max touches on
| some of this stuff with a great deal of accuracy (I'm told)
| and equal amounts of empathy. It's quite good.
| cheschire wrote:
| Can't the same be said for humans though? Not to be too
| reductive, but aren't most general practitioners just pattern
| recognition machines?
| daemonologist wrote:
| I'm sure humans can make similar errors, but we're definitely
| less suggestible than current language models. For example,
| if you tell a chat-tuned LLM it's incorrect, it will almost
| _always_ respond with something like "I'm sorry, you're
| right..." A human would be much more likely to push back if
| they're confident.
| AnimalMuppet wrote:
| Unfortunately, humans talking to a doctor give lots of
| additional, misleading hints...
| echoangle wrote:
| > a sibling with alpha-thalassemia
|
| I have no clue what that is or why it shouldn't change the
| diagnosis, but it seems to be a genetic thing. Is the problem
| that this has nothing to do with the described symptoms?
| Because surely, a sibling having a genetic disease would be
| relevant if the disease could be a cause of the symptoms?
| kulahan wrote:
| In medicine, if it walk like a horse and talks like a horse,
| it's a horse. You don't start looking into the health of
| relatives when your patient tells the full story on their
| own.
|
| Sickle cell anemia is common among African Americans (if you
| don't have the full-blown version, the genes can assist with
| resisting one of the common mosquito-borne diseases found in
| Africa, which is why it developed in the first place I
| believe).
|
| So, we have a patient in the primary risk group presenting
| with symptoms that match well with SCA. You treat that now,
| unless you have a specific reason not to.
|
| Sometimes you have a list of 10-ish diseases in order of
| descending likelihood, and the only way to rule out which one
| it isn't, is by seeing no results from the treatment.
|
| Edit: and it's probably worth mentioning no patient ever
| gives ONLY relevant info. Every human barrages you with all
| the things hurting that may or may not be related. A doctor's
| specific job in that situation is to filter out useless info.
| orr94 wrote:
| "AIs want the future to be like the past, and AIs make the future
| like the past. If the training data is full of human bias, then
| the predictions will also be full of human bias, and then the
| outcomes will be full of human bias, and when those outcomes are
| copraphagically fed back into the training data, you get new,
| highly concentrated human/machine bias."
|
| https://pluralistic.net/2025/03/18/asbestos-in-the-walls/#go...
| mhuffman wrote:
| "The model used in the new study, called CheXzero, was
| developed in 2022 by a team at Stanford University using a data
| set of almost 400,000 chest x-rays of people from Boston with
| conditions such as pulmonary edema, an accumulation of fluids
| in the lungs. Researchers fed their model the x-ray images
| without any of the associated radiologist reports, which
| contained information about diagnoses. "
|
| ... very interesting that the inputs to the model had nothing
| related to race or gender, but somehow it still was able to
| miss diagnose Black and female patients? I am curious of the
| mechanism for this. Can it just tell which x-rays belong to
| Black or female patients and then use some latent racism or
| misogyny to change the diagnosis? I do remember when it came
| out that AI could predict race from medical images with no
| other information[1], so that part seems possible. But where
| would it get the idea to do a worse diagnosis, even if it
| determines this? Surely there is no medical literature that
| recommends this!
|
| [1]https://news.mit.edu/2022/artificial-intelligence-
| predicts-p...
| protonbob wrote:
| I'm going to wager an uneducated guess. Black people are less
| likely to go to the doctor for both economic and historical
| reasons so images from them are going to be underrepresented.
| So in some way I guess you could say that yes, latent racism
| caused people to go to the doctor less which made them appear
| less in the data.
| apical_dendrite wrote:
| Where the data comes from also matters. Data is collected
| based on what's available to the researcher. Data from a
| particular city or time period may have a very different
| distribution than the general population.
| encipriano wrote:
| Arent black people like 10% of us population? You dont have
| ro look further
| daveguy wrote:
| You really just have to understand one thing: AI is not
| intelligent. It's pattern matching without wisdom. If fewer
| people in the dataset are a particular race or gender it will
| do a shittier job predicting and won't even "understand" why
| or that it has bias, because it doesn't understand anything
| at a human level or even a dog level. At least most humans
| can learn their biases.
| bilbo0s wrote:
| Isn't it kind of clear that it would have to be that the data
| they chose was influenced somehow by bias?
|
| Machines don't spontaneously do this stuff. But the humans
| that train the machines definitely do it all the time. Mostly
| without even thinking about it.
|
| I'm positive the issue is in the data selection and vetting.
| I would have been shocked if it was anything else.
| h2zizzle wrote:
| Non-technical suggestion: if AI represents an aspect of the
| collective unconscious, as it were, then a racist society
| would produce latently racist training data that manifests in
| racist output, without anyone at any step being overtly
| racist. Same as an image model having a preference for red
| apples (even though there are many colors of apple, and even
| red ones are not uniformly cherry red).
|
| The training data has a preponderance of examples where
| doctors missed a clear diagnosis because of their unconscious
| bias? Then this outcome would be unsurprising.
|
| An interesting test would be to see if a similar issue pops
| up for obese patients. A common complaint, IIUC, is that
| doctors will chalk up a complaint to their obesity rather
| than investigating further for a more specific (perhaps
| pathological) cause.
| FanaHOVA wrote:
| The non-tinfoil hat approach is to simply Google "Boston
| demographics", and think of how training data distribution
| impacts model performance.
|
| > The data set used to train CheXzero included more men, more
| people between 40 and 80 years old, and more white patients,
| which Yang says underscores the need for larger, more diverse
| data sets.
|
| I'm not a doctor so I cannot tell you how xrays differ across
| genders / ethnicities, but these models aren't magic
| (especially computer vision ones, which are usually much
| smaller). If there are meaningful differences and they don't
| see those specific cases in training data, they will always
| fail to recognize them at inference.
| cratermoon wrote:
| > Can it just tell which x-rays belong to Black or female
| patients and then use some latent racism or misogyny to
| change the diagnosis?
|
| The opposite. The dataset is for the standard model "white
| male", and the diagnoses generated pattern-matched on that.
| Because there's no gender or racial information, the model
| produced the statistically most likely result for white male,
| a result less likely to be correct for a patient that doesn't
| fit the standard model.
| XorNot wrote:
| The better question is just "are you actually just
| selecting for symptom occurrence by socioeconomic group?"
|
| Like you could modify the question to ask "is the model
| better at diagnosing people who went to a certain school?"
| and simplistically the answer would likely seem to be yes.
| ideamotor wrote:
| I really can't help but think of the simulation hypothesis.
| What are the chances this copy-cat technology was developed
| when I was alive, given that it keeps going.
| kcorbitt wrote:
| We may be in a simulation, but your odds of being alive to
| see this (conditioned on being born as a human at some point)
| aren't _that_ low. Around 7% of all humans ever born are
| alive today!
| encipriano wrote:
| I dont believe that percentage. Especially considering how
| spread the homo branch already was more than 100 000 years
| ago. And from which point do you start counting? Homo
| erectus?
| bobthepanda wrote:
| I would imagine this is probably the source, which
| benchmarks using the last 200,000 years.
| https://www.prb.org/articles/how-many-people-have-ever-
| lived...
|
| Given that we only hit the first billion people in 1804
| and the second billion in 1927 it's not all that
| shocking.
| XorNot wrote:
| That argument works both ways, it might be significantly
| higher depending how you count.
|
| But this is also just the non-intuitiveness of
| exponential growth which has only now tapering off.
| jfengel wrote:
| It kinda doesn't matter where you start counting.
| Exponential curves put almost everything at the end.
| Adding to the left side doesn't change it much.
|
| You could go back to Lucy and add only a few million.
| Compared to the billions at this specific instant, it
| just doesn't make a difference.
| ToValueFunfetti wrote:
| In order to address the chances of a human being alive to
| witness the creation of this tech, you'd have to factor in
| the humans who have yet to be born. If you're a doomer, 7%
| is probably still fine. If we just maintain the current
| population for another century, it'll be much lower.
| bko wrote:
| Suppose you have a system that saves 90% of lives on group A
| but only 80% of lives in group B.
|
| This is due to the fact that you have considerably more
| training data on group A.
|
| You cannot release this life saving technology because it has a
| 'disparate impact' on group B relative to group A.
|
| So the obvious thing to do is to have the technology
| intentionally kill ~1 out of every 10 patients from group A so
| the efficacy rate is ~80% for both groups. Problem solved
|
| From the article:
|
| > "What is clear is that it's going to be really difficult to
| mitigate these biases," says Judy Gichoya, an interventional
| radiologist and informatician at Emory University who was not
| involved in the study. Instead, she advocates for smaller, but
| more diverse data sets that test these AI models to identify
| their flaws and correct them on a small scale first. Even so,
| "Humans have to be in the loop," she says. "AI can't be left on
| its own."
|
| Quiz: What impact would smaller data sets have on efficacy for
| group A? How about group B? Explain your reasoning
| janice1999 wrote:
| > You cannot release this life saving technology because it
| has a 'disparate impact' on group B relative to group A.
|
| Who is preventing you in this imagined scenario?
|
| There are drugs that are more effective on certain groups of
| people than others. BiDil, for example, is an FDA approved
| drug marketed to a single racial-ethnic group, African
| Americans, in the treatment of congestive heart failure. As
| long as the risks are understood there can be accommodations
| made ("this AI tool is for males only" etc). However such
| limitations and restrictions are rarely mentioned or
| understood by AI hype people.
| bko wrote:
| What does this have to do with FDA or drugs? Re-read the
| comment I was replying to. It's complaining that a
| technology could serve one group of people better than
| another, and I would argue that this should not be our
| goal.
|
| A technology should be judged by "does it provide value to
| any group or harm any other group". But endlessly dividing
| people into groups and saying how everything is unfair
| because it benefits group A over group B due to the nature
| of the problem, just results in endless hand-wringing and
| conservatism and delays useful technology from being
| released due to the fear of mean headlines like this.
| bilbo0s wrote:
| No. That's not how it works.
|
| It's contraindication. So you're in a race to the bottom in a
| busy hospital or clinic. Where people throw group A in a line
| to look at what the AI says, and doctors and nurses actually
| look at people in group B. Because you're trying to move
| patients through the enterprise.
|
| The AI is never even given a chance to fail group B. But now
| you've got another problem with the optics.
| potsandpans wrote:
| Imagine if you had a strawman so full of straw, it was the
| most strawfilled man that ever existed.
| bko wrote:
| From the article:
|
| > "What is clear is that it's going to be really difficult
| to mitigate these biases," says Judy Gichoya, an
| interventional radiologist and informatician at Emory
| University who was not involved in the study. Instead, she
| advocates for smaller, but more diverse data sets that test
| these AI models to identify their flaws and correct them on
| a small scale first. Even so, "Humans have to be in the
| loop," she says. "AI can't be left on its own."
|
| What do you think smaller data sets would do to a model?
| It'll get rid of disparity sure
| milesrout wrote:
| It is a hypothetical example not a strawman.
| JumpCrisscross wrote:
| > _You cannot release this life saving technology because it
| has a 'disparate impact' on group B relative to group A_
|
| I think the point is you need to let group B know this tech
| works less well on them.
| timewizard wrote:
| LLMs don't and cannot want things. Human beings also like it
| when the future is mostly like the past. They just call that
| "predictability."
|
| Human data is bias. You literally cannot remove one from the
| other.
|
| There are some people who want to erase humanity's will and
| replace it with an anthropomorphized algorithm. These people
| concern me.
| balamatom wrote:
| The most concerning people are -- as ever -- those who only
| think that they are thinking. Those who keep trying to fit
| square pegs into triangular holes without, you know, stopping
| to reflect: _who_ gave them those pegs in the first place,
| and to what end?
|
| Why be obtuse? There is no "anthropomorphic fallacy" here to
| dispel. You know very well that "LLMs want" is simply a way
| of speaking about _teleology_ without antagonizing people who
| are taught that they should be afraid of _precise notions_ (
| "big words"). But accepting _that_ bias can lead to some
| pretty funny conflations.
|
| For example, humanity as a whole doesn't have this "will" you
| speak of any more than LLMs can "want"; _will is an aspect of
| the consciousness of the individual_. So you seem to be be
| uncritically anthropomorphizing social processes!
|
| If we assume those to be chaotic, in that sense any sort of
| algorithm is _slightly more_ anthropomorphic: at least it
| works towards a human-given and therefore human-
| comprehensible purpose -- on the other hand, whether there is
| some particular "destination of history" towards which
| humanity is moving, is a question that can only ever be
| speculated upon, but not definitively perceived.
| verisimi wrote:
| > If we assume those to be chaotic, in that sense any sort
| of algorithm is slightly more anthropomorphic: at least it
| works towards a human-given and therefore human-
| comprehensible purpose -- on the other hand, whether there
| is some particular "destination of history" towards which
| humanity is moving, is a question that can only ever be
| speculated upon, but not definitively perceived.
|
| Do you not think that if you anthropomorphise things that
| aren't actually anthropic, that you then insert a bias
| towards those things? The bias will actually discriminate
| at the expense of people.
|
| If that is so, the destination of history will inevitably
| be misanthropic.
|
| Misplaced anthropomorphism is a genuine, present concern.
| sapphicsnail wrote:
| Humans anthropocize all sorts of things but there are way
| bigger consequences for treating current AI like a human
| than someone anthropocizing their dog.
|
| I know plenty of people that believe LLMs think and reason
| the same way as humans do and it leads them to make bad
| choices. I'm really careful about the language I use around
| such people because we understand expressions like, "the AI
| thought this" very differently.
| itishappy wrote:
| Can humans want things? Our reward structures sure seem
| aligned in a manner that encourages anthropomorphization.
|
| Biases are symptoms of imperfect data, but that's hardly a
| human-specific problem.
| MountainArras wrote:
| The dataset they used to train the model are chest xrays of
| known diseases. I'm having trouble understanding how that's
| relevant here. The key takeaway is that you can't treat all
| humans as a single group in this context, and variations in the
| biology across different groups of people may need to be taken
| into account within the training process. In other words, the
| model will need to be trained on this racial/gender data too in
| order to get better results when predicting the targeted
| diseases within these groups.
|
| I think it's interesting to think about instead attaching
| generic information instead of group data, which would be blind
| to human bias and the messiness of our rough categorizations of
| subgroups.
| pelorat wrote:
| I think the model needs to be thought about human anatomy,
| not just fed a bunch of scans. It needs to understand what
| ribs and organs are.
| ericmcer wrote:
| I don't think LLMs can achieve "understanding" in that
| sense.
| nomel wrote:
| These aren't LLM. Most of the neat things in science,
| involving AI, aren't LLM. Next word prediction has
| extremely limited use with non-text data.
| thaumasiotes wrote:
| People seem to have started to use "LLM" to refer to any
| suite of software that includes an LLM somewhere within
| it; you can see them talking about LLM-generated art, for
| example.
| hnlmorg wrote:
| Was it ascii art? ;)
| thaumasiotes wrote:
| https://hamatti.org/posts/art-forgery-llms-and-why-it-
| feels-...
|
| People will just believe whatever they hear.
| satvikpendem wrote:
| Computer vision models are not large language models; LLM
| does not mean generative AI or even AI in general, it
| stands for a specific initialism.
| bko wrote:
| Apparently providing this messy rough categorization appeared
| to help in some cases. From the article:
|
| > To force CheXzero to avoid shortcuts and therefore try to
| mitigate this bias, the team repeated the experiment but
| deliberately gave the race, sex, or age of patients to the
| model together with the images. The model's rate of "missed"
| diagnoses decreased by half--but only for some conditions.
|
| In the end though I think you're right and we're just at the
| phases of hand-coding attributes. The bitter lesson always
| prevails
|
| https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.
| ..
| thaumasiotes wrote:
| > Also important was the use [in Go] of learning by self
| play to learn a value function
|
| I thought the self-play _was_ the value function that made
| progress in Go. That is, it wasn 't the case that we played
| through a lot of games and used that data to create a
| function that would assign a value to a Go board. Instead,
| the function to assign a value to a Go board would do some
| self-play on the board and assign value based on the
| outcome.
| ruytlm wrote:
| It disappoints me how easily we are collectively falling for
| what effectively is "Oh, our model is biased, but the only
| way to fix it is that everyone needs to give us all their
| data, so that we can eliminate that bias. If you think the
| model shouldn't be biased, you're morally obligated to give
| us everything you have for free. Oh but then we'll charge you
| for the outputs."
|
| How convenient.
|
| It's increasingly looking like the AI business model is "rent
| extracting middleman", just like the Elseviers et al of the
| academic publishing world - wedging themselves into a
| position where they get to take everything for free, but
| charge others at every opportunity.
| multjoy wrote:
| The key takeaway from the article is that the race etc. of
| the subjects wasn't disclosed to the AI, yet it was able to
| predict it to 80% while the human experts managed 50%
| suggesting that there was _something else_ encoded in the
| imagery that the AI was picking up on.
| mjevans wrote:
| The AI might just have a better subjective / analytical
| weight detection criteria. Humans are likely more willing
| to see what they (or not see what they don't) expect to
| see.
| genocidicbunny wrote:
| One of the things that people I know in the medical field
| have mentioned is that there's racial and gender bias that
| goes through all levels and has a sort of feedback loop. A
| lot of medical knowledge is gained empirically, and
| historically that has meant that minorities and women tended
| to be underrepresented in western medical literature. That
| leads to new medical practitioners being less exposed to
| presentations of various ailments that may have variance due
| to gender or ethnicity. Basically, if most data is gathered
| from those who have the most access to medicine, there will
| be an inherent bias towards how various ailments present in
| those populations. So your base data set might be skewed from
| the very beginning.
|
| (This is mostly just to offer some food for thought, I
| haven't read the article in full so I don't want to comment
| on it specifically.)
| dartos wrote:
| > The dataset they used to train the model are chest xrays of
| known diseases. I'm having trouble understanding how that's
| relevant here.
|
| For example, If you include no (or few enough) black women in
| the dataset of x-rays, the model may very well miss signs of
| disease in black women.
|
| The biases and mistakes of those who created the data set
| leak into the model.
|
| Early image recognition models had some very... culturally
| insensitive classes baked in.
| niyyou wrote:
| As Sara Hooker discussed in her paper https://www.cell.com/patt
| erns/fulltext/S2666-3899(21)00061-1..., bias goes way beyond
| data.
| jhanschoo wrote:
| I like how the author used neo-Greek words to sneak in graphic
| imagery that would normally be taboo in this register of
| writing
| MonkeyClub wrote:
| I dislike how they misspelled it though.
| nonethewiser wrote:
| Race and gender should be inputs then.
|
| The female part is actually a bit more surprising. Its easy to
| imagine a dataset not skewed towards black people. ~15% of the
| population in North America, probably less in Europe, and way
| less in Asia. But female? Thats ~52% globally.
| krapp wrote:
| Modern medicine has long operated under the assumption that
| whatever makes sense in a male body also makes sense in a
| female body, and womens' health concerns were often dismissed,
| misdiagnosed or misunderstood in patriarchal society. Women
| were rarely even included in medical trials prior to 1993. As a
| result, there is simply a dearth of medical research directly
| relevant to women for models to even train on.
| Avshalom wrote:
| https://www.npr.org/2022/11/01/1133375223/the-first-
| female-c... Twenty Twenty Two!
| Freak_NL wrote:
| Surprising? That's not a new realisation. It's a well known
| fact that women are affected by this in medicine. You can do a
| cursory search for the gender gap in medicine and get an
| endless amount of reporting on that topic.
| nonethewiser wrote:
| That just makes it more surprising.
| appleorchard46 wrote:
| I learned about this recently! It's wild how big the
| difference is. Even though legal/practical barriers to gender
| equality in medicine and data collection have been virtually
| nonexistent for the past few decades the inertia from the
| decades before that (where women were often specifically
| excluded, among many other factors) still weigh heavily.
|
| To any women who happen to be reading this: if you can,
| please help fix this! Participate in studies, share your data
| when appropriate. If you see how a process can be improved to
| be more inclusive then please let it be known. Any
| (reasonable) male knows this is an issue and wants to see it
| fixed but it's not clear what should be done.
| andsoitis wrote:
| > Its easy to imagine a dataset not skewed towards black
| people. ~15% of the population in North America, probably less
| in Europe, and way less in Asia.
|
| What about Africa?
| rafaelmn wrote:
| How much medical data/papers do you think they generate in
| comparison to these three ?
| appleorchard46 wrote:
| That's not where most of the data is coming from. If it was
| we'd be seeing the opposite effect, presumably.
| jsemrau wrote:
| I suppose that's the problem I have with that study. T
| nonethewiser wrote:
| The story is that there exists this model which poorly
| predicts for black (and female) patients. Given there are
| probably lots of datasets where black people are a vast
| minority makes this not surprising.
|
| For all I know there are millions of models with extremely
| poor accuracy based on African datasets. Wouldnt really
| change anything about the above though. I wouldnt expect that
| though and it would definitely be interesting.
| orand wrote:
| Race and sex should be inputs. Giving any medical prominence to
| gender identity will result in people receiving wrong and
| potentially harmful treatment, or lack of treatment.
| lalaithion wrote:
| Most trans people have undergone gender affirming medical
| care. A trans man who has had a hysterectomy and is on
| testosterone will have a very different medical baseline than
| a cis woman. A trans woman who has had an orchiectomy and is
| on estrogen will have a very different medical baseline than
| a cis man. It is literally throwing out relevant medical
| information to attempt to ignore this.
| nonethewiser wrote:
| How is that in any way in conflict with what he said?
| You're just making an argument for more inputs.
|
| Biological sex, hormone levels, etc.
| matthewmacleod wrote:
| The GP literally said "giving any medical prominence to
| gender identity will result in people receiving wrong and
| potentially harmful treatment" which is categorically
| false for the reasons the comment you replied to
| outlined.
|
| Sex assigned at birth is in many situations important
| medical information; the vast majority of trans people
| are very conscious of their health in this sense and
| happy to share that with their doctor.
| nonethewiser wrote:
| >Sex assigned at birth is in many situations important
| medical information
|
| Which is not gender identity. As a result of being trans
| there may be things like hormone levels that are
| different than what you'd expect based on biological sex,
| which is why I say hormone levels are important, but how
| you identify is in fact irrelevant.
| matthewmacleod wrote:
| Well, this is clearly wrong - it's obvious, for example,
| that gender identity could have a significant impact on
| mental health.
|
| Regardless of that, you seem to agree that:
|
| - Sex assigned at birth is important medical information
|
| - Information about gender affirming treatments is
| important medical information
|
| So I don't think there's much to worry about there.
| jl6 wrote:
| The problem is that over the past few decades there has
| been substantial conflation of sex and gender, with many
| information systems _replacing_ the former with the
| latter, rather than _augmenting_ data collection with the
| latter.
| connicpu wrote:
| I think it's pretty clear to see how discrimination is
| the cause of that. Why would you volunteer information
| that from your point of view is more likely to cause a
| negative interaction than not?
| skyyler wrote:
| >why I say hormone levels are important, but how you
| identify is in fact irrelevant
|
| I don't understand what your issue with it is, it's just
| another point of data.
|
| I don't want to be treated like a cis woman in a medical
| context, but I sure do want to be treated like a trans
| woman.
| consteval wrote:
| > hormone levels, etc.
|
| Right... their gender they identify as.
|
| So sex, and then also the gender they identify as.
|
| You can't hide behind an "etc". Expand that out and the
| conclusion is you really do need to know who is trans and
| who is cisgender when doing treatment.
| root_axis wrote:
| Seems like adding in gender only makes things less clear.
| The relevant information is sex and a medical history of
| specific surgeries and medications - the type of thing your
| doctor should already be aware of. Adding in gender only
| creates ambiguity because there's no way to measure gender
| from a biological perspective.
| LadyCailin wrote:
| That's mostly correct, that "gender identity" doesn't matter
| for physical medicine. But hormone levels and actual internal
| organ sets matter a huge amount, more than genes or original
| genitalia, in general. There are of course genetically linked
| diseases, but there are people with XX chromosomes that are
| born with a penis, and XY people that are born with a vulva,
| and genetically linked diseases don't care about external
| genitalia either way.
|
| You simply can't reduce it to birth sex assignment and that's
| it, if you do, you will, as you say, end up with wrong and
| potentially harmful treatment, or lack of treatment.
| nonethewiser wrote:
| >But hormone levels and actual internal organ sets matter a
| huge amount, more than genes or original genitalia
|
| Or current genitalia for that matter. It's just a matter of
| the genitalia signifying other biological realities for
| 99.9% of people. For sure more info like average hormone
| levels or ranges over time would be more helpful.
| connicpu wrote:
| Actually both are important inputs, especially when someone
| has been taking hormones for a very long time. The human body
| changes greatly. Growing breast tissue increases the
| likelyhood of breast cancer, for example, compared to if you
| had never taken it (but about the same as if estradiol had
| been present during your initial puberty).
| XorNot wrote:
| Why not socioeconomic status or place of residence? Knowing
| mean yearly income will absolutely help an AI figure out
| statistically likely health outcomes.
| jimnotgym wrote:
| Just as good as a real doctor then?
| josefritzishere wrote:
| AI does a terrible job huh? I wish there was a way we have done
| this for decades without that problem...
| LadyCailin wrote:
| Good thing we got rid of DEI.
| kjkjadksj wrote:
| This isn't an AI problem but a general medical field problem. It
| is a big issue with basically any population centric analysis
| where the people involved in the study don't have a perfect
| subset of the worlds population to model human health; they have
| a couple hundred blood samples from patients at a Boise hospital
| over the past 10 years perhaps. And they validate this population
| against some other available cohort that is similarly constrained
| by what is practically possible to sample and catalog and might
| not even see the same markers shake out between disease and
| healthy.
|
| There are a couple populations that are really overrepresented as
| a result of these available datasets. Utah populations on one
| hand because they are genetically bottlenecked and therefore have
| better signal to noise in theory. And on the other the Yoruba
| tribe out of west africa as a model of the most diverse and
| ancestral population of humans for studies that concern
| themselves with how populations evolved perhaps.
|
| There are other projects too amassing population data. About
| 2/3rd of the population of iceland has been sequenced and this
| dataset is also frequently used.
| cratermoon wrote:
| It's a generative AI LLM hype issue because it follows the
| confidence game playbook. Feed someone correct ideas and
| answers that fit their biases until they trust you, then when
| the time is right, suggest things that fit their biases but
| give _incorrect_ (and exploitative) results.
| antipaul wrote:
| When was AI supposed to replace radiologists? Was it 7 years ago
| or something?
| bilbo0s wrote:
| Nah.
|
| It was more like one year away.
|
| But one year away for the past 7 years.
| dekhn wrote:
| Nearly all radiology practice has integrated AI to some degree
| or another at this point.
| chadd wrote:
| A surprisingly high number of medical studies will not include
| women because the study doesn't want to account for "outliers"
| like pregnancy and menstrual cycles[0]. This is bound to have
| effects on LLM answers for women.
|
| [0] https://www.northwell.edu/katz-institute-for-womens-
| health/a...
| nottorp wrote:
| > as well in those 40 years or younger
|
| Are we sure it's only about racial bias then?
|
| Looks to me like the training data set is too small overall. They
| had too few black people, too few women, but also too few younger
| people.
| xboxnolifes wrote:
| It's the same old story that's been occurring for
| years/decades. Bad data in, bad data out.
| sxp wrote:
| https://www.science.org/doi/10.1126/sciadv.adq0305 is the paper
| and
| https://www.science.org/cms/10.1126/sciadv.adq0305/asset/b68...
| is the key graph.
| christkv wrote:
| Ran the paper through o3-mini-high and get the following.
| Obviously not going to post the whole answer (too long). Run the
| prompt if you want to look at it.
|
| Seems like a reasonable analysis having read the paper myself.
|
| Prompt
|
| please analyse the passed in paper for any logical faults in
| results and conclusions.
| https://www.science.org/doi/10.1126/sciadv.adq0305
|
| 5. Overall Assessment The paper is robust in its experimental
| setup and highlights a crucial issue in deploying AI in clinical
| settings. However, the following logical issues merit further
| attention:
|
| Overgeneralization: Extrapolating findings from two models and
| specific datasets to the broader class of vision-language
| foundation models might be premature.
|
| Causal Attribution: The paper's conclusion that model-encoded
| demographic information leads to higher bias, while plausible, is
| not definitively proven through causal analysis.
|
| Comparative Baseline: The method of averaging radiologist
| responses may mask underlying variability, potentially leading to
| an overestimation of the model's relative bias.
|
| Statistical Extremes: Extremely low p-values should be
| interpreted with caution, as they may reflect large sample sizes
| rather than clinically meaningful differences.
|
| In summary, while the study is valuable and well-constructed in
| many respects, it would benefit from a more cautious
| interpretation of its findings, a broader evaluation of different
| models, and a more thorough exploration of potential confounders
| and mitigation strategies.
|
| Final Thoughts The paper offers significant insights into bias in
| medical AI; however, its conclusions should be understood in the
| context of the study's limitations, particularly in terms of
| model selection, dataset representativeness, and the inference of
| causality from correlational data.
|
| Please let me know if you need further elaboration or have
| specific aspects of the paper you'd like to discuss further.
| zeagle wrote:
| Cool topic! This isn't surprising given the AI models would be
| trained such that existing medical practices, biases, and
| failures would propagate through them as others have said here.
|
| There is a published, recognized bias against women and blacks
| (borrowing the literature term) specifically in medicine when it
| comes to pain assessment and treatment. Racism is a part of it
| but too simplistic. Most of us don't go to work trying to be
| horrible people. I was in a fly in community earlier this week
| for work where 80% of housing is subsidized social housing... so
| spit balling a bit... things like assumptions about rate of
| metabolizing medications being equal, assess to medication,
| culture and stoicism, dismissing concerts, and the broad effects
| of poverty/trauma/inter-generational trauma all must play a role
| in this.
|
| For interest:
|
| https://jamanetwork.com/journals/jamanetworkopen/fullarticle...
|
| Overall, the authors found comparable ratings in Black and White
| participants' perceptions of the patient-physician relationship
| across all three measures (...) Alternatively, the authors found
| significant racial differences in the pain-related outcomes,
| including higher pain intensity and greater back-related
| disability among Black participants compared with White
| participants (intensity mean: 7.1 vs 5.8; P < .001; disability
| mean: 15.8 vs 14.1; P < .001). The quality of the patient-
| physician relationship did not explain the association between
| participant race and the pain outcomes in the mediation analysis.
|
| https://www.aamc.org/news/how-we-fail-black-patients-pain
|
| (top line summary) Half of white medical trainees believe such
| myths as black people have thicker skin or less sensitive nerve
| endings than white people. An expert looks at how false notions
| and hidden biases fuel inadequate treatment of minorities' pain.
|
| And
| https://www.washingtonpost.com/wellness/interactive/2022/wom...
| tennisflyi wrote:
| Yes. Almost certain there are dedicated books to IDing/how
| diseases present differently on skin other than white
| jdthedisciple wrote:
| Anyone who thinks that the primary culprit for this is anything
| other than the input data distribution (and metadata inputs or
| the lack thereof) lacks even the most basic understanding of AI.
| bbarnett wrote:
| I remember a male and female specialist, whatever their
| discipline, holding a media scrum a decade ago.
|
| They pleaded for people to understand that men and women are
| physically different, including the brain, its neurological
| structure, and that this was in modern medicine being overlooked
| for political reasons.
|
| One of the results was that many clinical trials and studies were
| populated by males only. The theory being that they are less risk
| adverse, and as "there is no difference", then who cares?
|
| Well these two cared, and said that it was hurting medical
| outcomes for women.
|
| I wonder, if this AI issue is a result of this. Fewer examples of
| female bodies and brains, fewer studies and trials, means less
| data to match on...
|
| https://news.harvard.edu/gazette/story/2007/07/sex-differenc...
| mg794613 wrote:
| Thats bad! Let's change that! Let's be better than our
| predecessors! Right?
|
| So, how do they suggest to tackle the problem?
|
| 1. Improve the science 2. Update the data
|
| or
|
| 3. Somehow focus on it being racist and then walking away like
| the hero of the day without actually solving the problem.
___________________________________________________________________
(page generated 2025-03-27 23:00 UTC)