[HN Gopher] AI models that predict disease are not as accurate a...
___________________________________________________________________
AI models that predict disease are not as accurate as reports might
suggest
Author : arkj
Score : 118 points
Date : 2022-10-21 18:07 UTC (4 hours ago)
(HTM) web link (www.scientificamerican.com)
(TXT) w3m dump (www.scientificamerican.com)
| deltree7 wrote:
| Yet.
|
| One thing media consistently gets wrong is the rate of innovation
| that is happening. Media also doesn't have access to state-of-
| the-art models, only from the trigger-happy startups too eager to
| release half-baked version.
|
| It's akin to downloading Image Generation tools from the App
| Store and concluding that's state of the art
| bpodgursky wrote:
| It baffles me that people can watch the trendline of
|
| "Job X can be automated in 40 years" (5 years ago)
|
| "Job X can be automated in 10 years" (2 years ago)
|
| "Job X can be automated in 5 years" (1 week ago)
|
| And feel comfortable poking holes in the AI models, pointing
| out where it fails. Obviously? But nobody 3 years ago thought
| that graphic design or creative writing was on death's row
| either.
|
| You have to spend a modicum of effort looking at how
| predictions have _evolved_ over the past couple years, but once
| you do, it 's very clear that mocking current AI systems makes
| you look like a clown.
| ckemere wrote:
| There's also the timeline that:
|
| "Radiology will be automatized in 5 years" (10 years ago)
| "Radiology will be automatized in 5 years" (5 years ago)
| "Radiology will be automatized in 5 years" (last year)
|
| or
|
| "Full self driving will arrive within 5 years" (5 years ago)
| "Full self driving is still a ways off" (last year)
|
| Assuming you're referring to generative models, I don't think
| that anyone (knowledgable) thinks that graphic design or
| creative writing are on death's door. They might change with
| new tools, but skilled practitioners are still required.
| That's basically the point of the article.
| chatterhead wrote:
| We are 18 years from the DARPA Grand Challenge and none of
| the vehicles finished.
|
| Do you think a self-driving car can make it from LA to NYC
| by itself now?
|
| What do you think 2040 AI will look like?
| lostlogin wrote:
| Having seen some of the automation available in radiology,
| I'm a bit baffled as to why I still have a job as an MRI
| tech.
|
| 5 years ago I watched automated cardiac MRI, and it worked
| well. I was told about a site that were having good results
| with fetal cardiac MRI via a related bit of software.
|
| These scans are hard to do, and the machines did well. In
| some cases they got confused and did a good functional
| analysis but of the stomach, not the heart. Oops, but
| easily fixed by almost anyone after a few minutes of
| explanation.
|
| Why are basic MSK scans still done by a tech with years of
| training?
|
| I don't know the answer to that as it's basic stuff and if
| I end my career without machines having taken over the
| basic stuff, I'll be a bit disappointed.
| esel2k wrote:
| But even if all these analysis would be dine
| automatically, I guess you won't be out of a job soon
| (good news I guess). But just different: I did in the
| automation of the diagnostic lab and what happened is
| that from a detective style job, today it is more about
| running a factory. 24h running a business, turn around
| times and have less and less qualified personnel to fill
| the machines...
| dr_dshiv wrote:
| I'm shocked. SHOCKED.
| bitL wrote:
| OK, AI is bad but compare it to human doctors/radiologists that
| are often worse. I still remember stats from some X-ray detection
| where AI diagnosed with 40% accuracy and the best human doctors
| with 38% accuracy (and median human doctors with 32% accuracy).
| Now what are we supposed to do?
| pcurve wrote:
| Can you cite the source? Is it not possible to improve the 40%
| rate by AI? Obviously someone eventually figured out the 100%
| dmurray wrote:
| They might have "figured it out" by cutting the patient open.
| yrgulation wrote:
| Oh god (science for some of us) the same kind of logic for
| defending tesla's fsd system. Both crappy and dangerous, but
| with cult like following.
| srvmshr wrote:
| I worked in healthcare ML solutions, as part of my PhD & also as
| consultant to a telemedicine company.
|
| My experience in dealing with data (we had sufficient, and
| somewhat well labeled) & methods made me realize that a lot of
| the prediction human doctors make are multimodal - and that is
| something deep learning will struggle for the time being. For
| example, say in detection of a disease _X_ , physicians factor in
| blood work, family history, imaging, racial genealogy, general
| symptoms (like hoarseness, gait, sweating etc), even texture &
| palpitations of affected regions sometimes before narrowing down
| on a set of assessments & making diagnostic decisions.
|
| If we just add in more dimensions of data to model, it just makes
| the search space sparser, not easier. Throwing in more data will
| likely just fit more common patterns & classes well, whereas a
| large number of symptoms may be treated as outliers and
| mispredicted.
|
| We humans are incredibly good at elimination of factors &
| differential diagnosis. The findings don't surprise me. There is
| much more work needing to be covered. For straightforward, and
| conditions with limited, clear cut symptoms they are showing
| promising advancements, but it cannot be trusted to wide arrays
| of diagnosis - especially when models don't know what 'they do
| not know'.
| dekhn wrote:
| are you really sure the doctors are doing a better job when
| they go through the motions of incorporating a wide range of
| data? Or do we just convince ourselves they're better?
|
| I suspect we massively underestimate the amount of misdiagnosis
| due to incorrect analysis of data using fairly naive medical
| mental models of disease.
| majormajor wrote:
| My view on this is framed a bit differently but probably a
| similar ultimate perspective:
|
| I think it's probably going to be a long time before models
| only using quantifiable measurements can even meet the
| performance of top doctors. I can't recommend enough that
| someone experiencing issues doctor-shop if they haven't
| gotten a well-explained diagnosis from their current doctor.
|
| But I'm very curious how good one has to be in order to be
| better than a below-average doctor, or a 50th-percentile
| doctor, or a 75th...
|
| But I also think there may be weird failure modes similar to
| today's not-fully-self-driving cars along the lines of "if
| even the 75th-percentile-doctor uses the tool and sees an
| output that stops them from asking a question they otherwise
| might have, can it hurt things too?"
| srvmshr wrote:
| > But I'm very curious how good one has to be in order to
| be better than a below-average doctor, or a 50th-percentile
| doctor, or a 75th.
|
| In dermatology, on which I was working, models were better
| (at detecting skin cancers) than 52% of the GPs, going by
| just images. In a famous Nature paper by Esteva et al., the
| TPR was at 74% for detecting Melanomas. There is a catch
| which probably got underreported (The skin cancer
| positivity rate was strongly correlated to clinical
| markings in photos. Their models didn't do quite as well
| when 'clean', holdout data were used).
|
| But the nature of information in all these models were skin
| deep (pun intended). They were designed with a calibrated
| objective in place unlike how we approach clinical
| diagnostics as open ended problems for the doctors.
| srvmshr wrote:
| > _Are you really sure the doctors are doing a better job
| when they go through the motions of incorporating a wide
| range of data? Or do we just convince ourselves they 're
| better?_
|
| Personal story: I was diagnosed with a rare genetic disease
| in 2019. If I ran the symptoms through a ML gauntlet, I would
| be sure they would cancel each other out or make little
| sense. Chest CT (clean), fever (high), TB test (negative),
| latent TB marker (positive), vision difficulty (Nothing
| unusual yet), edema in eye socket (yes), WBC count (normal),
| tumors (none), hormones (normal) & retina images (severely
| abnormal)
|
| My condition was zeroed in within 5 minutes of a visitation
| to a top retina specialist, after regular opthalmologists
| were in a fix about two conflicting conditions. This was
| differential diagnosis based even though genetic assay hadn't
| returned yet, which also later came in favor. I cannot
| overemphasize enough how good human brain is in recalling
| information & connecting the sparse dots to logical
| conclusions
|
| (I am one of 0.003% unlucky ones among all opthalmological
| cases & the only active patient with that affliction in one
| of the busiest hospitals in the country. My data is part of
| the 36 people in a NIH study & opthalmo residents are
| routinely called in to see me as case study when I go for
| follow up quarterly).
| pixl97 wrote:
| How many specialists did you go to before it was
| identified?
|
| How many other people with the condition were
| misidentified?
|
| I only say this because of a family member with a rare
| genetic condition. For years they were told it was
| something else, or told 'it was in their head'. The family
| member started a journal of their medical conditions and
| experiences that was detailed then brought that to their PC
| which whom then sent them to a specialist, this specialist
| wasn't sure and sent them to another specialist that had a
| 3 month wait. After 5+ years of living with increasing
| severity of the condition it was identified.
|
| So, just saying, it's as much likely that the condition was
| identified because you kept a detailed list (on paper or in
| your mind) of the aliments and presented them in a manner
| that helped with the final diagnosis.
| srvmshr wrote:
| > How many specialists did you go to before it was
| identified?
|
| 2 opthalmo, 1 internal medicine, 1 retina super-
| specialist & finally someone from USC Davey
|
| > How many other people with the condition were
| misidentified?
|
| Historical data: I don't know. It is fairly divided
| between two types, one being zoonotic & other to IL2
| gene. I am told this distinction of pathways was
| identified in 2007.
|
| > [..] you kept a detailed list (on paper or in your
| mind) of the aliments and presented them in a manner that
| helped with the final diagnosis.
|
| I might have been a better informed patient but I went
| with a complaint of pink eye, flu & mild light
| sensitivity. Never imagined that visit would change my
| life forever. Thank you though, for expressing your
| concern & support
| nostromo wrote:
| I'm confused by your comment because these are exactly the type
| of problems that humans generally really do a poor job
| classifying.
| jeffreyrogers wrote:
| Most modern ML techniques do a poor job on these types of
| problems too unless they have a lot of data (hence the
| reference to sparsity) or assume structure that requires
| domain specific modeling to capture.
| fallingknife wrote:
| The system itself should be built around these capabilities,
| not the other way around. Instead of collecting data at regular
| intervals we wait until symptoms to go to the doctor. This is
| why the dataset is so sparse.
| IdealeZahlen wrote:
| Exactly this. The features (or limitations) of medical data
| is inherent in the process of clinical practice, but this
| seems to be oftentimes overlooked.
| planetsprite wrote:
| The solution to failures of AI in heathcare is transparency of
| data. OpenAI's models work because they have virtually unlimited
| data to train on. The scale of training data for doctor bots is
| one millionth the size. Different countries, organizations,
| universities need to be as open as possible sharing and
| collaborating, realizing improvements in medicine benefits all of
| humanity with almost no downsides.
| dirheist wrote:
| There should be a standardization committee tasked with
| standardizing the collection of anonymized, semi-synthetic
| medical data from hospitals/hospital networks. It seems like so
| much research is just locked up in the IMS systems the
| hospitals use for their patients and that never see the light
| of day.
| dekhn wrote:
| You cannot imagine just how deep the medical data rabbit hole
| goes.
|
| Already plenty of institutions have semi-standardized their
| collect and do multi-hospital (typically research hospitals)
| aggregation. Whether this data is any good as training data
| for supervised or unsupervised algorithms is really
| questionable.
| tomrod wrote:
| Technical (Honest) Solution: two holdouts
|
| 1. Involved in the build process
|
| 2. Never touched until paper metrics are being written, only run
| once
|
| Realistically, unlikely to occur however due to the incentives
| causing publication bias.
| jerpint wrote:
| Third (better) option: have a regulating body have a separate,
| undisclosed test set. If you can't beat it, you can't deploy
| your model. If you can beat it, you still need to have your
| models peer reviewed and scrutinized
| tomrod wrote:
| This sounds simple yet I expect data governance will be the
| bottleneck.
| cm2187 wrote:
| so the models that fail this one test never get published and
| the models that succeed get published. And all you have done is
| to publish a model that predicts that particular history, in
| other words data fitting.
| chatterhead wrote:
| 'Brunelleschi had just the solution. To get around the issue, the
| contest contender proposed building two domes instead of one --
| one nested inside the other. "The inner dome was built with four
| horizontal stone and chain hoops which reinforced the octagonal
| dome and resisted the outward spreading force that is common to
| domes, eliminating the need for buttresses," Wildman says. "A
| fifth chain made of wood was utilized as well. This technique had
| never been utilized in dome construction before and to this day
| is still regarded as a remarkable engineering achievement.'
|
| Brunelleschi was not an engineer he was a goldsmith. AI will
| advance in the same way architecture did during the Renaissance.
| By those with the winning ideas not with the right credentials.
|
| https://science.howstuffworks.com/engineering/architecture/b...
| dm319 wrote:
| As someone who works in healthcare, so much of what I read about
| AI makes me think that the people who are enthusiastic about
| healthcare AI don't have much experience doing it.
|
| The scenarios rarely seem to fit with what I'm actually
| practicing. Most of medicine is boring, it is largely routine,
| and if we don't know what's going on, it's because we're not the
| right person to be managing the patient. Most of my time is spent
| talking to people - patients, colleagues, family. I explain the
| diagnosis, I talk about the plan, I am getting ideas of what the
| patient wants and values, and then actioning it. I spend very
| little of my time like Dr House pondering what the next most
| important test to perform is for a patient who is confounding us.
| lostlogin wrote:
| I work in radiology with MRI as a tech. We use AI slightly
| differently to the examples here, but it's changing a lot of
| what we do. It's more about enhancing images than directly
| about diagnosing.
|
| The image is denoised 'intelligently' in k-space and then the
| resolution is doubled via another AI process in the image
| domain (or maybe the resolution is quadrupled, as it depends on
| how you measure it. Our pixel count doubles in x and y
| dimensions).
|
| These are 2 distinct processes which we can turn on or off and
| have some parameters which with we can alter the process.
|
| The result is amazing and image quality has gone up a lot.
|
| We haven't got a full grasp yet and have a few theories. The
| vendors are also still getting to grips.
|
| We think the training data set turns out to have some weird
| influences on requires acquisition parameters. For example,
| parallel imaging factor 4 works well, 3 and 2 less so, which is
| not intuitive. More acceleration being better for image quality
| is not how MRI used to work (except in a few edge cases).
|
| Bandwidth, averages, square pixel, turbo factor and appropriate
| TE matter a bit more than they did pre-AI.
|
| Images are now acquired faster, look better and sequence
| selection can be better tailored to the patient as we have less
| of a time pressure.
|
| I'd put our images up against almost anything I've seen before
| as examples of good work. We are seeing anatomy and pathology
| that we didn't previously appreciate. Sceptics ask if the
| things we see are really there, but after some time with the
| images the concern goes away and the pre-AI images just look
| broken.
|
| In the below link, ignore Gain (it isn't that great), Boost and
| Sharp are the vendor names for the good stuff. The brochure
| undersells it.
|
| https://www.siemens-healthineers.com/magnetic-resonance-imag...
| srvmshr wrote:
| I did my Masters in NMR. Can confirm a lot of ML based plug-
| and-play solutions are helping denoising k-space.
|
| Trivia: I am also one of the pulse sequence developers
| affiliated to Siemens LiverLab package on Syngo platform :)
| [Specifically the multiecho Dixon fat-water sequence]. SNR
| improvement was a big headache for rapid Dixon echos.
| lostlogin wrote:
| Ha, small world. Thanks for your work, I used to use this
| daily until a year ago, now my usage is less frequent.
|
| I guess Dixons are still a headache with their new k-space
| stuff as Boost (the denoising) isn't compatible with it
| yet. Gain is but looks distinctly lame when you compare it
| Boost.
|
| We are yet to see the tech applied to breath hold sequences
| (haste, vibe etc), Dixon, 3D, gradient sequences and
| probably others.
|
| I'm looking forward to seeing it on haste and 3D T2s
| (space) in particular. MRI looks very different today
| compared to how it looked just 6 months ago.
|
| I'd compare it to the change we saw going from 1.5T to 3T,
| just accelerated in how quickly progress is being made.
| srvmshr wrote:
| I have long since left collaboration with team at Cary,
| NC. But all I can say there was a great deal of interest
| in 3D sequence improvement by interpolation with known
| k-space patterns like in the GRASE or PROPELLR sequence
| for e.g. They also learned a good deal from working with
| NYU's fastMRI
| ggm wrote:
| My partner had a clinician review her paperwork and say "why
| are you here" explaining the enhanced imaging was leading to
| tentative concerns being raised about structural change so
| small it was below the threshold for safe surgical treatment.
|
| Moral of the story: the imaging has got so good that
| diagnostics is now on the fringe of over diagnosing and the
| stats need to catch up
| lostlogin wrote:
| This has been a thing for a long time, with MRI in
| particular.
|
| It gets quite philosophical. To diagnose something you need
| some pattern on the images. As resolution and tissue
| contrast improves you see more things, and the radiologist
| gets to decide if the appearance is something.
|
| When a clinician says there is a problem in some area of
| anatomy and there is something on the scan, the radiologist
| has to make a call.
|
| The great thing about being a tech is that making the call
| isn't my job. I have noticed that keeping the field of view
| small tends to make me more friends.
|
| A half imaged liver haemangioma, a thyroid nodule or a
| white matter brain lesion as an incidental finding are a
| daily occurrence at least.
| esel2k wrote:
| So much this. I just interviewed about 10 doctors in the space
| of neurology and radiology to start some new projects. The
| truth is most of the headaches are from insurance coverage
| check or for radiologist for filling out correct reports. The
| fancy AI stuff is with maybe a few exceptions due to the great
| advancement imaging still far away from validation and I didn't
| even start about it's usage and gotomarket.
|
| Most of the cases the doctors sees are boring / regular cases -
| and problems like access to medical history is way more basic
| but more prevalent.
| [deleted]
| ericmcer wrote:
| That scenario sounds like it lends itself more to AI automation
| than a Dr. House type one.
| kbenson wrote:
| I don't know, compassion and understanding and nuanced
| understanding of individual desires when talking to someone
| is not what I associate AI with in my mind, but being able to
| assess sociological and cultural taboos and try to what a
| patient actually wants rather then what they might initially
| express seems like something I good doctor would get to
| through explorative conversation.
| junipertea wrote:
| Maybe removing a human from the equation would lead to more
| honest outcome? E.g. people google all sorts of issues more
| earnestly than they would describe it to the doctors. The
| bottleneck would be properly understanding what the user
| intends, which might be out of reach.
| ben_w wrote:
| Indeed. Language has been historically difficult for AI,
| but I think it's even tougher here -- language is less
| and less reliable the further we get from a shared
| experience, and this is a problem when describing our
| experiences of our own bodies, and much worse when
| describing our own minds.
|
| For example, when I was coming off an SSRI, I was
| forewarned that I might get a sensation of "electric
| shocks"; the actual experience wasn't like that, though I
| could tell why they chose to describe it like that.
|
| How different is the tightness in the chest during a
| heart attack from the tightness in the chest from
| exercising chest muscles?
|
| I have no idea how doctors, GPs, and nurses manage this,
| though they seem to have relatively little trouble.
| dm319 wrote:
| My experience of chatting with an internet chat-bot when
| trying to get some help with a product gives me little
| confidence we are close here.
|
| Edit: wording
| [deleted]
| rvz wrote:
| Of course. No surprise there. Especially the ones made with 'Deep
| Learning'.
|
| At this point, Each time AI and 'Deep Learning' is applied and
| then scrutinised, it almost always concludes and tends towards
| pure hype generated by investors and output garbage unexplainable
| results from broken models. The exact same goes for the self-
| driving scam.
|
| 'AI' is slowing starting to be getting outed as an exit scam.
| johannes_ne wrote:
| I recently published a paper, where we explain how an FDA
| approved prediction model, build into a widely used cardiac
| monitor was developed with an incredibly biased method.
|
| https://doi.org/10.1097/ALN.0000000000004320
|
| Basically, the training and validation data was engineered so an
| important range for one of the predictor variables was only
| present in one of the outcomes, making perfect prediction
| possible for these cases.
|
| I summarize the paper in this Twitter thread:
| https://twitter.com/JohsEnevoldsen/status/156164115389992960...
| baxtr wrote:
| Sorry for asking, but how is this relevant to the article?
| NovemberWhiskey wrote:
| Sorry for asking, but how is it _not_?
| baxtr wrote:
| Do you agree that it's ok to pose a question whenever you
| don't understand?
| csallen wrote:
| Ironically, that's exactly what NovemberWhiskey is doing
| here :)
| johannes_ne wrote:
| Fair question. The model we comment on both suffer from the
| problem described in the article but also a more severe
| problem:
|
| The developers sampled obvious cases og hypotension and
| nonhypotension, and trained the model to distinguish those.
| And also validated it on data that was similarly dichotomous.
| In reality the outcome is often between these two scenarios.
|
| But worse, they also introduce a more severe problem where as
| range of an important predictor is only available in the
| hypotension outcome.
| baxtr wrote:
| Thanks for explaining!
| roflyear wrote:
| This has to be intentional no?
| yrgulation wrote:
| So many in ai chasing software solutions when the problem is
| hardware. Limited power means limited learning. Mix lab grown
| neurons with software and you have a wining proposition.
| bjt2n3904 wrote:
| This is what freaks me out about AI.
|
| People will use it for years in various fields, and one by one,
| after a decade or so of use, they'll come to find it was complete
| garbage information, and they were just putting their trust in a
| magic 8 ball.
|
| But the damage is already done.
| hdhdhsjsbdh wrote:
| While the notion of treating these systems as "sociotechical"
| rather than purely technical is probably a good move wrt actually
| improving people's lives, I can say from my own experience in
| academia that there are still way too many academics working in
| this field who don't think it's their problem. I've personally
| raised these types of issues before and been told "we're computer
| scientists, not social scientists", as if "social scientist" is a
| derogatory term. The biggest impediment here is, in my opinion,
| overcoming the bloated egos of the people who think the social
| impacts of their work are somehow out of scope. All is well as
| long as you can continue to publish.
| JHonaker wrote:
| Preach.
|
| There are way too many people that conflate MSE or other
| abstract technical measurements of model performance like they
| actually represent the impact any model has on a problem. Even
| if we could somehow perfectly predict an actual realization
| instead of a conditional expectation that still forgets to ask
| the question of why we predicted that. Are we exploiting
| systemic biases, like historically racist policies? Almost
| definitely (unless we've consciously tried to adjust for them,
| and still we've probably done that incorrectly). I've become
| much less interested in models that basically just interpolate
| (very well I might add), and more in frameworks that attempt to
| answer why we see particular patterns.
| drtgh wrote:
| My humble opinion; AI is supposed to be the acronym for
| artificial intelligence, but marketing has usurped it to refer to
| machine learning, which is nothing more than a neo-language for
| defining statistical equations in a semi-automated way. An
| attempt to dispense with mathematicians to develop models.
|
| What amount of energy is necessary for an event to be reflected
| in a statistic? You have a box of 2x2 meters with balls of data,
| and a string with a diameter of 1 meter with which to surround
| the highest concentration of balls possible, and those that
| remain outside, there they stay. Statistics and lack of precision
| are concepts that go hand in hand (someones say even it is not an
| science).
| [deleted]
| spywaregorilla wrote:
| > My humble opinion; AI is supposed to be the acronym for
| artificial intelligence, but marketing has usurped it to refer
| to machine learning, which is nothing more than a neo-language
| for defining statistical equations in a semi-automated way.
|
| Sure. Hardly controversial.
|
| > An attempt to dispense with mathematicians to develop models.
|
| What...? No. Definitely not.
|
| > What amount of energy is necessary for an event to be
| reflected in a statistic? You have a box of 2x2 meters with
| balls of data, and a string with a diameter of 1 meter with
| which to surround the highest concentration of balls possible,
| and those that remain outside, there they stay. Statistics and
| lack of precision are concepts that go hand in hand (someones
| say even it is not an science).
|
| I have no idea what this is saying. It sounds like you're
| shitting on statistics all of a sudden, which is weird, given
| that you seemed to favor mathematicians in the first part.
| drtgh wrote:
| >I have no idea what this is saying. It sounds like you're
| shitting on statistics all of a sudden, which is weird, given
| that you seemed to favor mathematicians in the first part.
|
| Mathematicians are specialized in problem solving, and as
| humans, their ability to predict and analyze data makes them
| more reliable developing models than a statistical equation.
| They have quite more tools than statistics one.
|
| Someway, it is like if using the acronym AI to define
| statistical algorithms leads to a false sense of greater
| reliability than such human review, or even that it is not
| needed a deep human review. ML statistics takes algorithms
| out of the oven long before mathematicians does, at the
| expense of a big in accuracy difference.
|
| The problem I think is people may take important decisions
| based in the result of such statistical algorithms without
| questioning
| spywaregorilla wrote:
| I don't think most mathematicians have spent a great deal
| of time analyzing data tbh. Unless you mean statisticians.
| naniwaduni wrote:
| It's not that AI has been conflated with machine learning--
| those are words that are _supposed_ to refer to the same thing.
| The confusion is conflating either with slapdash applied
| statistics.
| dekhn wrote:
| Statistics is not science- it's an application of probability
| theory and some other forms of math to hypothesis selection
| (among other things).
|
| It's scientific. We only use stats because that's the best
| method for dealing with imprecise and noisy data.
|
| Statistical thermodynamics contains all the necessary tools you
| need to answer your balls in a box question.
| drtgh wrote:
| >Statistical thermodynamics contains all the necessary tools
| you need to answer your balls in a box question
|
| The balls in a box example shows how ML statistics work. The
| string is adjustable, it can be adapted to different
| contours, but you have to discard data.
|
| How do you compensate for the inclusion of data in the model
| without discarding others? The string has a limit in diameter
| by design, and you need to know the content of most of the
| data to make good decisions.
| charcircuit wrote:
| >which is nothing more than a neo-language for defining
| statistical equations in a semi-automated way.
|
| That's why it's called artificial intelligence.
| jfghi wrote:
| Having built models, I'd claim that it's art based upon
| science, perhaps not too different than engineering a building.
| At every stage there are decisions to be made with tradeoffs.
| Over time, the resulting model could be invalidated or perhaps
| perform better. It's remarkably difficult to approach or even
| define a "best" model.
|
| What's most peculiar to me is that somehow AI is becoming more
| distinct from math or stats and that there's a notion by
| running pytorch one is able to play god and create sentience.
| ben_w wrote:
| > Statistics and lack of precision are concepts that go hand in
| hand (someones say even it is not an science).
|
| Statistics is the mathematics of being precise about your level
| of imprecision. It's fairly fundamental to all science, and has
| been for a while now.
| Grothendank wrote:
| Color _me_ , personally, surprised. Between publication bias and
| the general public ignorance of AI and its evolving capabilities,
| and over a decade of results in AI health being overblown before
| transformers, how could we have predicted that post-transformer
| results in AI health would _continue_ to be overblown?
| fasthands9 wrote:
| It is still unclear to me exactly what data they were looking
| at/referring to in this article.
|
| If you take into account bloodwork, family history, demographics,
| etc. then it seems like you are still only getting a few dozen
| data points. At this scale it seems like traditional statistics
| or human checks for abnormalities are going to be about as good.
|
| Although I personally know very little (apologies for
| conjecturing) it does seem like there could be a lot of uses for
| AI for specific diagnosis. For example, when they take your blood
| pressure/heartbeat they only get data for one particular moment
| where you are sitting in a controlled environment. I would think
| if you had a year's worth of data (along with activity data from
| an apple watch) you might be able to diagnose/predict things that
| traditional doctors/human analysis could not.
|
| I would also imagine anything that deals with image analyzing
| (like looking for tumors in scans) will be vastly better with
| computer AI systems than humans.
| bmh100 wrote:
| The issue with data leakage can be handled through k-fold cross-
| validation, in which all of the data takes turns as either
| training data or test data.
| photochemsyn wrote:
| Not that surprising. AI learning seems to do best with fairly
| predictable systems, and when it comes to individual outcomes in
| medicine, there's a lot of mystery involved. A group of people
| with similar genetic makeup and exposure history to carcinogens
| or pathogens won't all respond identically - some get persistent
| cancers, some get nasty infections, and some don't.
|
| For example, training an AI on historical tidal data would likely
| lead to very good future tide timing and height predictions,
| without any explicit mechanistic model needed. Tides have high
| predictability and relatively low variability (things like
| unusual wind patterns accounting for most of that).
|
| In contrast, there are some current efforts to forecast
| earthquakes by training an AI on historical seismograph data, but
| whether or not these will be of much use is similarly
| questionable.
|
| https://sciencetrends.com/ai-algorithms-being-used-to-improv...
| tensor wrote:
| This is entirely unsurprising and has a very simple solution:
| keep adding more data. Our measurements of the accuracy of AI
| systems are only as good as the test data, and if the test data
| is too small, then the reported accuracies won't reflect the true
| accuracies of the model applied to wild data.
|
| Basically, we need an accurate measure of whether the test data
| set is statistically representative of wild data. In healthcare,
| this means that the individuals that make of the test dataset
| must be statistically representative of the actual population
| (and also have enough samples).
|
| An easy solution here is that any research that doesn't pass a
| population statistics test must be up-front declared to be "not
| representative of real word usage" or something.
| blackbear_ wrote:
| From the article:
|
| > Here's why: As researchers feed data into AI models, the
| models are expected to become more accurate, or at least not
| get worse. However, our work and the work of others has
| identified the opposite, where the reported accuracy in
| published models decreases with increasing data set size.
| spywaregorilla wrote:
| That's not a contradiction per se. It's easier to get
| spurriously high test scores with smaller datasets. It does
| not clearly demonstrate that the models are actually getting
| worse.
| dirheist wrote:
| But if diagnosis are multimodal and rely upon large,
| multidimensional analysis of symptoms/bloodwork/past
| medical history, wouldn't adding more dimensions just
| increase dimensional sparsity and decrease the useful
| amount of conclusions you are able to draw from your
| variables?
|
| It's been a long time since I remember learning about the
| curse of dimensionality but if you increase the amount of
| datapoints you collect by half you would have to quadruple
| the amount of samples you have to retrieve any meaningful
| benefit, no?
| tappio wrote:
| You are right, but I feel you misunderstood op.
|
| I understood that op meant increase number of samples,
| not variables.
| spywaregorilla wrote:
| I did mean samples (n size) not the number of features.
| But also, no your point isn't right. If you have a ton of
| variables, you'll be better able to overfit your models
| to a training set (which is bad). However, that's not to
| say that a fairly basic toolkit can't help you avoid
| doing that even with a ton of variables. What really
| matter is the effect size of the the variables you're
| adding. That is, whether or not they can actually help
| you predict the answer, distinctly from the other
| variables you have.
|
| Stupid example: imagine trying to predict the answer of a
| function that is just the sum of 1,000,000 random
| variables. Obviously having all 1,000,000 variables will
| be helpful here, and the model will learn to sum them up.
|
| In the real world, a lot of your variables either don't
| matter or are basically saying the same thing as some of
| your other variables so you don't actually get a lot of
| value from trying to expand your feature set mindlessly.
|
| > if you increase the amount of datapoints you collect by
| half you would have to quadruple the amount of samples
| you have to retrieve any meaningful benefit, no?
|
| I think you might be thinking about standard error. Where
| you divide the standard deviation of your data by sqrt of
| the number of samples. So quadrupling your sample size
| will cut the error in half?
| rscho wrote:
| > This is entirely unsurprising and has a very simple solution:
| keep adding more data
|
| Nope. Won't work. Biased data made bigger results only in bias
| confirmation. Which is the real problem.
| rjdagost wrote:
| If there's one thing I learned with biomedical data modeling
| and machine learning, it's that "it's complicated". For
| biomedical scenarios, getting more data is often not simple at
| all. This is especially the case for rare diseases. For areas
| like drug discovery, getting a single new data point (for
| example, the effect of a drug candidate in human clinical
| settings) may require a huge expenditure of time and money.
| Biomedical results are often plagued with confounding
| variables, hidden and invisible, and simply adding in more data
| without detection and consideration of these bias sources can
| be disastrous. For example, measurements from lab #1 may show
| persistent errors not present in lab #2, and simply adding in
| more data blindly from lab #1 can make for worse models.
|
| My conclusion is that you really need domain knowledge to know
| if you're fooling yourself with your great-looking modeling
| results. There's no simple statistical test to tell you if your
| data is acceptable or not.
| dm319 wrote:
| I think this is a key point - the training set is very
| important, because biases, over-curation, or wrong contexts
| will mean the model may perform very poorly for particular
| scenarios or demographics.
|
| I can't find the reference now of a radiology AI system which
| had a good diagnosis rate of finding a pneumothorax on a chest
| x ray (air in the lining of the lung). This can be quite a
| serious condition, but is easy to miss. Turns out that the
| training set had a lot of 'treated' pneumothorax. The outcome
| was correct - they did indeed have a pneumothorax, but they
| also had a chest drain in, which was helping the prediction.
|
| Similar to asking what the demographic of training set is, is
| what the recorded outcome was. How was the diagnosis made.
| There is often no 'gold standard' of diagnosis, and some are
| made with varying degrees of confidence. Even a post-mortem
| can't find everything...
| cm2187 wrote:
| So a model calibrated on a backtest says nothing about its
| predictive capacity. Who would have thought? Well, at least I
| think anyone who worked even a little bit in quantitative
| finance. The only way to validate a model is to make predictions
| and test if those predictions actually happen in a repeatable
| way, which in certain circles is refered to as "experiment".
|
| That's why I distrust any model built purely on backtested data
| unless they can be shown to predict something else than history.
| And AI is not the only area that blindly trusts those kind of
| models.
| rscho wrote:
| Surprise, surprise. People hugely overestimate the data retrieval
| capabilities of healthcare systems. And if you really put
| clinical 'AI' systems to the test in day-to-day settings (which
| is in fact never done), results would be much, much worse.
|
| Shit data in, shit prediction out.
___________________________________________________________________
(page generated 2022-10-21 23:00 UTC)