https://statmodeling.stat.columbia.edu/2021/06/07/ai-promised-to-revolutionize-radiology-but-so-far-its-failing/ Statistical Modeling, Causal Inference, and Social Science Skip to content * Home * Books * Blogroll * Sponsors * Authors * Feed << 4 years of an unpopular Republican president -> bad news for Republican support among young voters -> continuation of unprecedented generation gap -> I'm not sure what this implies for politics "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" Posted by Andrew on 7 June 2021, 9:31 am Gary Smith points us to this news article: Geoffrey Hinton is a legendary computer scientist . . . Naturally, people paid attention when Hinton declared in 2016, "We should stop training radiologists now, it's just completely obvious within five years deep learning is going to do better than radiologists." The US Food and Drug Administration (FDA) approved the first AI algorithm for medical imaging that year and there are now more than 80 approved algorithms in the US and a similar number in Europe. OK, so far no surprise. But then: Yet, the number of radiologists working in the US has gone up, not down, increasing by about 7% between 2015 and 2019. Indeed, there is now a shortage of radiologists that is predicted to increase over the next decade. What happened? The inert AI revolution in radiology is yet another example of how AI has overpromised and under delivered. . . . Radiology--the analysis of images for signs of disease--is a narrowly defined task that AI might be good at, but image recognition algorithms are often brittle and inconsistent. . . . only about 11% of radiologists used AI for image interpretation in a clinical practice. Of those not using AI, 72% have no plans to do so while approximately 20% want to adopt within five years. The reason for this slow diffusion is poor performance. . . . Interesting. I'm not sure what to think here. AI will only get better, not worse, so it seems reasonable to suppose that in the not-too-distant future it will be useful, at the very least as aid to radiologists. A lot of work has to get into making any system be useful in practice, but there's lots and lots of money in radiology so I'd think that someone could be put on the job of building a useful tool. Here's an analogy to a much simpler, but still not trivial, problem. Nearly twenty years ago some colleagues and I came up with an improved analysis of serial dilution assays. The default software on these lab machines was using various seat-of-the-pants statistical methods that were really inefficient, averaging data inappropriately and declaring observations "below detection limit" when they were actually carrying useful information. We took the same statistical model that was used in that software and just fit it better. It wasn't super-hard but there were various subtle twists, and we published our method in the journal Biometrics. I thought this would revolutionize dilution assays. Well, no. In the 17 years since that paper has appeared, it's been cited only 61 times. And the assay machines still use some version of the same old crappy software. Why? It's simple. We haven't provided a clean alternative. It's not enough to have a published paper showing how to fit the model. You need to program it in, and you need the program to handle all the bad things that might happen to the data, and we haven't done that. Way back when we got an NIH grant to implement our model, but in the meantime things changed at the lab and we didn't have access to a stream of data to try out ideas, and everything was a bit harder to implement than we thought, and we got tangled up in some issues with the raw data, and . . . well, we've returned to the problem recently so maybe we'll make some progress, but the point is that it's hard to come up with a usable replacement, even in a problem as clean and clearly defined as a numerical laboratory assay. There are lots of dilution assays being done in the world, so in theory I think there would be money available to increase the efficiency of the estimates, but it hasn't happened. The radiology story is different in that there's more money but the problems, both technical and institutional, are more difficult. But I guess Hinton was wrong when he said, "We should stop training radiologists now." You can be good at science and engineering but still not be the right person to forecast employment trends. Filed under Decision Theory, Economics, Jobs, Public Health Comment (RSS) | Permalink 34 Comments 1. [gif][a23] John Hall says: June 7, 2021 at 9:50 am "You can be good at science engineering but still not be the right person to forecast employment trends." It's not so easy to forecast employment trends, even for people who study forecasting or for people who study employment or for people who study trends. The best coordination tool we have is prices. Radiologists are still making a lot of money, which is encouraging people to study it and become radiologists. If image recognition starts reducing the income of radiologists, then the marginal radiologists will start doing something else. Reply to this comment + [gif][gif][gif][422] Daniel Lakeland says: June 7, 2021 at 10:17 am Also just because the best societal outcome is to use some technology doesn't mean that's the outcome that will occur. You think radiologists will want to have their top 1% jobs eliminated and just go on to be paid $15/hr to plug images into software tools? Hell no, they're going to argue that people should die because they dont use the superior tools just so they can keep paying the mortgage on their 6 bedroom houses and upkeep on their fishing boats. Of course they won't put it like that. But it'll amount to the same thing. Reply to this comment o [gif][gif][gif][2f5] Rahul says: June 7, 2021 at 11:31 am Therein lies the problem of self regulation. The incentives can be perverse at times. Reply to this comment o [gif][414] Maarten says: June 7, 2021 at 1:43 pm Radiologist here. People always seem to confuse radiologists with a machine which gets presented a set of imagrs and comes up with a diagnosis. This is one of the reasons why I am jot afraid of being replaced in the coming 40 years left in my career. Reply to this comment # [gif][gif][gif][422] Daniel Lakeland says: June 7, 2021 at 3:02 pm Suppose there were a machine, you give it a set of images and it writes a textual radiologist type report. It provably does better than say 85% of human radiologists at that job. What would be the value added by a random radiologist? Real actual question not rhetorical. I'd like to know. Reply to this comment @ [gif][dff] mu says: June 7, 2021 at 4:07 pm Isn't this mostly just "a machine which gets presented a set of images and comes up with a diagnosis"? I thought Maarten was arguing that radiologists do much more than that? (This also sounds like a pretty fantastic machine, in multiple senses of the word.) Reply to this comment - [gif][gif][gif][422] Daniel Lakeland says: June 7, 2021 at 4:14 pm Exactly I thought he was arguing the same thing so I'm asking him to give some examples of what the value added from the person is. Reply to this comment @ [gif][d4e] Randy Crawford says: June 7, 2021 at 5:41 pm Sometimes the question for the radiologist is pretty general, like "What's wrong with this person?" Or "...this spleen?" Or "...these blood vessels?" AI has a lot of potential to answer very specific questions but has shown very little skill in reading between the lines (or making sense of anatomy that's unusual). I work in medical imaging and believe it'll be a long time before AI-based diagnostic tools replace physicians fully... decades, if ever. But there's a lot of potential for building better lab-based diagnostic assays along the way where AI will indeed make major inroads. Reply to this comment # [gif][fee] Thomas Speidel says: June 7, 2021 at 3:15 pm I'm not a radiologist, but I used to work with radiation oncologists and I felt that reading scans were a very small part of their work. Reply to this comment 2. [gif][8a6] Matt Skaggs says: June 7, 2021 at 10:35 am It is hard for me to imagine a worse use of AI than radiology. I got to experience the AI revolution in real time. Early in my career, I taught myself how to interpret infrared spectra of finished products, mostly plastics and lubricants and such. First it was just me comparing my results to printed spectra in fat books. Then the first computers led to the FTIR revolution (Fourier Transform IR) and brought primitive search-match software, which only worked on pure substances. Reading spectra is like learning a new language, and many (most) labs didn't have someone who could do it. So right from the beginning of search-match software, some labs simply gave the customer the search-match results list. In the real world, the results were useless because a typical plastic has five constituents, all of which respond to infrared differently, and the correct plastic would not appear on the search-match list. As time went on, the software got better of course. By the time I retired, there was a function that could deconvolute up to three spectra superimposed over each other, better than I could. It was only possible because of vast computing power compared to what had been available, but still worked by brute force to a large degree. Thinking about radiology - amorphous shapes in grayscale - I can imagine that the software will eventually get there, but it might be the last field where it happens. Reply to this comment + [gif][gif][gif][2f5] Rahul says: June 7, 2021 at 11:30 am Coming from the chemical industry I appreciate what you are saying here but I think there's a difference: Deducing component identity from a mixtures IR signature was never easy even for a human. OTOH in the case of radiology we are probably only trying to get an AI algorithm to do as well as a human radiologist already does. Reply to this comment o [gif][6ba] jeff says: June 7, 2021 at 1:04 pm Reading a radiograph has never been easy for people either. Reply to this comment + [gif][25e] Phil says: June 7, 2021 at 3:10 pm "It is hard for me to imagine a worse use of AI than radiology." That seems odd to me. Computers can now be trained to recognize many patterns and phenomena much, much better than humans can. Why would radiology be different? Matt: my father spent much of his career at the U.S. Department of Agriculture, where he worked on ways to extract useful information from satellite images. He got tired of people assuming that you could determine the type and health of a crop based on its spectrum, so he wrote a little program: feed in a spectrum (from a LANDSAT image of a specific location, say) and the program would spit out a bunch of spectra that looked just like it and tell you what crops they were. A spectrum from a field of young, slightly dry corn might turn out to be virtually indistinguishable from that of mature soybeans or whatever. I guess mostly the chlorophyll absorption just dominates and everything else is small background that is modified by atmospheric conditions etc. anyway. At any rate, as a practical matter, given the spectral resolution available at the time, you couldn't even tell what species the crops were. Reply to this comment o [gif][gif][gif][] Ben says: June 7, 2021 at 4:43 pm > He got tired of people assuming that you could determine the type and health of a crop based on its spectrum I would have guessed something like this is is why the satellites were taking pictures in first place. Is this true and the spectrums didn't end up being that useful? Or were they put up for something else and everyone just thought this would be one ez addition to the software? Or what? Reply to this comment 3. [gif][gif][gif][] Ben B says: June 7, 2021 at 10:45 am Posts like this keep reminding me of an article published by MIT Tech Review last fall. It discusses how the Duke University Health System implemented Sepsis Watch, a Duke University deep-learning model, to better detect sepsis in its patients. The takeaway is essentially that the tool was only successful because of its bespoke implementation process, which relied heavily on hospital nurses and their understanding of how the hospital actually operated. A technology is not introduced to a blank slate, it is folded into a web of existing structural and social hierarchies and systems that, more often than not, it disrupts. A new tech only works when it has sufficient buy-in. That only comes when the tech complements and is an outgrowth of these underlying systems. I think the same could be said of AI's place in radiology. Article: https://www.technologyreview.com/2020/10/02/1009267/ ai-reduced-hospital-deaths-in-the-real-world/ Reply to this comment + [gif][gif][5a5] Jessica Hullman says: June 7, 2021 at 2:27 pm Agreed, there is a "user-centered design" process behind integrating AI into health settings effectively (and probably most other settings for that matter). How to elicit and apply domain knowledge from the various people the AI is supposed to help is a part that doesn't get discussed as much (at least not in the ML lit) but which is obviously critical. Reply to this comment + [gif][f25] Curious says: June 7, 2021 at 2:52 pm Ben B: This "web of existing structural and social hierarchies" is also why many companies are unable to adapt to a changing competitive landscape. Required buy-in can simply reinforce and concretize current maladaptive structures and processes. Reply to this comment + [gif][gif][gif][] Ben says: June 7, 2021 at 4:38 pm Oh cool another Ben B :waves:! Reply to this comment o [gif][gif][d58] Andrew says: June 7, 2021 at 6:24 pm Ben: Don't worry, we'll build an AI to tell the two of you apart. We'll train it on the collected blog comments of a corpus of different people named Ben B. Reply to this comment # [gif][dcf] Ben Bolker says: June 7, 2021 at 6:47 pm Hmmm. (How many are Benjamin B? Bens I've known include Bentley, Bennett, and Benedicte as well as Benjamin ...) Reply to this comment 4. [gif][d3c] oncodoc says: June 7, 2021 at 10:55 am No field in medicine saw more technological innovation than radiology during my career. I remember being slack jawed with amazement when I saw the my first head CT scan in a patient with melanoma and a suspected brain metastasis; the image was so clear, so obvious! This was on the second scanner in the US in 1976, and it obtained images getting 5 mm slices. Image quality and precision has increased dramatically since then. We can imagine normal sized lymph nodes and determine whether they contain cancer with pretty good reliability. MRI and PET also generate a huge amount of data. The amount of data is growing rapidly, and there is no way the number of human readers can keep pace. I think that AI is going to be a necessity. We can't build using guys with pick and shovels; we can't do actuarial work with paper and pencil, and we can't do radiology with guys looking at a viewbox in a dark room. The amount of information generated by the tools of 2020 is too great, and the tools are getting better. There are aircraft that require adjustments in trim faster than humans can respond, and we have devices crawling around Mars that are too far away for immediate human oversight; these things require machine learning. To get useful information out of the fantastic imaging tools we have requires AI tools. Development will take more time than predicted by an enthusiast, but it is coming. Reply to this comment + [gif][gif][gif][2f5] Rahul says: June 7, 2021 at 11:26 am Has the development post 1976 been as amazing? Most of the advances in CT / MRI etc. so far seem to be what I would call digital signal processing, sensors, algorithms, physics rather than what goes as AI. Basically we get increasingly precises and high resolution images coded by various attributes. I guess the AI part is what would allow the diagnosis to be made automatically and that does not seem to have progressed much? I mean even a macroscopic thing like a femur fracture still gets manually reported by a human radiologist? Reply to this comment 5. [gif][0ed] Raghu Parthasarathy says: June 7, 2021 at 11:16 am Without actually seeing the performance data, not linked anywhere as far as I can tell, it's hard to know where the conclusion actually lies on the spectrum between (i) "AI has overpromised and under delivered" and (ii) people resist better, non-human tools that might make their jobs redundant. I can easily imagine either. Reply to this comment + [gif][389] John Richters says: June 7, 2021 at 11:38 am Andrew: Looks like your 2004 article fell victim to the first law of wing walking: the natural and normatively rational reluctance of scientists and practitioners to let go of what they're holding until they are holding something at least as secure. John Reply to this comment + [gif][d00] josh rushton says: June 7, 2021 at 2:35 pm I completely agree that the story here could be either (i) or (ii), or some combination of the two. The practitioner self-interest point (ii) is particularly important in healthcare because the economic pressure toward lower costs is relatively weak (sometimes even inverted) in that sector, at least in the US. Of course, the other sector that AI was scheduled to have revolutionized by now was transportation. The downward cost pressure in that sector is relentless, so that seems like a more straightforward case of AI over-hype. I'm curious if others think this is just a slight timeline adjustment or some more fundamental barrier. I don't claim to know much about AI, but some surprising failures (e.g., the crazy volume of obviously fake reviews on Amazon) give me pause. Reply to this comment + [gif][gif][d58] Andrew says: June 7, 2021 at 2:54 pm Raghu: Yes, there are a lot of steps between technology and implementation. Here's a quick list: 1. Making the technology work at all. 2. Making it reliable. 3. Implementing it so it is easy to use. 4. Getting a track record so people trust it. 5. Overcoming economic barriers. When writing the above post, I was thinking of all these difficulties, and that's one reason I gave the example of my own technology with the serial dilution assays that succeeded with #1 but not #2 ,3, 4, and 5. In other settings, it can be possible to start with #5 and then try to go backward to reach #1. For example, that Tesla autopilot thing which doesn't work but has somehow succeeded with #3, 4, and 5. Or various scammy things like health care databases that cost zillions of dollars and have to be thrown out, but they succeed in the sense that someone cons the purchasing officer at some major company to pay 2 million dollars for it. Reply to this comment 6. [gif][170] Chris says: June 7, 2021 at 11:49 am A major problem with the scaling of research projects in this space is the lack of enough open data and stringent review processes. Algorithms that supposedly 'worked' were later found to have rulers in images (a physician would put a ruler only next to a more suspect lesion, or the rule would be in the image from a specialist provider so these lesions already passed an earlier screening). Much of this was written up a couple years ago, https://www.statnews.com/2019/10/23/ advancing-ai-health-care-trust/ Reply to this comment 7. [gif][2da] Keith O'Rourke says: June 7, 2021 at 11:54 am I suspect one of the bumps in the road here was the position that given we can't understand how people make decisions (the explanations they give are just stories rather than fact) we should not expect or even try to understand machine learning models/representations. However, models are deductive and they can be fully understood (what they imply) until they get too complex. However, even those as complex as deep neural nets do get understood well enough in certain cases e.g. no green pixels: can't be a cow. So they are almost ideal for becoming untrustworthy. So thinking interpretability could just be dismissed as an unreasonable expectation may have worked initially - hey the accuracy versus interpretability trade off myth persists and there is still an "explainable" AI (XAI) industry - it was a self defeating strategy. Eventually people do understand at least in whenever they can. Reply to this comment 8. [gif][cbf] Carlos Ungil says: June 7, 2021 at 1:16 pm Several auto manufacturers were also promising at the time self-driving cars by 2020. Funny progress report: https://www.reddit.com/r/teslamotors/ comments/nrs8kf/ you_think_ice_cream_truck_stop_signs_are_a_problem/ Overpromising and underdeliving is the name of the game. Reply to this comment 9. [gif][400] Alan Crowe says: June 7, 2021 at 2:52 pm Here are two ways that AI, as applied, could get worse with time. 1)Radiologists can exercise quality control over images. Fuzzy image? Obvious artifacts? The radiologist can insist on getting another image, and has the clout to insist on decent image quality. Fast forward 20 years, to a world with AI reading the images. Now the "AI radiologist" is just a machine with no clout. Over time people get sloppy about image quality and the AI just has to do the best it can with poor images. Accuracy goes down. 2)Along comes a respiratory virus. Lots of patients have mild scarring in their lungs. But FDA certification is an expensive, one way street. Accuracy has gone down, but the old, certified software doesn't get decertified. Meanwhile, upgrading the software is technically quite demanding and nobody wants to spend the large sums of money needed to get the upgraded software certified, so it doesn't happen. Getting somethings that is good in theory, to be good when rolled out nation wide, is hard. Ensuring that the deployed version stays good is harder still. Reply to this comment 10. [gif][gif][5a5] Jessica Hullman says: June 7, 2021 at 2:53 pm Reminds me of some work related to visual query refinement interfaces for pathologists working with deep learning based image search algorithms, e.g. https://dl.acm.org/doi/pdf/10.1145/ 3290605.3300234. The idea there seems to be that what possible diagnoses are relevant and what visual features should be analyzed closely will depend in part on information the pathologist has about the patient and case at the specific point of the search, so they can lose trust in fully automated methods. Reply to this comment 11. [gif][gif][373] JM says: June 7, 2021 at 4:00 pm I don't believe this has much to do with 'performance' at all. It seems to me that the places where AI has the hardest time making a real impact are places where qualitative results are most important, rather than quantitative ones. Getting self driving cars is a totally qualitative goal (does it need to avoid 100% of accidents? handle 100% of situations? These are impossible goals for any human driver, but any other quantitative threshold seems irresponsible). At a certain point someone needs to say "yeah this is good enough" and there is not really a good indication where that point should be, so it's impossible to reach. Diagnosing from radiology scans seems like it could be quantitative, but it's not. It's only quantitative when you have a database of labeled data, which is not reflective of the real world at all. In the real world success is measured by something like "did the patient get better or avoid some bad outcome" where getting better involves a huge number of things above and beyond detecting some anomaly on a scan, and most of which have more to do with interpersonal interactions or institutional practices. It's not at all surprising to me that adoption of a complex new tool to marginally improve this one aspect of an already complex process isn't happening. And it's not plainly obvious to me that the world in which is DOES happen would have better outcomes (because of the rest of the complex process that has nothing to do with image processing). This is not to mention that any of these 'automated' systems require constant upkeep, usually of highly trained professionals who demand salaries near the top of the earning ladder. You need a pretty big improvement to justify that amount of additional overhead. Maybe this could work if you can replace a whole team of radiologists with one ML engineer, but there is no incremental path to this. And it's not like any given hospital system has millions of radiologists to potentially replace, so the scaling potential is really limited. AI seems to dominate in certain areas (advertising, finance) where a small quantitative improvement is both important and measurable. This is also the the world in which AI researchers operate (optimizing some model accuracy metric, and being rewarded with papers in flashy journals). Given that advancements in this area seem to consist mostly of adding complexity (and therefore cost), I'm not sure if will ever reach the point where the benefits scale enough to outweigh the costs in the real world for things like healthcare. Of course, people much smarter than me are making billion dollar bets to the contrary. Reply to this comment + [gif][gif][373] JM says: June 7, 2021 at 4:04 pm I should have said the benefits won't scale enough to DRAMATICALLY change things like healthcare. There are definite marginal gains to be made. Reply to this comment 12. [gif][4d4] gec says: June 7, 2021 at 4:32 pm Is the unusual format of today's title actually an obscure callback to the Good Old Days of AI by evoking LISP function name conventions? Reply to this comment Leave a Reply Click here to cancel reply. [ ] Name [ ] Mail (will not be published) [ ] Website [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [Submit Comment] << 4 years of an unpopular Republican president -> bad news for Republican support among young voters -> continuation of unprecedented generation gap -> I'm not sure what this implies for politics * Search for: [ ] [Search] * Recent Comments + Ben Bolker on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Andrew on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Christian Hennig on He wants to test whether his distribution has infinite variance. I have other ideas . . . + Randy Crawford on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Ben on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Ben on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + gec on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Daniel Lakeland on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Alex on 4 years of an unpopular Republican president -> bad news for Republican support among young voters -> continuation of unprecedented generation gap -> I'm not sure what this implies for politics + mu on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + JM on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + JM on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Thomas Speidel on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Phil on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Daniel Lakeland on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Andrew on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Jessica Hullman on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Alan Crowe on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + Curious on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" + josh rushton on "ai-promised-to-revolutionize-radiology-but-so-far-its-failing" * Categories + Administrative + Art + Bayesian Statistics + Causal Inference + Decision Theory + Economics + Jobs + Literature + Miscellaneous Science + Miscellaneous Statistics + Multilevel Modeling + Political Science + Public Health + Sociology + Sports + Stan + Statistical computing + Statistical graphics + Teaching + Zombies Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here. Powered by WordPress. Theme F2.