hngopher.com

       [HN Gopher] Clinical knowledge in LLMs does not translate to hum...
       ___________________________________________________________________
        
       Clinical knowledge in LLMs does not translate to human interactions
        
       See also https://venturebeat.com/ai/just-add-humans-oxford-medical-
       st...
        
       Author : insistent
       Score  : 19 points
       Date   : 2025-06-14 22:18 UTC (41 minutes ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | ekianjo wrote:
       | > perform no better than the control group
       | 
       | This is still impressive. Does it mean it can replace humans in
       | the loop with no loss?
        
         | majormajor wrote:
         | What human? The control group was "instructed to instead use
         | any methods they would typically employ at home." Most people
         | don't have human-doctors-in-the-loop at home.
        
         | jdiff wrote:
         | No, the control group was instructed to "use any methods they
         | would typically employ at home." So ChatGPT is no better than
         | WebMD.
        
           | ekianjo wrote:
           | It's better as in, it's faster to give you an answer versus
           | reading pages of WebMD
        
             | brianpan wrote:
             | You're wrong most of the time, but at least you get there
             | quickly.
        
       | dosinga wrote:
       | Really what it seems to say is that LLMs are pretty good at
       | identifying underlying causes and recommending medical actions
       | but if you let humans use LLMs to self diagnose the whole thing
       | falls apart, if I read this correctly
        
         | majormajor wrote:
         | Yeah it sounds like "LLMs are bad at interacting with lay
         | humans compared to being prompted by experts or being given
         | well-formed questions like from licensing exams."
         | 
         | Feels to me like how two years ago "prompt engineering" got a
         | bunch of hype in tech companies, and now is nonexistent because
         | the models began being trained and prompted specifically to
         | mimic "reasoning" for the sorts of questions tech company users
         | had. Seems like that has not translated to reasoning their way
         | through the sort of health conversations a non-medical-
         | professional would initiate.
        
       ___________________________________________________________________
       (page generated 2025-06-14 23:00 UTC)