[HN Gopher] HealthBench
___________________________________________________________________
HealthBench
Author : mfiguiere
Score : 111 points
Date : 2025-05-12 17:42 UTC (5 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| Zaheer wrote:
| Impressive how well Grok performs in these tests. Grok feels
| 'underrated' in terms of how much other models (gemini, llama,
| etc) are in the news.
| tough wrote:
| you can't download grok's weights to run locally
| simianwords wrote:
| how is that relevant here?
| tough wrote:
| it helps explain why theres' less people talking about them
| than gemini or llama?
|
| less people using them.
| Insanity wrote:
| I can guarantee you none of my friends (not in tech) use
| "downloading weights" as an input to select an LLM
| application.
| simianwords wrote:
| isn't chatgpt the most used or most popular model?
| tough wrote:
| Yes OpenAI has a first-mover advantage and Claude seems
| to be close as a second player with their closed models
| too, open weights is not a requirement for success but in
| an already crowded market (grok's prospect) their
| preposition isn't competing neither with top tier closed
| models nor the maybe lesser-capable but more-available
| battle-tested freely available to run locally open ones
| reissbaker wrote:
| You can't download Gemini's weights either, so it's not
| relevant as a comparison against Gemini.
|
| I think the actually-relevant issue here is that until
| last month there wasn't API access for Grok 3, so no one
| could test or benchmark it, and you couldn't integrate it
| into tools that you might want to use it with. They only
| allowed Grok 2 in their API, and Grok 2 was a pretty bad
| model.
| tough wrote:
| lol sorry mixed them up w gemma3 which feels like the
| open lesser cousin to gemini 2.5/2.0 models
| moralestapia wrote:
| It's not.
|
| Also, only one out of the ten models benchmarked have open
| weights, so I'm not sure what GP is arguing for.
| tough wrote:
| > in terms of how much other models (gemini, llama, etc)
| are in the news.
|
| not talking about TFA or benchmarks but the news
| coverage/user sentiment ...
| ramon156 wrote:
| I don't want to be a conspiracy theorist, but could this be in
| preparation for Amazon's (to be) health branch?
| srameshc wrote:
| Is the Med-PaLM model that Google's has been working on meant to
| be considered for comparison ? If I'm not mistaken, it isn't
| publicly available.
|
| > https://sites.research.google/med-palm/
| aix1 wrote:
| Med-PaLM is old and has been superseded by (multiple
| generations of) Gemini.
| GuinansEyebrows wrote:
| i have zero trust in openai's ability to do anything impartially.
| why should we leave the judgement of a private tool up to the
| makers of the tool especially when human lives are at stake?
| simianwords wrote:
| I agree - we should exercise a bit of caution here. There is no
| way they would release a benchmark which makes their model look
| bad. But then again we know that their models are one of the
| best for other uses so its not a big leap to accept this
| benchmark.
| beezlebroxxxxxx wrote:
| I can already see the pharma salesmen drooling at the idea of
| how various symptoms can be marketed to.
|
| "As an AI agent, a possible diagnosis is [xxx]. Ask your doctor
| about / look into [yyy(tm)] for a possible solution!"
| ceejayoz wrote:
| And OpenAI is definitely thinking about this on their end:
|
| https://www.axios.com/2024/12/03/openai-ads-chatgpt
|
| > OpenAI CFO Sarah Friar told the Financial Times that OpenAI
| is weighing the inclusion of ads in its products but wants to
| be "thoughtful about when and where we implement them."
| barnas2 wrote:
| Ad spots inside chatgpt are going to be worth an obscene
| amount of money.
| amarcheschi wrote:
| I think that the damage of "chatgpt misdiagnose X as Y, person
| dies of Z" would be quite bad for PR
| dcreater wrote:
| Isn't there an obvious conflict of interest when the model maker
| is also the creator of a benchmark? I think at the very least it
| should be from a separate business entity under the non profit or
| from the non profit holding entity itself
| gwd wrote:
| I don't think it's necessarily bad to have the benchmark, but
| the graphs of Gemini and Claude doing worse than o3 did kind of
| leave a bad taste in my mouth. "Oh look, your models are worse
| than ours at this very important metric that we just made up!
| How terrible!"
| progbits wrote:
| It's also barely better, within the error bar on some
| metrics. So if this is the best they can do on their own
| benchmark that's pretty pathetic and shows they have no edge
| anymore.
| pizzathyme wrote:
| Non-clinicians are using ChatGPT every day now to try to find
| assistance (right or wrong) to real-life medical problems. This
| is a great evaluation set that could prevent a lot of harm
| unsupp0rted wrote:
| Recently I uploaded a lab report to chatGPT and asked it to
| summarize it.
|
| It hallucinated serious cancer, along with all the associated
| details you'd normally find on a lab report. It had an answer to
| every question I had pre-asked about the report.
|
| The report said the opposite: no cancer detected.
| maliker wrote:
| Interesting. What LLM model? 4o, o3, 3.5? I had horrible
| performance with earlier models, but o3 has helped me with
| health stuff (hearing issues).
| unsupp0rted wrote:
| Whichever the default free model is right now- I stopped
| paying for it when Gemini 2.5 came out in Google's AI lab.
|
| 4o, o4? I'm certain it wasn't 3.5
|
| Edit: while logged in
| pants2 wrote:
| If you're logged in, 4o, if you're not logged int, 4o-mini.
| Both don't score well on the benchmark!
| askafriend wrote:
| This gets at the UX issue with AI right now. How's a
| normie supposed to know and understand this nuance?
| unsupp0rted wrote:
| Or a non-normie. Even while logged in, I had no idea what
| ChatGPT model it was using, since it doesn't label it.
| All the label says is "great for everyday tasks".
|
| And as a non-normie, I obviously didn't take its analysis
| seriously, and compared it to Grok and Gemini 2.5. The
| latter was the best.
| unsupp0rted wrote:
| Added context: While logged in
| maliker wrote:
| Might be worth trying again with Gemini 2.5. The reasoning
| models like that one are much better at health questions.
| unsupp0rted wrote:
| Gemini 2.5 in AI Studio gave by far the best analysis
| icelancer wrote:
| > Whichever the default free model is right now
|
| Sigh. This is a point in favor of not allowing free access
| to ChatGPT at all given that people are getting mad at
| GPT-4o-mini which is complete garbage for anything remotely
| complex... and garbage for most other things, too.
|
| Just give 5 free queries of 4o/o3 or whatever and call it
| good.
| dgfitz wrote:
| I can't believe you're getting downvoted for answering the
| question about the next-token-predictor model you can't
| recall using.
|
| What is happening?
| Gracana wrote:
| I wonder if it was unable to read your report, and just
| answered as if role-playing?
|
| I gave it a pdf of an engine manual recently and asked some
| questions, which it answered reasonably. It even pulled a
| schematic out for me, though it was the wrong one (it gave me a
| schematic for the CDI ignition variant that we first talked
| about, rather than the DSAI one we settled on later.)
| astura wrote:
| >I wonder if it was unable to read your report, and just
| answered as if role-playing?
|
| FWIW,
|
| This is what chatgpt gave me when I uploaded a lab report, it
| read the results back to me:
|
| Here's an interpretation of the blood test results you
| provided:
|
| White Blood Cell Count (WBC) Value: 9.2 (Normal: 3.8 - 10.8
| Thousand/uL)
|
| Within the normal range, indicating no immediate signs of
| infection or inflammation.
|
| Red Blood Cell Count (RBC) Value: 5.29 (Normal: 3.8 - 5.1
| Million/uL)
|
| Elevated (High). This could indicate dehydration, stress, or
| other conditions like polycythemia.
|
| Hemoglobin Value: 10.9 (Normal: 11.7 - 15.5 g/dL)
|
| Low. Suggests anemia, possibly due to iron deficiency or
| chronic disease.
|
| Hematocrit Value: 36.0% (Normal: 35.0 - 45.0%)
|
| Within the normal range, but on the lower side, supporting
| the presence of mild anemia.
|
| Mean Corpuscular Volume (MCV) Value: 68.1 (Normal: 80.0 -
| 100.0 fL)
|
| Low. Indicates microcytic anemia, often due to iron
| deficiency.
|
| Mean Corpuscular Hemoglobin (MCH) Value: 20.6 (Normal: 27.0 -
| 33.0 pg)
|
| Low. Suggests hypochromia (paler red blood cells), often
| associated with iron-deficiency anemia.
|
| Mean Corpuscular Hemoglobin Concentration (MCHC) Value: 30.3
| (Normal: 32.0 - 36.0 g/dL)
|
| Slightly low. May reflect reduced hemoglobin in red blood
| cells, consistent with iron-deficiency anemia.
|
| Red Cell Distribution Width (RDW) Value: 15.8% (Normal: 11.0
| - 15.0%)
|
| Elevated. Indicates variability in red blood cell size,
| common in iron-deficiency anemia.
|
| Platelet Count Value: 397 (Normal: 140 - 400 Thousand/uL)
|
| Normal, but on the higher side of the range.
|
| Other Key Metrics Absolute Neutrophils, Lymphocytes,
| Monocytes, Eosinophils, Basophils: All within normal ranges,
| suggesting no significant abnormalities in immune cell
| populations.
|
| Anisocytosis (1+): Indicates variation in red blood cell
| size.
|
| Hypochromasia (1+): Suggests pale red blood cells, supporting
| iron-deficiency anemia.
|
| Summary These results suggest iron-deficiency anemia,
| characterized by low hemoglobin, low MCV, low MCH, and high
| RDW. The elevated red blood cell count may reflect
| compensatory mechanisms or mild dehydration.
|
| Your provider recommends a heart-healthy, low-fat diet, which
| is great for overall health. However, addressing the anemia
| might also involve increasing dietary iron or taking iron
| supplements.
|
| The diagnosis is wrong, btw, I don't have iron deficiency.
| The anemia is caused by a genetic condition called
| thalassemia, which has been verified by genetic tests. You
| can use the Mentzer Index to differentiate the two on a
| simple CBC - https://www.mdcalc.com/calc/10534/mentzer-index
|
| Mine numbers return a "probable diagnosis."
|
| I was wondering if chatgpt would catch it, nope, it didn't.
| It did say that it was a possibility once I suggested it
| though.
| nradov wrote:
| It's also disappointing that the LLM would falsely claim
| that a low-fat diet is "great for overall health". That
| claim has been thoroughly debunked in the scientific
| literature (or at least not supported by evidence-based
| medicine), and yet the recommendation still keeps popping
| up. While there are certain specific types of fats that can
| be problematic, in general a moderate amount of fat is fine
| as a macronutrient.
| dekhn wrote:
| LLMs don't have a magical ability to arbitrate complex
| truths. If the "low-fat diet" narrative is prominent in
| the training literature, I would expect the model to say
| that more often that it said the opposite.
|
| I don't ever really use the term "thoroughly debunked"
| when referring to nutrition science; as you noted, a
| better term is that the claim is not supported by the
| evidence. I've seen enough things debunked and then
| rebunked to know that nutrition science is not really
| that accurate.
| arcanemachiner wrote:
| No, cancer detected!
| mvdtnz wrote:
| You're not on Reddit.
| arcanemachiner wrote:
| I typically respect the decorum here, but I couldn't help
| myself this time.
| dekhn wrote:
| What was the outcome (did the subject of the lab report go on
| to present cancer)? (sometimes what we think is a false
| positive ends up being a true positive later, and cancer is a
| late-arriving signal).
| iNic wrote:
| I like that they include the "worst case score at k samples".
| This is a much more realistic view of what will happen, because
| someone will get that 1/100 response.
| pants2 wrote:
| This appears to be a very thoughtful and helpful study. It's also
| impressive to see the improvement in performance in just the last
| year of model development - almost double.
|
| I've found o3 & deep research to be very effective in guiding my
| health plan. One interesting anecdote - I got hit in the chest
| (right over the heart) quite hard a month or so ago. I prompted
| o3 with my ensuing symptoms and heart rate / oxygenation data
| from my Apple watch, and it already knew my health history from
| previous conversations. It gave very good advice and properly
| diagnosed me with a costochondral sprain. It gave me a timeline
| to expect (which ended up being 100% accurate) and treatments /
| ointments to help.
|
| IMO - it's a good idea to have a detailed prompt ready to go with
| your health history, height/weight, medications and supplements,
| etc. if anything's happening to you you've got it handy to give
| to o3 to help in a diagnosis.
| quantumwoke wrote:
| How confident was it in that diagnosis? Did you trust its
| confidence if so? I asked this to my medico wife and apparently
| the other possibility is pericardial effusion from injury which
| can be life threatening.
| simianwords wrote:
| I would really rather like a benchmark purely focusing on
| diagnosis. Symptoms, patient history vs the real diagnosis. Maybe
| name this model House M.D 1.0 or something.
|
| The other stuff is good to have but ultimately a model that
| focuses on diagnosing medical conditions is going to be the most
| useful. Look - we aren't going to replace doctors anytime soon
| but it is good to have a second opinion from an LLM purely for
| diagnosis. I would hope it captures patterns that weren't
| observed before. This is exactly the sort of thing game that AI
| can beat a human at - large scale pattern recognition.
| mrcwinn wrote:
| Happy to see this. I've struggled with an injury for the past
| five years. I've been to multiple sports-focused physicians, had
| various scans. Responses from doctors have ranged from
| "everything seems fine, can't really figure this out" to
| [completely wrong hypothesis]. Tried acupuncture. Tried a
| chiropractor. I remember one doctor, though, had an interesting
| thought that seemed to make sense - but I've been so discouraged
| from so many false starts or misplaced hope, I didn't bother
| following up.
|
| Finally I typed in my entire history into o3-deep-research and
| let it rip for a while. It came back with a theory for the injury
| that matched that one doctor, diagrams of muscle groups and even
| illustrations of proposed exercises. I'm not out of the woods
| yet, but I am cautiously optimistic for the first time in a long
| time.
| Noumenon72 wrote:
| I hope recent cuts to government science have managed to hit
| enough of the safetyists and industry captures who keep us from
| just trying out new healthcare approaches like this and
| learning. They'd like nothing better than to replace the help
| you got with "As a large language model, I am unable to offer
| medical advice."
| candiddevmike wrote:
| Why would you trust a LLM over a battery of human experts? I
| find it hard to believe that the doctors never proposed
| exercises or some kind of physical therapy for you, at least in
| the US.
| BeetleB wrote:
| I can't speak to the OP's condition, but having seen plenty
| of doctors and physical therapists in the US for over a
| decade:
|
| Yes, they propose exercises.
|
| No, they don't work.
|
| For certain (common) conditions, PT seems to have it nailed -
| the exercises really help. For the others, it's just snake
| oil. Not backed by much research. The current state of the
| art is just not good when it comes to chronic pain.
|
| So while I don't know if an LLM can be better than a battery
| of human experts, I do know that those human experts do not
| perform well. I'm guessing with the OP's case, that battery
| of human experts does not lead to a consensus - you just end
| up with 10 different treatments/diagnoses (and occasionally,
| one is a lot more common than the other, but it's still
| wrong).
| zeroxfe wrote:
| I've never used LLMs for this, but as someone who's been
| through a lot of sports-related injuries, I find doctors more
| or less useless (except for prescribing painkillers and
| performing surgeries.)
|
| No doctor or physio has ever been able to fix my chronic
| issues, and I've always had to figure them out myself through
| lots of self-study and experimentation.
| n8henrie wrote:
| Much of modern medicine's expertise is focused on areas that
| reimburse well (in some part because those industries
| subsidize the research).
|
| I would not expect most physicians to have a deep fund of
| literature-backed knowledge to draw from regarding exercise.
| Telling someone to do an exercise probably doesn't compensate
| well.
|
| That said, I'm also pretty negative about the availability of
| rigorous literature regarding much of nutrition, dentistry,
| podiatry, physical therapy, etc... you know, the things that
| affect the health of most human beings that have ever lived.
| scudsworth wrote:
| i mean it sounds like that doctor had it right then? i dont
| think this reads as a win for "online self-diagnosis", ai-
| empowered or otherwise.
| lcnPylGDnU4H9OF wrote:
| From their story, it sounds like the one doctor had it right
| because the AI-empowered self diagnosis seems to agree. Or do
| I misunderstand?
| mvdtnz wrote:
| You trusted known grifters acupuncturists and chiropractors,
| but not doctors. Then you trusted a computer known for telling
| lies.
|
| I think you should take a step back and re-assess your internal
| heuristics.
| BigGreenJorts wrote:
| They literally said they saw sport focused physicians.
| quantumwoke wrote:
| Sounds like you needed a good PT not a physician.
| nradov wrote:
| You have to be realistic and assume that treating most major
| musculoskeletal injuries will be a lengthy trial-and-error
| process. Every injury is unique and imaging studies or other
| tests can only give limited clues about what's going on. There
| are patients in terrible pain whose scans look normal, and
| there are other patients whose scans indicate serious pathology
| yet they are pain-free and completely functional. Instead of
| getting discouraged, keep trying different modalities (except
| for the pseudo-science BS) until something works.
|
| In general a lot of those injuries will eventually heal on
| their own. So it's easy to fool yourself into believing that a
| particular treatment was effective even when the real cure was
| time.
| kypro wrote:
| Why are all the label colours for the "Worst-case HealthBench
| score at k samples" chart the same colour and the same shape?
| Completely unreadable.
| brap wrote:
| The colors are slightly different. I think the person why made
| this was more concerned with appearance and not providing
| information.
| andy99 wrote:
| My sense is that these benchmarks are not realistic in terms of
| the way the model is used. People building specialized AI systems
| are not, in my experience, letting users just chat with a base
| model, they would have some variant of RAG plus some guardrails
| plus other stuff (like routing to pre-written answers for common
| question).
|
| So what use case does this test setup reflect? Is there a
| relevant commercial use case here?
| programmertote wrote:
| I have no doubt that a lot of garden-variety diagnoses and
| treatments can be done by an AI system that is fine-tuned and
| vetted to accomplish the task. I recently had to pay $93 to have
| a virtual session with a physician to get prescription for a
| cough syrup, which I already knew what to take before talking to
| her because I did some research/reading. Some may argue, "Doctors
| studied years in med school and you shouldn't trust Google more
| than them", but knowing human's fallibility and knowing that a
| lot of doctors do look things up on places like
| https://www.wolterskluwer.com/en/solutions/uptodate to
| refresh/reaffirm their knowledge, I'd argue that if we are
| willing to take the risk, why shouldn't we be allowed to take
| that risk on our own? Why do I have to pay $93 (on top of the
| cough syrup that costed ~$44) just so that the doctor can see me
| on Zoom for less than 5 mins and submit an order for the med?
|
| With the healthcare prices increasing at the breakneck speed, I
| am sure AI will take more and more role in diagnosing and
| treating people's common illnesses, and hopefully (doubt it), the
| some of that savings will be transferred to the patients.
|
| P.S. In contrast to the US system, in my home city (Rangoon,
| Burma/Myanmar), I have multiple clinics near my home and a couple
| of pharmacy within two bus stops distance. I can either go buy
| most of the medications I need from the pharmacy (without
| prescription) and take them on my own (why am I not allowed to
| take that risk?) OR I can go see a doctor at one of these clinics
| to confirm my diagnosis, pay him/her $10-$20 for the visit, and
| then head down to the pharmacy to buy the medication. Of course,
| some of the medications that include opioids will only be sold to
| me with the doctor's prescription, but a good number of other
| meds are available as long as I can afford them.
| BeetleB wrote:
| Where are you that you need a prescription to get cough
| medicine? The only ones I know of that require prescription are
| the ones with controlled substances.
| dgfitz wrote:
| > Of course, some of the medications that include opioids
| will only be sold to me with the doctor's prescription
| HaZeust wrote:
| Ain't nothing wrong with a little Wockhardt to get you
| through the day
| ezst wrote:
| I mean, if such is your typical experience with healthcare,
| what you need is a big shake up, don't count on AI to help you
| there! (but it might inflate the profit margins for the few
| running the show at your expense).
| yapyap wrote:
| You think you won't have to pay the AI? lol.
| hollerith wrote:
| IMHO it is much easier to increase the supply of AI doctor
| hours than the supply of human doctor hours, and supply
| constraints are the main thing maintaining doctor hours at
| the high price it currently sells for.
| onlyrealcuzzo wrote:
| There are simply not enough doctors for doctors to be doing
| this work.
|
| We have a massive, massive shortage of doctors.
|
| The industry is doing everything they can to make it worse by
| the day, so I won't hold my breath that we'll get the slightest
| bit of respite.
|
| It'd obviously be ideal if everyone could see a doctor for an
| affordable price any time they wanted.
|
| We don't live in the ideal world.
|
| This would be a HUGE win for most people.
| nradov wrote:
| This is why most routine primary care can be delivered by
| Nurse Practitioners and Physician Assistants working under a
| doctor's supervision.
| closewith wrote:
| NPs and community paramedics are definitely the future for
| a huge chunk of the current GP workload, but PAs are a
| failed experiment which are being rolled back pretty much
| everywhere.
| nradov wrote:
| Are you sure about that? The ranks of PAs have been
| expanding rapidly. I know there has been some political
| resistance over _expanding_ their scope of practice but I
| 'm not aware of it being significantly rolled back.
|
| https://www.bls.gov/ooh/healthcare/physician-
| assistants.htm
| closewith wrote:
| [delayed]
| imiric wrote:
| > Why do I have to pay $93 (on top of the cough syrup that
| costed ~$44) just so that the doctor can see me on Zoom for
| less than 5 mins and submit an order for the med?
|
| Because you're paying for the expertise of someone who studied
| for more than a decade which you won't get from a random web
| search.
|
| An AI system with today's technology should be less trustworthy
| for medical diagnosis than a web search. At least with a web
| search you might stumble upon a site with content from experts,
| assuming you trust yourself to be able to discern expert advice
| from bot-generated and spam content. Even if a doctor is doing
| the searching instead of me, I would pay them only for their
| knowledge to make that discernment for me. Why you think an AI
| could do better than a human at that is beyond me.
|
| Your question reminds me of that famous Henry Ford GE invoice
| story:
|
| > Making chalk mark on generator: $1.
|
| > Knowing where to make mark: $9,999.
| 85392_school wrote:
| AI systems have been improving. O3 now has the capability to
| decide to search multiple times as part of its response.
| creata wrote:
| Even o4-mini (free) uses web searches and runs Python
| scripts very eagerly. I'm not sure how long they'll be able
| to afford giving all of that away.
| ChadNauseam wrote:
| You answered why someone would want to pay $93, but not why
| they _have_ to pay $93.
| hiatus wrote:
| There's a limit though right? How about, if you can
| accurately predict the doctor's diagnosis it's free? If not
| you pay. This person needed a doctor's approval presumably
| for a prescription they couldn't get without authorization
| which leads to this gatekeeping. Not to mention also
| contributes to the insane medical costs in the US. $93 for 5
| minutes is over 1k _an hour_ for what amounts to a rubber
| stamp (because how much can you really dig into a person's
| medical history and condition in 5 mins).
| wubrr wrote:
| I mean, it really depends on what you're trying to
| diagnose/do. Yeah, a math PhD might have spent a decade+
| studying the subject. Doesn't mean I need to pay for their
| skills instead of using a calculator/program to solve my math
| problem.
|
| > Why you think an AI could do better than a human at that is
| beyond me.
|
| Why do you think an AI couldn't do better than a human, when
| we have ample evidence of computers/AI exceeding humans in
| many areas?
| const_cast wrote:
| The only reason this worked is because your situation was
| exceedingly simple.
|
| The trouble is you are not educated enough to tell what is
| simple and what isn't. A cough could be a cough or it could be
| something more serious, only a "real" examination will reveal
| that. And sometimes even that's not enough, you need an
| examination by a specialists.
|
| I'll tell you a story. Once upon a time I got pain in my balls.
| I went to a doctor and he felt around and he said he didn't
| feel anything. I went to another doctor and he felt _something_
| , but he had no idea what it was. He said could be a cyst,
| could be a swollen vein, could be an infection - he didn't even
| know if it was on the testicle or on the tube thingy.
|
| Then I went to a Urologist. You can tell this man has felt up a
| _lot_ of balls. He felt me up and said, "yup, that's a tumor"
| almost immediately. He was right, of course, and he ended up
| being the one to remove it too. Since I caught the cancer
| pretty early the chemotherapy wasn't too intense.
|
| Point is, expertise matters when things aren't straight
| forward. Then, experience and perspective gets to shine.
| imiric wrote:
| Good lord. The idea that a system that produces pseudo-random
| output without any semantic understanding can be relied on to
| give accurate health-related information is deeply flawed and
| troubling. It's one thing to use these things for finding
| patterns in data, for entertainment purposes, and for producing
| nonsensical code a human has to fix, but entirely different to
| rely on them for health diagnosis or advice. I shudder at the
| thought that a medical practitioner I go to will parrot whatever
| an LLM told them.
|
| This insanity needs to be regulated yesterday.
| FlamingMoe wrote:
| Good lord. The idea that a sleep-deprived, bias-prone mammal
| can be relied on to give accurate health-related information is
| deeply flawed and troubling. It's one thing to seek a human for
| empathy, small talk, or a routine vaccine, but entirely
| different to assume an overworked doctor will consistently out-
| diagnose a modern data tool. I shudder at the thought that the
| practitioner I visit might simply repeat whatever a drug-rep
| lunch pitch or outdated guideline left in their memory.
| stavros wrote:
| This insanity needs to be regulated yesterday.
| imiric wrote:
| Thankfully, it is. :)
| hiatus wrote:
| Are there regulations regarding doctor's working hours
| similar to say, truck drivers? I found this [1] article
| which states doctors are limited to 80 hours a week
| whereas truck drivers are limited to 70 hours in a 7 day
| period. [2]
|
| [1] https://psnet.ahrq.gov/primer/duty-hours-and-patient-
| safety
|
| [2] https://www.fmcsa.dot.gov/sites/fmcsa.dot.gov/files/d
| ocs/Dri...
| n8henrie wrote:
| Those restrictions are for residents (trainees).
|
| They were also found not to improve patient outcomes
| (possibly due to increased number of handoffs, which are
| highly error prone).
| creata wrote:
| Can you state more clearly _why_ it 's deeply flawed?
|
| Because while LLMs obviously have massive limitations, so do
| humans, and it's not entirely clear to me that some synthesis
| of the two can't produce much better results than either on its
| own.
| yapyap wrote:
| Sam Altman does not care about "improving human health"
| NKosmatos wrote:
| Most probably I'm going to get downvoted, but I'm gonna say it...
|
| It's a pity they don't support Greek language, keeping in mind
| that almost all medical terminology has Greek origins.
|
| Anyhow, this is a step in the good direction and for sure it will
| aid many people looking for medical assistance via ChatGPT.
___________________________________________________________________
(page generated 2025-05-12 23:00 UTC)