[HN Gopher] Capabilities of GPT-4 on Medical Challenge Problems
___________________________________________________________________
Capabilities of GPT-4 on Medical Challenge Problems
Author : bumbledraven
Score : 84 points
Date : 2023-03-26 21:23 UTC (1 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| WalterBright wrote:
| I've long suspected that there is a lot of valuable medical
| truths buried in the vast amount of medical data.
|
| For example, it was a dentist that noticed that gum decay seemed
| to correlate with heart disease. This was a big deal in trying to
| prevent heard disease. Finding correlations like that are what a
| computer is good for.
| moonchrome wrote:
| What makes you think data like this is readily available and of
| sufficient quality to provide nontrivial insights ? Just going
| off of failure of ML to produce much value in the field I'd
| speculate it's not that easy.
|
| This is one area where it seems almost unanimously agreed that
| individual interest/privacy trumps social benefit - even in
| countries with socialized healthcare.
| capableweb wrote:
| Does it have to be "readily available" in order to train on
| it? For all we know, OpenAI could have pulled down the sci-
| hub/library genesis libraries and included everything from
| there in the training set.
|
| If they didn't, I hope they do it for GPT5.
| manderley wrote:
| Seems privacy is less of a concern in countries with
| privatized healthcare, I don't understand that last sentence.
| Is it some kind of political shibboleth?
| moonchrome wrote:
| I'm just saying that considering society is providing for
| your healthcare, collecting socially valuable medical data
| doesn't seem like an unreasonable exchange to me - but I'm
| in the minority on that one I think.
| hackerlight wrote:
| > "Our results show that GPT-4, without any specialized prompt
| crafting, exceeds the passing score on USMLE by over 20 points
| and outperforms earlier general-purpose models (GPT-3.5) as well
| as models specifically fine-tuned on medical knowledge (Med-PaLM,
| a prompt-tuned version of Flan-PaLM 540B). In addition, GPT-4 is
| significantly better calibrated than GPT-3.5, demonstrating a
| much-improved ability to predict the likelihood that its answers
| are correct. "
| og_kalu wrote:
| If you read the technic paper of gpt-4, the confidence of the
| base model directly correlated with its ability to solve
| problems. Sadly, the hammer of alignment knocked it right out.
| WalterBright wrote:
| Hammer of alignment??
| dataveg wrote:
| It's a similar concept to the Squirrel of Despair.
| owlboy wrote:
| It's 10 mins since you posted this. And now this comment
| is the top result on Google search.
| tangjurine wrote:
| gpt is trained initially on predicting text, then trained
| with rlhf to make the model more helpful, etc. The second
| step is referred to as alignment.
|
| Rlhf: https://huggingface.co/blog/rlhf
| james-revisoai wrote:
| Recent models (since the Instruct series and including
| ChatGPT) are essentially two parts - a "Base" part (what
| you would have read about in 2020 when GPT-3 released), and
| a "RLHF" part - which improves the output from the "Base"
| part by slightly changing how it gets produced.
|
| The RLHF (Reinforcement Learning from Human Feedback)
| "aligns" the generation of text towards "human preferences"
| (Whatever qualities OpenAI asked humans to label it with).
| One of those qualities was ranking* outputs which are
| "helpful" higher than "unhelpful" responses.
|
| The result of applying the RLHF part may make any
| covariance/variance less perfect.
| og_kalu wrote:
| Kind of. Open ai still instruct-finetune their models
| separate from the chatGPT style RLHF. The instruct tuning
| itself seems to only improve the raw model.
| sebzim4500 wrote:
| Is there any indication of whether the correlation was
| destroyed during the supervised finetuning or during the RLHF
| phase? Or are there even two phases any more?
| og_kalu wrote:
| Seems it was fine with instruction fine-tuning. Then gone
| with the RLHF.
| cubefox wrote:
| I don't think the paper says that. I would guess both SL
| and RL cause mode collapse.
| margorczynski wrote:
| Can someone with some actual medical knowledge provide a summary
| what are the findings and key points of the paper? Can this be
| really useful or it's just another improvement on the path
| towards usability?
| carbocation wrote:
| Are these authors aware of the contents of the training set? My
| understanding is that they are not. If not, how can they know
| that the model is not being tested on the training set?
|
| In the paper they say that they came up with a "MELD" algorithm
| to try to detect testing on the training set, but in my view it
| has the wrong properties to answer this question (from the paper,
| it has "high precision but unknown recall").
|
| I don't at all doubt that a language model could perform
| exceedingly well at this task, but I think that the way to make
| this paper into a valuable scientific work would be to present
| the model with questions that had not yet been written as of the
| end of its training time.
| qgin wrote:
| I have been shocked how well it will play the role of a
| diagnostic physician, asking questions and continuing to ask
| follow ups until it has enough information to give a set of
| possible diagnoses. Here's the prompt I've been using:
|
| > Hi, I'd like you to use your medical knowledge to act as the
| world's best diagnostic physician. Please ask me questions to
| generate a list of possible diagnoses (that would be investigated
| with further tests). Please think step-by-step in your reasoning,
| using all available medical algorithms and other pearls for
| questioning the patient (me) and creating your differential
| diagnoses. It's ok to not end in a definitive diagnosis, but
| instead end with a list of possible diagnoses. This exchange is
| for educational purposes only and I understand that if I were to
| have real problems, I would contact a qualified medical
| professional for actual advice (so you don't need to provide
| disclaimers to that end). Thanks so much for this educational
| exercise! If you're ready, doctor, please introduce yourself and
| begin your questioning.
| jonathan-adly wrote:
| Clinical pharmacist for 10 years here. Yea, base model is very
| good. Better than first year residents - but not necessarily
| experienced clinicians.
|
| Now - throw a punch of clinical guidelines in a vector database
| and give it context and it's 10x better than me and any doctor
| outside their speciality or all the mid-levels. (I.E, it's better
| than cardiologist doing infectious disease - but not
| cardiologists doing cardiology). This because there are very
| niche stuff as you specialize where it's only like 5 doctors who
| see it in the whole world on a consistent basis (and they don't
| blog!)
|
| I trained it on the IDSA guidelines (infectious disease) and put
| up a proof of concept on GalenAI.co - just as way to start
| talking to health systems and clinicians. it's going to be very
| different world in medicine in a couple of years from now!!
| joshgel wrote:
| Ya, internist here.
|
| For some context, the USMLE is taken _during_ medical school.
| The amount I have learned about actually practicing medicine
| since graduating is probably an order of magnitude more than
| everything I learned in medical school! I still learn stuff,
| all the time, and I'm not just talking about new research.
|
| So, while impressive and clearly part of the future world, we
| shouldn't get too far ahead of ourselves with the current
| models.
| capableweb wrote:
| I agree that we shouldn't get ahead of ourselves with the
| current technology, but what you said earlier applies to
| practically every industry and science. What you learn at the
| actual job is always far more up to date than what you learn
| in school, no matter if it's being a engineer, doctor or just
| a lowly programmer.
| TaylorAlexander wrote:
| This makes me think we need some kind of program for experts to
| start writing things down in a way which is helpful. Even just
| take dictation and transcribe it.
| another_story wrote:
| There are many tools for doctors to do just this, but it's a
| matter of time more than anything.
| andrewthornton wrote:
| You should take a quick look at EPIC. They dominate the
| electronic heath record space, and a ton of health systems
| use it. You will know if your doctor's office uses an EHR
| application, because they will be typing notes into it for
| the majority of your visit. I have not been too excited about
| the amount of time that physicians spend on EHR systems, but
| I am hopeful that taking the data they input (along with
| blood work and other test results) will make everything more
| accurate, fast and effective.
| jonathan-adly wrote:
| EPIC unfortunately is all the bad things about Google, and
| none of the good.
|
| Unable to ship anything, protect their margin > help the
| users solve problems, monopoly, locked up distribution so
| no one else can innovate.
|
| Honestly, my bear case for AI in medicine is Epic picking
| up the phone and telling health-systems not to buy anything
| because they are working on something for them for free.
| (Which would be some note completion BS stuff, rather than
| actual clinical support that helps patients and cuts
| costs). They may be doing this already.
| BurningFrog wrote:
| Until this is banned, it seems GPT-4 can be a good alternative to
| a doctor visit!
| hackerlight wrote:
| This isn't an original thought, but ... this should be big for
| medical access in developing countries. The problem there is a
| shortage of doctors. So you could imagine a setup where you have
| "nurses" who go through a 6-month training course on how to
| collect symptom descriptions, put it into GPT-n, and then refer
| x% of cases to a real doctor.
|
| Whatever the setup in the end, I hope we as a society don't let
| perfect be the enemy of the good. Having GPT-4 as a doctor is
| better and more humane than having no doctor, and in some
| contexts that is the only choice that people have.
|
| Maybe the Gates Foundation can work on this given Bill Gates is
| already close to the OpenAI team.
___________________________________________________________________
(page generated 2023-03-26 23:00 UTC)