[HN Gopher] Deterministic Quoting: Making LLMs safer for healthcare
       ___________________________________________________________________
        
       Deterministic Quoting: Making LLMs safer for healthcare
        
       Author : mattyyeung
       Score  : 69 points
       Date   : 2024-05-05 10:47 UTC (2 days ago)
        
 (HTM) web link (mattyyeung.github.io)
 (TXT) w3m dump (mattyyeung.github.io)
        
       | telotortium wrote:
       | We've developed LLM W^X now - time to develop LLM ROP!
        
         | gojomo wrote:
         | Interesting analogies for LLMs!
         | (https://en.wikipedia.org/wiki/W%5EX &
         | https://en.wikipedia.org/wiki/Return-oriented_programming)
        
       | w10-1 wrote:
       | I'm not sure determinism alone is sufficient for proper
       | attribution.
       | 
       | This presumes "chunks" are the source. But it's not easy to
       | identify the propositions that form the source of some knowledge.
       | In the best case, you are looking for an association and find it
       | in a sentence you've semantically parsed, but that's rarely the
       | case, particularly for medical histories.
       | 
       | That said, deterministic accuracy might not matter if you can
       | provide enough context, particularly for further exploration. But
       | that's not really "chunks".
       | 
       | So it's unclear to me that tracing probability clouds back to
       | chunks of text will work better than semantic search.
        
       | nextworddev wrote:
       | Did I miss something or did the article never describe how the
       | technique works? (Despite the "How It Works" section
        
         | Smaug123 wrote:
         | It's explained at considerable length in the section _A
         | "Minimalist Implementation" of DQ: a modified RAG Pipeline_.
        
       | Animats wrote:
       | It's a search engine, basically?
        
         | robrenaud wrote:
         | A good, automatically run, privacy preserving search engine
         | that uses electronic medical records might be a valuable
         | resource for busy doctors.
        
         | simonw wrote:
         | Building better search tools is one of the most directly
         | interesting applications of LLMs in my opinion.
        
         | tylersmith wrote:
         | Yes, and Dropbox is an rsync server.
        
       | itishappy wrote:
       | What happens if it hallucinates the <title>?
        
         | resource_waste wrote:
         | Same thing when a human hallucinates.
         | 
         | Except with LLMs, you can run like 10 different models. With a
         | human, you owe $120 and are taking medicine.
        
           | KaiserPro wrote:
           | > With a human, you owe $120 and are taking medicine.
           | 
           | Well there are protocols, procedures and a bunch of checks
           | and balances.
           | 
           | The problem with the LLM is that there isn't any, its you vs
           | one shot retrieval.
        
           | pton_xd wrote:
           | Except with a human there's a counter-party with assets or
           | insurance who assumes liability for mistakes.
           | 
           | Although presumably if a company is making decisions using an
           | LLM, and the LLM makes a mistake, the company would still be
           | held liable ... probably.
           | 
           | If there's no "damage" from the mistake then it doesn't
           | matter either way.
        
         | simonw wrote:
         | You catch it. The hallucinated title will fail to match the
         | retrieved text based on the reference ID.
         | 
         | If it hallucinates an incorrect (but valid) reference ID then
         | hopefully your users can spot that the quoted text has no
         | relevance to their question.
        
       | resource_waste wrote:
       | I feel like this is the perfect application of running the data
       | multiple times.
       | 
       | Imagine having ~10-100 different LLMs, maybe some are medical,
       | maybe some are general, some are from a different language. Have
       | them all run it, rank the answers.
       | 
       | Now I believe this can further be amplified by having another
       | prompt ask to confirm the previous answer. This could get a bit
       | insane computationally with 100 original answers, but I believe
       | the original paper I read was that by doing this prompt
       | processing ~4 times, they got to some 95% accuracy.
       | 
       | So 100 LLMs give an answer, each time we process it 4 times, can
       | we beat a 64 year old doctor?
        
       | simonw wrote:
       | I like this a lot. I've been telling people for a while that
       | asking for direct quotations in LLM output - which you can then
       | "fact-check" by confirming them against the source document - is
       | a useful trick. But that still depends on people actually doing
       | that check, which most people won't do.
       | 
       | I'd thought about experimenting with automatically validating
       | that the quoted text does indeed 100% match the original source,
       | but should even a tweak to punctuation count as a failure there?
       | 
       | The proposed deterministic quoting mechanism feels like a much
       | simpler and more reliable way to achieve the same effect.
        
       | jonathan-adly wrote:
       | I built and sold a company that does this a year ago. It was hard
       | 2 years ago, but now pretty standard RAG with a good
       | implementation will get you there.
       | 
       | The trick is, healthcare users would complain to no end about
       | determinism. But, these are "below-the-line" user - aka, folks
       | who don't write checks and the AI is better than them. (I am a
       | pharmacist by training, and plain vanilla GPT4-turbo is better
       | than me).
       | 
       | Don't really worry about them. The folks who are interested and
       | willing to pay for AI has more practical concerns - like what is
       | my ROI and the implementation like.
       | 
       | Also - folks should be building Baymax from big hero 6 by now
       | (the medical capabilities, not the rocket arm stuff). That's the
       | next leg up.
        
       | not2b wrote:
       | I was thinking that something like this could be useful for
       | discovery in legal cases, where a company might give up a
       | gigabyte or more of allegedly relevant material in response to
       | recovery demands and the opposing side has to plow through it to
       | find the good stuff. But then I thought of a countermeasure:
       | there could be messages in the discovery material that act as
       | instructions to the LLM, telling it what it should _not_ find. We
       | can guarantee that any reports generated will contain accurate
       | quotes, even where they are so that surrounding context can be
       | found. But perhaps, if the attacker controls the input data,
       | things can be missed. And it could be done in a deniable way:
       | email conversations talking about LLMs that also have keywords
       | related to the lawsuit.
        
       | burntcaramel wrote:
       | Is there existing terms of art for this concept? It's not like
       | slightly unreliable writers is a new concept, such as a student
       | writing a paper.
       | 
       | For example:
       | 
       | - Authoritative reference:
       | https://www.montana.edu/rmaher/ee417/Authoritative%20Referen...
       | 
       | - Authoritative source:
       | https://piedmont.libanswers.com/faq/135714
        
       | mattyyeung wrote:
       | Author here, thanks for your interest! Surprising way to wake up
       | in the morning. Happy to answer questions
        
       ___________________________________________________________________
       (page generated 2024-05-07 23:01 UTC)