[HN Gopher] Binary Retrieval-Augmented Reward Mitigates Hallucin...
       ___________________________________________________________________
        
       Binary Retrieval-Augmented Reward Mitigates Hallucinations
        
       Author : MarlonPro
       Score  : 32 points
       Date   : 2025-10-21 16:14 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | amflare wrote:
       | > Existing mitigation approaches often degrade performance on
       | open-ended generation and downstream tasks, limiting their
       | practical utility. [...] Unlike continuous reward schemes, our
       | approach assigns a reward of one only when the model's output is
       | entirely factually correct, and zero otherwise.
       | 
       | Someone correct me if I am wrong, as I'm am on the very edge of
       | this space looking in, but does this mean that they are using a
       | "degraded performance with fewer hallucinations" model to fact
       | check the "more powerful yet prone to hallucinations" model?
        
         | svnt wrote:
         | Also on the edge, but it appears they are relying on the
         | search-augmented identification of conflicts in the generated
         | statement, which is an easier task than constructing an answer
         | to the question. It also encourages abstention because there
         | are no conflicts in "I don't know" (so "mitigating
         | hallucinations" and "answering more questions correctly" are
         | not necessarily the same thing)
        
         | mNovak wrote:
         | My understanding is no, they are collecting a cache of
         | documents from the training set, then after pre-training prompt
         | about those topics. A separate verifier is given both the
         | relevant source documents and generated response, and tasked
         | with checking for conflicts in factuality.
         | 
         | They describe using Qwen 32B as the verifier, and the model
         | under training is Qwen 8B. So in fact the verifier is beefier
         | than the trainee model, though it's unclear if that has to be
         | the case as you scale up.
        
       ___________________________________________________________________
       (page generated 2025-10-21 23:01 UTC)