hngopher.com

       [HN Gopher] LLMs as Unbiased Oracles
       ___________________________________________________________________
        
       LLMs as Unbiased Oracles
        
       Author : MarcoDewey
       Score  : 23 points
       Date   : 2025-05-04 18:45 UTC (4 hours ago)
        
 (HTM) web link (jazzberry.ai)
 (TXT) w3m dump (jazzberry.ai)
        
       | Jensson wrote:
       | > An LLM, specifically trained for test generation, consumes this
       | specification. Its objective is to generate a diverse and
       | comprehensive test suite that probes the specified behavior from
       | an external perspective.
       | 
       | If one of these tests are wrong though it will ruin the whole
       | thing. And LLM are much more likely to make a math error (which
       | would result in a faulty test) than to implement a math function
       | the wrong way, so this probably wont make it better at generating
       | code.
        
       | satisfice wrote:
       | If your premises and assumptions are sufficiently corrupted, you
       | can come to any conclusion and believe you are being rational.
       | Like those dreams where you walk around without pants on and you
       | are more worried about not having pants than you are about how it
       | could have come to be that your pants kept going missing. Your
       | brain is not present enough to find the root of the problem.
       | 
       | An LLM is not unbiased, and you would know that if you tested
       | LLMs.
       | 
       | Apart from biases, an LLM is not a reliable oracle, you would
       | know that if you tested LLMs.
       | 
       | The reliabilities and unreliabilities of LLMs vary in
       | discontinuous and unpredictable ways from task to task, model to
       | model, and within the same model over time. You would know this
       | if you tested LLMs. I have. Why haven't you?
       | 
       | Ideas like this are promoted by people who don't like testing,
       | and don't respect it. That explains why a concept like this is
       | treated as equivalent to a tested fact. There is a name for it:
       | wishful thinking.
        
         | walterbell wrote:
         | _> wishful thinking_
         | 
         | Given the economic component of LLM wishes, we can look at
         | prior instances of wishing-at-scale,
         | https://en.wikipedia.org/wiki/Tulip_mania
        
           | troupo wrote:
           | There's a more recent one:
           | https://blog.mollywhite.net/blockchain/
        
       | TazeTSchnitzel wrote:
       | Is this a blogpost that's incomplete or a barely disguised ad?
        
         | saagarjha wrote:
         | You'd think AI would have told them not to post it
        
       | brahyam wrote:
       | The amount of time it would take to write the formal spec for the
       | code I need is more than it would take to generate the code so
       | doesn't sound like something that will go mainstream. Except for
       | those industries where formal code specs are already in place.
        
         | MarcoDewey wrote:
         | Yes, this test-driven approach will likely increase generation
         | time upfront. However, the payoff is more reliable code being
         | generated. This will lead to less debugging and fewer reprompts
         | overall, which saves time in the long run.
         | 
         | Also agree on the specification formality. Even a less formal
         | spec provides a clearer boundary for the LLM during code
         | generation, which should improve code generation results.
        
       | bluefirebrand wrote:
       | LLMs are absolutely biased
       | 
       | They are biased by the training dataset, which probably also
       | reflects the biases of the people who select the training dataset
       | 
       | They are biased by the system prompts that are embedded into
       | every request to keep them on the rails
       | 
       | They are even biased by the prompt that you write into them,
       | which can lead them to incorrect conclusions if you design the
       | prompt to lead them to it
       | 
       | I think it is a very careless mistake to think of LLMs as
       | unbiased or neutral in any way
        
         | MarcoDewey wrote:
         | You are correct that the notion of LLMs being completely
         | unbiased or neutral does not make sense due to how they are
         | trained. Perhaps my title is even misleading if taken at face
         | value.
         | 
         | When I talk about "unbiased oracles" I am speaking in the
         | context of black box testing. I'm not suggesting they are free
         | from all forms of bias. Instead, the key distinction I'm trying
         | to draw is their lack of implementation-level bias towards the
         | specific code they are testing.
        
       | neuroelectron wrote:
       | Yeah that would be cool
        
       | fallinditch wrote:
       | I think it makes a lot of sense to employ various specialized
       | LLMs in the software development lifecycle: one that's good at
       | ideation and product development, one that fronts the
       | organizational knowledge base, one for testing code, one (or
       | more) for coding, etc, maybe even one whose job it is to always
       | question your assumptions.
        
       ___________________________________________________________________
       (page generated 2025-05-04 23:00 UTC)