[HN Gopher] LLMs as Unbiased Oracles
___________________________________________________________________
LLMs as Unbiased Oracles
Author : MarcoDewey
Score : 23 points
Date : 2025-05-04 18:45 UTC (4 hours ago)
(HTM) web link (jazzberry.ai)
(TXT) w3m dump (jazzberry.ai)
| Jensson wrote:
| > An LLM, specifically trained for test generation, consumes this
| specification. Its objective is to generate a diverse and
| comprehensive test suite that probes the specified behavior from
| an external perspective.
|
| If one of these tests are wrong though it will ruin the whole
| thing. And LLM are much more likely to make a math error (which
| would result in a faulty test) than to implement a math function
| the wrong way, so this probably wont make it better at generating
| code.
| satisfice wrote:
| If your premises and assumptions are sufficiently corrupted, you
| can come to any conclusion and believe you are being rational.
| Like those dreams where you walk around without pants on and you
| are more worried about not having pants than you are about how it
| could have come to be that your pants kept going missing. Your
| brain is not present enough to find the root of the problem.
|
| An LLM is not unbiased, and you would know that if you tested
| LLMs.
|
| Apart from biases, an LLM is not a reliable oracle, you would
| know that if you tested LLMs.
|
| The reliabilities and unreliabilities of LLMs vary in
| discontinuous and unpredictable ways from task to task, model to
| model, and within the same model over time. You would know this
| if you tested LLMs. I have. Why haven't you?
|
| Ideas like this are promoted by people who don't like testing,
| and don't respect it. That explains why a concept like this is
| treated as equivalent to a tested fact. There is a name for it:
| wishful thinking.
| walterbell wrote:
| _> wishful thinking_
|
| Given the economic component of LLM wishes, we can look at
| prior instances of wishing-at-scale,
| https://en.wikipedia.org/wiki/Tulip_mania
| troupo wrote:
| There's a more recent one:
| https://blog.mollywhite.net/blockchain/
| TazeTSchnitzel wrote:
| Is this a blogpost that's incomplete or a barely disguised ad?
| saagarjha wrote:
| You'd think AI would have told them not to post it
| brahyam wrote:
| The amount of time it would take to write the formal spec for the
| code I need is more than it would take to generate the code so
| doesn't sound like something that will go mainstream. Except for
| those industries where formal code specs are already in place.
| MarcoDewey wrote:
| Yes, this test-driven approach will likely increase generation
| time upfront. However, the payoff is more reliable code being
| generated. This will lead to less debugging and fewer reprompts
| overall, which saves time in the long run.
|
| Also agree on the specification formality. Even a less formal
| spec provides a clearer boundary for the LLM during code
| generation, which should improve code generation results.
| bluefirebrand wrote:
| LLMs are absolutely biased
|
| They are biased by the training dataset, which probably also
| reflects the biases of the people who select the training dataset
|
| They are biased by the system prompts that are embedded into
| every request to keep them on the rails
|
| They are even biased by the prompt that you write into them,
| which can lead them to incorrect conclusions if you design the
| prompt to lead them to it
|
| I think it is a very careless mistake to think of LLMs as
| unbiased or neutral in any way
| MarcoDewey wrote:
| You are correct that the notion of LLMs being completely
| unbiased or neutral does not make sense due to how they are
| trained. Perhaps my title is even misleading if taken at face
| value.
|
| When I talk about "unbiased oracles" I am speaking in the
| context of black box testing. I'm not suggesting they are free
| from all forms of bias. Instead, the key distinction I'm trying
| to draw is their lack of implementation-level bias towards the
| specific code they are testing.
| neuroelectron wrote:
| Yeah that would be cool
| fallinditch wrote:
| I think it makes a lot of sense to employ various specialized
| LLMs in the software development lifecycle: one that's good at
| ideation and product development, one that fronts the
| organizational knowledge base, one for testing code, one (or
| more) for coding, etc, maybe even one whose job it is to always
| question your assumptions.
___________________________________________________________________
(page generated 2025-05-04 23:00 UTC)