[HN Gopher] OpenAI's new reasoning AI models hallucinate more
___________________________________________________________________
OpenAI's new reasoning AI models hallucinate more
Author : almog
Score : 5 points
Date : 2025-04-18 22:43 UTC (17 minutes ago)
(HTM) web link (techcrunch.com)
(TXT) w3m dump (techcrunch.com)
| rzz3 wrote:
| Does anyone have any technical insight on what actually causes
| the hallucinations? I know it's an ongoing area of research, but
| do we have a lead?
| pkaye wrote:
| Anthropic had a recent paper that might be of interest.
|
| https://www.anthropic.com/research/tracing-thoughts-language...
| minimaxir wrote:
| At a high level, what _causes_ hallucinations is an easier
| question than how to solve them.
|
| LLMs are pretrained to maximize the probability of the n+1
| tokens given n tokens. To do this reliably, the model learns
| statistical patterns in the source data and transformer models
| are very good at doing that when large enough and given enough
| data. It is therefore suspect to any statistical biases in the
| training data because despite many advances in guiding LLMs,
| e.g. RLHF, LLMs are not sentient and most approaches to get
| around that such as the current reasoning models are hacks over
| a fundamental problem with the approach.
|
| It also doesn't help that when sampling the tokens, the default
| temperature of most LLM UIs is 1.0, with the argument that it
| is better for creativity. If you have access to the API and
| want a specific answer more reliably, I recommend setting
| temperature = 0.0, in which case the model will always select
| the token with the highest probability and tends to be more
| correct.
| vikramkr wrote:
| There's the anthropic paper someone else linked, but also it's
| pretty interesting to see the question framed as trying to
| understand what causes the hallucinations lol. It's a (very
| fancy) next word predictor - it's kind of amazing that it
| doesn't hallucinate! Like that paper showed that there were
| circuits that functionally actually do things resembling
| arithmetic and computation with lookup tables instead of just
| blindly 'guessing' a random number when asked what an
| arithmetic expression equals and that seems like the much more
| extraordinary thing that we want to figure out the cause of!
| serjester wrote:
| Anecdotally o3 is the first OpenAI model in a while that I have
| to double check if it's dropping important pieces of my code.
___________________________________________________________________
(page generated 2025-04-18 23:01 UTC)