[HN Gopher] LLMs Encode How Difficult Problems Are
___________________________________________________________________
LLMs Encode How Difficult Problems Are
Author : stansApprentice
Score : 73 points
Date : 2025-11-06 18:29 UTC (4 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| jiito wrote:
| I haven't read this particular paper in-depth, but it reminds me
| of another one I saw that used a similar approach to find if the
| model encodes its own certainty of answering correctly.
| https://arxiv.org/abs/2509.10625
| kazinator wrote:
| It's all very clear when you mentally replace "LLM" with "text
| completion driven by compressed training data".
|
| E.g.
|
| [Text copletion driven by compressed training data] exhibit[s] a
| puzzling inconsistency: [it] solves complex problems yet
| frequently fail[s] on seemingly simpler ones.
|
| Some problems are better represented by a locus of texts in the
| training data, allowing more plausible talk to be generated. When
| the problem is not well represented, it does not help that the
| problem is simple.
|
| If you train it on nothing but Scientology documents, and then
| ask about the Buddhist perspective on a situation, you will
| probably get some nonsense about body thetans, even if the
| situation is simple.
| th0ma5 wrote:
| Thank you for posting this. I'm struck with how there is a lot
| of studying of the behavior and isolating it from other
| assumptions and then these individual capabilities are then
| described as a new solution or discovered capability that would
| work with all of those other assumptions. This makes most all
| of the LLM research feel like whack a mole if the goal was to
| make accurate and reliable models by understanding these
| techniques. Instead, it's more like seeing faces in cars and
| buildings and other artifacts of patterns and pattern groupings
| and recognition of patterns. Building houses on sand, etc.
| lukev wrote:
| Well, that's what a LLM _is_. The problem is if one 's mental
| model is built on "AI" instead of "LLM."
|
| The fact that LLMs can abstract concepts and do _any_ amount of
| out-of-sample reasoning is impressive and interesting, but the
| null hypothesis for a LLM being "impressive" in any regard is
| that the data required to answer the question is present in
| it's training set.
| XenophileJKO wrote:
| This is true, but also misleading. We are learning that the
| models achieve compression by distilling higher level concepts
| and deriving generalized human like abilities, for example the
| recent introspection paper from Anthropic.
| layoric wrote:
| I have a hard time trying to conceptualize lossy text
| compression, but I've recently started to think about the
| "reasoning"/output as just a by product of lossy compression,
| and weights tending towards an average of the information
| "around" the main topic of prompt. What I've found easier is
| thinking about it like lossy image compression, generating more
| output tokens via "reasoning" is like subdividing nearby pixels
| and filling in the gaps with values that they've seen there
| before. Taking the analogy a bit too far, you can also think of
| the vocabulary as the pixel bit depth.
|
| I definitely agree replacing AI or LLMs with "X driven by
| compressed training data" starts to make a lot more sense, and
| a useful shortcut.
| suprjami wrote:
| You're right about "reasoning". It's just trying to steer the
| conversation in a more relevant direction in vector space,
| hopefully to generate more relevant output tokens. I find it
| easier to conceptualize this in three dimensions. 3blue1brown
| has a good video series which covers the overall concept of
| LLM vectors in machine learning: https://youtube.com/playlist
| ?list=PLZHQObOWTQDNU6R1_67000Dx_...
|
| To give a concrete example, say we're generating the next
| token from the word "queen". Is this the monarch, the bee,
| the playing card, the drag entertainer? By adding more
| relevant tokens (honey, worker, hive, beeswax) we steer the
| token generation to the place in the "word cloud" where our
| next token is more likely to exist.
|
| I don't see LLMs as "lossy compression" of text. To me that
| implies retrieval, and Transformers are a prediction device,
| not a retrieval device. If one needs retrieval then use a
| database.
| astrange wrote:
| It is not a useful shortcut because you don't know what the
| training data is, nothing requires it to be an "average" of
| anything, and post-training arbitrarily re-weights all of its
| existing distributions anyway.
| onraglanroad wrote:
| > Text copletion driven by compressed training data...solves
| complex problems
|
| Sure it does. Obviously. All we ever needed was some text
| completion.
|
| Thanks for your valuable insight.
| WhyOhWhyQ wrote:
| Probably irrelevant, but something funny about claude code is it
| will routinely say something like "10 week task, very complex",
| and then one-shot it in 2 minutes. I didn't have it create a
| feature for a while because it kept telling me it's way too
| complicated. All of the open source versions I tried weren't
| working, but I finally just decided to get it to make the feature
| anyways and it ended up doing better than the open source
| projects. So there's something off about how well claude
| estimates the difficulty of things for it, and I'm wondering if
| that makes it perform worse by not doing things it would do well
| at.
| danielbln wrote:
| In terms of the time estimates: I've added to my global rules
| to never give time estimates for tasks, as they're useless and
| inaccurate.
| jives wrote:
| I wonder if it's trying to predict what kind of estimate a
| human engineer would provide.
| EGreg wrote:
| Considering it's trained on predicting the next word in stuff
| humans estimated before AI, wouldn't that make sense?
| bartwe wrote:
| Sound a lot like Kolmogorov complexity
___________________________________________________________________
(page generated 2025-11-06 23:00 UTC)