[HN Gopher] Context Rot: How increasing input tokens impacts LLM...
___________________________________________________________________
Context Rot: How increasing input tokens impacts LLM performance
I work on research at Chroma, and I just published our latest
technical report on context rot. TLDR: Model performance is non-
uniform across context lengths, including state-of-the-art GPT-4.1,
Claude 4, Gemini 2.5, and Qwen3 models. This highlights the need
for context engineering. Whether relevant information is present in
a model's context is not all that matters; what matters more is how
that information is presented. Here is the complete open-source
codebase to replicate our results: https://github.com/chroma-
core/context-rot
Author : kellyhongsn
Score : 34 points
Date : 2025-07-14 19:25 UTC (3 hours ago)
(HTM) web link (research.trychroma.com)
(TXT) w3m dump (research.trychroma.com)
| tjkrusinski wrote:
| Interesting report. Are there recommended sizes for different
| models? How do I know what works or doesn't for my use case?
| posnet wrote:
| I've definitely noticed this anecdotally.
|
| Especially with Gemini Pro when providing long form textual
| references, providing many documents in a single context windows
| gives worse answers than having it summarize documents first, ask
| a question about the summary only, then provide the full text of
| the sub-documents on request (rag style or just simple agent
| loop).
|
| Similarly I've personally noticed that Claude Code with Opus or
| Sonnet gets worse the more compactions happen, it's unclear to me
| whether it's just the summary gets worse, or if its the context
| window having a higher percentage of less relevant data, but even
| clearing the context and asking it to re-read the relevant files
| (even if they were mentioned and summarized in the compaction)
| gives better results.
| tough wrote:
| Have you tried NotebookLM which basically does this as an app
| on the bg (chunking and summarising many docs) and you can
| -chat- with the full corpus using RAG
| zwaps wrote:
| Gemini loses coherence and reasoning ability well before the
| chat hits the context limitations, and according to this
| report, it is the best model on several dimensions.
|
| Long story short: Context engineering is still king, RAG is not
| dead
| zwaps wrote:
| Very cool results, very comprehensive article, many insights!
|
| Media literacy disclaimer: Chroma is a vectorDB company.
| philip1209 wrote:
| Chroma does vector, full-text, and regex search. And, it's
| designed for multitenant workloads typical of AI applications.
| So, not just a "vectorDB company"
| tough wrote:
| this felt intuitively true, great to see some research putting
| hard numbers on that
___________________________________________________________________
(page generated 2025-07-14 23:00 UTC)