[HN Gopher] Context Rot: How increasing input tokens impacts LLM...
       ___________________________________________________________________
        
       Context Rot: How increasing input tokens impacts LLM performance
        
       I work on research at Chroma, and I just published our latest
       technical report on context rot.  TLDR: Model performance is non-
       uniform across context lengths, including state-of-the-art GPT-4.1,
       Claude 4, Gemini 2.5, and Qwen3 models.  This highlights the need
       for context engineering. Whether relevant information is present in
       a model's context is not all that matters; what matters more is how
       that information is presented.  Here is the complete open-source
       codebase to replicate our results: https://github.com/chroma-
       core/context-rot
        
       Author : kellyhongsn
       Score  : 34 points
       Date   : 2025-07-14 19:25 UTC (3 hours ago)
        
 (HTM) web link (research.trychroma.com)
 (TXT) w3m dump (research.trychroma.com)
        
       | tjkrusinski wrote:
       | Interesting report. Are there recommended sizes for different
       | models? How do I know what works or doesn't for my use case?
        
       | posnet wrote:
       | I've definitely noticed this anecdotally.
       | 
       | Especially with Gemini Pro when providing long form textual
       | references, providing many documents in a single context windows
       | gives worse answers than having it summarize documents first, ask
       | a question about the summary only, then provide the full text of
       | the sub-documents on request (rag style or just simple agent
       | loop).
       | 
       | Similarly I've personally noticed that Claude Code with Opus or
       | Sonnet gets worse the more compactions happen, it's unclear to me
       | whether it's just the summary gets worse, or if its the context
       | window having a higher percentage of less relevant data, but even
       | clearing the context and asking it to re-read the relevant files
       | (even if they were mentioned and summarized in the compaction)
       | gives better results.
        
         | tough wrote:
         | Have you tried NotebookLM which basically does this as an app
         | on the bg (chunking and summarising many docs) and you can
         | -chat- with the full corpus using RAG
        
         | zwaps wrote:
         | Gemini loses coherence and reasoning ability well before the
         | chat hits the context limitations, and according to this
         | report, it is the best model on several dimensions.
         | 
         | Long story short: Context engineering is still king, RAG is not
         | dead
        
       | zwaps wrote:
       | Very cool results, very comprehensive article, many insights!
       | 
       | Media literacy disclaimer: Chroma is a vectorDB company.
        
         | philip1209 wrote:
         | Chroma does vector, full-text, and regex search. And, it's
         | designed for multitenant workloads typical of AI applications.
         | So, not just a "vectorDB company"
        
       | tough wrote:
       | this felt intuitively true, great to see some research putting
       | hard numbers on that
        
       ___________________________________________________________________
       (page generated 2025-07-14 23:00 UTC)