[HN Gopher] Claude vs. Gemini: Testing on 1M Tokens of Context
       ___________________________________________________________________
        
       Claude vs. Gemini: Testing on 1M Tokens of Context
        
       Author : dshipper
       Score  : 119 points
       Date   : 2025-08-12 16:59 UTC (6 hours ago)
        
 (HTM) web link (every.to)
 (TXT) w3m dump (every.to)
        
       | arnaudsm wrote:
       | https://archive.is/sb7D5
        
         | thefourthchime wrote:
         | Does anyone else have trouble with the archive rendering of
         | that? It seemed to also have the pop up.
        
           | sebastienbarre wrote:
           | You can delete the div with id=subscribe-popup from the dev
           | tools for a better view.
        
           | skarz wrote:
           | Try one of these. They have the popup but you can dismiss it.
           | 
           | https://ghostarchive.org/archive/JlE5T
           | 
           | https://web.archive.org/web/20250812172455/https://every.to/.
           | ..
        
       | irthomasthomas wrote:
       | So sonnet-4 is faster than gemini-2.5-flash at long context. That
       | is surprising. Especially since Gemini runs on those fast TPUS.
        
         | jbellis wrote:
         | if they left them both on defaults, flash is thinking-by-
         | default and sonnet 4 is no-thinking-by-default
        
         | bitpush wrote:
         | > Claude's overall response was consistently around 500 words--
         | Flash and Pro delivered 3,372 and 1,591 words by contrast.
         | 
         | It isnt clear from the article whether the time they quote is
         | time-to-first-token or time to completion. If it is latter,
         | then it makes sense why gemini* would take longer even with
         | similar token throughput.
        
         | curl-up wrote:
         | Note that (in the first test, the only one where output length
         | is reported), Gemini Pro returned more than 3x the amount of
         | text, at less than 2x the amount of time. From my experience
         | with Gemini, that time was probably mainly spent on thinking,
         | length of which is not reported here. So looking at pure TPS of
         | output, Gemini is faster, but without clear info on the
         | thinking time/length, it's impossible to judge.
        
         | lugao wrote:
         | Anthropic also uses TPUs for inference.
        
           | irthomasthomas wrote:
           | Do they rent them from Google? Or are they a different brand?
        
             | ancientworldnow wrote:
             | Google provides them.
        
         | netdur wrote:
         | output tokens must be generated in order (autoregressive
         | decoding), inputs don't have that constraint, so prefill is
         | parallel, with stronger kernels, KV-cache handling, and
         | batching, Claude can outrun Gemini.
        
       | koakuma-chan wrote:
       | I really doubt you can fit all Harry Potter books in 1M tokens.
        
         | gcr wrote:
         | The entire HP series is about one million words.
        
           | koakuma-chan wrote:
           | Harry Potter and the Order of Phoenix alone is 400K tokens.
        
             | kridsdale3 wrote:
             | And takes up a proportional width of everyone's bookshelves
             | along side the others.
        
         | PeterStuer wrote:
         | The series is 1,084,170 words. At let's say 1.4 tokens per
         | word, this would not fit, but it is getting close.
        
           | koakuma-chan wrote:
           | It's 2M tokens for Gemini.
        
             | chrismustcode wrote:
             | That was previous iterations, 2.5 is 1 million context
             | window
             | 
             | https://ai.google.dev/gemini-api/docs/models (context
             | window is details under model variant section with + signs)
             | 
             | They were meant to crank 2.5 to 2 million at some point
             | though, maybe waiting now till 3?
        
           | magicalhippo wrote:
           | How do they do if you test[1] them for attention deficit
           | disorder?
           | 
           | [1]:
           | https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870
        
       | dang wrote:
       | Related ongoing thread:
       | 
       |  _Claude Sonnet 4 now supports 1M tokens of context_ -
       | https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160
       | comments)
        
       | daft_pink wrote:
       | i'm really curious how well they perform with a long chat
       | history. i find that gemini often gets confused when the context
       | is long enough and starts responding to prior prompts, using the
       | cli or it's gem chat window.
        
         | XenophileJKO wrote:
         | From my experience. Gemini is REALLY bad about context
         | blending. It can't keep track of what I said and what it said
         | in a conversation under 200K tokens. It blends concepts and
         | statements up, then refers to some fabricated hybrid fact or
         | comment.
         | 
         | Gemini has done this in ways that I haven't seen in the recent
         | or current generation models from OpenAI or Anthropic.
         | 
         | It really surprised me that Gemini performs so well in multi-
         | turn benchmarks, given that tendency.
        
           | IanCal wrote:
           | I've not experimented with the recent models for this but
           | older Gemini models were awful for this - they'd lie about
           | what I'd said or what was in their system prompt even with
           | short conversations.
        
       | akomtu wrote:
       | IMO, a good contest between LLMs would be data compression. Each
       | LLM is given the same pile of text, and then asked to create
       | compact notes that fit into N pages of text. Then the original
       | text is replaced with their notes and they need to answer a bunch
       | of questions about the original text using the notes alone.
        
       | HackerThemAll wrote:
       | What people seem to miss very hard is that they get interactive
       | chat mode of all the models, including the best and newest
       | (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite and older) totally for
       | free. I mean when working from chat at
       | https://aistudio.google.com/ the entire 1M context window and all
       | is totally free of charge. You really get a very good AI for
       | nothing.
       | 
       | https://i.imgur.com/pgfRrZY.png
        
         | cma wrote:
         | Can you opt out of them training on your data in that free
         | tier?
        
           | relatedtitle wrote:
           | If you have cloud billing enabled you can still use it for
           | free and they say they don't train on it.
           | https://ai.google.dev/gemini-api/docs/billing#paid-api-ai-
           | st...
        
         | matesz wrote:
         | Geminis free tier allows maybe 5 messages on average, for 2.5
         | pro at least and this is not usable.
         | 
         | I'm using Claude Pro for daily driver and Gemini / ChatGPT free
         | tiers.
        
           | rat9988 wrote:
           | > Geminis free tier allows maybe 5 messages on average, for
           | 2.5 pro at least and this is not usable.
           | 
           | Not on ai studio.
        
           | HackerThemAll wrote:
           | You are clearly confirming my comment above.
        
             | thomastjeffery wrote:
             | How?
        
       ___________________________________________________________________
       (page generated 2025-08-12 23:01 UTC)