[HN Gopher] Claude vs. Gemini: Testing on 1M Tokens of Context
___________________________________________________________________
Claude vs. Gemini: Testing on 1M Tokens of Context
Author : dshipper
Score : 119 points
Date : 2025-08-12 16:59 UTC (6 hours ago)
(HTM) web link (every.to)
(TXT) w3m dump (every.to)
| arnaudsm wrote:
| https://archive.is/sb7D5
| thefourthchime wrote:
| Does anyone else have trouble with the archive rendering of
| that? It seemed to also have the pop up.
| sebastienbarre wrote:
| You can delete the div with id=subscribe-popup from the dev
| tools for a better view.
| skarz wrote:
| Try one of these. They have the popup but you can dismiss it.
|
| https://ghostarchive.org/archive/JlE5T
|
| https://web.archive.org/web/20250812172455/https://every.to/.
| ..
| irthomasthomas wrote:
| So sonnet-4 is faster than gemini-2.5-flash at long context. That
| is surprising. Especially since Gemini runs on those fast TPUS.
| jbellis wrote:
| if they left them both on defaults, flash is thinking-by-
| default and sonnet 4 is no-thinking-by-default
| bitpush wrote:
| > Claude's overall response was consistently around 500 words--
| Flash and Pro delivered 3,372 and 1,591 words by contrast.
|
| It isnt clear from the article whether the time they quote is
| time-to-first-token or time to completion. If it is latter,
| then it makes sense why gemini* would take longer even with
| similar token throughput.
| curl-up wrote:
| Note that (in the first test, the only one where output length
| is reported), Gemini Pro returned more than 3x the amount of
| text, at less than 2x the amount of time. From my experience
| with Gemini, that time was probably mainly spent on thinking,
| length of which is not reported here. So looking at pure TPS of
| output, Gemini is faster, but without clear info on the
| thinking time/length, it's impossible to judge.
| lugao wrote:
| Anthropic also uses TPUs for inference.
| irthomasthomas wrote:
| Do they rent them from Google? Or are they a different brand?
| ancientworldnow wrote:
| Google provides them.
| netdur wrote:
| output tokens must be generated in order (autoregressive
| decoding), inputs don't have that constraint, so prefill is
| parallel, with stronger kernels, KV-cache handling, and
| batching, Claude can outrun Gemini.
| koakuma-chan wrote:
| I really doubt you can fit all Harry Potter books in 1M tokens.
| gcr wrote:
| The entire HP series is about one million words.
| koakuma-chan wrote:
| Harry Potter and the Order of Phoenix alone is 400K tokens.
| kridsdale3 wrote:
| And takes up a proportional width of everyone's bookshelves
| along side the others.
| PeterStuer wrote:
| The series is 1,084,170 words. At let's say 1.4 tokens per
| word, this would not fit, but it is getting close.
| koakuma-chan wrote:
| It's 2M tokens for Gemini.
| chrismustcode wrote:
| That was previous iterations, 2.5 is 1 million context
| window
|
| https://ai.google.dev/gemini-api/docs/models (context
| window is details under model variant section with + signs)
|
| They were meant to crank 2.5 to 2 million at some point
| though, maybe waiting now till 3?
| magicalhippo wrote:
| How do they do if you test[1] them for attention deficit
| disorder?
|
| [1]:
| https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870
| dang wrote:
| Related ongoing thread:
|
| _Claude Sonnet 4 now supports 1M tokens of context_ -
| https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160
| comments)
| daft_pink wrote:
| i'm really curious how well they perform with a long chat
| history. i find that gemini often gets confused when the context
| is long enough and starts responding to prior prompts, using the
| cli or it's gem chat window.
| XenophileJKO wrote:
| From my experience. Gemini is REALLY bad about context
| blending. It can't keep track of what I said and what it said
| in a conversation under 200K tokens. It blends concepts and
| statements up, then refers to some fabricated hybrid fact or
| comment.
|
| Gemini has done this in ways that I haven't seen in the recent
| or current generation models from OpenAI or Anthropic.
|
| It really surprised me that Gemini performs so well in multi-
| turn benchmarks, given that tendency.
| IanCal wrote:
| I've not experimented with the recent models for this but
| older Gemini models were awful for this - they'd lie about
| what I'd said or what was in their system prompt even with
| short conversations.
| akomtu wrote:
| IMO, a good contest between LLMs would be data compression. Each
| LLM is given the same pile of text, and then asked to create
| compact notes that fit into N pages of text. Then the original
| text is replaced with their notes and they need to answer a bunch
| of questions about the original text using the notes alone.
| HackerThemAll wrote:
| What people seem to miss very hard is that they get interactive
| chat mode of all the models, including the best and newest
| (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite and older) totally for
| free. I mean when working from chat at
| https://aistudio.google.com/ the entire 1M context window and all
| is totally free of charge. You really get a very good AI for
| nothing.
|
| https://i.imgur.com/pgfRrZY.png
| cma wrote:
| Can you opt out of them training on your data in that free
| tier?
| relatedtitle wrote:
| If you have cloud billing enabled you can still use it for
| free and they say they don't train on it.
| https://ai.google.dev/gemini-api/docs/billing#paid-api-ai-
| st...
| matesz wrote:
| Geminis free tier allows maybe 5 messages on average, for 2.5
| pro at least and this is not usable.
|
| I'm using Claude Pro for daily driver and Gemini / ChatGPT free
| tiers.
| rat9988 wrote:
| > Geminis free tier allows maybe 5 messages on average, for
| 2.5 pro at least and this is not usable.
|
| Not on ai studio.
| HackerThemAll wrote:
| You are clearly confirming my comment above.
| thomastjeffery wrote:
| How?
___________________________________________________________________
(page generated 2025-08-12 23:01 UTC)