[HN Gopher] SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn C...
___________________________________________________________________
SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context
Retrieval
Author : meetpateltech
Score : 70 points
Date : 2025-10-16 16:59 UTC (6 hours ago)
(HTM) web link (cognition.ai)
(TXT) w3m dump (cognition.ai)
| marstall wrote:
| SWE-1 has been being booped up by WindSurf to me lately and I've
| been impressed - often (enough?) getting me the same answers as
| GPT5 etc., but almost instantly. Gotta say speed is nice.
| swyx wrote:
| nice, what does booped up mean? is this gen z lingo?
| marstall wrote:
| ha more like how i talk to my two year old. WindSurf's
| Cascade sidebar tool (which i use in RubyMine) has a stable
| of LLMs and it somewhat randomly switches the active one out
| from time to time. So I get a taste of what different ones
| are like, it's kind of cool.
| tifa2up wrote:
| Searched for 'hi' and it took 166s to return a response using
| this model: https://pasteboard.co/oB4VqVC5FGkl.png
|
| Claude Code took 0.1s, Cursor CLI 19s
| mgambati wrote:
| If you ask a real question, then you might get real results.
| silasalberti wrote:
| hey I'm from the SWE-grep team - feel free to ask me any
| questions :)
| daralthus wrote:
| this would be useful outside of coding. could you release a
| benchmark so we can have more models tuned for this?
| swyx wrote:
| (coauthor) main charts/evals here
| https://x.com/cognition/status/1978867021669413252
|
| you can try the https://playground.cognition.ai/ here
|
| i wrote a longer explainer here
| https://x.com/swyx/status/1978874342743343254 but saving you the
| click
|
| this was a perspective cut from the blogpost, but let me explain
| why subagents kill long context
|
| Like you can spend $500m building 100 million context models, and
| they would be 1) slow, 2) expensive to use, 3) have huge context
| rot. O(n) is the lower bound.
|
| Cog's approach is something you learn in day 1 of CS50 - divide
| and parallelize. Embeddings are too dumb, Agentic Search is too
| slow. So train limited-agency (max 4 turns), natively parallel
| tool calling (avg parallelism of 7-8, custom toolset) fast
| (2800tok/s) subagents to give the performance of Agentic Search
| under an acceptable "Flow Window" that feels immaterially slower
| than Embeddings.
|
| The benefit of this is threefold:
|
| - 8 ^ 4 toolcalls cover a very large code search space. can
| compound subagent calls if more needed.
|
| - predictable cost & end to end latency
|
| - subagent outputs "clean" contexts, free of context failure
| modes like context poisoning and context rot
|
| we originally called this Rapid Agentic Search, to contrast with
| RAG. but Fast Context rolls off the tongue better.
|
| -- Second perspective --
|
| The Fundamental Equation of Coding Agents is:
|
| Coding Agent Performance = Ability to Read the Right Files *
| Ability to Generate the Right Diffs
|
| Fast Context is Cognition's first solution for the Read. As
| codebases get larger and and tasks get more complex, Reads get
| more important. the average production codebase first query in
| Cascade is >60% just searching and reading files.
|
| But if this were just about speed, it might not be that exciting.
| I think there are unappreciated effects in performance as well
| when you have very good context. In other words:
|
| Context Engineering is Actually Very Important. Too important for
| humans and hardcoded rules.
|
| The swe-greps are the first dedicated context engineer agent
| models.
| vessenes wrote:
| Thanks for the summary. I noticed from the announcement you
| trained on parallel tool calling to save on serial round
| tripping. This is awesome.
|
| Most LLM coding is so slow that you're permanently out of flow
| state, and in 'manager' state right now - I'm interested in a
| future where you've got enough fast low TTFT support that an
| engineer could maintain flow state and have sort of super power
| type productivity at the same time, and this tool makes me
| think of that.
|
| That is, it looks fast enough to be used as a sort of sidebar
| info tool, as in "what you're coding might need / refer to
| these other parts of the codebase" -- effectively increasing an
| engineer's working memory. Super cool. And obviously useful for
| an AI engineer as well. Thanks for the writeup!
| ntntnt wrote:
| lol dead thread, cognition begging to grab some traction in this
| space.
| kburman wrote:
| I thought https://playground.cognition.ai/ was just returning
| some cached query results, but no, they're actually spinning up
| real VMs and running live queries without any authentication or
| restrictions. That must be costing them a fortune.
| groby_b wrote:
| Currently, all queries are returning "We're under load and
| processing too many requests. Please try again later."
|
| So that's how that is going ;)
| awsanswers wrote:
| LLM product managers: Show me what's in the context convenient to
| where I am prompting. Likely the user knowing and editing the
| precise context between requests will be a user task for a long
| time
| breadislove wrote:
| guys please release the benchmark or the benchmark code. like
| this is just "trust me bro"
| swyx wrote:
| well thats what the playground is for! playground.cognition.ai
| breadislove wrote:
| yeah but if people would like to double check the results it
| would be nice to have the actual benchmark. especially given
| that your playground is broken...
|
| "We ran into an error processing your request. Please try
| again"
___________________________________________________________________
(page generated 2025-10-16 23:00 UTC)