[HN Gopher] Reverse Engineering Cursor's LLM Client
___________________________________________________________________
Reverse Engineering Cursor's LLM Client
Author : paulwarren
Score : 138 points
Date : 2025-06-07 02:59 UTC (1 days ago)
(HTM) web link (www.tensorzero.com)
(TXT) w3m dump (www.tensorzero.com)
| CafeRacer wrote:
| Soooo.... wireshark is no longer available or something?
| Maxious wrote:
| The article literally says at the end this was just the first
| post about looking before getting into actually changing the
| responses.
|
| (that being said, mitmproxy has gotten pretty good for just
| looking lately
| https://docs.mitmproxy.org/stable/concepts/modes/#local-capt...
| )
| spmurrayzzz wrote:
| Yea the proxying/observability is without question the
| simplest part of this whole problem space. Once you get into
| the weeds of automating all the eval and prompt optimizing,
| you realize how irrelevant wireshark actually is in the
| feedback loop.
|
| But I also like you landed on mitmproxy as well, after
| starting with tcpdump/wireshark. I recently started building
| a tiny streaming textual gradient based optimizer (similar to
| what adalflow is doing) by parsing the mitmproxy outputs in
| realtime. Having a turnkey solution for this sort of thing
| will definitely be valuable at least in the near to mid term.
| vrm wrote:
| if you haven't check out our repo -- it's free, fully self-
| hosted, production-grade, and designed for precisely this
| application :)
|
| https://github.com/TensorZero/tensorzero
| spmurrayzzz wrote:
| Looks very buttoned up. My local project has some
| features tuned for my explicit agent flows however (built
| directly into my inference engine), so can't really jump
| ship just yet.
|
| Looking great so far though!
| vrm wrote:
| wireshark would work for seeing the requests from the desktop
| app to Cursor's servers (which make the actual LLM requests).
| But if you're interested in what the actual requests to LLMs
| look like from Cursor's servers you have to set something like
| this up. Plus, this lets us modify the request and A/B test
| variations!
| stavros wrote:
| Sorry, can you explain this a bit more? Either you're putting
| something between your desktop to the server (in which case
| Wireshark would work) or you're putting something between
| Cursor's infrastructure and their LLM provider, in which
| case, how?
| vrm wrote:
| we're doing the latter! Cursor lets you configure the
| OpenAI base URL so we were able to have Cursor call Ngrok
| -> Nginx (for auth) -> TensorZero -> LLMs. We explain in
| detail in the blog post.
| stavros wrote:
| Ah OK, I saw that, but I thought that was the desktop
| client hitting the endpoint, not the server. Thanks!
| robkop wrote:
| There is much missing from this prompt, tool call descriptors is
| the most obvious. See for yourself using even a year old
| jailbreak [1]. There's some great ideas in how they've setup
| other pieces such as cursor rules.
|
| [1]:
| https://gist.github.com/lucasmrdt/4215e483257e1d81e44842eddb...
| ericrallen wrote:
| Maybe there is some optimization logic that only appends tool
| details that are required for the user's query?
|
| I'm sure they are trying to slash tokens where they can, and
| removing potentially irrelevant tool descriptors seems like
| low-hanging fruit to reduce token consumption.
| vrm wrote:
| I definitely see different prompts based on what I'm doing in
| the app. As we mentioned there are different prompts for if
| you're asking questions, doing Cmd-K edits, working in the
| shell, etc. I'd also imagine that they customize the prompt
| by model (unobserved here, but we can also customize per-
| model using TensorZero and A/B test).
| joshmlewis wrote:
| Yes this is one of the techniques apps can use. You vectorize
| the tool description and then do a lookup based on the users
| query to select the most relevant tools, this is called pre-
| computed semantic profiles. You can even hash queries
| themselves and cache tools that were used and then do
| similarity lookups by query.
| tough wrote:
| cool stuff
| GabrielBianconi wrote:
| They use different prompts depending on the action you're
| taking. We provided just a sample because our ultimate goal
| here is to start A/B testing models, optimizing prompts +
| models, etc. We provide the code to reproduce our work so you
| can see other prompts!
|
| The Gist you shared is a good resource too though!
| cloudking wrote:
| https://github.com/elder-plinius/CL4R1T4S/blob/main/CURSOR/C...
| notpushkin wrote:
| Hmm, now that we have the prompts, would it be possible to
| reimplement Cursor servers and have a fully local ( _ahem_
| pirated) version?
| deadbabe wrote:
| Absolutely
| handfuloflight wrote:
| Were you really waiting for the prompts before disembarking on
| this adventure?
| tomr75 wrote:
| presumably their apply model is run on their servers
|
| I wonder how hard it would be to build a local apply
| model/surely that would be faster on a macbook
| notpushkin wrote:
| It's possible, but they allow you to specify your own API
| (that's how they got the prompts in this article).
| smcleod wrote:
| Or you could just use Cline / Roo Code which are better for
| agentic coding and open source anyway...
| tmikaeld wrote:
| But extremely expensive in comparison
| bredren wrote:
| Cursor and other IDE modality solutions are interesting but train
| sloppy use of context.
|
| From the extracted prompting Cursor is using:
|
| > Each time the USER sends a message, we may automatically attach
| some information about their current state...edit history in
| their session so far, linter errors, and more. This information
| may or may not be relevant to the coding task, it is up for you
| to decide.
|
| This is the context bloat that limits effectiveness of LLMs in
| solving very hard problems.
|
| This particular .env example illustrates the low stakes type of
| problem cursor is great at solving but also lacks the complexity
| that will keep SWE's employed.
|
| Instead I suggest folks working with AI start at chat interface
| and work on editing conversations to keep clean contexts as they
| explore a truly challenging problem.
|
| This often includes meeting and slack transcripts, internal docs,
| external content and code.
|
| I've built a tool for surgical use of code called FileKitty:
| https://github.com/banagale/FileKitty and more recently
| slackprep: https://github.com/banagale/slackprep
|
| That let a person be more intentional about what the problem they
| are trying to solve by only including information relevant to the
| problem.
| jacob019 wrote:
| I had this thought as well and find it a bit surprising. For my
| own agentic applications, I have found it necessary to
| carefully curate the context. Instead of including an
| instruction that we "may automatically attach", only include an
| instruction WHEN something is attached. Instead of "may or may
| not be relevant to the coding task, it is up for you to
| decide"; provide explicit instruction to consider the relevance
| and what to do when it is relevant and when it is not relevant.
| When the context is short, it doesn't matter as much, but when
| there is a difficult problem with long context length, fine
| tuned instructions make all the difference. Cursor may be
| keeping instructions more generic to take advantage of cached
| token pricing, but the phrasing does seem rather sloppy. This
| is all still relatively new, I'm sure both the models and the
| prompts will see a lot more change before things settle down.
| lyjackal wrote:
| I've been curious to see the process for selecting relevant
| context from a long conversation. has anyone reverse engineered
| what that looks like? how is the conversion history pruned, and
| how is the latest state of a file represented?
| GabrielBianconi wrote:
| We didn't look into that workflow closely, but you can
| reproduce our work (code in GitHub) and potentially find some
| insights!
|
| We plan to continue investigating how it works (+ optimize the
| models and prompts using TensorZero).
| serf wrote:
| Cursor is the only product that I have cancelled in 20+ years due
| to a lack of customer service response.
|
| Emailed them multiple times over _weeks_ about billing questions
| -- not a single response. These weren 't like VS code questions ,
| either -- they needed Cursor staff intervention.
|
| No problem getting promo emails though!
|
| The quicker their 'value' can be spread to other services the
| better, imo. Maybe the next group will answer emails.
| jjani wrote:
| Similarly: https://github.com/getcursor/cursor/issues/1052
| SbNn6uJO5wzPww wrote:
| doing same with mitmproxy
| https://news.ycombinator.com/item?id=44213073
| Rastonbury wrote:
| Another similar look at their prompting:
| https://news.ycombinator.com/item?id=44154962
___________________________________________________________________
(page generated 2025-06-08 23:02 UTC)