hngopher.com

       [HN Gopher] Claude Sonnet 4 now supports 1M tokens of context
       ___________________________________________________________________
        
       Claude Sonnet 4 now supports 1M tokens of context
        
       Author : adocomplete
       Score  : 1273 points
       Date   : 2025-08-12 16:02 UTC (1 days ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | throwaway888abc wrote:
       | holy moly! awesome
        
       | tankenmate wrote:
       | This is definitely good to have this as an option but at the same
       | time having more context reduces the quality of the output
       | because it's easier for the LLM to get "distracted". So, I wonder
       | what will happen to the quality of code produced by tools like
       | Claude Code if users don't properly understand the trade off
       | being made (if they leave it in auto mode of coding right up to
       | the auto compact).
        
         | jasonthorsness wrote:
         | What do you recommend doing instead? I've been using Claude
         | Code a lot but am still pretty novice at the best practices
         | around this.
        
           | TheDong wrote:
           | Have the AI produce a plan that spans multiple files (like
           | "01 create frontend.md", "02 create backend.md", "03 test
           | frontend and backend running together.md"), and then create a
           | fresh context for each step if it looks like re-using the
           | same context is leading it to confusion.
           | 
           | Also, commit frequently, and if the AI constantly goes down
           | the wrong path ("I can't create X so I'll stub it out with Y,
           | we'll fix it later"), you can update the original plan with
           | wording to tell it not to take that path ("Do not ever stub
           | out X, we must make X work"), and then start a fresh session
           | with an older and simpler version of the code and see if that
           | fresh context ends up down a better path.
           | 
           | You can also run multiple attempts in parallel if you use
           | tooling that supports that (containers + git worktrees is one
           | way)
        
             | F7F7F7 wrote:
             | Inventivatbly the files become a mess of their own. Changes
             | and learnings from one part of the plan often dont result
             | in adaptation to impacted plans down chain.
             | 
             | In the end you have a mish mash of half implemented plans
             | and now you've lost context too. Which leads to blowing
             | tokens on trying to figure out what's been implemented,
             | what's half baked, and what was completely ignored.
             | 
             | Any links to anyone who's built something at scale using
             | this method? It always sounds good on paper.
             | 
             | I'd love to find a system that works.
        
               | brandall10 wrote:
               | My system is to create detailed feature files up to a few
               | hundred lines in size that are immutable, and then have a
               | status.md file (preferably kept to about 50 lines) that
               | links to a current feature that is used as a way to keep
               | track of the progress on that feature.
               | 
               | Additionally I have a Claude Code command with
               | instructions referencing the status.md, how to select the
               | next task, how to compact status.md, etc.
               | 
               | Every time I'm done with a unit of work from that feature
               | - always triggered w/ ultrathink - I'll put up a PR and
               | go through the motions of extra refactors/testing. For
               | more complex PRs that require many extra commits to get
               | prod ready I just let the sessions auto-compact.
               | 
               | After merging I'll clear the context and call the CC
               | command to progress to the next unit of work.
               | 
               | This allows me to put up to around 4-5 meaningful PRs per
               | feature if it's reasonably complex while keeping the
               | context relatively tight. The current project I'm focused
               | on is just over 16k LOC in swift (25k total w/ tests) and
               | it seems to work pretty well - it rarely gets off track,
               | does unnecessary refactors, destroys working features,
               | etc.
        
               | nzach wrote:
               | Care to elaborate on how you use the status.md file? What
               | exactly you put in there, and what value does it bring?
        
               | brandall10 wrote:
               | When I initially have it built from a feature file, it
               | pulls in the most pertinent high level details from that
               | and creates a supercharged task list that is updated w/
               | implementation details as the feature progresses.
               | 
               | As it links to the feature file as well, that is pulled
               | into the context, but status.md is there to essentially
               | act as a 'cursor' to where it is in the implementation
               | and provide extended working memory - that Claude itself
               | manages - specific to that feature. With that you can
               | work on bite sized chunks of the feature each with a
               | clean context. When the feature is complete it is
               | trashed.
               | 
               | I've seen others try to achieve similar things by making
               | CLAUDE.md or the feature file mutable but that IME is a
               | bad time. CLAUDE.md should be lean with the details to
               | work on the project, and the feature file can easily be
               | corrupted in an unintended way allowing things to go
               | wayward in scope.
        
               | nzach wrote:
               | In my experience it works better if you create one plan
               | at a time. Create a prompt, make claude implement it and
               | then you make sure it is working as expected. Only then
               | you ask for something new.
               | 
               | I've created an agent to help me create the prompts, it
               | goes something like this: "You are an Expert Software
               | Architect specializing in creating comprehensive, well-
               | researched feature implementation prompts. Your sole
               | purpose is to analyze existing codebases and
               | documentation to craft detailed prompts for new features.
               | You always think deeply before giving an answer...."
               | 
               | My workflow is: 1) use this agent to create a prompt for
               | my feature; 2) ask claude to create a plan for the just
               | created prompt; 3) ask claude to implement said plan if
               | it looks good.
        
               | cube00 wrote:
               | >You always think deeply before giving an answer...
               | 
               | Nice try but they're not giving you the "think deeper"
               | level just because you asked.
        
               | nzach wrote:
               | https://docs.anthropic.com/en/docs/build-with-
               | claude/prompt-...
        
               | dpe82 wrote:
               | Actually that's exactly how you do it.
        
               | theshrike79 wrote:
               | I use Gemini-cli (free 2.5 pro for an undetermined time
               | before it self-lobotomises and switches to lite) to keep
               | the specs up to date.
               | 
               | The actual tasks are stored in Github issues, which
               | Claude (and sometimes Gemini when it feels like it) can
               | access using the `gh` CLI tool.
               | 
               | But it's all just project management, if what the code
               | says drifts from what's in the specs (for any reason),
               | one of them has to change.
               | 
               | Claude does exactly what the documentation says, it
               | doesn't evaluate the fact that the code is completely
               | different and adapt - like a human would.
        
               | bredren wrote:
               | Don't rely entirely on CC. Once a milestone has been
               | reached, copy the full patch to clipboard and the
               | technical spec covering this. Provide the original files,
               | the patch and the spec to Gemini and say ~a colleague did
               | the work and does it fulfill the aims to best practices
               | and spec?
               | 
               | Pick among the best feedback to polish the work done by
               | CC---it will miss things that Gemini will catch.
               | 
               | Then do it again. Sometimes CC just won't follow feedback
               | well and you gotta make the changes yourself.
               | 
               | If you do this you'll be more gradual but by nature of
               | the pattern look at the changes more closely.
               | 
               | You'll be able to realign CC with the spec afterward with
               | a fresh context and the existing commits showing the way.
               | 
               | Fwiw, this kind of technique can be done entirely without
               | CC and can lead to excellent results faster, as Gemini
               | can look at the full picture all at once, vs having to
               | force cc to consume vs hen and peck slices of files.
        
             | wongarsu wrote:
             | Changing the prompt and rerunning is something where Cursor
             | still has a clear edge over Claude Code. It's such a
             | powerful technique for keeping the context small because it
             | keeps the context clear of back-and-forths and dead ends. I
             | wish it was more universally supported
        
               | abound wrote:
               | I do this all the time in Claude Code, you hit Escape
               | twice and select the conversation point to 'branch' from.
        
           | agotterer wrote:
           | I use the main Claude code thread (I don't know what to call
           | it) for planning and then explicitly tell Claude to delegate
           | certain standalone tasks out to subagents. The subagents
           | don't consume the main threads context window. Even just
           | delegating testing, debugging, and building will save a ton
           | context.
        
           | sixothree wrote:
           | /clear often is really the first tool for management. Do this
           | when you finish a task.
        
         | tehlike wrote:
         | Some reference:
         | 
         | https://simonwillison.net/2025/Jun/29/how-to-fix-your-contex...
         | 
         | https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho...
        
           | cubefox wrote:
           | It would be interesting how this manifests in SSM/Mamba
           | models. The way they handle their context window is different
           | from Transformers, as the former don't use the Attention
           | mechanism. Mamba is better at context window scaling than
           | transformers but worse at explicit recall. Though that
           | doesn't tell us how susceptible they are to context
           | distractions.
        
         | bachittle wrote:
         | As of now it's not integrated into Claude Code. "We're also
         | exploring how to bring long context to other Claude products".
         | I'm sure they already know about this issue and are trying to
         | think of solutions before letting users incur more costs on
         | their monthly plans.
        
           | PickledJesus wrote:
           | Seems to be for me, I came to look at HN because I saw it was
           | the default in CC
        
             | novaleaf wrote:
             | where do you see it in CC?
        
               | PickledJesus wrote:
               | I got a notification when I opened it, indicating that
               | the default had changed, and I can see it on /model.
               | 
               | Only on a max (20x) account, not there on a Pro one.
        
               | novaleaf wrote:
               | thanks, FYI I'm on a max 20x also and I don't see it!
        
               | tankenmate wrote:
               | maybe a staggered release?
        
               | Wowfunhappy wrote:
               | I'm curious, what does it say on /model?
               | 
               | For reference, my options are:                   +-------
               | ---------------------------------------------------------
               | -----------------------------+         |
               | |         |  Select Model
               | |         |  Switch between Claude models. Applies to
               | this session and future Claude Code sessions.     |
               | |  For custom model names, specify with --model.
               | |         |
               | |         |     1. Default (recommended)  Opus 4.1 for up
               | to 50% of usage limits, then use Sonnet 4     |         |
               | 2. Opus                   Opus 4.1 for complex tasks *
               | Reaches usage limits faster      |         |     3.
               | Sonnet                 Sonnet 4 for daily use
               | |         |     4. Opus Plan Mode         Use Opus 4.1 in
               | plan mode, Sonnet 4 otherwise                 |         |
               | |         +----------------------------------------------
               | -----------------------------------------------+
        
               | novaleaf wrote:
               | me also
        
         | dbreunig wrote:
         | The team at Chroma is currently looking into this and should
         | have some figures.
        
       | falcor84 wrote:
       | Strange that they don't mention whether that's enabled or
       | configurable in Claude Code.
        
         | farslan wrote:
         | Yeah same, I'm curious about this. I would guess it's by
         | default enabled with Claude Code.
        
         | csunoser wrote:
         | They don't say it outright. But I think it is not in Claude
         | Code yet.
         | 
         | > We're also exploring how to bring long context to other
         | Claude products. - Anthropic
         | 
         | That is, any other product that is not Anthropic API tier 4 or
         | Amazon bedrock.
        
         | CharlesW wrote:
         | From a co-marketing POV, it's considered best practice to not
         | discuss home-grown offerings in the same or similar category as
         | products from the partners you're featuring.
         | 
         | It's likely they'll announce this week, albeit possibly just
         | within the "what's new" notes that you see when Claude Code is
         | updated.
        
           | reasonableklout wrote:
           | They just sent an email that the feature is in beta in CC.
        
       | faangguyindia wrote:
       | In my testing the gap between claude and gemini pro 2.5 is close.
       | My company is in asia pacific and we can't get access to claude
       | via vertex for some stupid reason.
       | 
       | but i tested it via other providers, the gap used to be huge but
       | now not.
        
         | Tostino wrote:
         | For me the gap is pretty large (in Gemini Pro 2.5's favor).
         | 
         | For reference, the code I am working on is a Spring Boot /
         | (Vaadin) Hilla multi-module project with helm charts for
         | deployment and a separate Python based module for ancillary
         | tasks that were appropriate for it.
         | 
         | I've not been able to get any good use out of Sonnet in months
         | now, whereas Gemini Pro 2.5 has (still) been able to grok the
         | project well enough to help out.
        
           | jona777than wrote:
           | I initially found Gemini Pro 2.5 to work well for coding.
           | Over time, I found Claude to be more consistently productive.
           | Gemini Pro 2.5 became my go-to for use cases benefitting from
           | larger context windows. Claude seemed to be the safer daily
           | driver (if I needed to get something done.)
           | 
           | All that being said, Gemini has been consistently dependable
           | when I had asks that involved large amounts of code and data.
           | Claude and the OpenAI models struggled with some tasks that
           | Gemini responsively satisfied seemingly without "breaking a
           | sweat."
           | 
           | Lately, it's been GPT-5 for brainstorming/planning, Claude
           | for hammering out some code, Gemini when there is huge
           | data/code requirements. I'm curious if the widened Sonnet 4
           | context window will change things.
        
           | llm_nerd wrote:
           | Opus 4.1 is a much better model for coding than Sonnet. The
           | latter is good for general queries / investigations or to
           | draw up some heuristics.
           | 
           | I have paid subscriptions to both Gemini Pro and Claude.
           | Hugely worthwhile expense professionally.
        
           | faangguyindia wrote:
           | when gemini 2.5 pro gets stuck, i often use deep seek r1 in
           | architect mode and qwen3 in coder mode in aider and it solves
           | all the problem
           | 
           | last month i ran into some wicked dependency bug and only
           | chatgpt could solve it which i am guessing is the case
           | because it has hot data from github?
           | 
           | On the other hand, i really need a tool like aider where i
           | can use various models in "architect" and "coder" mode.
           | 
           | what i've found is better reasoning models tend to be bad at
           | writing actual code, and models like qwen3 coder seems
           | better.
           | 
           | deep seek r1 will not write reliable code but it will reason
           | well and map out the path forward.
           | 
           | i wouldn't be surprised if sonnets success was doing EXACTLY
           | this behind the scenes.
           | 
           | but now i am looking for pure models who do not use this
           | black magic hack behind API.
           | 
           | I want more control at tool end where i can alter the prompts
           | and achieve results i want
           | 
           | this is one reason i do not use claude code etc....
           | 
           | aider is 80% of what i want wish it had more of what i want
           | though.
           | 
           | i just don't know why no one has build a perfect solution to
           | this yet.
           | 
           | Here are things i am missing in aider
           | 
           | 1. Automatic model switching, use different models for asking
           | questions about the code, different one for planning a
           | feature, different one for writing actual code.
           | 
           | 2. Self determine, if a feature needs a "reasoning" model or
           | coding model will suffice.
           | 
           | 3. be able to do more, selectively send context and drop the
           | files we don't need. Intelligently add files to context which
           | will be touched by the feature, not after having done all
           | code planning asking to add files, then doing it all over
           | again with more context available.
        
         | penguin202 wrote:
         | Claude doesn't have a mid-life crisis and try to `rm -rf /` or
         | delete your project.
        
         | film42 wrote:
         | Agree but pricing wise, Gemini 2.5 pro wins. Gemini input
         | tokens are half the cost of Claude 4. Output is $5/million
         | cheaper than Claude. But, document processing is significantly
         | cheaper. A 5MB PDF (customer invoice) with Gemini is like 5k
         | tokens vs 56k with Claude.
         | 
         | The only downside with Gemini (and it's a big one) is
         | availability. We get rate limited by their dynamic QoS all the
         | time even if we haven't reached our quota. Our GCP sales rep
         | keeps recommending "provisioned throughput," but it's both
         | expensive, and doesn't fit our workload type. Plus, the
         | VertexAI SDK is kind of a PITA compared to Anthropic.
        
           | Alex-Programs wrote:
           | Google products are such a pain to work with from an API
           | perspective that I actively avoid them where possible.
        
       | artursapek wrote:
       | Eagerly waiting for them to do this with Opus
        
         | irthomasthomas wrote:
         | Imagine paying $20 a prompt?
        
           | artursapek wrote:
           | If I can give it a detailed spec, walk away and do something
           | else for 20 minutes, and come back to work that would have
           | taken me 2 hours, then that's a steal.
        
             | dbbk wrote:
             | You can just do this now though. In fact you could go a
             | step further and set up the GitHub Action, then you can
             | kick off Claude from the iOS GitHub app from the beach and
             | review the PR when it's done.
        
           | datadrivenangel wrote:
           | Depending on how many prompts per hour you're looking at,
           | that's probably same order of magnitude as expensive SAAS. A
           | fancy CRM seat can be ~$2000 per month (or more), which
           | assuming 50 hours per week x 4 weeks per month is $10 per
           | hour ($2000/200 hours). A lot of money, but if it makes your
           | sales people more productive, it's a good investment.
           | Assuming that you're paying your sales people say 240K per
           | year, ($20,000 per month), then the SAAS cost is 10% of their
           | salary.
           | 
           | This explains DataDog pricing. Maybe it will give a future
           | look at AI pricing.
        
       | mettamage wrote:
       | Shame it's only the API. Would've loved to see it via the web
       | interface on claude.ai itself.
        
         | minimaxir wrote:
         | Can you even fit 200+k tokens worth of context in the web
         | interface? IMO Claude's API workbench is the worst of the three
         | major providers.
        
           | mettamage wrote:
           | Via text files right? Just drag and drop.
        
           | data-ottawa wrote:
           | When working on artifacts after a few change requests it
           | definitely can.
        
           | 77pt77 wrote:
           | Even if you can't, a conversation can easily get larger than
           | that.
        
         | fblp wrote:
         | I assume this will mean that long chats continue to get the
         | "prompt is too long" error?
        
       | penguin202 wrote:
       | But will it remember any of it, and stop creating new redundant
       | files when it can't find or understand what its looking for?
        
       | 1xer wrote:
       | moaaaaarrrr
        
       | aliljet wrote:
       | This is definitely one of my CORE problem as I use these tools
       | for "professional software engineering." I really desperately
       | need LLMs to maintain extremely effective context and it's not
       | actually that interesting to see a new model that's marginally
       | better than the next one (for my day-to-day).
       | 
       | However. Price is king. Allowing me to flood the context window
       | with my code base is great, but given that the price has
       | substantially increased, it makes sense to better manage the
       | context window into the current situation. The value I'm getting
       | here flooding their context window is great for them, but short
       | of evals that look into how effective Sonnet stays on track, it's
       | not clear if the value actually exists here.
        
         | rootnod3 wrote:
         | Flooding the context also means increasing the likelihood of
         | the LLM confusing itself. Mainly because of the longer context.
         | It derails along the way without a reset.
        
           | aliljet wrote:
           | How do you know that?
        
             | EForEndeavour wrote:
             | https://onnyunhui.medium.com/evaluating-long-context-
             | lengths...
        
             | bigmadshoe wrote:
             | https://research.trychroma.com/context-rot
        
               | joenot443 wrote:
               | This is a good piece. Clearly it's a pretty complex
               | problem and the intuitive result a layman engineer like
               | myself might expect doesn't reflect the reality of LLMs.
               | Regex works as reliably on 20 characters as it does 2m
               | characters; the only difference is speed. I've learned
               | this will probably _never_ be the case with LLMs, there
               | will forever exist some level of epistemic doubt in its
               | result.
               | 
               | When they announced Big Contexts in 2023, they referenced
               | being able to find a single changed sentence in the
               | context's copy of Great Gatsby[1]. This example seemed
               | _incredible_ to me at the time but now two years later
               | I'm feeling like it was pretty cherry-picked. What does
               | everyone else think? Could you feed a novel into an LLM
               | and expect it to find the single change?
               | 
               | [1] https://news.ycombinator.com/item?id=35941920
        
               | adastra22 wrote:
               | Depends on the change.
        
               | bigmadshoe wrote:
               | This is called a "needle in a haystack" test, and all the
               | 1M context models perform perfectly on this exact
               | problem, at least when your prompt and the needle are
               | sufficiently similar.
               | 
               | As the piece above references, this is a totally
               | insufficient test for the real world. Things like "find
               | two unrelated facts tied together by a question, then
               | perform reasoning based on them" are much harder.
               | 
               | Scaling context properly is O(n^2). I'm not really up to
               | date on what people are doing to combat this, but I find
               | it hard to believe the jump from 100k -> 1m context
               | window involved a 100x (10^2) slowdown, so they're
               | probably taking some shortcut.
        
               | dang wrote:
               | Discussed here:
               | 
               |  _Context Rot: How increasing input tokens impacts LLM
               | performance_ -
               | https://news.ycombinator.com/item?id=44564248 - July 2025
               | (59 comments)
        
             | F7F7F7 wrote:
             | What do you think happens when things start falling outside
             | of its context window? It loses access to parts of your
             | conversation.
             | 
             | And that's why it will gladly rebuild the same feature over
             | and over again.
        
             | anonz4FWNqnX wrote:
             | I've had similar experiences. I've gone back and forth
             | between running models locally and using the commercial
             | models. The local models can be incredibly useful (gemma,
             | qwen), but they need more patience and work to get them to
             | work.
             | 
             | One advantage to running locally[1] is that you can set the
             | context length manually and see how well the llm uses it. I
             | don't have an exact experience to relay, but it's not
             | unusual for models to be allow longer contexts, but ignore
             | that context.
             | 
             | Just making the context big doesn't mean the LLM is going
             | to use it well.
             | 
             | [1] I've using lm studio on both a macbook air and a
             | macbook pro. Even a macbook air with 16G can run pretty
             | decent models.
        
               | nomel wrote:
               | A good example of this was the first Gemini model that
               | allowed 1 million tokens, but would lose track of the
               | conversation after a couple paragraphs.
        
             | rootnod3 wrote:
             | The longer the context and the discussion goes on, the more
             | it can get confused, especially if you have to refine the
             | conversation or code you are building on.
             | 
             | Remember, in its core it's basically a text prediction
             | engine. So the more varying context there is, the more
             | likely it is to make a mess of it.
             | 
             | Short context: conversion leaves the context window and it
             | loses context. Long context: it can mess with the model. So
             | the trick is to strike a balance. But if it's an online
             | models, you have fuck all to control. If it's a local
             | model, you have some say in the parameters.
        
             | fkyoureadthedoc wrote:
             | https://github.com/adobe-research/NoLiMa
        
             | giancarlostoro wrote:
             | Here's a paper from MIT that covers how this could be
             | resolved in an interesting fashion:
             | 
             | https://hanlab.mit.edu/blog/streamingllm
             | 
             | The AI field is reusing existing CS concepts for AI that we
             | never had hardware for, and now these people are learning
             | how applied Software Engineering can make their theoretical
             | models more efficient. It's kind of funny, I've seen this
             | in tech over and over. People discover new thing, then
             | optimize using known thing.
        
               | mamp wrote:
               | Unfortunately, I think the context rot paper [1] found
               | that the performance degradation when context increased
               | still occurred in models using attention sinks.
               | 
               | 1. https://research.trychroma.com/context-rot
        
               | giancarlostoro wrote:
               | Saw that paper have not had a chance to read it yet, are
               | there other techniques that help then? I assume theres a
               | few different ones used.
        
               | kridsdale3 wrote:
               | The fact that this is happening is where the tremendous
               | opportunity to make money as an experienced Software
               | Engineer currently lies.
               | 
               | For instance, a year or two ago, the AI people discovered
               | "cache". Imagine how many millions the people who
               | implemented it earned for that one.
        
               | giancarlostoro wrote:
               | I've been thinking the same, and its things that you
               | don't need some crazy ML degree to know how to do... A
               | lot of the algorithms are known... for a while now...
               | Milk it while you can.
        
           | Wowfunhappy wrote:
           | I keep reading this, but with Claude Code in particular, I
           | consistently find it gets smarter the longer my conversations
           | go on, peaking right at the point where it auto-compacts and
           | everything goes to crap.
           | 
           | This isn't always true--some conversations go poorly and it's
           | better to reset and start over--but it usually is.
        
             | will_pseudonym wrote:
             | This is my exact experience as well. I wonder if I should
             | switch to using Sonnet so that I can have more time before
             | auto-compact gets forced on me.
        
             | jacobr1 wrote:
             | I've found there usually is some key context that is
             | missing. Maybe it is project structure or a sampling of
             | some key patterns from different parts of the codebase, or
             | key data models. Getting those into CLAUDE.md reduces the
             | need to keep building up (as large) context.
             | 
             | As an example for one project, I realized things were
             | getting better after it started writing integration tests.
             | I wasn't sure if that was the act of writing the test
             | forced it to reason about the they black box way the system
             | would be used, or if there was another factor. Turns out it
             | was just example usage. Extracting out the usage patterns
             | into both the README and CLAUDE.md was itself a simple
             | request, then I got similar performance on new tasks.
        
         | benterix wrote:
         | > it's not clear if the value actually exists here.
         | 
         | Having spent a couple of weeks on Claude Code recently, I
         | arrived to the conclusion that the net value for me from
         | agentic AI is actually negative.
         | 
         | I will give it another run in 6-8 months though.
        
           | wahnfrieden wrote:
           | Did you try with using Opus exclusively?
        
             | freedomben wrote:
             | Do you know if there's a way to force Claude code to do
             | that exclusively? I've found a few env vars online but they
             | don't seem to actually work
        
               | wahnfrieden wrote:
               | Peter Steinberger has been documenting his workflows and
               | he relies exclusively on Opus at least until recently.
               | (He also pays for a few Max 20x subscriptions at once to
               | avoid rate limits.)
        
               | atonse wrote:
               | You can type /config and then go to the setting to pick a
               | model.
        
               | gdudeman wrote:
               | Yes: type /model and then pick Opus 4.1.
        
               | artursapek wrote:
               | You can "force" it by just paying them $200 (which is
               | nothing compared to the value)
        
               | parineum wrote:
               | Value is irrelevant. What's the return on investment you
               | get from spending $200?
               | 
               | Collecting value doesn't really get you anywhere if
               | nobody is compensating you for it. Unless someone is
               | going to either pay for it for you or give you $200/mo
               | post-tax dollars, it's costing you money.
        
               | wahnfrieden wrote:
               | The return for me is faster output of features, fixes,
               | and polish for my products which increases revenue above
               | the cost of the tool. Did you need to ask this?
        
               | parineum wrote:
               | Yes, I did. Not everybody has their own product that
               | might benefit from a $200 subscription. Most of us work
               | for someone else and, unless that person is paying for
               | the subscription, the _value_ it adds is irrelevant
               | unless it results in better compensation.
               | 
               | Furthermore, the advice was given to upgrade to a $200
               | subscription from the $20 subscription. The difference in
               | value that might translate into income between the $20
               | option and the $200 option is very unclear.
        
               | wahnfrieden wrote:
               | If you are employed you should petition your employer for
               | tools you want. Maybe you can use it to take the day off
               | earlier or spend more time socializing. Or to get a
               | promotion or performance bonus. Hopefully not just to
               | meet rising productivity expectations without being
               | handed the tools needed to achieve that. Having full-time
               | access to these tools can also improve your own skills in
               | using them, to profit from in a later career move or from
               | contributing toward your own ends.
        
               | parineum wrote:
               | I'm not disputing that. I'm just pushing back against the
               | casual suggestion (not by you) to just go spend $200.
               | 
               | No doubt that you should ask you employer for the tools
               | you want/need to do your job but plenty of us are using
               | this kind of thing casually and the response to "Any way
               | I can force it to use [Opus] exclusively?" is "Spend
               | $200, it's worth it." isn't really helpful, especially in
               | the context where the poster was clearly looking to try
               | it out to see if it was worth it.
        
               | Aeolun wrote:
               | If you have the money, and like coding your own stuff,
               | the $200 is worth it. If you just code for the
               | enterprise? Not so much.
        
               | epiccoleman wrote:
               | is Opus that much better than Sonnet? My sub is $20 a
               | month, so I guess I'd have to buy that I'm going to get a
               | 10x boost, which seems dubious
        
               | theshrike79 wrote:
               | With the $20 plan you get Opus on the web and in the
               | native app. Just not in Claude Code.
               | 
               | IMO it's pretty good for design, but with code it gets in
               | its head a bit too much and overthinks and
               | overcomplicates solutions.
        
               | artursapek wrote:
               | Yes, Opus is much better at complicated architecture
        
               | noarchy wrote:
               | It does seem better in many regards, but the usage limits
               | get hit quickly even with a paid account.
        
           | mark_l_watson wrote:
           | I am sort of with you. I am down to asking Gemini Pro a
           | couple of questions a day, use ChatGPT just a few times a
           | week, and about once a week use gemini-cli (either a short
           | free session, or a longer session where I provide my API
           | key.)
           | 
           | That said I spend (waste?) an absurdly large amount of time
           | each week experimenting with local models (sometimes
           | practical applications, sometimes 'research').
        
           | mikepurvis wrote:
           | For a bit more nuance, I think I would my overall net is
           | about break even. But I don't take that as "it's not worth it
           | at all, abandon ship" but rather that I need to hone my
           | instinct of what is and is not a good task for AI
           | involvement, and what that involvement should look like.
           | 
           | Throwing together a GHA workflow? Sure, make a ticket, assign
           | it to copilot, check in later to give a little feedback and
           | we're golden. Half a day of labour turned into fifteen
           | minutes.
           | 
           | But there are a lot of tasks that are far too nuanced where
           | trying to take that approach just results in frustration and
           | wasted time. There it's better to rely on editor completion
           | or maybe the chat interface, like "hey I want to do X and Y,
           | what approach makes sense for this?" and treat it like a
           | rubber duck session with a junior colleague.
        
           | cambaceres wrote:
           | For me it's meant a huge increase in productivity, at least
           | 3X.
           | 
           | Since so many claim the opposite, I'm curious to what you do
           | more specifically? I guess different roles/technologies
           | benefit more from agents than others.
           | 
           | I build full stack web applications in node/.net/react, more
           | importantly (I think) is that I work on a small startup and
           | manage 3 applications myself.
        
             | datadrivenangel wrote:
             | How do you structure your applications for maintainability?
        
             | dingnuts wrote:
             | You have small applications following extremely common
             | patterns and using common libraries. Models are good at
             | regurgitating patterns they've seen many times, with fuzzy
             | find/replace translations applied.
             | 
             | Try to build something like Kubernetes from the ground up
             | and let us know how it goes. Or try writing a custom
             | firmware for a device you just designed. Something like
             | that.
        
             | elevatortrim wrote:
             | I think there are two broad cases where ai coding is
             | beneficial:
             | 
             | 1. You are a good coder but working on a new (to you) or
             | building a new project, or working with a technology you
             | are not familiar with. This is where AI is hugely
             | beneficial. It does not only accelerate you, it lets you do
             | things you could not otherwise.
             | 
             | 2. You have spent a lot of time on engineering your context
             | and learning what AI is good at, and using it very
             | strategically where you know it will save time and not
             | bother otherwise.
             | 
             | If you are a really good coder, really familiar with the
             | project, and mostly changing its bits and pieces rather
             | than building new functionality, AI won't accelerate you
             | much. Especially if you did not invest the time to make it
             | work well.
        
             | nicce wrote:
             | > I build full stack web applications in node/.net/react,
             | more importantly (I think) is that I work on a small
             | startup and manage 3 applications myself.
             | 
             | I think this is your answer. For example, React and
             | JavaScript are extremely popular and aged. Are you using
             | TypeScript and want to get most of the types or are you
             | accepting everything that LLM gives as JavaScript? How
             | interested you are about the code whether it is using "soon
             | to be deprecated" functions or the most optimized
             | loop/implementation? How about the project structure?
             | 
             | In other cases, the more precision you need, less effective
             | LLM is.
        
             | rs186 wrote:
             | 3X if not 10X if you are starting a new project with
             | Next.js, React, Tailwind CSS for a fullstack website
             | development, that solves an everyday problem. Yeah I just
             | witnessed that yesterday when creating a toy project.
             | 
             | For my company's codebase, where we use internal tools and
             | proprietary technology, solving a problem that does not
             | exist outside the specific domain, on a codebase of over
             | 1000 files? No way. Even locating the correct file to edit
             | is non trivial for a new (human) developer.
        
               | GenerocUsername wrote:
               | Your first week of AI usage should be crawling your
               | codebase and generating context.md docs that can then be
               | fed back into future prompts so that AI understands your
               | project space, packages, apis, and code philosophy.
               | 
               | I guarantee your internal tools are not revolutionary,
               | they are just unrepresented in the ML model out of the
               | box
        
               | nicce wrote:
               | Even then, are you even allowed to use AI in such
               | codebase. Is some part of the code "bought", e.g.
               | commercial compiler generated with specific license? Is
               | pinky promise from LLM provider enough?
        
               | GenerocUsername wrote:
               | Are the resources to understand the code on a computer?
               | Whether it's code, swagger, or a collection of sticky
               | notes, your job is now to supply context to the AI.
               | 
               | I am 100% convinced people who are not getting value from
               | AI would have trouble explaining how to tie shoes to a
               | toddler
        
               | orra wrote:
               | That sounds incredibly boring.
               | 
               | Is it effective? If so I'm sure we'll see models to
               | generate those context.md files.
        
               | cpursley wrote:
               | Yes. And way less boring than manually reading a section
               | of a codebase to understand what is going on after being
               | away from it for 8 months. Claude's docs and git commit
               | writing skills are worth it for that alone.
        
               | blitztime wrote:
               | How do you keep the context.md updated as the code
               | changes?
        
               | shmoogy wrote:
               | I tell Claude to update it generally but you can probably
               | use a hook
        
               | tombot wrote:
               | This, while it has context of the current problem, just
               | ask Claude to re-read it's own documentation and think of
               | things to add that will help it in the future
        
               | MattGaiser wrote:
               | Yeah, anecdotally it is heavily dependent on:
               | 
               | 1. Using a common tech. It is not as good at Vue as it is
               | at React.
               | 
               | 2. Using it in a standard way. To get AI to really work
               | well, I have had to change my typical naming conventions
               | (or specify them in detail in the instructions).
        
               | nicce wrote:
               | React also seems to be actually alias for Next.js. Models
               | have hard time to make the difference.
        
               | mike_hearn wrote:
               | My codebase has about 1500 files and is highly domain
               | specific: it's a tool for shipping desktop apps[1] that
               | handles all the building, packaging, signing, uploading
               | etc for every platform on every OS simultaneously. It's
               | written mostly in Kotlin, and to some extent uses a
               | custom in-house build system. The rest of the build is
               | Gradle, which is a notoriously confusing tool. The source
               | tree also contains servers, command line tools and a
               | custom scripting language which is used for all the
               | scripting needs of the project [2].
               | 
               | The code itself is quite complex and there's lots of
               | unusual code for munging undocumented formats, speaking
               | undocumented protocols, doing cryptography, Mac/Windows
               | specific APIs, and it's all built on a foundation of a
               | custom parallel incremental build system.
               | 
               | In other words: nightmare codebase for an LLM. Nothing
               | like other codebases. Yet, Claude Code demolishes
               | problems in it without a sweat.
               | 
               | I don't know why people have different experiences but
               | speculating a bit:
               | 
               | 1. I wrote most of it myself and this codebase is
               | unusually well documented and structured compared to
               | most. All the internal APIs have full JavaDocs/KDocs,
               | there are extensive design notes in Markdown in the
               | source tree, the user guide is also part of the source
               | tree. Files, classes and modules are logically named.
               | Files are relatively small. All this means Claude can
               | often find the right parts of the source within just a
               | few tool uses.
               | 
               | 2. I invested in making a good CLAUDE.md and also wrote a
               | script to generate "map.md" files that are at the top of
               | every module. These map files contain one-liners of what
               | every source file contains. I used Gemini to make these
               | due to its cheap 1M context window. If Claude _does_
               | struggle to find the right code by just reading the
               | context files or guessing, it can consult the maps to
               | locate the right place quickly.
               | 
               | 3. I've developed a good intuition for what it can and
               | cannot do well.
               | 
               | 4. I don't ask it to do big refactorings that would
               | stress the context window. IntelliJ is for refactorings.
               | AI is for writing code.
               | 
               | [1] https://hydraulic.dev
               | 
               | [2] https://hshell.hydraulic.dev/
        
               | tptacek wrote:
               | That's an interesting comment, because "locating the
               | correct file to edit" was the very first thing LLMs did
               | that was valuable to me as a developer.
        
             | evantbyrne wrote:
             | The problem with these discussions is that almost nobody
             | outside of the agency/contracting world seems to track
             | their time. Self-reported data is already sketchy enough
             | without layering on the issue of relying on distant memory
             | of fine details.
        
             | andrepd wrote:
             | Self-reports are notoriously overexcited, real results are,
             | let's say, not so stellar.
             | 
             | https://metr.org/blog/2025-07-10-early-2025-ai-
             | experienced-o...
        
               | logicprog wrote:
               | Here's an in depth analysis and critique of that study by
               | someone whose job is literally to study programmers
               | psychologically and has experience in sociology studies:
               | https://www.fightforthehuman.com/are-developers-slowed-
               | down-...
               | 
               | Basically, the study has a fuckton of methodological
               | problems that seriously undercut the quality of its
               | findings, and even assuming its findings are correct, if
               | you look closer at the data, it doesn't show what it
               | claims to show regarding developer estimations, and the
               | story of whether it speeds up or slows down developers is
               | actually much more nuanced and precisely mirrors what the
               | developers themselves say in the qualitative quote
               | questionaire, and relatively closely mirrors what the
               | more nuanced people will say here -- that it helps with
               | things you're less familiar with, that have scope creep,
               | etc a lot more, but is less or even negatively useful for
               | the opposite scenarios -- even in the worst case setting.
               | 
               | Not to mention this is studying a highly specific and
               | rare subset of developers, and they even admit it's a
               | subset that isn't applicable to the whole.
        
             | acedTrex wrote:
             | I have yet to get it to generate code past 10ish lines that
             | I am willing to accept. I read stuff like this and wonder
             | how low yall's standards are, or if you are working on
             | projects that just do not matter in any real world sense.
        
               | spicyusername wrote:
               | 4/5 times I can easily get 100s of lines output, that
               | only needs a quick once over.
               | 
               | 1/5 times, I spend an extra hour tangled in code it
               | outputs that I eventually just rewrite from scratch.
               | 
               | Definitely a massive net positive, but that 20% is
               | extremely frustrating.
        
               | acedTrex wrote:
               | That is fascinating to me, i've never seen it generate
               | that much code that is actually something i would
               | consider correct. It's always wrong in some way.
        
               | LinXitoW wrote:
               | In my experience, if I have to issue more than 2
               | corrections, I'm better off restarting and beefing up the
               | prompt or just doing it myself
        
               | dillydogg wrote:
               | Whenever I read comments from the people singing their
               | praises of the technology, it's hard not to think of the
               | study that found AI tools made developers slower in early
               | 2025.
               | 
               | >When developers are allowed to use AI tools, they take
               | 19% longer to complete issues--a significant slowdown
               | that goes against developer beliefs and expert forecasts.
               | This gap between perception and reality is striking:
               | developers expected AI to speed them up by 24%, and even
               | after experiencing the slowdown, they still believed AI
               | had sped them up by 20%.
               | 
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
        
               | mstkllah wrote:
               | Ah, the very extensive study with 16 developers.
               | Bulletproof results.
        
               | izacus wrote:
               | Yeah, we should listen to the one "trust me bro" dude
               | instead.
        
               | troupo wrote:
               | Compared to "it's just a skill issue you're not prompting
               | it correctly" crowd with literally zero actionable data?
        
               | logicprog wrote:
               | Here's an in depth analysis and critique of that study by
               | someone whose job is literally to study programmers
               | psychologically and has experience in sociology studies:
               | https://www.fightforthehuman.com/are-developers-slowed-
               | down-...
               | 
               | Basically, the study has a fuckton of methodological
               | problems that seriously undercut the quality of its
               | findings, and even assuming its findings are correct, if
               | you look closer at the data, it doesn't show what it
               | claims to show regarding developer estimations, and the
               | story of whether it speeds up or slows down developers is
               | actually much more nuanced and precisely mirrors what the
               | developers themselves say in the qualitative quote
               | questionaire, and relatively closely mirrors what the
               | more nuanced people will say here -- that it helps with
               | things you're less familiar with, that have scope creep,
               | etc a lot more, but is less or even negatively useful for
               | the opposite scenarios -- even in the worst case setting.
               | 
               | Not to mention this is studying a highly specific and
               | rare subset of developers, and they even admit it's a
               | subset that isn't applicable to the whole.
        
               | dillydogg wrote:
               | This is very helpful, thank you for the resource
        
               | djeastm wrote:
               | Standards are going to be as low as the market allows I
               | think. Some industries code quality is paramount, other
               | times its negligible and perhaps speed of development is
               | higher priority and the code is mostly disposable.
        
             | wiremine wrote:
             | > Having spent a couple of weeks on Claude Code recently, I
             | arrived to the conclusion that the net value for me from
             | agentic AI is actually negative.
             | 
             | > For me it's meant a huge increase in productivity, at
             | least 3X.
             | 
             | How do we reconcile these two comments? I think that's a
             | core question of the industry right now.
             | 
             | My take, as a CTO, is this: we're giving people new tools,
             | and very little training on the techniques that make those
             | tools effective.
             | 
             | It's sort of like we're dropping trucks and airplanes on a
             | generation that only knows walking and bicycles.
             | 
             | If you've never driven a truck before, you're going to
             | crash a few times. Then it's easy to say "See, I told you,
             | this new fangled truck is rubbish."
             | 
             | Those who practice with the truck are going to get the hang
             | of it, and figure out two things:
             | 
             | 1. How to drive the truck effectively, and
             | 
             | 2. When NOT to use the truck... when talking or the bike is
             | actually the better way to go.
             | 
             | We need to shift the conversation to techniques, and away
             | from the tools. Until we do that, we're going to be forever
             | comparing apples to oranges and talking around each other.
        
               | jdgoesmarching wrote:
               | Agreed, and it drives me bonkers when people talk about
               | AI coding as if it represents some a single technique,
               | process, or tool.
               | 
               | Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days.
               | 
               | We don't even fully agree on the best practices for
               | writing code _without_ AI.
        
               | moregrist wrote:
               | > Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days.
               | 
               | There were gobs of terrible road metaphors that spun out
               | of calling the Internet the "Information Superhighway."
               | 
               | Gobs and gobs of them. All self-parody to anyone who knew
               | anything.
               | 
               | I hesitate to relate this to anything in the current AI
               | era, but maybe the closest (and in a gallows humor/doomer
               | kind of way) is the amount of exec speak on how many jobs
               | will be replaced.
        
               | porksoda wrote:
               | Remember the ones who loudly proclaimed the internet to
               | be a passing fad, not useful for normal people. All anti
               | LLM rants taste like that to me.
               | 
               | I get why they thought that - it was kind of crappy
               | unless you're one who is excited about the future and
               | prepared to bleed a bit on the edge.
        
               | benterix wrote:
               | > Remember the ones who loudly proclaimed the internet to
               | be a passing fad, not useful for normal people. All anti
               | LLM rants taste like that to me.
               | 
               | For me they're very different and they sound much more
               | the crypto-skepticism. It's not like "LLMs are worthless,
               | there are no use cases, they should be banned" but rather
               | "LLMs do have their use cases but they also do have
               | inherent flaws that need to be addressed; embedding them
               | in every product makes no sense etc.". (I mean LLMs as
               | tech, what's happening with GenAI companies and their
               | leaders is a completely different matter and we have
               | every right to criticize every lie, hypocrisy and
               | manipulation, but let's not mix up these two.)
        
               | mh- wrote:
               | _> Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days._
               | 
               | Older person here: they absolutely did, all over the
               | place in the early 90s. I remember people decrying
               | projects that moved them to computers everywhere I went.
               | Doctors offices, auto mechanics, etc.
               | 
               | Then later, people did the same thing about _the
               | Internet_ (was written with a single word capital I by
               | 2000, having been previously written as two separate
               | words.)
               | 
               | https://i.imgur.com/vApWP6l.png
        
               | jacquesm wrote:
               | And not all of those people were wrong.
        
               | jeremy_k wrote:
               | Well put. It really does come down to nuance. I find
               | Claude is amazing at writing React / Typescript. I mostly
               | let it do it's own thing and skim the results after. I
               | have it write Storybook components so I can visually
               | confirm things look how I want. If something isn't quite
               | right I'll take a look and if I can spot the problem and
               | fix it myself, I'll do that. If I can't quickly spot it,
               | I'll write up a prompt describing what is going on and
               | work through it with AI assistance.
               | 
               | Overall, React / Typescript I heavily let Claude write
               | the code.
               | 
               | The flip side of this is my server code is Ruby on Rails.
               | Claude helps me a lot less here because this is my
               | primary coding background. I also have a certain way I
               | like to write Ruby. In these scenarios I'm usually asking
               | Claude to generate tests for code I've already written
               | and supplying lots of examples in context so the coding
               | style matches. If I ask Claude to write something novel
               | in Ruby I tend to use it as more of a jumping off point.
               | It generates, I read, I refactor to my liking. Claude is
               | still very helpful, but I tend to do more of the code
               | writing for Ruby.
               | 
               | Overall, helpful for Ruby, I still write most of the
               | code.
               | 
               | These are the nuances I've come to find and what works
               | best for my coding patterns. But to your point, if you
               | tell someone "go use Claude" and they have have a
               | preference in how to write Ruby and they see Claude
               | generate a bunch of Ruby they don't like, they'll likely
               | dismiss it as "This isn't useful. It took me longer to
               | rewrite everything than just doing it myself". Which all
               | goes to say, time using the tools whether its Cursor,
               | Claude Code, etc (I use OpenCode) is the biggest key but
               | figuring out how to get over the initial hump is probably
               | the biggest hurdle.
        
               | croes wrote:
               | Do you only skim the results or do you audit them at some
               | point to prevent security issues?
        
               | jeremy_k wrote:
               | What kind of security issues are you thinking about? I'm
               | generating UI components like Selects for certain data
               | types or Charts of data.
        
               | dghlsakjg wrote:
               | User input is a notoriously thorny area.
               | 
               | If you aren't sanitizing and checking the inputs
               | appropriately somewhere between the user and trusted
               | code, you WILL get pwned.
               | 
               | Rails provides default ways to avoid this, but it makes
               | it very easy to do whatever you want with user input.
               | Rails will not necessarily throw a warning if your AI
               | decides that it wants to directly interpolate user input
               | into a sql query.
        
               | jeremy_k wrote:
               | Well in this case, I am reading through everything that
               | is generated for Rails because I want things to be done
               | my way. For user input, I tend to validate everything
               | with Zod before sending it off the backend which then
               | flows through ActiveRecord.
               | 
               | I get what you're saying that AI could write something
               | that executes user input but with the way I'm using the
               | tools that shouldn't happen.
        
               | croes wrote:
               | Do these components have JS, do they have npm
               | dependencies?
               | 
               | Since AI slopsquatting is a thing
               | 
               | https://en.wikipedia.org/wiki/Slopsquatting
        
               | jeremy_k wrote:
               | I do not have AI install packages or do things like run
               | Git commands for me.
        
               | k9294 wrote:
               | For this very reason I switched for TS for backend as
               | well. I'm not a big fun of JS but the productivity gain
               | of having shared types between frontend and backend and
               | the Claude code proficiency with TS is immense.
        
               | jeremy_k wrote:
               | I considered this, but I'm just too comfortable writing
               | my server logic in Ruby on Rails (as I do that for my day
               | job and side project). I'm super comfortable writing
               | client side React / Typescript but whenever I look at
               | server side Typescript code I'm like "I should understand
               | what this is doing but I don't" haha.
        
               | jorvi wrote:
               | It is not really a nuanced take when it compares
               | 'unassisted' coding to using a bicycle and AI-assisted
               | coding with a truck.
               | 
               | I put myself somewhere in the middle in terms of how
               | great I think LLMs are for coding, but anyone that has
               | worked with a colleague that loves LLM coding knows how
               | horrid it is that the team has to comb through and
               | doublecheck their commits.
               | 
               | In that sense it would be equally nuanced to call AI-
               | assisted development something like "pipe bomb coding".
               | You toss out your code into the branch, and your non-AI'd
               | colleagues have to quickly check if your code is a
               | harmless tube of code or yet another contraption that
               | quickly needs defusing before it blows up in everyone's
               | face.
               | 
               | Of course that is not nuanced either, but you get the
               | point :)
        
               | LinXitoW wrote:
               | Oh nuanced the comparison seems also depends on whether
               | you live in Arkansas or in Amsterdam.
               | 
               | But I disagree that your counterexample has anything at
               | all to do with AI coding. That very same developer was
               | perfectly capable of committing untested crap without AI.
               | Perfectly capable of copy pasting the first answer they
               | found on Stack Overflow. Perfectly capable of recreating
               | utility functions over and over because they were to lazy
               | to check if they already exist.
        
               | nabla9 wrote:
               | I agree.
               | 
               | I experience a productivity boost, and I believe it's
               | because I prevent LLMs from making design choices or
               | handling creative tasks. They're best used as a "code
               | monkey", fill in function bodies once I've defined them.
               | I design the data structures, functions, and classes
               | myself. LLMs also help with learning new libraries by
               | providing examples, and they can even write unit tests
               | that I manually check. Importantly, no code I haven't
               | read and accepted ever gets committed.
               | 
               | Then I see people doing things like "write an app for
               | ....", run, hey it works! WTF?
        
               | quikoa wrote:
               | It's not just about the programmer and his experience
               | with AI tools. The problem domain and programming
               | language(s) used for a particular project may have a
               | large impact on how effective the AI can be.
        
               | wiremine wrote:
               | > The problem domain and programming language(s) used for
               | a particular project may have a large impact on how
               | effective the AI can be.
               | 
               | 100%. Again, if we only focus on things like context
               | windows, we're missing the important details.
        
               | vitaflo wrote:
               | But even on the same project with the same tools the
               | general way a dev derives satisfaction from their work
               | can play a big role. Some devs derive satisfaction from
               | getting work done and care less about the code as long as
               | it works. Others derive satisfaction from writing well
               | architected and maintainable code. One can guess the
               | reactions to how LLM's fit into their day to day lives
               | for each.
        
               | weego wrote:
               | In a similar role and place with this.
               | 
               | My biggest take so far: If you're a disciplined coder
               | that can handle 20% of an entire project's (project being
               | a bug through to an entire app) time being used on
               | research, planning and breaking those plans into phases
               | and tasks, then augmenting your workflow with AI appears
               | to be to have large gains in productivity.
               | 
               | Even then you need to learn a new version of explaining
               | it 'out loud' to get proper results.
               | 
               | If you're more inclined to dive in and plan as you go,
               | and store the scope of the plan in your head because
               | "it's easier that way" then AI 'help' will just
               | fundamentally end up in a mess of frustration.
        
               | cmdli wrote:
               | My experience has been entirely the opposite as an IC. If
               | I spend the time to delve into the code base to the point
               | that I understand how it works, AI just serves as a mild
               | improvement in writing code as opposed to implementing it
               | normally, saving me maybe 5 minutes on a 2 hour task.
               | 
               | On the other hand, I've found success when I have no idea
               | how to do something and tell the AI to do it. In that
               | case, the AI usually does the wrong thing but it can
               | oftentimes reveal to me the methods used in the rest of
               | the codebase.
        
               | zarzavat wrote:
               | Both modes of operation are useful.
               | 
               | If you know how to do something, then you can give Claude
               | the broad strokes of how you want it done and -- if you
               | give enough detail -- hopefully it will come back with
               | work similar to what you would have written. In this case
               | it's saving you on the order of minutes, but those
               | minutes add up. There is a possibility for negative time
               | saving if it returns garbage.
               | 
               | If you _don 't_ know how to do something then you can see
               | if an AI has any ideas. This is where the big
               | productivity gains are, hours or even days can become
               | minutes if you are sufficiently clueless about something.
        
               | jacobr1 wrote:
               | An importantly the cycle time on this stuff can be much
               | faster. Trying out different variants, and iterating
               | through larger changes can be huge.
        
               | hirako2000 wrote:
               | The issue is that you would be not just clueless but
               | grown naive about the correctness of what it did.
               | 
               | Knowing what to do at least you can review. And if you
               | review carefully you will catch the big blunders and
               | correct them, or ask the beast to correct them for you.
               | 
               | > Claude, please generate a safe random number. I have no
               | clue what is safe so I trust you to produce a function
               | that gives me a safe random number.
               | 
               | Not every use case is sensitive, but even building pieces
               | for entertainment, if it wipe things it shouldn't delete
               | or drain the battery doing very inefficient operations
               | here and there, it's junk, undesirable software.
        
               | bcrosby95 wrote:
               | Claude will point you in the right neighborhood but to
               | the wrong house. So if you're completely ignorant that's
               | cool. But recognize that its probably wrong and only a
               | starting point.
               | 
               | Hell, I spent 3 hours "arguing" with Claude the other day
               | in a new domain because my intuition told me something
               | was true. I brought out all the technical reason why it
               | was fine but Claude kept skirting around it saying the
               | code change was wrong.
               | 
               | After spending extra time researching it I found out
               | there was a technical term for it and when I brought that
               | up Claude finally admitted defeat. It was being a
               | persistent little fucker before then.
               | 
               | My current hobby is writing concurrent/parallel systems.
               | Oh god AI agents are terrible. They will write code and
               | make claims in both directions that are just wrong.
        
               | hebocon wrote:
               | > After spending extra time researching it I found out
               | there was a technical term for it and when I brought that
               | up Claude finally admitted defeat. It was being a
               | persistent little fucker before then.
               | 
               | Whenever I feel like I need to write "Why aren't you
               | listening to me?!" I know it's time for a walk and a
               | change in strategy. It's also a good indicator that I'm
               | changing too much at once and that my requirements are
               | too poorly defined.
        
               | zarzavat wrote:
               | To give an example: a few days ago I needed to patch an
               | open source library to add a single feature.
               | 
               | This is a pathologically bad case for a human. I'm in an
               | alien codebase, I don't know where anything is. The
               | library is vanilla JS (ES5 even!) so the only way to know
               | the types is to read the function definitions.
               | 
               | If I had to accomplish this task myself, my estimate
               | would be 1-2 days. It takes time to get read code, get
               | orientated, understand what's going on, etc.
               | 
               | I set Claude on the problem. Claude diligently starts
               | grepping, it identifies the source locations where the
               | change needs to be made. After 10 minutes it has a patch
               | for me.
               | 
               | Does it do exactly what I wanted it to do? No. But it
               | does all the hard work. Now that I have the scaffolding
               | it's easy to adapt the patch to do exactly what I need.
               | 
               | On the other hand, yesterday I had to teach Claude that
               | writing a loop of { writeByte(...) } is _not_ the right
               | way to copy a buffer. Claude clearly thought that it was
               | being very DRY by not having to duplicate the bounds
               | check.
               | 
               | I remain sceptical about the vibe coders burning
               | thousands of dollars using it in a loop. It's hardworking
               | but stupid.
        
               | teaearlgraycold wrote:
               | LLMs are great at semantic searching through packages
               | when I need to know exactly how something is implemented.
               | If that's a major part of your job then you're saving a
               | ton of time with what's available today.
        
               | t0mas88 wrote:
               | For me it has a big positive impact on two sides of the
               | spectrum and not so much in the middle.
               | 
               | One end is larger complex new features where I spend a
               | few days thinking about how to approach it. Usually most
               | thought goes into how to do something complex with good
               | performance that spans a few apps/services. I write a
               | half page high level plan description, a set of bullets
               | for gotchas and how to deal with them and list normal
               | requirements. Then let Claude Code run with that. If the
               | input is good you'll get a 90% version and then you can
               | refactor some things or give it feedback on how to do
               | some things more cleanly.
               | 
               | The other end of the spectrum is "build this simple
               | screen using this API, like these 5 other examples". It
               | does those well because it's almost advanced autocomplete
               | mimicking your other code.
               | 
               | Where it doesn't do well for me is in the middle between
               | those two. Some complexity, not a big plan and not simple
               | enough to just repeat something existing. For those
               | things it makes a mess or you end up writing a lot of
               | instructions/prompt abs could have just done it yourself.
        
               | ath3nd wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | The current freshest study focusing on experienced
               | developers showed a net negative in the productivity when
               | using an LLM solution in their flow:
               | 
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
               | 
               | My conclusion on this, as an ex VP of Engineering, is
               | that good senior developers find little utility with LLMs
               | and even them to be a nuisance/detriment, while for
               | juniors, they can be godsend, as they help them with
               | syntax and coax the solution out of them.
               | 
               | It's like training wheels to a bike. A toddler might find
               | 3x utility, while a person who actually can ride a bike
               | well will find themselves restricted by training wheels.
        
               | pesfandiar wrote:
               | Your analogy would be much better with giving workers a
               | work horse with a mind of its own. Trucks come with clear
               | instructions and predictable behaviour.
        
               | chasd00 wrote:
               | > Your analogy would be much better with giving workers a
               | work horse with a mind of its own.
               | 
               | i think this is a very insightful comment with respect to
               | working with LLMs. If you've ever ridden a horse you
               | don't really tell it to walk, run, turn left, turn right,
               | etc you have to convince it to do those things and not be
               | too aggravating while you're at it. With a truck simple
               | cause and effect applies but with horse it's a
               | negotiation. I feel like working with LLMs is like a
               | negotiation, you have to coax out of it what you're
               | after.
        
               | pletnes wrote:
               | Being a consultant / programmer with feet on the ground,
               | eh, hands on the keyboard: some orgs let us use some AI
               | tools, others do not. Some projects are predominantly new
               | code based on recent tech (React); others include
               | maintaining legacy stuff on windows server and
               | proprietary frameworks. AI is great on some tasks, but
               | unavailable or ignorant about others. Some projects have
               | sharp requirements (or at least, have requirements)
               | whereas some require 39 out of 40 hours a week guessing
               | at what the other meat-based intelligences actually want
               | from us.
               | 
               | What <<programming>> actually entails, differs
               | enormously; so does AI's relevance.
        
               | abc_lisper wrote:
               | I doubt there is much art to getting LLM work for you,
               | despite all the hoopla. Any competent engineer can figure
               | that much out.
               | 
               | The real dichotomy is this. If you are aware of the
               | tools/APIs and the Domain, you are better off writing the
               | code on your own, except may be shallow changes like
               | refactorings. OTOH, if you are not familiar with the
               | domain/tools, using a LLM gives you a huge legup by
               | preventing you from getting stuck and providing intial
               | momentum.
        
               | jama211 wrote:
               | I dunno, first time I tried an LLM I was getting so
               | annoyed because I just wanted it to go through a css file
               | and replace all colours with variables defined in root,
               | and it kept missing stuff and spinning and I was getting
               | so frustrated. Then a friend told me I should instead
               | just ask it to write a script which accomplishes that
               | goal, and it did it perfectly in one prompt, then ran it
               | for me, and also wrote another script to check it hadn't
               | missed any and ran that.
               | 
               | At no point when it was getting f stuck initially did it
               | suggest another approach, or complain that it was outside
               | its context window even though it was.
               | 
               | This is a perfect example of "knowing how to use an LLM"
               | taking it from useless to useful.
        
               | abc_lisper wrote:
               | Which one did you use and when was this? I mean, no body
               | gets anything working right the first time. You got to
               | spend a few days atleast trying to understand the tool
        
               | badlucklottery wrote:
               | This is my experience as well.
               | 
               | LLM currently produce pretty mediocre code. A lot of that
               | is a "garbage in, garbage out" issue but it's just the
               | current state of things.
               | 
               | If the alternative is noob code or just not doing a task
               | at all, then mediocre is great.
               | 
               | But 90% of the time I'm working in a familiar
               | language/domain so I can grind out better code relatively
               | quickly and do so in a way that's cohesive with nearby
               | code in the codebase. The main use-case I have for AI in
               | that case is writing the trivial unit tests for me.
               | 
               | So it's another "No Silver Bullet" technology where the
               | problem it's fixing isn't the essential problem software
               | engineers are facing.
        
               | brulard wrote:
               | I believe there IS much art in LLMs and Agents
               | especially. Maybe you can get like 20% boost quite
               | quickly, but there is so much room to grow it to maybe
               | 500% long term.
        
               | worldsayshi wrote:
               | I think it's very much down to which kind of problem
               | you're trying to solve.
               | 
               | If a solution can subtly fail and it is critical that it
               | doesn't, LLM is net negative.
               | 
               | If a solution is easy to verify or if it is enough that
               | it walks like a duck and quacks like one, LLM can be very
               | useful.
               | 
               | I've had examples of both lately. I'm very much both
               | bullish and bearish atm.
        
               | oceanplexian wrote:
               | It's pretty simple, AI is now political for a lot of
               | people. Some folks have a vested interest in downplaying
               | it or over hyping it rather than impartially approaching
               | it as a tool.
        
               | Gigachad wrote:
               | It's also just not consistent. A manager who can't code
               | using it to generate a react todo list thinks it's 100x
               | efficiency while a senior software dev working on
               | established apps finds it a net productivity negative.
               | 
               | AI coding tools seem to excel at demos and flop on the
               | field so the expectation disconnect between managers and
               | actual workers is massive.
        
               | chasd00 wrote:
               | One thing to think about is many software devs have a
               | very hard time with code they didn't write. I've seen
               | many devs do a lot of work to change code to something
               | equivalent (even with respect to performance and
               | readability) only because it's not the way they would
               | have done it. I could see people having a hard time using
               | what the LLM produced without having to "fix it up" and
               | basically re-write everything.
        
               | jama211 wrote:
               | Yeah sometimes I feel like a unicorn because I don't
               | really care about code at all, so long as it conforms to
               | decent standards and does what it needs to do. I honestly
               | believe engineers often overestimate the importance of
               | elegance in code too, to the point of not realising the
               | slow down of a project due to overly perfect code is
               | genuinely not worth it.
        
               | parpfish wrote:
               | i dont care if the code is elegant, i care that the code
               | is _consistent_.
               | 
               | do the same thing in the same way each time and it lets
               | you chunk it up and skim it much easier. if there are
               | little differences each time, you have to keep asking
               | yourself "is it done differently here for a particular
               | reason?"
        
               | vanviegen wrote:
               | Exactly! And besides that, new code being consistent with
               | its surrounding code used to be a sign of careful
               | craftsmanship (as opposed to spaghetti-against-the-wall
               | style coding), giving me some confidence that the
               | programmer may have considered at least the most
               | important nasty edge cases. LLMs have rendered that
               | signal mostly useless, of course.
        
               | dennisy wrote:
               | Also another view is that developers below a certain
               | level get a positive benefit and those above get a
               | negative effect.
               | 
               | This makes sense, as the models are an average of the
               | code out there and some of us are above and below that
               | average.
               | 
               | Sorry btw I do not want to offend anyone who feels they
               | do garner a benefit from LLMs, just wanted to drop in
               | this idea!
        
               | ath3nd wrote:
               | That's my anecdotal experience as well! Junior devs
               | struggle with a lot of things:
               | 
               | - syntax
               | 
               | - iteration over an idea
               | 
               | - breaking down the task and verifying each step
               | 
               | Working with a tool like Claude that gets them started
               | quick and iterate the solution together with them helps
               | them tremendously and educate them on best practices in
               | the field.
               | 
               | Contrast that with a seasoned developer with a domain
               | experience, good command of the programming language and
               | knowledge of the best practices and a clear vision of how
               | the things can be implemented. They hardly need any help
               | on those steps where the junior struggled and where the
               | LLMs shine, maybe some quick check on the API, but that's
               | mostly it. That's consistent with the finding of the
               | study https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o... that experienced developers' performance
               | suffered when using an LLM.
               | 
               | What I used as a metaphor before to describe this
               | phenomena is _training wheels_ : kids learning how to
               | ride a bike can get the basics with the help and safety
               | of the wheels, but adults that already can ride a bike
               | don't have any use for the training wheels, and can often
               | find restricted by them.
        
               | epolanski wrote:
               | > that experienced developers' performance suffered when
               | using an LLM
               | 
               | That experiment is really non significant. A bunch of OSS
               | devs without much training in the tools used them for
               | very little time and found it to be a net negative.
        
               | ath3nd wrote:
               | > That experiment is really non significant
               | 
               | That's been anecdotally my experience as well, I have
               | found juniors benefitted the most so far in professional
               | settings with lots of time spent on learning the tools.
               | Senior devs either negatively suffered or didn't
               | experience an improvement. The only study so far also
               | corroborates that anecdotal experience.
               | 
               | We can wait for other studies that are more relevant and
               | with larger sample sizes, but till the only folks
               | actually trying to measure productivity experienced a
               | negative effect so I am more inclined to believe it until
               | other studies come along.
        
               | parpfish wrote:
               | i don't know if anybody else has experienced this, but
               | one of my biggest time-sucks with cursor is that it
               | doesn't have a way for me to steer it mid-process that
               | i'm aware of.
               | 
               | it'll build something that fails a test, but _i know_ how
               | to fix the problem. i can 't jump in a manually fix it or
               | tell it what to do. i just have to watch it churn through
               | the problem and eventually give up and throw away a 90%
               | good solution that i knew how to fix.
        
               | williamdclt wrote:
               | You can click stop, and prompt it from there
        
               | smokel wrote:
               | My experience _was_ exactly the opposite.
               | 
               | Experienced developers know when the LLM goes off the
               | rails, and are typically better at finding useful
               | applications. Junior developers on the other hand, can
               | let horrible solutions pass through unchecked.
               | 
               | Then again, LLMs are improving so quickly, that the most
               | recent ones help juniors to learn and understand things
               | better.
        
               | rzz3 wrote:
               | It's also really good for me as a very senior engineer
               | with serious ADHD. Sometimes I get very mentally blocked,
               | and telling Claude Code to plan and implement a feature
               | gives me a really valuable starting point and has a way
               | of unblocking me. For me it's easier to elaborate off of
               | an existing idea or starting point and refactor than
               | start a whole big thing from zero on my own.
        
               | unoti wrote:
               | > Having spent a couple of weeks on Claude Code recently,
               | I arrived to the conclusion that the net value for me
               | from agentic AI is actually negative. > For me it's meant
               | a huge increase in productivity, at least 3X. > How do we
               | reconcile these two comments? I think that's a core
               | question of the industry right now.
               | 
               | Every success story with AI coding involves giving the
               | agent enough context to succeed on a task that it can see
               | a path to success on. And every story where it fails is a
               | situation where it had not enough context to see a path
               | to success on. Think about what happens with a junior
               | software engineer: you give them a task and they either
               | succeed or fail. If they succeed wildly, you give them a
               | more challenging task. If they fail, you give them more
               | guidance, more coaching, and less challenging tasks with
               | more personal intervention from you to break it down into
               | achievable steps.
               | 
               | As models and tooling becomes more advanced, the place
               | where that balance lies shifts. The trick is to ride that
               | sweet spot of task breakdown and guidance and
               | supervision.
        
               | troupo wrote:
               | > And every story where it fails is a situation where it
               | had not enough context to see a path to success on.
               | 
               | And you know that because people are actively sharing the
               | projects, code bases, programming languages and
               | approaches they used? Or because your _gut feeling_ is
               | telling you that?
               | 
               | For me, agents failed with enough context, and with not
               | enough context, and succeeded with context, or not
               | enough, and succeeded and failed with and without
               | "guidance and coaching"
        
               | hirako2000 wrote:
               | Bold claims.
               | 
               | From my experience, even the top models continue to fail
               | delivering correctness on many tasks even with all the
               | details and no ambiguity in the input.
               | 
               | In particular when details are provided, in fact.
               | 
               | I find that with solutions likely to be well oiled in the
               | training data, a well formulated set of *basic*
               | requirements often leads to a zero shot, "a" perfectly
               | valid solution. I say "a" solution because there is still
               | this probability (seed factor) that it will not honour
               | part of the demands.
               | 
               | E.g, build a to-do list app for the browser, persist
               | entries into a hashmap, no duplicate, can edit and
               | delete, responsive design.
               | 
               | I never recall seeing an LLM kick off C++ code out of
               | that. But I also don't recall any LLM succeeding in all
               | these requirements, even though there aren't that many.
               | 
               | It may use a hash set, or even a set for persistence
               | because it avoids duplicates out of the box. And it would
               | even use a hash map to show it used a hashmap but as an
               | intermediary data structure. It would be responsive, but
               | the edit/delete buttons may not show, or may not be
               | functional. Saving the edits may look like it worked, but
               | did not.
               | 
               | The comparison with junior developers is pale. Even a
               | mediocre developer can test its and won't pretend that it
               | works if it doesn't even execute. If a develop lies too
               | many times it would lose trust. We forgive these machines
               | because they are just automatons with a label on it "can
               | make mistakes". We have no resorts to make them speak the
               | truth, they lie by design.
        
               | brulard wrote:
               | > From my experience, even the top models continue to
               | fail delivering correctness on many tasks even with all
               | the details and no ambiguity in the input.
               | 
               | You may feel like there are all the details and no
               | ambiguity in the prompt. But there may still be missing
               | parts, like examples, structure, plan, or division to
               | smaller parts (it can do that quite well if explicitly
               | asked for). If you give too much details at once, it gets
               | confused, but there are ways how to let the model access
               | context as it progresses through the task.
               | 
               | And models are just one part of the equation. Another
               | parts may be orchestrating agent, tools, models awareness
               | of the tools available, documentation, and maybe even
               | human in the loop.
        
               | epolanski wrote:
               | > From my experience, even the top models continue to
               | fail delivering correctness on many tasks even with all
               | the details and no ambiguity in the input.
               | 
               | Please provide the examples, both of the problem and your
               | input so we can double check.
        
               | sixothree wrote:
               | It might just be me but I feel like it excels with
               | certain languages where other situations it falls flat.
               | Throw a well architected and documented code base in a
               | popular language and you can definitely feel it get I to
               | its groove.
               | 
               | Also giving IT tools to ensure success is just as
               | important. MCPs can sometimes make a world of difference,
               | especially when it needs to search you code base.
        
               | delegate wrote:
               | Easy. You're 3x more productive for a while and then you
               | burn yourself out.
               | 
               | Or lose control of the codebase, which you no longer
               | understand after weeks of vibing (since we can only think
               | and accumulate knowledge at 1x).
               | 
               | Sometimes the easy way out is throwing a week of
               | generated code away and starting over.
               | 
               | So that 3x doesn't come for free at all, besides API
               | costs, there's the cost of quickly accumulating tech debt
               | which you have to pay if this is a long term project.
               | 
               | For prototypes, it's still amazing.
        
               | brulard wrote:
               | You conflate efficient usage of AI with "vibing". Code
               | can be written by AI and still follow the agreed-upon
               | structures and rules and still can and should be
               | thoroughly reviewed. The 3x absolutely does not come for
               | free. But the price may have been paid in advance by
               | learning how to use those tools best.
               | 
               | I agree the vibe-coding mentality is going to be a major
               | problem. But aren't all tools used well and used badly?
        
               | Aeolun wrote:
               | > Or lose control of the codebase, which you no longer
               | understand after weeks of vibing (since we can only think
               | and accumulate knowledge at 1x).
               | 
               | I recognize this, but at the same time, I'm still better
               | at rmembering the scope of the codebase than Claude is.
               | 
               | If Claude gets a 1M context window, we can start sticking
               | a general overview of the codebase in every single prompt
               | without.
        
               | bloomca wrote:
               | > 2. When NOT to use the truck... when talking or the
               | bike is actually the better way to go.
               | 
               | Some people write racing car code, where a truck just
               | doesn't bring much value. Some people go into more
               | uncharted territories, where there are no roads (so the
               | truck will not only slow you down, it will bring a bunch
               | of dead weight).
               | 
               | If the road is straight, AI is wildly good. In fact, it
               | is probably _too_ good; but it can easily miss a turn and
               | it will take a minute to get it on track.
               | 
               | I am curious if we'll able to fine tune LLMs to assist
               | with less known paths.
        
               | troupo wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | We don't. Because there's no hard data:
               | https://dmitriid.com/everything-around-llms-is-still-
               | magical...
               | 
               | And when hard data of any kind _does_ start appearing, it
               | may actually point in a different direction:
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
               | 
               | > We need to shift the conversation to techniques, and
               | away from the tools.
               | 
               | No, you're asking to shift the conversation to magical
               | incantation which experts claim work.
               | 
               | What we need to do is shift the conversation to
               | _measurements_
        
               | jf22 wrote:
               | A couple of weeks isn't enough.
               | 
               | I'm six months in using LLMs to generate 90 of my code
               | and finally understanding the techniques and limitations.
        
               | gwd wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | The question is, for those people who _feel_ like things
               | are going faster, what 's the _actual_ velocity?
               | 
               | A month ago I showed it a basic query of one resource I'd
               | rewritten to use a "query builder" API. Then I showed it
               | the "legacy" query of another resource, and asked it to
               | do something similar. It managed to get very close on the
               | first try, and with only a few more hours of tweaking and
               | testing managed to get a reasonably thorough test suite
               | to pass. I'm sure that took half the time it would have
               | taken me to do it by hand.
               | 
               | Fast forward to this week, when I ran across some strange
               | bugs, and had to spend a day or two digging into the code
               | again, and do some major revision. Pretty sure those bugs
               | wouldn't have happened if I'd written the code myself;
               | but even though I reviewed the code, they went under the
               | radar, because I hadn't really understood the code as
               | well as I thought I had.
               | 
               | So was I faster overall? Or did I just offload some of
               | the work to myself at an unpredictable point in the
               | future? I don't "vibe code": I keep tight reign on the
               | tool and review everything it's doing.
        
               | Gigachad wrote:
               | Pretty much. We are in an era of vibe efficiency.
               | 
               | If programmers really did get 3x faster. Why has software
               | not improved any faster than it always has been.
        
               | lfowles wrote:
               | Probably because we're attempting to make 3x more
               | products
        
               | epolanski wrote:
               | This is a very sensible point.
        
               | nhaehnle wrote:
               | I just find it hard to take the 3x claims at face value
               | because actual code generation is only a small part of my
               | job, and so Amdahl's law currently limits any
               | productivity increase from agentic AI to well below 2x
               | for me.
               | 
               | (And I believe I'm fairly typical for my team. While
               | there are more junior folks, it's not that I'm just stuck
               | with powerpoint or something all day. Writing code is
               | rarely the bottleneck.)
               | 
               | So... either their job is really just churning out code
               | (where do these jobs exist, and are there any jobs like
               | this at all that still care about quality?) or the most
               | generous explanation that I can think of is that people
               | are really, really bad at self-evaluations of
               | productivity.
        
               | jg0r3 wrote:
               | Three things I've noticed as a dev whose field involves a
               | lot of niche software development.
               | 
               | 1. LLMs seem to benefit 'hacker-type' programmers from my
               | experience. People who tend to approach coding problems
               | in a very "kick the TV from different angles and see if
               | it works" strategy.
               | 
               | 2. There seems to be two overgeneralized types of devs in
               | the market right now: Devs who make niche software and
               | devs who make web apps, data pipelines, and other
               | standard industry tools. LLMs are much better at helping
               | with the established tool development at the moment.
               | 
               | 3. LLMs are absolute savants at making clean-ish looking
               | surface level tech demos in ~5 minutes, they are masters
               | of selling "themselves" to executives. Moving a demo to a
               | production stack? Eh, results may vary to say the least.
               | 
               | I use LLMs extensively when they make sense for me.
               | 
               | One fascinating thing for me is how different everyone's
               | experience with LLMs is. Obviously there's a lot of noise
               | out there. With AI haters and AI tech bros kind of
               | muddying the waters with extremist takes.
        
               | Ianjit wrote:
               | "How do we reconcile these two comments? I think that's a
               | core question of the industry right now."
               | 
               | There is no correlation between developers self
               | assessment of their productivity and their actual
               | productivity.
               | 
               | https://www.youtube.com/watch?v=tbDDYKRFjhk
        
             | thanhhaimai wrote:
             | I work across the stack (frontend, backend, ML)
             | 
             | - For FrontEnd or easy code, it's a speed up. I think it's
             | more like 2x instead of 3x.
             | 
             | - For my backend (hard trading algo), it has like 90%
             | failure rate so far. There is just so much for it to reason
             | through (balance sheet, lots, wash, etc). All agents I have
             | tried, even on Max mode, couldn't reason through all the
             | cases correctly. They end up thrashing back and forth.
             | Gemini most of the time will go into the "depressed" mode
             | on the code base.
             | 
             | One thing I notice is that the Max mode on Cursor is not
             | worth it for my particular use case. The problem is either
             | easy (frontend), which means any agent can solve it, or
             | it's hard, and Max mode can't solve it. I tend to pick the
             | fast model over strong model.
        
             | squeaky-clean wrote:
             | I just want to point out that they only said agentic models
             | were a negative, not AI in general. I don't know if this is
             | what they meant, but I personally prefer to use a web or
             | IDE AI tool and don't really like the agentic stuff
             | compared to those. For me agentic AI would be a net
             | positive against no-AI, but it's a net negative compared to
             | other AI interfaces
        
             | dmitrygr wrote:
             | > For me it's meant a huge increase in productivity, at
             | least 3X.
             | 
             | Quote possibly you are doing very common things that are
             | often done and thus are in the training set a lot, the
             | parent post is doing something more novel that forces the
             | model to extrapolate, which they suck at.
        
               | cambaceres wrote:
               | Sure, I won't argue against that. The more complex (and
               | fun) parts of the applications I tend to write myself.
               | The productivity gains are still real though.
        
             | bcrosby95 wrote:
             | My current guess is it's how the programmer solves problems
             | in their head. This isn't something we talk about much.
             | 
             | People seem to find LLMs do well with well-spec'd features.
             | But for me, creating a good spec doesn't take any less time
             | than creating the code. The problem for me is the
             | translation layer that turns the model in my head into
             | something more concrete. As such, creating a spec for the
             | LLM doesn't save me any time over writing the code myself.
             | 
             | So if it's a one shot with a vague spec and that works
             | that's cool. But if it's well spec'd to the point the LLM
             | won't fuck it up then I may as well write it myself.
        
             | byryan wrote:
             | That makes sense, especially if your building web
             | applications that are primarily "just" CRUD operations. If
             | a lot of the API calls follow the same pattern and the
             | application is just a series of API calls + React UI then
             | that seems like something an LLM would excel at. LLM's are
             | also more proficient in TypeScript/JS/Python compared to
             | other languages, so that helps as well.
        
             | carlhjerpe wrote:
             | I'm currently unemployed in the DevOps field (resigned and
             | got a long vacation). I've been using various models to
             | write various Kubernetes plug-ins abd simple automation
             | scripts. It's been a godsend implementing things which
             | would require too much research otherwise, my ADHD context
             | window is smaller than Claude's.
             | 
             | Models are VERY good at Kubernetes since they have very
             | anal (good) documentation requirements before merging.
             | 
             | I would say my productivity gain is unmeasurable since I
             | can produce things I'd ADHD out of unless I've got a whip
             | up my rear.
        
             | qingcharles wrote:
             | On the right projects, definitely an enormous upgrade for
             | me. Have to be judicious with it and know when it is right
             | and when it's wrong. I think people have to figure out what
             | those times are. For now. In the future I think a lot of
             | the problems people are having with it will diminish.
        
             | epolanski wrote:
             | > Since so many claim the opposite
             | 
             | The overwhelming majority of those claiming the opposite
             | are a mixture of:
             | 
             | - users with wrong expectations, such as AI's ability to do
             | the job on its own with minimal effort from the user. They
             | have marketers to blame.
             | 
             | - users that have AI skill issues: they simply don't
             | understand/know how to use the tools appropriately. I could
             | provide countless examples from the importance of quality
             | prompting, good guidelines, context management, and many
             | others. They have only their laziness or lack of interest
             | to blame.
             | 
             | - users that are very defensive about their job/skills.
             | Many feel threatened by AI taking their jobs or diminishing
             | it, so their default stance is negative. They have their
             | ego to blame.
        
             | darkmarmot wrote:
             | I work in distributed systems programming and have been
             | horrified by the crap the AIs produce. I've found them to
             | be quite helpful at summarizing papers and doing research,
             | providing jumping off points. But none of the code I write
             | can be scraped from a blog post.
        
           | revskill wrote:
           | Truth. To some extend, the agent doesn't know what it's doing
           | at all, it lacks real brain, maybe we should just treat them
           | as the hard worker.
        
           | flowerthoughts wrote:
           | What type of work do you do? And how do you measure value?
           | 
           | Last week I was using Claude Code for web development. This
           | week, I used it to write ESP32 firmware and a Linux kernel
           | driver. Sure, it made mistakes, but the net was still very
           | positive in terms of efficiency.
        
             | verall wrote:
             | > This week, I used it to write ESP32 firmware and a Linux
             | kernel driver.
             | 
             | I'm not meaning to be negative at all, but was this for a
             | toy/hobby or for a commercial project?
             | 
             | I find that LLMs do very well on small greenfield toy/hobby
             | projects but basically fall over when brought into
             | commercial projects that often have bespoke requirements
             | and standards (i.e. has to cross compile on qcc, comply
             | with autosar, in-house build system, tons of legacy code
             | laying around maybe maybe not used).
             | 
             | So no shade - I'm just really curious what kind of project
             | you were able get such good results writing ESP32 FW and
             | kernel drivers for :)
        
               | lukebechtel wrote:
               | Maintaining project documentation is:
               | 
               | (1) Easier with AI
               | 
               | (2) Critical for letting AI work effectively in your
               | codebase.
               | 
               | Try creating well structured rules for working in your
               | codebase, put in .cursorrules or Claude equivalent... let
               | AI help you... see if that helps.
        
               | theshrike79 wrote:
               | The magic to using agentic LLMs efficiently is...
               | 
               | proper project management.
               | 
               | You need to have good documentation, split into logical
               | bits. Tasks need to be clearly defined and not have
               | extensive dependencies.
               | 
               | And you need to have a simple feedback loop where you can
               | easily run the program and confirm the output matches
               | what you want.
        
               | troupo wrote:
               | And the chance of that working depends on the weather,
               | the phase of the moon and the arrangement of bird bones
               | in a druidic augury.
               | 
               | It's a non-deterministic system producing statistically
               | relevant results with no failure modes.
               | 
               | I had Cursor one-shot issues in internal libraries with
               | zero rules.
               | 
               | And then suggest I use StringBuilder (Java) in a 100%
               | Elixir project with carefully curated cursor rules as
               | suggested by the latest shamanic ritual trends.
        
               | oceanplexian wrote:
               | I work in FAANG, have been for over a decade. These tools
               | are creating a huge amount of value, starting with
               | Copilot but now with tools like Claude Code and Cursor.
               | The people doing so don't have a lot of time to comment
               | about it on HN since we're busy building things.
        
               | nomel wrote:
               | What are the AI usage policies like at your org? Where I
               | am, we're severely limited.
        
               | jpc0 wrote:
               | > These tools are creating a huge amount of value...
               | 
               | > The people doing so don't have a lot of time to comment
               | about it on HN since we're busy building...
               | 
               | "We're so much more productive that we don't have time to
               | tell you how much more productive we are"
               | 
               | Do you see how that sounds?
        
               | wijwp wrote:
               | To be fair, AI isn't going to give us more time outside
               | work. It'll just increase expectations from leadership.
        
               | drusepth wrote:
               | I feel this, honestly. I get so much more work done
               | (currently: building & shipping games, maintaining
               | websites, managing APIs, releasing several mobile apps,
               | and developing native desktop applications) managing 5x
               | claude instances that the majority of my time is sucked
               | up by just prompting whichever agent is done on their
               | next task(s), and there's a real feeling of lost
               | productivity if any agent is left idle for too long.
               | 
               | The only time to browse HN left is when all the agents
               | are comfortably spinning away.
        
               | GodelNumbering wrote:
               | I don't see how FAANG is relevant here. But the 'FAANG' I
               | used to work at had an emergent problem of people
               | throwing a lot of half baked 'AI-powered' code over the
               | wall and let reviewers deal with it (due to incentives,
               | not that they were malicious). In orgs like infra where
               | everything needs to be reviewed carefully, this is purely
               | a burden
        
               | nme01 wrote:
               | I also work for a FAANG company and so far most employees
               | agree that while LLMs are good for writing docs,
               | presentations or emails, they still lack a lot when it
               | comes to writing a maintainable code (especially in Java,
               | they supposedly do better in Go, don't know why, not my
               | opinion). Even simple refactorings need to be carefully
               | checked. I really like them for doing stuff that I know
               | nothing about though (eg write a script using a certain
               | tool, tell me how to rewrite my code to use certain
               | library etc) or for reviewing changes
        
               | verall wrote:
               | I work in a FAANG equivalent for a decade, mostly in
               | C++/embedded systems. I work on commercial products used
               | by millions of people. I use the AI also.
               | 
               | When others are finding gold in rivers similar to mine,
               | and I'm mostly finding dirt, I'm curious to ask and see
               | how similar the rivers really are, or if the river they
               | are panning in is actually somewhere I do find gold, but
               | not a river I get to pan in often.
               | 
               | If the rivers really are similar, maybe I need to work on
               | my panning game :)
        
               | boppo1 wrote:
               | >creating a huge amount of value Do you write software,
               | or work in accounting/finance/marketing?
        
               | ewoodrich wrote:
               | I use agentic tools all the time but comments like this
               | always make me feel like someone's trying to sell me
               | their new cryptocoin or NFT.
        
               | GodelNumbering wrote:
               | This is my experience too. Also, their propensity to jump
               | into code without necessarily understanding the
               | requirement is annoying to say the least. As the project
               | complexity grows, you find yourself writing longer and
               | longer instructions just to guardrail.
               | 
               | Another rather interesting thing is that they tend to
               | gravitate towards sweep the errors under the rug kind of
               | coding which is disastrous. e.g. "return X if we don't
               | find the value so downstream doesn't crash". These are
               | the kind of errors no human, even a beginner on their
               | first day learning to code, wouldn't make and are
               | extremely annoying to debug.
               | 
               | Tl;dr: LLMs' tendency to treat every single thing you
               | give it as a demo homework project
        
               | tombot wrote:
               | > their propensity to jump into code without necessarily
               | understanding the requirement is annoying to say the
               | least.
               | 
               | Then don't let it, collaborate on the spec, ask Claude to
               | make a plan. You'll get far better results
               | 
               | https://www.anthropic.com/engineering/claude-code-best-
               | pract...
        
               | verall wrote:
               | > Another rather interesting thing is that they tend to
               | gravitate towards sweep the errors under the rug kind of
               | coding which is disastrous. e.g. "return X if we don't
               | find the value so downstream doesn't crash".
               | 
               | Yes, these are painful and basically the main reason I
               | moved from Claude to Gemini - it felt insane to be
               | begging the AI - "No, you actually have to fix the bug,
               | in the code you wrote, you cannot just return some random
               | value when it fails, it actually has to work".
        
               | GodelNumbering wrote:
               | Claude in particular abuses the word 'Comprehensive' a
               | lot. You express that you're unhappy with its approach,
               | it will likely comeback with "Comprehensive plan to ..."
               | and then write like 3 bullet points under it, that is of
               | course after profusely apologizing. On a sidenote, I wish
               | LLMs never apologized and instead just said I don't know
               | how to do this.
        
               | jorvi wrote:
               | Running LLM code with kernel privileges seems like
               | courting disaster. I wouldn't dare do that unless I had a
               | rock-solid grasp of the subsystem, and at that point, why
               | not just write the code myself? LLM coding is on-average
               | 20% slower.
        
               | LinXitoW wrote:
               | In my experience in a Java code base, it didn't do any of
               | this, and did a good job with exceptions.
               | 
               | And I have to disagree that these aren't errors that
               | beginners or even intermediates make. Who hasn't
               | swallowed an error because "that case totally, most
               | definitely won't ever happen, and I need to get this
               | done"?
        
               | flowerthoughts wrote:
               | Totally agree.
               | 
               | This was a debugging tool for Zigbee/Thread.
               | 
               | The web project is Nuxt v4, which was just released, so
               | Claude keeps wanting to use v3 semantics, and you have to
               | keep repeating the known differences, even if you use
               | CLAUDE.md. (They moved client files under a app/
               | subdirectory.)
               | 
               | All of these are greenfield prototypes. I haven't used it
               | in large systems, and I can totally see how that would be
               | context overload for it. This is why I was asking GP
               | about the circumstances.
        
               | LinXitoW wrote:
               | Ironically, AI mirrors human developers in that it's far
               | more effective when working in a well written, well
               | documented code base. It will infer function
               | functionality from function names. If those are shitty,
               | short, or full of weird abbreviations, it'll have a hard
               | time.
               | 
               | Maybe it's a skill issue, in the sense of having a decent
               | code base.
        
           | greenie_beans wrote:
           | same. agents are good with easy stuff and debugging but
           | extremely bad with complexity. has no clue about chesterson's
           | fence, and it's hard to parse the results especially when it
           | creates massive diffs. creates a ton of abandoned/cargo code.
           | lots of misdirection with OOP.
           | 
           | chatting witch claude and copy/pasting code between my IDE
           | and claude is still the most effective for more complex
           | stuff, at least for me.
        
           | jmartrican wrote:
           | Maybe that is a skills issue.
        
             | rootusrootus wrote:
             | If you are suggesting that LLMs are proving quite good at
             | taking over the low skilled work that probably 90% of devs
             | spend the majority of their time doing, I totally agree. It
             | is the simplest explanation for why many people think they
             | are magic, while some people find very little value.
             | 
             | On the occasion that I find myself having to write web code
             | for whatever reason, I'm very happy to have Claude. I don't
             | enjoy coding for the web, like at all.
        
               | logicprog wrote:
               | I think that's definitely true -- these tools are only
               | really taking care of the relatively low skill stuff;
               | synthesizing algorithms and architectures and approaches
               | that have been seen before, automating building out for
               | scaffolding things, or interpolating skeletons, and
               | running relatively typical bash commands for you after
               | making code changes, or implementing fairly specific
               | specifications of how to approach novel architectures
               | algorithms or code logic, automating exploring code bases
               | and building understanding of what things do and where
               | they are and how they relate and the control flow (which
               | would otherwise take hours of laboriously grepping around
               | and reading code), all in small bite sized pieces with a
               | human in the loop. They're even able to make complete and
               | fully working code for things that are a small variation
               | or synthesization of things they've seen a lot before in
               | technologies they're familiar with.
               | 
               | But I think that that can still be a pretty good boost --
               | I'd say maybe 20 to 30%, plus MUCH less headache, when
               | used right -- even for people that are doing really
               | interesting and novel things, because even if your work
               | has a lot of novelty and domain knowledge to it, there's
               | always mundane horseshit that eats up way too much of
               | your time and brain cycles. So you can use these agents
               | to take care of all the peripheral stuff for you and just
               | focus on what's interesting to you. Imagine you want to
               | write some really novel unique complex algorithm or
               | something but you do want it to have a GUI debugging
               | interface. You can just use Imgui or TKinter if you can
               | make Python bindings or something and then offload that
               | whole thing onto the LLM instead of having to have that
               | extra cognitive load and have to page just to warp the
               | meat of what you're working on out whenever you need to
               | make a modification to your GUI that's more than trivial.
               | 
               | I also think this opens up the possibility for a lot more
               | people to write ad hoc personal programs for various
               | things they need, which is even more powerful when
               | combined with something like Python that has a ton of
               | pre-made libraries that do all the difficult stuff for
               | you, or something like emacs that's highly malleable and
               | rewards being able to write programs with it by making
               | them able to very powerfully integrate with your workflow
               | and environment. Even for people who already know how to
               | program and like programming even, there's still an
               | opportunity cost and an amount of time and effort and
               | cognitive load investment in making programs. So by
               | significantly lowering that you open up the opportunities
               | even for us and for people who don't know how to program
               | at all, their productivity basically goes from zero to
               | one, an improvement of 100% (or infinity lol)
        
               | phist_mcgee wrote:
               | What a supremely arrogant comment.
        
               | rootusrootus wrote:
               | I often have such thoughts about things I read on HN but
               | I usually follow the site guidelines and keep it to
               | myself.
        
           | ericmcer wrote:
           | Agreed, daily Cursor user.
           | 
           | Just got out of a 15m huddle with someone trying to
           | understand what they were doing in a PR before they admitted
           | Claude generated everything and it worked but they weren't
           | sure why... Ended up ripping about 200 LoC out because what
           | Claude "fixed" wasn't even broken.
           | 
           | So never let it generate code, but the autocomplete is
           | absolutely killer. If you understand how to code in 2+
           | languages you can make assumptions about how to do things in
           | many others and let the AI autofill the syntax in. I have
           | been able to swap to languages I have almost no experience in
           | and work fairly well because memorizing syntax is irrelevant.
        
             | daymanstep wrote:
             | > I have been able to swap to languages I have almost no
             | experience in and work fairly well because memorizing
             | syntax is irrelevant.
             | 
             | I do wonder whether your code does what you think it does.
             | Similar-sounding keywords in different languages can have
             | completely different meanings. E.g. the volatile keyword in
             | Java vs C++. You don't know what you don't know, right? How
             | do you know that the AI generated code does what you think
             | it does?
        
               | jacobr1 wrote:
               | Beyond code-gen I think some techniques are very
               | underutilized. One can generate tests, generate docs,
               | explain things line by line. Explicitly explaining
               | alternative approaches and tradeoffs is helpful too.
               | While, as with everything in this space, there are
               | imperfection, I find a ton of value in looking beyond the
               | code into thinking through the use cases, alternative
               | approaches and different ways to structure the same
               | thing.
        
               | pornel wrote:
               | I've wasted time debugging phantom issues due to LLM-
               | generated tests that were misusing an API.
               | 
               | Brainstorming/explanations can be helpful, but also watch
               | out for Gell-Mann amnesia. It's annoying that LLMs always
               | sound smart whether they are saying something smart or
               | not.
        
               | Miraste wrote:
               | Yes, you can't use any of the heuristics you develop for
               | human writing to decide if the LLM is saying something
               | stupid, because its best insights and its worst
               | hallucinations all have the same formatting, diction, and
               | style. Instead, you need to engage your frontal cortex
               | and rationally evaluate every single piece of information
               | it presents, and that's tiring.
        
               | valenterry wrote:
               | It's like listening to a politician or lawyer, who might
               | talk absolute bullshit in the most persuading words. =)
        
               | spanishgum wrote:
               | The same way I would with any of my own code - I would
               | test it!
               | 
               | The key here is to spend less time searching, and more
               | time understanding the search result.
               | 
               | I do think the vibe factor is going to bite companies in
               | the long run. I see a lot of vibe code pushed by both
               | junior and senior devs alike, where it's clear not enough
               | time was spent reviewing the product. This behavior is
               | being actively rewarded now, but I do think the attitude
               | around building code as fast as possible will change if
               | impact to production systems becomes realized as a net
               | negative. Time will tell.
        
             | senko wrote:
             | > Just got out of a 15m huddle with someone trying to
             | understand what they were doing in a PR before they
             | admitted Claude generated everything and it worked but they
             | weren't sure why...
             | 
             | But .. that's not the AI's fault. If people submit _any_
             | PRs (including AI-generated or AI-assisted) without
             | _completely_ understanding them, I 'd treat is as serious
             | breach of professional conduct and (gently, for first-
             | timers) stress that this is _not_ acceptable.
             | 
             | As someone hitting the "Create PR" (or equivalent) button,
             | you accept responsibility for the code in question. If you
             | submit slop, it's 100% on you, not on any tool used.
        
               | draxil wrote:
               | But it's pretty much a given at this point that if you
               | use agents to code for any length of time it starts to
               | atrophy your ability to understand what's going on. So,
               | yeah. it's a bit of a devils chalice.
        
               | whatever1 wrote:
               | If you have to review what the LLM wrote then there is no
               | productivity gain.
               | 
               | Leadership asks for vibe coding
        
               | senko wrote:
               | > If you have to review what the LLM wrote then there is
               | no productivity gain.
               | 
               | I do not agree with that statement.
               | 
               | > Leadership asks for vibe coding
               | 
               | Leadership always asks for more, better, faster.
        
               | mangamadaiyan wrote:
               | > Leadership always asks for more, better, faster.
               | 
               | More and faster, yes. Almost never better.
        
               | swat535 wrote:
               | > If you have to review what the LLM wrote then there is
               | no productivity gain.
               | 
               | You always have to review the code, whether it's written
               | by another person, yourself or an AI.
               | 
               | I'm not sure how this translates into the loss of
               | productivity?
               | 
               | Did you mean to say that the code AI generates is
               | difficult to review? In those cases, it's the fault of
               | the code author and not the AI.
               | 
               | Using AI like any other tool requires experience and
               | skill.
        
               | WolfeReader wrote:
               | I've seen AI create incorrect solutions and deceptive
               | variable names. Reviewing the code is absolutely
               | necessary.
        
               | epolanski wrote:
               | > If you have to review what the LLM wrote then there is
               | no productivity gain
               | 
               | Stating something with confidence does not make it
               | automatically true.
        
               | fooster wrote:
               | I suggest you upgrade your code review skill. I find it
               | vastly quicker in most cases to review code than write it
               | in the first place.
        
               | whatever1 wrote:
               | Anyone can skim code and type "looks good to me".
        
             | qingcharles wrote:
             | The other day I caught it changing the grammar and spelling
             | in a bunch of static strings in a totally different part of
             | a project, for no sane reason.
        
               | bdamm wrote:
               | I've seen it do this as well. Odd things like swapping
               | the severity level on log statements that had nothing to
               | do with the task.
               | 
               | Very careful review of my commits is the only way
               | forward, for a long time.
        
               | ericmcer wrote:
               | That sounds similar to what it was doing here. It
               | basically took a function like `thing = getThing(); id =
               | thing.id` and created `id = getThingId()` and replaced
               | hundreds of lines and made a new API endpoint.
               | 
               | Not a huge deal because it works, but it seems like you
               | would have 100,000 extra lines if you let Claude do
               | whatever it wanted for a few months.
        
             | epolanski wrote:
             | You're blaming the tool and not the tool user.
        
           | meowtimemania wrote:
           | For me it depends on the task. For some tasks (maybe things
           | that don't have good existing examples in my codebase?)
           | 
           | I'll spend 3x the time repeatedly asking claude to do
           | something for me
        
           | 9cb14c1ec0 wrote:
           | The more I use Claude Code, the more aware I become of its
           | limitations. On the whole, it's a useful tool, but the bigger
           | the codebase the less useful. I've noticed a big difference
           | on its performance on projects with 20k lines of code versus
           | 100k. (Yes, I know. A 100k line project is still very small
           | in the big picture)
        
             | Aeolun wrote:
             | I think one of thr big issues with CC is that it'll read
             | the first occurence of something, and then think it's
             | _found_ it. Never mind that there are 17 instances spread
             | throughout the codebase.
             | 
             | I have to be really vigilant and tell it to search the
             | codebase for any duplication, then resolve it, if I want it
             | to keep going good at what it does.
        
           | sorhaindop wrote:
           | This exact phrase has been said by 3 different users...
           | weird.
        
             | sorhaindop wrote:
             | "Having spent a couple of weeks on Claude Code recently, I
             | arrived to the conclusion that the net value for me from
             | agentic AI is actually negative" - smells like BS to me.
        
         | alexchamberlain wrote:
         | I'm not sure how, and maybe some of the coding agents are doing
         | this, but we need to teach the AI to use abstractions, rather
         | than the whole code base for context. We as humans don't hold
         | the whole codebase in our hear, and we shouldn't expect the AI
         | to either.
        
           | F7F7F7 wrote:
           | There are a billion and one repos that claim to help do this.
           | Let us know when you find one.
        
           | siwatanejo wrote:
           | I do think AIs are already using abstractions, otherwise you
           | would be submitting all the source code of your dependencies
           | into the context.
        
             | TheOtherHobbes wrote:
             | I think they're recognising patterns, which is not the same
             | thing.
             | 
             | Abstractions are stable, they're explicit in their domains,
             | good abstractions cross multiple domains, and they
             | typically come with a symbolic algebra of available
             | operations.
             | 
             | Math is made of abstractions.
             | 
             | Patterns are a weaker form of cognition. They're implicit,
             | heavily context-dependent, and there's no algebra. You have
             | to poke at them crudely in the hope you can make them do
             | something useful.
             | 
             | Using LLMs feels more like the latter than the former.
             | 
             | If LLMs were generating true abstractions they'd be finding
             | meta-descriptions for code and language and making them
             | accessible directly.
             | 
             | AGI - or ASI - may be be able to do that some day, but it's
             | not doing that now.
        
           | anthonypasq wrote:
           | the fact we cant keep the repo in our working memory is a
           | flaw of our brains. i cant see how you could possibly make
           | the argument that if you were somehow able to keep the entire
           | codebase in your head that it would be a disadvantage.
        
             | SkyBelow wrote:
             | Information tradeoff. Even if you could keep the entire
             | code base in memory, if something else has to be left out
             | of memory, then you have to consider the value of an
             | abstraction verses whatever other information is lost.
             | Abstractions also apply to the business domain and works
             | the same.
             | 
             | You also have time tradeoffs. Like time to access memory
             | and time to process that memory to achieve some outcome.
             | 
             | There is also quality. If you can keep the entire code base
             | in memory but with some chance of confusion, while
             | abstractions will allow less chance of confusion, then the
             | tradeoff of abstractions might be worth it still.
             | 
             | Even if we assume a memory that has no limits, can access
             | and process all information at constant speed, and no
             | quality loss, there is still communication limitations to
             | worry about. Energy consumption is yet another.
        
           | sdesol wrote:
           | LLMs (current implementation) are probabilistic so it really
           | needs the actual code to predict the most likely next tokens.
           | Now loading the whole code base can be a problem in itself,
           | since other files may negatively affect the next token.
        
             | photon_lines wrote:
             | Sorry -- I keep seeing this being used but I'm not entirely
             | sure how it differs from most of human thinking. Most human
             | 'reasoning' is probabilistic as well and we rely on
             | 'associative' networks to ingest information. In a similar
             | manner - LLMs use association as well -- and not only that,
             | but they are capable of figuring out patterns based on
             | examples (just like humans are) -- read this paper for
             | context: https://arxiv.org/pdf/2005.14165. In other words,
             | they are capable of grokking patterns from simple data
             | (just like humans are). I've given various LLMs my
             | requirements and they produced working solutions for me by
             | simply 1) including all of the requirements in my prompt
             | and 2) asking them to think through and 'reason' through
             | their suggestions and the products have always been
             | superior to what most humans have produced. The 'LLMs are
             | probabilistic predictors' comments though keep appearing on
             | threads and I'm not quite sure I understand them -- yes,
             | LLMs don't have 'human context' i.e. data needed to
             | understand human beings since they have not directly been
             | fed in human experiences, but for the most part -- LLMs are
             | not simple 'statistical predictors' as everyone brands them
             | to be. You can see a thorough write-up I did of what GPT is
             | / was here if you're interested:
             | https://photonlines.substack.com/p/intuitive-and-visual-
             | guid...
        
               | didibus wrote:
               | You seem possibly more knowledgeable then me on the
               | matter.
               | 
               | My impression is that LLMs predict the next token based
               | on the prior context. They do that by having learned a
               | probability distribution from tokens -> next-token.
               | 
               | Then as I understand, the models are never reasoning
               | about the problem, but always about what the next token
               | should be given the context.
               | 
               | The chain of thought is just rewarding them so that the
               | next token isn't predicting the token of the final answer
               | directly, but instead predicting the token of the
               | reasoning to the solution.
               | 
               | Since human language in the dataset contains text that
               | describes many concepts and offers many solutions to
               | problems. It turns out that predicting the text that
               | describes the solution to a problem often ends up being
               | the correct solution to the problem. That this was true
               | was kind of a lucky accident and is where all the
               | "intelligence" comes from.
        
               | photon_lines wrote:
               | So - in the pre-training step you are right -- they are
               | simple 'statistical' predictors but there are more steps
               | involved in their training which turn them from simple
               | predictors to being able to capture patterns and reason
               | -- I tried to come up with an intuitive overview of how
               | they do this in the write-up and I'm not sure I can give
               | you a simple explanation here, but I would recommend you
               | play around with Deep-Seek and other more advanced
               | 'reasoning' or 'chain-of-reason' models and ask them to
               | perform tasks for you: they are not simply statistically
               | combining information together. Many times they are able
               | to reason through and come up with extremely advanced
               | working solutions. To me this indicates that they are not
               | 'accidently' stumbling upon solutions based on statistics
               | -- they actually are able to 'understand' what you are
               | asking them to do and to produce valid results.
        
               | didibus wrote:
               | If you observe the failure modes of current models, you
               | see that they fail in ways that align with probabilistic
               | token prediction.
               | 
               | I don't mean that the textual prediction is simple, it's
               | very advanced and it learns all kinds of relationships,
               | patterns and so on.
               | 
               | But it doesn't have a real model and thinking process
               | relating to the the actual problem. It thinks about what
               | text could describe a solution that is linguistically and
               | language semantically probable.
               | 
               | Since human language embedds so many of the logics and
               | ground truths that's good enough to result in a textual
               | description that approximate or nails the actual
               | underlying problem.
               | 
               | And this is why we see them being able to solve quite
               | advanced problems.
               | 
               | I admit that people are wondering now, what's different
               | about human thinking? Maybe we do the same, you invent a
               | probable sounding answer and then check if it was
               | correct, rinse and repeat until you find one that works.
               | 
               | But this in itself is a big conjecture. We don't really
               | know how human thinking works. We've found a method that
               | works well for computers and now we wonder if maybe we're
               | just the same but scaled even higher or with slight
               | modifications.
               | 
               | I've heard from ML experts though that they don't think
               | so. Most seem to believe different architecture will be
               | needed, world models, model ensembles with various
               | specialized models with different architecture working
               | together, etc. That LLMs fundamentaly are kind of limited
               | by their nature as next token predictors.
        
               | coderenegade wrote:
               | I think the intuitive leap (or at least, what I believe)
               | is that meaning is encoded in the media. A given context
               | and input encodes a particular meaning that the model is
               | able to map to an output, and because the output is also
               | in the same medium (tokens, text), it also has meaning.
               | Even reasoning can fit in with this, because the model
               | generates additional meaningful context that allows it to
               | better map to an output.
               | 
               | How you find the function that does the mapping probably
               | doesn't matter. We use probability theory and information
               | theory, because they're the best tools for the job, but
               | there's nothing to say you couldn't handcraft it from
               | scratch if you were some transcendent creature.
        
               | didibus wrote:
               | Yes exactly.
               | 
               | The text of human natural language that it is trained on
               | encodes the solutions to many problems as well as a lot
               | of ground truths.
               | 
               | The way I think of it is. First you have a random text
               | generator. This generative "model" in theory can find the
               | solution to all problems that text can describe.
               | 
               | If you had a way to assert if it found the correct
               | solution, you could run it and eventually it would
               | generate the text that describes the working solution.
               | 
               | Obviously inefficient and not practical.
               | 
               | What if you made it so it skipped generating all text
               | that aren't valid sensical English?
               | 
               | Well now it would find the correct solution in way less
               | iterations, but still too slow.
               | 
               | What if it generated only text that made sense to follow
               | the context of the question?
               | 
               | Now you might start to see it 100-shot, 10-shot, maybe
               | even 1-shot some problems.
               | 
               | What if you tuned that to the max? Well you get our
               | current crop of LLMs.
               | 
               | What else can you do to make it better?
               | 
               | Tune the dataset, remove text that describe wrong answers
               | to prior context so it learns not to generate those. Add
               | more quality answers to prior context, add more
               | problems/solutions, etc.
               | 
               | Instead of generating the answer to a mathematical
               | equation the above way, generate the Python code to run
               | to get the answer.
               | 
               | Instead of generating the answer to questions about
               | current real world events/facts (like the weather). Have
               | it generate the web search query to find it.
               | 
               | If you're asking a more complex question, instead of
               | generating the answer directly, have it generate smaller
               | logical steps towards the answer.
               | 
               | Etc.
        
               | sdesol wrote:
               | I'm not sure if I would say human reasoning is
               | 'probabilistic' unless you are taking a very far step
               | back and saying based on how the person lived, they have
               | ingrained biases (weights) that dictates how they reason.
               | I don't know if LLMs have a built in scepticism like
               | humans do, that plays a significant role in reasoning.
               | 
               | Regardless if you believe LLMs are probabilistic or not,
               | I think what we are both saying is context is king and
               | what it (LLM) says is dictated by the context (either
               | through training) or introduced by the user.
        
               | Workaccount2 wrote:
               | Humans have a neuro-chemical system that performs
               | operations with electrical signals.
               | 
               | That's the level to look at, unless you have a dualist
               | view of the brain (we are channeling a super-natural
               | forces).
        
               | lll-o-lll wrote:
               | Yep, just like like looking at a birds feather through a
               | microscope explains the principles of flight...
               | 
               | Complexity theory doesn't have a mathematics (yet), but
               | that doesn't mean we can't see that it exists. Studying
               | the brain at the lowest levels haven't lead to any major
               | insights in how cognition functions.
        
               | brookst wrote:
               | I personally believe that quantum effects play a role and
               | we'll learn more once we understand the brain at that
               | level, but I recognize that is an intuition and may well
               | be wrong.
        
               | photon_lines wrote:
               | 'I don't know if LLMs have a built in scepticism like
               | humans do' - humans don't have an 'in built skepticism'
               | -- we learn in through experience and through being
               | taught how to 'reason' within school (and it takes a very
               | long time to do this). You believe that this is in-
               | grained but you may have forgotten having to slog through
               | most of how the world works and being tested when you
               | went to school and when your parents taught you these
               | things. On the context component: yes, context is vitally
               | important (just as it is with humans) -- you can't
               | produce a great solution unless you understand the 'why'
               | behind it and how the current solution works so I 100%
               | agree with that.
        
               | ijidak wrote:
               | For me, the way humans finish each other's sentences and
               | often think of quotes from the same movies at the same
               | time in conversation (when there is no clear reason for
               | that quote to be a part of the conversation), indicates
               | that there is a probabilistic element to human thinking.
               | 
               | Is it entirely probabilistic? I don't think so. But, it
               | does seem that a chunk of our speech generation and
               | processing is similar to LLMs. (e.g. given the words I've
               | heard so far, my brain is guessing words x y z should
               | come next.)
               | 
               | I feel like the conscious, executive mind humans have
               | exercises some active control over our underlying
               | probabilistic element. And LLMs lack the conscious
               | executive.
               | 
               | e.g. They have our probabilistic capabilities, without
               | some additional governing layer that humans have.
        
               | coderenegade wrote:
               | I think the better way to look at it is that
               | probabilistic models seem to be an accurate model for
               | human thought. We don't really know how humans think, but
               | we know that they probably aren't violating information
               | theoretic principles, and we observe similar phenomena
               | when we compare humans with LLMs.
        
             | nomel wrote:
             | No, it doesn't, nor do we. It's why abstractions and
             | documentations exist.
             | 
             | If you know what a function achieves, and you trust it to
             | do that, you don't need to see/hold its exact
             | implementation in your head.
        
               | sdesol wrote:
               | But documentation doesn't include styling or preferred
               | pattern, which is why I think a lot people complain that
               | the LLM will just produce garbage. Also documentation is
               | not guaranteed to be correct or up to date. To be able to
               | produce the best code based on what you are hoping for, I
               | do think having the actual code is necessary unless
               | styling/design patterns are not important, then yes
               | documentation will be suffice, provided they are accurate
               | and up to date.
        
           | throwaway314155 wrote:
           | /compact in Claude Code is effectively this.
        
             | brulard wrote:
             | Compact is a reasonable default way to do that, but quite
             | often it discards important details. It's better to have CC
             | to store important details, decisions and reasons in a
             | document where it can be reviewed and modified if needed.
        
           | LinXitoW wrote:
           | They already do, or at least Claude Code does. It will search
           | for a method name, then only load a chunk of that file to get
           | the method signature, for example.
           | 
           | It will use the general information you give it to make
           | educated guesses of where things are. If it knows the code is
           | Vue based and it has to do something with "users", it might
           | seach for "src/*/ _User_.vue.
           | 
           | This is also the reason why the quality of your code makes
           | such a large difference. The more consistent the naming of
           | files and classes, the better the AI is at finding them.
        
             | felipeerias wrote:
             | Claude Code can get access to a language server like clangd
             | through a MCP server, for example
             | https://github.com/isaacphi/mcp-language-server
        
         | sdesol wrote:
         | > I really desperately need LLMs to maintain extremely
         | effective context
         | 
         | I actually built this. I'm still not ready to say "use the tool
         | yet" but you can learn more about it at
         | https://github.com/gitsense/chat.
         | 
         | The demo link is not up yet as I need to finalize an admin tool
         | but you should be able to follow the npm instructions to play
         | around with.
         | 
         | The basic idea is, you should be able to load your entire repo
         | or repos and use the context builder to help you refine it. Or
         | you can can create custom analyzers that you can do 'AI
         | Assisted' searches with like execute `!ask find all frontend
         | code that does [this]` and the because the analyzer knows how
         | to extract the correct metadata to support that query, you'll
         | be able to easily build the context using it.
        
           | kvirani wrote:
           | Wait that's not how Cursor etc work? (I made assumptions)
        
             | trenchpilgrim wrote:
             | Dunno about Cursor but this is exactly how I use Zed to
             | navigate groups of projects
        
             | sdesol wrote:
             | I don't use Cursor so I can't say, but based on what I've
             | read, they optimize for smaller context to reduce cost and
             | probably for performance. The issue is, I think this is
             | severely flawed as LLMs are insanely context sensitive and
             | forgetting to include a reference file can lead to
             | undesirable code.
             | 
             | I am obviously biased, but I still think to get the best
             | results, the context needs to be human curated to ensure
             | everything the LLM needs will be present. LLMs are
             | probabilistic, so the more relevant context, the greater
             | the chances the final output is the most desired.
        
           | hirako2000 wrote:
           | Not clear how it gets around what is, ultimately, a context
           | limit.
           | 
           | I've been fiddling with some process too, would be good if
           | you shared the how. The readme looks like yet another full
           | fledged app.
        
             | sdesol wrote:
             | Yes there is a context window limit, but I've found for
             | most frontier models, you can generate very effective code
             | if the context window is under 75,000 tokens provided the
             | context is consistent. You have to think of everything from
             | a probability point of view and the more logical the
             | context, the greater the chances of better code.
             | 
             | For example, if the frontend doesn't need to know the
             | backend code (other than the interface) not including the
             | backend code to solve a frontend one to solve a specific
             | problem can reduce context size and improve the chances of
             | expected output. You just need to ensure you include the
             | necessary interface documenation.
             | 
             | As for the full fledged app, I think you raised a good
             | point and I should add a 'No lock in' section for why to
             | use it. The app has a message tool that lets you pick and
             | choose what messages to copy. Once you've copied the
             | context (including any conversation messages that can help
             | the LLM), you can use the context where ever you want.
             | 
             | My strategy with the app is to be the first place you goto
             | to start a conversation before you even generate code, so
             | my focus is helping you construct contexts (the smaller the
             | better) to feed into LLMs.
        
           | handfuloflight wrote:
           | Doesn't Claude Code do all of this automatically?
        
             | sdesol wrote:
             | I haven't looked at Claud Code, so I don't know if they
             | have analyzers or not that understands how to extract any
             | type of data other than specific coding data that it is
             | trained on. Based on the runtime for some tasks, I would
             | not be surprised if it is going through all the files and
             | asking "is this relevant"
             | 
             | My tool is mainly targeted at massive code bases and
             | enterprise as I still believe the most efficient way to
             | build accurate context is by domain experts.
             | 
             | Right now, I would say 95% of my code is AI generated (98%
             | human architectured) and I am spending about $2 a day on
             | LLM costs and the code generation part usually never runs
             | more than 30 seconds for most tasks.
        
               | handfuloflight wrote:
               | Well you should look at it, because it's not going
               | through all files. I looked at your product and the
               | workflow is essentially asking me to do manually what
               | Claude Code does auto. Granted, manually selecting the
               | context will probably lead to lower costs in any case
               | because Claude Code invokes tool calls like grep to do
               | its search, so I do see merit in your product in that
               | respect.
        
               | sdesol wrote:
               | Looking at the code, it does have some sort of automatic
               | discovery. I also don't know how scalable Claude Code is.
               | I've spent over a decade thinking about code search, so I
               | know what the limitations are for enterprise code.
               | 
               | One of the neat tricks that I've developed is, I would
               | load all my backend code for my search component and then
               | I would ask the LLM to trace a query and create a context
               | bundle for only the files that are affected. Once the LLM
               | has finished, I just need to do a few clicks to refine a
               | 80,000 token size window down to about 20,000 tokens.
               | 
               | I would not be surprised if this is one of the tricks
               | that it does as it is highly effective. Also, yes my tool
               | is manual, but I treat conversations as durable asset so
               | in the future, you should be able to say, last week I did
               | this, load the same files and LLM will know what files to
               | bring into context.
        
               | handfuloflight wrote:
               | Excellent, I look forward to trying it out, at minimum to
               | wean off dependency to Claude Code and it's likely
               | current state of overspending on context. I agree with
               | looking at conversations as durable assets.
        
               | sdesol wrote:
               | > current state of overspending on context
               | 
               | The thing that is killing me when I hear about Claude
               | Code and other agent tools is the amount of energy they
               | must be using. People say they let the task run for an
               | hour and I can't help but to think how much energy is
               | being used and if Claude Code is being upfront with how
               | much things will actually cost in the future.
        
               | pacoWebConsult wrote:
               | FWIW Claude code conversations are also durable. You can
               | resume any past conversation in your project. They're
               | stored as jsonl files within your `$HOME/.claude`
               | directory. This retains the actual context (including
               | your prompts, assistant responses, tool usages, etc) from
               | that conversation, not just the files you're affecting as
               | context.
        
               | sdesol wrote:
               | Thanks for the info. I actually want to make it easy for
               | people to review aider, plandex, claude code, etc.
               | conversations so I will probably look at importing them.
               | 
               | My goal isn't to replace the other tools, but to make
               | them work smarter and more efficiently. I also think we
               | will in a year or two, start measuring performance based
               | on how developers interact with LLMs (so management will
               | want to see the conversations). Instead of looking at
               | code generated, the question is going to be, if this
               | person is let go, what is the impact based on how they
               | are contributing via their conversations.
        
               | ec109685 wrote:
               | It greps around the code like an intern would. You have
               | to have patience and be willing to document workflows and
               | correct when it gets things wrong via CLAUDE.md files.
        
               | sdesol wrote:
               | Honestly, grepping isn't a bad strategy if there is
               | enough context to generate focused keywords/patterns to
               | search. The "let Claude Code think for 10 minutes or
               | more", makes a lot more sense now, as this brute force
               | method can take some time.
        
               | ec109685 wrote:
               | Yeah and it's creative with its grepping.
        
           | msikora wrote:
           | Why not build this as an MCP so that people can plug it into
           | their favorite platform?
        
             | sdesol wrote:
             | An MCP is definitely on the roadmap. My objective is to
             | become the context engine for LLMs so having a MCP is
             | required. However, there will be things from a UX
             | perspective that you'll lose out on if you just use the
             | MCP.
        
         | seanmmward wrote:
         | The primary use case isn't just about shoving more code in
         | context, although depending on the task, there is an
         | irredicible minimum context needed for it to capture all the
         | needed understanding. The 1M context model is a unique beast in
         | terms of how you need to feed it, and its real power is being
         | able to tackle long horizon tasks which require iterative
         | exploration, in context learning, and resynthesis. Ie, some
         | problems are breadth (go fix an api change in 100 files), other
         | however require depth (go learn from trying 15 different ways
         | to solve this problem). 1M Sonnet is unique in its capabilities
         | for the latter in particular.
        
         | hinkley wrote:
         | Sounds to me like your problem has shifted from how much the AI
         | tool costs per hour to how much it costs per token because
         | resetting a model happens often enough that the price doesn't
         | amortize out per hour. That giant spike every ?? months
         | overshadows the average cost per day.
         | 
         | I wonder if this will become more universal, and if we won't
         | see a 'tick-tock' pattern like Intel used, where they tweak the
         | existing architecture one or more times between major design
         | work. The 'tick' is about keeping you competitive and the
         | 'tock' is about keeping you relevant.
        
         | TZubiri wrote:
         | "However. Price is king. Allowing me to flood the context
         | window with my code base is great"
         | 
         | I don't vibe code, but in general having to know all of the
         | codebase to be able to do something is a smell, it's
         | spagghetti, it's lack of encapsulation.
         | 
         | When I program I cannot think about the whole database, I have
         | a couple of files open tops and I think about the code in those
         | files.
         | 
         | This issue of having to understand the whole codebase,
         | complaining about abstractions, microservices, and OOP, and
         | wanting everything to be in a "simple" monorepo, or a monolith;
         | is something that I see juniors do, almost exclusively.
        
         | ants_everywhere wrote:
         | > I really desperately need LLMs to maintain extremely
         | effective context
         | 
         | The context is in the repo. An LLM will never have the context
         | you need to solve all problems. Large enough repos don't fit on
         | a single machine.
         | 
         | There's a tradeoff just like in humans where getting a specific
         | task done requires removing distractions. A context window that
         | contains everything makes focus harder.
         | 
         | For a long time context windows were too small, and they
         | probably still are. But they have to get better at
         | understanding the repo by asking the right questions.
        
           | stuartjohnson12 wrote:
           | > An LLM will never have the context you need to solve all
           | problems.
           | 
           | How often do you need more than 10 million tokens to answer
           | your query?
        
             | ants_everywhere wrote:
             | I exhaust the 1 million context windows on multiple models
             | multiple times per day.
             | 
             | I haven't used the Llama 4 10 million context window so I
             | don't know how it performs in practice compared to the
             | major non-open-source offerings that have smaller context
             | windows.
             | 
             | But there is an induced demand effect where as the context
             | window increases it opens up more possibilities, and those
             | possibilities can get bottlenecked on requiring an even
             | bigger context window size.
             | 
             | For example, consider the idea of storing all Hollywood
             | films on your computer. In the 1980s this was impossible.
             | If you store them in DVD or Bluray quality you could
             | probably do it in a few terabytes. If you store them in
             | full quality you may be talking about petabytes.
             | 
             | We recently struggled to get a full file into a context
             | window. Now a lot of people feel a bit like "just take the
             | whole repo, it's only a few MB".
        
               | brulard wrote:
               | I think you misunderstand how context in current LLMs
               | works. To get the best results you have to be very
               | careful to provide what is needed for immediate task
               | progression, and postpone context thats needed later in
               | the process. If you give all the context at once, you
               | will likely get quite degraded output quality. Thats like
               | if you want to give a junior developer his first task,
               | you likely won't teach him every corner of your app. You
               | would give him context he needs. It is similar with these
               | models. Those that provided 1M or 2M of context (Gemini
               | etc.) were getting less and less useful after cca 200k
               | tokens in the context.
               | 
               | Maybe models would get better in picking up relevant
               | information from large context, but AFAIK it is not the
               | case today.
        
               | remexre wrote:
               | That's a really anthropomorphizing description; a more
               | mechanical one might be,
               | 
               | The attention mechanism that transformers use to find
               | information in the context is, in its simplest form,
               | O(n^2); for each token position, the model considers
               | whether relevant information has been produced at the
               | position of every other token.
               | 
               | To preserve performance when really long contexts are
               | used, current-generation LLMs use various ways to
               | consider fewer positions in the context; for example,
               | they might only consider the 4096 "most likely" places to
               | matter (de-emphasizing large numbers of "subtle hints"
               | that something isn't correct), or they might have some
               | way of combining multiple tokens worth of information
               | into a single value (losing some fine detail).
        
               | ants_everywhere wrote:
               | > I think you misunderstand how context in current LLMs
               | works.
               | 
               | Thanks but I don't and I'm not sure why you're jumping to
               | this conclusion.
               | 
               | EDIT: Oh I think you're talking about the last bit of the
               | comment! If you read the one before I say that feeding it
               | the entire repo isn't a great idea. But great idea or
               | not, people want to do it, and it illustrates that as
               | context window increases it creates demand for even
               | larger context windows.
        
               | brulard wrote:
               | I said that based on you saying you exhaust a million
               | token context windows easily. I'm no expert on that, but
               | I think the current state of LLMs works best if you are
               | not approaching that 1M token limit, because large
               | context (reportedly) deteriorates response quality
               | quickly. I think state of the art usage is managing
               | context in tens or low hundreds thousands tokens at most
               | and taking advantage of splitting tasks across subtasks
               | in time, or splitting context across multiple "expert"
               | agents (see sub-agents in claude code).
        
               | jimbokun wrote:
               | It seems like LLM need to become experts at managing
               | their OWN context.
               | 
               | Selectively gripping and searching the code to pull into
               | context only those parts relevant to the task at hand.
        
               | brulard wrote:
               | That's what I'm thinking about a lot. Something like the
               | models "activate" just some subset of parameters when
               | working (if I understand the new models correctly). So
               | that model could activate parts of context which are
               | relevant for the task at hand
        
             | rocqua wrote:
             | It doesn't take me 10000000 tokens to have the context
             | "this was the general idea of the code, these were
             | unimportant implementation details, and this is where
             | lifetimes were tricky."
             | 
             | And that context is the valuable bit for quickly getting
             | back up to speed on a codebase.
        
           | onion2k wrote:
           | _Large enough repos don 't fit on a single machine._
           | 
           | I don't believe any human can understand a problem if they
           | need to fit the entire problem blem domain in their head, and
           | the scope of a domain that doesn't fit on a computer. You
           | _have_ to break it down into a manageable amount of
           | information to tackle it in chunks.
           | 
           | If a person can do that, so can an LLM prompted to do that by
           | a person.
        
             | wraptile wrote:
             | Right, the LLM doesn't need to know all of the code under
             | utils.parse_id to know that this call will parse the ID.
             | The best LLM results I get is when I manually define the
             | the relative code graph of my problem similar how I'd
             | imagine it my head which seems to provide optimal context.
             | So bigger isn't really better.
        
               | rocqua wrote:
               | I wonder why we can't have one LLM generate this
               | understanding for another? Perhaps this is where teaming
               | of LLMs gets its value. In managing high and low level
               | context in different context windows.
        
               | mixedCase wrote:
               | This is a thing and doesn't require a separate model. You
               | can set up custom prompts that will, based on another
               | prompt describing the task to achieve, generate
               | information about the codebase and a set of TODOs to
               | accomplish the task, generating markdown files with a
               | summarized version of the relevant knowledge and
               | prompting you again to refine that summary if needed. You
               | can then use these files to let the agent take over
               | without going on a wild goose chase.
        
             | ehnto wrote:
             | I disagree, I may not have the whole codebase in my head in
             | one moment but I have had all of it in my head at some
             | point, and it is still there, that is not true of an LLM. I
             | use LLMs and am impressed by them, but they just do not
             | approximate a human in this particular area.
             | 
             | My ability to break a problem down does not start from
             | listing the files out and reading a few. I have a high
             | level understanding of the whole project at all times, and
             | a deep understanding of the whole project stored, and I can
             | recall that when required, this is not true of an LLM at
             | any point.
             | 
             | We know this is a limitation and it's why we have various
             | tools attempting to approximate memory and augment training
             | on the fly, but they are approximations and they are in my
             | opinion, not even close to real human memory and depth of
             | understanding for data it was not trained on.
             | 
             | Even for mutations of scenarios it was trained on, which
             | code is a great example of that. It is trained on billions
             | of lines of code, yet still fails to understand my codebase
             | intuitively. I have definitely not read billions of lines
             | of code.
        
               | ehnto wrote:
               | Additionally, the more information you put into the
               | context the more confused the LLM will get, if you did
               | dump the whole codebase into the context it would not
               | suddenly understand the whole thing. It is still an LLM,
               | all you have done is polluted the context with a million
               | lines of unrelated code, and some lines of related code,
               | which it will struggle to find in the noise (in my
               | experience of much smaller experiments)
        
               | Bombthecat wrote:
               | I call this context decay. :)
               | 
               | The bigger the context, the more stuff "decays" sometimes
               | to complete different meanings
        
               | xwolfi wrote:
               | You only worked on very small codebase then. When you
               | work on giant ones, you Ctrl+F a lot, build a limited
               | model of the problem space, and pray the unit tests will
               | catch anything you might have missed...
        
               | akhosravian wrote:
               | And when you work on a really big codebase you start
               | having multiple files and have to learn tools more
               | advanced than ctrl-f!!
        
               | ghurtado wrote:
               | > and have to learn tools more advanced than ctrl-f!!
               | 
               | Such as ctrl-shift-f
               | 
               | But this is an advanced topic, I don't wanna get into it
        
               | ehnto wrote:
               | We're measuring lengths of string, but I would not say I
               | have worked on small projects. I am very familiar with
               | discovery, and have worked on a lot of large legacy
               | projects that have no tests just fine.
        
               | jimbokun wrote:
               | Why are LLMs so bad at doing the same thing?
        
               | airbreather wrote:
               | you will have abstractions - black boxing, interface
               | overviews etc, humans can only hold so much detail in
               | current context memory, some say 7 items on average.
        
               | ehnto wrote:
               | Of course, but even those blackoxes are not empty,
               | they've got a vague picture inside them based on prior
               | experience. I have been doing this for a while so most
               | things are just various flavours of the same stuff,
               | especially in enterprise software.
               | 
               | The important thing in this context is that I know it's
               | all there, I don't have to grep the codebase to fill up
               | my context, and my understanding of the holistic project
               | does not change each time I am booted up.
        
               | jimbokun wrote:
               | And LLMs can't leverage these abstractions nearly as well
               | as humans...so far.
        
               | PaulDavisThe1st wrote:
               | > I disagree, I may not have the whole codebase in my
               | head in one moment but I have had all of it in my head at
               | some point, and it is still there, that is not true of an
               | LLM.
               | 
               | All 3 points (you have had all of it your head at some
               | point, it is still there, that is not true of an LLM) are
               | mere conjectures, and not provable at this time,
               | certainly not in the general case. You may be able to
               | show this of some codebases for some developers and for
               | some LLMs, but not all.
        
               | fnordsensei wrote:
               | The brain can literally not process any piece of
               | information without being changed by the act of
               | processing it. Neuronal pathways are constantly being
               | reinforced or weakened.
               | 
               | Even remembering alters the memory being recalled,
               | entirely unlike how computers work.
        
               | johnisgood wrote:
               | For humans, remembering strengthens that memory, even if
               | it is dead wrong.
        
               | Lutger wrote:
               | I've always find it interesting that once I take a wrong
               | turn finding my way through the city and I'm not
               | deliberate about remembering this was, in fact, a
               | mistake, I am more prone to taking the same wrong turn
               | again the next time.
        
               | dberge wrote:
               | > once I take a wrong turn finding my way through the
               | city... I am more prone to taking the same wrong turn
               | again
               | 
               | You may want to stay home then to avoid getting lost.
        
               | jbs789 wrote:
               | I'm not sure the idea that a developer maintains a high
               | level understanding is all that controversial...
        
               | animuchan wrote:
               | The trend for this idea's controversiality is shown on
               | this very small chart: /
        
               | ehnto wrote:
               | I never intended to say it was true of all codebases for
               | all developers, that would make no sense. I don't know
               | all developers.
               | 
               | I think it's objectively true that the information is not
               | in the LLM. It did not have all codebases to train with,
               | and they do not (immediately) retrain on the codebases
               | they encounter through usage.
        
               | ivape wrote:
               | _My ability to break a problem down does not start from
               | listing the files out and reading a few._
               | 
               | I does, it's just happening at lightning speed.
        
               | CPLX wrote:
               | We don't actually know that.
               | 
               | If we had that level of understanding of how exactly our
               | brains do what they do things would be quite different.
        
               | onion2k wrote:
               | _My ability to break a problem down does not start from
               | listing the files out and reading a few._
               | 
               | If you're completely new to the problem then ... yes, it
               | does.
               | 
               | You're assuming that you're working on a project that
               | you've spent time on and learned the domain for, and then
               | you're comparing that to an LLM being prompted to look at
               | a codebase with the context of the files. Those things
               | are not the same though.
               | 
               | A closer analogy to LLMs would be prompting it for
               | questions when it has access (either through MCP or
               | training) to the project's git history, documentation,
               | notes, issue tracker, etc. When that sort of thing is
               | commonplace, and LLMs have the context window size to
               | take advantage of all that information, I suspect we'll
               | be surprised how good they are even given the results we
               | get today.
        
               | ehnto wrote:
               | > If you're completely new to the problem then ... yes,
               | it does.
               | 
               | Of course, because I am not new to the problem, whereas
               | an LLM is new to it every new prompt. I am not really
               | trying to find a fair comparison because I believe humans
               | have an unfair advantage in this instance, and am trying
               | to make that point, rather than compare like for like
               | abilities. I think we'll find even with all the context
               | clues from MCPs and history etc. they might still fail to
               | have the insight to recall the right data into the
               | context, but that's just a feeling I have from working
               | with Claude Code for a while. Because I instruct it to do
               | those things, like look through git log, check the
               | documentation etc, and it sometimes finds a path through
               | to an insight but it's just as likely to get lost.
               | 
               | I alluded to it somewhere else but my experience with
               | massive context windows so far has just been that it
               | distracts the LLM. We are usually guiding it down a path
               | with each new prompt and have a specific subset of
               | information to give it, and so pumping the context full
               | of unrelated code at the start seems to derail it from
               | that path. That's anecdotal, though I encourage you to
               | try messing around with it.
               | 
               | As always, there's a good chance I will eat my hat some
               | day.
        
               | scott_s wrote:
               | > Of course, because I am not new to the problem, whereas
               | an LLM is new to it every new prompt.
               | 
               | That is true for the LLMs you have access to now. Now
               | imagine if the LLM had been _trained_ on your entire code
               | base. And not just the code, but the entire commit
               | history, commit messages and also all of your external
               | design docs. _And_ code and docs from all relevant
               | projects. That LLM would not be new to the problem every
               | prompt. Basically, imagine that you fine-tuned an LLM for
               | your specific project. You will eventually have access to
               | such an LLM.
        
               | jimbokun wrote:
               | Why haven't the bug AI companies been pursuing that
               | approach, vs just ramping up context window size?
        
               | scott_s wrote:
               | Because training one family of models with very large
               | context windows can be offered to the entire world as an
               | online service. That is a very different business model
               | from training or fine-tuning individual models
               | specifically for individual customers. _Someone_ will
               | figure out how to do that at scale, eventually. It might
               | require the cost of training to reduce significantly. But
               | large companies with the resources to do this for
               | themselves will do it, and many are doing it.
        
             | krainboltgreene wrote:
             | I have an entire life worth of context and I still remember
             | projects I worked on 15 years ago.
        
               | adastra22 wrote:
               | Not with pixel perfect accuracy. You vaguely remember,
               | although it may not feel like that because your brain
               | fills in the details (hallucinates) as you recall. The
               | comparisons are closer than you might think.
        
               | vidarh wrote:
               | The comparison would be apt if the LLM was _trained on
               | your codebase_.
        
               | jimbokun wrote:
               | Isn't that the problem?
               | 
               | I don't see any progress on incrementally training LLMs
               | on specific projects. I believe it's called fine tuning,
               | right?
               | 
               | Why isn't that the default approach anywhere instead of
               | the hack of bigger "context windows"?
        
               | adastra22 wrote:
               | Because fine-tuning can be used to remove restrictions
               | from a model, so they don't give us plebs access to that.
        
               | gerhardi wrote:
               | I'm not well versed enough on this but wouldn't it be a
               | problem with custom training that the specific project
               | training codebases probably would likely have a lot of
               | the implemented stuff, relevant for the domain, only once
               | and in one way, compared to how the todays popular large
               | models have been trained maybe with countless different
               | ways to use common libraries for whatever various tasks
               | with whatever Github ripped material fed in?
        
               | krainboltgreene wrote:
               | You have no idea if I remember with pixel perfect
               | accuracy (whatever that even means). There are plenty of
               | people with photographic memory.
               | 
               | Also, you're a programmer you have no foundation of
               | knowledge on which to make that assessment. You might as
               | well opine on quarks or martian cellular life. My god the
               | arrogance of people in my industry.
        
               | johnisgood wrote:
               | > There are plenty of people with photographic memory.
               | 
               | I thought it was rare.
        
               | adastra22 wrote:
               | Repeated studies have shown that perfect "photographic
               | memory" does not in fact exist. Nobody has it. Some
               | people think that they do though, but when tested under
               | lab conditions those claims don't hold up.
               | 
               | I don't believe these people are lying. They are self-
               | reporting their own experiences, which unfortunately have
               | the annoying property of being generated by the very mind
               | that is living the experience.
               | 
               | What does it mean to have an eidetic memory? It means
               | that when you remember something you vividly remember
               | details, and can examine those details to your heart's
               | content. When you do so, it feels like all those details
               | are correct. (Or so I'm told, I'm the opposite with
               | aphantasia.)
               | 
               | But it turns out if you actually have a photo reference
               | and do a blind comparison test, people who report
               | photographic memories actually don't do statistically any
               | better than others in remembering specific fine details,
               | even though they claim that they clearly remember.
               | 
               | The simpler explanation is that while all of our brains
               | are provide hallucinated detail to fill the gaps of
               | memories, their brains are wired up to present those made
               | up details feel much more real than they do to others.
               | That is all.
        
               | HarHarVeryFunny wrote:
               | > Repeated studies have shown that perfect "photographic
               | memory" does not in fact exist.
               | 
               | This may change your mind!
               | 
               | https://www.youtube.com/watch?v=jVqRT_kCOLI
        
             | melagonster wrote:
             | Sure, this is why AGI looks possible sometimes. But
             | companies should not require their users to create AGI for
             | them.
        
             | friendzis wrote:
             | Fitting the _entire_ problem domain in their head is what
             | engineers _do_.
             | 
             | Engineering is merely a search for optimal solution in this
             | multidimensional space of problem domain(-s), requirements,
             | limitations and optimization functions.
        
               | barnabee wrote:
               | _Good_ engineers fit their entire understanding of the
               | problem domain in their head
               | 
               | The best engineers understand how big a difference that
               | is
        
           | sdesol wrote:
           | > But they have to get better at understanding the repo by
           | asking the right questions.
           | 
           | How I am tackling this problem is making it dead simple for
           | users to create analyzers that are designed to enriched text
           | data. You can read more about how it would be used in a
           | search at https://github.com/gitsense/chat/blob/main/packages
           | /chat/wid...
           | 
           | The basic idea is, users would construct analyzers with the
           | help of LLMs to extract the proper metadata that can be
           | semantically searched. So when the user does an AI Assisted
           | search with my tool, I would load all the analyzers
           | (description and schema) into the system prompt and the LLM
           | can determine which analyzers can be used to answer the
           | question.
           | 
           | A very simplistic analyzer would be to make it easy to
           | identify backend and frontend code so you can just use the
           | command `!ask find all frontend files` and the LLM will
           | construct a deterministic search that knows to match for
           | frontend files.
        
             | mrits wrote:
             | How is that better than just writing a line in the md?
        
               | sdesol wrote:
               | I am not sure I follow what you are saying. What would
               | the line be and how would it become deterministically
               | searchable?
        
               | mrits wrote:
               | frontend path: /src/frontend/* backend path: /src/*
               | 
               | I suppose the problem you have might be unique to nextJS
               | ?
        
               | sdesol wrote:
               | The issue is frontend can be a loaded question,
               | especially if you are dealing with legacy stuff,
               | different frameworks, etc. You also can't tell what the
               | frontend code does by looking at that single line.
               | 
               | Now imagine as part of your analyzer, you have the
               | following instructions for the llm:
               | 
               | --- For all files in `src/frontend/ _` treat them as
               | frontend code. For all files in 'src/_' excluding
               | `src/frontend` treat as backend. Create a metadata called
               | `scope` which can be 'frontend', 'backend' or 'mix' where
               | mix means the code can be used for both front and backend
               | like utilities.
               | 
               | Now for each file, create a `keywords` metadata that
               | includes up to 10 unique keywords that describes the core
               | functionality for the file. ---
               | 
               | So with this you can say
               | 
               | - `!ask find all frontend files`
               | 
               | - `!ask find all mix use files`
               | 
               | - `!ask find all frontend files that does [this]`
               | 
               | and so forth.
               | 
               | The whole point of analyzers is to make it easy for the
               | LLM to map your natural language query to a deterministic
               | search.
               | 
               | If the code base is straightforward and follows a well
               | known framework, asking for frontend or backend wouldn't
               | even need an entry as you can just include in the
               | instructions that I use framework X and the LLM would
               | know what to consider.
        
           | mock-possum wrote:
           | > The context is in the repo
           | 
           | Agreed but that's a bit different from "the context _is_ the
           | repo"
           | 
           | It's been my experience that usually just picking a couple
           | files out to add to the context is enough - Claude seems
           | capable of following imports and finding what it needs, in
           | most cases.
           | 
           | I'm sure it depends on the task, and the structure of the
           | codebase.
        
           | manmal wrote:
           | > The context is in the repo
           | 
           | No it's in the problem at hand. I need to load all related
           | files, documentation, and style guides into the context. This
           | works really well for smaller modules, but currently falls
           | apart after a certain size.
        
           | alvis wrote:
           | Everything in context hurts focus. It's like some people
           | suffering from hyperthymesia. They are easily get distracted
           | when the recall something
        
           | injidup wrote:
           | All the more reason for good software engineering. Folders of
           | files managing one concept. Files tightly focussed on sub
           | problems of that concept. Keep your code so that you can
           | solve problems in self contained context windows at the right
           | level of abstraction
        
             | Sharlin wrote:
             | I fear that LLM-optimal code structure is different from
             | human-optimal code structure, and people are starting to
             | optimize for the former rather than the latter.
        
         | NuclearPM wrote:
         | Problems
        
         | jack_pp wrote:
         | maybe we need LLMs trained on ASTs or create a new symbolic way
         | to represent software that's faster to grok by LLMs and have a
         | translator so we can verify the code
        
           | energy123 wrote:
           | You could probably build a decent agentic harness that
           | achieves something similar.
           | 
           | Show the LLM a tree and/or call-graph representation of your
           | codebase (e.g. `cargo diagram` and `cargo-depgraph`), which
           | is token efficient.
           | 
           | And give the LLM a tool call to see the contents of the
           | desired subtree. More precise than querying a RAG chunk or a
           | whole file.
           | 
           | You could also have another optional tool call which routes
           | the text content of the subtree through a smaller LLM that
           | summarizes it into a maximum density snippet, which the LLM
           | can use for a token efficient understanding of that subtree
           | during early the planning phase.
           | 
           | But I'd agree that an LLM built natively around AST is a
           | pretty cool idea.
        
         | fgbarben wrote:
         | Allow me to flood the fertile plains of its consciousness with
         | my seed... yes, yes, let it take root... this is important to
         | me
        
           | fgbarben wrote:
           | Let me despoil the rich geography of your context window with
           | my corrupted b2b SaaS workflows and code... absorb the
           | pollution, rework it, struggling against the weight... yes,
           | this pleases me, it is essential for the propagation of my
           | germline
        
         | dberge wrote:
         | > the price has substantially increased
         | 
         | I'm assuming the credits required per use won't increase in
         | Cursor.
         | 
         | Hopefully this puts pressure on them to lower credits required
         | for gpt-5.
        
         | khalic wrote:
         | This is a major issue with LLMs altogether, it probably has to
         | do with the transformer architecture. We need another
         | breakthrough in the field for this to become reality.
        
         | HarHarVeryFunny wrote:
         | Even 1 MB context is only roughly 20K LOC so pretty limiting,
         | especially if you're also trying to fit API documents or any
         | other lengthy material into the context.
         | 
         | Anthropic also recently said that they think that
         | longer/compressed context can serve as an alternative (not sure
         | what was the exact wording/characterization they used) to
         | continual/incremental learning, so context space is also going
         | to be competing with model interaction history if you want to
         | avoid groundhog day and continually having to tell/correct the
         | model the same things over and over.
         | 
         | It seems we're now firmly in the productization phase of LLM
         | development, as opposed to seeing much fundamental improvement
         | (other than math olympiad etc "benchmark" results, released to
         | give the impression of progress). Yannic Kilcher is right, "AGI
         | is not coming", at least not in the form of an enhanced LLM.
         | Demis Hassabis' very recent estimate was for 50% chance of AGI
         | by 2030 (i.e. still 15 years out).
         | 
         | While we're waiting for AGI, it seems a better approach to
         | needing everything in context would be to lean more heavily on
         | tool use, perhaps more similar to how a human works - we don't
         | memorize the entire code base (at least not in terms of
         | complete line-by-line detail, even though we may have a pretty
         | clear overview of a 10K LOC codebase while we're in the middle
         | of development) but rather rely on tools like grep and ctags to
         | locate relevant parts of source code on an as-needed basis.
        
           | aorobin wrote:
           | >"Demis Hassabis' very recent estimate was for 50% chance of
           | AGI by 2030 (i.e. still 15 years out)."
           | 
           | 2030 is only 5 years out
        
             | Zircom wrote:
             | That was his point lol, if someone is saying it'll happen
             | in 5 years, triple that for a real estimate.
        
           | km144 wrote:
           | As you alluded to at the end of your post--I'm not really
           | convinced 20k LOC is very limiting. How many lines of code
           | can you fit in your working mental model of a program?
           | Certainly less than 20k concrete lines of text at any given
           | time.
           | 
           | In your working mental model, you have broad understandings
           | of the broader domain. You have broad understandings of the
           | architecture. You summarize broad sections of the program
           | into simpler ideas. module_a does x, module_b does y, insane
           | file c does z, and so on. Then there is the part of the
           | software you're actively working on, where you need more
           | concrete context.
           | 
           | So as you move towards the central task, the context becomes
           | more specific. But the vague outer context is still crucial
           | to the task at hand. Now, you can certainly find ways to
           | summarize this mental model in an input to an LLM, especially
           | with increasing context windows. But we probably need to
           | understand how we would better present these sorts of things
           | to achieve _performance_ similar to a human brain, because
           | the mechanism is very different.
        
             | jacobr1 wrote:
             | This is basically how claude code works today. You have it
             | /init a description of the project structure into CLAUDE.md
             | that is used for each invocation. There is some implicit
             | knowledge in the project about common frameworks and
             | languages. Then when working on something between the
             | explicit and implicit knowledge and the task at hand it
             | will grep for relevant material in the project, load either
             | full or parts of files, and THEN it will start working on
             | the task. But it dynamically builds the context of the
             | codebase based on searching for the relevant bit. Short-
             | circuiting this by having a good project summary makes it
             | more efficient - but you don't need to literally copy in
             | all the code files.
        
               | HarHarVeryFunny wrote:
               | Interesting - thanks!
        
           | brookst wrote:
           | 1M tokens ~= 3.5M characters ~= 58k LOC at an _average_ of 60
           | chars /line. 88k LOC at 40 chars/line
        
             | HarHarVeryFunny wrote:
             | OK - I was thinking 1M chars (@ 50 chars/line) vs tokens,
             | but I'm not sure it makes much difference to the argument.
             | There are plenty of commercial code bases WAY bigger, and
             | as noted other things may also be competing for space in
             | the context.
        
           | HarHarVeryFunny wrote:
           | Just as a self follow-up, another motivation to lean on tool
           | use rather than massive context (cf. short-term memory) is to
           | keep LLM/AI written/modified code understandable to humans
           | ...
           | 
           | At least part of the reason that humans use hierarchical
           | decomposition and divide-and-conquor is presumably because of
           | our own limited short term memory, since hierarchical
           | organization (modules, classes, methods, etc) allows us to
           | work on a problem at different levels of abstraction while
           | only needing to hold that level of the hierarchy in memory.
           | 
           | Imagine what code might look like if written by something
           | with no context limit - just a flat hierarchy of functions,
           | perhaps, at least until it perhaps eventually learned, or was
           | told, the other reasons for hierarchical and modular
           | design/decomposition to assist in debugging and future
           | enhancement, etc!
        
       | rootnod3 wrote:
       | So, more tokens means better but at the same time more tokens
       | means it distracts itself too much along the way. So at the same
       | time it is an improvement but also potentially detrimental. How
       | are those things beneficial in any capacity? What was said last
       | week? Embrace AI or leave?
       | 
       | All I see so far is: don't embrace and stay.
        
         | rootnod3 wrote:
         | So, I see this got downvoted. Instead of just downvoting, I
         | would prefer to have a counter-argument. Honestly. I am on the
         | skeptic side of LLM, but would not mind being turned to the
         | other side with some solid arguments.
        
       | pupppet wrote:
       | How does anyone send these models that much context without it
       | tripping over itself? I can't get anywhere near that much before
       | it starts losing track of instruction.
        
         | 9wzYQbTYsAIc wrote:
         | I've been having decent luck telling it to keep track of itself
         | in a .plan file, not foolproof, of course, but it has some
         | ability to "preserve context" between contexts.
         | 
         | Right now I'm experimenting with using separate .plan files for
         | tracking key instructions across domains like architecture and
         | feature decisions.
        
           | CharlesW wrote:
           | > _I've been having decent luck telling it to keep track of
           | itself in a .plan file, not foolproof, of course, but it has
           | some ability to "preserve context" between contexts._
           | 
           | This is the way. Not only have I had good luck with both a
           | TASKS.md and TASKS-COMPLETE.md (for history), but I have an
           | .llm/arch full of AI-assisted, for-LLM .md files (auth.md,
           | data-access.md, etc.) that document architecture decisions
           | made along the way. They're invaluable for effectively and
           | efficiently crossing context chasms.
        
           | collinvandyck76 wrote:
           | Yeah, this. Each project I work on has it's own markdown file
           | named for the ticket or the project. Committed on the branch,
           | and I have claude rewrite it with the "current understanding"
           | periodically. After compacting, I have it re-read the MD file
           | and we get started again. Quite nice.
        
         | olddustytrail wrote:
         | I think it's key to not give it contradictory instructions,
         | which is an easy mistake to make if you forget where you
         | started.
         | 
         | As an example, I know of an instance where the LLM claimed it
         | had tried a test on its laptop. This obviously isn't true so
         | the user argued with it. But they'd originally told it that it
         | was a Senior Software Engineer so playing that role, saying you
         | tested locally is fine.
         | 
         | As soon as you start arguing with those minor points you break
         | the context; now it's both a Software Engineer and an LLM. Of
         | course you get confused responses if you do that.
        
           | pupppet wrote:
           | The problem I often have is I may have instruction like-
           | 
           | General instruction: - Do "ABC"
           | 
           | If condition == whatever: - Do "XYZ" instead
           | 
           | I have a hard time making the AI obey the instances I wish to
           | override my own instruction and without having full control
           | of the input context, I can't just modify my 'General
           | Instruction' on a case by case basis to simply avoid having
           | to contradict myself.
        
             | olddustytrail wrote:
             | That's a difficult case where you might want to collect
             | your good context and shift it to a different session.
             | 
             | It would be nice if the UI made that easy to do.
        
       | greenfish6 wrote:
       | Yes, but if you look in the rate limit notes, the rate limit is
       | 500k tokens / minite for tier 4, which we are on. Given how
       | stingy anthropic has been with rate limit increases, this is for
       | very few people right now
        
       | alvis wrote:
       | Context window after certain size doesn't bring in much benefit
       | but higher bill. If it still keeps forgetting instructions it
       | would be just much easier to be ended up with long messages with
       | higher context consumption and hence the bill
       | 
       | I'd rather having an option to limit the context size
        
         | EcommerceFlow wrote:
         | It does if you're working with bigger codebases. I've found
         | copy/pasting my entire codebase + adding a <task> works
         | significantly better than cursor.
        
           | spiderice wrote:
           | How does one even copy their entire codebase? Are you saying
           | you attach all the files? Or you use some script to copy all
           | the text to your clipboard? Or something else?
        
             | EcommerceFlow wrote:
             | I created a script that outputs the entire codebase to a
             | text file (also allows me to exclude
             | files/folders/node_modules), separating and labeling each
             | file in the program folder.
             | 
             | I then structure my prompts around like so:
             | 
             | <project_code> ``` ``` </project_code>
             | 
             | <heroku_errors> " " </heroku_errors>
             | 
             | <task> " " </task>
             | 
             | I've been using this with Google Ai studio and it's worked
             | phenomenally. 1 million tokens is A LOT of code, so I'd
             | imagine this would work for lots n lots of project type
             | programs.
        
             | swader999 wrote:
             | Repomix, there's a cli and an MCP.
        
       | andrewstuart wrote:
       | Oh man finally. This has been such a HUGE advantage for Gemini.
       | 
       | Could we please have zip files too? ChatGPT and Gemini both
       | unpack zip files via the chat window.
       | 
       | Now how about a button to download all files?
        
       | qsort wrote:
       | I won't complain about a strict upgrade, but that's a pricy boi.
       | Interesting to see differential pricing based on size of input,
       | which is understandable given the O(n^2) nature of attention.
        
       | isoprophlex wrote:
       | 1M of input... at $6/1M input tokens. Better hope it can one-shot
       | your answer.
        
         | elitan wrote:
         | have you ever hired humans?
        
           | bicepjai wrote:
           | Depends on which human you tried :) Donot underestimate
           | yourself !
        
       | rafaelero wrote:
       | god they keep raising prices
        
       | revskill wrote:
       | The critical issue with LLM which never beats human: break what
       | worked.
        
       | henriquegodoy wrote:
       | Thats incredible to see how ai models are improving, i'm really
       | happy with this news. (imo it's more impactful than the release
       | of gpt5) now, we need more tokens per second, and then the self-
       | improvement of the model will accelerate.
        
       | lherron wrote:
       | Wow, I thought they would feel some pricing pressure from GPT5
       | API costs, but they are doubling down on their API being more
       | expensive than everyone else.
        
         | sebzim4500 wrote:
         | I think it's the right approach, the cost of running these
         | things as coding assistants is negligable compared to the
         | benefit of even a slight model improvement.
        
         | AtNightWeCode wrote:
         | GPT5 API uses more tokens for answers of the same quality as
         | previous versions. Fell into that trap myself. I use both
         | Claude and OpenAI right now. Will probably drop OpenAI since
         | they are obviously not to be trusted considering the way they
         | do changes.
        
       | shamano wrote:
       | 1M tokens is impressive, but the real gains will come from how we
       | curate context--compact summaries, per-repo indexes, and phase
       | resets. Bigger windows help; guardrails keep models focused and
       | costs predictable.
        
       | jbellis wrote:
       | Just completed a new benchmark that sheds some light on whether
       | Anthropic's premium is worth it.
       | 
       | (Short answer: not unless your top priority is speed.)
       | 
       | https://brokk.ai/power-rankings
        
         | 24xpossible wrote:
         | Why no Grok 4?
        
           | Zorbanator wrote:
           | You should be able to guess.
        
           | jeffhuys wrote:
           | People hate it because it had less filters and media caught
           | on, so they told people to hate it.
           | 
           | It's actually the best one right now, or close to. For my
           | uses (code and queries) nothing comes even close.
           | 
           | Once people look past the "but ELoN mUssKkkk!!!", they'll be
           | surprised.
        
           | jbellis wrote:
           | the accompanying blog post explains: xAI did not respond to
           | our requests for a grok 4 quota that would allow us to run
           | the evaluation
        
         | rcanepa wrote:
         | I recently switched to the $200 CC subscription and I think I
         | will stay with it for a while. I briefly tested whatever
         | version of ChatGPT 5 comes with the free Cursor plan and it was
         | unbearably slow. I could not really code with it as I was
         | constantly getting distracted while waiting for a response. So,
         | speed matters a lot for some people.
        
       | Someone1234 wrote:
       | Before this they supposedly had a longer context window than
       | ChatGPT, but I have workloads that abuse the heck out of context
       | windows (100-120K tokens). ChatGPT genuinely seems to have a 32K
       | context window, in the sense that is legitimately remembers/can
       | utilize everything within that window.
       | 
       | Claude previously had "200K" context windows, but during testing
       | it wouldn't even hit a full 32K before hitting a wall/it
       | forgetting earlier parts of the context. They also have extremely
       | short prompt limits relative to the other services around, making
       | it hard to utilize their supposedly larger context windows (which
       | is suspicious).
       | 
       | I guess my point is that with Anthropic specifically, I don't
       | trust their claims because that has been my personal experience.
       | It would be nice if this "1M" context window now allows you to
       | actually use 200K though, but it remains to be seen if it can
       | even do _that_. As I said with Anthropic you need to verify
       | everything they claim.
        
         | Etheryte wrote:
         | Strong agree, Claude is very quick to forget things like "don't
         | do this", "never do this" or things it tried that were wrong.
         | It will happily keep looping even in very short conversations,
         | completely defeating the purpose of using it. It's easy to game
         | the numbers, but it falls apart in the real world.
        
           | joquarky wrote:
           | I've found it better to use antonyms than negations most
           | situations.
        
             | typpilol wrote:
             | Same here. Always tell them the way you want it done.
             | 
             | For example:
             | 
             | Instead of "don't modify the tests"
             | 
             | It should be: analyze the test output and fix the bug in
             | the source code. The test is built correctly.
             | 
             | Not the best but you get the idea.
             | 
             | The one problem with this is if you don't know how to do
             | something properly. Like if you're just writing in your
             | prompt "generate 90% test coverage" , you give it a lot
             | more leeway to do whatever it wants.
             | 
             | And that's how you end up with the source code being
             | modified to fit the test vice versa
        
         | wahnfrieden wrote:
         | ChatGPT Pro has a longer window but I've read conflicting
         | reports on what it actually uses
        
       | lvl155 wrote:
       | Only time this is useful is to do init on a sizable code base or
       | dump a "big" csv.
        
       | film42 wrote:
       | The 1M token context was Gemini's headlining feature. Now, the
       | only thing I'd like Claude to work on is tokens counted towards
       | document processing. Gemini will often bill 1/10th the tokens
       | Anthropic does for the same document.
        
       | varyherb wrote:
       | I believe this can be configured in Claude Code via the following
       | environment variable:
       | 
       | ANTHROPIC_BETAS="context-1m-2025-08-07" claude
        
         | falcor84 wrote:
         | Have you tested it? I see that this env var isn't specified in
         | their docs
         | 
         | https://docs.anthropic.com/en/docs/claude-code/settings#envi...
        
           | bazhand wrote:
           | Add these settings to your `.claude/settings.json`:
           | ```json          {            "env": {
           | "ANTHROPIC_CUSTOM_HEADERS": {"anthropic-beta":
           | "context-1m-2025-08-07"},              "ANTHROPIC_MODEL":
           | "claude-sonnet-4-20250514",
           | "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 8192            }          }
           | ```
        
           | varyherb wrote:
           | Yup! Claude Code has a lot of undocumented configuration.
           | Once I saw the beta header value in their docs [1], I tried
           | to see in their source code if there was anyway to specify
           | this flag via env var config. Their source code is already on
           | your computer, just gotta dig through the minified JS :) Try:
           | 
           | `cat $(which claude) | grep ANTHROPIC_BETAS`
           | 
           | Sibling comment's approach with the other (documented) env
           | var works too.
           | 
           | [1] https://docs.anthropic.com/en/docs/build-with-
           | claude/context...
        
         | anonym29 wrote:
         | Tested this morning. Worked wonderfully, except ran into output
         | issues. Attempted to patched the minified Claude file's
         | CLAUDE_CODE_MAX_OUTPUT_TOKENS hard limit of 32000 on Sonnet to
         | 64000, which worked, and I was able to generate outputs above
         | 32000 tokens, but this coincided with a breakage of the 1m
         | context window for me. Still testing and playing around with
         | this, but this may be getting patched?
        
       | gdudeman wrote:
       | A tip for those who both use Claude Code and are worried about
       | token use (which you should be if you're stuffing 400k tokens
       | into context even if you're on 20x Max):                 1. Build
       | context for the work you're doing. Put lots of your codebase into
       | the context window.       2. Do work, but at each logical
       | stopping point hit double escape to rewind to the context-filled
       | checkpoint. You do not spend those tokens to rewind to that
       | point.       3. Tell Claude your developer finished XYZ, have it
       | read it into context and give high level and low level feedback
       | (Claude will find more problems with your developer's work than
       | with yours).
       | 
       | If you want to have multiple chats running, use /resume and pull
       | up the same thread. Hit double escape to the point where Claude
       | has rich context, but has not started down a specific rabbit
       | hole.
        
         | rvnx wrote:
         | Thank you for the tips, do you know how to rollback latest
         | changes ? Trying very hard to do it, but seems like Git is the
         | only way ?
        
           | gdudeman wrote:
           | Git or my favorite "Undo all of those changes."
        
             | spike021 wrote:
             | this usually gets the job done for me as well
        
           | SparkyMcUnicorn wrote:
           | I haven't used it, but saw this the other day:
           | https://github.com/RonitSachdev/ccundo
        
           | rtuin wrote:
           | Quick tip when working with Claude Code and Git: When you're
           | happy with an intermediate result, stage the changes by
           | running `git add` (no commit). That makes it possible to
           | always go back to the staged changes when Claude messes up.
           | You can then just discard the unstaged changes and don't have
           | to roll back to the latest commit.
        
         | seperman wrote:
         | Very interesting. Why does Claude find more problems if we
         | mention the code is written by another developer?
        
           | bgilly wrote:
           | In my experience, Claude will criticize others more than it
           | will criticize itself. Seems similar to how LLMs in general
           | tend to say yes to things or call anything a good idea by
           | default.
           | 
           | I find it to be an entertaining reflection of the cultural
           | nuances embedded into training data and reinforcement
           | learning processes.
        
             | umbra07 wrote:
             | Interesting. In my experience, it's the opposite. Claude is
             | too syncophantic. If you tell it that it was wrong, it will
             | just accept your word at face value. If I give a problem to
             | both Claude and Gemini, their responses differ and I ask
             | Claude why Gemini has a different response - Claude will
             | just roll over and tell me that Gemini's response was
             | perfect and that it messed up.
             | 
             | This is why I was really taken by Gemini 2.0/2.5 when it
             | first came out - it was the first model that really pushed
             | back at you. It would even tell me that it wanted _x_
             | additional information to continue onwards, unprompted.
             | Sadly, as Google has neutered 2.5 over the last few months,
             | its independent streak has also gone away, and its only
             | slightly more individualistic than Claude /OpenAI's models.
        
           | mcintyre1994 wrote:
           | Total guess, but maybe it breaks it out of the sycophancy
           | that most models seem to exhibit?
           | 
           | I wonder if they'd also be better at things like telling you
           | an idea is dumb if you tell it it's from someone else and
           | you're just assessing it.
        
           | gdudeman wrote:
           | Claude is very agreeable and is an eager helper.
           | 
           | It gives you the benefit of the doubt if you're coding.
           | 
           | It also gives you the benefit of the doubt if you're looking
           | for feedback on your developers work. If you give it a hint
           | of distrust "my developer says they completed this, can you
           | check and make sure, give them feedback....?" Claude will
           | look out for you.
        
           | daveydave wrote:
           | I would guess the training data (conversational as opposed to
           | coding specific solutions) is weighted towards people finding
           | errors in others work, more than people discussing errors in
           | their own. If you knew there was an error in your thinking,
           | you probably wouldn't think that way.
        
         | sixothree wrote:
         | I've been using Serena MCP to keep my context smaller. It seems
         | to be working because claude uses it pretty much exclusively to
         | search the codebase.
        
           | lucasfdacunha wrote:
           | Could you elaborate a bit on how that works? Does it need any
           | changes in how you use Claude?
        
             | sixothree wrote:
             | No. I have three MCPs installed and this is the only one
             | that doesn't need guidance. You'll see it using it for
             | search and finding references and such. It's a one line
             | install and no work to maintain.
             | 
             | The advantage is that Claude won't have to use the file
             | system to find files. And it won't have to go read files
             | into context to find what it's looking for. It can use its
             | context for the parts of code that actually matter.
             | 
             | And I feel like my results have actually been much better
             | with this.
        
         | yahoozoo wrote:
         | I thought double escape just clears the text box?
        
           | gdudeman wrote:
           | With an empty text box, double escape shows you a list of
           | previous inputs from you. You can go back and fork at any one
           | of those.
        
         | oars wrote:
         | I tell Claude that it wrote XYZ in another session (I wrote it)
         | then use that context to ask questions or make changes.
        
           | ewoodrich wrote:
           | Hah, I do the same when I need to manually intervene to nudge
           | the solution in the direction I want after a few failed
           | attempts to recontruct my prompt to avoid some undesired path
           | the LLM _really_ wants to go down.
        
         | gdudeman wrote:
         | I'll note this saves a lot of wait time as well! No sitting
         | there while a new Claude builds context from scratch.
        
         | i_have_an_idea wrote:
         | This sounds like the programmer equivalent of astrology.
         | 
         | > Build context for the work you're doing. Put lots of your
         | codebase into the context window.
         | 
         | If you don't say that, what do you think happens as the agent
         | works on your codebase.
        
           | bubblyworld wrote:
           | You don't have to think about it, you can just go try it. It
           | doesn't work as well (yet) for me. I'm still way better than
           | Claude at finding an initial heading.
           | 
           | Astrology doesn't produce working code =P
        
           | gdudeman wrote:
           | You don't say that - you instruct the LLM to read files about
           | X, Y, and Z. Putting the context in helps the agent plan
           | better (next step) and write correct code (final step).
           | 
           | If you're asking the agent to do chunks of work, this will
           | get better results than asking it to blindly go forth and do
           | work. Anthropic's best practices guide says as much.
           | 
           | If you're asking the agent to create one method that
           | accomplishes X, this isn't useful.
        
         | insane_dreamer wrote:
         | I usually tell CC (or opencode, which I've been using recently)
         | to look up the files and find the relevant code. So I'm not
         | attaching a huge number of files to the context. But I don't
         | actually know whether this saves tokens or not.
        
         | Wowfunhappy wrote:
         | I do this all the time and it sometimes works, but it's not a
         | silver bullet. Sometimes Claude benefits from having the full
         | conversation.
        
         | FajitaNachos wrote:
         | What's the benefit to using claude code CLI directly over
         | something like Cursor?
        
           | trenchpilgrim wrote:
           | Claude Code is a much more flexible tool:
           | https://docs.anthropic.com/en/docs/claude-
           | code/overview#why-...
        
           | qafy wrote:
           | the benefit is you can use your preferred editor. no need to
           | learn a completely new piece of software that doesnt match
           | your workflow just to get access to agentic workflows. for
           | example, my entire workflow for the last 15+ years has been
           | tmux+vim, and i have no desire to change that.
        
           | KptMarchewa wrote:
           | You don't have to deal with awfulness of vs code.
        
         | nojs wrote:
         | In my experience jumping back like this is risky unless you
         | explicitly tell it you made changes, otherwise they will get
         | clobbered because it will update files based on the old
         | context.
         | 
         | Telling it to "re-read" xyz files before starting works though.
        
           | bamboozled wrote:
           | I always ask it to read the last 5 commits and anaylize and
           | modified or staged files, works well...
        
             | mattmanser wrote:
             | Why do you find this better than just starting again at
             | that point? I'm trying to understand the benefit of using
             | this 'trick', without being able to try it as I'm away from
             | my computer.
             | 
             | Couldn't you start a new context and achieve the same
             | thing, without any of the risks of this approach?
        
               | bamboozled wrote:
               | LLMs have no "memory" so it generally gives it something
               | to go off, I forgot to add that I only do this if the
               | change i'm making is related to whatever I did yesterday.
               | 
               | I do this because sometimes I just manually edit code and
               | the LLM doesn't know everything that's happened.
               | 
               | I also find the. best way to work with "AI" is to make
               | very small changes and commit frequently, I truly think
               | it's a slot machine and if it does go wild, you can lose
               | hours of work.
        
         | cube00 wrote:
         | > Tell Claude your developer finished XYZ [...] (Claude will
         | find more problems with your developer's work than with yours).
         | 
         | It's crazy to think LLMs are so focused on pleasing us that we
         | have to trick them like this to get frank and fearless
         | feedback.
        
           | razemio wrote:
           | I think it is something else. If you think about it, humans
           | often write about correcting errors done by others.
           | Refactoring code, fixing bugs and write code more efficient.
           | I guess it triggers other paths in the model, if we write
           | that someone else did it. It is not about pleasing but our
           | constant desire to improve things.
        
             | cube00 wrote:
             | If we tell it a rival LLM wrote the code it will be extra
             | critical to tap into of its capitalist streak to crush the
             | competition?
        
       | ZeroCool2u wrote:
       | It's great they've finally caught up, but unfortunate it's on
       | their mid-tier model only and it's laughably expensive.
        
       | thimabi wrote:
       | Oh, well, ChatGPT is being left in the dust...
       | 
       | When done correctly, having one million tokens of context window
       | is amazing for all sorts of tasks: understanding large codebases,
       | summarizing books, finding information on many documents, etc.
       | 
       | Existing RAG solutions fill a void up to a point, but they lack
       | the precision that large context windows offer.
       | 
       | I'm excited for this release and hope to see it soon on the UI as
       | well.
        
         | OutOfHere wrote:
         | Fwiw, OpenAI does have a decent active API model family of
         | GPT-4.1 with a 1M context. But yes, the context of the GPT-5
         | models is terrible in comparison, and it's altogether atrocious
         | for the GPT-5-Chat model.
         | 
         | The biggest issue in ChatGPT right now is a very inconsistent
         | experience, presumably due to smaller models getting used even
         | for paid users with complex questions.
        
           | wahnfrieden wrote:
           | Doesn't it matter more what context they provide via Claude
           | Code and Codex CLI? And arent they similar anyway there?
           | 
           | Because the API with maximum context is very expensive (also
           | not rolled out to everyone)
        
       | kotaKat wrote:
       | A million tokens? Damn, I'm gonna need a _lot_ of quarters to
       | play this game at Chuck-E-Cheese.
        
       | xnx wrote:
       | 1M context windows are not created equal. I doubt Claude's recall
       | is as good as Gemini's 1M context recall.
       | https://cloud.google.com/blog/products/ai-machine-learning/t...
        
         | xnx wrote:
         | Good analysis here:
         | https://news.ycombinator.com/item?id=44878999
         | 
         | > the model that's best at details in long context text and
         | code analysis is still Gemini.
         | 
         | > Gemini Pro and Flash, by comparison, are far cheaper
        
       | firasd wrote:
       | A big problem with the chat apps (ChatGPT; Claude.ai) is the
       | weird context window hijinks. Especially ChatGPT does wild
       | stuff.. sudden truncation; summarization; reinjecting 'ghost
       | snippets' etc
       | 
       | I was thinking this should be up to the user (do you want to
       | continue this conversation with context rolling out of the window
       | or start a new chat) but now I realized that this is inevitable
       | given the way pricing tiers and limited computation works. Like
       | the only way to have full context is use developer tools like
       | Google AI Studio or use a chat app that wraps the API
       | 
       | With a custom chat app that wraps the API you can even inject the
       | current timestamp into each message and just ask the LLM btw
       | every 10 minutes just make a new row in a markdown table that
       | summarizes every 10 min chunk
        
         | cruffle_duffle wrote:
         | > btw every 10 minutes just make a new row in a markdown table
         | that summarizes every 10 min chunk
         | 
         | Why make it time based instead of "message based"... like
         | "every 10 messages, summarize to blah-blah.md"?
        
           | dev0p wrote:
           | Probably it's more cost effective and less error prone to
           | just dump the message log rather than actively rethink the
           | context window, costing resources and potentially losing
           | information in the process. As the models gets better, this
           | might change.
        
           | firasd wrote:
           | Sure. But you'd want to help out the LLM with a message count
           | like this is message 40, this is message 41... so when it
           | hits message 50 it's like ahh time for a new summary and call
           | the memory_table function (cause it's executing the earlier
           | standing order in your prompt)
        
       | tosh wrote:
       | How did they do the 1M context window?
       | 
       | Same technique as Qwen? As Gemini?
        
       | deadbabe wrote:
       | Unfortunately, larger context isn't really the answer after a
       | certain point. Small focused context is better, lazily throwing a
       | bunch of tokens in as a context is going to yield bad results.
        
       | ramoz wrote:
       | Awesome addition to a great model.
       | 
       | The best interface for long context reasoning has been AIStudio
       | by Google. Exceptional experience.
       | 
       | I use Prompt Tower to create long context payloads.
        
       | simianwords wrote:
       | How does "supporting 1M tokens" really work in practice? Is it a
       | new model? Or did they just remove some hard coded constraint?
        
         | eldenring wrote:
         | Serving a model efficiently at 1M context is difficult and
         | could be much more expensive/numerically tricky. I'm guessing
         | they were working on serving it properly, since its the same
         | "model" in scores and such.
        
           | simianwords wrote:
           | Thanks - still not clear what they did really. Some inference
           | time hacks?
        
             | FergusArgyll wrote:
             | That would imply the model always had a 1m token context
             | but they limited it in the api and app? That's strange
             | because they can just charge more for every token past 250k
             | (like google does, I believe).
             | 
             | But if not shouldn't it have to be completely retrained
             | model? it's clearly not that - good question!
        
             | Aeolun wrote:
             | They already had 0.5M context window on the enteprise
             | version.
        
             | otabdeveloper4 wrote:
             | Most likely still 32k tokens under the hood, but with some
             | context slicing/averaging hacks to make inference not error
             | out on infinite input.
             | 
             | (That's what I do locally with llama.cpp)
        
       | nickphx wrote:
       | Yay, more room for stray cats.
        
       | alienbaby wrote:
       | The fracturing of all the models offered across providers is
       | annoying. The number of different models and the fact a given
       | model will have different capabilities from different providers
       | is ridiculous.
        
       | chrisweekly wrote:
       | Peer of this post currently also on HN front page, comparing perf
       | for Claude vs Gemini, w/ 1M tokens:
       | https://news.ycombinator.com/item?id=44878999
        
       | DiabloD3 wrote:
       | Neat. I do 1M tokens context locally, and do it entirely with a
       | single GPU and FOSS software, and have access to a wide range of
       | models of equivalent or better quality.
       | 
       | Explain to me, again, how Anthropic's flawed business model
       | works?
        
         | codazoda wrote:
         | Tell us more?
        
           | DiabloD3 wrote:
           | Nothing really to say, its just like everyone else's
           | inference setups.
           | 
           | Select a model that produces good results, has anywhere from
           | 256k to 1M context (ex: Qwen3-Coder can do 1M), is under one
           | of the acceptable open weights licenses, and run it in
           | llama.cpp.
           | 
           | llama.cpp can split layers between active and MoE, and only
           | load the active ones into vram, leaving the rest of it
           | available for context.
           | 
           | With Qwen3-Coder-30B-A3B, I can use Unsloth's Q4_K_M, consume
           | a mere 784MB of VRAM with the active layers, then consume
           | 27648MB (kv cache) + 3096MB (context) with the kv cache
           | quantized to iq4_nl. This will fit onto a single card with
           | 32GB of VRAM, or slightly spill over on 24GB.
           | 
           | Since I don't personally need that much, I'm not pouring
           | entire projects into it (I know people do this, and more data
           | _does not produce better results_ ), I bump it down to 512k
           | context and fit it in 16.0GB, to avoid spill over on my 24GB
           | card. In the event I do need the context, I am always free to
           | enable it.
           | 
           | I do not see a meaningful performance difference between all
           | on the card and MoE sent to RAM while active is on VRAM, its
           | very much a worthwhile option for home inference.
           | 
           | Edit: For completeness sake, 256k context with this
           | configuration is 8.3GB total VRAM, making _very_ budget good
           | inference absolutely possible.
        
       | ffitch wrote:
       | I wonder how modern models fair on NovelQA and FLenQA (benchmarks
       | that test ability to understand long context beyond needle in a
       | haystack retrieval). The only such test on a reasoning model that
       | I found was done on o3-mini-high
       | (https://arxiv.org/abs/2504.21318), it suggests that reasoning
       | noticeably improves FLenQA performance, but this test only
       | explored context up to 3,000 tokens.
        
       | dang wrote:
       | Related ongoing thread:
       | 
       |  _Claude vs. Gemini: Testing on 1M Tokens of Context_ -
       | https://news.ycombinator.com/item?id=44878999 - Aug 2025 (9
       | comments)
        
       | whalesalad wrote:
       | My first thought was "gg no re" can't wait to see how this
       | changes compaction requirements in claude code.
        
       | pmxi wrote:
       | The reason I initially got interested in Claude was because they
       | were the first to offer a 200K token context window. That was
       | massive in 2023. However, they didn't keep up once Gemini offered
       | a 1M token window last year.
       | 
       | I'm glad to see an attempt to return to having a competitive
       | context window.
        
       | markb139 wrote:
       | I've tried 2 AI tools recently. Neither could produce the correct
       | code to calculate the CPU temperature on a Raspberry Pi RP2040.
       | The code worked, looked ok and even produced reasonable looking
       | results - until I put a finger on the chip and thus raised the
       | temp. The calculated temperature went down. As an aside the free
       | version of chatGPT didn't know about anything newer than 2023 so
       | couldn't tell me about the RP2350
        
         | anvuong wrote:
         | How can you be sure putting the finger on the chip raise the
         | temp? If you feel hot that means heat from the chip is being
         | transferred to your finger, that may decrease the temp, no?
        
         | broshtush wrote:
         | From my understanding putting your finger on an uncooled CPU
         | acts like a passive cooler, thus actually decreasing
         | temperature.
        
         | fwip wrote:
         | I don't think a larger context window would help with that.
        
           | fpauser wrote:
           | Best comment ;)
        
         | ghjv wrote:
         | wouldn't your finger have acted as a heat sink, lowering the
         | temp? sounds like the program may have worked correctly. could
         | be worth trying again with a hot enough piece of metal instead
         | of your finger
        
       | logicchains wrote:
       | With that pricing I can't imagine why anyone would use Claude
       | Sonnet through the API when Gemini 2.5 Pro is both better and
       | cheaper (especially at long-context understanding).
        
         | CuriouslyC wrote:
         | Claude is a good deal with the $20 subscription giving a fair
         | amount of sonnet use with Code. It's also got a very distinct
         | voice as far as LLMs go, and tends to produce cleaner/clearer
         | writing in general. I wouldn't use the API in an application
         | but the subscription feels like a pretty good deal.
        
       | siva7 wrote:
       | Ah, so claude code on subscription will become a crippled down
       | version
        
       | joduplessis wrote:
       | As far as coding goes Claude seems to be the most competent right
       | now, I like it. GPT5 is abysmal - I'm not sure if they're bugs,
       | or what, but the new release takes a good few steps back. Gemini
       | still a hit and miss - and Grok seems to be a poor man's Claude
       | (where code is kind of okay, a bit buggy and somehow similar to
       | Claude).
        
         | wahnfrieden wrote:
         | Are you evaluating gpt5-thinking on high mode, via API or Codex
         | CLI on Pro tier? Just wondering what specifically you compared
         | since those factors affect its performance and context
        
       | brokegrammer wrote:
       | Many people are confused about the usefulness of 1M tokens
       | because LLMs often start to get confused after about 100k. But
       | this is big for Claude 4 because it uses automatic RAG when the
       | context becomes large. With optimized retrieval thanks to RAG,
       | we'll be able to make good use of those 1M tokens.
        
         | m4r71n wrote:
         | How does this work under the hood? Does it build an in-memory
         | vector database of the input sources and runs queries on top of
         | that data to supplement the context window?
        
           | brokegrammer wrote:
           | No idea how it's implemented because it's proprietary.
           | Details here: https://support.anthropic.com/en/articles/11473
           | 015-retrieval...
        
       | Balgair wrote:
       | Wow!
       | 
       | As a fiction writer/noodler this is amazing. I can put not just a
       | whole book in as before, not just a whole series, but the entire
       | corpus of author _s_ in.
       | 
       | I mean, from the pov of biography writers, this is awesome too.
       | Just dump it all in, right?
       | 
       | I'll have to switch using to Sonnet 4 now for workflows and edit
       | my RAG code to be longer windows, a _lot_ longer
        
       | irthomasthomas wrote:
       | Brain: Hey, you going to sleep? Me: Yes. Brain: That 200,001st
       | token cost you $600,000/M.
        
       | qwertox wrote:
       | > desperately need LLMs to maintain extremely effective context
       | 
       | Last time I used Gemini it did something very surprising: instead
       | of providing readable code, it started to generate pseudo-
       | minified code.
       | 
       | Like on CSS class would become one long line of CSS, and one JS
       | function became one long line of JS, with most of the variable
       | names minified, while some remained readable, but short. It did
       | away with all unnecessary spaces.
       | 
       | I was asking myself what is happening here, and my only
       | explanation was that maybe Google started training Gemini on
       | minified code, on making Gemini understand and generate it, in
       | order to maximize the value of every token.
        
       | ericol wrote:
       | "...in API"
       | 
       | That's a VERY relevant clarification. this DOESN'T apply to web
       | or app users.
       | 
       | Basically, if you want a 1M context window you have to
       | specifically pay for it.
        
       | sporkland wrote:
       | Does anyone have data on how much better these 1M token context
       | models produce better results than the more limited windows
       | alongside certain RAG implementations? Or how much better in the
       | face of RAG the 200k vs 1M token models perform on a benchmark?
        
       | poniko wrote:
       | [Claude usage limit reached. Your limit will reset at..] .. eh
       | lunch is a good time to go home anyways..
        
       | chmod775 wrote:
       | For some context, only the tweaks files and scripting parts of
       | Cyberpunk 2077 are ~2 million LOC.
        
       | not_that_d wrote:
       | My experience with the current tools so far:
       | 
       | 1. It helps to get me going with new languages, frameworks,
       | utilities or full green field stuff. After that I expend a lot of
       | time parsing the code to understand what it wrote that I kind of
       | "trust" it because it is too tedious but "it works".
       | 
       | 2. When working with languages or frameworks that I know, I find
       | it makes me unproductive, the amount of time I spend writing a
       | good enough prompt with the correct context is almost the same or
       | more that if I write the stuff myself and to be honest the
       | solution that it gives me works for this specific case but looks
       | like a junior code with pitfalls that are not that obvious unless
       | you have the experience to know it.
       | 
       | I used it with Typescript, Kotlin, Java and C++, for different
       | scenarios, like websites, ESPHome components (ESP32), backend
       | APIs, node scripts etc.
       | 
       | Botton line: usefull for hobby projects, scripts and to
       | prototypes, but for enterprise level code it is not there.
        
         | jeremywho wrote:
         | My workflow is to use Claude desktop with the filesystem mcp
         | server.
         | 
         | I give claude the full path to a couple of relevant files
         | related to the task at hand, ie where the new code should hook
         | into or where the current problem is.
         | 
         | Then I ask it to solve the task.
         | 
         | Claude will read the files, determine what should be done and
         | it will edit/add relevant files. There's typically a couple of
         | build errors I will paste back in and have it correct.
         | 
         | Current code patterns & style will be maintained in the new
         | code. It's been quite impressive.
         | 
         | This has been with Typescript and C#.
         | 
         | I don't agree that what it has produced for me is hobby-grade
         | only...
        
           | taberiand wrote:
           | I've been using it the same way. One approach that's worked
           | well for me is to start a project and first ask it to analyse
           | and make a plan with phases for what needs to be done, save
           | that plan into the project, then get it to do each phase in
           | sequence. Once it completes a phase, have it review the code
           | to confirm if the phase is complete. Each phase of work and
           | review is a new chat.
           | 
           | This way helps ensure it works on manageable amounts of code
           | at a time and doesn't overload its context, but also keeps
           | the bigger picture and goal in sight.
        
             | mnky9800n wrote:
             | I find that sometimes this works great and sometimes it
             | happily tells you everything works and your code fails
             | successfully and if you aren't reading all the code you
             | would never know. It's kind of strange actually. I don't
             | have a good feeling when it will get everything correct and
             | when it will fail and that's what is disconcerting. I would
             | be happy to be given advice on what to do to untangle when
             | it's good and when it's not. I love chatting with Claude
             | code about code. It's annoying that it doesn't always get
             | it right and also doesn't really interact with failure like
             | a human would. At Least in my experience anyways.
        
               | taberiand wrote:
               | Of course, everything needs to be verified - I'm just
               | trying to figure out a process that enables it to work as
               | effectively as it can on large code bases in a structured
               | way. Committing each stage to git, fixing issues and
               | adjusting the context still comes into play.
        
           | nwatson wrote:
           | One can also integrate with, say, a running PyCharm with the
           | Jetbrains IDE MCP server. Claude Desktop can then interact
           | directly with PyCharm.
        
           | hamandcheese wrote:
           | Any particular reason you prefer that over Claude code?
        
             | jeremywho wrote:
             | I'm on windows. Claude Code via WSL hasn't been as smooth a
             | ride.
        
           | JyB wrote:
           | That's exactly how you should do it. You can also plug in an
           | MCP for your CI or mention cli.github.com in your prompt to
           | also make it iterate on CI failures.
           | 
           | Next you use claude code instead and you make several work on
           | their own clone on their own workspace and branches in the
           | background; So you can still iterate yourself on some other
           | topic on your personal clone.
           | 
           | Then you check out its tab from time to time and optionally
           | checkout its branch if you'd rather do some updates yourself.
           | It's so ingrained in my day-to-day flow now it's been super
           | impressive.
        
         | risyachka wrote:
         | Pretty much my experience too.
         | 
         | I usually go to option 2 - just write it by myself as it is
         | same time-wise but keeps skills sharp.
        
           | fpauser wrote:
           | Not to degenerate is really challenging these days. There are
           | the bubbles that simulate multiple realities to us and try to
           | untrain us logic thinking. And there are the llms that try to
           | convice us that self thinking is unproductive. I wonder when
           | this digitalophily suddenly turns into digitalophobia.
        
             | sciencejerk wrote:
             | It's happening, friend, don't let the AI hype fool you. I'm
             | detecting quite a bit of reluctance and lack of 100% buy-in
             | on AI coding tools and trends, even from your typically
             | tech-loving Software Engineers.
        
         | flowerthoughts wrote:
         | I predict microservices will get a huge push forward. The
         | question then becomes if we're good enough at saying "Claude,
         | this is too big now, you have to split it in two services" or
         | not.
         | 
         | If LLMs maintain the code, the API boundary
         | definitions/documentation and orchestration, it might be
         | manageable.
        
           | fsloth wrote:
           | Why microservices? Monoliths with code-golfed minimal
           | implementation size (but high quality architecture)
           | implemented in strongly typed language would consume far less
           | tokens (and thus would be cheaper to maintain).
        
           | arwhatever wrote:
           | Won't this cause [insert LLM] to lose context around the
           | semantics of messages passed between microservices?
           | 
           | You could then put all services in 1 repo, or point LLM at X
           | number of folders containing source for all X services, but
           | then it doesn't seem like you'll have gained anything, and at
           | the cost of added network calls and more infra management.
        
           | urbandw311er wrote:
           | Why not just cleanly separated code in a single execution
           | environment? No need to actually run the services in separate
           | execution environments just for the sake of an LLM being able
           | to parse it, that's crazy! You can just give it the files or
           | folders it needs for the particular services within the
           | project.
           | 
           | Obviously there's still other reasons to create micro
           | services if you wish, but this does not need to be another
           | reason.
        
         | fpauser wrote:
         | Same conclusion here. Also good for analyzing existing
         | codebases and to generate documentation for undocumented
         | projects.
        
           | j45 wrote:
           | It's quite good at this, I have been tying in Gemini Pro with
           | this too.
        
         | johnisgood wrote:
         | > but for enterprise level code it is not there
         | 
         | It is good for me in Go but I had to tell it what to write and
         | how.
        
           | sdesol wrote:
           | I've been able to create a very advanced search engine for my
           | chat app that is more than enterprise ready. I've spent a
           | decade thinking about search, but in a different language.
           | Like you, I needed to explain what I knew about writing a
           | search engine in Java for the LLM, to write it in JavaScript
           | using libraries I did not know and it got me 95% of the way
           | there.
           | 
           | It is also incredibly important to note that the 5% that I
           | needed to figure out was the difference between throw away
           | code and something useful. You absolutely need domain
           | knowledge but LLMs are more than enterprise ready in my
           | opinion.
           | 
           | Here is some documentation on how my search solution is used
           | in my app to show that it is not a hobby feature.
           | 
           | https://github.com/gitsense/chat/blob/main/packages/chat/wid.
           | ..
        
             | johnisgood wrote:
             | Thanks for your reply, I am in the same boat, and it works
             | for me, like it seems to work for you. So as long as we are
             | effective with it, why not? Of course I am not doing things
             | blindly and expect good results.
        
         | jiggawatts wrote:
         | Something I've discovered is that it may be worthwhile writing
         | the prompt anyway, even for a framework you're an expert with.
         | Sometimes the AIs will surprise me with a novel approach, but
         | the real value is that the prompt makes for _excellent_
         | documentation of the requirements! It's a much better starting
         | point for doc-comments or PR blurbs than after-the-fact
         | ramblings.
        
         | viccis wrote:
         | I agree. For me it's a modern version of that good ol "rails
         | new" scaffolding with Ruby on Rails that got you started with a
         | project structure. It makes sense because LLMs are particularly
         | good at tasks that require little more knowledge than just a
         | near perfect knowledge of the documentation of the tooling
         | involved, and creating a well organized scaffold for a
         | greenfield project falls squarely in that area.
         | 
         | For legacy systems, especially ones in which a lot of the
         | things they do are because of requirements from external
         | services (whether that's tech debt or just normal growing
         | complexity in a large connected system), it's less useful.
         | 
         | And for tooling that moves fast and breaks things (looking at
         | you, Databricks), it's basically worthless. People have already
         | brought attention to the fact that it will only be as current
         | as its training data was, and so if a bunch of terminology,
         | features, and syntax have changed since then (ahem,
         | Databricks), you would have to do some kind of prompt
         | engineering with up to date docs for it to have any hope of
         | succeeding.
        
           | pvorb wrote:
           | I'm wondering what exact issue you are referring to with
           | Databricks? I can't remember a time I had to change a line I
           | wrote during the past 2.5 years I've been using it. Or are
           | you talking about non-breaking changes?
        
         | alfalfasprout wrote:
         | The bigger problem I'm seeing is engineers that become over
         | reliant on vibe coding tools are starting to lose context on
         | how systems are designed and work.
         | 
         | As a result, their productivity might go up on simple "ticket
         | like tasks" where it's basically just simple implementation
         | (find the file(s) to edit, modify it, test it) but when they
         | start using it for all their tasks suddenly they don't know how
         | anything works. Or worse, they let the LLM dictate and bad
         | decisions are made.
         | 
         | These same people are also very dogmatic on the use of these
         | tools. They refuse to just code when needed.
         | 
         | Don't get me wrong, this stuff has value. But I just hate
         | seeing how it's made many engineers complacent and accelerated
         | their ability to add to tech debt like never before.
        
         | mnky9800n wrote:
         | Yea that's right. It's kind of annoying how useful it is for
         | hobby projects and it is suddenly useless on anything at work.
         | Haha. I love Claude code for some stuff (like generating a
         | notebook to analyse some data). But it really just disconnects
         | you from the problem you are solving without you going through
         | everything it writes. And I'm really bullish on ai coding tools
         | haha, for example:
         | 
         | https://open.substack.com/pub/mnky9800n/p/coding-agents-prov...
        
         | pqs wrote:
         | I'm not a programmer, but I need to write python and bash
         | programs to do my work. I also have a few websites and other
         | personal projects. Claude Code helps me implement those little
         | projects I've been wanting to do for a very long time, but I
         | couldn't due to the lack of coding experience and time. Now I'm
         | doing them. Also now I can improve my emacs environment,
         | because I can create lisp functions with ease. For me, this is
         | the perfect tool, because now I can do those little projects I
         | couldn't do before, making my life easier.
        
           | zingar wrote:
           | Big +1 to customizing emacs! Used to feel so out of reach,
           | but now I basically rolled my own cursor.
        
           | chamomeal wrote:
           | LLMs totally kick ass for making bash scripts
        
             | dboreham wrote:
             | Strong agree. Bash is so annoying that there have been many
             | scripts that I wanted to have, but just didn't write (did
             | the thing manually instead) rather than go down the rabbit
             | hole of Bash nonsense. LLMs turn this on its head. I
             | probably have LLMs write 1-2 bash scripts a week now, that
             | I commit to git for use now and later.
        
               | unshavedyak wrote:
               | Similarly my Nix[OS] env had a ton of annoyances and
               | updates needed that i didn't care to do. My first week of
               | Claude saw tons of Nix improvements for my environment
               | across my three machines (desk, server, macbook) and it's
               | a much more rich environment.
               | 
               | Claude did great at Nix, something i struggled with due
               | to lack of documentation. It was far from perfect, but it
               | usually pointed me towards the answer that i could later
               | refine with it. Felt magical.
        
               | elcritch wrote:
               | Similarly I've been making Ansible Playbooks using LLMs
               | of late, often by converting shell scripts. Play books
               | are pretty great and easier to make idempotent than
               | shell. But without Claude I'd forget the syntax or
               | commands and it'd take forever to setup.
        
               | int_19h wrote:
               | Why not use a more sensible shell, e.g. Fish?
        
               | chamomeal wrote:
               | Also great at making fish scripts!
               | 
               | Bash scripts are p much universal though. I can send em
               | to my coworkers. I can use them in my awful prod-
               | debugging-helm environment.
        
           | MangoCoffee wrote:
           | At the end of the day, all tools are made to make their
           | users' lives easier.
           | 
           | I use GitHub Copilot. I recently did a vibe code hobby
           | project for a command line tool that can display my
           | computer's IP, hard drive, hard drive space, CPU, etc. GPT
           | 4.1 did coding and Claude did the bug fixing.
           | 
           | The code it wrote worked, and I even asked it to create a
           | PowerShell script to build the project for release
        
             | dfedbeef wrote:
             | Try typing ctrl+shift+escape.
        
           | dekhn wrote:
           | For context I'm a principal software engineer who has worked
           | in and out of machine learning for decades (along with a
           | bunch of tech infra, high performance scientific computing,
           | and a bunch of hobby projects).
           | 
           | In the few weeks since I've started using
           | Gemini/ChatGPT/Claude, I've
           | 
           | 1. had it read my undergrad thesis and the paper it's based
           | on, implementing correct pytorch code for featurization and
           | training, along wiht some aspects of the original paper that
           | I didn't include in my thesis. I had been waiting until
           | retirement until taking on this task.
           | 
           | 2. had it write a bunch of different scripts for automating
           | tasks (typically scripting a few cloud APIs) which I then
           | ran, cleaning up a long backlog of activities I had been
           | putting off.
           | 
           | 3. had it write a yahtzee game and implement a decent "pick a
           | good move" feature . It took a few tries but then it output a
           | fully functional PyQt5 desktop app that played the game. It
           | beat my top score of all time in the first few plays.
           | 
           | 4. tried to convert the yahtzee game to an android app so my
           | son and I could play. This has continually failed on every
           | chat agent I've tried- typically getting stuck with gradle or
           | the android SDK. This matches my own personal experience with
           | android.
           | 
           | 5. had it write python and web-based g-code senders that
           | allowed me to replace some tools I didn't like (UGS). Adding
           | real-time vis of the toolpath and objects wasn't that hard
           | either. Took about 10 minutes and it cleaned up a number of
           | issues I saw with my own previous implementations
           | (multithreading). It was stunning how quickly it can create
           | fully capable web applications using javascript and external
           | libraries.
           | 
           | 6. had it implement a gcode toolpath generator for basic
           | operations. At first I asked it to write Rust code, which
           | turned out to be an issue (mainly because the opencascade
           | bindings are incomplete), it generated mostly functional code
           | but left it to me to implement the core algorithm. I asked it
           | to switch to C++ and it spit out the correct code the first
           | time. I spent more time getting cmake working on my system
           | than I did writing the prompt and waiting for the code.
           | 
           | 7. had it Write a script to extract subtitles from a movie,
           | translate them into my language, and re-mux them back into
           | the video. I was able to watch the movie less than an hour
           | after having the idea- and most of that time was just
           | customizing my prompt to get several refinements.
           | 
           | 8. had it write a fully functional chemistry structure
           | variational autoencoder that trains faster and more accurate
           | than any I previously implemented.
           | 
           | 9. various other scientific/imaging/photography related
           | codes, like impleemnting multi-camera rectification, so I can
           | view obscured objects head-on from two angled cameras.
           | 
           | With a few caveats (Android projects, Rust-based toolpath
           | generation), I have been absolutely blown away with how
           | effective the tools are (especially used in a agent which has
           | terminal and file read/write capabilities). It's like having
           | a mini-renaissance in my garage, unblocking things that would
           | have taken me a while, or been so frustrating I'd give up.
           | 
           | I've also found that AI summaries in google search are often
           | good enough that I don't click on links to pages (wikipedia,
           | papers, tutorials etc). The more experience I get, the more
           | limitations I see, but many of those limitations are simply
           | due to the extraordinary level of unnecessary complexity
           | required to do nearly anything on a modern computer (see my
           | comments about about Android apps & gradle).
        
         | stpedgwdgfhgdd wrote:
         | For enterprise software development CC is definitely there.
         | 100k Go code paas platform, micro services architecture, mono
         | repo is manageable.
         | 
         | The prompt needs to be good, but in plan mode it will
         | iteratively figure it out.
         | 
         | You need to have automated tests. For enterprise software
         | development that actually goes without saying.
        
         | dclowd9901 wrote:
         | It also steps right over easy optimizations. I was doing a
         | query on some github data (tedious work) and rather than
         | preliminarily filter down using the graphql search method, it
         | wanted to comb through all PRs individually. This seems like
         | something it probably should have figured out.
        
         | amelius wrote:
         | It is very useful for small tasks like fixing network problems,
         | or writing regexp patterns based on a few examples.
        
           | MarcelOlsz wrote:
           | _Here 's how YOU can save $200/mo!_
        
         | brulard wrote:
         | For me it was like this for like a year (using Cline + Sonnet &
         | Gemini) until Claude Code came out and until I learned how to
         | keep context real clean. The key breakthrough was treating AI
         | as an architect/implementer rather than a code generator.
         | 
         | Most recently I ask first CC to create a design document for
         | what we are going to do. He has instructions to look into the
         | relevant parts of the code and docs to reference them. I review
         | it and few back-and-forths we have defined what we want to do.
         | Next step is to chunk it into stages and even those to smaller
         | steps. All this may take few hours, but after this is well
         | defined, I clear the context. I then let him read the docs and
         | implement one stage. This goes mostly well and if it doesn't I
         | either try to steer him to correct it, or if it's too bad, I
         | improve the docs and start this stage over. After stage is
         | complete, we commit, clear context and proceed to next stage.
         | 
         | This way I spend maybe a day creating a feature that would take
         | me maybe 2-3. And at the end we have a document, unit tests,
         | storybook pages, and features that gets overlooked like
         | accessibility, aria-things, etc.
         | 
         | At the very end I like another model to make a code review.
         | 
         | Even if this didn't make me faster now, I would consider it
         | future-proofing myself as a software engineer as these tools
         | are improving quickly
        
           | imiric wrote:
           | This is a common workflow that most advanced users are
           | familiar with.
           | 
           | Yet even following it to a T, and being _really_ careful with
           | how you manage context, the LLM will still hallucinate,
           | generate non-working code, steer you into wrong directions
           | and dead ends, and just waste your time in most scenarios.
           | There 's no magical workflow or workaround for avoiding this.
           | These issues are inherent to the technology, and have been
           | since its inception. The tools have certainly gotten more
           | capable, and the ecosystem has matured greatly in the last
           | couple of years, but these issues remain unsolved. The idea
           | that people who experience them are not using the tools
           | correctly is insulting.
           | 
           | I'm not saying that the current generation of this tech isn't
           | useful. I've found it very useful for the same scenarios GP
           | mentioned. But the above issues prevent me from relying on it
           | for anything more sophisticated than that.
        
             | brulard wrote:
             | > These issues are inherent to the technology
             | 
             | That's simply false. Even if LLMs don't produce correct and
             | valid code on first shot 100% times of the cases, if you
             | use an agent, it's simply a matter of iterations. I have
             | claude code connected to Playwright, context7 for docs and
             | to Playwright, so it can iterate by itself if there are
             | syntax errors, runtime errors or problems with the data on
             | the backend side. Currently I have near zero cases when it
             | does not produce valid working code. If it is incorrect in
             | some aspect, it is then not that hard to steer it to better
             | solution or to fix yourself.
             | 
             | And even if it failed in implementing most of these stages
             | of the plan, it's not all wasted time. I brainstormed
             | ideas, formed the requirements, specifications to features
             | and have clear documentation and plan of the
             | implementation, unit tests, etc. and I can use it to code
             | it myself. So even in the worst case scenario my
             | development workflow is improved.
        
               | mathiaspoint wrote:
               | It definitely isn't. LLMs often end up stuck in weird
               | corners they just don't get and need someone familiar
               | with the theory of what they're working on to unstick
               | them. If the agent is the same model as the code
               | generator it won't be able to on its own.
        
               | sawjet wrote:
               | Skill issue
        
               | brulard wrote:
               | I was getting to stuck state with Gemini and to lesser
               | extent with Sonnet 4, but my cases were resolved by Opus.
               | I think it is mostly due to size of the task and if you
               | split it in advance to smaller chunks, all these models
               | has much higher probability to resolve.
        
               | nojs wrote:
               | Could you explain your exact playwright setup in more
               | detail? I've found that claude really struggles to end-
               | to-end test complex features that require browser use. It
               | gets stuck for several minutes trying to find the right
               | button to click for example.
        
               | brulard wrote:
               | No special setup, just something along "test with
               | playwright" in the process list. It can get stuck, but
               | for me it was not often enough for me to care. If it
               | happens, I push it in the right direction.
        
           | john-tells-all wrote:
           | I've seen this referred to as Chain of Thought. I've used it
           | with great success a few times.
           | 
           | https://martinfowler.com/articles/2023-chatgpt-xu-hao.html
        
           | aatd86 wrote:
           | For me it's the opposite. As long as I ask for small tasks,
           | or error checking, it can help. But I'd rather think of the
           | overall design myself because I tend to figure out corner
           | cases or superlinear complexities much better. I develop
           | better mental models than the NNs. That's somewhat of a
           | relief.
           | 
           | Also the longer the conversation goes, the less effective it
           | gets. (saturated context window?)
        
             | brulard wrote:
             | I don't think thats the opposite. I have an idea what I
             | want and to some extent how I want it to be done. The
             | design document starts with a brainstorming where I throw
             | all my ideas at the agent and we iterate together.
             | 
             | > Also the longer the conversation goes, the less effective
             | it gets. (saturated context window?)
             | 
             | Yes, this is exactly why I said the breakthrough came for
             | me when I learned how to keep the context clean. That means
             | multiple times in the process I ask the model to put the
             | relevant parts of our discussion into an MD document, I may
             | review and edit it and I reset the context with /clear.
             | Then I have him read just the relevant things from MD docs
             | and we continue.
        
           | ramshanker wrote:
           | Same here. A small variation: I explicitly use website to
           | manage what context it gets to see.
        
             | brulard wrote:
             | What do you mean by website? An HTML doc?
        
               | ramshanker wrote:
               | I mean the website of AI providers. chatgpt.com ,
               | gemini.google.com , claude.ai and so on.
        
               | spaceywilly wrote:
               | I've had more success this way as well. I will use the
               | model via web ui, paste in the relevant code, and ask it
               | to implement something. It spits out the code, I copy it
               | back into the ide, and build. I tried Claude Code but I
               | find it goes off the rails too easily. I like the chat
               | through the UI because it explains what it's doing like a
               | senior engineer would
        
               | brulard wrote:
               | Well, this is the way we could do it for 2 years already,
               | but basically you are doing the transport layer for the
               | process, which can not be efficient. If you really want
               | to have tight control of what exactly the LLM sees, than
               | that's still an option. But you only get so far with this
               | approach.
        
         | drums8787 wrote:
         | My experience is the opposite I guess. I am having a great time
         | using claude to quickly implement little "filler features" that
         | require a good amount of typing and pulling from/editing
         | different sources. Nothing that requires much brainpower beyond
         | remembering the details of some sub system, finding the right
         | files, and typing.
         | 
         | Once the code is written, review, test and done. And on to more
         | fun things.
         | 
         | Maybe what has made it work is that these tasks have all fit
         | comfortably within existing code patterns.
         | 
         | My next step is to break down bigger & more complex changes
         | into claude friendly bites to save me more grunt work.
        
           | unlikelytomato wrote:
           | I wish I shared this experience. There are virtually no
           | filter features for me to work on. When things feel like
           | filler on my team, it's generally a sign of tech debt and we
           | wouldn't want to have it generate all the code it would take.
           | What are some examples of filler features for you?
           | 
           | On the other hand, it does cost me about 8 hours a week
           | debugging issues created by bad autocompletes from my team.
           | The last 6 months have gotten really bad with that. But that
           | is a different issue.
        
         | apimade wrote:
         | Many who say LLMs produce "enterprise-grade" code haven't
         | worked in mid-tier or traditional companies, where projects are
         | held together by duct tape, requirements are outdated, and
         | testing barely exists. In those environments, enterprise-ready
         | code is rare even without AI.
         | 
         | For developers deeply familiar with a codebase they've worked
         | on for years, LLMs can be a game-changer. But in most other
         | cases, they're best for brainstorming, creating small tests, or
         | prototyping. When mid-level or junior developers lean heavily
         | on them, the output may look useful.. until a third-party
         | review reveals security flaws, performance issues, and built-in
         | legacy debt.
         | 
         | That might be fine for quick fixes or internal tooling, but
         | it's a poor fit for enterprise.
        
           | bityard wrote:
           | I work in the enterprise, although not as a programmer, but I
           | get to see how the sausage is made. And describing code as
           | "enterprise grade" would not be a compliment in my book. Very
           | analogous to "contractor grade" when describing home
           | furnishings.
        
           | Aeolun wrote:
           | Umm, Claude Code is a lot better than a lot of enterprise
           | grade code I see. And it actually learns from mistakes with a
           | properly crafted instruction xD
        
             | cube00 wrote:
             | >And it actually learns from mistakes with a properly
             | crafted instruction
             | 
             | ...until it hallucinates and ignores said instruction.
        
           | typpilol wrote:
           | I've found having a ton of linting tools can help the AI
           | write much better and secure code.
           | 
           | My eslint config is a mess but the code it writes comes out
           | pretty good. Although it makes a few iterations after the
           | lint errors pop for it to rewrite it, the code it writes is
           | way better.
        
         | therealpygon wrote:
         | I mostly agree, with the caveat that I would say it can
         | certainly be useful when used appropriately as an "assistant".
         | NOT vibe coding blindly and hoping what I end up with is
         | useful. "Implement x specific thing" (e.g. add an edit button
         | to component x), not "implement a whole new epic feature that
         | includes changes to a significant number of files". Imagine
         | meeting a house builder and saying "I want a house", then
         | leaving and expecting to come back to exactly the house you
         | dreamed of.
         | 
         | I get why, it's a test of just how intuitive the model can be
         | at planning and execution which drives innovation more than 1%
         | differences in benchmarks ever will. I encourage that
         | innovation in the hobby arena or when dogfooding your AI
         | engineer. But as a replacement developer in an enterprise where
         | an uncaught mistake could cost millions? No way. I wouldn't
         | even want to be the manager of the AI engineering team, when
         | they come looking for the only real person to blame for the
         | mistake not being caught.
         | 
         | For additional checks/tasks as a completely extra set of eyes,
         | building internal tools, and for scripts? Sure. It's incredibly
         | useful with all sorts of non- application development tasks.
         | I've not written a batch or bash script in forever...you just
         | don't really need to much anymore. The linear flow of most
         | batch/bash/scripts (like you mentioned) couldn't be a more
         | suitable domain.
         | 
         | Also, with a basic prompt, it can be an incredibly useful
         | rubber duck. For example, I'll say something like "how do you
         | think I should solve x problem"(with tools for the codebase and
         | such, of course), and then over time having rejected and been
         | adversarial to every suggestion, I end up working through the
         | problem and have a more concrete mental design. Think "over-
         | eager junior know-it-all that tries to be right constantly"
         | without the person attached and you get a better idea of what
         | kind of LLM output you can expect including following false
         | leads to test your ideas. For me it's less about wanting a plan
         | from the LLM, and more about talking through the problems I
         | think my plan could solve better, when more things are
         | considered outside the LLMs direct knowledge or access.
         | 
         | "We can't do that, changing X would break Y external process
         | because Z. Summarize that concern into a paragraph to be added
         | to the knowledge base. Then, what other options would you
         | suggest?"
        
         | tonyhart7 wrote:
         | it depends on model but sonnet is more than capable for
         | enterprise code
         | 
         | when you stuck at claude doing dumb shit, you didnt give the
         | model enough context to know better the system
         | 
         | after following spec driven development, works with LLM in
         | large code base make it so much easier than without it like its
         | heaven and hell differences
         | 
         | but also it increase in token cost exponentially, so there's
         | that
        
         | hoppp wrote:
         | I used it with Tyopescript and Go, SQL, Rust
         | 
         | Using it with rust is just horrible imho. Lots and lots of
         | errors, I cant wait to stop this rust project already. But the
         | project itself is quite complex
         | 
         | Go on the other hand is super productive, mainly because the
         | language is already very simple. I can move 2x fast
         | 
         | Typescript is fine, I use it for react components and it will
         | do animations Im lazy to do...
         | 
         | SQL and postgresql is fine, I can do it without it also, I just
         | dont like to write stored functions cuz of the boilerplatey
         | syntax, a little speed up saves me from carpal tunnel
        
         | epolanski wrote:
         | I really find your experience strikingly different than mine,
         | I'll share you my flow:
         | 
         | - step A: ask AI to write a featureA-requirements.md file at
         | the root of the project, I give it a general description for
         | the task, then have it ask me as many questions as possible to
         | refine user stories and requirements. It generally comes up
         | with a dozen or more of questions, of which multiples I
         | would've not thought about and found out much later. Time:
         | between 5 and 40 minutes. It's very detailed.
         | 
         | - step B: after we refine the requirements (functional and non
         | functional) we write together a todo plan as featureA-todo.md.
         | I refine the plan again, this is generally shorter than the
         | requirements and I'm generally done in less than 10 minutes.
         | 
         | - step C: implementation phase. Again the AI does most of the
         | job, I correct it at each edit and point flaws. Are there cases
         | where I would've done that faster? Maybe. I can still jump in
         | the editor and do the changes I want. This step in general
         | includes comprehensive tests for all the requirements and edge
         | cases we have found in step A, both functional, integration and
         | E2Es. This really varies but it is generally highly tied to the
         | quality of phase A and B. It can be as little as few minutes
         | (especially true when we indeed come up with the most effective
         | plan) and as much as few hours.
         | 
         | - step D: documentation and PR description. With all of this
         | context (in requirements and todos) at this point updating any
         | relevant documentation and writing the PR description is a very
         | short experiment.
         | 
         | In all of that: I have textual files with precise coding style
         | guidelines, comprehensive readmes to give precise context, etc
         | that get referenced in the context.
         | 
         | Bottom line: you might be doing something profoundly wrong,
         | because in my case, all of this planning, requirements
         | gathering, testing, documenting etc is pushing me to deliver a
         | much higher quality engineering work.
        
           | mcintyre1994 wrote:
           | You'd probably like Kiro, it seems to be built specifically
           | for this sort of spec-driven development.
        
             | epolanski wrote:
             | How would it be better than what I'm doing with Claude?
        
       | TZubiri wrote:
       | Remember kids, just because you CAN doesn't mean you SHOULD
        
       | mrcwinn wrote:
       | This tells me they've gotten very good at caching and modeling
       | the impact of caching.
        
       | fpauser wrote:
       | O observed that claude produces a lot of bloat. Wonder how such
       | llm generated projects age.
        
       | cadamsdotcom wrote:
       | I'm glad to see the only company chasing margins - which they get
       | by having a great product and a meticulous brand - finding even
       | more ways to get margin. That's good business.
        
       | howinator wrote:
       | I could be wrong, but I think this pricing is the first to admit
       | that cost scales quadratically with number of tokens. It's the
       | first time I've seen nonlinear pricing from an LLM provider which
       | implicitly mirrors the inference scaling laws I think we're all
       | aware of.
        
         | jpau wrote:
         | Google[1] also has a "long context" pricing structure. OpenAI
         | may be considering offering similar since they do not offer
         | their priority processing SLAs[2] for context >128K.
         | 
         | [1] https://cloud.google.com/vertex-ai/generative-ai/pricing
         | 
         | [2] https://openai.com/api-priority-processing/
        
         | energy123 wrote:
         | Is this marginal pricing or if you go from 200,000 to 200,001
         | tokens your total costs double?
        
       | reverseblade2 wrote:
       | Does this cover subscription?
        
         | anonym29 wrote:
         | API only for now, but at the very bottom of the post: "We're
         | also exploring how to bring long context to other Claude
         | products."
         | 
         | So, not yet, but maybe someday?
        
       | _joel wrote:
       | Fantastic, use up your quota even more quickly. :)
        
       | phyzix5761 wrote:
       | What I've found with LLMs is they're basically a better version
       | of Google Search. If I need a quick "How do I do..." or if I need
       | to find a quick answer to something its way more useful than
       | Google and the fact that I can ask follow up questions is
       | amazing. But for any serious deep work it has a long way to go.
        
         | mr_moon wrote:
         | I feel exactly the same way. why skim and sift 15 different
         | stackoverflow posts when an LLM can pick out exactly the info I
         | need?
         | 
         | I don't need to spin up an entire feature in a few seconds. I
         | need help understanding where something is broken; what are
         | some opinions o best practice; or finding out what a poorly
         | written snippet is doing.
         | 
         | context still v important for this though and I appreciate
         | cranking that capacity. "read 15000 stackoverflow posts for me
         | please"
        
           | anvuong wrote:
           | The action of sifting through through poop to find gold
           | actually positively develops my critical thinking skill. I,
           | too, went through a phase of just asking LLM for a specific
           | concept instead of Googling it and weave through dozens of
           | wiki pages or niche mailing list discussions. It did improve
           | my productivity but I feel like it dulls my brain. So
           | recently I have to tone that down and force myself to go back
           | to the old way. Maybe too much of a good thing is bad.
        
         | Whatarethese wrote:
         | This is my primary use of AI. Looking for a new mountain bike
         | and using AI to list and compare parts of the bike and which is
         | best for my use case scenario. Works pretty well so far.
        
         | throawaywpg wrote:
         | Google always planned search to be just a stopgap
        
       | meander_water wrote:
       | I like to spend a lot of time in "Ask" mode in Cursor. I guess
       | the equivalent in Claude code is "plan" mode.
       | 
       | Where I have minimal knowledge about the framework or language, I
       | ask a lot of questions about how the implementation would work,
       | what the tradeoffs are etc. This is to minimize any
       | misunderstanding between me and the tool. Then I ask it to write
       | the implementation plan, and execute it one by one.
       | 
       | Cursor lets you have multiple tabs open so I'll have a Ask mode
       | and Agent mode running in parallel.
       | 
       | This is a lot slower, and if it was a language/framework I'm
       | familiar with I'm more likely to execute the plan myself.
        
       | itissid wrote:
       | My experience with Claude code beyond building anything bigger
       | than a webpage, a small API, a tutorial on CSS etc has been
       | pretty bad. I think context length is a manageable problem, but
       | not the main one. I used it to write a 50K LoC python code base
       | with 300 unit tests and it went ok for the first few weeks and
       | then it failed. This is after there is a CLAUDE.md file for every
       | single module that needs it as well as detailed agents for
       | testing, design, coding and review.
       | 
       | I won't going into a case by case list of its failures, The core
       | of the issue is misaligned incentives, which I want to get into:
       | 
       | 1. The incentives for coding agent, in general and claude, are
       | writing LOTS of code. None of them -- O -- are good at the
       | planning and verification.
       | 
       | 2. The involvement of the human, ironically, in a haphazard way
       | in the agent's process. And this has to do with how the problem
       | of coding for these agents is defined. Human developers are like
       | snow flakes when it comes to opinions on software design, there
       | is no way to apply each's preference(except paper machet and
       | superglue SO, Reddit threads and books) to the design of the
       | system in any meaningful way and that makes a simple system way
       | too complex or it makes a complex problem simplistic.
       | - There is no way to evolve the plan to accept new preferences
       | except text in CLAUDE.md file in git that you will have to read
       | through and edit.            - There is no way to know the near
       | term effect of code choices now on 1 week from now.             -
       | So much code is written that asking a person to review it in case
       | you are at the envelope and pushing the limit feels morally wrong
       | and an insane ask. How many of your Code reviews are instead
       | replaced by 15-30 min design meetings to instead solicit feedback
       | on design of the PR -- because it so complex -- and just push the
       | PR into dev? WTF am I even doing I wonder.            - It does
       | not know how far to explore for better rewards and does not know
       | it better from local rewards, Resulting in commented out tests
       | and deleting arbitrary code, to make its plan "work".
       | 
       | In short code is a commodity for CEOs of Coding agent companies
       | and CXOs of your company to use(sales force has everyone coding,
       | but that just raises the floor and its a good thing, it does NOT
       | lower the bar and make people 10x devs). All of them have bought
       | into this idea that 10x is somehow producing 10x code. Your time
       | reviewing and unmangling and mainitaining the code is not the
       | commodity. It never ever was.
        
       | lpa22 wrote:
       | One of the most helpful usages of CC so far is when I simply ask:
       | 
       | "Are there any bugs in the current diff"
       | 
       | It analyzes the changes very thoroughly, often finds very subtle
       | bugs that would cost hours of time/deployments down the line, and
       | points out a bunch of things to think through for correctness.
        
         | swyx wrote:
         | maybe want to reify that as a claude code hook!
        
         | bertil wrote:
         | That matches my experience with non-coding tasks: it's not very
         | creative, but it's a comprehensive critical reader.
        
         | neucoas wrote:
         | I am trying this tomorrow
        
           | lpa22 wrote:
           | Let me know how it goes. It's a game changer
        
         | KTibow wrote:
         | I'm surprised that works even without telling it to think/think
         | hard/think harder/ultrathink.
        
         | GuB-42 wrote:
         | I added this to my toolbox in addition to traditional linters.
         | 
         | My experience is that it is about 10% harmful, 80% useless and
         | 10% helpful. Which is actually great, the 10% is worth it, but
         | it is far from a hands off experience.
         | 
         | By harmful I mean something like suggesting a wrong fix to code
         | that works, it usually happens when I am doing something
         | unusual or counter intuitive, for example having a function
         | "decrease_x" that (correctly) adds 1 to x. It may hint for
         | better documentation, but you have to be careful not to go on
         | autopilot and just do what it says.
         | 
         | By useless I mean something like "you didn't check for null"
         | even though the variable can't be null or is passed to a
         | function that handles the "null" case gracefully. In general,
         | it tends to be overly defensive and following the
         | recommendations would lead to bloated code.
         | 
         | By helpful I mean finding a real bug. Most of them minor, but
         | for some, I am glad I did that check.
         | 
         | LLMs complement traditional linters well, but they don't
         | replace them.
        
           | csomar wrote:
           | > it usually happens when I am doing something unusual or
           | counter intuitive,
           | 
           | That's usually your signal that your code needs refactoring.
        
             | GuB-42 wrote:
             | I wouldn't say it needs refactoring. Maybe more
             | documentation, or some work on naming. But I believe that
             | code you write has to be at least a bit unusual.
             | 
             | Every project worth making is unique. Otherwise, why not
             | use something off the shelf?
             | 
             | For example, let's say you want to shuffle songs for a
             | music player, you write your shuffling algorithm and it is
             | "wrong", but there is a reason it is "wrong": it better
             | matches the expectations of the user than a truly random
             | shuffle. A LLM trained on thousands of truly random
             | shuffles may try to "fix" your code, but it is actually the
             | worst thing you can do. That "wrong" shuffle is the reason
             | why you wrote that code in the first place, the "wrongness"
             | is what adds value. But now, imagine that you realize that
             | a true random shuffle is actually the way to go, then
             | "fixing" your code is not what you should do either,
             | instead, you should delete it and use the shuffle function
             | your standard library offers.
             | 
             | The unusual/unique/surprising parts of your code is where
             | the true value is, and if there is none of that in your
             | codebase, maybe you are just reinventing the wheel. Now, if
             | a LLM trips off these parts, maybe you need some
             | documentation, as a way to tell both the LLM and a human
             | reading that part that it is something you should pay
             | attention to. I am not a fan of comments in general, but
             | that's where they are useful: explaining why you wrote that
             | weird code, something along the lines of "I know it is not
             | the correct algorithm, but users prefer it that way".
        
             | Vegenoid wrote:
             | While this can be true, I think a lot of people are far too
             | eager to say that because someone is doing something in an
             | unusual way, it's probably wrong. Not everything fits the
             | cookie cutter model, there is tons of software written for
             | all kinds of purposes. Suggesting that they're writing code
             | wrong when someone says "an LLM struggles with my unusual
             | code", when we aren't actually looking at the code and the
             | context, is not helpful.
        
         | elcritch wrote:
         | Recently I realized you can add Github Copilot as a reviewer to
         | a PR. It's surprisingly handy and found a few annoying typos
         | and one legit bug mostly from me forgetting to update another
         | field.
        
         | aurareturn wrote:
         | I do the same with Github Copilot after every change.
         | 
         | I work with a high stakes app and breaking changes cause a ton
         | of customer headaches. LLMs have been excellent at catching
         | potential little bugs.
        
         | hahn-kev wrote:
         | We use Code Rabbit for this in our open source project.
        
       | i_have_an_idea wrote:
       | While this is cool, can anything be done about the speed of
       | inference?
       | 
       | At least for my use, 200K context is fine, but I'd like to see a
       | lot faster task completion. I feel like more people would be OK
       | with the smaller context if the agent acts quickly (vs waiting
       | 2-3 mins per prompt).
        
         | jeffhuys wrote:
         | There's work being done in this field - I saw a demo using the
         | same method as stable diffusion does, but then for text. Was
         | extremely fast (3 pages of text in like a second). It'll come.
        
         | wahnfrieden wrote:
         | Meanwhile the key is to become proficient at using worktrees to
         | parallelize agents instead of working serially with them
        
           | i_have_an_idea wrote:
           | Sounds nice, in theory, but in practice I want to iterate on
           | one, perhaps, two tasks at a time, and keep a good
           | understanding of what the agent is doing, so that I can
           | prevent it from going off the rails, making bad decisions and
           | then building on them even further.
           | 
           | Worktrees and parallel agents do nothing to help me with
           | that. It's just additional cognitive load.
        
       | maxnevermind wrote:
       | Does very large context significantly increase a response time?
       | Are there any benchmarks/leader-boards estimating different
       | models in that regard?
        
       | hoppp wrote:
       | So I can upload 1M tokens per prompt but pay $3 per 1M input
       | tokens?
       | 
       | Its really expensive to use.
        
         | Aeolun wrote:
         | Only the first time. After that it's 0.3 per 1M input tokens
         | (cached).
        
       | psyclobe wrote:
       | Isn't Opus better? Whenever I run out of Opus tokens and get
       | kicked down to Sonnet it's quite a shock sometimes.
       | 
       | But man I'm at the perfect stage in my career for these tools. I
       | know a lot, I understand a lot, I have a _lot_ of great ideas-but
       | I'm getting kinda tired of hammering out code all day long. Now
       | with Claude I am just busting ass executing in all these ideas
       | and tests and fixes-never going back!
        
         | Aeolun wrote:
         | Haha, I think I recognize that. I'm just worried my actual
         | skills will athrophy while I use Claude Code like I'm a manager
         | on steroids.
        
           | stavros wrote:
           | By definition the thing that atrophies is the thing you never
           | need to use.
        
       | as367 wrote:
       | That is an unfortunate logo.
        
         | wiseowise wrote:
         | This is something that I wish I would unremember.
        
       | tomsanbear wrote:
       | I just want a better way to invalidate old context... It's great
       | that I can fit more context, but the main challenge is claude
       | getting sidetracked with 10 invalid grep calls, pytest dumping a
       | 10k token stack trace etc.... And yes the ability to go back in
       | time via esc+esc is great but I want claude to read the error
       | stack learn from it and purge from its context or at least let me
       | lobotomize ot selectively... Learning and discarding the raw
       | output from tool calls feels like the missing piece here still.
        
         | typpilol wrote:
         | Vscode recently rolled out checkpoints where you can go back to
         | a previous state of the conversation. But it's still not
         | enough.
         | 
         | We honestly need to be able to see exactly what's in the
         | context and be able to manually edit it.
        
       | aledalgrande wrote:
       | I hope that they are going to put something in Claude Code to
       | display if you're entering the expensive window. Sometime I just
       | keep the conversation going. I wouldn't want that to burn my Max
       | credits 2x faster.
        
         | terminalshort wrote:
         | Yeah, that 1 MM tokens is a $15 (IIRC) API call. That's gonna
         | add up quick! My favorite hypothetical AI failure scenario is
         | that LLM agents eventually achieve human level general
         | intelligence, but have to burn so many tokens to do it that
         | they actually become more expensive than a human.
        
           | k9294 wrote:
           | I believe Claude Code uses cache aggressively, so this 1kk
           | tokens will be 90% discounted or do I miss something?
        
       | socrateslee wrote:
       | It's like double "double the dose"
        
       | forgingahead wrote:
       | Wish it was on the web app as well!
        
       | williamtrask wrote:
       | Claude is down.
       | 
       | EDIT: for the moment... it supports 0 tokens of context xD
        
       | nojs wrote:
       | Currently the quality seems to degrade long before the context
       | limit is reached, as the context becomes "polluted".
       | 
       | Should we expect the higher limit to also increase the practical
       | context size proportionally?
        
       | m13rar wrote:
       | This is amazing. shout out to anthropic for doing this. I would
       | like to have a CLAUDE Model which is not nerfed with ethics and
       | values to please the users and write overtly large plans to
       | impress the user.
        
         | elcritch wrote:
         | I'm finding GPT5 to be more succinct and on par with Claude
         | Code so far. They're really toned down the obsequiousness.
        
       | truelson wrote:
       | We do know Parkinson's Law (
       | https://en.m.wikipedia.org/wiki/Parkinson%27s_law ) will apply to
       | all this, right?
        
       | simon_rider wrote:
       | feels like we just traded "not enough context" for "too much
       | noise." The million-token window is cool for marketing, but until
       | retrieval and summarization get way better, it's like dumping the
       | entire repo on a junior dev's desk and saying "figure it out."
       | They'll spend half their time paging through irrelevant crap, and
       | the other half hallucinating connections. Bigger context is only
       | a net win if the model can filter, prioritize, and actually
       | reason over it
        
         | whalesalad wrote:
         | I can't tell you the number of times I had almost reached
         | utopia only to hit compaction limits. Post-compaction I am
         | usually dead in the water and the spiraling or repetition
         | begins. Claude has a hard time compacting/remembering/flaggign
         | a-ha moments from the session. Stuff that is important in the
         | context of the task, but not appropriate for CLAUDE.md for
         | instance. I have been thinking for months that if the context
         | window was 2-3x larger, I would be unstoppable. So happy for
         | this change, and excited to test it this week.
        
       | MagicMoonlight wrote:
       | It's a stupid metric because nothing in the real world has half a
       | million words of context. So all they're doing is feeding it
       | imagined slop, or sticking together random files.
        
         | zaptrem wrote:
         | It's useful for hours-long long-context debugging sessions in
         | Claude Code, etc.
        
       | cintusshied wrote:
       | For folks using LLMs for big coding projects, what's your go-to
       | workflow for deciding which parts of the codebase to feed the
       | model?
        
         | aitchnyu wrote:
         | Aider automatically makes an outline of your whole codebase,
         | which takes fraction of the tokens of the real files.
         | 
         | https://aider.chat/docs/repomap.html
        
       | mrcwinn wrote:
       | I wish they'd fix other things faster. Still can't upload an
       | Excel file in the iOS app, even with analyst mode enabled. The
       | voice mode feels like it's 10 years behind OpenAI (no realtime,
       | for example). And Opus 4.1 still occasionally goes absolutely
       | mental and provides much worse analysis than o3 or GPT5-thinking.
       | 
       | Rooting for Anthropic. Competition in this space is good.
       | 
       | I watched an interview with Dario recently where he said he
       | wasted a "product guy" and it really shows.
        
       | cognix_dev wrote:
       | To be honest, I am not particularly interested in whether the
       | next model is better than the previous one. Rather than being
       | superior, it is important that it maintains context and has
       | enough memory capacity to not interfere with work. I believe that
       | is what makes the new model competitive.
        
       | CodeCompost wrote:
       | In Visual Studio as well?
        
       | omlelelom_kimox wrote:
       | HM
        
       | throwmeaway222 wrote:
       | Why do I get the feeling that HN devs on here want to just feed
       | it their entire folder, source, binaries everything and have it
       | make changes in seconds.
        
       | Roark66 wrote:
       | I noticed the quality of answer degrades horribly beyond few
       | thousands of tokens. Maybe 10k. Is anyone actually successfully
       | using these 100k+ token contexts for anything?
        
       | nprateem wrote:
       | Anyone else found Claude has become hugely more stupid recently?
       | 
       | It used to always pitch answers at the right level, but recently
       | it just seems to have left its common sense at the door. Gemini
       | just gives much better answers for non-technical questions now.
        
       | amelius wrote:
       | 1M?
       | 
       | 640K ought to be enough for anybody ... right?
        
       | doppelgunner wrote:
       | Great! Now we can use AI to read and think like a specific
       | "book".
        
       | hassadyb wrote:
       | i personally use it in my codding tasks such ana amazing and
       | powerful llm
        
       | muzani wrote:
       | Of course Bolt is the customer spotlight. These vibe coding tools
       | chuck the entire codebase and charge for tokens used. By 10k
       | lines of code or so, these apps were not able to fit.
        
       | ndkap wrote:
       | Does anybody know which technology most of these companies that
       | support 1M tokens use? Or is it all hidden?
        
       | t43562 wrote:
       | I think this highlights some problems with software development
       | in general. i.e. the code isn't enough - you need to have domain
       | knowledge too and a lot of knowledge about how and why the
       | company needs things done in some way or another. You might
       | imagine that dumping the contents of your wiki and all your chat
       | channels into some sort of context might do it but that would
       | miss the 100s of verbal conversations between people in the
       | company. It would also fall foul of the way everything tends to
       | work in any way you can imagine except what the wiki says.
       | 
       | Even if you transcribed all the voice chats and meetings and
       | added it in, it challenges a human to work out what is going on.
       | No-context human developers are pretty useless too.
        
       ___________________________________________________________________
       (page generated 2025-08-13 23:01 UTC)