[HN Gopher] Claude Sonnet 4 now supports 1M tokens of context
       ___________________________________________________________________
        
       Claude Sonnet 4 now supports 1M tokens of context
        
       Author : adocomplete
       Score  : 773 points
       Date   : 2025-08-12 16:02 UTC (6 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | throwaway888abc wrote:
       | holy moly! awesome
        
       | tankenmate wrote:
       | This is definitely good to have this as an option but at the same
       | time having more context reduces the quality of the output
       | because it's easier for the LLM to get "distracted". So, I wonder
       | what will happen to the quality of code produced by tools like
       | Claude Code if users don't properly understand the trade off
       | being made (if they leave it in auto mode of coding right up to
       | the auto compact).
        
         | jasonthorsness wrote:
         | What do you recommend doing instead? I've been using Claude
         | Code a lot but am still pretty novice at the best practices
         | around this.
        
           | TheDong wrote:
           | Have the AI produce a plan that spans multiple files (like
           | "01 create frontend.md", "02 create backend.md", "03 test
           | frontend and backend running together.md"), and then create a
           | fresh context for each step if it looks like re-using the
           | same context is leading it to confusion.
           | 
           | Also, commit frequently, and if the AI constantly goes down
           | the wrong path ("I can't create X so I'll stub it out with Y,
           | we'll fix it later"), you can update the original plan with
           | wording to tell it not to take that path ("Do not ever stub
           | out X, we must make X work"), and then start a fresh session
           | with an older and simpler version of the code and see if that
           | fresh context ends up down a better path.
           | 
           | You can also run multiple attempts in parallel if you use
           | tooling that supports that (containers + git worktrees is one
           | way)
        
             | F7F7F7 wrote:
             | Inventivatbly the files become a mess of their own. Changes
             | and learnings from one part of the plan often dont result
             | in adaptation to impacted plans down chain.
             | 
             | In the end you have a mish mash of half implemented plans
             | and now you've lost context too. Which leads to blowing
             | tokens on trying to figure out what's been implemented,
             | what's half baked, and what was completely ignored.
             | 
             | Any links to anyone who's built something at scale using
             | this method? It always sounds good on paper.
             | 
             | I'd love to find a system that works.
        
               | brandall10 wrote:
               | My system is to create detailed feature files up to a few
               | hundred lines in size that are immutable, and then have a
               | status.md file (preferably kept to about 50 lines) that
               | links to a current feature that is used as a way to keep
               | track of the progress on that feature.
               | 
               | Additionally I have a Claude Code command with
               | instructions referencing the status.md, how to select the
               | next task, how to compact status.md, etc.
               | 
               | Every time I'm done with a unit of work from that feature
               | - always triggered w/ ultrathink - I'll put up a PR and
               | go through the motions of extra refactors/testing. For
               | more complex PRs that require many extra commits to get
               | prod ready I just let the sessions auto-compact.
               | 
               | After merging I'll clear the context and call the CC
               | command to progress to the next unit of work.
               | 
               | This allows me to put up to around 4-5 meaningful PRs per
               | feature if it's reasonably complex while keeping the
               | context relatively tight. The current project I'm focused
               | on is just over 16k LOC in swift (25k total w/ tests) and
               | it seems to work pretty well - it rarely gets off track,
               | does unnecessary refactors, destroys working features,
               | etc.
        
               | nzach wrote:
               | Care to elaborate on how you use the status.md file? What
               | exactly you put in there, and what value does it bring?
        
               | brandall10 wrote:
               | When I initially have it built from a feature file, it
               | pulls in the most pertinent high level details from that
               | and creates a supercharged task list that is updated w/
               | implementation details as the feature progresses.
               | 
               | As it links to the feature file as well, that is pulled
               | into the context, but status.md is there to essentially
               | act as a 'cursor' to where it is in the implementation
               | and provide extended working memory - that Claude itself
               | manages - specific to that feature. With that you can
               | work on bite sized chunks of the feature each with a
               | clean context. When the feature is complete it is
               | trashed.
               | 
               | I've seen others try to achieve similar things by making
               | CLAUDE.md or the feature file mutable but that IME is a
               | bad time. CLAUDE.md should be lean with the details to
               | work on the project, and the feature file can easily be
               | corrupted in an unintended way allowing things to go
               | wayward in scope.
        
               | nzach wrote:
               | In my experience it works better if you create one plan
               | at a time. Create a prompt, make claude implement it and
               | then you make sure it is working as expected. Only then
               | you ask for something new.
               | 
               | I've created an agent to help me create the prompts, it
               | goes something like this: "You are an Expert Software
               | Architect specializing in creating comprehensive, well-
               | researched feature implementation prompts. Your sole
               | purpose is to analyze existing codebases and
               | documentation to craft detailed prompts for new features.
               | You always think deeply before giving an answer...."
               | 
               | My workflow is: 1) use this agent to create a prompt for
               | my feature; 2) ask claude to create a plan for the just
               | created prompt; 3) ask claude to implement said plan if
               | it looks good.
        
               | cube00 wrote:
               | >You always think deeply before giving an answer...
               | 
               | Nice try but they're not giving you the "think deeper"
               | level just because you asked.
        
               | nzach wrote:
               | https://docs.anthropic.com/en/docs/build-with-
               | claude/prompt-...
        
               | dpe82 wrote:
               | Actually that's exactly how you do it.
        
               | theshrike79 wrote:
               | I use Gemini-cli (free 2.5 pro for an undetermined time
               | before it self-lobotomises and switches to lite) to keep
               | the specs up to date.
               | 
               | The actual tasks are stored in Github issues, which
               | Claude (and sometimes Gemini when it feels like it) can
               | access using the `gh` CLI tool.
               | 
               | But it's all just project management, if what the code
               | says drifts from what's in the specs (for any reason),
               | one of them has to change.
               | 
               | Claude does exactly what the documentation says, it
               | doesn't evaluate the fact that the code is completely
               | different and adapt - like a human would.
        
               | bredren wrote:
               | Don't rely entirely on CC. Once a milestone has been
               | reached, copy the full patch to clipboard and the
               | technical spec covering this. Provide the original files,
               | the patch and the spec to Gemini and say ~a colleague did
               | the work and does it fulfill the aims to best practices
               | and spec?
               | 
               | Pick among the best feedback to polish the work done by
               | CC---it will miss things that Gemini will catch.
               | 
               | Then do it again. Sometimes CC just won't follow feedback
               | well and you gotta make the changes yourself.
               | 
               | If you do this you'll be more gradual but by nature of
               | the pattern look at the changes more closely.
               | 
               | You'll be able to realign CC with the spec afterward with
               | a fresh context and the existing commits showing the way.
               | 
               | Fwiw, this kind of technique can be done entirely without
               | CC and can lead to excellent results faster, as Gemini
               | can look at the full picture all at once, vs having to
               | force cc to consume vs hen and peck slices of files.
        
             | wongarsu wrote:
             | Changing the prompt and rerunning is something where Cursor
             | still has a clear edge over Claude Code. It's such a
             | powerful technique for keeping the context small because it
             | keeps the context clear of back-and-forths and dead ends. I
             | wish it was more universally supported
        
               | abound wrote:
               | I do this all the time in Claude Code, you hit Escape
               | twice and select the conversation point to 'branch' from.
        
           | agotterer wrote:
           | I use the main Claude code thread (I don't know what to call
           | it) for planning and then explicitly tell Claude to delegate
           | certain standalone tasks out to subagents. The subagents
           | don't consume the main threads context window. Even just
           | delegating testing, debugging, and building will save a ton
           | context.
        
           | sixothree wrote:
           | /clear often is really the first tool for management. Do this
           | when you finish a task.
        
         | tehlike wrote:
         | Some reference:
         | 
         | https://simonwillison.net/2025/Jun/29/how-to-fix-your-contex...
         | 
         | https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho...
        
         | bachittle wrote:
         | As of now it's not integrated into Claude Code. "We're also
         | exploring how to bring long context to other Claude products".
         | I'm sure they already know about this issue and are trying to
         | think of solutions before letting users incur more costs on
         | their monthly plans.
        
           | PickledJesus wrote:
           | Seems to be for me, I came to look at HN because I saw it was
           | the default in CC
        
             | novaleaf wrote:
             | where do you see it in CC?
        
               | PickledJesus wrote:
               | I got a notification when I opened it, indicating that
               | the default had changed, and I can see it on /model.
               | 
               | Only on a max (20x) account, not there on a Pro one.
        
               | novaleaf wrote:
               | thanks, FYI I'm on a max 20x also and I don't see it!
        
               | tankenmate wrote:
               | maybe a staggered release?
        
               | Wowfunhappy wrote:
               | I'm curious, what does it say on /model?
               | 
               | For reference, my options are:                   +-------
               | ---------------------------------------------------------
               | -----------------------------+         |
               | |         |  Select Model
               | |         |  Switch between Claude models. Applies to
               | this session and future Claude Code sessions.     |
               | |  For custom model names, specify with --model.
               | |         |
               | |         |     1. Default (recommended)  Opus 4.1 for up
               | to 50% of usage limits, then use Sonnet 4     |         |
               | 2. Opus                   Opus 4.1 for complex tasks *
               | Reaches usage limits faster      |         |     3.
               | Sonnet                 Sonnet 4 for daily use
               | |         |     4. Opus Plan Mode         Use Opus 4.1 in
               | plan mode, Sonnet 4 otherwise                 |         |
               | |         +----------------------------------------------
               | -----------------------------------------------+
        
         | dbreunig wrote:
         | The team at Chroma is currently looking into this and should
         | have some figures.
        
       | falcor84 wrote:
       | Strange that they don't mention whether that's enabled or
       | configurable in Claude Code.
        
         | farslan wrote:
         | Yeah same, I'm curious about this. I would guess it's by
         | default enabled with Claude Code.
        
         | csunoser wrote:
         | They don't say it outright. But I think it is not in Claude
         | Code yet.
         | 
         | > We're also exploring how to bring long context to other
         | Claude products. - Anthropic
         | 
         | That is, any other product that is not Anthropic API tier 4 or
         | Amazon bedrock.
        
         | CharlesW wrote:
         | From a co-marketing POV, it's considered best practice to not
         | discuss home-grown offerings in the same or similar category as
         | products from the partners you're featuring.
         | 
         | It's likely they'll announce this week, albeit possibly just
         | within the "what's new" notes that you see when Claude Code is
         | updated.
        
       | faangguyindia wrote:
       | In my testing the gap between claude and gemini pro 2.5 is close.
       | My company is in asia pacific and we can't get access to claude
       | via vertex for some stupid reason.
       | 
       | but i tested it via other providers, the gap used to be huge but
       | now not.
        
         | Tostino wrote:
         | For me the gap is pretty large (in Gemini Pro 2.5's favor).
         | 
         | For reference, the code I am working on is a Spring Boot /
         | (Vaadin) Hilla multi-module project with helm charts for
         | deployment and a separate Python based module for ancillary
         | tasks that were appropriate for it.
         | 
         | I've not been able to get any good use out of Sonnet in months
         | now, whereas Gemini Pro 2.5 has (still) been able to grok the
         | project well enough to help out.
        
           | jona777than wrote:
           | I initially found Gemini Pro 2.5 to work well for coding.
           | Over time, I found Claude to be more consistently productive.
           | Gemini Pro 2.5 became my go-to for use cases benefitting from
           | larger context windows. Claude seemed to be the safer daily
           | driver (if I needed to get something done.)
           | 
           | All that being said, Gemini has been consistently dependable
           | when I had asks that involved large amounts of code and data.
           | Claude and the OpenAI models struggled with some tasks that
           | Gemini responsively satisfied seemingly without "breaking a
           | sweat."
           | 
           | Lately, it's been GPT-5 for brainstorming/planning, Claude
           | for hammering out some code, Gemini when there is huge
           | data/code requirements. I'm curious if the widened Sonnet 4
           | context window will change things.
        
           | llm_nerd wrote:
           | Opus 4.1 is a much better model for coding than Sonnet. The
           | latter is good for general queries / investigations or to
           | draw up some heuristics.
           | 
           | I have paid subscriptions to both Gemini Pro and Claude.
           | Hugely worthwhile expense professionally.
        
           | faangguyindia wrote:
           | when gemini 2.5 pro gets stuck, i often use deep seek r1 in
           | architect mode and qwen3 in coder mode in aider and it solves
           | all the problem
           | 
           | last month i ran into some wicked dependency bug and only
           | chatgpt could solve it which i am guessing is the case
           | because it has hot data from github?
           | 
           | On the other hand, i really need a tool like aider where i
           | can use various models in "architect" and "coder" mode.
           | 
           | what i've found is better reasoning models tend to be bad at
           | writing actual code, and models like qwen3 coder seems
           | better.
           | 
           | deep seek r1 will not write reliable code but it will reason
           | well and map out the path forward.
           | 
           | i wouldn't be surprised if sonnets success was doing EXACTLY
           | this behind the scenes.
           | 
           | but now i am looking for pure models who do not use this
           | black magic hack behind API.
           | 
           | I want more control at tool end where i can alter the prompts
           | and achieve results i want
           | 
           | this is one reason i do not use claude code etc....
           | 
           | aider is 80% of what i want wish it had more of what i want
           | though.
           | 
           | i just don't know why no one has build a perfect solution to
           | this yet.
           | 
           | Here are things i am missing in aider
           | 
           | 1. Automatic model switching, use different models for asking
           | questions about the code, different one for planning a
           | feature, different one for writing actual code.
           | 
           | 2. Self determine, if a feature needs a "reasoning" model or
           | coding model will suffice.
           | 
           | 3. be able to do more, selectively send context and drop the
           | files we don't need. Intelligently add files to context which
           | will be touched by the feature, not after having done all
           | code planning asking to add files, then doing it all over
           | again with more context available.
        
         | penguin202 wrote:
         | Claude doesn't have a mid-life crisis and try to `rm -rf /` or
         | delete your project.
        
         | film42 wrote:
         | Agree but pricing wise, Gemini 2.5 pro wins. Gemini input
         | tokens are half the cost of Claude 4. Output is $5/million
         | cheaper than Claude. But, document processing is significantly
         | cheaper. A 5MB PDF (customer invoice) with Gemini is like 5k
         | tokens vs 56k with Claude.
         | 
         | The only downside with Gemini (and it's a big one) is
         | availability. We get rate limited by their dynamic QoS all the
         | time even if we haven't reached our quota. Our GCP sales rep
         | keeps recommending "provisioned throughput," but it's both
         | expensive, and doesn't fit our workload type. Plus, the
         | VertexAI SDK is kind of a PITA compared to Anthropic.
        
           | Alex-Programs wrote:
           | Google products are such a pain to work with from an API
           | perspective that I actively avoid them where possible.
        
       | artursapek wrote:
       | Eagerly waiting for them to do this with Opus
        
         | irthomasthomas wrote:
         | Imagine paying $20 a prompt?
        
           | artursapek wrote:
           | If I can give it a detailed spec, walk away and do something
           | else for 20 minutes, and come back to work that would have
           | taken me 2 hours, then that's a steal.
        
           | datadrivenangel wrote:
           | Depending on how many prompts per hour you're looking at,
           | that's probably same order of magnitude as expensive SAAS. A
           | fancy CRM seat can be ~$2000 per month (or more), which
           | assuming 50 hours per week x 4 weeks per month is $10 per
           | hour ($2000/200 hours). A lot of money, but if it makes your
           | sales people more productive, it's a good investment.
           | Assuming that you're paying your sales people say 240K per
           | year, ($20,000 per month), then the SAAS cost is 10% of their
           | salary.
           | 
           | This explains DataDog pricing. Maybe it will give a future
           | look at AI pricing.
        
       | mettamage wrote:
       | Shame it's only the API. Would've loved to see it via the web
       | interface on claude.ai itself.
        
         | minimaxir wrote:
         | Can you even fit 200+k tokens worth of context in the web
         | interface? IMO Claude's API workbench is the worst of the three
         | major providers.
        
           | mettamage wrote:
           | Via text files right? Just drag and drop.
        
           | data-ottawa wrote:
           | When working on artifacts after a few change requests it
           | definitely can.
        
           | 77pt77 wrote:
           | Even if you can't, a conversation can easily get larger than
           | that.
        
         | fblp wrote:
         | I assume this will mean that long chats continue to get the
         | "prompt is too long" error?
        
       | penguin202 wrote:
       | But will it remember any of it, and stop creating new redundant
       | files when it can't find or understand what its looking for?
        
       | 1xer wrote:
       | moaaaaarrrr
        
       | aliljet wrote:
       | This is definitely one of my CORE problem as I use these tools
       | for "professional software engineering." I really desperately
       | need LLMs to maintain extremely effective context and it's not
       | actually that interesting to see a new model that's marginally
       | better than the next one (for my day-to-day).
       | 
       | However. Price is king. Allowing me to flood the context window
       | with my code base is great, but given that the price has
       | substantially increased, it makes sense to better manage the
       | context window into the current situation. The value I'm getting
       | here flooding their context window is great for them, but short
       | of evals that look into how effective Sonnet stays on track, it's
       | not clear if the value actually exists here.
        
         | rootnod3 wrote:
         | Flooding the context also means increasing the likelihood of
         | the LLM confusing itself. Mainly because of the longer context.
         | It derails along the way without a reset.
        
           | aliljet wrote:
           | How do you know that?
        
             | EForEndeavour wrote:
             | https://onnyunhui.medium.com/evaluating-long-context-
             | lengths...
        
             | bigmadshoe wrote:
             | https://research.trychroma.com/context-rot
        
               | joenot443 wrote:
               | This is a good piece. Clearly it's a pretty complex
               | problem and the intuitive result a layman engineer like
               | myself might expect doesn't reflect the reality of LLMs.
               | Regex works as reliably on 20 characters as it does 2m
               | characters; the only difference is speed. I've learned
               | this will probably _never_ be the case with LLMs, there
               | will forever exist some level of epistemic doubt in its
               | result.
               | 
               | When they announced Big Contexts in 2023, they referenced
               | being able to find a single changed sentence in the
               | context's copy of Great Gatsby[1]. This example seemed
               | _incredible_ to me at the time but now two years later
               | I'm feeling like it was pretty cherry-picked. What does
               | everyone else think? Could you feed a novel into an LLM
               | and expect it to find the single change?
               | 
               | [1] https://news.ycombinator.com/item?id=35941920
        
             | F7F7F7 wrote:
             | What do you think happens when things start falling outside
             | of its context window? It loses access to parts of your
             | conversation.
             | 
             | And that's why it will gladly rebuild the same feature over
             | and over again.
        
             | anonz4FWNqnX wrote:
             | I've had similar experiences. I've gone back and forth
             | between running models locally and using the commercial
             | models. The local models can be incredibly useful (gemma,
             | qwen), but they need more patience and work to get them to
             | work.
             | 
             | One advantage to running locally[1] is that you can set the
             | context length manually and see how well the llm uses it. I
             | don't have an exact experience to relay, but it's not
             | unusual for models to be allow longer contexts, but ignore
             | that context.
             | 
             | Just making the context big doesn't mean the LLM is going
             | to use it well.
             | 
             | [1] I've using lm studio on both a macbook air and a
             | macbook pro. Even a macbook air with 16G can run pretty
             | decent models.
        
               | nomel wrote:
               | A good example of this was the first Gemini model that
               | allowed 1 million tokens, but would lose track of the
               | conversation after a couple paragraphs.
        
             | rootnod3 wrote:
             | The longer the context and the discussion goes on, the more
             | it can get confused, especially if you have to refine the
             | conversation or code you are building on.
             | 
             | Remember, in its core it's basically a text prediction
             | engine. So the more varying context there is, the more
             | likely it is to make a mess of it.
             | 
             | Short context: conversion leaves the context window and it
             | loses context. Long context: it can mess with the model. So
             | the trick is to strike a balance. But if it's an online
             | models, you have fuck all to control. If it's a local
             | model, you have some say in the parameters.
        
             | fkyoureadthedoc wrote:
             | https://github.com/adobe-research/NoLiMa
        
             | giancarlostoro wrote:
             | Here's a paper from MIT that covers how this could be
             | resolved in an interesting fashion:
             | 
             | https://hanlab.mit.edu/blog/streamingllm
             | 
             | The AI field is reusing existing CS concepts for AI that we
             | never had hardware for, and now these people are learning
             | how applied Software Engineering can make their theoretical
             | models more efficient. It's kind of funny, I've seen this
             | in tech over and over. People discover new thing, then
             | optimize using known thing.
        
               | mamp wrote:
               | Unfortunately, I think the context rot paper [1] found
               | that the performance degradation when context increased
               | still occurred in models using attention sinks.
               | 
               | 1. https://research.trychroma.com/context-rot
        
               | kridsdale3 wrote:
               | The fact that this is happening is where the tremendous
               | opportunity to make money as an experienced Software
               | Engineer currently lies.
               | 
               | For instance, a year or two ago, the AI people discovered
               | "cache". Imagine how many millions the people who
               | implemented it earned for that one.
        
           | Wowfunhappy wrote:
           | I keep reading this, but with Claude Code in particular, I
           | consistently find it gets smarter the longer my conversations
           | go on, peaking right at the point where it auto-compacts and
           | everything goes to crap.
           | 
           | This isn't always true--some conversations go poorly and it's
           | better to reset and start over--but it usually is.
        
         | benterix wrote:
         | > it's not clear if the value actually exists here.
         | 
         | Having spent a couple of weeks on Claude Code recently, I
         | arrived to the conclusion that the net value for me from
         | agentic AI is actually negative.
         | 
         | I will give it another run in 6-8 months though.
        
           | wahnfrieden wrote:
           | Did you try with using Opus exclusively?
        
             | freedomben wrote:
             | Do you know if there's a way to force Claude code to do
             | that exclusively? I've found a few env vars online but they
             | don't seem to actually work
        
               | wahnfrieden wrote:
               | Peter Steinberger has been documenting his workflows and
               | he relies exclusively on Opus at least until recently.
               | (He also pays for a few Max 20x subscriptions at once to
               | avoid rate limits.)
        
               | atonse wrote:
               | You can type /config and then go to the setting to pick a
               | model.
        
               | gdudeman wrote:
               | Yes: type /model and then pick Opus 4.1.
        
               | artursapek wrote:
               | You can "force" it by just paying them $200 (which is
               | nothing compared to the value)
        
               | parineum wrote:
               | Value is irrelevant. What's the return on investment you
               | get from spending $200?
               | 
               | Collecting value doesn't really get you anywhere if
               | nobody is compensating you for it. Unless someone is
               | going to either pay for it for you or give you $200/mo
               | post-tax dollars, it's costing you money.
        
               | wahnfrieden wrote:
               | The return for me is faster output of features, fixes,
               | and polish for my products which increases revenue above
               | the cost of the tool. Did you need to ask this?
        
               | parineum wrote:
               | Yes, I did. Not everybody has their own product that
               | might benefit from a $200 subscription. Most of us work
               | for someone else and, unless that person is paying for
               | the subscription, the _value_ it adds is irrelevant
               | unless it results in better compensation.
               | 
               | Furthermore, the advice was given to upgrade to a $200
               | subscription from the $20 subscription. The difference in
               | value that might translate into income between the $20
               | option and the $200 option is very unclear.
        
               | wahnfrieden wrote:
               | If you are employed you should petition your employer for
               | tools you want. Maybe you can use it to take the day off
               | earlier or spend more time socializing. Or to get a
               | promotion or performance bonus. Hopefully not just to
               | meet rising productivity expectations without being
               | handed the tools needed to achieve that. Having full-time
               | access to these tools can also improve your own skills in
               | using them, to profit from in a later career move or from
               | contributing toward your own ends.
        
               | parineum wrote:
               | I'm not disputing that. I'm just pushing back against the
               | casual suggestion (not by you) to just go spend $200.
               | 
               | No doubt that you should ask you employer for the tools
               | you want/need to do your job but plenty of us are using
               | this kind of thing casually and the response to "Any way
               | I can force it to use [Opus] exclusively?" is "Spend
               | $200, it's worth it." isn't really helpful, especially in
               | the context where the poster was clearly looking to try
               | it out to see if it was worth it.
        
               | epiccoleman wrote:
               | is Opus that much better than Sonnet? My sub is $20 a
               | month, so I guess I'd have to buy that I'm going to get a
               | 10x boost, which seems dubious
        
               | theshrike79 wrote:
               | With the $20 plan you get Opus on the web and in the
               | native app. Just not in Claude Code.
               | 
               | IMO it's pretty good for design, but with code it gets in
               | its head a bit too much and overthinks and
               | overcomplicates solutions.
        
               | artursapek wrote:
               | Yes, Opus is much better at complicated architecture
        
           | mark_l_watson wrote:
           | I am sort of with you. I am down to asking Gemini Pro a
           | couple of questions a day, use ChatGPT just a few times a
           | week, and about once a week use gemini-cli (either a short
           | free session, or a longer session where I provide my API
           | key.)
           | 
           | That said I spend (waste?) an absurdly large amount of time
           | each week experimenting with local models (sometimes
           | practical applications, sometimes 'research').
        
           | mikepurvis wrote:
           | For a bit more nuance, I think I would my overall net is
           | about break even. But I don't take that as "it's not worth it
           | at all, abandon ship" but rather that I need to hone my
           | instinct of what is and is not a good task for AI
           | involvement, and what that involvement should look like.
           | 
           | Throwing together a GHA workflow? Sure, make a ticket, assign
           | it to copilot, check in later to give a little feedback and
           | we're golden. Half a day of labour turned into fifteen
           | minutes.
           | 
           | But there are a lot of tasks that are far too nuanced where
           | trying to take that approach just results in frustration and
           | wasted time. There it's better to rely on editor completion
           | or maybe the chat interface, like "hey I want to do X and Y,
           | what approach makes sense for this?" and treat it like a
           | rubber duck session with a junior colleague.
        
           | cambaceres wrote:
           | For me it's meant a huge increase in productivity, at least
           | 3X.
           | 
           | Since so many claim the opposite, I'm curious to what you do
           | more specifically? I guess different roles/technologies
           | benefit more from agents than others.
           | 
           | I build full stack web applications in node/.net/react, more
           | importantly (I think) is that I work on a small startup and
           | manage 3 applications myself.
        
             | datadrivenangel wrote:
             | How do you structure your applications for maintainability?
        
             | dingnuts wrote:
             | You have small applications following extremely common
             | patterns and using common libraries. Models are good at
             | regurgitating patterns they've seen many times, with fuzzy
             | find/replace translations applied.
             | 
             | Try to build something like Kubernetes from the ground up
             | and let us know how it goes. Or try writing a custom
             | firmware for a device you just designed. Something like
             | that.
        
             | elevatortrim wrote:
             | I think there are two broad cases where ai coding is
             | beneficial:
             | 
             | 1. You are a good coder but working on a new (to you) or
             | building a new project, or working with a technology you
             | are not familiar with. This is where AI is hugely
             | beneficial. It does not only accelerate you, it lets you do
             | things you could not otherwise.
             | 
             | 2. You have spent a lot of time on engineering your context
             | and learning what AI is good at, and using it very
             | strategically where you know it will save time and not
             | bother otherwise.
             | 
             | If you are a really good coder, really familiar with the
             | project, and mostly changing its bits and pieces rather
             | than building new functionality, AI won't accelerate you
             | much. Especially if you did not invest the time to make it
             | work well.
        
             | nicce wrote:
             | > I build full stack web applications in node/.net/react,
             | more importantly (I think) is that I work on a small
             | startup and manage 3 applications myself.
             | 
             | I think this is your answer. For example, React and
             | JavaScript are extremely popular and aged. Are you using
             | TypeScript and want to get most of the types or are you
             | accepting everything that LLM gives as JavaScript? How
             | interested you are about the code whether it is using "soon
             | to be deprecated" functions or the most optimized
             | loop/implementation? How about the project structure?
             | 
             | In other cases, the more precision you need, less effective
             | LLM is.
        
             | rs186 wrote:
             | 3X if not 10X if you are starting a new project with
             | Next.js, React, Tailwind CSS for a fullstack website
             | development, that solves an everyday problem. Yeah I just
             | witnessed that yesterday when creating a toy project.
             | 
             | For my company's codebase, where we use internal tools and
             | proprietary technology, solving a problem that does not
             | exist outside the specific domain, on a codebase of over
             | 1000 files? No way. Even locating the correct file to edit
             | is non trivial for a new (human) developer.
        
               | GenerocUsername wrote:
               | Your first week of AI usage should be crawling your
               | codebase and generating context.md docs that can then be
               | fed back into future prompts so that AI understands your
               | project space, packages, apis, and code philosophy.
               | 
               | I guarantee your internal tools are not revolutionary,
               | they are just unrepresented in the ML model out of the
               | box
        
               | nicce wrote:
               | Even then, are you even allowed to use AI in such
               | codebase. Is some part of the code "bought", e.g.
               | commercial compiler generated with specific license? Is
               | pinky promise from LLM provider enough?
        
               | orra wrote:
               | That sounds incredibly boring.
               | 
               | Is it effective? If so I'm sure we'll see models to
               | generate those context.md files.
        
               | cpursley wrote:
               | Yes. And way less boring than manually reading a section
               | of a codebase to understand what is going on after being
               | away from it for 8 months. Claude's docs and git commit
               | writing skills are worth it for that alone.
        
               | blitztime wrote:
               | How do you keep the context.md updated as the code
               | changes?
        
               | shmoogy wrote:
               | I tell Claude to update it generally but you can probably
               | use a hook
        
               | tombot wrote:
               | This, while it has context of the current problem, just
               | ask Claude to re-read it's own documentation and think of
               | things to add that will help it in the future
        
               | MattGaiser wrote:
               | Yeah, anecdotally it is heavily dependent on:
               | 
               | 1. Using a common tech. It is not as good at Vue as it is
               | at React.
               | 
               | 2. Using it in a standard way. To get AI to really work
               | well, I have had to change my typical naming conventions
               | (or specify them in detail in the instructions).
        
               | nicce wrote:
               | React also seems to be actually alias for Next.js. Models
               | have hard time to make the difference.
        
               | mike_hearn wrote:
               | My codebase has about 1500 files and is highly domain
               | specific: it's a tool for shipping desktop apps[1] that
               | handles all the building, packaging, signing, uploading
               | etc for every platform on every OS simultaneously. It's
               | written mostly in Kotlin, and to some extent uses a
               | custom in-house build system. The rest of the build is
               | Gradle, which is a notoriously confusing tool. The source
               | tree also contains servers, command line tools and a
               | custom scripting language which is used for all the
               | scripting needs of the project [2].
               | 
               | The code itself is quite complex and there's lots of
               | unusual code for munging undocumented formats, speaking
               | undocumented protocols, doing cryptography, Mac/Windows
               | specific APIs, and it's all built on a foundation of a
               | custom parallel incremental build system.
               | 
               | In other words: nightmare codebase for an LLM. Nothing
               | like other codebases. Yet, Claude Code demolishes
               | problems in it without a sweat.
               | 
               | I don't know why people have different experiences but
               | speculating a bit:
               | 
               | 1. I wrote most of it myself and this codebase is
               | unusually well documented and structured compared to
               | most. All the internal APIs have full JavaDocs/KDocs,
               | there are extensive design notes in Markdown in the
               | source tree, the user guide is also part of the source
               | tree. Files, classes and modules are logically named.
               | Files are relatively small. All this means Claude can
               | often find the right parts of the source within just a
               | few tool uses.
               | 
               | 2. I invested in making a good CLAUDE.md and also wrote a
               | script to generate "map.md" files that are at the top of
               | every module. These map files contain one-liners of what
               | every source file contains. I used Gemini to make these
               | due to its cheap 1M context window. If Claude _does_
               | struggle to find the right code by just reading the
               | context files or guessing, it can consult the maps to
               | locate the right place quickly.
               | 
               | 3. I've developed a good intuition for what it can and
               | cannot do well.
               | 
               | 4. I don't ask it to do big refactorings that would
               | stress the context window. IntelliJ is for refactorings.
               | AI is for writing code.
               | 
               | [1] https://hydraulic.dev
               | 
               | [2] https://hshell.hydraulic.dev/
        
               | tptacek wrote:
               | That's an interesting comment, because "locating the
               | correct file to edit" was the very first thing LLMs did
               | that was valuable to me as a developer.
        
             | evantbyrne wrote:
             | The problem with these discussions is that almost nobody
             | outside of the agency/contracting world seems to track
             | their time. Self-reported data is already sketchy enough
             | without layering on the issue of relying on distant memory
             | of fine details.
        
             | andrepd wrote:
             | Self-reports are notoriously overexcited, real results are,
             | let's say, not so stellar.
             | 
             | https://metr.org/blog/2025-07-10-early-2025-ai-
             | experienced-o...
        
               | logicprog wrote:
               | Here's an in depth analysis and critique of that study by
               | someone whose job is literally to study programmers
               | psychologically and has experience in sociology studies:
               | https://www.fightforthehuman.com/are-developers-slowed-
               | down-...
               | 
               | Basically, the study has a fuckton of methodological
               | problems that seriously undercut the quality of its
               | findings, and even assuming its findings are correct, if
               | you look closer at the data, it doesn't show what it
               | claims to show regarding developer estimations, and the
               | story of whether it speeds up or slows down developers is
               | actually much more nuanced and precisely mirrors what the
               | developers themselves say in the qualitative quote
               | questionaire, and relatively closely mirrors what the
               | more nuanced people will say here -- that it helps with
               | things you're less familiar with, that have scope creep,
               | etc a lot more, but is less or even negatively useful for
               | the opposite scenarios -- even in the worst case setting.
               | 
               | Not to mention this is studying a highly specific and
               | rare subset of developers, and they even admit it's a
               | subset that isn't applicable to the whole.
        
             | acedTrex wrote:
             | I have yet to get it to generate code past 10ish lines that
             | I am willing to accept. I read stuff like this and wonder
             | how low yall's standards are, or if you are working on
             | projects that just do not matter in any real world sense.
        
               | spicyusername wrote:
               | 4/5 times I can easily get 100s of lines output, that
               | only needs a quick once over.
               | 
               | 1/5 times, I spend an extra hour tangled in code it
               | outputs that I eventually just rewrite from scratch.
               | 
               | Definitely a massive net positive, but that 20% is
               | extremely frustrating.
        
               | acedTrex wrote:
               | That is fascinating to me, i've never seen it generate
               | that much code that is actually something i would
               | consider correct. It's always wrong in some way.
        
               | LinXitoW wrote:
               | In my experience, if I have to issue more than 2
               | corrections, I'm better off restarting and beefing up the
               | prompt or just doing it myself
        
               | dillydogg wrote:
               | Whenever I read comments from the people singing their
               | praises of the technology, it's hard not to think of the
               | study that found AI tools made developers slower in early
               | 2025.
               | 
               | >When developers are allowed to use AI tools, they take
               | 19% longer to complete issues--a significant slowdown
               | that goes against developer beliefs and expert forecasts.
               | This gap between perception and reality is striking:
               | developers expected AI to speed them up by 24%, and even
               | after experiencing the slowdown, they still believed AI
               | had sped them up by 20%.
               | 
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
        
               | mstkllah wrote:
               | Ah, the very extensive study with 16 developers.
               | Bulletproof results.
        
               | izacus wrote:
               | Yeah, we should listen to the one "trust me bro" dude
               | instead.
        
               | troupo wrote:
               | Compared to "it's just a skill issue you're not prompting
               | it correctly" crowd with literally zero actionable data?
        
               | logicprog wrote:
               | Here's an in depth analysis and critique of that study by
               | someone whose job is literally to study programmers
               | psychologically and has experience in sociology studies:
               | https://www.fightforthehuman.com/are-developers-slowed-
               | down-...
               | 
               | Basically, the study has a fuckton of methodological
               | problems that seriously undercut the quality of its
               | findings, and even assuming its findings are correct, if
               | you look closer at the data, it doesn't show what it
               | claims to show regarding developer estimations, and the
               | story of whether it speeds up or slows down developers is
               | actually much more nuanced and precisely mirrors what the
               | developers themselves say in the qualitative quote
               | questionaire, and relatively closely mirrors what the
               | more nuanced people will say here -- that it helps with
               | things you're less familiar with, that have scope creep,
               | etc a lot more, but is less or even negatively useful for
               | the opposite scenarios -- even in the worst case setting.
               | 
               | Not to mention this is studying a highly specific and
               | rare subset of developers, and they even admit it's a
               | subset that isn't applicable to the whole.
        
               | dillydogg wrote:
               | This is very helpful, thank you for the resource
        
               | djeastm wrote:
               | Standards are going to be as low as the market allows I
               | think. Some industries code quality is paramount, other
               | times its negligible and perhaps speed of development is
               | higher priority and the code is mostly disposable.
        
             | wiremine wrote:
             | > Having spent a couple of weeks on Claude Code recently, I
             | arrived to the conclusion that the net value for me from
             | agentic AI is actually negative.
             | 
             | > For me it's meant a huge increase in productivity, at
             | least 3X.
             | 
             | How do we reconcile these two comments? I think that's a
             | core question of the industry right now.
             | 
             | My take, as a CTO, is this: we're giving people new tools,
             | and very little training on the techniques that make those
             | tools effective.
             | 
             | It's sort of like we're dropping trucks and airplanes on a
             | generation that only knows walking and bicycles.
             | 
             | If you've never driven a truck before, you're going to
             | crash a few times. Then it's easy to say "See, I told you,
             | this new fangled truck is rubbish."
             | 
             | Those who practice with the truck are going to get the hang
             | of it, and figure out two things:
             | 
             | 1. How to drive the truck effectively, and
             | 
             | 2. When NOT to use the truck... when talking or the bike is
             | actually the better way to go.
             | 
             | We need to shift the conversation to techniques, and away
             | from the tools. Until we do that, we're going to be forever
             | comparing apples to oranges and talking around each other.
        
               | jdgoesmarching wrote:
               | Agreed, and it drives me bonkers when people talk about
               | AI coding as if it represents some a single technique,
               | process, or tool.
               | 
               | Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days.
               | 
               | We don't even fully agree on the best practices for
               | writing code _without_ AI.
        
               | moregrist wrote:
               | > Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days.
               | 
               | There were gobs of terrible road metaphors that spun out
               | of calling the Internet the "Information Superhighway."
               | 
               | Gobs and gobs of them. All self-parody to anyone who knew
               | anything.
               | 
               | I hesitate to relate this to anything in the current AI
               | era, but maybe the closest (and in a gallows humor/doomer
               | kind of way) is the amount of exec speak on how many jobs
               | will be replaced.
        
               | porksoda wrote:
               | Remember the ones who loudly proclaimed the internet to
               | be a passing fad, not useful for normal people. All anti
               | LLM rants taste like that to me.
               | 
               | I get why they thought that - it was kind of crappy
               | unless you're one who is excited about the future and
               | prepared to bleed a bit on the edge.
        
               | mh- wrote:
               | _> Makes me wonder if people spoke this way about "using
               | computers" or "using the internet" in the olden days._
               | 
               | Older person here: they absolutely did, all over the
               | place in the early 90s. I remember people decrying
               | projects that moved them to computers everywhere I went.
               | Doctors offices, auto mechanics, etc.
               | 
               | Then later, people did the same thing about _the
               | Internet_ (was written with a single word capital I by
               | 2000, having been previously written as two separate
               | words.)
               | 
               | https://i.imgur.com/vApWP6l.png
        
               | jacquesm wrote:
               | And not all of those people were wrong.
        
               | jeremy_k wrote:
               | Well put. It really does come down to nuance. I find
               | Claude is amazing at writing React / Typescript. I mostly
               | let it do it's own thing and skim the results after. I
               | have it write Storybook components so I can visually
               | confirm things look how I want. If something isn't quite
               | right I'll take a look and if I can spot the problem and
               | fix it myself, I'll do that. If I can't quickly spot it,
               | I'll write up a prompt describing what is going on and
               | work through it with AI assistance.
               | 
               | Overall, React / Typescript I heavily let Claude write
               | the code.
               | 
               | The flip side of this is my server code is Ruby on Rails.
               | Claude helps me a lot less here because this is my
               | primary coding background. I also have a certain way I
               | like to write Ruby. In these scenarios I'm usually asking
               | Claude to generate tests for code I've already written
               | and supplying lots of examples in context so the coding
               | style matches. If I ask Claude to write something novel
               | in Ruby I tend to use it as more of a jumping off point.
               | It generates, I read, I refactor to my liking. Claude is
               | still very helpful, but I tend to do more of the code
               | writing for Ruby.
               | 
               | Overall, helpful for Ruby, I still write most of the
               | code.
               | 
               | These are the nuances I've come to find and what works
               | best for my coding patterns. But to your point, if you
               | tell someone "go use Claude" and they have have a
               | preference in how to write Ruby and they see Claude
               | generate a bunch of Ruby they don't like, they'll likely
               | dismiss it as "This isn't useful. It took me longer to
               | rewrite everything than just doing it myself". Which all
               | goes to say, time using the tools whether its Cursor,
               | Claude Code, etc (I use OpenCode) is the biggest key but
               | figuring out how to get over the initial hump is probably
               | the biggest hurdle.
        
               | croes wrote:
               | Do you only skim the results or do you audit them at some
               | point to prevent security issues?
        
               | jeremy_k wrote:
               | What kind of security issues are you thinking about? I'm
               | generating UI components like Selects for certain data
               | types or Charts of data.
        
               | dghlsakjg wrote:
               | User input is a notoriously thorny area.
               | 
               | If you aren't sanitizing and checking the inputs
               | appropriately somewhere between the user and trusted
               | code, you WILL get pwned.
               | 
               | Rails provides default ways to avoid this, but it makes
               | it very easy to do whatever you want with user input.
               | Rails will not necessarily throw a warning if your AI
               | decides that it wants to directly interpolate user input
               | into a sql query.
        
               | jeremy_k wrote:
               | Well in this case, I am reading through everything that
               | is generated for Rails because I want things to be done
               | my way. For user input, I tend to validate everything
               | with Zod before sending it off the backend which then
               | flows through ActiveRecord.
               | 
               | I get what you're saying that AI could write something
               | that executes user input but with the way I'm using the
               | tools that shouldn't happen.
        
               | k9294 wrote:
               | For this very reason I switched for TS for backend as
               | well. I'm not a big fun of JS but the productivity gain
               | of having shared types between frontend and backend and
               | the Claude code proficiency with TS is immense.
        
               | jeremy_k wrote:
               | I considered this, but I'm just too comfortable writing
               | my server logic in Ruby on Rails (as I do that for my day
               | job and side project). I'm super comfortable writing
               | client side React / Typescript but whenever I look at
               | server side Typescript code I'm like "I should understand
               | what this is doing but I don't" haha.
        
               | jorvi wrote:
               | It is not really a nuanced take when it compares
               | 'unassisted' coding to using a bicycle and AI-assisted
               | coding with a truck.
               | 
               | I put myself somewhere in the middle in terms of how
               | great I think LLMs are for coding, but anyone that has
               | worked with a colleague that loves LLM coding knows how
               | horrid it is that the team has to comb through and
               | doublecheck their commits.
               | 
               | In that sense it would be equally nuanced to call AI-
               | assisted development something like "pipe bomb coding".
               | You toss out your code into the branch, and your non-AI'd
               | colleagues have to quickly check if your code is a
               | harmless tube of code or yet another contraption that
               | quickly needs defusing before it blows up in everyone's
               | face.
               | 
               | Of course that is not nuanced either, but you get the
               | point :)
        
               | LinXitoW wrote:
               | Oh nuanced the comparison seems also depends on whether
               | you live in Arkansas or in Amsterdam.
               | 
               | But I disagree that your counterexample has anything at
               | all to do with AI coding. That very same developer was
               | perfectly capable of committing untested crap without AI.
               | Perfectly capable of copy pasting the first answer they
               | found on Stack Overflow. Perfectly capable of recreating
               | utility functions over and over because they were to lazy
               | to check if they already exist.
        
               | nabla9 wrote:
               | I agree.
               | 
               | I experience a productivity boost, and I believe it's
               | because I prevent LLMs from making design choices or
               | handling creative tasks. They're best used as a "code
               | monkey", fill in function bodies once I've defined them.
               | I design the data structures, functions, and classes
               | myself. LLMs also help with learning new libraries by
               | providing examples, and they can even write unit tests
               | that I manually check. Importantly, no code I haven't
               | read and accepted ever gets committed.
               | 
               | Then I see people doing things like "write an app for
               | ....", run, hey it works! WTF?
        
               | quikoa wrote:
               | It's not just about the programmer and his experience
               | with AI tools. The problem domain and programming
               | language(s) used for a particular project may have a
               | large impact on how effective the AI can be.
        
               | wiremine wrote:
               | > The problem domain and programming language(s) used for
               | a particular project may have a large impact on how
               | effective the AI can be.
               | 
               | 100%. Again, if we only focus on things like context
               | windows, we're missing the important details.
        
               | vitaflo wrote:
               | But even on the same project with the same tools the
               | general way a dev derives satisfaction from their work
               | can play a big role. Some devs derive satisfaction from
               | getting work done and care less about the code as long as
               | it works. Others derive satisfaction from writing well
               | architected and maintainable code. One can guess the
               | reactions to how LLM's fit into their day to day lives
               | for each.
        
               | weego wrote:
               | In a similar role and place with this.
               | 
               | My biggest take so far: If you're a disciplined coder
               | that can handle 20% of an entire project's (project being
               | a bug through to an entire app) time being used on
               | research, planning and breaking those plans into phases
               | and tasks, then augmenting your workflow with AI appears
               | to be to have large gains in productivity.
               | 
               | Even then you need to learn a new version of explaining
               | it 'out loud' to get proper results.
               | 
               | If you're more inclined to dive in and plan as you go,
               | and store the scope of the plan in your head because
               | "it's easier that way" then AI 'help' will just
               | fundamentally end up in a mess of frustration.
        
               | cmdli wrote:
               | My experience has been entirely the opposite as an IC. If
               | I spend the time to delve into the code base to the point
               | that I understand how it works, AI just serves as a mild
               | improvement in writing code as opposed to implementing it
               | normally, saving me maybe 5 minutes on a 2 hour task.
               | 
               | On the other hand, I've found success when I have no idea
               | how to do something and tell the AI to do it. In that
               | case, the AI usually does the wrong thing but it can
               | oftentimes reveal to me the methods used in the rest of
               | the codebase.
        
               | zarzavat wrote:
               | Both modes of operation are useful.
               | 
               | If you know how to do something, then you can give Claude
               | the broad strokes of how you want it done and -- if you
               | give enough detail -- hopefully it will come back with
               | work similar to what you would have written. In this case
               | it's saving you on the order of minutes, but those
               | minutes add up. There is a possibility for negative time
               | saving if it returns garbage.
               | 
               | If you _don 't_ know how to do something then you can see
               | if an AI has any ideas. This is where the big
               | productivity gains are, hours or even days can become
               | minutes if you are sufficiently clueless about something.
        
               | jacobr1 wrote:
               | An importantly the cycle time on this stuff can be much
               | faster. Trying out different variants, and iterating
               | through larger changes can be huge.
        
               | hirako2000 wrote:
               | The issue is that you would be not just clueless but
               | grown naive about the correctness of what it did.
               | 
               | Knowing what to do at least you can review. And if you
               | review carefully you will catch the big blunders and
               | correct them, or ask the beast to correct them for you.
               | 
               | > Claude, please generate a safe random number. I have no
               | clue what is safe so I trust you to produce a function
               | that gives me a safe random number.
               | 
               | Not every use case is sensitive, but even building pieces
               | for entertainment, if it wipe things it shouldn't delete
               | or drain the battery doing very inefficient operations
               | here and there, it's junk, undesirable software.
        
               | bcrosby95 wrote:
               | Claude will point you in the right neighborhood but to
               | the wrong house. So if you're completely ignorant that's
               | cool. But recognize that its probably wrong and only a
               | starting point.
               | 
               | Hell, I spent 3 hours "arguing" with Claude the other day
               | in a new domain because my intuition told me something
               | was true. I brought out all the technical reason why it
               | was fine but Claude kept skirting around it saying the
               | code change was wrong.
               | 
               | After spending extra time researching it I found out
               | there was a technical term for it and when I brought that
               | up Claude finally admitted defeat. It was being a
               | persistent little fucker before then.
               | 
               | My current hobby is writing concurrent/parallel systems.
               | Oh god AI agents are terrible. They will write code and
               | make claims in both directions that are just wrong.
        
               | teaearlgraycold wrote:
               | LLMs are great at semantic searching through packages
               | when I need to know exactly how something is implemented.
               | If that's a major part of your job then you're saving a
               | ton of time with what's available today.
        
               | t0mas88 wrote:
               | For me it has a big positive impact on two sides of the
               | spectrum and not so much in the middle.
               | 
               | One end is larger complex new features where I spend a
               | few days thinking about how to approach it. Usually most
               | thought goes into how to do something complex with good
               | performance that spans a few apps/services. I write a
               | half page high level plan description, a set of bullets
               | for gotchas and how to deal with them and list normal
               | requirements. Then let Claude Code run with that. If the
               | input is good you'll get a 90% version and then you can
               | refactor some things or give it feedback on how to do
               | some things more cleanly.
               | 
               | The other end of the spectrum is "build this simple
               | screen using this API, like these 5 other examples". It
               | does those well because it's almost advanced autocomplete
               | mimicking your other code.
               | 
               | Where it doesn't do well for me is in the middle between
               | those two. Some complexity, not a big plan and not simple
               | enough to just repeat something existing. For those
               | things it makes a mess or you end up writing a lot of
               | instructions/prompt abs could have just done it yourself.
        
               | ath3nd wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | The current freshest study focusing on experienced
               | developers showed a net negative in the productivity when
               | using an LLM solution in their flow:
               | 
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
               | 
               | My conclusion on this, as an ex VP of Engineering, is
               | that good senior developers find little utility with LLMs
               | and even them to be a nuisance/detriment, while for
               | juniors, they can be godsend, as they help them with
               | syntax and coax the solution out of them.
               | 
               | It's like training wheels to a bike. A toddler might find
               | 3x utility, while a person who actually can ride a bike
               | well will find themselves restricted by training wheels.
        
               | pesfandiar wrote:
               | Your analogy would be much better with giving workers a
               | work horse with a mind of its own. Trucks come with clear
               | instructions and predictable behaviour.
        
               | chasd00 wrote:
               | > Your analogy would be much better with giving workers a
               | work horse with a mind of its own.
               | 
               | i think this is a very insightful comment with respect to
               | working with LLMs. If you've ever ridden a horse you
               | don't really tell it to walk, run, turn left, turn right,
               | etc you have to convince it to do those things and not be
               | too aggravating while you're at it. With a truck simple
               | cause and effect applies but with horse it's a
               | negotiation. I feel like working with LLMs is like a
               | negotiation, you have to coax out of it what you're
               | after.
        
               | pletnes wrote:
               | Being a consultant / programmer with feet on the ground,
               | eh, hands on the keyboard: some orgs let us use some AI
               | tools, others do not. Some projects are predominantly new
               | code based on recent tech (React); others include
               | maintaining legacy stuff on windows server and
               | proprietary frameworks. AI is great on some tasks, but
               | unavailable or ignorant about others. Some projects have
               | sharp requirements (or at least, have requirements)
               | whereas some require 39 out of 40 hours a week guessing
               | at what the other meat-based intelligences actually want
               | from us.
               | 
               | What <<programming>> actually entails, differs
               | enormously; so does AI's relevance.
        
               | abc_lisper wrote:
               | I doubt there is much art to getting LLM work for you,
               | despite all the hoopla. Any competent engineer can figure
               | that much out.
               | 
               | The real dichotomy is this. If you are aware of the
               | tools/APIs and the Domain, you are better off writing the
               | code on your own, except may be shallow changes like
               | refactorings. OTOH, if you are not familiar with the
               | domain/tools, using a LLM gives you a huge legup by
               | preventing you from getting stuck and providing intial
               | momentum.
        
               | jama211 wrote:
               | I dunno, first time I tried an LLM I was getting so
               | annoyed because I just wanted it to go through a css file
               | and replace all colours with variables defined in root,
               | and it kept missing stuff and spinning and I was getting
               | so frustrated. Then a friend told me I should instead
               | just ask it to write a script which accomplishes that
               | goal, and it did it perfectly in one prompt, then ran it
               | for me, and also wrote another script to check it hadn't
               | missed any and ran that.
               | 
               | At no point when it was getting f stuck initially did it
               | suggest another approach, or complain that it was outside
               | its context window even though it was.
               | 
               | This is a perfect example of "knowing how to use an LLM"
               | taking it from useless to useful.
        
               | abc_lisper wrote:
               | Which one did you use and when was this? I mean, no body
               | gets anything working right the first time. You got to
               | spend a few days atleast trying to understand the tool
        
               | badlucklottery wrote:
               | This is my experience as well.
               | 
               | LLM currently produce pretty mediocre code. A lot of that
               | is a "garbage in, garbage out" issue but it's just the
               | current state of things.
               | 
               | If the alternative is noob code or just not doing a task
               | at all, then mediocre is great.
               | 
               | But 90% of the time I'm working in a familiar
               | language/domain so I can grind out better code relatively
               | quickly and do so in a way that's cohesive with nearby
               | code in the codebase. The main use-case I have for AI in
               | that case is writing the trivial unit tests for me.
               | 
               | So it's another "No Silver Bullet" technology where the
               | problem it's fixing isn't the essential problem software
               | engineers are facing.
        
               | brulard wrote:
               | I believe there IS much art in LLMs and Agents
               | especially. Maybe you can get like 20% boost quite
               | quickly, but there is so much room to grow it to maybe
               | 500% long term.
        
               | worldsayshi wrote:
               | I think it's very much down to which kind of problem
               | you're trying to solve.
               | 
               | If a solution can subtly fail and it is critical that it
               | doesn't, LLM is net negative.
               | 
               | If a solution is easy to verify or if it is enough that
               | it walks like a duck and quacks like one, LLM can be very
               | useful.
               | 
               | I've had examples of both lately. I'm very much both
               | bullish and bearish atm.
        
               | oceanplexian wrote:
               | It's pretty simple, AI is now political for a lot of
               | people. Some folks have a vested interest in downplaying
               | it or over hyping it rather than impartially approaching
               | it as a tool.
        
               | chasd00 wrote:
               | One thing to think about is many software devs have a
               | very hard time with code they didn't write. I've seen
               | many devs do a lot of work to change code to something
               | equivalent (even with respect to performance and
               | readability) only because it's not the way they would
               | have done it. I could see people having a hard time using
               | what the LLM produced without having to "fix it up" and
               | basically re-write everything.
        
               | jama211 wrote:
               | Yeah sometimes I feel like a unicorn because I don't
               | really care about code at all, so long as it conforms to
               | decent standards and does what it needs to do. I honestly
               | believe engineers often overestimate the importance of
               | elegance in code too, to the point of not realising the
               | slow down of a project due to overly perfect code is
               | genuinely not worth it.
        
               | parpfish wrote:
               | i dont care if the code is elegant, i care that the code
               | is _consistent_.
               | 
               | do the same thing in the same way each time and it lets
               | you chunk it up and skim it much easier. if there are
               | little differences each time, you have to keep asking
               | yourself "is it done differently here for a particular
               | reason?"
        
               | vanviegen wrote:
               | Exactly! And besides that, new code being consistent with
               | its surrounding code used to be a sign of careful
               | craftsmanship (as opposed to spaghetti-against-the-wall
               | style coding), giving me some confidence that the
               | programmer may have considered at least the most
               | important nasty edge cases. LLMs have rendered that
               | signal mostly useless, of course.
        
               | dennisy wrote:
               | Also another view is that developers below a certain
               | level get a positive benefit and those above get a
               | negative effect.
               | 
               | This makes sense, as the models are an average of the
               | code out there and some of us are above and below that
               | average.
               | 
               | Sorry btw I do not want to offend anyone who feels they
               | do garner a benefit from LLMs, just wanted to drop in
               | this idea!
        
               | ath3nd wrote:
               | That's my anecdotal experience as well! Junior devs
               | struggle with a lot of things:
               | 
               | - syntax
               | 
               | - iteration over an idea
               | 
               | - breaking down the task and verifying each step
               | 
               | Working with a tool like Claude that gets them started
               | quick and iterate the solution together with them helps
               | them tremendously and educate them on best practices in
               | the field.
               | 
               | Contrast that with a seasoned developer with a domain
               | experience, good command of the programming language and
               | knowledge of the best practices and a clear vision of how
               | the things can be implemented. They hardly need any help
               | on those steps where the junior struggled and where the
               | LLMs shine, maybe some quick check on the API, but that's
               | mostly it. That's consistent with the finding of the
               | study https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o... that experienced developers' performance
               | suffered when using an LLM.
               | 
               | What I used as a metaphor before to describe this
               | phenomena is _training wheels_ : kids learning how to
               | ride a bike can get the basics with the help and safety
               | of the wheels, but adults that already can ride a bike
               | don't have any use for the training wheels, and can often
               | find restricted by them.
        
               | parpfish wrote:
               | i don't know if anybody else has experienced this, but
               | one of my biggest time-sucks with cursor is that it
               | doesn't have a way for me to steer it mid-process that
               | i'm aware of.
               | 
               | it'll build something that fails a test, but _i know_ how
               | to fix the problem. i can 't jump in a manually fix it or
               | tell it what to do. i just have to watch it churn through
               | the problem and eventually give up and throw away a 90%
               | good solution that i knew how to fix.
        
               | smokel wrote:
               | My experience _was_ exactly the opposite.
               | 
               | Experienced developers know when the LLM goes off the
               | rails, and are typically better at finding useful
               | applications. Junior developers on the other hand, can
               | let horrible solutions pass through unchecked.
               | 
               | Then again, LLMs are improving so quickly, that the most
               | recent ones help juniors to learn and understand things
               | better.
        
               | rzz3 wrote:
               | It's also really good for me as a very senior engineer
               | with serious ADHD. Sometimes I get very mentally blocked,
               | and telling Claude Code to plan and implement a feature
               | gives me a really valuable starting point and has a way
               | of unblocking me. For me it's easier to elaborate off of
               | an existing idea or starting point and refactor than
               | start a whole big thing from zero on my own.
        
               | unoti wrote:
               | > Having spent a couple of weeks on Claude Code recently,
               | I arrived to the conclusion that the net value for me
               | from agentic AI is actually negative. > For me it's meant
               | a huge increase in productivity, at least 3X. > How do we
               | reconcile these two comments? I think that's a core
               | question of the industry right now.
               | 
               | Every success story with AI coding involves giving the
               | agent enough context to succeed on a task that it can see
               | a path to success on. And every story where it fails is a
               | situation where it had not enough context to see a path
               | to success on. Think about what happens with a junior
               | software engineer: you give them a task and they either
               | succeed or fail. If they succeed wildly, you give them a
               | more challenging task. If they fail, you give them more
               | guidance, more coaching, and less challenging tasks with
               | more personal intervention from you to break it down into
               | achievable steps.
               | 
               | As models and tooling becomes more advanced, the place
               | where that balance lies shifts. The trick is to ride that
               | sweet spot of task breakdown and guidance and
               | supervision.
        
               | troupo wrote:
               | > And every story where it fails is a situation where it
               | had not enough context to see a path to success on.
               | 
               | And you know that because people are actively sharing the
               | projects, code bases, programming languages and
               | approaches they used? Or because your _gut feeling_ is
               | telling you that?
               | 
               | For me, agents failed with enough context, and with not
               | enough context, and succeeded with context, or not
               | enough, and succeeded and failed with and without
               | "guidance and coaching"
        
               | hirako2000 wrote:
               | Bold claims.
               | 
               | From my experience, even the top models continue to fail
               | delivering correctness on many tasks even with all the
               | details and no ambiguity in the input.
               | 
               | In particular when details are provided, in fact.
               | 
               | I find that with solutions likely to be well oiled in the
               | training data, a well formulated set of *basic*
               | requirements often leads to a zero shot, "a" perfectly
               | valid solution. I say "a" solution because there is still
               | this probability (seed factor) that it will not honour
               | part of the demands.
               | 
               | E.g, build a to-do list app for the browser, persist
               | entries into a hashmap, no duplicate, can edit and
               | delete, responsive design.
               | 
               | I never recall seeing an LLM kick off C++ code out of
               | that. But I also don't recall any LLM succeeding in all
               | these requirements, even though there aren't that many.
               | 
               | It may use a hash set, or even a set for persistence
               | because it avoids duplicates out of the box. And it would
               | even use a hash map to show it used a hashmap but as an
               | intermediary data structure. It would be responsive, but
               | the edit/delete buttons may not show, or may not be
               | functional. Saving the edits may look like it worked, but
               | did not.
               | 
               | The comparison with junior developers is pale. Even a
               | mediocre developer can test its and won't pretend that it
               | works if it doesn't even execute. If a develop lies too
               | many times it would lose trust. We forgive these machines
               | because they are just automatons with a label on it "can
               | make mistakes". We have no resorts to make them speak the
               | truth, they lie by design.
        
               | brulard wrote:
               | > From my experience, even the top models continue to
               | fail delivering correctness on many tasks even with all
               | the details and no ambiguity in the input.
               | 
               | You may feel like there are all the details and no
               | ambiguity in the prompt. But there may still be missing
               | parts, like examples, structure, plan, or division to
               | smaller parts (it can do that quite well if explicitly
               | asked for). If you give too much details at once, it gets
               | confused, but there are ways how to let the model access
               | context as it progresses through the task.
               | 
               | And models are just one part of the equation. Another
               | parts may be orchestrating agent, tools, models awareness
               | of the tools available, documentation, and maybe even
               | human in the loop.
        
               | sixothree wrote:
               | It might just be me but I feel like it excels with
               | certain languages where other situations it falls flat.
               | Throw a well architected and documented code base in a
               | popular language and you can definitely feel it get I to
               | its groove.
               | 
               | Also giving IT tools to ensure success is just as
               | important. MCPs can sometimes make a world of difference,
               | especially when it needs to search you code base.
        
               | delegate wrote:
               | Easy. You're 3x more productive for a while and then you
               | burn yourself out.
               | 
               | Or lose control of the codebase, which you no longer
               | understand after weeks of vibing (since we can only think
               | and accumulate knowledge at 1x).
               | 
               | Sometimes the easy way out is throwing a week of
               | generated code away and starting over.
               | 
               | So that 3x doesn't come for free at all, besides API
               | costs, there's the cost of quickly accumulating tech debt
               | which you have to pay if this is a long term project.
               | 
               | For prototypes, it's still amazing.
        
               | brulard wrote:
               | You conflate efficient usage of AI with "vibing". Code
               | can be written by AI and still follow the agreed-upon
               | structures and rules and still can and should be
               | thoroughly reviewed. The 3x absolutely does not come for
               | free. But the price may have been paid in advance by
               | learning how to use those tools best.
               | 
               | I agree the vibe-coding mentality is going to be a major
               | problem. But aren't all tools used well and used badly?
        
               | bloomca wrote:
               | > 2. When NOT to use the truck... when talking or the
               | bike is actually the better way to go.
               | 
               | Some people write racing car code, where a truck just
               | doesn't bring much value. Some people go into more
               | uncharted territories, where there are no roads (so the
               | truck will not only slow you down, it will bring a bunch
               | of dead weight).
               | 
               | If the road is straight, AI is wildly good. In fact, it
               | is probably _too_ good; but it can easily miss a turn and
               | it will take a minute to get it on track.
               | 
               | I am curious if we'll able to fine tune LLMs to assist
               | with less known paths.
        
               | troupo wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | We don't. Because there's no hard data:
               | https://dmitriid.com/everything-around-llms-is-still-
               | magical...
               | 
               | And when hard data of any kind _does_ start appearing, it
               | may actually point in a different direction:
               | https://metr.org/blog/2025-07-10-early-2025-ai-
               | experienced-o...
               | 
               | > We need to shift the conversation to techniques, and
               | away from the tools.
               | 
               | No, you're asking to shift the conversation to magical
               | incantation which experts claim work.
               | 
               | What we need to do is shift the conversation to
               | _measurements_
        
               | jf22 wrote:
               | A couple of weeks isn't enough.
               | 
               | I'm six months in using LLMs to generate 90 of my code
               | and finally understanding the techniques and limitations.
        
               | gwd wrote:
               | > How do we reconcile these two comments? I think that's
               | a core question of the industry right now.
               | 
               | The question is, for those people who _feel_ like things
               | are going faster, what 's the _actual_ velocity?
               | 
               | A month ago I showed it a basic query of one resource I'd
               | rewritten to use a "query builder" API. Then I showed it
               | the "legacy" query of another resource, and asked it to
               | do something similar. It managed to get very close on the
               | first try, and with only a few more hours of tweaking and
               | testing managed to get a reasonably thorough test suite
               | to pass. I'm sure that took half the time it would have
               | taken me to do it by hand.
               | 
               | Fast forward to this week, when I ran across some strange
               | bugs, and had to spend a day or two digging into the code
               | again, and do some major revision. Pretty sure those bugs
               | wouldn't have happened if I'd written the code myself;
               | but even though I reviewed the code, they went under the
               | radar, because I hadn't really understood the code as
               | well as I thought I had.
               | 
               | So was I faster overall? Or did I just offload some of
               | the work to myself at an unpredictable point in the
               | future? I don't "vibe code": I keep tight reign on the
               | tool and review everything it's doing.
        
             | thanhhaimai wrote:
             | I work across the stack (frontend, backend, ML)
             | 
             | - For FrontEnd or easy code, it's a speed up. I think it's
             | more like 2x instead of 3x.
             | 
             | - For my backend (hard trading algo), it has like 90%
             | failure rate so far. There is just so much for it to reason
             | through (balance sheet, lots, wash, etc). All agents I have
             | tried, even on Max mode, couldn't reason through all the
             | cases correctly. They end up thrashing back and forth.
             | Gemini most of the time will go into the "depressed" mode
             | on the code base.
             | 
             | One thing I notice is that the Max mode on Cursor is not
             | worth it for my particular use case. The problem is either
             | easy (frontend), which means any agent can solve it, or
             | it's hard, and Max mode can't solve it. I tend to pick the
             | fast model over strong model.
        
             | squeaky-clean wrote:
             | I just want to point out that they only said agentic models
             | were a negative, not AI in general. I don't know if this is
             | what they meant, but I personally prefer to use a web or
             | IDE AI tool and don't really like the agentic stuff
             | compared to those. For me agentic AI would be a net
             | positive against no-AI, but it's a net negative compared to
             | other AI interfaces
        
             | dmitrygr wrote:
             | > For me it's meant a huge increase in productivity, at
             | least 3X.
             | 
             | Quote possibly you are doing very common things that are
             | often done and thus are in the training set a lot, the
             | parent post is doing something more novel that forces the
             | model to extrapolate, which they suck at.
        
               | cambaceres wrote:
               | Sure, I won't argue against that. The more complex (and
               | fun) parts of the applications I tend to write myself.
               | The productivity gains are still real though.
        
             | bcrosby95 wrote:
             | My current guess is it's how the programmer solves problems
             | in their head. This isn't something we talk about much.
             | 
             | People seem to find LLMs do well with well-spec'd features.
             | But for me, creating a good spec doesn't take any less time
             | than creating the code. The problem for me is the
             | translation layer that turns the model in my head into
             | something more concrete. As such, creating a spec for the
             | LLM doesn't save me any time over writing the code myself.
             | 
             | So if it's a one shot with a vague spec and that works
             | that's cool. But if it's well spec'd to the point the LLM
             | won't fuck it up then I may as well write it myself.
        
             | byryan wrote:
             | That makes sense, especially if your building web
             | applications that are primarily "just" CRUD operations. If
             | a lot of the API calls follow the same pattern and the
             | application is just a series of API calls + React UI then
             | that seems like something an LLM would excel at. LLM's are
             | also more proficient in TypeScript/JS/Python compared to
             | other languages, so that helps as well.
        
             | carlhjerpe wrote:
             | I'm currently unemployed in the DevOps field (resigned and
             | got a long vacation). I've been using various models to
             | write various Kubernetes plug-ins abd simple automation
             | scripts. It's been a godsend implementing things which
             | would require too much research otherwise, my ADHD context
             | window is smaller than Claude's.
             | 
             | Models are VERY good at Kubernetes since they have very
             | anal (good) documentation requirements before merging.
             | 
             | I would say my productivity gain is unmeasurable since I
             | can produce things I'd ADHD out of unless I've got a whip
             | up my rear.
        
             | qingcharles wrote:
             | On the right projects, definitely an enormous upgrade for
             | me. Have to be judicious with it and know when it is right
             | and when it's wrong. I think people have to figure out what
             | those times are. For now. In the future I think a lot of
             | the problems people are having with it will diminish.
        
           | revskill wrote:
           | Truth. To some extend, the agent doesn't know what it's doing
           | at all, it lacks real brain, maybe we should just treat them
           | as the hard worker.
        
           | flowerthoughts wrote:
           | What type of work do you do? And how do you measure value?
           | 
           | Last week I was using Claude Code for web development. This
           | week, I used it to write ESP32 firmware and a Linux kernel
           | driver. Sure, it made mistakes, but the net was still very
           | positive in terms of efficiency.
        
             | verall wrote:
             | > This week, I used it to write ESP32 firmware and a Linux
             | kernel driver.
             | 
             | I'm not meaning to be negative at all, but was this for a
             | toy/hobby or for a commercial project?
             | 
             | I find that LLMs do very well on small greenfield toy/hobby
             | projects but basically fall over when brought into
             | commercial projects that often have bespoke requirements
             | and standards (i.e. has to cross compile on qcc, comply
             | with autosar, in-house build system, tons of legacy code
             | laying around maybe maybe not used).
             | 
             | So no shade - I'm just really curious what kind of project
             | you were able get such good results writing ESP32 FW and
             | kernel drivers for :)
        
               | lukebechtel wrote:
               | Maintaining project documentation is:
               | 
               | (1) Easier with AI
               | 
               | (2) Critical for letting AI work effectively in your
               | codebase.
               | 
               | Try creating well structured rules for working in your
               | codebase, put in .cursorrules or Claude equivalent... let
               | AI help you... see if that helps.
        
               | theshrike79 wrote:
               | The magic to using agentic LLMs efficiently is...
               | 
               | proper project management.
               | 
               | You need to have good documentation, split into logical
               | bits. Tasks need to be clearly defined and not have
               | extensive dependencies.
               | 
               | And you need to have a simple feedback loop where you can
               | easily run the program and confirm the output matches
               | what you want.
        
               | troupo wrote:
               | And the chance of that working depends on the weather,
               | the phase of the moon and the arrangement of bird bones
               | in a druidic augury.
               | 
               | It's a non-deterministic system producing statistically
               | relevant results with no failure modes.
               | 
               | I had Cursor one-shot issues in internal libraries with
               | zero rules.
               | 
               | And then suggest I use StringBuilder (Java) in a 100%
               | Elixir project with carefully curated cursor rules as
               | suggested by the latest shamanic ritual trends.
        
               | oceanplexian wrote:
               | I work in FAANG, have been for over a decade. These tools
               | are creating a huge amount of value, starting with
               | Copilot but now with tools like Claude Code and Cursor.
               | The people doing so don't have a lot of time to comment
               | about it on HN since we're busy building things.
        
               | nomel wrote:
               | What are the AI usage policies like at your org? Where I
               | am, we're severely limited.
        
               | jpc0 wrote:
               | > These tools are creating a huge amount of value...
               | 
               | > The people doing so don't have a lot of time to comment
               | about it on HN since we're busy building...
               | 
               | "We're so much more productive that we don't have time to
               | tell you how much more productive we are"
               | 
               | Do you see how that sounds?
        
               | wijwp wrote:
               | To be fair, AI isn't going to give us more time outside
               | work. It'll just increase expectations from leadership.
        
               | drusepth wrote:
               | I feel this, honestly. I get so much more work done
               | (currently: building & shipping games, maintaining
               | websites, managing APIs, releasing several mobile apps,
               | and developing native desktop applications) managing 5x
               | claude instances that the majority of my time is sucked
               | up by just prompting whichever agent is done on their
               | next task(s), and there's a real feeling of lost
               | productivity if any agent is left idle for too long.
               | 
               | The only time to browse HN left is when all the agents
               | are comfortably spinning away.
        
               | GodelNumbering wrote:
               | I don't see how FAANG is relevant here. But the 'FAANG' I
               | used to work at had an emergent problem of people
               | throwing a lot of half baked 'AI-powered' code over the
               | wall and let reviewers deal with it (due to incentives,
               | not that they were malicious). In orgs like infra where
               | everything needs to be reviewed carefully, this is purely
               | a burden
        
               | nme01 wrote:
               | I also work for a FAANG company and so far most employees
               | agree that while LLMs are good for writing docs,
               | presentations or emails, they still lack a lot when it
               | comes to writing a maintainable code (especially in Java,
               | they supposedly do better in Go, don't know why, not my
               | opinion). Even simple refactorings need to be carefully
               | checked. I really like them for doing stuff that I know
               | nothing about though (eg write a script using a certain
               | tool, tell me how to rewrite my code to use certain
               | library etc) or for reviewing changes
        
               | verall wrote:
               | I work in a FAANG equivalent for a decade, mostly in
               | C++/embedded systems. I work on commercial products used
               | by millions of people. I use the AI also.
               | 
               | When others are finding gold in rivers similar to mine,
               | and I'm mostly finding dirt, I'm curious to ask and see
               | how similar the rivers really are, or if the river they
               | are panning in is actually somewhere I do find gold, but
               | not a river I get to pan in often.
               | 
               | If the rivers really are similar, maybe I need to work on
               | my panning game :)
        
               | GodelNumbering wrote:
               | This is my experience too. Also, their propensity to jump
               | into code without necessarily understanding the
               | requirement is annoying to say the least. As the project
               | complexity grows, you find yourself writing longer and
               | longer instructions just to guardrail.
               | 
               | Another rather interesting thing is that they tend to
               | gravitate towards sweep the errors under the rug kind of
               | coding which is disastrous. e.g. "return X if we don't
               | find the value so downstream doesn't crash". These are
               | the kind of errors no human, even a beginner on their
               | first day learning to code, wouldn't make and are
               | extremely annoying to debug.
               | 
               | Tl;dr: LLMs' tendency to treat every single thing you
               | give it as a demo homework project
        
               | tombot wrote:
               | > their propensity to jump into code without necessarily
               | understanding the requirement is annoying to say the
               | least.
               | 
               | Then don't let it, collaborate on the spec, ask Claude to
               | make a plan. You'll get far better results
               | 
               | https://www.anthropic.com/engineering/claude-code-best-
               | pract...
        
               | verall wrote:
               | > Another rather interesting thing is that they tend to
               | gravitate towards sweep the errors under the rug kind of
               | coding which is disastrous. e.g. "return X if we don't
               | find the value so downstream doesn't crash".
               | 
               | Yes, these are painful and basically the main reason I
               | moved from Claude to Gemini - it felt insane to be
               | begging the AI - "No, you actually have to fix the bug,
               | in the code you wrote, you cannot just return some random
               | value when it fails, it actually has to work".
        
               | GodelNumbering wrote:
               | Claude in particular abuses the word 'Comprehensive' a
               | lot. You express that you're unhappy with its approach,
               | it will likely comeback with "Comprehensive plan to ..."
               | and then write like 3 bullet points under it, that is of
               | course after profusely apologizing. On a sidenote, I wish
               | LLMs never apologized and instead just said I don't know
               | how to do this.
        
               | jorvi wrote:
               | Running LLM code with kernel privileges seems like
               | courting disaster. I wouldn't dare do that unless I had a
               | rock-solid grasp of the subsystem, and at that point, why
               | not just write the code myself? LLM coding is on-average
               | 20% slower.
        
               | LinXitoW wrote:
               | In my experience in a Java code base, it didn't do any of
               | this, and did a good job with exceptions.
               | 
               | And I have to disagree that these aren't errors that
               | beginners or even intermediates make. Who hasn't
               | swallowed an error because "that case totally, most
               | definitely won't ever happen, and I need to get this
               | done"?
        
               | flowerthoughts wrote:
               | Totally agree.
               | 
               | This was a debugging tool for Zigbee/Thread.
               | 
               | The web project is Nuxt v4, which was just released, so
               | Claude keeps wanting to use v3 semantics, and you have to
               | keep repeating the known differences, even if you use
               | CLAUDE.md. (They moved client files under a app/
               | subdirectory.)
               | 
               | All of these are greenfield prototypes. I haven't used it
               | in large systems, and I can totally see how that would be
               | context overload for it. This is why I was asking GP
               | about the circumstances.
        
               | LinXitoW wrote:
               | Ironically, AI mirrors human developers in that it's far
               | more effective when working in a well written, well
               | documented code base. It will infer function
               | functionality from function names. If those are shitty,
               | short, or full of weird abbreviations, it'll have a hard
               | time.
               | 
               | Maybe it's a skill issue, in the sense of having a decent
               | code base.
        
           | greenie_beans wrote:
           | same. agents are good with easy stuff and debugging but
           | extremely bad with complexity. has no clue about chesterson's
           | fence, and it's hard to parse the results especially when it
           | creates massive diffs. creates a ton of abandoned/cargo code.
           | lots of misdirection with OOP.
           | 
           | chatting witch claude and copy/pasting code between my IDE
           | and claude is still the most effective for more complex
           | stuff, at least for me.
        
           | jmartrican wrote:
           | Maybe that is a skills issue.
        
             | rootusrootus wrote:
             | If you are suggesting that LLMs are proving quite good at
             | taking over the low skilled work that probably 90% of devs
             | spend the majority of their time doing, I totally agree. It
             | is the simplest explanation for why many people think they
             | are magic, while some people find very little value.
             | 
             | On the occasion that I find myself having to write web code
             | for whatever reason, I'm very happy to have Claude. I don't
             | enjoy coding for the web, like at all.
        
               | logicprog wrote:
               | I think that's definitely true -- these tools are only
               | really taking care of the relatively low skill stuff;
               | synthesizing algorithms and architectures and approaches
               | that have been seen before, automating building out for
               | scaffolding things, or interpolating skeletons, and
               | running relatively typical bash commands for you after
               | making code changes, or implementing fairly specific
               | specifications of how to approach novel architectures
               | algorithms or code logic, automating exploring code bases
               | and building understanding of what things do and where
               | they are and how they relate and the control flow (which
               | would otherwise take hours of laboriously grepping around
               | and reading code), all in small bite sized pieces with a
               | human in the loop. They're even able to make complete and
               | fully working code for things that are a small variation
               | or synthesization of things they've seen a lot before in
               | technologies they're familiar with.
               | 
               | But I think that that can still be a pretty good boost --
               | I'd say maybe 20 to 30%, plus MUCH less headache, when
               | used right -- even for people that are doing really
               | interesting and novel things, because even if your work
               | has a lot of novelty and domain knowledge to it, there's
               | always mundane horseshit that eats up way too much of
               | your time and brain cycles. So you can use these agents
               | to take care of all the peripheral stuff for you and just
               | focus on what's interesting to you. Imagine you want to
               | write some really novel unique complex algorithm or
               | something but you do want it to have a GUI debugging
               | interface. You can just use Imgui or TKinter if you can
               | make Python bindings or something and then offload that
               | whole thing onto the LLM instead of having to have that
               | extra cognitive load and have to page just to warp the
               | meat of what you're working on out whenever you need to
               | make a modification to your GUI that's more than trivial.
               | 
               | I also think this opens up the possibility for a lot more
               | people to write ad hoc personal programs for various
               | things they need, which is even more powerful when
               | combined with something like Python that has a ton of
               | pre-made libraries that do all the difficult stuff for
               | you, or something like emacs that's highly malleable and
               | rewards being able to write programs with it by making
               | them able to very powerfully integrate with your workflow
               | and environment. Even for people who already know how to
               | program and like programming even, there's still an
               | opportunity cost and an amount of time and effort and
               | cognitive load investment in making programs. So by
               | significantly lowering that you open up the opportunities
               | even for us and for people who don't know how to program
               | at all, their productivity basically goes from zero to
               | one, an improvement of 100% (or infinity lol)
        
               | phist_mcgee wrote:
               | What a supremely arrogant comment.
        
               | rootusrootus wrote:
               | I often have such thoughts about things I read on HN but
               | I usually follow the site guidelines and keep it to
               | myself.
        
           | ericmcer wrote:
           | Agreed, daily Cursor user.
           | 
           | Just got out of a 15m huddle with someone trying to
           | understand what they were doing in a PR before they admitted
           | Claude generated everything and it worked but they weren't
           | sure why... Ended up ripping about 200 LoC out because what
           | Claude "fixed" wasn't even broken.
           | 
           | So never let it generate code, but the autocomplete is
           | absolutely killer. If you understand how to code in 2+
           | languages you can make assumptions about how to do things in
           | many others and let the AI autofill the syntax in. I have
           | been able to swap to languages I have almost no experience in
           | and work fairly well because memorizing syntax is irrelevant.
        
             | daymanstep wrote:
             | > I have been able to swap to languages I have almost no
             | experience in and work fairly well because memorizing
             | syntax is irrelevant.
             | 
             | I do wonder whether your code does what you think it does.
             | Similar-sounding keywords in different languages can have
             | completely different meanings. E.g. the volatile keyword in
             | Java vs C++. You don't know what you don't know, right? How
             | do you know that the AI generated code does what you think
             | it does?
        
               | jacobr1 wrote:
               | Beyond code-gen I think some techniques are very
               | underutilized. One can generate tests, generate docs,
               | explain things line by line. Explicitly explaining
               | alternative approaches and tradeoffs is helpful too.
               | While, as with everything in this space, there are
               | imperfection, I find a ton of value in looking beyond the
               | code into thinking through the use cases, alternative
               | approaches and different ways to structure the same
               | thing.
        
               | pornel wrote:
               | I've wasted time debugging phantom issues due to LLM-
               | generated tests that were misusing an API.
               | 
               | Brainstorming/explanations can be helpful, but also watch
               | out for Gell-Mann amnesia. It's annoying that LLMs always
               | sound smart whether they are saying something smart or
               | not.
        
               | Miraste wrote:
               | Yes, you can't use any of the heuristics you develop for
               | human writing to decide if the LLM is saying something
               | stupid, because its best insights and its worst
               | hallucinations all have the same formatting, diction, and
               | style. Instead, you need to engage your frontal cortex
               | and rationally evaluate every single piece of information
               | it presents, and that's tiring.
        
               | spanishgum wrote:
               | The same way I would with any of my own code - I would
               | test it!
               | 
               | The key here is to spend less time searching, and more
               | time understanding the search result.
               | 
               | I do think the vibe factor is going to bite companies in
               | the long run. I see a lot of vibe code pushed by both
               | junior and senior devs alike, where it's clear not enough
               | time was spent reviewing the product. This behavior is
               | being actively rewarded now, but I do think the attitude
               | around building code as fast as possible will change if
               | impact to production systems becomes realized as a net
               | negative. Time will tell.
        
             | senko wrote:
             | > Just got out of a 15m huddle with someone trying to
             | understand what they were doing in a PR before they
             | admitted Claude generated everything and it worked but they
             | weren't sure why...
             | 
             | But .. that's not the AI's fault. If people submit _any_
             | PRs (including AI-generated or AI-assisted) without
             | _completely_ understanding them, I 'd treat is as serious
             | breach of professional conduct and (gently, for first-
             | timers) stress that this is _not_ acceptable.
             | 
             | As someone hitting the "Create PR" (or equivalent) button,
             | you accept responsibility for the code in question. If you
             | submit slop, it's 100% on you, not on any tool used.
        
               | draxil wrote:
               | But it's pretty much a given at this point that if you
               | use agents to code for any length of time it starts to
               | atrophy your ability to understand what's going on. So,
               | yeah. it's a bit of a devils chalice.
        
               | whatever1 wrote:
               | If you have to review what the LLM wrote then there is no
               | productivity gain.
               | 
               | Leadership asks for vibe coding
        
               | senko wrote:
               | > If you have to review what the LLM wrote then there is
               | no productivity gain.
               | 
               | I do not agree with that statement.
               | 
               | > Leadership asks for vibe coding
               | 
               | Leadership always asks for more, better, faster.
        
               | swat535 wrote:
               | > If you have to review what the LLM wrote then there is
               | no productivity gain.
               | 
               | You always have to review the code, whether it's written
               | by another person, yourself or an AI.
               | 
               | I'm not sure how this translates into the loss of
               | productivity?
               | 
               | Did you mean to say that the code AI generates is
               | difficult to review? In those cases, it's the fault of
               | the code author and not the AI.
               | 
               | Using AI like any other tool requires experience and
               | skill.
        
             | qingcharles wrote:
             | The other day I caught it changing the grammar and spelling
             | in a bunch of static strings in a totally different part of
             | a project, for no sane reason.
        
               | bdamm wrote:
               | I've seen it do this as well. Odd things like swapping
               | the severity level on log statements that had nothing to
               | do with the task.
               | 
               | Very careful review of my commits is the only way
               | forward, for a long time.
        
           | meowtimemania wrote:
           | For me it depends on the task. For some tasks (maybe things
           | that don't have good existing examples in my codebase?)
           | 
           | I'll spend 3x the time repeatedly asking claude to do
           | something for me
        
           | 9cb14c1ec0 wrote:
           | The more I use Claude Code, the more aware I become of its
           | limitations. On the whole, it's a useful tool, but the bigger
           | the codebase the less useful. I've noticed a big difference
           | on its performance on projects with 20k lines of code versus
           | 100k. (Yes, I know. A 100k line project is still very small
           | in the big picture)
        
         | alexchamberlain wrote:
         | I'm not sure how, and maybe some of the coding agents are doing
         | this, but we need to teach the AI to use abstractions, rather
         | than the whole code base for context. We as humans don't hold
         | the whole codebase in our hear, and we shouldn't expect the AI
         | to either.
        
           | F7F7F7 wrote:
           | There are a billion and one repos that claim to help do this.
           | Let us know when you find one.
        
           | siwatanejo wrote:
           | I do think AIs are already using abstractions, otherwise you
           | would be submitting all the source code of your dependencies
           | into the context.
        
             | TheOtherHobbes wrote:
             | I think they're recognising patterns, which is not the same
             | thing.
             | 
             | Abstractions are stable, they're explicit in their domains,
             | good abstractions cross multiple domains, and they
             | typically come with a symbolic algebra of available
             | operations.
             | 
             | Math is made of abstractions.
             | 
             | Patterns are a weaker form of cognition. They're implicit,
             | heavily context-dependent, and there's no algebra. You have
             | to poke at them crudely in the hope you can make them do
             | something useful.
             | 
             | Using LLMs feels more like the latter than the former.
             | 
             | If LLMs were generating true abstractions they'd be finding
             | meta-descriptions for code and language and making them
             | accessible directly.
             | 
             | AGI - or ASI - may be be able to do that some day, but it's
             | not doing that now.
        
           | anthonypasq wrote:
           | the fact we cant keep the repo in our working memory is a
           | flaw of our brains. i cant see how you could possibly make
           | the argument that if you were somehow able to keep the entire
           | codebase in your head that it would be a disadvantage.
        
             | SkyBelow wrote:
             | Information tradeoff. Even if you could keep the entire
             | code base in memory, if something else has to be left out
             | of memory, then you have to consider the value of an
             | abstraction verses whatever other information is lost.
             | Abstractions also apply to the business domain and works
             | the same.
             | 
             | You also have time tradeoffs. Like time to access memory
             | and time to process that memory to achieve some outcome.
             | 
             | There is also quality. If you can keep the entire code base
             | in memory but with some chance of confusion, while
             | abstractions will allow less chance of confusion, then the
             | tradeoff of abstractions might be worth it still.
             | 
             | Even if we assume a memory that has no limits, can access
             | and process all information at constant speed, and no
             | quality loss, there is still communication limitations to
             | worry about. Energy consumption is yet another.
        
           | sdesol wrote:
           | LLMs (current implementation) are probabilistic so it really
           | needs the actual code to predict the most likely next tokens.
           | Now loading the whole code base can be a problem in itself,
           | since other files may negatively affect the next token.
        
             | photon_lines wrote:
             | Sorry -- I keep seeing this being used but I'm not entirely
             | sure how it differs from most of human thinking. Most human
             | 'reasoning' is probabilistic as well and we rely on
             | 'associative' networks to ingest information. In a similar
             | manner - LLMs use association as well -- and not only that,
             | but they are capable of figuring out patterns based on
             | examples (just like humans are) -- read this paper for
             | context: https://arxiv.org/pdf/2005.14165. In other words,
             | they are capable of grokking patterns from simple data
             | (just like humans are). I've given various LLMs my
             | requirements and they produced working solutions for me by
             | simply 1) including all of the requirements in my prompt
             | and 2) asking them to think through and 'reason' through
             | their suggestions and the products have always been
             | superior to what most humans have produced. The 'LLMs are
             | probabilistic predictors' comments though keep appearing on
             | threads and I'm not quite sure I understand them -- yes,
             | LLMs don't have 'human context' i.e. data needed to
             | understand human beings since they have not directly been
             | fed in human experiences, but for the most part -- LLMs are
             | not simple 'statistical predictors' as everyone brands them
             | to be. You can see a thorough write-up I did of what GPT is
             | / was here if you're interested:
             | https://photonlines.substack.com/p/intuitive-and-visual-
             | guid...
        
               | didibus wrote:
               | You seem possibly more knowledgeable then me on the
               | matter.
               | 
               | My impression is that LLMs predict the next token based
               | on the prior context. They do that by having learned a
               | probability distribution from tokens -> next-token.
               | 
               | Then as I understand, the models are never reasoning
               | about the problem, but always about what the next token
               | should be given the context.
               | 
               | The chain of thought is just rewarding them so that the
               | next token isn't predicting the token of the final answer
               | directly, but instead predicting the token of the
               | reasoning to the solution.
               | 
               | Since human language in the dataset contains text that
               | describes many concepts and offers many solutions to
               | problems. It turns out that predicting the text that
               | describes the solution to a problem often ends up being
               | the correct solution to the problem. That this was true
               | was kind of a lucky accident and is where all the
               | "intelligence" comes from.
        
               | photon_lines wrote:
               | So - in the pre-training step you are right -- they are
               | simple 'statistical' predictors but there are more steps
               | involved in their training which turn them from simple
               | predictors to being able to capture patterns and reason
               | -- I tried to come up with an intuitive overview of how
               | they do this in the write-up and I'm not sure I can give
               | you a simple explanation here, but I would recommend you
               | play around with Deep-Seek and other more advanced
               | 'reasoning' or 'chain-of-reason' models and ask them to
               | perform tasks for you: they are not simply statistically
               | combining information together. Many times they are able
               | to reason through and come up with extremely advanced
               | working solutions. To me this indicates that they are not
               | 'accidently' stumbling upon solutions based on statistics
               | -- they actually are able to 'understand' what you are
               | asking them to do and to produce valid results.
        
               | sdesol wrote:
               | I'm not sure if I would say human reasoning is
               | 'probabilistic' unless you are taking a very far step
               | back and saying based on how the person lived, they have
               | ingrained biases (weights) that dictates how they reason.
               | I don't know if LLMs have a built in scepticism like
               | humans do, that plays a significant role in reasoning.
               | 
               | Regardless if you believe LLMs are probabilistic or not,
               | I think what we are both saying is context is king and
               | what it (LLM) says is dictated by the context (either
               | through training) or introduced by the user.
        
               | Workaccount2 wrote:
               | Humans have a neuro-chemical system that performs
               | operations with electrical signals.
               | 
               | That's the level to look at, unless you have a dualist
               | view of the brain (we are channeling a super-natural
               | forces).
        
               | lll-o-lll wrote:
               | Yep, just like like looking at a birds feather through a
               | microscope explains the principles of flight...
               | 
               | Complexity theory doesn't have a mathematics (yet), but
               | that doesn't mean we can't see that it exists. Studying
               | the brain at the lowest levels haven't lead to any major
               | insights in how cognition functions.
        
               | photon_lines wrote:
               | 'I don't know if LLMs have a built in scepticism like
               | humans do' - humans don't have an 'in built skepticism'
               | -- we learn in through experience and through being
               | taught how to 'reason' within school (and it takes a very
               | long time to do this). You believe that this is in-
               | grained but you may have forgotten having to slog through
               | most of how the world works and being tested when you
               | went to school and when your parents taught you these
               | things. On the context component: yes, context is vitally
               | important (just as it is with humans) -- you can't
               | produce a great solution unless you understand the 'why'
               | behind it and how the current solution works so I 100%
               | agree with that.
        
               | ijidak wrote:
               | For me, the way humans finish each other's sentences and
               | often think of quotes from the same movies at the same
               | time in conversation (when there is no clear reason for
               | that quote to be a part of the conversation), indicates
               | that there is a probabilistic element to human thinking.
               | 
               | Is it entirely probabilistic? I don't think so. But, it
               | does seem that a chunk of our speech generation and
               | processing is similar to LLMs. (e.g. given the words I've
               | heard so far, my brain is guessing words x y z should
               | come next.)
               | 
               | I feel like the conscious, executive mind humans have
               | exercises some active control over our underlying
               | probabilistic element. And LLMs lack the conscious
               | executive.
               | 
               | e.g. They have our probabilistic capabilities, without
               | some additional governing layer that humans have.
        
             | nomel wrote:
             | No, it doesn't, nor do we. It's why abstractions and
             | documentations exist.
             | 
             | If you know what a function achieves, and you trust it to
             | do that, you don't need to see/hold its exact
             | implementation in your head.
        
               | sdesol wrote:
               | But documentation doesn't include styling or preferred
               | pattern, which is why I think a lot people complain that
               | the LLM will just produce garbage. Also documentation is
               | not guaranteed to be correct or up to date. To be able to
               | produce the best code based on what you are hoping for, I
               | do think having the actual code is necessary unless
               | styling/design patterns are not important, then yes
               | documentation will be suffice, provided they are accurate
               | and up to date.
        
           | throwaway314155 wrote:
           | /compact in Claude Code is effectively this.
        
           | LinXitoW wrote:
           | They already do, or at least Claude Code does. It will search
           | for a method name, then only load a chunk of that file to get
           | the method signature, for example.
           | 
           | It will use the general information you give it to make
           | educated guesses of where things are. If it knows the code is
           | Vue based and it has to do something with "users", it might
           | seach for "src/*/ _User_.vue.
           | 
           | This is also the reason why the quality of your code makes
           | such a large difference. The more consistent the naming of
           | files and classes, the better the AI is at finding them.
        
         | sdesol wrote:
         | > I really desperately need LLMs to maintain extremely
         | effective context
         | 
         | I actually built this. I'm still not ready to say "use the tool
         | yet" but you can learn more about it at
         | https://github.com/gitsense/chat.
         | 
         | The demo link is not up yet as I need to finalize an admin tool
         | but you should be able to follow the npm instructions to play
         | around with.
         | 
         | The basic idea is, you should be able to load your entire repo
         | or repos and use the context builder to help you refine it. Or
         | you can can create custom analyzers that you can do 'AI
         | Assisted' searches with like execute `!ask find all frontend
         | code that does [this]` and the because the analyzer knows how
         | to extract the correct metadata to support that query, you'll
         | be able to easily build the context using it.
        
           | kvirani wrote:
           | Wait that's not how Cursor etc work? (I made assumptions)
        
             | trenchpilgrim wrote:
             | Dunno about Cursor but this is exactly how I use Zed to
             | navigate groups of projects
        
             | sdesol wrote:
             | I don't use Cursor so I can't say, but based on what I've
             | read, they optimize for smaller context to reduce cost and
             | probably for performance. The issue is, I think this is
             | severely flawed as LLMs are insanely context sensitive and
             | forgetting to include a reference file can lead to
             | undesirable code.
             | 
             | I am obviously biased, but I still think to get the best
             | results, the context needs to be human curated to ensure
             | everything the LLM needs will be present. LLMs are
             | probabilistic, so the more relevant context, the greater
             | the chances the final output is the most desired.
        
           | hirako2000 wrote:
           | Not clear how it gets around what is, ultimately, a context
           | limit.
           | 
           | I've been fiddling with some process too, would be good if
           | you shared the how. The readme looks like yet another full
           | fledged app.
        
             | sdesol wrote:
             | Yes there is a context window limit, but I've found for
             | most frontier models, you can generate very effective code
             | if the context window is under 75,000 tokens provided the
             | context is consistent. You have to think of everything from
             | a probability point of view and the more logical the
             | context, the greater the chances of better code.
             | 
             | For example, if the frontend doesn't need to know the
             | backend code (other than the interface) not including the
             | backend code to solve a frontend one to solve a specific
             | problem can reduce context size and improve the chances of
             | expected output. You just need to ensure you include the
             | necessary interface documenation.
             | 
             | As for the full fledged app, I think you raised a good
             | point and I should add a 'No lock in' section for why to
             | use it. The app has a message tool that lets you pick and
             | choose what messages to copy. Once you've copied the
             | context (including any conversation messages that can help
             | the LLM), you can use the context where ever you want.
             | 
             | My strategy with the app is to be the first place you goto
             | to start a conversation before you even generate code, so
             | my focus is helping you construct contexts (the smaller the
             | better) to feed into LLMs.
        
           | handfuloflight wrote:
           | Doesn't Claude Code do all of this automatically?
        
             | sdesol wrote:
             | I haven't looked at Claud Code, so I don't know if they
             | have analyzers or not that understands how to extract any
             | type of data other than specific coding data that it is
             | trained on. Based on the runtime for some tasks, I would
             | not be surprised if it is going through all the files and
             | asking "is this relevant"
             | 
             | My tool is mainly targeted at massive code bases and
             | enterprise as I still believe the most efficient way to
             | build accurate context is by domain experts.
             | 
             | Right now, I would say 95% of my code is AI generated (98%
             | human architectured) and I am spending about $2 a day on
             | LLM costs and the code generation part usually never runs
             | more than 30 seconds for most tasks.
        
               | handfuloflight wrote:
               | Well you should look at it, because it's not going
               | through all files. I looked at your product and the
               | workflow is essentially asking me to do manually what
               | Claude Code does auto. Granted, manually selecting the
               | context will probably lead to lower costs in any case
               | because Claude Code invokes tool calls like grep to do
               | its search, so I do see merit in your product in that
               | respect.
        
               | sdesol wrote:
               | Looking at the code, it does have some sort of automatic
               | discovery. I also don't know how scalable Claude Code is.
               | I've spent over a decade thinking about code search, so I
               | know what the limitations are for enterprise code.
               | 
               | One of the neat tricks that I've developed is, I would
               | load all my backend code for my search component and then
               | I would ask the LLM to trace a query and create a context
               | bundle for only the files that are affected. Once the LLM
               | has finished, I just need to do a few clicks to refine a
               | 80,000 token size window down to about 20,000 tokens.
               | 
               | I would not be surprised if this is one of the tricks
               | that it does as it is highly effective. Also, yes my tool
               | is manual, but I treat conversations as durable asset so
               | in the future, you should be able to say, last week I did
               | this, load the same files and LLM will know what files to
               | bring into context.
        
               | handfuloflight wrote:
               | Excellent, I look forward to trying it out, at minimum to
               | wean off dependency to Claude Code and it's likely
               | current state of overspending on context. I agree with
               | looking at conversations as durable assets.
        
               | sdesol wrote:
               | > current state of overspending on context
               | 
               | The thing that is killing me when I hear about Claude
               | Code and other agent tools is the amount of energy they
               | must be using. People say they let the task run for an
               | hour and I can't help but to think how much energy is
               | being used and if Claude Code is being upfront with how
               | much things will actually cost in the future.
        
               | pacoWebConsult wrote:
               | FWIW Claude code conversations are also durable. You can
               | resume any past conversation in your project. They're
               | stored as jsonl files within your `$HOME/.claude`
               | directory. This retains the actual context (including
               | your prompts, assistant responses, tool usages, etc) from
               | that conversation, not just the files you're affecting as
               | context.
        
               | sdesol wrote:
               | Thanks for the info. I actually want to make it easy for
               | people to review aider, plandex, claude code, etc.
               | conversations so I will probably look at importing them.
               | 
               | My goal isn't to replace the other tools, but to make
               | them work smarter and more efficiently. I also think we
               | will in a year or two, start measuring performance based
               | on how developers interact with LLMs (so management will
               | want to see the conversations). Instead of looking at
               | code generated, the question is going to be, if this
               | person is let go, what is the impact based on how they
               | are contributing via their conversations.
        
         | seanmmward wrote:
         | The primary use case isn't just about shoving more code in
         | context, although depending on the task, there is an
         | irredicible minimum context needed for it to capture all the
         | needed understanding. The 1M context model is a unique beast in
         | terms of how you need to feed it, and its real power is being
         | able to tackle long horizon tasks which require iterative
         | exploration, in context learning, and resynthesis. Ie, some
         | problems are breadth (go fix an api change in 100 files), other
         | however require depth (go learn from trying 15 different ways
         | to solve this problem). 1M Sonnet is unique in its capabilities
         | for the latter in particular.
        
         | hinkley wrote:
         | Sounds to me like your problem has shifted from how much the AI
         | tool costs per hour to how much it costs per token because
         | resetting a model happens often enough that the price doesn't
         | amortize out per hour. That giant spike every ?? months
         | overshadows the average cost per day.
         | 
         | I wonder if this will become more universal, and if we won't
         | see a 'tick-tock' pattern like Intel used, where they tweak the
         | existing architecture one or more times between major design
         | work. The 'tick' is about keeping you competitive and the
         | 'tock' is about keeping you relevant.
        
         | TZubiri wrote:
         | "However. Price is king. Allowing me to flood the context
         | window with my code base is great"
         | 
         | I don't vibe code, but in general having to know all of the
         | codebase to be able to do something is a smell, it's
         | spagghetti, it's lack of encapsulation.
         | 
         | When I program I cannot think about the whole database, I have
         | a couple of files open tops and I think about the code in those
         | files.
         | 
         | This issue of having to understand the whole codebase,
         | complaining about abstractions, microservices, and OOP, and
         | wanting everything to be in a "simple" monorepo, or a monolith;
         | is something that I see juniors do, almost exclusively.
        
         | ants_everywhere wrote:
         | > I really desperately need LLMs to maintain extremely
         | effective context
         | 
         | The context is in the repo. An LLM will never have the context
         | you need to solve all problems. Large enough repos don't fit on
         | a single machine.
         | 
         | There's a tradeoff just like in humans where getting a specific
         | task done requires removing distractions. A context window that
         | contains everything makes focus harder.
         | 
         | For a long time context windows were too small, and they
         | probably still are. But they have to get better at
         | understanding the repo by asking the right questions.
        
           | stuartjohnson12 wrote:
           | > An LLM will never have the context you need to solve all
           | problems.
           | 
           | How often do you need more than 10 million tokens to answer
           | your query?
        
             | ants_everywhere wrote:
             | I exhaust the 1 million context windows on multiple models
             | multiple times per day.
             | 
             | I haven't used the Llama 4 10 million context window so I
             | don't know how it performs in practice compared to the
             | major non-open-source offerings that have smaller context
             | windows.
             | 
             | But there is an induced demand effect where as the context
             | window increases it opens up more possibilities, and those
             | possibilities can get bottlenecked on requiring an even
             | bigger context window size.
             | 
             | For example, consider the idea of storing all Hollywood
             | films on your computer. In the 1980s this was impossible.
             | If you store them in DVD or Bluray quality you could
             | probably do it in a few terabytes. If you store them in
             | full quality you may be talking about petabytes.
             | 
             | We recently struggled to get a full file into a context
             | window. Now a lot of people feel a bit like "just take the
             | whole repo, it's only a few MB".
        
           | onion2k wrote:
           | _Large enough repos don 't fit on a single machine._
           | 
           | I don't believe any human can understand a problem if they
           | need to fit the entire problem blem domain in their head, and
           | the scope of a domain that doesn't fit on a computer. You
           | _have_ to break it down into a manageable amount of
           | information to tackle it in chunks.
           | 
           | If a person can do that, so can an LLM prompted to do that by
           | a person.
        
           | sdesol wrote:
           | > But they have to get better at understanding the repo by
           | asking the right questions.
           | 
           | How I am tackling this problem is making it dead simple for
           | users to create analyzers that are designed to enriched text
           | data. You can read more about how it would be used in a
           | search at https://github.com/gitsense/chat/blob/main/packages
           | /chat/wid...
           | 
           | The basic idea is, users would construct analyzers with the
           | help of LLMs to extract the proper metadata that can be
           | semantically searched. So when the user does an AI Assisted
           | search with my tool, I would load all the analyzers
           | (description and schema) into the system prompt and the LLM
           | can determine which analyzers can be used to answer the
           | question.
           | 
           | A very simplistic analyzer would be to make it easy to
           | identify backend and frontend code so you can just use the
           | command `!ask find all frontend files` and the LLM will
           | construct a deterministic search that knows to match for
           | frontend files.
        
       | rootnod3 wrote:
       | So, more tokens means better but at the same time more tokens
       | means it distracts itself too much along the way. So at the same
       | time it is an improvement but also potentially detrimental. How
       | are those things beneficial in any capacity? What was said last
       | week? Embrace AI or leave?
       | 
       | All I see so far is: don't embrace and stay.
        
         | rootnod3 wrote:
         | So, I see this got downvoted. Instead of just downvoting, I
         | would prefer to have a counter-argument. Honestly. I am on the
         | skeptic side of LLM, but would not mind being turned to the
         | other side with some solid arguments.
        
       | pupppet wrote:
       | How does anyone send these models that much context without it
       | tripping over itself? I can't get anywhere near that much before
       | it starts losing track of instruction.
        
         | 9wzYQbTYsAIc wrote:
         | I've been having decent luck telling it to keep track of itself
         | in a .plan file, not foolproof, of course, but it has some
         | ability to "preserve context" between contexts.
         | 
         | Right now I'm experimenting with using separate .plan files for
         | tracking key instructions across domains like architecture and
         | feature decisions.
        
           | CharlesW wrote:
           | > _I've been having decent luck telling it to keep track of
           | itself in a .plan file, not foolproof, of course, but it has
           | some ability to "preserve context" between contexts._
           | 
           | This is the way. Not only have I had good luck with both a
           | TASKS.md and TASKS-COMPLETE.md (for history), but I have an
           | .llm/arch full of AI-assisted, for-LLM .md files (auth.md,
           | data-access.md, etc.) that document architecture decisions
           | made along the way. They're invaluable for effectively and
           | efficiently crossing context chasms.
        
         | olddustytrail wrote:
         | I think it's key to not give it contradictory instructions,
         | which is an easy mistake to make if you forget where you
         | started.
         | 
         | As an example, I know of an instance where the LLM claimed it
         | had tried a test on its laptop. This obviously isn't true so
         | the user argued with it. But they'd originally told it that it
         | was a Senior Software Engineer so playing that role, saying you
         | tested locally is fine.
         | 
         | As soon as you start arguing with those minor points you break
         | the context; now it's both a Software Engineer and an LLM. Of
         | course you get confused responses if you do that.
        
           | pupppet wrote:
           | The problem I often have is I may have instruction like-
           | 
           | General instruction: - Do "ABC"
           | 
           | If condition == whatever: - Do "XYZ" instead
           | 
           | I have a hard time making the AI obey the instances I wish to
           | override my own instruction and without having full control
           | of the input context, I can't just modify my 'General
           | Instruction' on a case by case basis to simply avoid having
           | to contradict myself.
        
             | olddustytrail wrote:
             | That's a difficult case where you might want to collect
             | your good context and shift it to a different session.
             | 
             | It would be nice if the UI made that easy to do.
        
       | greenfish6 wrote:
       | Yes, but if you look in the rate limit notes, the rate limit is
       | 500k tokens / minite for tier 4, which we are on. Given how
       | stingy anthropic has been with rate limit increases, this is for
       | very few people right now
        
       | alvis wrote:
       | Context window after certain size doesn't bring in much benefit
       | but higher bill. If it still keeps forgetting instructions it
       | would be just much easier to be ended up with long messages with
       | higher context consumption and hence the bill
       | 
       | I'd rather having an option to limit the context size
        
         | EcommerceFlow wrote:
         | It does if you're working with bigger codebases. I've found
         | copy/pasting my entire codebase + adding a <task> works
         | significantly better than cursor.
        
           | spiderice wrote:
           | How does one even copy their entire codebase? Are you saying
           | you attach all the files? Or you use some script to copy all
           | the text to your clipboard? Or something else?
        
             | EcommerceFlow wrote:
             | I created a script that outputs the entire codebase to a
             | text file (also allows me to exclude
             | files/folders/node_modules), separating and labeling each
             | file in the program folder.
             | 
             | I then structure my prompts around like so:
             | 
             | <project_code> ``` ``` </project_code>
             | 
             | <heroku_errors> " " </heroku_errors>
             | 
             | <task> " " </task>
             | 
             | I've been using this with Google Ai studio and it's worked
             | phenomenally. 1 million tokens is A LOT of code, so I'd
             | imagine this would work for lots n lots of project type
             | programs.
        
       | andrewstuart wrote:
       | Oh man finally. This has been such a HUGE advantage for Gemini.
       | 
       | Could we please have zip files too? ChatGPT and Gemini both
       | unpack zip files via the chat window.
       | 
       | Now how about a button to download all files?
        
       | qsort wrote:
       | I won't complain about a strict upgrade, but that's a pricy boi.
       | Interesting to see differential pricing based on size of input,
       | which is understandable given the O(n^2) nature of attention.
        
       | isoprophlex wrote:
       | 1M of input... at $6/1M input tokens. Better hope it can one-shot
       | your answer.
        
         | elitan wrote:
         | have you ever hired humans?
        
           | bicepjai wrote:
           | Depends on which human you tried :) Donot underestimate
           | yourself !
        
       | rafaelero wrote:
       | god they keep raising prices
        
       | revskill wrote:
       | The critical issue with LLM which never beats human: break what
       | worked.
        
       | henriquegodoy wrote:
       | Thats incredible to see how ai models are improving, i'm really
       | happy with this news. (imo it's more impactful than the release
       | of gpt5) now, we need more tokens per second, and then the self-
       | improvement of the model will accelerate.
        
       | lherron wrote:
       | Wow, I thought they would feel some pricing pressure from GPT5
       | API costs, but they are doubling down on their API being more
       | expensive than everyone else.
        
         | sebzim4500 wrote:
         | I think it's the right approach, the cost of running these
         | things as coding assistants is negligable compared to the
         | benefit of even a slight model improvement.
        
         | AtNightWeCode wrote:
         | GPT5 API uses more tokens for answers of the same quality as
         | previous versions. Fell into that trap myself. I use both
         | Claude and OpenAI right now. Will probably drop OpenAI since
         | they are obviously not to be trusted considering the way they
         | do changes.
        
       | shamano wrote:
       | 1M tokens is impressive, but the real gains will come from how we
       | curate context--compact summaries, per-repo indexes, and phase
       | resets. Bigger windows help; guardrails keep models focused and
       | costs predictable.
        
       | jbellis wrote:
       | Just completed a new benchmark that sheds some light on whether
       | Anthropic's premium is worth it.
       | 
       | (Short answer: not unless your top priority is speed.)
       | 
       | https://brokk.ai/power-rankings
        
         | 24xpossible wrote:
         | Why no Grok 4?
        
           | Zorbanator wrote:
           | You should be able to guess.
        
         | rcanepa wrote:
         | I recently switched to the $200 CC subscription and I think I
         | will stay with it for a while. I briefly tested whatever
         | version of ChatGPT 5 comes with the free Cursor plan and it was
         | unbearably slow. I could not really code with it as I was
         | constantly getting distracted while waiting for a response. So,
         | speed matters a lot for some people.
        
       | Someone1234 wrote:
       | Before this they supposedly had a longer context window than
       | ChatGPT, but I have workloads that abuse the heck out of context
       | windows (100-120K tokens). ChatGPT genuinely seems to have a 32K
       | context window, in the sense that is legitimately remembers/can
       | utilize everything within that window.
       | 
       | Claude previously had "200K" context windows, but during testing
       | it wouldn't even hit a full 32K before hitting a wall/it
       | forgetting earlier parts of the context. They also have extremely
       | short prompt limits relative to the other services around, making
       | it hard to utilize their supposedly larger context windows (which
       | is suspicious).
       | 
       | I guess my point is that with Anthropic specifically, I don't
       | trust their claims because that has been my personal experience.
       | It would be nice if this "1M" context window now allows you to
       | actually use 200K though, but it remains to be seen if it can
       | even do _that_. As I said with Anthropic you need to verify
       | everything they claim.
        
         | Etheryte wrote:
         | Strong agree, Claude is very quick to forget things like "don't
         | do this", "never do this" or things it tried that were wrong.
         | It will happily keep looping even in very short conversations,
         | completely defeating the purpose of using it. It's easy to game
         | the numbers, but it falls apart in the real world.
        
           | joquarky wrote:
           | I've found it better to use antonyms than negations most
           | situations.
        
       | lvl155 wrote:
       | Only time this is useful is to do init on a sizable code base or
       | dump a "big" csv.
        
       | film42 wrote:
       | The 1M token context was Gemini's headlining feature. Now, the
       | only thing I'd like Claude to work on is tokens counted towards
       | document processing. Gemini will often bill 1/10th the tokens
       | Anthropic does for the same document.
        
       | varyherb wrote:
       | I believe this can be configured in Claude Code via the following
       | environment variable:
       | 
       | ANTHROPIC_BETAS="context-1m-2025-08-07" claude
        
         | falcor84 wrote:
         | Have you tested it? I see that this env var isn't specified in
         | their docs
         | 
         | https://docs.anthropic.com/en/docs/claude-code/settings#envi...
        
           | bazhand wrote:
           | Add these settings to your `.claude/settings.json`:
           | ```json          {            "env": {
           | "ANTHROPIC_CUSTOM_HEADERS": {"anthropic-beta":
           | "context-1m-2025-08-07"},              "ANTHROPIC_MODEL":
           | "claude-sonnet-4-20250514",
           | "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 8192            }          }
           | ```
        
       | gdudeman wrote:
       | A tip for those who both use Claude Code and are worried about
       | token use (which you should be if you're stuffing 400k tokens
       | into context even if you're on 20x Max):                 1. Build
       | context for the work you're doing. Put lots of your codebase into
       | the context window.       2. Do work, but at each logical
       | stopping point hit double escape to rewind to the context-filled
       | checkpoint. You do not spend those tokens to rewind to that
       | point.       3. Tell Claude your developer finished XYZ, have it
       | read it into context and give high level and low level feedback
       | (Claude will find more problems with your developer's work than
       | with yours).
       | 
       | If you want to have multiple chats running, use /resume and pull
       | up the same thread. Hit double escape to the point where Claude
       | has rich context, but has not started down a specific rabbit
       | hole.
        
         | rvnx wrote:
         | Thank you for the tips, do you know how to rollback latest
         | changes ? Trying very hard to do it, but seems like Git is the
         | only way ?
        
           | gdudeman wrote:
           | Git or my favorite "Undo all of those changes."
        
             | spike021 wrote:
             | this usually gets the job done for me as well
        
           | SparkyMcUnicorn wrote:
           | I haven't used it, but saw this the other day:
           | https://github.com/RonitSachdev/ccundo
        
           | rtuin wrote:
           | Quick tip when working with Claude Code and Git: When you're
           | happy with an intermediate result, stage the changes by
           | running `git add` (no commit). That makes it possible to
           | always go back to the staged changes when Claude messes up.
           | You can then just discard the unstaged changes and don't have
           | to roll back to the latest commit.
        
         | seperman wrote:
         | Very interesting. Why does Claude find more problems if we
         | mention the code is written by another developer?
        
           | bgilly wrote:
           | In my experience, Claude will criticize others more than it
           | will criticize itself. Seems similar to how LLMs in general
           | tend to say yes to things or call anything a good idea by
           | default.
           | 
           | I find it to be an entertaining reflection of the cultural
           | nuances embedded into training data and reinforcement
           | learning processes.
        
           | mcintyre1994 wrote:
           | Total guess, but maybe it breaks it out of the sycophancy
           | that most models seem to exhibit?
           | 
           | I wonder if they'd also be better at things like telling you
           | an idea is dumb if you tell it it's from someone else and
           | you're just assessing it.
        
           | gdudeman wrote:
           | Claude is very agreeable and is an eager helper.
           | 
           | It gives you the benefit of the doubt if you're coding.
           | 
           | It also gives you the benefit of the doubt if you're looking
           | for feedback on your developers work. If you give it a hint
           | of distrust "my developer says they completed this, can you
           | check and make sure, give them feedback....?" Claude will
           | look out for you.
        
         | sixothree wrote:
         | I've been using Serena MCP to keep my context smaller. It seems
         | to be working because claude uses it pretty much exclusively to
         | search the codebase.
        
           | lucasfdacunha wrote:
           | Could you elaborate a bit on how that works? Does it need any
           | changes in how you use Claude?
        
         | yahoozoo wrote:
         | I thought double escape just clears the text box?
        
           | gdudeman wrote:
           | With an empty text box, double escape shows you a list of
           | previous inputs from you. You can go back and fork at any one
           | of those.
        
         | oars wrote:
         | I tell Claude that it wrote XYZ in another session (I wrote it)
         | then use that context to ask questions or make changes.
        
         | gdudeman wrote:
         | I'll note this saves a lot of wait time as well! No sitting
         | there while a new Claude builds context from scratch.
        
         | i_have_an_idea wrote:
         | This sounds like the programmer equivalent of astrology.
         | 
         | > Build context for the work you're doing. Put lots of your
         | codebase into the context window.
         | 
         | If you don't say that, what do you think happens as the agent
         | works on your codebase.
        
         | insane_dreamer wrote:
         | I usually tell CC (or opencode, which I've been using recently)
         | to look up the files and find the relevant code. So I'm not
         | attaching a huge number of files to the context. But I don't
         | actually know whether this saves tokens or not.
        
         | Wowfunhappy wrote:
         | I do this all the time and it sometimes works, but it's not a
         | silver bullet. Sometimes Claude benefits from having the full
         | conversation.
        
       | ZeroCool2u wrote:
       | It's great they've finally caught up, but unfortunate it's on
       | their mid-tier model only and it's laughably expensive.
        
       | thimabi wrote:
       | Oh, well, ChatGPT is being left in the dust...
       | 
       | When done correctly, having one million tokens of context window
       | is amazing for all sorts of tasks: understanding large codebases,
       | summarizing books, finding information on many documents, etc.
       | 
       | Existing RAG solutions fill a void up to a point, but they lack
       | the precision that large context windows offer.
       | 
       | I'm excited for this release and hope to see it soon on the UI as
       | well.
        
         | OutOfHere wrote:
         | Fwiw, OpenAI does have a decent active API model family of
         | GPT-4.1 with a 1M context. But yes, the context of the GPT-5
         | models is terrible in comparison, and it's altogether atrocious
         | for the GPT-5-Chat model.
         | 
         | The biggest issue in ChatGPT right now is a very inconsistent
         | experience, presumably due to smaller models getting used even
         | for paid users with complex questions.
        
       | kotaKat wrote:
       | A million tokens? Damn, I'm gonna need a _lot_ of quarters to
       | play this game at Chuck-E-Cheese.
        
       | xnx wrote:
       | 1M context windows are not created equal. I doubt Claude's recall
       | is as good as Gemini's 1M context recall.
       | https://cloud.google.com/blog/products/ai-machine-learning/t...
        
         | xnx wrote:
         | Good analysis here:
         | https://news.ycombinator.com/item?id=44878999
         | 
         | > the model that's best at details in long context text and
         | code analysis is still Gemini.
         | 
         | > Gemini Pro and Flash, by comparison, are far cheaper
        
       | firasd wrote:
       | A big problem with the chat apps (ChatGPT; Claude.ai) is the
       | weird context window hijinks. Especially ChatGPT does wild
       | stuff.. sudden truncation; summarization; reinjecting 'ghost
       | snippets' etc
       | 
       | I was thinking this should be up to the user (do you want to
       | continue this conversation with context rolling out of the window
       | or start a new chat) but now I realized that this is inevitable
       | given the way pricing tiers and limited computation works. Like
       | the only way to have full context is use developer tools like
       | Google AI Studio or use a chat app that wraps the API
       | 
       | With a custom chat app that wraps the API you can even inject the
       | current timestamp into each message and just ask the LLM btw
       | every 10 minutes just make a new row in a markdown table that
       | summarizes every 10 min chunk
        
         | cruffle_duffle wrote:
         | > btw every 10 minutes just make a new row in a markdown table
         | that summarizes every 10 min chunk
         | 
         | Why make it time based instead of "message based"... like
         | "every 10 messages, summarize to blah-blah.md"?
        
           | dev0p wrote:
           | Probably it's more cost effective and less error prone to
           | just dump the message log rather than actively rethink the
           | context window, costing resources and potentially losing
           | information in the process. As the models gets better, this
           | might change.
        
           | firasd wrote:
           | Sure. But you'd want to help out the LLM with a message count
           | like this is message 40, this is message 41... so when it
           | hits message 50 it's like ahh time for a new summary and call
           | the memory_table function (cause it's executing the earlier
           | standing order in your prompt)
        
       | tosh wrote:
       | How did they do the 1M context window?
       | 
       | Same technique as Qwen? As Gemini?
        
       | deadbabe wrote:
       | Unfortunately, larger context isn't really the answer after a
       | certain point. Small focused context is better, lazily throwing a
       | bunch of tokens in as a context is going to yield bad results.
        
       | ramoz wrote:
       | Awesome addition to a great model.
       | 
       | The best interface for long context reasoning has been AIStudio
       | by Google. Exceptional experience.
       | 
       | I use Prompt Tower to create long context payloads.
        
       | simianwords wrote:
       | How does "supporting 1M tokens" really work in practice? Is it a
       | new model? Or did they just remove some hard coded constraint?
        
         | eldenring wrote:
         | Serving a model efficiently at 1M context is difficult and
         | could be much more expensive/numerically tricky. I'm guessing
         | they were working on serving it properly, since its the same
         | "model" in scores and such.
        
           | simianwords wrote:
           | Thanks - still not clear what they did really. Some inference
           | time hacks?
        
             | FergusArgyll wrote:
             | That would imply the model always had a 1m token context
             | but they limited it in the api and app? That's strange
             | because they can just charge more for every token past 250k
             | (like google does, I believe).
             | 
             | But if not shouldn't it have to be completely retrained
             | model? it's clearly not that - good question!
        
       | nickphx wrote:
       | Yay, more room for stray cats.
        
       | alienbaby wrote:
       | The fracturing of all the models offered across providers is
       | annoying. The number of different models and the fact a given
       | model will have different capabilities from different providers
       | is ridiculous.
        
       | chrisweekly wrote:
       | Peer of this post currently also on HN front page, comparing perf
       | for Claude vs Gemini, w/ 1M tokens:
       | https://news.ycombinator.com/item?id=44878999
        
       | DiabloD3 wrote:
       | Neat. I do 1M tokens context locally, and do it entirely with a
       | single GPU and FOSS software, and have access to a wide range of
       | models of equivalent or better quality.
       | 
       | Explain to me, again, how Anthropic's flawed business model
       | works?
        
         | codazoda wrote:
         | Tell us more?
        
           | DiabloD3 wrote:
           | Nothing really to say, its just like everyone else's
           | inference setups.
           | 
           | Select a model that produces good results, has anywhere from
           | 256k to 1M context (ex: Qwen3-Coder can do 1M), is under one
           | of the acceptable open weights licenses, and run it in
           | llama.cpp.
           | 
           | llama.cpp can split layers between active and MoE, and only
           | load the active ones into vram, leaving the rest of it
           | available for context.
           | 
           | With Qwen3-Coder-30B-A3B, I can use Unsloth's Q4_K_M, consume
           | a mere 784MB of VRAM with the active layers, then consume
           | 27648MB (kv cache) + 3096MB (context) with the kv cache
           | quantized to iq4_nl. This will fit onto a single card with
           | 32GB of VRAM, or slightly spill over on 24GB.
           | 
           | Since I don't personally need that much, I'm not pouring
           | entire projects into it (I know people do this, and more data
           | _does not produce better results_ ), I bump it down to 512k
           | context and fit it in 16.0GB, to avoid spill over on my 24GB
           | card. In the event I do need the context, I am always free to
           | enable it.
           | 
           | I do not see a meaningful performance difference between all
           | on the card and MoE sent to RAM while active is on VRAM, its
           | very much a worthwhile option for home inference.
           | 
           | Edit: For completeness sake, 256k context with this
           | configuration is 8.3GB total VRAM, making _very_ budget good
           | inference absolutely possible.
        
       | ffitch wrote:
       | I wonder how modern models fair on NovelQA and FLenQA (benchmarks
       | that test ability to understand long context beyond needle in a
       | haystack retrieval). The only such test on a reasoning model that
       | I found was done on o3-mini-high
       | (https://arxiv.org/abs/2504.21318), it suggests that reasoning
       | noticeably improves FLenQA performance, but this test only
       | explored context up to 3,000 tokens.
        
       | dang wrote:
       | Related ongoing thread:
       | 
       |  _Claude vs. Gemini: Testing on 1M Tokens of Context_ -
       | https://news.ycombinator.com/item?id=44878999 - Aug 2025 (9
       | comments)
        
       | whalesalad wrote:
       | My first thought was "gg no re" can't wait to see how this
       | changes compaction requirements in claude code.
        
       | pmxi wrote:
       | The reason I initially got interested in Claude was because they
       | were the first to offer a 200K token context window. That was
       | massive in 2023. However, they didn't keep up once Gemini offered
       | a 1M token window last year.
       | 
       | I'm glad to see an attempt to return to having a competitive
       | context window.
        
       | markb139 wrote:
       | I've tried 2 AI tools recently. Neither could produce the correct
       | code to calculate the CPU temperature on a Raspberry Pi RP2040.
       | The code worked, looked ok and even produced reasonable looking
       | results - until I put a finger on the chip and thus raised the
       | temp. The calculated temperature went down. As an aside the free
       | version of chatGPT didn't know about anything newer than 2023 so
       | couldn't tell me about the RP2350
        
         | anvuong wrote:
         | How can you be sure putting the finger on the chip raise the
         | temp? If you feel hot that means heat from the chip is being
         | transferred to your finger, that may decrease the temp, no?
        
         | broshtush wrote:
         | From my understanding putting your finger on an uncooled CPU
         | acts like a passive cooler, thus actually decreasing
         | temperature.
        
         | fwip wrote:
         | I don't think a larger context window would help with that.
        
           | fpauser wrote:
           | Best comment ;)
        
         | ghjv wrote:
         | wouldn't your finger have acted as a heat sink, lowering the
         | temp? sounds like the program may have worked correctly. could
         | be worth trying again with a hot enough piece of metal instead
         | of your finger
        
       | logicchains wrote:
       | With that pricing I can't imagine why anyone would use Claude
       | Sonnet through the API when Gemini 2.5 Pro is both better and
       | cheaper (especially at long-context understanding).
        
         | CuriouslyC wrote:
         | Claude is a good deal with the $20 subscription giving a fair
         | amount of sonnet use with Code. It's also got a very distinct
         | voice as far as LLMs go, and tends to produce cleaner/clearer
         | writing in general. I wouldn't use the API in an application
         | but the subscription feels like a pretty good deal.
        
       | siva7 wrote:
       | Ah, so claude code on subscription will become a crippled down
       | version
        
       | joduplessis wrote:
       | As far as coding goes Claude seems to be the most competent right
       | now, I like it. GPT5 is abysmal - I'm not sure if they're bugs,
       | or what, but the new release takes a good few steps back. Gemini
       | still a hit and miss - and Grok seems to be a poor man's Claude
       | (where code is kind of okay, a bit buggy and somehow similar to
       | Claude).
        
       | brokegrammer wrote:
       | Many people are confused about the usefulness of 1M tokens
       | because LLMs often start to get confused after about 100k. But
       | this is big for Claude 4 because it uses automatic RAG when the
       | context becomes large. With optimized retrieval thanks to RAG,
       | we'll be able to make good use of those 1M tokens.
        
         | m4r71n wrote:
         | How does this work under the hood? Does it build an in-memory
         | vector database of the input sources and runs queries on top of
         | that data to supplement the context window?
        
       | Balgair wrote:
       | Wow!
       | 
       | As a fiction writer/noodler this is amazing. I can put not just a
       | whole book in as before, not just a whole series, but the entire
       | corpus of author _s_ in.
       | 
       | I mean, from the pov of biography writers, this is awesome too.
       | Just dump it all in, right?
       | 
       | I'll have to switch using to Sonnet 4 now for workflows and edit
       | my RAG code to be longer windows, a _lot_ longer
        
       | irthomasthomas wrote:
       | Brain: Hey, you going to sleep? Me: Yes. Brain: That 200,001st
       | token cost you $600,000/M.
        
       | qwertox wrote:
       | > desperately need LLMs to maintain extremely effective context
       | 
       | Last time I used Gemini it did something very surprising: instead
       | of providing readable code, it started to generate pseudo-
       | minified code.
       | 
       | Like on CSS class would become one long line of CSS, and one JS
       | function became one long line of JS, with most of the variable
       | names minified, while some remained readable, but short. It did
       | away with all unnecessary spaces.
       | 
       | I was asking myself what is happening here, and my only
       | explanation was that maybe Google started training Gemini on
       | minified code, on making Gemini understand and generate it, in
       | order to maximize the value of every token.
        
       | ericol wrote:
       | "...in API"
       | 
       | That's a VERY relevant clarification. this DOESN'T apply to web
       | or app users.
       | 
       | Basically, if you want a 1M context window you have to
       | specifically pay for it.
        
       | sporkland wrote:
       | Does anyone have data on how much better these 1M token context
       | models produce better results than the more limited windows
       | alongside certain RAG implementations? Or how much better in the
       | face of RAG the 200k vs 1M token models perform on a benchmark?
        
       | poniko wrote:
       | [Claude usage limit reached. Your limit will reset at..] .. eh
       | lunch is a good time to go home anyways..
        
       | chmod775 wrote:
       | For some context, only the tweaks files and scripting parts of
       | Cyberpunk 2077 are ~2 million LOC.
        
       | not_that_d wrote:
       | My experience with the current tools so far:
       | 
       | 1. It helps to get me going with new languages, frameworks,
       | utilities or full green field stuff. After that I expend a lot of
       | time parsing the code to understand what it wrote that I kind of
       | "trust" it because it is too tedious but "it works".
       | 
       | 2. When working with languages or frameworks that I know, I find
       | it makes me unproductive, the amount of time I spend writing a
       | good enough prompt with the correct context is almost the same or
       | more that if I write the stuff myself and to be honest the
       | solution that it gives me works for this specific case but looks
       | like a junior code with pitfalls that are not that obvious unless
       | you have the experience to know it.
       | 
       | I used it with Typescript, Kotlin, Java and C++, for different
       | scenarios, like websites, ESPHome components (ESP32), backend
       | APIs, node scripts etc.
       | 
       | Botton line: usefull for hobby projects, scripts and to
       | prototypes, but for enterprise level code it is not there.
        
         | jeremywho wrote:
         | My workflow is to use Claude desktop with the filesystem mcp
         | server.
         | 
         | I give claude the full path to a couple of relevant files
         | related to the task at hand, ie where the new code should hook
         | into or where the current problem is.
         | 
         | Then I ask it to solve the task.
         | 
         | Claude will read the files, determine what should be done and
         | it will edit/add relevant files. There's typically a couple of
         | build errors I will paste back in and have it correct.
         | 
         | Current code patterns & style will be maintained in the new
         | code. It's been quite impressive.
         | 
         | This has been with Typescript and C#.
         | 
         | I don't agree that what it has produced for me is hobby-grade
         | only...
        
           | taberiand wrote:
           | I've been using it the same way. One approach that's worked
           | well for me is to start a project and first ask it to analyse
           | and make a plan with phases for what needs to be done, save
           | that plan into the project, then get it to do each phase in
           | sequence. Once it completes a phase, have it review the code
           | to confirm if the phase is complete. Each phase of work and
           | review is a new chat.
           | 
           | This way helps ensure it works on manageable amounts of code
           | at a time and doesn't overload its context, but also keeps
           | the bigger picture and goal in sight.
        
             | mnky9800n wrote:
             | I find that sometimes this works great and sometimes it
             | happily tells you everything works and your code fails
             | successfully and if you aren't reading all the code you
             | would never know. It's kind of strange actually. I don't
             | have a good feeling when it will get everything correct and
             | when it will fail and that's what is disconcerting. I would
             | be happy to be given advice on what to do to untangle when
             | it's good and when it's not. I love chatting with Claude
             | code about code. It's annoying that it doesn't always get
             | it right and also doesn't really interact with failure like
             | a human would. At Least in my experience anyways.
        
               | taberiand wrote:
               | Of course, everything needs to be verified - I'm just
               | trying to figure out a process that enables it to work as
               | effectively as it can on large code bases in a structured
               | way. Committing each stage to git, fixing issues and
               | adjusting the context still comes into play.
        
           | nwatson wrote:
           | One can also integrate with, say, a running PyCharm with the
           | Jetbrains IDE MCP server. Claude Desktop can then interact
           | directly with PyCharm.
        
           | hamandcheese wrote:
           | Any particular reason you prefer that over Claude code?
        
             | jeremywho wrote:
             | I'm on windows. Claude Code via WSL hasn't been as smooth a
             | ride.
        
         | risyachka wrote:
         | Pretty much my experience too.
         | 
         | I usually go to option 2 - just write it by myself as it is
         | same time-wise but keeps skills sharp.
        
           | fpauser wrote:
           | Not to degenerate is really challenging these days. There are
           | the bubbles that simulate multiple realities to us and try to
           | untrain us logic thinking. And there are the llms that try to
           | convice us that self thinking is unproductive. I wonder when
           | this digitalophily suddenly turns into digitalophobia.
        
         | flowerthoughts wrote:
         | I predict microservices will get a huge push forward. The
         | question then becomes if we're good enough at saying "Claude,
         | this is too big now, you have to split it in two services" or
         | not.
         | 
         | If LLMs maintain the code, the API boundary
         | definitions/documentation and orchestration, it might be
         | manageable.
        
           | fsloth wrote:
           | Why microservices? Monoliths with code-golfed minimal
           | implementation size (but high quality architecture)
           | implemented in strongly typed language would consume far less
           | tokens (and thus would be cheaper to maintain).
        
           | arwhatever wrote:
           | Won't this cause [insert LLM] to lose context around the
           | semantics of messages passed between microservices?
           | 
           | You could then put all services in 1 repo, or point LLM at X
           | number of folders containing source for all X services, but
           | then it doesn't seem like you'll have gained anything, and at
           | the cost of added network calls and more infra management.
        
           | urbandw311er wrote:
           | Why not just cleanly separated code in a single execution
           | environment? No need to actually run the services in separate
           | execution environments just for the sake of an LLM being able
           | to parse it, that's crazy! You can just give it the files or
           | folders it needs for the particular services within the
           | project.
           | 
           | Obviously there's still other reasons to create micro
           | services if you wish, but this does not need to be another
           | reason.
        
         | fpauser wrote:
         | Same conclusion here. Also good for analyzing existing
         | codebases and to generate documentation for undocumented
         | projects.
        
           | j45 wrote:
           | It's quite good at this, I have been tying in Gemini Pro with
           | this too.
        
         | johnisgood wrote:
         | > but for enterprise level code it is not there
         | 
         | It is good for me in Go but I had to tell it what to write and
         | how.
        
           | sdesol wrote:
           | I've been able to create a very advanced search engine for my
           | chat app that is more than enterprise ready. I've spent a
           | decade thinking about search, but in a different language.
           | Like you, I needed to explain what I knew about writing a
           | search engine in Java for the LLM, to write it in JavaScript
           | using libraries I did not know and it got me 95% of the way
           | there.
           | 
           | It is also incredibly important to note that the 5% that I
           | needed to figure out was the difference between throw away
           | code and something useful. You absolutely need domain
           | knowledge but LLMs are more than enterprise ready in my
           | opinion.
           | 
           | Here is some documentation on how my search solution is used
           | in my app to show that it is not a hobby feature.
           | 
           | https://github.com/gitsense/chat/blob/main/packages/chat/wid.
           | ..
        
             | johnisgood wrote:
             | Thanks for your reply, I am in the same boat, and it works
             | for me, like it seems to work for you. So as long as we are
             | effective with it, why not? Of course I am not doing things
             | blindly and expect good results.
        
         | jiggawatts wrote:
         | Something I've discovered is that it may be worthwhile writing
         | the prompt anyway, even for a framework you're an expert with.
         | Sometimes the AIs will surprise me with a novel approach, but
         | the real value is that the prompt makes for _excellent_
         | documentation of the requirements! It's a much better starting
         | point for doc-comments or PR blurbs than after-the-fact
         | ramblings.
        
         | viccis wrote:
         | I agree. For me it's a modern version of that good ol "rails
         | new" scaffolding with Ruby on Rails that got you started with a
         | project structure. It makes sense because LLMs are particularly
         | good at tasks that require little more knowledge than just a
         | near perfect knowledge of the documentation of the tooling
         | involved, and creating a well organized scaffold for a
         | greenfield project falls squarely in that area.
         | 
         | For legacy systems, especially ones in which a lot of the
         | things they do are because of requirements from external
         | services (whether that's tech debt or just normal growing
         | complexity in a large connected system), it's less useful.
         | 
         | And for tooling that moves fast and breaks things (looking at
         | you, Databricks), it's basically worthless. People have already
         | brought attention to the fact that it will only be as current
         | as its training data was, and so if a bunch of terminology,
         | features, and syntax have changed since then (ahem,
         | Databricks), you would have to do some kind of prompt
         | engineering with up to date docs for it to have any hope of
         | succeeding.
        
         | alfalfasprout wrote:
         | The bigger problem I'm seeing is engineers that become over
         | reliant on vibe coding tools are starting to lose context on
         | how systems are designed and work.
         | 
         | As a result, their productivity might go up on simple "ticket
         | like tasks" where it's basically just simple implementation
         | (find the file(s) to edit, modify it, test it) but when they
         | start using it for all their tasks suddenly they don't know how
         | anything works. Or worse, they let the LLM dictate and bad
         | decisions are made.
         | 
         | These same people are also very dogmatic on the use of these
         | tools. They refuse to just code when needed.
         | 
         | Don't get me wrong, this stuff has value. But I just hate
         | seeing how it's made many engineers complacent and accelerated
         | their ability to add to tech debt like never before.
        
         | mnky9800n wrote:
         | Yea that's right. It's kind of annoying how useful it is for
         | hobby projects and it is suddenly useless on anything at work.
         | Haha. I love Claude code for some stuff (like generating a
         | notebook to analyse some data). But it really just disconnects
         | you from the problem you are solving without you going through
         | everything it writes. And I'm really bullish on ai coding tools
         | haha, for example:
         | 
         | https://open.substack.com/pub/mnky9800n/p/coding-agents-prov...
        
         | pqs wrote:
         | I'm not a programmer, but I need to write python and bash
         | programs to do my work. I also have a few websites and other
         | personal projects. Claude Code helps me implement those little
         | projects I've been wanting to do for a very long time, but I
         | couldn't due to the lack of coding experience and time. Now I'm
         | doing them. Also now I can improve my emacs environment,
         | because I can create lisp functions with ease. For me, this is
         | the perfect tool, because now I can do those little projects I
         | couldn't do before, making my life easier.
        
           | zingar wrote:
           | Big +1 to customizing emacs! Used to feel so out of reach,
           | but now I basically rolled my own cursor.
        
           | chamomeal wrote:
           | LLMs totally kick ass for making bash scripts
        
             | dboreham wrote:
             | Strong agree. Bash is so annoying that there have been many
             | scripts that I wanted to have, but just didn't write (did
             | the thing manually instead) rather than go down the rabbit
             | hole of Bash nonsense. LLMs turn this on its head. I
             | probably have LLMs write 1-2 bash scripts a week now, that
             | I commit to git for use now and later.
        
           | MangoCoffee wrote:
           | At the end of the day, all tools are made to make their
           | users' lives easier.
           | 
           | I use GitHub Copilot. I recently did a vibe code hobby
           | project for a command line tool that can display my
           | computer's IP, hard drive, hard drive space, CPU, etc. GPT
           | 4.1 did coding and Claude did the bug fixing.
           | 
           | The code it wrote worked, and I even asked it to create a
           | PowerShell script to build the project for release
        
         | stpedgwdgfhgdd wrote:
         | For enterprise software development CC is definitely there.
         | 100k Go code paas platform, micro services architecture, mono
         | repo is manageable.
         | 
         | The prompt needs to be good, but in plan mode it will
         | iteratively figure it out.
         | 
         | You need to have automated tests. For enterprise software
         | development that actually goes without saying.
        
         | dclowd9901 wrote:
         | It also steps right over easy optimizations. I was doing a
         | query on some github data (tedious work) and rather than
         | preliminarily filter down using the graphql search method, it
         | wanted to comb through all PRs individually. This seems like
         | something it probably should have figured out.
        
         | amelius wrote:
         | It is very useful for small tasks like fixing network problems,
         | or writing regexp patterns based on a few examples.
        
           | MarcelOlsz wrote:
           | _Here 's how YOU can save $200/mo!_
        
         | brulard wrote:
         | For me it was like this for like a year (using Cline + Sonnet &
         | Gemini) until Claude Code came out and until I learned how to
         | keep context real clean. The key breakthrough was treating AI
         | as an architect/implementer rather than a code generator.
         | 
         | Most recently I ask first CC to create a design document for
         | what we are going to do. He has instructions to look into the
         | relevant parts of the code and docs to reference them. I review
         | it and few back-and-forths we have defined what we want to do.
         | Next step is to chunk it into stages and even those to smaller
         | steps. All this may take few hours, but after this is well
         | defined, I clear the context. I then let him read the docs and
         | implement one stage. This goes mostly well and if it doesn't I
         | either try to steer him to correct it, or if it's too bad, I
         | improve the docs and start this stage over. After stage is
         | complete, we commit, clear context and proceed to next stage.
         | 
         | This way I spend maybe a day creating a feature that would take
         | me maybe 2-3. And at the end we have a document, unit tests,
         | storybook pages, and features that gets overlooked like
         | accessibility, aria-things, etc.
         | 
         | At the very end I like another model to make a code review.
         | 
         | Even if this didn't make me faster now, I would consider it
         | future-proofing myself as a software engineer as these tools
         | are improving quickly
        
           | imiric wrote:
           | This is a common workflow that most advanced users are
           | familiar with.
           | 
           | Yet even following it to a T, and being _really_ careful with
           | how you manage context, the LLM will still hallucinate,
           | generate non-working code, steer you into wrong directions
           | and dead ends, and just waste your time in most scenarios.
           | There 's no magical workflow or workaround for avoiding this.
           | These issues are inherent to the technology, and have been
           | since its inception. The tools have certainly gotten more
           | capable, and the ecosystem has matured greatly in the last
           | couple of years, but these issues remain unsolved. The idea
           | that people who experience them are not using the tools
           | correctly is insulting.
           | 
           | I'm not saying that the current generation of this tech isn't
           | useful. I've found it very useful for the same scenarios GP
           | mentioned. But the above issues prevent me from relying on it
           | for anything more sophisticated than that.
        
         | drums8787 wrote:
         | My experience is the opposite I guess. I am having a great time
         | using claude to quickly implement little "filler features" that
         | require a good amount of typing and pulling from/editing
         | different sources. Nothing that requires much brainpower beyond
         | remembering the details of some sub system, finding the right
         | files, and typing.
         | 
         | Once the code is written, review, test and done. And on to more
         | fun things.
         | 
         | Maybe what has made it work is that these tasks have all fit
         | comfortably within existing code patterns.
         | 
         | My next step is to break down bigger & more complex changes
         | into claude friendly bites to save me more grunt work.
        
           | unlikelytomato wrote:
           | I wish I shared this experience. There are virtually no
           | filter features for me to work on. When things feel like
           | filler on my team, it's generally a sign of tech debt and we
           | wouldn't want to have it generate all the code it would take.
           | What are some examples of filler features for you?
           | 
           | On the other hand, it does cost me about 8 hours a week
           | debugging issues created by bad autocompletes from my team.
           | The last 6 months have gotten really bad with that. But that
           | is a different issue.
        
         | apimade wrote:
         | Many who say LLMs produce "enterprise-grade" code haven't
         | worked in mid-tier or traditional companies, where projects are
         | held together by duct tape, requirements are outdated, and
         | testing barely exists. In those environments, enterprise-ready
         | code is rare even without AI.
         | 
         | For developers deeply familiar with a codebase they've worked
         | on for years, LLMs can be a game-changer. But in most other
         | cases, they're best for brainstorming, creating small tests, or
         | prototyping. When mid-level or junior developers lean heavily
         | on them, the output may look useful.. until a third-party
         | review reveals security flaws, performance issues, and built-in
         | legacy debt.
         | 
         | That might be fine for quick fixes or internal tooling, but
         | it's a poor fit for enterprise.
        
         | therealpygon wrote:
         | I mostly agree, with the caveat that I would say it can
         | absolutely be useful when used appropriately as an "assistant".
         | NOT vibe coding blindly and hoping what I end up with is
         | useful. "Implement x specific thing" (e.g. add an edit button
         | to component x), not "implement a whole new epic feature that
         | includes changes to a significant number of files". Imagine
         | meeting a house builder and saying "I want a house", then
         | leaving and expecting to come back to exactly the house you
         | dreamed of.
         | 
         | I get why, it's a test of just how intuitive the model can be
         | at planning and execution which drives innovation more than 1%
         | differences in benchmarks ever will. I encourage that
         | innovation in the hobby arena or when dogfooding your AI
         | engineer. But as a replacement developer in an enterprise where
         | an uncaught mistake could cost millions? No way. I wouldn't
         | even want to be the manager of the AI engineer team, when they
         | come looking for the only real person to blame for the mistake.
         | 
         | For additional checks, or internal tools, and for scripts?
         | Sure. It's incredibly useful with all sorts of non- application
         | development tasks. I've not written a batch or bash script in
         | forever...you just don't really need to much anymore. The
         | linear flow of most batch/bash/scripts (like you mentioned)
         | couldn't be a more suitable domain.
         | 
         | Also, with a basic prompt, it can be an incredibly useful
         | rubber duck. For example, I'll say something like "how do you
         | think I should solve x problem"(with tools for the codebase and
         | such, of course), and then over time having rejected and been
         | adversarial to every suggestion, I end up working through the
         | problem and have a more concrete mental design. Think "over-
         | eager junior know-it-all that tries to be right constantly"
         | without the person attached and you get a better idea of what
         | kind of LLM output you can expect. For me it's less about
         | wanting a plan from the LLM, and more about talking through the
         | problems my plan solves better.
        
       | TZubiri wrote:
       | Remember kids, just because you CAN doesn't mean you SHOULD
        
       | mrcwinn wrote:
       | This tells me they've gotten very good at caching and modeling
       | the impact of caching.
        
       | fpauser wrote:
       | O observed that claude produces a lot of bloat. Wonder how such
       | llm generated projects age.
        
       | cadamsdotcom wrote:
       | I'm glad to see the only company chasing margins - which they get
       | by having a great product and a meticulous brand - finding even
       | more ways to get margin. That's good business.
        
       | howinator wrote:
       | I could be wrong, but I think this pricing is the first to admit
       | that cost scales quadratically with number of tokens. It's the
       | first time I've seen nonlinear pricing from an LLM provider which
       | implicitly mirrors the inference scaling laws I think we're all
       | aware of.
        
         | jpau wrote:
         | Google[1] also has a "long context" pricing structure. OpenAI
         | may be considering offering similar since they do not offer
         | their priority processing SLAs[2] for context >128K.
         | 
         | [1] https://cloud.google.com/vertex-ai/generative-ai/pricing
         | 
         | [2] https://openai.com/api-priority-processing/
        
       | reverseblade2 wrote:
       | Does this cover subscription?
        
         | anonym29 wrote:
         | API only for now, but at the very bottom of the post: "We're
         | also exploring how to bring long context to other Claude
         | products."
         | 
         | So, not yet, but maybe someday?
        
       | _joel wrote:
       | Fantastic, use up your quota even more quickly. :)
        
       | phyzix5761 wrote:
       | What I've found with LLMs is they're basically a better version
       | of Google Search. If I need a quick "How do I do..." or if I need
       | to find a quick answer to something its way more useful than
       | Google and the fact that I can ask follow up questions is
       | amazing. But for any serious deep work it has a long way to go.
        
         | mr_moon wrote:
         | I feel exactly the same way. why skim and sift 15 different
         | stackoverflow posts when an LLM can pick out exactly the info I
         | need?
         | 
         | I don't need to spin up an entire feature in a few seconds. I
         | need help understanding where something is broken; what are
         | some opinions o best practice; or finding out what a poorly
         | written snippet is doing.
         | 
         | context still v important for this though and I appreciate
         | cranking that capacity. "read 15000 stackoverflow posts for me
         | please"
        
           | anvuong wrote:
           | The action of sifting through through poop to find gold
           | actually positively develops my critical thinking skill. I,
           | too, went through a phase of just asking LLM for a specific
           | concept instead of Googling it and weave through dozens of
           | wiki pages or niche mailing list discussions. It did improve
           | my productivity but I feel like it dulls my brain. So
           | recently I have to tone that down and force myself to go back
           | to the old way. Maybe too much of a good thing is bad.
        
         | Whatarethese wrote:
         | This is my primary use of AI. Looking for a new mountain bike
         | and using AI to list and compare parts of the bike and which is
         | best for my use case scenario. Works pretty well so far.
        
       | meander_water wrote:
       | I like to spend a lot of time in "Ask" mode in Cursor. I guess
       | the equivalent in Claude code is "plan" mode.
       | 
       | Where I have minimal knowledge about the framework or language, I
       | ask a lot of questions about how the implementation would work,
       | what the tradeoffs are etc. This is to minimize any
       | misunderstanding between me and the tool. Then I ask it to write
       | the implementation plan, and execute it one by one.
       | 
       | Cursor lets you have multiple tabs open so I'll have a Ask mode
       | and Agent mode running in parallel.
       | 
       | This is a lot slower, and if it was a language/framework I'm
       | familiar with I'm more likely to execute the plan myself.
        
       | itissid wrote:
       | My experience with Claude code beyond building anything bigger
       | than a webpage, a small API, a tutorial on CSS etc has been
       | pretty bad. I think context length is a manageable problem, but
       | not the main one. I used it to write a 50K LoC python code base
       | with 300 unit tests and it went ok for the first few weeks and
       | then it failed. This is after there is a CLAUDE.md file for every
       | single module that needs it as well as detailed agents for
       | testing, design, coding and review.
       | 
       | I won't going into a case by case list of its failures, The core
       | of the issue is misaligned incentives, which I want to get into:
       | 
       | 1. The incentives for coding agent, in general and claude, are
       | writing LOTS of code. None of them -- O -- are good at the
       | planning and verification.
       | 
       | 2. The involvement of the human, ironically, in a haphazard way
       | in the agent's process. And this has to do with how the problem
       | of coding for these agents is defined. Human developers are like
       | snow flakes when it comes to opinions on software design, there
       | is no way to apply each's preference(except paper machet and
       | superglue SO, Reddit threads and books) to the design of the
       | system in any meaningful way and that makes a simple system way
       | too complex or it makes a complex problem simplistic.
       | - There is no way to evolve the plan to accept new preferences
       | except text in CLAUDE.md file in git that you will have to read
       | through and edit.            - There is no way to know the near
       | term effect of code choices now on 1 week from now.             -
       | So much code is written that asking a person to review it in case
       | you are at the envelope and pushing the limit feels morally wrong
       | and an insane ask. How many of your Code reviews are instead
       | replaced by 15-30 min design meetings to instead solicit feedback
       | on design of the PR -- because it so complex -- and just push the
       | PR into dev? WTF am I even doing I wonder.            - It does
       | not know how far to explore for better rewards and does not know
       | it better from local rewards, Resulting in commented out tests
       | and deleting arbitrary code, to make its plan "work".
       | 
       | In short code is a commodity for CEOs of Coding agent companies
       | and CXOs of your company to use(sales force has everyone coding,
       | but that just raises the floor and its a good thing, it does NOT
       | lower the bar and make people 10x devs). All of them have bought
       | into this idea that 10x is somehow producing 10x code. Your time
       | reviewing and unmangling and mainitaining the code is not the
       | commodity. It never ever was.
        
       | lpa22 wrote:
       | One of the most helpful usages of CC so far is when I simply ask:
       | 
       | "Are there any bugs in the current diff"
       | 
       | It analyzes the changes very thoroughly, often finds very subtle
       | bugs that would cost hours of time/deployments down the line, and
       | points out a bunch of things to think through for correctness.
        
       ___________________________________________________________________
       (page generated 2025-08-12 23:00 UTC)