[HN Gopher] Claude Sonnet 4 now supports 1M tokens of context
___________________________________________________________________
Claude Sonnet 4 now supports 1M tokens of context
Author : adocomplete
Score : 773 points
Date : 2025-08-12 16:02 UTC (6 hours ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| throwaway888abc wrote:
| holy moly! awesome
| tankenmate wrote:
| This is definitely good to have this as an option but at the same
| time having more context reduces the quality of the output
| because it's easier for the LLM to get "distracted". So, I wonder
| what will happen to the quality of code produced by tools like
| Claude Code if users don't properly understand the trade off
| being made (if they leave it in auto mode of coding right up to
| the auto compact).
| jasonthorsness wrote:
| What do you recommend doing instead? I've been using Claude
| Code a lot but am still pretty novice at the best practices
| around this.
| TheDong wrote:
| Have the AI produce a plan that spans multiple files (like
| "01 create frontend.md", "02 create backend.md", "03 test
| frontend and backend running together.md"), and then create a
| fresh context for each step if it looks like re-using the
| same context is leading it to confusion.
|
| Also, commit frequently, and if the AI constantly goes down
| the wrong path ("I can't create X so I'll stub it out with Y,
| we'll fix it later"), you can update the original plan with
| wording to tell it not to take that path ("Do not ever stub
| out X, we must make X work"), and then start a fresh session
| with an older and simpler version of the code and see if that
| fresh context ends up down a better path.
|
| You can also run multiple attempts in parallel if you use
| tooling that supports that (containers + git worktrees is one
| way)
| F7F7F7 wrote:
| Inventivatbly the files become a mess of their own. Changes
| and learnings from one part of the plan often dont result
| in adaptation to impacted plans down chain.
|
| In the end you have a mish mash of half implemented plans
| and now you've lost context too. Which leads to blowing
| tokens on trying to figure out what's been implemented,
| what's half baked, and what was completely ignored.
|
| Any links to anyone who's built something at scale using
| this method? It always sounds good on paper.
|
| I'd love to find a system that works.
| brandall10 wrote:
| My system is to create detailed feature files up to a few
| hundred lines in size that are immutable, and then have a
| status.md file (preferably kept to about 50 lines) that
| links to a current feature that is used as a way to keep
| track of the progress on that feature.
|
| Additionally I have a Claude Code command with
| instructions referencing the status.md, how to select the
| next task, how to compact status.md, etc.
|
| Every time I'm done with a unit of work from that feature
| - always triggered w/ ultrathink - I'll put up a PR and
| go through the motions of extra refactors/testing. For
| more complex PRs that require many extra commits to get
| prod ready I just let the sessions auto-compact.
|
| After merging I'll clear the context and call the CC
| command to progress to the next unit of work.
|
| This allows me to put up to around 4-5 meaningful PRs per
| feature if it's reasonably complex while keeping the
| context relatively tight. The current project I'm focused
| on is just over 16k LOC in swift (25k total w/ tests) and
| it seems to work pretty well - it rarely gets off track,
| does unnecessary refactors, destroys working features,
| etc.
| nzach wrote:
| Care to elaborate on how you use the status.md file? What
| exactly you put in there, and what value does it bring?
| brandall10 wrote:
| When I initially have it built from a feature file, it
| pulls in the most pertinent high level details from that
| and creates a supercharged task list that is updated w/
| implementation details as the feature progresses.
|
| As it links to the feature file as well, that is pulled
| into the context, but status.md is there to essentially
| act as a 'cursor' to where it is in the implementation
| and provide extended working memory - that Claude itself
| manages - specific to that feature. With that you can
| work on bite sized chunks of the feature each with a
| clean context. When the feature is complete it is
| trashed.
|
| I've seen others try to achieve similar things by making
| CLAUDE.md or the feature file mutable but that IME is a
| bad time. CLAUDE.md should be lean with the details to
| work on the project, and the feature file can easily be
| corrupted in an unintended way allowing things to go
| wayward in scope.
| nzach wrote:
| In my experience it works better if you create one plan
| at a time. Create a prompt, make claude implement it and
| then you make sure it is working as expected. Only then
| you ask for something new.
|
| I've created an agent to help me create the prompts, it
| goes something like this: "You are an Expert Software
| Architect specializing in creating comprehensive, well-
| researched feature implementation prompts. Your sole
| purpose is to analyze existing codebases and
| documentation to craft detailed prompts for new features.
| You always think deeply before giving an answer...."
|
| My workflow is: 1) use this agent to create a prompt for
| my feature; 2) ask claude to create a plan for the just
| created prompt; 3) ask claude to implement said plan if
| it looks good.
| cube00 wrote:
| >You always think deeply before giving an answer...
|
| Nice try but they're not giving you the "think deeper"
| level just because you asked.
| nzach wrote:
| https://docs.anthropic.com/en/docs/build-with-
| claude/prompt-...
| dpe82 wrote:
| Actually that's exactly how you do it.
| theshrike79 wrote:
| I use Gemini-cli (free 2.5 pro for an undetermined time
| before it self-lobotomises and switches to lite) to keep
| the specs up to date.
|
| The actual tasks are stored in Github issues, which
| Claude (and sometimes Gemini when it feels like it) can
| access using the `gh` CLI tool.
|
| But it's all just project management, if what the code
| says drifts from what's in the specs (for any reason),
| one of them has to change.
|
| Claude does exactly what the documentation says, it
| doesn't evaluate the fact that the code is completely
| different and adapt - like a human would.
| bredren wrote:
| Don't rely entirely on CC. Once a milestone has been
| reached, copy the full patch to clipboard and the
| technical spec covering this. Provide the original files,
| the patch and the spec to Gemini and say ~a colleague did
| the work and does it fulfill the aims to best practices
| and spec?
|
| Pick among the best feedback to polish the work done by
| CC---it will miss things that Gemini will catch.
|
| Then do it again. Sometimes CC just won't follow feedback
| well and you gotta make the changes yourself.
|
| If you do this you'll be more gradual but by nature of
| the pattern look at the changes more closely.
|
| You'll be able to realign CC with the spec afterward with
| a fresh context and the existing commits showing the way.
|
| Fwiw, this kind of technique can be done entirely without
| CC and can lead to excellent results faster, as Gemini
| can look at the full picture all at once, vs having to
| force cc to consume vs hen and peck slices of files.
| wongarsu wrote:
| Changing the prompt and rerunning is something where Cursor
| still has a clear edge over Claude Code. It's such a
| powerful technique for keeping the context small because it
| keeps the context clear of back-and-forths and dead ends. I
| wish it was more universally supported
| abound wrote:
| I do this all the time in Claude Code, you hit Escape
| twice and select the conversation point to 'branch' from.
| agotterer wrote:
| I use the main Claude code thread (I don't know what to call
| it) for planning and then explicitly tell Claude to delegate
| certain standalone tasks out to subagents. The subagents
| don't consume the main threads context window. Even just
| delegating testing, debugging, and building will save a ton
| context.
| sixothree wrote:
| /clear often is really the first tool for management. Do this
| when you finish a task.
| tehlike wrote:
| Some reference:
|
| https://simonwillison.net/2025/Jun/29/how-to-fix-your-contex...
|
| https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-ho...
| bachittle wrote:
| As of now it's not integrated into Claude Code. "We're also
| exploring how to bring long context to other Claude products".
| I'm sure they already know about this issue and are trying to
| think of solutions before letting users incur more costs on
| their monthly plans.
| PickledJesus wrote:
| Seems to be for me, I came to look at HN because I saw it was
| the default in CC
| novaleaf wrote:
| where do you see it in CC?
| PickledJesus wrote:
| I got a notification when I opened it, indicating that
| the default had changed, and I can see it on /model.
|
| Only on a max (20x) account, not there on a Pro one.
| novaleaf wrote:
| thanks, FYI I'm on a max 20x also and I don't see it!
| tankenmate wrote:
| maybe a staggered release?
| Wowfunhappy wrote:
| I'm curious, what does it say on /model?
|
| For reference, my options are: +-------
| ---------------------------------------------------------
| -----------------------------+ |
| | | Select Model
| | | Switch between Claude models. Applies to
| this session and future Claude Code sessions. |
| | For custom model names, specify with --model.
| | |
| | | 1. Default (recommended) Opus 4.1 for up
| to 50% of usage limits, then use Sonnet 4 | |
| 2. Opus Opus 4.1 for complex tasks *
| Reaches usage limits faster | | 3.
| Sonnet Sonnet 4 for daily use
| | | 4. Opus Plan Mode Use Opus 4.1 in
| plan mode, Sonnet 4 otherwise | |
| | +----------------------------------------------
| -----------------------------------------------+
| dbreunig wrote:
| The team at Chroma is currently looking into this and should
| have some figures.
| falcor84 wrote:
| Strange that they don't mention whether that's enabled or
| configurable in Claude Code.
| farslan wrote:
| Yeah same, I'm curious about this. I would guess it's by
| default enabled with Claude Code.
| csunoser wrote:
| They don't say it outright. But I think it is not in Claude
| Code yet.
|
| > We're also exploring how to bring long context to other
| Claude products. - Anthropic
|
| That is, any other product that is not Anthropic API tier 4 or
| Amazon bedrock.
| CharlesW wrote:
| From a co-marketing POV, it's considered best practice to not
| discuss home-grown offerings in the same or similar category as
| products from the partners you're featuring.
|
| It's likely they'll announce this week, albeit possibly just
| within the "what's new" notes that you see when Claude Code is
| updated.
| faangguyindia wrote:
| In my testing the gap between claude and gemini pro 2.5 is close.
| My company is in asia pacific and we can't get access to claude
| via vertex for some stupid reason.
|
| but i tested it via other providers, the gap used to be huge but
| now not.
| Tostino wrote:
| For me the gap is pretty large (in Gemini Pro 2.5's favor).
|
| For reference, the code I am working on is a Spring Boot /
| (Vaadin) Hilla multi-module project with helm charts for
| deployment and a separate Python based module for ancillary
| tasks that were appropriate for it.
|
| I've not been able to get any good use out of Sonnet in months
| now, whereas Gemini Pro 2.5 has (still) been able to grok the
| project well enough to help out.
| jona777than wrote:
| I initially found Gemini Pro 2.5 to work well for coding.
| Over time, I found Claude to be more consistently productive.
| Gemini Pro 2.5 became my go-to for use cases benefitting from
| larger context windows. Claude seemed to be the safer daily
| driver (if I needed to get something done.)
|
| All that being said, Gemini has been consistently dependable
| when I had asks that involved large amounts of code and data.
| Claude and the OpenAI models struggled with some tasks that
| Gemini responsively satisfied seemingly without "breaking a
| sweat."
|
| Lately, it's been GPT-5 for brainstorming/planning, Claude
| for hammering out some code, Gemini when there is huge
| data/code requirements. I'm curious if the widened Sonnet 4
| context window will change things.
| llm_nerd wrote:
| Opus 4.1 is a much better model for coding than Sonnet. The
| latter is good for general queries / investigations or to
| draw up some heuristics.
|
| I have paid subscriptions to both Gemini Pro and Claude.
| Hugely worthwhile expense professionally.
| faangguyindia wrote:
| when gemini 2.5 pro gets stuck, i often use deep seek r1 in
| architect mode and qwen3 in coder mode in aider and it solves
| all the problem
|
| last month i ran into some wicked dependency bug and only
| chatgpt could solve it which i am guessing is the case
| because it has hot data from github?
|
| On the other hand, i really need a tool like aider where i
| can use various models in "architect" and "coder" mode.
|
| what i've found is better reasoning models tend to be bad at
| writing actual code, and models like qwen3 coder seems
| better.
|
| deep seek r1 will not write reliable code but it will reason
| well and map out the path forward.
|
| i wouldn't be surprised if sonnets success was doing EXACTLY
| this behind the scenes.
|
| but now i am looking for pure models who do not use this
| black magic hack behind API.
|
| I want more control at tool end where i can alter the prompts
| and achieve results i want
|
| this is one reason i do not use claude code etc....
|
| aider is 80% of what i want wish it had more of what i want
| though.
|
| i just don't know why no one has build a perfect solution to
| this yet.
|
| Here are things i am missing in aider
|
| 1. Automatic model switching, use different models for asking
| questions about the code, different one for planning a
| feature, different one for writing actual code.
|
| 2. Self determine, if a feature needs a "reasoning" model or
| coding model will suffice.
|
| 3. be able to do more, selectively send context and drop the
| files we don't need. Intelligently add files to context which
| will be touched by the feature, not after having done all
| code planning asking to add files, then doing it all over
| again with more context available.
| penguin202 wrote:
| Claude doesn't have a mid-life crisis and try to `rm -rf /` or
| delete your project.
| film42 wrote:
| Agree but pricing wise, Gemini 2.5 pro wins. Gemini input
| tokens are half the cost of Claude 4. Output is $5/million
| cheaper than Claude. But, document processing is significantly
| cheaper. A 5MB PDF (customer invoice) with Gemini is like 5k
| tokens vs 56k with Claude.
|
| The only downside with Gemini (and it's a big one) is
| availability. We get rate limited by their dynamic QoS all the
| time even if we haven't reached our quota. Our GCP sales rep
| keeps recommending "provisioned throughput," but it's both
| expensive, and doesn't fit our workload type. Plus, the
| VertexAI SDK is kind of a PITA compared to Anthropic.
| Alex-Programs wrote:
| Google products are such a pain to work with from an API
| perspective that I actively avoid them where possible.
| artursapek wrote:
| Eagerly waiting for them to do this with Opus
| irthomasthomas wrote:
| Imagine paying $20 a prompt?
| artursapek wrote:
| If I can give it a detailed spec, walk away and do something
| else for 20 minutes, and come back to work that would have
| taken me 2 hours, then that's a steal.
| datadrivenangel wrote:
| Depending on how many prompts per hour you're looking at,
| that's probably same order of magnitude as expensive SAAS. A
| fancy CRM seat can be ~$2000 per month (or more), which
| assuming 50 hours per week x 4 weeks per month is $10 per
| hour ($2000/200 hours). A lot of money, but if it makes your
| sales people more productive, it's a good investment.
| Assuming that you're paying your sales people say 240K per
| year, ($20,000 per month), then the SAAS cost is 10% of their
| salary.
|
| This explains DataDog pricing. Maybe it will give a future
| look at AI pricing.
| mettamage wrote:
| Shame it's only the API. Would've loved to see it via the web
| interface on claude.ai itself.
| minimaxir wrote:
| Can you even fit 200+k tokens worth of context in the web
| interface? IMO Claude's API workbench is the worst of the three
| major providers.
| mettamage wrote:
| Via text files right? Just drag and drop.
| data-ottawa wrote:
| When working on artifacts after a few change requests it
| definitely can.
| 77pt77 wrote:
| Even if you can't, a conversation can easily get larger than
| that.
| fblp wrote:
| I assume this will mean that long chats continue to get the
| "prompt is too long" error?
| penguin202 wrote:
| But will it remember any of it, and stop creating new redundant
| files when it can't find or understand what its looking for?
| 1xer wrote:
| moaaaaarrrr
| aliljet wrote:
| This is definitely one of my CORE problem as I use these tools
| for "professional software engineering." I really desperately
| need LLMs to maintain extremely effective context and it's not
| actually that interesting to see a new model that's marginally
| better than the next one (for my day-to-day).
|
| However. Price is king. Allowing me to flood the context window
| with my code base is great, but given that the price has
| substantially increased, it makes sense to better manage the
| context window into the current situation. The value I'm getting
| here flooding their context window is great for them, but short
| of evals that look into how effective Sonnet stays on track, it's
| not clear if the value actually exists here.
| rootnod3 wrote:
| Flooding the context also means increasing the likelihood of
| the LLM confusing itself. Mainly because of the longer context.
| It derails along the way without a reset.
| aliljet wrote:
| How do you know that?
| EForEndeavour wrote:
| https://onnyunhui.medium.com/evaluating-long-context-
| lengths...
| bigmadshoe wrote:
| https://research.trychroma.com/context-rot
| joenot443 wrote:
| This is a good piece. Clearly it's a pretty complex
| problem and the intuitive result a layman engineer like
| myself might expect doesn't reflect the reality of LLMs.
| Regex works as reliably on 20 characters as it does 2m
| characters; the only difference is speed. I've learned
| this will probably _never_ be the case with LLMs, there
| will forever exist some level of epistemic doubt in its
| result.
|
| When they announced Big Contexts in 2023, they referenced
| being able to find a single changed sentence in the
| context's copy of Great Gatsby[1]. This example seemed
| _incredible_ to me at the time but now two years later
| I'm feeling like it was pretty cherry-picked. What does
| everyone else think? Could you feed a novel into an LLM
| and expect it to find the single change?
|
| [1] https://news.ycombinator.com/item?id=35941920
| F7F7F7 wrote:
| What do you think happens when things start falling outside
| of its context window? It loses access to parts of your
| conversation.
|
| And that's why it will gladly rebuild the same feature over
| and over again.
| anonz4FWNqnX wrote:
| I've had similar experiences. I've gone back and forth
| between running models locally and using the commercial
| models. The local models can be incredibly useful (gemma,
| qwen), but they need more patience and work to get them to
| work.
|
| One advantage to running locally[1] is that you can set the
| context length manually and see how well the llm uses it. I
| don't have an exact experience to relay, but it's not
| unusual for models to be allow longer contexts, but ignore
| that context.
|
| Just making the context big doesn't mean the LLM is going
| to use it well.
|
| [1] I've using lm studio on both a macbook air and a
| macbook pro. Even a macbook air with 16G can run pretty
| decent models.
| nomel wrote:
| A good example of this was the first Gemini model that
| allowed 1 million tokens, but would lose track of the
| conversation after a couple paragraphs.
| rootnod3 wrote:
| The longer the context and the discussion goes on, the more
| it can get confused, especially if you have to refine the
| conversation or code you are building on.
|
| Remember, in its core it's basically a text prediction
| engine. So the more varying context there is, the more
| likely it is to make a mess of it.
|
| Short context: conversion leaves the context window and it
| loses context. Long context: it can mess with the model. So
| the trick is to strike a balance. But if it's an online
| models, you have fuck all to control. If it's a local
| model, you have some say in the parameters.
| fkyoureadthedoc wrote:
| https://github.com/adobe-research/NoLiMa
| giancarlostoro wrote:
| Here's a paper from MIT that covers how this could be
| resolved in an interesting fashion:
|
| https://hanlab.mit.edu/blog/streamingllm
|
| The AI field is reusing existing CS concepts for AI that we
| never had hardware for, and now these people are learning
| how applied Software Engineering can make their theoretical
| models more efficient. It's kind of funny, I've seen this
| in tech over and over. People discover new thing, then
| optimize using known thing.
| mamp wrote:
| Unfortunately, I think the context rot paper [1] found
| that the performance degradation when context increased
| still occurred in models using attention sinks.
|
| 1. https://research.trychroma.com/context-rot
| kridsdale3 wrote:
| The fact that this is happening is where the tremendous
| opportunity to make money as an experienced Software
| Engineer currently lies.
|
| For instance, a year or two ago, the AI people discovered
| "cache". Imagine how many millions the people who
| implemented it earned for that one.
| Wowfunhappy wrote:
| I keep reading this, but with Claude Code in particular, I
| consistently find it gets smarter the longer my conversations
| go on, peaking right at the point where it auto-compacts and
| everything goes to crap.
|
| This isn't always true--some conversations go poorly and it's
| better to reset and start over--but it usually is.
| benterix wrote:
| > it's not clear if the value actually exists here.
|
| Having spent a couple of weeks on Claude Code recently, I
| arrived to the conclusion that the net value for me from
| agentic AI is actually negative.
|
| I will give it another run in 6-8 months though.
| wahnfrieden wrote:
| Did you try with using Opus exclusively?
| freedomben wrote:
| Do you know if there's a way to force Claude code to do
| that exclusively? I've found a few env vars online but they
| don't seem to actually work
| wahnfrieden wrote:
| Peter Steinberger has been documenting his workflows and
| he relies exclusively on Opus at least until recently.
| (He also pays for a few Max 20x subscriptions at once to
| avoid rate limits.)
| atonse wrote:
| You can type /config and then go to the setting to pick a
| model.
| gdudeman wrote:
| Yes: type /model and then pick Opus 4.1.
| artursapek wrote:
| You can "force" it by just paying them $200 (which is
| nothing compared to the value)
| parineum wrote:
| Value is irrelevant. What's the return on investment you
| get from spending $200?
|
| Collecting value doesn't really get you anywhere if
| nobody is compensating you for it. Unless someone is
| going to either pay for it for you or give you $200/mo
| post-tax dollars, it's costing you money.
| wahnfrieden wrote:
| The return for me is faster output of features, fixes,
| and polish for my products which increases revenue above
| the cost of the tool. Did you need to ask this?
| parineum wrote:
| Yes, I did. Not everybody has their own product that
| might benefit from a $200 subscription. Most of us work
| for someone else and, unless that person is paying for
| the subscription, the _value_ it adds is irrelevant
| unless it results in better compensation.
|
| Furthermore, the advice was given to upgrade to a $200
| subscription from the $20 subscription. The difference in
| value that might translate into income between the $20
| option and the $200 option is very unclear.
| wahnfrieden wrote:
| If you are employed you should petition your employer for
| tools you want. Maybe you can use it to take the day off
| earlier or spend more time socializing. Or to get a
| promotion or performance bonus. Hopefully not just to
| meet rising productivity expectations without being
| handed the tools needed to achieve that. Having full-time
| access to these tools can also improve your own skills in
| using them, to profit from in a later career move or from
| contributing toward your own ends.
| parineum wrote:
| I'm not disputing that. I'm just pushing back against the
| casual suggestion (not by you) to just go spend $200.
|
| No doubt that you should ask you employer for the tools
| you want/need to do your job but plenty of us are using
| this kind of thing casually and the response to "Any way
| I can force it to use [Opus] exclusively?" is "Spend
| $200, it's worth it." isn't really helpful, especially in
| the context where the poster was clearly looking to try
| it out to see if it was worth it.
| epiccoleman wrote:
| is Opus that much better than Sonnet? My sub is $20 a
| month, so I guess I'd have to buy that I'm going to get a
| 10x boost, which seems dubious
| theshrike79 wrote:
| With the $20 plan you get Opus on the web and in the
| native app. Just not in Claude Code.
|
| IMO it's pretty good for design, but with code it gets in
| its head a bit too much and overthinks and
| overcomplicates solutions.
| artursapek wrote:
| Yes, Opus is much better at complicated architecture
| mark_l_watson wrote:
| I am sort of with you. I am down to asking Gemini Pro a
| couple of questions a day, use ChatGPT just a few times a
| week, and about once a week use gemini-cli (either a short
| free session, or a longer session where I provide my API
| key.)
|
| That said I spend (waste?) an absurdly large amount of time
| each week experimenting with local models (sometimes
| practical applications, sometimes 'research').
| mikepurvis wrote:
| For a bit more nuance, I think I would my overall net is
| about break even. But I don't take that as "it's not worth it
| at all, abandon ship" but rather that I need to hone my
| instinct of what is and is not a good task for AI
| involvement, and what that involvement should look like.
|
| Throwing together a GHA workflow? Sure, make a ticket, assign
| it to copilot, check in later to give a little feedback and
| we're golden. Half a day of labour turned into fifteen
| minutes.
|
| But there are a lot of tasks that are far too nuanced where
| trying to take that approach just results in frustration and
| wasted time. There it's better to rely on editor completion
| or maybe the chat interface, like "hey I want to do X and Y,
| what approach makes sense for this?" and treat it like a
| rubber duck session with a junior colleague.
| cambaceres wrote:
| For me it's meant a huge increase in productivity, at least
| 3X.
|
| Since so many claim the opposite, I'm curious to what you do
| more specifically? I guess different roles/technologies
| benefit more from agents than others.
|
| I build full stack web applications in node/.net/react, more
| importantly (I think) is that I work on a small startup and
| manage 3 applications myself.
| datadrivenangel wrote:
| How do you structure your applications for maintainability?
| dingnuts wrote:
| You have small applications following extremely common
| patterns and using common libraries. Models are good at
| regurgitating patterns they've seen many times, with fuzzy
| find/replace translations applied.
|
| Try to build something like Kubernetes from the ground up
| and let us know how it goes. Or try writing a custom
| firmware for a device you just designed. Something like
| that.
| elevatortrim wrote:
| I think there are two broad cases where ai coding is
| beneficial:
|
| 1. You are a good coder but working on a new (to you) or
| building a new project, or working with a technology you
| are not familiar with. This is where AI is hugely
| beneficial. It does not only accelerate you, it lets you do
| things you could not otherwise.
|
| 2. You have spent a lot of time on engineering your context
| and learning what AI is good at, and using it very
| strategically where you know it will save time and not
| bother otherwise.
|
| If you are a really good coder, really familiar with the
| project, and mostly changing its bits and pieces rather
| than building new functionality, AI won't accelerate you
| much. Especially if you did not invest the time to make it
| work well.
| nicce wrote:
| > I build full stack web applications in node/.net/react,
| more importantly (I think) is that I work on a small
| startup and manage 3 applications myself.
|
| I think this is your answer. For example, React and
| JavaScript are extremely popular and aged. Are you using
| TypeScript and want to get most of the types or are you
| accepting everything that LLM gives as JavaScript? How
| interested you are about the code whether it is using "soon
| to be deprecated" functions or the most optimized
| loop/implementation? How about the project structure?
|
| In other cases, the more precision you need, less effective
| LLM is.
| rs186 wrote:
| 3X if not 10X if you are starting a new project with
| Next.js, React, Tailwind CSS for a fullstack website
| development, that solves an everyday problem. Yeah I just
| witnessed that yesterday when creating a toy project.
|
| For my company's codebase, where we use internal tools and
| proprietary technology, solving a problem that does not
| exist outside the specific domain, on a codebase of over
| 1000 files? No way. Even locating the correct file to edit
| is non trivial for a new (human) developer.
| GenerocUsername wrote:
| Your first week of AI usage should be crawling your
| codebase and generating context.md docs that can then be
| fed back into future prompts so that AI understands your
| project space, packages, apis, and code philosophy.
|
| I guarantee your internal tools are not revolutionary,
| they are just unrepresented in the ML model out of the
| box
| nicce wrote:
| Even then, are you even allowed to use AI in such
| codebase. Is some part of the code "bought", e.g.
| commercial compiler generated with specific license? Is
| pinky promise from LLM provider enough?
| orra wrote:
| That sounds incredibly boring.
|
| Is it effective? If so I'm sure we'll see models to
| generate those context.md files.
| cpursley wrote:
| Yes. And way less boring than manually reading a section
| of a codebase to understand what is going on after being
| away from it for 8 months. Claude's docs and git commit
| writing skills are worth it for that alone.
| blitztime wrote:
| How do you keep the context.md updated as the code
| changes?
| shmoogy wrote:
| I tell Claude to update it generally but you can probably
| use a hook
| tombot wrote:
| This, while it has context of the current problem, just
| ask Claude to re-read it's own documentation and think of
| things to add that will help it in the future
| MattGaiser wrote:
| Yeah, anecdotally it is heavily dependent on:
|
| 1. Using a common tech. It is not as good at Vue as it is
| at React.
|
| 2. Using it in a standard way. To get AI to really work
| well, I have had to change my typical naming conventions
| (or specify them in detail in the instructions).
| nicce wrote:
| React also seems to be actually alias for Next.js. Models
| have hard time to make the difference.
| mike_hearn wrote:
| My codebase has about 1500 files and is highly domain
| specific: it's a tool for shipping desktop apps[1] that
| handles all the building, packaging, signing, uploading
| etc for every platform on every OS simultaneously. It's
| written mostly in Kotlin, and to some extent uses a
| custom in-house build system. The rest of the build is
| Gradle, which is a notoriously confusing tool. The source
| tree also contains servers, command line tools and a
| custom scripting language which is used for all the
| scripting needs of the project [2].
|
| The code itself is quite complex and there's lots of
| unusual code for munging undocumented formats, speaking
| undocumented protocols, doing cryptography, Mac/Windows
| specific APIs, and it's all built on a foundation of a
| custom parallel incremental build system.
|
| In other words: nightmare codebase for an LLM. Nothing
| like other codebases. Yet, Claude Code demolishes
| problems in it without a sweat.
|
| I don't know why people have different experiences but
| speculating a bit:
|
| 1. I wrote most of it myself and this codebase is
| unusually well documented and structured compared to
| most. All the internal APIs have full JavaDocs/KDocs,
| there are extensive design notes in Markdown in the
| source tree, the user guide is also part of the source
| tree. Files, classes and modules are logically named.
| Files are relatively small. All this means Claude can
| often find the right parts of the source within just a
| few tool uses.
|
| 2. I invested in making a good CLAUDE.md and also wrote a
| script to generate "map.md" files that are at the top of
| every module. These map files contain one-liners of what
| every source file contains. I used Gemini to make these
| due to its cheap 1M context window. If Claude _does_
| struggle to find the right code by just reading the
| context files or guessing, it can consult the maps to
| locate the right place quickly.
|
| 3. I've developed a good intuition for what it can and
| cannot do well.
|
| 4. I don't ask it to do big refactorings that would
| stress the context window. IntelliJ is for refactorings.
| AI is for writing code.
|
| [1] https://hydraulic.dev
|
| [2] https://hshell.hydraulic.dev/
| tptacek wrote:
| That's an interesting comment, because "locating the
| correct file to edit" was the very first thing LLMs did
| that was valuable to me as a developer.
| evantbyrne wrote:
| The problem with these discussions is that almost nobody
| outside of the agency/contracting world seems to track
| their time. Self-reported data is already sketchy enough
| without layering on the issue of relying on distant memory
| of fine details.
| andrepd wrote:
| Self-reports are notoriously overexcited, real results are,
| let's say, not so stellar.
|
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
| logicprog wrote:
| Here's an in depth analysis and critique of that study by
| someone whose job is literally to study programmers
| psychologically and has experience in sociology studies:
| https://www.fightforthehuman.com/are-developers-slowed-
| down-...
|
| Basically, the study has a fuckton of methodological
| problems that seriously undercut the quality of its
| findings, and even assuming its findings are correct, if
| you look closer at the data, it doesn't show what it
| claims to show regarding developer estimations, and the
| story of whether it speeds up or slows down developers is
| actually much more nuanced and precisely mirrors what the
| developers themselves say in the qualitative quote
| questionaire, and relatively closely mirrors what the
| more nuanced people will say here -- that it helps with
| things you're less familiar with, that have scope creep,
| etc a lot more, but is less or even negatively useful for
| the opposite scenarios -- even in the worst case setting.
|
| Not to mention this is studying a highly specific and
| rare subset of developers, and they even admit it's a
| subset that isn't applicable to the whole.
| acedTrex wrote:
| I have yet to get it to generate code past 10ish lines that
| I am willing to accept. I read stuff like this and wonder
| how low yall's standards are, or if you are working on
| projects that just do not matter in any real world sense.
| spicyusername wrote:
| 4/5 times I can easily get 100s of lines output, that
| only needs a quick once over.
|
| 1/5 times, I spend an extra hour tangled in code it
| outputs that I eventually just rewrite from scratch.
|
| Definitely a massive net positive, but that 20% is
| extremely frustrating.
| acedTrex wrote:
| That is fascinating to me, i've never seen it generate
| that much code that is actually something i would
| consider correct. It's always wrong in some way.
| LinXitoW wrote:
| In my experience, if I have to issue more than 2
| corrections, I'm better off restarting and beefing up the
| prompt or just doing it myself
| dillydogg wrote:
| Whenever I read comments from the people singing their
| praises of the technology, it's hard not to think of the
| study that found AI tools made developers slower in early
| 2025.
|
| >When developers are allowed to use AI tools, they take
| 19% longer to complete issues--a significant slowdown
| that goes against developer beliefs and expert forecasts.
| This gap between perception and reality is striking:
| developers expected AI to speed them up by 24%, and even
| after experiencing the slowdown, they still believed AI
| had sped them up by 20%.
|
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
| mstkllah wrote:
| Ah, the very extensive study with 16 developers.
| Bulletproof results.
| izacus wrote:
| Yeah, we should listen to the one "trust me bro" dude
| instead.
| troupo wrote:
| Compared to "it's just a skill issue you're not prompting
| it correctly" crowd with literally zero actionable data?
| logicprog wrote:
| Here's an in depth analysis and critique of that study by
| someone whose job is literally to study programmers
| psychologically and has experience in sociology studies:
| https://www.fightforthehuman.com/are-developers-slowed-
| down-...
|
| Basically, the study has a fuckton of methodological
| problems that seriously undercut the quality of its
| findings, and even assuming its findings are correct, if
| you look closer at the data, it doesn't show what it
| claims to show regarding developer estimations, and the
| story of whether it speeds up or slows down developers is
| actually much more nuanced and precisely mirrors what the
| developers themselves say in the qualitative quote
| questionaire, and relatively closely mirrors what the
| more nuanced people will say here -- that it helps with
| things you're less familiar with, that have scope creep,
| etc a lot more, but is less or even negatively useful for
| the opposite scenarios -- even in the worst case setting.
|
| Not to mention this is studying a highly specific and
| rare subset of developers, and they even admit it's a
| subset that isn't applicable to the whole.
| dillydogg wrote:
| This is very helpful, thank you for the resource
| djeastm wrote:
| Standards are going to be as low as the market allows I
| think. Some industries code quality is paramount, other
| times its negligible and perhaps speed of development is
| higher priority and the code is mostly disposable.
| wiremine wrote:
| > Having spent a couple of weeks on Claude Code recently, I
| arrived to the conclusion that the net value for me from
| agentic AI is actually negative.
|
| > For me it's meant a huge increase in productivity, at
| least 3X.
|
| How do we reconcile these two comments? I think that's a
| core question of the industry right now.
|
| My take, as a CTO, is this: we're giving people new tools,
| and very little training on the techniques that make those
| tools effective.
|
| It's sort of like we're dropping trucks and airplanes on a
| generation that only knows walking and bicycles.
|
| If you've never driven a truck before, you're going to
| crash a few times. Then it's easy to say "See, I told you,
| this new fangled truck is rubbish."
|
| Those who practice with the truck are going to get the hang
| of it, and figure out two things:
|
| 1. How to drive the truck effectively, and
|
| 2. When NOT to use the truck... when talking or the bike is
| actually the better way to go.
|
| We need to shift the conversation to techniques, and away
| from the tools. Until we do that, we're going to be forever
| comparing apples to oranges and talking around each other.
| jdgoesmarching wrote:
| Agreed, and it drives me bonkers when people talk about
| AI coding as if it represents some a single technique,
| process, or tool.
|
| Makes me wonder if people spoke this way about "using
| computers" or "using the internet" in the olden days.
|
| We don't even fully agree on the best practices for
| writing code _without_ AI.
| moregrist wrote:
| > Makes me wonder if people spoke this way about "using
| computers" or "using the internet" in the olden days.
|
| There were gobs of terrible road metaphors that spun out
| of calling the Internet the "Information Superhighway."
|
| Gobs and gobs of them. All self-parody to anyone who knew
| anything.
|
| I hesitate to relate this to anything in the current AI
| era, but maybe the closest (and in a gallows humor/doomer
| kind of way) is the amount of exec speak on how many jobs
| will be replaced.
| porksoda wrote:
| Remember the ones who loudly proclaimed the internet to
| be a passing fad, not useful for normal people. All anti
| LLM rants taste like that to me.
|
| I get why they thought that - it was kind of crappy
| unless you're one who is excited about the future and
| prepared to bleed a bit on the edge.
| mh- wrote:
| _> Makes me wonder if people spoke this way about "using
| computers" or "using the internet" in the olden days._
|
| Older person here: they absolutely did, all over the
| place in the early 90s. I remember people decrying
| projects that moved them to computers everywhere I went.
| Doctors offices, auto mechanics, etc.
|
| Then later, people did the same thing about _the
| Internet_ (was written with a single word capital I by
| 2000, having been previously written as two separate
| words.)
|
| https://i.imgur.com/vApWP6l.png
| jacquesm wrote:
| And not all of those people were wrong.
| jeremy_k wrote:
| Well put. It really does come down to nuance. I find
| Claude is amazing at writing React / Typescript. I mostly
| let it do it's own thing and skim the results after. I
| have it write Storybook components so I can visually
| confirm things look how I want. If something isn't quite
| right I'll take a look and if I can spot the problem and
| fix it myself, I'll do that. If I can't quickly spot it,
| I'll write up a prompt describing what is going on and
| work through it with AI assistance.
|
| Overall, React / Typescript I heavily let Claude write
| the code.
|
| The flip side of this is my server code is Ruby on Rails.
| Claude helps me a lot less here because this is my
| primary coding background. I also have a certain way I
| like to write Ruby. In these scenarios I'm usually asking
| Claude to generate tests for code I've already written
| and supplying lots of examples in context so the coding
| style matches. If I ask Claude to write something novel
| in Ruby I tend to use it as more of a jumping off point.
| It generates, I read, I refactor to my liking. Claude is
| still very helpful, but I tend to do more of the code
| writing for Ruby.
|
| Overall, helpful for Ruby, I still write most of the
| code.
|
| These are the nuances I've come to find and what works
| best for my coding patterns. But to your point, if you
| tell someone "go use Claude" and they have have a
| preference in how to write Ruby and they see Claude
| generate a bunch of Ruby they don't like, they'll likely
| dismiss it as "This isn't useful. It took me longer to
| rewrite everything than just doing it myself". Which all
| goes to say, time using the tools whether its Cursor,
| Claude Code, etc (I use OpenCode) is the biggest key but
| figuring out how to get over the initial hump is probably
| the biggest hurdle.
| croes wrote:
| Do you only skim the results or do you audit them at some
| point to prevent security issues?
| jeremy_k wrote:
| What kind of security issues are you thinking about? I'm
| generating UI components like Selects for certain data
| types or Charts of data.
| dghlsakjg wrote:
| User input is a notoriously thorny area.
|
| If you aren't sanitizing and checking the inputs
| appropriately somewhere between the user and trusted
| code, you WILL get pwned.
|
| Rails provides default ways to avoid this, but it makes
| it very easy to do whatever you want with user input.
| Rails will not necessarily throw a warning if your AI
| decides that it wants to directly interpolate user input
| into a sql query.
| jeremy_k wrote:
| Well in this case, I am reading through everything that
| is generated for Rails because I want things to be done
| my way. For user input, I tend to validate everything
| with Zod before sending it off the backend which then
| flows through ActiveRecord.
|
| I get what you're saying that AI could write something
| that executes user input but with the way I'm using the
| tools that shouldn't happen.
| k9294 wrote:
| For this very reason I switched for TS for backend as
| well. I'm not a big fun of JS but the productivity gain
| of having shared types between frontend and backend and
| the Claude code proficiency with TS is immense.
| jeremy_k wrote:
| I considered this, but I'm just too comfortable writing
| my server logic in Ruby on Rails (as I do that for my day
| job and side project). I'm super comfortable writing
| client side React / Typescript but whenever I look at
| server side Typescript code I'm like "I should understand
| what this is doing but I don't" haha.
| jorvi wrote:
| It is not really a nuanced take when it compares
| 'unassisted' coding to using a bicycle and AI-assisted
| coding with a truck.
|
| I put myself somewhere in the middle in terms of how
| great I think LLMs are for coding, but anyone that has
| worked with a colleague that loves LLM coding knows how
| horrid it is that the team has to comb through and
| doublecheck their commits.
|
| In that sense it would be equally nuanced to call AI-
| assisted development something like "pipe bomb coding".
| You toss out your code into the branch, and your non-AI'd
| colleagues have to quickly check if your code is a
| harmless tube of code or yet another contraption that
| quickly needs defusing before it blows up in everyone's
| face.
|
| Of course that is not nuanced either, but you get the
| point :)
| LinXitoW wrote:
| Oh nuanced the comparison seems also depends on whether
| you live in Arkansas or in Amsterdam.
|
| But I disagree that your counterexample has anything at
| all to do with AI coding. That very same developer was
| perfectly capable of committing untested crap without AI.
| Perfectly capable of copy pasting the first answer they
| found on Stack Overflow. Perfectly capable of recreating
| utility functions over and over because they were to lazy
| to check if they already exist.
| nabla9 wrote:
| I agree.
|
| I experience a productivity boost, and I believe it's
| because I prevent LLMs from making design choices or
| handling creative tasks. They're best used as a "code
| monkey", fill in function bodies once I've defined them.
| I design the data structures, functions, and classes
| myself. LLMs also help with learning new libraries by
| providing examples, and they can even write unit tests
| that I manually check. Importantly, no code I haven't
| read and accepted ever gets committed.
|
| Then I see people doing things like "write an app for
| ....", run, hey it works! WTF?
| quikoa wrote:
| It's not just about the programmer and his experience
| with AI tools. The problem domain and programming
| language(s) used for a particular project may have a
| large impact on how effective the AI can be.
| wiremine wrote:
| > The problem domain and programming language(s) used for
| a particular project may have a large impact on how
| effective the AI can be.
|
| 100%. Again, if we only focus on things like context
| windows, we're missing the important details.
| vitaflo wrote:
| But even on the same project with the same tools the
| general way a dev derives satisfaction from their work
| can play a big role. Some devs derive satisfaction from
| getting work done and care less about the code as long as
| it works. Others derive satisfaction from writing well
| architected and maintainable code. One can guess the
| reactions to how LLM's fit into their day to day lives
| for each.
| weego wrote:
| In a similar role and place with this.
|
| My biggest take so far: If you're a disciplined coder
| that can handle 20% of an entire project's (project being
| a bug through to an entire app) time being used on
| research, planning and breaking those plans into phases
| and tasks, then augmenting your workflow with AI appears
| to be to have large gains in productivity.
|
| Even then you need to learn a new version of explaining
| it 'out loud' to get proper results.
|
| If you're more inclined to dive in and plan as you go,
| and store the scope of the plan in your head because
| "it's easier that way" then AI 'help' will just
| fundamentally end up in a mess of frustration.
| cmdli wrote:
| My experience has been entirely the opposite as an IC. If
| I spend the time to delve into the code base to the point
| that I understand how it works, AI just serves as a mild
| improvement in writing code as opposed to implementing it
| normally, saving me maybe 5 minutes on a 2 hour task.
|
| On the other hand, I've found success when I have no idea
| how to do something and tell the AI to do it. In that
| case, the AI usually does the wrong thing but it can
| oftentimes reveal to me the methods used in the rest of
| the codebase.
| zarzavat wrote:
| Both modes of operation are useful.
|
| If you know how to do something, then you can give Claude
| the broad strokes of how you want it done and -- if you
| give enough detail -- hopefully it will come back with
| work similar to what you would have written. In this case
| it's saving you on the order of minutes, but those
| minutes add up. There is a possibility for negative time
| saving if it returns garbage.
|
| If you _don 't_ know how to do something then you can see
| if an AI has any ideas. This is where the big
| productivity gains are, hours or even days can become
| minutes if you are sufficiently clueless about something.
| jacobr1 wrote:
| An importantly the cycle time on this stuff can be much
| faster. Trying out different variants, and iterating
| through larger changes can be huge.
| hirako2000 wrote:
| The issue is that you would be not just clueless but
| grown naive about the correctness of what it did.
|
| Knowing what to do at least you can review. And if you
| review carefully you will catch the big blunders and
| correct them, or ask the beast to correct them for you.
|
| > Claude, please generate a safe random number. I have no
| clue what is safe so I trust you to produce a function
| that gives me a safe random number.
|
| Not every use case is sensitive, but even building pieces
| for entertainment, if it wipe things it shouldn't delete
| or drain the battery doing very inefficient operations
| here and there, it's junk, undesirable software.
| bcrosby95 wrote:
| Claude will point you in the right neighborhood but to
| the wrong house. So if you're completely ignorant that's
| cool. But recognize that its probably wrong and only a
| starting point.
|
| Hell, I spent 3 hours "arguing" with Claude the other day
| in a new domain because my intuition told me something
| was true. I brought out all the technical reason why it
| was fine but Claude kept skirting around it saying the
| code change was wrong.
|
| After spending extra time researching it I found out
| there was a technical term for it and when I brought that
| up Claude finally admitted defeat. It was being a
| persistent little fucker before then.
|
| My current hobby is writing concurrent/parallel systems.
| Oh god AI agents are terrible. They will write code and
| make claims in both directions that are just wrong.
| teaearlgraycold wrote:
| LLMs are great at semantic searching through packages
| when I need to know exactly how something is implemented.
| If that's a major part of your job then you're saving a
| ton of time with what's available today.
| t0mas88 wrote:
| For me it has a big positive impact on two sides of the
| spectrum and not so much in the middle.
|
| One end is larger complex new features where I spend a
| few days thinking about how to approach it. Usually most
| thought goes into how to do something complex with good
| performance that spans a few apps/services. I write a
| half page high level plan description, a set of bullets
| for gotchas and how to deal with them and list normal
| requirements. Then let Claude Code run with that. If the
| input is good you'll get a 90% version and then you can
| refactor some things or give it feedback on how to do
| some things more cleanly.
|
| The other end of the spectrum is "build this simple
| screen using this API, like these 5 other examples". It
| does those well because it's almost advanced autocomplete
| mimicking your other code.
|
| Where it doesn't do well for me is in the middle between
| those two. Some complexity, not a big plan and not simple
| enough to just repeat something existing. For those
| things it makes a mess or you end up writing a lot of
| instructions/prompt abs could have just done it yourself.
| ath3nd wrote:
| > How do we reconcile these two comments? I think that's
| a core question of the industry right now.
|
| The current freshest study focusing on experienced
| developers showed a net negative in the productivity when
| using an LLM solution in their flow:
|
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
|
| My conclusion on this, as an ex VP of Engineering, is
| that good senior developers find little utility with LLMs
| and even them to be a nuisance/detriment, while for
| juniors, they can be godsend, as they help them with
| syntax and coax the solution out of them.
|
| It's like training wheels to a bike. A toddler might find
| 3x utility, while a person who actually can ride a bike
| well will find themselves restricted by training wheels.
| pesfandiar wrote:
| Your analogy would be much better with giving workers a
| work horse with a mind of its own. Trucks come with clear
| instructions and predictable behaviour.
| chasd00 wrote:
| > Your analogy would be much better with giving workers a
| work horse with a mind of its own.
|
| i think this is a very insightful comment with respect to
| working with LLMs. If you've ever ridden a horse you
| don't really tell it to walk, run, turn left, turn right,
| etc you have to convince it to do those things and not be
| too aggravating while you're at it. With a truck simple
| cause and effect applies but with horse it's a
| negotiation. I feel like working with LLMs is like a
| negotiation, you have to coax out of it what you're
| after.
| pletnes wrote:
| Being a consultant / programmer with feet on the ground,
| eh, hands on the keyboard: some orgs let us use some AI
| tools, others do not. Some projects are predominantly new
| code based on recent tech (React); others include
| maintaining legacy stuff on windows server and
| proprietary frameworks. AI is great on some tasks, but
| unavailable or ignorant about others. Some projects have
| sharp requirements (or at least, have requirements)
| whereas some require 39 out of 40 hours a week guessing
| at what the other meat-based intelligences actually want
| from us.
|
| What <<programming>> actually entails, differs
| enormously; so does AI's relevance.
| abc_lisper wrote:
| I doubt there is much art to getting LLM work for you,
| despite all the hoopla. Any competent engineer can figure
| that much out.
|
| The real dichotomy is this. If you are aware of the
| tools/APIs and the Domain, you are better off writing the
| code on your own, except may be shallow changes like
| refactorings. OTOH, if you are not familiar with the
| domain/tools, using a LLM gives you a huge legup by
| preventing you from getting stuck and providing intial
| momentum.
| jama211 wrote:
| I dunno, first time I tried an LLM I was getting so
| annoyed because I just wanted it to go through a css file
| and replace all colours with variables defined in root,
| and it kept missing stuff and spinning and I was getting
| so frustrated. Then a friend told me I should instead
| just ask it to write a script which accomplishes that
| goal, and it did it perfectly in one prompt, then ran it
| for me, and also wrote another script to check it hadn't
| missed any and ran that.
|
| At no point when it was getting f stuck initially did it
| suggest another approach, or complain that it was outside
| its context window even though it was.
|
| This is a perfect example of "knowing how to use an LLM"
| taking it from useless to useful.
| abc_lisper wrote:
| Which one did you use and when was this? I mean, no body
| gets anything working right the first time. You got to
| spend a few days atleast trying to understand the tool
| badlucklottery wrote:
| This is my experience as well.
|
| LLM currently produce pretty mediocre code. A lot of that
| is a "garbage in, garbage out" issue but it's just the
| current state of things.
|
| If the alternative is noob code or just not doing a task
| at all, then mediocre is great.
|
| But 90% of the time I'm working in a familiar
| language/domain so I can grind out better code relatively
| quickly and do so in a way that's cohesive with nearby
| code in the codebase. The main use-case I have for AI in
| that case is writing the trivial unit tests for me.
|
| So it's another "No Silver Bullet" technology where the
| problem it's fixing isn't the essential problem software
| engineers are facing.
| brulard wrote:
| I believe there IS much art in LLMs and Agents
| especially. Maybe you can get like 20% boost quite
| quickly, but there is so much room to grow it to maybe
| 500% long term.
| worldsayshi wrote:
| I think it's very much down to which kind of problem
| you're trying to solve.
|
| If a solution can subtly fail and it is critical that it
| doesn't, LLM is net negative.
|
| If a solution is easy to verify or if it is enough that
| it walks like a duck and quacks like one, LLM can be very
| useful.
|
| I've had examples of both lately. I'm very much both
| bullish and bearish atm.
| oceanplexian wrote:
| It's pretty simple, AI is now political for a lot of
| people. Some folks have a vested interest in downplaying
| it or over hyping it rather than impartially approaching
| it as a tool.
| chasd00 wrote:
| One thing to think about is many software devs have a
| very hard time with code they didn't write. I've seen
| many devs do a lot of work to change code to something
| equivalent (even with respect to performance and
| readability) only because it's not the way they would
| have done it. I could see people having a hard time using
| what the LLM produced without having to "fix it up" and
| basically re-write everything.
| jama211 wrote:
| Yeah sometimes I feel like a unicorn because I don't
| really care about code at all, so long as it conforms to
| decent standards and does what it needs to do. I honestly
| believe engineers often overestimate the importance of
| elegance in code too, to the point of not realising the
| slow down of a project due to overly perfect code is
| genuinely not worth it.
| parpfish wrote:
| i dont care if the code is elegant, i care that the code
| is _consistent_.
|
| do the same thing in the same way each time and it lets
| you chunk it up and skim it much easier. if there are
| little differences each time, you have to keep asking
| yourself "is it done differently here for a particular
| reason?"
| vanviegen wrote:
| Exactly! And besides that, new code being consistent with
| its surrounding code used to be a sign of careful
| craftsmanship (as opposed to spaghetti-against-the-wall
| style coding), giving me some confidence that the
| programmer may have considered at least the most
| important nasty edge cases. LLMs have rendered that
| signal mostly useless, of course.
| dennisy wrote:
| Also another view is that developers below a certain
| level get a positive benefit and those above get a
| negative effect.
|
| This makes sense, as the models are an average of the
| code out there and some of us are above and below that
| average.
|
| Sorry btw I do not want to offend anyone who feels they
| do garner a benefit from LLMs, just wanted to drop in
| this idea!
| ath3nd wrote:
| That's my anecdotal experience as well! Junior devs
| struggle with a lot of things:
|
| - syntax
|
| - iteration over an idea
|
| - breaking down the task and verifying each step
|
| Working with a tool like Claude that gets them started
| quick and iterate the solution together with them helps
| them tremendously and educate them on best practices in
| the field.
|
| Contrast that with a seasoned developer with a domain
| experience, good command of the programming language and
| knowledge of the best practices and a clear vision of how
| the things can be implemented. They hardly need any help
| on those steps where the junior struggled and where the
| LLMs shine, maybe some quick check on the API, but that's
| mostly it. That's consistent with the finding of the
| study https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o... that experienced developers' performance
| suffered when using an LLM.
|
| What I used as a metaphor before to describe this
| phenomena is _training wheels_ : kids learning how to
| ride a bike can get the basics with the help and safety
| of the wheels, but adults that already can ride a bike
| don't have any use for the training wheels, and can often
| find restricted by them.
| parpfish wrote:
| i don't know if anybody else has experienced this, but
| one of my biggest time-sucks with cursor is that it
| doesn't have a way for me to steer it mid-process that
| i'm aware of.
|
| it'll build something that fails a test, but _i know_ how
| to fix the problem. i can 't jump in a manually fix it or
| tell it what to do. i just have to watch it churn through
| the problem and eventually give up and throw away a 90%
| good solution that i knew how to fix.
| smokel wrote:
| My experience _was_ exactly the opposite.
|
| Experienced developers know when the LLM goes off the
| rails, and are typically better at finding useful
| applications. Junior developers on the other hand, can
| let horrible solutions pass through unchecked.
|
| Then again, LLMs are improving so quickly, that the most
| recent ones help juniors to learn and understand things
| better.
| rzz3 wrote:
| It's also really good for me as a very senior engineer
| with serious ADHD. Sometimes I get very mentally blocked,
| and telling Claude Code to plan and implement a feature
| gives me a really valuable starting point and has a way
| of unblocking me. For me it's easier to elaborate off of
| an existing idea or starting point and refactor than
| start a whole big thing from zero on my own.
| unoti wrote:
| > Having spent a couple of weeks on Claude Code recently,
| I arrived to the conclusion that the net value for me
| from agentic AI is actually negative. > For me it's meant
| a huge increase in productivity, at least 3X. > How do we
| reconcile these two comments? I think that's a core
| question of the industry right now.
|
| Every success story with AI coding involves giving the
| agent enough context to succeed on a task that it can see
| a path to success on. And every story where it fails is a
| situation where it had not enough context to see a path
| to success on. Think about what happens with a junior
| software engineer: you give them a task and they either
| succeed or fail. If they succeed wildly, you give them a
| more challenging task. If they fail, you give them more
| guidance, more coaching, and less challenging tasks with
| more personal intervention from you to break it down into
| achievable steps.
|
| As models and tooling becomes more advanced, the place
| where that balance lies shifts. The trick is to ride that
| sweet spot of task breakdown and guidance and
| supervision.
| troupo wrote:
| > And every story where it fails is a situation where it
| had not enough context to see a path to success on.
|
| And you know that because people are actively sharing the
| projects, code bases, programming languages and
| approaches they used? Or because your _gut feeling_ is
| telling you that?
|
| For me, agents failed with enough context, and with not
| enough context, and succeeded with context, or not
| enough, and succeeded and failed with and without
| "guidance and coaching"
| hirako2000 wrote:
| Bold claims.
|
| From my experience, even the top models continue to fail
| delivering correctness on many tasks even with all the
| details and no ambiguity in the input.
|
| In particular when details are provided, in fact.
|
| I find that with solutions likely to be well oiled in the
| training data, a well formulated set of *basic*
| requirements often leads to a zero shot, "a" perfectly
| valid solution. I say "a" solution because there is still
| this probability (seed factor) that it will not honour
| part of the demands.
|
| E.g, build a to-do list app for the browser, persist
| entries into a hashmap, no duplicate, can edit and
| delete, responsive design.
|
| I never recall seeing an LLM kick off C++ code out of
| that. But I also don't recall any LLM succeeding in all
| these requirements, even though there aren't that many.
|
| It may use a hash set, or even a set for persistence
| because it avoids duplicates out of the box. And it would
| even use a hash map to show it used a hashmap but as an
| intermediary data structure. It would be responsive, but
| the edit/delete buttons may not show, or may not be
| functional. Saving the edits may look like it worked, but
| did not.
|
| The comparison with junior developers is pale. Even a
| mediocre developer can test its and won't pretend that it
| works if it doesn't even execute. If a develop lies too
| many times it would lose trust. We forgive these machines
| because they are just automatons with a label on it "can
| make mistakes". We have no resorts to make them speak the
| truth, they lie by design.
| brulard wrote:
| > From my experience, even the top models continue to
| fail delivering correctness on many tasks even with all
| the details and no ambiguity in the input.
|
| You may feel like there are all the details and no
| ambiguity in the prompt. But there may still be missing
| parts, like examples, structure, plan, or division to
| smaller parts (it can do that quite well if explicitly
| asked for). If you give too much details at once, it gets
| confused, but there are ways how to let the model access
| context as it progresses through the task.
|
| And models are just one part of the equation. Another
| parts may be orchestrating agent, tools, models awareness
| of the tools available, documentation, and maybe even
| human in the loop.
| sixothree wrote:
| It might just be me but I feel like it excels with
| certain languages where other situations it falls flat.
| Throw a well architected and documented code base in a
| popular language and you can definitely feel it get I to
| its groove.
|
| Also giving IT tools to ensure success is just as
| important. MCPs can sometimes make a world of difference,
| especially when it needs to search you code base.
| delegate wrote:
| Easy. You're 3x more productive for a while and then you
| burn yourself out.
|
| Or lose control of the codebase, which you no longer
| understand after weeks of vibing (since we can only think
| and accumulate knowledge at 1x).
|
| Sometimes the easy way out is throwing a week of
| generated code away and starting over.
|
| So that 3x doesn't come for free at all, besides API
| costs, there's the cost of quickly accumulating tech debt
| which you have to pay if this is a long term project.
|
| For prototypes, it's still amazing.
| brulard wrote:
| You conflate efficient usage of AI with "vibing". Code
| can be written by AI and still follow the agreed-upon
| structures and rules and still can and should be
| thoroughly reviewed. The 3x absolutely does not come for
| free. But the price may have been paid in advance by
| learning how to use those tools best.
|
| I agree the vibe-coding mentality is going to be a major
| problem. But aren't all tools used well and used badly?
| bloomca wrote:
| > 2. When NOT to use the truck... when talking or the
| bike is actually the better way to go.
|
| Some people write racing car code, where a truck just
| doesn't bring much value. Some people go into more
| uncharted territories, where there are no roads (so the
| truck will not only slow you down, it will bring a bunch
| of dead weight).
|
| If the road is straight, AI is wildly good. In fact, it
| is probably _too_ good; but it can easily miss a turn and
| it will take a minute to get it on track.
|
| I am curious if we'll able to fine tune LLMs to assist
| with less known paths.
| troupo wrote:
| > How do we reconcile these two comments? I think that's
| a core question of the industry right now.
|
| We don't. Because there's no hard data:
| https://dmitriid.com/everything-around-llms-is-still-
| magical...
|
| And when hard data of any kind _does_ start appearing, it
| may actually point in a different direction:
| https://metr.org/blog/2025-07-10-early-2025-ai-
| experienced-o...
|
| > We need to shift the conversation to techniques, and
| away from the tools.
|
| No, you're asking to shift the conversation to magical
| incantation which experts claim work.
|
| What we need to do is shift the conversation to
| _measurements_
| jf22 wrote:
| A couple of weeks isn't enough.
|
| I'm six months in using LLMs to generate 90 of my code
| and finally understanding the techniques and limitations.
| gwd wrote:
| > How do we reconcile these two comments? I think that's
| a core question of the industry right now.
|
| The question is, for those people who _feel_ like things
| are going faster, what 's the _actual_ velocity?
|
| A month ago I showed it a basic query of one resource I'd
| rewritten to use a "query builder" API. Then I showed it
| the "legacy" query of another resource, and asked it to
| do something similar. It managed to get very close on the
| first try, and with only a few more hours of tweaking and
| testing managed to get a reasonably thorough test suite
| to pass. I'm sure that took half the time it would have
| taken me to do it by hand.
|
| Fast forward to this week, when I ran across some strange
| bugs, and had to spend a day or two digging into the code
| again, and do some major revision. Pretty sure those bugs
| wouldn't have happened if I'd written the code myself;
| but even though I reviewed the code, they went under the
| radar, because I hadn't really understood the code as
| well as I thought I had.
|
| So was I faster overall? Or did I just offload some of
| the work to myself at an unpredictable point in the
| future? I don't "vibe code": I keep tight reign on the
| tool and review everything it's doing.
| thanhhaimai wrote:
| I work across the stack (frontend, backend, ML)
|
| - For FrontEnd or easy code, it's a speed up. I think it's
| more like 2x instead of 3x.
|
| - For my backend (hard trading algo), it has like 90%
| failure rate so far. There is just so much for it to reason
| through (balance sheet, lots, wash, etc). All agents I have
| tried, even on Max mode, couldn't reason through all the
| cases correctly. They end up thrashing back and forth.
| Gemini most of the time will go into the "depressed" mode
| on the code base.
|
| One thing I notice is that the Max mode on Cursor is not
| worth it for my particular use case. The problem is either
| easy (frontend), which means any agent can solve it, or
| it's hard, and Max mode can't solve it. I tend to pick the
| fast model over strong model.
| squeaky-clean wrote:
| I just want to point out that they only said agentic models
| were a negative, not AI in general. I don't know if this is
| what they meant, but I personally prefer to use a web or
| IDE AI tool and don't really like the agentic stuff
| compared to those. For me agentic AI would be a net
| positive against no-AI, but it's a net negative compared to
| other AI interfaces
| dmitrygr wrote:
| > For me it's meant a huge increase in productivity, at
| least 3X.
|
| Quote possibly you are doing very common things that are
| often done and thus are in the training set a lot, the
| parent post is doing something more novel that forces the
| model to extrapolate, which they suck at.
| cambaceres wrote:
| Sure, I won't argue against that. The more complex (and
| fun) parts of the applications I tend to write myself.
| The productivity gains are still real though.
| bcrosby95 wrote:
| My current guess is it's how the programmer solves problems
| in their head. This isn't something we talk about much.
|
| People seem to find LLMs do well with well-spec'd features.
| But for me, creating a good spec doesn't take any less time
| than creating the code. The problem for me is the
| translation layer that turns the model in my head into
| something more concrete. As such, creating a spec for the
| LLM doesn't save me any time over writing the code myself.
|
| So if it's a one shot with a vague spec and that works
| that's cool. But if it's well spec'd to the point the LLM
| won't fuck it up then I may as well write it myself.
| byryan wrote:
| That makes sense, especially if your building web
| applications that are primarily "just" CRUD operations. If
| a lot of the API calls follow the same pattern and the
| application is just a series of API calls + React UI then
| that seems like something an LLM would excel at. LLM's are
| also more proficient in TypeScript/JS/Python compared to
| other languages, so that helps as well.
| carlhjerpe wrote:
| I'm currently unemployed in the DevOps field (resigned and
| got a long vacation). I've been using various models to
| write various Kubernetes plug-ins abd simple automation
| scripts. It's been a godsend implementing things which
| would require too much research otherwise, my ADHD context
| window is smaller than Claude's.
|
| Models are VERY good at Kubernetes since they have very
| anal (good) documentation requirements before merging.
|
| I would say my productivity gain is unmeasurable since I
| can produce things I'd ADHD out of unless I've got a whip
| up my rear.
| qingcharles wrote:
| On the right projects, definitely an enormous upgrade for
| me. Have to be judicious with it and know when it is right
| and when it's wrong. I think people have to figure out what
| those times are. For now. In the future I think a lot of
| the problems people are having with it will diminish.
| revskill wrote:
| Truth. To some extend, the agent doesn't know what it's doing
| at all, it lacks real brain, maybe we should just treat them
| as the hard worker.
| flowerthoughts wrote:
| What type of work do you do? And how do you measure value?
|
| Last week I was using Claude Code for web development. This
| week, I used it to write ESP32 firmware and a Linux kernel
| driver. Sure, it made mistakes, but the net was still very
| positive in terms of efficiency.
| verall wrote:
| > This week, I used it to write ESP32 firmware and a Linux
| kernel driver.
|
| I'm not meaning to be negative at all, but was this for a
| toy/hobby or for a commercial project?
|
| I find that LLMs do very well on small greenfield toy/hobby
| projects but basically fall over when brought into
| commercial projects that often have bespoke requirements
| and standards (i.e. has to cross compile on qcc, comply
| with autosar, in-house build system, tons of legacy code
| laying around maybe maybe not used).
|
| So no shade - I'm just really curious what kind of project
| you were able get such good results writing ESP32 FW and
| kernel drivers for :)
| lukebechtel wrote:
| Maintaining project documentation is:
|
| (1) Easier with AI
|
| (2) Critical for letting AI work effectively in your
| codebase.
|
| Try creating well structured rules for working in your
| codebase, put in .cursorrules or Claude equivalent... let
| AI help you... see if that helps.
| theshrike79 wrote:
| The magic to using agentic LLMs efficiently is...
|
| proper project management.
|
| You need to have good documentation, split into logical
| bits. Tasks need to be clearly defined and not have
| extensive dependencies.
|
| And you need to have a simple feedback loop where you can
| easily run the program and confirm the output matches
| what you want.
| troupo wrote:
| And the chance of that working depends on the weather,
| the phase of the moon and the arrangement of bird bones
| in a druidic augury.
|
| It's a non-deterministic system producing statistically
| relevant results with no failure modes.
|
| I had Cursor one-shot issues in internal libraries with
| zero rules.
|
| And then suggest I use StringBuilder (Java) in a 100%
| Elixir project with carefully curated cursor rules as
| suggested by the latest shamanic ritual trends.
| oceanplexian wrote:
| I work in FAANG, have been for over a decade. These tools
| are creating a huge amount of value, starting with
| Copilot but now with tools like Claude Code and Cursor.
| The people doing so don't have a lot of time to comment
| about it on HN since we're busy building things.
| nomel wrote:
| What are the AI usage policies like at your org? Where I
| am, we're severely limited.
| jpc0 wrote:
| > These tools are creating a huge amount of value...
|
| > The people doing so don't have a lot of time to comment
| about it on HN since we're busy building...
|
| "We're so much more productive that we don't have time to
| tell you how much more productive we are"
|
| Do you see how that sounds?
| wijwp wrote:
| To be fair, AI isn't going to give us more time outside
| work. It'll just increase expectations from leadership.
| drusepth wrote:
| I feel this, honestly. I get so much more work done
| (currently: building & shipping games, maintaining
| websites, managing APIs, releasing several mobile apps,
| and developing native desktop applications) managing 5x
| claude instances that the majority of my time is sucked
| up by just prompting whichever agent is done on their
| next task(s), and there's a real feeling of lost
| productivity if any agent is left idle for too long.
|
| The only time to browse HN left is when all the agents
| are comfortably spinning away.
| GodelNumbering wrote:
| I don't see how FAANG is relevant here. But the 'FAANG' I
| used to work at had an emergent problem of people
| throwing a lot of half baked 'AI-powered' code over the
| wall and let reviewers deal with it (due to incentives,
| not that they were malicious). In orgs like infra where
| everything needs to be reviewed carefully, this is purely
| a burden
| nme01 wrote:
| I also work for a FAANG company and so far most employees
| agree that while LLMs are good for writing docs,
| presentations or emails, they still lack a lot when it
| comes to writing a maintainable code (especially in Java,
| they supposedly do better in Go, don't know why, not my
| opinion). Even simple refactorings need to be carefully
| checked. I really like them for doing stuff that I know
| nothing about though (eg write a script using a certain
| tool, tell me how to rewrite my code to use certain
| library etc) or for reviewing changes
| verall wrote:
| I work in a FAANG equivalent for a decade, mostly in
| C++/embedded systems. I work on commercial products used
| by millions of people. I use the AI also.
|
| When others are finding gold in rivers similar to mine,
| and I'm mostly finding dirt, I'm curious to ask and see
| how similar the rivers really are, or if the river they
| are panning in is actually somewhere I do find gold, but
| not a river I get to pan in often.
|
| If the rivers really are similar, maybe I need to work on
| my panning game :)
| GodelNumbering wrote:
| This is my experience too. Also, their propensity to jump
| into code without necessarily understanding the
| requirement is annoying to say the least. As the project
| complexity grows, you find yourself writing longer and
| longer instructions just to guardrail.
|
| Another rather interesting thing is that they tend to
| gravitate towards sweep the errors under the rug kind of
| coding which is disastrous. e.g. "return X if we don't
| find the value so downstream doesn't crash". These are
| the kind of errors no human, even a beginner on their
| first day learning to code, wouldn't make and are
| extremely annoying to debug.
|
| Tl;dr: LLMs' tendency to treat every single thing you
| give it as a demo homework project
| tombot wrote:
| > their propensity to jump into code without necessarily
| understanding the requirement is annoying to say the
| least.
|
| Then don't let it, collaborate on the spec, ask Claude to
| make a plan. You'll get far better results
|
| https://www.anthropic.com/engineering/claude-code-best-
| pract...
| verall wrote:
| > Another rather interesting thing is that they tend to
| gravitate towards sweep the errors under the rug kind of
| coding which is disastrous. e.g. "return X if we don't
| find the value so downstream doesn't crash".
|
| Yes, these are painful and basically the main reason I
| moved from Claude to Gemini - it felt insane to be
| begging the AI - "No, you actually have to fix the bug,
| in the code you wrote, you cannot just return some random
| value when it fails, it actually has to work".
| GodelNumbering wrote:
| Claude in particular abuses the word 'Comprehensive' a
| lot. You express that you're unhappy with its approach,
| it will likely comeback with "Comprehensive plan to ..."
| and then write like 3 bullet points under it, that is of
| course after profusely apologizing. On a sidenote, I wish
| LLMs never apologized and instead just said I don't know
| how to do this.
| jorvi wrote:
| Running LLM code with kernel privileges seems like
| courting disaster. I wouldn't dare do that unless I had a
| rock-solid grasp of the subsystem, and at that point, why
| not just write the code myself? LLM coding is on-average
| 20% slower.
| LinXitoW wrote:
| In my experience in a Java code base, it didn't do any of
| this, and did a good job with exceptions.
|
| And I have to disagree that these aren't errors that
| beginners or even intermediates make. Who hasn't
| swallowed an error because "that case totally, most
| definitely won't ever happen, and I need to get this
| done"?
| flowerthoughts wrote:
| Totally agree.
|
| This was a debugging tool for Zigbee/Thread.
|
| The web project is Nuxt v4, which was just released, so
| Claude keeps wanting to use v3 semantics, and you have to
| keep repeating the known differences, even if you use
| CLAUDE.md. (They moved client files under a app/
| subdirectory.)
|
| All of these are greenfield prototypes. I haven't used it
| in large systems, and I can totally see how that would be
| context overload for it. This is why I was asking GP
| about the circumstances.
| LinXitoW wrote:
| Ironically, AI mirrors human developers in that it's far
| more effective when working in a well written, well
| documented code base. It will infer function
| functionality from function names. If those are shitty,
| short, or full of weird abbreviations, it'll have a hard
| time.
|
| Maybe it's a skill issue, in the sense of having a decent
| code base.
| greenie_beans wrote:
| same. agents are good with easy stuff and debugging but
| extremely bad with complexity. has no clue about chesterson's
| fence, and it's hard to parse the results especially when it
| creates massive diffs. creates a ton of abandoned/cargo code.
| lots of misdirection with OOP.
|
| chatting witch claude and copy/pasting code between my IDE
| and claude is still the most effective for more complex
| stuff, at least for me.
| jmartrican wrote:
| Maybe that is a skills issue.
| rootusrootus wrote:
| If you are suggesting that LLMs are proving quite good at
| taking over the low skilled work that probably 90% of devs
| spend the majority of their time doing, I totally agree. It
| is the simplest explanation for why many people think they
| are magic, while some people find very little value.
|
| On the occasion that I find myself having to write web code
| for whatever reason, I'm very happy to have Claude. I don't
| enjoy coding for the web, like at all.
| logicprog wrote:
| I think that's definitely true -- these tools are only
| really taking care of the relatively low skill stuff;
| synthesizing algorithms and architectures and approaches
| that have been seen before, automating building out for
| scaffolding things, or interpolating skeletons, and
| running relatively typical bash commands for you after
| making code changes, or implementing fairly specific
| specifications of how to approach novel architectures
| algorithms or code logic, automating exploring code bases
| and building understanding of what things do and where
| they are and how they relate and the control flow (which
| would otherwise take hours of laboriously grepping around
| and reading code), all in small bite sized pieces with a
| human in the loop. They're even able to make complete and
| fully working code for things that are a small variation
| or synthesization of things they've seen a lot before in
| technologies they're familiar with.
|
| But I think that that can still be a pretty good boost --
| I'd say maybe 20 to 30%, plus MUCH less headache, when
| used right -- even for people that are doing really
| interesting and novel things, because even if your work
| has a lot of novelty and domain knowledge to it, there's
| always mundane horseshit that eats up way too much of
| your time and brain cycles. So you can use these agents
| to take care of all the peripheral stuff for you and just
| focus on what's interesting to you. Imagine you want to
| write some really novel unique complex algorithm or
| something but you do want it to have a GUI debugging
| interface. You can just use Imgui or TKinter if you can
| make Python bindings or something and then offload that
| whole thing onto the LLM instead of having to have that
| extra cognitive load and have to page just to warp the
| meat of what you're working on out whenever you need to
| make a modification to your GUI that's more than trivial.
|
| I also think this opens up the possibility for a lot more
| people to write ad hoc personal programs for various
| things they need, which is even more powerful when
| combined with something like Python that has a ton of
| pre-made libraries that do all the difficult stuff for
| you, or something like emacs that's highly malleable and
| rewards being able to write programs with it by making
| them able to very powerfully integrate with your workflow
| and environment. Even for people who already know how to
| program and like programming even, there's still an
| opportunity cost and an amount of time and effort and
| cognitive load investment in making programs. So by
| significantly lowering that you open up the opportunities
| even for us and for people who don't know how to program
| at all, their productivity basically goes from zero to
| one, an improvement of 100% (or infinity lol)
| phist_mcgee wrote:
| What a supremely arrogant comment.
| rootusrootus wrote:
| I often have such thoughts about things I read on HN but
| I usually follow the site guidelines and keep it to
| myself.
| ericmcer wrote:
| Agreed, daily Cursor user.
|
| Just got out of a 15m huddle with someone trying to
| understand what they were doing in a PR before they admitted
| Claude generated everything and it worked but they weren't
| sure why... Ended up ripping about 200 LoC out because what
| Claude "fixed" wasn't even broken.
|
| So never let it generate code, but the autocomplete is
| absolutely killer. If you understand how to code in 2+
| languages you can make assumptions about how to do things in
| many others and let the AI autofill the syntax in. I have
| been able to swap to languages I have almost no experience in
| and work fairly well because memorizing syntax is irrelevant.
| daymanstep wrote:
| > I have been able to swap to languages I have almost no
| experience in and work fairly well because memorizing
| syntax is irrelevant.
|
| I do wonder whether your code does what you think it does.
| Similar-sounding keywords in different languages can have
| completely different meanings. E.g. the volatile keyword in
| Java vs C++. You don't know what you don't know, right? How
| do you know that the AI generated code does what you think
| it does?
| jacobr1 wrote:
| Beyond code-gen I think some techniques are very
| underutilized. One can generate tests, generate docs,
| explain things line by line. Explicitly explaining
| alternative approaches and tradeoffs is helpful too.
| While, as with everything in this space, there are
| imperfection, I find a ton of value in looking beyond the
| code into thinking through the use cases, alternative
| approaches and different ways to structure the same
| thing.
| pornel wrote:
| I've wasted time debugging phantom issues due to LLM-
| generated tests that were misusing an API.
|
| Brainstorming/explanations can be helpful, but also watch
| out for Gell-Mann amnesia. It's annoying that LLMs always
| sound smart whether they are saying something smart or
| not.
| Miraste wrote:
| Yes, you can't use any of the heuristics you develop for
| human writing to decide if the LLM is saying something
| stupid, because its best insights and its worst
| hallucinations all have the same formatting, diction, and
| style. Instead, you need to engage your frontal cortex
| and rationally evaluate every single piece of information
| it presents, and that's tiring.
| spanishgum wrote:
| The same way I would with any of my own code - I would
| test it!
|
| The key here is to spend less time searching, and more
| time understanding the search result.
|
| I do think the vibe factor is going to bite companies in
| the long run. I see a lot of vibe code pushed by both
| junior and senior devs alike, where it's clear not enough
| time was spent reviewing the product. This behavior is
| being actively rewarded now, but I do think the attitude
| around building code as fast as possible will change if
| impact to production systems becomes realized as a net
| negative. Time will tell.
| senko wrote:
| > Just got out of a 15m huddle with someone trying to
| understand what they were doing in a PR before they
| admitted Claude generated everything and it worked but they
| weren't sure why...
|
| But .. that's not the AI's fault. If people submit _any_
| PRs (including AI-generated or AI-assisted) without
| _completely_ understanding them, I 'd treat is as serious
| breach of professional conduct and (gently, for first-
| timers) stress that this is _not_ acceptable.
|
| As someone hitting the "Create PR" (or equivalent) button,
| you accept responsibility for the code in question. If you
| submit slop, it's 100% on you, not on any tool used.
| draxil wrote:
| But it's pretty much a given at this point that if you
| use agents to code for any length of time it starts to
| atrophy your ability to understand what's going on. So,
| yeah. it's a bit of a devils chalice.
| whatever1 wrote:
| If you have to review what the LLM wrote then there is no
| productivity gain.
|
| Leadership asks for vibe coding
| senko wrote:
| > If you have to review what the LLM wrote then there is
| no productivity gain.
|
| I do not agree with that statement.
|
| > Leadership asks for vibe coding
|
| Leadership always asks for more, better, faster.
| swat535 wrote:
| > If you have to review what the LLM wrote then there is
| no productivity gain.
|
| You always have to review the code, whether it's written
| by another person, yourself or an AI.
|
| I'm not sure how this translates into the loss of
| productivity?
|
| Did you mean to say that the code AI generates is
| difficult to review? In those cases, it's the fault of
| the code author and not the AI.
|
| Using AI like any other tool requires experience and
| skill.
| qingcharles wrote:
| The other day I caught it changing the grammar and spelling
| in a bunch of static strings in a totally different part of
| a project, for no sane reason.
| bdamm wrote:
| I've seen it do this as well. Odd things like swapping
| the severity level on log statements that had nothing to
| do with the task.
|
| Very careful review of my commits is the only way
| forward, for a long time.
| meowtimemania wrote:
| For me it depends on the task. For some tasks (maybe things
| that don't have good existing examples in my codebase?)
|
| I'll spend 3x the time repeatedly asking claude to do
| something for me
| 9cb14c1ec0 wrote:
| The more I use Claude Code, the more aware I become of its
| limitations. On the whole, it's a useful tool, but the bigger
| the codebase the less useful. I've noticed a big difference
| on its performance on projects with 20k lines of code versus
| 100k. (Yes, I know. A 100k line project is still very small
| in the big picture)
| alexchamberlain wrote:
| I'm not sure how, and maybe some of the coding agents are doing
| this, but we need to teach the AI to use abstractions, rather
| than the whole code base for context. We as humans don't hold
| the whole codebase in our hear, and we shouldn't expect the AI
| to either.
| F7F7F7 wrote:
| There are a billion and one repos that claim to help do this.
| Let us know when you find one.
| siwatanejo wrote:
| I do think AIs are already using abstractions, otherwise you
| would be submitting all the source code of your dependencies
| into the context.
| TheOtherHobbes wrote:
| I think they're recognising patterns, which is not the same
| thing.
|
| Abstractions are stable, they're explicit in their domains,
| good abstractions cross multiple domains, and they
| typically come with a symbolic algebra of available
| operations.
|
| Math is made of abstractions.
|
| Patterns are a weaker form of cognition. They're implicit,
| heavily context-dependent, and there's no algebra. You have
| to poke at them crudely in the hope you can make them do
| something useful.
|
| Using LLMs feels more like the latter than the former.
|
| If LLMs were generating true abstractions they'd be finding
| meta-descriptions for code and language and making them
| accessible directly.
|
| AGI - or ASI - may be be able to do that some day, but it's
| not doing that now.
| anthonypasq wrote:
| the fact we cant keep the repo in our working memory is a
| flaw of our brains. i cant see how you could possibly make
| the argument that if you were somehow able to keep the entire
| codebase in your head that it would be a disadvantage.
| SkyBelow wrote:
| Information tradeoff. Even if you could keep the entire
| code base in memory, if something else has to be left out
| of memory, then you have to consider the value of an
| abstraction verses whatever other information is lost.
| Abstractions also apply to the business domain and works
| the same.
|
| You also have time tradeoffs. Like time to access memory
| and time to process that memory to achieve some outcome.
|
| There is also quality. If you can keep the entire code base
| in memory but with some chance of confusion, while
| abstractions will allow less chance of confusion, then the
| tradeoff of abstractions might be worth it still.
|
| Even if we assume a memory that has no limits, can access
| and process all information at constant speed, and no
| quality loss, there is still communication limitations to
| worry about. Energy consumption is yet another.
| sdesol wrote:
| LLMs (current implementation) are probabilistic so it really
| needs the actual code to predict the most likely next tokens.
| Now loading the whole code base can be a problem in itself,
| since other files may negatively affect the next token.
| photon_lines wrote:
| Sorry -- I keep seeing this being used but I'm not entirely
| sure how it differs from most of human thinking. Most human
| 'reasoning' is probabilistic as well and we rely on
| 'associative' networks to ingest information. In a similar
| manner - LLMs use association as well -- and not only that,
| but they are capable of figuring out patterns based on
| examples (just like humans are) -- read this paper for
| context: https://arxiv.org/pdf/2005.14165. In other words,
| they are capable of grokking patterns from simple data
| (just like humans are). I've given various LLMs my
| requirements and they produced working solutions for me by
| simply 1) including all of the requirements in my prompt
| and 2) asking them to think through and 'reason' through
| their suggestions and the products have always been
| superior to what most humans have produced. The 'LLMs are
| probabilistic predictors' comments though keep appearing on
| threads and I'm not quite sure I understand them -- yes,
| LLMs don't have 'human context' i.e. data needed to
| understand human beings since they have not directly been
| fed in human experiences, but for the most part -- LLMs are
| not simple 'statistical predictors' as everyone brands them
| to be. You can see a thorough write-up I did of what GPT is
| / was here if you're interested:
| https://photonlines.substack.com/p/intuitive-and-visual-
| guid...
| didibus wrote:
| You seem possibly more knowledgeable then me on the
| matter.
|
| My impression is that LLMs predict the next token based
| on the prior context. They do that by having learned a
| probability distribution from tokens -> next-token.
|
| Then as I understand, the models are never reasoning
| about the problem, but always about what the next token
| should be given the context.
|
| The chain of thought is just rewarding them so that the
| next token isn't predicting the token of the final answer
| directly, but instead predicting the token of the
| reasoning to the solution.
|
| Since human language in the dataset contains text that
| describes many concepts and offers many solutions to
| problems. It turns out that predicting the text that
| describes the solution to a problem often ends up being
| the correct solution to the problem. That this was true
| was kind of a lucky accident and is where all the
| "intelligence" comes from.
| photon_lines wrote:
| So - in the pre-training step you are right -- they are
| simple 'statistical' predictors but there are more steps
| involved in their training which turn them from simple
| predictors to being able to capture patterns and reason
| -- I tried to come up with an intuitive overview of how
| they do this in the write-up and I'm not sure I can give
| you a simple explanation here, but I would recommend you
| play around with Deep-Seek and other more advanced
| 'reasoning' or 'chain-of-reason' models and ask them to
| perform tasks for you: they are not simply statistically
| combining information together. Many times they are able
| to reason through and come up with extremely advanced
| working solutions. To me this indicates that they are not
| 'accidently' stumbling upon solutions based on statistics
| -- they actually are able to 'understand' what you are
| asking them to do and to produce valid results.
| sdesol wrote:
| I'm not sure if I would say human reasoning is
| 'probabilistic' unless you are taking a very far step
| back and saying based on how the person lived, they have
| ingrained biases (weights) that dictates how they reason.
| I don't know if LLMs have a built in scepticism like
| humans do, that plays a significant role in reasoning.
|
| Regardless if you believe LLMs are probabilistic or not,
| I think what we are both saying is context is king and
| what it (LLM) says is dictated by the context (either
| through training) or introduced by the user.
| Workaccount2 wrote:
| Humans have a neuro-chemical system that performs
| operations with electrical signals.
|
| That's the level to look at, unless you have a dualist
| view of the brain (we are channeling a super-natural
| forces).
| lll-o-lll wrote:
| Yep, just like like looking at a birds feather through a
| microscope explains the principles of flight...
|
| Complexity theory doesn't have a mathematics (yet), but
| that doesn't mean we can't see that it exists. Studying
| the brain at the lowest levels haven't lead to any major
| insights in how cognition functions.
| photon_lines wrote:
| 'I don't know if LLMs have a built in scepticism like
| humans do' - humans don't have an 'in built skepticism'
| -- we learn in through experience and through being
| taught how to 'reason' within school (and it takes a very
| long time to do this). You believe that this is in-
| grained but you may have forgotten having to slog through
| most of how the world works and being tested when you
| went to school and when your parents taught you these
| things. On the context component: yes, context is vitally
| important (just as it is with humans) -- you can't
| produce a great solution unless you understand the 'why'
| behind it and how the current solution works so I 100%
| agree with that.
| ijidak wrote:
| For me, the way humans finish each other's sentences and
| often think of quotes from the same movies at the same
| time in conversation (when there is no clear reason for
| that quote to be a part of the conversation), indicates
| that there is a probabilistic element to human thinking.
|
| Is it entirely probabilistic? I don't think so. But, it
| does seem that a chunk of our speech generation and
| processing is similar to LLMs. (e.g. given the words I've
| heard so far, my brain is guessing words x y z should
| come next.)
|
| I feel like the conscious, executive mind humans have
| exercises some active control over our underlying
| probabilistic element. And LLMs lack the conscious
| executive.
|
| e.g. They have our probabilistic capabilities, without
| some additional governing layer that humans have.
| nomel wrote:
| No, it doesn't, nor do we. It's why abstractions and
| documentations exist.
|
| If you know what a function achieves, and you trust it to
| do that, you don't need to see/hold its exact
| implementation in your head.
| sdesol wrote:
| But documentation doesn't include styling or preferred
| pattern, which is why I think a lot people complain that
| the LLM will just produce garbage. Also documentation is
| not guaranteed to be correct or up to date. To be able to
| produce the best code based on what you are hoping for, I
| do think having the actual code is necessary unless
| styling/design patterns are not important, then yes
| documentation will be suffice, provided they are accurate
| and up to date.
| throwaway314155 wrote:
| /compact in Claude Code is effectively this.
| LinXitoW wrote:
| They already do, or at least Claude Code does. It will search
| for a method name, then only load a chunk of that file to get
| the method signature, for example.
|
| It will use the general information you give it to make
| educated guesses of where things are. If it knows the code is
| Vue based and it has to do something with "users", it might
| seach for "src/*/ _User_.vue.
|
| This is also the reason why the quality of your code makes
| such a large difference. The more consistent the naming of
| files and classes, the better the AI is at finding them.
| sdesol wrote:
| > I really desperately need LLMs to maintain extremely
| effective context
|
| I actually built this. I'm still not ready to say "use the tool
| yet" but you can learn more about it at
| https://github.com/gitsense/chat.
|
| The demo link is not up yet as I need to finalize an admin tool
| but you should be able to follow the npm instructions to play
| around with.
|
| The basic idea is, you should be able to load your entire repo
| or repos and use the context builder to help you refine it. Or
| you can can create custom analyzers that you can do 'AI
| Assisted' searches with like execute `!ask find all frontend
| code that does [this]` and the because the analyzer knows how
| to extract the correct metadata to support that query, you'll
| be able to easily build the context using it.
| kvirani wrote:
| Wait that's not how Cursor etc work? (I made assumptions)
| trenchpilgrim wrote:
| Dunno about Cursor but this is exactly how I use Zed to
| navigate groups of projects
| sdesol wrote:
| I don't use Cursor so I can't say, but based on what I've
| read, they optimize for smaller context to reduce cost and
| probably for performance. The issue is, I think this is
| severely flawed as LLMs are insanely context sensitive and
| forgetting to include a reference file can lead to
| undesirable code.
|
| I am obviously biased, but I still think to get the best
| results, the context needs to be human curated to ensure
| everything the LLM needs will be present. LLMs are
| probabilistic, so the more relevant context, the greater
| the chances the final output is the most desired.
| hirako2000 wrote:
| Not clear how it gets around what is, ultimately, a context
| limit.
|
| I've been fiddling with some process too, would be good if
| you shared the how. The readme looks like yet another full
| fledged app.
| sdesol wrote:
| Yes there is a context window limit, but I've found for
| most frontier models, you can generate very effective code
| if the context window is under 75,000 tokens provided the
| context is consistent. You have to think of everything from
| a probability point of view and the more logical the
| context, the greater the chances of better code.
|
| For example, if the frontend doesn't need to know the
| backend code (other than the interface) not including the
| backend code to solve a frontend one to solve a specific
| problem can reduce context size and improve the chances of
| expected output. You just need to ensure you include the
| necessary interface documenation.
|
| As for the full fledged app, I think you raised a good
| point and I should add a 'No lock in' section for why to
| use it. The app has a message tool that lets you pick and
| choose what messages to copy. Once you've copied the
| context (including any conversation messages that can help
| the LLM), you can use the context where ever you want.
|
| My strategy with the app is to be the first place you goto
| to start a conversation before you even generate code, so
| my focus is helping you construct contexts (the smaller the
| better) to feed into LLMs.
| handfuloflight wrote:
| Doesn't Claude Code do all of this automatically?
| sdesol wrote:
| I haven't looked at Claud Code, so I don't know if they
| have analyzers or not that understands how to extract any
| type of data other than specific coding data that it is
| trained on. Based on the runtime for some tasks, I would
| not be surprised if it is going through all the files and
| asking "is this relevant"
|
| My tool is mainly targeted at massive code bases and
| enterprise as I still believe the most efficient way to
| build accurate context is by domain experts.
|
| Right now, I would say 95% of my code is AI generated (98%
| human architectured) and I am spending about $2 a day on
| LLM costs and the code generation part usually never runs
| more than 30 seconds for most tasks.
| handfuloflight wrote:
| Well you should look at it, because it's not going
| through all files. I looked at your product and the
| workflow is essentially asking me to do manually what
| Claude Code does auto. Granted, manually selecting the
| context will probably lead to lower costs in any case
| because Claude Code invokes tool calls like grep to do
| its search, so I do see merit in your product in that
| respect.
| sdesol wrote:
| Looking at the code, it does have some sort of automatic
| discovery. I also don't know how scalable Claude Code is.
| I've spent over a decade thinking about code search, so I
| know what the limitations are for enterprise code.
|
| One of the neat tricks that I've developed is, I would
| load all my backend code for my search component and then
| I would ask the LLM to trace a query and create a context
| bundle for only the files that are affected. Once the LLM
| has finished, I just need to do a few clicks to refine a
| 80,000 token size window down to about 20,000 tokens.
|
| I would not be surprised if this is one of the tricks
| that it does as it is highly effective. Also, yes my tool
| is manual, but I treat conversations as durable asset so
| in the future, you should be able to say, last week I did
| this, load the same files and LLM will know what files to
| bring into context.
| handfuloflight wrote:
| Excellent, I look forward to trying it out, at minimum to
| wean off dependency to Claude Code and it's likely
| current state of overspending on context. I agree with
| looking at conversations as durable assets.
| sdesol wrote:
| > current state of overspending on context
|
| The thing that is killing me when I hear about Claude
| Code and other agent tools is the amount of energy they
| must be using. People say they let the task run for an
| hour and I can't help but to think how much energy is
| being used and if Claude Code is being upfront with how
| much things will actually cost in the future.
| pacoWebConsult wrote:
| FWIW Claude code conversations are also durable. You can
| resume any past conversation in your project. They're
| stored as jsonl files within your `$HOME/.claude`
| directory. This retains the actual context (including
| your prompts, assistant responses, tool usages, etc) from
| that conversation, not just the files you're affecting as
| context.
| sdesol wrote:
| Thanks for the info. I actually want to make it easy for
| people to review aider, plandex, claude code, etc.
| conversations so I will probably look at importing them.
|
| My goal isn't to replace the other tools, but to make
| them work smarter and more efficiently. I also think we
| will in a year or two, start measuring performance based
| on how developers interact with LLMs (so management will
| want to see the conversations). Instead of looking at
| code generated, the question is going to be, if this
| person is let go, what is the impact based on how they
| are contributing via their conversations.
| seanmmward wrote:
| The primary use case isn't just about shoving more code in
| context, although depending on the task, there is an
| irredicible minimum context needed for it to capture all the
| needed understanding. The 1M context model is a unique beast in
| terms of how you need to feed it, and its real power is being
| able to tackle long horizon tasks which require iterative
| exploration, in context learning, and resynthesis. Ie, some
| problems are breadth (go fix an api change in 100 files), other
| however require depth (go learn from trying 15 different ways
| to solve this problem). 1M Sonnet is unique in its capabilities
| for the latter in particular.
| hinkley wrote:
| Sounds to me like your problem has shifted from how much the AI
| tool costs per hour to how much it costs per token because
| resetting a model happens often enough that the price doesn't
| amortize out per hour. That giant spike every ?? months
| overshadows the average cost per day.
|
| I wonder if this will become more universal, and if we won't
| see a 'tick-tock' pattern like Intel used, where they tweak the
| existing architecture one or more times between major design
| work. The 'tick' is about keeping you competitive and the
| 'tock' is about keeping you relevant.
| TZubiri wrote:
| "However. Price is king. Allowing me to flood the context
| window with my code base is great"
|
| I don't vibe code, but in general having to know all of the
| codebase to be able to do something is a smell, it's
| spagghetti, it's lack of encapsulation.
|
| When I program I cannot think about the whole database, I have
| a couple of files open tops and I think about the code in those
| files.
|
| This issue of having to understand the whole codebase,
| complaining about abstractions, microservices, and OOP, and
| wanting everything to be in a "simple" monorepo, or a monolith;
| is something that I see juniors do, almost exclusively.
| ants_everywhere wrote:
| > I really desperately need LLMs to maintain extremely
| effective context
|
| The context is in the repo. An LLM will never have the context
| you need to solve all problems. Large enough repos don't fit on
| a single machine.
|
| There's a tradeoff just like in humans where getting a specific
| task done requires removing distractions. A context window that
| contains everything makes focus harder.
|
| For a long time context windows were too small, and they
| probably still are. But they have to get better at
| understanding the repo by asking the right questions.
| stuartjohnson12 wrote:
| > An LLM will never have the context you need to solve all
| problems.
|
| How often do you need more than 10 million tokens to answer
| your query?
| ants_everywhere wrote:
| I exhaust the 1 million context windows on multiple models
| multiple times per day.
|
| I haven't used the Llama 4 10 million context window so I
| don't know how it performs in practice compared to the
| major non-open-source offerings that have smaller context
| windows.
|
| But there is an induced demand effect where as the context
| window increases it opens up more possibilities, and those
| possibilities can get bottlenecked on requiring an even
| bigger context window size.
|
| For example, consider the idea of storing all Hollywood
| films on your computer. In the 1980s this was impossible.
| If you store them in DVD or Bluray quality you could
| probably do it in a few terabytes. If you store them in
| full quality you may be talking about petabytes.
|
| We recently struggled to get a full file into a context
| window. Now a lot of people feel a bit like "just take the
| whole repo, it's only a few MB".
| onion2k wrote:
| _Large enough repos don 't fit on a single machine._
|
| I don't believe any human can understand a problem if they
| need to fit the entire problem blem domain in their head, and
| the scope of a domain that doesn't fit on a computer. You
| _have_ to break it down into a manageable amount of
| information to tackle it in chunks.
|
| If a person can do that, so can an LLM prompted to do that by
| a person.
| sdesol wrote:
| > But they have to get better at understanding the repo by
| asking the right questions.
|
| How I am tackling this problem is making it dead simple for
| users to create analyzers that are designed to enriched text
| data. You can read more about how it would be used in a
| search at https://github.com/gitsense/chat/blob/main/packages
| /chat/wid...
|
| The basic idea is, users would construct analyzers with the
| help of LLMs to extract the proper metadata that can be
| semantically searched. So when the user does an AI Assisted
| search with my tool, I would load all the analyzers
| (description and schema) into the system prompt and the LLM
| can determine which analyzers can be used to answer the
| question.
|
| A very simplistic analyzer would be to make it easy to
| identify backend and frontend code so you can just use the
| command `!ask find all frontend files` and the LLM will
| construct a deterministic search that knows to match for
| frontend files.
| rootnod3 wrote:
| So, more tokens means better but at the same time more tokens
| means it distracts itself too much along the way. So at the same
| time it is an improvement but also potentially detrimental. How
| are those things beneficial in any capacity? What was said last
| week? Embrace AI or leave?
|
| All I see so far is: don't embrace and stay.
| rootnod3 wrote:
| So, I see this got downvoted. Instead of just downvoting, I
| would prefer to have a counter-argument. Honestly. I am on the
| skeptic side of LLM, but would not mind being turned to the
| other side with some solid arguments.
| pupppet wrote:
| How does anyone send these models that much context without it
| tripping over itself? I can't get anywhere near that much before
| it starts losing track of instruction.
| 9wzYQbTYsAIc wrote:
| I've been having decent luck telling it to keep track of itself
| in a .plan file, not foolproof, of course, but it has some
| ability to "preserve context" between contexts.
|
| Right now I'm experimenting with using separate .plan files for
| tracking key instructions across domains like architecture and
| feature decisions.
| CharlesW wrote:
| > _I've been having decent luck telling it to keep track of
| itself in a .plan file, not foolproof, of course, but it has
| some ability to "preserve context" between contexts._
|
| This is the way. Not only have I had good luck with both a
| TASKS.md and TASKS-COMPLETE.md (for history), but I have an
| .llm/arch full of AI-assisted, for-LLM .md files (auth.md,
| data-access.md, etc.) that document architecture decisions
| made along the way. They're invaluable for effectively and
| efficiently crossing context chasms.
| olddustytrail wrote:
| I think it's key to not give it contradictory instructions,
| which is an easy mistake to make if you forget where you
| started.
|
| As an example, I know of an instance where the LLM claimed it
| had tried a test on its laptop. This obviously isn't true so
| the user argued with it. But they'd originally told it that it
| was a Senior Software Engineer so playing that role, saying you
| tested locally is fine.
|
| As soon as you start arguing with those minor points you break
| the context; now it's both a Software Engineer and an LLM. Of
| course you get confused responses if you do that.
| pupppet wrote:
| The problem I often have is I may have instruction like-
|
| General instruction: - Do "ABC"
|
| If condition == whatever: - Do "XYZ" instead
|
| I have a hard time making the AI obey the instances I wish to
| override my own instruction and without having full control
| of the input context, I can't just modify my 'General
| Instruction' on a case by case basis to simply avoid having
| to contradict myself.
| olddustytrail wrote:
| That's a difficult case where you might want to collect
| your good context and shift it to a different session.
|
| It would be nice if the UI made that easy to do.
| greenfish6 wrote:
| Yes, but if you look in the rate limit notes, the rate limit is
| 500k tokens / minite for tier 4, which we are on. Given how
| stingy anthropic has been with rate limit increases, this is for
| very few people right now
| alvis wrote:
| Context window after certain size doesn't bring in much benefit
| but higher bill. If it still keeps forgetting instructions it
| would be just much easier to be ended up with long messages with
| higher context consumption and hence the bill
|
| I'd rather having an option to limit the context size
| EcommerceFlow wrote:
| It does if you're working with bigger codebases. I've found
| copy/pasting my entire codebase + adding a <task> works
| significantly better than cursor.
| spiderice wrote:
| How does one even copy their entire codebase? Are you saying
| you attach all the files? Or you use some script to copy all
| the text to your clipboard? Or something else?
| EcommerceFlow wrote:
| I created a script that outputs the entire codebase to a
| text file (also allows me to exclude
| files/folders/node_modules), separating and labeling each
| file in the program folder.
|
| I then structure my prompts around like so:
|
| <project_code> ``` ``` </project_code>
|
| <heroku_errors> " " </heroku_errors>
|
| <task> " " </task>
|
| I've been using this with Google Ai studio and it's worked
| phenomenally. 1 million tokens is A LOT of code, so I'd
| imagine this would work for lots n lots of project type
| programs.
| andrewstuart wrote:
| Oh man finally. This has been such a HUGE advantage for Gemini.
|
| Could we please have zip files too? ChatGPT and Gemini both
| unpack zip files via the chat window.
|
| Now how about a button to download all files?
| qsort wrote:
| I won't complain about a strict upgrade, but that's a pricy boi.
| Interesting to see differential pricing based on size of input,
| which is understandable given the O(n^2) nature of attention.
| isoprophlex wrote:
| 1M of input... at $6/1M input tokens. Better hope it can one-shot
| your answer.
| elitan wrote:
| have you ever hired humans?
| bicepjai wrote:
| Depends on which human you tried :) Donot underestimate
| yourself !
| rafaelero wrote:
| god they keep raising prices
| revskill wrote:
| The critical issue with LLM which never beats human: break what
| worked.
| henriquegodoy wrote:
| Thats incredible to see how ai models are improving, i'm really
| happy with this news. (imo it's more impactful than the release
| of gpt5) now, we need more tokens per second, and then the self-
| improvement of the model will accelerate.
| lherron wrote:
| Wow, I thought they would feel some pricing pressure from GPT5
| API costs, but they are doubling down on their API being more
| expensive than everyone else.
| sebzim4500 wrote:
| I think it's the right approach, the cost of running these
| things as coding assistants is negligable compared to the
| benefit of even a slight model improvement.
| AtNightWeCode wrote:
| GPT5 API uses more tokens for answers of the same quality as
| previous versions. Fell into that trap myself. I use both
| Claude and OpenAI right now. Will probably drop OpenAI since
| they are obviously not to be trusted considering the way they
| do changes.
| shamano wrote:
| 1M tokens is impressive, but the real gains will come from how we
| curate context--compact summaries, per-repo indexes, and phase
| resets. Bigger windows help; guardrails keep models focused and
| costs predictable.
| jbellis wrote:
| Just completed a new benchmark that sheds some light on whether
| Anthropic's premium is worth it.
|
| (Short answer: not unless your top priority is speed.)
|
| https://brokk.ai/power-rankings
| 24xpossible wrote:
| Why no Grok 4?
| Zorbanator wrote:
| You should be able to guess.
| rcanepa wrote:
| I recently switched to the $200 CC subscription and I think I
| will stay with it for a while. I briefly tested whatever
| version of ChatGPT 5 comes with the free Cursor plan and it was
| unbearably slow. I could not really code with it as I was
| constantly getting distracted while waiting for a response. So,
| speed matters a lot for some people.
| Someone1234 wrote:
| Before this they supposedly had a longer context window than
| ChatGPT, but I have workloads that abuse the heck out of context
| windows (100-120K tokens). ChatGPT genuinely seems to have a 32K
| context window, in the sense that is legitimately remembers/can
| utilize everything within that window.
|
| Claude previously had "200K" context windows, but during testing
| it wouldn't even hit a full 32K before hitting a wall/it
| forgetting earlier parts of the context. They also have extremely
| short prompt limits relative to the other services around, making
| it hard to utilize their supposedly larger context windows (which
| is suspicious).
|
| I guess my point is that with Anthropic specifically, I don't
| trust their claims because that has been my personal experience.
| It would be nice if this "1M" context window now allows you to
| actually use 200K though, but it remains to be seen if it can
| even do _that_. As I said with Anthropic you need to verify
| everything they claim.
| Etheryte wrote:
| Strong agree, Claude is very quick to forget things like "don't
| do this", "never do this" or things it tried that were wrong.
| It will happily keep looping even in very short conversations,
| completely defeating the purpose of using it. It's easy to game
| the numbers, but it falls apart in the real world.
| joquarky wrote:
| I've found it better to use antonyms than negations most
| situations.
| lvl155 wrote:
| Only time this is useful is to do init on a sizable code base or
| dump a "big" csv.
| film42 wrote:
| The 1M token context was Gemini's headlining feature. Now, the
| only thing I'd like Claude to work on is tokens counted towards
| document processing. Gemini will often bill 1/10th the tokens
| Anthropic does for the same document.
| varyherb wrote:
| I believe this can be configured in Claude Code via the following
| environment variable:
|
| ANTHROPIC_BETAS="context-1m-2025-08-07" claude
| falcor84 wrote:
| Have you tested it? I see that this env var isn't specified in
| their docs
|
| https://docs.anthropic.com/en/docs/claude-code/settings#envi...
| bazhand wrote:
| Add these settings to your `.claude/settings.json`:
| ```json { "env": {
| "ANTHROPIC_CUSTOM_HEADERS": {"anthropic-beta":
| "context-1m-2025-08-07"}, "ANTHROPIC_MODEL":
| "claude-sonnet-4-20250514",
| "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 8192 } }
| ```
| gdudeman wrote:
| A tip for those who both use Claude Code and are worried about
| token use (which you should be if you're stuffing 400k tokens
| into context even if you're on 20x Max): 1. Build
| context for the work you're doing. Put lots of your codebase into
| the context window. 2. Do work, but at each logical
| stopping point hit double escape to rewind to the context-filled
| checkpoint. You do not spend those tokens to rewind to that
| point. 3. Tell Claude your developer finished XYZ, have it
| read it into context and give high level and low level feedback
| (Claude will find more problems with your developer's work than
| with yours).
|
| If you want to have multiple chats running, use /resume and pull
| up the same thread. Hit double escape to the point where Claude
| has rich context, but has not started down a specific rabbit
| hole.
| rvnx wrote:
| Thank you for the tips, do you know how to rollback latest
| changes ? Trying very hard to do it, but seems like Git is the
| only way ?
| gdudeman wrote:
| Git or my favorite "Undo all of those changes."
| spike021 wrote:
| this usually gets the job done for me as well
| SparkyMcUnicorn wrote:
| I haven't used it, but saw this the other day:
| https://github.com/RonitSachdev/ccundo
| rtuin wrote:
| Quick tip when working with Claude Code and Git: When you're
| happy with an intermediate result, stage the changes by
| running `git add` (no commit). That makes it possible to
| always go back to the staged changes when Claude messes up.
| You can then just discard the unstaged changes and don't have
| to roll back to the latest commit.
| seperman wrote:
| Very interesting. Why does Claude find more problems if we
| mention the code is written by another developer?
| bgilly wrote:
| In my experience, Claude will criticize others more than it
| will criticize itself. Seems similar to how LLMs in general
| tend to say yes to things or call anything a good idea by
| default.
|
| I find it to be an entertaining reflection of the cultural
| nuances embedded into training data and reinforcement
| learning processes.
| mcintyre1994 wrote:
| Total guess, but maybe it breaks it out of the sycophancy
| that most models seem to exhibit?
|
| I wonder if they'd also be better at things like telling you
| an idea is dumb if you tell it it's from someone else and
| you're just assessing it.
| gdudeman wrote:
| Claude is very agreeable and is an eager helper.
|
| It gives you the benefit of the doubt if you're coding.
|
| It also gives you the benefit of the doubt if you're looking
| for feedback on your developers work. If you give it a hint
| of distrust "my developer says they completed this, can you
| check and make sure, give them feedback....?" Claude will
| look out for you.
| sixothree wrote:
| I've been using Serena MCP to keep my context smaller. It seems
| to be working because claude uses it pretty much exclusively to
| search the codebase.
| lucasfdacunha wrote:
| Could you elaborate a bit on how that works? Does it need any
| changes in how you use Claude?
| yahoozoo wrote:
| I thought double escape just clears the text box?
| gdudeman wrote:
| With an empty text box, double escape shows you a list of
| previous inputs from you. You can go back and fork at any one
| of those.
| oars wrote:
| I tell Claude that it wrote XYZ in another session (I wrote it)
| then use that context to ask questions or make changes.
| gdudeman wrote:
| I'll note this saves a lot of wait time as well! No sitting
| there while a new Claude builds context from scratch.
| i_have_an_idea wrote:
| This sounds like the programmer equivalent of astrology.
|
| > Build context for the work you're doing. Put lots of your
| codebase into the context window.
|
| If you don't say that, what do you think happens as the agent
| works on your codebase.
| insane_dreamer wrote:
| I usually tell CC (or opencode, which I've been using recently)
| to look up the files and find the relevant code. So I'm not
| attaching a huge number of files to the context. But I don't
| actually know whether this saves tokens or not.
| Wowfunhappy wrote:
| I do this all the time and it sometimes works, but it's not a
| silver bullet. Sometimes Claude benefits from having the full
| conversation.
| ZeroCool2u wrote:
| It's great they've finally caught up, but unfortunate it's on
| their mid-tier model only and it's laughably expensive.
| thimabi wrote:
| Oh, well, ChatGPT is being left in the dust...
|
| When done correctly, having one million tokens of context window
| is amazing for all sorts of tasks: understanding large codebases,
| summarizing books, finding information on many documents, etc.
|
| Existing RAG solutions fill a void up to a point, but they lack
| the precision that large context windows offer.
|
| I'm excited for this release and hope to see it soon on the UI as
| well.
| OutOfHere wrote:
| Fwiw, OpenAI does have a decent active API model family of
| GPT-4.1 with a 1M context. But yes, the context of the GPT-5
| models is terrible in comparison, and it's altogether atrocious
| for the GPT-5-Chat model.
|
| The biggest issue in ChatGPT right now is a very inconsistent
| experience, presumably due to smaller models getting used even
| for paid users with complex questions.
| kotaKat wrote:
| A million tokens? Damn, I'm gonna need a _lot_ of quarters to
| play this game at Chuck-E-Cheese.
| xnx wrote:
| 1M context windows are not created equal. I doubt Claude's recall
| is as good as Gemini's 1M context recall.
| https://cloud.google.com/blog/products/ai-machine-learning/t...
| xnx wrote:
| Good analysis here:
| https://news.ycombinator.com/item?id=44878999
|
| > the model that's best at details in long context text and
| code analysis is still Gemini.
|
| > Gemini Pro and Flash, by comparison, are far cheaper
| firasd wrote:
| A big problem with the chat apps (ChatGPT; Claude.ai) is the
| weird context window hijinks. Especially ChatGPT does wild
| stuff.. sudden truncation; summarization; reinjecting 'ghost
| snippets' etc
|
| I was thinking this should be up to the user (do you want to
| continue this conversation with context rolling out of the window
| or start a new chat) but now I realized that this is inevitable
| given the way pricing tiers and limited computation works. Like
| the only way to have full context is use developer tools like
| Google AI Studio or use a chat app that wraps the API
|
| With a custom chat app that wraps the API you can even inject the
| current timestamp into each message and just ask the LLM btw
| every 10 minutes just make a new row in a markdown table that
| summarizes every 10 min chunk
| cruffle_duffle wrote:
| > btw every 10 minutes just make a new row in a markdown table
| that summarizes every 10 min chunk
|
| Why make it time based instead of "message based"... like
| "every 10 messages, summarize to blah-blah.md"?
| dev0p wrote:
| Probably it's more cost effective and less error prone to
| just dump the message log rather than actively rethink the
| context window, costing resources and potentially losing
| information in the process. As the models gets better, this
| might change.
| firasd wrote:
| Sure. But you'd want to help out the LLM with a message count
| like this is message 40, this is message 41... so when it
| hits message 50 it's like ahh time for a new summary and call
| the memory_table function (cause it's executing the earlier
| standing order in your prompt)
| tosh wrote:
| How did they do the 1M context window?
|
| Same technique as Qwen? As Gemini?
| deadbabe wrote:
| Unfortunately, larger context isn't really the answer after a
| certain point. Small focused context is better, lazily throwing a
| bunch of tokens in as a context is going to yield bad results.
| ramoz wrote:
| Awesome addition to a great model.
|
| The best interface for long context reasoning has been AIStudio
| by Google. Exceptional experience.
|
| I use Prompt Tower to create long context payloads.
| simianwords wrote:
| How does "supporting 1M tokens" really work in practice? Is it a
| new model? Or did they just remove some hard coded constraint?
| eldenring wrote:
| Serving a model efficiently at 1M context is difficult and
| could be much more expensive/numerically tricky. I'm guessing
| they were working on serving it properly, since its the same
| "model" in scores and such.
| simianwords wrote:
| Thanks - still not clear what they did really. Some inference
| time hacks?
| FergusArgyll wrote:
| That would imply the model always had a 1m token context
| but they limited it in the api and app? That's strange
| because they can just charge more for every token past 250k
| (like google does, I believe).
|
| But if not shouldn't it have to be completely retrained
| model? it's clearly not that - good question!
| nickphx wrote:
| Yay, more room for stray cats.
| alienbaby wrote:
| The fracturing of all the models offered across providers is
| annoying. The number of different models and the fact a given
| model will have different capabilities from different providers
| is ridiculous.
| chrisweekly wrote:
| Peer of this post currently also on HN front page, comparing perf
| for Claude vs Gemini, w/ 1M tokens:
| https://news.ycombinator.com/item?id=44878999
| DiabloD3 wrote:
| Neat. I do 1M tokens context locally, and do it entirely with a
| single GPU and FOSS software, and have access to a wide range of
| models of equivalent or better quality.
|
| Explain to me, again, how Anthropic's flawed business model
| works?
| codazoda wrote:
| Tell us more?
| DiabloD3 wrote:
| Nothing really to say, its just like everyone else's
| inference setups.
|
| Select a model that produces good results, has anywhere from
| 256k to 1M context (ex: Qwen3-Coder can do 1M), is under one
| of the acceptable open weights licenses, and run it in
| llama.cpp.
|
| llama.cpp can split layers between active and MoE, and only
| load the active ones into vram, leaving the rest of it
| available for context.
|
| With Qwen3-Coder-30B-A3B, I can use Unsloth's Q4_K_M, consume
| a mere 784MB of VRAM with the active layers, then consume
| 27648MB (kv cache) + 3096MB (context) with the kv cache
| quantized to iq4_nl. This will fit onto a single card with
| 32GB of VRAM, or slightly spill over on 24GB.
|
| Since I don't personally need that much, I'm not pouring
| entire projects into it (I know people do this, and more data
| _does not produce better results_ ), I bump it down to 512k
| context and fit it in 16.0GB, to avoid spill over on my 24GB
| card. In the event I do need the context, I am always free to
| enable it.
|
| I do not see a meaningful performance difference between all
| on the card and MoE sent to RAM while active is on VRAM, its
| very much a worthwhile option for home inference.
|
| Edit: For completeness sake, 256k context with this
| configuration is 8.3GB total VRAM, making _very_ budget good
| inference absolutely possible.
| ffitch wrote:
| I wonder how modern models fair on NovelQA and FLenQA (benchmarks
| that test ability to understand long context beyond needle in a
| haystack retrieval). The only such test on a reasoning model that
| I found was done on o3-mini-high
| (https://arxiv.org/abs/2504.21318), it suggests that reasoning
| noticeably improves FLenQA performance, but this test only
| explored context up to 3,000 tokens.
| dang wrote:
| Related ongoing thread:
|
| _Claude vs. Gemini: Testing on 1M Tokens of Context_ -
| https://news.ycombinator.com/item?id=44878999 - Aug 2025 (9
| comments)
| whalesalad wrote:
| My first thought was "gg no re" can't wait to see how this
| changes compaction requirements in claude code.
| pmxi wrote:
| The reason I initially got interested in Claude was because they
| were the first to offer a 200K token context window. That was
| massive in 2023. However, they didn't keep up once Gemini offered
| a 1M token window last year.
|
| I'm glad to see an attempt to return to having a competitive
| context window.
| markb139 wrote:
| I've tried 2 AI tools recently. Neither could produce the correct
| code to calculate the CPU temperature on a Raspberry Pi RP2040.
| The code worked, looked ok and even produced reasonable looking
| results - until I put a finger on the chip and thus raised the
| temp. The calculated temperature went down. As an aside the free
| version of chatGPT didn't know about anything newer than 2023 so
| couldn't tell me about the RP2350
| anvuong wrote:
| How can you be sure putting the finger on the chip raise the
| temp? If you feel hot that means heat from the chip is being
| transferred to your finger, that may decrease the temp, no?
| broshtush wrote:
| From my understanding putting your finger on an uncooled CPU
| acts like a passive cooler, thus actually decreasing
| temperature.
| fwip wrote:
| I don't think a larger context window would help with that.
| fpauser wrote:
| Best comment ;)
| ghjv wrote:
| wouldn't your finger have acted as a heat sink, lowering the
| temp? sounds like the program may have worked correctly. could
| be worth trying again with a hot enough piece of metal instead
| of your finger
| logicchains wrote:
| With that pricing I can't imagine why anyone would use Claude
| Sonnet through the API when Gemini 2.5 Pro is both better and
| cheaper (especially at long-context understanding).
| CuriouslyC wrote:
| Claude is a good deal with the $20 subscription giving a fair
| amount of sonnet use with Code. It's also got a very distinct
| voice as far as LLMs go, and tends to produce cleaner/clearer
| writing in general. I wouldn't use the API in an application
| but the subscription feels like a pretty good deal.
| siva7 wrote:
| Ah, so claude code on subscription will become a crippled down
| version
| joduplessis wrote:
| As far as coding goes Claude seems to be the most competent right
| now, I like it. GPT5 is abysmal - I'm not sure if they're bugs,
| or what, but the new release takes a good few steps back. Gemini
| still a hit and miss - and Grok seems to be a poor man's Claude
| (where code is kind of okay, a bit buggy and somehow similar to
| Claude).
| brokegrammer wrote:
| Many people are confused about the usefulness of 1M tokens
| because LLMs often start to get confused after about 100k. But
| this is big for Claude 4 because it uses automatic RAG when the
| context becomes large. With optimized retrieval thanks to RAG,
| we'll be able to make good use of those 1M tokens.
| m4r71n wrote:
| How does this work under the hood? Does it build an in-memory
| vector database of the input sources and runs queries on top of
| that data to supplement the context window?
| Balgair wrote:
| Wow!
|
| As a fiction writer/noodler this is amazing. I can put not just a
| whole book in as before, not just a whole series, but the entire
| corpus of author _s_ in.
|
| I mean, from the pov of biography writers, this is awesome too.
| Just dump it all in, right?
|
| I'll have to switch using to Sonnet 4 now for workflows and edit
| my RAG code to be longer windows, a _lot_ longer
| irthomasthomas wrote:
| Brain: Hey, you going to sleep? Me: Yes. Brain: That 200,001st
| token cost you $600,000/M.
| qwertox wrote:
| > desperately need LLMs to maintain extremely effective context
|
| Last time I used Gemini it did something very surprising: instead
| of providing readable code, it started to generate pseudo-
| minified code.
|
| Like on CSS class would become one long line of CSS, and one JS
| function became one long line of JS, with most of the variable
| names minified, while some remained readable, but short. It did
| away with all unnecessary spaces.
|
| I was asking myself what is happening here, and my only
| explanation was that maybe Google started training Gemini on
| minified code, on making Gemini understand and generate it, in
| order to maximize the value of every token.
| ericol wrote:
| "...in API"
|
| That's a VERY relevant clarification. this DOESN'T apply to web
| or app users.
|
| Basically, if you want a 1M context window you have to
| specifically pay for it.
| sporkland wrote:
| Does anyone have data on how much better these 1M token context
| models produce better results than the more limited windows
| alongside certain RAG implementations? Or how much better in the
| face of RAG the 200k vs 1M token models perform on a benchmark?
| poniko wrote:
| [Claude usage limit reached. Your limit will reset at..] .. eh
| lunch is a good time to go home anyways..
| chmod775 wrote:
| For some context, only the tweaks files and scripting parts of
| Cyberpunk 2077 are ~2 million LOC.
| not_that_d wrote:
| My experience with the current tools so far:
|
| 1. It helps to get me going with new languages, frameworks,
| utilities or full green field stuff. After that I expend a lot of
| time parsing the code to understand what it wrote that I kind of
| "trust" it because it is too tedious but "it works".
|
| 2. When working with languages or frameworks that I know, I find
| it makes me unproductive, the amount of time I spend writing a
| good enough prompt with the correct context is almost the same or
| more that if I write the stuff myself and to be honest the
| solution that it gives me works for this specific case but looks
| like a junior code with pitfalls that are not that obvious unless
| you have the experience to know it.
|
| I used it with Typescript, Kotlin, Java and C++, for different
| scenarios, like websites, ESPHome components (ESP32), backend
| APIs, node scripts etc.
|
| Botton line: usefull for hobby projects, scripts and to
| prototypes, but for enterprise level code it is not there.
| jeremywho wrote:
| My workflow is to use Claude desktop with the filesystem mcp
| server.
|
| I give claude the full path to a couple of relevant files
| related to the task at hand, ie where the new code should hook
| into or where the current problem is.
|
| Then I ask it to solve the task.
|
| Claude will read the files, determine what should be done and
| it will edit/add relevant files. There's typically a couple of
| build errors I will paste back in and have it correct.
|
| Current code patterns & style will be maintained in the new
| code. It's been quite impressive.
|
| This has been with Typescript and C#.
|
| I don't agree that what it has produced for me is hobby-grade
| only...
| taberiand wrote:
| I've been using it the same way. One approach that's worked
| well for me is to start a project and first ask it to analyse
| and make a plan with phases for what needs to be done, save
| that plan into the project, then get it to do each phase in
| sequence. Once it completes a phase, have it review the code
| to confirm if the phase is complete. Each phase of work and
| review is a new chat.
|
| This way helps ensure it works on manageable amounts of code
| at a time and doesn't overload its context, but also keeps
| the bigger picture and goal in sight.
| mnky9800n wrote:
| I find that sometimes this works great and sometimes it
| happily tells you everything works and your code fails
| successfully and if you aren't reading all the code you
| would never know. It's kind of strange actually. I don't
| have a good feeling when it will get everything correct and
| when it will fail and that's what is disconcerting. I would
| be happy to be given advice on what to do to untangle when
| it's good and when it's not. I love chatting with Claude
| code about code. It's annoying that it doesn't always get
| it right and also doesn't really interact with failure like
| a human would. At Least in my experience anyways.
| taberiand wrote:
| Of course, everything needs to be verified - I'm just
| trying to figure out a process that enables it to work as
| effectively as it can on large code bases in a structured
| way. Committing each stage to git, fixing issues and
| adjusting the context still comes into play.
| nwatson wrote:
| One can also integrate with, say, a running PyCharm with the
| Jetbrains IDE MCP server. Claude Desktop can then interact
| directly with PyCharm.
| hamandcheese wrote:
| Any particular reason you prefer that over Claude code?
| jeremywho wrote:
| I'm on windows. Claude Code via WSL hasn't been as smooth a
| ride.
| risyachka wrote:
| Pretty much my experience too.
|
| I usually go to option 2 - just write it by myself as it is
| same time-wise but keeps skills sharp.
| fpauser wrote:
| Not to degenerate is really challenging these days. There are
| the bubbles that simulate multiple realities to us and try to
| untrain us logic thinking. And there are the llms that try to
| convice us that self thinking is unproductive. I wonder when
| this digitalophily suddenly turns into digitalophobia.
| flowerthoughts wrote:
| I predict microservices will get a huge push forward. The
| question then becomes if we're good enough at saying "Claude,
| this is too big now, you have to split it in two services" or
| not.
|
| If LLMs maintain the code, the API boundary
| definitions/documentation and orchestration, it might be
| manageable.
| fsloth wrote:
| Why microservices? Monoliths with code-golfed minimal
| implementation size (but high quality architecture)
| implemented in strongly typed language would consume far less
| tokens (and thus would be cheaper to maintain).
| arwhatever wrote:
| Won't this cause [insert LLM] to lose context around the
| semantics of messages passed between microservices?
|
| You could then put all services in 1 repo, or point LLM at X
| number of folders containing source for all X services, but
| then it doesn't seem like you'll have gained anything, and at
| the cost of added network calls and more infra management.
| urbandw311er wrote:
| Why not just cleanly separated code in a single execution
| environment? No need to actually run the services in separate
| execution environments just for the sake of an LLM being able
| to parse it, that's crazy! You can just give it the files or
| folders it needs for the particular services within the
| project.
|
| Obviously there's still other reasons to create micro
| services if you wish, but this does not need to be another
| reason.
| fpauser wrote:
| Same conclusion here. Also good for analyzing existing
| codebases and to generate documentation for undocumented
| projects.
| j45 wrote:
| It's quite good at this, I have been tying in Gemini Pro with
| this too.
| johnisgood wrote:
| > but for enterprise level code it is not there
|
| It is good for me in Go but I had to tell it what to write and
| how.
| sdesol wrote:
| I've been able to create a very advanced search engine for my
| chat app that is more than enterprise ready. I've spent a
| decade thinking about search, but in a different language.
| Like you, I needed to explain what I knew about writing a
| search engine in Java for the LLM, to write it in JavaScript
| using libraries I did not know and it got me 95% of the way
| there.
|
| It is also incredibly important to note that the 5% that I
| needed to figure out was the difference between throw away
| code and something useful. You absolutely need domain
| knowledge but LLMs are more than enterprise ready in my
| opinion.
|
| Here is some documentation on how my search solution is used
| in my app to show that it is not a hobby feature.
|
| https://github.com/gitsense/chat/blob/main/packages/chat/wid.
| ..
| johnisgood wrote:
| Thanks for your reply, I am in the same boat, and it works
| for me, like it seems to work for you. So as long as we are
| effective with it, why not? Of course I am not doing things
| blindly and expect good results.
| jiggawatts wrote:
| Something I've discovered is that it may be worthwhile writing
| the prompt anyway, even for a framework you're an expert with.
| Sometimes the AIs will surprise me with a novel approach, but
| the real value is that the prompt makes for _excellent_
| documentation of the requirements! It's a much better starting
| point for doc-comments or PR blurbs than after-the-fact
| ramblings.
| viccis wrote:
| I agree. For me it's a modern version of that good ol "rails
| new" scaffolding with Ruby on Rails that got you started with a
| project structure. It makes sense because LLMs are particularly
| good at tasks that require little more knowledge than just a
| near perfect knowledge of the documentation of the tooling
| involved, and creating a well organized scaffold for a
| greenfield project falls squarely in that area.
|
| For legacy systems, especially ones in which a lot of the
| things they do are because of requirements from external
| services (whether that's tech debt or just normal growing
| complexity in a large connected system), it's less useful.
|
| And for tooling that moves fast and breaks things (looking at
| you, Databricks), it's basically worthless. People have already
| brought attention to the fact that it will only be as current
| as its training data was, and so if a bunch of terminology,
| features, and syntax have changed since then (ahem,
| Databricks), you would have to do some kind of prompt
| engineering with up to date docs for it to have any hope of
| succeeding.
| alfalfasprout wrote:
| The bigger problem I'm seeing is engineers that become over
| reliant on vibe coding tools are starting to lose context on
| how systems are designed and work.
|
| As a result, their productivity might go up on simple "ticket
| like tasks" where it's basically just simple implementation
| (find the file(s) to edit, modify it, test it) but when they
| start using it for all their tasks suddenly they don't know how
| anything works. Or worse, they let the LLM dictate and bad
| decisions are made.
|
| These same people are also very dogmatic on the use of these
| tools. They refuse to just code when needed.
|
| Don't get me wrong, this stuff has value. But I just hate
| seeing how it's made many engineers complacent and accelerated
| their ability to add to tech debt like never before.
| mnky9800n wrote:
| Yea that's right. It's kind of annoying how useful it is for
| hobby projects and it is suddenly useless on anything at work.
| Haha. I love Claude code for some stuff (like generating a
| notebook to analyse some data). But it really just disconnects
| you from the problem you are solving without you going through
| everything it writes. And I'm really bullish on ai coding tools
| haha, for example:
|
| https://open.substack.com/pub/mnky9800n/p/coding-agents-prov...
| pqs wrote:
| I'm not a programmer, but I need to write python and bash
| programs to do my work. I also have a few websites and other
| personal projects. Claude Code helps me implement those little
| projects I've been wanting to do for a very long time, but I
| couldn't due to the lack of coding experience and time. Now I'm
| doing them. Also now I can improve my emacs environment,
| because I can create lisp functions with ease. For me, this is
| the perfect tool, because now I can do those little projects I
| couldn't do before, making my life easier.
| zingar wrote:
| Big +1 to customizing emacs! Used to feel so out of reach,
| but now I basically rolled my own cursor.
| chamomeal wrote:
| LLMs totally kick ass for making bash scripts
| dboreham wrote:
| Strong agree. Bash is so annoying that there have been many
| scripts that I wanted to have, but just didn't write (did
| the thing manually instead) rather than go down the rabbit
| hole of Bash nonsense. LLMs turn this on its head. I
| probably have LLMs write 1-2 bash scripts a week now, that
| I commit to git for use now and later.
| MangoCoffee wrote:
| At the end of the day, all tools are made to make their
| users' lives easier.
|
| I use GitHub Copilot. I recently did a vibe code hobby
| project for a command line tool that can display my
| computer's IP, hard drive, hard drive space, CPU, etc. GPT
| 4.1 did coding and Claude did the bug fixing.
|
| The code it wrote worked, and I even asked it to create a
| PowerShell script to build the project for release
| stpedgwdgfhgdd wrote:
| For enterprise software development CC is definitely there.
| 100k Go code paas platform, micro services architecture, mono
| repo is manageable.
|
| The prompt needs to be good, but in plan mode it will
| iteratively figure it out.
|
| You need to have automated tests. For enterprise software
| development that actually goes without saying.
| dclowd9901 wrote:
| It also steps right over easy optimizations. I was doing a
| query on some github data (tedious work) and rather than
| preliminarily filter down using the graphql search method, it
| wanted to comb through all PRs individually. This seems like
| something it probably should have figured out.
| amelius wrote:
| It is very useful for small tasks like fixing network problems,
| or writing regexp patterns based on a few examples.
| MarcelOlsz wrote:
| _Here 's how YOU can save $200/mo!_
| brulard wrote:
| For me it was like this for like a year (using Cline + Sonnet &
| Gemini) until Claude Code came out and until I learned how to
| keep context real clean. The key breakthrough was treating AI
| as an architect/implementer rather than a code generator.
|
| Most recently I ask first CC to create a design document for
| what we are going to do. He has instructions to look into the
| relevant parts of the code and docs to reference them. I review
| it and few back-and-forths we have defined what we want to do.
| Next step is to chunk it into stages and even those to smaller
| steps. All this may take few hours, but after this is well
| defined, I clear the context. I then let him read the docs and
| implement one stage. This goes mostly well and if it doesn't I
| either try to steer him to correct it, or if it's too bad, I
| improve the docs and start this stage over. After stage is
| complete, we commit, clear context and proceed to next stage.
|
| This way I spend maybe a day creating a feature that would take
| me maybe 2-3. And at the end we have a document, unit tests,
| storybook pages, and features that gets overlooked like
| accessibility, aria-things, etc.
|
| At the very end I like another model to make a code review.
|
| Even if this didn't make me faster now, I would consider it
| future-proofing myself as a software engineer as these tools
| are improving quickly
| imiric wrote:
| This is a common workflow that most advanced users are
| familiar with.
|
| Yet even following it to a T, and being _really_ careful with
| how you manage context, the LLM will still hallucinate,
| generate non-working code, steer you into wrong directions
| and dead ends, and just waste your time in most scenarios.
| There 's no magical workflow or workaround for avoiding this.
| These issues are inherent to the technology, and have been
| since its inception. The tools have certainly gotten more
| capable, and the ecosystem has matured greatly in the last
| couple of years, but these issues remain unsolved. The idea
| that people who experience them are not using the tools
| correctly is insulting.
|
| I'm not saying that the current generation of this tech isn't
| useful. I've found it very useful for the same scenarios GP
| mentioned. But the above issues prevent me from relying on it
| for anything more sophisticated than that.
| drums8787 wrote:
| My experience is the opposite I guess. I am having a great time
| using claude to quickly implement little "filler features" that
| require a good amount of typing and pulling from/editing
| different sources. Nothing that requires much brainpower beyond
| remembering the details of some sub system, finding the right
| files, and typing.
|
| Once the code is written, review, test and done. And on to more
| fun things.
|
| Maybe what has made it work is that these tasks have all fit
| comfortably within existing code patterns.
|
| My next step is to break down bigger & more complex changes
| into claude friendly bites to save me more grunt work.
| unlikelytomato wrote:
| I wish I shared this experience. There are virtually no
| filter features for me to work on. When things feel like
| filler on my team, it's generally a sign of tech debt and we
| wouldn't want to have it generate all the code it would take.
| What are some examples of filler features for you?
|
| On the other hand, it does cost me about 8 hours a week
| debugging issues created by bad autocompletes from my team.
| The last 6 months have gotten really bad with that. But that
| is a different issue.
| apimade wrote:
| Many who say LLMs produce "enterprise-grade" code haven't
| worked in mid-tier or traditional companies, where projects are
| held together by duct tape, requirements are outdated, and
| testing barely exists. In those environments, enterprise-ready
| code is rare even without AI.
|
| For developers deeply familiar with a codebase they've worked
| on for years, LLMs can be a game-changer. But in most other
| cases, they're best for brainstorming, creating small tests, or
| prototyping. When mid-level or junior developers lean heavily
| on them, the output may look useful.. until a third-party
| review reveals security flaws, performance issues, and built-in
| legacy debt.
|
| That might be fine for quick fixes or internal tooling, but
| it's a poor fit for enterprise.
| therealpygon wrote:
| I mostly agree, with the caveat that I would say it can
| absolutely be useful when used appropriately as an "assistant".
| NOT vibe coding blindly and hoping what I end up with is
| useful. "Implement x specific thing" (e.g. add an edit button
| to component x), not "implement a whole new epic feature that
| includes changes to a significant number of files". Imagine
| meeting a house builder and saying "I want a house", then
| leaving and expecting to come back to exactly the house you
| dreamed of.
|
| I get why, it's a test of just how intuitive the model can be
| at planning and execution which drives innovation more than 1%
| differences in benchmarks ever will. I encourage that
| innovation in the hobby arena or when dogfooding your AI
| engineer. But as a replacement developer in an enterprise where
| an uncaught mistake could cost millions? No way. I wouldn't
| even want to be the manager of the AI engineer team, when they
| come looking for the only real person to blame for the mistake.
|
| For additional checks, or internal tools, and for scripts?
| Sure. It's incredibly useful with all sorts of non- application
| development tasks. I've not written a batch or bash script in
| forever...you just don't really need to much anymore. The
| linear flow of most batch/bash/scripts (like you mentioned)
| couldn't be a more suitable domain.
|
| Also, with a basic prompt, it can be an incredibly useful
| rubber duck. For example, I'll say something like "how do you
| think I should solve x problem"(with tools for the codebase and
| such, of course), and then over time having rejected and been
| adversarial to every suggestion, I end up working through the
| problem and have a more concrete mental design. Think "over-
| eager junior know-it-all that tries to be right constantly"
| without the person attached and you get a better idea of what
| kind of LLM output you can expect. For me it's less about
| wanting a plan from the LLM, and more about talking through the
| problems my plan solves better.
| TZubiri wrote:
| Remember kids, just because you CAN doesn't mean you SHOULD
| mrcwinn wrote:
| This tells me they've gotten very good at caching and modeling
| the impact of caching.
| fpauser wrote:
| O observed that claude produces a lot of bloat. Wonder how such
| llm generated projects age.
| cadamsdotcom wrote:
| I'm glad to see the only company chasing margins - which they get
| by having a great product and a meticulous brand - finding even
| more ways to get margin. That's good business.
| howinator wrote:
| I could be wrong, but I think this pricing is the first to admit
| that cost scales quadratically with number of tokens. It's the
| first time I've seen nonlinear pricing from an LLM provider which
| implicitly mirrors the inference scaling laws I think we're all
| aware of.
| jpau wrote:
| Google[1] also has a "long context" pricing structure. OpenAI
| may be considering offering similar since they do not offer
| their priority processing SLAs[2] for context >128K.
|
| [1] https://cloud.google.com/vertex-ai/generative-ai/pricing
|
| [2] https://openai.com/api-priority-processing/
| reverseblade2 wrote:
| Does this cover subscription?
| anonym29 wrote:
| API only for now, but at the very bottom of the post: "We're
| also exploring how to bring long context to other Claude
| products."
|
| So, not yet, but maybe someday?
| _joel wrote:
| Fantastic, use up your quota even more quickly. :)
| phyzix5761 wrote:
| What I've found with LLMs is they're basically a better version
| of Google Search. If I need a quick "How do I do..." or if I need
| to find a quick answer to something its way more useful than
| Google and the fact that I can ask follow up questions is
| amazing. But for any serious deep work it has a long way to go.
| mr_moon wrote:
| I feel exactly the same way. why skim and sift 15 different
| stackoverflow posts when an LLM can pick out exactly the info I
| need?
|
| I don't need to spin up an entire feature in a few seconds. I
| need help understanding where something is broken; what are
| some opinions o best practice; or finding out what a poorly
| written snippet is doing.
|
| context still v important for this though and I appreciate
| cranking that capacity. "read 15000 stackoverflow posts for me
| please"
| anvuong wrote:
| The action of sifting through through poop to find gold
| actually positively develops my critical thinking skill. I,
| too, went through a phase of just asking LLM for a specific
| concept instead of Googling it and weave through dozens of
| wiki pages or niche mailing list discussions. It did improve
| my productivity but I feel like it dulls my brain. So
| recently I have to tone that down and force myself to go back
| to the old way. Maybe too much of a good thing is bad.
| Whatarethese wrote:
| This is my primary use of AI. Looking for a new mountain bike
| and using AI to list and compare parts of the bike and which is
| best for my use case scenario. Works pretty well so far.
| meander_water wrote:
| I like to spend a lot of time in "Ask" mode in Cursor. I guess
| the equivalent in Claude code is "plan" mode.
|
| Where I have minimal knowledge about the framework or language, I
| ask a lot of questions about how the implementation would work,
| what the tradeoffs are etc. This is to minimize any
| misunderstanding between me and the tool. Then I ask it to write
| the implementation plan, and execute it one by one.
|
| Cursor lets you have multiple tabs open so I'll have a Ask mode
| and Agent mode running in parallel.
|
| This is a lot slower, and if it was a language/framework I'm
| familiar with I'm more likely to execute the plan myself.
| itissid wrote:
| My experience with Claude code beyond building anything bigger
| than a webpage, a small API, a tutorial on CSS etc has been
| pretty bad. I think context length is a manageable problem, but
| not the main one. I used it to write a 50K LoC python code base
| with 300 unit tests and it went ok for the first few weeks and
| then it failed. This is after there is a CLAUDE.md file for every
| single module that needs it as well as detailed agents for
| testing, design, coding and review.
|
| I won't going into a case by case list of its failures, The core
| of the issue is misaligned incentives, which I want to get into:
|
| 1. The incentives for coding agent, in general and claude, are
| writing LOTS of code. None of them -- O -- are good at the
| planning and verification.
|
| 2. The involvement of the human, ironically, in a haphazard way
| in the agent's process. And this has to do with how the problem
| of coding for these agents is defined. Human developers are like
| snow flakes when it comes to opinions on software design, there
| is no way to apply each's preference(except paper machet and
| superglue SO, Reddit threads and books) to the design of the
| system in any meaningful way and that makes a simple system way
| too complex or it makes a complex problem simplistic.
| - There is no way to evolve the plan to accept new preferences
| except text in CLAUDE.md file in git that you will have to read
| through and edit. - There is no way to know the near
| term effect of code choices now on 1 week from now. -
| So much code is written that asking a person to review it in case
| you are at the envelope and pushing the limit feels morally wrong
| and an insane ask. How many of your Code reviews are instead
| replaced by 15-30 min design meetings to instead solicit feedback
| on design of the PR -- because it so complex -- and just push the
| PR into dev? WTF am I even doing I wonder. - It does
| not know how far to explore for better rewards and does not know
| it better from local rewards, Resulting in commented out tests
| and deleting arbitrary code, to make its plan "work".
|
| In short code is a commodity for CEOs of Coding agent companies
| and CXOs of your company to use(sales force has everyone coding,
| but that just raises the floor and its a good thing, it does NOT
| lower the bar and make people 10x devs). All of them have bought
| into this idea that 10x is somehow producing 10x code. Your time
| reviewing and unmangling and mainitaining the code is not the
| commodity. It never ever was.
| lpa22 wrote:
| One of the most helpful usages of CC so far is when I simply ask:
|
| "Are there any bugs in the current diff"
|
| It analyzes the changes very thoroughly, often finds very subtle
| bugs that would cost hours of time/deployments down the line, and
| points out a bunch of things to think through for correctness.
___________________________________________________________________
(page generated 2025-08-12 23:00 UTC)