hngopher.com

       [HN Gopher] Claude Opus 4.1
       ___________________________________________________________________
        
       Claude Opus 4.1
        
       Author : meetpateltech
       Score  : 565 points
       Date   : 2025-08-05 16:28 UTC (6 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | jasonlernerman wrote:
       | Has anyone tested it yet? How's it acting?
        
         | smerrill25 wrote:
         | waiting for this, too.
        
         | usaar333 wrote:
         | No obvious gains I feel from quick chats, but too early to
         | tell.
         | 
         | These benchmark gains aren't that high, so I doubt it is that
         | obvious.
        
         | jedisct1 wrote:
         | Tested it on a refactor of Zig code. It worked fine, but was
         | very slow.
        
       | minimaxir wrote:
       | This likely won't move the needle for Opus use over Sonnet while
       | the cost remains the same. Using OpenRouter rankings
       | (https://openrouter.ai/rankings) as a proxy, Sonnet 3.7 and
       | Sonnet 4 combined generates 17x more tokens than Opus 4.
        
       | qsort wrote:
       | All three major labs released something within hours of each
       | other. This anime arc is insane.
        
         | x187463 wrote:
         | Given the GPT5 rumors, August is just getting started.
        
           | kridsdale3 wrote:
           | Given the Gregorian Calendar and the planet's path through
           | its orbit, August is just getting started.
        
             | tomrod wrote:
             | This legitimately made me chuckle.
        
         | ozgung wrote:
         | What a time to be alive
        
         | tonyhart7 wrote:
         | as if they wait competitor first then launch it at the same
         | time to make market decide which one is best
        
           | torginus wrote:
           | I think this means that GPT5 is better - you can't launch a
           | worse model after the competitor supersedes you - you have to
           | show that you're in the lead even if its just for a day.
        
             | rapind wrote:
             | Not sure that this is true. Are there a lot of people
             | waiting anxiously to adopt the next model on the day of
             | release and expecting some huge work advantage?
        
               | azan_ wrote:
               | Absolutely.
        
         | vFunct wrote:
         | None of them seem to have published any papers associated with
         | them on how these new models advanced the state-of-the-art
         | though. =^(
        
           | hugodan wrote:
           | china will do that for them
        
         | candiddevmike wrote:
         | It's definitely a coincidence
        
           | wilg wrote:
           | It's not a coincidence or a cartel, it's PR
           | counterprogramming.
        
             | BudaDude wrote:
             | Agree 100%
             | 
             | If you look at the past, whenever Google announces
             | something major, OpenAI almost always releases something as
             | well.
             | 
             | People forget realize that OpenAI was started to compete
             | with Google on AI.
        
         | Etheryte wrote:
         | This is why you have PR departments. Being on top of the HN
         | front page, news sites, etc matters a lot. Even if you can't be
         | the first, it's important to dilute the attention as much as
         | possible to reduce the limelight your competitors get.
        
       | steveklabnik wrote:
       | This is the bit I'm most interested in:
       | 
       | > We plan to release substantially larger improvements to our
       | models in the coming weeks.
        
         | machiaweliczny wrote:
         | This is so people don't immediately migrate for GPT5
        
       | NitpickLawyer wrote:
       | Cheekily announcing during oAI's oss model launch :D
        
       | haaz wrote:
       | it is barely an improvement according to their own benchmarks.
       | not saying thats a bad thing, but not enough for anybody to
       | notice any difference
        
         | waynenilsen wrote:
         | i think its probably mostly vibes but that still counts, this
         | is not in the charts
         | 
         | > Windsurf reports Opus 4.1 delivers a one standard deviation
         | improvement over Opus 4 on their junior developer benchmark,
         | showing roughly the same performance leap as the jump from
         | Sonnet 3.7 to Sonnet 4.
        
         | ttoinou wrote:
         | That's why they named it 4.1 and not 4.5
        
           | zamadatix wrote:
           | When it's "that's why they incremented the version by a tenth
           | instead of a half" you know things have really started to
           | slow for the large models.
        
             | phonon wrote:
             | Opus 4 came out 10 weeks ago. So this is basically one new
             | training run improvement.
        
               | zamadatix wrote:
               | And in 52 weeks we've gone 3.5->4.1 with this training
               | improvement, meanwhile the 52 weeks prior to that were
               | Claude -> Claude 3. The absolute jumps per version delta
               | also used to be larger.
               | 
               | I.e. it seems we don't get much more than new training
               | run levels of improvement anymore. Which is better than
               | nothing, but a shame compared to the early scaling.
        
               | globalise83 wrote:
               | Is it really a bigger jump to go from plausible to
               | frequently useful, than from frequently useful to
               | indispensable?
        
               | zamadatix wrote:
               | Why is there supposed to be no step between frequently
               | useful and indispensable? Quickly going from nothing to
               | frequently useful (which involved many rapid hops
               | between) was certainly surprising, and that's precisely
               | the lost momentum.
        
             | mclau157 wrote:
             | They released this because competitors are releasing things
        
         | leetharris wrote:
         | Good! I'm glad they are just giving us small updates. Opus 4
         | just came out, if you have small improvements, why not just
         | release them? There's no downside for us.
        
         | AstroBen wrote:
         | I don't think this could even be called an improvement? It's
         | small enough that it could just be random chance
        
           | j_bum wrote:
           | I've always wondered about this actually. My assumption is
           | that they always "pick the best" result from these tests.
           | 
           | Instead, ideally they'd run the benchmark tests many times,
           | and share all of the results so we could make statistical
           | determinations.
        
         | gloosx wrote:
         | They need to leave some room to release 10 more models. They
         | could crank benchmarks to 100% but then no new model is needed
         | lol? Pretty sure these pretty benchmark graphs are all
         | completely staged marketing numbers since they do solve the
         | same problems they are being trained on - no novel or unknown
         | problematic is presented to them.
        
         | levocardia wrote:
         | "You pay $20/mo for X, and now I'm giving you 1.05*X for the
         | same price." Outrageous!
        
         | onlyrealcuzzo wrote:
         | I will only add that it's interesting that in the results
         | graphic, they simply highlighted Opus 4.1 - choosing not to
         | display which models have the best scores - as Opus 4.1 only
         | scored the best on about half of the benchmarks - and was worse
         | than Opus 4.0 on at least one measure.
        
         | Topfi wrote:
         | I am still very early, but output quality wise, yes, there does
         | not seem to be any noticeable improvement in my limited
         | personal testing suite. What I have noticed though is
         | subjectively better adherence to instructions and documentation
         | provided outside the main prompt, though I have no way to
         | quantify or reliably test that yet. So beyond reliably finding
         | Needles-in-the-Haystack (which Frontier models have done well
         | on lately), Opus 4.1 seems to do better in following those
         | needles even if not explicitly guided to compared to Opus 4.
        
       | jzig wrote:
       | I'm confused by how Opus is presented to be superior in nearly
       | every way for coding purposes yet the general consensus and my
       | own experience seem to be that Sonnet is much much better. Has
       | anyone switched to entirely using Opus from Sonnet? Or maybe
       | switching to Opus for certain things while using Sonnet for
       | others?
        
         | datameta wrote:
         | I now eagerly await Sonnet 4.1, only because of this release.
        
         | rtfeldman wrote:
         | Yes, Opus is very noticeably better at programming in both Rust
         | and Zig in my experience. I wish it were cheaper!
        
         | MostlyStable wrote:
         | Opus seems better to me on long tasks that require iterative
         | problem solving and keeping track of the context of what we
         | have already tried. I usually switch to it for any kind of
         | complicated troubleshooting etc.
         | 
         | I stick with Sonnet for most things because it's generally good
         | enough and I hit my token limits with it far less often.
        
           | unshavedyak wrote:
           | Same. I'm on the $200 plan and I find Opus "better", but
           | Sonnet is more straight forward. Sonnet is, to me, a "don't
           | let it think" model. It does great if you give it concrete
           | and small goals. Anything vague or broad and it starts
           | thinking and it's a problem.
           | 
           | Opus gives you a bit more rope to hang yourself with imo.
           | Yes, it "thinks" slightly better, but still not good enough
           | to me. But it can be good enough to convince you that it can
           | do the job.. so i dunno, i almost dislike it in this regard.
           | I find Sonnet just easier to predict in this regard.
           | 
           | Could i use Opus like i do Sonnet? Yes definitely, and
           | generally i do. But then i don't really see much difference
           | since i'm hand-holding so much.
        
         | adastra22 wrote:
         | Every time that Sonnet is acting like it has brain damage
         | (which is once or twice a day), I switch to Opus and it seems
         | to sort things out pretty fast. This is unscientific anicdata
         | though, and it could just be that switching models (any model)
         | would have worked.
        
           | anonzzzies wrote:
           | Exactly that.
        
           | j45 wrote:
           | They both seem to behave differently depending on how loaded
           | the system seems to be.
        
             | api wrote:
             | I have suspected for a long time that hosted models load
             | shed by diverting some requests to lesser models or running
             | more quantized versions under high load.
        
               | parineum wrote:
               | I think OpenRouter saves tokens by summarizing queries
               | through another model, IIRC.
        
           | monatron wrote:
           | This is a great use case for sub-agents IMO. By default, sub-
           | agents use sonnet. You can have opus orchestrate the various
           | agents and get (close to) the best of both worlds.
        
             | adastra22 wrote:
             | Is there a way to get persistent sub-agents? I'd love to
             | have a bunch of YAML files in my repository, one for each
             | sub-agent, and have those automatically used across all
             | Claude Code instances I have on multiple machines (I dev on
             | laptop and desktop), or across the team.
        
               | mwigdahl wrote:
               | Yep: https://docs.anthropic.com/en/docs/claude-code/sub-
               | agents
        
             | rapind wrote:
             | In this case I don't think the controller needs to be the
             | smartest model. I use sonnet as the main driver and pass
             | the heavy thinking (via zen mcp) onto Gemini pro for
             | example, but I could use openai or opus or all of them via
             | OpenRouter.
             | 
             | Subagents seem pretty similar to using zen mcp w/
             | OpenRouter but maybe better or at least more turnkey? I'll
             | be checking them out.
        
               | mark_undoio wrote:
               | Amp (ampcode.com) uses Sonnet as its main model and has
               | GPT o3 as a special purpose tool / subagent. It can call
               | into that when it needs particularly advanced reasoning.
               | 
               | Interestingly I found that prompting it to ask the o3
               | submodel (which they call The Oracle) to check Sonnet's
               | working on a debugging solution was helpful. Extra
               | interesting to me was the fact that Sonnet appeared to do
               | a better job once I'd prompted that (like chain of
               | thought prompting, perhaps asking it to put forward an
               | explanation to be checked actually triggered more
               | effective thinking).
        
           | gpm wrote:
           | This seems like a case of reversion to the mean. When one
           | model is performing below average, changing anything (like
           | switching to another model) is likely to improve it by random
           | chance...
        
             | keeeba wrote:
             | Anthropic say Opus is better, benchmarks & evals say Opus
             | is better, Opus has more parameters and parameters
             | determine how much a NN can learn.
             | 
             | Maybe Opus just is better
        
           | HarHarVeryFunny wrote:
           | Maybe context rot? If model's output seems to be getting
           | worse or in a rut, then try just clearing context / starting
           | a new session.
        
             | adastra22 wrote:
             | Switching models with the same context, in this case.
        
         | dested wrote:
         | If I'm using cursor then sonnet is better, but in claude code
         | Opus 4 is at least 3x better than Sonnet. As with most things
         | these days, I think a lot of it comes down to prompting.
        
           | jzig wrote:
           | This is interesting. I do use Cursor with almost exclusively
           | Sonnet and thinking mode turned on. I wonder if what Cursor
           | does under the hood (like their indexing) somehow empowers
           | Sonnet more. I do not have much experience with using Claude
           | Code.
        
         | seunosewa wrote:
         | It's ridiculously overpriced in the API. Just like o3 used to
         | be.
        
         | brenoRibeiro706 wrote:
         | I feel the same way. I usually use Opus to help with coding and
         | documentation, and I use Sonnet for emails and so on.
        
         | biinjo wrote:
         | Im on the Max plan and generally Opus seems to do better work
         | than Sonnet. However, that's only when they allow me to use
         | Opus. The usage limits, even on the max plan, are a joke.
         | Yesterday I hit the limits within MINUTES of starting my work
         | day.
        
           | epolanski wrote:
           | Yeah, you need to actively cherry pick which model to use in
           | order to not waste tokens on stuff that would be easily
           | handed by a simpler model.
        
           | furyofantares wrote:
           | I'm a bit confused by people hitting usage limits so quickly.
           | 
           | I use Opus exclusively and don't hit limits. ccusage reports
           | I'm using the API-equivalent of $2000/mo
        
             | rirze wrote:
             | You always have to ask which plan they're paying for.
             | Sometimes people complain about the $20 per month plan...
        
               | stavros wrote:
               | There's no Opus quota on that plan at all.
        
               | furyofantares wrote:
               | In this case I'm replying to someone who lead with "I'm
               | on the Max plan" but I realize now that's ambiguous,
               | maybe they are on 5x while I'm on 20x.
        
             | Bolwin wrote:
             | That's insane. Are you accounting for caching? If not,
             | there's no way this is going to last
        
               | furyofantares wrote:
               | I'm using ccusage to get the number, I think it just
               | looks at your history and calculates based on tokens vs
               | API pricing. So I think it wouldn't account for caching.
               | 
               | But I totally agree there's no way it lasts. I'm mostly
               | only using this for side projects and I'm sitting there
               | interacting with it, not YOLO'ing, I do sometimes have
               | two sessions going at the same time but I'm not firing
               | off swarms or anything crazy. Just have it set to Opus
               | and I chat with it.
        
           | dsrtslnd23 wrote:
           | same here constantly hit the Opus limits after minutes on Max
           | plan
        
         | gpm wrote:
         | I notice that on the "Agentic Coding" benchmark cited in the
         | article Sonnet 4 outperformed Opus 4 (by 0.2%), and under
         | performs Opus 4.1 (by -1.8%).
         | 
         | So this release might change that consensus? If you believe the
         | benchmarks are reflective of reality anyways.
        
           | jimbo808 wrote:
           | > If you believe the benchmarks are reflective of reality
           | anyways.
           | 
           | That's a big "if." But yeah, I can't tell a difference
           | subjectively between Opus and Sonnet, other than maybe a sort
           | of placebo effect. I'm more careful to write quality prompts
           | when using Opus, because I don't want to waste the 5x more
           | expensive tokens.
        
         | Uehreka wrote:
         | > yet the general consensus and my own experience seem to be
         | that Sonnet is much much better
         | 
         | Given that there's nothing close to scientific analysis going
         | on, I find it hard to tell how big the "Sonnet is overall
         | better, not just sometimes" crowd is. I think part of the
         | problem is that "The bigger model is better" feels obvious to
         | say, so why say it? Whereas "the smaller model is better
         | actually" feels both like unobvious advice and also the kind of
         | thing that feels smart to say, both of which would lead to more
         | people who believe it saying it, possibly creating the illusion
         | of consensus.
         | 
         | I was trying to dig into this yesterday, but every time I come
         | across a new thread the things people are saying and the
         | proportions saying what are different.
         | 
         | I suppose one useful takeaway is this: If you're using Claude
         | Max and get downgraded from Opus to Sonnet for a few hours, you
         | don't have to worry too much about it being a harsh downgrade
         | in quality.
        
         | taormina wrote:
         | Just more ancedata, but I entirely agree. I can't say that I am
         | happy with Sonnet's output at any point, really, but it still
         | occasionally works, whereas Opus has been a dumpster fire every
         | single time.
        
         | SkyPuncher wrote:
         | I don't doubt Opus is technically superior, but it's not
         | practically superior for me.
         | 
         | It's still pretty much impossible to have any LLM one-shot a
         | complex implementation. There's just too many details to figure
         | out and too much to explain for it to get correct. Often,
         | there's uncertainty and ambiguity that I only understand the
         | correct answer (or rather less bad answer) after I've spent
         | time deep in the code. Having Opus spit out a possibly correct
         | solution just isn't useful to me. I need to understand _why_ we
         | got to that solution and _why_ it's a correct solution for the
         | context I'm working in.
         | 
         | For me, this means that I largely have an iteratively driven
         | implementation approach where any particular task just isn't
         | that complex. Therefore, Sonnet is completely sufficient for my
         | day-to-day needs.
        
           | ssk42 wrote:
           | You can also always have it create design docs and mermaid
           | diagrams for each task. Outline the why much easier earlier,
           | shifting left
        
           | bdamm wrote:
           | I've been having a great time with Windsurf's "Planning"
           | feature. Have a nice discussion with Cascade (Claude) all
           | about what it is that neerds to happen - sometimes a very
           | long conversation including test code. Then when everything
           | is very clear, make it happen. Then test and debug the
           | results with all that context. Pretty nice.
        
             | jstummbillig wrote:
             | Can you explain what you do exactly? Do you enable plan
             | mode and use with chat...?
        
         | jm4 wrote:
         | I use both. Sonnet is faster and more cost efficient. It's
         | great for coding. Where Opus is noticeably better is in
         | analysis. It surpasses Sonnet for debugging, finding patterns
         | in data, creativity and analysis in general. It doesn't make a
         | lot of sense to use Opus exclusively unless you're on a max20
         | plan and not hitting limits. Using Opus for design and
         | troubleshooting and Sonnet for everything else is a good way to
         | go.
        
         | astrostl wrote:
         | With aggressive Claude Code use I didn't find Sonnet _better_
         | than Opus but I did find it _faster_ while consuming far fewer
         | tokens. Once I switched to the $100 Max plan and configured CC
         | to exclusively use Sonnet I haven 't run into a plan token
         | limit even once. When I saw this announcement my first thing
         | was to CMD-F and see when Sonnet 4.1 was coming out, because I
         | don't really care about Opus outside of interactive deep
         | research usage.
        
         | ssss11 wrote:
         | That's very strange. Sonnet is hot garbage and Opus is a
         | miracle, for me. I also don't see anyone praising sonnet
         | anywhere.
        
         | sky2224 wrote:
         | I've found with limited context provided in your prompt, opus
         | is just awful compared to even gpt-4.1, but once I give it even
         | just a little bit more of an explanation, it jumps leagues
         | ahead.
        
         | sothatsit wrote:
         | Opus really shines for completing long-running tasks with no
         | supervision. But if you are using Claude Code interactively and
         | actively steering it yourself, Sonnet is good enough and is
         | faster.
         | 
         | I don't believe anyone saying Sonnet is strictly better than
         | Opus though, as my experience has been exactly the opposite.
         | But trade-off wise, I can definitely see it being a better
         | experience when used interactively because of its speed and
         | lower cost.
        
       | paxys wrote:
       | Why is everything releasing today?
        
         | datameta wrote:
         | Could it be nobody wanted to be first and overshadowed, nor the
         | only one left out - and it cascaded after the first
         | announcement? My first hunch, though, was that it had been
         | agreed upon. Game theory I think tells us that releasing same
         | day in the pattern ABC BCA CAB etc would be lowest risk and
         | highest average gain?
        
         | highfrequency wrote:
         | If they release before GPT-5, they don't have to compare to
         | GPT-5 in their benchmarks. It's a big PR win to be able to
         | plausibly claim that your model is the best coding model at the
         | time of release.
        
       | gusmally wrote:
       | They restarted Claude Plays Pokemon with the new model:
       | https://www.twitch.tv/claudeplayspokemon
       | 
       | (He had been stuck in the Team Rocket hideout (I believe) for
       | weeks)
        
       | alrocar wrote:
       | just ran the LLM to SQL benchmark over opus-4.1 and it didn't top
       | previous version :thinking: => https://llm-
       | benchmark.tinybird.live/
        
         | epolanski wrote:
         | How does running it multiple times performs?
         | 
         | LLMs are non-deterministic, I think benchmarks should be more
         | about averages of N runs, rather than single shot experiments.
        
       | jedisct1 wrote:
       | Is it just me or is it super slow?
        
       | taormina wrote:
       | Alright, well, Opus 4.1 seems exactly as useless as Opus 4 was,
       | but it's probably eating my tokens faster. Wish they let you tell
       | somehow.
       | 
       | At least Sonnet 4 is still usable, but I'll be honest, it's been
       | producing worse and worse slob all day.
       | 
       | I've basically wasted the morning on Claude Code when I should've
       | just been doing it all myself.
        
         | AlecSchueler wrote:
         | I've also noticed Sonnet starting to degrade. It's developing
         | some of the behaviours that put me off the competition in the
         | first place. Needless explanations, filler in responses,
         | wanting to put everything in lists, even increased sycophancy.
        
         | bavell wrote:
         | > I've basically wasted the morning on Claude Code when I
         | should've just been doing it all myself.
         | 
         | Welcome to the machine
         | 
         | https://www.youtube.com/watch?v=tBvAxSx0nAM&t=45s
        
       | rvz wrote:
       | Notice how Anthropic has never open sourced any of their models.
       | 
       | This makes them (Anthropic) worse than OpenAI in terms of
       | openness.
       | 
       | Since in this case as we all know. [0]
       | 
       |  _" What will permanently change everything is open source and
       | transparent AI models that are smaller and more powerful than
       | GPT-3 or even GPT-4."_
       | 
       | [0] https://news.ycombinator.com/item?id=34865626
        
         | jjani wrote:
         | On the other hand, they have always exposed their raw chain of
         | thought, so you know exactly what you're paying for, unlike
         | OpenAI who hides it. Similarly they allow an actual thinking
         | budget rather than vague "low, medium, high", again unlike
         | OpenAI. They also allow API access to all their models without
         | draconic send-us-your-personal-data-KYC, once more unlikely
         | OpenAI.
         | 
         | They might not fit your personal definition of "openness", but
         | they do fit many other equally valid interpretations of that
         | contept.
        
       | ryandrake wrote:
       | Am I the only one super confused about how to even get started
       | trying out this stuff? Just so I wouldn't be "that critic who
       | doesn't try the stuff he criticizes," I tried GitHub Copilot and
       | was kind of not very impressed. Someone on HN told me Copilot
       | sucks, use Claude. But I have no idea what the right way to do it
       | is because there are so many paths to choose.
       | 
       | Let's see: we have Claude Code vs. Claude the API vs. Claude the
       | website, and they're totally different from each other? One is
       | command line, one integrates into your IDE (which IDE?) and one
       | is just browser based, I guess. Then you have the different
       | pricing plans, Free, Pro, and Max? But then there's also Claude
       | Team and Claude Enterprise? These are monthly plans that only
       | work with Claude the Website, but Claude Code is per-request? Or
       | is it Claude API that's per-request? I have no idea. Then you
       | have the models: Claude Opus and Claude Sonnet, with various
       | version numbers for each?? Then there's Cline and Cursor and GOOD
       | GRIEF! I just want to putz around with something in VSCode for a
       | few hours!
        
         | adamors wrote:
         | Download Cursor and try it through that, IMO that's currently
         | the most polished experience especially since you can change
         | models on the fly. For more advanced usecases, CLI is better
         | but for getting your feet wet I think Cursor is the best
         | choice.
        
           | ryandrake wrote:
           | Thanks. Too bad you need to switch editors to go that path. I
           | assume the Cursor monthly plans are not the same as the
           | Claude monthly plans and you can't use one for the other if
           | you want to experiment...
        
             | kingnothing wrote:
             | Cursor is built on VSCode.
        
         | olalonde wrote:
         | Claude Code CLI.
        
           | ryandrake wrote:
           | Thanks. With the CLI, can you get Copilot-ish things like
           | tab-completion and inline commands directly in your IDE? Or
           | do you need to copy/paste to and from a terminal? It feels
           | like running a command on the IDE and then copying the output
           | into your IDE is a pretty primitive way to operate.
        
             | cultureulterior wrote:
             | Claude does the coding, and edits your files. You just sit
             | back and relax. You don't do any tab completion etc.
        
             | avemg wrote:
             | My advice is this:
             | 
             | 1) Completely separate in your mind the auto-completion
             | features from the agentic coding features. The auto-
             | completion features are a neat trick but I personally find
             | those to be a bit annoying overall, even if they sometimes
             | hit it completely right. If I'm writing the code, I mostly
             | don't want the LLM autocompletion.
             | 
             | 2) Pay the $20 to get a month of Claude Pro access and then
             | install Claude Code. Then, either wait until you have a
             | small task in mind or your stuck on some stupid issue that
             | you've been banging your head on and then open your
             | terminal and fire up Claude Code. Explain to it in plain
             | English what you want it to do. Pretend it's a colleague
             | that you're giving a task to over Slack. And then watch it
             | go. It works directly on your source code. There is no
             | copying and pasting code.
             | 
             | 3) Bookmark the Claude website. The next time you'd Google
             | something technical, ask it Claude instead. General
             | questions like "how does one typically implement a flizzle
             | using the floppity-do framework"? "I'm trying to accomplish
             | X, what are my options when using this stack?". General
             | questions like that.
             | 
             | From there you'll start to get it and you'll get better at
             | leverage the tool to do what you want. Then you can branch
             | out the rest of the tool ecosystem.
        
               | ryandrake wrote:
               | Interesting about the auto-completion. That was really
               | the only Copilot feature I found to be useful. The idea
               | of writing out an English prompt and telling Copilot what
               | to write sounded (and still sounds) so slow and clunky.
               | By the time I've articulated what I want it to do, I
               | might as well have written the code myself. The auto-
               | completion was at least a major time-saver.
               | 
               | "The card game state is a structure that contains a Deck
               | of cards, represented by a list of type Card, and a list
               | of Players, each containing a Hand which is also a list
               | of type Card, dealt randomly, round-robin from the Deck
               | object." I could have input the data structure and logic
               | myself in the amount of time it took to describe that.
        
               | avemg wrote:
               | I think you should embrace a bit of ambiguity. Don't
               | treat this like a stupid computer where you have to
               | specify everything in minute detail. Certainly the more
               | detail you give, the better to an extent. But really:
               | Treat it like you're talking to a colleague and give it a
               | shot. You don't have to get it right on the first prompt.
               | You see what it did and you give it further instructions.
               | Autocomplete is the least compelling feature of all of
               | this.
               | 
               | Also, I don't remember what model Copilot uses by
               | default, especially the free version, but the model
               | absolutely makes a difference. That's why I say to spend
               | the $20. That gives you access to Sonnet 4 which is
               | where, imo, these models took a giant leap forward in
               | terms of quality of output.
        
               | ryandrake wrote:
               | Thanks, I shall give it a try.
        
               | rstupek wrote:
               | Is Opus as big a leap as sonnet4 was?
        
               | stillpointlab wrote:
               | One analogy I have been thinking about lately is GPUs.
               | You might say "The amount of time it takes me to fill
               | memory with the data I want, copy from RAM to the GPU,
               | let the GPU do it's thing, then copy it back to RAM, I
               | might as well have just done the task on the CPU!"
               | 
               | I hope when I state it that way you start to realize the
               | error in your thinking process. You don't send trivial
               | tasks to the GPU because the overhead is too high.
               | 
               | You have to experiment and gain experience with agent
               | coding. Just imagine that there are tasks where the
               | overhead of explaining what to do and reviewing the
               | output are dwarfed by the actual implementation. You have
               | to calibrate yourself so you can recognize those tasks
               | and offload them to the agent.
        
               | potatolicious wrote:
               | There's a sweet spot in terms of generalization. Yes,
               | painstakingly writing out an object definition in English
               | just so that the LLM can write it out in Java is a poor
               | use of time. You want to give it more general tasks.
               | 
               | But not _too_ general, because then it can get lost in
               | the sauce and do something profoundly wrong.
               | 
               | IMO it's worth the effort to know these tools, because
               | once you have a more intuitive sense for the right level
               | of abstraction it really does help.
               | 
               | So not "make this very basic data structure for me based
               | on my specs", and more like "rewrite this sequential
               | logic into parallel batches", which might take some
               | actual effort but also doesn't require the model to make
               | too many decisions by itself.
               | 
               | It's also pretty good at tests, which tends to be very
               | boilerplate-y, and by default that means you skip some
               | cases, do a _lot_ of brain-melting typing, or copy-and-
               | paste liberally (and suffer the consequences when you
               | missed that _one_ search and replace). The model doesn 't
               | tire, and it's a simple enough task that the reliability
               | is high. "Generate test cases for this object, making
               | sure to cover edges cases A, B, and C" is a pretty good
               | ROI in terms of your-time-spent vs. results.
        
         | collinvandyck76 wrote:
         | Claude Code is the superior interface in my opinion. Definitely
         | start there.
        
         | Filligree wrote:
         | You need Claude Pro or Max. The website subscription also
         | allows you to use the command line tool--the rate limits are
         | shared--and the command line tool includes IDE integration, at
         | least for VSCode.
         | 
         | Claude Code is currently best-in-class, so no point in starting
         | elsewhere, but you do need to read the documentation.
        
           | wahnfrieden wrote:
           | Correct. Claude Code Max with Opus. Don't even bother with
           | Sonnet.
        
             | kelnos wrote:
             | I wouldn't be too prescriptive. I have Pro, and it's fine.
             | I'm not an incredibly heavy user (yet?); I've hit the rate
             | limits a couple times, but not to the point where I'm
             | motivated to spend more.
             | 
             | I haven't tried it myself, but I've heard from people that
             | Opus can be slow when using it for coding tasks. I've only
             | been using Sonnet, and it's performed well enough for my
             | purposes.
        
               | Filligree wrote:
               | Sonnet works fine in many cases. Opus is smarter, and
               | custom 'agents' can be set to use either.
               | 
               | I prefer configuring it to use Sonnet for things that
               | don't require much reasoning/intelligence, with Opus as
               | the coordinator.
        
         | vlade11115 wrote:
         | Claude Code has two usage modes: pay-per-token or subscription.
         | Both modes are using API under the hood, but with subscription
         | mode you are only paying a fixed amount a month. Each
         | subscription tier has some undisclosed limits, cheaper plans
         | have lower usage limits. So I would recommend paying $20 and
         | trying the Claude Code via that subscription.
        
           | dennisy wrote:
           | No Opus in the $20 tier though sadly
        
             | oblio wrote:
             | What does Opus do extra?
        
               | lxgr wrote:
               | It's a much larger, more capable LLM than Claude Sonnet.
        
             | andyferris wrote:
             | As far as I can tell - that seems to have changed today!
        
           | kace91 wrote:
           | I'm looking for cursor alternatives after confusing pricing
           | changes. Is Claude code an option? Can be integrated on an
           | editor/ide for similar results?
           | 
           | My use case so far is usually requesting mechanic work I
           | would rather describe than write myself like certain test
           | suites, and sometimes discovery on messy code bases.
        
             | andyferris wrote:
             | Claude Code is really good for this situation.
             | 
             | If you like an IDE, for example VS Code you can have the
             | terminal open at the bottom and run Claude Code in that.
             | You can put your instructions there and any edits it makes
             | are visibile in the IDE immediately.
             | 
             | Personally I just keep a separate terminal open and have
             | the terminal and VSCode open on two monitors - seems to
             | work OK for me.
        
         | prinny_ wrote:
         | What exactly did you try with GitHub copilot? It's not an LLM
         | itself, just in interface for an LLM. I have copilot in my
         | professional GitHub account and I can choose between chat-gpt
         | and Claude.
        
         | AlecSchueler wrote:
         | I'm not sure what's complicated about what you're describing?
         | They offer two models and you can pay more for higher usage
         | limits, then you can choose if you want to run it in your
         | browser or in your terminal. Like what else would you expect?
         | 
         | Fwiw I have a Claude pro plan and have no interest in using
         | other offerings so I'm not sure if they're super simple (one
         | model, one interface, one pricing plan)?
        
           | onlyrealcuzzo wrote:
           | When people post this stuff, it's like, are you also confused
           | that Nike sells shoes AND shorts AND shirts, and there's
           | different colors and skus for each article of clothing, and
           | sometimes they sell direct to consumer and other times to
           | stores and to universities, and also there's sales and
           | promotions, etc, etc?
           | 
           | It's almost as if companies sell more than one product.
           | 
           | Why is this the top comment on so many threads about tech
           | products?
        
             | Imustaskforhelp wrote:
             | Because I think that claude has gone beyond tech niche at
             | this point..
             | 
             | Or maybe that's me, but still whether its through the likes
             | of those vibe coding apps like lovable bolt etc.
             | 
             | at the end of the day, Most people are using the same tool
             | which is claude since its mostly superior in coding
             | (questionable now with oss models, but I still use it
             | through kiro).
             | 
             | People expect this stuff to be simple when in reality its
             | not and there is some frustation I suppose.
        
             | furyofantares wrote:
             | In this case, they tried something and were told they were
             | doing it wrong, and they know there's more than one way to
             | do it wrong - wrong model, wrong tool using the model,
             | wrong prompting, wrong task that you're trying to use it
             | for.
             | 
             | And of course you could be doing it right but the people
             | saying it works great could themselves be wrong about how
             | good it is.
             | 
             | On top of that it costs both money and time/effort
             | investment to figure out if you're doing it wrong. It's
             | understandable to want some clarity. I think it's pretty
             | different from buying shoes.
        
               | AlecSchueler wrote:
               | Is it though? People complain about sore feet and hear
               | they wear the wrong kind of shoes so they go to the store
               | where they have to spend money to find out while trying
               | to navigate between dress shoes, minimal shoes, running
               | shoes, hiking shoes etc etc., they have to know their
               | size, ask for assistance in trying them on...
        
               | evilduck wrote:
               | > I think it's pretty different from buying shoes.
               | 
               | Shoe shopping is pretty complex, more so than trialing an
               | AI model in my opinion.
               | 
               | Are you a construction worker, a banker, a cashier or a
               | driver? Are you walking 5 miles everyday or mostly
               | sedentary? Do you require steel toed shoes? How long are
               | you expecting them to last and what are you willing to
               | pay? Are you going to wear them on long runs or take them
               | river kayaking? Do they need to be water resistant,
               | waterproof or highly breathable? Do you want glued,
               | welted, or stitch down construction? What about flat feet
               | or arch support? Does shoe weight matter? What clothing
               | are you going to wear them with? Are you going to be
               | dancing with them? Do the shoes need a break in period or
               | are they ready to wear? Does the available style match
               | your preferences? What about availability, are you ok
               | having them made to order or do you require something in
               | stock now?
               | 
               | By comparison I can try 10 different AI services without
               | even needing to stand up for a break while I can't buy
               | good dress shoes in the same physical store as a pair of
               | football cleats.
        
               | kelnos wrote:
               | > _Shoe shopping is pretty complex, more so than trialing
               | an AI model in my opinion._
               | 
               | Oh c'mon, now you're just being disingenuous, trying to
               | make an argument for argument's sake.
               | 
               | No, shoe shopping is not more complicated than trialing a
               | LLM. For all of those questions about shoes you are
               | posing, either a) a purchaser won't care and won't need
               | to ask them, or b) they already know they have specific
               | requirements and will know what to ask.
               | 
               | With an LLM, a newbie doesn't even know what they're
               | getting into, let alone what to ask or where to start.
               | 
               | > _By comparison I can try 10 different AI services
               | without even needing to stand up for a break_
               | 
               | I can't. I have no idea how to do that. It sounds like
               | you've been following the space for a while, and you're
               | letting your knowledge blind you to the idea that many
               | (most?) people don't have your experience.
        
             | ryandrake wrote:
             | Hey, I'm open to the idea that I'm just stupid. But, if
             | people in your target market (software developers) don't
             | even understand your product line and need a HOWTO+glossary
             | to figure it out, maybe there's also a
             | branding/messaging/onboarding problem?
        
               | DougBTX wrote:
               | My hot take is that your friend should show you what
               | they're using, not just dismiss Copilot and leave you
               | hanging!
        
             | gmueckl wrote:
             | When you walk into a store, you can see and touch all of
             | these products. It's intuitive.
             | 
             | With all this LLM cruft all you get is essentially the same
             | old chat interface that's like the year 2000 called and
             | wants its on-line chat websites back. The only thing other
             | than a text box that you usually get is a model selector
             | dropdown squirreled away in a corner somewhere. And that
             | dropdown doesn't really explain the differences between the
             | cryptic sounding options (GPT-something, Claude
             | Whatever...). Of course this confuses people!
        
               | derefr wrote:
               | Claude.ai, ChatGPT, etc. are finished B2C products.
               | They're black boxes, encapsulated experiences. Consumers
               | don't want to pick a model, or know what model they're
               | using; they just want to "talk to AI", and for the system
               | to know which model is best to answer any given question.
               | I would bet that for these companies, if their frontend
               | observes you using the little model override button, that
               | gets instrumented as an "oops" event in their metrics --
               | something they aim to minimize.
               | 
               | What _you 're_ looking for, are the landing pages of the
               | B2B API products underlying these B2C experiences. That
               | would be https://www.anthropic.com/claude,
               | https://openai.com/api/, etc. (In general, search "[AI
               | company] API".)
               | 
               | From those B2B landing pages, you can usually click
               | through to pages with details about each of their models.
               | 
               | Here's the model page corresponding to this news
               | announcement, for example:
               | https://www.anthropic.com/claude/opus
               | 
               | (Also, note how these B2B pages are on the AI companies'
               | own corporate domains; whereas their B2C products have
               | their own dedicated domains. From their perspective,
               | their B2C offerings are essentially treated as separate
               | companies that happen to consume their APIs -- a
               | "reference use-case" -- rather than as a part of what the
               | B2B company sells.)
        
             | margalabargala wrote:
             | If anything, Anthropic has the product lineup that makes
             | the most sense. Higher numbers mean better model. Haiku <
             | Sonnet < Opus which translates to length/size. Free < Pro <
             | Max.
             | 
             | Contrast to something like OpenAI. They've got gpt4.1, 4o,
             | and o4. Which of these are newer than one another? How do
             | people remember which of o4 and 4o are which?
        
             | hvb2 wrote:
             | Not sure is this is sarcasm I'm assuming not.
             | 
             | You're comparing well understood products that are wildly
             | different to products with code names. Even someone who has
             | never wore a t-shirt will see it on a mannequin and know
             | where it goes.
             | 
             | I'm sorry but I cannot tell what the difference is between
             | sonnet and opus. Unless one is for music...
             | 
             | So in this case you read the docs. Which is, in your
             | analogy, you going to the Nike store and reading up on if a
             | tshirt goes on your upper or lower body.
        
             | potatolicious wrote:
             | Eh, this seems like a take that reeks a bit of "everyone is
             | stupid except me".
             | 
             | I _do_ know the answer to OP 's question but that's because
             | I pickle my brain in this stuff. It is legitimately
             | confusing.
             | 
             | The analogy to different SKUs strikes me also inaccurate.
             | This isn't the difference between shoes, shirts, and shorts
             | - it's more as if a company sells three t-shirts but you
             | can't really tell what's different about them.
             | 
             | It's Claude, Claude, and Claude. Which ones code for you?
             | Well, actually, all of them (Code, web/desktop Claude, and
             | the API can all do this)
             | 
             | Which ones do you ask about daily sundry queries? Well, two
             | of them (web/desktop Claude, but also the API, but not
             | Code). Well, except if your sundry query is about a
             | programming topic, in which case Code can also do that!
             | 
             | Ok, if I _do_ want to use this to write code, which one
             | should I use? Honestly, any of them, and the company does a
             | poor job of explaining why you would use each option.
             | 
             | "Which of these very similar-seeming t-shirts should I
             | get?" "You knob. How are posts like this even being
             | _posted_. " is just an extremely poor way to approach other
             | people, IMO.
        
               | ryandrake wrote:
               | > It's Claude, Claude, and Claude. Which ones code for
               | you?
               | 
               | Thanks for articulating the confusion better than I
               | could! I feel it's a similar branding problem as other
               | tech companies have: I'm watching Apple TV+ on my Apple
               | TV software running on my Apple TV connected to my Google
               | TV that isn't actually manufactured by Google. But that
               | Google TV also has an Apple TV app that can play Apple
               | TV+.
        
               | potatolicious wrote:
               | It's a bit worse than a branding problem honestly, since
               | there's legitimate overlap between products, because
               | ultimately they're different expressions of the same
               | underlying LLMs.
               | 
               | I'm not sure if you ever got a good rundown, but the
               | tl;dr is that the 3 products ("Desktop", Code, and API)
               | all expose the same underlying models, but are given
               | different prompts, tools, and context management
               | techniques that make them behave fairly differently and
               | affect how you interact with them.
               | 
               | - The API is the bare model itself. It has some coding
               | ability because that's inherent to the model - you can
               | ask it to generate code and copy and paste it for
               | example. You normally wouldn't use this _except_ that if
               | you 're using some Copilot-type IDE integration where the
               | IDE is doing the work of talking to the model for you and
               | integrating it into your developer experience. In that
               | case you provide API key and the IDE does the heavy
               | lifting.
               | 
               | - The desktop app is actually a half-decent coder. It's
               | capable of producing specific artifacts, distinguishing
               | between multiple "files" it's writing for you, and
               | revisiting previously-written code. "Oh, actually rewrite
               | this in Go." is for example a thing it can totally do. I
               | find it useful for diagnosing issues interactively.
               | 
               | - "Claude Code" is a CLI-only wrapper around the model.
               | Think of it like Anthropic's first-party IDE integration,
               | except there's not an IDE, just the CLI. In this case the
               | integration gives the tool broad powers to actually
               | navigate your filesystem, read specific files, write to
               | specific files, run shell commands like builds and tests,
               | etc. These are all functions that an IDE integration
               | would also give you, but this is done in a Claude-y way.
               | 
               | My personal take is: try Claude Code, since as long as
               | you're halfway comfortable with a CLI it's pretty usable.
               | If you really want a direct IDE integration you can go
               | with the IDE+API key route, though keep in mind that you
               | might end up paying more (Claude Code is all-you-can-eat-
               | with-rate-limits, where API keys will... just keep
               | going).
        
               | ryandrake wrote:
               | Wow. After 50 replies to what I thought wasn't such a
               | weird question, your rundown is the most enlightening.
               | Thank you very much.
        
               | Karrot_Kream wrote:
               | FWIW it's probably because a lot of us have been
               | following along and trying these things from the start so
               | the nuances seem more obvious but also I feel that some
               | folks feel your question is a bit "stupid", like "why are
               | you suddenly interested in the frontier of these models?
               | where were you for the last 2 years?"
               | 
               | And to some extent it is like the PC race. Imagine going
               | to work and writing software for whatever devices your
               | company writes software for in whatever toolchain your
               | company uses. Then 2-3 years after the PC race began
               | heating up, asking "Hey I only really write code for
               | whatever devices my employer gives me access to. Now I
               | want to buy one of these new PCs but I don't really
               | understand why I'd choose an Intel over a Motorolla
               | chipset or why I'd prioritize more ROM or more RAM, and I
               | keep hearing about this thing called RISC that's way
               | better than CISC and some of these chips claim to have
               | different addressing modes that are better?"
        
               | slackpad wrote:
               | Claude Code running in a terminal can connect to your IDE
               | so you can review its proposed changes there. I've found
               | this to be a nice drop in way to try it out without
               | having to change your core workflow and tools too much.
               | Check out the /ide command for details.
        
               | Karrot_Kream wrote:
               | Also when it comes to API integrations, I find some
               | better than others. Copilot has been pretty crummy for me
               | but Zed's Agent Mode seems to be almost as good as Claude
               | Code. I agree with the general take that Claude Code is a
               | good default place to start.
        
             | tomrod wrote:
             | > Why is this the top comment on so many threads about tech
             | products?
             | 
             | Because you overestimate the difference that the
             | representative person understands.
             | 
             | A more accurate analogy is that Nike sells green-blue shoes
             | and Nike sells blue-green shoes, but the blue-green shoes
             | add 3 feet to your jump and green-blue shoes add 20 mph to
             | your 100 yard dash sprint.
             | 
             | You know you need one of them for tomorrow's hurdles race
             | but have no idea which is meaningful for your need.
        
               | ryandrake wrote:
               | Also, the green-blue shoes charge per-step, but the blue-
               | green shoes are billed monthly by signing up for
               | BlueGreenPro+ or BlueGreenMax+, each with a hidden step
               | limit but BlueGreenMax+ is the one that gives you access
               | to the Cyan step model which is better; plus the green-
               | blue shoes are only useful when sprinting, but the blue-
               | green shoes can be used in many different events, but
               | only through the Nike blue-green API that only some
               | track&field venues have adopted...
        
             | true_religion wrote:
             | This is like being told to buy Nike shoes. Then when you
             | proudly display your new cleats, they tell you "no, I meant
             | you should by basketball shoes. The cleats are terrible."
        
             | squeaky-clean wrote:
             | Which Nike shoe is best for basketball? The Nike Dunk, Air
             | Force 1, Air Jordan, LeBron 20, LeBron XXI Prime 93, Kobe
             | IX elite, Giannis Freak 7, GT Cut, GT Cut 3, GT Cut 3
             | Turbo, GT Hustle 3, or the KD18?
             | 
             | At least with those you can buy whatever you think is
             | coolest. Which Claude model and interface should the
             | average programmer use?
        
               | AlecSchueler wrote:
               | What's the average programmer? Is it someone who likes
               | CLI tools? Or who likes IDE integration? Different
               | strokes for different folks and surely the average
               | programmer understands what environment they will be most
               | comfortable in.
        
               | nawgz wrote:
               | > Different strokes for different folks and surely the
               | average programmer understands what environment they will
               | be most comfortable in.
               | 
               | That's a silly claim to me, we're talking about a
               | completely new environment where you prompt an AI to
               | develop code, and therefore an "average programmer" is
               | unlikely to have any meaningful experience or intuition
               | with this flow. That is exactly what GP is talking about
               | - where does he plug in the AI? What tradeoffs are there
               | to different options?
               | 
               | The other day I had someone judge me for asking this
               | question by dismissively saying "dont say youve still
               | been using ChatGPT and copy/paste", which made me laugh -
               | I don't use AI at all, so who was he looking down on?
        
             | kelnos wrote:
             | Because the offerings are not simple. Your Nike example is
             | silly; everyone knows what to do with shoes and shorts and
             | shirts, and why they might want (or not want) to buy those
             | particular items from Nike.
             | 
             | But for someone who hasn't been immersed in the "LLM
             | scene", it's hard to understand why you might want to use
             | one particular model of another. It's hard to understand
             | why you might want to do per-request API pricing vs. a
             | bucketed usage plan. This is a new technology, and the
             | landscape is changing weekly.
             | 
             | I think maybe it might be nice if folks around here were a
             | bit more charitable and empathetic about this stuff.
             | There's no reason to get all gatekeep-y about this kind of
             | knowledge, and complaining about these questions just
             | sounds condescending and doesn't do anyone any good.
        
             | pdntspa wrote:
             | Because few seem to want to expend the effort to dive in
             | and understand something. Instead they want the details
             | spoonfed to them by marketing or something.
             | 
             | I absolutely loathe this timeline we're stuck in.
        
           | windsignaling wrote:
           | On the contrary, I'm confused about why you're confused.
           | 
           | This is a well-known and documented phenomenon - the paradox
           | of choice.
           | 
           | I've been working in machine learning and AI for nearly 20
           | years and the number of options out there is overwhelming.
           | 
           | I've found many of the tools out there do some things I want,
           | but not others, so even finding the model or platform that
           | does exactly what I want or does it the best is a time-
           | consuming process.
        
         | joshmarlow wrote:
         | VSCode has a pretty good Gemini integration - it can pull up a
         | chat window from the side. I like to discuss design changes and
         | small refactorings ("I added this new rpc call in my protobuf
         | file, can you go ahead and stub out the parts of code I need to
         | get this working in these 5 different places?") and it usually
         | does a pretty darn good job of looking at surrounding idioms in
         | each place and doing what I want. But gemini can be kind of
         | slow here.
         | 
         | But I would recommend just starting using Claude in the
         | browser, talk through an idea for a project you have and ask it
         | to build it for you. Go ahead and have a brain storming session
         | before you actually ask it to code - it'll help make sure the
         | model has all of the context. Don't be afraid to overload it
         | with requirements - it's generally pretty good at putting
         | together a coherent plan. If the project is small/fits in a
         | single file - say a one page web app or a complicated data
         | schema + sql queries - then it can usually do a pretty good job
         | in one place. Then just copy+paste the code and run it out of
         | the browser.
         | 
         | This workflow works well for exploring and understanding new
         | topics and technologies.
         | 
         | Cursor is nice because it's an AI integrated IDE (smoother than
         | the VSCode experience above) where you can select which models
         | to use. IMO it seems better at tracking project context than
         | Gemini+VSCode.
         | 
         | Hope this helps!
        
         | spaceman_2020 wrote:
         | Download Claude Code
         | 
         | Create a new directory in your terminal
         | 
         | Open that directory, type in "Claude" to run Claude
         | 
         | Press Shit + Tab to go into planning mode
         | 
         | Tell Claude what you want to build - recommend something simple
         | to start with. Specify the languages, environment, frameworks
         | you want, etc.
         | 
         | Claude will come up with a plan. Modify the plan or break it
         | into smaller chunks if necessary
         | 
         | Once plan is approved, ask it to start coding. It will ask you
         | for permissions and give you the finished code
         | 
         | It really is something when you actually watch it go.
        
         | zarzavat wrote:
         | Github Copilot and Claude code are not exactly competitors.
         | 
         | Github Copilot is autocomplete, highly useful if you use VS
         | Code, but if you are using e.g. Jetbrains then you have other
         | options. Copilot comes with a bunch of other stuff that I
         | rarely use.
         | 
         | Claude code is project-wide editing, from the CLI.
         | 
         | They complement each other well.
         | 
         | As far as I'm concerned the utility of the AI-focused editors
         | has been diminished by the existence of Claude code, though not
         | entirely made redundant.
        
           | fkyoureadthedoc wrote:
           | > Github Copilot is autocomplete... comes with a bunch of
           | other stuff that I rarely use.
           | 
           | That bunch of other stuff includes the chat, and more
           | recently "Agent Mode". I find it pretty useful, and the
           | autocomplete near useless.
        
           | qingcharles wrote:
           | This isn't correct. GitHub Copilot now totally competes with
           | Claude Code. You can have it create an entire app for you in
           | "Agent" mode if you're feeling brave. In fact, seeing as
           | Copilot is built directly into Visual Studio when you
           | download it, I guess they have a one-up.
           | 
           | Copilot isn't locked to a specific LLM, though. You can
           | select the model from a panel, but I don't think you can plug
           | in your own right now, and the ones you can select might not
           | be SOTA because of that.
        
             | alienbaby wrote:
             | Sonnet 4 in copilot agent mode has been doing great work
             | for me lately. Especially once you realise that at least
             | 50% of the work is done before you get to copilot, as
             | architectural and product specs and implementations plans.
        
           | tomwojcik wrote:
           | Opencode https://github.com/sst/opencode provides a CC like
           | interface for copilot. It's a slightly worse tool, but since
           | copilot with Claude 4 is super cheap, I ended up preferring
           | it over CC. Almost no limits, cheaper, you can use all the
           | Copilot models, GH is not training on your data.
        
         | andsoitis wrote:
         | > use Claude. But I have no idea what the right way to do it is
         | because there are so many paths to choose.
         | 
         | Anthropic has this useful quick start guide:
         | https://docs.anthropic.com/en/docs/claude-code/quickstart
        
         | StephenHerlihyy wrote:
         | Kilo Code for VSCode is pretty solid. Give it a try.
        
         | wintermutestwin wrote:
         | Yes. You basically need an LLM to provide guidance on product
         | selection in this brave new world.
         | 
         | It is actually one of my most useful use cases of this tech.
         | Nice to have a way to ask in private so you don't get snarky
         | answers like: it's just like buying shoes!
        
         | vanillax wrote:
         | All the tools, copilot,claude, gemini in vscode are all
         | completely worthless unless in Agent Mode. I have no idea why
         | none of these tools dont default to Agent mode.
        
         | ActorNightly wrote:
         | If you want your own cheap IDE integration, you can set up
         | VSCode with Continue extension, ollama running locally, and a
         | small agent model.
         | https://docs.continue.dev/features/agent/model-setup.
         | 
         | If you want to understand how all of this works, the best way
         | is to build a coding agent manually. Its not that hard
         | 
         | 1. Start with Ollama running locally and Gemma3 QAT models.
         | https://ollama.com/library/gemma3
         | 
         | 2. Write a wrapper around Ollama using your favorite language.
         | The idea is that you want to be able to intercept responses
         | coming back from the model.
         | 
         | 3. Create a system prompt that tells the model things like "if
         | the user is asking you to create a file, reply in this
         | format:...". Generally to start, you can specify instructions
         | for read file, write file, and execute file
         | 
         | 4. In your wrapper, when you send the input chat prompt, and
         | get the model response back, you look for those formats, and
         | make the wrapper actually execute the action. For example if
         | the model replies back with the format to read file, you read
         | the file from your wrapper code and send it back to the model.
         | 
         | Every coding assistant is basically this under the hood with
         | just a lot more fluff and their own IDE integration.
         | 
         | The benefit of doing your own is that you can customize it to
         | your own needs, and when you direct a model with more precision
         | even the small models perform very well with much faster speed.
        
           | afro88 wrote:
           | OP is asking for where to get started with Claude for coding.
           | They're confused. They just want to mess around with it in
           | VSCode. And you start talking about Ollama, PAT, coding your
           | own wrapper, composing a system prompt etc.!?
        
         | jimbo808 wrote:
         | You just described all of your options in detail - what's the
         | problem? Pick one. Seems like you've got a very thorough grasp
         | on how to get started trying the stuff out, but it requires you
         | to choose how you want to do that.
        
         | kelnos wrote:
         | If you're looking for a coding assistant, get Claude Code, and
         | give it a try. I think you need the Pro plan at a minimum for
         | that ($20/mo; I don't think Free includes Claude Code). Don't
         | do the per-request API pricing as it can get expensive even
         | while just playing around.
         | 
         | Agree that the offering is a bit confusing and it's hard to
         | know where to start.
         | 
         | Just FYI: Claude Code is a terminal-based app. You run it in
         | the working directory of your project, and use your regular
         | editor that you're used to, but of course that means there's no
         | editor integration (unlike something like Cursor). I personally
         | like it that way, but YMMV.
        
         | robluxus wrote:
         | > I just want to putz around with something in VSCode for a few
         | hours!
         | 
         | I just googled "using claude from vscode" and the first page
         | had a link that brought me to anthropic's step by step guide on
         | how to set this up exactly.
         | 
         | Why care about pricing and product names and UI until it's a
         | problem?
         | 
         | > Someone on HN told me Copilot sucks, use Claude.
         | 
         | I concur, but I'm also just a dude saying some stuff on HN :)
        
         | zaphirplane wrote:
         | try asking it ?
        
         | screye wrote:
         | Cursor + Claude 4 = best quality + UX balance. Pay up for
         | 20/month subscription.
         | 
         | Cursor imports in your VSCode setup. Setting it up should be
         | trivial.
         | 
         | Use Agent mode. Use it in a preexisting repo.
         | 
         | You're off the races.
         | 
         | There is a lot more you can do, but you should start seeing
         | value at this point.
        
         | w0m wrote:
         | honestly - copilot free mode; and just play with the agentic
         | stuff can give you a good idea. Attach it to Roo and you'll get
         | a good idea. Realize that if you paid to use a better model;
         | you'd get better results as free doesn't have a ton of premium
         | tokens.
        
       | ramesh31 wrote:
       | Will the price for 4 go down? I still find Opus completely
       | unusable for the cost/performance, as someone who spends
       | thousands per month on tokens. There's really no noticeable
       | difference from Sonnet, at nearly 10x the price.
        
       | _vaporwave_ wrote:
       | It's interesting that Anthropic maintains current prices for
       | prior state of the art models when doing a new release. Why offer
       | a model with worse performance for the same price? What
       | incentives are they trying to create?
        
         | dysoco wrote:
         | I'm guessing it's mostly for legacy reasons. When 3.7 came out
         | many people were not happy with it and went back to 3.5; I
         | guess supporting older models for a while makes sense.
        
         | gwd wrote:
         | > What incentives are they trying to create?
         | 
         | One obvious explanation is that pricing is strongly related to
         | the _price to them_ , and that their only incentive is for
         | people to use an expensive model of they really need it.
         | 
         | I forget which one of the GPT models was better, faster, _and
         | cheaper_ than the previous model. The incentive there is
         | obviously,  "If you want to use the old model for whatever
         | reason, fine, but we really want you to use the new one because
         | costs _us_ less to run. "
        
       | mrcwinn wrote:
       | o3 and o3-pro are just so good. Sonnet goes off the deep end too
       | often and Opus, in my experience, is not as strong at reasoning
       | compared to OpenAI, despite the higher costs. Rarely do we see a
       | worse, more expensive product win - but competition is good and
       | I'm rooting for Anthropic nonetheless!
        
         | AlecSchueler wrote:
         | Off the deep end?
        
         | WXLCKNO wrote:
         | o3 feels pretty good to me as well but o3-pro has consistently
         | one shotted problems other LLMs got stuck on.
         | 
         | I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3
         | etc resulting in sometimes hundreds of lines of code.
         | 
         | Versus o3-pro (very slowly) analyzing and then fixing something
         | that seemed completely unrelated in a one or two line change
         | and truly fixing the root cause.
         | 
         | o3-pro level LLMs at reduced cost and increased speed will
         | already be amazing..
        
       | thoop wrote:
       | The article says "We plan to release substantially larger
       | improvements to our models in the coming weeks."
       | 
       | Sonnet 4 has definitely been the best model for our product's use
       | case, but I'd be interested in trying Haiku 4 (or 4.1?) just due
       | to the cost savings.
       | 
       | I'm surprised Anthropic hasn't mentioned anything about Haiku 4
       | yet since they released the other models.
        
       | mocmoc wrote:
       | Their limits are just ... a real road blocker
        
         | bananapub wrote:
         | huh?
         | 
         | Claude Mad is tens of hours of opus a month, or you can pay per
         | token and have unlimited.
         | 
         | Or did you mean "I wish it was cheaper"?
        
       | OldGreenYodaGPT wrote:
       | Claude Code has honestly made me at least 10x more productive.
       | I've burned through about 3 billion tokens and have been
       | consistently merging 5+ PRs a day, tackling tons of tech debt,
       | improving GitHub Actions, and making crazy progress on product
       | work
        
         | totaa wrote:
         | can you share your workflow?
        
         | steinvakt2 wrote:
         | I also have this feeling that I'm 2-10x more productive. But
         | isn't it curious how a lot of devs feel this way, but no devs
         | that I know have the experience that any of their colleagues
         | have become 2-10x more productive?
        
           | nevertoolate wrote:
           | 10x means to me that i can finish a month of work in max 2
           | days and go cloud watching. What does it mean for you?
        
           | mwigdahl wrote:
           | <raises hand> Our automated test folks were chronically
           | behind, struggling to keep up with feature development. I got
           | the two assigned to the team that was the most behind set up
           | with Claude Code. Six weeks later they are fully caught up,
           | expanding coverage, and integrating AI code review into our
           | build pipeline.
           | 
           | It's not 10x, but those guys do seem like they've hit
           | somewhere around 2x improvement overall.
        
         | samtp wrote:
         | What type of work do you do and what type of code do you
         | produce?
         | 
         | Because I've found it to work pretty amazingly for things that
         | don't need to be exact (like data modeling) or don't have any
         | security implications (public apps). But for everything else I
         | end up having to find all the little bugs by reading the code
         | line by line, which is much slower than just writing the code
         | in the first place.
        
         | AstroBen wrote:
         | only 10x? I'm at least 100x as productive. I only type at a
         | measly 100wpm, whereas Claude can output 100+ tokens a second
         | 
         | I'm outputting a PR every 6 minutes. The reviewers are using
         | Claude to review everything. It used to take a day to add 100
         | lines to the codebase.. now I can add 100 lines in one prompt
         | 
         | If I want even more productivity (at risk of making the rest of
         | my team look slow) I can tell Claude to output double the lines
         | and ship it off for review. My performance metrics are
         | incredible
        
           | samtp wrote:
           | So no human reads the actual code that you push to
           | production? Are you not worried about security risks,
           | spaghetti code, and other issues? Or does Claude magically
           | make all of those concerns go away?
        
             | AstroBen wrote:
             | forgot the /s
        
               | samtp wrote:
               | Sorry lol, sometimes difficult to separate the hype boys
               | from actual sarcasm these days
        
           | qingcharles wrote:
           | Not sure if joking...?
        
             | AstroBen wrote:
             | This is only the beginning. I can see myself having 100
             | Claude tasks running concurrently - the only problem is
             | edits clash between files. I'm working on having Claude
             | solve this by giving each instance its own repo to work
             | with, then I ask the final Claude to mash it all together
             | as best it can
             | 
             | What's 100x productivity multiplied by 100 instances of
             | Claude? 10,000x productivity
             | 
             | Now to be fair and a bit more realistic it's not actually
             | 10000x because it takes longer to push the PR because the
             | file sizes are so big. Let's call it 9800x. That's still a
             | sizable improvement
        
           | trallnag wrote:
           | Big if true
        
         | screye wrote:
         | How do you maintain high confidence in the code it generates ?
         | 
         | My current bottleneck is having to review the huge amounts of
         | code that these models spit out. I do TDD, use auto-linting and
         | type-checking.... but the model makes insidious changes that
         | are only visible on deep inspection.
        
         | theappsecguy wrote:
         | The only way you could be 10x more productive is omit you were
         | doing nothing before.
        
       | P24L wrote:
       | The improved Opus isn't about achieving significantly better peak
       | performance for me. It's not about pushing the high end of the
       | spectrum. Instead, it's about consistently delivering better
       | average results - structuring outputs more effectively, self-
       | correcting mistakes more reliably, and becoming a trustworthy
       | workhorse for everyday tasks.
        
       | djha-skin wrote:
       | Opus 4(.1) is _so_ expensive[1]. Even Sonnet[2] costs me $5 per
       | hour (basically) using OpenRouter + Codename Goose[3]. The crazy
       | thing is Sonnet 3.5 costs _the same thing_ [4] right now. Gemini
       | Flash is more reasonable[5], but always seems to make the wrong
       | decisions in the end, spinning in circles. OpenAI is better, but
       | still falls short of Claude's performance. Claude also gives back
       | 400's from its API if you CTRL-C in the middle though, so that's
       | annoying.
       | 
       | Economics is important. Best bang for the buck seems to be OpenAI
       | ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context
       | window with useless tokens like Claude does, API works every
       | time. Gets me out of bad spots. Can get confused, but I've been
       | able to muddle through with it.
       | 
       | 1: https://openrouter.ai/anthropic/claude-opus-4.1
       | 
       | 2: https://openrouter.ai/anthropic/claude-sonnet-4
       | 
       | 3: https://block.github.io/goose/
       | 
       | 4: https://openrouter.ai/anthropic/claude-3.5-sonnet
       | 
       | 5: https://openrouter.ai/google/gemini-2.5-flash
       | 
       | 6: https://openrouter.ai/openai/gpt-4.1-mini
        
         | generalizations wrote:
         | Get a subscription and use claude code - that's how you get
         | actual reasonable economics out of it. I use claude code all
         | day on the max subscription and maybe twice in the last two
         | weeks have I actually hit usage limits.
        
           | tgtweak wrote:
           | Is it considerably more cost effective than cline+sonnet api
           | calls with caching and diff edits?
           | 
           | Same context length and throughput limits?
           | 
           | Anecdotally I find gpt4.1 (and mini) were pretty good at
           | those agentic programming tasks but the lack of token caching
           | made the costs blow up with long context.
        
             | bavell wrote:
             | I'm on the basic $20/mo sub and only ran into token cap
             | limitations in the first few days of using Claude Code (now
             | 2-3 weeks in) before I started being more aggressive about
             | clearing the context. Long contexts will eat up tokens caps
             | quickly when you are having extended back-and-forth
             | conversations with the model. Otherwise, it's been
             | effectively "unlimited" for my own use.
        
               | bgirard wrote:
               | YMMV I'm using the $100/mo max subscription and I hit the
               | limit during a focused coding session where I'm giving it
               | prompts non-stop.
               | 
               | Unfortunately there's no easy tool to inspect usage. I
               | started a project to parse the Claude logs using Claude
               | and generate a Chrome trace with it. It's promising but
               | it was taking my tokens away from my core project.
        
               | bartman wrote:
               | Check out ccusage, it sounds like the tool you're
               | describing: https://github.com/ryoppippi/ccusage
        
               | bgirard wrote:
               | That's neat. According to the tool I'm consuming ~300m
               | tokens per day coding with a (retail?) cost of ~125$/day.
               | The output of the model is definitely worth $100/mo to
               | me.
        
               | symbolicAGI wrote:
               | ccusage on GitHub.
        
           | seneca wrote:
           | Is there a way to sign up for Claude code that doesn't
           | involve verifying a phone number with Anthropic? They don't
           | even accept Google Voice numbers.
           | 
           | Maybe I'm out of touch, but I'm not handing out my phone
           | number to sign up for random SaaS tools.
        
             | tagami wrote:
             | use a burner
        
         | kroaton wrote:
         | GLM 4.5 / Kimi K2 / Qwen Coder 3 / Gemini Pro 2.5
        
       | paul7986 wrote:
       | Claude plus failed me today badly compared to chatGPT plus.
       | 
       | I uploaded a web design of mine (jpeg) and asked Claude to create
       | the html/css. Asked GPT to do the same. GPT's code looked the
       | closet to the design I created and uploaded. Just five to ten
       | small tweaks and I was done vs. Claude it would have taken me
       | almost triple the steps.
       | 
       | I actually subscribed to both today (resubscribed to GPT) and
       | going to keep testing which one is the better front-end developer
       | (i am, but got to embrace AI ).
        
       | alvis wrote:
       | Funny Open AI and Anthropic seems to be coordinating their
       | releases on the same day
        
       | KaoruAoiShiho wrote:
       | For me this is the big news of the day. Looks insane.
        
       | hartator wrote:
       | > 1 min read
       | 
       | What the point of these?
       | 
       | Kind of interesting that we live in an area of AI super advanced,
       | but still make basic UI/UX mistake. The tagline of this blog post
       | shouldn't be "1 min read".
       | 
       | It's not even accurate. I timed myself not reading fast but not
       | slow, took me 3 min 30s. Maybe the images need be OCRed to make
       | the estimation more accurate.
        
       | TimMeade wrote:
       | This has been the worse Claude day ever. Just fell apart. Not
       | sure if the release is why, but cursing in documents and can not
       | fix a bug after hours of back and forth.
        
       ___________________________________________________________________
       (page generated 2025-08-05 23:00 UTC)