[HN Gopher] Claude Opus 4.1
___________________________________________________________________
Claude Opus 4.1
Author : meetpateltech
Score : 565 points
Date : 2025-08-05 16:28 UTC (6 hours ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| jasonlernerman wrote:
| Has anyone tested it yet? How's it acting?
| smerrill25 wrote:
| waiting for this, too.
| usaar333 wrote:
| No obvious gains I feel from quick chats, but too early to
| tell.
|
| These benchmark gains aren't that high, so I doubt it is that
| obvious.
| jedisct1 wrote:
| Tested it on a refactor of Zig code. It worked fine, but was
| very slow.
| minimaxir wrote:
| This likely won't move the needle for Opus use over Sonnet while
| the cost remains the same. Using OpenRouter rankings
| (https://openrouter.ai/rankings) as a proxy, Sonnet 3.7 and
| Sonnet 4 combined generates 17x more tokens than Opus 4.
| qsort wrote:
| All three major labs released something within hours of each
| other. This anime arc is insane.
| x187463 wrote:
| Given the GPT5 rumors, August is just getting started.
| kridsdale3 wrote:
| Given the Gregorian Calendar and the planet's path through
| its orbit, August is just getting started.
| tomrod wrote:
| This legitimately made me chuckle.
| ozgung wrote:
| What a time to be alive
| tonyhart7 wrote:
| as if they wait competitor first then launch it at the same
| time to make market decide which one is best
| torginus wrote:
| I think this means that GPT5 is better - you can't launch a
| worse model after the competitor supersedes you - you have to
| show that you're in the lead even if its just for a day.
| rapind wrote:
| Not sure that this is true. Are there a lot of people
| waiting anxiously to adopt the next model on the day of
| release and expecting some huge work advantage?
| azan_ wrote:
| Absolutely.
| vFunct wrote:
| None of them seem to have published any papers associated with
| them on how these new models advanced the state-of-the-art
| though. =^(
| hugodan wrote:
| china will do that for them
| candiddevmike wrote:
| It's definitely a coincidence
| wilg wrote:
| It's not a coincidence or a cartel, it's PR
| counterprogramming.
| BudaDude wrote:
| Agree 100%
|
| If you look at the past, whenever Google announces
| something major, OpenAI almost always releases something as
| well.
|
| People forget realize that OpenAI was started to compete
| with Google on AI.
| Etheryte wrote:
| This is why you have PR departments. Being on top of the HN
| front page, news sites, etc matters a lot. Even if you can't be
| the first, it's important to dilute the attention as much as
| possible to reduce the limelight your competitors get.
| steveklabnik wrote:
| This is the bit I'm most interested in:
|
| > We plan to release substantially larger improvements to our
| models in the coming weeks.
| machiaweliczny wrote:
| This is so people don't immediately migrate for GPT5
| NitpickLawyer wrote:
| Cheekily announcing during oAI's oss model launch :D
| haaz wrote:
| it is barely an improvement according to their own benchmarks.
| not saying thats a bad thing, but not enough for anybody to
| notice any difference
| waynenilsen wrote:
| i think its probably mostly vibes but that still counts, this
| is not in the charts
|
| > Windsurf reports Opus 4.1 delivers a one standard deviation
| improvement over Opus 4 on their junior developer benchmark,
| showing roughly the same performance leap as the jump from
| Sonnet 3.7 to Sonnet 4.
| ttoinou wrote:
| That's why they named it 4.1 and not 4.5
| zamadatix wrote:
| When it's "that's why they incremented the version by a tenth
| instead of a half" you know things have really started to
| slow for the large models.
| phonon wrote:
| Opus 4 came out 10 weeks ago. So this is basically one new
| training run improvement.
| zamadatix wrote:
| And in 52 weeks we've gone 3.5->4.1 with this training
| improvement, meanwhile the 52 weeks prior to that were
| Claude -> Claude 3. The absolute jumps per version delta
| also used to be larger.
|
| I.e. it seems we don't get much more than new training
| run levels of improvement anymore. Which is better than
| nothing, but a shame compared to the early scaling.
| globalise83 wrote:
| Is it really a bigger jump to go from plausible to
| frequently useful, than from frequently useful to
| indispensable?
| zamadatix wrote:
| Why is there supposed to be no step between frequently
| useful and indispensable? Quickly going from nothing to
| frequently useful (which involved many rapid hops
| between) was certainly surprising, and that's precisely
| the lost momentum.
| mclau157 wrote:
| They released this because competitors are releasing things
| leetharris wrote:
| Good! I'm glad they are just giving us small updates. Opus 4
| just came out, if you have small improvements, why not just
| release them? There's no downside for us.
| AstroBen wrote:
| I don't think this could even be called an improvement? It's
| small enough that it could just be random chance
| j_bum wrote:
| I've always wondered about this actually. My assumption is
| that they always "pick the best" result from these tests.
|
| Instead, ideally they'd run the benchmark tests many times,
| and share all of the results so we could make statistical
| determinations.
| gloosx wrote:
| They need to leave some room to release 10 more models. They
| could crank benchmarks to 100% but then no new model is needed
| lol? Pretty sure these pretty benchmark graphs are all
| completely staged marketing numbers since they do solve the
| same problems they are being trained on - no novel or unknown
| problematic is presented to them.
| levocardia wrote:
| "You pay $20/mo for X, and now I'm giving you 1.05*X for the
| same price." Outrageous!
| onlyrealcuzzo wrote:
| I will only add that it's interesting that in the results
| graphic, they simply highlighted Opus 4.1 - choosing not to
| display which models have the best scores - as Opus 4.1 only
| scored the best on about half of the benchmarks - and was worse
| than Opus 4.0 on at least one measure.
| Topfi wrote:
| I am still very early, but output quality wise, yes, there does
| not seem to be any noticeable improvement in my limited
| personal testing suite. What I have noticed though is
| subjectively better adherence to instructions and documentation
| provided outside the main prompt, though I have no way to
| quantify or reliably test that yet. So beyond reliably finding
| Needles-in-the-Haystack (which Frontier models have done well
| on lately), Opus 4.1 seems to do better in following those
| needles even if not explicitly guided to compared to Opus 4.
| jzig wrote:
| I'm confused by how Opus is presented to be superior in nearly
| every way for coding purposes yet the general consensus and my
| own experience seem to be that Sonnet is much much better. Has
| anyone switched to entirely using Opus from Sonnet? Or maybe
| switching to Opus for certain things while using Sonnet for
| others?
| datameta wrote:
| I now eagerly await Sonnet 4.1, only because of this release.
| rtfeldman wrote:
| Yes, Opus is very noticeably better at programming in both Rust
| and Zig in my experience. I wish it were cheaper!
| MostlyStable wrote:
| Opus seems better to me on long tasks that require iterative
| problem solving and keeping track of the context of what we
| have already tried. I usually switch to it for any kind of
| complicated troubleshooting etc.
|
| I stick with Sonnet for most things because it's generally good
| enough and I hit my token limits with it far less often.
| unshavedyak wrote:
| Same. I'm on the $200 plan and I find Opus "better", but
| Sonnet is more straight forward. Sonnet is, to me, a "don't
| let it think" model. It does great if you give it concrete
| and small goals. Anything vague or broad and it starts
| thinking and it's a problem.
|
| Opus gives you a bit more rope to hang yourself with imo.
| Yes, it "thinks" slightly better, but still not good enough
| to me. But it can be good enough to convince you that it can
| do the job.. so i dunno, i almost dislike it in this regard.
| I find Sonnet just easier to predict in this regard.
|
| Could i use Opus like i do Sonnet? Yes definitely, and
| generally i do. But then i don't really see much difference
| since i'm hand-holding so much.
| adastra22 wrote:
| Every time that Sonnet is acting like it has brain damage
| (which is once or twice a day), I switch to Opus and it seems
| to sort things out pretty fast. This is unscientific anicdata
| though, and it could just be that switching models (any model)
| would have worked.
| anonzzzies wrote:
| Exactly that.
| j45 wrote:
| They both seem to behave differently depending on how loaded
| the system seems to be.
| api wrote:
| I have suspected for a long time that hosted models load
| shed by diverting some requests to lesser models or running
| more quantized versions under high load.
| parineum wrote:
| I think OpenRouter saves tokens by summarizing queries
| through another model, IIRC.
| monatron wrote:
| This is a great use case for sub-agents IMO. By default, sub-
| agents use sonnet. You can have opus orchestrate the various
| agents and get (close to) the best of both worlds.
| adastra22 wrote:
| Is there a way to get persistent sub-agents? I'd love to
| have a bunch of YAML files in my repository, one for each
| sub-agent, and have those automatically used across all
| Claude Code instances I have on multiple machines (I dev on
| laptop and desktop), or across the team.
| mwigdahl wrote:
| Yep: https://docs.anthropic.com/en/docs/claude-code/sub-
| agents
| rapind wrote:
| In this case I don't think the controller needs to be the
| smartest model. I use sonnet as the main driver and pass
| the heavy thinking (via zen mcp) onto Gemini pro for
| example, but I could use openai or opus or all of them via
| OpenRouter.
|
| Subagents seem pretty similar to using zen mcp w/
| OpenRouter but maybe better or at least more turnkey? I'll
| be checking them out.
| mark_undoio wrote:
| Amp (ampcode.com) uses Sonnet as its main model and has
| GPT o3 as a special purpose tool / subagent. It can call
| into that when it needs particularly advanced reasoning.
|
| Interestingly I found that prompting it to ask the o3
| submodel (which they call The Oracle) to check Sonnet's
| working on a debugging solution was helpful. Extra
| interesting to me was the fact that Sonnet appeared to do
| a better job once I'd prompted that (like chain of
| thought prompting, perhaps asking it to put forward an
| explanation to be checked actually triggered more
| effective thinking).
| gpm wrote:
| This seems like a case of reversion to the mean. When one
| model is performing below average, changing anything (like
| switching to another model) is likely to improve it by random
| chance...
| keeeba wrote:
| Anthropic say Opus is better, benchmarks & evals say Opus
| is better, Opus has more parameters and parameters
| determine how much a NN can learn.
|
| Maybe Opus just is better
| HarHarVeryFunny wrote:
| Maybe context rot? If model's output seems to be getting
| worse or in a rut, then try just clearing context / starting
| a new session.
| adastra22 wrote:
| Switching models with the same context, in this case.
| dested wrote:
| If I'm using cursor then sonnet is better, but in claude code
| Opus 4 is at least 3x better than Sonnet. As with most things
| these days, I think a lot of it comes down to prompting.
| jzig wrote:
| This is interesting. I do use Cursor with almost exclusively
| Sonnet and thinking mode turned on. I wonder if what Cursor
| does under the hood (like their indexing) somehow empowers
| Sonnet more. I do not have much experience with using Claude
| Code.
| seunosewa wrote:
| It's ridiculously overpriced in the API. Just like o3 used to
| be.
| brenoRibeiro706 wrote:
| I feel the same way. I usually use Opus to help with coding and
| documentation, and I use Sonnet for emails and so on.
| biinjo wrote:
| Im on the Max plan and generally Opus seems to do better work
| than Sonnet. However, that's only when they allow me to use
| Opus. The usage limits, even on the max plan, are a joke.
| Yesterday I hit the limits within MINUTES of starting my work
| day.
| epolanski wrote:
| Yeah, you need to actively cherry pick which model to use in
| order to not waste tokens on stuff that would be easily
| handed by a simpler model.
| furyofantares wrote:
| I'm a bit confused by people hitting usage limits so quickly.
|
| I use Opus exclusively and don't hit limits. ccusage reports
| I'm using the API-equivalent of $2000/mo
| rirze wrote:
| You always have to ask which plan they're paying for.
| Sometimes people complain about the $20 per month plan...
| stavros wrote:
| There's no Opus quota on that plan at all.
| furyofantares wrote:
| In this case I'm replying to someone who lead with "I'm
| on the Max plan" but I realize now that's ambiguous,
| maybe they are on 5x while I'm on 20x.
| Bolwin wrote:
| That's insane. Are you accounting for caching? If not,
| there's no way this is going to last
| furyofantares wrote:
| I'm using ccusage to get the number, I think it just
| looks at your history and calculates based on tokens vs
| API pricing. So I think it wouldn't account for caching.
|
| But I totally agree there's no way it lasts. I'm mostly
| only using this for side projects and I'm sitting there
| interacting with it, not YOLO'ing, I do sometimes have
| two sessions going at the same time but I'm not firing
| off swarms or anything crazy. Just have it set to Opus
| and I chat with it.
| dsrtslnd23 wrote:
| same here constantly hit the Opus limits after minutes on Max
| plan
| gpm wrote:
| I notice that on the "Agentic Coding" benchmark cited in the
| article Sonnet 4 outperformed Opus 4 (by 0.2%), and under
| performs Opus 4.1 (by -1.8%).
|
| So this release might change that consensus? If you believe the
| benchmarks are reflective of reality anyways.
| jimbo808 wrote:
| > If you believe the benchmarks are reflective of reality
| anyways.
|
| That's a big "if." But yeah, I can't tell a difference
| subjectively between Opus and Sonnet, other than maybe a sort
| of placebo effect. I'm more careful to write quality prompts
| when using Opus, because I don't want to waste the 5x more
| expensive tokens.
| Uehreka wrote:
| > yet the general consensus and my own experience seem to be
| that Sonnet is much much better
|
| Given that there's nothing close to scientific analysis going
| on, I find it hard to tell how big the "Sonnet is overall
| better, not just sometimes" crowd is. I think part of the
| problem is that "The bigger model is better" feels obvious to
| say, so why say it? Whereas "the smaller model is better
| actually" feels both like unobvious advice and also the kind of
| thing that feels smart to say, both of which would lead to more
| people who believe it saying it, possibly creating the illusion
| of consensus.
|
| I was trying to dig into this yesterday, but every time I come
| across a new thread the things people are saying and the
| proportions saying what are different.
|
| I suppose one useful takeaway is this: If you're using Claude
| Max and get downgraded from Opus to Sonnet for a few hours, you
| don't have to worry too much about it being a harsh downgrade
| in quality.
| taormina wrote:
| Just more ancedata, but I entirely agree. I can't say that I am
| happy with Sonnet's output at any point, really, but it still
| occasionally works, whereas Opus has been a dumpster fire every
| single time.
| SkyPuncher wrote:
| I don't doubt Opus is technically superior, but it's not
| practically superior for me.
|
| It's still pretty much impossible to have any LLM one-shot a
| complex implementation. There's just too many details to figure
| out and too much to explain for it to get correct. Often,
| there's uncertainty and ambiguity that I only understand the
| correct answer (or rather less bad answer) after I've spent
| time deep in the code. Having Opus spit out a possibly correct
| solution just isn't useful to me. I need to understand _why_ we
| got to that solution and _why_ it's a correct solution for the
| context I'm working in.
|
| For me, this means that I largely have an iteratively driven
| implementation approach where any particular task just isn't
| that complex. Therefore, Sonnet is completely sufficient for my
| day-to-day needs.
| ssk42 wrote:
| You can also always have it create design docs and mermaid
| diagrams for each task. Outline the why much easier earlier,
| shifting left
| bdamm wrote:
| I've been having a great time with Windsurf's "Planning"
| feature. Have a nice discussion with Cascade (Claude) all
| about what it is that neerds to happen - sometimes a very
| long conversation including test code. Then when everything
| is very clear, make it happen. Then test and debug the
| results with all that context. Pretty nice.
| jstummbillig wrote:
| Can you explain what you do exactly? Do you enable plan
| mode and use with chat...?
| jm4 wrote:
| I use both. Sonnet is faster and more cost efficient. It's
| great for coding. Where Opus is noticeably better is in
| analysis. It surpasses Sonnet for debugging, finding patterns
| in data, creativity and analysis in general. It doesn't make a
| lot of sense to use Opus exclusively unless you're on a max20
| plan and not hitting limits. Using Opus for design and
| troubleshooting and Sonnet for everything else is a good way to
| go.
| astrostl wrote:
| With aggressive Claude Code use I didn't find Sonnet _better_
| than Opus but I did find it _faster_ while consuming far fewer
| tokens. Once I switched to the $100 Max plan and configured CC
| to exclusively use Sonnet I haven 't run into a plan token
| limit even once. When I saw this announcement my first thing
| was to CMD-F and see when Sonnet 4.1 was coming out, because I
| don't really care about Opus outside of interactive deep
| research usage.
| ssss11 wrote:
| That's very strange. Sonnet is hot garbage and Opus is a
| miracle, for me. I also don't see anyone praising sonnet
| anywhere.
| sky2224 wrote:
| I've found with limited context provided in your prompt, opus
| is just awful compared to even gpt-4.1, but once I give it even
| just a little bit more of an explanation, it jumps leagues
| ahead.
| sothatsit wrote:
| Opus really shines for completing long-running tasks with no
| supervision. But if you are using Claude Code interactively and
| actively steering it yourself, Sonnet is good enough and is
| faster.
|
| I don't believe anyone saying Sonnet is strictly better than
| Opus though, as my experience has been exactly the opposite.
| But trade-off wise, I can definitely see it being a better
| experience when used interactively because of its speed and
| lower cost.
| paxys wrote:
| Why is everything releasing today?
| datameta wrote:
| Could it be nobody wanted to be first and overshadowed, nor the
| only one left out - and it cascaded after the first
| announcement? My first hunch, though, was that it had been
| agreed upon. Game theory I think tells us that releasing same
| day in the pattern ABC BCA CAB etc would be lowest risk and
| highest average gain?
| highfrequency wrote:
| If they release before GPT-5, they don't have to compare to
| GPT-5 in their benchmarks. It's a big PR win to be able to
| plausibly claim that your model is the best coding model at the
| time of release.
| gusmally wrote:
| They restarted Claude Plays Pokemon with the new model:
| https://www.twitch.tv/claudeplayspokemon
|
| (He had been stuck in the Team Rocket hideout (I believe) for
| weeks)
| alrocar wrote:
| just ran the LLM to SQL benchmark over opus-4.1 and it didn't top
| previous version :thinking: => https://llm-
| benchmark.tinybird.live/
| epolanski wrote:
| How does running it multiple times performs?
|
| LLMs are non-deterministic, I think benchmarks should be more
| about averages of N runs, rather than single shot experiments.
| jedisct1 wrote:
| Is it just me or is it super slow?
| taormina wrote:
| Alright, well, Opus 4.1 seems exactly as useless as Opus 4 was,
| but it's probably eating my tokens faster. Wish they let you tell
| somehow.
|
| At least Sonnet 4 is still usable, but I'll be honest, it's been
| producing worse and worse slob all day.
|
| I've basically wasted the morning on Claude Code when I should've
| just been doing it all myself.
| AlecSchueler wrote:
| I've also noticed Sonnet starting to degrade. It's developing
| some of the behaviours that put me off the competition in the
| first place. Needless explanations, filler in responses,
| wanting to put everything in lists, even increased sycophancy.
| bavell wrote:
| > I've basically wasted the morning on Claude Code when I
| should've just been doing it all myself.
|
| Welcome to the machine
|
| https://www.youtube.com/watch?v=tBvAxSx0nAM&t=45s
| rvz wrote:
| Notice how Anthropic has never open sourced any of their models.
|
| This makes them (Anthropic) worse than OpenAI in terms of
| openness.
|
| Since in this case as we all know. [0]
|
| _" What will permanently change everything is open source and
| transparent AI models that are smaller and more powerful than
| GPT-3 or even GPT-4."_
|
| [0] https://news.ycombinator.com/item?id=34865626
| jjani wrote:
| On the other hand, they have always exposed their raw chain of
| thought, so you know exactly what you're paying for, unlike
| OpenAI who hides it. Similarly they allow an actual thinking
| budget rather than vague "low, medium, high", again unlike
| OpenAI. They also allow API access to all their models without
| draconic send-us-your-personal-data-KYC, once more unlikely
| OpenAI.
|
| They might not fit your personal definition of "openness", but
| they do fit many other equally valid interpretations of that
| contept.
| ryandrake wrote:
| Am I the only one super confused about how to even get started
| trying out this stuff? Just so I wouldn't be "that critic who
| doesn't try the stuff he criticizes," I tried GitHub Copilot and
| was kind of not very impressed. Someone on HN told me Copilot
| sucks, use Claude. But I have no idea what the right way to do it
| is because there are so many paths to choose.
|
| Let's see: we have Claude Code vs. Claude the API vs. Claude the
| website, and they're totally different from each other? One is
| command line, one integrates into your IDE (which IDE?) and one
| is just browser based, I guess. Then you have the different
| pricing plans, Free, Pro, and Max? But then there's also Claude
| Team and Claude Enterprise? These are monthly plans that only
| work with Claude the Website, but Claude Code is per-request? Or
| is it Claude API that's per-request? I have no idea. Then you
| have the models: Claude Opus and Claude Sonnet, with various
| version numbers for each?? Then there's Cline and Cursor and GOOD
| GRIEF! I just want to putz around with something in VSCode for a
| few hours!
| adamors wrote:
| Download Cursor and try it through that, IMO that's currently
| the most polished experience especially since you can change
| models on the fly. For more advanced usecases, CLI is better
| but for getting your feet wet I think Cursor is the best
| choice.
| ryandrake wrote:
| Thanks. Too bad you need to switch editors to go that path. I
| assume the Cursor monthly plans are not the same as the
| Claude monthly plans and you can't use one for the other if
| you want to experiment...
| kingnothing wrote:
| Cursor is built on VSCode.
| olalonde wrote:
| Claude Code CLI.
| ryandrake wrote:
| Thanks. With the CLI, can you get Copilot-ish things like
| tab-completion and inline commands directly in your IDE? Or
| do you need to copy/paste to and from a terminal? It feels
| like running a command on the IDE and then copying the output
| into your IDE is a pretty primitive way to operate.
| cultureulterior wrote:
| Claude does the coding, and edits your files. You just sit
| back and relax. You don't do any tab completion etc.
| avemg wrote:
| My advice is this:
|
| 1) Completely separate in your mind the auto-completion
| features from the agentic coding features. The auto-
| completion features are a neat trick but I personally find
| those to be a bit annoying overall, even if they sometimes
| hit it completely right. If I'm writing the code, I mostly
| don't want the LLM autocompletion.
|
| 2) Pay the $20 to get a month of Claude Pro access and then
| install Claude Code. Then, either wait until you have a
| small task in mind or your stuck on some stupid issue that
| you've been banging your head on and then open your
| terminal and fire up Claude Code. Explain to it in plain
| English what you want it to do. Pretend it's a colleague
| that you're giving a task to over Slack. And then watch it
| go. It works directly on your source code. There is no
| copying and pasting code.
|
| 3) Bookmark the Claude website. The next time you'd Google
| something technical, ask it Claude instead. General
| questions like "how does one typically implement a flizzle
| using the floppity-do framework"? "I'm trying to accomplish
| X, what are my options when using this stack?". General
| questions like that.
|
| From there you'll start to get it and you'll get better at
| leverage the tool to do what you want. Then you can branch
| out the rest of the tool ecosystem.
| ryandrake wrote:
| Interesting about the auto-completion. That was really
| the only Copilot feature I found to be useful. The idea
| of writing out an English prompt and telling Copilot what
| to write sounded (and still sounds) so slow and clunky.
| By the time I've articulated what I want it to do, I
| might as well have written the code myself. The auto-
| completion was at least a major time-saver.
|
| "The card game state is a structure that contains a Deck
| of cards, represented by a list of type Card, and a list
| of Players, each containing a Hand which is also a list
| of type Card, dealt randomly, round-robin from the Deck
| object." I could have input the data structure and logic
| myself in the amount of time it took to describe that.
| avemg wrote:
| I think you should embrace a bit of ambiguity. Don't
| treat this like a stupid computer where you have to
| specify everything in minute detail. Certainly the more
| detail you give, the better to an extent. But really:
| Treat it like you're talking to a colleague and give it a
| shot. You don't have to get it right on the first prompt.
| You see what it did and you give it further instructions.
| Autocomplete is the least compelling feature of all of
| this.
|
| Also, I don't remember what model Copilot uses by
| default, especially the free version, but the model
| absolutely makes a difference. That's why I say to spend
| the $20. That gives you access to Sonnet 4 which is
| where, imo, these models took a giant leap forward in
| terms of quality of output.
| ryandrake wrote:
| Thanks, I shall give it a try.
| rstupek wrote:
| Is Opus as big a leap as sonnet4 was?
| stillpointlab wrote:
| One analogy I have been thinking about lately is GPUs.
| You might say "The amount of time it takes me to fill
| memory with the data I want, copy from RAM to the GPU,
| let the GPU do it's thing, then copy it back to RAM, I
| might as well have just done the task on the CPU!"
|
| I hope when I state it that way you start to realize the
| error in your thinking process. You don't send trivial
| tasks to the GPU because the overhead is too high.
|
| You have to experiment and gain experience with agent
| coding. Just imagine that there are tasks where the
| overhead of explaining what to do and reviewing the
| output are dwarfed by the actual implementation. You have
| to calibrate yourself so you can recognize those tasks
| and offload them to the agent.
| potatolicious wrote:
| There's a sweet spot in terms of generalization. Yes,
| painstakingly writing out an object definition in English
| just so that the LLM can write it out in Java is a poor
| use of time. You want to give it more general tasks.
|
| But not _too_ general, because then it can get lost in
| the sauce and do something profoundly wrong.
|
| IMO it's worth the effort to know these tools, because
| once you have a more intuitive sense for the right level
| of abstraction it really does help.
|
| So not "make this very basic data structure for me based
| on my specs", and more like "rewrite this sequential
| logic into parallel batches", which might take some
| actual effort but also doesn't require the model to make
| too many decisions by itself.
|
| It's also pretty good at tests, which tends to be very
| boilerplate-y, and by default that means you skip some
| cases, do a _lot_ of brain-melting typing, or copy-and-
| paste liberally (and suffer the consequences when you
| missed that _one_ search and replace). The model doesn 't
| tire, and it's a simple enough task that the reliability
| is high. "Generate test cases for this object, making
| sure to cover edges cases A, B, and C" is a pretty good
| ROI in terms of your-time-spent vs. results.
| collinvandyck76 wrote:
| Claude Code is the superior interface in my opinion. Definitely
| start there.
| Filligree wrote:
| You need Claude Pro or Max. The website subscription also
| allows you to use the command line tool--the rate limits are
| shared--and the command line tool includes IDE integration, at
| least for VSCode.
|
| Claude Code is currently best-in-class, so no point in starting
| elsewhere, but you do need to read the documentation.
| wahnfrieden wrote:
| Correct. Claude Code Max with Opus. Don't even bother with
| Sonnet.
| kelnos wrote:
| I wouldn't be too prescriptive. I have Pro, and it's fine.
| I'm not an incredibly heavy user (yet?); I've hit the rate
| limits a couple times, but not to the point where I'm
| motivated to spend more.
|
| I haven't tried it myself, but I've heard from people that
| Opus can be slow when using it for coding tasks. I've only
| been using Sonnet, and it's performed well enough for my
| purposes.
| Filligree wrote:
| Sonnet works fine in many cases. Opus is smarter, and
| custom 'agents' can be set to use either.
|
| I prefer configuring it to use Sonnet for things that
| don't require much reasoning/intelligence, with Opus as
| the coordinator.
| vlade11115 wrote:
| Claude Code has two usage modes: pay-per-token or subscription.
| Both modes are using API under the hood, but with subscription
| mode you are only paying a fixed amount a month. Each
| subscription tier has some undisclosed limits, cheaper plans
| have lower usage limits. So I would recommend paying $20 and
| trying the Claude Code via that subscription.
| dennisy wrote:
| No Opus in the $20 tier though sadly
| oblio wrote:
| What does Opus do extra?
| lxgr wrote:
| It's a much larger, more capable LLM than Claude Sonnet.
| andyferris wrote:
| As far as I can tell - that seems to have changed today!
| kace91 wrote:
| I'm looking for cursor alternatives after confusing pricing
| changes. Is Claude code an option? Can be integrated on an
| editor/ide for similar results?
|
| My use case so far is usually requesting mechanic work I
| would rather describe than write myself like certain test
| suites, and sometimes discovery on messy code bases.
| andyferris wrote:
| Claude Code is really good for this situation.
|
| If you like an IDE, for example VS Code you can have the
| terminal open at the bottom and run Claude Code in that.
| You can put your instructions there and any edits it makes
| are visibile in the IDE immediately.
|
| Personally I just keep a separate terminal open and have
| the terminal and VSCode open on two monitors - seems to
| work OK for me.
| prinny_ wrote:
| What exactly did you try with GitHub copilot? It's not an LLM
| itself, just in interface for an LLM. I have copilot in my
| professional GitHub account and I can choose between chat-gpt
| and Claude.
| AlecSchueler wrote:
| I'm not sure what's complicated about what you're describing?
| They offer two models and you can pay more for higher usage
| limits, then you can choose if you want to run it in your
| browser or in your terminal. Like what else would you expect?
|
| Fwiw I have a Claude pro plan and have no interest in using
| other offerings so I'm not sure if they're super simple (one
| model, one interface, one pricing plan)?
| onlyrealcuzzo wrote:
| When people post this stuff, it's like, are you also confused
| that Nike sells shoes AND shorts AND shirts, and there's
| different colors and skus for each article of clothing, and
| sometimes they sell direct to consumer and other times to
| stores and to universities, and also there's sales and
| promotions, etc, etc?
|
| It's almost as if companies sell more than one product.
|
| Why is this the top comment on so many threads about tech
| products?
| Imustaskforhelp wrote:
| Because I think that claude has gone beyond tech niche at
| this point..
|
| Or maybe that's me, but still whether its through the likes
| of those vibe coding apps like lovable bolt etc.
|
| at the end of the day, Most people are using the same tool
| which is claude since its mostly superior in coding
| (questionable now with oss models, but I still use it
| through kiro).
|
| People expect this stuff to be simple when in reality its
| not and there is some frustation I suppose.
| furyofantares wrote:
| In this case, they tried something and were told they were
| doing it wrong, and they know there's more than one way to
| do it wrong - wrong model, wrong tool using the model,
| wrong prompting, wrong task that you're trying to use it
| for.
|
| And of course you could be doing it right but the people
| saying it works great could themselves be wrong about how
| good it is.
|
| On top of that it costs both money and time/effort
| investment to figure out if you're doing it wrong. It's
| understandable to want some clarity. I think it's pretty
| different from buying shoes.
| AlecSchueler wrote:
| Is it though? People complain about sore feet and hear
| they wear the wrong kind of shoes so they go to the store
| where they have to spend money to find out while trying
| to navigate between dress shoes, minimal shoes, running
| shoes, hiking shoes etc etc., they have to know their
| size, ask for assistance in trying them on...
| evilduck wrote:
| > I think it's pretty different from buying shoes.
|
| Shoe shopping is pretty complex, more so than trialing an
| AI model in my opinion.
|
| Are you a construction worker, a banker, a cashier or a
| driver? Are you walking 5 miles everyday or mostly
| sedentary? Do you require steel toed shoes? How long are
| you expecting them to last and what are you willing to
| pay? Are you going to wear them on long runs or take them
| river kayaking? Do they need to be water resistant,
| waterproof or highly breathable? Do you want glued,
| welted, or stitch down construction? What about flat feet
| or arch support? Does shoe weight matter? What clothing
| are you going to wear them with? Are you going to be
| dancing with them? Do the shoes need a break in period or
| are they ready to wear? Does the available style match
| your preferences? What about availability, are you ok
| having them made to order or do you require something in
| stock now?
|
| By comparison I can try 10 different AI services without
| even needing to stand up for a break while I can't buy
| good dress shoes in the same physical store as a pair of
| football cleats.
| kelnos wrote:
| > _Shoe shopping is pretty complex, more so than trialing
| an AI model in my opinion._
|
| Oh c'mon, now you're just being disingenuous, trying to
| make an argument for argument's sake.
|
| No, shoe shopping is not more complicated than trialing a
| LLM. For all of those questions about shoes you are
| posing, either a) a purchaser won't care and won't need
| to ask them, or b) they already know they have specific
| requirements and will know what to ask.
|
| With an LLM, a newbie doesn't even know what they're
| getting into, let alone what to ask or where to start.
|
| > _By comparison I can try 10 different AI services
| without even needing to stand up for a break_
|
| I can't. I have no idea how to do that. It sounds like
| you've been following the space for a while, and you're
| letting your knowledge blind you to the idea that many
| (most?) people don't have your experience.
| ryandrake wrote:
| Hey, I'm open to the idea that I'm just stupid. But, if
| people in your target market (software developers) don't
| even understand your product line and need a HOWTO+glossary
| to figure it out, maybe there's also a
| branding/messaging/onboarding problem?
| DougBTX wrote:
| My hot take is that your friend should show you what
| they're using, not just dismiss Copilot and leave you
| hanging!
| gmueckl wrote:
| When you walk into a store, you can see and touch all of
| these products. It's intuitive.
|
| With all this LLM cruft all you get is essentially the same
| old chat interface that's like the year 2000 called and
| wants its on-line chat websites back. The only thing other
| than a text box that you usually get is a model selector
| dropdown squirreled away in a corner somewhere. And that
| dropdown doesn't really explain the differences between the
| cryptic sounding options (GPT-something, Claude
| Whatever...). Of course this confuses people!
| derefr wrote:
| Claude.ai, ChatGPT, etc. are finished B2C products.
| They're black boxes, encapsulated experiences. Consumers
| don't want to pick a model, or know what model they're
| using; they just want to "talk to AI", and for the system
| to know which model is best to answer any given question.
| I would bet that for these companies, if their frontend
| observes you using the little model override button, that
| gets instrumented as an "oops" event in their metrics --
| something they aim to minimize.
|
| What _you 're_ looking for, are the landing pages of the
| B2B API products underlying these B2C experiences. That
| would be https://www.anthropic.com/claude,
| https://openai.com/api/, etc. (In general, search "[AI
| company] API".)
|
| From those B2B landing pages, you can usually click
| through to pages with details about each of their models.
|
| Here's the model page corresponding to this news
| announcement, for example:
| https://www.anthropic.com/claude/opus
|
| (Also, note how these B2B pages are on the AI companies'
| own corporate domains; whereas their B2C products have
| their own dedicated domains. From their perspective,
| their B2C offerings are essentially treated as separate
| companies that happen to consume their APIs -- a
| "reference use-case" -- rather than as a part of what the
| B2B company sells.)
| margalabargala wrote:
| If anything, Anthropic has the product lineup that makes
| the most sense. Higher numbers mean better model. Haiku <
| Sonnet < Opus which translates to length/size. Free < Pro <
| Max.
|
| Contrast to something like OpenAI. They've got gpt4.1, 4o,
| and o4. Which of these are newer than one another? How do
| people remember which of o4 and 4o are which?
| hvb2 wrote:
| Not sure is this is sarcasm I'm assuming not.
|
| You're comparing well understood products that are wildly
| different to products with code names. Even someone who has
| never wore a t-shirt will see it on a mannequin and know
| where it goes.
|
| I'm sorry but I cannot tell what the difference is between
| sonnet and opus. Unless one is for music...
|
| So in this case you read the docs. Which is, in your
| analogy, you going to the Nike store and reading up on if a
| tshirt goes on your upper or lower body.
| potatolicious wrote:
| Eh, this seems like a take that reeks a bit of "everyone is
| stupid except me".
|
| I _do_ know the answer to OP 's question but that's because
| I pickle my brain in this stuff. It is legitimately
| confusing.
|
| The analogy to different SKUs strikes me also inaccurate.
| This isn't the difference between shoes, shirts, and shorts
| - it's more as if a company sells three t-shirts but you
| can't really tell what's different about them.
|
| It's Claude, Claude, and Claude. Which ones code for you?
| Well, actually, all of them (Code, web/desktop Claude, and
| the API can all do this)
|
| Which ones do you ask about daily sundry queries? Well, two
| of them (web/desktop Claude, but also the API, but not
| Code). Well, except if your sundry query is about a
| programming topic, in which case Code can also do that!
|
| Ok, if I _do_ want to use this to write code, which one
| should I use? Honestly, any of them, and the company does a
| poor job of explaining why you would use each option.
|
| "Which of these very similar-seeming t-shirts should I
| get?" "You knob. How are posts like this even being
| _posted_. " is just an extremely poor way to approach other
| people, IMO.
| ryandrake wrote:
| > It's Claude, Claude, and Claude. Which ones code for
| you?
|
| Thanks for articulating the confusion better than I
| could! I feel it's a similar branding problem as other
| tech companies have: I'm watching Apple TV+ on my Apple
| TV software running on my Apple TV connected to my Google
| TV that isn't actually manufactured by Google. But that
| Google TV also has an Apple TV app that can play Apple
| TV+.
| potatolicious wrote:
| It's a bit worse than a branding problem honestly, since
| there's legitimate overlap between products, because
| ultimately they're different expressions of the same
| underlying LLMs.
|
| I'm not sure if you ever got a good rundown, but the
| tl;dr is that the 3 products ("Desktop", Code, and API)
| all expose the same underlying models, but are given
| different prompts, tools, and context management
| techniques that make them behave fairly differently and
| affect how you interact with them.
|
| - The API is the bare model itself. It has some coding
| ability because that's inherent to the model - you can
| ask it to generate code and copy and paste it for
| example. You normally wouldn't use this _except_ that if
| you 're using some Copilot-type IDE integration where the
| IDE is doing the work of talking to the model for you and
| integrating it into your developer experience. In that
| case you provide API key and the IDE does the heavy
| lifting.
|
| - The desktop app is actually a half-decent coder. It's
| capable of producing specific artifacts, distinguishing
| between multiple "files" it's writing for you, and
| revisiting previously-written code. "Oh, actually rewrite
| this in Go." is for example a thing it can totally do. I
| find it useful for diagnosing issues interactively.
|
| - "Claude Code" is a CLI-only wrapper around the model.
| Think of it like Anthropic's first-party IDE integration,
| except there's not an IDE, just the CLI. In this case the
| integration gives the tool broad powers to actually
| navigate your filesystem, read specific files, write to
| specific files, run shell commands like builds and tests,
| etc. These are all functions that an IDE integration
| would also give you, but this is done in a Claude-y way.
|
| My personal take is: try Claude Code, since as long as
| you're halfway comfortable with a CLI it's pretty usable.
| If you really want a direct IDE integration you can go
| with the IDE+API key route, though keep in mind that you
| might end up paying more (Claude Code is all-you-can-eat-
| with-rate-limits, where API keys will... just keep
| going).
| ryandrake wrote:
| Wow. After 50 replies to what I thought wasn't such a
| weird question, your rundown is the most enlightening.
| Thank you very much.
| Karrot_Kream wrote:
| FWIW it's probably because a lot of us have been
| following along and trying these things from the start so
| the nuances seem more obvious but also I feel that some
| folks feel your question is a bit "stupid", like "why are
| you suddenly interested in the frontier of these models?
| where were you for the last 2 years?"
|
| And to some extent it is like the PC race. Imagine going
| to work and writing software for whatever devices your
| company writes software for in whatever toolchain your
| company uses. Then 2-3 years after the PC race began
| heating up, asking "Hey I only really write code for
| whatever devices my employer gives me access to. Now I
| want to buy one of these new PCs but I don't really
| understand why I'd choose an Intel over a Motorolla
| chipset or why I'd prioritize more ROM or more RAM, and I
| keep hearing about this thing called RISC that's way
| better than CISC and some of these chips claim to have
| different addressing modes that are better?"
| slackpad wrote:
| Claude Code running in a terminal can connect to your IDE
| so you can review its proposed changes there. I've found
| this to be a nice drop in way to try it out without
| having to change your core workflow and tools too much.
| Check out the /ide command for details.
| Karrot_Kream wrote:
| Also when it comes to API integrations, I find some
| better than others. Copilot has been pretty crummy for me
| but Zed's Agent Mode seems to be almost as good as Claude
| Code. I agree with the general take that Claude Code is a
| good default place to start.
| tomrod wrote:
| > Why is this the top comment on so many threads about tech
| products?
|
| Because you overestimate the difference that the
| representative person understands.
|
| A more accurate analogy is that Nike sells green-blue shoes
| and Nike sells blue-green shoes, but the blue-green shoes
| add 3 feet to your jump and green-blue shoes add 20 mph to
| your 100 yard dash sprint.
|
| You know you need one of them for tomorrow's hurdles race
| but have no idea which is meaningful for your need.
| ryandrake wrote:
| Also, the green-blue shoes charge per-step, but the blue-
| green shoes are billed monthly by signing up for
| BlueGreenPro+ or BlueGreenMax+, each with a hidden step
| limit but BlueGreenMax+ is the one that gives you access
| to the Cyan step model which is better; plus the green-
| blue shoes are only useful when sprinting, but the blue-
| green shoes can be used in many different events, but
| only through the Nike blue-green API that only some
| track&field venues have adopted...
| true_religion wrote:
| This is like being told to buy Nike shoes. Then when you
| proudly display your new cleats, they tell you "no, I meant
| you should by basketball shoes. The cleats are terrible."
| squeaky-clean wrote:
| Which Nike shoe is best for basketball? The Nike Dunk, Air
| Force 1, Air Jordan, LeBron 20, LeBron XXI Prime 93, Kobe
| IX elite, Giannis Freak 7, GT Cut, GT Cut 3, GT Cut 3
| Turbo, GT Hustle 3, or the KD18?
|
| At least with those you can buy whatever you think is
| coolest. Which Claude model and interface should the
| average programmer use?
| AlecSchueler wrote:
| What's the average programmer? Is it someone who likes
| CLI tools? Or who likes IDE integration? Different
| strokes for different folks and surely the average
| programmer understands what environment they will be most
| comfortable in.
| nawgz wrote:
| > Different strokes for different folks and surely the
| average programmer understands what environment they will
| be most comfortable in.
|
| That's a silly claim to me, we're talking about a
| completely new environment where you prompt an AI to
| develop code, and therefore an "average programmer" is
| unlikely to have any meaningful experience or intuition
| with this flow. That is exactly what GP is talking about
| - where does he plug in the AI? What tradeoffs are there
| to different options?
|
| The other day I had someone judge me for asking this
| question by dismissively saying "dont say youve still
| been using ChatGPT and copy/paste", which made me laugh -
| I don't use AI at all, so who was he looking down on?
| kelnos wrote:
| Because the offerings are not simple. Your Nike example is
| silly; everyone knows what to do with shoes and shorts and
| shirts, and why they might want (or not want) to buy those
| particular items from Nike.
|
| But for someone who hasn't been immersed in the "LLM
| scene", it's hard to understand why you might want to use
| one particular model of another. It's hard to understand
| why you might want to do per-request API pricing vs. a
| bucketed usage plan. This is a new technology, and the
| landscape is changing weekly.
|
| I think maybe it might be nice if folks around here were a
| bit more charitable and empathetic about this stuff.
| There's no reason to get all gatekeep-y about this kind of
| knowledge, and complaining about these questions just
| sounds condescending and doesn't do anyone any good.
| pdntspa wrote:
| Because few seem to want to expend the effort to dive in
| and understand something. Instead they want the details
| spoonfed to them by marketing or something.
|
| I absolutely loathe this timeline we're stuck in.
| windsignaling wrote:
| On the contrary, I'm confused about why you're confused.
|
| This is a well-known and documented phenomenon - the paradox
| of choice.
|
| I've been working in machine learning and AI for nearly 20
| years and the number of options out there is overwhelming.
|
| I've found many of the tools out there do some things I want,
| but not others, so even finding the model or platform that
| does exactly what I want or does it the best is a time-
| consuming process.
| joshmarlow wrote:
| VSCode has a pretty good Gemini integration - it can pull up a
| chat window from the side. I like to discuss design changes and
| small refactorings ("I added this new rpc call in my protobuf
| file, can you go ahead and stub out the parts of code I need to
| get this working in these 5 different places?") and it usually
| does a pretty darn good job of looking at surrounding idioms in
| each place and doing what I want. But gemini can be kind of
| slow here.
|
| But I would recommend just starting using Claude in the
| browser, talk through an idea for a project you have and ask it
| to build it for you. Go ahead and have a brain storming session
| before you actually ask it to code - it'll help make sure the
| model has all of the context. Don't be afraid to overload it
| with requirements - it's generally pretty good at putting
| together a coherent plan. If the project is small/fits in a
| single file - say a one page web app or a complicated data
| schema + sql queries - then it can usually do a pretty good job
| in one place. Then just copy+paste the code and run it out of
| the browser.
|
| This workflow works well for exploring and understanding new
| topics and technologies.
|
| Cursor is nice because it's an AI integrated IDE (smoother than
| the VSCode experience above) where you can select which models
| to use. IMO it seems better at tracking project context than
| Gemini+VSCode.
|
| Hope this helps!
| spaceman_2020 wrote:
| Download Claude Code
|
| Create a new directory in your terminal
|
| Open that directory, type in "Claude" to run Claude
|
| Press Shit + Tab to go into planning mode
|
| Tell Claude what you want to build - recommend something simple
| to start with. Specify the languages, environment, frameworks
| you want, etc.
|
| Claude will come up with a plan. Modify the plan or break it
| into smaller chunks if necessary
|
| Once plan is approved, ask it to start coding. It will ask you
| for permissions and give you the finished code
|
| It really is something when you actually watch it go.
| zarzavat wrote:
| Github Copilot and Claude code are not exactly competitors.
|
| Github Copilot is autocomplete, highly useful if you use VS
| Code, but if you are using e.g. Jetbrains then you have other
| options. Copilot comes with a bunch of other stuff that I
| rarely use.
|
| Claude code is project-wide editing, from the CLI.
|
| They complement each other well.
|
| As far as I'm concerned the utility of the AI-focused editors
| has been diminished by the existence of Claude code, though not
| entirely made redundant.
| fkyoureadthedoc wrote:
| > Github Copilot is autocomplete... comes with a bunch of
| other stuff that I rarely use.
|
| That bunch of other stuff includes the chat, and more
| recently "Agent Mode". I find it pretty useful, and the
| autocomplete near useless.
| qingcharles wrote:
| This isn't correct. GitHub Copilot now totally competes with
| Claude Code. You can have it create an entire app for you in
| "Agent" mode if you're feeling brave. In fact, seeing as
| Copilot is built directly into Visual Studio when you
| download it, I guess they have a one-up.
|
| Copilot isn't locked to a specific LLM, though. You can
| select the model from a panel, but I don't think you can plug
| in your own right now, and the ones you can select might not
| be SOTA because of that.
| alienbaby wrote:
| Sonnet 4 in copilot agent mode has been doing great work
| for me lately. Especially once you realise that at least
| 50% of the work is done before you get to copilot, as
| architectural and product specs and implementations plans.
| tomwojcik wrote:
| Opencode https://github.com/sst/opencode provides a CC like
| interface for copilot. It's a slightly worse tool, but since
| copilot with Claude 4 is super cheap, I ended up preferring
| it over CC. Almost no limits, cheaper, you can use all the
| Copilot models, GH is not training on your data.
| andsoitis wrote:
| > use Claude. But I have no idea what the right way to do it is
| because there are so many paths to choose.
|
| Anthropic has this useful quick start guide:
| https://docs.anthropic.com/en/docs/claude-code/quickstart
| StephenHerlihyy wrote:
| Kilo Code for VSCode is pretty solid. Give it a try.
| wintermutestwin wrote:
| Yes. You basically need an LLM to provide guidance on product
| selection in this brave new world.
|
| It is actually one of my most useful use cases of this tech.
| Nice to have a way to ask in private so you don't get snarky
| answers like: it's just like buying shoes!
| vanillax wrote:
| All the tools, copilot,claude, gemini in vscode are all
| completely worthless unless in Agent Mode. I have no idea why
| none of these tools dont default to Agent mode.
| ActorNightly wrote:
| If you want your own cheap IDE integration, you can set up
| VSCode with Continue extension, ollama running locally, and a
| small agent model.
| https://docs.continue.dev/features/agent/model-setup.
|
| If you want to understand how all of this works, the best way
| is to build a coding agent manually. Its not that hard
|
| 1. Start with Ollama running locally and Gemma3 QAT models.
| https://ollama.com/library/gemma3
|
| 2. Write a wrapper around Ollama using your favorite language.
| The idea is that you want to be able to intercept responses
| coming back from the model.
|
| 3. Create a system prompt that tells the model things like "if
| the user is asking you to create a file, reply in this
| format:...". Generally to start, you can specify instructions
| for read file, write file, and execute file
|
| 4. In your wrapper, when you send the input chat prompt, and
| get the model response back, you look for those formats, and
| make the wrapper actually execute the action. For example if
| the model replies back with the format to read file, you read
| the file from your wrapper code and send it back to the model.
|
| Every coding assistant is basically this under the hood with
| just a lot more fluff and their own IDE integration.
|
| The benefit of doing your own is that you can customize it to
| your own needs, and when you direct a model with more precision
| even the small models perform very well with much faster speed.
| afro88 wrote:
| OP is asking for where to get started with Claude for coding.
| They're confused. They just want to mess around with it in
| VSCode. And you start talking about Ollama, PAT, coding your
| own wrapper, composing a system prompt etc.!?
| jimbo808 wrote:
| You just described all of your options in detail - what's the
| problem? Pick one. Seems like you've got a very thorough grasp
| on how to get started trying the stuff out, but it requires you
| to choose how you want to do that.
| kelnos wrote:
| If you're looking for a coding assistant, get Claude Code, and
| give it a try. I think you need the Pro plan at a minimum for
| that ($20/mo; I don't think Free includes Claude Code). Don't
| do the per-request API pricing as it can get expensive even
| while just playing around.
|
| Agree that the offering is a bit confusing and it's hard to
| know where to start.
|
| Just FYI: Claude Code is a terminal-based app. You run it in
| the working directory of your project, and use your regular
| editor that you're used to, but of course that means there's no
| editor integration (unlike something like Cursor). I personally
| like it that way, but YMMV.
| robluxus wrote:
| > I just want to putz around with something in VSCode for a few
| hours!
|
| I just googled "using claude from vscode" and the first page
| had a link that brought me to anthropic's step by step guide on
| how to set this up exactly.
|
| Why care about pricing and product names and UI until it's a
| problem?
|
| > Someone on HN told me Copilot sucks, use Claude.
|
| I concur, but I'm also just a dude saying some stuff on HN :)
| zaphirplane wrote:
| try asking it ?
| screye wrote:
| Cursor + Claude 4 = best quality + UX balance. Pay up for
| 20/month subscription.
|
| Cursor imports in your VSCode setup. Setting it up should be
| trivial.
|
| Use Agent mode. Use it in a preexisting repo.
|
| You're off the races.
|
| There is a lot more you can do, but you should start seeing
| value at this point.
| w0m wrote:
| honestly - copilot free mode; and just play with the agentic
| stuff can give you a good idea. Attach it to Roo and you'll get
| a good idea. Realize that if you paid to use a better model;
| you'd get better results as free doesn't have a ton of premium
| tokens.
| ramesh31 wrote:
| Will the price for 4 go down? I still find Opus completely
| unusable for the cost/performance, as someone who spends
| thousands per month on tokens. There's really no noticeable
| difference from Sonnet, at nearly 10x the price.
| _vaporwave_ wrote:
| It's interesting that Anthropic maintains current prices for
| prior state of the art models when doing a new release. Why offer
| a model with worse performance for the same price? What
| incentives are they trying to create?
| dysoco wrote:
| I'm guessing it's mostly for legacy reasons. When 3.7 came out
| many people were not happy with it and went back to 3.5; I
| guess supporting older models for a while makes sense.
| gwd wrote:
| > What incentives are they trying to create?
|
| One obvious explanation is that pricing is strongly related to
| the _price to them_ , and that their only incentive is for
| people to use an expensive model of they really need it.
|
| I forget which one of the GPT models was better, faster, _and
| cheaper_ than the previous model. The incentive there is
| obviously, "If you want to use the old model for whatever
| reason, fine, but we really want you to use the new one because
| costs _us_ less to run. "
| mrcwinn wrote:
| o3 and o3-pro are just so good. Sonnet goes off the deep end too
| often and Opus, in my experience, is not as strong at reasoning
| compared to OpenAI, despite the higher costs. Rarely do we see a
| worse, more expensive product win - but competition is good and
| I'm rooting for Anthropic nonetheless!
| AlecSchueler wrote:
| Off the deep end?
| WXLCKNO wrote:
| o3 feels pretty good to me as well but o3-pro has consistently
| one shotted problems other LLMs got stuck on.
|
| I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3
| etc resulting in sometimes hundreds of lines of code.
|
| Versus o3-pro (very slowly) analyzing and then fixing something
| that seemed completely unrelated in a one or two line change
| and truly fixing the root cause.
|
| o3-pro level LLMs at reduced cost and increased speed will
| already be amazing..
| thoop wrote:
| The article says "We plan to release substantially larger
| improvements to our models in the coming weeks."
|
| Sonnet 4 has definitely been the best model for our product's use
| case, but I'd be interested in trying Haiku 4 (or 4.1?) just due
| to the cost savings.
|
| I'm surprised Anthropic hasn't mentioned anything about Haiku 4
| yet since they released the other models.
| mocmoc wrote:
| Their limits are just ... a real road blocker
| bananapub wrote:
| huh?
|
| Claude Mad is tens of hours of opus a month, or you can pay per
| token and have unlimited.
|
| Or did you mean "I wish it was cheaper"?
| OldGreenYodaGPT wrote:
| Claude Code has honestly made me at least 10x more productive.
| I've burned through about 3 billion tokens and have been
| consistently merging 5+ PRs a day, tackling tons of tech debt,
| improving GitHub Actions, and making crazy progress on product
| work
| totaa wrote:
| can you share your workflow?
| steinvakt2 wrote:
| I also have this feeling that I'm 2-10x more productive. But
| isn't it curious how a lot of devs feel this way, but no devs
| that I know have the experience that any of their colleagues
| have become 2-10x more productive?
| nevertoolate wrote:
| 10x means to me that i can finish a month of work in max 2
| days and go cloud watching. What does it mean for you?
| mwigdahl wrote:
| <raises hand> Our automated test folks were chronically
| behind, struggling to keep up with feature development. I got
| the two assigned to the team that was the most behind set up
| with Claude Code. Six weeks later they are fully caught up,
| expanding coverage, and integrating AI code review into our
| build pipeline.
|
| It's not 10x, but those guys do seem like they've hit
| somewhere around 2x improvement overall.
| samtp wrote:
| What type of work do you do and what type of code do you
| produce?
|
| Because I've found it to work pretty amazingly for things that
| don't need to be exact (like data modeling) or don't have any
| security implications (public apps). But for everything else I
| end up having to find all the little bugs by reading the code
| line by line, which is much slower than just writing the code
| in the first place.
| AstroBen wrote:
| only 10x? I'm at least 100x as productive. I only type at a
| measly 100wpm, whereas Claude can output 100+ tokens a second
|
| I'm outputting a PR every 6 minutes. The reviewers are using
| Claude to review everything. It used to take a day to add 100
| lines to the codebase.. now I can add 100 lines in one prompt
|
| If I want even more productivity (at risk of making the rest of
| my team look slow) I can tell Claude to output double the lines
| and ship it off for review. My performance metrics are
| incredible
| samtp wrote:
| So no human reads the actual code that you push to
| production? Are you not worried about security risks,
| spaghetti code, and other issues? Or does Claude magically
| make all of those concerns go away?
| AstroBen wrote:
| forgot the /s
| samtp wrote:
| Sorry lol, sometimes difficult to separate the hype boys
| from actual sarcasm these days
| qingcharles wrote:
| Not sure if joking...?
| AstroBen wrote:
| This is only the beginning. I can see myself having 100
| Claude tasks running concurrently - the only problem is
| edits clash between files. I'm working on having Claude
| solve this by giving each instance its own repo to work
| with, then I ask the final Claude to mash it all together
| as best it can
|
| What's 100x productivity multiplied by 100 instances of
| Claude? 10,000x productivity
|
| Now to be fair and a bit more realistic it's not actually
| 10000x because it takes longer to push the PR because the
| file sizes are so big. Let's call it 9800x. That's still a
| sizable improvement
| trallnag wrote:
| Big if true
| screye wrote:
| How do you maintain high confidence in the code it generates ?
|
| My current bottleneck is having to review the huge amounts of
| code that these models spit out. I do TDD, use auto-linting and
| type-checking.... but the model makes insidious changes that
| are only visible on deep inspection.
| theappsecguy wrote:
| The only way you could be 10x more productive is omit you were
| doing nothing before.
| P24L wrote:
| The improved Opus isn't about achieving significantly better peak
| performance for me. It's not about pushing the high end of the
| spectrum. Instead, it's about consistently delivering better
| average results - structuring outputs more effectively, self-
| correcting mistakes more reliably, and becoming a trustworthy
| workhorse for everyday tasks.
| djha-skin wrote:
| Opus 4(.1) is _so_ expensive[1]. Even Sonnet[2] costs me $5 per
| hour (basically) using OpenRouter + Codename Goose[3]. The crazy
| thing is Sonnet 3.5 costs _the same thing_ [4] right now. Gemini
| Flash is more reasonable[5], but always seems to make the wrong
| decisions in the end, spinning in circles. OpenAI is better, but
| still falls short of Claude's performance. Claude also gives back
| 400's from its API if you CTRL-C in the middle though, so that's
| annoying.
|
| Economics is important. Best bang for the buck seems to be OpenAI
| ChatGPT 4.1 mini[6]. Does a decent job, doesn't flood my context
| window with useless tokens like Claude does, API works every
| time. Gets me out of bad spots. Can get confused, but I've been
| able to muddle through with it.
|
| 1: https://openrouter.ai/anthropic/claude-opus-4.1
|
| 2: https://openrouter.ai/anthropic/claude-sonnet-4
|
| 3: https://block.github.io/goose/
|
| 4: https://openrouter.ai/anthropic/claude-3.5-sonnet
|
| 5: https://openrouter.ai/google/gemini-2.5-flash
|
| 6: https://openrouter.ai/openai/gpt-4.1-mini
| generalizations wrote:
| Get a subscription and use claude code - that's how you get
| actual reasonable economics out of it. I use claude code all
| day on the max subscription and maybe twice in the last two
| weeks have I actually hit usage limits.
| tgtweak wrote:
| Is it considerably more cost effective than cline+sonnet api
| calls with caching and diff edits?
|
| Same context length and throughput limits?
|
| Anecdotally I find gpt4.1 (and mini) were pretty good at
| those agentic programming tasks but the lack of token caching
| made the costs blow up with long context.
| bavell wrote:
| I'm on the basic $20/mo sub and only ran into token cap
| limitations in the first few days of using Claude Code (now
| 2-3 weeks in) before I started being more aggressive about
| clearing the context. Long contexts will eat up tokens caps
| quickly when you are having extended back-and-forth
| conversations with the model. Otherwise, it's been
| effectively "unlimited" for my own use.
| bgirard wrote:
| YMMV I'm using the $100/mo max subscription and I hit the
| limit during a focused coding session where I'm giving it
| prompts non-stop.
|
| Unfortunately there's no easy tool to inspect usage. I
| started a project to parse the Claude logs using Claude
| and generate a Chrome trace with it. It's promising but
| it was taking my tokens away from my core project.
| bartman wrote:
| Check out ccusage, it sounds like the tool you're
| describing: https://github.com/ryoppippi/ccusage
| bgirard wrote:
| That's neat. According to the tool I'm consuming ~300m
| tokens per day coding with a (retail?) cost of ~125$/day.
| The output of the model is definitely worth $100/mo to
| me.
| symbolicAGI wrote:
| ccusage on GitHub.
| seneca wrote:
| Is there a way to sign up for Claude code that doesn't
| involve verifying a phone number with Anthropic? They don't
| even accept Google Voice numbers.
|
| Maybe I'm out of touch, but I'm not handing out my phone
| number to sign up for random SaaS tools.
| tagami wrote:
| use a burner
| kroaton wrote:
| GLM 4.5 / Kimi K2 / Qwen Coder 3 / Gemini Pro 2.5
| paul7986 wrote:
| Claude plus failed me today badly compared to chatGPT plus.
|
| I uploaded a web design of mine (jpeg) and asked Claude to create
| the html/css. Asked GPT to do the same. GPT's code looked the
| closet to the design I created and uploaded. Just five to ten
| small tweaks and I was done vs. Claude it would have taken me
| almost triple the steps.
|
| I actually subscribed to both today (resubscribed to GPT) and
| going to keep testing which one is the better front-end developer
| (i am, but got to embrace AI ).
| alvis wrote:
| Funny Open AI and Anthropic seems to be coordinating their
| releases on the same day
| KaoruAoiShiho wrote:
| For me this is the big news of the day. Looks insane.
| hartator wrote:
| > 1 min read
|
| What the point of these?
|
| Kind of interesting that we live in an area of AI super advanced,
| but still make basic UI/UX mistake. The tagline of this blog post
| shouldn't be "1 min read".
|
| It's not even accurate. I timed myself not reading fast but not
| slow, took me 3 min 30s. Maybe the images need be OCRed to make
| the estimation more accurate.
| TimMeade wrote:
| This has been the worse Claude day ever. Just fell apart. Not
| sure if the release is why, but cursing in documents and can not
| fix a bug after hours of back and forth.
___________________________________________________________________
(page generated 2025-08-05 23:00 UTC)