[HN Gopher] Claude 3.7 Sonnet and Claude Code
___________________________________________________________________
Claude 3.7 Sonnet and Claude Code
Author : bakugo
Score : 1052 points
Date : 2025-02-24 18:28 UTC (4 hours ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| bnc319 wrote:
| Pretty amazing how DeepSeek started the visual reasoning trend,
| xAI featured it in their latest release, and now Anthropic does
| the same.
| anjel wrote:
| I took DS visual reasoning to be an elegant misdirect from how
| much slower DS returns your query's output.
| t55 wrote:
| Anthropic doubling down on code makes sense, that has been their
| strong suit compared to all other models
|
| Curious how their Devin competitor will pan out given Devin's
| challenges
| malux85 wrote:
| I thought the same thing, I have 3 really hard problems that
| Claude (or any model) hasn't been able to solve so far and I'm
| really excited to try them today
| ru552 wrote:
| Considering that they are the model that powers a majority of
| Cursor/Windsurf usage and their play with MCP, I think they
| just have to figure out the UX and they'll be fine.
| weinzierl wrote:
| It's their strong suit no doubt, but sometimes I wish the chat
| would not be so eager to code.
|
| It often throws code at me when I just want a conceptual or
| high level answer. So often that I routinely tell it not to.
| KerryJones wrote:
| I complain about this all the time, despite me saying "ask me
| questions before you code" or all these other instructions to
| code less, it is SO eager to code. I am hoping their 3.7
| reasoning follows these instructions better
| vessenes wrote:
| We should remember 3.5 was trained in an era when ChatGPT
| would routinely refuse to code at all and architected in an
| era when system prompts were not necessarily very
| effective. I bet this will improve, especially now that
| Claude has its own coding and arch cli tool.
| NitpickLawyer wrote:
| > I just want a conceptual or high level answer
|
| I've found claude to be very receptive to precise
| instructions. If I ask for "let's first discuss the
| architecture" it never produces code. Aider also has this
| feature with /architect
| ap-hyperbole wrote:
| I added custom instruction under my Profile settings in the
| "personal preferences" text box. Something along the lines of
| "I like to discuss things before wanting the code. Only
| generate code when I prompt for it. Any question should be
| answered to as a discussion first and only when prompted
| should the implementation code be provided". It works well,
| occasionally I want to see the code straight away but this
| does not happen as often.
| perdomon wrote:
| I get this as well, to the point where I created a specific
| project for brainstorming without code -- asking for
| concepts, patterns, architectural ideas without any code
| samples. One issue I find is that sometimes I get better
| answers without using projects, but I'm not sure if that's
| everyone experience.
| bitbuilder wrote:
| That's been my experience as well with projects, though I
| have yet to do any sort of A/B testing to see if it's all
| in my head or not.
|
| I've attributed it to all your project content (custom
| instruction, plus documents) getting thrown into context
| before your prompt. And honestly, I have yet to work with
| any model where the quality of the answer wasn't inversely
| proportional to the length of context (beyond of course
| supplying good instruction and documentation where needed).
| ben30 wrote:
| I've set up a custom style in Claude that won't code but just
| keeps asking questions to remove assumptions:
|
| Deep Understanding Mode (Gen Hui shi - Nemawashi Phase)
|
| Purpose: - Create space (Jian , ma) for understanding to
| emerge - Lay careful groundwork for all that follows -
| Achieve complete understanding (grokking) of the true need -
| Unpack complexity (desenrascar) without rushing to solutions
|
| Expected Behaviors: - Show determination (sisu) in
| questioning assumptions - Practice careful attention to
| context (taarof) - Hold space for ambiguity until clarity
| emerges - Work to achieve intuitive grasp (apercu) of core
| issues
|
| Core Questions: - What do we mean by [key terms]? - What
| explicit and implicit needs exist? - Who are the
| stakeholders? - What defines success? - What constraints
| exist? - What cultural/contextual factors matter?
|
| Understanding is Complete When: - Core terms are clearly
| defined - Explicit and implicit needs are surfaced - Scope is
| well-bounded - Success criteria are clear - Stakeholders are
| identified - Achieve apercu - intuitive grasp of essence
|
| Return to Understanding When: - New assumptions surface -
| Implicit needs emerge - Context shifts - Understanding feels
| incomplete
|
| Explicit Permissions: - Push back on vague terms - Question
| assumptions - Request clarification - Challenge problem
| framing - Take time for proper nemawashi
| KaoruAoiShiho wrote:
| They cited Cognition (Devin's maker) in this blog post which is
| kinda funny.
| Flux159 wrote:
| It's interesting that Anthropic is making their own coding agent
| with Claude Code - is this a sign of them looking to move up the
| stack and more into verticals that model wrapper startups are in?
| madduci wrote:
| GitHub copilot has now introduced Claude as model as well
| vessenes wrote:
| This makes sense to me: sell razor blades. Presumably Claude
| has a large developer distribution channel so they will keep
| eyeballing what to 'give away' that turns the dials on
| inference billing.
|
| I'd guess this will keep raising the bar for paid or open
| source competitors, so probably good for end users esp given
| they aren't a monopoly by any means.
| estsauver wrote:
| The docs for Claude code don't seem to be up yet but are linked
| here: http://docs.anthropic.com/s/claude-code
|
| I'm not sure if it's a broken link in the blog post or just
| hasn't been published yet.
| jumploops wrote:
| Saw the same thing, but looks to be up now!
| tablet wrote:
| The progress in AI area is insane. I can't keep up with all the
| news. And I have work to do...
| amelius wrote:
| It stopped being revolutionary and is now mostly evolutionary,
| though.
| dingnuts wrote:
| it's been evolutionary for a long time. I fine-tuned a GPT-2
| based chat bot that could form complete sentences back in
| like 2017
|
| It's been so long that I'm not even certain which YEAR I set
| that up.
| falcor84 wrote:
| Where do you draw the line? If going from forming sentences
| to achieving medal level success on IMO questions, doing
| extensive web research on its own and writing entire SaaS
| apps based on a prompt in under 10 years is just
| "evolutionary", then it's one heck of an evolution.
| og_kalu wrote:
| >I fine-tuned a GPT-2 based chat bot that could form
| complete sentences back in like 2017.
|
| GPT-2 was a 2019 release lol.
| frankfrank13 wrote:
| This is a pretty small update, no? Nothing major since R1,
| everyone else is just catching up to that, and putting small
| spins on it, Anthropic's is "hybrid" research instead of
| separate models
| tablet wrote:
| Well, now I have to play with it, try to see how it will
| generate code for our agentic assistance (we do rely on code
| to execute tasks flows), etc.
| TIPSIO wrote:
| "Make me a website about books. Make it look like a designer and
| agency made it. Use Tailwind."
|
| https://play.tailwindcss.com/tp54wfmIlN
|
| Getting way better at UI.
| flir wrote:
| That's not hideous. Derivative, but that's the nature of the
| beast.
| jasonjmcghee wrote:
| I feel like something isn't working... when i try to click
| anything it just reloads. i can't see the collections
| handfuloflight wrote:
| As a designer and agency... this is extremely basic... but so
| was the prompt.
| lysace wrote:
| It's fascinating how close these companies are to each other.
| Some company comes up with something clever/ground-breaking and
| everyone else has implemented it a few weeks later.
|
| Hard not to think of Kurzweil's Law of Accelerating Returns.
| mechagodzilla wrote:
| It does seem like it will be very, very hard for the companies
| training their own models to recoup their investment when the
| capabilities of open-weight models catch up so quickly -
| general purpose LLMs just seem destined to be a cheap
| commodity.
| jsheard wrote:
| Well, the companies releasing open weights also need to
| recoup their investments at some point, they can't coast on
| VC hype forever. Huge models don't grow on trees.
| mechagodzilla wrote:
| Or, like Meta, they make their money elsewhere and just
| seem interested in wrecking the economics of LLMs. As soon
| as an open-weight model is released, it basically sets a
| global floor that says "Models with similar or worse
| performance effectively have zero value," and that floor
| has been rising incredibly quickly. I'd be surprised if the
| vast, vast majority of queries ChatGPT gets couldn't get
| equivalently good results from llama3/deepseek/qwen/mistral
| models, even for those paying for the pro versions.
| Philpax wrote:
| Eh, to some extent - there's still a pretty significant
| cost to actually running inference for those models. For
| example, no consumer can run DeepSeek v3/r1 - that
| requires tens, possibly hundreds, of thousands of dollars
| of hardware to run.
|
| There's still room for other models, especially if they
| have different performance characteristics that make them
| suitable to run under consumer constraints. Mistral has
| been doing quite well here.
| mechagodzilla wrote:
| If you don't need to pay for the model development costs,
| I think running inference will just be driven down to the
| underlying cloud computing costs. The actual requirement
| to passably (~4-bit quantization) run Deepseek v3/r1 at
| home is really just having 512GB or so of RAM - I bought
| a used dual-socket xeon for $2k that has 768GB of RAM,
| and can run Deepseek R1 at 1-1.5 tokens/sec, which is
| perfectly usable for "ask a complicated question, come
| back an hour or so later and check on the result".
| riku_iki wrote:
| I think Meta folks just don't know how to come to this
| market and build something potentially profitable, and
| doing random stuff, because need to report some results
| to management.
| azinman2 wrote:
| It's extremely unlikely that everyone is copying in a few weeks
| for models that themselves take many weeks if not longer to
| train. Great minds think alike, and everyone is influencing
| everyone. The history of innovation is filled with examples of
| similar discoveries around the same time but totally
| disconnected in the world. Now with the rate of publishing and
| the openness of the internet, you're only bound to get even
| more of that.
| lysace wrote:
| Isn't the reasoning thing essentially a bolt-on to existing
| trained models? Like basically a meta-prompt?
| pertymcpert wrote:
| Somewhat but not exactly? I think the models need to be
| trained to think.
| azinman2 wrote:
| No.
|
| DeepSeek and now related projects have shown it's possible
| to add reasoning via SFT to existing models, but that's not
| the same as a prompt. But if you look at R1 they do a blend
| of techniques to get reasoning.
|
| For Anthropic to have a hybrid model where you can control
| this, it will have to be built into the model directly in
| its training and probably architecture as well.
|
| If you're a competent company filled with the best AI minds
| and a frontier model, you're not just purely copying...
| you're taking ideas while innovating and adapting.
| Philpax wrote:
| The fundamental innovation is training the model to reason
| through reinforcement learning; you can train existing
| models with traces from these reasoning models to get you
| within the same ballpark, but taking it further requires
| you to do RL yourself.
| KaoruAoiShiho wrote:
| The copying here probably goes to strawberry from o1 which is
| like at least 6 months but maybe copying efforts started even
| earlier.
| riku_iki wrote:
| > for models that themselves take many weeks if not longer to
| train.
|
| they all have foundational heavy-trained model, and then they
| can do follow up experimental training much faster.
| luma wrote:
| Where RL can play into post training there's something of an
| anti-moat. Maybe a "tow rope"?
|
| Let's say OAI releases some great new model. The moment it
| becomes available via API, everyone else can make use of that
| model to create high-quality RL training data, which can then
| be used to make their models perform better.
|
| The very act of making an AI model commercially available is
| the same act which allows your competitors to pull themselves
| closer to you.
| ctoth wrote:
| I've been using O3-mini with reasoning effort set to high in
| Aider and loving the pricing. This looks as though it'll be about
| three times as expensive. Curious to see which falls out as most
| useful for what over the next month!
| rahimnathwani wrote:
| Aro using o3-mini for editing or just architect in architect-
| editor mode?
| vessenes wrote:
| It is .. not a great architect. I have high hopes for 3.7
| though - even 3.5 architect matched with 3.5 coding is
| generally better than 3.5 coding alone.
| rs_rs_rs_rs_rs wrote:
| Hope it's worth the money because it's quite expensive.
| m3kw9 wrote:
| Wonder if Aider will copy some of these features
| sergiotapia wrote:
| Already available in Cursor!
| https://x.com/cursor_ai/status/1894093436896129425
|
| (although I do not see it)
| ianhawes wrote:
| > Include the beta header output-128k-2025-02-19 in your API
| request to increase the maximum output token length to 128k
| tokens for Claude 3.7 Sonnet.
|
| This is pretty big! Previously most models could accept massive
| input tokens but would be restricted to 4096 or 8192 output
| tokens.
| thegeomaster wrote:
| This amounts to a cost-saving measure - you can generate
| arbitrarily many tokens by appending the output and re-invoking
| the model.
| ungreased0675 wrote:
| Awesome. Claude is significantly better than other models at code
| assistant tasks, or at least in the way I use it.
| jasondigitized wrote:
| Totally agree. I continue to be blown away at how good it is at
| understanding, explaining, and writing code. Got an obscure
| error? Give Claude enough context and it is pretty dang good
| and getting you on glide slope.
| jedberg wrote:
| Last week when Grok launched the consensus was that its coding
| ability was better than Claude. Anyone have a benchmark with this
| new model? Or just warm feelings?
| esafak wrote:
| They merely claimed that. I have not seen many people confirm
| that it is the best, let alone a consensus. I don't believe it
| is even available through an API yet.
| minihat wrote:
| Grok 3 with thinking is comparable to o1 for writing complex
| algorithms.
|
| However, Grok sometimes loses the context where o1 seems not
| to. For this reason I still mostly use o1.
|
| I have found both o1 and Grok 3 to be substantially better than
| any Claude offering.
| bbor wrote:
| Just as humans use a single brain for both quick responses and
| deep reflection, we believe reasoning should be an integrated
| capability of frontier models rather than a separate model
| entirely.
|
| Interesting. I've been working on exactly this for a bit over two
| years, and I wasn't surprised to see UAI finally getting traction
| from the biggest companies -- but how deep do they really take
| it...? I've taken this philosophy as an impetus to build an
| integrated system of interdependent hierarchical modules, much
| like Minsky's Society of Mind that's been popular in AI for
| decades. But this (short, blog) post reads like it's more of a
| behavioral goal than a design paradigm.
|
| Anyone happen to have insight on the details here? Or, even
| better, anyone from Anthropic lurking in these comments that
| cares to give us some hints? I promise, I'm not a competitor!
|
| Separately, the throwaway paragraph on alignment is worrying as
| hell, but that's nothing new. I maintain hope that Anthropic is
| keeping to their founding principles in private, and tracking
| more serious concerns than "unnecessary refusals" and prompt
| injection...
| Alex-Programs wrote:
| IIRC there's some reasoning in old Sonnet too, they're just
| expanding that. Perhaps that's part of why it was so good for a
| while.
|
| https://www.reddit.com/r/ClaudeAI/comments/1iv356t/is_sonnet...
| isoprophlex wrote:
| YES. I've tried them all but Sonnet is still the model I'm most
| productive with, even better than the o1/o3 models.
|
| Wish I could find the link to enroll in their Claude Code beta...
| frankfrank13 wrote:
| here -- https://docs.anthropic.com/en/docs/agents-and-
| tools/claude-c...
| isoprophlex wrote:
| Thanks!
| waltercool wrote:
| Just like OpenAI or Grok, there is no transparency and no way for
| self-hosting purposes. Your input and confidential information
| can be collected for training purposes.
|
| I just don't trust those companies when you use their servers.
| This is not a good approach to LLM democratization.
| azinman2 wrote:
| I wouldn't assume there's no way to self host -- it just costs
| a lot more than open weights.
|
| Anthropic claims they don't train on their inputs. I haven't
| seen any reason to disbelieve them.
| waltercool wrote:
| But there is no way to know if their claims are true either.
| Your inputs are processed into their servers, then you get a
| response. Whatever happens in the middle, only Anthropic
| knows. We don't even know of governments are actually pushing
| AI companies to enforce censorship or spying people, like we
| seen recently at UK government getting into Apple E2E
| encryption.
|
| This criticism is valid for the business who wants to use AI
| to improve coding, code analysis or code review,
| documentation, emails, etc, but also for that individual who
| don't want to rely on 3rd party companies for AI usage.
| wewewedxfgdf wrote:
| Nothing in the Claude API release notes.
|
| https://docs.anthropic.com/en/release-notes/api
|
| I really wish Claude would get Projects and Files built into its
| API, not just the consumer UI.
| thanhhaimai wrote:
| > Third, in developing our reasoning models, we've optimized
| somewhat less for math and computer science competition problems,
| and instead shifted focus towards real-world tasks that better
| reflect how businesses actually use LLMs.
|
| Company: we find that optimizing for LeetCode level programming
| is not a good use of resources, and we should be training AI less
| on competition problems.
|
| Also Company: we hire SWEs based on how much time they trained
| themselves on LeetCode
|
| /joke of course
| Svoka wrote:
| My manager explained to me that LeetCode is proving that you
| are willing to dance the dance. Same as PhD requirements etc -
| you probably won't be doing anything related and definitely
| nothing related to LeetCode, but you display dedication and
| ability.
|
| I kinda agree that this is probably reason why companies are
| doing it. I don't like it, but this is besides the matter.
|
| Using Claude other models in interviews probably won't be
| allowed any time soon, but I do use it the work. So it does
| make sense.
| nico wrote:
| And it's also the reality of hiring practices for most VC-
| backed and public companies
|
| Some try to do something more like "real-world" tasks, but
| those end up either being either just toy problems, or long
| take homes
|
| Personally, I feel the most important things to prioritize when
| hiring are: is the candidate going to get along with their
| teammates (colleagues, boss, etc), and do they have the basic
| skills to relatively quickly learn their jobs once they start?
| EliasWatson wrote:
| I asked it for a self-portrait as a joke and the result is
| actually pretty impressive.
|
| Prompt: "Draw a SVG self-portrait"
|
| https://claude.site/artifacts/b10ef00f-87f6-4ce7-bc32-80b3ee...
|
| For comparison, this is Sonnet 3.5's attempt:
| https://claude.site/artifacts/b3a93ba6-9e16-4293-8ad7-398a5e...
| orangesun wrote:
| New mascot! Just make it the Anthropic orange
| frankfrank13 wrote:
| Tried claude code, and have an empty unresponsive terminal.
|
| Looks cool in the demo though, but not sure this is going to
| perform better than Cursor, and shipping this as an interactive
| CLI instead of an extension is... a choice
| toddmorey wrote:
| I think it's a smart starting point as it's compatible with all
| IDEs. Iterate and learn and then later wrap the functionality
| up into IDE plugins.
| apsec112 wrote:
| They don't say this, but from querying it, they also seem to have
| updated the knowledge cutoff from April 2024 ("3.6") to October
| 2024 (3.7)
| KerryJones wrote:
| Thanks for noting this -- it's actually pretty important in my
| work.
| sunaookami wrote:
| It's in the Model Card:
| https://assets.anthropic.com/m/785e231869ea8b3b/original/cla...
|
| >Claude 3.7 Sonnet is trained on a proprietary mix of publicly
| available information on the Internet as of November 2024
| rahimnathwani wrote:
| I'm curious how Claude Code compares to Aider. It seems like they
| have a similar user experience.
| azinman2 wrote:
| To me the biggest surprise was seeking grok dominate in all of
| their published benchmarks. I haven't seen any benchmarks of it
| yet (which I take with a giant heap of salt), but it's still
| interesting nevertheless.
|
| I'm rooting for Anthropic.
| pertymcpert wrote:
| Indeed. I wonder what the architecture for Claude and Grok3 is.
| If they're still dense models was the MoE excitement with R1
| was a tad premature...
| phillipcarter wrote:
| Neither a statement for or against Grok or Anthropic:
|
| I've now just taken to seeing benchmarks as pretty lines or
| bars on a chart that are in no way reflective of actual ability
| for my use cases. Claude has consistently scored lower on some
| benchmarks for me, but when I use it in a real-world codebase,
| it's consistently been the only one that doesn't veer off
| course or "feel wrong". The others do. I can't quantify it, but
| that's how it goes.
| vessenes wrote:
| O1 pro is excellent at figuring out complex stuff that Claude
| misses. It's my go to mid level debug assistant when Claude
| spins
| viccis wrote:
| Yeah, putting it on the opposite side of that comparison chart
| was a sleezy but likely effective move.
| koakuma-chan wrote:
| Grok does the most thinking out of all models I tried (it can
| think for 2+ minutes), and that's why it is so good, though I
| haven't tried Claude 3.7 yet.
| photon_collider wrote:
| Nice to see a new release from Anthropic. Yet, this only makes me
| even more curious of when we'll see a new Claude Opus model.
| bakugo wrote:
| Funny enough, 3.7 Sonnet seems to think it's Opus right now:
|
| > "thinking": "I am Claude, an AI assistant created by
| Anthropic. I believe the specific model is Claude 3 Opus, which
| is Anthropic's most capable model at the time of my training.
| However, I should simply identify myself as Claude and not
| mention the specific model version unless explicitly asked for
| that level of detail."
| Alex-Programs wrote:
| I doubt we will. The state of the art seem to have moved away
| from the GPT-4 style giant and slow models to smaller, more
| refined ones - though Groq might be a bit of a return to the
| "old ways"?
|
| Personally I'm hoping they update Haiku at some point. It's not
| quite good enough for translation at the moment, while Sonnet
| is pretty great and has OK latency
| (https://nuenki.app/blog/llm_translation_comparison)
| cyounkins wrote:
| I don't yet see it in Bedrock in us-east-1 or us-east-2
| punkpeye wrote:
| If you are open to alternatives
| https://glama.ai/models/claude-3-7-sonnet-20250219
| elliot07 wrote:
| The cost is absurd (compared to other LLM providers these days).
| I asked 3 questions and the cost was ~0.77c.
|
| I do like how this is implemented as a bash tool and not an
| editor replacement though. Never leaving Vim! :P
| koakuma-chan wrote:
| Yep, my experience as well. It's just not worth it.
| koakuma-chan wrote:
| It burns through tokens like crazy on a small code base
| https://i.imgur.com/16GCxiy.png
| modeless wrote:
| I updated Cursor to the latest 0.46.3 and manually added
| "claude-3.7-sonnet" to the model list and it appears to work
| already.
|
| "claude-3.7-sonnet-thinking" works as well. Apparently controls
| for thinking time will come soon:
| https://x.com/sualehasif996/status/1894094715479548273
| punkpeye wrote:
| https://glama.ai/models/claude-3-7-sonnet-20250219
|
| Will be interesting to see how this gets adopted in communities
| like Roo/Cline, which currently account for the most token usage
| among Glama gateway user base.
| bcherny wrote:
| Hi everyone! Boris from the Claude Code team here. @eschluntz,
| @catherinewu, @wolffiex, @bdr and I will be around for the next
| hour or so and we'll do our best to answer your questions about
| the product.
| frankfrank13 wrote:
| Congrats on the launch! You said its an important tool for you
| (Claude Code) how does this fit in with Co-Pilot, Cursor, etc.
| Do you/your teammates only rely on Claude Code? What do you
| reach for for different tasks?
| bcherny wrote:
| Claude Code is super popular internally at Anthropic. Most
| engineers like to use it together with an IDE like Cursor,
| Windsurf, VS Code, Zed, Xcode, etc. Personally I usually
| start most coding tasks in Code, then move to an IDE for
| finishing touches.
| 420gunna wrote:
| Are you guys paying Claude for its assistance with your
| products
| pookieinc wrote:
| The biggest complaint I (and several others) have is that we
| continuously hit the limit via the UI after even just a few
| intensive queries. Of course, we can use the console API, but
| then we lose ability to have things like Projects, etc.
|
| Do you foresee these limitations increasing anytime soon?
|
| Quick Edit: Just wanted to also say thank you for all your hard
| work, Claude has been phenomenal.
| eschluntz wrote:
| We are definitely aware of this (and working on it for the
| web UI), and that's why Claude Code goes directly through the
| API!
| smallerfish wrote:
| I'm sure many of us would gladly pay more to get 3-5x the
| limit.
|
| And I'm also sure that you're working on it, but some kind
| of auto-summarization of facts to reduce the context in
| order to avoid penalizing long threads would be sweet.
|
| I don't know if your internal users are dogfooding the
| product that has user limits, so you may not have had this
| feedback - it makes me irritable/stressed to know that I'm
| running up close to the limit without having gotten to the
| bottom of a bug. I don't think stress response in your
| users is a desirable thing :).
| justinbaker84 wrote:
| This is the main point I always want to communicate to
| the teams building foundation models.
|
| A lot of people just want the ability to pay more in
| order to get more.
|
| I would gladly pay 10x more to get relatively modest
| increases in performance. That is how important the
| intelligence is.
| sealthedeal wrote:
| I haven't been able to find ClaudeCLI for pubic access yet.
| Would love to use.
| eschluntz wrote:
| >>> npm install -g @anthropic-ai/claude-code
|
| >>> claude
| kkarpkkarp wrote:
| see https://docs.anthropic.com/en/docs/agents-and-
| tools/claude-c...
| clangfan wrote:
| this is also my problem, ive only used the UI with $20
| subscription, can I use the same subscription to use the cli?
| I'm afraid its like those aws api billing where there is no
| limit to how much I can use then get a surprise bill
| eschluntz wrote:
| It is API billing like AWS - you pay for what you use.
| Every time you exit a session we print the cost, and in the
| middle of a session you can do /cost to see your cost so
| far that session!
|
| You can track costs in a few ways and set spend limits to
| avoid surprises: https://docs.anthropic.com/en/docs/agents-
| and-tools/claude-c...
| mindok wrote:
| Which is theoretically great, but if anyone can get an
| Aussie credit card to work, please let me know.
| robbiep wrote:
| I haven't had an issue with Aussie cards?
|
| But I still hit limits, I use Claudemind with jetbrains
| stuff and there is a max of input tokens (j believe), I
| am 'tier 2' but doesn't look like I can go past this
| without an enterprise agreement
| danw1979 wrote:
| What I really want (as a current Pro subscriber) is a
| subscription tier ("Ultimate" at ~$120/month ?) that
| gives me priority access to the usual chat interface, but
| _also_ a bunch of API credits that would ensure Claude
| and I can code together for most of the average working
| month (reasonable estimate would be 4 hours a day, 15
| days a month).
|
| i.e I'd like my chat and API usage to be all included
| under a flat-rate subscription.
|
| Currenty Pro doesn't give me any API credits to use with
| coding assistants (Claude Code included ?) which is
| completely disjointed. And I need to be a business to use
| the API still ?
|
| Honestly, Claude is so good, just please take my money
| and make it easy to do the above !
| punkpeye wrote:
| If you are open to alternatives, try https://glama.ai/gateway
|
| We currently serve ~10bn tokens per day (across all models).
| OpenAI compatible API. No rate limits. Built in logging and
| tracing.
|
| I work with LLMs every day, so I am always on top of adding
| models. 3.7 is also already available.
|
| https://glama.ai/models/claude-3-7-sonnet-20250219
|
| The gateway is integrated directly into our chat
| (https://glama.ai/chat). So you can use most of the things
| that you are used to having with Claude. And if anything is
| missing, just let me know and I will prioritize it. If you
| check our Discord, I have a decent track record of being
| receptive to feedback and quickly turning around features.
|
| Long term, Glama's focus is predominantly on MCPs, but chat,
| gateway and LLM routing is integral to the greater vision.
|
| I would love feedback if you are going to give a try
| frank@glama.ai
| airstrike wrote:
| The issue isn't API limits, but web UI limits. We can
| always get around the web interface's limits by using the
| claude API directly but then you need to have some other
| interface...
| punkpeye wrote:
| The API still has limits. Even if you are on the highest
| tier, you will quickly run into those limits when using
| coding assistants.
|
| The value proposition of Glama is that it combines UI and
| API.
|
| While everyone focuses on either one or the other, I've
| been splitting my time equally working on both.
|
| Glama UI would not win against Anthropic if we were to
| compare them by the number of features. However, the
| components that I developed were created with craft and
| love.
|
| You have access to:
|
| * Switch models between OpenAI/Anthropic, etc.
|
| * Side-by-side conversations
|
| * Full-text search of all your conversations
|
| * Integration of LaTeX, Mermaid, rich-text editing
|
| * Vision (uploading images)
|
| * Response personalizations
|
| * MCP
|
| * Every action has a shortcut via cmd+k (ctrl+k)
| airstrike wrote:
| Ok, but that's not the issue the parent was mentioning.
| I've never hit API limits but, like the original comment
| mentioned, I too constantly hit the web interface limits
| particularly when discussing relatively large modules.
| glenstein wrote:
| Right, that's how I read it also. It's not that there's
| no limits with the API, but that they're appreciably
| different.
| cmdtab wrote:
| Do you have deepseek r1 support? I need it for a current
| product I'm working on.
| pclmulqdq wrote:
| They are just selling a frontend wrapper on other
| people's services, so if someone else offers deepseek,
| I'm sure they will integrate it.
| punkpeye wrote:
| Indeed we do https://glama.ai/models/deepseek-r1
|
| It is provided by DeepSeek and Avian.
|
| I am also midway of enabling a third-provider (Nebius).
|
| You can see all models/providers over at
| https://glama.ai/models
|
| As another commenter in this tread said, we are just a
| 'frontend wrapper' around other people services.
| Therefore, it is not particularly difficult to add models
| that are already supported by other providers.
|
| The benefit of using our wrapper is that you can use a
| single API key and you get one bill for all your AI
| bills, you don't need to hack together your own logic for
| routing requests between different providers, failovers,
| keeping track of their costs, worry what happens if a
| provider goes down, etc.
|
| The market at the moment is hugely fragmented, with many
| providers unstable, constantly shifting prices, etc. The
| benefit of a router is that you don't need to worry about
| those things.
| cmdtab wrote:
| Yeah I am aware. I use open router at the moment but I
| find it lacks a good UX.
| punkpeye wrote:
| Open router is great.
|
| They have a very solid infrastructure.
|
| Scaling infrastructure to handle billions of tokens is no
| joke.
|
| I believe they are approaching 1 trillion tokens per
| week.
|
| Glama is way smaller. We only recently crossed 10bn
| tokens per day.
|
| However, I have invested a lot more into UX/UI of that
| chat itself, i.e. while OpenRouter is entirely focused on
| API gateway (which is working for them), I am going for a
| hybrid approach.
|
| The market is big enough for both projects to co-exist.
| light_triad wrote:
| Thanks for this - exciting launch. Do you have examples of cool
| applications or demos that the HN crowd should check out?
| eschluntz wrote:
| hi! I've been working on demos where I let Claude Code run
| for hours at a time on a sandboxed project:
| https://x.com/ErikSchluntz/status/1894104265817284770
|
| TLDR: asking claude to speed up my code once 1.8x'd perf, but
| putting it in a loop telling it to make it faster for 2 hours
| led to a 500x speedup!
| LouisSayers wrote:
| I assume you had a comprehensive test suite?
| light_triad wrote:
| YES!! I need infinite credits for infinite Claude Code.
| Will try it to get Claude to do all my work.
| catherinewu wrote:
| We built Claude Code with Claude Code!
| Karrot_Kream wrote:
| This is super cool and I hope y'all highlight it
| prominently!
| light_triad wrote:
| Best demo - it's Claude Code all the way down. Claude Code
| === Claude Code
| logicallee wrote:
| >Do you have examples of cool applications or demos that the
| HN crowd should check out?
|
| Not OP obviously, but I've built so many applications with
| Claude, here are just a few:
|
| [1]
|
| Mockup of Utopian infrastructure support button (this is just
| a mockup, the buttons don't do anything): https://claude.site
| /artifacts/435290a1-20c4-4b9b-8731-67f5d8...
|
| [2]
|
| Robot body simulation: https://claude.site/artifacts/6ffd3a73
| -43d6-4bdb-9e08-02901d...
|
| [3]
|
| 15-piece slider puzzle: https://claude.site/artifacts/4504269
| b-69e3-4b76-823f-d55b3e...
|
| [4]
|
| Canada joining the U.S., checklist:
| https://claude.site/artifacts/6e249e38-f891-4aad-
| bb47-2d0c81...
|
| [5]
|
| Secure encryption and decryption with AES-256-GCM with
| password-based key derivation:
|
| https://claude.site/artifacts/cb0ac898-e5ad-42cf-a961-3c4bf8.
| ..
|
| (Try to decrypt this message
|
| kFIxcBVRi2bZVGcIiQ7nnS0qZ+Y+1tlZkEtAD88MuNsfCUZcr6ujaz/mtbEDs
| LOquP4MZiKcGeTpBbXnwvSLLbA/a2uq4QgM7oJfnNakMmGAAtJ1UX8qzA5qMh
| 7b5gze32S5c8OpsJ8=
|
| With the password "Hello Hacker News!!" (without quotation
| marks))
|
| [6]
|
| Supply-demand visualizer under tariffs and subsidies: https:/
| /claude.site/artifacts/455fe568-27e5-4239-afa4-051652...
|
| [7]
|
| fortune cookie program: https://claude.site/artifacts/d7cfa4a
| e-6946-47af-b538-e6f992...
|
| [8]
|
| Household security training for classified household members
| (includes self-assessment and certificate): https://claude.si
| te/artifacts/7754dae3-a095-4f02-b4d3-26f1a5...
|
| [9]
|
| public service accountability training program: https://claud
| e.site/artifacts/b89a69fb-1e46-4b5c-9e96-2c29dd...
|
| [10]
|
| Nuclear non-proliferation "big brother" agent technical
| demonstration: https://claude.site/artifacts/555d57ba-6b0e-41
| a1-ad26-7c90ca...
|
| Dating stuff:
|
| [11]
|
| Dating help: Interest Level Assessment Game (is she
| interested?) https://claude.site/artifacts/523c935c-274e-4efa
| -8480-1e09e9...
|
| [12]
|
| Dating checklist: https://claude.site/artifacts/10bf8bea-36d5
| -407d-908a-c1e156...
| mike_hearn wrote:
| Great, thanks! Could you compare this new tool to Aider?
| thegeomaster wrote:
| Thank you to the team. Looks like a great release. Already
| switching existing prompts to Claude 3.7 to see the eval
| results :)
| oofbaroomf wrote:
| Do you think Claude Code is "better", in terms of capabilities
| and token efficiency, than other tools such as Cline, Cursor,
| or Aider?
| bcherny wrote:
| Claude Code is a research preview -- it's more rough, lets
| you see model errors directly, etc. so it's not as polished
| as something like Cline. Personally I use all of the above.
| Engineers here at Anthropic also tend to use Claude Code
| alongside IDEs like Cursor.
| curl-up wrote:
| In the console, TPM limit for 3.7 is not shown (I'm tier 4).
| Does it mean there is no limit, or is it just pending and is
| "variable" until you set it to some value?
| catherinewu wrote:
| We set the Claude Code rate limits to be usable as a daily
| driver. We expect hitting rate limits for synchronous usage
| to be uncommon. Since this is a research preview, we
| recommend you start small as you try the product though.
| curl-up wrote:
| Sorry, I completely missed you're from the Code team. I was
| actually asking about the vanilla API. Any insights into
| those limits? It's still missing the TPM number in the
| console.
| neoromantique wrote:
| Thanks for the product! Glad to hear the (so called) "safety"
| is being walked back on, previously Claude has been feeling a
| little like it is treating me as a child, excited to try it out
| now.
| jumploops wrote:
| From the release you say: "[..] in developing our reasoning
| models, we've optimized somewhat less for math and computer
| science competition problems, and instead shifted focus towards
| real-world tasks that better reflect how businesses actually
| use LLMs."
|
| Can you tell us more about the trade-offs here?
|
| Also, are you using synthetic data for improving the responses
| here, or are you purely leveraging data from usage/partner's
| usage?
| davely wrote:
| I'm in the middle of a particularly nasty refactor of some
| legacy React component code (hasn't been touched in 6 years,
| old class based pattern, tons of methods, why, oh, why did we
| do XYZ) at work and have been using Aider for the last few days
| and have been hitting a wall. I've been digging through Aider's
| source code on Github to pull out prompts and try to write my
| own little helper script.
|
| So, perfect timing on this release for me! I decided to install
| Claude Code and it is making short work of this. I love the
| interface. I love the personality ("Ruminating", "Schlepping",
| etc).
|
| Just an all around fantastic job!
|
| (This makes me especially bummed that I really messed up my OA
| awhile back for you guys. I'll try again in a few months!)
|
| Keep on doing great work. Thank you!
| bcherny wrote:
| Hey thanks so much! <3
| fsndz wrote:
| Anthropic is back and cementing its place as the creator of the
| best coding models--bravo!
|
| With Claude Code, the goal is clearly to take a slice of Cursor
| and its competitors' market share. I expected this to happen
| eventually.
|
| The app layer has barely any moat, so any successful app with
| the potential to generate significant revenue will eventually
| be absorbed by foundation model companies in their quest for
| growth and profits.
| keithwhor wrote:
| I think an argument could be reasonably made that the app
| layer is the only moat. It's more likely Anthropic eventually
| has to acquire Cursor to cement a position here than they
| out-compete it. Where, why, what brand and what product
| customers swipe their credit cards for matters -- a lot.
| fsndz wrote:
| if Claude Code offers a better experience, users will
| rapidly move from cursor to Claude Code.
|
| Claude is for Code: https://medium.com/thoughts-on-machine-
| learning/claude-is-fo...
| keithwhor wrote:
| (1) That's a big if. It requires building a team
| specialized in delivering what Cursor has already
| delivered which is no small task. There are probably only
| a handful of engineers on the planet that have or can be
| incentivized to develop the product intuition the Cursor
| founders have developed in the market already. And even
| then; I'm an aspiring engineer / PM at Anthropic. Why
| would I choose to spend all of my creative energy
| _copying what somebody else is doing_ for the same pay I
| 'd get working on something greenfield, or more
| interesting to me, or more likely to get me a promotion?
|
| (2) It's not clear to me that users (or developers)
| actually behave this way in practice. Engineering is a
| bit of a cargo cult. Cursor got popular because it was
| good but it also got popular because it _got popular_.
| CharlesW wrote:
| > _It requires building a team specialized in delivering
| what Cursor has already delivered which is no small
| task._
|
| There are several AIDEs out there, and based on working
| with Cursor, VS Code, and Windsurf there doesn't seem to
| be much of a difference (although I like Windsurf best).
| What moat does Cursor have?
| aquariusDue wrote:
| Just chiming in to say that AIDEs (Artificial
| Intelligence Development Environments, I suppose) is such
| a good term for these new tools imo.
|
| It's one thing to retrofit LLMs into existing tools but
| I'm more curious how this new space will develop as time
| goes on. Already stuff like the Warp terminal is pretty
| useful in day to day use.
|
| Who knows, maybe this time next year we'll see more
| people programming by voice input instead of typing.
| Something akin to Talon Voice supercharged by a local LLM
| hopefully.
| Etheryte wrote:
| In my opinion you're vastly overestimating how much of a
| moat Cursor has. In broad strokes, in builds an index of
| your repo for easier referencing and then adds some handy
| UI hooks so you can talk to the model, there really isn't
| that much more going on. Yes, the autocomplete is nice at
| times, but it's at best like pair programming with a new
| hire. Every big player in the AI space could replicate
| what they've done, it's only a matter of whether they
| consider it worth the investment or not given how fast
| the whole field is moving.
| keithwhor wrote:
| Conversely, I think you're overestimating the impact of
| the value (or lack thereof) of technology over
| distribution and market timing.
| eschluntz wrote:
| hi! I've been using Claude Code in a very complementary way
| to my IDE, and one of the reasons we chose the terminal is
| because you can open it up inside whichever IDE you want!
| biker142541 wrote:
| I wonder if they will offer competitive request counts
| against Cursor. Right now, at least for me, the biggest
| downside to Claude is how fast I blow through the limits
| (Pro) and hit a wall.
|
| At least with Cursor, I can use all "premium" 500 completions
| and either buy more, or be patient for throttled responses.
| Attummm wrote:
| Hi Boris,
|
| Would it be possible to bring back sonnet 2024 June?
|
| That model was the most attentive.
|
| Because we lost that model this release a value loss for me
| personally.
| ac29 wrote:
| Seems to still be available via API as
| claude-3-5-sonnet-20240620
| joshuabaker2 wrote:
| Hi Boris, love working with Claude! I do have a question--is
| there a plan to have Claude 3.5 Sonnet (or even 3.7!) made
| available on ca-central-1 for Amazon Bedrock anytime soon? My
| company is based in Canada and we deal with customer
| information that is required to stay within Canada, and the
| most recent model from Anthropic we have available to us is
| Claude 3.
| pbronez wrote:
| Concur. Models aren't real until I can run them inside my
| perimeter.
| matznerd wrote:
| Hi Boris et al, can you comment on increased conversation
| lengths or limits through the UI? I didn't see that mentioned
| in the blog post, but it is a continued major concern of
| $20/month Claude.ai users. Is this an issue that should be
| fixed now or still waiting on a larger deployment via Amazon or
| something? If not now, when can users expect the conversation
| length limitations will be increased?
| LouisSayers wrote:
| Awesome work, Claude is amazingly good at writing code that is
| pretty much plug and play.
|
| Could you speak at all about potential IDE integrations? An
| integration into Jetbrains IDEs would be super useful - I
| imagine being able to highlight a bit of code and having a
| plugin check the code graph to see dependencies, tests etc that
| might be affected by a change.
|
| Copying and pasting code constantly is starting to seem a bit
| primitive.
| eschluntz wrote:
| Part of our vision is that because Claude Code is just in the
| terminal, you can bring it into any IDE (or server) you want!
| Obviously that has tradeoffs of not having a full GUI of the
| IDE though
| elliot07 wrote:
| I much prefer the standalone design to being editor
| integrated.
| unshavedyak wrote:
| Anyone know how to get access to it? Notably i'm debating
| purchasing for Claude Code, but being on NixOS i want to
| make sure i can install it first.
|
| If this Code preview is only open to subscribers it means i
| have to subscribe before i can even see if the binary works
| for me. Hmm
|
| _edit_ : Oh, there's a link to "joining the preview" which
| points to: https://docs.anthropic.com/en/docs/agents-and-
| tools/claude-c...
| ben30 wrote:
| Jetbrains have an official mcp plugin
| LouisSayers wrote:
| Thanks, I wasn't aware of the Model Context Protocol!
|
| For anyone interested - you can extend Claude's
| functionality by allowing it to run commands via a local
| "MCP server" (e.g. make code commits, create files,
| retrieve third party library code etc).
|
| Then when you're running Claude it asks for permission to
| run a specific tool inside your usual Claude UI.
|
| https://www.anthropic.com/news/model-context-protocol
|
| https://github.com/modelcontextprotocol/servers
| Falimonda wrote:
| CLAUDE NUMBA ONE!!!
|
| Congrats on the new release!
| Flux159 wrote:
| Is there a way to always accept certain commands across
| sessions? Specifically for things like reading or updating
| files I don't want to have to approve that each time I open a
| new repl.
|
| Also, is there a way to switch models between 3.5-sonnet and
| 3.5-sonnet-thinking? Got the initial impression that the
| thinking model is using an excessive amount of tokens on first
| use.
| eschluntz wrote:
| Right now no, but if you run in docker, you can use
| `--dangerously-skip-permissions`
|
| Some commands could be totally fine in one context, but bad
| in a different i.e. pushing to master
| bcherny wrote:
| When you are prompted to accept a bash command, we should be
| giving you the option to not ask again. If you're not seeing
| that for a specific bash command, would you mind running /bug
| or filing an issue on Github?
| https://github.com/anthropics/claude-code/issues
|
| Thinking and not thinking is actually the same model! The
| model thinks automatically when you ask it to. If you don't
| explicitly ask it to think, it won't use thinking.
| logicallee wrote:
| Can you give some insight into how you chose the reply limit
| length? It seems to cut off many useful programs that are
| 80%-90% done and if the limit were just a little higher it
| would be a source of extraordinary benefit.
| bcherny wrote:
| If you can reproduce that, would you mind reporting it with
| /bug?
| logicallee wrote:
| Just tried it with claude 3.7 sonnet, here is the share: ht
| tps://claude.ai/share/68db540d-a7ba-4e1f-882e-f10adf64be91
| and it doesn't finish outputing the program. (It's missing
| the rest of the application function and the main
| function).
|
| Here are steps to reproduce.
|
| Background/environment:
|
| ChatGPT helped me build this complete web browser in
| Python:
|
| https://taonexus.com/publicfiles/feb2025/71toy-browser-
| with-...
|
| It looks like this, versus the eventual goal:
| https://imgur.com/a/j8ZHrt1
|
| in 1055 lines. But eventually it couldn't improve on it
| anymore, ChatGPT couldn't modify it at my request so that
| inline elements would be on the same line.
|
| If you want to run it just download it and rename it to
| .py, I like Anaconda as an environment, after reading the
| code you can install the required libraries with:
|
| conda install -c conda-forge requests pillow urllib3
|
| then run the browser from the Anaconda prompt by just
| writing "python " followed by the name of the file.
|
| 2.
|
| I tried to continue to improve the program with Claude, so
| that in-line elements would be on the same line.
|
| I performed these reproduceable steps:
|
| 1. copied the code and pasted it into a Claude chat window
| with ctrl-v. This keeps it in the chat as paste.
|
| 2. Gave it the prompt "This complete web browser works but
| doesn't lay out inline elements inline, it puts them all on
| a new line, can you fix it so inline elements are inline?"
|
| It spit out code until it hit section 8 out of 9 which is
| 70% of the way through and gave the error message "Claude
| hit the max length for a message and has paused its
| response. You can write Continue to keep the chat going".
| Screenshot:
|
| https://imgur.com/a/oSeiA4M
|
| So I wrote "Continue" and it stops when it is 90% of the
| way done.
|
| Again it got stuck at 90% of the way done, second
| screenshot in the above album.
|
| So I wrote "Continue" again.
|
| It just gave an answer but it never finished the program.
| There's no app entry in the program, it completely omited
| the rest of the main class itself and the callback to call
| it, which would be like: def
| run(self): self.root.mainloop()
| ###########################################################
| #################### # main ###############
| ###########################################################
| ##### if __name__=="__main__":
| sys.setrecursionlimit(10**6) app=ToyBrowser()
| app.run()
|
| so it only output a half-finished program. It explained
| that it was finished.
|
| I tried telling it "you didn't finish the program, output
| the rest of it" but doing so just got it stuck rewriting it
| without finishing it. Again it said it ran into the limit,
| again I said Continue, and again it didn't finish it.
|
| The program itself is only 1055 lines, it should be able to
| output that much.
| bakugo wrote:
| Can you let the API team know that the /v1/models endpoint has
| been broken for hours? Thanks.
| latetomato wrote:
| Hello! Member of the API team here. We're unable to find
| issues with the /v1/models endpoint--can you share more
| details about your request? Feel free to email me at
| suzanne@anthropic.com. Thank you!
| bakugo wrote:
| It always returns a Not Found error for me. Using the curl
| command copied directly from the docs:
|
| $ curl https://api.anthropic.com/v1/models --header "x-api-
| key: $ANTHROPIC_API_KEY" --header "anthropic-version:
| 2023-06-01"
|
| {"type":"error","error":{"type":"not_found_error","message"
| :"Not found"}}
|
| Edit: Tried creating a different API key and it works with
| that one. Weird.
| lebovic wrote:
| If you can reproduce the issue with the other API key,
| I'd also love to debug this! Feel free to share the curl
| -vv output (excluding the key) with the Anthropic email
| address in my profile
| kevinz3 wrote:
| hey guys! i was wondering why you chose to build Claude code
| via CLI when many popular choices like cursor and windsurf fork
| VScode. do you envision the future of Claude code to abstract
| away the codebase entirely?
| bcherny wrote:
| We wanted to bring the model to people where they are without
| having to commit to a specific tool or radically change their
| workflows. We also wanted to make a way that lets people
| experience the model's coding abilities as directly as
| possible. This has tradeoffs: it uses a lot of tokens, and is
| rough (eg. it shows you tool errors and model weirdness), but
| it also gives you a lot of power and feels pretty awesome to
| use.
| unshavedyak wrote:
| I like this quite a bit, thank you! I prefer Helix editor
| and i hate the idea of running VSCode just to access some
| random Code assistant
| babyshake wrote:
| One thing I would love to have fixed - I type in a prompt, the
| model produces 90% or even 100% of the answer, and then shows
| an error that the system is at capacity and can't produce an
| answer. And then the response that has already been provided is
| removed! Please just make it where I can still have access to
| the response that has been provided, even if it is incomplete.
| rishikeshs wrote:
| This. Claude team, please fix this!
| pbor wrote:
| Hi and congrats on the launch!
|
| Will check out Claude Code soon, but in the meantime one
| unrelated other feature request: Moving existing chats into a
| project. I have a number of old-ish but super-useful and
| valuable chats (that are superficially unrelated) that I would
| like to bring together in a project.
| ipsum2 wrote:
| Why gatekeep Claude Code, instead of releasing the code for it?
| It seems like a direct increase in revenue/API sales for your
| company.
| sangnoir wrote:
| I'm not affiliated with Anthropic, but it seems like doing
| this will commoditize Claude (the AIaaS). Hosted AI providers
| are doing all they can to move away from being
| interchangeable commodities; it's not good for Anthropic's
| revenue for users to be able to easily swap-out the backend
| of Cloud Code to a local Olama backend, or a cheaper hosted
| DeepSeek. Open sourcing Claude Code would make this option 1
| or 2 forks/PRs away.
| Ninjinka wrote:
| How is your largest customer, Cursor, taking the news that
| you'll be competing directly with them?
| behnamoh wrote:
| honestly, is this something that anthropic should be worried
| about? you could ask the same question from all the startups
| that were destroyed by OpenAI.
| sebzim4500 wrote:
| They probably aren't thrilled, but a lot of users will prefer
| a UI and I doubt Anthropic has the spare cycles to make a
| full Cursor competitor.
| alienthrowaway wrote:
| Unless Cursor had agreed to an exclusivity agreement with
| Anthropic, Antropic was (and still is) at risk of Cursor
| moving to a different provider or using their middleman
| position to train/distill their own model that competes with
| Anthropic.
| themgt wrote:
| Is there / are you planning a way to set $ limits per API key?
| Far as I can tell the "Spend limits" are currently per-org only
| which seems problematic.
| bcherny wrote:
| Good idea! Tracking here:
| https://github.com/anthropics/claude-code/issues/16
| l1n wrote:
| You can with Workspaces - https://support.anthropic.com/en/ar
| ticles/9796807-creating-a...
| nprateem wrote:
| Does this actually have an 8k (or more) output context via the
| API?
|
| 3.5 did with a beta header but while 3.6 claimed to, it always
| cut its responses after 4k.
|
| IIRC someone reported it on GH but had no reply.
| antirez wrote:
| One of the silver bullets of Claude, in the context of coding,
| is that it does NOT use RAG when you use it via the web
| interface. Sure, you burn your tokens but the model sees
| everything and this let it reply in a much better way. Is
| Claude Code doing the same and just doing document-level RAG,
| so that if a document is relevant and _if it fits_ , all the
| document will be put inside the context window? I really hope
| so! Also, this means that splitting large code bases into
| manageable file sizes will make more and more sense. Another Q:
| is the context size of Sonnet 3.7 the same of 3.5? Btw Thanks
| you _so much_ for Claude Sonnet, in the latest months it
| changed the way I work and I 'm able to do a lot more, now.
| bcherny wrote:
| Right -- Claude Code doesn't use RAG currently. In our
| testing we found that agentic search out-performed RAG for
| the kinds of things people use Code for.
| marlott wrote:
| Interesting - can you elaborate a little on what you mean
| by agentic search here?
| antirez wrote:
| I guess it's what sometimes it's called "self RAG", that
| is, the agent looks inside the files how a human would be
| to find that's relevant.
| kadushka wrote:
| As opposed to vector search, or...?
| FeepingCreature wrote:
| To my knowledge these are the options:
|
| 1. RAG: A model looks at the question, pulls up some
| associated data and hopes that it helps.
|
| 2. Self-RAG: The model intentionally triggers a lookup
| for some keywords. This can be via a traditional RAG or
| just keyword search, like grep.
|
| 3. Full Context: Just jam everything in the context
| window. The model uses its attention mechanism to pick
| out the parts it needs. Best but most expensive of the
| three, especially with repeated queries.
|
| Aider uses kind of a hybrid of 2 and 3: you specify files
| that go in the context, but Aider also uses Tree-Sitter
| to get a map of the entire code, ie. function headers,
| class definitions etc., that is provided in full. On that
| basis, the model can then request additional files to be
| added to the context.
| siva7 wrote:
| Will Claude be available on Azure?
| rgomez wrote:
| What kind of sorcery did you use to create Claude? Honest
| question :)
| bcherny wrote:
| Reticulating...
| TIPSIO wrote:
| What are your thoughts on having a UI/design benchmark?
| riku_iki wrote:
| Is there plans to add websearch function over some core
| websites (SO, API docs)? Competitors have it, and in my
| experience this provide very good grounding for coding tasks
| (way less API functions hallucinated).
| artvandalai wrote:
| Any updates on web search?
| adastra22 wrote:
| When are you providing an alternative to email magic login
| links?
| sebzim4500 wrote:
| Did you guys ever fix the issue where if UK users wanted to use
| the API they have to provide a VAT number?
| posix86 wrote:
| Claude is my go to llm for everything, sounds corny but it's
| literally expanding the circle of what I can reasonably learn,
| manyfold. Right now I'm attempting to read old philosophical
| texts (without any background in similar disciplines), and
| without claude's help to explain the dense language in simpler
| terms & discuss its ideas, give me historical contexts,
| explaining why it was written this or that way, compare it
| against newer ideas - I would've given up many times.
|
| At work I used it many times daily in development. It's concise
| mode is a breath of fresh air compared to any other llm I've
| tried. It has helped me find bugs in foreign code bases,
| explain me the techstack, written bash scripts, saving me
| dozens of hours of work & many nerves. It generally makes me
| reach places I wouldn't without due to time constraints &
| nerves.
|
| The only nitpick is that the service reliability is a bit worse
| than others, forcing me sometimes to switch to others. This is
| probably a hard to answer question, but are there plans to
| improve that?
| throwaway0123_5 wrote:
| I'm curious why there are no results for the "Claude 3.7
| Extended Thinking" on SWE-Bench and Agentic tool use.
|
| Are you finding that extended thinking helps a lot when the
| whole problem can be posed in the prompt, but that it isn't a
| major benefit for agentic tasks?
|
| It would be a bit surprising, but it would also mirror my
| experiences, and the benchmarks which show Claude 3.5 being
| better at agentic tasks and SWE tasks than all other models,
| despite not being a reasoning model.
| danso wrote:
| Been a long time casual -- i.e. happy to fix my code by asking
| questions and copy/pasting individual snippets via the chat
| interface. Decided to give the `claude` terminal tool a run and
| have to admit it looks like a fantastic tool.
|
| Haven't tried to build a modern JS web app in _years_ -- it
| took the claude tool just a few minutes of prompting to convert
| and refactor an old clunky tool into a proper project
| structure, and using svelte and vite and tailwind (which I
| haven 't built with before). Trying to learn how to even
| scaffold a modern app has felt daunting and this eliminates 99%
| of that friction.
|
| One funny quirk: I asked it to build a test suite (I know zilch
| about JS testing frameworks, so it picked vitest for me) for
| the newly refactored app. I noticed that 3 of the 20 tests
| failed and so I asked it to run vitest for itself and fix the
| failing things. 2 minutes later, and now 7 tests were
| failing...
|
| Which is very funny to me, but also not a big deal. Again, it's
| such a chore to research test libs and then set things up to
| their conventions. That the claude tool built a very usable
| scaffold that I can then edit and iterate on is such a huge
| benefit by itself, I don't need (nor desire) the AI to be
| complete turnkey solution.
| bhouston wrote:
| Have you seen https://mycoder.ai? Seems quite similar. It was
| my own invention and it seems that you guys are thinking along
| similar lines - incredibly similar lines.
| handfuloflight wrote:
| Have _you_ seen https://www.codebuff.com?
| farco12 wrote:
| Thank you for the update!
|
| I recently attempted to use the Google Drive integration but
| didn't follow through with connecting because Claude wanted
| access to my entire Google Drive. I understand this simplifies
| the user experience and reduced time to ship, but is there
| anyway the team can add "reduce the access scope of Google
| Drive integration" to your backlog. Thank you!
|
| Also, I just caught the new Github integration. Awesome.
| lintaho wrote:
| For the pokemon benchmark, what happened after the Lt Surge
| gym? Did the model stall or run out of context or something
| similar?
| swairshah wrote:
| Why not just open source Claude Code? people have tried to
| reverse eng the minified version
| https://gist.githubusercontent.com/1rgs/e4e13ac9aba301bcec28...
| cowpig wrote:
| It would be great if we could upgrade API rate limits. I've
| tried "contacting sales" a few times and never received a
| response.
|
| edit: note that my team mostly hits rate limits using things
| like aider and goose. 80k input token is not enough when in a
| flow, and I would love to experiment with a multi-agent
| workflow using claude
| levocardia wrote:
| Which starter pokemon does Claude typically choose?
| lcnPylGDnU4H9OF wrote:
| I'd also be interested in stats on Helix Fossil vs. Dome
| Fossil.
| gwd wrote:
| Just started playing with the command-line tool. First reaction
| (after using it for 5 minutes): I've been using `aider` as a
| daily driver, with Claude 3.5, for a while now. One of the
| things I appreciate about aider is that it tells you how much
| each query cost, and what your total cost is this session. This
| makes it low-key easy to keep tabs on the cost of what I'm
| doing. Any chance you could add that to claude-code?
|
| I'd also love to have it in a language that can be compiled,
| like golang or rust, but I recognize a rewrite might be more
| effort than it's worth. (Although maybe less with claude code
| to help you?)
|
| EDIT: OK, 10 minutes in, and it seems to have major issues
| doing basic patches to my Golang code; the most recent thing it
| did was add a line with incorrect indentation, then try three
| times to update it with the correct indentation, getting
| "String to replace not found in file" each time. Aider with
| claude 3.5 does this really well -- not sure what the
| counfounding issue is here, but might be worth taking a look at
| their prompt & patch format to see how they do it.
| davidbarker wrote:
| If you do `/cost` it will tell you how much you've spent
| during that session so far.
| eschluntz wrote:
| hi! You can do /cost at any time to see what the current
| session has cost
| xianshou wrote:
| Any way to parallelize tool use? When I go into a repo and ask
| "what's in here", I'm aiming for a summary that returns in 20
| seconds.
| andrewchilds wrote:
| Hi Boris! Thank you for your work on Claude! My one pet peeve
| with Claude specifically, if I may: I might be working on a
| Svelte codebase and Claude will happily ignore that context and
| provide React code. I understand why, but I'd love to see much
| less of a deep reliance on React for front-end code generation.
| PKop wrote:
| It would be great to have a C# / .NET SDK available for Claude
| so it can be integrated into Semantic Kernel [0][1]. Are there
| any plans for this?
|
| [0] https://github.com/microsoft/semantic-
| kernel/issues/5690#iss...
|
| [1] https://github.com/microsoft/semantic-kernel/pull/7364
| timojaask wrote:
| Hi! I've been using Claude for macOS and iOS coding for a
| while, and it's mostly great, but it's always using deprecated
| APIs, even if I instruct it not to. It will correct the mistake
| if I ask it to, but then in later iterations, it will sometimes
| switch back to using a deprecated API. It also produces a lot
| of code that just doesn't compile, so a lot of time is spent
| fixing the made up or deprecated APIs.
| kapnap wrote:
| Any change there will be a way to copy and paste the responses
| into other text boxes (i.e., a new email) and not have to re-
| jig the formatting?
|
| Lists, numbers, tabs, etc. are all a little time consuming...
| minor annoyance but thought I'd share.
| wellthisisgreat wrote:
| Hi, what are the privacy terms for Claude Code? Is it
| memorizing the codebase it's helping with? From an enterprise
| standpoint
| joevandyk wrote:
| It would be amazing to be able to use an API key to submit
| prompts that use our Project Knowledge. That doesn't seem to be
| currently possible, right?
| robbomacrae wrote:
| Awesome to see a new Claude model - since 3.5 its been my go-to
| for all code related tasks.
|
| I'd really like to use Claude Code in some of my projects vs
| just sharing snippets via the UI but I'm curious how might
| doing this from our source directory affect our IP including
| NDA's, trade secret protections, prior disclosure rules on
| (future) patents, open source licensing restrictions re:
| redistribution etc?
|
| Also hi Erik! - Rob
| dailykoder wrote:
| Folks, let me tell you, AI is a big league player, it's a real
| winner, believe me. Nobody knows more about AI than I do, and I
| can tell you, it's going to be huge, just huge. The
| advancements we're seeing in AI are tremendous, the best, the
| greatest, the most fantastic. People are saying it's going to
| change the world, and I'm telling you, they're right, it's
| going to be yuge. AI is a game-changer, a real champion, and
| we're going to make America great again with the help of this
| incredible technology, mark my words.
| fragmede wrote:
| Now that the world's gotten used to the existence of AI, any
| hope on removing the guardrails on Claude? I don't need it to
| answer "How do I make meth", but I would like to not have to
| social engineer my prompts. I'd like it to just write the code
| I asked for and not judge me on how ethical the code might be.
|
| Eg Claude will refuse to write code to wget a website and parse
| the html if you ask it to scrape your ex girlfriend's Instagram
| profile, for ethical and tos reasons, but if you phrase the
| request differently, it'll happily go off and generate code
| that does that exact thing.
|
| Does that really provide value as a business transaction?
| luke-stanley wrote:
| My key got killed months ago when I tested it on a PDF, and
| support never got back to me so I am waiting for OpenRouter
| support!
| throw83288 wrote:
| Serious question: What advice would you give to a Computer
| Science student in light of these tools?
| danw1979 wrote:
| Serious answer: learn to code.
|
| You still need to know what good code looks like to use these
| tools. If you go forward in your career trusting the output
| of LLMs without the skills to evaluate the correctness,
| style, functionality of that code then you will have
| problems.
|
| People still write low level machine code today, despite
| compilers having existed for 70+ (?) years.
|
| We'll always need full-stack humans who understand everything
| down to the electrons even in the age of insane automation
| that we're entering.
| _cs2017_ wrote:
| Your footnote 3 seems to imply that the low number for o1 and
| Grok3 is without parallelism, but I don't think it's publicly
| known whether they use internal parallelism? So perhaps the low
| number already uses parallelism, while the high number uses
| even more parallelism?
|
| Also, curious if you have any intuition as to why the no-
| parallelism number for AIME with Claude (61.3%) is quite low
| (e.g., relative to R1 87.3% -- assuming it is an apples to
| apples comparison)?
| failerk wrote:
| I tried signing up to use Claude about 6 months ago and ran
| into an error on the signup page. For some reason this
| completely locked me out from signing up since a phone number
| was tied to the login. I have submitted requests to get removed
| from this blacklist and heard nothing. The times I have tried
| to reach out on Twitter were never responded to. Has the
| customer support improved in the last 6 months?
| galaxyLogic wrote:
| The thing I would like automated is highlighting a function in
| my code then ask the AI to move it to a new module-file and
| import that new module.
|
| I would like this to happen easily like hitting a menu or
| button without having to write an elaborate "prompt" every
| time.
|
| Is this possible?
| TriangleEdge wrote:
| This AI race is happening so fast. Seems like it to me anyway. As
| a software developer/engineer I am worried about my job
| prospects.. time will tell. I am wondering what will happen to
| the west coast housing bubbles once software engineers lose their
| high price tags. I guess the next wave of knowledge workers will
| move in and take their place?
| fallinditch wrote:
| My guess is that, yes, the software development job market is
| being massively disrupted, but there are things you can do to
| come out on top:
|
| * Learn more of the entire stack, especially the backend, and
| devops.
|
| * Embrace the increased productivity on offer to ship more
| products, solo projects, etc
|
| * Be highly selective as far as possible in how you spend your
| productive time: being uber-effective can mean thinking and
| planning in longer timescales.
|
| * Set up an awesome personal knowledge management system and
| agentic assistants
| bilbo0s wrote:
| This is really good advice.
|
| Underrated comment.
| j_maffe wrote:
| Do you have any specific tips for the last point? I
| completely agree with it and have set up a fairly robust
| Obsidian note taking structure that will benefit greatly from
| an agentic assistant. Do you use specific tools or workframe
| for this?
| whynotminot wrote:
| > Learn more of the entire stack, especially the backend, and
| devops.
|
| I actually wonder about this. Is it better to gain some
| relatively mediocre experience at lots of things? AI seems to
| be pretty good at lots of things.
|
| Or would it be better to develop deep expertise in a few
| things? Areas where even smart AI with reasoning still can
| get tripped up.
|
| Trying to broaden your base of expertise seems like it's
| always a good idea, but when AI can slurp the whole internet
| in a single gulp, maybe it isn't the best allocation of your
| limited human training cycles.
| throw234234234 wrote:
| It has the potential to effect a lot more than just SV/The West
| Coast - in fact SV may be one of the only areas who have some
| silver lining with AI development. I think these models have a
| chance to disrupt employment in the industry globally.
| Ironically it may be only SWE's and a few other industries
| (writing, graphic design, etc) that truly change. You can see
| they and other AI labs are targeting SWEs in particular - just
| look at the announcement "Claude 3.7 and Code" - very little
| mention of any other domains on their announcement posts.
|
| For people who aren't in SV for whatever reason and haven't
| seen the really high pay associated with being there - SWE is
| just a standard job often stressful with lots of learning
| required ongoing. The pain/anxiety of being disrupted is even
| higher then since having high disposable income to invest/save
| would of been less likely. Software to them would of been a job
| with comparable pay's to other jobs in the area; often
| requiring you to be degree qualified as well - anecdotally many
| I know got into it for the love; not the money.
|
| Who would of thought the first job being automated by AI would
| be software itself? Not labor, or self driving cars. Other
| industries either seem to have hit dead ends, or had other
| barriers (regulation, closed knowledge, etc) that make it
| harder to do. SWE's have set an example to other industries -
| don't let AI in or keep it in-house as long as possible. Be
| closed source in other words. Seems ironic in hindsight.
| throw83288 wrote:
| What do you even do then as a student? I've asked this dozens
| of times with zero practical answers at all. Frankly I've
| become entirely numb to it all.
| throw234234234 wrote:
| Be glad that you are empowered to pivot - I'm making the
| assumption you are still young being a student. In a
| disrupted industry you either want to be young (time to
| change out of it) or old (50+) - can retire with enough
| savings. The middle age people (say 15-25 years in the
| industry; your 35-50 yr olds) are most in trouble depending
| on the domain they are in. For all the "friendly" marketing
| IMO they are targeting tech jobs in general - for many
| people if it wasn't for tech/coding/etc they would never
| need to use an LLM at all. Anthrophic's recent stats as to
| who uses their products are telling - its mostly code code
| code.
|
| The real answer is either to pivot to a domain where the
| computer use/coding skills are secondary (i.e. you need the
| knowledge but it isn't primary to the role) or move to an
| industry which isn't very exposed to AI either due to
| natural protections (e.g. trades) or artifical ones (e.g
| regulation/oligopolies colluding to prevent knowledge
| leaking to AI). May not be a popular comment on this
| platform - I would love to be wrong.
| viraptor wrote:
| It seems to be slowing down actually. Last year was wild until
| around llama 3. The latest improvements are relatively small.
| Even the reasoning models are a small improvement over explicit
| planning with agents that we could already do before - it's
| just nicely wrapped and slightly tuned for that purpose.
| Deepseek did some serious efficiency improvements, but not so
| much user-visible things.
|
| So I'd say that the AI race is starting to plateau a bit
| recently.
| j_maffe wrote:
| While I agree, you have to remember the dimensionality of the
| labor-skill space is. The was I see it is that you can
| imagine the capability of AI as a radius, and the amount of
| tasks it can cover is a sphere. Linear imporovements in
| performance causes cubic (or whatever the labor-skill
| dimensionality is) imporvement in task coverage.
| LouisSayers wrote:
| I'm not too concerned short to medium term. I feel there are
| just too many edge cases and nuances that are going to be
| missed by AI systems.
|
| For example, systems don't always work in the way they're
| documented to. How is an AI going to differentiate cases where
| there's a bug in a service vs a bug in its own code? How will
| an AI even learn that the bug exists in the first place? How
| will an AI differentiate between someone reporting a bug and a
| hacker attempting to break into a system?
|
| The world is a complex place and without ACTUAL artificial
| _intelligence_ we 're going to need people to _at least_ guide
| AI in these tricky situations.
|
| My advice would be to get familiar with using AI and new AI
| tools and how they fit into our usual workflows.
|
| Others may disagree, but I don't think software engineers (at
| least ones the good ones) are going anywhere.
| shortrounddev2 wrote:
| Does claude have a vscode plugin yet? I dropped github copilot
| because I didnt want so many subscriptions
| dugmartin wrote:
| You can use the Roo Code extension and point it most any api,
| including Anthropic:
|
| https://marketplace.visualstudio.com/items?itemName=RooVeter...
| visarga wrote:
| Use Windsurf, a VSCode fork, it defaults on Claude as LLM.
| wolffiex wrote:
| Try running Claude Code in your VS Code terminal! Just don't
| paste too much text :)
| https://stackoverflow.com/questions/41714897/character-line-...
| jumploops wrote:
| > "[..] in developing our reasoning models, we've optimized
| somewhat less for math and computer science competition problems,
| and instead shifted focus towards real-world tasks that better
| reflect how businesses actually use LLMs."
|
| This is good news. OpenAI seems to be aiming towards "the
| smartest model," but in practice, LLMs are used primarily as
| learning aids, data transformers, and code writers.
|
| Balancing "intelligence" with "get shit done" seems to be the
| sweet spot, and afaict one of the reasons the current crop of
| developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet
| over 4o.
| bicx wrote:
| Claude 3.5 has been fantastic in Windsurf. However, it does
| cost credits. DeepSeek V3 is now available in Windsurf at zero
| credit cost, which was a major shift for the company. Great to
| have variable options either way.
|
| I'd highly recommend anyone check out Windsurf's Cascade
| feature for agentic-like code writing and exploration. It
| helped save me many hours in understanding new codebases and
| tracing data flows.
| ai-christianson wrote:
| I'm working on an OSS agent called RA.Aid and 3.7 is
| anecdotally a huge improvement.
|
| About to push a new release that makes it the default.
|
| It costs money but if you're writing code to make money, it's
| totally worth it.
| throwup238 wrote:
| DeepSeek's models are vastly overhyped (FWIW I have access to
| them via Kagi, Windsurf, and Cursor - I regularly run the
| same tests on all three). I don't think it matters that V3 is
| free when even R1 with its extra compute budget is inferior
| to Claude 3.5 by a large margin - at least in my experience
| in both bog standard React/Svelte frontend code and more
| complex C++/Qt components. After only half an hour of using
| Claude 3.7, I find the code output is superior and the
| thinking output is in a completely different universe (YMMV
| and caveat emptor).
|
| For example, DeepSeek's models almost always smash together
| C++ headers and code files even with Qt, which is an
| absolutely egregious error due to the meta-object compiler
| preprocessor step. The MOC has been around for at least 15
| years and is all over the training data so there's no excuse.
| tonyhart7 wrote:
| I seen people switch from claude due to cost to another
| model notably deepseek tbh I think it still depends on
| model trained data on
| bionhoward wrote:
| The big difference is DeepSeek R1 has a permissive license
| whereas Claude has a nightmare "closed output" customer
| noncompete license which makes it unusable for work unless
| you accept not competing with your intelligence supplier,
| which sounds dumb
| SkyPuncher wrote:
| I've found DeepSeek's models are within a stone's throw of
| Claude. Given the massive price difference, I often use
| DeepSeek.
|
| That being said, when cost isn't a factor Claude remains my
| winner for coding.
| rubymamis wrote:
| Hey there! I'm a fellow Qt developer and I really like your
| takes. Would you like to connect? My socials are on my
| profile.
| throwup238 wrote:
| We've already connected! Last year I think, because I was
| interested in your experience building a block editor
| (this was before your blog post on the topic). I've been
| meaning to reconnect for a few weeks now but family life
| keeps getting in the way - just like it keeps getting in
| the way of my implementing that block editor :)
|
| I especially want to publish and send you the code for
| that inspector class and selector GUI that dumps the
| component hierarchy/state, QML source, and screenshot for
| use with Claude. Sadly I (and Claude) took some dumb
| shortcuts while implementing the inspector class that
| both couples it to proprietary code I can't share and
| hardcodes some project specific bits, so it's going to
| take me a bit of time to extricate the core logic.
|
| I haven't tried it with 3.7 but based on my tree-sitter
| QSyntaxHighlighter and Markdown QAbstactListModel tests
| so far, it is _significantly_ better and I suspect the
| work Anthropic has done to train it for computer use will
| reap huge rewards for this use case. I'm still
| experimenting with the nitty gritty details but I think
| it will also be a game changer for testing in general,
| because combining computer use, gammaray-like dumps, and
| the Spix e2e testing API completes the full circle on app
| context.
| newgo wrote:
| How is it possible that deepseek v3 would be free? It costs a
| lot of money to host models
| crowcroft wrote:
| Sometimes I wonder if there is overfitting towards benchmarks
| (DeepSeek is the worst for this to me).
|
| Claude is pretty consistently the chat I go back to where the
| responses subjectively seem better to me, regardless of where
| the model actually lands in benchmarks.
| ben_w wrote:
| > Sometimes I wonder if there is overfitting towards
| benchmarks
|
| There absolutely is, even when it isn't intended.
|
| The difference between what the model is fitting to and
| reality it is used on is essentially every problem in AI,
| from paperclipping to hallucination, from unlawful output to
| simple classification errors.
|
| (Ok, not _every_ problem, there 's also sample efficiency,
| and...)
| FergusArgyll wrote:
| Ya, Claude _crushes_ the smell test
| eschluntz wrote:
| Thanks! We all dogfood Claude every day to do our own work
| here, and solving our own pain points is more exciting to us
| than abstract benchmarks.
|
| Getting things done require a lot of booksmarts, but also a lot
| of "street smarts" - knowing when to answer quickly, when to
| double back, etc
| LouisSayers wrote:
| Could you tell us a bit about the coding tools you use and
| how you go about interacting with Claude?
| catherinewu wrote:
| We find that Claude is really good at test driven
| development, so we often ask Claude to write tests first
| and then ask Claude to iterate against the tests
| Kerrick wrote:
| Write tests (plural) first, as in write more than one
| failing test before making it pass?
| jasonjmcghee wrote:
| Just want to say nice job and keep it up. Thrilled to start
| playing with 3.7.
|
| In general, benchmarks seem to very misleading in my
| experience, and I still prefer sonnet 3.5 for _nearly_ every
| use case- except massive text tasks, which I use gemini 2.0
| pro with the 2M token context window.
| martinald wrote:
| I find the webdev arena tends to match my experience with
| models much more closely than other benchmarks:
| https://web.lmarena.ai/leaderboard. Excited to see how 3.7
| performs!
| jasonjmcghee wrote:
| Just wanted to already plop an update - "code" is very
| good. Just did a ~4 hour task in about an hour. It cost $3
| which is more than I usual spend in an hour, but very worth
| it.
| d_watt wrote:
| I'm about 50kloc into a project making a react native app /
| golang backend for recipes with grocery lists, collaborative
| editing, household sharing, so a complex data model and runtime.
| Purely from the experiment of "what's it like to build with AI,
| no lines of code directly written, just directing the AI."
|
| As I go through features, I'm comparing a matrix of Cursor,
| Cline, and Roo, with the various models.
|
| While I'm still working on the final product, there's no doubt to
| me that Sonnet is the only model that works with these tools well
| enough to be Agentic (rather than single file work).
|
| I'm really excited to now compare this 3.7 release and how good
| it is at avoiding some of the traps 3.5 can fall into.
| thebigspacefuck wrote:
| This has been my experience as well. Why do the others suck so
| bad?
| d_watt wrote:
| I wonder how much it's self fulfilling, where the developers
| of the agents are tuning their prompts / tool calls to
| sonnet.
| ndm000 wrote:
| Have there been any updates to Claude 3.5 Sonnet pricing? I can't
| find that anywhere even though Claude 3.7 Sonnet is now at the
| same price point. I could use 3.5 for a lot more if it's cheaper.
| minimaxir wrote:
| No changes to Claude 3.5 Sonnet pricing despite the new model.
|
| https://www.anthropic.com/pricing#anthropic-api
| ramesh31 wrote:
| Well there goes my evening
| hubraumhugo wrote:
| You can get your HN profile analyzed by it and it's pretty funny
| :)
|
| https://hn-wrapped.kadoa.com/
|
| I'm using this to test the humor of new models.
| Philpax wrote:
| Seems broken? Getting
|
| > An error occurred in the Server Components render. The
| specific message is omitted in production builds to avoid
| leaking sensitive details. A digest property is included on
| this error instance which may provide additional details about
| the nature of the error.
| ANewFormation wrote:
| I did multiple accounts with no problem, but in trying to do
| you I got the same error.
|
| You've broke the system.
| Philpax wrote:
| New benchmark for good posting, I'll take it!
| ghxst wrote:
| Worked for me, seems to be case sensitive (?) I'll post these
| incase I just got lucky and it still doesn't work for you.
|
| https://hn-wrapped.kadoa.com/Philpax?share
|
| > You explain WebAssembly memory management with such passion
| that we're worried you might be dating your pointer
| allocations.
|
| > Your comments about multiplayer game architecture are so
| detailed, we suspect you've spent more time debugging network
| code than maintaining actual human connections.
|
| > You track AI model performance metrics more closely than
| your own bank account. DeepSeek R1 knows your preferences
| better than your significant other.
|
| I like your interests :)
| Philpax wrote:
| Aha, there it is - terrific, thank you :>
|
| Yes, I'm quite the eclectic kind!
| ANewFormation wrote:
| Oh god that's genuinely _way_ more amusing than I thought llm
| systems were capable of.
| XenophileJKO wrote:
| The more I use LLMs the more I have actually gravitated to
| looking at the humor of LLMs as a imperfect proxy measure of
| "intelligence".
|
| Obviously this is problematic, but Claude 3.5 (and now 3.7)
| have been genuinely funny and consistently funny.
| rubslopes wrote:
| > - You've reminded so many people to use 'Show HN:' that you
| should probably just apply for a moderator position already.
|
| > - Your relationship with AI coding assistants is more
| complicated than most people's dating history - Cline, Cursor,
| Continue.Dev... pick a lane!
|
| > - You talk about grabbing coffee while your LLM writes code
| so much that we're not sure if you're a developer or a barista
| who occasionally programs.
|
| I laughed hard at this :D
| jedberg wrote:
| > For someone who worked at Reddit, you sure spend a lot of
| time on HN. It's like leaving Facebook to spend all day on
| Twitter complaining about social media.
|
| Wow, so spot on it hurts!
| sitkack wrote:
| > For someone who criticizes corporate structures so much,
| you've spent an impressive amount of time analyzing their
| technical decisions. It's like watching someone critique a
| restaurant's menu while eating there five times a week.
| calvinmorrison wrote:
| >Your ideal tech stack is so old it qualifies for social
| security benefits
|
| >You're the only person who gets excited when someone
| mentions Trinity Desktop Environment in 2025
|
| > You probably have more opinions about PHP's empty()
| function than most people have about their entire career
| choices
| drivers99 wrote:
| > Personal Projects: You'll finally complete that bare-
| metal Forth interpreter for Raspberry Pi
|
| I was just looking into that again as of yesterday (I
| didn't post about it here yesterday, just to be clear; it
| picked up on that from some old comments I must have
| posted).
|
| > Profile summary: [...] You're the person who not only
| remembers what a CGA adapter is but probably still has one
| in working condition in your basement, right next to your
| collection of programming books from 1985.
|
| Exactly the case, in a working IBM PC, except I don't have
| a basement. :)
| seafoamteal wrote:
| Felt genuinely called out by that 'Roasts' section.
| Panoramix wrote:
| That thing knows me better than I know myself
| cyberpunk wrote:
| > You hate Terraform so much you'd rather learn Erlang than
| write another for-loop in HCL.
|
| ..
|
| > After years of complaining about Terraform, you'll fully
| embrace Crossplane and write a scathing Medium article titled
| 'Why I Left Terraform and Never Looked Back'.
|
| Hahahaha.
| BeetleB wrote:
| This is a better plug for the new Claude Sonnet model than the
| official announcement!
| jjice wrote:
| This is absolutely hilarious! Thanks for posting. It feels
| weighted towards some specific things (I assume this is done by
| the LLM caring about later context more?) - making it debatably
| even funnier.
|
| > You're the only person who gets excited about trailing commas
| in SQL. Even the database administrators are like 'dude, it's
| just a comma.'
| throwup238 wrote:
| Your comments about suburban missile defense systems have the
| FBI agent monitoring your internet connection seriously
| questioning their career choices. You've spent so much
| time explaining why manufacturing is complex that you could
| have just built your own CRT factory by now. You claim to
| be skeptical of AI hype, yet you've indexed more documentation
| with Cursor than most people have read in their lifetime.
|
| Surprisingly accurate, but seems to be based on a very small
| snippet of actual comments (presumably to save money). I wonder
| what the prompt would output when given the full 200k tokens of
| context.
| LinXitoW wrote:
| Got absolutely read to filth:
|
| > You've spent more time explaining why Go's error handling is
| bad than Go developers have spent actually handling errors.
|
| > Your relationship with programming languages is like a dating
| show - you keep finding flaws in all of them but can't commit
| to just one.
|
| > If error handling were a religion, you'd be its most zealous
| missionary, converting the unchecked one exception at a time.
| airstrike wrote:
| > You've spent more time explaining why Go's error handling
| is bad than Go developers have spent actually handling
| errors.
|
| That is absolutely hilarious. Really well done by everyone
| who made that line possible.
| sa46 wrote:
| Yea, these are nicely done. To add some balance:
|
| > After years of defending Go, you'll secretly start a side
| project in Rust but tell no one on HN about your betrayal
| toomuchtodo wrote:
| The 2025 predictions were like a spooky tarot card reading.
| airstrike wrote:
| > You've mentioned iced so many times, we're starting to wonder
| if you're secretly developing a Rust-based refrigerator company
| on the side.
|
| LMFAO so good. Humor seems on point
| desperatecuban wrote:
| > Your salary is so low even your legacy code feels sorry for
| you.
|
| > You're the only person on HN who thinks $800/month is a
| salary and not a cloud computing bill.
|
| ouch
| jumploops wrote:
| > You've mentioned 'simple is robust' so many times that we're
| starting to think your dating profile just says 'uncomplicated
| and sturdy'.
|
| > For someone who builds tools to automate everything, you sure
| spend a lot of time manually explaining why automation is the
| future on HN.
|
| > Your obsession with sandboxed code execution suggests you've
| been traumatized by at least one production outage caused by an
| intern's unreviewed PR.
|
| So good it hurts!
| jddj wrote:
| > You've recommended Marginalia search so many times, we're
| starting to think you're either the developer or just really
| enjoy websites that look like they were designed in 1998.
|
| Actually quite funny.
|
| [1] https://hn-wrapped.kadoa.com/jddj?share
| throwup238 wrote:
| Especially hilarious considering that this is the actual
| marginalia developer: https://hn-
| wrapped.kadoa.com/marginalia_nu
|
| _> You defend Java with such passion that Oracle 's legal
| team is considering hiring you as their chief evangelist -
| just don't tell them about your secret admiration for more
| elegant programming paradigms._
| StefanBatory wrote:
| ... I had been called out by it hard, lmao. Painfully accurate.
| taytus wrote:
| "You were using 'I don't understand these valuations' before it
| was cool - the original valuation skeptic hipster of Hacker
| News" -
| agys wrote:
| "You've spent more time optimizing DOM manipulation for ASCII
| art than most people spend deciding what to watch on Netflix in
| their entire lives."
|
| Ouch... :)
| hambos22 wrote:
| > You built your own Klaviyo alternative to save EUR500, but
| how many hours of development at market rate did that cost? The
| true Greek economy at work!
|
| ouch (yuyu)
| nbzso wrote:
| This thing is hilarious. :)
|
| Roast:
|
| - Your comments have more doom predictions than a Y2K
| convention in December 1999.
|
| - You've used 'stochastic parrot' so many times, actual parrots
| are filing for trademark infringement.
|
| - If tech dystopia were an Olympic sport, you'd be bringing
| home gold medals while explaining how the podium was designed
| by committee and the medal contains surveillance chips.
| replete wrote:
| I need some ice for the burn I just received.
| gmassman wrote:
| > Spends more time explaining why TypeScript in Svelte is
| problematic than actually fixing TypeScript in Svelte.
|
| Damn, that's brutal. I mean, I never said _I knew_ how to fix
| ComponentProps or generic components, just that they have
| issues...
| processing wrote:
| ljl good stuff
|
| "A digital nomad who splits time between critiquing Facebook's
| UI decisions, unearthing obscure electronic music tracks with 3
| plays on YouTube, and occasionally making fires on German
| islands. When not creating Dystopian Disco mixtapes or
| lamenting the lack of MIDI export in AI tools, they're probably
| archiving NYT articles before paywalls hit.
|
| Roast
|
| You've spent more time complaining about Facebook's UI than
| Facebook has spent designing it, yet you still check it enough
| to notice every change.
|
| Your music discovery process is so complex it requires Discogs,
| Bandcamp, YouTube, and three specialized record stores, yet
| you're surprised when tracks only have 3 plays.
|
| You're the only person who joined HN to discuss the Yamaha DX7
| synthesizer from 1983 and somehow managed to submit two front-
| page stories about it in 2019-2020. The 80s called, they want
| their FM synthesis back."
|
| edit: predictions are spot on - wow. Two of them detailed two
| projects I'm actively working on.
| redeux wrote:
| > You complain about digital distractions while writing novels
| in HN comment threads. That's like criticizing fast food while
| waiting in the drive-thru line.
|
| >You'll write a thoughtful essay about 'digital minimalism'
| that reaches the HN front page, ironically causing you to spend
| more time on HN responding to comments than you have all year.
|
| It sees me! Noooooo ...
| maronato wrote:
| https://hn-wrapped.kadoa.com/dang?share
|
| > Most used terms: "Please don't" lol
| nickvec wrote:
| > You correct grammar in HN comments but still haven't figured
| out that nobody cares
|
| My ego will never recover from this
| raminf wrote:
| > Hacker News
|
| > You'll finally stop checking egg prices at Costco and instead
| focus on writing that definitive 'How I Built My Own Super App
| Without Getting Rejected By Apple' post.
|
| On it!
| fullstackchris wrote:
| > You've experienced so many startup failures that your
| LinkedIn profile should just read 'Professional Titanic
| Passenger: Always Picks the Wrong Ship'.
|
| :'(
| boogieknite wrote:
| > You've spent more time justifying your Apple Vision Pro
| purchase than actually using it for anything productive, but
| hey, at least you can watch movies on 'the best screen' while
| pretending it's a 'dev kit'.
|
| blasted
| wildermuthn wrote:
| "Your enthusiasm for Oculus in 2014 was so intense that Mark
| Zuckerberg probably bought it just to make you stop posting
| about it."
|
| Incredible work!
| ilrwbwrkhv wrote:
| Profile Summary
|
| A successful tech entrepreneur who built a multi-million dollar
| business starting with Common Lisp, you're the rare HN user who
| actually practices what they preach.
|
| Your journey from Lisp to Go to Rust mirrors your evolution
| from idealist to pragmatist, though you still can't help but
| reminisce about the magical REPL experience while complaining
| about JavaScript frameworks.
|
| ---
|
| Roast
|
| You complain about AI-generated code being too complex, yet you
| pine for Common Lisp, a language where parentheses reproduction
| is the primary feature.
|
| For someone who built a multi-million dollar business, you
| spend an awful lot of time telling everyone how much JavaScript
| and React suck. Did a React component steal your lunch money?
|
| You've changed programming languages more often than most
| people change their profile pictures. At this rate, you'll be
| coding in COBOL by 2026 while insisting it's
| 'underappreciated'.
| CamperBob2 wrote:
| _Your comments have more bits of precision than the ADCs you
| love discussing, but somehow still manage to compress all
| nuance out of complex topics_
|
| Hit dog hollers
| dgunay wrote:
| > Your ideal laptop would run Linux flawlessly with perfect
| hardware compatibility, have MacBook build quality, and Windows
| game support. Meanwhile, the rest of us live in reality.
|
| Damn, got me there haha
| netshade wrote:
| LOL, this truly made me laugh. I'm also doing humor stuff with
| Claude, I was pretty pleased with 3.5 so excited to see what
| happens with the 3.7 change. It's a radio station with a bunch
| of DJs with different takes on reality, so looking forward to
| see how it handles their different experiences.
| anonzzzies wrote:
| We have used claude almost exclusively since 3.5 ; we regularly
| run our internal benchmark (coding) against others, but it's
| mostly just a waste of time and money. Will be testing 3.7 the
| coming days to see how it stacks up!
| newbie578 wrote:
| Scary to watch the pace of progress and how the whole industry is
| rapidly shifting.
|
| I honestly didn't believe things would speed up this much.
| DavidPP wrote:
| Haven't had time to try it out, but I've built myself a tool to
| tag my bookmarks and it uses 3.5 Haiku. Here is what it said
| about the official article content:
|
| _I apologize, but the URL and page description you provided
| appear to be fictional. There is no current announcement of a
| Claude 3.7 Sonnet model on Anthropic 's website. The most recent
| Claude 3 models are Claude 3 Haiku, Sonnet, and Opus, released in
| March 2024. I cannot generate a description for a non-existent
| product announcement._
|
| I appreciate their stance on safety, but that still made me
| laugh.
| dzhiurgis wrote:
| Anyone else noticed all the reasoning models kinda catch up on
| claude and claude itself turned to crap last week?
| kmlx wrote:
| Claude 3.5 sonnet has been my go to for coding tasks, it's just
| so much better than the others.
|
| but I've tried using the api in production and had to drop it due
| to daily issues: https://status.anthropic.com/
|
| compare to https://status.openai.com/
|
| any idea when we'll see some improvements in api availability or
| will the focus be more on the web version of claude?
| scrollop wrote:
| Err, if you compare the two consoles you'll see that anthropic
| is actually slightly better on average than openai's uptime.
| kmlx wrote:
| click on individual days. you'll notice that there are daily
| errors.
| msp26 wrote:
| Does it show the raw "reasoning" tokens or is it a summary?
|
| Edit: > we've decided to make its thought process visible in raw
| form.
| koakuma-chan wrote:
| Where did 3.6 go?
| danielbln wrote:
| Allegedly many people called new newest 3.5 revision 3.6, so
| Anthropic just rolled with it and called this 3.7.
| meetpateltech wrote:
| When you ask: 'How many r's are in strawberry?'
|
| Claude 3.7 Sonnet generates a response in a fun and cool way with
| React code and a preview in Artifacts
|
| check out some examples:
|
| [1]https://claude.ai/share/d565f5a8-136b-41a4-b365-bfb4f4400df5
|
| [2]https://claude.ai/share/a817ac87-c98b-4ab0-8160-feefd7f798e8
| jasonjmcghee wrote:
| I'm guessing this is an easter egg, but this was a huge gripe I
| had with artifacts and eventually disabled it (now impossible
| to disable afaict) as I'd ask question completely unrelated to
| code or clearly not wanting code as an output, and I'd have to
| wait for it to write a program (which you can't stop afaict, it
| stops the current artifact then starts a new one)
|
| (still claude sonnet is my go-to and favorite model)
| falcor84 wrote:
| A shame the underlying issue still persists:
|
| > There is exactly 1 'r' in "blueberry" [0]
|
| [0]
| https://claude.ai/share/9202007a-9d85-49e6-9883-a8d8305cd29f
| OsrsNeedsf2P wrote:
| This test has always been so stupid since models work at the
| token level. Claude 3.5 already 5xs your frontend dev speed but
| people still say "hurr durr it can't count strawberry" as if
| that's a useful problem
| dannyw wrote:
| The problem also comes to LLMs being confidently wrong when
| it's wrong.
| bufferoverflow wrote:
| This test isn't stupid. If it can't count the number of
| letters in a text, can you rely on it with more important
| calculations?
| stnmtn wrote:
| You can rely on it for anything that you can validate
| quickly. And it turns out, there are a lot of problems
| which are trivial to validate the solution to, but
| difficult to build the solution.
| anti-soyboy wrote:
| OpenAI should be worried as they products are weak
| batterylake wrote:
| Hi Claude Code team, excited for the launch!
|
| How well does Claude Code do on tasks which rely heavily on
| visual input such as frontend web dev or creating data
| visualizations?
| wolffiex wrote:
| As a CLI, this tool is most efficient when it can see text
| outputs from the commands that it runs. But you can help it
| with visual tasks by putting a screenshot file in your project
| directory and telling claude to read it, or by copying an image
| to your clipboard and pasting it with CTRL+V
| batterylake wrote:
| Cool, thanks!
| siva7 wrote:
| Will Claude Code also be available with Pro Subscription?
| simion314 wrote:
| Why not accepting other payment methods like PayPal/venmo ?
| Steam, Netflix have developers managed to integrate those payment
| methods so I conclude that Anthropic,Google, MS, OpenAI don't
| really need the money from the user but just hunting from big
| investors.
| _joel wrote:
| I've been using 3.5 with Roocode for the past couple of weeks and
| I've found it really quite powerful. Making it write tests and
| run them as part of the flow is with vscode windows pinging about
| is neat too.
| forrestthewoods wrote:
| Claude is the best example of benchmarks not being reflective of
| reality. All the AI labs are so focused on improving benchmark
| scores but when it comes to providing actual utility Claude has
| been the winner for quite some time.
|
| Which isn't to say that benchmarks aren't useful. They surely
| are. But labs are clearly both overtraining and overindexing on
| benchmarks.
|
| Coming from gamedev I've always been significantly more yolo
| trust your gut than my PhD co-workers. Yes data is good. But I
| think the industry would very often be better off trusting guts
| and not needing a big huge expensive UX study or benchmark to
| prove what you can plainly see.
| Alifatisk wrote:
| Why is Claude-3.5-Haiku considered PRO and Claude-3.7-Sonnet is
| for free users?
| alecco wrote:
| Who do I have to kill to get Claude Code access?
| xd1936 wrote:
| $ npm install -g @anthropic-ai/claude-code
|
| $ claude
| ckbishop wrote:
| Well, I used 3.5 via Cursor to do some coding earlier today, and
| the output kind of sucked. Ran it through 3.7 a few minutes ago,
| and it's much more concise and makes sense. Just a little
| anecdotal high five from me.
| freediver wrote:
| Kagi LLM benchmark updated with general purpose and thinking mode
| for Sonnet 3.7.
|
| https://help.kagi.com/kagi/ai/llm-benchmark.html
|
| Appears to be second most capable general purpose LLM we tried
| (second to gemini 2.0 pro, in front of gpt-4o). Less impressive
| in thinking mode, about at the same level as o1-mini and o3-mini
| (with 8192 token thinking budget).
|
| Overall a very nice update, you get higher quality and higher
| speed model at same price.
|
| Hope to enable it in Kagi Assistant within 24h!
| jjice wrote:
| Thank you to the Kagi team for such fast turn around on new
| LLMs being accessible via the Assistant! The value of Kagi
| Assistant has been a no-brainer for me.
| flixing wrote:
| Do you think kagi is the right Eval tool? If so,why?
| thefourthchime wrote:
| Nice, but where is Grok?
| pertymcpert wrote:
| Perhaps they're waiting for the Grok API to be public?
| Squarex wrote:
| I'm surprised that Gemini 2.0 is first now. I remember that
| Google models were under performing on kagi benchmarks.
| manmal wrote:
| Gemini 2 is really good, and insanely fast.
| Squarex wrote:
| It is, but in this benchmark gemini scored very poorly in
| the past.
| Workaccount2 wrote:
| Having your own hardware to run LLMs will pay dividends.
| Despite getting off on the wrong foot, I still believe Google
| is best positioned to run away with the AI lead, solely
| because they are not beholden to Nvidia and not stuck with a
| 3rd party cloud provider. They are the only AI team that is
| top to bottom in-house.
| Squarex wrote:
| I've used gemini for it's large context window before. It's
| a great model. But specifically in this benchmark it has
| always scored very low. So I wonder what has changed.
| guelo wrote:
| How did you chose the 8192 token thinking budget? I've often
| seen Deepseek R1 use way more than that.
| KTibow wrote:
| One thing I don't understand is why Claude 3.5 Haiku, a non
| thinking model in the non thinking section, says it has a 8192
| thinking budget.
| slantedview wrote:
| As a Claude Pro user, one of the biggest problems I have with day
| to day use of Sonnet is running out of tokens, and having to wait
| several hours. Would this new deep thinking capability just hit
| this problem faster?
| k8sToGo wrote:
| Have you tried just using the API and pay as you go?
| mvdtnz wrote:
| That doesn't answer his very specific question.
| grav wrote:
| Claude 3.7 Sonnet seems to have a context window of 64.000 via
| the API: max_tokens: 4242424242 > 64000, which is
| the maximum allowed number of output tokens for
| claude-3-7-sonnet-20250219
|
| I got a max of 8192 with Claude 3.5 sonnet.
| koakuma-chan wrote:
| Context window is how long your prompt can be. Output tokens is
| how long its response can be. What you sent says its response
| can be 64k tokens at maximum.
| epistasis wrote:
| It's pretty fascinating to refresh the usage page on the API site
| while working [0].
|
| After initialization it was up to 500k tokens ($1.50). After a
| few questions and a small edit, I'm up to over a million tokens
| (>$3.00). Not sure if the amount of code navigation and typing
| saved will justify the expense yet. It'll take a bit more
| experimentation.
|
| In any case, the default API buy of $5 seems woefully low to
| explore this tool.
|
| [0] https://console.anthropic.com/settings/usage
| koakuma-chan wrote:
| It also produces terrible code even though it's supposed to be
| good for front-end development.
| trekkie1024 wrote:
| Could you share an example?
| koakuma-chan wrote:
| TLDR: told it to implement a grid view as an alternative to
| the existing list view, and specifically told it to DRY the
| code. What it did? Copy and pasted the list view
| implementation (definitely not DRY), and tried to make it a
| grid, and even though it is a grid, it looks terrible
| (https://i.imgur.com/fJiSjq4.png).
|
| I don't understand how people use cursor and all that other
| shit when it cannot follow such simple instructions.
|
| Prompt (Claude Code): Implement an alternative grid view
| that the users can switch to. Follow the existing code
| style with empty comments and line breaks for improved code
| readability. Use snake case. DRY the code, avoid repetition
| of code. Do not change the font size or weight.
|
| Output: https://github.com/mayo-
| dayo/app/compare/0.4...claude-code-g...
| koakuma-chan wrote:
| It also keeps adding aspect-ratio to every single image
| it finds in my code base.
| koakuma-chan wrote:
| Also this: `grid grid-cols-2 sm:grid-cols-3 md:grid-
| cols-4 lg:grid-cols-5 xl:grid-cols-6`
| (https://github.com/mayo-
| dayo/app/blob/463ad5aeee904289ecc7d4...).
|
| Even though my Layout clearly says `max-w-md`
| (https://github.com/mayo-
| dayo/app/blob/463ad5aeee904289ecc7d4...).
| sensanaty wrote:
| In any moderately sized codebase it's basically useless
| indeed. Pretty much all the praise and hype I ever see is
| from people making todo-list-tier applications and
| shouting with excitement how this is going to replace all
| of humanity.
|
| Hell, I still have to remind it (Cursor) to not give me
| fucking React a few messages after I've already told it
| to not give me React (it's a Vue application with not a
| single line of React in it). Genuinely maddening, but the
| infinite wisdom of the higher ups forces me into wasting
| my time with this crap
| epistasis wrote:
| Claude's predilection and evangelism for React is
| frustrating. Many times I have used it as search with a
| question like "In the Python library X how do I do Z?"
| And I'll get a React widget that computes what I was
| trying to compute.
| pityJuke wrote:
| There's a middle ground, I find.
|
| Absolutely, when tasked with something quite complex in a
| complex code base, it doesn't really work. It can get you
| some of the way there, and some of the code it produces
| gives you great ideas on where to go from, but it doesn't
| work.
|
| But there are certainly some tasks where it excels. I
| asked it to refactor a rather gnarly function (C++), and
| it did a great job at decomposing it. The initial
| decomposition was a bit naive: the original function took
| in a vector, and would parse what the function & data
| from the vector, and the decomposition split out the
| functions, but the data still came in as a vector. For
| instance, one of the functions took a filename, and file
| contents, and it took it as element 0 and element 1 from
| a vector, when it should obviously be two parameters. But
| some further prompting and it took it to the end.
| epistasis wrote:
| Update: Code tokens appear to be cheaper than 3.7 tokens, looks
| like it is around $0.75/million tokens for code, rather than
| the $3/million that the articles specifies for Claude 3.7
| highfrequency wrote:
| Awesome work. When CoT is enabled in Claude 3.7 (not the new
| Claude Code), is the model now able to compile and run code as
| part of its thought process? This always seemed like very low
| hanging fruit to me, given how common this pattern is: ask for
| code, try running it, get an error (often from an outdated API in
| one of the packages used), paste the error back to Claude, have
| Claude immediately fix it. Surely this could be wrapped into the
| reasoning iterations?
| falcor84 wrote:
| Why can't they count to 4?
|
| I accepted it when Knuth did it with TeX's versioning. And I sort
| of accept it with Python (after the 2-3 transition fiasco), but
| this is getting annoying. Why not just use natural numbers for
| major releases?
| jjice wrote:
| I think I heard on a podcast with some of their team that they
| want 4 to be a massive jump. If I recall, they said that they
| want Haiku (the smallest of their current gen models) to be as
| good as Opus (the highest version, although there isn't one in
| the 3.5+ line) of the previous generation.
| sensanaty wrote:
| You'd think all these companies would have a single good naming
| convention, amazingly they don't. I suspect it's half on
| purpose so they can nerf the models without anyone suspecting
| once the hype dies down, since with every one of these models
| the latter version of the "same" version is worse than the
| launch version
| nurettin wrote:
| What I love about their API is the tools array. Given a json
| schema describing your functions, it will output tool usage
| appropriate for the prompt. You can return tool results per call,
| and it will generate a dialog and additional tool calls based on
| those results.
| Uninen wrote:
| I'm somewhat impressed from the very first interaction I had with
| Claude 3.7 Sonnet. I prompted it to find a problem in my codebase
| where a CloudFlare pages function would return 500 + nonsensical
| error and an empty response in prod. Tried to figure this out all
| Friday. It was super annoying to fix as there's no way to add
| more logging or have any visibility to the issue as the script
| died before outputting anything.
|
| Both o1, o3 and Claude 3.5 failed to help me in any way with
| this, but Claude 3.7 not only found the correct issue with first
| answer (after thinking 39 seconds) but then continued to write me
| a working function to work around the issue with the second
| prompt. (I'm going to let it write some tests later but stopped
| here for now.)
|
| I assume it doesn't let me to share the discussion as I connected
| my GitHub repo to the conversation (a new feature in the web chat
| UI launched today) but I copied it as a gist here:
| https://gist.github.com/Uninen/46df44f4307d324682dabb7aa6e10...
| Uninen wrote:
| One thing about the reply gives away why Claude is still
| basically clueless about Actual Thinking; it suggested me to
| move the HTML sanitization to the frontend. It's in the CF
| function because it would be trivial to bypass it in the
| frontend making it easy to post literally anything in the db.
| Even a junior developer would understand this.
| umaar wrote:
| Drawing an SVG of a pelican on a bicycle. Claude 3.7 edition:
| https://x.com/umaar/status/1894114767079403747
| redox99 wrote:
| Claude 3.5/3.6/3.7 seems _too_ good at SVG compared to other
| models. I 'd wager they did a bit of training specifically on
| that.
| pcwelder wrote:
| Claude code terminal ux feels great.
|
| It has some well thought out features like restarting
| conversation with compressed context.
|
| Great work guys.
|
| However, I did get stuck when I asked it to run `npm create
| vite@latest todo-app` because it needs interactivity.
| g8oz wrote:
| Congratulations on the release! While team members are monitoring
| this discussion let me add that a relatively simple improvement
| I'd like to see in the UI is the ability to export a chat to
| markdown or XML.
| bhouston wrote:
| I wonder how similar Claude Code is to https://mycoder.ai - which
| also uses Claude in an agentic fashion?
|
| It seems quite similar:
|
| https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...
| jsemrau wrote:
| What I found one of the most interesting takeaways from
| Huggingface's GAIA is that the agent would provide better result
| when the agent "reasoned" the response to the task in code.
| wewewedxfgdf wrote:
| What makes software "agentic" instead of just a computer program?
|
| I hear lots of talk about agents and can't see them as being any
| different from an ordinary computer program.
| dannyw wrote:
| Computer programs generally don't call functions non-
| deterministically, including choosing what functions to call ,
| and when, at runtime.
| Uninen wrote:
| The Anthropic models comparison table has been updated now.
| Interesting new things at least the maximum output tokens upped
| from 8k to 64k and the knowledge cutoff date from April 2024 to
| October 2024.
|
| https://docs.anthropic.com/en/docs/about-claude/models/all-m...
| bcherny wrote:
| Thanks everyone for all your questions! The team and I are
| signing off. Please drop any other bugs or feature requests here:
| https://github.com/anthropics/claude-code. Thanks and happy
| coding!
| anotherpaulg wrote:
| Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard
| [0], WITHOUT USING THINKING.
|
| Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest
| non-thinking score, taking that title from Sonnet 3.5.
|
| Aider 0.75.0 is out with support for 3.7 Sonnet [1].
|
| Thinking support and thinking benchmark results coming soon.
|
| [0] https://aider.chat/docs/leaderboards/
|
| [1] https://aider.chat/HISTORY.html#aider-v0750
| bearjaws wrote:
| Thanks for all the work on aider, my favorite AI tool.
| stavros wrote:
| I'd like to second the thanks for Aider, I use it all the time.
| liamYC wrote:
| I'd like to 3rd the thanks for Aider it's fantastic!
| gwd wrote:
| Interesting that the "correct diff format" score went from
| 99.6% with Claude 3.5 to 93.3% for Claude 3.7. My experience
| with using claude-code was that it consistently required
| several tries to get the right diff. Hopefully all that will
| improve as they get things ironed out.
| throwaway454812 wrote:
| Any chance you can add support for Vertex AI Sonnet 3.7, which
| looks like it's available now? Thank you!
| hankchinaski wrote:
| It's amazingly good, but it will be scaringly good when there
| will be a way to include the entire codebase in the context and
| let it create and run various parts of a large codebase
| autonomously. Right now I can only do patch work and give
| specific code snippets to make it work. Excited to try this new
| version out, I'm sure I won't be disappointed,
|
| Edit: I just tried claude code CLI and it's a good compromise, it
| works pretty well, it does the discovery by itself instead of
| loading the whole codebase into context
| flutas wrote:
| FWIW, there's a project to turn it into something similar,
| though I think it's lacking the "entire in context" part and
| runs into rate limits quick with Claude.
|
| https://github.com/All-Hands-AI/OpenHands
|
| The few times I've tested it out though it fails fairly quick
| and gets hung up (usually on setting up the project while
| testing with Kotlin / Go).
| thefourthchime wrote:
| Cursor AI is getting there.
| hankchinaski wrote:
| cursor is just a wrapper to the apis and is unnecessarily
| expensive, I use zed editor with custom API keys and it works
| super well
| knes wrote:
| at Augment (https://augmentcode.com) we were one of the partner
| who tested 3.7 pre-launch. And it has been a pretty significant
| increase in quality and code understanding. Happy to answer some
| questions
|
| FYI, We use Claude 3.7 has part of the new features we are
| shipping around Code Agent & more.
| ginkgotree wrote:
| Been using 3.5 sonnet for a mobile app build the past month.
| Havent had much time to get a good sense of 3.7 improvements, but
| I have to say the dev experience improvement of Claude Code right
| in my shell is fantastic. Loving it so far
| ismaelvega wrote:
| Any plans to make some HackerRank Astra bench?
| specto wrote:
| I've had a personal subscription to Claude for a while now. I
| would love if that also gave me access to some amount of API
| calls.
| mirekrusin wrote:
| Ok, just got documentation and fixed two bugs in my open source
| project.
|
| $1.42
|
| This thing is a game changer.
| ramesh31 wrote:
| It would be reeeaaally nice if someone built Claude Code into a
| Cline/Aider type extension...
| bittermandel wrote:
| Claude Code works pretty OK so far, but Bash doesn't work
| straight up. Just sits and waits, even when running something
| basic like "!echo 123".
| leyoDeLionKin wrote:
| I cancelled after I hit the limit, plus you have very limited
| support here in europe
| 0xcb0 wrote:
| I can just say that this is awesome. I just did spend 10$ and a
| handful of querys to init up a app idea I had in a while.
|
| The basic idea is working, it handled everything for me.
|
| From setting up the node environment. Creating the directories,
| files, patching the files, running code, handling errors,
| patching again. From time to time it fails to detect its own
| faults. But when I pinpoint it, it get it most of the time. And
| the UI is actually more pretty than I would have crafted in v1
|
| When this get's cheaper, and better with each iteration,
| everybody will have a full dev team for a couple of bucks.
| wellthisisgreat wrote:
| What's the privacy like for Claude Code? Is it memorizing all the
| codebase?
| j_maffe wrote:
| It redid half of my BSc thesis in less than 30s :|
|
| https://claude.ai/share/ed8a0e55-633f-4056-ba70-772ab5f5a08b
|
| edit: Here's the output figure https://i.imgur.com/0c65Xfk.png
|
| edit 2: Gemini Flash 2 failed miserably
| https://g.co/gemini/share/10437164edd0
| ThouYS wrote:
| master and phd next!
| akreal wrote:
| Could this (or something similar) be found in public
| access/some libraries?
| j_maffe wrote:
| There is only a single paper that has published a similar
| derivation but with a critical mistake. To be fair there are
| many documented examples of how to derive parametric
| relationships in linkages and can be quite methodical. I
| think I could get Gemini or 3.5 to do it but not single
| shot/ultra fast like here.
| dev0p wrote:
| The quality of the code is so much better!
|
| The UI seems to have an issue with big artifacts but the model is
| noticeably smarter.
|
| Congratulations on the release!
| unshavedyak wrote:
| Are you using Claude Code or just the UI? Trying to figure out
| if anyone actually has Code yet hah.
|
| _edit_ : Oh, there's a link to "joining the preview" which
| points to: https://docs.anthropic.com/en/docs/agents-and-
| tools/claude-c...
| gigatexal wrote:
| How is the code generation? Open ai was generating good looking
| terraform but it was hallucinating on things that were incorrect.
| Copenjin wrote:
| Very good, Code is extremely nice but as others have said, if you
| let it go on its own it burns through your money pretty fast.
|
| I've made it build a web scraper from scratch, figuring out the
| "API" of a website using a project from github in another
| language to get some hints, and while in the end everything was
| working, I've seen 100k+ tokens being sent too frequently for
| apparently simple requests, something feels off, it feels like
| there are quite a few opportunities to reduce token usage.
| taosx wrote:
| The model is expensive, it almost reaches what I charge per hour.
| If used right it can be a productivity increase otherwise if you
| trust it, it WILL introduce silent bugs. So if I have to go over
| the code line by line I'd prefer to use the cheapest viable
| model: deepseek, gemini any other free self-hosted models.
|
| Congratz to the team!
| vbezhenar wrote:
| So far only o1 pro was breathtaking for me few times.
|
| I wrote a kind of complex code for MCU which deals with FRAM and
| few buffers, juggling bytes around in a complex fashion.
|
| I was very not sure in this code, so I spent some time with AI
| chats asking them to review this code.
|
| 4o, o3-mini and claude were more or less useless. They spot basic
| stuff like this code might be problematic for multi-thread
| environment, those are obvious things and not even true.
|
| o1 pro did something on another level. It recognized that my code
| uses SPI to talk to FRAM chip. It decoded commands that I've
| used. It understood the whole timeline of using CS pin. And it
| highlighted to me, that I used WREN command in a wrong way, that
| I must have separated it from WRITE command.
|
| That was truly breathtaking moment for me. It easily saved me
| days of debugging, that's for sure.
|
| I asked the same question to Claude 3.7 thinking mode and it
| still wasn't that useful.
|
| It's not the only occasion. Few weeks before o1 pro delivered me
| the solution to a problem that I considered kind of hard.
| Basically I had issues accessing IPsec VPN configured on a host,
| from a docker container. I made a well thought question with all
| the information one might need and o1 pro crafted for me magic
| iptables incarnation that just solved my problem. I spent quite a
| bit of time working on this problem, I was close but not there
| yet.
|
| I often use both ChatGPT and Claude comparing them side by side.
| For other models they are comparable and I can't really say
| what's better. But o1 pro plays above. I'll keep trying both for
| the upcoming days.
| dkulchenko wrote:
| Have you tried comparing with 3.7 via the API with a large
| thinking budget yet (32k-64k perhaps?), to bring it closer to
| the amount of tokens that o1-pro would use?
|
| I think claude.ai's web app in thinking mode is likely
| defaulting to a much much smaller thinking budget than that.
| davidbarker wrote:
| Claude 3.5 Sonnet is great, but on a few occasions I've gone
| round in circles on a bug. I gave it to o1 pro and it fixed it
| in one shot.
|
| More generally, I tend to give o1 pro as much of my codebase as
| possible (it can take around 100k tokens) and then ask it for
| small chunks of work which I then pass to Sonnet inside Cursor.
|
| Very excited to see what o3 pro can do.
| akomtu wrote:
| This is how the future AI will break free: "no idea what this
| update is doing, but what AI is suggesting seems to work and I
| have other things to do."
| sylware wrote:
| Is there some truth in the following relationship: o1 -> openai
| -> microsoft -> github for "training data" ?
| danieldevries wrote:
| Just tried Claude code. First impressions, it seems rather
| expensive. I prefer how Aider allows finer control over which
| files to add, or to use a sub-tree of a git repo. Also, It feels
| like the API calls when using Claude code are much faster then
| when using 3.7 on Aider. Giving bandwidth priority?
| RomanPushkin wrote:
| > strong improvements in coding and front-end web development
|
| The best part
| Daniel_Van_Zant wrote:
| Being able to control how many tokens are spent on thinking is a
| game-changer. I've been building fairly complex, efficient,
| systems with many LLMs. Despite the advantages, reasoning models
| have been a no-go due to how variable the cost is, and how hard
| that makes it to calculate a final per-query cost for the
| customer. Being able to say "I know this model can always solve
| this problem in this many thinking tokens" and thus limiting the
| cost for that component is huge.
| syndicatedjelly wrote:
| Claude Code is pretty sick. I love the terminal integration, I
| like being able to stay on the keyboard and not have to switch
| UIs. It did a nice job learning my small Django codebase and
| helping me finish out a feature that I wasn't sure how to
| complete.
| unsupp0rted wrote:
| Anybody else noticing that in Cursor, Claude Sonnet 3.7 is
| thinking much slower than Claude Sonnet 3.5 did?
| numba888 wrote:
| This was nice. I passed it jseessort algorithm. If you remember
| discussed here recently. Claude 3.7 generated C++ code. Non-
| working. But in few steps it gave extensive test, then fix. It
| looks to be working after a couple of minutes. It's 5-6 times
| slower than std::sort. Result is better than I've got from
| o3-mini-hard. Not fair comparison actually as prompting was
| different.
| smusamashah wrote:
| > output limit of 128K tokens
|
| Is this limit on thinking mode only? Or does normal mode have
| same limit now? 8192 tokens output limit can be bit small these
| days.
|
| I was trying to extract all urls along with their topics from a
| "what are you working on" HN thread. And 8192 token limit
| couldn't cover it.
| cavisne wrote:
| So far Claude Code seems very capable, it oneshotted something I
| couldnt get to work in cursor at all.
|
| However its expensive, 5m of work cost ~$1 which.
___________________________________________________________________
(page generated 2025-02-24 23:00 UTC)