hngopher.com

       [HN Gopher] Claude 3.7 Sonnet and Claude Code
       ___________________________________________________________________
        
       Claude 3.7 Sonnet and Claude Code
        
       Author : bakugo
       Score  : 1052 points
       Date   : 2025-02-24 18:28 UTC (4 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | bnc319 wrote:
       | Pretty amazing how DeepSeek started the visual reasoning trend,
       | xAI featured it in their latest release, and now Anthropic does
       | the same.
        
         | anjel wrote:
         | I took DS visual reasoning to be an elegant misdirect from how
         | much slower DS returns your query's output.
        
       | t55 wrote:
       | Anthropic doubling down on code makes sense, that has been their
       | strong suit compared to all other models
       | 
       | Curious how their Devin competitor will pan out given Devin's
       | challenges
        
         | malux85 wrote:
         | I thought the same thing, I have 3 really hard problems that
         | Claude (or any model) hasn't been able to solve so far and I'm
         | really excited to try them today
        
         | ru552 wrote:
         | Considering that they are the model that powers a majority of
         | Cursor/Windsurf usage and their play with MCP, I think they
         | just have to figure out the UX and they'll be fine.
        
         | weinzierl wrote:
         | It's their strong suit no doubt, but sometimes I wish the chat
         | would not be so eager to code.
         | 
         | It often throws code at me when I just want a conceptual or
         | high level answer. So often that I routinely tell it not to.
        
           | KerryJones wrote:
           | I complain about this all the time, despite me saying "ask me
           | questions before you code" or all these other instructions to
           | code less, it is SO eager to code. I am hoping their 3.7
           | reasoning follows these instructions better
        
             | vessenes wrote:
             | We should remember 3.5 was trained in an era when ChatGPT
             | would routinely refuse to code at all and architected in an
             | era when system prompts were not necessarily very
             | effective. I bet this will improve, especially now that
             | Claude has its own coding and arch cli tool.
        
           | NitpickLawyer wrote:
           | > I just want a conceptual or high level answer
           | 
           | I've found claude to be very receptive to precise
           | instructions. If I ask for "let's first discuss the
           | architecture" it never produces code. Aider also has this
           | feature with /architect
        
           | ap-hyperbole wrote:
           | I added custom instruction under my Profile settings in the
           | "personal preferences" text box. Something along the lines of
           | "I like to discuss things before wanting the code. Only
           | generate code when I prompt for it. Any question should be
           | answered to as a discussion first and only when prompted
           | should the implementation code be provided". It works well,
           | occasionally I want to see the code straight away but this
           | does not happen as often.
        
           | perdomon wrote:
           | I get this as well, to the point where I created a specific
           | project for brainstorming without code -- asking for
           | concepts, patterns, architectural ideas without any code
           | samples. One issue I find is that sometimes I get better
           | answers without using projects, but I'm not sure if that's
           | everyone experience.
        
             | bitbuilder wrote:
             | That's been my experience as well with projects, though I
             | have yet to do any sort of A/B testing to see if it's all
             | in my head or not.
             | 
             | I've attributed it to all your project content (custom
             | instruction, plus documents) getting thrown into context
             | before your prompt. And honestly, I have yet to work with
             | any model where the quality of the answer wasn't inversely
             | proportional to the length of context (beyond of course
             | supplying good instruction and documentation where needed).
        
           | ben30 wrote:
           | I've set up a custom style in Claude that won't code but just
           | keeps asking questions to remove assumptions:
           | 
           | Deep Understanding Mode (Gen Hui shi - Nemawashi Phase)
           | 
           | Purpose: - Create space (Jian , ma) for understanding to
           | emerge - Lay careful groundwork for all that follows -
           | Achieve complete understanding (grokking) of the true need -
           | Unpack complexity (desenrascar) without rushing to solutions
           | 
           | Expected Behaviors: - Show determination (sisu) in
           | questioning assumptions - Practice careful attention to
           | context (taarof) - Hold space for ambiguity until clarity
           | emerges - Work to achieve intuitive grasp (apercu) of core
           | issues
           | 
           | Core Questions: - What do we mean by [key terms]? - What
           | explicit and implicit needs exist? - Who are the
           | stakeholders? - What defines success? - What constraints
           | exist? - What cultural/contextual factors matter?
           | 
           | Understanding is Complete When: - Core terms are clearly
           | defined - Explicit and implicit needs are surfaced - Scope is
           | well-bounded - Success criteria are clear - Stakeholders are
           | identified - Achieve apercu - intuitive grasp of essence
           | 
           | Return to Understanding When: - New assumptions surface -
           | Implicit needs emerge - Context shifts - Understanding feels
           | incomplete
           | 
           | Explicit Permissions: - Push back on vague terms - Question
           | assumptions - Request clarification - Challenge problem
           | framing - Take time for proper nemawashi
        
         | KaoruAoiShiho wrote:
         | They cited Cognition (Devin's maker) in this blog post which is
         | kinda funny.
        
       | Flux159 wrote:
       | It's interesting that Anthropic is making their own coding agent
       | with Claude Code - is this a sign of them looking to move up the
       | stack and more into verticals that model wrapper startups are in?
        
         | madduci wrote:
         | GitHub copilot has now introduced Claude as model as well
        
         | vessenes wrote:
         | This makes sense to me: sell razor blades. Presumably Claude
         | has a large developer distribution channel so they will keep
         | eyeballing what to 'give away' that turns the dials on
         | inference billing.
         | 
         | I'd guess this will keep raising the bar for paid or open
         | source competitors, so probably good for end users esp given
         | they aren't a monopoly by any means.
        
       | estsauver wrote:
       | The docs for Claude code don't seem to be up yet but are linked
       | here: http://docs.anthropic.com/s/claude-code
       | 
       | I'm not sure if it's a broken link in the blog post or just
       | hasn't been published yet.
        
         | jumploops wrote:
         | Saw the same thing, but looks to be up now!
        
       | tablet wrote:
       | The progress in AI area is insane. I can't keep up with all the
       | news. And I have work to do...
        
         | amelius wrote:
         | It stopped being revolutionary and is now mostly evolutionary,
         | though.
        
           | dingnuts wrote:
           | it's been evolutionary for a long time. I fine-tuned a GPT-2
           | based chat bot that could form complete sentences back in
           | like 2017
           | 
           | It's been so long that I'm not even certain which YEAR I set
           | that up.
        
             | falcor84 wrote:
             | Where do you draw the line? If going from forming sentences
             | to achieving medal level success on IMO questions, doing
             | extensive web research on its own and writing entire SaaS
             | apps based on a prompt in under 10 years is just
             | "evolutionary", then it's one heck of an evolution.
        
             | og_kalu wrote:
             | >I fine-tuned a GPT-2 based chat bot that could form
             | complete sentences back in like 2017.
             | 
             | GPT-2 was a 2019 release lol.
        
         | frankfrank13 wrote:
         | This is a pretty small update, no? Nothing major since R1,
         | everyone else is just catching up to that, and putting small
         | spins on it, Anthropic's is "hybrid" research instead of
         | separate models
        
           | tablet wrote:
           | Well, now I have to play with it, try to see how it will
           | generate code for our agentic assistance (we do rely on code
           | to execute tasks flows), etc.
        
       | TIPSIO wrote:
       | "Make me a website about books. Make it look like a designer and
       | agency made it. Use Tailwind."
       | 
       | https://play.tailwindcss.com/tp54wfmIlN
       | 
       | Getting way better at UI.
        
         | flir wrote:
         | That's not hideous. Derivative, but that's the nature of the
         | beast.
        
         | jasonjmcghee wrote:
         | I feel like something isn't working... when i try to click
         | anything it just reloads. i can't see the collections
        
         | handfuloflight wrote:
         | As a designer and agency... this is extremely basic... but so
         | was the prompt.
        
       | lysace wrote:
       | It's fascinating how close these companies are to each other.
       | Some company comes up with something clever/ground-breaking and
       | everyone else has implemented it a few weeks later.
       | 
       | Hard not to think of Kurzweil's Law of Accelerating Returns.
        
         | mechagodzilla wrote:
         | It does seem like it will be very, very hard for the companies
         | training their own models to recoup their investment when the
         | capabilities of open-weight models catch up so quickly -
         | general purpose LLMs just seem destined to be a cheap
         | commodity.
        
           | jsheard wrote:
           | Well, the companies releasing open weights also need to
           | recoup their investments at some point, they can't coast on
           | VC hype forever. Huge models don't grow on trees.
        
             | mechagodzilla wrote:
             | Or, like Meta, they make their money elsewhere and just
             | seem interested in wrecking the economics of LLMs. As soon
             | as an open-weight model is released, it basically sets a
             | global floor that says "Models with similar or worse
             | performance effectively have zero value," and that floor
             | has been rising incredibly quickly. I'd be surprised if the
             | vast, vast majority of queries ChatGPT gets couldn't get
             | equivalently good results from llama3/deepseek/qwen/mistral
             | models, even for those paying for the pro versions.
        
               | Philpax wrote:
               | Eh, to some extent - there's still a pretty significant
               | cost to actually running inference for those models. For
               | example, no consumer can run DeepSeek v3/r1 - that
               | requires tens, possibly hundreds, of thousands of dollars
               | of hardware to run.
               | 
               | There's still room for other models, especially if they
               | have different performance characteristics that make them
               | suitable to run under consumer constraints. Mistral has
               | been doing quite well here.
        
               | mechagodzilla wrote:
               | If you don't need to pay for the model development costs,
               | I think running inference will just be driven down to the
               | underlying cloud computing costs. The actual requirement
               | to passably (~4-bit quantization) run Deepseek v3/r1 at
               | home is really just having 512GB or so of RAM - I bought
               | a used dual-socket xeon for $2k that has 768GB of RAM,
               | and can run Deepseek R1 at 1-1.5 tokens/sec, which is
               | perfectly usable for "ask a complicated question, come
               | back an hour or so later and check on the result".
        
               | riku_iki wrote:
               | I think Meta folks just don't know how to come to this
               | market and build something potentially profitable, and
               | doing random stuff, because need to report some results
               | to management.
        
         | azinman2 wrote:
         | It's extremely unlikely that everyone is copying in a few weeks
         | for models that themselves take many weeks if not longer to
         | train. Great minds think alike, and everyone is influencing
         | everyone. The history of innovation is filled with examples of
         | similar discoveries around the same time but totally
         | disconnected in the world. Now with the rate of publishing and
         | the openness of the internet, you're only bound to get even
         | more of that.
        
           | lysace wrote:
           | Isn't the reasoning thing essentially a bolt-on to existing
           | trained models? Like basically a meta-prompt?
        
             | pertymcpert wrote:
             | Somewhat but not exactly? I think the models need to be
             | trained to think.
        
             | azinman2 wrote:
             | No.
             | 
             | DeepSeek and now related projects have shown it's possible
             | to add reasoning via SFT to existing models, but that's not
             | the same as a prompt. But if you look at R1 they do a blend
             | of techniques to get reasoning.
             | 
             | For Anthropic to have a hybrid model where you can control
             | this, it will have to be built into the model directly in
             | its training and probably architecture as well.
             | 
             | If you're a competent company filled with the best AI minds
             | and a frontier model, you're not just purely copying...
             | you're taking ideas while innovating and adapting.
        
             | Philpax wrote:
             | The fundamental innovation is training the model to reason
             | through reinforcement learning; you can train existing
             | models with traces from these reasoning models to get you
             | within the same ballpark, but taking it further requires
             | you to do RL yourself.
        
           | KaoruAoiShiho wrote:
           | The copying here probably goes to strawberry from o1 which is
           | like at least 6 months but maybe copying efforts started even
           | earlier.
        
           | riku_iki wrote:
           | > for models that themselves take many weeks if not longer to
           | train.
           | 
           | they all have foundational heavy-trained model, and then they
           | can do follow up experimental training much faster.
        
         | luma wrote:
         | Where RL can play into post training there's something of an
         | anti-moat. Maybe a "tow rope"?
         | 
         | Let's say OAI releases some great new model. The moment it
         | becomes available via API, everyone else can make use of that
         | model to create high-quality RL training data, which can then
         | be used to make their models perform better.
         | 
         | The very act of making an AI model commercially available is
         | the same act which allows your competitors to pull themselves
         | closer to you.
        
       | ctoth wrote:
       | I've been using O3-mini with reasoning effort set to high in
       | Aider and loving the pricing. This looks as though it'll be about
       | three times as expensive. Curious to see which falls out as most
       | useful for what over the next month!
        
         | rahimnathwani wrote:
         | Aro using o3-mini for editing or just architect in architect-
         | editor mode?
        
           | vessenes wrote:
           | It is .. not a great architect. I have high hopes for 3.7
           | though - even 3.5 architect matched with 3.5 coding is
           | generally better than 3.5 coding alone.
        
       | rs_rs_rs_rs_rs wrote:
       | Hope it's worth the money because it's quite expensive.
        
       | m3kw9 wrote:
       | Wonder if Aider will copy some of these features
        
       | sergiotapia wrote:
       | Already available in Cursor!
       | https://x.com/cursor_ai/status/1894093436896129425
       | 
       | (although I do not see it)
        
       | ianhawes wrote:
       | > Include the beta header output-128k-2025-02-19 in your API
       | request to increase the maximum output token length to 128k
       | tokens for Claude 3.7 Sonnet.
       | 
       | This is pretty big! Previously most models could accept massive
       | input tokens but would be restricted to 4096 or 8192 output
       | tokens.
        
         | thegeomaster wrote:
         | This amounts to a cost-saving measure - you can generate
         | arbitrarily many tokens by appending the output and re-invoking
         | the model.
        
       | ungreased0675 wrote:
       | Awesome. Claude is significantly better than other models at code
       | assistant tasks, or at least in the way I use it.
        
         | jasondigitized wrote:
         | Totally agree. I continue to be blown away at how good it is at
         | understanding, explaining, and writing code. Got an obscure
         | error? Give Claude enough context and it is pretty dang good
         | and getting you on glide slope.
        
       | jedberg wrote:
       | Last week when Grok launched the consensus was that its coding
       | ability was better than Claude. Anyone have a benchmark with this
       | new model? Or just warm feelings?
        
         | esafak wrote:
         | They merely claimed that. I have not seen many people confirm
         | that it is the best, let alone a consensus. I don't believe it
         | is even available through an API yet.
        
         | minihat wrote:
         | Grok 3 with thinking is comparable to o1 for writing complex
         | algorithms.
         | 
         | However, Grok sometimes loses the context where o1 seems not
         | to. For this reason I still mostly use o1.
         | 
         | I have found both o1 and Grok 3 to be substantially better than
         | any Claude offering.
        
       | bbor wrote:
       | Just as humans use a single brain for both quick responses and
       | deep reflection, we believe reasoning should be an integrated
       | capability of frontier models rather than a separate model
       | entirely.
       | 
       | Interesting. I've been working on exactly this for a bit over two
       | years, and I wasn't surprised to see UAI finally getting traction
       | from the biggest companies -- but how deep do they really take
       | it...? I've taken this philosophy as an impetus to build an
       | integrated system of interdependent hierarchical modules, much
       | like Minsky's Society of Mind that's been popular in AI for
       | decades. But this (short, blog) post reads like it's more of a
       | behavioral goal than a design paradigm.
       | 
       | Anyone happen to have insight on the details here? Or, even
       | better, anyone from Anthropic lurking in these comments that
       | cares to give us some hints? I promise, I'm not a competitor!
       | 
       | Separately, the throwaway paragraph on alignment is worrying as
       | hell, but that's nothing new. I maintain hope that Anthropic is
       | keeping to their founding principles in private, and tracking
       | more serious concerns than "unnecessary refusals" and prompt
       | injection...
        
         | Alex-Programs wrote:
         | IIRC there's some reasoning in old Sonnet too, they're just
         | expanding that. Perhaps that's part of why it was so good for a
         | while.
         | 
         | https://www.reddit.com/r/ClaudeAI/comments/1iv356t/is_sonnet...
        
       | isoprophlex wrote:
       | YES. I've tried them all but Sonnet is still the model I'm most
       | productive with, even better than the o1/o3 models.
       | 
       | Wish I could find the link to enroll in their Claude Code beta...
        
         | frankfrank13 wrote:
         | here -- https://docs.anthropic.com/en/docs/agents-and-
         | tools/claude-c...
        
           | isoprophlex wrote:
           | Thanks!
        
       | waltercool wrote:
       | Just like OpenAI or Grok, there is no transparency and no way for
       | self-hosting purposes. Your input and confidential information
       | can be collected for training purposes.
       | 
       | I just don't trust those companies when you use their servers.
       | This is not a good approach to LLM democratization.
        
         | azinman2 wrote:
         | I wouldn't assume there's no way to self host -- it just costs
         | a lot more than open weights.
         | 
         | Anthropic claims they don't train on their inputs. I haven't
         | seen any reason to disbelieve them.
        
           | waltercool wrote:
           | But there is no way to know if their claims are true either.
           | Your inputs are processed into their servers, then you get a
           | response. Whatever happens in the middle, only Anthropic
           | knows. We don't even know of governments are actually pushing
           | AI companies to enforce censorship or spying people, like we
           | seen recently at UK government getting into Apple E2E
           | encryption.
           | 
           | This criticism is valid for the business who wants to use AI
           | to improve coding, code analysis or code review,
           | documentation, emails, etc, but also for that individual who
           | don't want to rely on 3rd party companies for AI usage.
        
       | wewewedxfgdf wrote:
       | Nothing in the Claude API release notes.
       | 
       | https://docs.anthropic.com/en/release-notes/api
       | 
       | I really wish Claude would get Projects and Files built into its
       | API, not just the consumer UI.
        
       | thanhhaimai wrote:
       | > Third, in developing our reasoning models, we've optimized
       | somewhat less for math and computer science competition problems,
       | and instead shifted focus towards real-world tasks that better
       | reflect how businesses actually use LLMs.
       | 
       | Company: we find that optimizing for LeetCode level programming
       | is not a good use of resources, and we should be training AI less
       | on competition problems.
       | 
       | Also Company: we hire SWEs based on how much time they trained
       | themselves on LeetCode
       | 
       | /joke of course
        
         | Svoka wrote:
         | My manager explained to me that LeetCode is proving that you
         | are willing to dance the dance. Same as PhD requirements etc -
         | you probably won't be doing anything related and definitely
         | nothing related to LeetCode, but you display dedication and
         | ability.
         | 
         | I kinda agree that this is probably reason why companies are
         | doing it. I don't like it, but this is besides the matter.
         | 
         | Using Claude other models in interviews probably won't be
         | allowed any time soon, but I do use it the work. So it does
         | make sense.
        
         | nico wrote:
         | And it's also the reality of hiring practices for most VC-
         | backed and public companies
         | 
         | Some try to do something more like "real-world" tasks, but
         | those end up either being either just toy problems, or long
         | take homes
         | 
         | Personally, I feel the most important things to prioritize when
         | hiring are: is the candidate going to get along with their
         | teammates (colleagues, boss, etc), and do they have the basic
         | skills to relatively quickly learn their jobs once they start?
        
       | EliasWatson wrote:
       | I asked it for a self-portrait as a joke and the result is
       | actually pretty impressive.
       | 
       | Prompt: "Draw a SVG self-portrait"
       | 
       | https://claude.site/artifacts/b10ef00f-87f6-4ce7-bc32-80b3ee...
       | 
       | For comparison, this is Sonnet 3.5's attempt:
       | https://claude.site/artifacts/b3a93ba6-9e16-4293-8ad7-398a5e...
        
         | orangesun wrote:
         | New mascot! Just make it the Anthropic orange
        
       | frankfrank13 wrote:
       | Tried claude code, and have an empty unresponsive terminal.
       | 
       | Looks cool in the demo though, but not sure this is going to
       | perform better than Cursor, and shipping this as an interactive
       | CLI instead of an extension is... a choice
        
         | toddmorey wrote:
         | I think it's a smart starting point as it's compatible with all
         | IDEs. Iterate and learn and then later wrap the functionality
         | up into IDE plugins.
        
       | apsec112 wrote:
       | They don't say this, but from querying it, they also seem to have
       | updated the knowledge cutoff from April 2024 ("3.6") to October
       | 2024 (3.7)
        
         | KerryJones wrote:
         | Thanks for noting this -- it's actually pretty important in my
         | work.
        
         | sunaookami wrote:
         | It's in the Model Card:
         | https://assets.anthropic.com/m/785e231869ea8b3b/original/cla...
         | 
         | >Claude 3.7 Sonnet is trained on a proprietary mix of publicly
         | available information on the Internet as of November 2024
        
       | rahimnathwani wrote:
       | I'm curious how Claude Code compares to Aider. It seems like they
       | have a similar user experience.
        
       | azinman2 wrote:
       | To me the biggest surprise was seeking grok dominate in all of
       | their published benchmarks. I haven't seen any benchmarks of it
       | yet (which I take with a giant heap of salt), but it's still
       | interesting nevertheless.
       | 
       | I'm rooting for Anthropic.
        
         | pertymcpert wrote:
         | Indeed. I wonder what the architecture for Claude and Grok3 is.
         | If they're still dense models was the MoE excitement with R1
         | was a tad premature...
        
         | phillipcarter wrote:
         | Neither a statement for or against Grok or Anthropic:
         | 
         | I've now just taken to seeing benchmarks as pretty lines or
         | bars on a chart that are in no way reflective of actual ability
         | for my use cases. Claude has consistently scored lower on some
         | benchmarks for me, but when I use it in a real-world codebase,
         | it's consistently been the only one that doesn't veer off
         | course or "feel wrong". The others do. I can't quantify it, but
         | that's how it goes.
        
           | vessenes wrote:
           | O1 pro is excellent at figuring out complex stuff that Claude
           | misses. It's my go to mid level debug assistant when Claude
           | spins
        
         | viccis wrote:
         | Yeah, putting it on the opposite side of that comparison chart
         | was a sleezy but likely effective move.
        
         | koakuma-chan wrote:
         | Grok does the most thinking out of all models I tried (it can
         | think for 2+ minutes), and that's why it is so good, though I
         | haven't tried Claude 3.7 yet.
        
       | photon_collider wrote:
       | Nice to see a new release from Anthropic. Yet, this only makes me
       | even more curious of when we'll see a new Claude Opus model.
        
         | bakugo wrote:
         | Funny enough, 3.7 Sonnet seems to think it's Opus right now:
         | 
         | > "thinking": "I am Claude, an AI assistant created by
         | Anthropic. I believe the specific model is Claude 3 Opus, which
         | is Anthropic's most capable model at the time of my training.
         | However, I should simply identify myself as Claude and not
         | mention the specific model version unless explicitly asked for
         | that level of detail."
        
         | Alex-Programs wrote:
         | I doubt we will. The state of the art seem to have moved away
         | from the GPT-4 style giant and slow models to smaller, more
         | refined ones - though Groq might be a bit of a return to the
         | "old ways"?
         | 
         | Personally I'm hoping they update Haiku at some point. It's not
         | quite good enough for translation at the moment, while Sonnet
         | is pretty great and has OK latency
         | (https://nuenki.app/blog/llm_translation_comparison)
        
       | cyounkins wrote:
       | I don't yet see it in Bedrock in us-east-1 or us-east-2
        
         | punkpeye wrote:
         | If you are open to alternatives
         | https://glama.ai/models/claude-3-7-sonnet-20250219
        
       | elliot07 wrote:
       | The cost is absurd (compared to other LLM providers these days).
       | I asked 3 questions and the cost was ~0.77c.
       | 
       | I do like how this is implemented as a bash tool and not an
       | editor replacement though. Never leaving Vim! :P
        
         | koakuma-chan wrote:
         | Yep, my experience as well. It's just not worth it.
        
           | koakuma-chan wrote:
           | It burns through tokens like crazy on a small code base
           | https://i.imgur.com/16GCxiy.png
        
       | modeless wrote:
       | I updated Cursor to the latest 0.46.3 and manually added
       | "claude-3.7-sonnet" to the model list and it appears to work
       | already.
       | 
       | "claude-3.7-sonnet-thinking" works as well. Apparently controls
       | for thinking time will come soon:
       | https://x.com/sualehasif996/status/1894094715479548273
        
       | punkpeye wrote:
       | https://glama.ai/models/claude-3-7-sonnet-20250219
       | 
       | Will be interesting to see how this gets adopted in communities
       | like Roo/Cline, which currently account for the most token usage
       | among Glama gateway user base.
        
       | bcherny wrote:
       | Hi everyone! Boris from the Claude Code team here. @eschluntz,
       | @catherinewu, @wolffiex, @bdr and I will be around for the next
       | hour or so and we'll do our best to answer your questions about
       | the product.
        
         | frankfrank13 wrote:
         | Congrats on the launch! You said its an important tool for you
         | (Claude Code) how does this fit in with Co-Pilot, Cursor, etc.
         | Do you/your teammates only rely on Claude Code? What do you
         | reach for for different tasks?
        
           | bcherny wrote:
           | Claude Code is super popular internally at Anthropic. Most
           | engineers like to use it together with an IDE like Cursor,
           | Windsurf, VS Code, Zed, Xcode, etc. Personally I usually
           | start most coding tasks in Code, then move to an IDE for
           | finishing touches.
        
         | 420gunna wrote:
         | Are you guys paying Claude for its assistance with your
         | products
        
         | pookieinc wrote:
         | The biggest complaint I (and several others) have is that we
         | continuously hit the limit via the UI after even just a few
         | intensive queries. Of course, we can use the console API, but
         | then we lose ability to have things like Projects, etc.
         | 
         | Do you foresee these limitations increasing anytime soon?
         | 
         | Quick Edit: Just wanted to also say thank you for all your hard
         | work, Claude has been phenomenal.
        
           | eschluntz wrote:
           | We are definitely aware of this (and working on it for the
           | web UI), and that's why Claude Code goes directly through the
           | API!
        
             | smallerfish wrote:
             | I'm sure many of us would gladly pay more to get 3-5x the
             | limit.
             | 
             | And I'm also sure that you're working on it, but some kind
             | of auto-summarization of facts to reduce the context in
             | order to avoid penalizing long threads would be sweet.
             | 
             | I don't know if your internal users are dogfooding the
             | product that has user limits, so you may not have had this
             | feedback - it makes me irritable/stressed to know that I'm
             | running up close to the limit without having gotten to the
             | bottom of a bug. I don't think stress response in your
             | users is a desirable thing :).
        
               | justinbaker84 wrote:
               | This is the main point I always want to communicate to
               | the teams building foundation models.
               | 
               | A lot of people just want the ability to pay more in
               | order to get more.
               | 
               | I would gladly pay 10x more to get relatively modest
               | increases in performance. That is how important the
               | intelligence is.
        
             | sealthedeal wrote:
             | I haven't been able to find ClaudeCLI for pubic access yet.
             | Would love to use.
        
               | eschluntz wrote:
               | >>> npm install -g @anthropic-ai/claude-code
               | 
               | >>> claude
        
               | kkarpkkarp wrote:
               | see https://docs.anthropic.com/en/docs/agents-and-
               | tools/claude-c...
        
           | clangfan wrote:
           | this is also my problem, ive only used the UI with $20
           | subscription, can I use the same subscription to use the cli?
           | I'm afraid its like those aws api billing where there is no
           | limit to how much I can use then get a surprise bill
        
             | eschluntz wrote:
             | It is API billing like AWS - you pay for what you use.
             | Every time you exit a session we print the cost, and in the
             | middle of a session you can do /cost to see your cost so
             | far that session!
             | 
             | You can track costs in a few ways and set spend limits to
             | avoid surprises: https://docs.anthropic.com/en/docs/agents-
             | and-tools/claude-c...
        
               | mindok wrote:
               | Which is theoretically great, but if anyone can get an
               | Aussie credit card to work, please let me know.
        
               | robbiep wrote:
               | I haven't had an issue with Aussie cards?
               | 
               | But I still hit limits, I use Claudemind with jetbrains
               | stuff and there is a max of input tokens (j believe), I
               | am 'tier 2' but doesn't look like I can go past this
               | without an enterprise agreement
        
               | danw1979 wrote:
               | What I really want (as a current Pro subscriber) is a
               | subscription tier ("Ultimate" at ~$120/month ?) that
               | gives me priority access to the usual chat interface, but
               | _also_ a bunch of API credits that would ensure Claude
               | and I can code together for most of the average working
               | month (reasonable estimate would be 4 hours a day, 15
               | days a month).
               | 
               | i.e I'd like my chat and API usage to be all included
               | under a flat-rate subscription.
               | 
               | Currenty Pro doesn't give me any API credits to use with
               | coding assistants (Claude Code included ?) which is
               | completely disjointed. And I need to be a business to use
               | the API still ?
               | 
               | Honestly, Claude is so good, just please take my money
               | and make it easy to do the above !
        
           | punkpeye wrote:
           | If you are open to alternatives, try https://glama.ai/gateway
           | 
           | We currently serve ~10bn tokens per day (across all models).
           | OpenAI compatible API. No rate limits. Built in logging and
           | tracing.
           | 
           | I work with LLMs every day, so I am always on top of adding
           | models. 3.7 is also already available.
           | 
           | https://glama.ai/models/claude-3-7-sonnet-20250219
           | 
           | The gateway is integrated directly into our chat
           | (https://glama.ai/chat). So you can use most of the things
           | that you are used to having with Claude. And if anything is
           | missing, just let me know and I will prioritize it. If you
           | check our Discord, I have a decent track record of being
           | receptive to feedback and quickly turning around features.
           | 
           | Long term, Glama's focus is predominantly on MCPs, but chat,
           | gateway and LLM routing is integral to the greater vision.
           | 
           | I would love feedback if you are going to give a try
           | frank@glama.ai
        
             | airstrike wrote:
             | The issue isn't API limits, but web UI limits. We can
             | always get around the web interface's limits by using the
             | claude API directly but then you need to have some other
             | interface...
        
               | punkpeye wrote:
               | The API still has limits. Even if you are on the highest
               | tier, you will quickly run into those limits when using
               | coding assistants.
               | 
               | The value proposition of Glama is that it combines UI and
               | API.
               | 
               | While everyone focuses on either one or the other, I've
               | been splitting my time equally working on both.
               | 
               | Glama UI would not win against Anthropic if we were to
               | compare them by the number of features. However, the
               | components that I developed were created with craft and
               | love.
               | 
               | You have access to:
               | 
               | * Switch models between OpenAI/Anthropic, etc.
               | 
               | * Side-by-side conversations
               | 
               | * Full-text search of all your conversations
               | 
               | * Integration of LaTeX, Mermaid, rich-text editing
               | 
               | * Vision (uploading images)
               | 
               | * Response personalizations
               | 
               | * MCP
               | 
               | * Every action has a shortcut via cmd+k (ctrl+k)
        
               | airstrike wrote:
               | Ok, but that's not the issue the parent was mentioning.
               | I've never hit API limits but, like the original comment
               | mentioned, I too constantly hit the web interface limits
               | particularly when discussing relatively large modules.
        
               | glenstein wrote:
               | Right, that's how I read it also. It's not that there's
               | no limits with the API, but that they're appreciably
               | different.
        
             | cmdtab wrote:
             | Do you have deepseek r1 support? I need it for a current
             | product I'm working on.
        
               | pclmulqdq wrote:
               | They are just selling a frontend wrapper on other
               | people's services, so if someone else offers deepseek,
               | I'm sure they will integrate it.
        
               | punkpeye wrote:
               | Indeed we do https://glama.ai/models/deepseek-r1
               | 
               | It is provided by DeepSeek and Avian.
               | 
               | I am also midway of enabling a third-provider (Nebius).
               | 
               | You can see all models/providers over at
               | https://glama.ai/models
               | 
               | As another commenter in this tread said, we are just a
               | 'frontend wrapper' around other people services.
               | Therefore, it is not particularly difficult to add models
               | that are already supported by other providers.
               | 
               | The benefit of using our wrapper is that you can use a
               | single API key and you get one bill for all your AI
               | bills, you don't need to hack together your own logic for
               | routing requests between different providers, failovers,
               | keeping track of their costs, worry what happens if a
               | provider goes down, etc.
               | 
               | The market at the moment is hugely fragmented, with many
               | providers unstable, constantly shifting prices, etc. The
               | benefit of a router is that you don't need to worry about
               | those things.
        
               | cmdtab wrote:
               | Yeah I am aware. I use open router at the moment but I
               | find it lacks a good UX.
        
               | punkpeye wrote:
               | Open router is great.
               | 
               | They have a very solid infrastructure.
               | 
               | Scaling infrastructure to handle billions of tokens is no
               | joke.
               | 
               | I believe they are approaching 1 trillion tokens per
               | week.
               | 
               | Glama is way smaller. We only recently crossed 10bn
               | tokens per day.
               | 
               | However, I have invested a lot more into UX/UI of that
               | chat itself, i.e. while OpenRouter is entirely focused on
               | API gateway (which is working for them), I am going for a
               | hybrid approach.
               | 
               | The market is big enough for both projects to co-exist.
        
         | light_triad wrote:
         | Thanks for this - exciting launch. Do you have examples of cool
         | applications or demos that the HN crowd should check out?
        
           | eschluntz wrote:
           | hi! I've been working on demos where I let Claude Code run
           | for hours at a time on a sandboxed project:
           | https://x.com/ErikSchluntz/status/1894104265817284770
           | 
           | TLDR: asking claude to speed up my code once 1.8x'd perf, but
           | putting it in a loop telling it to make it faster for 2 hours
           | led to a 500x speedup!
        
             | LouisSayers wrote:
             | I assume you had a comprehensive test suite?
        
             | light_triad wrote:
             | YES!! I need infinite credits for infinite Claude Code.
             | Will try it to get Claude to do all my work.
        
           | catherinewu wrote:
           | We built Claude Code with Claude Code!
        
             | Karrot_Kream wrote:
             | This is super cool and I hope y'all highlight it
             | prominently!
        
             | light_triad wrote:
             | Best demo - it's Claude Code all the way down. Claude Code
             | === Claude Code
        
           | logicallee wrote:
           | >Do you have examples of cool applications or demos that the
           | HN crowd should check out?
           | 
           | Not OP obviously, but I've built so many applications with
           | Claude, here are just a few:
           | 
           | [1]
           | 
           | Mockup of Utopian infrastructure support button (this is just
           | a mockup, the buttons don't do anything): https://claude.site
           | /artifacts/435290a1-20c4-4b9b-8731-67f5d8...
           | 
           | [2]
           | 
           | Robot body simulation: https://claude.site/artifacts/6ffd3a73
           | -43d6-4bdb-9e08-02901d...
           | 
           | [3]
           | 
           | 15-piece slider puzzle: https://claude.site/artifacts/4504269
           | b-69e3-4b76-823f-d55b3e...
           | 
           | [4]
           | 
           | Canada joining the U.S., checklist:
           | https://claude.site/artifacts/6e249e38-f891-4aad-
           | bb47-2d0c81...
           | 
           | [5]
           | 
           | Secure encryption and decryption with AES-256-GCM with
           | password-based key derivation:
           | 
           | https://claude.site/artifacts/cb0ac898-e5ad-42cf-a961-3c4bf8.
           | ..
           | 
           | (Try to decrypt this message
           | 
           | kFIxcBVRi2bZVGcIiQ7nnS0qZ+Y+1tlZkEtAD88MuNsfCUZcr6ujaz/mtbEDs
           | LOquP4MZiKcGeTpBbXnwvSLLbA/a2uq4QgM7oJfnNakMmGAAtJ1UX8qzA5qMh
           | 7b5gze32S5c8OpsJ8=
           | 
           | With the password "Hello Hacker News!!" (without quotation
           | marks))
           | 
           | [6]
           | 
           | Supply-demand visualizer under tariffs and subsidies: https:/
           | /claude.site/artifacts/455fe568-27e5-4239-afa4-051652...
           | 
           | [7]
           | 
           | fortune cookie program: https://claude.site/artifacts/d7cfa4a
           | e-6946-47af-b538-e6f992...
           | 
           | [8]
           | 
           | Household security training for classified household members
           | (includes self-assessment and certificate): https://claude.si
           | te/artifacts/7754dae3-a095-4f02-b4d3-26f1a5...
           | 
           | [9]
           | 
           | public service accountability training program: https://claud
           | e.site/artifacts/b89a69fb-1e46-4b5c-9e96-2c29dd...
           | 
           | [10]
           | 
           | Nuclear non-proliferation "big brother" agent technical
           | demonstration: https://claude.site/artifacts/555d57ba-6b0e-41
           | a1-ad26-7c90ca...
           | 
           | Dating stuff:
           | 
           | [11]
           | 
           | Dating help: Interest Level Assessment Game (is she
           | interested?) https://claude.site/artifacts/523c935c-274e-4efa
           | -8480-1e09e9...
           | 
           | [12]
           | 
           | Dating checklist: https://claude.site/artifacts/10bf8bea-36d5
           | -407d-908a-c1e156...
        
         | mike_hearn wrote:
         | Great, thanks! Could you compare this new tool to Aider?
        
         | thegeomaster wrote:
         | Thank you to the team. Looks like a great release. Already
         | switching existing prompts to Claude 3.7 to see the eval
         | results :)
        
         | oofbaroomf wrote:
         | Do you think Claude Code is "better", in terms of capabilities
         | and token efficiency, than other tools such as Cline, Cursor,
         | or Aider?
        
           | bcherny wrote:
           | Claude Code is a research preview -- it's more rough, lets
           | you see model errors directly, etc. so it's not as polished
           | as something like Cline. Personally I use all of the above.
           | Engineers here at Anthropic also tend to use Claude Code
           | alongside IDEs like Cursor.
        
         | curl-up wrote:
         | In the console, TPM limit for 3.7 is not shown (I'm tier 4).
         | Does it mean there is no limit, or is it just pending and is
         | "variable" until you set it to some value?
        
           | catherinewu wrote:
           | We set the Claude Code rate limits to be usable as a daily
           | driver. We expect hitting rate limits for synchronous usage
           | to be uncommon. Since this is a research preview, we
           | recommend you start small as you try the product though.
        
             | curl-up wrote:
             | Sorry, I completely missed you're from the Code team. I was
             | actually asking about the vanilla API. Any insights into
             | those limits? It's still missing the TPM number in the
             | console.
        
         | neoromantique wrote:
         | Thanks for the product! Glad to hear the (so called) "safety"
         | is being walked back on, previously Claude has been feeling a
         | little like it is treating me as a child, excited to try it out
         | now.
        
         | jumploops wrote:
         | From the release you say: "[..] in developing our reasoning
         | models, we've optimized somewhat less for math and computer
         | science competition problems, and instead shifted focus towards
         | real-world tasks that better reflect how businesses actually
         | use LLMs."
         | 
         | Can you tell us more about the trade-offs here?
         | 
         | Also, are you using synthetic data for improving the responses
         | here, or are you purely leveraging data from usage/partner's
         | usage?
        
         | davely wrote:
         | I'm in the middle of a particularly nasty refactor of some
         | legacy React component code (hasn't been touched in 6 years,
         | old class based pattern, tons of methods, why, oh, why did we
         | do XYZ) at work and have been using Aider for the last few days
         | and have been hitting a wall. I've been digging through Aider's
         | source code on Github to pull out prompts and try to write my
         | own little helper script.
         | 
         | So, perfect timing on this release for me! I decided to install
         | Claude Code and it is making short work of this. I love the
         | interface. I love the personality ("Ruminating", "Schlepping",
         | etc).
         | 
         | Just an all around fantastic job!
         | 
         | (This makes me especially bummed that I really messed up my OA
         | awhile back for you guys. I'll try again in a few months!)
         | 
         | Keep on doing great work. Thank you!
        
           | bcherny wrote:
           | Hey thanks so much! <3
        
         | fsndz wrote:
         | Anthropic is back and cementing its place as the creator of the
         | best coding models--bravo!
         | 
         | With Claude Code, the goal is clearly to take a slice of Cursor
         | and its competitors' market share. I expected this to happen
         | eventually.
         | 
         | The app layer has barely any moat, so any successful app with
         | the potential to generate significant revenue will eventually
         | be absorbed by foundation model companies in their quest for
         | growth and profits.
        
           | keithwhor wrote:
           | I think an argument could be reasonably made that the app
           | layer is the only moat. It's more likely Anthropic eventually
           | has to acquire Cursor to cement a position here than they
           | out-compete it. Where, why, what brand and what product
           | customers swipe their credit cards for matters -- a lot.
        
             | fsndz wrote:
             | if Claude Code offers a better experience, users will
             | rapidly move from cursor to Claude Code.
             | 
             | Claude is for Code: https://medium.com/thoughts-on-machine-
             | learning/claude-is-fo...
        
               | keithwhor wrote:
               | (1) That's a big if. It requires building a team
               | specialized in delivering what Cursor has already
               | delivered which is no small task. There are probably only
               | a handful of engineers on the planet that have or can be
               | incentivized to develop the product intuition the Cursor
               | founders have developed in the market already. And even
               | then; I'm an aspiring engineer / PM at Anthropic. Why
               | would I choose to spend all of my creative energy
               | _copying what somebody else is doing_ for the same pay I
               | 'd get working on something greenfield, or more
               | interesting to me, or more likely to get me a promotion?
               | 
               | (2) It's not clear to me that users (or developers)
               | actually behave this way in practice. Engineering is a
               | bit of a cargo cult. Cursor got popular because it was
               | good but it also got popular because it _got popular_.
        
               | CharlesW wrote:
               | > _It requires building a team specialized in delivering
               | what Cursor has already delivered which is no small
               | task._
               | 
               | There are several AIDEs out there, and based on working
               | with Cursor, VS Code, and Windsurf there doesn't seem to
               | be much of a difference (although I like Windsurf best).
               | What moat does Cursor have?
        
               | aquariusDue wrote:
               | Just chiming in to say that AIDEs (Artificial
               | Intelligence Development Environments, I suppose) is such
               | a good term for these new tools imo.
               | 
               | It's one thing to retrofit LLMs into existing tools but
               | I'm more curious how this new space will develop as time
               | goes on. Already stuff like the Warp terminal is pretty
               | useful in day to day use.
               | 
               | Who knows, maybe this time next year we'll see more
               | people programming by voice input instead of typing.
               | Something akin to Talon Voice supercharged by a local LLM
               | hopefully.
        
               | Etheryte wrote:
               | In my opinion you're vastly overestimating how much of a
               | moat Cursor has. In broad strokes, in builds an index of
               | your repo for easier referencing and then adds some handy
               | UI hooks so you can talk to the model, there really isn't
               | that much more going on. Yes, the autocomplete is nice at
               | times, but it's at best like pair programming with a new
               | hire. Every big player in the AI space could replicate
               | what they've done, it's only a matter of whether they
               | consider it worth the investment or not given how fast
               | the whole field is moving.
        
               | keithwhor wrote:
               | Conversely, I think you're overestimating the impact of
               | the value (or lack thereof) of technology over
               | distribution and market timing.
        
           | eschluntz wrote:
           | hi! I've been using Claude Code in a very complementary way
           | to my IDE, and one of the reasons we chose the terminal is
           | because you can open it up inside whichever IDE you want!
        
           | biker142541 wrote:
           | I wonder if they will offer competitive request counts
           | against Cursor. Right now, at least for me, the biggest
           | downside to Claude is how fast I blow through the limits
           | (Pro) and hit a wall.
           | 
           | At least with Cursor, I can use all "premium" 500 completions
           | and either buy more, or be patient for throttled responses.
        
         | Attummm wrote:
         | Hi Boris,
         | 
         | Would it be possible to bring back sonnet 2024 June?
         | 
         | That model was the most attentive.
         | 
         | Because we lost that model this release a value loss for me
         | personally.
        
           | ac29 wrote:
           | Seems to still be available via API as
           | claude-3-5-sonnet-20240620
        
         | joshuabaker2 wrote:
         | Hi Boris, love working with Claude! I do have a question--is
         | there a plan to have Claude 3.5 Sonnet (or even 3.7!) made
         | available on ca-central-1 for Amazon Bedrock anytime soon? My
         | company is based in Canada and we deal with customer
         | information that is required to stay within Canada, and the
         | most recent model from Anthropic we have available to us is
         | Claude 3.
        
           | pbronez wrote:
           | Concur. Models aren't real until I can run them inside my
           | perimeter.
        
         | matznerd wrote:
         | Hi Boris et al, can you comment on increased conversation
         | lengths or limits through the UI? I didn't see that mentioned
         | in the blog post, but it is a continued major concern of
         | $20/month Claude.ai users. Is this an issue that should be
         | fixed now or still waiting on a larger deployment via Amazon or
         | something? If not now, when can users expect the conversation
         | length limitations will be increased?
        
         | LouisSayers wrote:
         | Awesome work, Claude is amazingly good at writing code that is
         | pretty much plug and play.
         | 
         | Could you speak at all about potential IDE integrations? An
         | integration into Jetbrains IDEs would be super useful - I
         | imagine being able to highlight a bit of code and having a
         | plugin check the code graph to see dependencies, tests etc that
         | might be affected by a change.
         | 
         | Copying and pasting code constantly is starting to seem a bit
         | primitive.
        
           | eschluntz wrote:
           | Part of our vision is that because Claude Code is just in the
           | terminal, you can bring it into any IDE (or server) you want!
           | Obviously that has tradeoffs of not having a full GUI of the
           | IDE though
        
             | elliot07 wrote:
             | I much prefer the standalone design to being editor
             | integrated.
        
             | unshavedyak wrote:
             | Anyone know how to get access to it? Notably i'm debating
             | purchasing for Claude Code, but being on NixOS i want to
             | make sure i can install it first.
             | 
             | If this Code preview is only open to subscribers it means i
             | have to subscribe before i can even see if the binary works
             | for me. Hmm
             | 
             |  _edit_ : Oh, there's a link to "joining the preview" which
             | points to: https://docs.anthropic.com/en/docs/agents-and-
             | tools/claude-c...
        
           | ben30 wrote:
           | Jetbrains have an official mcp plugin
        
             | LouisSayers wrote:
             | Thanks, I wasn't aware of the Model Context Protocol!
             | 
             | For anyone interested - you can extend Claude's
             | functionality by allowing it to run commands via a local
             | "MCP server" (e.g. make code commits, create files,
             | retrieve third party library code etc).
             | 
             | Then when you're running Claude it asks for permission to
             | run a specific tool inside your usual Claude UI.
             | 
             | https://www.anthropic.com/news/model-context-protocol
             | 
             | https://github.com/modelcontextprotocol/servers
        
         | Falimonda wrote:
         | CLAUDE NUMBA ONE!!!
         | 
         | Congrats on the new release!
        
         | Flux159 wrote:
         | Is there a way to always accept certain commands across
         | sessions? Specifically for things like reading or updating
         | files I don't want to have to approve that each time I open a
         | new repl.
         | 
         | Also, is there a way to switch models between 3.5-sonnet and
         | 3.5-sonnet-thinking? Got the initial impression that the
         | thinking model is using an excessive amount of tokens on first
         | use.
        
           | eschluntz wrote:
           | Right now no, but if you run in docker, you can use
           | `--dangerously-skip-permissions`
           | 
           | Some commands could be totally fine in one context, but bad
           | in a different i.e. pushing to master
        
           | bcherny wrote:
           | When you are prompted to accept a bash command, we should be
           | giving you the option to not ask again. If you're not seeing
           | that for a specific bash command, would you mind running /bug
           | or filing an issue on Github?
           | https://github.com/anthropics/claude-code/issues
           | 
           | Thinking and not thinking is actually the same model! The
           | model thinks automatically when you ask it to. If you don't
           | explicitly ask it to think, it won't use thinking.
        
         | logicallee wrote:
         | Can you give some insight into how you chose the reply limit
         | length? It seems to cut off many useful programs that are
         | 80%-90% done and if the limit were just a little higher it
         | would be a source of extraordinary benefit.
        
           | bcherny wrote:
           | If you can reproduce that, would you mind reporting it with
           | /bug?
        
             | logicallee wrote:
             | Just tried it with claude 3.7 sonnet, here is the share: ht
             | tps://claude.ai/share/68db540d-a7ba-4e1f-882e-f10adf64be91
             | and it doesn't finish outputing the program. (It's missing
             | the rest of the application function and the main
             | function).
             | 
             | Here are steps to reproduce.
             | 
             | Background/environment:
             | 
             | ChatGPT helped me build this complete web browser in
             | Python:
             | 
             | https://taonexus.com/publicfiles/feb2025/71toy-browser-
             | with-...
             | 
             | It looks like this, versus the eventual goal:
             | https://imgur.com/a/j8ZHrt1
             | 
             | in 1055 lines. But eventually it couldn't improve on it
             | anymore, ChatGPT couldn't modify it at my request so that
             | inline elements would be on the same line.
             | 
             | If you want to run it just download it and rename it to
             | .py, I like Anaconda as an environment, after reading the
             | code you can install the required libraries with:
             | 
             | conda install -c conda-forge requests pillow urllib3
             | 
             | then run the browser from the Anaconda prompt by just
             | writing "python " followed by the name of the file.
             | 
             | 2.
             | 
             | I tried to continue to improve the program with Claude, so
             | that in-line elements would be on the same line.
             | 
             | I performed these reproduceable steps:
             | 
             | 1. copied the code and pasted it into a Claude chat window
             | with ctrl-v. This keeps it in the chat as paste.
             | 
             | 2. Gave it the prompt "This complete web browser works but
             | doesn't lay out inline elements inline, it puts them all on
             | a new line, can you fix it so inline elements are inline?"
             | 
             | It spit out code until it hit section 8 out of 9 which is
             | 70% of the way through and gave the error message "Claude
             | hit the max length for a message and has paused its
             | response. You can write Continue to keep the chat going".
             | Screenshot:
             | 
             | https://imgur.com/a/oSeiA4M
             | 
             | So I wrote "Continue" and it stops when it is 90% of the
             | way done.
             | 
             | Again it got stuck at 90% of the way done, second
             | screenshot in the above album.
             | 
             | So I wrote "Continue" again.
             | 
             | It just gave an answer but it never finished the program.
             | There's no app entry in the program, it completely omited
             | the rest of the main class itself and the callback to call
             | it, which would be like:                       def
             | run(self):                 self.root.mainloop()
             | ###########################################################
             | ####################         # main         ###############
             | ###########################################################
             | #####                  if __name__=="__main__":
             | sys.setrecursionlimit(10**6)             app=ToyBrowser()
             | app.run()
             | 
             | so it only output a half-finished program. It explained
             | that it was finished.
             | 
             | I tried telling it "you didn't finish the program, output
             | the rest of it" but doing so just got it stuck rewriting it
             | without finishing it. Again it said it ran into the limit,
             | again I said Continue, and again it didn't finish it.
             | 
             | The program itself is only 1055 lines, it should be able to
             | output that much.
        
         | bakugo wrote:
         | Can you let the API team know that the /v1/models endpoint has
         | been broken for hours? Thanks.
        
           | latetomato wrote:
           | Hello! Member of the API team here. We're unable to find
           | issues with the /v1/models endpoint--can you share more
           | details about your request? Feel free to email me at
           | suzanne@anthropic.com. Thank you!
        
             | bakugo wrote:
             | It always returns a Not Found error for me. Using the curl
             | command copied directly from the docs:
             | 
             | $ curl https://api.anthropic.com/v1/models --header "x-api-
             | key: $ANTHROPIC_API_KEY" --header "anthropic-version:
             | 2023-06-01"
             | 
             | {"type":"error","error":{"type":"not_found_error","message"
             | :"Not found"}}
             | 
             | Edit: Tried creating a different API key and it works with
             | that one. Weird.
        
               | lebovic wrote:
               | If you can reproduce the issue with the other API key,
               | I'd also love to debug this! Feel free to share the curl
               | -vv output (excluding the key) with the Anthropic email
               | address in my profile
        
         | kevinz3 wrote:
         | hey guys! i was wondering why you chose to build Claude code
         | via CLI when many popular choices like cursor and windsurf fork
         | VScode. do you envision the future of Claude code to abstract
         | away the codebase entirely?
        
           | bcherny wrote:
           | We wanted to bring the model to people where they are without
           | having to commit to a specific tool or radically change their
           | workflows. We also wanted to make a way that lets people
           | experience the model's coding abilities as directly as
           | possible. This has tradeoffs: it uses a lot of tokens, and is
           | rough (eg. it shows you tool errors and model weirdness), but
           | it also gives you a lot of power and feels pretty awesome to
           | use.
        
             | unshavedyak wrote:
             | I like this quite a bit, thank you! I prefer Helix editor
             | and i hate the idea of running VSCode just to access some
             | random Code assistant
        
         | babyshake wrote:
         | One thing I would love to have fixed - I type in a prompt, the
         | model produces 90% or even 100% of the answer, and then shows
         | an error that the system is at capacity and can't produce an
         | answer. And then the response that has already been provided is
         | removed! Please just make it where I can still have access to
         | the response that has been provided, even if it is incomplete.
        
           | rishikeshs wrote:
           | This. Claude team, please fix this!
        
         | pbor wrote:
         | Hi and congrats on the launch!
         | 
         | Will check out Claude Code soon, but in the meantime one
         | unrelated other feature request: Moving existing chats into a
         | project. I have a number of old-ish but super-useful and
         | valuable chats (that are superficially unrelated) that I would
         | like to bring together in a project.
        
         | ipsum2 wrote:
         | Why gatekeep Claude Code, instead of releasing the code for it?
         | It seems like a direct increase in revenue/API sales for your
         | company.
        
           | sangnoir wrote:
           | I'm not affiliated with Anthropic, but it seems like doing
           | this will commoditize Claude (the AIaaS). Hosted AI providers
           | are doing all they can to move away from being
           | interchangeable commodities; it's not good for Anthropic's
           | revenue for users to be able to easily swap-out the backend
           | of Cloud Code to a local Olama backend, or a cheaper hosted
           | DeepSeek. Open sourcing Claude Code would make this option 1
           | or 2 forks/PRs away.
        
         | Ninjinka wrote:
         | How is your largest customer, Cursor, taking the news that
         | you'll be competing directly with them?
        
           | behnamoh wrote:
           | honestly, is this something that anthropic should be worried
           | about? you could ask the same question from all the startups
           | that were destroyed by OpenAI.
        
           | sebzim4500 wrote:
           | They probably aren't thrilled, but a lot of users will prefer
           | a UI and I doubt Anthropic has the spare cycles to make a
           | full Cursor competitor.
        
           | alienthrowaway wrote:
           | Unless Cursor had agreed to an exclusivity agreement with
           | Anthropic, Antropic was (and still is) at risk of Cursor
           | moving to a different provider or using their middleman
           | position to train/distill their own model that competes with
           | Anthropic.
        
         | themgt wrote:
         | Is there / are you planning a way to set $ limits per API key?
         | Far as I can tell the "Spend limits" are currently per-org only
         | which seems problematic.
        
           | bcherny wrote:
           | Good idea! Tracking here:
           | https://github.com/anthropics/claude-code/issues/16
        
           | l1n wrote:
           | You can with Workspaces - https://support.anthropic.com/en/ar
           | ticles/9796807-creating-a...
        
         | nprateem wrote:
         | Does this actually have an 8k (or more) output context via the
         | API?
         | 
         | 3.5 did with a beta header but while 3.6 claimed to, it always
         | cut its responses after 4k.
         | 
         | IIRC someone reported it on GH but had no reply.
        
         | antirez wrote:
         | One of the silver bullets of Claude, in the context of coding,
         | is that it does NOT use RAG when you use it via the web
         | interface. Sure, you burn your tokens but the model sees
         | everything and this let it reply in a much better way. Is
         | Claude Code doing the same and just doing document-level RAG,
         | so that if a document is relevant and _if it fits_ , all the
         | document will be put inside the context window? I really hope
         | so! Also, this means that splitting large code bases into
         | manageable file sizes will make more and more sense. Another Q:
         | is the context size of Sonnet 3.7 the same of 3.5? Btw Thanks
         | you _so much_ for Claude Sonnet, in the latest months it
         | changed the way I work and I 'm able to do a lot more, now.
        
           | bcherny wrote:
           | Right -- Claude Code doesn't use RAG currently. In our
           | testing we found that agentic search out-performed RAG for
           | the kinds of things people use Code for.
        
             | marlott wrote:
             | Interesting - can you elaborate a little on what you mean
             | by agentic search here?
        
               | antirez wrote:
               | I guess it's what sometimes it's called "self RAG", that
               | is, the agent looks inside the files how a human would be
               | to find that's relevant.
        
               | kadushka wrote:
               | As opposed to vector search, or...?
        
               | FeepingCreature wrote:
               | To my knowledge these are the options:
               | 
               | 1. RAG: A model looks at the question, pulls up some
               | associated data and hopes that it helps.
               | 
               | 2. Self-RAG: The model intentionally triggers a lookup
               | for some keywords. This can be via a traditional RAG or
               | just keyword search, like grep.
               | 
               | 3. Full Context: Just jam everything in the context
               | window. The model uses its attention mechanism to pick
               | out the parts it needs. Best but most expensive of the
               | three, especially with repeated queries.
               | 
               | Aider uses kind of a hybrid of 2 and 3: you specify files
               | that go in the context, but Aider also uses Tree-Sitter
               | to get a map of the entire code, ie. function headers,
               | class definitions etc., that is provided in full. On that
               | basis, the model can then request additional files to be
               | added to the context.
        
         | siva7 wrote:
         | Will Claude be available on Azure?
        
         | rgomez wrote:
         | What kind of sorcery did you use to create Claude? Honest
         | question :)
        
           | bcherny wrote:
           | Reticulating...
        
         | TIPSIO wrote:
         | What are your thoughts on having a UI/design benchmark?
        
         | riku_iki wrote:
         | Is there plans to add websearch function over some core
         | websites (SO, API docs)? Competitors have it, and in my
         | experience this provide very good grounding for coding tasks
         | (way less API functions hallucinated).
        
         | artvandalai wrote:
         | Any updates on web search?
        
         | adastra22 wrote:
         | When are you providing an alternative to email magic login
         | links?
        
         | sebzim4500 wrote:
         | Did you guys ever fix the issue where if UK users wanted to use
         | the API they have to provide a VAT number?
        
         | posix86 wrote:
         | Claude is my go to llm for everything, sounds corny but it's
         | literally expanding the circle of what I can reasonably learn,
         | manyfold. Right now I'm attempting to read old philosophical
         | texts (without any background in similar disciplines), and
         | without claude's help to explain the dense language in simpler
         | terms & discuss its ideas, give me historical contexts,
         | explaining why it was written this or that way, compare it
         | against newer ideas - I would've given up many times.
         | 
         | At work I used it many times daily in development. It's concise
         | mode is a breath of fresh air compared to any other llm I've
         | tried. It has helped me find bugs in foreign code bases,
         | explain me the techstack, written bash scripts, saving me
         | dozens of hours of work & many nerves. It generally makes me
         | reach places I wouldn't without due to time constraints &
         | nerves.
         | 
         | The only nitpick is that the service reliability is a bit worse
         | than others, forcing me sometimes to switch to others. This is
         | probably a hard to answer question, but are there plans to
         | improve that?
        
         | throwaway0123_5 wrote:
         | I'm curious why there are no results for the "Claude 3.7
         | Extended Thinking" on SWE-Bench and Agentic tool use.
         | 
         | Are you finding that extended thinking helps a lot when the
         | whole problem can be posed in the prompt, but that it isn't a
         | major benefit for agentic tasks?
         | 
         | It would be a bit surprising, but it would also mirror my
         | experiences, and the benchmarks which show Claude 3.5 being
         | better at agentic tasks and SWE tasks than all other models,
         | despite not being a reasoning model.
        
         | danso wrote:
         | Been a long time casual -- i.e. happy to fix my code by asking
         | questions and copy/pasting individual snippets via the chat
         | interface. Decided to give the `claude` terminal tool a run and
         | have to admit it looks like a fantastic tool.
         | 
         | Haven't tried to build a modern JS web app in _years_ -- it
         | took the claude tool just a few minutes of prompting to convert
         | and refactor an old clunky tool into a proper project
         | structure, and using svelte and vite and tailwind (which I
         | haven 't built with before). Trying to learn how to even
         | scaffold a modern app has felt daunting and this eliminates 99%
         | of that friction.
         | 
         | One funny quirk: I asked it to build a test suite (I know zilch
         | about JS testing frameworks, so it picked vitest for me) for
         | the newly refactored app. I noticed that 3 of the 20 tests
         | failed and so I asked it to run vitest for itself and fix the
         | failing things. 2 minutes later, and now 7 tests were
         | failing...
         | 
         | Which is very funny to me, but also not a big deal. Again, it's
         | such a chore to research test libs and then set things up to
         | their conventions. That the claude tool built a very usable
         | scaffold that I can then edit and iterate on is such a huge
         | benefit by itself, I don't need (nor desire) the AI to be
         | complete turnkey solution.
        
         | bhouston wrote:
         | Have you seen https://mycoder.ai? Seems quite similar. It was
         | my own invention and it seems that you guys are thinking along
         | similar lines - incredibly similar lines.
        
           | handfuloflight wrote:
           | Have _you_ seen https://www.codebuff.com?
        
         | farco12 wrote:
         | Thank you for the update!
         | 
         | I recently attempted to use the Google Drive integration but
         | didn't follow through with connecting because Claude wanted
         | access to my entire Google Drive. I understand this simplifies
         | the user experience and reduced time to ship, but is there
         | anyway the team can add "reduce the access scope of Google
         | Drive integration" to your backlog. Thank you!
         | 
         | Also, I just caught the new Github integration. Awesome.
        
         | lintaho wrote:
         | For the pokemon benchmark, what happened after the Lt Surge
         | gym? Did the model stall or run out of context or something
         | similar?
        
         | swairshah wrote:
         | Why not just open source Claude Code? people have tried to
         | reverse eng the minified version
         | https://gist.githubusercontent.com/1rgs/e4e13ac9aba301bcec28...
        
         | cowpig wrote:
         | It would be great if we could upgrade API rate limits. I've
         | tried "contacting sales" a few times and never received a
         | response.
         | 
         | edit: note that my team mostly hits rate limits using things
         | like aider and goose. 80k input token is not enough when in a
         | flow, and I would love to experiment with a multi-agent
         | workflow using claude
        
         | levocardia wrote:
         | Which starter pokemon does Claude typically choose?
        
           | lcnPylGDnU4H9OF wrote:
           | I'd also be interested in stats on Helix Fossil vs. Dome
           | Fossil.
        
         | gwd wrote:
         | Just started playing with the command-line tool. First reaction
         | (after using it for 5 minutes): I've been using `aider` as a
         | daily driver, with Claude 3.5, for a while now. One of the
         | things I appreciate about aider is that it tells you how much
         | each query cost, and what your total cost is this session. This
         | makes it low-key easy to keep tabs on the cost of what I'm
         | doing. Any chance you could add that to claude-code?
         | 
         | I'd also love to have it in a language that can be compiled,
         | like golang or rust, but I recognize a rewrite might be more
         | effort than it's worth. (Although maybe less with claude code
         | to help you?)
         | 
         | EDIT: OK, 10 minutes in, and it seems to have major issues
         | doing basic patches to my Golang code; the most recent thing it
         | did was add a line with incorrect indentation, then try three
         | times to update it with the correct indentation, getting
         | "String to replace not found in file" each time. Aider with
         | claude 3.5 does this really well -- not sure what the
         | counfounding issue is here, but might be worth taking a look at
         | their prompt & patch format to see how they do it.
        
           | davidbarker wrote:
           | If you do `/cost` it will tell you how much you've spent
           | during that session so far.
        
           | eschluntz wrote:
           | hi! You can do /cost at any time to see what the current
           | session has cost
        
         | xianshou wrote:
         | Any way to parallelize tool use? When I go into a repo and ask
         | "what's in here", I'm aiming for a summary that returns in 20
         | seconds.
        
         | andrewchilds wrote:
         | Hi Boris! Thank you for your work on Claude! My one pet peeve
         | with Claude specifically, if I may: I might be working on a
         | Svelte codebase and Claude will happily ignore that context and
         | provide React code. I understand why, but I'd love to see much
         | less of a deep reliance on React for front-end code generation.
        
         | PKop wrote:
         | It would be great to have a C# / .NET SDK available for Claude
         | so it can be integrated into Semantic Kernel [0][1]. Are there
         | any plans for this?
         | 
         | [0] https://github.com/microsoft/semantic-
         | kernel/issues/5690#iss...
         | 
         | [1] https://github.com/microsoft/semantic-kernel/pull/7364
        
         | timojaask wrote:
         | Hi! I've been using Claude for macOS and iOS coding for a
         | while, and it's mostly great, but it's always using deprecated
         | APIs, even if I instruct it not to. It will correct the mistake
         | if I ask it to, but then in later iterations, it will sometimes
         | switch back to using a deprecated API. It also produces a lot
         | of code that just doesn't compile, so a lot of time is spent
         | fixing the made up or deprecated APIs.
        
         | kapnap wrote:
         | Any change there will be a way to copy and paste the responses
         | into other text boxes (i.e., a new email) and not have to re-
         | jig the formatting?
         | 
         | Lists, numbers, tabs, etc. are all a little time consuming...
         | minor annoyance but thought I'd share.
        
         | wellthisisgreat wrote:
         | Hi, what are the privacy terms for Claude Code? Is it
         | memorizing the codebase it's helping with? From an enterprise
         | standpoint
        
         | joevandyk wrote:
         | It would be amazing to be able to use an API key to submit
         | prompts that use our Project Knowledge. That doesn't seem to be
         | currently possible, right?
        
         | robbomacrae wrote:
         | Awesome to see a new Claude model - since 3.5 its been my go-to
         | for all code related tasks.
         | 
         | I'd really like to use Claude Code in some of my projects vs
         | just sharing snippets via the UI but I'm curious how might
         | doing this from our source directory affect our IP including
         | NDA's, trade secret protections, prior disclosure rules on
         | (future) patents, open source licensing restrictions re:
         | redistribution etc?
         | 
         | Also hi Erik! - Rob
        
         | dailykoder wrote:
         | Folks, let me tell you, AI is a big league player, it's a real
         | winner, believe me. Nobody knows more about AI than I do, and I
         | can tell you, it's going to be huge, just huge. The
         | advancements we're seeing in AI are tremendous, the best, the
         | greatest, the most fantastic. People are saying it's going to
         | change the world, and I'm telling you, they're right, it's
         | going to be yuge. AI is a game-changer, a real champion, and
         | we're going to make America great again with the help of this
         | incredible technology, mark my words.
        
         | fragmede wrote:
         | Now that the world's gotten used to the existence of AI, any
         | hope on removing the guardrails on Claude? I don't need it to
         | answer "How do I make meth", but I would like to not have to
         | social engineer my prompts. I'd like it to just write the code
         | I asked for and not judge me on how ethical the code might be.
         | 
         | Eg Claude will refuse to write code to wget a website and parse
         | the html if you ask it to scrape your ex girlfriend's Instagram
         | profile, for ethical and tos reasons, but if you phrase the
         | request differently, it'll happily go off and generate code
         | that does that exact thing.
         | 
         | Does that really provide value as a business transaction?
        
         | luke-stanley wrote:
         | My key got killed months ago when I tested it on a PDF, and
         | support never got back to me so I am waiting for OpenRouter
         | support!
        
         | throw83288 wrote:
         | Serious question: What advice would you give to a Computer
         | Science student in light of these tools?
        
           | danw1979 wrote:
           | Serious answer: learn to code.
           | 
           | You still need to know what good code looks like to use these
           | tools. If you go forward in your career trusting the output
           | of LLMs without the skills to evaluate the correctness,
           | style, functionality of that code then you will have
           | problems.
           | 
           | People still write low level machine code today, despite
           | compilers having existed for 70+ (?) years.
           | 
           | We'll always need full-stack humans who understand everything
           | down to the electrons even in the age of insane automation
           | that we're entering.
        
         | _cs2017_ wrote:
         | Your footnote 3 seems to imply that the low number for o1 and
         | Grok3 is without parallelism, but I don't think it's publicly
         | known whether they use internal parallelism? So perhaps the low
         | number already uses parallelism, while the high number uses
         | even more parallelism?
         | 
         | Also, curious if you have any intuition as to why the no-
         | parallelism number for AIME with Claude (61.3%) is quite low
         | (e.g., relative to R1 87.3% -- assuming it is an apples to
         | apples comparison)?
        
         | failerk wrote:
         | I tried signing up to use Claude about 6 months ago and ran
         | into an error on the signup page. For some reason this
         | completely locked me out from signing up since a phone number
         | was tied to the login. I have submitted requests to get removed
         | from this blacklist and heard nothing. The times I have tried
         | to reach out on Twitter were never responded to. Has the
         | customer support improved in the last 6 months?
        
         | galaxyLogic wrote:
         | The thing I would like automated is highlighting a function in
         | my code then ask the AI to move it to a new module-file and
         | import that new module.
         | 
         | I would like this to happen easily like hitting a menu or
         | button without having to write an elaborate "prompt" every
         | time.
         | 
         | Is this possible?
        
       | TriangleEdge wrote:
       | This AI race is happening so fast. Seems like it to me anyway. As
       | a software developer/engineer I am worried about my job
       | prospects.. time will tell. I am wondering what will happen to
       | the west coast housing bubbles once software engineers lose their
       | high price tags. I guess the next wave of knowledge workers will
       | move in and take their place?
        
         | fallinditch wrote:
         | My guess is that, yes, the software development job market is
         | being massively disrupted, but there are things you can do to
         | come out on top:
         | 
         | * Learn more of the entire stack, especially the backend, and
         | devops.
         | 
         | * Embrace the increased productivity on offer to ship more
         | products, solo projects, etc
         | 
         | * Be highly selective as far as possible in how you spend your
         | productive time: being uber-effective can mean thinking and
         | planning in longer timescales.
         | 
         | * Set up an awesome personal knowledge management system and
         | agentic assistants
        
           | bilbo0s wrote:
           | This is really good advice.
           | 
           | Underrated comment.
        
           | j_maffe wrote:
           | Do you have any specific tips for the last point? I
           | completely agree with it and have set up a fairly robust
           | Obsidian note taking structure that will benefit greatly from
           | an agentic assistant. Do you use specific tools or workframe
           | for this?
        
           | whynotminot wrote:
           | > Learn more of the entire stack, especially the backend, and
           | devops.
           | 
           | I actually wonder about this. Is it better to gain some
           | relatively mediocre experience at lots of things? AI seems to
           | be pretty good at lots of things.
           | 
           | Or would it be better to develop deep expertise in a few
           | things? Areas where even smart AI with reasoning still can
           | get tripped up.
           | 
           | Trying to broaden your base of expertise seems like it's
           | always a good idea, but when AI can slurp the whole internet
           | in a single gulp, maybe it isn't the best allocation of your
           | limited human training cycles.
        
         | throw234234234 wrote:
         | It has the potential to effect a lot more than just SV/The West
         | Coast - in fact SV may be one of the only areas who have some
         | silver lining with AI development. I think these models have a
         | chance to disrupt employment in the industry globally.
         | Ironically it may be only SWE's and a few other industries
         | (writing, graphic design, etc) that truly change. You can see
         | they and other AI labs are targeting SWEs in particular - just
         | look at the announcement "Claude 3.7 and Code" - very little
         | mention of any other domains on their announcement posts.
         | 
         | For people who aren't in SV for whatever reason and haven't
         | seen the really high pay associated with being there - SWE is
         | just a standard job often stressful with lots of learning
         | required ongoing. The pain/anxiety of being disrupted is even
         | higher then since having high disposable income to invest/save
         | would of been less likely. Software to them would of been a job
         | with comparable pay's to other jobs in the area; often
         | requiring you to be degree qualified as well - anecdotally many
         | I know got into it for the love; not the money.
         | 
         | Who would of thought the first job being automated by AI would
         | be software itself? Not labor, or self driving cars. Other
         | industries either seem to have hit dead ends, or had other
         | barriers (regulation, closed knowledge, etc) that make it
         | harder to do. SWE's have set an example to other industries -
         | don't let AI in or keep it in-house as long as possible. Be
         | closed source in other words. Seems ironic in hindsight.
        
           | throw83288 wrote:
           | What do you even do then as a student? I've asked this dozens
           | of times with zero practical answers at all. Frankly I've
           | become entirely numb to it all.
        
             | throw234234234 wrote:
             | Be glad that you are empowered to pivot - I'm making the
             | assumption you are still young being a student. In a
             | disrupted industry you either want to be young (time to
             | change out of it) or old (50+) - can retire with enough
             | savings. The middle age people (say 15-25 years in the
             | industry; your 35-50 yr olds) are most in trouble depending
             | on the domain they are in. For all the "friendly" marketing
             | IMO they are targeting tech jobs in general - for many
             | people if it wasn't for tech/coding/etc they would never
             | need to use an LLM at all. Anthrophic's recent stats as to
             | who uses their products are telling - its mostly code code
             | code.
             | 
             | The real answer is either to pivot to a domain where the
             | computer use/coding skills are secondary (i.e. you need the
             | knowledge but it isn't primary to the role) or move to an
             | industry which isn't very exposed to AI either due to
             | natural protections (e.g. trades) or artifical ones (e.g
             | regulation/oligopolies colluding to prevent knowledge
             | leaking to AI). May not be a popular comment on this
             | platform - I would love to be wrong.
        
         | viraptor wrote:
         | It seems to be slowing down actually. Last year was wild until
         | around llama 3. The latest improvements are relatively small.
         | Even the reasoning models are a small improvement over explicit
         | planning with agents that we could already do before - it's
         | just nicely wrapped and slightly tuned for that purpose.
         | Deepseek did some serious efficiency improvements, but not so
         | much user-visible things.
         | 
         | So I'd say that the AI race is starting to plateau a bit
         | recently.
        
           | j_maffe wrote:
           | While I agree, you have to remember the dimensionality of the
           | labor-skill space is. The was I see it is that you can
           | imagine the capability of AI as a radius, and the amount of
           | tasks it can cover is a sphere. Linear imporovements in
           | performance causes cubic (or whatever the labor-skill
           | dimensionality is) imporvement in task coverage.
        
         | LouisSayers wrote:
         | I'm not too concerned short to medium term. I feel there are
         | just too many edge cases and nuances that are going to be
         | missed by AI systems.
         | 
         | For example, systems don't always work in the way they're
         | documented to. How is an AI going to differentiate cases where
         | there's a bug in a service vs a bug in its own code? How will
         | an AI even learn that the bug exists in the first place? How
         | will an AI differentiate between someone reporting a bug and a
         | hacker attempting to break into a system?
         | 
         | The world is a complex place and without ACTUAL artificial
         | _intelligence_ we 're going to need people to _at least_ guide
         | AI in these tricky situations.
         | 
         | My advice would be to get familiar with using AI and new AI
         | tools and how they fit into our usual workflows.
         | 
         | Others may disagree, but I don't think software engineers (at
         | least ones the good ones) are going anywhere.
        
       | shortrounddev2 wrote:
       | Does claude have a vscode plugin yet? I dropped github copilot
       | because I didnt want so many subscriptions
        
         | dugmartin wrote:
         | You can use the Roo Code extension and point it most any api,
         | including Anthropic:
         | 
         | https://marketplace.visualstudio.com/items?itemName=RooVeter...
        
         | visarga wrote:
         | Use Windsurf, a VSCode fork, it defaults on Claude as LLM.
        
         | wolffiex wrote:
         | Try running Claude Code in your VS Code terminal! Just don't
         | paste too much text :)
         | https://stackoverflow.com/questions/41714897/character-line-...
        
       | jumploops wrote:
       | > "[..] in developing our reasoning models, we've optimized
       | somewhat less for math and computer science competition problems,
       | and instead shifted focus towards real-world tasks that better
       | reflect how businesses actually use LLMs."
       | 
       | This is good news. OpenAI seems to be aiming towards "the
       | smartest model," but in practice, LLMs are used primarily as
       | learning aids, data transformers, and code writers.
       | 
       | Balancing "intelligence" with "get shit done" seems to be the
       | sweet spot, and afaict one of the reasons the current crop of
       | developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet
       | over 4o.
        
         | bicx wrote:
         | Claude 3.5 has been fantastic in Windsurf. However, it does
         | cost credits. DeepSeek V3 is now available in Windsurf at zero
         | credit cost, which was a major shift for the company. Great to
         | have variable options either way.
         | 
         | I'd highly recommend anyone check out Windsurf's Cascade
         | feature for agentic-like code writing and exploration. It
         | helped save me many hours in understanding new codebases and
         | tracing data flows.
        
           | ai-christianson wrote:
           | I'm working on an OSS agent called RA.Aid and 3.7 is
           | anecdotally a huge improvement.
           | 
           | About to push a new release that makes it the default.
           | 
           | It costs money but if you're writing code to make money, it's
           | totally worth it.
        
           | throwup238 wrote:
           | DeepSeek's models are vastly overhyped (FWIW I have access to
           | them via Kagi, Windsurf, and Cursor - I regularly run the
           | same tests on all three). I don't think it matters that V3 is
           | free when even R1 with its extra compute budget is inferior
           | to Claude 3.5 by a large margin - at least in my experience
           | in both bog standard React/Svelte frontend code and more
           | complex C++/Qt components. After only half an hour of using
           | Claude 3.7, I find the code output is superior and the
           | thinking output is in a completely different universe (YMMV
           | and caveat emptor).
           | 
           | For example, DeepSeek's models almost always smash together
           | C++ headers and code files even with Qt, which is an
           | absolutely egregious error due to the meta-object compiler
           | preprocessor step. The MOC has been around for at least 15
           | years and is all over the training data so there's no excuse.
        
             | tonyhart7 wrote:
             | I seen people switch from claude due to cost to another
             | model notably deepseek tbh I think it still depends on
             | model trained data on
        
             | bionhoward wrote:
             | The big difference is DeepSeek R1 has a permissive license
             | whereas Claude has a nightmare "closed output" customer
             | noncompete license which makes it unusable for work unless
             | you accept not competing with your intelligence supplier,
             | which sounds dumb
        
             | SkyPuncher wrote:
             | I've found DeepSeek's models are within a stone's throw of
             | Claude. Given the massive price difference, I often use
             | DeepSeek.
             | 
             | That being said, when cost isn't a factor Claude remains my
             | winner for coding.
        
             | rubymamis wrote:
             | Hey there! I'm a fellow Qt developer and I really like your
             | takes. Would you like to connect? My socials are on my
             | profile.
        
               | throwup238 wrote:
               | We've already connected! Last year I think, because I was
               | interested in your experience building a block editor
               | (this was before your blog post on the topic). I've been
               | meaning to reconnect for a few weeks now but family life
               | keeps getting in the way - just like it keeps getting in
               | the way of my implementing that block editor :)
               | 
               | I especially want to publish and send you the code for
               | that inspector class and selector GUI that dumps the
               | component hierarchy/state, QML source, and screenshot for
               | use with Claude. Sadly I (and Claude) took some dumb
               | shortcuts while implementing the inspector class that
               | both couples it to proprietary code I can't share and
               | hardcodes some project specific bits, so it's going to
               | take me a bit of time to extricate the core logic.
               | 
               | I haven't tried it with 3.7 but based on my tree-sitter
               | QSyntaxHighlighter and Markdown QAbstactListModel tests
               | so far, it is _significantly_ better and I suspect the
               | work Anthropic has done to train it for computer use will
               | reap huge rewards for this use case. I'm still
               | experimenting with the nitty gritty details but I think
               | it will also be a game changer for testing in general,
               | because combining computer use, gammaray-like dumps, and
               | the Spix e2e testing API completes the full circle on app
               | context.
        
           | newgo wrote:
           | How is it possible that deepseek v3 would be free? It costs a
           | lot of money to host models
        
         | crowcroft wrote:
         | Sometimes I wonder if there is overfitting towards benchmarks
         | (DeepSeek is the worst for this to me).
         | 
         | Claude is pretty consistently the chat I go back to where the
         | responses subjectively seem better to me, regardless of where
         | the model actually lands in benchmarks.
        
           | ben_w wrote:
           | > Sometimes I wonder if there is overfitting towards
           | benchmarks
           | 
           | There absolutely is, even when it isn't intended.
           | 
           | The difference between what the model is fitting to and
           | reality it is used on is essentially every problem in AI,
           | from paperclipping to hallucination, from unlawful output to
           | simple classification errors.
           | 
           | (Ok, not _every_ problem, there 's also sample efficiency,
           | and...)
        
           | FergusArgyll wrote:
           | Ya, Claude _crushes_ the smell test
        
         | eschluntz wrote:
         | Thanks! We all dogfood Claude every day to do our own work
         | here, and solving our own pain points is more exciting to us
         | than abstract benchmarks.
         | 
         | Getting things done require a lot of booksmarts, but also a lot
         | of "street smarts" - knowing when to answer quickly, when to
         | double back, etc
        
           | LouisSayers wrote:
           | Could you tell us a bit about the coding tools you use and
           | how you go about interacting with Claude?
        
             | catherinewu wrote:
             | We find that Claude is really good at test driven
             | development, so we often ask Claude to write tests first
             | and then ask Claude to iterate against the tests
        
               | Kerrick wrote:
               | Write tests (plural) first, as in write more than one
               | failing test before making it pass?
        
           | jasonjmcghee wrote:
           | Just want to say nice job and keep it up. Thrilled to start
           | playing with 3.7.
           | 
           | In general, benchmarks seem to very misleading in my
           | experience, and I still prefer sonnet 3.5 for _nearly_ every
           | use case- except massive text tasks, which I use gemini 2.0
           | pro with the 2M token context window.
        
             | martinald wrote:
             | I find the webdev arena tends to match my experience with
             | models much more closely than other benchmarks:
             | https://web.lmarena.ai/leaderboard. Excited to see how 3.7
             | performs!
        
             | jasonjmcghee wrote:
             | Just wanted to already plop an update - "code" is very
             | good. Just did a ~4 hour task in about an hour. It cost $3
             | which is more than I usual spend in an hour, but very worth
             | it.
        
       | d_watt wrote:
       | I'm about 50kloc into a project making a react native app /
       | golang backend for recipes with grocery lists, collaborative
       | editing, household sharing, so a complex data model and runtime.
       | Purely from the experiment of "what's it like to build with AI,
       | no lines of code directly written, just directing the AI."
       | 
       | As I go through features, I'm comparing a matrix of Cursor,
       | Cline, and Roo, with the various models.
       | 
       | While I'm still working on the final product, there's no doubt to
       | me that Sonnet is the only model that works with these tools well
       | enough to be Agentic (rather than single file work).
       | 
       | I'm really excited to now compare this 3.7 release and how good
       | it is at avoiding some of the traps 3.5 can fall into.
        
         | thebigspacefuck wrote:
         | This has been my experience as well. Why do the others suck so
         | bad?
        
           | d_watt wrote:
           | I wonder how much it's self fulfilling, where the developers
           | of the agents are tuning their prompts / tool calls to
           | sonnet.
        
       | ndm000 wrote:
       | Have there been any updates to Claude 3.5 Sonnet pricing? I can't
       | find that anywhere even though Claude 3.7 Sonnet is now at the
       | same price point. I could use 3.5 for a lot more if it's cheaper.
        
         | minimaxir wrote:
         | No changes to Claude 3.5 Sonnet pricing despite the new model.
         | 
         | https://www.anthropic.com/pricing#anthropic-api
        
       | ramesh31 wrote:
       | Well there goes my evening
        
       | hubraumhugo wrote:
       | You can get your HN profile analyzed by it and it's pretty funny
       | :)
       | 
       | https://hn-wrapped.kadoa.com/
       | 
       | I'm using this to test the humor of new models.
        
         | Philpax wrote:
         | Seems broken? Getting
         | 
         | > An error occurred in the Server Components render. The
         | specific message is omitted in production builds to avoid
         | leaking sensitive details. A digest property is included on
         | this error instance which may provide additional details about
         | the nature of the error.
        
           | ANewFormation wrote:
           | I did multiple accounts with no problem, but in trying to do
           | you I got the same error.
           | 
           | You've broke the system.
        
             | Philpax wrote:
             | New benchmark for good posting, I'll take it!
        
           | ghxst wrote:
           | Worked for me, seems to be case sensitive (?) I'll post these
           | incase I just got lucky and it still doesn't work for you.
           | 
           | https://hn-wrapped.kadoa.com/Philpax?share
           | 
           | > You explain WebAssembly memory management with such passion
           | that we're worried you might be dating your pointer
           | allocations.
           | 
           | > Your comments about multiplayer game architecture are so
           | detailed, we suspect you've spent more time debugging network
           | code than maintaining actual human connections.
           | 
           | > You track AI model performance metrics more closely than
           | your own bank account. DeepSeek R1 knows your preferences
           | better than your significant other.
           | 
           | I like your interests :)
        
             | Philpax wrote:
             | Aha, there it is - terrific, thank you :>
             | 
             | Yes, I'm quite the eclectic kind!
        
         | ANewFormation wrote:
         | Oh god that's genuinely _way_ more amusing than I thought llm
         | systems were capable of.
        
           | XenophileJKO wrote:
           | The more I use LLMs the more I have actually gravitated to
           | looking at the humor of LLMs as a imperfect proxy measure of
           | "intelligence".
           | 
           | Obviously this is problematic, but Claude 3.5 (and now 3.7)
           | have been genuinely funny and consistently funny.
        
         | rubslopes wrote:
         | > - You've reminded so many people to use 'Show HN:' that you
         | should probably just apply for a moderator position already.
         | 
         | > - Your relationship with AI coding assistants is more
         | complicated than most people's dating history - Cline, Cursor,
         | Continue.Dev... pick a lane!
         | 
         | > - You talk about grabbing coffee while your LLM writes code
         | so much that we're not sure if you're a developer or a barista
         | who occasionally programs.
         | 
         | I laughed hard at this :D
        
         | jedberg wrote:
         | > For someone who worked at Reddit, you sure spend a lot of
         | time on HN. It's like leaving Facebook to spend all day on
         | Twitter complaining about social media.
         | 
         | Wow, so spot on it hurts!
        
           | sitkack wrote:
           | > For someone who criticizes corporate structures so much,
           | you've spent an impressive amount of time analyzing their
           | technical decisions. It's like watching someone critique a
           | restaurant's menu while eating there five times a week.
        
           | calvinmorrison wrote:
           | >Your ideal tech stack is so old it qualifies for social
           | security benefits
           | 
           | >You're the only person who gets excited when someone
           | mentions Trinity Desktop Environment in 2025
           | 
           | > You probably have more opinions about PHP's empty()
           | function than most people have about their entire career
           | choices
        
             | drivers99 wrote:
             | > Personal Projects: You'll finally complete that bare-
             | metal Forth interpreter for Raspberry Pi
             | 
             | I was just looking into that again as of yesterday (I
             | didn't post about it here yesterday, just to be clear; it
             | picked up on that from some old comments I must have
             | posted).
             | 
             | > Profile summary: [...] You're the person who not only
             | remembers what a CGA adapter is but probably still has one
             | in working condition in your basement, right next to your
             | collection of programming books from 1985.
             | 
             | Exactly the case, in a working IBM PC, except I don't have
             | a basement. :)
        
         | seafoamteal wrote:
         | Felt genuinely called out by that 'Roasts' section.
        
           | Panoramix wrote:
           | That thing knows me better than I know myself
        
         | cyberpunk wrote:
         | > You hate Terraform so much you'd rather learn Erlang than
         | write another for-loop in HCL.
         | 
         | ..
         | 
         | > After years of complaining about Terraform, you'll fully
         | embrace Crossplane and write a scathing Medium article titled
         | 'Why I Left Terraform and Never Looked Back'.
         | 
         | Hahahaha.
        
         | BeetleB wrote:
         | This is a better plug for the new Claude Sonnet model than the
         | official announcement!
        
         | jjice wrote:
         | This is absolutely hilarious! Thanks for posting. It feels
         | weighted towards some specific things (I assume this is done by
         | the LLM caring about later context more?) - making it debatably
         | even funnier.
         | 
         | > You're the only person who gets excited about trailing commas
         | in SQL. Even the database administrators are like 'dude, it's
         | just a comma.'
        
         | throwup238 wrote:
         | Your comments about suburban missile defense systems have the
         | FBI agent monitoring your internet connection seriously
         | questioning their career choices.       You've spent so much
         | time explaining why manufacturing is complex that you could
         | have just built your own CRT factory by now.       You claim to
         | be skeptical of AI hype, yet you've indexed more documentation
         | with Cursor than most people have read in their lifetime.
         | 
         | Surprisingly accurate, but seems to be based on a very small
         | snippet of actual comments (presumably to save money). I wonder
         | what the prompt would output when given the full 200k tokens of
         | context.
        
         | LinXitoW wrote:
         | Got absolutely read to filth:
         | 
         | > You've spent more time explaining why Go's error handling is
         | bad than Go developers have spent actually handling errors.
         | 
         | > Your relationship with programming languages is like a dating
         | show - you keep finding flaws in all of them but can't commit
         | to just one.
         | 
         | > If error handling were a religion, you'd be its most zealous
         | missionary, converting the unchecked one exception at a time.
        
           | airstrike wrote:
           | > You've spent more time explaining why Go's error handling
           | is bad than Go developers have spent actually handling
           | errors.
           | 
           | That is absolutely hilarious. Really well done by everyone
           | who made that line possible.
        
           | sa46 wrote:
           | Yea, these are nicely done. To add some balance:
           | 
           | > After years of defending Go, you'll secretly start a side
           | project in Rust but tell no one on HN about your betrayal
        
         | toomuchtodo wrote:
         | The 2025 predictions were like a spooky tarot card reading.
        
         | airstrike wrote:
         | > You've mentioned iced so many times, we're starting to wonder
         | if you're secretly developing a Rust-based refrigerator company
         | on the side.
         | 
         | LMFAO so good. Humor seems on point
        
         | desperatecuban wrote:
         | > Your salary is so low even your legacy code feels sorry for
         | you.
         | 
         | > You're the only person on HN who thinks $800/month is a
         | salary and not a cloud computing bill.
         | 
         | ouch
        
         | jumploops wrote:
         | > You've mentioned 'simple is robust' so many times that we're
         | starting to think your dating profile just says 'uncomplicated
         | and sturdy'.
         | 
         | > For someone who builds tools to automate everything, you sure
         | spend a lot of time manually explaining why automation is the
         | future on HN.
         | 
         | > Your obsession with sandboxed code execution suggests you've
         | been traumatized by at least one production outage caused by an
         | intern's unreviewed PR.
         | 
         | So good it hurts!
        
         | jddj wrote:
         | > You've recommended Marginalia search so many times, we're
         | starting to think you're either the developer or just really
         | enjoy websites that look like they were designed in 1998.
         | 
         | Actually quite funny.
         | 
         | [1] https://hn-wrapped.kadoa.com/jddj?share
        
           | throwup238 wrote:
           | Especially hilarious considering that this is the actual
           | marginalia developer: https://hn-
           | wrapped.kadoa.com/marginalia_nu
           | 
           |  _> You defend Java with such passion that Oracle 's legal
           | team is considering hiring you as their chief evangelist -
           | just don't tell them about your secret admiration for more
           | elegant programming paradigms._
        
         | StefanBatory wrote:
         | ... I had been called out by it hard, lmao. Painfully accurate.
        
         | taytus wrote:
         | "You were using 'I don't understand these valuations' before it
         | was cool - the original valuation skeptic hipster of Hacker
         | News" -
        
         | agys wrote:
         | "You've spent more time optimizing DOM manipulation for ASCII
         | art than most people spend deciding what to watch on Netflix in
         | their entire lives."
         | 
         | Ouch... :)
        
         | hambos22 wrote:
         | > You built your own Klaviyo alternative to save EUR500, but
         | how many hours of development at market rate did that cost? The
         | true Greek economy at work!
         | 
         | ouch (yuyu)
        
         | nbzso wrote:
         | This thing is hilarious. :)
         | 
         | Roast:
         | 
         | - Your comments have more doom predictions than a Y2K
         | convention in December 1999.
         | 
         | - You've used 'stochastic parrot' so many times, actual parrots
         | are filing for trademark infringement.
         | 
         | - If tech dystopia were an Olympic sport, you'd be bringing
         | home gold medals while explaining how the podium was designed
         | by committee and the medal contains surveillance chips.
        
         | replete wrote:
         | I need some ice for the burn I just received.
        
         | gmassman wrote:
         | > Spends more time explaining why TypeScript in Svelte is
         | problematic than actually fixing TypeScript in Svelte.
         | 
         | Damn, that's brutal. I mean, I never said _I knew_ how to fix
         | ComponentProps or generic components, just that they have
         | issues...
        
         | processing wrote:
         | ljl good stuff
         | 
         | "A digital nomad who splits time between critiquing Facebook's
         | UI decisions, unearthing obscure electronic music tracks with 3
         | plays on YouTube, and occasionally making fires on German
         | islands. When not creating Dystopian Disco mixtapes or
         | lamenting the lack of MIDI export in AI tools, they're probably
         | archiving NYT articles before paywalls hit.
         | 
         | Roast
         | 
         | You've spent more time complaining about Facebook's UI than
         | Facebook has spent designing it, yet you still check it enough
         | to notice every change.
         | 
         | Your music discovery process is so complex it requires Discogs,
         | Bandcamp, YouTube, and three specialized record stores, yet
         | you're surprised when tracks only have 3 plays.
         | 
         | You're the only person who joined HN to discuss the Yamaha DX7
         | synthesizer from 1983 and somehow managed to submit two front-
         | page stories about it in 2019-2020. The 80s called, they want
         | their FM synthesis back."
         | 
         | edit: predictions are spot on - wow. Two of them detailed two
         | projects I'm actively working on.
        
         | redeux wrote:
         | > You complain about digital distractions while writing novels
         | in HN comment threads. That's like criticizing fast food while
         | waiting in the drive-thru line.
         | 
         | >You'll write a thoughtful essay about 'digital minimalism'
         | that reaches the HN front page, ironically causing you to spend
         | more time on HN responding to comments than you have all year.
         | 
         | It sees me! Noooooo ...
        
         | maronato wrote:
         | https://hn-wrapped.kadoa.com/dang?share
         | 
         | > Most used terms: "Please don't" lol
        
         | nickvec wrote:
         | > You correct grammar in HN comments but still haven't figured
         | out that nobody cares
         | 
         | My ego will never recover from this
        
         | raminf wrote:
         | > Hacker News
         | 
         | > You'll finally stop checking egg prices at Costco and instead
         | focus on writing that definitive 'How I Built My Own Super App
         | Without Getting Rejected By Apple' post.
         | 
         | On it!
        
         | fullstackchris wrote:
         | > You've experienced so many startup failures that your
         | LinkedIn profile should just read 'Professional Titanic
         | Passenger: Always Picks the Wrong Ship'.
         | 
         | :'(
        
         | boogieknite wrote:
         | > You've spent more time justifying your Apple Vision Pro
         | purchase than actually using it for anything productive, but
         | hey, at least you can watch movies on 'the best screen' while
         | pretending it's a 'dev kit'.
         | 
         | blasted
        
         | wildermuthn wrote:
         | "Your enthusiasm for Oculus in 2014 was so intense that Mark
         | Zuckerberg probably bought it just to make you stop posting
         | about it."
         | 
         | Incredible work!
        
         | ilrwbwrkhv wrote:
         | Profile Summary
         | 
         | A successful tech entrepreneur who built a multi-million dollar
         | business starting with Common Lisp, you're the rare HN user who
         | actually practices what they preach.
         | 
         | Your journey from Lisp to Go to Rust mirrors your evolution
         | from idealist to pragmatist, though you still can't help but
         | reminisce about the magical REPL experience while complaining
         | about JavaScript frameworks.
         | 
         | ---
         | 
         | Roast
         | 
         | You complain about AI-generated code being too complex, yet you
         | pine for Common Lisp, a language where parentheses reproduction
         | is the primary feature.
         | 
         | For someone who built a multi-million dollar business, you
         | spend an awful lot of time telling everyone how much JavaScript
         | and React suck. Did a React component steal your lunch money?
         | 
         | You've changed programming languages more often than most
         | people change their profile pictures. At this rate, you'll be
         | coding in COBOL by 2026 while insisting it's
         | 'underappreciated'.
        
         | CamperBob2 wrote:
         | _Your comments have more bits of precision than the ADCs you
         | love discussing, but somehow still manage to compress all
         | nuance out of complex topics_
         | 
         | Hit dog hollers
        
         | dgunay wrote:
         | > Your ideal laptop would run Linux flawlessly with perfect
         | hardware compatibility, have MacBook build quality, and Windows
         | game support. Meanwhile, the rest of us live in reality.
         | 
         | Damn, got me there haha
        
         | netshade wrote:
         | LOL, this truly made me laugh. I'm also doing humor stuff with
         | Claude, I was pretty pleased with 3.5 so excited to see what
         | happens with the 3.7 change. It's a radio station with a bunch
         | of DJs with different takes on reality, so looking forward to
         | see how it handles their different experiences.
        
       | anonzzzies wrote:
       | We have used claude almost exclusively since 3.5 ; we regularly
       | run our internal benchmark (coding) against others, but it's
       | mostly just a waste of time and money. Will be testing 3.7 the
       | coming days to see how it stacks up!
        
       | newbie578 wrote:
       | Scary to watch the pace of progress and how the whole industry is
       | rapidly shifting.
       | 
       | I honestly didn't believe things would speed up this much.
        
       | DavidPP wrote:
       | Haven't had time to try it out, but I've built myself a tool to
       | tag my bookmarks and it uses 3.5 Haiku. Here is what it said
       | about the official article content:
       | 
       |  _I apologize, but the URL and page description you provided
       | appear to be fictional. There is no current announcement of a
       | Claude 3.7 Sonnet model on Anthropic 's website. The most recent
       | Claude 3 models are Claude 3 Haiku, Sonnet, and Opus, released in
       | March 2024. I cannot generate a description for a non-existent
       | product announcement._
       | 
       | I appreciate their stance on safety, but that still made me
       | laugh.
        
       | dzhiurgis wrote:
       | Anyone else noticed all the reasoning models kinda catch up on
       | claude and claude itself turned to crap last week?
        
       | kmlx wrote:
       | Claude 3.5 sonnet has been my go to for coding tasks, it's just
       | so much better than the others.
       | 
       | but I've tried using the api in production and had to drop it due
       | to daily issues: https://status.anthropic.com/
       | 
       | compare to https://status.openai.com/
       | 
       | any idea when we'll see some improvements in api availability or
       | will the focus be more on the web version of claude?
        
         | scrollop wrote:
         | Err, if you compare the two consoles you'll see that anthropic
         | is actually slightly better on average than openai's uptime.
        
           | kmlx wrote:
           | click on individual days. you'll notice that there are daily
           | errors.
        
       | msp26 wrote:
       | Does it show the raw "reasoning" tokens or is it a summary?
       | 
       | Edit: > we've decided to make its thought process visible in raw
       | form.
        
       | koakuma-chan wrote:
       | Where did 3.6 go?
        
         | danielbln wrote:
         | Allegedly many people called new newest 3.5 revision 3.6, so
         | Anthropic just rolled with it and called this 3.7.
        
       | meetpateltech wrote:
       | When you ask: 'How many r's are in strawberry?'
       | 
       | Claude 3.7 Sonnet generates a response in a fun and cool way with
       | React code and a preview in Artifacts
       | 
       | check out some examples:
       | 
       | [1]https://claude.ai/share/d565f5a8-136b-41a4-b365-bfb4f4400df5
       | 
       | [2]https://claude.ai/share/a817ac87-c98b-4ab0-8160-feefd7f798e8
        
         | jasonjmcghee wrote:
         | I'm guessing this is an easter egg, but this was a huge gripe I
         | had with artifacts and eventually disabled it (now impossible
         | to disable afaict) as I'd ask question completely unrelated to
         | code or clearly not wanting code as an output, and I'd have to
         | wait for it to write a program (which you can't stop afaict, it
         | stops the current artifact then starts a new one)
         | 
         | (still claude sonnet is my go-to and favorite model)
        
         | falcor84 wrote:
         | A shame the underlying issue still persists:
         | 
         | > There is exactly 1 'r' in "blueberry" [0]
         | 
         | [0]
         | https://claude.ai/share/9202007a-9d85-49e6-9883-a8d8305cd29f
        
         | OsrsNeedsf2P wrote:
         | This test has always been so stupid since models work at the
         | token level. Claude 3.5 already 5xs your frontend dev speed but
         | people still say "hurr durr it can't count strawberry" as if
         | that's a useful problem
        
           | dannyw wrote:
           | The problem also comes to LLMs being confidently wrong when
           | it's wrong.
        
           | bufferoverflow wrote:
           | This test isn't stupid. If it can't count the number of
           | letters in a text, can you rely on it with more important
           | calculations?
        
             | stnmtn wrote:
             | You can rely on it for anything that you can validate
             | quickly. And it turns out, there are a lot of problems
             | which are trivial to validate the solution to, but
             | difficult to build the solution.
        
       | anti-soyboy wrote:
       | OpenAI should be worried as they products are weak
        
       | batterylake wrote:
       | Hi Claude Code team, excited for the launch!
       | 
       | How well does Claude Code do on tasks which rely heavily on
       | visual input such as frontend web dev or creating data
       | visualizations?
        
         | wolffiex wrote:
         | As a CLI, this tool is most efficient when it can see text
         | outputs from the commands that it runs. But you can help it
         | with visual tasks by putting a screenshot file in your project
         | directory and telling claude to read it, or by copying an image
         | to your clipboard and pasting it with CTRL+V
        
           | batterylake wrote:
           | Cool, thanks!
        
       | siva7 wrote:
       | Will Claude Code also be available with Pro Subscription?
        
       | simion314 wrote:
       | Why not accepting other payment methods like PayPal/venmo ?
       | Steam, Netflix have developers managed to integrate those payment
       | methods so I conclude that Anthropic,Google, MS, OpenAI don't
       | really need the money from the user but just hunting from big
       | investors.
        
       | _joel wrote:
       | I've been using 3.5 with Roocode for the past couple of weeks and
       | I've found it really quite powerful. Making it write tests and
       | run them as part of the flow is with vscode windows pinging about
       | is neat too.
        
       | forrestthewoods wrote:
       | Claude is the best example of benchmarks not being reflective of
       | reality. All the AI labs are so focused on improving benchmark
       | scores but when it comes to providing actual utility Claude has
       | been the winner for quite some time.
       | 
       | Which isn't to say that benchmarks aren't useful. They surely
       | are. But labs are clearly both overtraining and overindexing on
       | benchmarks.
       | 
       | Coming from gamedev I've always been significantly more yolo
       | trust your gut than my PhD co-workers. Yes data is good. But I
       | think the industry would very often be better off trusting guts
       | and not needing a big huge expensive UX study or benchmark to
       | prove what you can plainly see.
        
       | Alifatisk wrote:
       | Why is Claude-3.5-Haiku considered PRO and Claude-3.7-Sonnet is
       | for free users?
        
       | alecco wrote:
       | Who do I have to kill to get Claude Code access?
        
         | xd1936 wrote:
         | $ npm install -g @anthropic-ai/claude-code
         | 
         | $ claude
        
       | ckbishop wrote:
       | Well, I used 3.5 via Cursor to do some coding earlier today, and
       | the output kind of sucked. Ran it through 3.7 a few minutes ago,
       | and it's much more concise and makes sense. Just a little
       | anecdotal high five from me.
        
       | freediver wrote:
       | Kagi LLM benchmark updated with general purpose and thinking mode
       | for Sonnet 3.7.
       | 
       | https://help.kagi.com/kagi/ai/llm-benchmark.html
       | 
       | Appears to be second most capable general purpose LLM we tried
       | (second to gemini 2.0 pro, in front of gpt-4o). Less impressive
       | in thinking mode, about at the same level as o1-mini and o3-mini
       | (with 8192 token thinking budget).
       | 
       | Overall a very nice update, you get higher quality and higher
       | speed model at same price.
       | 
       | Hope to enable it in Kagi Assistant within 24h!
        
         | jjice wrote:
         | Thank you to the Kagi team for such fast turn around on new
         | LLMs being accessible via the Assistant! The value of Kagi
         | Assistant has been a no-brainer for me.
        
         | flixing wrote:
         | Do you think kagi is the right Eval tool? If so,why?
        
         | thefourthchime wrote:
         | Nice, but where is Grok?
        
           | pertymcpert wrote:
           | Perhaps they're waiting for the Grok API to be public?
        
         | Squarex wrote:
         | I'm surprised that Gemini 2.0 is first now. I remember that
         | Google models were under performing on kagi benchmarks.
        
           | manmal wrote:
           | Gemini 2 is really good, and insanely fast.
        
             | Squarex wrote:
             | It is, but in this benchmark gemini scored very poorly in
             | the past.
        
           | Workaccount2 wrote:
           | Having your own hardware to run LLMs will pay dividends.
           | Despite getting off on the wrong foot, I still believe Google
           | is best positioned to run away with the AI lead, solely
           | because they are not beholden to Nvidia and not stuck with a
           | 3rd party cloud provider. They are the only AI team that is
           | top to bottom in-house.
        
             | Squarex wrote:
             | I've used gemini for it's large context window before. It's
             | a great model. But specifically in this benchmark it has
             | always scored very low. So I wonder what has changed.
        
         | guelo wrote:
         | How did you chose the 8192 token thinking budget? I've often
         | seen Deepseek R1 use way more than that.
        
         | KTibow wrote:
         | One thing I don't understand is why Claude 3.5 Haiku, a non
         | thinking model in the non thinking section, says it has a 8192
         | thinking budget.
        
       | slantedview wrote:
       | As a Claude Pro user, one of the biggest problems I have with day
       | to day use of Sonnet is running out of tokens, and having to wait
       | several hours. Would this new deep thinking capability just hit
       | this problem faster?
        
         | k8sToGo wrote:
         | Have you tried just using the API and pay as you go?
        
           | mvdtnz wrote:
           | That doesn't answer his very specific question.
        
       | grav wrote:
       | Claude 3.7 Sonnet seems to have a context window of 64.000 via
       | the API:                 max_tokens: 4242424242 > 64000, which is
       | the maximum allowed number of output tokens for
       | claude-3-7-sonnet-20250219
       | 
       | I got a max of 8192 with Claude 3.5 sonnet.
        
         | koakuma-chan wrote:
         | Context window is how long your prompt can be. Output tokens is
         | how long its response can be. What you sent says its response
         | can be 64k tokens at maximum.
        
       | epistasis wrote:
       | It's pretty fascinating to refresh the usage page on the API site
       | while working [0].
       | 
       | After initialization it was up to 500k tokens ($1.50). After a
       | few questions and a small edit, I'm up to over a million tokens
       | (>$3.00). Not sure if the amount of code navigation and typing
       | saved will justify the expense yet. It'll take a bit more
       | experimentation.
       | 
       | In any case, the default API buy of $5 seems woefully low to
       | explore this tool.
       | 
       | [0] https://console.anthropic.com/settings/usage
        
         | koakuma-chan wrote:
         | It also produces terrible code even though it's supposed to be
         | good for front-end development.
        
           | trekkie1024 wrote:
           | Could you share an example?
        
             | koakuma-chan wrote:
             | TLDR: told it to implement a grid view as an alternative to
             | the existing list view, and specifically told it to DRY the
             | code. What it did? Copy and pasted the list view
             | implementation (definitely not DRY), and tried to make it a
             | grid, and even though it is a grid, it looks terrible
             | (https://i.imgur.com/fJiSjq4.png).
             | 
             | I don't understand how people use cursor and all that other
             | shit when it cannot follow such simple instructions.
             | 
             | Prompt (Claude Code): Implement an alternative grid view
             | that the users can switch to. Follow the existing code
             | style with empty comments and line breaks for improved code
             | readability. Use snake case. DRY the code, avoid repetition
             | of code. Do not change the font size or weight.
             | 
             | Output: https://github.com/mayo-
             | dayo/app/compare/0.4...claude-code-g...
        
               | koakuma-chan wrote:
               | It also keeps adding aspect-ratio to every single image
               | it finds in my code base.
        
               | koakuma-chan wrote:
               | Also this: `grid grid-cols-2 sm:grid-cols-3 md:grid-
               | cols-4 lg:grid-cols-5 xl:grid-cols-6`
               | (https://github.com/mayo-
               | dayo/app/blob/463ad5aeee904289ecc7d4...).
               | 
               | Even though my Layout clearly says `max-w-md`
               | (https://github.com/mayo-
               | dayo/app/blob/463ad5aeee904289ecc7d4...).
        
               | sensanaty wrote:
               | In any moderately sized codebase it's basically useless
               | indeed. Pretty much all the praise and hype I ever see is
               | from people making todo-list-tier applications and
               | shouting with excitement how this is going to replace all
               | of humanity.
               | 
               | Hell, I still have to remind it (Cursor) to not give me
               | fucking React a few messages after I've already told it
               | to not give me React (it's a Vue application with not a
               | single line of React in it). Genuinely maddening, but the
               | infinite wisdom of the higher ups forces me into wasting
               | my time with this crap
        
               | epistasis wrote:
               | Claude's predilection and evangelism for React is
               | frustrating. Many times I have used it as search with a
               | question like "In the Python library X how do I do Z?"
               | And I'll get a React widget that computes what I was
               | trying to compute.
        
               | pityJuke wrote:
               | There's a middle ground, I find.
               | 
               | Absolutely, when tasked with something quite complex in a
               | complex code base, it doesn't really work. It can get you
               | some of the way there, and some of the code it produces
               | gives you great ideas on where to go from, but it doesn't
               | work.
               | 
               | But there are certainly some tasks where it excels. I
               | asked it to refactor a rather gnarly function (C++), and
               | it did a great job at decomposing it. The initial
               | decomposition was a bit naive: the original function took
               | in a vector, and would parse what the function & data
               | from the vector, and the decomposition split out the
               | functions, but the data still came in as a vector. For
               | instance, one of the functions took a filename, and file
               | contents, and it took it as element 0 and element 1 from
               | a vector, when it should obviously be two parameters. But
               | some further prompting and it took it to the end.
        
         | epistasis wrote:
         | Update: Code tokens appear to be cheaper than 3.7 tokens, looks
         | like it is around $0.75/million tokens for code, rather than
         | the $3/million that the articles specifies for Claude 3.7
        
       | highfrequency wrote:
       | Awesome work. When CoT is enabled in Claude 3.7 (not the new
       | Claude Code), is the model now able to compile and run code as
       | part of its thought process? This always seemed like very low
       | hanging fruit to me, given how common this pattern is: ask for
       | code, try running it, get an error (often from an outdated API in
       | one of the packages used), paste the error back to Claude, have
       | Claude immediately fix it. Surely this could be wrapped into the
       | reasoning iterations?
        
       | falcor84 wrote:
       | Why can't they count to 4?
       | 
       | I accepted it when Knuth did it with TeX's versioning. And I sort
       | of accept it with Python (after the 2-3 transition fiasco), but
       | this is getting annoying. Why not just use natural numbers for
       | major releases?
        
         | jjice wrote:
         | I think I heard on a podcast with some of their team that they
         | want 4 to be a massive jump. If I recall, they said that they
         | want Haiku (the smallest of their current gen models) to be as
         | good as Opus (the highest version, although there isn't one in
         | the 3.5+ line) of the previous generation.
        
         | sensanaty wrote:
         | You'd think all these companies would have a single good naming
         | convention, amazingly they don't. I suspect it's half on
         | purpose so they can nerf the models without anyone suspecting
         | once the hype dies down, since with every one of these models
         | the latter version of the "same" version is worse than the
         | launch version
        
       | nurettin wrote:
       | What I love about their API is the tools array. Given a json
       | schema describing your functions, it will output tool usage
       | appropriate for the prompt. You can return tool results per call,
       | and it will generate a dialog and additional tool calls based on
       | those results.
        
       | Uninen wrote:
       | I'm somewhat impressed from the very first interaction I had with
       | Claude 3.7 Sonnet. I prompted it to find a problem in my codebase
       | where a CloudFlare pages function would return 500 + nonsensical
       | error and an empty response in prod. Tried to figure this out all
       | Friday. It was super annoying to fix as there's no way to add
       | more logging or have any visibility to the issue as the script
       | died before outputting anything.
       | 
       | Both o1, o3 and Claude 3.5 failed to help me in any way with
       | this, but Claude 3.7 not only found the correct issue with first
       | answer (after thinking 39 seconds) but then continued to write me
       | a working function to work around the issue with the second
       | prompt. (I'm going to let it write some tests later but stopped
       | here for now.)
       | 
       | I assume it doesn't let me to share the discussion as I connected
       | my GitHub repo to the conversation (a new feature in the web chat
       | UI launched today) but I copied it as a gist here:
       | https://gist.github.com/Uninen/46df44f4307d324682dabb7aa6e10...
        
         | Uninen wrote:
         | One thing about the reply gives away why Claude is still
         | basically clueless about Actual Thinking; it suggested me to
         | move the HTML sanitization to the frontend. It's in the CF
         | function because it would be trivial to bypass it in the
         | frontend making it easy to post literally anything in the db.
         | Even a junior developer would understand this.
        
       | umaar wrote:
       | Drawing an SVG of a pelican on a bicycle. Claude 3.7 edition:
       | https://x.com/umaar/status/1894114767079403747
        
         | redox99 wrote:
         | Claude 3.5/3.6/3.7 seems _too_ good at SVG compared to other
         | models. I 'd wager they did a bit of training specifically on
         | that.
        
       | pcwelder wrote:
       | Claude code terminal ux feels great.
       | 
       | It has some well thought out features like restarting
       | conversation with compressed context.
       | 
       | Great work guys.
       | 
       | However, I did get stuck when I asked it to run `npm create
       | vite@latest todo-app` because it needs interactivity.
        
       | g8oz wrote:
       | Congratulations on the release! While team members are monitoring
       | this discussion let me add that a relatively simple improvement
       | I'd like to see in the UI is the ability to export a chat to
       | markdown or XML.
        
       | bhouston wrote:
       | I wonder how similar Claude Code is to https://mycoder.ai - which
       | also uses Claude in an agentic fashion?
       | 
       | It seems quite similar:
       | 
       | https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...
        
       | jsemrau wrote:
       | What I found one of the most interesting takeaways from
       | Huggingface's GAIA is that the agent would provide better result
       | when the agent "reasoned" the response to the task in code.
        
       | wewewedxfgdf wrote:
       | What makes software "agentic" instead of just a computer program?
       | 
       | I hear lots of talk about agents and can't see them as being any
       | different from an ordinary computer program.
        
         | dannyw wrote:
         | Computer programs generally don't call functions non-
         | deterministically, including choosing what functions to call ,
         | and when, at runtime.
        
       | Uninen wrote:
       | The Anthropic models comparison table has been updated now.
       | Interesting new things at least the maximum output tokens upped
       | from 8k to 64k and the knowledge cutoff date from April 2024 to
       | October 2024.
       | 
       | https://docs.anthropic.com/en/docs/about-claude/models/all-m...
        
       | bcherny wrote:
       | Thanks everyone for all your questions! The team and I are
       | signing off. Please drop any other bugs or feature requests here:
       | https://github.com/anthropics/claude-code. Thanks and happy
       | coding!
        
       | anotherpaulg wrote:
       | Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard
       | [0], WITHOUT USING THINKING.
       | 
       | Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest
       | non-thinking score, taking that title from Sonnet 3.5.
       | 
       | Aider 0.75.0 is out with support for 3.7 Sonnet [1].
       | 
       | Thinking support and thinking benchmark results coming soon.
       | 
       | [0] https://aider.chat/docs/leaderboards/
       | 
       | [1] https://aider.chat/HISTORY.html#aider-v0750
        
         | bearjaws wrote:
         | Thanks for all the work on aider, my favorite AI tool.
        
         | stavros wrote:
         | I'd like to second the thanks for Aider, I use it all the time.
        
         | liamYC wrote:
         | I'd like to 3rd the thanks for Aider it's fantastic!
        
         | gwd wrote:
         | Interesting that the "correct diff format" score went from
         | 99.6% with Claude 3.5 to 93.3% for Claude 3.7. My experience
         | with using claude-code was that it consistently required
         | several tries to get the right diff. Hopefully all that will
         | improve as they get things ironed out.
        
         | throwaway454812 wrote:
         | Any chance you can add support for Vertex AI Sonnet 3.7, which
         | looks like it's available now? Thank you!
        
       | hankchinaski wrote:
       | It's amazingly good, but it will be scaringly good when there
       | will be a way to include the entire codebase in the context and
       | let it create and run various parts of a large codebase
       | autonomously. Right now I can only do patch work and give
       | specific code snippets to make it work. Excited to try this new
       | version out, I'm sure I won't be disappointed,
       | 
       | Edit: I just tried claude code CLI and it's a good compromise, it
       | works pretty well, it does the discovery by itself instead of
       | loading the whole codebase into context
        
         | flutas wrote:
         | FWIW, there's a project to turn it into something similar,
         | though I think it's lacking the "entire in context" part and
         | runs into rate limits quick with Claude.
         | 
         | https://github.com/All-Hands-AI/OpenHands
         | 
         | The few times I've tested it out though it fails fairly quick
         | and gets hung up (usually on setting up the project while
         | testing with Kotlin / Go).
        
         | thefourthchime wrote:
         | Cursor AI is getting there.
        
           | hankchinaski wrote:
           | cursor is just a wrapper to the apis and is unnecessarily
           | expensive, I use zed editor with custom API keys and it works
           | super well
        
       | knes wrote:
       | at Augment (https://augmentcode.com) we were one of the partner
       | who tested 3.7 pre-launch. And it has been a pretty significant
       | increase in quality and code understanding. Happy to answer some
       | questions
       | 
       | FYI, We use Claude 3.7 has part of the new features we are
       | shipping around Code Agent & more.
        
       | ginkgotree wrote:
       | Been using 3.5 sonnet for a mobile app build the past month.
       | Havent had much time to get a good sense of 3.7 improvements, but
       | I have to say the dev experience improvement of Claude Code right
       | in my shell is fantastic. Loving it so far
        
       | ismaelvega wrote:
       | Any plans to make some HackerRank Astra bench?
        
       | specto wrote:
       | I've had a personal subscription to Claude for a while now. I
       | would love if that also gave me access to some amount of API
       | calls.
        
       | mirekrusin wrote:
       | Ok, just got documentation and fixed two bugs in my open source
       | project.
       | 
       | $1.42
       | 
       | This thing is a game changer.
        
       | ramesh31 wrote:
       | It would be reeeaaally nice if someone built Claude Code into a
       | Cline/Aider type extension...
        
       | bittermandel wrote:
       | Claude Code works pretty OK so far, but Bash doesn't work
       | straight up. Just sits and waits, even when running something
       | basic like "!echo 123".
        
       | leyoDeLionKin wrote:
       | I cancelled after I hit the limit, plus you have very limited
       | support here in europe
        
       | 0xcb0 wrote:
       | I can just say that this is awesome. I just did spend 10$ and a
       | handful of querys to init up a app idea I had in a while.
       | 
       | The basic idea is working, it handled everything for me.
       | 
       | From setting up the node environment. Creating the directories,
       | files, patching the files, running code, handling errors,
       | patching again. From time to time it fails to detect its own
       | faults. But when I pinpoint it, it get it most of the time. And
       | the UI is actually more pretty than I would have crafted in v1
       | 
       | When this get's cheaper, and better with each iteration,
       | everybody will have a full dev team for a couple of bucks.
        
       | wellthisisgreat wrote:
       | What's the privacy like for Claude Code? Is it memorizing all the
       | codebase?
        
       | j_maffe wrote:
       | It redid half of my BSc thesis in less than 30s :|
       | 
       | https://claude.ai/share/ed8a0e55-633f-4056-ba70-772ab5f5a08b
       | 
       | edit: Here's the output figure https://i.imgur.com/0c65Xfk.png
       | 
       | edit 2: Gemini Flash 2 failed miserably
       | https://g.co/gemini/share/10437164edd0
        
         | ThouYS wrote:
         | master and phd next!
        
         | akreal wrote:
         | Could this (or something similar) be found in public
         | access/some libraries?
        
           | j_maffe wrote:
           | There is only a single paper that has published a similar
           | derivation but with a critical mistake. To be fair there are
           | many documented examples of how to derive parametric
           | relationships in linkages and can be quite methodical. I
           | think I could get Gemini or 3.5 to do it but not single
           | shot/ultra fast like here.
        
       | dev0p wrote:
       | The quality of the code is so much better!
       | 
       | The UI seems to have an issue with big artifacts but the model is
       | noticeably smarter.
       | 
       | Congratulations on the release!
        
         | unshavedyak wrote:
         | Are you using Claude Code or just the UI? Trying to figure out
         | if anyone actually has Code yet hah.
         | 
         |  _edit_ : Oh, there's a link to "joining the preview" which
         | points to: https://docs.anthropic.com/en/docs/agents-and-
         | tools/claude-c...
        
       | gigatexal wrote:
       | How is the code generation? Open ai was generating good looking
       | terraform but it was hallucinating on things that were incorrect.
        
       | Copenjin wrote:
       | Very good, Code is extremely nice but as others have said, if you
       | let it go on its own it burns through your money pretty fast.
       | 
       | I've made it build a web scraper from scratch, figuring out the
       | "API" of a website using a project from github in another
       | language to get some hints, and while in the end everything was
       | working, I've seen 100k+ tokens being sent too frequently for
       | apparently simple requests, something feels off, it feels like
       | there are quite a few opportunities to reduce token usage.
        
       | taosx wrote:
       | The model is expensive, it almost reaches what I charge per hour.
       | If used right it can be a productivity increase otherwise if you
       | trust it, it WILL introduce silent bugs. So if I have to go over
       | the code line by line I'd prefer to use the cheapest viable
       | model: deepseek, gemini any other free self-hosted models.
       | 
       | Congratz to the team!
        
       | vbezhenar wrote:
       | So far only o1 pro was breathtaking for me few times.
       | 
       | I wrote a kind of complex code for MCU which deals with FRAM and
       | few buffers, juggling bytes around in a complex fashion.
       | 
       | I was very not sure in this code, so I spent some time with AI
       | chats asking them to review this code.
       | 
       | 4o, o3-mini and claude were more or less useless. They spot basic
       | stuff like this code might be problematic for multi-thread
       | environment, those are obvious things and not even true.
       | 
       | o1 pro did something on another level. It recognized that my code
       | uses SPI to talk to FRAM chip. It decoded commands that I've
       | used. It understood the whole timeline of using CS pin. And it
       | highlighted to me, that I used WREN command in a wrong way, that
       | I must have separated it from WRITE command.
       | 
       | That was truly breathtaking moment for me. It easily saved me
       | days of debugging, that's for sure.
       | 
       | I asked the same question to Claude 3.7 thinking mode and it
       | still wasn't that useful.
       | 
       | It's not the only occasion. Few weeks before o1 pro delivered me
       | the solution to a problem that I considered kind of hard.
       | Basically I had issues accessing IPsec VPN configured on a host,
       | from a docker container. I made a well thought question with all
       | the information one might need and o1 pro crafted for me magic
       | iptables incarnation that just solved my problem. I spent quite a
       | bit of time working on this problem, I was close but not there
       | yet.
       | 
       | I often use both ChatGPT and Claude comparing them side by side.
       | For other models they are comparable and I can't really say
       | what's better. But o1 pro plays above. I'll keep trying both for
       | the upcoming days.
        
         | dkulchenko wrote:
         | Have you tried comparing with 3.7 via the API with a large
         | thinking budget yet (32k-64k perhaps?), to bring it closer to
         | the amount of tokens that o1-pro would use?
         | 
         | I think claude.ai's web app in thinking mode is likely
         | defaulting to a much much smaller thinking budget than that.
        
         | davidbarker wrote:
         | Claude 3.5 Sonnet is great, but on a few occasions I've gone
         | round in circles on a bug. I gave it to o1 pro and it fixed it
         | in one shot.
         | 
         | More generally, I tend to give o1 pro as much of my codebase as
         | possible (it can take around 100k tokens) and then ask it for
         | small chunks of work which I then pass to Sonnet inside Cursor.
         | 
         | Very excited to see what o3 pro can do.
        
         | akomtu wrote:
         | This is how the future AI will break free: "no idea what this
         | update is doing, but what AI is suggesting seems to work and I
         | have other things to do."
        
         | sylware wrote:
         | Is there some truth in the following relationship: o1 -> openai
         | -> microsoft -> github for "training data" ?
        
       | danieldevries wrote:
       | Just tried Claude code. First impressions, it seems rather
       | expensive. I prefer how Aider allows finer control over which
       | files to add, or to use a sub-tree of a git repo. Also, It feels
       | like the API calls when using Claude code are much faster then
       | when using 3.7 on Aider. Giving bandwidth priority?
        
       | RomanPushkin wrote:
       | > strong improvements in coding and front-end web development
       | 
       | The best part
        
       | Daniel_Van_Zant wrote:
       | Being able to control how many tokens are spent on thinking is a
       | game-changer. I've been building fairly complex, efficient,
       | systems with many LLMs. Despite the advantages, reasoning models
       | have been a no-go due to how variable the cost is, and how hard
       | that makes it to calculate a final per-query cost for the
       | customer. Being able to say "I know this model can always solve
       | this problem in this many thinking tokens" and thus limiting the
       | cost for that component is huge.
        
       | syndicatedjelly wrote:
       | Claude Code is pretty sick. I love the terminal integration, I
       | like being able to stay on the keyboard and not have to switch
       | UIs. It did a nice job learning my small Django codebase and
       | helping me finish out a feature that I wasn't sure how to
       | complete.
        
       | unsupp0rted wrote:
       | Anybody else noticing that in Cursor, Claude Sonnet 3.7 is
       | thinking much slower than Claude Sonnet 3.5 did?
        
       | numba888 wrote:
       | This was nice. I passed it jseessort algorithm. If you remember
       | discussed here recently. Claude 3.7 generated C++ code. Non-
       | working. But in few steps it gave extensive test, then fix. It
       | looks to be working after a couple of minutes. It's 5-6 times
       | slower than std::sort. Result is better than I've got from
       | o3-mini-hard. Not fair comparison actually as prompting was
       | different.
        
       | smusamashah wrote:
       | > output limit of 128K tokens
       | 
       | Is this limit on thinking mode only? Or does normal mode have
       | same limit now? 8192 tokens output limit can be bit small these
       | days.
       | 
       | I was trying to extract all urls along with their topics from a
       | "what are you working on" HN thread. And 8192 token limit
       | couldn't cover it.
        
       | cavisne wrote:
       | So far Claude Code seems very capable, it oneshotted something I
       | couldnt get to work in cursor at all.
       | 
       | However its expensive, 5m of work cost ~$1 which.
        
       ___________________________________________________________________
       (page generated 2025-02-24 23:00 UTC)