hngopher.com

       [HN Gopher] Everything around LLMs is still magical and wishful ...
       ___________________________________________________________________
        
       Everything around LLMs is still magical and wishful thinking
        
       Author : troupo
       Score  : 111 points
       Date   : 2025-07-04 21:16 UTC (1 hours ago)
        
 (HTM) web link (dmitriid.com)
 (TXT) w3m dump (dmitriid.com)
        
       | atemerev wrote:
       | "It's crypto all over again"
       | 
       | Crypto is a lifeline for me, as I cannot open a bank account in
       | the country I live in, for reasons I can neither control nor fix.
       | So I am happy if crypto is useless for you. For me and for
       | millions like me, it is a matter of life and death.
       | 
       | As for LLMs -- once again, magic for some, reliable deterministic
       | instrument for others (and also magic). Just classified and
       | sorted a few hundreds of invoices. Yes, magic.
        
         | harel wrote:
         | Can you elaborate on your situation? Which country are you in?
         | How is crypto used there?
        
         | mumbisChungo wrote:
         | Said this in another thread and I'll repeat it here:
         | 
         | It's the same problem that crypto experiences. Almost everyone
         | is propagating lies about the technology, even if a majority of
         | those doing so don't understand enough to realize they're lies
         | (naivety vs malice).
         | 
         | I'd argue there's more intentional lying in crypto and less
         | value to be gained, but in both cases people who might derive
         | real benefit from the hard truth of the matter are turning away
         | before they enter the door due to dishonesty/misrepresentation-
         | and in both cases there are examples of people deriving real
         | value today.
        
           | o11c wrote:
           | > I'd argue there's more intentional lying in crypto
           | 
           | I disagree. Crypto _sounds_ more like intentional lying
           | because it 's primarily hyped in contexts typical for
           | scams/gambling. Yes, there are businesses involved (anybody
           | can start one), but they're mostly new businesses or a tiny
           | tack-on to an existing business.
           | 
           | AI is largely being hyped within the existing major corporate
           | structures, therefore its lies just get tagged as as
           | "business as usual". That doesn't make them any less of a lie
           | though.
        
             | mumbisChungo wrote:
             | I think crypto companies and AI companies probably
             | intentionally mislead approximately the same amount as one
             | another, but in crypto the average participant is often
             | bagholding a very short term investment and has a direct
             | and tangible incentive to mislead as many people about it
             | as quickly as possible- whereas in AI people mostly just
             | get lost in the sauce with anthropomorphism.
             | 
             | Anecdotally, I see a lot more bold-facing lies by crypto
             | traders or NFT "collectors" than by LLM enthusiasts.
        
         | tehjoker wrote:
         | This is basically the only use case for crypto, and one for
         | which it was explicitly designed: censorship resistance. This
         | is why people have so much trouble finding useful things for it
         | to do in the legal economy, it was explicitly designed to
         | facilitate transactions the government doesn't want or can't
         | facilitate. In some cases, there are humanitarian applications,
         | there are also a lot of illicit applications.
        
         | foobarchu wrote:
         | I don't think you actually disagree with the authors quip. You
         | seem to want to use crypto as a currency, while OP was most
         | likely referring to the grifting around crypto as an
         | investment. If you're using it as a currency, then the people
         | trying to pump and dump coins and use it for a money making
         | vehicle are your adversaries. You are best served if it's
         | stable instead of a rollercoaster of booms and busts.
        
         | troupo wrote:
         | It's a valid use case in the sea of nonsensical hype where "you
         | are a moron if you don't believe in some true meaning of
         | crypto".
         | 
         | "You had to be there to believe it"
         | https://x.com/0xbags/status/1940774543553146956
         | 
         | AI craze is currently going through a similar period: any
         | criticism is brushed away as being presented by morons who know
         | nothing
        
       | dcre wrote:
       | Similar argument to
       | https://www.baldurbjarnason.com/2025/trusting-your-own-judge...,
       | but I like this one better because at least it doesn't try to
       | pull the rhetorical trick of slipping from "we can't know whether
       | LLMs are helping because we haven't studied the question
       | systematically" to "actually we do know, and they're shit".
        
         | troupo wrote:
         | Wow. Quite a conclusion from an article that actually doesn't
         | reach for that conclusion
        
       | careful_ai wrote:
       | This resonates deeply. There's a fine line between innovation and
       | illusion--and right now, so much of the LLM hype lives in the gap
       | between demos and delivery. We're quick to build wrappers and
       | workflows, but slow to ask: what actually works at scale?
       | Appreciate the candor here--it's a reminder that just because
       | something's magical doesn't mean it's mature.
        
         | Arainach wrote:
         | Amen.
         | 
         | At my job right now there is an imminent threat from a team
         | empowered to say "what if we asked an AI to just build X
         | instead of having a team build and maintain it?"
         | 
         | X is something where it's straightforward when N is below 50
         | but deeply complex when N is in the thousands, which for our
         | team it is, and there is a huge risk that this team will get a
         | demo with N=15 that attracts leadership attention and trying to
         | explain why the AI-generated solution does not scale is a
         | career-limiting move framing me as a naysayer, but this AI team
         | would deliver the demo and go away and the inevitable failure
         | of their solution at scale would ALSO be my team's problem,
         | so..... I hate the future.
        
           | rdgthree wrote:
           | FWIW the comment you are responding to was authored by AI.
        
             | careful_ai wrote:
             | Real human here--though I admit I do love a well-placed em
             | dash. Appreciate the concern though. I'm just someone who's
             | spent a lot of time building with LLMs and watching the
             | line blur between promise and production. Always happy to
             | discuss more if it helps clear the air.
        
               | jules-jules wrote:
               | This is so chatGpt it hurts. Can we petition hn to ban ai
               | generated comments? I see more Reddit communities
               | actively putting a ban on ai, hn should follow if it can
               | be done with the available resources.
        
               | careful_ai wrote:
               | HN already does that but wouldn't it be easier if we
               | trust and value someone else's opinions as well?
        
               | ThrowawayR2 wrote:
               | [delayed]
        
           | careful_ai wrote:
           | Ugh, this hit a little too close to home.
           | 
           | We're seeing the same pattern--where leadership gets enamored
           | with AI "magic," then greenlights a proof-of-concept with a
           | few handpicked inputs, only to realize the whole thing falls
           | apart in real-world complexity. What's worse is when raising
           | those concerns gets you labeled the bottleneck.
           | 
           | The hardest part isn't building AI. It's earning the patience
           | to build it right--and making sure the people in charge
           | understand the difference between a cool demo and a scalable
           | system.
           | 
           | Appreciate you voicing this so clearly. You're not alone in
           | hating that part of the future.
        
         | Jordan-117 wrote:
         | - superficial emotion
         | 
         | - cliche phrasing
         | 
         | - em dashes
         | 
         | - abundant alliteration
         | 
         | - all comments suspiciously similar in length
         | 
         | - all posts pointing to the same website
         | 
         | Does HN not have a policy against vapid AI comment spam? If
         | not, it needs one.
         | 
         | edit: It does:
         | 
         | https://news.ycombinator.com/item?id=37617714
        
           | farts_mckensy wrote:
           | Watch out, HN. The em dash police are here. Hands up
        
             | careful_ai wrote:
             | LOL. I've clearly overused my em dash quota this week.
             | Noted. Will switch to semicolons next time; they need love
             | too.
        
           | bgwalter wrote:
           | This one is funny though. I'd bet on Bing CoPilot, which now
           | always agrees with "AI" concerns because MSFT has probably
           | realized that no one wants "AI" and takes a more cautious
           | approach.
        
             | careful_ai wrote:
             | Haha, if Bing Copilot could thread this much nuance into a
             | comment, I'd hire it for my job. For now, just a human
             | who's been burned by LLM hype cycles one too many times.
        
           | careful_ai wrote:
           | Didn't think using punctuation and full sentences would get
           | me flagged--but I get it. In a world where AI writes like
           | humans and humans write like bots, I guess tone becomes a
           | Rorschach test. Just sharing thoughts I care about, with zero
           | automation involved. If something I wrote sounded "AI-ish,"
           | that's on me for being too polished. Feedback taken, and
           | appreciated.
        
             | Jordan-117 wrote:
             | Reaction to a post at 21:39:36:
             | 
             | https://news.ycombinator.com/item?id=44468067
             | 
             | Reaction to a different post at 21:40:30:
             | 
             | https://news.ycombinator.com/item?id=44468069
             | 
             | Fast typist! (Incidentally, both are exactly 59 tokens
             | long)
        
             | gjm11 wrote:
             | For what it's worth, I had a similar "that looks like AI
             | writing" response, and it wasn't because it was "too
             | polished". And having looked at the rest of your comment
             | history, the only reason why I'm only at 90% confidence
             | it's all AI-generated rather than 100% is your explicit
             | claims to the contrary. Today's LLMs have a definite style
             | that is, sorry, _not_ the same thing as being  "polished",
             | and if your comments have "zero automation involved" then
             | it's quite the extraordinary coincidence how much more like
             | an AI you sound than any other human writer I have ever
             | encountered. And a further coincidence that this very AI-
             | sounding human just happens to be selling services to
             | "unlock Meaningful Business Outcomes With AI".
        
       | notphilipmoran wrote:
       | I think that the disparity comes between people that are too in
       | the weeds believing that their use cases apply to everyone. The
       | reality is this world is made up of people with a wide array of
       | different needs and AI is yet to proliferate into all usage
       | applications.
       | 
       | Sure some of this comes from a lack of education.
       | 
       | But similar to crypto these movements only have value if the
       | value is widely perceived. We have to work to continue to
       | educate, continue to question, continue to understand different
       | perspectives. All in favor of advancing the movement and coming
       | out with better tech.
       | 
       | I am a supporter of both but I agree with the reference in the
       | article to both becoming echo chambers at times. This is a
       | setback we need to avoid.
        
         | tempodox wrote:
         | Was this text generated by an LLM?
        
       | blueboo wrote:
       | Ok. Claude Code produces most code at Anthropic. Theres an
       | enterprise code base, with acute real needs. There are real,
       | experienced SWEs. How much babysitting and reviewing is
       | undetermined; but the Ants seem to tremendously prefer the
       | workflow.
       | 
       | Even crypto people didn't dogfood their crypto like that, on
       | their own critical path.
        
         | taurath wrote:
         | Truly, how could they have the valuation they have and do
         | anything else?
        
         | rightbyte wrote:
         | > but the Ants
         | 
         | Is that the official cutsie name people working there are
         | called? Those feels so 2020 ...
        
         | peter422 wrote:
         | In my codebase in my proprietary project, it's possible that
         | LLMs write around half the code, but that's just because a lot
         | of the trivial parts of the project are quite verbose.
         | 
         | The really difficult and valuable parts of the codebase are
         | very very far beyond what the current LLMs are capable of, and
         | believe me, I've tried!
         | 
         | Writing the majority of the code is very different from
         | creating the majority of the value.
         | 
         | And I really use and value LLMs, but they are not replacing me
         | at the moment.
        
         | thisoneworks wrote:
         | Regardless of the usefulness of llms, if you don't work at
         | anthropic, how gullible are you to believe that claim at face
         | value?
        
         | troupo wrote:
         | > Ok. Claude Code produces most code at Anthropic.m
         | 
         | Does it? Or does their marketing tell you that? Strange that
         | "most code is written by Claude" and they still hire for actual
         | humans for all the positions from backend to API to desktop to
         | mobile clients.
         | 
         | > How much babysitting and reviewing is undetermined; but the
         | Ants seem to tremendously prefer the workflow.
         | 
         | So. We know nothing about their codebase, actual flows,
         | programming languages, depth and breadth of usage, how much
         | babysitting is required...
        
       | jm20 wrote:
       | The best way I've heard this described: AI (LLMs) is probably 90%
       | of the way to human levels of reasoning. We can probably get to
       | about 95% optimizing current technology.
       | 
       | Whether or not we can get to 100% using LLMs is an open research
       | problem and far from guaranteed. If we can't, it's unclear if it
       | will ever really proliferate the way things hope. That 5% makes a
       | big difference in most non-niche use cases...
        
         | krapp wrote:
         | >The best way I've heard this described: AI (LLMs) is probably
         | 90% of the way to human levels of reasoning. We can probably
         | get to about 95% optimizing current technology.
         | 
         | We don't know enough about how LLMs work or about how human
         | reasoning works for this to be at all meaningful. These numbers
         | quantify nothing but wishes and hype.
        
         | ath3nd wrote:
         | > AI (LLMs) is probably 90% of the way to human levels of
         | reasoning
         | 
         | Considering LLMs have 0 level of reasoning, I can't decide if
         | it's a bad take, or a stab at the average human's level of
         | reasoning.
         | 
         | In all seriousness, the actual numbers vary from 13% to 26%:
         | https://fortune.com/2025/02/12/openai-deepresearch-humanity-...
         | 
         | My take is that there are fundamental limitations to try to
         | pigeon-hole reasoning to LLMs, which are essentially a very
         | very advanced autocomplete, and that's why those % won't jump
         | too much too soon.
        
           | farts_mckensy wrote:
           | Whenever people claim that LLMs are not capable of reasoning,
           | I put them into a category of people who are themselves not
           | capable of reasoning.
        
             | ath3nd wrote:
             | Whenever people claim that LLMs are capable of reasoning, I
             | put them into a category of people who are themselves able
             | to reason as much as an LLM.
        
         | andy99 wrote:
         | I've always looked at it as we're not making software that can
         | think, we're (quite literally) demonstrating that vast
         | categories of things don't need thought (for some quality
         | level). The problem is, it's clearly not 100%, maybe it's
         | 90-some percent, but it doesn't matter, we're only outsourcing
         | the unimportant things that aren't definitional for a task.
         | 
         | This is very typical of naive automation, people assume that
         | most of the work is X and by automating that we replace people,
         | but the thing that's automated is almost never the real
         | bottleneck. Pretty sure I saw an article here yesterday about
         | how writing code is not the bottleneck in software development,
         | and it holds everywhere.
        
           | farts_mckensy wrote:
           | The discussion is completely useless without defining what
           | thought is and then demostrating that LLMs are not capable of
           | it. And I doubt any definition you come up with will be
           | workable.
        
         | ethan_smith wrote:
         | These percentage estimates of AI's proximity to "human
         | reasoning" are misleading abstractions that mask fundamental
         | qualitative differences in how LLMs and humans process
         | information.
        
       | sherdil2022 wrote:
       | I follow Emily Bender on LinkedIn. She cuts through the AI hype
       | and is also the author of The AI Con book - https://thecon.ai/
       | 
       | Of course people will either love AI or hate AI - and some don't
       | care. I am cautious especially when people say 'AI is here to
       | stay'. It takes away agency.
        
         | farts_mckensy wrote:
         | AI is here to stay, and you do not have agency over that. You
         | can choose not to use it, but that has zero impact on the
         | broader adoption rate. Just like when the automobile was
         | introduced and society as a whole evolved.
        
       | jjtheblunt wrote:
       | https://en.wikipedia.org/wiki/Clarke%27s_three_laws
       | 
       | includes the 3rd law, which reads, and seems on topic,
       | 
       | "Any sufficiently advanced technology is indistinguishable from
       | magic."
        
         | readthenotes1 wrote:
         | And of course it says first law that applies here.
         | 
         | The people I have talked to at length about using AI tools
         | claim that it has been a boon for productivity: a nurse, a
         | doctor, three (old) software developers, a product manager, and
         | a graduate student in Control Systems.
         | 
         | It is entirely believable that it may not, on average, help the
         | average developer.
         | 
         | I'm reminded of the old joke that ends with "who are you going
         | to believe, me or you're lying eyes?"
        
       | tasty_freeze wrote:
       | One thing I find frustrating is that management where I work has
       | heard of 10x productivity gains. Some of those claims even come
       | from early adopters at my work.
       | 
       | But that sets expectation way too high. Partly it is due to
       | Amdahl's law: I spend only a portion of my time coding, and far
       | more time thinking and communicating with others that are
       | customers of my code. Even if does make the coding 10x faster
       | (and it doesn't most of the time) overall my productivity is
       | 10-15% better. That is nothing to sneeze at, but it isn't 10x.
        
         | louthy wrote:
         | > overall my productivity is 10-15% better. That is nothing to
         | sneeze at, but it isn't 10x.
         | 
         | It is something to sneeze at if you are 10-15% more expensive
         | to employ due to the cost of the LLM tools. The total cost of
         | production should always be considered, not just throughput.
        
           | bravesoul2 wrote:
           | That's a good insight be because with perfect competition it
           | means you need to share your old salary with an LLM!
        
           | votepaunchy wrote:
           | > if you are 10-15% more expensive to employ due to the cost
           | of the LLM tools
           | 
           | How is one spending anywhere close to 10% of total
           | compensation on LLMs?
        
           | CharlesW wrote:
           | > _It is something to sneeze at if you are 10-15% more
           | expensive to employ due to the cost of the LLM tools._
           | 
           | Claude Max is $200/month, or ~2% of the salary of an average
           | software engineer.
        
             | m4rtink wrote:
             | Does anyone actually know what the real cost for the
             | customers will be once the free AI money no longer floods
             | those companies?
        
               | wubrr wrote:
               | I'm no LLM evangelist, far from it, but I expect models
               | of similar quality to the current bleeding-edge, will be
               | freely runnable on consumer hardware within 3 years.
               | Future bleeding-edge models may well be more expensive
               | than current ones, who knows.
        
               | jppope wrote:
               | yeah there was an analysis that came out on hackernews
               | the other day. between low demand side economics,
               | virtually no impact to GDP, and corporate/vc subsidies
               | going away soon we're close to finding out. Sam Altman
               | did convince Softbank to do a 40B round though so it
               | might be another year or two. Current estimates are that
               | its cheaper than search to run so its probabilistic that
               | there will be more search features swapped. OpenAi hasn't
               | dropped their ad platform yet though, so interested to
               | see how that goes
        
         | wubrr wrote:
         | > One thing I find frustrating is that management where I work
         | has heard of 10x productivity gains. Some of those claims even
         | come from early adopters at my work.
         | 
         | Similar situation at my work, but all of the productivity
         | claims from internal early adopters I've seen so far are based
         | on very narrow ways of measuring productivity, and very sketchy
         | math, to put it mildly.
        
         | jppope wrote:
         | The reports from analysis of open source projects are that its
         | something in the range of 10%-15% productivity gains... so it
         | sounds like you're spot on
        
           | smcleod wrote:
           | That's about right for copilots. It's much higher for agentic
           | coding.
        
             | estomagordo wrote:
             | [citation needed]
        
         | datpuz wrote:
         | It's just another tech hype wave. Reality will be somewhere
         | between total doom and boundless utopia. But probably neither
         | of those.
         | 
         | The AI thing kind of reminds me of the big push to outsource
         | software engineers in the early 2000's. There was a ton of hype
         | among executives about it, and it all seemed plausible on
         | paper. But most of those initiatives ended up being huge
         | failures, and nearly all of those jobs came back to the US.
         | 
         | People tend to ignore a lot of the little things that glue it
         | all together that software engineers do. AI lacks a lot of
         | this. Foreigners don't necessarily lack it, but language
         | barriers, time zone differences, cultural differences, and all
         | sorts of other things led to similar issues. Code quality and
         | maintainability took a nosedive and a lot of the stuff produced
         | by those outsourced shops had to be thrown in the trash.
         | 
         | I can already see the AI slop accumulating in the codebases I
         | work in. It's super hard to spot a lot of these things that
         | manage to slip through code review, because they tend to look
         | reasonable when you're looking at a diff. The problem is all
         | the redundant code that you're not seeing, and the weird
         | abstractions that make no sense at all when you look at it from
         | a higher level.
        
           | 2muchcoffeeman wrote:
           | This was what I was saying to a friend the other day. I think
           | anyone vaguely competent that is using LLMs will make the
           | technology look far better than it is.
           | 
           | Management thinks the LLM is doing most of the work. Work is
           | off shored. Oh, the quality sucks when someone without a clue
           | is driving. We need to hire again.
        
         | mlinsey wrote:
         | I don't disagree with your assessment of the world today, but
         | just 12 months ago (before the current crop of base models and
         | coding agents like Claude Code), even that 10X improvement of
         | writing some-of-the-code wouldn't have been true.
        
           | __loam wrote:
           | It still isn't.
        
           | timr wrote:
           | > I don't disagree with your assessment of the world today,
           | but just 12 months ago (before the current crop of base
           | models and coding agents like Claude Code), even that 10X
           | improvement of writing some-of-the-code wouldn't have been
           | true.
           | 
           | So? It sounds like you're prodding us to make an
           | extrapolation fallacy (I don't even grant the "10x in 12
           | months" point, but let's just accept the premise for the sake
           | of argument).
           | 
           | Honestly, 12 months ago the base models weren't substantially
           | worse than they are right now. Some people will argue with me
           | endlessly on this point, and maybe they're a bit better on
           | the margin, but I think it's pretty much true. When I look at
           | the improvements of the last year with a cold, rational eye,
           | they've been in two major areas:                 * cost &
           | efficiency            * UI & integration
           | 
           | So how do we improve from here? Cost & efficiency are the
           | obvious lever with historical precedent: GPUs kinda suck for
           | inference, and costs are (currently) rapidly dropping. But,
           | maybe this won't continue -- algorithmic complexity is what
           | it is, and barring some revolutionary change in the
           | architecture, LLMs are exponential algorithms.
           | 
           | UI and integration is where most of the rest of the recent
           | improvement has come from, and honestly, this is pretty close
           | to saturation. All of the various AI products _already look
           | the same_ , and I'm certain that they'll continue to converge
           | to a well-accepted local maxima. After that, huge gains in
           | productivity from UX alone will not be possible. This will
           | happen quickly -- probably in the next year or two.
           | 
           | Basically, unless we see a Moore's law of GPUs, I wouldn't
           | bet on indefinite exponential improvement in AI. My bet is
           | that, from here out, this looks like the adoption curve of
           | any prior technology shift (e.g. mainframe -> PC, PC ->
           | laptop, mobile, etc.) where there's a big boom, then a long,
           | slow adoption for the masses.
        
             | ssk42 wrote:
             | What exactly are you basing any of your assertions off of?
        
         | deadbabe wrote:
         | Wait till they hear about the productivity gains from using
         | vim/neovim.
         | 
         | Your developers still push a mouse around to get work done?
         | Fire them.
        
         | ghuntley wrote:
         | Canva has seen a 30% productivity uplift -
         | https://fortune.com/2025/06/25/canva-cto-encourages-all-5000...
         | 
         | AI is the new uplift. Embrace and adapt, as a rift is forming
         | (see my talk at https://ghuntley.com/six-month-recap/) in what
         | employers seek in terms of skills from employees.
        
       | labrador wrote:
       | I'm a retired programmer. I can't imagine trusting code generated
       | by probablities for anything mission critical. If it were close
       | and just needed minor tweaks I could understand that. But I don't
       | have experience with it.
       | 
       | My comment is mainly to say LLMs are amazing in areas that are
       | not coding, like brainstorming, blue sky thinking, filling in
       | research details, asking questions that make me reflect. I treat
       | the LLM like a thinking partner. It does make mistakes, but those
       | can be caught easily by checking other sources, or even having
       | another LLM review the conclusions.
        
         | garciasn wrote:
         | Well; I can't speak to your specific experience (current or
         | past) but I'm telling you that while I'm skeptical as hell
         | about EVERYTHING, it's blowing my expectations away in every
         | conceivable way.
         | 
         | I built something in less than 24h that I'm sure would have
         | taken us MONTHS to just get off the ground, let alone to the
         | polished version it's at right now. The most impressive thing
         | is that it can do all of the things that I absolutely can do,
         | just faster. But the most impressive thing is that it can do
         | all the things I cannot possibly do and would have had to hire
         | up/contract out to accomplish--for far less money, time, and
         | with faster iterations than if I had to communicate with
         | another human being.
         | 
         | It's not perfect and it's incredibly frustrating at times
         | (hardcoding values into the code when I have explicitly told it
         | not to; outright lying that it made a particular fix, when it
         | actually changed something else entirely unrelated), but it is
         | a game changer IMO.
        
           | gyomu wrote:
           | > I built something in less than 24h that I'm sure would have
           | taken us MONTHS to just get off the ground, let alone to the
           | polished version it's at right now
           | 
           | Would love to see it!
        
           | 98eb1d0ff7fb96 wrote:
           | See, your comment is a good example of what's going wrong.
           | The OP specifically mentioned "mission critical things" - My
           | interpretation of that would be things that are not allowed
           | to break, because otherwise people might die, in the worst
           | case - and you were talking about just SOMETHING that got
           | "done" faster. No mention about anything critical.
           | 
           | Of course, I was playing around with claude code, too, and I
           | was fascinated how fun it can be and yes, you can get stuff
           | done. But I have absolutely no clue what the code is doing
           | and if there are some nasty mistakes. So it kinda worked, but
           | I would not use that for anything "mission critical"
           | (whatever this means).
        
             | CharlesW wrote:
             | > _So it kinda worked, but I would not use that for
             | anything "mission critical" (whatever this means)._
             | 
             | It means projects like Cloudflare's new OAuth provider
             | library. https://github.com/cloudflare/workers-oauth-
             | provider
             | 
             | > _" This library (including the schema documentation) was
             | largely written with the help of Claude, the AI model by
             | Anthropic. Claude's output was thoroughly reviewed by
             | Cloudflare engineers with careful attention paid to
             | security and compliance with standards. Many improvements
             | were made on the initial output, mostly again by prompting
             | Claude (and reviewing the results)."_
        
             | labrador wrote:
             | The mission of a professional programmer is to deliver code
             | that works according to the design specs, handles edge
             | cases, fails gracefully and doesn't contain performance
             | bottlenecks. It could be software for a water plant, or
             | software that incurs charges to accomplish it's task and
             | could bankrupt you if there is a mistake. It doesn't have
             | to be a matter of life or death.
        
         | svdr wrote:
         | I've been programming for 40 years and have started using LLM's
         | a few months ago, and it has really changed the way I work. I
         | let it write pieces of code (pasting error messages from logs
         | mostly result in a fix in less then a minute), but also
         | brainstorming about architecture or new solutions. Of course I
         | check the code it writes, but I'm still almost daily amazed at
         | the intelligence and accuracy. (Very much unlike crypto).
        
           | labrador wrote:
           | That's good to know if I ever get an idea for a side project.
           | Anything to relieve the tedius aspects of programming would
           | be very welcome.
        
         | anon-3988 wrote:
         | There's one thing that I find LLM extremely good at: data
         | science. Since the IO is well defined, you can easily verify
         | that the output is correct. You can even ask it write tests for
         | you given that you know certain properties of the data.
         | 
         | The problem is that the LLM needs context of what you are
         | doing, contexts that you won't (or too lazy) to give in a chat
         | with it ala ChatGPT. This is where Claude Code changes the
         | game.
         | 
         | For example, you have PCAP file where each UDP packet contain
         | multiple messages.
         | 
         | How do you filter the IP/port/protocol/time? Use LLM, check the
         | output
         | 
         | How do you find the number of packets that have patterns A, AB,
         | AAB, ABB.... Use LLM, check the output
         | 
         | How to create PCAPs that only contain those packets for
         | testing? Use LLM, check the output
         | 
         | Etc etc
         | 
         | Since it can read your code, it is able to infer (because lets
         | be honest, you work aint special) what you are trying to do at
         | a much better rate. In any case, the fact that you can simply
         | ask "Please write a unit test for all of the above functions"
         | means that you can help it verify itself.
        
         | fleebee wrote:
         | I tried the "thinking partner" approach for a while and for a
         | moment I thought it worked well, but at some point the cracks
         | started to show and I called the bluff. LLMs are extremely good
         | at creating an illusion that they know things and are capable
         | of reasoning, but they really don't do a good job of
         | cultivating intellectual conversation.
         | 
         | I think it's dangerously easy to get misled when trying to prod
         | LLMs for knowledge, especially if it's a field you're new to.
         | If you were using a regular search engine, you could look at
         | the source website to determine the trustworthiness of its
         | contents, but LLMs don't have that. The output can really be
         | whatever, and I don't agree it's necessarily that easy to catch
         | the mistakes.
        
           | labrador wrote:
           | You don't say what LLM you are using. I'm using ChatGPT 4o.
           | I'm getting great results, but I review the output with a
           | skeptical eye similar to how I read Wikipedia articles. Like
           | Wikipedia, GPT 4o is great for surfacing new topics for
           | research and does it quickly, which makes stream of thought
           | easier.
        
       | yahoozoo wrote:
       | The thing is, the questions such as "are they an expert in the
       | domain" ... "are they good at coding to being with" ... and so on
       | only really apply to the folks claiming positive results from
       | LLMs. On the flip side, someone not getting much value - or dare
       | I say, a skeptic - pushes back because they _can see_ what the
       | LLM gave them is wrong. I'm not providing any revelatory comment
       | here, but the simple truth is: people who are shit to begin with
       | think this is all amazing/magic/the future.
        
       | hotpotat wrote:
       | I have to say I'm in the exact camp the author is complaining
       | about. I've shipped non trivial greenfield products which I
       | started back when it was only ChatGPT and it was shitty. I
       | started using Claude with copying and pasting back and forth
       | between the web chat and XCode. Then I discovered Cursor. It left
       | me with a lot of annoying build errors, but my productivity was
       | still at least 3x. Now that agents are better and claude 4 is
       | out, I barely ever write code, and I don't mind. I've leaned into
       | the Architect/Manager role and direct the agent with my
       | specialized knowledge if I need to.
       | 
       | I started a job at a demanding startup and it's been several
       | months and I have still not written a single line of code by
       | hand. I audit everything myself before making PRs and test
       | rigorously, but Cursor + Sonnet is just insane with their
       | codebase. I'm convinced I'm their most productive employee and
       | that's not by measuring lines of code, which don't matter; people
       | who are experts in the codebase ask me for help with niche bugs I
       | can narrow in on in 5-30 minutes as someone whose fresh to their
       | domain. I had to lay off taking work away from the front end dev
       | (which I've avoided my whole career) because I was stepping on
       | his toes, fixing little problems as I saw them thanks to Claude.
       | It's not vibe coding - there's a process of research and planning
       | and perusing in careful steps, and I set the agent up for
       | success. Domain knowledge is necessary. But I'm just so floored
       | how anyone could not be extracting the same utility from it. It
       | feels like there's two articles like this every week now.
        
         | the__alchemist wrote:
         | Web dev CRUD in node?
        
           | hotpotat wrote:
           | Multi platform web+native consumer application with lots of
           | moving parts and integration. I think to call it a CRUD app
           | would be oversimplifying it.
        
         | gyomu wrote:
         | > I've shipped non trivial greenfield products
         | 
         | Links please
        
           | hotpotat wrote:
           | I'd like to, but purposefully am using a throwaway account.
           | It's an iOS app rated 4.5 stars on the app store and has a
           | nice community. Mild userbase, in the hundreds.
        
           | larve wrote:
           | Here's maybe the most impressive thing I've vibecoded, where
           | I wanted to track a file write/read race condition in a
           | vscode extension: https://github.com/go-go-golems/go-go-
           | labs/tree/main/cmd/exp...
           | 
           | This is _far_ from web crud.
           | 
           | Otherwise, 99% of my code these days is LLM generated,
           | there's a fair amount of visible commits from my opensource
           | on my profile https://github.com/wesen .
           | 
           | A lot of it is more on the system side of things, although
           | there are a fair amount of one-off webapps, now that I can do
           | frontends that don't suck.
        
         | 0x696C6961 wrote:
         | I find that the code quality LLMs output is pretty bad. I end
         | up going through so many iterations that it ends up being
         | faster to do it myself. What I find agents actually useful for
         | is doing large scale mechanical refractors. Instead of trying
         | to figure out the perfect vim macro or AST rewrite script, I'll
         | throw an agent at it.
        
           | CharlesW wrote:
           | > _I find that the code quality LLMs output is pretty bad._
           | 
           | That was my experience with Cursor, but Claude Code is a
           | different world. What specific product/models brought you to
           | this generalization?
        
           | AnotherGoodName wrote:
           | I disagree strongly at this point. The code is generally good
           | if the prompt was reasonable at this point but also every
           | test possible is now being written, every ui element has the
           | all required traits, every function has the correct
           | documentation attached, the million little refactors to
           | improve the codebase are being done, etc.
           | 
           | Someone told me 'ai makes all the little things trivial to
           | do' and i agree strongly with that. Those many little things
           | are things that together make a strong statement about
           | quality. Our codebase has gone up in quality significantly
           | with ai whereas we'd let the little things slide due to
           | understaffing before.
        
           | the__alchemist wrote:
           | What sort of mechanical refactors?
        
         | bamboozled wrote:
         | I use Claude code for hours a day, it's a liar, trust what it
         | does at your own risk.
         | 
         | I personally think you're sugar coating the experience.
        
           | CharlesW wrote:
           | > _I use Claude code for hours a day, it's a liar, trust what
           | it does at your own risk._
           | 
           | The person you're responding to literally said, "I audit
           | everything myself before making PRs and test rigorously".
        
         | troupo wrote:
         | Please re-read the article. Especially the first list of things
         | we don't know about you, your projects etc.
         | 
         | Your specific experience cannot be generalized. And speaking as
         | the author, and who is (as written in the article) literally
         | using these tools everyday.
         | 
         | > But I'm just so floored how anyone could not be extracting
         | the same utility from it. It feels like there's two articles
         | like this every week now.
         | 
         | This is where we learn that you haven't actually read the
         | article. Because it is very clearly stating, with links, that I
         | am extracting value from these tools.
         | 
         | And the article is also very clearly not about extracting or
         | not extracting value.
        
           | hotpotat wrote:
           | I did read the entire article before commenting and
           | acknowledge that you are using them to some affect, but the
           | line about 50% of the time it works 50% of the time is where
           | I lost faith in the claims you're making. I agree it's very
           | context dependent but, in the same way, you did not outline
           | your approaches and practices in how you use AI in your
           | workflow. The same lack of context exists on the other side
           | of the argument.
        
             | CharlesW wrote:
             | > _...the line about 50% of the time it works 50% of the
             | time is where I lost faith in the claims you're making..._
             | 
             | That's where the author lost me as well. I'd really be
             | interested in a deep dive on their workflow/tools to
             | understand how I've been so unbelievably lucky in
             | comparison.
        
               | troupo wrote:
               | Sibling comment:
               | https://news.ycombinator.com/item?id=44468374
        
             | troupo wrote:
             | > but the line about 50% of the time it works 50% of the
             | time is where I lost faith in the claims you're making.
             | 
             | It's a play on the Anchorman joke that I slightly
             | misremembered: "60% of the time it works 100% of the time"
             | 
             | > is where I lost faith in the claims you're making.
             | 
             | Ah yes. You lost faith in mine, but I have to have 100%
             | faith in your 100% unverified claim about "job at a
             | demanding startup" where "you still haven't written a
             | single line of code by hand"?
             | 
             | Why do you assume that your word and experience is more
             | correct than mine? Or why should anyone?
             | 
             | > you did not outline your approaches and practices in how
             | you use AI in your workflow
             | 
             | No one does. And if you actually read the article, you'd
             | see that is _literally the point_.
        
             | alt187 wrote:
             | I agree about the 50/50 thing. It's about how much Claude
             | helped me, and I use it daily _too_.
             | 
             | I'll give some context, though.
             | 
             | - I use OCaml and Python/SQL, on two different projects.
             | 
             | - Both are single-person.
             | 
             | - The first project is a real-time messaging system, the
             | second one is logging a bunch of events in an SQL database.
             | 
             | In the first project, Claude has been... underwhelming. It
             | casually uses C idioms, overabuses records and procedural
             | programming, ignores basic stuff about the OCaml standard
             | library, and even gave me some data structures that slowed
             | me down later down the line. It also casuallyies about what
             | functions does.
             | 
             | A real example: `Buffer.add_utf_8_uchar` adds the ASCII
             | representation of an utf8 char to a buffer, so it adds
             | something that looks like `\123\456` for non-ascii.
             | 
             | I had to scold Claude for using this function to add an
             | utf8 character to a Buffer so many times I've lost count.
             | 
             | In the second project, Claude really shined. Making most of
             | the SQL database and moving most of the logic to the SQL
             | engine, writing coherent and readable Python code, etc.
             | 
             | I think the main difference is that the first one is an
             | arcane project in an underdog language. The second one is a
             | special case of a common "shovel through lists of stuffs
             | and stuff them in SQL" problem, in the most common
             | language.
             | 
             | You basically get what you trained for.
        
         | mccoyb wrote:
         | Same experience here, probably in a slightly different way of
         | work (PhD student). Was extremely skeptical of LLMs, Claude
         | Code has completely transformed the way I work.
         | 
         | It doesn't take away the requirements of _curation_ - that
         | remains firmly in my camp (partially what a PhD is supposed to
         | teach you! to be precise and reflective about why you are doing
         | X, what do you hope to show with Y, etc -- breakdown every
         | single step, explain those steps to someone else -- this is a
         | tremendous soft skill, and it's even more important now because
         | these agents do not have persistent world models / immediately
         | forget the goal of a sequence of interactions, even with clever
         | compaction).
         | 
         | If I'm on my game with precise communication, I can use CC to
         | organize computation in a way which has never been possible
         | before.
         | 
         | It's not easier than programming (if you care about quality!),
         | but it is different, and it comes with different idioms.
        
         | exe34 wrote:
         | > but my productivity was still at least 3x
         | 
         | How do you measure this?
        
       | DiscourseFan wrote:
       | ChatGPT can write research papers in about 20 minutes--its the
       | "Deep Research" tool. These are not original papers, but it can
       | perform complex tasks that require multiple steps that would
       | normally take a person hours. No its not a magic
       | superintelligence, but it will transform a lot of white collar
       | labor.
        
       | martinald wrote:
       | I personally don't really get this.
       | 
       | _So much_ work in the 'services' industries globally comes down
       | to really a human transposing data from one Excel sheet to
       | another (or from a CRM/emails to Excel), manually. Every (or
       | nearly every) enterprise scale company will have hundreds if not
       | thousands of FTEs doing this kind of work day in day out - often
       | with a lot of it outsourced. I would guess that for every 1
       | software engineer there are 100 people doing this kind of 'manual
       | data pipelining'.
       | 
       | So really for giant value to be created out of LLMs you do not
       | need them to be incredible at OCaml. They just need to
       | ~outperform humans on Excel. Where I do think MCP really helps is
       | that you can connect all these systems together easily, and a lot
       | of the errors in this kind of work came from trying to pass the
       | entire 'task' in context. If you can take an email via MCP,
       | extract some data out and put it into a CRM (again via MCP) a row
       | at a time the hallucination rate is very low IME. I would say at
       | least a junior overworked human level.
       | 
       | Perhaps this was the point of the article, but non-determinism is
       | not an issue for these kind of use cases, given all the humans
       | involved are not deterministic either. We can build systems and
       | processes to help enforce quality on non deterministic (eg:
       | human) systems.
       | 
       | Finally, I've followed crypto closely and also LLMs closely. They
       | do not seem to be similar in terms of utility and adoption. The
       | closest thing I can recall is smartphone adoption. A lot of my
       | non technical friends didn't think/want a smartphone when the
       | iPhone first came out. Within a few years, all of them have them.
       | Similar with LLMs. Virtually all of my non technical friends use
       | it now for incredibly varied use cases.
        
       | tiahura wrote:
       | Everything? As a lawyer, I'm producing 2x - with fewer errors.
       | Admittedly, law is a field that mostly involves shuffling words
       | around so it may be the best case scenario, but much of the
       | skepticism comes off as cope.
        
       | larve wrote:
       | Software methodologies and workflows are not engineering either,
       | yet we spend a fair amount of time iterating and refining those.
       | You can very much become better at prompt engineering. There is a
       | huge differential between individuals, for example.
       | 
       | The code coming out of LLMs is just as deterministic as code
       | coming out of humans, and despite humans being feckle beings, we
       | still talk of software engineering.
       | 
       | As for LLMs, they are and will forever be "unknowable". The human
       | mind just can't comprehend what a billion parameters trained on
       | trillions of tokens under different regimes for months
       | corresponds to. While science has to do microscopic steps towards
       | understanding the brain, we still have methods to teach, learn,
       | be creative, be rigorous, communicate that do work despite it
       | being this "magical" organ.
       | 
       | With LLMs, you can be pretty rigorous. Benchmarks, evals, and
       | just the vibes of day to day usage if you are a programmer, are
       | not "wishful thinking", they are reasonably effective methods and
       | the best we have.
        
       | sureglymop wrote:
       | Loosely related, but I find the use of AGI (and sometimes even
       | AI) as terms annoying lately. Especially in scientific papers,
       | where I would imagine everything to be well defined. If at least
       | in how it is used in that paper.
       | 
       | So, why can't we just come up with _some_ definition for what AGI
       | is? We could then, say, logically prove that some AI fits that
       | definition. Even if this doesn 't seem practically useful, it's
       | theoretically much more useful than just using that term with no
       | meaning.
       | 
       | Instead it kind of feels like it's an escape hatch. On wikipedia
       | we have "a type of ai that would match or surpass human
       | capabilities across virtually all cognitive tasks". How could we
       | measure that? What good is this if we can't prove that a system
       | has this property?
       | 
       | Bit of a rant but I hope it's somewhat legible still.
        
         | AlienRobot wrote:
         | We have a definition.
         | 
         | "AI is whatever hasn't been done yet."[1]
         | 
         | 1. https://en.wikipedia.org/wiki/AI_effect
        
           | CharlesW wrote:
           | "AI" and "AGI" are _very_ different things.
        
       | AbrahamParangi wrote:
       | This reads like the author is mad about imprecision in the
       | discourse which is real but to be quite frank more rampant
       | amongst detractors than promoters, who often have to deal with
       | the flaws and limitations on a day to day basis.
       | 
       | The conclusion that everything around LLMs is magical thinking
       | seems to be fairly hubristic to me given that in the last 5 years
       | a set of previously borderline intractable problems have become
       | _completely or near completely solved_ , translation,
       | transcription, and code generation (up to some scale), for
       | instance.
        
       | arendtio wrote:
       | I think it is more like googling: when the search engine
       | appeared, everybody had to learn how to write a good query, even
       | though the expectation was that everybody could use them.
       | 
       | With LLMs, it's quite similar: you have to learn how to use them.
       | Yes, they are non-deterministic, but if you know how to use them,
       | you can increase your chances of getting a good result
       | dramatically. Often, this not only means articulating a task, but
       | also looking at the bigger picture and asking yourself what tasks
       | you should assign in the first place.
       | 
       | For example, I can ask the LLM to write software directly, or I
       | can ask it to write user stories or prototypes and then take a
       | multi-step approach to develop the software. This can make a huge
       | difference in reliability.
       | 
       | And to be clear, I don't mean that every bad result is caused by
       | not correctly handling the LLM (some models are simply poor at
       | specific tasks), but rather that it is a significant factor to
       | consider when evaluating results.
        
       | alganet wrote:
       | LLM tech probably will find some legitimate use, but by then,
       | everything will be filled with people misusing it.
       | 
       | Millions of beginner developers running with scissors in their
       | hands, millions of investment in the garbage.
       | 
       | I don't think this can be reversed anymore, companies are all-in
       | and pot commited.
        
       | Sherveen wrote:
       | This is completely incoherent. 3 reasons:
       | 
       | 1. he talks about what he's shipped, and yet compares it to
       | _crypto_ - already, you 're in a contradiction as to your
       | relative comparison - you straight up shouldn't blog if you can't
       | conceive that these two are opposing thoughts
       | 
       | 2. this whole refrain from people of like, "SHOW ME your
       | enterprise codebase that includes lots of LLM code" - HELLO,
       | people who work at private companies CANNOT just reveal their
       | codebase to you for internet points
       | 
       | 3. anyone who has actually used these tools has now integrated
       | them into their daily life on the order of millions of people and
       | billions of dollars - unless you think all CEOs are in a grand
       | conspiracy, lying about their teams adopting AI
        
       | ibaikov wrote:
       | Crypto and NFT situation happened because of our society, media
       | and vc/startup landscape who hype things up a lot for their own
       | reasons. We treat massive technologies as new brands of bottled
       | water. Or, actually, as a new hype toy as fidget spinners or pop
       | it toys. This tech is massively more complex and you have to
       | invest time to learn about its abilities, limitations and
       | potential developments. Almost nobody actually does this, it's
       | easier to follow hype train and put money into something that
       | grows up and looks cool without obvious cons. Crypto is cool for
       | some stuff. On the other hand, where's your Stepn (and move to
       | earn in general), decentraland cities, Apes that will make a
       | multimedia universe? Where's "you'll be paying using crypto for
       | everything"?
       | 
       | Same for LLMs and AI: it is awesome for some things and
       | absolutely sucks for other things. Curiously tho, it feels like
       | UX was solved by making chats, but it actually still sucks
       | enormously, as with crypto. It is mostly sufficient for doing
       | basic stuff. It is difficult to predict where we'll land on the
       | curve of difficult (or expensive) vs abilities. I'd bet AI will
       | get way more capable, but even now you can't really deny its
       | usefulness.
        
         | CharlesW wrote:
         | It makes no sense to compare the current AI hype to the tulip
         | mania of crypto/NFTs. A much better parallel is to cloud
         | computing hype in 2009.
        
       | afiodorov wrote:
       | We've been visited by alien intelligence that is simultaneously
       | fascinating and underwhelming.
       | 
       | The real issue isn't the technology itself, but our complete
       | inability to predict its competence. Our intuition for what
       | should be hard or easy simply shatters. It can display superhuman
       | breadth of knowledge, yet fail with a confident absurdity that,
       | in a person, we'd label as malicious or delusional.
       | 
       | The discourse is stuck because we're trying to map a familiar
       | psychology onto a system that has none. We haven't just built a
       | new tool; we've built a new kind of intellectual blindness for
       | ourselves.
        
       | localghost3000 wrote:
       | I've developed the following methodology with LLM's and "agentic"
       | (what a dumb fucking word...) workflows:
       | 
       | I will use an LLM/agent if
       | 
       | - I need to get a bunch of coding done and I keep getting booked
       | into meetings. I'll give it a task on my todo list and see how it
       | did when I get done with said meeting(s). Maybe 40% of the time
       | it will have done something I'll keep or just need to do a few
       | tweaks to. YMMV though.
       | 
       | - I need to write up a bunch of dumb boilerplatey code. I've got
       | my rules tuned so that it generally gets this kind of thing
       | right.
       | 
       | - I need a stupid one off script or a little application to help
       | me with a specific problem and I don't care about code quality or
       | maintainability.
       | 
       | - Stack overflow replacement.
       | 
       | - I need to do something annoying but well understood. An XML
       | serializer in Java for example.
       | 
       | - Unit tests. I'm questioning if this ones a good idea though
       | outside of maybe doing some of the setup work though. I find I
       | generally come to understand my code better through the exercise
       | of writing up tests. Sometimes you're in a hurry though
       | so...<shrug>
       | 
       | With any of the above, if it doesn't get me close to what I want
       | within 2 or 3 tries, I just back off and do the work. I also
       | avoid building things I don't fully understand. I'm not going to
       | waste 3 hours to save 1 hour of coding.
       | 
       | I will not use an LLM if I need to do anything involving business
       | logic and/or need to solve a novel problem. I also don't bother
       | if I am working with novel tech. You'll get way more usable
       | answers asking about Python then you will asking about Elm.
       | 
       | TL;DR - use your brain. Understand how this tech works, its
       | limitations, AND its strengths.
        
       | hamilyon2 wrote:
       | I am impressed by speed of the sound goalpost movement.
       | 
       | Few days ago Google released very competent summary generator,
       | interpreter between 10-s of languages, gpt-3 class general
       | purpose assistant. Working locally on modest hardware. On 5 years
       | old laptop, no discrete GPU.
       | 
       | It alone potentially saves so much toil, so much stupid work.
       | 
       | We also finally "solved computer vision". Read from PDF, read
       | diagrams and tables.
       | 
       | Local vision models are much less impressive and need some care
       | to use. Give it 2 years.
       | 
       | I don't know if we can overhype it when it archives holy grail
       | level on some important tasks.
        
       ___________________________________________________________________
       (page generated 2025-07-04 23:00 UTC)