[HN Gopher] Everything around LLMs is still magical and wishful ...
___________________________________________________________________
Everything around LLMs is still magical and wishful thinking
Author : troupo
Score : 111 points
Date : 2025-07-04 21:16 UTC (1 hours ago)
(HTM) web link (dmitriid.com)
(TXT) w3m dump (dmitriid.com)
| atemerev wrote:
| "It's crypto all over again"
|
| Crypto is a lifeline for me, as I cannot open a bank account in
| the country I live in, for reasons I can neither control nor fix.
| So I am happy if crypto is useless for you. For me and for
| millions like me, it is a matter of life and death.
|
| As for LLMs -- once again, magic for some, reliable deterministic
| instrument for others (and also magic). Just classified and
| sorted a few hundreds of invoices. Yes, magic.
| harel wrote:
| Can you elaborate on your situation? Which country are you in?
| How is crypto used there?
| mumbisChungo wrote:
| Said this in another thread and I'll repeat it here:
|
| It's the same problem that crypto experiences. Almost everyone
| is propagating lies about the technology, even if a majority of
| those doing so don't understand enough to realize they're lies
| (naivety vs malice).
|
| I'd argue there's more intentional lying in crypto and less
| value to be gained, but in both cases people who might derive
| real benefit from the hard truth of the matter are turning away
| before they enter the door due to dishonesty/misrepresentation-
| and in both cases there are examples of people deriving real
| value today.
| o11c wrote:
| > I'd argue there's more intentional lying in crypto
|
| I disagree. Crypto _sounds_ more like intentional lying
| because it 's primarily hyped in contexts typical for
| scams/gambling. Yes, there are businesses involved (anybody
| can start one), but they're mostly new businesses or a tiny
| tack-on to an existing business.
|
| AI is largely being hyped within the existing major corporate
| structures, therefore its lies just get tagged as as
| "business as usual". That doesn't make them any less of a lie
| though.
| mumbisChungo wrote:
| I think crypto companies and AI companies probably
| intentionally mislead approximately the same amount as one
| another, but in crypto the average participant is often
| bagholding a very short term investment and has a direct
| and tangible incentive to mislead as many people about it
| as quickly as possible- whereas in AI people mostly just
| get lost in the sauce with anthropomorphism.
|
| Anecdotally, I see a lot more bold-facing lies by crypto
| traders or NFT "collectors" than by LLM enthusiasts.
| tehjoker wrote:
| This is basically the only use case for crypto, and one for
| which it was explicitly designed: censorship resistance. This
| is why people have so much trouble finding useful things for it
| to do in the legal economy, it was explicitly designed to
| facilitate transactions the government doesn't want or can't
| facilitate. In some cases, there are humanitarian applications,
| there are also a lot of illicit applications.
| foobarchu wrote:
| I don't think you actually disagree with the authors quip. You
| seem to want to use crypto as a currency, while OP was most
| likely referring to the grifting around crypto as an
| investment. If you're using it as a currency, then the people
| trying to pump and dump coins and use it for a money making
| vehicle are your adversaries. You are best served if it's
| stable instead of a rollercoaster of booms and busts.
| troupo wrote:
| It's a valid use case in the sea of nonsensical hype where "you
| are a moron if you don't believe in some true meaning of
| crypto".
|
| "You had to be there to believe it"
| https://x.com/0xbags/status/1940774543553146956
|
| AI craze is currently going through a similar period: any
| criticism is brushed away as being presented by morons who know
| nothing
| dcre wrote:
| Similar argument to
| https://www.baldurbjarnason.com/2025/trusting-your-own-judge...,
| but I like this one better because at least it doesn't try to
| pull the rhetorical trick of slipping from "we can't know whether
| LLMs are helping because we haven't studied the question
| systematically" to "actually we do know, and they're shit".
| troupo wrote:
| Wow. Quite a conclusion from an article that actually doesn't
| reach for that conclusion
| careful_ai wrote:
| This resonates deeply. There's a fine line between innovation and
| illusion--and right now, so much of the LLM hype lives in the gap
| between demos and delivery. We're quick to build wrappers and
| workflows, but slow to ask: what actually works at scale?
| Appreciate the candor here--it's a reminder that just because
| something's magical doesn't mean it's mature.
| Arainach wrote:
| Amen.
|
| At my job right now there is an imminent threat from a team
| empowered to say "what if we asked an AI to just build X
| instead of having a team build and maintain it?"
|
| X is something where it's straightforward when N is below 50
| but deeply complex when N is in the thousands, which for our
| team it is, and there is a huge risk that this team will get a
| demo with N=15 that attracts leadership attention and trying to
| explain why the AI-generated solution does not scale is a
| career-limiting move framing me as a naysayer, but this AI team
| would deliver the demo and go away and the inevitable failure
| of their solution at scale would ALSO be my team's problem,
| so..... I hate the future.
| rdgthree wrote:
| FWIW the comment you are responding to was authored by AI.
| careful_ai wrote:
| Real human here--though I admit I do love a well-placed em
| dash. Appreciate the concern though. I'm just someone who's
| spent a lot of time building with LLMs and watching the
| line blur between promise and production. Always happy to
| discuss more if it helps clear the air.
| jules-jules wrote:
| This is so chatGpt it hurts. Can we petition hn to ban ai
| generated comments? I see more Reddit communities
| actively putting a ban on ai, hn should follow if it can
| be done with the available resources.
| careful_ai wrote:
| HN already does that but wouldn't it be easier if we
| trust and value someone else's opinions as well?
| ThrowawayR2 wrote:
| [delayed]
| careful_ai wrote:
| Ugh, this hit a little too close to home.
|
| We're seeing the same pattern--where leadership gets enamored
| with AI "magic," then greenlights a proof-of-concept with a
| few handpicked inputs, only to realize the whole thing falls
| apart in real-world complexity. What's worse is when raising
| those concerns gets you labeled the bottleneck.
|
| The hardest part isn't building AI. It's earning the patience
| to build it right--and making sure the people in charge
| understand the difference between a cool demo and a scalable
| system.
|
| Appreciate you voicing this so clearly. You're not alone in
| hating that part of the future.
| Jordan-117 wrote:
| - superficial emotion
|
| - cliche phrasing
|
| - em dashes
|
| - abundant alliteration
|
| - all comments suspiciously similar in length
|
| - all posts pointing to the same website
|
| Does HN not have a policy against vapid AI comment spam? If
| not, it needs one.
|
| edit: It does:
|
| https://news.ycombinator.com/item?id=37617714
| farts_mckensy wrote:
| Watch out, HN. The em dash police are here. Hands up
| careful_ai wrote:
| LOL. I've clearly overused my em dash quota this week.
| Noted. Will switch to semicolons next time; they need love
| too.
| bgwalter wrote:
| This one is funny though. I'd bet on Bing CoPilot, which now
| always agrees with "AI" concerns because MSFT has probably
| realized that no one wants "AI" and takes a more cautious
| approach.
| careful_ai wrote:
| Haha, if Bing Copilot could thread this much nuance into a
| comment, I'd hire it for my job. For now, just a human
| who's been burned by LLM hype cycles one too many times.
| careful_ai wrote:
| Didn't think using punctuation and full sentences would get
| me flagged--but I get it. In a world where AI writes like
| humans and humans write like bots, I guess tone becomes a
| Rorschach test. Just sharing thoughts I care about, with zero
| automation involved. If something I wrote sounded "AI-ish,"
| that's on me for being too polished. Feedback taken, and
| appreciated.
| Jordan-117 wrote:
| Reaction to a post at 21:39:36:
|
| https://news.ycombinator.com/item?id=44468067
|
| Reaction to a different post at 21:40:30:
|
| https://news.ycombinator.com/item?id=44468069
|
| Fast typist! (Incidentally, both are exactly 59 tokens
| long)
| gjm11 wrote:
| For what it's worth, I had a similar "that looks like AI
| writing" response, and it wasn't because it was "too
| polished". And having looked at the rest of your comment
| history, the only reason why I'm only at 90% confidence
| it's all AI-generated rather than 100% is your explicit
| claims to the contrary. Today's LLMs have a definite style
| that is, sorry, _not_ the same thing as being "polished",
| and if your comments have "zero automation involved" then
| it's quite the extraordinary coincidence how much more like
| an AI you sound than any other human writer I have ever
| encountered. And a further coincidence that this very AI-
| sounding human just happens to be selling services to
| "unlock Meaningful Business Outcomes With AI".
| notphilipmoran wrote:
| I think that the disparity comes between people that are too in
| the weeds believing that their use cases apply to everyone. The
| reality is this world is made up of people with a wide array of
| different needs and AI is yet to proliferate into all usage
| applications.
|
| Sure some of this comes from a lack of education.
|
| But similar to crypto these movements only have value if the
| value is widely perceived. We have to work to continue to
| educate, continue to question, continue to understand different
| perspectives. All in favor of advancing the movement and coming
| out with better tech.
|
| I am a supporter of both but I agree with the reference in the
| article to both becoming echo chambers at times. This is a
| setback we need to avoid.
| tempodox wrote:
| Was this text generated by an LLM?
| blueboo wrote:
| Ok. Claude Code produces most code at Anthropic. Theres an
| enterprise code base, with acute real needs. There are real,
| experienced SWEs. How much babysitting and reviewing is
| undetermined; but the Ants seem to tremendously prefer the
| workflow.
|
| Even crypto people didn't dogfood their crypto like that, on
| their own critical path.
| taurath wrote:
| Truly, how could they have the valuation they have and do
| anything else?
| rightbyte wrote:
| > but the Ants
|
| Is that the official cutsie name people working there are
| called? Those feels so 2020 ...
| peter422 wrote:
| In my codebase in my proprietary project, it's possible that
| LLMs write around half the code, but that's just because a lot
| of the trivial parts of the project are quite verbose.
|
| The really difficult and valuable parts of the codebase are
| very very far beyond what the current LLMs are capable of, and
| believe me, I've tried!
|
| Writing the majority of the code is very different from
| creating the majority of the value.
|
| And I really use and value LLMs, but they are not replacing me
| at the moment.
| thisoneworks wrote:
| Regardless of the usefulness of llms, if you don't work at
| anthropic, how gullible are you to believe that claim at face
| value?
| troupo wrote:
| > Ok. Claude Code produces most code at Anthropic.m
|
| Does it? Or does their marketing tell you that? Strange that
| "most code is written by Claude" and they still hire for actual
| humans for all the positions from backend to API to desktop to
| mobile clients.
|
| > How much babysitting and reviewing is undetermined; but the
| Ants seem to tremendously prefer the workflow.
|
| So. We know nothing about their codebase, actual flows,
| programming languages, depth and breadth of usage, how much
| babysitting is required...
| jm20 wrote:
| The best way I've heard this described: AI (LLMs) is probably 90%
| of the way to human levels of reasoning. We can probably get to
| about 95% optimizing current technology.
|
| Whether or not we can get to 100% using LLMs is an open research
| problem and far from guaranteed. If we can't, it's unclear if it
| will ever really proliferate the way things hope. That 5% makes a
| big difference in most non-niche use cases...
| krapp wrote:
| >The best way I've heard this described: AI (LLMs) is probably
| 90% of the way to human levels of reasoning. We can probably
| get to about 95% optimizing current technology.
|
| We don't know enough about how LLMs work or about how human
| reasoning works for this to be at all meaningful. These numbers
| quantify nothing but wishes and hype.
| ath3nd wrote:
| > AI (LLMs) is probably 90% of the way to human levels of
| reasoning
|
| Considering LLMs have 0 level of reasoning, I can't decide if
| it's a bad take, or a stab at the average human's level of
| reasoning.
|
| In all seriousness, the actual numbers vary from 13% to 26%:
| https://fortune.com/2025/02/12/openai-deepresearch-humanity-...
|
| My take is that there are fundamental limitations to try to
| pigeon-hole reasoning to LLMs, which are essentially a very
| very advanced autocomplete, and that's why those % won't jump
| too much too soon.
| farts_mckensy wrote:
| Whenever people claim that LLMs are not capable of reasoning,
| I put them into a category of people who are themselves not
| capable of reasoning.
| ath3nd wrote:
| Whenever people claim that LLMs are capable of reasoning, I
| put them into a category of people who are themselves able
| to reason as much as an LLM.
| andy99 wrote:
| I've always looked at it as we're not making software that can
| think, we're (quite literally) demonstrating that vast
| categories of things don't need thought (for some quality
| level). The problem is, it's clearly not 100%, maybe it's
| 90-some percent, but it doesn't matter, we're only outsourcing
| the unimportant things that aren't definitional for a task.
|
| This is very typical of naive automation, people assume that
| most of the work is X and by automating that we replace people,
| but the thing that's automated is almost never the real
| bottleneck. Pretty sure I saw an article here yesterday about
| how writing code is not the bottleneck in software development,
| and it holds everywhere.
| farts_mckensy wrote:
| The discussion is completely useless without defining what
| thought is and then demostrating that LLMs are not capable of
| it. And I doubt any definition you come up with will be
| workable.
| ethan_smith wrote:
| These percentage estimates of AI's proximity to "human
| reasoning" are misleading abstractions that mask fundamental
| qualitative differences in how LLMs and humans process
| information.
| sherdil2022 wrote:
| I follow Emily Bender on LinkedIn. She cuts through the AI hype
| and is also the author of The AI Con book - https://thecon.ai/
|
| Of course people will either love AI or hate AI - and some don't
| care. I am cautious especially when people say 'AI is here to
| stay'. It takes away agency.
| farts_mckensy wrote:
| AI is here to stay, and you do not have agency over that. You
| can choose not to use it, but that has zero impact on the
| broader adoption rate. Just like when the automobile was
| introduced and society as a whole evolved.
| jjtheblunt wrote:
| https://en.wikipedia.org/wiki/Clarke%27s_three_laws
|
| includes the 3rd law, which reads, and seems on topic,
|
| "Any sufficiently advanced technology is indistinguishable from
| magic."
| readthenotes1 wrote:
| And of course it says first law that applies here.
|
| The people I have talked to at length about using AI tools
| claim that it has been a boon for productivity: a nurse, a
| doctor, three (old) software developers, a product manager, and
| a graduate student in Control Systems.
|
| It is entirely believable that it may not, on average, help the
| average developer.
|
| I'm reminded of the old joke that ends with "who are you going
| to believe, me or you're lying eyes?"
| tasty_freeze wrote:
| One thing I find frustrating is that management where I work has
| heard of 10x productivity gains. Some of those claims even come
| from early adopters at my work.
|
| But that sets expectation way too high. Partly it is due to
| Amdahl's law: I spend only a portion of my time coding, and far
| more time thinking and communicating with others that are
| customers of my code. Even if does make the coding 10x faster
| (and it doesn't most of the time) overall my productivity is
| 10-15% better. That is nothing to sneeze at, but it isn't 10x.
| louthy wrote:
| > overall my productivity is 10-15% better. That is nothing to
| sneeze at, but it isn't 10x.
|
| It is something to sneeze at if you are 10-15% more expensive
| to employ due to the cost of the LLM tools. The total cost of
| production should always be considered, not just throughput.
| bravesoul2 wrote:
| That's a good insight be because with perfect competition it
| means you need to share your old salary with an LLM!
| votepaunchy wrote:
| > if you are 10-15% more expensive to employ due to the cost
| of the LLM tools
|
| How is one spending anywhere close to 10% of total
| compensation on LLMs?
| CharlesW wrote:
| > _It is something to sneeze at if you are 10-15% more
| expensive to employ due to the cost of the LLM tools._
|
| Claude Max is $200/month, or ~2% of the salary of an average
| software engineer.
| m4rtink wrote:
| Does anyone actually know what the real cost for the
| customers will be once the free AI money no longer floods
| those companies?
| wubrr wrote:
| I'm no LLM evangelist, far from it, but I expect models
| of similar quality to the current bleeding-edge, will be
| freely runnable on consumer hardware within 3 years.
| Future bleeding-edge models may well be more expensive
| than current ones, who knows.
| jppope wrote:
| yeah there was an analysis that came out on hackernews
| the other day. between low demand side economics,
| virtually no impact to GDP, and corporate/vc subsidies
| going away soon we're close to finding out. Sam Altman
| did convince Softbank to do a 40B round though so it
| might be another year or two. Current estimates are that
| its cheaper than search to run so its probabilistic that
| there will be more search features swapped. OpenAi hasn't
| dropped their ad platform yet though, so interested to
| see how that goes
| wubrr wrote:
| > One thing I find frustrating is that management where I work
| has heard of 10x productivity gains. Some of those claims even
| come from early adopters at my work.
|
| Similar situation at my work, but all of the productivity
| claims from internal early adopters I've seen so far are based
| on very narrow ways of measuring productivity, and very sketchy
| math, to put it mildly.
| jppope wrote:
| The reports from analysis of open source projects are that its
| something in the range of 10%-15% productivity gains... so it
| sounds like you're spot on
| smcleod wrote:
| That's about right for copilots. It's much higher for agentic
| coding.
| estomagordo wrote:
| [citation needed]
| datpuz wrote:
| It's just another tech hype wave. Reality will be somewhere
| between total doom and boundless utopia. But probably neither
| of those.
|
| The AI thing kind of reminds me of the big push to outsource
| software engineers in the early 2000's. There was a ton of hype
| among executives about it, and it all seemed plausible on
| paper. But most of those initiatives ended up being huge
| failures, and nearly all of those jobs came back to the US.
|
| People tend to ignore a lot of the little things that glue it
| all together that software engineers do. AI lacks a lot of
| this. Foreigners don't necessarily lack it, but language
| barriers, time zone differences, cultural differences, and all
| sorts of other things led to similar issues. Code quality and
| maintainability took a nosedive and a lot of the stuff produced
| by those outsourced shops had to be thrown in the trash.
|
| I can already see the AI slop accumulating in the codebases I
| work in. It's super hard to spot a lot of these things that
| manage to slip through code review, because they tend to look
| reasonable when you're looking at a diff. The problem is all
| the redundant code that you're not seeing, and the weird
| abstractions that make no sense at all when you look at it from
| a higher level.
| 2muchcoffeeman wrote:
| This was what I was saying to a friend the other day. I think
| anyone vaguely competent that is using LLMs will make the
| technology look far better than it is.
|
| Management thinks the LLM is doing most of the work. Work is
| off shored. Oh, the quality sucks when someone without a clue
| is driving. We need to hire again.
| mlinsey wrote:
| I don't disagree with your assessment of the world today, but
| just 12 months ago (before the current crop of base models and
| coding agents like Claude Code), even that 10X improvement of
| writing some-of-the-code wouldn't have been true.
| __loam wrote:
| It still isn't.
| timr wrote:
| > I don't disagree with your assessment of the world today,
| but just 12 months ago (before the current crop of base
| models and coding agents like Claude Code), even that 10X
| improvement of writing some-of-the-code wouldn't have been
| true.
|
| So? It sounds like you're prodding us to make an
| extrapolation fallacy (I don't even grant the "10x in 12
| months" point, but let's just accept the premise for the sake
| of argument).
|
| Honestly, 12 months ago the base models weren't substantially
| worse than they are right now. Some people will argue with me
| endlessly on this point, and maybe they're a bit better on
| the margin, but I think it's pretty much true. When I look at
| the improvements of the last year with a cold, rational eye,
| they've been in two major areas: * cost &
| efficiency * UI & integration
|
| So how do we improve from here? Cost & efficiency are the
| obvious lever with historical precedent: GPUs kinda suck for
| inference, and costs are (currently) rapidly dropping. But,
| maybe this won't continue -- algorithmic complexity is what
| it is, and barring some revolutionary change in the
| architecture, LLMs are exponential algorithms.
|
| UI and integration is where most of the rest of the recent
| improvement has come from, and honestly, this is pretty close
| to saturation. All of the various AI products _already look
| the same_ , and I'm certain that they'll continue to converge
| to a well-accepted local maxima. After that, huge gains in
| productivity from UX alone will not be possible. This will
| happen quickly -- probably in the next year or two.
|
| Basically, unless we see a Moore's law of GPUs, I wouldn't
| bet on indefinite exponential improvement in AI. My bet is
| that, from here out, this looks like the adoption curve of
| any prior technology shift (e.g. mainframe -> PC, PC ->
| laptop, mobile, etc.) where there's a big boom, then a long,
| slow adoption for the masses.
| ssk42 wrote:
| What exactly are you basing any of your assertions off of?
| deadbabe wrote:
| Wait till they hear about the productivity gains from using
| vim/neovim.
|
| Your developers still push a mouse around to get work done?
| Fire them.
| ghuntley wrote:
| Canva has seen a 30% productivity uplift -
| https://fortune.com/2025/06/25/canva-cto-encourages-all-5000...
|
| AI is the new uplift. Embrace and adapt, as a rift is forming
| (see my talk at https://ghuntley.com/six-month-recap/) in what
| employers seek in terms of skills from employees.
| labrador wrote:
| I'm a retired programmer. I can't imagine trusting code generated
| by probablities for anything mission critical. If it were close
| and just needed minor tweaks I could understand that. But I don't
| have experience with it.
|
| My comment is mainly to say LLMs are amazing in areas that are
| not coding, like brainstorming, blue sky thinking, filling in
| research details, asking questions that make me reflect. I treat
| the LLM like a thinking partner. It does make mistakes, but those
| can be caught easily by checking other sources, or even having
| another LLM review the conclusions.
| garciasn wrote:
| Well; I can't speak to your specific experience (current or
| past) but I'm telling you that while I'm skeptical as hell
| about EVERYTHING, it's blowing my expectations away in every
| conceivable way.
|
| I built something in less than 24h that I'm sure would have
| taken us MONTHS to just get off the ground, let alone to the
| polished version it's at right now. The most impressive thing
| is that it can do all of the things that I absolutely can do,
| just faster. But the most impressive thing is that it can do
| all the things I cannot possibly do and would have had to hire
| up/contract out to accomplish--for far less money, time, and
| with faster iterations than if I had to communicate with
| another human being.
|
| It's not perfect and it's incredibly frustrating at times
| (hardcoding values into the code when I have explicitly told it
| not to; outright lying that it made a particular fix, when it
| actually changed something else entirely unrelated), but it is
| a game changer IMO.
| gyomu wrote:
| > I built something in less than 24h that I'm sure would have
| taken us MONTHS to just get off the ground, let alone to the
| polished version it's at right now
|
| Would love to see it!
| 98eb1d0ff7fb96 wrote:
| See, your comment is a good example of what's going wrong.
| The OP specifically mentioned "mission critical things" - My
| interpretation of that would be things that are not allowed
| to break, because otherwise people might die, in the worst
| case - and you were talking about just SOMETHING that got
| "done" faster. No mention about anything critical.
|
| Of course, I was playing around with claude code, too, and I
| was fascinated how fun it can be and yes, you can get stuff
| done. But I have absolutely no clue what the code is doing
| and if there are some nasty mistakes. So it kinda worked, but
| I would not use that for anything "mission critical"
| (whatever this means).
| CharlesW wrote:
| > _So it kinda worked, but I would not use that for
| anything "mission critical" (whatever this means)._
|
| It means projects like Cloudflare's new OAuth provider
| library. https://github.com/cloudflare/workers-oauth-
| provider
|
| > _" This library (including the schema documentation) was
| largely written with the help of Claude, the AI model by
| Anthropic. Claude's output was thoroughly reviewed by
| Cloudflare engineers with careful attention paid to
| security and compliance with standards. Many improvements
| were made on the initial output, mostly again by prompting
| Claude (and reviewing the results)."_
| labrador wrote:
| The mission of a professional programmer is to deliver code
| that works according to the design specs, handles edge
| cases, fails gracefully and doesn't contain performance
| bottlenecks. It could be software for a water plant, or
| software that incurs charges to accomplish it's task and
| could bankrupt you if there is a mistake. It doesn't have
| to be a matter of life or death.
| svdr wrote:
| I've been programming for 40 years and have started using LLM's
| a few months ago, and it has really changed the way I work. I
| let it write pieces of code (pasting error messages from logs
| mostly result in a fix in less then a minute), but also
| brainstorming about architecture or new solutions. Of course I
| check the code it writes, but I'm still almost daily amazed at
| the intelligence and accuracy. (Very much unlike crypto).
| labrador wrote:
| That's good to know if I ever get an idea for a side project.
| Anything to relieve the tedius aspects of programming would
| be very welcome.
| anon-3988 wrote:
| There's one thing that I find LLM extremely good at: data
| science. Since the IO is well defined, you can easily verify
| that the output is correct. You can even ask it write tests for
| you given that you know certain properties of the data.
|
| The problem is that the LLM needs context of what you are
| doing, contexts that you won't (or too lazy) to give in a chat
| with it ala ChatGPT. This is where Claude Code changes the
| game.
|
| For example, you have PCAP file where each UDP packet contain
| multiple messages.
|
| How do you filter the IP/port/protocol/time? Use LLM, check the
| output
|
| How do you find the number of packets that have patterns A, AB,
| AAB, ABB.... Use LLM, check the output
|
| How to create PCAPs that only contain those packets for
| testing? Use LLM, check the output
|
| Etc etc
|
| Since it can read your code, it is able to infer (because lets
| be honest, you work aint special) what you are trying to do at
| a much better rate. In any case, the fact that you can simply
| ask "Please write a unit test for all of the above functions"
| means that you can help it verify itself.
| fleebee wrote:
| I tried the "thinking partner" approach for a while and for a
| moment I thought it worked well, but at some point the cracks
| started to show and I called the bluff. LLMs are extremely good
| at creating an illusion that they know things and are capable
| of reasoning, but they really don't do a good job of
| cultivating intellectual conversation.
|
| I think it's dangerously easy to get misled when trying to prod
| LLMs for knowledge, especially if it's a field you're new to.
| If you were using a regular search engine, you could look at
| the source website to determine the trustworthiness of its
| contents, but LLMs don't have that. The output can really be
| whatever, and I don't agree it's necessarily that easy to catch
| the mistakes.
| labrador wrote:
| You don't say what LLM you are using. I'm using ChatGPT 4o.
| I'm getting great results, but I review the output with a
| skeptical eye similar to how I read Wikipedia articles. Like
| Wikipedia, GPT 4o is great for surfacing new topics for
| research and does it quickly, which makes stream of thought
| easier.
| yahoozoo wrote:
| The thing is, the questions such as "are they an expert in the
| domain" ... "are they good at coding to being with" ... and so on
| only really apply to the folks claiming positive results from
| LLMs. On the flip side, someone not getting much value - or dare
| I say, a skeptic - pushes back because they _can see_ what the
| LLM gave them is wrong. I'm not providing any revelatory comment
| here, but the simple truth is: people who are shit to begin with
| think this is all amazing/magic/the future.
| hotpotat wrote:
| I have to say I'm in the exact camp the author is complaining
| about. I've shipped non trivial greenfield products which I
| started back when it was only ChatGPT and it was shitty. I
| started using Claude with copying and pasting back and forth
| between the web chat and XCode. Then I discovered Cursor. It left
| me with a lot of annoying build errors, but my productivity was
| still at least 3x. Now that agents are better and claude 4 is
| out, I barely ever write code, and I don't mind. I've leaned into
| the Architect/Manager role and direct the agent with my
| specialized knowledge if I need to.
|
| I started a job at a demanding startup and it's been several
| months and I have still not written a single line of code by
| hand. I audit everything myself before making PRs and test
| rigorously, but Cursor + Sonnet is just insane with their
| codebase. I'm convinced I'm their most productive employee and
| that's not by measuring lines of code, which don't matter; people
| who are experts in the codebase ask me for help with niche bugs I
| can narrow in on in 5-30 minutes as someone whose fresh to their
| domain. I had to lay off taking work away from the front end dev
| (which I've avoided my whole career) because I was stepping on
| his toes, fixing little problems as I saw them thanks to Claude.
| It's not vibe coding - there's a process of research and planning
| and perusing in careful steps, and I set the agent up for
| success. Domain knowledge is necessary. But I'm just so floored
| how anyone could not be extracting the same utility from it. It
| feels like there's two articles like this every week now.
| the__alchemist wrote:
| Web dev CRUD in node?
| hotpotat wrote:
| Multi platform web+native consumer application with lots of
| moving parts and integration. I think to call it a CRUD app
| would be oversimplifying it.
| gyomu wrote:
| > I've shipped non trivial greenfield products
|
| Links please
| hotpotat wrote:
| I'd like to, but purposefully am using a throwaway account.
| It's an iOS app rated 4.5 stars on the app store and has a
| nice community. Mild userbase, in the hundreds.
| larve wrote:
| Here's maybe the most impressive thing I've vibecoded, where
| I wanted to track a file write/read race condition in a
| vscode extension: https://github.com/go-go-golems/go-go-
| labs/tree/main/cmd/exp...
|
| This is _far_ from web crud.
|
| Otherwise, 99% of my code these days is LLM generated,
| there's a fair amount of visible commits from my opensource
| on my profile https://github.com/wesen .
|
| A lot of it is more on the system side of things, although
| there are a fair amount of one-off webapps, now that I can do
| frontends that don't suck.
| 0x696C6961 wrote:
| I find that the code quality LLMs output is pretty bad. I end
| up going through so many iterations that it ends up being
| faster to do it myself. What I find agents actually useful for
| is doing large scale mechanical refractors. Instead of trying
| to figure out the perfect vim macro or AST rewrite script, I'll
| throw an agent at it.
| CharlesW wrote:
| > _I find that the code quality LLMs output is pretty bad._
|
| That was my experience with Cursor, but Claude Code is a
| different world. What specific product/models brought you to
| this generalization?
| AnotherGoodName wrote:
| I disagree strongly at this point. The code is generally good
| if the prompt was reasonable at this point but also every
| test possible is now being written, every ui element has the
| all required traits, every function has the correct
| documentation attached, the million little refactors to
| improve the codebase are being done, etc.
|
| Someone told me 'ai makes all the little things trivial to
| do' and i agree strongly with that. Those many little things
| are things that together make a strong statement about
| quality. Our codebase has gone up in quality significantly
| with ai whereas we'd let the little things slide due to
| understaffing before.
| the__alchemist wrote:
| What sort of mechanical refactors?
| bamboozled wrote:
| I use Claude code for hours a day, it's a liar, trust what it
| does at your own risk.
|
| I personally think you're sugar coating the experience.
| CharlesW wrote:
| > _I use Claude code for hours a day, it's a liar, trust what
| it does at your own risk._
|
| The person you're responding to literally said, "I audit
| everything myself before making PRs and test rigorously".
| troupo wrote:
| Please re-read the article. Especially the first list of things
| we don't know about you, your projects etc.
|
| Your specific experience cannot be generalized. And speaking as
| the author, and who is (as written in the article) literally
| using these tools everyday.
|
| > But I'm just so floored how anyone could not be extracting
| the same utility from it. It feels like there's two articles
| like this every week now.
|
| This is where we learn that you haven't actually read the
| article. Because it is very clearly stating, with links, that I
| am extracting value from these tools.
|
| And the article is also very clearly not about extracting or
| not extracting value.
| hotpotat wrote:
| I did read the entire article before commenting and
| acknowledge that you are using them to some affect, but the
| line about 50% of the time it works 50% of the time is where
| I lost faith in the claims you're making. I agree it's very
| context dependent but, in the same way, you did not outline
| your approaches and practices in how you use AI in your
| workflow. The same lack of context exists on the other side
| of the argument.
| CharlesW wrote:
| > _...the line about 50% of the time it works 50% of the
| time is where I lost faith in the claims you're making..._
|
| That's where the author lost me as well. I'd really be
| interested in a deep dive on their workflow/tools to
| understand how I've been so unbelievably lucky in
| comparison.
| troupo wrote:
| Sibling comment:
| https://news.ycombinator.com/item?id=44468374
| troupo wrote:
| > but the line about 50% of the time it works 50% of the
| time is where I lost faith in the claims you're making.
|
| It's a play on the Anchorman joke that I slightly
| misremembered: "60% of the time it works 100% of the time"
|
| > is where I lost faith in the claims you're making.
|
| Ah yes. You lost faith in mine, but I have to have 100%
| faith in your 100% unverified claim about "job at a
| demanding startup" where "you still haven't written a
| single line of code by hand"?
|
| Why do you assume that your word and experience is more
| correct than mine? Or why should anyone?
|
| > you did not outline your approaches and practices in how
| you use AI in your workflow
|
| No one does. And if you actually read the article, you'd
| see that is _literally the point_.
| alt187 wrote:
| I agree about the 50/50 thing. It's about how much Claude
| helped me, and I use it daily _too_.
|
| I'll give some context, though.
|
| - I use OCaml and Python/SQL, on two different projects.
|
| - Both are single-person.
|
| - The first project is a real-time messaging system, the
| second one is logging a bunch of events in an SQL database.
|
| In the first project, Claude has been... underwhelming. It
| casually uses C idioms, overabuses records and procedural
| programming, ignores basic stuff about the OCaml standard
| library, and even gave me some data structures that slowed
| me down later down the line. It also casuallyies about what
| functions does.
|
| A real example: `Buffer.add_utf_8_uchar` adds the ASCII
| representation of an utf8 char to a buffer, so it adds
| something that looks like `\123\456` for non-ascii.
|
| I had to scold Claude for using this function to add an
| utf8 character to a Buffer so many times I've lost count.
|
| In the second project, Claude really shined. Making most of
| the SQL database and moving most of the logic to the SQL
| engine, writing coherent and readable Python code, etc.
|
| I think the main difference is that the first one is an
| arcane project in an underdog language. The second one is a
| special case of a common "shovel through lists of stuffs
| and stuff them in SQL" problem, in the most common
| language.
|
| You basically get what you trained for.
| mccoyb wrote:
| Same experience here, probably in a slightly different way of
| work (PhD student). Was extremely skeptical of LLMs, Claude
| Code has completely transformed the way I work.
|
| It doesn't take away the requirements of _curation_ - that
| remains firmly in my camp (partially what a PhD is supposed to
| teach you! to be precise and reflective about why you are doing
| X, what do you hope to show with Y, etc -- breakdown every
| single step, explain those steps to someone else -- this is a
| tremendous soft skill, and it's even more important now because
| these agents do not have persistent world models / immediately
| forget the goal of a sequence of interactions, even with clever
| compaction).
|
| If I'm on my game with precise communication, I can use CC to
| organize computation in a way which has never been possible
| before.
|
| It's not easier than programming (if you care about quality!),
| but it is different, and it comes with different idioms.
| exe34 wrote:
| > but my productivity was still at least 3x
|
| How do you measure this?
| DiscourseFan wrote:
| ChatGPT can write research papers in about 20 minutes--its the
| "Deep Research" tool. These are not original papers, but it can
| perform complex tasks that require multiple steps that would
| normally take a person hours. No its not a magic
| superintelligence, but it will transform a lot of white collar
| labor.
| martinald wrote:
| I personally don't really get this.
|
| _So much_ work in the 'services' industries globally comes down
| to really a human transposing data from one Excel sheet to
| another (or from a CRM/emails to Excel), manually. Every (or
| nearly every) enterprise scale company will have hundreds if not
| thousands of FTEs doing this kind of work day in day out - often
| with a lot of it outsourced. I would guess that for every 1
| software engineer there are 100 people doing this kind of 'manual
| data pipelining'.
|
| So really for giant value to be created out of LLMs you do not
| need them to be incredible at OCaml. They just need to
| ~outperform humans on Excel. Where I do think MCP really helps is
| that you can connect all these systems together easily, and a lot
| of the errors in this kind of work came from trying to pass the
| entire 'task' in context. If you can take an email via MCP,
| extract some data out and put it into a CRM (again via MCP) a row
| at a time the hallucination rate is very low IME. I would say at
| least a junior overworked human level.
|
| Perhaps this was the point of the article, but non-determinism is
| not an issue for these kind of use cases, given all the humans
| involved are not deterministic either. We can build systems and
| processes to help enforce quality on non deterministic (eg:
| human) systems.
|
| Finally, I've followed crypto closely and also LLMs closely. They
| do not seem to be similar in terms of utility and adoption. The
| closest thing I can recall is smartphone adoption. A lot of my
| non technical friends didn't think/want a smartphone when the
| iPhone first came out. Within a few years, all of them have them.
| Similar with LLMs. Virtually all of my non technical friends use
| it now for incredibly varied use cases.
| tiahura wrote:
| Everything? As a lawyer, I'm producing 2x - with fewer errors.
| Admittedly, law is a field that mostly involves shuffling words
| around so it may be the best case scenario, but much of the
| skepticism comes off as cope.
| larve wrote:
| Software methodologies and workflows are not engineering either,
| yet we spend a fair amount of time iterating and refining those.
| You can very much become better at prompt engineering. There is a
| huge differential between individuals, for example.
|
| The code coming out of LLMs is just as deterministic as code
| coming out of humans, and despite humans being feckle beings, we
| still talk of software engineering.
|
| As for LLMs, they are and will forever be "unknowable". The human
| mind just can't comprehend what a billion parameters trained on
| trillions of tokens under different regimes for months
| corresponds to. While science has to do microscopic steps towards
| understanding the brain, we still have methods to teach, learn,
| be creative, be rigorous, communicate that do work despite it
| being this "magical" organ.
|
| With LLMs, you can be pretty rigorous. Benchmarks, evals, and
| just the vibes of day to day usage if you are a programmer, are
| not "wishful thinking", they are reasonably effective methods and
| the best we have.
| sureglymop wrote:
| Loosely related, but I find the use of AGI (and sometimes even
| AI) as terms annoying lately. Especially in scientific papers,
| where I would imagine everything to be well defined. If at least
| in how it is used in that paper.
|
| So, why can't we just come up with _some_ definition for what AGI
| is? We could then, say, logically prove that some AI fits that
| definition. Even if this doesn 't seem practically useful, it's
| theoretically much more useful than just using that term with no
| meaning.
|
| Instead it kind of feels like it's an escape hatch. On wikipedia
| we have "a type of ai that would match or surpass human
| capabilities across virtually all cognitive tasks". How could we
| measure that? What good is this if we can't prove that a system
| has this property?
|
| Bit of a rant but I hope it's somewhat legible still.
| AlienRobot wrote:
| We have a definition.
|
| "AI is whatever hasn't been done yet."[1]
|
| 1. https://en.wikipedia.org/wiki/AI_effect
| CharlesW wrote:
| "AI" and "AGI" are _very_ different things.
| AbrahamParangi wrote:
| This reads like the author is mad about imprecision in the
| discourse which is real but to be quite frank more rampant
| amongst detractors than promoters, who often have to deal with
| the flaws and limitations on a day to day basis.
|
| The conclusion that everything around LLMs is magical thinking
| seems to be fairly hubristic to me given that in the last 5 years
| a set of previously borderline intractable problems have become
| _completely or near completely solved_ , translation,
| transcription, and code generation (up to some scale), for
| instance.
| arendtio wrote:
| I think it is more like googling: when the search engine
| appeared, everybody had to learn how to write a good query, even
| though the expectation was that everybody could use them.
|
| With LLMs, it's quite similar: you have to learn how to use them.
| Yes, they are non-deterministic, but if you know how to use them,
| you can increase your chances of getting a good result
| dramatically. Often, this not only means articulating a task, but
| also looking at the bigger picture and asking yourself what tasks
| you should assign in the first place.
|
| For example, I can ask the LLM to write software directly, or I
| can ask it to write user stories or prototypes and then take a
| multi-step approach to develop the software. This can make a huge
| difference in reliability.
|
| And to be clear, I don't mean that every bad result is caused by
| not correctly handling the LLM (some models are simply poor at
| specific tasks), but rather that it is a significant factor to
| consider when evaluating results.
| alganet wrote:
| LLM tech probably will find some legitimate use, but by then,
| everything will be filled with people misusing it.
|
| Millions of beginner developers running with scissors in their
| hands, millions of investment in the garbage.
|
| I don't think this can be reversed anymore, companies are all-in
| and pot commited.
| Sherveen wrote:
| This is completely incoherent. 3 reasons:
|
| 1. he talks about what he's shipped, and yet compares it to
| _crypto_ - already, you 're in a contradiction as to your
| relative comparison - you straight up shouldn't blog if you can't
| conceive that these two are opposing thoughts
|
| 2. this whole refrain from people of like, "SHOW ME your
| enterprise codebase that includes lots of LLM code" - HELLO,
| people who work at private companies CANNOT just reveal their
| codebase to you for internet points
|
| 3. anyone who has actually used these tools has now integrated
| them into their daily life on the order of millions of people and
| billions of dollars - unless you think all CEOs are in a grand
| conspiracy, lying about their teams adopting AI
| ibaikov wrote:
| Crypto and NFT situation happened because of our society, media
| and vc/startup landscape who hype things up a lot for their own
| reasons. We treat massive technologies as new brands of bottled
| water. Or, actually, as a new hype toy as fidget spinners or pop
| it toys. This tech is massively more complex and you have to
| invest time to learn about its abilities, limitations and
| potential developments. Almost nobody actually does this, it's
| easier to follow hype train and put money into something that
| grows up and looks cool without obvious cons. Crypto is cool for
| some stuff. On the other hand, where's your Stepn (and move to
| earn in general), decentraland cities, Apes that will make a
| multimedia universe? Where's "you'll be paying using crypto for
| everything"?
|
| Same for LLMs and AI: it is awesome for some things and
| absolutely sucks for other things. Curiously tho, it feels like
| UX was solved by making chats, but it actually still sucks
| enormously, as with crypto. It is mostly sufficient for doing
| basic stuff. It is difficult to predict where we'll land on the
| curve of difficult (or expensive) vs abilities. I'd bet AI will
| get way more capable, but even now you can't really deny its
| usefulness.
| CharlesW wrote:
| It makes no sense to compare the current AI hype to the tulip
| mania of crypto/NFTs. A much better parallel is to cloud
| computing hype in 2009.
| afiodorov wrote:
| We've been visited by alien intelligence that is simultaneously
| fascinating and underwhelming.
|
| The real issue isn't the technology itself, but our complete
| inability to predict its competence. Our intuition for what
| should be hard or easy simply shatters. It can display superhuman
| breadth of knowledge, yet fail with a confident absurdity that,
| in a person, we'd label as malicious or delusional.
|
| The discourse is stuck because we're trying to map a familiar
| psychology onto a system that has none. We haven't just built a
| new tool; we've built a new kind of intellectual blindness for
| ourselves.
| localghost3000 wrote:
| I've developed the following methodology with LLM's and "agentic"
| (what a dumb fucking word...) workflows:
|
| I will use an LLM/agent if
|
| - I need to get a bunch of coding done and I keep getting booked
| into meetings. I'll give it a task on my todo list and see how it
| did when I get done with said meeting(s). Maybe 40% of the time
| it will have done something I'll keep or just need to do a few
| tweaks to. YMMV though.
|
| - I need to write up a bunch of dumb boilerplatey code. I've got
| my rules tuned so that it generally gets this kind of thing
| right.
|
| - I need a stupid one off script or a little application to help
| me with a specific problem and I don't care about code quality or
| maintainability.
|
| - Stack overflow replacement.
|
| - I need to do something annoying but well understood. An XML
| serializer in Java for example.
|
| - Unit tests. I'm questioning if this ones a good idea though
| outside of maybe doing some of the setup work though. I find I
| generally come to understand my code better through the exercise
| of writing up tests. Sometimes you're in a hurry though
| so...<shrug>
|
| With any of the above, if it doesn't get me close to what I want
| within 2 or 3 tries, I just back off and do the work. I also
| avoid building things I don't fully understand. I'm not going to
| waste 3 hours to save 1 hour of coding.
|
| I will not use an LLM if I need to do anything involving business
| logic and/or need to solve a novel problem. I also don't bother
| if I am working with novel tech. You'll get way more usable
| answers asking about Python then you will asking about Elm.
|
| TL;DR - use your brain. Understand how this tech works, its
| limitations, AND its strengths.
| hamilyon2 wrote:
| I am impressed by speed of the sound goalpost movement.
|
| Few days ago Google released very competent summary generator,
| interpreter between 10-s of languages, gpt-3 class general
| purpose assistant. Working locally on modest hardware. On 5 years
| old laptop, no discrete GPU.
|
| It alone potentially saves so much toil, so much stupid work.
|
| We also finally "solved computer vision". Read from PDF, read
| diagrams and tables.
|
| Local vision models are much less impressive and need some care
| to use. Give it 2 years.
|
| I don't know if we can overhype it when it archives holy grail
| level on some important tasks.
___________________________________________________________________
(page generated 2025-07-04 23:00 UTC)