[HN Gopher] LLM code generation may lead to an erosion of trust
___________________________________________________________________
LLM code generation may lead to an erosion of trust
Author : CoffeeOnWrite
Score : 206 points
Date : 2025-06-26 06:07 UTC (16 hours ago)
(HTM) web link (jaysthoughts.com)
(TXT) w3m dump (jaysthoughts.com)
| gblargg wrote:
| https://archive.is/5I9sB
|
| (Works on older browsers and doesn't require JavaScript except to
| get past CloudSnare).
| cheriot wrote:
| > promises that the contributed code is not the product of an LLM
| but rather original and understood completely.
|
| > require them to be majority hand written.
|
| We should specify the outcome not the process. Expecting the
| contributor to understand the patch is a good idea.
|
| > Juniors may be encouraged/required to elide LLM-assisted
| tooling for a period of time during their onboarding.
|
| This is a terrible idea. Onboarding is a lot of random
| environment setup hitches that LLMs are often really good at.
| It's also getting up to speed on code and docs and I've got some
| great text search/summarizing tools to share.
| bluefirebrand wrote:
| > Onboarding is a lot of random environment setup hitches
|
| Learning how to navigate these hitches is a _really important
| process_
|
| If we streamline every bit of difficulty or complexity out of
| our lives, it seems trivially obvious that we will soon have no
| idea what to do when we encounter difficulty or complexity. Is
| that just me thinking that?
| kmoser wrote:
| There will always be people who know how to handle the
| complexity we're trying to automate away. If I can't figure
| out some arcane tax law when filling out my taxes, I ask my
| accountant, as it's literally their job to know these things.
| bluefirebrand wrote:
| > There will always be people who know how to handle the
| complexity we're trying to automate away
|
| This is not a given!
|
| If we automated all accounting, why would anyone still take
| the time to learn to become an accountant?
|
| Yes, there are sometimes people who are just invested in
| learning traditional stuff for the sake of it, but is that
| really what we want to rely on as the fallback when AI
| fails?
| RunningDroid wrote:
| > > Onboarding is a lot of random environment setup hitches >
| > Learning how to navigate these hitches is a _really
| important process_
|
| To add to this, a barrier to contribution can reduce low
| quality/spam contributions. The downside is that a barrier to
| contribution that's too high reduces all contributions.
| namenotrequired wrote:
| > LLMs ... approximate correctness for varying amounts of time.
| Once that time runs out there is a sharp drop off in model
| accuracy, it simply cannot continue to offer you an output that
| even approximates something workable. I have taken to calling
| this phenomenon the "AI Cliff," as it is very sharp and very
| sudden
|
| I've never heard of this cliff before. Has anyone else
| experienced this?
| sandspar wrote:
| I'm not sure. Is he talking about context poisoning?
| Kuinox wrote:
| I'm doing my own procedurally generated benchmark.
|
| I can make the problem input bigger as I want.
|
| Each LLM have a different thresholf for each problem, when
| crossed the performance of the LLM collapse.
| Paradigma11 wrote:
| If the context gets to big or otherwise poisoned you have to
| restart the chat/agent. A bit like windows of old. This trains
| you to document the current state of your work so the new agent
| can get up to speed.
| bubblyworld wrote:
| I've only experienced this while vibe coding through chat
| interfaces, i.e. in the complete absence of feedback loops.
| This is _much_ less of a problem with agentic tools like claude
| code /codex/gemini cli, where they manage their own context
| windows and can run your dev tooling to sanity check themselves
| as they go.
| Syzygies wrote:
| One can find opinions that Claude Code Opus 4 is worth the
| monthly $200 I pay for Anthropic's Max plan. Opus 4 is smarter;
| one either can't afford to use it, or can't afford not to use
| it. I'm in the latter group.
|
| One feature others have noted is that the Opus 4 context buffer
| rarely "wears out" in a work session. It can, and one needs to
| recognize this and start over. With other agents, it was my
| routine experience that I'd be lucky to get an hour before
| having to restart my agent. A reliable way to induce this
| "cliff" is to let AI take on a much too hard problem in one
| step, then flail helplessly trying to fix their mess. Vibe-
| coding an unsuitable problem. One can even kill Opus 4 this
| way, but that's no way to run a race horse.
|
| Some "persistence of memory" harness is as important as one's
| testing harness, for effective AI coding. With the right care
| having AI edit its own context prompts for orienting new
| sessions, this all matters less. AI is spectacularly bad at
| breaking problems into small steps without our guidance, and
| small steps done right can be different sessions. I'll
| regularly start new sessions when I have a hunch that this will
| get me better focus for the next step. So the cliff isn't so
| important. But Opus 4 is smarter in other ways.
| suddenlybananas wrote:
| >can't afford not to use it. I'm in the latter group.
|
| People love to justify big expenses as necessary.
| Syzygies wrote:
| $200 is a small expense and you don't know why I need AI.
|
| The online dialog about AI is mostly noise, and even at HN
| it is badly distorted by people who wince at $20 a month,
| and complain AI isn't that smart.
| fwip wrote:
| Sometimes after it flails for a while, but I think it's on
| the right path, I'll rewind the context to just before it
| started trying to solve the problem (but keep the code
| changes). And I'll tell it "I got this other guy to attempt
| what we just talked about, but it still has some problems."
|
| Snipping out the flailing in this way seems to help.
| gwd wrote:
| I experience it pretty regularly -- once the complexity of the
| code passes a certain threshold, the LLM can't keep everything
| in its head and starts thrashing around. Part of my job working
| with the LLM is to manage the complexity it sees.
|
| And one of the things with current generators is that they tend
| to make things more complex over time, rather than less. It's
| always _me_ prompting the LLM to refactor things to make it
| simpler, or doing the refactoring once it 's gotten to complex
| for the LLM to deal with.
|
| So at least with the current generation of LLMs, it seems
| rather inevitable that if you just "give LLMs their head" and
| let them do what they want, eventually they'll create a giant
| Rube Goldberg mess that you'll have to try to clean up.
|
| ETA: And to the point of the article -- if you're an old salt,
| you'll be able to recognize when the LLM is taking you out to
| sea early, and be able to navigate your way back into shallower
| waters even if you go out a bit too far. If you're a new hand,
| you'll be out of your depth and lost at sea before you know
| it's happened.
| windward wrote:
| I've seen it referred to as 'context drunk'.
|
| Imagine that you have your input to the context, 10000 tokens
| that are 99% correct. Each time the LLM replies it adds 1000
| tokens that are 90% correct.
|
| After some back-and-forth of you correcting the LLM, its
| context window is mostly its own backwash^Woutput. Worse, the
| error compounds because the 90% that is correct is just correct
| extrapolation of an argument about incorrect code, and because
| the LLM ranks more recent tokens as more important.
|
| The same problem also shows up in prose.
| Workaccount2 wrote:
| I call it context rot. As the context fills up the quality of
| output erodes with it. The rot gets even worse or progresses
| faster the more spurious or tangential discussion is in
| context.
|
| This is also can be made much worse by thinking models, as
| their CoT is all in context, and if there thoughts really
| wander it just plants seeds of poison feeding the rot. I really
| wish they can implement some form of context pruning, so you
| can nip irrelevant context when it forms.
|
| In the meantime, I make summaries and carry it to a fresh
| instance when I notice the rot forming.
| lubujackson wrote:
| I definitely hit this vibe coding a large-ish backend. Well
| defined data structures, good modularity, etc. But at a point,
| Cursor started to lose the plot and rewrite or duplicate
| functions, recreate or misue data structures, etc.
|
| The solve was to define several Cursor rules files for
| different views of the codebase - here's the structure, here's
| the validation logic, etc. That and using o3 has at least
| gotten me to the next level.
| impure wrote:
| This sounds a lot like accuracy collapse as discussed in that
| Apple paper. That paper clearly showed that there is some point
| where AI accuracy collapses extremely quickly.
|
| I suspect it has something more to do with the model producing
| too many tokens and becoming fixated on what it said before.
| You'll often see this in long conversations. The only way to
| fix it is to start a new conversation.
| npteljes wrote:
| I reset "work" AI sessions quite frequently, so I didn't see
| that there. I experienced it though with storytelling. In my
| storytelling scenario, context and length was important. And
| the AI at one late point forgot how my characters should behave
| in the developing situation, and just had them react to it in a
| very different way. And there was no going back from that. Very
| weird experience.
| beau_g wrote:
| The article opens with a statement saying the author isn't going
| to reword what others are writing, but the article reads as that
| and only that.
|
| That said, I do think it would be nice for people to note in pull
| requests which files have AI gen code in the diff. It's still a
| good idea to look at LLM gen code vs human code with a bit
| different lens, the mistakes each make are often a bit different
| in flavor, and it would save time for me in a review to know
| which is which. Has anyone seen this at a larger org and is it of
| value to you as a reviewer? Maybe some tool sets can already do
| this automatically (I suppose all these companies report the % of
| code that is LLM generated must have one if they actually have
| these granular metrics?)
| acedTrex wrote:
| Author here:
|
| > The article opens with a statement saying the author isn't
| going to reword what others are writing, but the article reads
| as that and only that.
|
| Hmm, I was just saying I hadn't seen much literature or
| discussion on trust dynamics in teams with LLMs. Maybe I'm just
| in the wrong spaces for such discussions but I haven't really
| come across it.
| DyslexicAtheist wrote:
| it's really hard using AI (not impossible) to produce meaningful
| offensive security to improve defense due to there being way too
| many guard rails.
|
| While on the other hand real nation-state threat actors would
| face no such limitations.
|
| On a more general level, what concerns me isn't whether people
| use it to get utility out of it (that would be silly), but the
| power-imbalance in the hand of a few, and with new people pouring
| their questions into it, this divide getting wider. But it's not
| just the people using AI directly but also every post online that
| eventually gets used for training. So to be against it would mean
| to stop producing digital content.
| davidthewatson wrote:
| Well said. The death of trust in software is a well worn path
| from the money that funds and founds it to the design and
| engineering that builds it - at least the 2 guys-in-a-garage
| startup work I was involved in for decades. HITL is key. Even
| with a human in the loop, you wind up at Therac 25. That's
| exactly where hybrid closed loop insulin pumps are right now.
| Autonomy and insulin don't mix well. If there weren't a moat of
| attorneys keeping the signal/noise ratio down, we'd already
| realize that at scale - like the PR team at 3 letter technical
| universities designed to protect parents from the exploding
| pressure inside the halls there.
| tomhow wrote:
| [Stub for offtopicness, including but not limited to comments
| replying to original title rather than article's content]
| sandspar wrote:
| It's interesting that AI proponents say stuff like, "Humans
| will remain interested in other humans, even after AI can do
| all our jobs." It really does seem to be true. Here for example
| we have a guy who's using AI to make a status-seeking statement
| i.e. "I'm playing a strong supporting role on the 'anti-AI
| thinkers' team therefore I'm high status". Like, humans have an
| amazing ability to repurpose anything into status markers. Even
| AI. I think that if AI replaces all of our actual jobs then
| we'll still spend our time doing status jobs. In a way this guy
| is living in the future even more than most AI users.
| michelsedgh wrote:
| For now, yes, because humans are doing most of jobs better
| than AI. In 10 years time, if the AI's are doing a better
| job, people like author need to learn all the ropes if they
| wanna catch up. I don't think LLMs will destroy all jobs, i
| think those who learn them and use them properly, and those
| professionals will outdo people who don't use these tools
| just for the sake of saying I'm high status I dont use these
| tools.
| nextlevelwizard wrote:
| If AI will do better job than humans what ropes are there
| to learn? You just feed in the requirements and AI poops
| out products.
|
| This often is brought up that if you don't use LLMs now to
| produce so-so code you will somehow magically completely
| fall off when the LLMs all of a sudden start making perfect
| code as if developers haven't been learning new tools
| constantly as the field as evolved. Yes, I use old
| technology, but also yes I try new technology and pick and
| choose what works for me and what does not. Just because
| LLMs don't have a good place in my work flow does not mean
| I am not using them at all or that I haven't tried to use
| them.
| michelsedgh wrote:
| Good on you. You are using it and trying to keep up. Keep
| doing that and try to push what you can do with it. I
| love to hear that!
| lynx97 wrote:
| No worries, I also judge you for relying on JavaScript for your
| "simple blog".
| gblargg wrote:
| Doesn't even work on older browsers either.
| rvnx wrote:
| Claude said to use Markdown, text file or HTML with minimal
| CSS. So it means the author does not know how to prompt.
|
| The blog itself is using Alpine JS, which is a human-written
| framework 6 years ago (https://github.com/alpinejs/alpine),
| and you can see the result is not good.
| mnmalst wrote:
| Ha, I came her to make the same comment.
|
| Two completely unnecessary request to: jsdelivr.net and
| net.cdn.cloudflare.net
| acedTrex wrote:
| I wrote it while playing with alpine.js for fun just messing
| around with stuff.
|
| Never actually expected it to be posted on HN. Working on
| getting a static version up now.
| MaxikCZ wrote:
| Yes, I will judge you for requiring javascript to display a
| page of such basic nature.
| thereisnospork wrote:
| In a few years people who don't/can't use AI will be looked at
| like people who couldn't use a computer ~20 years ago.
|
| It might not solve every problem, but it solves enough of them
| better enough it belongs in the tool kit.
| tines wrote:
| I think it will be the opposite. AI causes cognitive decline,
| in the future only the people who don't use AI will retain
| their ability to think. Same as smartphone usage, the less
| the better.
| thereisnospork wrote:
| >Same as smartphone usage, the less the better.
|
| That comparison kind of makes my point though. Sure you can
| bury your face into Tik Tok for 12hrs a day and they do
| kind of suck at Excel but smartphones are massively useful
| and used tools by (approximately) everyone.
|
| Someone not using a smartphone in this day and age is very
| fairly a 'luddite'.
| tines wrote:
| I disagree, smartphones are very narrowly useful. Most of
| the time they're used in ways that destroy the human
| spirit. Someone not using a smartphone in this day and
| age is a god among ants.
|
| A computer is a bicycle for the mind; an LLM is an easy-
| chair.
| AnimalMuppet wrote:
| One could argue (truthfully!) that cars cause the decline
| of leg muscles. But in many situations, cars are enough
| better than walking, so we don't care.
|
| AI _may_ reach that point - that it 's enough better than
| us thinking that we don't think much anymore, and get worse
| at thinking as a result. Well, is that a net win, or not?
| If we get there for that reason, it's probably a net
| win[1]. If we get there because the AI companies are really
| good at PR, that's a definite net loss.
|
| All that is for the future, though. I think that currently,
| it's a net loss. Keep your ability to think; don't trust AI
| any farther than you yourself understand.
|
| [1] It could still not be a net win, if AI turns out to be
| very useful but also either damaging or malicious, and lack
| of thinking for ourselves causes us to miss that.
| tines wrote:
| You're really saying that getting worse at thinking may
| be a net win, and comparing atrophied leg muscles to an
| atrophied mind? I think humanity has lost the plot.
| AnimalMuppet wrote:
| Which took better thinking, assembly or Java? We've lost
| our ability to think well in at least that specific area.
| Are we worse off, or better?
| tines wrote:
| Java and Assembly are the same in the dimension of
| cognitive burden. Trying to reason about this
| fundamentally new thing with analogies like this will not
| work.
| j3th9n wrote:
| Back in the day they would judge people for turning on a
| lightbulb instead of lighting a candle.
| djm_ wrote:
| You could do with using an LLM to make your site work on
| mobile.
| Kuinox wrote:
| 7 comments.
|
| 3 have obviously only read the title, and 3 comments how the
| article require JS.
|
| Well played HN.
| sandspar wrote:
| That's typical for link sharing communities like HN and
| Reddit. His title clearly struck a nerve. I assume many
| people opened the link, saw that it was a wall of text,
| scanned the first paragraph, categorized his point into some
| slot that they understand, then came here to compete in HN's
| side-market status game. Normal web browsing behavior, in
| other words.
| tomhow wrote:
| This exactly why the guideline about titles says:
|
| _Otherwise please use the original title, unless it is
| misleading or linkbait_.
|
| This title counts as linkbait so I've changed it. It turns
| out the article is much better (for HN) than the title
| suggests.
| Kuinox wrote:
| I did not posted the article, but I know who wrote it.
|
| Good change btw.
| DocTomoe wrote:
| You can judge all you want. You'll eventually appear much like
| that old woman secretly judging you in church.
|
| Most of the current discourse on AI coding assistants sounds
| either breathlessly optimistic or catastrophically alarmist.
| What's missing is a more surgical observation: the disruptive
| effect of LLMs is not evenly distributed. In fact, the clash
| between how open source and industry teams establish trust
| reveals a fault line that's been papered over with hype and
| metrics.
|
| FOSS project work on a trust basis - but industry standard is
| automated testing, pair programming, and development speed.
| That CRUD app for finding out if a rental car is available? Not
| exactly in need for a hand-crafted piece of code, and no-one
| cares if Junior Dev #18493 is trusted within the software dev
| organization.
|
| If the LLM-generated code breaks, blame gets passed, retros are
| held, Jira tickets multiply -- the world keeps spinning, and a
| team fixes it. If a junior doesn't understand their own patch,
| the senior rewrites it under deadline. It's not pretty, but it
| works. And when it doesn't, nobody loses "reputation" - they
| lose time, money, maybe sleep. But not identity.
|
| LLMs challenge open source where it's most vulnerable - in its
| culture. Meanwhile, industry just treats them like the next
| Jenkins: mildly annoying at first, but soon part of the stack.
|
| The author loves the old ways, for many valid reasons: Gabled
| houses _are_ beautiful, but outside of architectural circles,
| prefab is what scaled the suburbs, not timber joints and
| romanticism.
| extr wrote:
| The author seems to be under the impression that AI is some
| kind of new invention that has now "arrived" and we need to
| "learn to work with". The old world is over. "Guaranteeing
| patches are written by hand" is like the Tesla Gigafactory
| wanting a guarantee that the nuts and bolts they purchase are
| hand-lathed.
| can16358p wrote:
| Ironically, a blog post about judging for a practice uses
| terrible web practices: I'm on mobile and the layout is messed
| up, and Safari's reader mode crashes on this page for whatever
| reason.
| rvnx wrote:
| On Safari mobile you even get a white page, which is almost
| poetic. It means it pushes your imagination to the max.
| acedTrex wrote:
| Mobile layout should be fixed now, I also just threw up a
| quick static version here as well
| https://static.jaysthoughts.com/
| EbNar wrote:
| I'll surely care that a stranger on the internet judges me
| about the tools I use kor I don't).
| stavros wrote:
| I don't understand the premise. If I trust someone to write good
| code, I learned to trust them because their code works well, not
| because I have a theory of mind for them that "produces good
| code" a priori.
|
| If someone uses an LLM and produces bug-free code, I'll trust
| them. If someone uses an LLM and produces buggy code, I won't
| trust them. How is this different from when they were only using
| their brain to produce the code?
| moffkalast wrote:
| It's easy to get overconfident and not test the LLM's code
| enough when it worked fine for a handful of times in a row, and
| then you miss something.
|
| The problem is often really one of miscommunication, the task
| may be clear to the person working on it, but with frequent
| context resets it's hard to make sure the LLM also knows what
| the whole picture is and they tend to make dumb assumptions
| when there's ambiguity.
|
| The thing that 4o does with deep research where it asks for
| additional info before it does anything should be standard for
| any code generation too tbh, it would prevent a mountain of
| issues.
| stavros wrote:
| Sure, but you're still responsible for the quality of the
| code you commit, LLM or no.
| moffkalast wrote:
| Of course you are, but it's sort of like how people are
| responsible their Tesla driving on autopilot, which then
| suddenly swerves into a wall and disengages two seconds
| before impact. The process forces you to make mistakes you
| wouldn't normally ever do or even consider a possibility.
| JohnKemeny wrote:
| To add to devs and Teslas, you have journalists using
| LLMs writing summaries, lawyers using LLMs writing
| dispositions, doctors using LLMs writing their patient
| entries, and law enforcement using LLMs writing their
| forensics report.
|
| All of these make mistakes (there are documented
| incidents).
|
| And yes, we can counter with "the journalists are dumb
| for not verifying", "the lawyers are dumb for not
| checking", etc., but we should also be open for the fact
| that these are intelligent and professional people who
| make mistakes because they were mislead by those who sell
| LLMs.
| bluefirebrand wrote:
| I think it's analogous to physical labour
|
| In the past someone might have been physically healthy
| and strong enough to physically shovel dirt all day long
|
| Nowadays this is rarer because we use an excavator
| instead. Yes, a professional dirt mover is more
| productive with an excavator than a shovel, but is likely
| not as physically fit as someone spending their days
| moving dirt with a shovel
|
| I think it will be similar with AI. It is absolutely
| going to offload a lot of people's thinking into the LLMs
| and their "do it by hand" muscles will atrophy. For
| knowledge workers, that's our brain
|
| I know this was a similar concern with search engines and
| Stack Overflow, so I am trying to temper my concern here
| as best I can. But I can't shake the feeling that LLMs
| provide a way for people to offload their thinking and go
| on autopilot a lot more easily than Search ever did
|
| I'm not saying that we were better off when we had to
| move dirt by hand either. I'm just saying there was a
| physical tradeoff when people moved out of the fields and
| into offices. I suspect there will be a cognitive
| tradeoff now that we are moving away from researching
| solutions to problems and towards asking the AI to give
| us solutions to problems
| acedTrex wrote:
| In an ideal world you would think everyone see's it this
| way. But we are starting to see an uptick in "I don't know
| the LLMs said do that."
|
| As if that is a somehow exonerating sentence.
| NeutralCrane wrote:
| It isn't, and that is a sign of a bad dev you shouldn't
| trust.
|
| LLMs are a tool, just like any number of tools that are
| used by developers in modern software development. If a
| dev doesn't use the tool properly, don't trust them. If
| they do, trust them. The way to assess if they use it
| properly is in the code they produce.
|
| Your premise is just fundamentally flawed. Before LLMs,
| the proof of a quality dev was in the pudding. After
| LLMs, the proof of a quality dev remains in the pudding.
| acedTrex wrote:
| > Your premise is just fundamentally flawed. Before LLMs,
| the proof of a quality dev was in the pudding. After
| LLMs, the proof of a quality dev remains in the pudding.
|
| Indeed it does, however what the "proof" is has changed.
| In terms of sitting down and doing a full, deep review,
| tracing every path validating every line etc. Then for
| sure, nothing has changed.
|
| However, at least in my experience, pre LLM those reviews
| were not EVERY CASE there were many times I elided parts
| of a deep review because i saw markers in the code that
| to me showed competency, care etc. With those markers
| there are certain failure conditions that can be deemed
| very unlikely to exist and therefore the checks can be
| skipped. Is that ALWAYS the correct assumption?
| Absolutely not but the more experienced you are the less
| false positives you get.
|
| LLMs make those markers MUCH harder to spot, so you have
| to fall back to doing a FULL indepth review no matter
| what. You have to eat ALL the pudding so to speak.
|
| For people that relied on maybe tasting a bit of the
| pudding then assuming based on the taste the rest of the
| pudding probably tastes the same its rather jarring and
| exhausting to now have to eat all of it all the time.
| NeutralCrane wrote:
| > However, at least in my experience, pre LLM those
| reviews were not EVERY CASE there were many times I
| elided parts of a deep review because i saw markers in
| the code that to me showed competency, care etc.
|
| That was never proof in the first place.
|
| If anything, someone basing their trust in a submission
| on anything other than the code itself is far more
| concerning and trust-damaging to me than if the submitter
| has used an LLM.
| acedTrex wrote:
| > That was never proof in the first place.
|
| I mean, it's not necessarily HARD proof but it has been a
| reliable enough way to figure out which corners to cut.
| You can of course say that no corners should ever be cut
| and while that is true in an ideal sense. In the real
| world things always get fuzzy.
|
| Maybe the death of cutting corners is a good thing
| overall for output quality. Its certainly exhausting on
| the people tasked with doing the reviews however.
| breuleux wrote:
| I don't know about that. Cutting corners will never die.
|
| Ultimately I don't think the heuristics would change all
| that much, though. If every time you review a person's
| PR, almost everything is great, they are either not using
| AI or they are vetting what the AI writes themselves, so
| you can trust them as you did before. It may just take
| some more PRs until that's apparent. Those who submit
| unvetted slop will have to fix a lot of things, and you
| can crank up the heat on them until they do better, if
| they can. (The "if they can" is what I'm most worried
| about.)
| taneq wrote:
| If you have a long standing, effective heuristic that "people
| with excellent, professional writing are more accurate and
| reliable than people with sloppy spelling and punctuation" then
| the appearance of a semi-infinite group of 'people' writing
| well presented, convincingly worded articles which nonetheless
| are riddled with misinformation, hidden logical flaws, and
| inconsistencies, you're gonna end up trusting everyone a lot
| less.
|
| It's like if someone started bricking up tunnel entrances and
| painting ultra realistic versions of the classic Road Runner
| tunnel painting on them, all over the place. You'd have to stop
| and poke every underpass with a stick just to be sure.
| stavros wrote:
| Sure, your heuristic no longer works, and that's a bit
| inconvenient. We'll just find new ones.
| sebmellen wrote:
| Yeah, now you need to be able to demonstrate verbal
| fluency. The problem is, that inherently means a loss of
| "trusted anonymous" communication, which is particularly
| damaging to the fiber of the internet.
| acedTrex wrote:
| Author here:
|
| Precisely, in the age where it is very difficult to
| ascertain the type or quality of skills you are
| interacting with say in a patch review or otherwise you
| frankly have to "judge" someone and fallback to suspicion
| and full verification.
| taneq wrote:
| Yeah I think "trust for a fluent, seemingly logically
| coherent anonymous responder" pretty much captures it.
| oasisaimlessly wrote:
| "A bit inconvenient" might be the understatement of the
| year. If information requires say, 2x the time to validate,
| the utility of the internet is halved.
| alganet wrote:
| > I learned to trust them because their code works well
|
| There's so much more than "works well". There are many cues
| that exist close to code, but are not code:
|
| I trust more if the contributor explains their change well.
|
| I trust more if the contributor did great things in the past.
|
| I trust more if the contributor manages granularity well
| (reasonable commits, not huge changes).
|
| I trust more if the contributor picks the right problems to
| work on (fixing bugs before adding new features, etc).
|
| I trust more if the contributor proves being able to maintain
| existing code, not just add on top of it.
|
| I trust more if the contributor makes regular contributions.
|
| And so on...
| acedTrex wrote:
| Author here:
|
| Spot on, there are so many little things that we as humans
| use as subtle verification steps to decide how much scrutiny
| various things require. LLMs are not necessarily the death of
| that concept but they do make it far far harder.
| somewhereoutth wrote:
| Because when people use LLMs, they are getting the tool to do
| the work for them, not using the tool to do the work. LLMs are
| not calculators, nor are they the internet.
|
| A good rule of thumb is to simply reject any work that has had
| involvement of an LLM, and ignore any communication written by
| an LLM (even for EFL speakers, I'd much rather have your "bad"
| English than whatever ChatGPT says for you).
|
| I suspect that as the serious problems with LLMs become ever
| more apparent, this will become standard policy across the
| board. Certainly I hope so.
| stavros wrote:
| Well, no, a good rule of thumb is to expect people to write
| good code, no matter how they do it. Why would you mandate
| what tool they can use to do it?
| somewhereoutth wrote:
| Because it pertains to the quality of the output - I can't
| validate every line of code, or test every edge case. So if
| I need a certain level of quality, I have to verify the
| process of producing it.
|
| This is standard for any activity where accuracy / safety
| is paramount - you validate the _process_. Hence things
| like maintenance logs for airplanes.
| acedTrex wrote:
| > So if I need a certain level of quality, I have to
| verify the process of producing it
|
| Precisely this, and this is hardly a unique to software
| requirement. Process audits are everywhere in
| engineering. Previously you could infer the process of
| producing some code by simply reading the patch and that
| generally would tell you quite a bit about the author
| itself. Using advanced and niche concepts with imply a
| solid process with experience backing it. Which would
| then imply that certain contextual bugs are unlikely so
| you skip looking for them.
|
| My premise in the blog is basically that "Well now I have
| go do a full review no matter what the code itself tells
| me about the author."
| badsectoracula wrote:
| > My premise in the blog is basically that "Well now I
| have go do a full review no matter what the code itself
| tells me about the author."
|
| Which IMO is the correct approach - or alternatively, if
| you do actually trust the author, you shouldn't care if
| they used LLMs or not because you'd trust them to check
| the LLM output too.
| badsectoracula wrote:
| The false assumption here is that humans will always
| write better code than LLMs, which is certainly not the
| case for all humans nor all LLMs.
| mexicocitinluez wrote:
| >Because when people use LLMs, they are getting the tool to
| do the work for them, not using the tool to do the work.
|
| What? How on god's green earth could you even pretend to know
| how all people are using these tools?
|
| > LLMs are not calculators, nor are they the internet.
|
| Umm, okay? How does that make them less useful?
|
| I'm going to give you a concrete example of something I just
| did and let you try and do whatever mental gymnastics you
| have to do to tell me it wasn't useful:
|
| Medicare requires all new patients receiving home health
| treatment go through a 100+ question long form. This form
| changes yearly, and it's my job to implement the form into
| our existing EMR. Well, part of that is creating a printable
| version. Guess what I did? I uploaded the entire pdf to
| Claude and asked it to create a print-friendly template using
| Cottle as the templating language in C#. It generated the 30
| page print preview in a minute. And it took me about 10 more
| minutes to clean up.
|
| > I suspect that as the serious problems with LLMs become
| ever more apparent, this will become standard policy across
| the board. Certainly I hope so.
|
| The irony is that they're getting better by the day. That's
| not to say people don't use them for the wrong applications,
| but the idea that this tech is going to be banned is absurd.
|
| > A good rule of thumb is to simply reject any work that has
| had involvement of an LLM
|
| Do you have any idea how ridiculous this sounds to people who
| actually use the tools? Are you going to be able to hunt down
| the single React component in which I asked it to convert the
| MUI styles to tailwind? How could you possibly know? You
| can't.
| flir wrote:
| > A good rule of thumb is to simply reject any work that has
| had involvement of an LLM,
|
| How are you going to _know_?
| bluefirebrand wrote:
| That's sort of the problem isn't it? There is no real way
| to know so we sort of just have to assume every bit of work
| is involving LLMs now so we have to take a lot closer look
| at everything
| sebmellen wrote:
| You're being unfairly downvoted. There is a plague of well-
| groomed incoherency in half of the business emails I receive
| today. You can often tell that the author, without wrestling
| with the text to figure out what they want to say, is a kind
| of stochastic parrot.
|
| This is okay for platitudes, but for emails that really
| matter, having this messy watercolor kind of writing totally
| destroys the clarity of the text and confuses everyone.
|
| To your point, I've asked everyone on my team to refrain from
| writing words (not code) with ChatGPT or other tools, because
| the LLM invariably leads to more complicated output than the
| author just badly, but authentically, trying to express
| themselves in the text.
| acedTrex wrote:
| Yep, I have come to really dislike LLMs for documentation
| as it just reads wrong to me and I find so often misses the
| point entirely. There is so much nuance tied up in
| documentation and much of it is in what is NOT said as much
| as what is said.
|
| The LLMs struggle with both but REALLY struggle with
| figuring out what NOT to say.
| short_sells_poo wrote:
| I wonder if this is to a large degree also because when
| we communicate with humans, we take cues from more than
| just the text. The personality of the author will project
| into the text they write, and assuming you know this
| person at least a little bit, these nuances will give you
| extra information.
| jimbokun wrote:
| I find the idea of using LLMs for emails confusing.
|
| Surely it's less work to put the words you want to say into
| an email, rather than craft a prompt to get the LLM to say
| what you want to say, and iterate until the LLM actually
| says it?
| fwip wrote:
| My own opinion, which is admittedly too harsh, is that
| they don't really know what they want to say. That is,
| the prompt they write is very short, along the lines of
| `ask when this will be done` or `schedule a followup`,
| and give the LLM output a cursory review before copy-
| pasting it.
| jimbokun wrote:
| Still funny to me.
|
| `ask when this will be done` -> ChatGPT -> paste answer
| into email
|
| vs
|
| type: "when will this be done?" Send.
| breuleux wrote:
| I think the main issue is people using LLMs to do things that
| they don't know how to do themselves. There's actually a
| similar problem with calculators, it's just a much smaller
| one: if you never learn how to add or multiply numbers by
| hand and use calculators for everything all the time, you may
| sometimes make absurd mistakes like tapping 44 * 3 instead of
| 44 * 37 and not bat an eye when your calculator tells you the
| result is a whole order of magnitude less than what you
| should have expected. Because you don't really understand how
| it works. You haven't developed the intuition.
|
| There's nothing wrong with using LLMs to save time doing
| trivial stuff you know how to do yourself and can check very
| easily. The problem is that (very lazy) people are using them
| to do stuff they are themselves not competent at. They can't
| check, they won't learn, and the LLM is essentially their
| skill ceiling. This is very bad: what plus-value are you
| supposed to bring over something you don't understand? AGI
| won't have to improve from the current baseline to surpass
| humans if we're just going to drag ourselves down to its
| level.
| tranchebald wrote:
| I'm not seeing a lot of discussion about verification or a
| stronger quality control process anywhere in the comments
| here. Is that some kind of unsolvable problem for software? I
| think if the standard of practice is to use author reputation
| as a substitute for a robust quality control process, then I
| wouldn't be confident that the current practice is much
| better than AI code-babel.
| badsectoracula wrote:
| > Because when people use LLMs, they are getting the tool to
| do the work for them, not using the tool to do the work.
|
| You can say that for pretty much any sort of automation or
| anything that makes things easier for humans. I'm pretty sure
| people were saying that about doing math by hand around when
| calculators became mainstream too.
| mexicocitinluez wrote:
| It's not.
|
| What you're seeing now is people who once thought and
| proclaimed these tools as useless now have to start to walk
| back their claims with stuff like this.
|
| It does amaze me that the people who don't use these tools seem
| to have the most to say about them.
| acedTrex wrote:
| Author here:
|
| For what it's worth I do actually use the tools albeit
| incredibly intentionally and sparingly.
|
| I see quite a few workflows and tasks that they can be a
| value add on, mostly outside of the hotpath of actual code
| generation but still quite enticing. So much so in fact I'm
| working on my own local agentic tool with some self hosted
| ollama models. I like to think that i am at least somewhat in
| the know on the capabilities and failure points of the latest
| LLM tooling.
|
| That however doesn't change my thoughts on trying to
| ascertain if code submitted to me deserves a full indepth
| review or if I can maybe cut a few corners here and there.
| mexicocitinluez wrote:
| > That however doesn't change my thoughts on trying to
| ascertain if code submitted to me deserves a full indepth
| review or if I can maybe cut a few corners here and there.
|
| How would you even know? Seriously, if I use Chatgpt to
| generate a one-off function for a feature I'm working on
| that searches all classes for one that inherits a specific
| interface and attribute, are you saying you'd be able to
| spot the difference?
|
| And what does it even matter it works?
|
| What if I use Bolt to generate a quick screen for a PoC? Or
| use Claude to create a print-preview with CSS of a 30 page
| Medicare form? Or converting a component's styles MUI to
| tailwind? What if all these things are correct?
|
| This whole OS repos will ban LLM-generated code is a bit
| absurd.
|
| > or what it's worth I do actually use the tools albeit
| incredibly intentionally and sparingly.
|
| How sparingly? Enough to see how it's constantly improving?
| acedTrex wrote:
| > How would you even know? Seriously, if I use Chatgpt to
| generate a one-off function for a feature I'm working on
| that searches all classes for one that inherits a
| specific interface and attribute, are you saying you'd be
| able to spot the difference?
|
| I don't know, thats the problem. As a result, because I
| can't know I have to now do full in depth reviews no
| matter what. Which is the "judging" I tongue in cheek
| talk about in the blog.
|
| > How sparingly? Enough to see how it's constantly
| improving?
|
| Nearly daily, to be honest I have not noticed too much
| improvement year over year in regards to how they fail.
| They still break in the exact same dumb ways now as they
| did before. Sure they might generate correct syntactic
| code reliably now and it might even work. But they still
| consistently fail to grok the underlying reasoning for
| things existing.
|
| But I am writing my own versions of these agentic systems
| to use for some rote menial stuff.
| mexicocitinluez wrote:
| So you werent doing in depth reviews before? Are these
| people you know? And now you just don't trust them
| because they include a tool on their workflow?
| globnomulous wrote:
| > It does amaze me that the people who don't use these tools
| seem to have the most to say about them.
|
| You're kidding, right? Most people who don't use the tools
| and write about it are responding to the ongoing hype train
| -- a specific article, a specific claim, or an idea that
| seems to be gaining acceptance or to have gone unquestioned
| among LLM boosters.
|
| I recently watched a talk by Andrei Karpathy. So much in it
| begged for a response. Google Glass was "all the rage" in
| 2013? Please. "Reading text is laborious and not fun. Looking
| at images is fun." You can't be serious.
|
| Someone recently shared on HN a blog post explaining why the
| author doesn't use LLMs. The justification for the post?
| "People keep asking me."
| mexicocitinluez wrote:
| Being asked if I'm kidding by the person comparing Google
| glasses to machine learning algorithms is pretty funny ngl.
|
| And the "I don't use these tools and never will" sentiment
| is rampant in the tech community right now. So yes, I am
| serious.
|
| Youre not talking about the blog post that completely
| ignored agentless uses are you? The one that came to the
| conclusion LLMs arent useful despite only using a subset of
| its features?
| bluefirebrand wrote:
| > And the "I don't use these tools and never will"
| sentiment is rampant in the tech community right now
|
| So is the "These tools are game changers and are going to
| make all work obsolete soon" sentiment
|
| Don't start pretending that AI boosters aren't
| _everywhere_ in tech right now
|
| I think the major difference I'm noticing is that many of
| the Boosters are not people who write any code. They are
| executives, managers, product owners, team leads, etc.
| Former Engineers maybe but very often not actively
| writing software daily
| globnomulous wrote:
| > I think the major difference I'm noticing is that many
| of the Boosters are not people who write any code.
|
| Plenty of current, working engineers who frequent and
| comment on Hacker News say they use LLMs and find them
| useful/'game changers,' I think.
|
| Regardless, I think I agree overall: the key distinction
| I see is between people who _like_ to read and write
| programs and people who just want to make some specific
| product. The former group generally treat LLMs as an
| unwelcome intrusion into the work they love and value.
| The latter generally welcome LLMs because the people
| selling them promise, in essence, that with LLMs you can
| skip the engineering and just make the product.
|
| I'm part of the former group. I love reading code,
| thinking about it, and working with it. Meeting-based
| programming (my term for LLM-assisted programming) sounds
| like hell on earth to me. I'd rather blow my brains out
| than continue to work as a software engineer in a world
| where the LLM-booster dream comes true.
| bluefirebrand wrote:
| > I'd rather blow my brains out than continue to work as
| a software engineer in a world where the LLM-booster
| dream comes true.
|
| I feel the same way
|
| But please don't. I promise I won't either. There is
| still a place for people like you and me in this world,
| it's just gonna take a bit more work to find it
|
| Deal? :)
| mexicocitinluez wrote:
| > So is the "These tools are game changers and are going
| to make all work obsolete soon" sentiment
|
| Except we aren't talking about those people, are we? The
| blog post wans't about that.
|
| > Don't start pretending that AI boosters aren't
| everywhere in tech right now
|
| PLEASE tell me what I said that made you feel like you
| need to put words in my mouth. Seriously.
|
| > I think the major difference I'm noticing is that many
| of the Boosters are not people who write any code
|
| I write code every day. I just asked Claude to convert a
| Medicare mandated 30 page assessment to a printable
| version with CSS using Cottle in C# and it did it. I'd
| love to know why that sort of thing isn't useful.
| globnomulous wrote:
| > Being asked if I'm kidding by the person comparing
| Google glasses to machine learning algorithms is pretty
| funny ngl.
|
| I didn't draw the comparison. Karpathy, one of the most
| prominent LLM proponents on the planet -- the guy who
| invented the term 'vibe-coding' -- drew the
| comparison.[1]
|
| > And the "I don't use these tools and never will"
| sentiment is rampant in the tech community right now. So
| yes, I am serious.
|
| I think you misunderstood my comment -- or my comment
| just wasn't clear enough: I quoted the line "It does
| amaze me that the people who don't use these tools seem
| to have the most to say about them." and then I asked
| "You're kidding, right?" In other words, "you can't
| seriously believe that the nay-sayers 'always have the
| most to say.'" It's a ridiculous claim. Just about every
| naysayer 'think piece' -- whether or not it's garbage --
| is responding to an overwhelming tidal wave of pro-LLM
| commentary and press coverage.
|
| > Youre not talking about the blog post that completely
| ignored agentless uses are you? The one that came to the
| conclusion LLMs arent useful despite only using a subset
| of its features?
|
| I'm referring to this one[2]. It's awful, smug, self-
| important, sanctimonious nonsense.
|
| [1] https://www.youtube.com/watch?si=xF5rqWueWDQsW3FC&v=L
| CEmiRjP...
|
| [2] https://news.ycombinator.com/item?id=44294633
| mexicocitinluez wrote:
| I'm so confused as to why you took that so literally. I
| didn't literally mean that the nay-sayers are producing
| more words than the evangelists. It was a hyperbolic
| expression. And I wasn't JUST talking about the blog
| posts. I'm talking about ALL comments about it.
| acedTrex wrote:
| Author here:
|
| Essentially the premise is that in medium trust environments
| like very large teams or low trust environments like an open
| source project.
|
| LLMs make it very difficult to make an immediate snap judgement
| about the quality of the dev that submitted the patch based
| solely on the code itself.
|
| In the absence of being able to ascertain the type of person
| you are dealing with you have to fall back too "no trust" and
| review everything with a very fine tooth comb. Essentially
| there are no longer any safe "review shortcuts" and that can be
| painful in places that relied on those markers to grease the
| wheels so to speak.
|
| Obviously if you are in an existing competent high trust team
| then this problem does not apply and most likely seems
| completely foreign as a concept.
| lxgr wrote:
| > LLMs make it very difficult to make an immediate snap
| judgement about the quality [...]
|
| That's the core of the issue. It's time to say goodbye to
| heuristics like "the blog post is written in eloquent,
| grammatical English, hence the point its author is trying to
| make must be true" or "the code is idiomatic and following
| all code styles, hence it must be modeling the world with
| high fidelity".
|
| Maybe that's not the worst thing in the world. I feel like it
| often made people complacent.
| acedTrex wrote:
| > Maybe that's not the worst thing in the world. I feel
| like it often made people complacent.
|
| For sure, in some ways perhaps reverting to a low trust
| environment might improve quality in that it now forces
| harsher/more in depth reviews.
|
| That however doesn't make the requirement less exhausting
| for people previously relying heavily on those markers to
| speed things up.
|
| Will be very interesting to see how the industry
| standardizes around this. Right now it's a bit of the wild
| west. Maybe people in ten years will look back at this post
| and think "what do you mean you judged people based on the
| code itself that's ridiculous"
| furyofantares wrote:
| I think you're unfair to the heuristics people use in your
| framing here.
|
| You said "hence the point its author is trying to make must
| be true" and "hence it must be modeling the world with high
| fidelity".
|
| But it's more like "hence the author is likely competent
| and likely put in a reasonable effort."
|
| When those assumptions hold, putting in a very deep review
| is less likely to pay off. Maybe you are right that people
| have been too complacent to begin with, I don't know, but I
| don't think you've framed it fairly.
| lxgr wrote:
| > But it's more like "hence the author is likely
| competent and likely put in a reasonable effort."
|
| And isn't dyslexic, and is a native speaker etc. Some
| will gain from this shift, some will lose.
| furyofantares wrote:
| Yes! This is part of why I bristle at such reductive
| takes, we can use more nuance thinking about what we are
| gaining and what we are losing and how to deal with it.
| tempodox wrote:
| Anyway, "following all code styles" is just a fancy way of
| saying "adheres to fashion". What meaningful conclusions
| can you draw from that?
| rurp wrote:
| It's not about fashion, it's about diligence and
| consideration. Code formatting is totally different from
| say clothing fashion. Social fashions are often about
| being novel or surprising which is the opposite of how
| good code is written. Code should be as standard, clear
| and unsurprising as is reasonably possible. If someone is
| writing code in a way that's deliberately unconventional
| or overly fancy that's a strong signal that it isn't very
| good.
|
| When someone follows standard conventions it means that
| they A) have a baseline level of knowledge to know about
| them, and B) care to write the code in a clear and
| approachable way for others.
| tempodox wrote:
| > If someone is writing code in a way that's deliberately
| unconventional or overly fancy that's a strong signal
| that it isn't very good.
|
| "unconventional" or "fancy" is in the eye of the
| beholder. Whose conventions are we talking about? Code is
| bad when it doesn't look the way you want it to? How
| convenient. I may find code hard to read because it's
| formatted "conventionally", but I wouldn't be so entitled
| as to call it bad just because of that.
| kiitos wrote:
| > "unconventional" or "fancy" is in the eye of the
| beholder.
|
| Literally not: a language defines its own conventions,
| they're not defined in terms of individual
| users/readers/maintainers subjective opinions.
|
| > Whose conventions are we talking about?
|
| The conventions defined by the language.
|
| > Code is bad when it doesn't look the way you want it
| to?
|
| No -- when it doesn't satisfy the conventions established
| by the language.
|
| > I may find code hard to read because it's formatted
| "conventionally",
|
| If you did this then you'd be wrong, and that'd be a
| problem with your personal evaluation process/criteria,
| that you would need to fix.
| Capricorn2481 wrote:
| > a language defines its own conventions
|
| Where are these mythical languages? I think the word
| you're looking for is syntax, which is entirely
| different. Conventions are how code is structured and
| expected to be read. Very few languages actually enforce
| or even suggest conventions, hence the many style guides.
| It's a standout feature of Go to have a format style, and
| people still don't agree with it.
|
| And it's kinda moot when you can always override
| conventions. It's more accurate to say a team decides on
| the conventions of a language.
| habinero wrote:
| No, they're absolutely correct that it's critical in
| professional and open source environments. Code is
| written once but read hundreds or thousands of times.
|
| If every rando hire goes in and has a completely
| different style and formatting -- and then other people
| come in and rewrite parts in their own style -- code
| rapidly goes to shit.
|
| It doesn't matter what the style is, as long as there is
| one and it's enforced.
| Capricorn2481 wrote:
| > No, they're absolutely correct that it's critical in
| professional and open source environments. Code is
| written once but read hundreds or thousands of times
|
| What you're saying is reasonable, but that's not what
| they said at all. They said there's one way to write
| cleanly and that's "Standard conventions", whatever that
| means. Yes, conventions so standard that I've read 10
| conflicting books on what they are.
|
| There is no agreed upon definition of "readable code". A
| team can have a style guide, which is great to follow,
| but that is just formalizing the personal preference of
| the people working on a project. It's not anymore divine
| than the opinion of a "rando."
| habinero wrote:
| No, you misunderstood what they said. And I misspoke a
| little, too.
|
| While it's true that _in principle_ it doesn 't matter
| what style you choose as long as there is one, in
| _practice_ languages are just communities of people, and
| every community develops norms and standards. More recent
| languages often just pick a style and bake it in.
|
| This is a good thing, because again, code is read 1000x
| more times than it's written. It saves everyone time and
| effort to just develop a typical style.
|
| And yeah, the code might run no matter how you indent it,
| but it's not correct, any more than you going to a
| restaurant and licking the plates.
| o11c wrote:
| That's not how heuristics work.
|
| The heuristic is "this submission doesn't even follow the
| basic laws of grammar, therefore I can safely assume
| incompetence and ignore it entirely."
|
| You still have to do verification for what passes the
| heuristic, but it keeps 90% of the crap away.
| sim7c00 wrote:
| its about the quality of the code, not the quality of the
| dev. you might think it's related, but it's not.
|
| a dev can write piece of good, and piece of bad code. so per
| code, review the code. not the dev!
| acedTrex wrote:
| > you might think it's related, but it's not.
|
| In my experience they very much are related. High quality
| devs are far more likely to output high quality working
| code. They test, they validate, they think, ultimately they
| care.
|
| In that case that you are reviewing a patch from someone
| you have limited experience with, it previously was
| feasible to infer the quality of the dev from the context
| of the patch itself and the surrounding context by which it
| was submitted.
|
| LLMs make that judgement far far more difficult and when
| you can not make a snap judgement you have to revert your
| review style to very low trust in depth review.
|
| No more greasing the wheels to expedite a process.
| haswell wrote:
| > _its about the quality of the code, not the quality of
| the dev. you might think it 's related, but it's not._
|
| I could not disagree more. The quality of the dev will
| always matter, and has as much to do with what code makes
| it into a project as the LLM that generated it.
|
| An experienced dev will have more finely tuned evaluation
| skills and will accept code from an LLM accordingly.
|
| An inexperienced or "low quality" dev may not even know
| what the ideal/correct solution looks like, and may be
| submitting code that they do not fully understand. This is
| especially tricky because they may still end up submitting
| high quality code, but not because they were capable of
| evaluating it as such.
|
| You could make the argument that it shouldn't matter who
| submits the code if the code is evaluated purely on its
| quality/correctness, but I've never worked in a team that
| doesn't account for who the person is behind the code. If
| its the grizzled veteran known for rarely making mistakes,
| the review might look a bit different from a review for the
| intern's code.
| NeutralCrane wrote:
| > An experienced dev will have more finely tuned
| evaluation skills and will accept code from an LLM
| accordingly. An inexperienced or "low quality" dev may
| not even know what the ideal/correct solution looks like,
| and may be submitting code that they do not fully
| understand. This is especially tricky because they may
| still end up submitting high quality code, but not
| because they were capable of evaluating it as such.
|
| That may be true, but the proxy for assessing the quality
| of the dev is the code. No one is standing over you as
| you code your contribution to ensure you are making the
| correct, pragmatic decisions. They are assessing the code
| you produce to determine the quality of your decisions,
| and over time, your reputation as a dev is made up of the
| assessments of the code you produced.
|
| The point is that an LLM in no way changes this. If a dev
| uses an LLM in a non-pragmatic way that produces bad
| code, it will erode trust in them. The LLM is a tool, but
| trust still factors in to how the dev uses the tool.
| haswell wrote:
| > _That may be true, but the proxy for assessing the
| quality of the dev is the code._
|
| Yes, the quality of the dev is a measure of the quality
| of the code they produce, but once a certain baseline has
| been established, the quality of the dev is now known
| independent of the code they may yet produce. i.e. if you
| were to make a prediction about the quality of code
| produced by a "high quality" dev vs. a "low quality" dev,
| you'd likely find that the high quality dev tends to
| produce high quality code more often.
|
| So now you have a certain degree of knowledge even before
| you've seen the code. In practice, this becomes a factor
| on every dev team I've worked around.
|
| Adding an LLM to the mix changes that assessment
| fundamentally.
|
| > _The point is that an LLM in no way changes this._
|
| I think the LLM by definition changes this in numerous
| ways that can't be avoided. i.e. the code that was
| previously a proxy for "dev quality" could now fall into
| multiple categories:
|
| 1. Good code written by the dev (a good indicator of dev
| quality if they're consistently good over time)
|
| 2. Good code written by the LLM and accepted by the dev
| because they are experienced and recognize the code to be
| good
|
| 3. Good code written by the LLM and accepted by the dev
| because it works, but not necessarily because the dev
| knew it was good (no longer a good indicator of dev
| quality)
|
| 4. Bad code written by the LLM
|
| 5. Bad code written by the dev
|
| #2 and #3 is where things get messy. Good code may now
| come into existence without it being an indicator of dev
| quality. It is now necessary to assess whether or not the
| LLM code was accepted because the dev recognized it was
| good code, or because the dev got things to work and
| essentially got lucky.
|
| It may be true that you're still evaluating the code at
| the end of the day, but what you learn from that
| evaluation has changed. You can no longer evaluate the
| quality of a dev by the quality of the code they commit
| unless you have other ways to independently assess them
| beyond the code itself.
|
| If you continued to assess dev quality without taking
| this into consideration, it seems likely that those
| assessments would become less accurate over time as more
| "low quality" devs produce high quality code - not
| because of their own skills, but because of the ongoing
| improvements to LLMs. That high quality code is no longer
| a trustworthy indicator of dev quality.
|
| > _If a dev uses an LLM in a non-pragmatic way that
| produces bad code, it will erode trust in them. The LLM
| is a tool, but trust still factors in to how the dev uses
| the tool._
|
| Yes, of course. But the issue is not that a good dev
| might erode trust by using the LLM poorly. The issue is
| that inexperienced devs will make it increasingly
| difficult to use the same heuristics to assess dev
| quality across the board.
| insane_dreamer wrote:
| > If someone uses an LLM and produces bug-free code, I'll trust
| them.
|
| Only because you already trust them to know that the code is
| indeed bug-free. Some cases are simple and straightforward --
| this routine returns a desired value or it doesn't. Other
| situations are much more complex in anticipating the ways in
| which it might interact with other parts of the system, edge
| cases that are not obvious, etc. Writing code that is "bug
| free" in that situation requires the writer of the code to
| understand the implications of the code, and if the dev doesn't
| understand exactly what the code does because it was written by
| an LLM, then they won't be able to understand the implications
| of the code. It then falls to the reviewer to understand the
| implications of the code -- increasing their workload. That was
| the premise.
| axegon_ wrote:
| That is already the case for me. The amount of times I've read
| "apologies for the oversight, you are absolutely correct" is
| staggering: 8 or 9 out of 10 times. Meanwhile I constantly see
| people mindlessly copy paying llm generated code and subsequently
| furious when it doesn't do what they expected it to do. Which,
| btw, is the better option: I'd rather have something obviously
| broken as opposed to something seemingly working.
| autobodie wrote:
| In my experience, LLMs are extremely inclined to modify code
| just to pass tests instead of meeting requirements.
| fwip wrote:
| When they're not modifying the tests to match buggy behavior.
| :P
| devjab wrote:
| Are you using the LLM's through a browser chatbot? Because the
| AI-agents we use with direct code-access aren't very chatty.
| I'd also argue that they are more capable than a lot of junior
| programmers, at least around here. We're almost at a point
| where you can feed the agents short specific tasks, and they
| will perform them well enough to not really require anything
| outside of a code review.
|
| That being said, the prediction engine still can't do any real
| engineering. If you don't specifically task them with using
| things like Python generators, you're very likely to have a
| piece of code that eats up a gazillion memory. Which
| unfortunately don't set them appart from a lot of Python
| programmers I know, but it is an example of how the LLM's are
| exactly as bad as you mention. On the positive side, it helps
| with people actually writing the specification tasks in more
| detail than just "add feature".
|
| Where AI-agents are the most useful for us is with legacy code
| that nobody prioritise. We have a data extractor which was
| written in the previous millennium. It basically uses around
| two hunded hard-coded coordinates to extact data from a
| specific type of documents which arrive by fax. It's worked for
| 30ish years because the documents haven't changed... but it
| recently did, and it took co-pilot like 30 seconds to correct
| the coordinates. Something that would've likely taken a human a
| full day of excruciating boredom.
|
| I have no idea how our industry expect anyone to become experts
| in the age of vibe coding though.
| teeray wrote:
| > Because the AI-agents we use with direct code-access aren't
| very chatty.
|
| So they're even more confident in their wrongness
| furyofantares wrote:
| > Because the AI-agents we use with direct code-access aren't
| very chatty.
|
| Every time I tell claude code something it did is wrong, or
| might be wrong, or even just ask a leading question about a
| potential bug it just wrote, it leads with "You're absolutely
| correct!" before even invoking any tools.
|
| Maybe you've just become used to ignoring this. I mostly
| ignore it but it is a bit annoying when I'm trying to use the
| agent to help me figure out if the code it wrote is correct,
| so I ask it some question it should be capable of helping
| with and it leads with "you're absolutely correct".
|
| I didn't make a proposition that can be correct or not, and
| it didn't do any work yet to to investigate my question - it
| feels like it has poisoned its own context by leading with
| this.
| gibspaulding wrote:
| > Where AI-agents are the most useful for us is with legacy
| code
|
| I'd love to hear more about your workflow and the code base
| you're working in. I have access to Amazon Q (which it looks
| like is using Claude Sonnet 4 behind the scenes) through
| work, and while I found it very useful for Greenfield
| projects, I've really struggled using it to work on our older
| code bases. These are all single file 20,000 to 100,000 line
| C modules with lots of global variables and most of the logic
| plus 25 years of changes dumped into a few long functions.
| It's hard to navigate for a human, but seems to completely
| overwhelm Q's context window.
|
| Do other Agents handle this sort of scenario better, or are
| there tricks to making things more manageable? Obviously re-
| factoring to break everything up into smaller files and
| smaller functions would be great, but that's just the sort of
| project that I want to be able to use the AI for.
| mexicocitinluez wrote:
| > 8 or 9 out of 10 times.
|
| Not they don't. This is 100% a made up statistic.
| bluefirebrand wrote:
| It isn't even being presented as a statistic it is someone
| saying what they have experienced
| atemerev wrote:
| I am a software engineer who writes 80-90% code with AI (sorry,
| can't ignore the productivity boost), and I mostly agree with
| this sentiment.
|
| I found out very early that under no circumstances you may have
| the code you don't understand, anywhere. Well, you may, but not
| in public, and you should commit to understanding it before
| anyone else sees that. Particularly before sales guys do.
|
| However, AI can help you with learning too. You can run
| experiments, test hypotheses and burn your fingers so fast. I
| like it.
| pfdietz wrote:
| There was trust?
| acedTrex wrote:
| Hi everyone, author here.
|
| Sorry about the JS stuff I wrote this while also fooling around
| with alpine.js for fun. I never expected it to make it to HN.
| I'll get a static version up and running.
|
| Happy to answer any questions or hear other thoughts.
|
| Edit: https://static.jaysthoughts.com/
|
| Static version here with slightly wonky formatting, sorry for the
| hassle.
|
| Edit2: Should work on mobile now well, added a quick breakpoint.
| konaraddi wrote:
| Given the topic of your post, and high pagespeed results, I
| think >99% of your intended audience can already read the
| original. No need to apologize or please HN users.
| pu_pe wrote:
| > While the industry leaping abstractions that came before
| focused on removing complexity, they did so with the fundamental
| assertion that the abstraction they created was correct. That is
| not to say they were perfect, or they never caused bugs or
| failures. But those events were a failure of the given
| implementation a departure from what the abstraction was SUPPOSED
| to do, every mistake, once patched led to a safer more robust
| system. LLMs by their very fundamental design are a probabilistic
| prediction engine, they merely approximate correctness for
| varying amounts of time.
|
| I think what the author misses here is that imperfect,
| probabilistic agents can build reliable, deterministic systems.
| No one would trust a garbage collection tool based on how
| reliable the author was, but rather if it proves it can do what
| it intends to do after extensive testing.
|
| I can certainly see an erosion of trust in the future, with the
| result being that test-driven development gains even more
| momentum. Don't trust, and verify.
| acedTrex wrote:
| > I think what the author misses here is that imperfect,
| probabilistic agents can build reliable, deterministic systems.
| No one would trust a garbage collection tool based on how
| reliable the author was, but rather if it proves it can do what
| it intends to do after extensive testing.
|
| > but rather if it proves it can do what it intends to do after
| extensive testing.
|
| Author here: Here I was less talking about the effectiveness of
| the output of a given tool and more so about the tool itself.
|
| To take your garbage collection example, sure perhaps an
| agentic system at some point can spin some stuff up and beat it
| into submission with test harnesses, bug fixes etc.
|
| But, imagine you used the model AS the garbage collector/tool,
| in that say every sweep you simply dumped the memory of the
| program into the model and told it to release the unneeded
| blocks. You would NEVER be able to trust that the model itself
| correctly identifies the correct memory blocks and no amount of
| "patching" or "fine tuning" would ever get you there.
|
| With other historical abstractions like say jvm, if the
| deterministic output, in this case the assembly the jit emits
| is incorrect that bug is patched and the abstraction will never
| have that same fault again. not so with LLMs.
|
| To me that distinction is very important when trying to point
| out previous developer tooling that changed the entire nature
| of the industry. It's not to say I do not think LLMs will have
| a profound impact on the way things work in the future. But I
| do think we are in completely uncharted territory with limited
| historical precedence to guide us.
| lbalazscs wrote:
| It's naive to hope that automatic tests will find all problems.
| There are several types of problems that are hard to detect
| automatically: concurrency problems, resource management
| errors, security vulnerabilities, etc.
|
| An even more important question: who tests the tests
| themselves? In traditional development, every piece of logic is
| implemented twice: once in the code and once in the tests. The
| tests checks the code, and in turn, the code implicitly checks
| the tests. It's quite common to find that a bug was actually in
| the tests, not the app code. You can't just blindly trust the
| tests, and wait until your agent finds a way to replicate a
| test bug in the code.
| bluefirebrand wrote:
| > I think what the author misses here is that imperfect,
| probabilistic agents can build reliable, deterministic systems
|
| That is quite a statement! You're talking about systems that
| are essentially entropy-machines somehow creating order?
|
| > with the result being that test-driven development gains even
| more momentum
|
| Why is it that TDD is always put forward as the silver bullet
| that fixes all issues with building software
|
| The number of times I've seen TDD build the wrong software
| after starting with the wrong tests is actually embarassing
| dirkc wrote:
| I have a friend that always says "innovation happens at the speed
| of trust". Ever since GPT3, that quote comes to mind over and
| over.
|
| Verification has a high cost and trust is the main way to lower
| that cost. I don't see how one can build trust in LLMs. While
| they are extremely articulate in both code and natural language,
| they will also happily go down fractal rabbit holes and show
| behavior I would consider malicious in a person.
| acedTrex wrote:
| Author here: I quite like that quote. A very succinct way of
| saying what took me a few paragraphs.
|
| This new world of having to verify every single thing at all
| points is quite exhausting and frankly pretty slow.
| Herring wrote:
| So get another LLM to do it. Judging is considerably easier
| [For LLMs] than writing something from scratch, so LLM judges
| will always have that edge in accuracy. Equivalently, I also
| like getting them to write tons of tests to build trust in
| correct behavior.
| acedTrex wrote:
| > Judging is considerably easier than writing something
| from scratch
|
| I don't agree with this at all. Writing new code is
| trivially easy, to do a full in depth review takes
| significantly more brain power. You have to fully ascertain
| and insert yourself into someone elses thought process.
| Thats way more work than utilizing your own thought
| process.
| Herring wrote:
| Sorry, I should have been more specific. I meant LLMs are
| more reliable and accurate at judging than at generating
| from scratch.
|
| They basically achieve over 80% agreement with human
| evaluators [1]. This level of agreement is similar to the
| consensus rate between two human evaluators, making LLM-
| as-a-judge a scalable and reliable proxy for human
| judgment.
|
| [1] https://arxiv.org/abs/2306.05685 (2023)
| habinero wrote:
| 80% is a pretty abysmal success rate and means it's very
| _unreliable_.
|
| It _sounds_ nice but it means at least 1 in 5 are bad.
| That 's worse odds than rolling 1 on a d6. You'll be
| tripping over mistakes constantly.
| malfist wrote:
| LLMs will not have the context behind the lines of code
| in the CR.
|
| Sure there's no bug with how the logic is defined in the
| CR or even in the context of the project, ti maybe won't
| throw an exception.
|
| But the LLM won't know that the query is iterating over
| an unindexed field in the DB with the table in prod
| having 10s of millions of rows. The LLM won't know that
| even though the code says the button should be red and
| the comments say the button should be red, the corporate
| style guide says red should be a very specific hex code
| that it isn't.
| inetknght wrote:
| > _So get another LLM to do it._
|
| Oh goodness that's like trusting one kid to tell you
| whether or not his friend lied.
|
| In matters where trust _matters_ , it's a recipe for
| disaster.
| Herring wrote:
| *shrug this kid is growing up _fast_
|
| Give it another year and HN comments will be very
| different.
|
| Writing tests already works now. It's usually easier to
| read tests than to read convoluted logic.
| catlifeonmars wrote:
| It's also easy to misread tests FWIW.
| inetknght wrote:
| > _shrug this kid is growing up fast_
|
| Mmmhmm. And you think this "growing up" doesn't have
| biases to lie in circumstances where it matters? Consider
| politics. Politics _matter_. It 's _inconceivable_ that a
| magic algorithm would _lie_ to us about various political
| concerns, right? Right...?
|
| A magic algorithm lying to us about anything would be
| extremely valuable to _liars_. Do you think it 's
| possible that liars are guiding the direction of these
| magic algorithms?
| dingnuts wrote:
| they've been saying that for three years and the
| performance improvement has been asymptotic (logarithmic)
| for a decade, if you've been following the state of the
| art that long.
| habinero wrote:
| Sure, and there was a lot of hype about the blockchain a
| decade ago and how it would take over everything. YC
| funded a ton of blockchain startups.
|
| I notice a distinct lack of blockchain hegemony.
| malfist wrote:
| LLMs inspecting LLM code is like the police investigating
| themselves for wrong doing.
| skim1420 wrote:
| 0.9 * 0.9 == 0.81
| kbelder wrote:
| 0.1 * 0.1 == 0.01
| tayo42 wrote:
| We do this is in professional environments already with
| documentation for designs upfront and code reviews though
| JackFr wrote:
| https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
| ..
|
| The classic on the subject.
| EGreg wrote:
| "Freedom of speech" in politics
| whiplash451 wrote:
| > "innovation happens at the speed of trust"
|
| You'll have to elaborate on that. How much trust was there in
| electricity, flight and radioactivity when we discovered them?
|
| In science, you build trust as you go.
| agent281 wrote:
| Have you heard of the War of the Currents?
|
| > As the use of AC spread rapidly with other companies
| deploying their own systems, the Edison Electric Light
| Company claimed in early 1888 that high voltages used in an
| alternating current system were hazardous, and that the
| design was inferior to, and infringed on the patents behind,
| their direct current system.
|
| > In the spring of 1888, a media furor arose over electrical
| fatalities caused by pole-mounted high-voltage AC lines,
| attributed to the greed and callousness of the arc lighting
| companies that operated them.
|
| https://en.wikipedia.org/wiki/War_of_the_currents
| bori5 wrote:
| Tesla is barely mentioned in that article which is somewhat
| surprising
| throw4847285 wrote:
| Not surprising at all. He was a minor player in the
| Current Wars compared to his primary benefactor, George
| Westinghouse. His image was rehabilitated first by
| Serbian-Americans and then by webcomic artists and
| Redditors, who turned him into a secular saint.
|
| Most of what people think they know about Tesla is not
| actually true if you examine the historical record. But
| software engineering as a discipline demands business
| villains and craftsman heroes, and so Edison and Tesla
| were warped to fit those roles even though in real life
| there is only evidence of cordial interactions.
| dirkc wrote:
| I use it to mean that the more people trust each other, the
| quicker things get done. Maybe the statement can be rephrased
| as "progress happens at the speed of trust" to avoid the
| specific scientific connotation.
| whiplash451 wrote:
| That's a pretty useless statement in the context of
| innovation.
|
| The moment a technology reaches trust at scale, it becomes
| a non-innovation in people's mind.
|
| Happened for TVs, electrical light in homes, AI for chess,
| and Google. Will happen with LLM-based assistants.
| jazzyjackson wrote:
| You're not catching on. It's not the trust in the
| technology, it's the trust between people. Consider
| business dealings between entities that do not have high
| trust - everything becomes mediated through lawyers and
| nothing happens without a contract. Slow and expensive.
| Handshake deals and promises kept move things along a lot
| faster and without the expense of hammering out legal
| arrangements.
|
| LLM leads to distrust between people. From TFA, _That
| concept is Trust - It underpins everything about how a
| group of engineers function and interact with each other
| in all technical contexts. When you discuss a project
| architecture you are trusting your team has experience
| and viewpoints to back up their assertions._
| perrygeo wrote:
| Importantly, there are many business processes today that
| are already limited by lack of trust. That's not
| necessarily a bad thing either - checks and balances exist
| for a reason. But it does strongly suggest that increasing
| "productivity" by dumping more inputs into the process is
| counter-productive to the throughput of the overall system.
| reaperducer wrote:
| _I use it to mean that the more people trust each other,
| the quicker things get done._
|
| True not only in innovation, but in business settings.
|
| I don't think there's anyone who works in any business long
| enough who doesn't have problems getting their job done
| simply because someone else with a key part of the project
| doesn't trust that you know what you're doing.
| reaperducer wrote:
| _How much trust was there in electricity, flight and
| radioactivity when we discovered them?_
|
| Not much.
|
| Plenty of people were against electricity when it started
| becoming common. They were terrified of lamps, doorbells,
| telephones, or anything else with an electric wire. If they
| were compelled to use these things (like for their job) they
| would often wear heavy gloves to protect themselves. It is
| very occasionally mentioned in novels from the late 1800's.
|
| (Edit: If you'd like to see this played out visually, watch
| the early episodes of Miss Fisher's Murder Mysteries on ABC
| [.oz])
|
| There are still people afraid of electricity today. There is
| no shortage of information on the (ironically enough)
| internet about how to shield your home from the harmful
| effects of electrical wires, both in the house and utility
| lines.
|
| Flight? I dunno about back then, but today there's plenty of
| people who are afraid to fly. If you live in Las Vegas for a
| while, you start to notice private train cars occasionally
| parked on the siding near the north outlet mall. These belong
| to celebrities who are afraid to fly, but have to go to Vegas
| for work.
|
| Radioactivity? There was a plethora of radioactive hysteria
| in books, magazines, comics, television, movies, and radio.
| It's not hard to find.
| whiplash451 wrote:
| That's exactly my point
| lubujackson wrote:
| We never can have total trust in LLM output, but we can
| certainly sanitize it and limit it's destructive range. Just
| like we sanitize user input and defend with pentests and hide
| secrets in dot files, we will eventually resolve to "best
| practices" and some "SOC-AI compliance" standard down the road.
|
| It's just too useful to ignore, and trust is always built,
| brick by brick. Let's not forget humans are far from reliable
| anyway. Just like with driving cars, I imagine producing less
| buggy code (along predefined roads) will soon outpace humans.
| Then it is just blocking and tackling to improve complexity.
| bluefirebrand wrote:
| > We never can have total trust in LLM output, but we can
| certainly sanitize it and limit it's destructive range
|
| Can we really do this reliably? LLMs are non-deterministic,
| right, so how do we validate the output in a deterministic
| way?
|
| We can validate things like shape of data being returned, but
| how do we validate correctness without an independent human
| in the loop to verify?
| lovich wrote:
| The same way we did it with humans in the loop?
|
| I check AI output for hallucinations and issues as I don't
| fully trust it to work, but we also do PRs with humans to
| have another set of eyes check because humans also make
| mistakes.
|
| For the soft sciences and arts I'm not sure how to validate
| anything from AI but for software and hard sciences I don't
| see why test suites wouldn't continue serving their same
| purpose
| aDyslecticCrow wrote:
| Famously, "it's easier to write code than to read it".
| That goes for humans. So why did we automate the easy
| part and move the effort over to the hard part?
|
| If we need a human in the loop to check every row of code
| for the deep logic errors... then we could just get the
| human to write it no?
| geor9e wrote:
| They changed the headline to "Yes, I will judge you for using
| AI..." so I feel like I got the whole story already.
| satisfice wrote:
| LLMs make bad work-- of any kind-- look like plausibly good work.
| That's why it is rational to automatically discount the products
| of anyone who has used AI.
|
| I once had a member of my extended family who turned out to be a
| con artist. After she was caught, I cut off contact, saying I
| didn't know her. She said "I am the same person you've known for
| ten years." And I replied "I suppose so. And now I realized I
| have never known who that is, and that I never can know."
|
| We all assume the people in our lives are not actively trying to
| hurt us. When that trust breaks, it breaks hard.
|
| No one who uses AI can claim "this is my work." I don't know that
| it is your work.
|
| No one who uses AI can claim that it is good work, unless they
| thoroughly understand it, which they probably don't.
|
| A great many students of mine have claimed to have read and
| understand articles I have written, yet I discovered they didn't.
| What if I were AI and they received my work and put their name on
| it as author? They'd be unable to explain, defend, or follow up
| on anything.
|
| This kind of problem is not new to AI. But it has become ten
| times worse.
| bobjordan wrote:
| I see where you're coming from, and I appreciate your
| perspective. The "con artist" analogy is plausible, for the
| fear of inauthenticity this technology creates. However, I'd
| like to offer a different view from someone who has been deep
| in the trenches of full-stack software development.
|
| I'm someone who put in my "+10,000 hours" programming complex
| applications, before useful LLMs were released. I spent years
| diving into documentation and other people's source code every
| night, completely focused on full-stack mastery. Eventually,
| that commitment led to severe burnout. My health was bad, my
| marriage was suffering. I released my application and then I
| immediately had to walk away from it for three years just to
| recover. I was convinced I'd never pick it up again.
|
| It was hearing many reports that LLMs had gotten good at code
| that cautiously brought me back to my computer. That's where my
| experience diverges so strongly from your concerns. You say,
| "No one who uses AI can claim 'this is my work.'" I have to
| disagree. When I use an LLM, I am the architect and the final
| inspector. I direct the vision, design the system, and use a
| diff tool to review every single line of code it produces. Just
| recently, I used it as a partner to build a complex
| optimization model for my business's quote engine. Using a true
| optimization model was always the "right" way to do it but
| would have taken me months of grueling work before, learning
| all details of the library, reading other people's code, etc.
| We got it done in a week. Do I feel like it's my work?
| Absolutely. I just had a tireless and brilliant, if sometimes
| flawed, assistant.
|
| You also claim the user won't "thoroughly understand it." I've
| found the opposite. To use an LLM effectively for anything non-
| trivial, you need a deeper understanding of the fundamentals to
| guide it and to catch its frequent, subtle mistakes. Without my
| years of experience, I would be unable to steer it for complex
| multi-module development, debug its output, or know that the
| "plausibly good work" it produced was actually wrong in some
| ways (like N+1 problems).
|
| I can sympathize with your experience as a teacher. The problem
| of students using these tools to fake comprehension is real and
| difficult. In academia, the process of learning, getting some
| real fraction of the +10,000hrs is the goal. But in the
| professional world, the result is the goal, and this is a new,
| powerful tool to achieve better results. I'm not sure how a
| teacher should instruct students in this new reality, but
| demonizing LLM use is probably not the best approach.
|
| For me, it didn't make bad work look good. It made great work
| possible again, all while allowing me to have my life back. It
| brought the joy back to my software development craft without
| killing me or my family to do it. My life is a lot more
| balanced now and for that, I'm thankful.
| satisfice wrote:
| Here's the problem, friend: I also have put in my 10,000
| hours. I've been coding as part of my job since 1983. I
| switched to testing from production coding in 1987, but I ran
| a team that tested developer tools, at Apple and Borland, for
| eight years. I've been living and breathing testing for
| decades as a consultant and expert witness.
|
| I do not lightly say that I don't trust the work of someone
| who uses AI. I'm required to practice with LLMs as part of my
| job. I've developed things with the help of AI. Small things,
| because the amount of vigilance necessary to do big things is
| prohibitive.
|
| Fools rush in, they say. I'm not a fool, and I'm not claiming
| that you are either. What I know is that there is a huge
| burden of proof on the shoulders of people who claim that AI
| is NOT problematic-- given the substantial evidence that it
| behaves recklessly. This burden is not satisfied by people
| who say "well, I'm experienced and I trust it."
| bobjordan wrote:
| Thank you for sharing your deep experience. It's a valid
| perspective, especially from an expert in the world of
| testing.
|
| You're right to call out the need for vigilance and to
| place the burden of proof on those of us who advocate for
| this tool. That burden is not met by simply trusting the
| AI, you're right, that would be foolish. The burden is met
| by changing our craft to incorporate the necessary
| oversight to not be reckless in our use of this new tool.
|
| Coming from the manufacturing world, I think of it like the
| transition in metalwork industry from hand tools to
| advanced CNC machines and robotics. A master craftsman with
| a set of metal working files has total, intimate control.
| When a CNC machine is introduced, it brings incredible
| speed and capability, but also a new kind of danger. It has
| no judgment. It will execute a flawed design with perfect,
| precision.
|
| An amateur using the CNC machine will trust it blindly and
| create "plausibly good" work that doesn't meet the
| specifications. A master, however, learns a new set of
| skills: CAD design, calibrating the machine, and, most
| importantly, inspecting the output. Their vigilance is what
| turns reckless use of a new tool into an asset that allows
| them to create things they couldn't before. They don't
| trust the tool, they trust their process for using it.
|
| My experience with LLM use has been the same. The
| "vigilance" I practice is my new craft. I spend less time
| on the manual labor of coding and more time on
| architecture, design, and critical review. That's the only
| way to manage the risks.
|
| So I agree with your premise, with one key distinction: I
| don't believe tools themselves can be reckless, only their
| users can. Ultimately, like any powerful tool, its value is
| unlocked not by the tool itself, but by the disciplined,
| expert process used to control it.
| HardCodedBias wrote:
| All of this fighting against LLMs is pissing in the wind.
|
| It seems that LLMs, as they work today, make developers more
| productive. It is possible that they benefit less experienced
| developers even more than experienced developers.
|
| More productivity, and perhaps very large multiples of
| productivity, will not be abandoned due roadblocks constructed by
| those who oppose the technology due to some reason.
|
| Examples of the new productivity tool causing enormous harm (eg:
| bug that brings down some large service for a considerable amount
| of time) will not stop the technology if it being considerable
| productivity.
|
| Working with the technology and mitigating it's weaknesses is the
| only rational path forward. And those mitigation can't be a set
| of rules that completely strip the new technology of it's
| productivity gains. The mitigations have to work with the
| technology to increase its adoption or they will be worked
| around.
| ge96 wrote:
| It is funny (ego) I remember when React was new and I refused
| to learn it, had I learned it earlier I probably would have
| entered the market years earlier.
|
| Even now I have this refusal to use GPT where as my coworkers
| lately have been saying "ChatGPT says" or this code was created
| by chatGPT idk, for me I take pride writing code myself/not
| using GPT but I also still use google/stackoverflow which you
| could say is a slower version of GPT.
| anthonypasq wrote:
| this mindset does not work in software. My dad would still be
| programming with punchcards if he thought this way. instead
| he using copilot daily writing microservices and isnt some
| annoying dinosaur
| ge96 wrote:
| yeah it's pro con, I also hear my coworkers saying "I don't
| know how it works" or there are methods in the code that
| don't exist
|
| But anyway I'm at the point in my career where I am not
| learning to code/can already do it. Sure languages are
| new/can help there for syntax
|
| edit: other thing I'll add, I can see the throughput thing,
| it's like a person has never used opensearch before and
| it's a rabbithole, anything new there's that wall you have
| to overcome, but it's like we'll get the feature done, but
| did we really understand how it works... do we need to?
| Idk. I know this person can barely code but because they
| use something like chatGPT they're able to crap out walls
| of code and with tweaking it will work eventually -- I am
| aware this sounds like gatekeeping from my part
|
| Ultimately personally I don't want to do software
| professionaly/trying to save/invest enough then get out
| just because the job part sucks the fun out of development.
| I've been in it for about 10 years now, should have been
| plenty of time to save but I'm dumb/too generous.
|
| I think there is healthy skepticism too vs. just jumping on
| the bandwagon that everyone else is doing and really my
| problem is just I'm insecure/indecisive, I don't need
| everyone to accept me especially if I don't need money
|
| Last rant, I will be experimenting with agentic stuff as I
| do like Jarvis, make my own voice rec model/locally runs.
| mjr00 wrote:
| > It seems that LLMs, as they work today, make developers more
| productive.
|
| Think this strongly depends on the developer and what they're
| attempting to accomplish.
|
| In my experience, most people who swear LLMs make them 10x more
| productive are relatively junior front-end developers or serial
| startup devs who are constantly greenfielding new apps. These
| are totally valid use cases, to be clear, but it means a junior
| front-end dev and a senior embedded C dev tend to talk past
| each other when they're discussing AI productivity gains.
|
| > Working with the technology and mitigating it's weaknesses is
| the only rational path forward.
|
| Or just using it more sensibly. As an example: is the idea of
| an AI "agent" even a good one? The recent incident with
| Copilot[0] made MS and AI look like a laughingstock. It's
| possible that trying to let AI autonomously do work just isn't
| very smart.
|
| As a recent analogy, we can look at blockchain and
| cryptocurrency. Love it or hate it, it's clear from the success
| of Coinbase and others that blockchain has found some real, if
| niche, use cases. But during peak crypto hype, you had people
| saying stuff like "we're going to track the coffee bean supply
| chain using blockchain". In 2025 that sounds like an
| exaggerated joke from Twitter, but in 2020 it was IBM
| legitimately trying to sell this stuff[1].
|
| It's possible we'll look back and see AI agents, or other
| current applications of generative AI, as the coffee blockchain
| of this bubble.
|
| [0]
| https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...
|
| [1]
| https://www.forbes.com/sites/robertanzalone/2020/07/15/big-c...
| parineum wrote:
| > In my experience, most people who swear LLMs make them 10x
| more productive are relatively junior front-end developers or
| serial startup devs who are constantly greenfielding new
| apps. These are totally valid use cases, to be clear, but it
| means a junior front-end dev and a senior embedded C dev tend
| to talk past each other when they're discussing AI
| productivity gains.
|
| I agree with this quite a lot. I also think that those
| greenfield apps quickly become unmanageable by AI as you need
| to start applying solutions that are unique/tailored for your
| objective or you want to start abstracting some functionality
| into building components and base classes that the AI hasn't
| seen before.
|
| I find AI very useful to get me to a from beginner to
| intermediate in codebases and domains that I'm not familiar
| with but, once I get the familiarity, the next steps I take
| mostly without AI because I want to do novel things it's
| never seen before.
| scelerat wrote:
| I didn't see the post as pissing into the wind so much as
| calling out several caveats of coding with LLMs, especially on
| teams, and ideas on how to mitigate them.
| conartist6 wrote:
| And here it is again. "More productive"
|
| But this doesn't mean that the model/human combo is more
| effective at serving the needs of users! It means "producing
| more code."
|
| There are no LLMs shipping changesets that delete 2000 lines of
| code -- that's how you know "making engineers more productive"
| is a way of talking about how much code is being created...
| eikenberry wrote:
| My wife's company recently hired some contractors and they
| were touting their productivity with AI by saying how it
| allowed them (one person) to write 150k lines of code in 3
| weeks. They said this without sarcasm. It was funny and scary
| at the same time that anyone might buy this as a good
| outcome. Classic lines-of-code metric rearing its ugly head
| again.
| FuckButtons wrote:
| I think you're arguing against something the author didn't
| actually say.
|
| You seem to be claiming that this is a binary, either we will
| or won't use llms, but the author is mostly talking about risk
| mitigation.
|
| By analogy it seems like you're saying the author is
| fundamentally against the development of the motor car because
| they've pointed out that some have exploded whereas before, we
| had horses which didn't explode, and maybe we should work on
| making them explode less before we fire up the glue factories.
| observationist wrote:
| There's no reason to think AI will stop improving, and the rate
| of improvement is increasing as well, and no reason to think that
| these tools won't vastly outperform us in the very near future.
| Putting aside AGI and ASI, simply improving the frameworks of
| instructions and context, breaking down problems into smaller
| problems, and methodology of tools will result in quality
| multiplication.
|
| Making these sort of blanket assessments of AI, as if it were a
| singular, static phenomena is bad thinking. You can say things
| like "AI Code bad!" about a particular model, or a particular
| model used in a particular context, and make sense. You cannot
| make generalized statements about LLMs as if they are uniform in
| their flaws and failure modes.
|
| They're as bad now as they're ever going to be again, and they're
| getting better faster, at a rate outpacing the expectations and
| predictions of all the experts.
|
| The best experts in the world, working on these systems, have a
| nearly universal sentiment of "holy shit" when working on and
| building better AI - we should probably pay attention to what
| they're seeing and saying.
|
| There's a huge swathe of performance gains to be made in fixing
| awful human code. There's a ton of low hanging fruit to be gotten
| by doing repetitive and tedious stuff humans won't or can't do.
| Those two things mean at least 20 or more years of impressive
| utility from AI code can be had.
|
| Things are just going to get faster, and weirder, and weirder
| faster.
| christhecaribou wrote:
| Sure, if we all collectively ignore model collapse.
| ayakaneko wrote:
| I think that, yes sure, there's no reason to think AI will stop
| improving.
|
| But I think that everyone is lossing trust not because there is
| no potential that LLMs could write good code or not, it's the
| trust to the user who uses LLMs to uncontrollable-ly generate
| those patches without any knowledge, fact checks, and
| verifications. (many of them may not even know how to test it.)
|
| In another word, while LLMs is potentially capable of being a
| good SWE, but the human behind it right now, is spamming, and
| doing non-sense works, and let the unpaid open source
| maintainers to review and feedback them (most of the time,
| manually).
| klabb3 wrote:
| > There's no reason to think AI will stop improving
|
| No, and there's no reason to think cars will stop improving
| either, but that doesn't mean they will start flying.
|
| The first error is in thinking that AI is converging towards a
| human brain. To treat this as a null hypothesis is incongruent
| both wrt the functional differences between the two and
| crucially empirical observations of the current trajectory of
| LLMs. We have seen rapid increases in ability, yes, but those
| abilities are very asymmetrical by domain. Pattern matching and
| shitposting? Absolutely crushing humans already. Novel
| conceptual ideas and consistency checked reasoning? Not so
| much, eg all that hype around PhD-level novel math problems
| died down as quickly as it had been manufactured. _If they
| were_ converging on human brain function, why this vastly
| uneven ability increases?
|
| The second error is to assume a superlinear ability improvement
| when the data has more or less run out and has to be slowly
| replenished over time, while avoiding the AI pollution in
| public sources. It's like assuming oil will accelerate if it
| had run out and we needed to wait for more bio-matter to
| decompose for every new drop of crude. Can we improve engine
| design and make ICEs more efficient? Yes, but it's a
| diminishing returns game. The scaling hypothesis was not
| exponential but sigmoid, which is in line with most paradigm
| shifts and novel discoveries.
|
| > Making these sort of blanket assessments of AI, as if it were
| a singular, static phenomena is bad thinking.
|
| I agree, but do you agree with yourself here? Ie:
|
| > no reason to think that these tools won't vastly outperform
| us in the very near future
|
| .. so back to single axis again? How is this different from
| saying calculators outperform humans?
| jrflowers wrote:
| > shitposting? Absolutely crushing humans already
|
| Where can I go to see language model shitposters that are
| better than human shitposters?
| klabb3 wrote:
| On LinkedIn.
|
| (Always remember to use eye protection.)
| jrflowers wrote:
| Oh. Well that's not super surprising. LinkedIn has the
| absolute worst posters. That's like saying robots can
| dance better than humans but the humans in question only
| know how to do The Robot and forgot most of the steps
| I_Lorem wrote:
| He's making a good point on trust, but, really, doesn't the trust
| flow both directions? Should the Sr. Engineer rubber stamp or
| just take a quick glance at Bob's implementation because he's
| earned his chops, or should the Sr. Engineer apply the same level
| of review regardless of whether it's Bob, Mary, or Rando
| Calrissian submitting their work for review?
| eikenberry wrote:
| The Sr. Engineer should definitely give (presumably another Sr.
| Eng.) Bob's code a quicky review and approve it. If Mary or
| Rando are Sr. then they should get the same level as well. If
| anyone is a Jr. they should get a much more in-depth review as
| it's a teaching opportunity, whereas Sr. on Sr. reviews are
| done to enforce conventions and to be sure the PR has an
| audience (people take more care when they know other people
| will look at it).
| macawfish wrote:
| I bumped into this at work but not in the way you might expect.
| My colleague and I were under some pressure to show progress and
| decided to rush merging a pretty significant refactor I'd been
| working on. It was a draft PR but we merged it for momentum's
| sake. The next week some bugs popped up in an untested area of
| the code.
|
| As we were debugging, my colleague revealed his assumption that
| I'd used AI to write it, and expressed frustration at trying to
| understand something AI generated after the fact.
|
| But I hadn't used AI for this. Sure, yes I do use AI to write
| code. But this code I'd written by hand and with careful
| deliberate thought to the overall design. The bugs didn't stem
| from some fundamental flaw in the refactor, they were little
| oversights in adjusting existing code to a modified API.
|
| This actually ended up being a trust building experience over all
| because my colleague and I got to talk about the tension
| explicitly. It ended up being a pretty gentle encounter with the
| power of what's happening right now. In hindsight I'm glad it
| worked out this way, I could imagine in a different work
| environment, something like this could have been more messy.
|
| Be careful out there.
| throwawayoldie wrote:
| IMHO, s/may/has/
| benreesman wrote:
| I'm currently standing up a C++ capability in an org that hasn't
| historically had one, so things like the style guide and examples
| folder require a lot of care to give a good start for new
| contributors.
|
| I have instructions for agents that are different in some details
| of convention, e.g. human contributors use AAA allocation style,
| agents are instructed to use type first. I convert code that
| "graduates" from agent product to review-ready as I review agent
| output, which keeps me honest that I don't myself submit code
| without scrutiny to the review of other humans: they are able to
| prompt an LLM without my involvement, and I'm able to ship LLM
| slop without making a demand on their time. Its an honor system,
| but a useful one if everyone acts in good faith.
|
| I get use from the agents, but I almost always make changes and
| reconcile contradictions.
| heisenbit wrote:
| > The reality is that LLMs enable an inexperienced engineer to
| punch far above their proverbial weight class. That is to say, it
| allows them to work with concepts immediately that might have
| taken days, months or even years otherwise to get to that level
| of output.
|
| At the moment LLMs allow me to punch far above my weight class in
| Python where I do a short term job. But then I know all the
| concepts from decades dabbling in other ecosystems. Let's all
| admit there is a huge amount of accidental complexity (h/t
| Brooks's Silver-bullet) in our world. For better or worse there
| are skill silos that are now breaking down.
| mensetmanusman wrote:
| All this means is that the QC is going to be 10x more important.
| wg0 wrote:
| We have seen those 10x engineers churning out PRs and huge PRs
| before anyone can fathom and make sense of the whole damn thing.
|
| Wondering what they would be producing with LLMs?
| lawlessone wrote:
| One trust breaking issue is we still can't know why the LLM makes
| specific choices.
|
| Sure we can ask it why it did something but any reason it gives
| is just something generated to sound plausible.
___________________________________________________________________
(page generated 2025-06-26 23:01 UTC)