[HN Gopher] LLM code generation may lead to an erosion of trust
       ___________________________________________________________________
        
       LLM code generation may lead to an erosion of trust
        
       Author : CoffeeOnWrite
       Score  : 206 points
       Date   : 2025-06-26 06:07 UTC (16 hours ago)
        
 (HTM) web link (jaysthoughts.com)
 (TXT) w3m dump (jaysthoughts.com)
        
       | gblargg wrote:
       | https://archive.is/5I9sB
       | 
       | (Works on older browsers and doesn't require JavaScript except to
       | get past CloudSnare).
        
       | cheriot wrote:
       | > promises that the contributed code is not the product of an LLM
       | but rather original and understood completely.
       | 
       | > require them to be majority hand written.
       | 
       | We should specify the outcome not the process. Expecting the
       | contributor to understand the patch is a good idea.
       | 
       | > Juniors may be encouraged/required to elide LLM-assisted
       | tooling for a period of time during their onboarding.
       | 
       | This is a terrible idea. Onboarding is a lot of random
       | environment setup hitches that LLMs are often really good at.
       | It's also getting up to speed on code and docs and I've got some
       | great text search/summarizing tools to share.
        
         | bluefirebrand wrote:
         | > Onboarding is a lot of random environment setup hitches
         | 
         | Learning how to navigate these hitches is a _really important
         | process_
         | 
         | If we streamline every bit of difficulty or complexity out of
         | our lives, it seems trivially obvious that we will soon have no
         | idea what to do when we encounter difficulty or complexity. Is
         | that just me thinking that?
        
           | kmoser wrote:
           | There will always be people who know how to handle the
           | complexity we're trying to automate away. If I can't figure
           | out some arcane tax law when filling out my taxes, I ask my
           | accountant, as it's literally their job to know these things.
        
             | bluefirebrand wrote:
             | > There will always be people who know how to handle the
             | complexity we're trying to automate away
             | 
             | This is not a given!
             | 
             | If we automated all accounting, why would anyone still take
             | the time to learn to become an accountant?
             | 
             | Yes, there are sometimes people who are just invested in
             | learning traditional stuff for the sake of it, but is that
             | really what we want to rely on as the fallback when AI
             | fails?
        
           | RunningDroid wrote:
           | > > Onboarding is a lot of random environment setup hitches >
           | > Learning how to navigate these hitches is a _really
           | important process_
           | 
           | To add to this, a barrier to contribution can reduce low
           | quality/spam contributions. The downside is that a barrier to
           | contribution that's too high reduces all contributions.
        
       | namenotrequired wrote:
       | > LLMs ... approximate correctness for varying amounts of time.
       | Once that time runs out there is a sharp drop off in model
       | accuracy, it simply cannot continue to offer you an output that
       | even approximates something workable. I have taken to calling
       | this phenomenon the "AI Cliff," as it is very sharp and very
       | sudden
       | 
       | I've never heard of this cliff before. Has anyone else
       | experienced this?
        
         | sandspar wrote:
         | I'm not sure. Is he talking about context poisoning?
        
         | Kuinox wrote:
         | I'm doing my own procedurally generated benchmark.
         | 
         | I can make the problem input bigger as I want.
         | 
         | Each LLM have a different thresholf for each problem, when
         | crossed the performance of the LLM collapse.
        
         | Paradigma11 wrote:
         | If the context gets to big or otherwise poisoned you have to
         | restart the chat/agent. A bit like windows of old. This trains
         | you to document the current state of your work so the new agent
         | can get up to speed.
        
         | bubblyworld wrote:
         | I've only experienced this while vibe coding through chat
         | interfaces, i.e. in the complete absence of feedback loops.
         | This is _much_ less of a problem with agentic tools like claude
         | code /codex/gemini cli, where they manage their own context
         | windows and can run your dev tooling to sanity check themselves
         | as they go.
        
         | Syzygies wrote:
         | One can find opinions that Claude Code Opus 4 is worth the
         | monthly $200 I pay for Anthropic's Max plan. Opus 4 is smarter;
         | one either can't afford to use it, or can't afford not to use
         | it. I'm in the latter group.
         | 
         | One feature others have noted is that the Opus 4 context buffer
         | rarely "wears out" in a work session. It can, and one needs to
         | recognize this and start over. With other agents, it was my
         | routine experience that I'd be lucky to get an hour before
         | having to restart my agent. A reliable way to induce this
         | "cliff" is to let AI take on a much too hard problem in one
         | step, then flail helplessly trying to fix their mess. Vibe-
         | coding an unsuitable problem. One can even kill Opus 4 this
         | way, but that's no way to run a race horse.
         | 
         | Some "persistence of memory" harness is as important as one's
         | testing harness, for effective AI coding. With the right care
         | having AI edit its own context prompts for orienting new
         | sessions, this all matters less. AI is spectacularly bad at
         | breaking problems into small steps without our guidance, and
         | small steps done right can be different sessions. I'll
         | regularly start new sessions when I have a hunch that this will
         | get me better focus for the next step. So the cliff isn't so
         | important. But Opus 4 is smarter in other ways.
        
           | suddenlybananas wrote:
           | >can't afford not to use it. I'm in the latter group.
           | 
           | People love to justify big expenses as necessary.
        
             | Syzygies wrote:
             | $200 is a small expense and you don't know why I need AI.
             | 
             | The online dialog about AI is mostly noise, and even at HN
             | it is badly distorted by people who wince at $20 a month,
             | and complain AI isn't that smart.
        
           | fwip wrote:
           | Sometimes after it flails for a while, but I think it's on
           | the right path, I'll rewind the context to just before it
           | started trying to solve the problem (but keep the code
           | changes). And I'll tell it "I got this other guy to attempt
           | what we just talked about, but it still has some problems."
           | 
           | Snipping out the flailing in this way seems to help.
        
         | gwd wrote:
         | I experience it pretty regularly -- once the complexity of the
         | code passes a certain threshold, the LLM can't keep everything
         | in its head and starts thrashing around. Part of my job working
         | with the LLM is to manage the complexity it sees.
         | 
         | And one of the things with current generators is that they tend
         | to make things more complex over time, rather than less. It's
         | always _me_ prompting the LLM to refactor things to make it
         | simpler, or doing the refactoring once it 's gotten to complex
         | for the LLM to deal with.
         | 
         | So at least with the current generation of LLMs, it seems
         | rather inevitable that if you just "give LLMs their head" and
         | let them do what they want, eventually they'll create a giant
         | Rube Goldberg mess that you'll have to try to clean up.
         | 
         | ETA: And to the point of the article -- if you're an old salt,
         | you'll be able to recognize when the LLM is taking you out to
         | sea early, and be able to navigate your way back into shallower
         | waters even if you go out a bit too far. If you're a new hand,
         | you'll be out of your depth and lost at sea before you know
         | it's happened.
        
         | windward wrote:
         | I've seen it referred to as 'context drunk'.
         | 
         | Imagine that you have your input to the context, 10000 tokens
         | that are 99% correct. Each time the LLM replies it adds 1000
         | tokens that are 90% correct.
         | 
         | After some back-and-forth of you correcting the LLM, its
         | context window is mostly its own backwash^Woutput. Worse, the
         | error compounds because the 90% that is correct is just correct
         | extrapolation of an argument about incorrect code, and because
         | the LLM ranks more recent tokens as more important.
         | 
         | The same problem also shows up in prose.
        
         | Workaccount2 wrote:
         | I call it context rot. As the context fills up the quality of
         | output erodes with it. The rot gets even worse or progresses
         | faster the more spurious or tangential discussion is in
         | context.
         | 
         | This is also can be made much worse by thinking models, as
         | their CoT is all in context, and if there thoughts really
         | wander it just plants seeds of poison feeding the rot. I really
         | wish they can implement some form of context pruning, so you
         | can nip irrelevant context when it forms.
         | 
         | In the meantime, I make summaries and carry it to a fresh
         | instance when I notice the rot forming.
        
         | lubujackson wrote:
         | I definitely hit this vibe coding a large-ish backend. Well
         | defined data structures, good modularity, etc. But at a point,
         | Cursor started to lose the plot and rewrite or duplicate
         | functions, recreate or misue data structures, etc.
         | 
         | The solve was to define several Cursor rules files for
         | different views of the codebase - here's the structure, here's
         | the validation logic, etc. That and using o3 has at least
         | gotten me to the next level.
        
         | impure wrote:
         | This sounds a lot like accuracy collapse as discussed in that
         | Apple paper. That paper clearly showed that there is some point
         | where AI accuracy collapses extremely quickly.
         | 
         | I suspect it has something more to do with the model producing
         | too many tokens and becoming fixated on what it said before.
         | You'll often see this in long conversations. The only way to
         | fix it is to start a new conversation.
        
         | npteljes wrote:
         | I reset "work" AI sessions quite frequently, so I didn't see
         | that there. I experienced it though with storytelling. In my
         | storytelling scenario, context and length was important. And
         | the AI at one late point forgot how my characters should behave
         | in the developing situation, and just had them react to it in a
         | very different way. And there was no going back from that. Very
         | weird experience.
        
       | beau_g wrote:
       | The article opens with a statement saying the author isn't going
       | to reword what others are writing, but the article reads as that
       | and only that.
       | 
       | That said, I do think it would be nice for people to note in pull
       | requests which files have AI gen code in the diff. It's still a
       | good idea to look at LLM gen code vs human code with a bit
       | different lens, the mistakes each make are often a bit different
       | in flavor, and it would save time for me in a review to know
       | which is which. Has anyone seen this at a larger org and is it of
       | value to you as a reviewer? Maybe some tool sets can already do
       | this automatically (I suppose all these companies report the % of
       | code that is LLM generated must have one if they actually have
       | these granular metrics?)
        
         | acedTrex wrote:
         | Author here:
         | 
         | > The article opens with a statement saying the author isn't
         | going to reword what others are writing, but the article reads
         | as that and only that.
         | 
         | Hmm, I was just saying I hadn't seen much literature or
         | discussion on trust dynamics in teams with LLMs. Maybe I'm just
         | in the wrong spaces for such discussions but I haven't really
         | come across it.
        
       | DyslexicAtheist wrote:
       | it's really hard using AI (not impossible) to produce meaningful
       | offensive security to improve defense due to there being way too
       | many guard rails.
       | 
       | While on the other hand real nation-state threat actors would
       | face no such limitations.
       | 
       | On a more general level, what concerns me isn't whether people
       | use it to get utility out of it (that would be silly), but the
       | power-imbalance in the hand of a few, and with new people pouring
       | their questions into it, this divide getting wider. But it's not
       | just the people using AI directly but also every post online that
       | eventually gets used for training. So to be against it would mean
       | to stop producing digital content.
        
       | davidthewatson wrote:
       | Well said. The death of trust in software is a well worn path
       | from the money that funds and founds it to the design and
       | engineering that builds it - at least the 2 guys-in-a-garage
       | startup work I was involved in for decades. HITL is key. Even
       | with a human in the loop, you wind up at Therac 25. That's
       | exactly where hybrid closed loop insulin pumps are right now.
       | Autonomy and insulin don't mix well. If there weren't a moat of
       | attorneys keeping the signal/noise ratio down, we'd already
       | realize that at scale - like the PR team at 3 letter technical
       | universities designed to protect parents from the exploding
       | pressure inside the halls there.
        
       | tomhow wrote:
       | [Stub for offtopicness, including but not limited to comments
       | replying to original title rather than article's content]
        
         | sandspar wrote:
         | It's interesting that AI proponents say stuff like, "Humans
         | will remain interested in other humans, even after AI can do
         | all our jobs." It really does seem to be true. Here for example
         | we have a guy who's using AI to make a status-seeking statement
         | i.e. "I'm playing a strong supporting role on the 'anti-AI
         | thinkers' team therefore I'm high status". Like, humans have an
         | amazing ability to repurpose anything into status markers. Even
         | AI. I think that if AI replaces all of our actual jobs then
         | we'll still spend our time doing status jobs. In a way this guy
         | is living in the future even more than most AI users.
        
           | michelsedgh wrote:
           | For now, yes, because humans are doing most of jobs better
           | than AI. In 10 years time, if the AI's are doing a better
           | job, people like author need to learn all the ropes if they
           | wanna catch up. I don't think LLMs will destroy all jobs, i
           | think those who learn them and use them properly, and those
           | professionals will outdo people who don't use these tools
           | just for the sake of saying I'm high status I dont use these
           | tools.
        
             | nextlevelwizard wrote:
             | If AI will do better job than humans what ropes are there
             | to learn? You just feed in the requirements and AI poops
             | out products.
             | 
             | This often is brought up that if you don't use LLMs now to
             | produce so-so code you will somehow magically completely
             | fall off when the LLMs all of a sudden start making perfect
             | code as if developers haven't been learning new tools
             | constantly as the field as evolved. Yes, I use old
             | technology, but also yes I try new technology and pick and
             | choose what works for me and what does not. Just because
             | LLMs don't have a good place in my work flow does not mean
             | I am not using them at all or that I haven't tried to use
             | them.
        
               | michelsedgh wrote:
               | Good on you. You are using it and trying to keep up. Keep
               | doing that and try to push what you can do with it. I
               | love to hear that!
        
         | lynx97 wrote:
         | No worries, I also judge you for relying on JavaScript for your
         | "simple blog".
        
           | gblargg wrote:
           | Doesn't even work on older browsers either.
        
           | rvnx wrote:
           | Claude said to use Markdown, text file or HTML with minimal
           | CSS. So it means the author does not know how to prompt.
           | 
           | The blog itself is using Alpine JS, which is a human-written
           | framework 6 years ago (https://github.com/alpinejs/alpine),
           | and you can see the result is not good.
        
           | mnmalst wrote:
           | Ha, I came her to make the same comment.
           | 
           | Two completely unnecessary request to: jsdelivr.net and
           | net.cdn.cloudflare.net
        
           | acedTrex wrote:
           | I wrote it while playing with alpine.js for fun just messing
           | around with stuff.
           | 
           | Never actually expected it to be posted on HN. Working on
           | getting a static version up now.
        
         | MaxikCZ wrote:
         | Yes, I will judge you for requiring javascript to display a
         | page of such basic nature.
        
         | thereisnospork wrote:
         | In a few years people who don't/can't use AI will be looked at
         | like people who couldn't use a computer ~20 years ago.
         | 
         | It might not solve every problem, but it solves enough of them
         | better enough it belongs in the tool kit.
        
           | tines wrote:
           | I think it will be the opposite. AI causes cognitive decline,
           | in the future only the people who don't use AI will retain
           | their ability to think. Same as smartphone usage, the less
           | the better.
        
             | thereisnospork wrote:
             | >Same as smartphone usage, the less the better.
             | 
             | That comparison kind of makes my point though. Sure you can
             | bury your face into Tik Tok for 12hrs a day and they do
             | kind of suck at Excel but smartphones are massively useful
             | and used tools by (approximately) everyone.
             | 
             | Someone not using a smartphone in this day and age is very
             | fairly a 'luddite'.
        
               | tines wrote:
               | I disagree, smartphones are very narrowly useful. Most of
               | the time they're used in ways that destroy the human
               | spirit. Someone not using a smartphone in this day and
               | age is a god among ants.
               | 
               | A computer is a bicycle for the mind; an LLM is an easy-
               | chair.
        
             | AnimalMuppet wrote:
             | One could argue (truthfully!) that cars cause the decline
             | of leg muscles. But in many situations, cars are enough
             | better than walking, so we don't care.
             | 
             | AI _may_ reach that point - that it 's enough better than
             | us thinking that we don't think much anymore, and get worse
             | at thinking as a result. Well, is that a net win, or not?
             | If we get there for that reason, it's probably a net
             | win[1]. If we get there because the AI companies are really
             | good at PR, that's a definite net loss.
             | 
             | All that is for the future, though. I think that currently,
             | it's a net loss. Keep your ability to think; don't trust AI
             | any farther than you yourself understand.
             | 
             | [1] It could still not be a net win, if AI turns out to be
             | very useful but also either damaging or malicious, and lack
             | of thinking for ourselves causes us to miss that.
        
               | tines wrote:
               | You're really saying that getting worse at thinking may
               | be a net win, and comparing atrophied leg muscles to an
               | atrophied mind? I think humanity has lost the plot.
        
               | AnimalMuppet wrote:
               | Which took better thinking, assembly or Java? We've lost
               | our ability to think well in at least that specific area.
               | Are we worse off, or better?
        
               | tines wrote:
               | Java and Assembly are the same in the dimension of
               | cognitive burden. Trying to reason about this
               | fundamentally new thing with analogies like this will not
               | work.
        
         | j3th9n wrote:
         | Back in the day they would judge people for turning on a
         | lightbulb instead of lighting a candle.
        
         | djm_ wrote:
         | You could do with using an LLM to make your site work on
         | mobile.
        
         | Kuinox wrote:
         | 7 comments.
         | 
         | 3 have obviously only read the title, and 3 comments how the
         | article require JS.
         | 
         | Well played HN.
        
           | sandspar wrote:
           | That's typical for link sharing communities like HN and
           | Reddit. His title clearly struck a nerve. I assume many
           | people opened the link, saw that it was a wall of text,
           | scanned the first paragraph, categorized his point into some
           | slot that they understand, then came here to compete in HN's
           | side-market status game. Normal web browsing behavior, in
           | other words.
        
           | tomhow wrote:
           | This exactly why the guideline about titles says:
           | 
           |  _Otherwise please use the original title, unless it is
           | misleading or linkbait_.
           | 
           | This title counts as linkbait so I've changed it. It turns
           | out the article is much better (for HN) than the title
           | suggests.
        
             | Kuinox wrote:
             | I did not posted the article, but I know who wrote it.
             | 
             | Good change btw.
        
         | DocTomoe wrote:
         | You can judge all you want. You'll eventually appear much like
         | that old woman secretly judging you in church.
         | 
         | Most of the current discourse on AI coding assistants sounds
         | either breathlessly optimistic or catastrophically alarmist.
         | What's missing is a more surgical observation: the disruptive
         | effect of LLMs is not evenly distributed. In fact, the clash
         | between how open source and industry teams establish trust
         | reveals a fault line that's been papered over with hype and
         | metrics.
         | 
         | FOSS project work on a trust basis - but industry standard is
         | automated testing, pair programming, and development speed.
         | That CRUD app for finding out if a rental car is available? Not
         | exactly in need for a hand-crafted piece of code, and no-one
         | cares if Junior Dev #18493 is trusted within the software dev
         | organization.
         | 
         | If the LLM-generated code breaks, blame gets passed, retros are
         | held, Jira tickets multiply -- the world keeps spinning, and a
         | team fixes it. If a junior doesn't understand their own patch,
         | the senior rewrites it under deadline. It's not pretty, but it
         | works. And when it doesn't, nobody loses "reputation" - they
         | lose time, money, maybe sleep. But not identity.
         | 
         | LLMs challenge open source where it's most vulnerable - in its
         | culture. Meanwhile, industry just treats them like the next
         | Jenkins: mildly annoying at first, but soon part of the stack.
         | 
         | The author loves the old ways, for many valid reasons: Gabled
         | houses _are_ beautiful, but outside of architectural circles,
         | prefab is what scaled the suburbs, not timber joints and
         | romanticism.
        
         | extr wrote:
         | The author seems to be under the impression that AI is some
         | kind of new invention that has now "arrived" and we need to
         | "learn to work with". The old world is over. "Guaranteeing
         | patches are written by hand" is like the Tesla Gigafactory
         | wanting a guarantee that the nuts and bolts they purchase are
         | hand-lathed.
        
         | can16358p wrote:
         | Ironically, a blog post about judging for a practice uses
         | terrible web practices: I'm on mobile and the layout is messed
         | up, and Safari's reader mode crashes on this page for whatever
         | reason.
        
           | rvnx wrote:
           | On Safari mobile you even get a white page, which is almost
           | poetic. It means it pushes your imagination to the max.
        
           | acedTrex wrote:
           | Mobile layout should be fixed now, I also just threw up a
           | quick static version here as well
           | https://static.jaysthoughts.com/
        
         | EbNar wrote:
         | I'll surely care that a stranger on the internet judges me
         | about the tools I use kor I don't).
        
       | stavros wrote:
       | I don't understand the premise. If I trust someone to write good
       | code, I learned to trust them because their code works well, not
       | because I have a theory of mind for them that "produces good
       | code" a priori.
       | 
       | If someone uses an LLM and produces bug-free code, I'll trust
       | them. If someone uses an LLM and produces buggy code, I won't
       | trust them. How is this different from when they were only using
       | their brain to produce the code?
        
         | moffkalast wrote:
         | It's easy to get overconfident and not test the LLM's code
         | enough when it worked fine for a handful of times in a row, and
         | then you miss something.
         | 
         | The problem is often really one of miscommunication, the task
         | may be clear to the person working on it, but with frequent
         | context resets it's hard to make sure the LLM also knows what
         | the whole picture is and they tend to make dumb assumptions
         | when there's ambiguity.
         | 
         | The thing that 4o does with deep research where it asks for
         | additional info before it does anything should be standard for
         | any code generation too tbh, it would prevent a mountain of
         | issues.
        
           | stavros wrote:
           | Sure, but you're still responsible for the quality of the
           | code you commit, LLM or no.
        
             | moffkalast wrote:
             | Of course you are, but it's sort of like how people are
             | responsible their Tesla driving on autopilot, which then
             | suddenly swerves into a wall and disengages two seconds
             | before impact. The process forces you to make mistakes you
             | wouldn't normally ever do or even consider a possibility.
        
               | JohnKemeny wrote:
               | To add to devs and Teslas, you have journalists using
               | LLMs writing summaries, lawyers using LLMs writing
               | dispositions, doctors using LLMs writing their patient
               | entries, and law enforcement using LLMs writing their
               | forensics report.
               | 
               | All of these make mistakes (there are documented
               | incidents).
               | 
               | And yes, we can counter with "the journalists are dumb
               | for not verifying", "the lawyers are dumb for not
               | checking", etc., but we should also be open for the fact
               | that these are intelligent and professional people who
               | make mistakes because they were mislead by those who sell
               | LLMs.
        
               | bluefirebrand wrote:
               | I think it's analogous to physical labour
               | 
               | In the past someone might have been physically healthy
               | and strong enough to physically shovel dirt all day long
               | 
               | Nowadays this is rarer because we use an excavator
               | instead. Yes, a professional dirt mover is more
               | productive with an excavator than a shovel, but is likely
               | not as physically fit as someone spending their days
               | moving dirt with a shovel
               | 
               | I think it will be similar with AI. It is absolutely
               | going to offload a lot of people's thinking into the LLMs
               | and their "do it by hand" muscles will atrophy. For
               | knowledge workers, that's our brain
               | 
               | I know this was a similar concern with search engines and
               | Stack Overflow, so I am trying to temper my concern here
               | as best I can. But I can't shake the feeling that LLMs
               | provide a way for people to offload their thinking and go
               | on autopilot a lot more easily than Search ever did
               | 
               | I'm not saying that we were better off when we had to
               | move dirt by hand either. I'm just saying there was a
               | physical tradeoff when people moved out of the fields and
               | into offices. I suspect there will be a cognitive
               | tradeoff now that we are moving away from researching
               | solutions to problems and towards asking the AI to give
               | us solutions to problems
        
             | acedTrex wrote:
             | In an ideal world you would think everyone see's it this
             | way. But we are starting to see an uptick in "I don't know
             | the LLMs said do that."
             | 
             | As if that is a somehow exonerating sentence.
        
               | NeutralCrane wrote:
               | It isn't, and that is a sign of a bad dev you shouldn't
               | trust.
               | 
               | LLMs are a tool, just like any number of tools that are
               | used by developers in modern software development. If a
               | dev doesn't use the tool properly, don't trust them. If
               | they do, trust them. The way to assess if they use it
               | properly is in the code they produce.
               | 
               | Your premise is just fundamentally flawed. Before LLMs,
               | the proof of a quality dev was in the pudding. After
               | LLMs, the proof of a quality dev remains in the pudding.
        
               | acedTrex wrote:
               | > Your premise is just fundamentally flawed. Before LLMs,
               | the proof of a quality dev was in the pudding. After
               | LLMs, the proof of a quality dev remains in the pudding.
               | 
               | Indeed it does, however what the "proof" is has changed.
               | In terms of sitting down and doing a full, deep review,
               | tracing every path validating every line etc. Then for
               | sure, nothing has changed.
               | 
               | However, at least in my experience, pre LLM those reviews
               | were not EVERY CASE there were many times I elided parts
               | of a deep review because i saw markers in the code that
               | to me showed competency, care etc. With those markers
               | there are certain failure conditions that can be deemed
               | very unlikely to exist and therefore the checks can be
               | skipped. Is that ALWAYS the correct assumption?
               | Absolutely not but the more experienced you are the less
               | false positives you get.
               | 
               | LLMs make those markers MUCH harder to spot, so you have
               | to fall back to doing a FULL indepth review no matter
               | what. You have to eat ALL the pudding so to speak.
               | 
               | For people that relied on maybe tasting a bit of the
               | pudding then assuming based on the taste the rest of the
               | pudding probably tastes the same its rather jarring and
               | exhausting to now have to eat all of it all the time.
        
               | NeutralCrane wrote:
               | > However, at least in my experience, pre LLM those
               | reviews were not EVERY CASE there were many times I
               | elided parts of a deep review because i saw markers in
               | the code that to me showed competency, care etc.
               | 
               | That was never proof in the first place.
               | 
               | If anything, someone basing their trust in a submission
               | on anything other than the code itself is far more
               | concerning and trust-damaging to me than if the submitter
               | has used an LLM.
        
               | acedTrex wrote:
               | > That was never proof in the first place.
               | 
               | I mean, it's not necessarily HARD proof but it has been a
               | reliable enough way to figure out which corners to cut.
               | You can of course say that no corners should ever be cut
               | and while that is true in an ideal sense. In the real
               | world things always get fuzzy.
               | 
               | Maybe the death of cutting corners is a good thing
               | overall for output quality. Its certainly exhausting on
               | the people tasked with doing the reviews however.
        
               | breuleux wrote:
               | I don't know about that. Cutting corners will never die.
               | 
               | Ultimately I don't think the heuristics would change all
               | that much, though. If every time you review a person's
               | PR, almost everything is great, they are either not using
               | AI or they are vetting what the AI writes themselves, so
               | you can trust them as you did before. It may just take
               | some more PRs until that's apparent. Those who submit
               | unvetted slop will have to fix a lot of things, and you
               | can crank up the heat on them until they do better, if
               | they can. (The "if they can" is what I'm most worried
               | about.)
        
         | taneq wrote:
         | If you have a long standing, effective heuristic that "people
         | with excellent, professional writing are more accurate and
         | reliable than people with sloppy spelling and punctuation" then
         | the appearance of a semi-infinite group of 'people' writing
         | well presented, convincingly worded articles which nonetheless
         | are riddled with misinformation, hidden logical flaws, and
         | inconsistencies, you're gonna end up trusting everyone a lot
         | less.
         | 
         | It's like if someone started bricking up tunnel entrances and
         | painting ultra realistic versions of the classic Road Runner
         | tunnel painting on them, all over the place. You'd have to stop
         | and poke every underpass with a stick just to be sure.
        
           | stavros wrote:
           | Sure, your heuristic no longer works, and that's a bit
           | inconvenient. We'll just find new ones.
        
             | sebmellen wrote:
             | Yeah, now you need to be able to demonstrate verbal
             | fluency. The problem is, that inherently means a loss of
             | "trusted anonymous" communication, which is particularly
             | damaging to the fiber of the internet.
        
               | acedTrex wrote:
               | Author here:
               | 
               | Precisely, in the age where it is very difficult to
               | ascertain the type or quality of skills you are
               | interacting with say in a patch review or otherwise you
               | frankly have to "judge" someone and fallback to suspicion
               | and full verification.
        
               | taneq wrote:
               | Yeah I think "trust for a fluent, seemingly logically
               | coherent anonymous responder" pretty much captures it.
        
             | oasisaimlessly wrote:
             | "A bit inconvenient" might be the understatement of the
             | year. If information requires say, 2x the time to validate,
             | the utility of the internet is halved.
        
         | alganet wrote:
         | > I learned to trust them because their code works well
         | 
         | There's so much more than "works well". There are many cues
         | that exist close to code, but are not code:
         | 
         | I trust more if the contributor explains their change well.
         | 
         | I trust more if the contributor did great things in the past.
         | 
         | I trust more if the contributor manages granularity well
         | (reasonable commits, not huge changes).
         | 
         | I trust more if the contributor picks the right problems to
         | work on (fixing bugs before adding new features, etc).
         | 
         | I trust more if the contributor proves being able to maintain
         | existing code, not just add on top of it.
         | 
         | I trust more if the contributor makes regular contributions.
         | 
         | And so on...
        
           | acedTrex wrote:
           | Author here:
           | 
           | Spot on, there are so many little things that we as humans
           | use as subtle verification steps to decide how much scrutiny
           | various things require. LLMs are not necessarily the death of
           | that concept but they do make it far far harder.
        
         | somewhereoutth wrote:
         | Because when people use LLMs, they are getting the tool to do
         | the work for them, not using the tool to do the work. LLMs are
         | not calculators, nor are they the internet.
         | 
         | A good rule of thumb is to simply reject any work that has had
         | involvement of an LLM, and ignore any communication written by
         | an LLM (even for EFL speakers, I'd much rather have your "bad"
         | English than whatever ChatGPT says for you).
         | 
         | I suspect that as the serious problems with LLMs become ever
         | more apparent, this will become standard policy across the
         | board. Certainly I hope so.
        
           | stavros wrote:
           | Well, no, a good rule of thumb is to expect people to write
           | good code, no matter how they do it. Why would you mandate
           | what tool they can use to do it?
        
             | somewhereoutth wrote:
             | Because it pertains to the quality of the output - I can't
             | validate every line of code, or test every edge case. So if
             | I need a certain level of quality, I have to verify the
             | process of producing it.
             | 
             | This is standard for any activity where accuracy / safety
             | is paramount - you validate the _process_. Hence things
             | like maintenance logs for airplanes.
        
               | acedTrex wrote:
               | > So if I need a certain level of quality, I have to
               | verify the process of producing it
               | 
               | Precisely this, and this is hardly a unique to software
               | requirement. Process audits are everywhere in
               | engineering. Previously you could infer the process of
               | producing some code by simply reading the patch and that
               | generally would tell you quite a bit about the author
               | itself. Using advanced and niche concepts with imply a
               | solid process with experience backing it. Which would
               | then imply that certain contextual bugs are unlikely so
               | you skip looking for them.
               | 
               | My premise in the blog is basically that "Well now I have
               | go do a full review no matter what the code itself tells
               | me about the author."
        
               | badsectoracula wrote:
               | > My premise in the blog is basically that "Well now I
               | have go do a full review no matter what the code itself
               | tells me about the author."
               | 
               | Which IMO is the correct approach - or alternatively, if
               | you do actually trust the author, you shouldn't care if
               | they used LLMs or not because you'd trust them to check
               | the LLM output too.
        
               | badsectoracula wrote:
               | The false assumption here is that humans will always
               | write better code than LLMs, which is certainly not the
               | case for all humans nor all LLMs.
        
           | mexicocitinluez wrote:
           | >Because when people use LLMs, they are getting the tool to
           | do the work for them, not using the tool to do the work.
           | 
           | What? How on god's green earth could you even pretend to know
           | how all people are using these tools?
           | 
           | > LLMs are not calculators, nor are they the internet.
           | 
           | Umm, okay? How does that make them less useful?
           | 
           | I'm going to give you a concrete example of something I just
           | did and let you try and do whatever mental gymnastics you
           | have to do to tell me it wasn't useful:
           | 
           | Medicare requires all new patients receiving home health
           | treatment go through a 100+ question long form. This form
           | changes yearly, and it's my job to implement the form into
           | our existing EMR. Well, part of that is creating a printable
           | version. Guess what I did? I uploaded the entire pdf to
           | Claude and asked it to create a print-friendly template using
           | Cottle as the templating language in C#. It generated the 30
           | page print preview in a minute. And it took me about 10 more
           | minutes to clean up.
           | 
           | > I suspect that as the serious problems with LLMs become
           | ever more apparent, this will become standard policy across
           | the board. Certainly I hope so.
           | 
           | The irony is that they're getting better by the day. That's
           | not to say people don't use them for the wrong applications,
           | but the idea that this tech is going to be banned is absurd.
           | 
           | > A good rule of thumb is to simply reject any work that has
           | had involvement of an LLM
           | 
           | Do you have any idea how ridiculous this sounds to people who
           | actually use the tools? Are you going to be able to hunt down
           | the single React component in which I asked it to convert the
           | MUI styles to tailwind? How could you possibly know? You
           | can't.
        
           | flir wrote:
           | > A good rule of thumb is to simply reject any work that has
           | had involvement of an LLM,
           | 
           | How are you going to _know_?
        
             | bluefirebrand wrote:
             | That's sort of the problem isn't it? There is no real way
             | to know so we sort of just have to assume every bit of work
             | is involving LLMs now so we have to take a lot closer look
             | at everything
        
           | sebmellen wrote:
           | You're being unfairly downvoted. There is a plague of well-
           | groomed incoherency in half of the business emails I receive
           | today. You can often tell that the author, without wrestling
           | with the text to figure out what they want to say, is a kind
           | of stochastic parrot.
           | 
           | This is okay for platitudes, but for emails that really
           | matter, having this messy watercolor kind of writing totally
           | destroys the clarity of the text and confuses everyone.
           | 
           | To your point, I've asked everyone on my team to refrain from
           | writing words (not code) with ChatGPT or other tools, because
           | the LLM invariably leads to more complicated output than the
           | author just badly, but authentically, trying to express
           | themselves in the text.
        
             | acedTrex wrote:
             | Yep, I have come to really dislike LLMs for documentation
             | as it just reads wrong to me and I find so often misses the
             | point entirely. There is so much nuance tied up in
             | documentation and much of it is in what is NOT said as much
             | as what is said.
             | 
             | The LLMs struggle with both but REALLY struggle with
             | figuring out what NOT to say.
        
               | short_sells_poo wrote:
               | I wonder if this is to a large degree also because when
               | we communicate with humans, we take cues from more than
               | just the text. The personality of the author will project
               | into the text they write, and assuming you know this
               | person at least a little bit, these nuances will give you
               | extra information.
        
             | jimbokun wrote:
             | I find the idea of using LLMs for emails confusing.
             | 
             | Surely it's less work to put the words you want to say into
             | an email, rather than craft a prompt to get the LLM to say
             | what you want to say, and iterate until the LLM actually
             | says it?
        
               | fwip wrote:
               | My own opinion, which is admittedly too harsh, is that
               | they don't really know what they want to say. That is,
               | the prompt they write is very short, along the lines of
               | `ask when this will be done` or `schedule a followup`,
               | and give the LLM output a cursory review before copy-
               | pasting it.
        
               | jimbokun wrote:
               | Still funny to me.
               | 
               | `ask when this will be done` -> ChatGPT -> paste answer
               | into email
               | 
               | vs
               | 
               | type: "when will this be done?" Send.
        
           | breuleux wrote:
           | I think the main issue is people using LLMs to do things that
           | they don't know how to do themselves. There's actually a
           | similar problem with calculators, it's just a much smaller
           | one: if you never learn how to add or multiply numbers by
           | hand and use calculators for everything all the time, you may
           | sometimes make absurd mistakes like tapping 44 * 3 instead of
           | 44 * 37 and not bat an eye when your calculator tells you the
           | result is a whole order of magnitude less than what you
           | should have expected. Because you don't really understand how
           | it works. You haven't developed the intuition.
           | 
           | There's nothing wrong with using LLMs to save time doing
           | trivial stuff you know how to do yourself and can check very
           | easily. The problem is that (very lazy) people are using them
           | to do stuff they are themselves not competent at. They can't
           | check, they won't learn, and the LLM is essentially their
           | skill ceiling. This is very bad: what plus-value are you
           | supposed to bring over something you don't understand? AGI
           | won't have to improve from the current baseline to surpass
           | humans if we're just going to drag ourselves down to its
           | level.
        
           | tranchebald wrote:
           | I'm not seeing a lot of discussion about verification or a
           | stronger quality control process anywhere in the comments
           | here. Is that some kind of unsolvable problem for software? I
           | think if the standard of practice is to use author reputation
           | as a substitute for a robust quality control process, then I
           | wouldn't be confident that the current practice is much
           | better than AI code-babel.
        
           | badsectoracula wrote:
           | > Because when people use LLMs, they are getting the tool to
           | do the work for them, not using the tool to do the work.
           | 
           | You can say that for pretty much any sort of automation or
           | anything that makes things easier for humans. I'm pretty sure
           | people were saying that about doing math by hand around when
           | calculators became mainstream too.
        
         | mexicocitinluez wrote:
         | It's not.
         | 
         | What you're seeing now is people who once thought and
         | proclaimed these tools as useless now have to start to walk
         | back their claims with stuff like this.
         | 
         | It does amaze me that the people who don't use these tools seem
         | to have the most to say about them.
        
           | acedTrex wrote:
           | Author here:
           | 
           | For what it's worth I do actually use the tools albeit
           | incredibly intentionally and sparingly.
           | 
           | I see quite a few workflows and tasks that they can be a
           | value add on, mostly outside of the hotpath of actual code
           | generation but still quite enticing. So much so in fact I'm
           | working on my own local agentic tool with some self hosted
           | ollama models. I like to think that i am at least somewhat in
           | the know on the capabilities and failure points of the latest
           | LLM tooling.
           | 
           | That however doesn't change my thoughts on trying to
           | ascertain if code submitted to me deserves a full indepth
           | review or if I can maybe cut a few corners here and there.
        
             | mexicocitinluez wrote:
             | > That however doesn't change my thoughts on trying to
             | ascertain if code submitted to me deserves a full indepth
             | review or if I can maybe cut a few corners here and there.
             | 
             | How would you even know? Seriously, if I use Chatgpt to
             | generate a one-off function for a feature I'm working on
             | that searches all classes for one that inherits a specific
             | interface and attribute, are you saying you'd be able to
             | spot the difference?
             | 
             | And what does it even matter it works?
             | 
             | What if I use Bolt to generate a quick screen for a PoC? Or
             | use Claude to create a print-preview with CSS of a 30 page
             | Medicare form? Or converting a component's styles MUI to
             | tailwind? What if all these things are correct?
             | 
             | This whole OS repos will ban LLM-generated code is a bit
             | absurd.
             | 
             | > or what it's worth I do actually use the tools albeit
             | incredibly intentionally and sparingly.
             | 
             | How sparingly? Enough to see how it's constantly improving?
        
               | acedTrex wrote:
               | > How would you even know? Seriously, if I use Chatgpt to
               | generate a one-off function for a feature I'm working on
               | that searches all classes for one that inherits a
               | specific interface and attribute, are you saying you'd be
               | able to spot the difference?
               | 
               | I don't know, thats the problem. As a result, because I
               | can't know I have to now do full in depth reviews no
               | matter what. Which is the "judging" I tongue in cheek
               | talk about in the blog.
               | 
               | > How sparingly? Enough to see how it's constantly
               | improving?
               | 
               | Nearly daily, to be honest I have not noticed too much
               | improvement year over year in regards to how they fail.
               | They still break in the exact same dumb ways now as they
               | did before. Sure they might generate correct syntactic
               | code reliably now and it might even work. But they still
               | consistently fail to grok the underlying reasoning for
               | things existing.
               | 
               | But I am writing my own versions of these agentic systems
               | to use for some rote menial stuff.
        
               | mexicocitinluez wrote:
               | So you werent doing in depth reviews before? Are these
               | people you know? And now you just don't trust them
               | because they include a tool on their workflow?
        
           | globnomulous wrote:
           | > It does amaze me that the people who don't use these tools
           | seem to have the most to say about them.
           | 
           | You're kidding, right? Most people who don't use the tools
           | and write about it are responding to the ongoing hype train
           | -- a specific article, a specific claim, or an idea that
           | seems to be gaining acceptance or to have gone unquestioned
           | among LLM boosters.
           | 
           | I recently watched a talk by Andrei Karpathy. So much in it
           | begged for a response. Google Glass was "all the rage" in
           | 2013? Please. "Reading text is laborious and not fun. Looking
           | at images is fun." You can't be serious.
           | 
           | Someone recently shared on HN a blog post explaining why the
           | author doesn't use LLMs. The justification for the post?
           | "People keep asking me."
        
             | mexicocitinluez wrote:
             | Being asked if I'm kidding by the person comparing Google
             | glasses to machine learning algorithms is pretty funny ngl.
             | 
             | And the "I don't use these tools and never will" sentiment
             | is rampant in the tech community right now. So yes, I am
             | serious.
             | 
             | Youre not talking about the blog post that completely
             | ignored agentless uses are you? The one that came to the
             | conclusion LLMs arent useful despite only using a subset of
             | its features?
        
               | bluefirebrand wrote:
               | > And the "I don't use these tools and never will"
               | sentiment is rampant in the tech community right now
               | 
               | So is the "These tools are game changers and are going to
               | make all work obsolete soon" sentiment
               | 
               | Don't start pretending that AI boosters aren't
               | _everywhere_ in tech right now
               | 
               | I think the major difference I'm noticing is that many of
               | the Boosters are not people who write any code. They are
               | executives, managers, product owners, team leads, etc.
               | Former Engineers maybe but very often not actively
               | writing software daily
        
               | globnomulous wrote:
               | > I think the major difference I'm noticing is that many
               | of the Boosters are not people who write any code.
               | 
               | Plenty of current, working engineers who frequent and
               | comment on Hacker News say they use LLMs and find them
               | useful/'game changers,' I think.
               | 
               | Regardless, I think I agree overall: the key distinction
               | I see is between people who _like_ to read and write
               | programs and people who just want to make some specific
               | product. The former group generally treat LLMs as an
               | unwelcome intrusion into the work they love and value.
               | The latter generally welcome LLMs because the people
               | selling them promise, in essence, that with LLMs you can
               | skip the engineering and just make the product.
               | 
               | I'm part of the former group. I love reading code,
               | thinking about it, and working with it. Meeting-based
               | programming (my term for LLM-assisted programming) sounds
               | like hell on earth to me. I'd rather blow my brains out
               | than continue to work as a software engineer in a world
               | where the LLM-booster dream comes true.
        
               | bluefirebrand wrote:
               | > I'd rather blow my brains out than continue to work as
               | a software engineer in a world where the LLM-booster
               | dream comes true.
               | 
               | I feel the same way
               | 
               | But please don't. I promise I won't either. There is
               | still a place for people like you and me in this world,
               | it's just gonna take a bit more work to find it
               | 
               | Deal? :)
        
               | mexicocitinluez wrote:
               | > So is the "These tools are game changers and are going
               | to make all work obsolete soon" sentiment
               | 
               | Except we aren't talking about those people, are we? The
               | blog post wans't about that.
               | 
               | > Don't start pretending that AI boosters aren't
               | everywhere in tech right now
               | 
               | PLEASE tell me what I said that made you feel like you
               | need to put words in my mouth. Seriously.
               | 
               | > I think the major difference I'm noticing is that many
               | of the Boosters are not people who write any code
               | 
               | I write code every day. I just asked Claude to convert a
               | Medicare mandated 30 page assessment to a printable
               | version with CSS using Cottle in C# and it did it. I'd
               | love to know why that sort of thing isn't useful.
        
               | globnomulous wrote:
               | > Being asked if I'm kidding by the person comparing
               | Google glasses to machine learning algorithms is pretty
               | funny ngl.
               | 
               | I didn't draw the comparison. Karpathy, one of the most
               | prominent LLM proponents on the planet -- the guy who
               | invented the term 'vibe-coding' -- drew the
               | comparison.[1]
               | 
               | > And the "I don't use these tools and never will"
               | sentiment is rampant in the tech community right now. So
               | yes, I am serious.
               | 
               | I think you misunderstood my comment -- or my comment
               | just wasn't clear enough: I quoted the line "It does
               | amaze me that the people who don't use these tools seem
               | to have the most to say about them." and then I asked
               | "You're kidding, right?" In other words, "you can't
               | seriously believe that the nay-sayers 'always have the
               | most to say.'" It's a ridiculous claim. Just about every
               | naysayer 'think piece' -- whether or not it's garbage --
               | is responding to an overwhelming tidal wave of pro-LLM
               | commentary and press coverage.
               | 
               | > Youre not talking about the blog post that completely
               | ignored agentless uses are you? The one that came to the
               | conclusion LLMs arent useful despite only using a subset
               | of its features?
               | 
               | I'm referring to this one[2]. It's awful, smug, self-
               | important, sanctimonious nonsense.
               | 
               | [1] https://www.youtube.com/watch?si=xF5rqWueWDQsW3FC&v=L
               | CEmiRjP...
               | 
               | [2] https://news.ycombinator.com/item?id=44294633
        
               | mexicocitinluez wrote:
               | I'm so confused as to why you took that so literally. I
               | didn't literally mean that the nay-sayers are producing
               | more words than the evangelists. It was a hyperbolic
               | expression. And I wasn't JUST talking about the blog
               | posts. I'm talking about ALL comments about it.
        
         | acedTrex wrote:
         | Author here:
         | 
         | Essentially the premise is that in medium trust environments
         | like very large teams or low trust environments like an open
         | source project.
         | 
         | LLMs make it very difficult to make an immediate snap judgement
         | about the quality of the dev that submitted the patch based
         | solely on the code itself.
         | 
         | In the absence of being able to ascertain the type of person
         | you are dealing with you have to fall back too "no trust" and
         | review everything with a very fine tooth comb. Essentially
         | there are no longer any safe "review shortcuts" and that can be
         | painful in places that relied on those markers to grease the
         | wheels so to speak.
         | 
         | Obviously if you are in an existing competent high trust team
         | then this problem does not apply and most likely seems
         | completely foreign as a concept.
        
           | lxgr wrote:
           | > LLMs make it very difficult to make an immediate snap
           | judgement about the quality [...]
           | 
           | That's the core of the issue. It's time to say goodbye to
           | heuristics like "the blog post is written in eloquent,
           | grammatical English, hence the point its author is trying to
           | make must be true" or "the code is idiomatic and following
           | all code styles, hence it must be modeling the world with
           | high fidelity".
           | 
           | Maybe that's not the worst thing in the world. I feel like it
           | often made people complacent.
        
             | acedTrex wrote:
             | > Maybe that's not the worst thing in the world. I feel
             | like it often made people complacent.
             | 
             | For sure, in some ways perhaps reverting to a low trust
             | environment might improve quality in that it now forces
             | harsher/more in depth reviews.
             | 
             | That however doesn't make the requirement less exhausting
             | for people previously relying heavily on those markers to
             | speed things up.
             | 
             | Will be very interesting to see how the industry
             | standardizes around this. Right now it's a bit of the wild
             | west. Maybe people in ten years will look back at this post
             | and think "what do you mean you judged people based on the
             | code itself that's ridiculous"
        
             | furyofantares wrote:
             | I think you're unfair to the heuristics people use in your
             | framing here.
             | 
             | You said "hence the point its author is trying to make must
             | be true" and "hence it must be modeling the world with high
             | fidelity".
             | 
             | But it's more like "hence the author is likely competent
             | and likely put in a reasonable effort."
             | 
             | When those assumptions hold, putting in a very deep review
             | is less likely to pay off. Maybe you are right that people
             | have been too complacent to begin with, I don't know, but I
             | don't think you've framed it fairly.
        
               | lxgr wrote:
               | > But it's more like "hence the author is likely
               | competent and likely put in a reasonable effort."
               | 
               | And isn't dyslexic, and is a native speaker etc. Some
               | will gain from this shift, some will lose.
        
               | furyofantares wrote:
               | Yes! This is part of why I bristle at such reductive
               | takes, we can use more nuance thinking about what we are
               | gaining and what we are losing and how to deal with it.
        
             | tempodox wrote:
             | Anyway, "following all code styles" is just a fancy way of
             | saying "adheres to fashion". What meaningful conclusions
             | can you draw from that?
        
               | rurp wrote:
               | It's not about fashion, it's about diligence and
               | consideration. Code formatting is totally different from
               | say clothing fashion. Social fashions are often about
               | being novel or surprising which is the opposite of how
               | good code is written. Code should be as standard, clear
               | and unsurprising as is reasonably possible. If someone is
               | writing code in a way that's deliberately unconventional
               | or overly fancy that's a strong signal that it isn't very
               | good.
               | 
               | When someone follows standard conventions it means that
               | they A) have a baseline level of knowledge to know about
               | them, and B) care to write the code in a clear and
               | approachable way for others.
        
               | tempodox wrote:
               | > If someone is writing code in a way that's deliberately
               | unconventional or overly fancy that's a strong signal
               | that it isn't very good.
               | 
               | "unconventional" or "fancy" is in the eye of the
               | beholder. Whose conventions are we talking about? Code is
               | bad when it doesn't look the way you want it to? How
               | convenient. I may find code hard to read because it's
               | formatted "conventionally", but I wouldn't be so entitled
               | as to call it bad just because of that.
        
               | kiitos wrote:
               | > "unconventional" or "fancy" is in the eye of the
               | beholder.
               | 
               | Literally not: a language defines its own conventions,
               | they're not defined in terms of individual
               | users/readers/maintainers subjective opinions.
               | 
               | > Whose conventions are we talking about?
               | 
               | The conventions defined by the language.
               | 
               | > Code is bad when it doesn't look the way you want it
               | to?
               | 
               | No -- when it doesn't satisfy the conventions established
               | by the language.
               | 
               | > I may find code hard to read because it's formatted
               | "conventionally",
               | 
               | If you did this then you'd be wrong, and that'd be a
               | problem with your personal evaluation process/criteria,
               | that you would need to fix.
        
               | Capricorn2481 wrote:
               | > a language defines its own conventions
               | 
               | Where are these mythical languages? I think the word
               | you're looking for is syntax, which is entirely
               | different. Conventions are how code is structured and
               | expected to be read. Very few languages actually enforce
               | or even suggest conventions, hence the many style guides.
               | It's a standout feature of Go to have a format style, and
               | people still don't agree with it.
               | 
               | And it's kinda moot when you can always override
               | conventions. It's more accurate to say a team decides on
               | the conventions of a language.
        
               | habinero wrote:
               | No, they're absolutely correct that it's critical in
               | professional and open source environments. Code is
               | written once but read hundreds or thousands of times.
               | 
               | If every rando hire goes in and has a completely
               | different style and formatting -- and then other people
               | come in and rewrite parts in their own style -- code
               | rapidly goes to shit.
               | 
               | It doesn't matter what the style is, as long as there is
               | one and it's enforced.
        
               | Capricorn2481 wrote:
               | > No, they're absolutely correct that it's critical in
               | professional and open source environments. Code is
               | written once but read hundreds or thousands of times
               | 
               | What you're saying is reasonable, but that's not what
               | they said at all. They said there's one way to write
               | cleanly and that's "Standard conventions", whatever that
               | means. Yes, conventions so standard that I've read 10
               | conflicting books on what they are.
               | 
               | There is no agreed upon definition of "readable code". A
               | team can have a style guide, which is great to follow,
               | but that is just formalizing the personal preference of
               | the people working on a project. It's not anymore divine
               | than the opinion of a "rando."
        
               | habinero wrote:
               | No, you misunderstood what they said. And I misspoke a
               | little, too.
               | 
               | While it's true that _in principle_ it doesn 't matter
               | what style you choose as long as there is one, in
               | _practice_ languages are just communities of people, and
               | every community develops norms and standards. More recent
               | languages often just pick a style and bake it in.
               | 
               | This is a good thing, because again, code is read 1000x
               | more times than it's written. It saves everyone time and
               | effort to just develop a typical style.
               | 
               | And yeah, the code might run no matter how you indent it,
               | but it's not correct, any more than you going to a
               | restaurant and licking the plates.
        
             | o11c wrote:
             | That's not how heuristics work.
             | 
             | The heuristic is "this submission doesn't even follow the
             | basic laws of grammar, therefore I can safely assume
             | incompetence and ignore it entirely."
             | 
             | You still have to do verification for what passes the
             | heuristic, but it keeps 90% of the crap away.
        
           | sim7c00 wrote:
           | its about the quality of the code, not the quality of the
           | dev. you might think it's related, but it's not.
           | 
           | a dev can write piece of good, and piece of bad code. so per
           | code, review the code. not the dev!
        
             | acedTrex wrote:
             | > you might think it's related, but it's not.
             | 
             | In my experience they very much are related. High quality
             | devs are far more likely to output high quality working
             | code. They test, they validate, they think, ultimately they
             | care.
             | 
             | In that case that you are reviewing a patch from someone
             | you have limited experience with, it previously was
             | feasible to infer the quality of the dev from the context
             | of the patch itself and the surrounding context by which it
             | was submitted.
             | 
             | LLMs make that judgement far far more difficult and when
             | you can not make a snap judgement you have to revert your
             | review style to very low trust in depth review.
             | 
             | No more greasing the wheels to expedite a process.
        
             | haswell wrote:
             | > _its about the quality of the code, not the quality of
             | the dev. you might think it 's related, but it's not._
             | 
             | I could not disagree more. The quality of the dev will
             | always matter, and has as much to do with what code makes
             | it into a project as the LLM that generated it.
             | 
             | An experienced dev will have more finely tuned evaluation
             | skills and will accept code from an LLM accordingly.
             | 
             | An inexperienced or "low quality" dev may not even know
             | what the ideal/correct solution looks like, and may be
             | submitting code that they do not fully understand. This is
             | especially tricky because they may still end up submitting
             | high quality code, but not because they were capable of
             | evaluating it as such.
             | 
             | You could make the argument that it shouldn't matter who
             | submits the code if the code is evaluated purely on its
             | quality/correctness, but I've never worked in a team that
             | doesn't account for who the person is behind the code. If
             | its the grizzled veteran known for rarely making mistakes,
             | the review might look a bit different from a review for the
             | intern's code.
        
               | NeutralCrane wrote:
               | > An experienced dev will have more finely tuned
               | evaluation skills and will accept code from an LLM
               | accordingly. An inexperienced or "low quality" dev may
               | not even know what the ideal/correct solution looks like,
               | and may be submitting code that they do not fully
               | understand. This is especially tricky because they may
               | still end up submitting high quality code, but not
               | because they were capable of evaluating it as such.
               | 
               | That may be true, but the proxy for assessing the quality
               | of the dev is the code. No one is standing over you as
               | you code your contribution to ensure you are making the
               | correct, pragmatic decisions. They are assessing the code
               | you produce to determine the quality of your decisions,
               | and over time, your reputation as a dev is made up of the
               | assessments of the code you produced.
               | 
               | The point is that an LLM in no way changes this. If a dev
               | uses an LLM in a non-pragmatic way that produces bad
               | code, it will erode trust in them. The LLM is a tool, but
               | trust still factors in to how the dev uses the tool.
        
               | haswell wrote:
               | > _That may be true, but the proxy for assessing the
               | quality of the dev is the code._
               | 
               | Yes, the quality of the dev is a measure of the quality
               | of the code they produce, but once a certain baseline has
               | been established, the quality of the dev is now known
               | independent of the code they may yet produce. i.e. if you
               | were to make a prediction about the quality of code
               | produced by a "high quality" dev vs. a "low quality" dev,
               | you'd likely find that the high quality dev tends to
               | produce high quality code more often.
               | 
               | So now you have a certain degree of knowledge even before
               | you've seen the code. In practice, this becomes a factor
               | on every dev team I've worked around.
               | 
               | Adding an LLM to the mix changes that assessment
               | fundamentally.
               | 
               | > _The point is that an LLM in no way changes this._
               | 
               | I think the LLM by definition changes this in numerous
               | ways that can't be avoided. i.e. the code that was
               | previously a proxy for "dev quality" could now fall into
               | multiple categories:
               | 
               | 1. Good code written by the dev (a good indicator of dev
               | quality if they're consistently good over time)
               | 
               | 2. Good code written by the LLM and accepted by the dev
               | because they are experienced and recognize the code to be
               | good
               | 
               | 3. Good code written by the LLM and accepted by the dev
               | because it works, but not necessarily because the dev
               | knew it was good (no longer a good indicator of dev
               | quality)
               | 
               | 4. Bad code written by the LLM
               | 
               | 5. Bad code written by the dev
               | 
               | #2 and #3 is where things get messy. Good code may now
               | come into existence without it being an indicator of dev
               | quality. It is now necessary to assess whether or not the
               | LLM code was accepted because the dev recognized it was
               | good code, or because the dev got things to work and
               | essentially got lucky.
               | 
               | It may be true that you're still evaluating the code at
               | the end of the day, but what you learn from that
               | evaluation has changed. You can no longer evaluate the
               | quality of a dev by the quality of the code they commit
               | unless you have other ways to independently assess them
               | beyond the code itself.
               | 
               | If you continued to assess dev quality without taking
               | this into consideration, it seems likely that those
               | assessments would become less accurate over time as more
               | "low quality" devs produce high quality code - not
               | because of their own skills, but because of the ongoing
               | improvements to LLMs. That high quality code is no longer
               | a trustworthy indicator of dev quality.
               | 
               | > _If a dev uses an LLM in a non-pragmatic way that
               | produces bad code, it will erode trust in them. The LLM
               | is a tool, but trust still factors in to how the dev uses
               | the tool._
               | 
               | Yes, of course. But the issue is not that a good dev
               | might erode trust by using the LLM poorly. The issue is
               | that inexperienced devs will make it increasingly
               | difficult to use the same heuristics to assess dev
               | quality across the board.
        
         | insane_dreamer wrote:
         | > If someone uses an LLM and produces bug-free code, I'll trust
         | them.
         | 
         | Only because you already trust them to know that the code is
         | indeed bug-free. Some cases are simple and straightforward --
         | this routine returns a desired value or it doesn't. Other
         | situations are much more complex in anticipating the ways in
         | which it might interact with other parts of the system, edge
         | cases that are not obvious, etc. Writing code that is "bug
         | free" in that situation requires the writer of the code to
         | understand the implications of the code, and if the dev doesn't
         | understand exactly what the code does because it was written by
         | an LLM, then they won't be able to understand the implications
         | of the code. It then falls to the reviewer to understand the
         | implications of the code -- increasing their workload. That was
         | the premise.
        
       | axegon_ wrote:
       | That is already the case for me. The amount of times I've read
       | "apologies for the oversight, you are absolutely correct" is
       | staggering: 8 or 9 out of 10 times. Meanwhile I constantly see
       | people mindlessly copy paying llm generated code and subsequently
       | furious when it doesn't do what they expected it to do. Which,
       | btw, is the better option: I'd rather have something obviously
       | broken as opposed to something seemingly working.
        
         | autobodie wrote:
         | In my experience, LLMs are extremely inclined to modify code
         | just to pass tests instead of meeting requirements.
        
           | fwip wrote:
           | When they're not modifying the tests to match buggy behavior.
           | :P
        
         | devjab wrote:
         | Are you using the LLM's through a browser chatbot? Because the
         | AI-agents we use with direct code-access aren't very chatty.
         | I'd also argue that they are more capable than a lot of junior
         | programmers, at least around here. We're almost at a point
         | where you can feed the agents short specific tasks, and they
         | will perform them well enough to not really require anything
         | outside of a code review.
         | 
         | That being said, the prediction engine still can't do any real
         | engineering. If you don't specifically task them with using
         | things like Python generators, you're very likely to have a
         | piece of code that eats up a gazillion memory. Which
         | unfortunately don't set them appart from a lot of Python
         | programmers I know, but it is an example of how the LLM's are
         | exactly as bad as you mention. On the positive side, it helps
         | with people actually writing the specification tasks in more
         | detail than just "add feature".
         | 
         | Where AI-agents are the most useful for us is with legacy code
         | that nobody prioritise. We have a data extractor which was
         | written in the previous millennium. It basically uses around
         | two hunded hard-coded coordinates to extact data from a
         | specific type of documents which arrive by fax. It's worked for
         | 30ish years because the documents haven't changed... but it
         | recently did, and it took co-pilot like 30 seconds to correct
         | the coordinates. Something that would've likely taken a human a
         | full day of excruciating boredom.
         | 
         | I have no idea how our industry expect anyone to become experts
         | in the age of vibe coding though.
        
           | teeray wrote:
           | > Because the AI-agents we use with direct code-access aren't
           | very chatty.
           | 
           | So they're even more confident in their wrongness
        
           | furyofantares wrote:
           | > Because the AI-agents we use with direct code-access aren't
           | very chatty.
           | 
           | Every time I tell claude code something it did is wrong, or
           | might be wrong, or even just ask a leading question about a
           | potential bug it just wrote, it leads with "You're absolutely
           | correct!" before even invoking any tools.
           | 
           | Maybe you've just become used to ignoring this. I mostly
           | ignore it but it is a bit annoying when I'm trying to use the
           | agent to help me figure out if the code it wrote is correct,
           | so I ask it some question it should be capable of helping
           | with and it leads with "you're absolutely correct".
           | 
           | I didn't make a proposition that can be correct or not, and
           | it didn't do any work yet to to investigate my question - it
           | feels like it has poisoned its own context by leading with
           | this.
        
           | gibspaulding wrote:
           | > Where AI-agents are the most useful for us is with legacy
           | code
           | 
           | I'd love to hear more about your workflow and the code base
           | you're working in. I have access to Amazon Q (which it looks
           | like is using Claude Sonnet 4 behind the scenes) through
           | work, and while I found it very useful for Greenfield
           | projects, I've really struggled using it to work on our older
           | code bases. These are all single file 20,000 to 100,000 line
           | C modules with lots of global variables and most of the logic
           | plus 25 years of changes dumped into a few long functions.
           | It's hard to navigate for a human, but seems to completely
           | overwhelm Q's context window.
           | 
           | Do other Agents handle this sort of scenario better, or are
           | there tricks to making things more manageable? Obviously re-
           | factoring to break everything up into smaller files and
           | smaller functions would be great, but that's just the sort of
           | project that I want to be able to use the AI for.
        
         | mexicocitinluez wrote:
         | > 8 or 9 out of 10 times.
         | 
         | Not they don't. This is 100% a made up statistic.
        
           | bluefirebrand wrote:
           | It isn't even being presented as a statistic it is someone
           | saying what they have experienced
        
       | atemerev wrote:
       | I am a software engineer who writes 80-90% code with AI (sorry,
       | can't ignore the productivity boost), and I mostly agree with
       | this sentiment.
       | 
       | I found out very early that under no circumstances you may have
       | the code you don't understand, anywhere. Well, you may, but not
       | in public, and you should commit to understanding it before
       | anyone else sees that. Particularly before sales guys do.
       | 
       | However, AI can help you with learning too. You can run
       | experiments, test hypotheses and burn your fingers so fast. I
       | like it.
        
       | pfdietz wrote:
       | There was trust?
        
       | acedTrex wrote:
       | Hi everyone, author here.
       | 
       | Sorry about the JS stuff I wrote this while also fooling around
       | with alpine.js for fun. I never expected it to make it to HN.
       | I'll get a static version up and running.
       | 
       | Happy to answer any questions or hear other thoughts.
       | 
       | Edit: https://static.jaysthoughts.com/
       | 
       | Static version here with slightly wonky formatting, sorry for the
       | hassle.
       | 
       | Edit2: Should work on mobile now well, added a quick breakpoint.
        
         | konaraddi wrote:
         | Given the topic of your post, and high pagespeed results, I
         | think >99% of your intended audience can already read the
         | original. No need to apologize or please HN users.
        
       | pu_pe wrote:
       | > While the industry leaping abstractions that came before
       | focused on removing complexity, they did so with the fundamental
       | assertion that the abstraction they created was correct. That is
       | not to say they were perfect, or they never caused bugs or
       | failures. But those events were a failure of the given
       | implementation a departure from what the abstraction was SUPPOSED
       | to do, every mistake, once patched led to a safer more robust
       | system. LLMs by their very fundamental design are a probabilistic
       | prediction engine, they merely approximate correctness for
       | varying amounts of time.
       | 
       | I think what the author misses here is that imperfect,
       | probabilistic agents can build reliable, deterministic systems.
       | No one would trust a garbage collection tool based on how
       | reliable the author was, but rather if it proves it can do what
       | it intends to do after extensive testing.
       | 
       | I can certainly see an erosion of trust in the future, with the
       | result being that test-driven development gains even more
       | momentum. Don't trust, and verify.
        
         | acedTrex wrote:
         | > I think what the author misses here is that imperfect,
         | probabilistic agents can build reliable, deterministic systems.
         | No one would trust a garbage collection tool based on how
         | reliable the author was, but rather if it proves it can do what
         | it intends to do after extensive testing.
         | 
         | > but rather if it proves it can do what it intends to do after
         | extensive testing.
         | 
         | Author here: Here I was less talking about the effectiveness of
         | the output of a given tool and more so about the tool itself.
         | 
         | To take your garbage collection example, sure perhaps an
         | agentic system at some point can spin some stuff up and beat it
         | into submission with test harnesses, bug fixes etc.
         | 
         | But, imagine you used the model AS the garbage collector/tool,
         | in that say every sweep you simply dumped the memory of the
         | program into the model and told it to release the unneeded
         | blocks. You would NEVER be able to trust that the model itself
         | correctly identifies the correct memory blocks and no amount of
         | "patching" or "fine tuning" would ever get you there.
         | 
         | With other historical abstractions like say jvm, if the
         | deterministic output, in this case the assembly the jit emits
         | is incorrect that bug is patched and the abstraction will never
         | have that same fault again. not so with LLMs.
         | 
         | To me that distinction is very important when trying to point
         | out previous developer tooling that changed the entire nature
         | of the industry. It's not to say I do not think LLMs will have
         | a profound impact on the way things work in the future. But I
         | do think we are in completely uncharted territory with limited
         | historical precedence to guide us.
        
         | lbalazscs wrote:
         | It's naive to hope that automatic tests will find all problems.
         | There are several types of problems that are hard to detect
         | automatically: concurrency problems, resource management
         | errors, security vulnerabilities, etc.
         | 
         | An even more important question: who tests the tests
         | themselves? In traditional development, every piece of logic is
         | implemented twice: once in the code and once in the tests. The
         | tests checks the code, and in turn, the code implicitly checks
         | the tests. It's quite common to find that a bug was actually in
         | the tests, not the app code. You can't just blindly trust the
         | tests, and wait until your agent finds a way to replicate a
         | test bug in the code.
        
         | bluefirebrand wrote:
         | > I think what the author misses here is that imperfect,
         | probabilistic agents can build reliable, deterministic systems
         | 
         | That is quite a statement! You're talking about systems that
         | are essentially entropy-machines somehow creating order?
         | 
         | > with the result being that test-driven development gains even
         | more momentum
         | 
         | Why is it that TDD is always put forward as the silver bullet
         | that fixes all issues with building software
         | 
         | The number of times I've seen TDD build the wrong software
         | after starting with the wrong tests is actually embarassing
        
       | dirkc wrote:
       | I have a friend that always says "innovation happens at the speed
       | of trust". Ever since GPT3, that quote comes to mind over and
       | over.
       | 
       | Verification has a high cost and trust is the main way to lower
       | that cost. I don't see how one can build trust in LLMs. While
       | they are extremely articulate in both code and natural language,
       | they will also happily go down fractal rabbit holes and show
       | behavior I would consider malicious in a person.
        
         | acedTrex wrote:
         | Author here: I quite like that quote. A very succinct way of
         | saying what took me a few paragraphs.
         | 
         | This new world of having to verify every single thing at all
         | points is quite exhausting and frankly pretty slow.
        
           | Herring wrote:
           | So get another LLM to do it. Judging is considerably easier
           | [For LLMs] than writing something from scratch, so LLM judges
           | will always have that edge in accuracy. Equivalently, I also
           | like getting them to write tons of tests to build trust in
           | correct behavior.
        
             | acedTrex wrote:
             | > Judging is considerably easier than writing something
             | from scratch
             | 
             | I don't agree with this at all. Writing new code is
             | trivially easy, to do a full in depth review takes
             | significantly more brain power. You have to fully ascertain
             | and insert yourself into someone elses thought process.
             | Thats way more work than utilizing your own thought
             | process.
        
               | Herring wrote:
               | Sorry, I should have been more specific. I meant LLMs are
               | more reliable and accurate at judging than at generating
               | from scratch.
               | 
               | They basically achieve over 80% agreement with human
               | evaluators [1]. This level of agreement is similar to the
               | consensus rate between two human evaluators, making LLM-
               | as-a-judge a scalable and reliable proxy for human
               | judgment.
               | 
               | [1] https://arxiv.org/abs/2306.05685 (2023)
        
               | habinero wrote:
               | 80% is a pretty abysmal success rate and means it's very
               | _unreliable_.
               | 
               | It _sounds_ nice but it means at least 1 in 5 are bad.
               | That 's worse odds than rolling 1 on a d6. You'll be
               | tripping over mistakes constantly.
        
               | malfist wrote:
               | LLMs will not have the context behind the lines of code
               | in the CR.
               | 
               | Sure there's no bug with how the logic is defined in the
               | CR or even in the context of the project, ti maybe won't
               | throw an exception.
               | 
               | But the LLM won't know that the query is iterating over
               | an unindexed field in the DB with the table in prod
               | having 10s of millions of rows. The LLM won't know that
               | even though the code says the button should be red and
               | the comments say the button should be red, the corporate
               | style guide says red should be a very specific hex code
               | that it isn't.
        
             | inetknght wrote:
             | > _So get another LLM to do it._
             | 
             | Oh goodness that's like trusting one kid to tell you
             | whether or not his friend lied.
             | 
             | In matters where trust _matters_ , it's a recipe for
             | disaster.
        
               | Herring wrote:
               | *shrug this kid is growing up _fast_
               | 
               | Give it another year and HN comments will be very
               | different.
               | 
               | Writing tests already works now. It's usually easier to
               | read tests than to read convoluted logic.
        
               | catlifeonmars wrote:
               | It's also easy to misread tests FWIW.
        
               | inetknght wrote:
               | > _shrug this kid is growing up fast_
               | 
               | Mmmhmm. And you think this "growing up" doesn't have
               | biases to lie in circumstances where it matters? Consider
               | politics. Politics _matter_. It 's _inconceivable_ that a
               | magic algorithm would _lie_ to us about various political
               | concerns, right? Right...?
               | 
               | A magic algorithm lying to us about anything would be
               | extremely valuable to _liars_. Do you think it 's
               | possible that liars are guiding the direction of these
               | magic algorithms?
        
               | dingnuts wrote:
               | they've been saying that for three years and the
               | performance improvement has been asymptotic (logarithmic)
               | for a decade, if you've been following the state of the
               | art that long.
        
               | habinero wrote:
               | Sure, and there was a lot of hype about the blockchain a
               | decade ago and how it would take over everything. YC
               | funded a ton of blockchain startups.
               | 
               | I notice a distinct lack of blockchain hegemony.
        
               | malfist wrote:
               | LLMs inspecting LLM code is like the police investigating
               | themselves for wrong doing.
        
             | skim1420 wrote:
             | 0.9 * 0.9 == 0.81
        
               | kbelder wrote:
               | 0.1 * 0.1 == 0.01
        
           | tayo42 wrote:
           | We do this is in professional environments already with
           | documentation for designs upfront and code reviews though
        
           | JackFr wrote:
           | https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
           | ..
           | 
           | The classic on the subject.
        
           | EGreg wrote:
           | "Freedom of speech" in politics
        
         | whiplash451 wrote:
         | > "innovation happens at the speed of trust"
         | 
         | You'll have to elaborate on that. How much trust was there in
         | electricity, flight and radioactivity when we discovered them?
         | 
         | In science, you build trust as you go.
        
           | agent281 wrote:
           | Have you heard of the War of the Currents?
           | 
           | > As the use of AC spread rapidly with other companies
           | deploying their own systems, the Edison Electric Light
           | Company claimed in early 1888 that high voltages used in an
           | alternating current system were hazardous, and that the
           | design was inferior to, and infringed on the patents behind,
           | their direct current system.
           | 
           | > In the spring of 1888, a media furor arose over electrical
           | fatalities caused by pole-mounted high-voltage AC lines,
           | attributed to the greed and callousness of the arc lighting
           | companies that operated them.
           | 
           | https://en.wikipedia.org/wiki/War_of_the_currents
        
             | bori5 wrote:
             | Tesla is barely mentioned in that article which is somewhat
             | surprising
        
               | throw4847285 wrote:
               | Not surprising at all. He was a minor player in the
               | Current Wars compared to his primary benefactor, George
               | Westinghouse. His image was rehabilitated first by
               | Serbian-Americans and then by webcomic artists and
               | Redditors, who turned him into a secular saint.
               | 
               | Most of what people think they know about Tesla is not
               | actually true if you examine the historical record. But
               | software engineering as a discipline demands business
               | villains and craftsman heroes, and so Edison and Tesla
               | were warped to fit those roles even though in real life
               | there is only evidence of cordial interactions.
        
           | dirkc wrote:
           | I use it to mean that the more people trust each other, the
           | quicker things get done. Maybe the statement can be rephrased
           | as "progress happens at the speed of trust" to avoid the
           | specific scientific connotation.
        
             | whiplash451 wrote:
             | That's a pretty useless statement in the context of
             | innovation.
             | 
             | The moment a technology reaches trust at scale, it becomes
             | a non-innovation in people's mind.
             | 
             | Happened for TVs, electrical light in homes, AI for chess,
             | and Google. Will happen with LLM-based assistants.
        
               | jazzyjackson wrote:
               | You're not catching on. It's not the trust in the
               | technology, it's the trust between people. Consider
               | business dealings between entities that do not have high
               | trust - everything becomes mediated through lawyers and
               | nothing happens without a contract. Slow and expensive.
               | Handshake deals and promises kept move things along a lot
               | faster and without the expense of hammering out legal
               | arrangements.
               | 
               | LLM leads to distrust between people. From TFA, _That
               | concept is Trust - It underpins everything about how a
               | group of engineers function and interact with each other
               | in all technical contexts. When you discuss a project
               | architecture you are trusting your team has experience
               | and viewpoints to back up their assertions._
        
             | perrygeo wrote:
             | Importantly, there are many business processes today that
             | are already limited by lack of trust. That's not
             | necessarily a bad thing either - checks and balances exist
             | for a reason. But it does strongly suggest that increasing
             | "productivity" by dumping more inputs into the process is
             | counter-productive to the throughput of the overall system.
        
             | reaperducer wrote:
             | _I use it to mean that the more people trust each other,
             | the quicker things get done._
             | 
             | True not only in innovation, but in business settings.
             | 
             | I don't think there's anyone who works in any business long
             | enough who doesn't have problems getting their job done
             | simply because someone else with a key part of the project
             | doesn't trust that you know what you're doing.
        
           | reaperducer wrote:
           | _How much trust was there in electricity, flight and
           | radioactivity when we discovered them?_
           | 
           | Not much.
           | 
           | Plenty of people were against electricity when it started
           | becoming common. They were terrified of lamps, doorbells,
           | telephones, or anything else with an electric wire. If they
           | were compelled to use these things (like for their job) they
           | would often wear heavy gloves to protect themselves. It is
           | very occasionally mentioned in novels from the late 1800's.
           | 
           | (Edit: If you'd like to see this played out visually, watch
           | the early episodes of Miss Fisher's Murder Mysteries on ABC
           | [.oz])
           | 
           | There are still people afraid of electricity today. There is
           | no shortage of information on the (ironically enough)
           | internet about how to shield your home from the harmful
           | effects of electrical wires, both in the house and utility
           | lines.
           | 
           | Flight? I dunno about back then, but today there's plenty of
           | people who are afraid to fly. If you live in Las Vegas for a
           | while, you start to notice private train cars occasionally
           | parked on the siding near the north outlet mall. These belong
           | to celebrities who are afraid to fly, but have to go to Vegas
           | for work.
           | 
           | Radioactivity? There was a plethora of radioactive hysteria
           | in books, magazines, comics, television, movies, and radio.
           | It's not hard to find.
        
             | whiplash451 wrote:
             | That's exactly my point
        
         | lubujackson wrote:
         | We never can have total trust in LLM output, but we can
         | certainly sanitize it and limit it's destructive range. Just
         | like we sanitize user input and defend with pentests and hide
         | secrets in dot files, we will eventually resolve to "best
         | practices" and some "SOC-AI compliance" standard down the road.
         | 
         | It's just too useful to ignore, and trust is always built,
         | brick by brick. Let's not forget humans are far from reliable
         | anyway. Just like with driving cars, I imagine producing less
         | buggy code (along predefined roads) will soon outpace humans.
         | Then it is just blocking and tackling to improve complexity.
        
           | bluefirebrand wrote:
           | > We never can have total trust in LLM output, but we can
           | certainly sanitize it and limit it's destructive range
           | 
           | Can we really do this reliably? LLMs are non-deterministic,
           | right, so how do we validate the output in a deterministic
           | way?
           | 
           | We can validate things like shape of data being returned, but
           | how do we validate correctness without an independent human
           | in the loop to verify?
        
             | lovich wrote:
             | The same way we did it with humans in the loop?
             | 
             | I check AI output for hallucinations and issues as I don't
             | fully trust it to work, but we also do PRs with humans to
             | have another set of eyes check because humans also make
             | mistakes.
             | 
             | For the soft sciences and arts I'm not sure how to validate
             | anything from AI but for software and hard sciences I don't
             | see why test suites wouldn't continue serving their same
             | purpose
        
               | aDyslecticCrow wrote:
               | Famously, "it's easier to write code than to read it".
               | That goes for humans. So why did we automate the easy
               | part and move the effort over to the hard part?
               | 
               | If we need a human in the loop to check every row of code
               | for the deep logic errors... then we could just get the
               | human to write it no?
        
       | geor9e wrote:
       | They changed the headline to "Yes, I will judge you for using
       | AI..." so I feel like I got the whole story already.
        
       | satisfice wrote:
       | LLMs make bad work-- of any kind-- look like plausibly good work.
       | That's why it is rational to automatically discount the products
       | of anyone who has used AI.
       | 
       | I once had a member of my extended family who turned out to be a
       | con artist. After she was caught, I cut off contact, saying I
       | didn't know her. She said "I am the same person you've known for
       | ten years." And I replied "I suppose so. And now I realized I
       | have never known who that is, and that I never can know."
       | 
       | We all assume the people in our lives are not actively trying to
       | hurt us. When that trust breaks, it breaks hard.
       | 
       | No one who uses AI can claim "this is my work." I don't know that
       | it is your work.
       | 
       | No one who uses AI can claim that it is good work, unless they
       | thoroughly understand it, which they probably don't.
       | 
       | A great many students of mine have claimed to have read and
       | understand articles I have written, yet I discovered they didn't.
       | What if I were AI and they received my work and put their name on
       | it as author? They'd be unable to explain, defend, or follow up
       | on anything.
       | 
       | This kind of problem is not new to AI. But it has become ten
       | times worse.
        
         | bobjordan wrote:
         | I see where you're coming from, and I appreciate your
         | perspective. The "con artist" analogy is plausible, for the
         | fear of inauthenticity this technology creates. However, I'd
         | like to offer a different view from someone who has been deep
         | in the trenches of full-stack software development.
         | 
         | I'm someone who put in my "+10,000 hours" programming complex
         | applications, before useful LLMs were released. I spent years
         | diving into documentation and other people's source code every
         | night, completely focused on full-stack mastery. Eventually,
         | that commitment led to severe burnout. My health was bad, my
         | marriage was suffering. I released my application and then I
         | immediately had to walk away from it for three years just to
         | recover. I was convinced I'd never pick it up again.
         | 
         | It was hearing many reports that LLMs had gotten good at code
         | that cautiously brought me back to my computer. That's where my
         | experience diverges so strongly from your concerns. You say,
         | "No one who uses AI can claim 'this is my work.'" I have to
         | disagree. When I use an LLM, I am the architect and the final
         | inspector. I direct the vision, design the system, and use a
         | diff tool to review every single line of code it produces. Just
         | recently, I used it as a partner to build a complex
         | optimization model for my business's quote engine. Using a true
         | optimization model was always the "right" way to do it but
         | would have taken me months of grueling work before, learning
         | all details of the library, reading other people's code, etc.
         | We got it done in a week. Do I feel like it's my work?
         | Absolutely. I just had a tireless and brilliant, if sometimes
         | flawed, assistant.
         | 
         | You also claim the user won't "thoroughly understand it." I've
         | found the opposite. To use an LLM effectively for anything non-
         | trivial, you need a deeper understanding of the fundamentals to
         | guide it and to catch its frequent, subtle mistakes. Without my
         | years of experience, I would be unable to steer it for complex
         | multi-module development, debug its output, or know that the
         | "plausibly good work" it produced was actually wrong in some
         | ways (like N+1 problems).
         | 
         | I can sympathize with your experience as a teacher. The problem
         | of students using these tools to fake comprehension is real and
         | difficult. In academia, the process of learning, getting some
         | real fraction of the +10,000hrs is the goal. But in the
         | professional world, the result is the goal, and this is a new,
         | powerful tool to achieve better results. I'm not sure how a
         | teacher should instruct students in this new reality, but
         | demonizing LLM use is probably not the best approach.
         | 
         | For me, it didn't make bad work look good. It made great work
         | possible again, all while allowing me to have my life back. It
         | brought the joy back to my software development craft without
         | killing me or my family to do it. My life is a lot more
         | balanced now and for that, I'm thankful.
        
           | satisfice wrote:
           | Here's the problem, friend: I also have put in my 10,000
           | hours. I've been coding as part of my job since 1983. I
           | switched to testing from production coding in 1987, but I ran
           | a team that tested developer tools, at Apple and Borland, for
           | eight years. I've been living and breathing testing for
           | decades as a consultant and expert witness.
           | 
           | I do not lightly say that I don't trust the work of someone
           | who uses AI. I'm required to practice with LLMs as part of my
           | job. I've developed things with the help of AI. Small things,
           | because the amount of vigilance necessary to do big things is
           | prohibitive.
           | 
           | Fools rush in, they say. I'm not a fool, and I'm not claiming
           | that you are either. What I know is that there is a huge
           | burden of proof on the shoulders of people who claim that AI
           | is NOT problematic-- given the substantial evidence that it
           | behaves recklessly. This burden is not satisfied by people
           | who say "well, I'm experienced and I trust it."
        
             | bobjordan wrote:
             | Thank you for sharing your deep experience. It's a valid
             | perspective, especially from an expert in the world of
             | testing.
             | 
             | You're right to call out the need for vigilance and to
             | place the burden of proof on those of us who advocate for
             | this tool. That burden is not met by simply trusting the
             | AI, you're right, that would be foolish. The burden is met
             | by changing our craft to incorporate the necessary
             | oversight to not be reckless in our use of this new tool.
             | 
             | Coming from the manufacturing world, I think of it like the
             | transition in metalwork industry from hand tools to
             | advanced CNC machines and robotics. A master craftsman with
             | a set of metal working files has total, intimate control.
             | When a CNC machine is introduced, it brings incredible
             | speed and capability, but also a new kind of danger. It has
             | no judgment. It will execute a flawed design with perfect,
             | precision.
             | 
             | An amateur using the CNC machine will trust it blindly and
             | create "plausibly good" work that doesn't meet the
             | specifications. A master, however, learns a new set of
             | skills: CAD design, calibrating the machine, and, most
             | importantly, inspecting the output. Their vigilance is what
             | turns reckless use of a new tool into an asset that allows
             | them to create things they couldn't before. They don't
             | trust the tool, they trust their process for using it.
             | 
             | My experience with LLM use has been the same. The
             | "vigilance" I practice is my new craft. I spend less time
             | on the manual labor of coding and more time on
             | architecture, design, and critical review. That's the only
             | way to manage the risks.
             | 
             | So I agree with your premise, with one key distinction: I
             | don't believe tools themselves can be reckless, only their
             | users can. Ultimately, like any powerful tool, its value is
             | unlocked not by the tool itself, but by the disciplined,
             | expert process used to control it.
        
       | HardCodedBias wrote:
       | All of this fighting against LLMs is pissing in the wind.
       | 
       | It seems that LLMs, as they work today, make developers more
       | productive. It is possible that they benefit less experienced
       | developers even more than experienced developers.
       | 
       | More productivity, and perhaps very large multiples of
       | productivity, will not be abandoned due roadblocks constructed by
       | those who oppose the technology due to some reason.
       | 
       | Examples of the new productivity tool causing enormous harm (eg:
       | bug that brings down some large service for a considerable amount
       | of time) will not stop the technology if it being considerable
       | productivity.
       | 
       | Working with the technology and mitigating it's weaknesses is the
       | only rational path forward. And those mitigation can't be a set
       | of rules that completely strip the new technology of it's
       | productivity gains. The mitigations have to work with the
       | technology to increase its adoption or they will be worked
       | around.
        
         | ge96 wrote:
         | It is funny (ego) I remember when React was new and I refused
         | to learn it, had I learned it earlier I probably would have
         | entered the market years earlier.
         | 
         | Even now I have this refusal to use GPT where as my coworkers
         | lately have been saying "ChatGPT says" or this code was created
         | by chatGPT idk, for me I take pride writing code myself/not
         | using GPT but I also still use google/stackoverflow which you
         | could say is a slower version of GPT.
        
           | anthonypasq wrote:
           | this mindset does not work in software. My dad would still be
           | programming with punchcards if he thought this way. instead
           | he using copilot daily writing microservices and isnt some
           | annoying dinosaur
        
             | ge96 wrote:
             | yeah it's pro con, I also hear my coworkers saying "I don't
             | know how it works" or there are methods in the code that
             | don't exist
             | 
             | But anyway I'm at the point in my career where I am not
             | learning to code/can already do it. Sure languages are
             | new/can help there for syntax
             | 
             | edit: other thing I'll add, I can see the throughput thing,
             | it's like a person has never used opensearch before and
             | it's a rabbithole, anything new there's that wall you have
             | to overcome, but it's like we'll get the feature done, but
             | did we really understand how it works... do we need to?
             | Idk. I know this person can barely code but because they
             | use something like chatGPT they're able to crap out walls
             | of code and with tweaking it will work eventually -- I am
             | aware this sounds like gatekeeping from my part
             | 
             | Ultimately personally I don't want to do software
             | professionaly/trying to save/invest enough then get out
             | just because the job part sucks the fun out of development.
             | I've been in it for about 10 years now, should have been
             | plenty of time to save but I'm dumb/too generous.
             | 
             | I think there is healthy skepticism too vs. just jumping on
             | the bandwagon that everyone else is doing and really my
             | problem is just I'm insecure/indecisive, I don't need
             | everyone to accept me especially if I don't need money
             | 
             | Last rant, I will be experimenting with agentic stuff as I
             | do like Jarvis, make my own voice rec model/locally runs.
        
         | mjr00 wrote:
         | > It seems that LLMs, as they work today, make developers more
         | productive.
         | 
         | Think this strongly depends on the developer and what they're
         | attempting to accomplish.
         | 
         | In my experience, most people who swear LLMs make them 10x more
         | productive are relatively junior front-end developers or serial
         | startup devs who are constantly greenfielding new apps. These
         | are totally valid use cases, to be clear, but it means a junior
         | front-end dev and a senior embedded C dev tend to talk past
         | each other when they're discussing AI productivity gains.
         | 
         | > Working with the technology and mitigating it's weaknesses is
         | the only rational path forward.
         | 
         | Or just using it more sensibly. As an example: is the idea of
         | an AI "agent" even a good one? The recent incident with
         | Copilot[0] made MS and AI look like a laughingstock. It's
         | possible that trying to let AI autonomously do work just isn't
         | very smart.
         | 
         | As a recent analogy, we can look at blockchain and
         | cryptocurrency. Love it or hate it, it's clear from the success
         | of Coinbase and others that blockchain has found some real, if
         | niche, use cases. But during peak crypto hype, you had people
         | saying stuff like "we're going to track the coffee bean supply
         | chain using blockchain". In 2025 that sounds like an
         | exaggerated joke from Twitter, but in 2020 it was IBM
         | legitimately trying to sell this stuff[1].
         | 
         | It's possible we'll look back and see AI agents, or other
         | current applications of generative AI, as the coffee blockchain
         | of this bubble.
         | 
         | [0]
         | https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...
         | 
         | [1]
         | https://www.forbes.com/sites/robertanzalone/2020/07/15/big-c...
        
           | parineum wrote:
           | > In my experience, most people who swear LLMs make them 10x
           | more productive are relatively junior front-end developers or
           | serial startup devs who are constantly greenfielding new
           | apps. These are totally valid use cases, to be clear, but it
           | means a junior front-end dev and a senior embedded C dev tend
           | to talk past each other when they're discussing AI
           | productivity gains.
           | 
           | I agree with this quite a lot. I also think that those
           | greenfield apps quickly become unmanageable by AI as you need
           | to start applying solutions that are unique/tailored for your
           | objective or you want to start abstracting some functionality
           | into building components and base classes that the AI hasn't
           | seen before.
           | 
           | I find AI very useful to get me to a from beginner to
           | intermediate in codebases and domains that I'm not familiar
           | with but, once I get the familiarity, the next steps I take
           | mostly without AI because I want to do novel things it's
           | never seen before.
        
         | scelerat wrote:
         | I didn't see the post as pissing into the wind so much as
         | calling out several caveats of coding with LLMs, especially on
         | teams, and ideas on how to mitigate them.
        
         | conartist6 wrote:
         | And here it is again. "More productive"
         | 
         | But this doesn't mean that the model/human combo is more
         | effective at serving the needs of users! It means "producing
         | more code."
         | 
         | There are no LLMs shipping changesets that delete 2000 lines of
         | code -- that's how you know "making engineers more productive"
         | is a way of talking about how much code is being created...
        
           | eikenberry wrote:
           | My wife's company recently hired some contractors and they
           | were touting their productivity with AI by saying how it
           | allowed them (one person) to write 150k lines of code in 3
           | weeks. They said this without sarcasm. It was funny and scary
           | at the same time that anyone might buy this as a good
           | outcome. Classic lines-of-code metric rearing its ugly head
           | again.
        
         | FuckButtons wrote:
         | I think you're arguing against something the author didn't
         | actually say.
         | 
         | You seem to be claiming that this is a binary, either we will
         | or won't use llms, but the author is mostly talking about risk
         | mitigation.
         | 
         | By analogy it seems like you're saying the author is
         | fundamentally against the development of the motor car because
         | they've pointed out that some have exploded whereas before, we
         | had horses which didn't explode, and maybe we should work on
         | making them explode less before we fire up the glue factories.
        
       | observationist wrote:
       | There's no reason to think AI will stop improving, and the rate
       | of improvement is increasing as well, and no reason to think that
       | these tools won't vastly outperform us in the very near future.
       | Putting aside AGI and ASI, simply improving the frameworks of
       | instructions and context, breaking down problems into smaller
       | problems, and methodology of tools will result in quality
       | multiplication.
       | 
       | Making these sort of blanket assessments of AI, as if it were a
       | singular, static phenomena is bad thinking. You can say things
       | like "AI Code bad!" about a particular model, or a particular
       | model used in a particular context, and make sense. You cannot
       | make generalized statements about LLMs as if they are uniform in
       | their flaws and failure modes.
       | 
       | They're as bad now as they're ever going to be again, and they're
       | getting better faster, at a rate outpacing the expectations and
       | predictions of all the experts.
       | 
       | The best experts in the world, working on these systems, have a
       | nearly universal sentiment of "holy shit" when working on and
       | building better AI - we should probably pay attention to what
       | they're seeing and saying.
       | 
       | There's a huge swathe of performance gains to be made in fixing
       | awful human code. There's a ton of low hanging fruit to be gotten
       | by doing repetitive and tedious stuff humans won't or can't do.
       | Those two things mean at least 20 or more years of impressive
       | utility from AI code can be had.
       | 
       | Things are just going to get faster, and weirder, and weirder
       | faster.
        
         | christhecaribou wrote:
         | Sure, if we all collectively ignore model collapse.
        
         | ayakaneko wrote:
         | I think that, yes sure, there's no reason to think AI will stop
         | improving.
         | 
         | But I think that everyone is lossing trust not because there is
         | no potential that LLMs could write good code or not, it's the
         | trust to the user who uses LLMs to uncontrollable-ly generate
         | those patches without any knowledge, fact checks, and
         | verifications. (many of them may not even know how to test it.)
         | 
         | In another word, while LLMs is potentially capable of being a
         | good SWE, but the human behind it right now, is spamming, and
         | doing non-sense works, and let the unpaid open source
         | maintainers to review and feedback them (most of the time,
         | manually).
        
         | klabb3 wrote:
         | > There's no reason to think AI will stop improving
         | 
         | No, and there's no reason to think cars will stop improving
         | either, but that doesn't mean they will start flying.
         | 
         | The first error is in thinking that AI is converging towards a
         | human brain. To treat this as a null hypothesis is incongruent
         | both wrt the functional differences between the two and
         | crucially empirical observations of the current trajectory of
         | LLMs. We have seen rapid increases in ability, yes, but those
         | abilities are very asymmetrical by domain. Pattern matching and
         | shitposting? Absolutely crushing humans already. Novel
         | conceptual ideas and consistency checked reasoning? Not so
         | much, eg all that hype around PhD-level novel math problems
         | died down as quickly as it had been manufactured. _If they
         | were_ converging on human brain function, why this vastly
         | uneven ability increases?
         | 
         | The second error is to assume a superlinear ability improvement
         | when the data has more or less run out and has to be slowly
         | replenished over time, while avoiding the AI pollution in
         | public sources. It's like assuming oil will accelerate if it
         | had run out and we needed to wait for more bio-matter to
         | decompose for every new drop of crude. Can we improve engine
         | design and make ICEs more efficient? Yes, but it's a
         | diminishing returns game. The scaling hypothesis was not
         | exponential but sigmoid, which is in line with most paradigm
         | shifts and novel discoveries.
         | 
         | > Making these sort of blanket assessments of AI, as if it were
         | a singular, static phenomena is bad thinking.
         | 
         | I agree, but do you agree with yourself here? Ie:
         | 
         | > no reason to think that these tools won't vastly outperform
         | us in the very near future
         | 
         | .. so back to single axis again? How is this different from
         | saying calculators outperform humans?
        
           | jrflowers wrote:
           | > shitposting? Absolutely crushing humans already
           | 
           | Where can I go to see language model shitposters that are
           | better than human shitposters?
        
             | klabb3 wrote:
             | On LinkedIn.
             | 
             | (Always remember to use eye protection.)
        
               | jrflowers wrote:
               | Oh. Well that's not super surprising. LinkedIn has the
               | absolute worst posters. That's like saying robots can
               | dance better than humans but the humans in question only
               | know how to do The Robot and forgot most of the steps
        
       | I_Lorem wrote:
       | He's making a good point on trust, but, really, doesn't the trust
       | flow both directions? Should the Sr. Engineer rubber stamp or
       | just take a quick glance at Bob's implementation because he's
       | earned his chops, or should the Sr. Engineer apply the same level
       | of review regardless of whether it's Bob, Mary, or Rando
       | Calrissian submitting their work for review?
        
         | eikenberry wrote:
         | The Sr. Engineer should definitely give (presumably another Sr.
         | Eng.) Bob's code a quicky review and approve it. If Mary or
         | Rando are Sr. then they should get the same level as well. If
         | anyone is a Jr. they should get a much more in-depth review as
         | it's a teaching opportunity, whereas Sr. on Sr. reviews are
         | done to enforce conventions and to be sure the PR has an
         | audience (people take more care when they know other people
         | will look at it).
        
       | macawfish wrote:
       | I bumped into this at work but not in the way you might expect.
       | My colleague and I were under some pressure to show progress and
       | decided to rush merging a pretty significant refactor I'd been
       | working on. It was a draft PR but we merged it for momentum's
       | sake. The next week some bugs popped up in an untested area of
       | the code.
       | 
       | As we were debugging, my colleague revealed his assumption that
       | I'd used AI to write it, and expressed frustration at trying to
       | understand something AI generated after the fact.
       | 
       | But I hadn't used AI for this. Sure, yes I do use AI to write
       | code. But this code I'd written by hand and with careful
       | deliberate thought to the overall design. The bugs didn't stem
       | from some fundamental flaw in the refactor, they were little
       | oversights in adjusting existing code to a modified API.
       | 
       | This actually ended up being a trust building experience over all
       | because my colleague and I got to talk about the tension
       | explicitly. It ended up being a pretty gentle encounter with the
       | power of what's happening right now. In hindsight I'm glad it
       | worked out this way, I could imagine in a different work
       | environment, something like this could have been more messy.
       | 
       | Be careful out there.
        
       | throwawayoldie wrote:
       | IMHO, s/may/has/
        
       | benreesman wrote:
       | I'm currently standing up a C++ capability in an org that hasn't
       | historically had one, so things like the style guide and examples
       | folder require a lot of care to give a good start for new
       | contributors.
       | 
       | I have instructions for agents that are different in some details
       | of convention, e.g. human contributors use AAA allocation style,
       | agents are instructed to use type first. I convert code that
       | "graduates" from agent product to review-ready as I review agent
       | output, which keeps me honest that I don't myself submit code
       | without scrutiny to the review of other humans: they are able to
       | prompt an LLM without my involvement, and I'm able to ship LLM
       | slop without making a demand on their time. Its an honor system,
       | but a useful one if everyone acts in good faith.
       | 
       | I get use from the agents, but I almost always make changes and
       | reconcile contradictions.
        
       | heisenbit wrote:
       | > The reality is that LLMs enable an inexperienced engineer to
       | punch far above their proverbial weight class. That is to say, it
       | allows them to work with concepts immediately that might have
       | taken days, months or even years otherwise to get to that level
       | of output.
       | 
       | At the moment LLMs allow me to punch far above my weight class in
       | Python where I do a short term job. But then I know all the
       | concepts from decades dabbling in other ecosystems. Let's all
       | admit there is a huge amount of accidental complexity (h/t
       | Brooks's Silver-bullet) in our world. For better or worse there
       | are skill silos that are now breaking down.
        
       | mensetmanusman wrote:
       | All this means is that the QC is going to be 10x more important.
        
       | wg0 wrote:
       | We have seen those 10x engineers churning out PRs and huge PRs
       | before anyone can fathom and make sense of the whole damn thing.
       | 
       | Wondering what they would be producing with LLMs?
        
       | lawlessone wrote:
       | One trust breaking issue is we still can't know why the LLM makes
       | specific choices.
       | 
       | Sure we can ask it why it did something but any reason it gives
       | is just something generated to sound plausible.
        
       ___________________________________________________________________
       (page generated 2025-06-26 23:01 UTC)