hngopher.com

       [HN Gopher] Why your boss isn't worried about AI - "can't you ju...
       ___________________________________________________________________
        
       Why your boss isn't worried about AI - "can't you just turn it
       off?"
        
       Author : beyarkay
       Score  : 144 points
       Date   : 2025-10-14 18:26 UTC (4 hours ago)
        
 (HTM) web link (boydkane.com)
 (TXT) w3m dump (boydkane.com)
        
       | kazinator wrote:
       | > _AIs will get more reliable over time, like old software is
       | more reliable than new software._
       | 
       | :)
       | 
       | Was that a humam Freudian slip, or artificial one?
       | 
       | Yes, old software is often more reliable than new.
        
         | joomla199 wrote:
         | Neither, you're reading it wrong. Think of it as codebases
         | getting more reliable over time as they accumulate fixes and
         | tests. (As opposed to, say, writing code in NodeJS versus C++)
        
           | giancarlostoro wrote:
           | Age of Code does not automatically equal quality of code,
           | ever. Good code is maintained by good developers. A lot of
           | bad code is pushed out by management, and other situations,
           | or just bad devs. This is a can of worms you're talking your
           | way into.
        
             | 1313ed01 wrote:
             | Old code that has been maintained (bugfixed), but not
             | messed with too much (i.e. major rewrites or new features)
             | is almost certain to be better than most other code though?
        
               | eptcyka wrote:
               | I've read parts of macOS' open source code that surely
               | has been around for a while, maintained and absolute
               | rubbish.
        
               | DSMan195276 wrote:
               | "Bugfixes" doesn't mean the code actually got better, it
               | just means someone attempted to fix a bug. I've seen
               | plenty of people make code worse and more buggy by trying
               | to fix a bug, and also plenty of old "maintained" code
               | that still has tons of bugs because it started from the
               | wrong foundation and everyone kept bolting on fixes
               | around the bad part.
        
             | prasadjoglekar wrote:
             | It actually might. Older code running in production is
             | almost automatically regression tested with each new fix.
             | It might not be pretty, but it's definitely more reliable
             | for solving real problems.
        
               | shakna wrote:
               | The list of bugs tagged regression at work certainly
               | suggests it gets tested... But fixing those
               | regressions...? That's a lot of dev time for things that
               | don't really have time allocated for them.
        
             | LeifCarrotson wrote:
             | You're using different words - the top comment only
             | mentioned the reliability of the software, which is only
             | tangentially related to the quality, goodness, or badness
             | of the code used to write it.
             | 
             | Old software is typically more reliable, not because the
             | developers were better or the software engineering targeted
             | a higher reliability metric, but because it's been tested
             | in the real world for years. Even more so if you consider a
             | known bug to be "reliable" behavior: "Sure, it crashes when
             | you enter an apostrophe in the name field, but everyone
             | knows that, there's a sticky note taped to the
             | receptionist's monitor so the new girl doesn't forget."
             | 
             | Maybe the new software has a more comprehensive automated
             | testing framework - maybe it simply _has_ tests, where the
             | old software had none - but regardless of how accurate you
             | make your mock objects, decades of end-to-end testing in
             | the real world is hard to replace.
             | 
             | As an industrial controls engineer, when I walk up to a
             | machine that's 30 years old but isn't working anymore, I'm
             | looking for failed mechanical components. Some switch is
             | worn out, a cable got crushed, a bearing is failing...it's
             | not the code's fault. It's not even the CMOS battery
             | failing and dropping memory this time, because we've had
             | that problem 4 times already, we recognize it and have a
             | procedure to prevent it happening again. The code didn't
             | change spontaneously, it's solved the business problem for
             | decades... Conversely, when I walk up to a newly
             | commissioned machine that's only been on the floor for a
             | month, the problem is probably something that hasn't ever
             | been tried before and was missed in the test procedure.
        
               | freetime2 wrote:
               | Yup, I have worked on several legacy codebases, and a
               | pretty common occurence is that a new team member will
               | join and think they may have discovered a bug in the
               | code. Sometimes they are even quite adamant that the code
               | is complete garbage and could never have worked properly.
               | Usually the conversation goes something like: "This code
               | is heavily used in production, and hasn't been touched in
               | 10 years. If it's broken, then why haven't we had any
               | complaints from users?"
               | 
               | And more often than not the issue is a local
               | configuration issue, bad test data, a misunderstanding of
               | what the code is supposed to do, not being aware of some
               | alternate execution path or other pre/post processing
               | that is running, some known issue that we've decided not
               | to fix for some reason, etc. (And of course sometimes we
               | do actually discover a completely new bug, but it's
               | rare).
               | 
               | To be clear, there are certainly code quality issues
               | present that make _modifications_ to the code costly and
               | risky. But the code itself is quite reliable, as most
               | bugs have been found and fixed over the years. And a lot
               | of the messy bits in the code are actually important
               | usability enhancements that get bolted on after the fact
               | in response to real-world user feedback.
        
             | hatthew wrote:
             | I think we all agree that the quality of the code itself
             | goes down over time. I think the point that is being made
             | is that the quality of the _final product_ goes up over
             | time.
             | 
             | E.g. you might fix a bug by adding a hacky workaround in
             | the code; better product, worse code.
        
             | kube-system wrote:
             | The author didn't mean that an older commit date on a file
             | makes code better.
             | 
             | The author is talking about the maturity of a project.
             | Likewise, as AI technologies become more mature we will
             | have more tools to use them in a safer and more reliable
             | way.
        
           | izzydata wrote:
           | Sounds more like survivorship bias. All the bad codebases
           | were thrown out and only the good ones lasted a long time.
        
             | wsc981 wrote:
             | Basically the Lindy Effect:
             | https://en.wikipedia.org/wiki/Lindy_effect
        
             | wvenable wrote:
             | In my experience actively maintained but not heavily
             | modified applications tend towards stability over time. It
             | don't even matter if they are good or bad codebases -- even
             | a bad code will become less buggy over time if someone is
             | working on bug fixes.
             | 
             | New code is the source of new bugs. Whether that's an
             | entirely new product, a new feature on an existing project,
             | or refactoring.
        
           | kazinator wrote:
           | You mean think of it as opposite to what is written in the
           | remark, and then find it funny?
           | 
           | Yes, I did that.
        
           | james_marks wrote:
           | I've always called this "Work Hardening", as in, the software
           | has been improved over time by real work being done with it.
        
         | glitchc wrote:
         | Perhaps better rephrased as "software that's been running for a
         | (long) while is more reliable than software that only started
         | running recently."
        
         | kstrauser wrote:
         | Holy survivorship bias, Batman.
         | 
         | If you think modern software is unreliable, let me introduce
         | you to our friend, Rational Rose.
        
           | kazinator wrote:
           | At least that project was wise enough to use Lisp for storing
           | its project files.
        
           | noir_lord wrote:
           | Agreed.
           | 
           | Or debuggers that would take out the entire OS.
           | 
           | Or a bad driver crashing everything multiple times a week.
           | 
           | Or a misbehaving process not handing control back to the OS.
           | 
           | I grew up in the era of 8 and 16 bit micros and early PCs,
           | they where hilariously less stable than modern machines while
           | doing far less, there wasn't some halcyon age of near perfect
           | software, it's always been a case of things been good enough
           | to be good enough but at least operating systems _did_
           | improve.
        
             | malfist wrote:
             | Remember BSODs? Used to be a regular occurrence, now
             | they're so infrequent they're gone from windows 11
        
               | kazinator wrote:
               | I remember Linux being remarkable reliable throughout its
               | entire life _in spite of being rabidly worked on_.
               | 
               | Windows is only stabilizing because it's basically dead.
               | All the activity is in the higher layers, where they are
               | racking their brains on how to enshittify the experience,
               | and extract value out of the remaining users.
        
               | wlesieutre wrote:
               | And the "cooperative multitasking" in old operating
               | systems where one program locking up meant the whole
               | system was locked up
        
               | krior wrote:
               | Gone? I had two last year, lets not overstate things.
        
               | rkomorn wrote:
               | My anecdata is that my current PC is four years old, with
               | the same OS install, and I can't even recall if I've seen
               | one BSoD.
        
               | ClimaxGravely wrote:
               | Still get them fairly regularly except now they come with
               | a QR code.
        
               | dist-epoch wrote:
               | Mostly because Microsoft shut down kernel access, wrote
               | it's own generic drivers for "simple" devices (USBs,
               | printers, sound cards, ...) and made "heavy" drivers
               | submit to their WHQL quality control to be signed to run.
        
               | ponector wrote:
               | I guess that is because you run it on old hardware. When
               | I've bought my Asus ROG expensive laptop I had bsod
               | almost daily. A year later with all updates I had bsod
               | once in a month on the same device and windows
               | installation.
        
               | Podrod wrote:
               | They're definitely not gone.
        
             | Yoric wrote:
             | I grew up in the same era and I recall crashes being less
             | frequent.
             | 
             | There were plenty of other issues, including the fact that
             | you had to adjust the right IRQ and DMA for your Sound
             | Blaster manually, both physically and in each game, or that
             | you needed to "optimize" memory usage, enable XMS or EMS or
             | whatever it was at the time, or that you spent hours
             | looking at the nice defrag/diskopt playing with your files,
             | etc.
             | 
             | More generally, as you hint to, desktop operating systems
             | were crap, but the software on top of it was much more
             | comprehensively debugged. This was presumably a combination
             | of two factors: you couldn't ship patches, so you had a
             | strong incentive to debug it if you wanted to sell it, and
             | software had way fewer features.
             | 
             | Come to think about it, early browsers kept crashing and
             | taking down the entire OS, so maybe I'm looking at it with
             | rosy glasses.
        
           | binarymax wrote:
           | You know, I had spent a good amount of years not having even
           | a single thought about rational rose, and now that's all
           | over.
        
             | cjbgkagh wrote:
             | How much of that do you think would be attributable to IBM
             | or Rational Software?
        
             | kstrauser wrote:
             | I do apologize. I couldn't bear this burden alone.
        
       | fidotron wrote:
       | But this is why using the AI in the production of (almost)
       | deterministic systems makes so much sense, including saving on
       | execution costs.
       | 
       | ISTR someone else round here observing how much more effective it
       | is to ask these things to write short scripts that perform a task
       | than doing the task themselves, and this is my experience as
       | well.
       | 
       | If/when AI actually gets much better it will be the boss that has
       | the problem. This is one of the things that baffles me about the
       | managerial globalists - they don't seem to appreciate that a
       | suitably advanced AI will point the finger at them for
       | inefficiency much more so than at the plebs, for which it will
       | have a use for quite a while.
        
         | pixl97 wrote:
         | >that baffles me about the managerial globalists
         | 
         | It's no different from those on HN that yell loudly that unions
         | for programmers are the worst idea ever... "it will never be
         | me" is all they can think, then they are protesting in the
         | streets when it is them, but only after the hypocrisy of
         | mocking those in the street protesting today.
        
           | hn_acc1 wrote:
           | Agreed. My dad was raised strongly fundamentalist, and in
           | North America, that included (back then) strongly resisting
           | unions. In hindsight, I've come to realize that my parent's
           | weren't maybe even of average intelligence, and definitely of
           | above-average gullibility.
           | 
           | Unionized software engineers would solve a lot of the "we
           | always work 80 hour weeks for 2 months at the end of a
           | release cycle" problems, the "you're too old, you're fired"
           | issues, the "new hires seems to always make more than the
           | 5/10+ year veterans", etc. Sure, you wouldn't have a few
           | getting super rich, but it would also make it a lot easier
           | for "unionized" action against companies like Meta, Google,
           | Oracle, etc. Right now, the employers hold like 100x the
           | power of the employees in tech. Just look at how much any
           | kind of resistance to fascism has dwindled after FAANG had
           | another round of layoffs..
        
             | fidotron wrote:
             | Software "engineers" totally miss a key thing in other
             | engineering professions as well, which is organizations to
             | enforce some pretense of ethical standards to help push
             | back against requests from product. Those orgs often look a
             | lot like unions.
        
         | hn_acc1 wrote:
         | A bunch of short scripts doesn't easily lead to a large-scale
         | robust software platform.
         | 
         | I guess if managers get canned, it'll be just marketing types
         | left?
        
       | xutopia wrote:
       | The most likely danger with AI is concentrated power, not that
       | sentient AI will develop a dislike for us and use us as
       | "batteries" like in the Matrix.
        
         | preciousoo wrote:
         | Seems like a self fulfilling prophecy
        
           | yoyohello13 wrote:
           | Definitely not 'self' fulfilling. There are plenty of people
           | actively and vigorously working to fulfill that particular
           | reality.
        
         | fidotron wrote:
         | I'm not so sure it will be that either, it would be having
         | multiple AIs essentially at war with each other over access to
         | GPUs/energy or whatever the materials are needed to grow
         | if/when that happens. We will end up as pawns in this conflict.
        
           | ben_w wrote:
           | Given that even fairly mediocre human intelligences can run
           | countries into the ground and avoid being thrown out in the
           | process, it's certainly _possible_ for an AI to be in the
           | intelligence range where it 's smart enough to win vs humans
           | but also dumb enough to turn us into pawns rather just go to
           | space and blot out the sun with a Dyson swarm made from the
           | planet Mercury.
           | 
           | But don't count on it.
           | 
           | I mean, apart from anything else, that's still a bad outcome.
        
         | pcdevils wrote:
         | For one thing, we'd make shit batteries.
        
           | prometheus76 wrote:
           | They farm you for attention, not electricity. Attention
           | (engagement time) is how they quantify "quality" so that it
           | can be gamed with an algorithm.
        
           | noir_lord wrote:
           | IIRC the original idea was that the machines used our brain
           | capacity as a distributed array but then they decided
           | batteries was easier to understand while been sillier, just
           | burn the carbon they are feeding us, it's more efficient.
        
         | darth_avocado wrote:
         | The reality is that the CEO/executive class already has
         | developed a dislike for us and is trying to use us as
         | "batteries" like in the Matrix.
        
           | ljlolel wrote:
           | CEOs (even most VCs) are labor too
        
             | pavel_lishin wrote:
             | Do they know it?
        
             | toomuchtodo wrote:
             | Labor competes for compensation, CEOs compete for status
             | (above a certain enterprise size, admittedly). Show me a
             | CEO willingly stepping down to be replaced by generative
             | AI. Jamie Dimon will be so bold to say AI will bring about
             | a 3 day week (because it grabs headlines [1]) but he isn't
             | going to give up the status of running JPMC; it's all he
             | has besides the wealth, which does not appear to be enough.
             | The feeling of importance and exceptionalism is baked into
             | the identity.
             | 
             | [1] https://fortune.com/article/jamie-dimon-jpmorgan-chase-
             | ceo-a...
        
               | Animats wrote:
               | That's the market's job. Once AI CEOs start outperforming
               | human CEOs, investment will flow to the winners. Give it
               | 5-10 years.
               | 
               | (Has anyone tried an LLM on an in-basket test? [1] That's
               | a basic test for managers.)
               | 
               | [1] https://en.wikipedia.org/wiki/In-basket_test
        
               | conception wrote:
               | Spoiler there's no reason we couldn't work three days a
               | week now. And 100 might be pushing it, but having life
               | expectancy to 90 as well within our grass today as well.
               | We have just decided not to do that.
        
             | darth_avocado wrote:
             | Until shareholders treat them as such, they will remain in
             | the ruling class
        
             | icedchai wrote:
             | Almost everyone is "labor" to some extent. There is always
             | a huge customer or major investor that you are beholden to.
             | If you are independently wealthy then you are the
             | exception.
        
           | vladms wrote:
           | Do you know personally some CEO-s? I know a couple and they
           | generally seem less empathic than the general population, so
           | I don't think that like/dislike even applies.
           | 
           | On the other hand, trying to do something "new" is lots of
           | headaches, so emotions are not always a plus. I could make a
           | parallel to doctors: you don't want a doctor to start crying
           | in a middle of an operation because he feels bad for you, but
           | you can't let doctors doing everything that they want - there
           | needs to be some checks on them.
        
             | darth_avocado wrote:
             | I would say that the parallel is not at all accurate
             | because the relationship between a doctor and a patient
             | undergoing surgery is not the same as the one you and I
             | have with CEOs. And a lot of good doctors have emotions and
             | they use them to influence patient outcomes positively.
        
         | nancyminusone wrote:
         | To me, the greatest threat is information pollution. Primary
         | sources will be diluted so heavily in an ocean of generated
         | trash that you might as well not even bother to look through
         | any of it.
        
           | tobias3 wrote:
           | And it imitates all the unimportant bits perfectly (like
           | spelling, grammar, word choice) while failing at the hard to
           | verify important bits (truth, consistency, novelty)
        
         | worldsayshi wrote:
         | > power resides where men believe it resides
         | 
         | And also where people believe that others believe it resides.
         | Etc...
         | 
         | If we can find new ways to collectively renegotiate where we
         | think power should reside we can break the cycle.
         | 
         | But we only have time to do this until people aren't a
         | significant power factor anymore. But that's still quite some
         | time away.
        
         | SkyBelow wrote:
         | I agree.
         | 
         | Our best technology at current require teams of people to
         | operate and entire legions to maintain. This leads to a sort of
         | balance, one single person can never go too far down any path
         | on their own unless they convince others to join/follow them.
         | That doesn't make this a perfect guard, we've seen it go
         | horribly wrong in the past, but, at least in theory, this
         | provides a dampening factor. It requires a relatively large
         | group to go far along any path, towards good or evil.
         | 
         | AI reduces this. How greatly it reduces this, if it reduces it
         | to only a handful, to a single person, or even to 0 people
         | (putting itself in charge), seems to not change the danger of
         | this reduction.
        
         | mrob wrote:
         | Why does an AI need the ability to "dislike" to calculate that
         | its goals are best accomplished without any living humans
         | around to interfere? Superintelligence doesn't need emotions or
         | consciousness to be dangerous.
        
           | Yoric wrote:
           | It needs to optimize for something. Like/dislike is an
           | anthropomorphization of the concept.
        
             | mrob wrote:
             | It's an unhelpful one because it implies the danger is
             | somehow the result of irrational or impulsive thought, and
             | making the AI smarter will avoid it.
        
               | Yoric wrote:
               | That's not how I read it.
               | 
               | Perhaps because most of the smartest people I know are
               | regularly irrational or impulsive :)
        
               | ben_w wrote:
               | I think most people don't get that; look at how often
               | even Star Trek script writers write Straw Vulcans*.
               | 
               | * https://tvtropes.org/pmwiki/pmwiki.php/Main/StrawVulcan
        
         | surgical_fire wrote:
         | "AI will take over the world".
         | 
         | I hear that. Then I try to use AI for simple code task, writing
         | unit tests for a class, very similar to other unit tests. If
         | fails miserably. Forgets to add an annotation and enters in a
         | death loop of bullshit code generation. Generates test classes
         | that tests failed test classes that test failed test classes
         | and so on. Fascinating to watch. I wonder how much CO2 it
         | generated while frying some Nvidia GPU in an overpriced data
         | center.
         | 
         | AI singularity may happen, but the Mother Brain will be a
         | complete moron anyway.
        
           | alecbz wrote:
           | Regularly trying to use LLMs to debug coding issues has
           | convinced me that we're _nowhere_ close to the kind of AGI
           | some are imagining is right around the corner.
        
             | surgical_fire wrote:
             | At least Mother Brain will praise your prompt to generate
             | yet another image in the style of Studio Ghibli as proof
             | that your mind is a _tour de force_ in creativity, and only
             | a borderline genius would ask for such a thing.
        
             | ben_w wrote:
             | Sure, but also the METR study showed the rate of change is
             | t doubles every 7 months where t ~= <<duration of human
             | time needed to complete a task, such that SOTA AI can
             | complete same with 50% success>>:
             | https://arxiv.org/pdf/2503.14499
             | 
             | I don't know how long that exponential will continue for,
             | and I have my suspicions that it stops before week-long
             | tasks, but that's the trend-line we're on.
        
         | ben_w wrote:
         | Concentrated power is kinda a pre-requisite for anything bad
         | happening, so yes, it's more likely in exactly the same way
         | that given this:                 Linda is 31 years old, single,
         | outspoken, and very bright. She majored in philosophy. As a
         | student, she was deeply concerned with issues of discrimination
         | and social justice, and also participated in anti-nuclear
         | demonstrations.
         | 
         | "Linda is a bank teller" is strictly more likely than "Linda is
         | a bank teller and is active in the feminist movement" -- all
         | you have is P(a)>P(a&b), not what the probability of either
         | statement is.
        
         | navane wrote:
         | The power concentration is already massive, and a huge problem
         | indeed. The ai is just a cherry on top. The ai is not the
         | problem.
        
         | mmmore wrote:
         | You can say that, and I might even agree, but many smart people
         | disagree. Could you explain why you believe that? Have you read
         | in detail the arguments of people who disagree with you?
        
       | alganet wrote:
       | > here are some example ideas that are perfectly true when
       | applied to regular software
       | 
       | Hm, I'm listening, let's see.
       | 
       | > Software vulnerabilities are caused by mistakes in the code
       | 
       | That's not exactly true. In regular software, the code can be
       | fine and you can still end up with vulnerabilities. The platform
       | in which the code is deployed could be vulnerable, or the way it
       | is installed make it vulnerable, and so on.
       | 
       | > Bugs in the code can be found by carefully analysing the code
       | 
       | Once again, not exactly true. Have you ever tried understanding
       | concurrent code just by reading it? Some bugs in regular software
       | hide in places that human minds cannot probe.
       | 
       | > Once a bug is fixed, it won't come back again
       | 
       | Ok, I'm starting to feel this is a troll post. This guy can't be
       | serious.
       | 
       | > If you give specifications beforehand, you can get software
       | that meets those specifications
       | 
       | Have you read The Mythical Man-Month?
        
         | SalientBlue wrote:
         | You should read the footnote marked [1] after "a note for
         | technical folk" at the beginning of the article. He is very
         | consciously making sweeping generalizations about how software
         | works in order to make things intelligible to non-technical
         | readers.
        
           | dkersten wrote:
           | Sure, but:
           | 
           | > these claims mostly hold, but they break down when applied
           | to distributed systems, parallel code, or complex
           | interactions between software systems and human processes
           | 
           | The claims the GP quoted DON'T mostly hold, they're just
           | plain wrong. At least the last two, anyway.
        
           | pavel_lishin wrote:
           | But are those sweeping generalizations true?
           | 
           | > _I'm also going to be making some sweeping statements about
           | "how software works", these claims mostly hold, but they
           | break down when applied to distributed systems, parallel
           | code, or complex interactions between software systems and
           | human processes._
           | 
           | I'd argue that this describes most software written since,
           | uh, I hesitate to even commit to a decade here.
        
             | hedora wrote:
             | At least the 1950's. That's when stuff like asynchrony and
             | interrupts were worked out. Dijkstra wrote at length about
             | this in reference to writing code that could drive a
             | teletype (which had fundamentally non-deterministic
             | timings).
             | 
             | If you include analog computers, then there are some WWII
             | targeting computers that definitely qualify (e.g., on
             | aircraft carriers).
        
             | SalientBlue wrote:
             | For the purposes of the article, which is to demonstrate
             | how developing an LLM is completely different from
             | developing traditional software, I'd say they are true
             | enough. It's a CS 101 understanding of the software
             | development lifecycle, which for non-technical readers is
             | enough to get the point across. An accurate depiction of
             | software development would only obscure the actual point
             | for the lay reader.
        
           | alganet wrote:
           | Does that really matter?
           | 
           | He is trying to lax the general public perception around AIs
           | shortcomings. He's giving AI a break, at the expense of
           | regular developers.
           | 
           | This is wrong on two fronts:
           | 
           | First, because many people foresaw the AI shortcomings and
           | warned about them. This "we can't fix a bug like in regular
           | software" theatre hides the fact that we can design better
           | benchmarks, or accountability frameworks. Again, lots of
           | people foresaw this, and they were ignored.
           | 
           | Second, because it puts the strain on non-AI developers. It
           | blamishes all the industry, putting together AI with non-AI
           | in the same bucket, as if AI companies stumbled on this new
           | thing and were not prepared for its problems, when the
           | reality is that many people were anxious about the AI
           | companies practices not being up to standard.
           | 
           | I think it's a disgraceful take, that only serves to sweep
           | things under a carpet.
        
             | SalientBlue wrote:
             | I don't think he's doing that at all. The article is
             | pointing out to non-technical people how AI is different
             | than traditional software. I'm not sure how you think it's
             | giving AI a break, as it's pointing out that it is
             | essentially impossible to reason about. And it's not at the
             | expense of regular developers because it's showing how
             | regular software development is _different_ than this. It
             | makes two buckets, and puts AI in one and non-AI in the
             | other.
        
               | alganet wrote:
               | He is. Maybe he's just running with the pack, but that
               | doesn't matter either.
               | 
               | The fact is, we kind of know how to prevent problems in
               | AI systems:
               | 
               | - Good benchmarks. People said several times that LLMs
               | display erratic behavior that could be prevented. Instead
               | of adjusting the benchmarks (which would slow down
               | development), they ignored the issues.
               | 
               | - Accountability frameworks. Who is responsible when an
               | AI fails? How the company responsible for the model is
               | going to make up for it? That was a demand from the very
               | beginning. There are no such accountability systems in
               | place. It's a clown fiesta.
               | 
               | - Slowing down. If you have a buggy product, you don't
               | scale it. First, you try to understand the problem. This
               | was the opposite of what happened, and at the time, they
               | lied that scaling would solve the issues (when in fact
               | many people knew for a fact that scaling wouldn't solve
               | shit).
               | 
               | Yes, it's kind of different. But it's a different we
               | already know. Stop pushing this idea that this stuff is
               | completely new.
        
               | SalientBlue wrote:
               | >But it's a different we already know
               | 
               | 'we' is the operative word here. 'We', meaning technical
               | people who have followed this stuff for years. The target
               | audience of this article are not part of this 'we' and
               | this stuff IS completely new _for them_. The target
               | audience are people who, when confronted with a problem
               | with an LLM, think it is perfectly reasonable to just
               | tell someone to 'look at the code' and 'fix the bug'. You
               | are not the target audience and you are arguing something
               | entirely different.
        
               | alganet wrote:
               | Let's pretend I'm the audience, and imagine that in the
               | past I said those things ("fix the bug" and "look at the
               | code").
               | 
               | What should I say now? "AI works in mysterious ways"?
               | Doesn't sound very useful.
               | 
               | Also, should I start parroting innacurate outdated
               | generalizations about regular software?
               | 
               | The post doesn't teach anything useful for a beginner
               | audience. It's bamboozling them. I am amazed that you
               | used the audience perspective as a defense of some kind.
               | It only made it worse.
               | 
               | Please, please, take a moment to digest my critique
               | properly. Think about what you just said and what that
               | implies. Re-read the thread if needed.
        
       | drsupergud wrote:
       | > bugs are usually caused by problems in the data used to train
       | an AI
       | 
       | This also is a misunderstanding.
       | 
       | The LLM can be fine, the training and data can be fine, but
       | because the LLMs we use are non-deterministic (at least in regard
       | to their being intentional attempts at entropy to avoid always
       | failing certain scenarios) current algorithms are inherently by-
       | design not going to always answer every question correctly that
       | it potentially could have if the values that fall within a range
       | had been specific values for that scenario. You roll the dice on
       | every answer.
        
         | coliveira wrote:
         | This is not necessarily a problem. Any programming or
         | mathematical question has several correct answers. The problem
         | with LLMs is that they don't have a process to guarantee that a
         | solution is correct. They will give a solution that seems
         | correct under their heuristic reasoning, but they arrived at
         | that result in a non-logical way. That's why LLMs generate so
         | many bugs in software and in anything related to logical
         | thinking.
        
           | vladms wrote:
           | > Any programming or mathematical question has several
           | correct answers.
           | 
           | Huh? If I need to sort the list of integer number of 3,1,2 in
           | ascending order the only correct answer is 1,2,3. And there
           | are multiple programming and mathematical questions with only
           | one correct answer.
           | 
           | If you want to say "some programming and mathematical
           | questions have several correct answers" that might hold.
        
             | redblacktree wrote:
             | What about multiple notational variations?
             | 
             | 1, 2, 3
             | 
             | 1,2,3
             | 
             | [1,2,3]
             | 
             | 1 2 3
             | 
             | etc.
        
             | naasking wrote:
             | I think more charitably, they meant either that 1. There is
             | often more than one way to arrive at any given answer, or
             | 2. Many questions are ambiguous and so may have many
             | different answers.
        
             | Yoric wrote:
             | "1, 2, 3" is a correct answer
             | 
             | "1 2 3" is another
             | 
             | "After sorting, we get `1, 2, 3`" yet another
             | 
             | etc.
             | 
             | At least, that's how I understood GP's comment.
        
           | naasking wrote:
           | > The problem with LLMs is that they don't have a process to
           | guarantee that a solution is correct
           | 
           | Neither do we.
           | 
           | > They will give a solution that seems correct under their
           | heuristic reasoning, but they arrived at that result in a
           | non-logical way.
           | 
           | As do we, and so you can correctly reframe the issue as
           | "there's a gap between the quality of AI heuristics and the
           | quality of human heuristics". That the gap is still shrinking
           | though.
        
             | tyg13 wrote:
             | I'll never doubt the ability of people like yourself to
             | consistently mischaracterize human capabilities in order to
             | make it seem like LLMs' flaws are just the same as (maybe
             | even fewer than!) humans. There are still so many obvious
             | errors (noticeable by just using Claude or ChatGPT to do
             | some non-trivial task) that the average human would simply
             | not make.
             | 
             | And no, just because you can imagine a human stupid enough
             | to make the same mistake, doesn't mean that LLMs are
             | somehow human in their flaws.
             | 
             | > the gap is still shrinking though
             | 
             | I can tell this human is fond of extrapolation. If the gap
             | is getting smaller, surely soon it will be zero, right?
        
               | ben_w wrote:
               | > doesn't mean that LLMs are somehow human in their
               | flaws.
               | 
               | I don't believe anyone is suggesting that LLMs flaws are
               | perfectly 1:1 aligned with human flaws, just that both do
               | have flaws.
               | 
               | > If the gap is getting smaller, surely soon it will be
               | zero, right?
               | 
               | The gap between y=x^2 and y=-x^2-1 gets closer for a bit,
               | fails to ever become zero, then gets bigger.
               | 
               | The difference between any given human (or even all
               | humans) and AI will never be zero: Some future AI that
               | can _only_ do what one or all of us can do, can be
               | trivially glued to any of that other stuff where AI can
               | already do better, like chess and go (and stuff simple
               | computers can do better, like arithmetic).
        
               | naasking wrote:
               | > I'll never doubt the ability of people like yourself to
               | consistently mischaracterize human capabilities
               | 
               | Ditto for your mischaracterizations of LLMs.
               | 
               | > There are still so many obvious errors (noticeable by
               | just using Claude or ChatGPT to do some non-trivial task)
               | that the average human would simply not make.
               | 
               | Firstly, so what? LLMs also do things no human could do.
               | 
               | Secondly, they've learned from unimodal data sets which
               | don't have the rich semantic content that humans are
               | exposed to (not to mention born with due to evolution).
               | Questions that cross modal boundaries are expected to be
               | wrong.
               | 
               | > If the gap is getting smaller, surely soon it will be
               | zero, right?
               | 
               | Quantify "soon".
        
       | smallnix wrote:
       | > bad behaviour isn't caused by any single bad piece of data, but
       | by the combined effects of significant fractions of the dataset
       | 
       | Related opposing data point to this statement:
       | https://news.ycombinator.com/item?id=45529587
        
         | buellerbueller wrote:
         | "Signficiant fraction" does not imply (to this data scientist)
         | a large fraction.
        
       | themanmaran wrote:
       | > Because eventually we'll iron out all the bugs so the AIs will
       | get more reliable over time
       | 
       | Honestly this feels like a true statement to me. It's obviously a
       | new technology, but so much of the "non-deterministic ===
       | unusable" HN sentiment seems to ignore the last two years where
       | LLMs have become 10x as reliable as the initial models.
        
         | criddell wrote:
         | Right away my mind went to "well, are people more reliable than
         | they used to be?" and I'm not sure they are.
         | 
         | Of course LLMs aren't people, but an AGI might behave like a
         | person.
        
           | adastra22 wrote:
           | Older people are generally more reliable than younger people.
        
           | Yoric wrote:
           | By the time a junior dev graduates to senior, I expect that
           | they'll be more reliable. In fact, at the end of each
           | project, I expect the junior dev to have grown more reliable.
           | 
           | LLMs don't learn from a project. At best, you learn how to
           | better use the LLM.
           | 
           | They do have other benefits, of course, i.e. once you have
           | trained one generation of Claude, you have as many instances
           | as you need, something that isn't true with human beings.
           | Whether that makes up for the lack of quality is an open
           | question, which presumably depends on the projects.
        
         | CobrastanJorji wrote:
         | They have certainly gotten better, but it seems to me like the
         | growth will be kind of logarithmic. I'd expect them to keep
         | getting better quickly for a few more years and then kinda slow
         | and eventually flatline as we reach the maximum for this sort
         | of pattern matching kind of ML. And I expect that flat line
         | will be well below the threshold needed for, say, a small
         | software company to not require a programmer.
        
           | Terr_ wrote:
           | > kind of logarithmic
           | 
           | https://en.wikipedia.org/wiki/Sigmoid_function
        
             | CobrastanJorji wrote:
             | Ironically, yes. :)
        
       | freediver wrote:
       | Lovely blog, RSS please.
        
         | meonkeys wrote:
         | There's... something at https://boydkane.com/index.xml
         | 
         | I guessed the URL based on the Quartz docs. It seems to work
         | but only has a few items from https://boydkane.com/essays/
        
           | 5- wrote:
           | the author (either of the blog or its software) would do well
           | to consult https://www.petefreitag.com/blog/rss-
           | autodiscovery/
        
       | nlawalker wrote:
       | Where did _" can't you just turn it off?"_ in the title come
       | from? It doesn't appear anywhere in the actual title or the
       | article, and I don't think it really aligns with its main
       | assertions.
        
         | meonkeys wrote:
         | It shows up at https://boydkane.com under the link "Why your
         | boss isn't worried about advanced AI". Must be some kind of
         | sub-heading, but not part of the actual article / blog post.
         | 
         | Presumably it's a phrase you might hear from a boss who sees AI
         | as similar to (and as benign/known/deterministic as) most other
         | software, per TFA
        
           | nlawalker wrote:
           | Ah, thanks for that!
           | 
           |  _> Presumably it's a phrase you might hear from a boss who
           | sees AI as similar to (and as benign/known/deterministic as)
           | most other software, per TFA_
           | 
           | Yeah I get that, but I think that given the content of the
           | article, _" can't you just fix the code?"_ or the like would
           | have been a better fit.
        
           | omnicognate wrote:
           | It's a poor choice of phrase if the purpose is to illustrate
           | a false equivalence. It applies to AI both as much (you can
           | kill a process or stop a machine just the same regardless of
           | whether it's running an LLM) and as little (you can't "turn
           | off" Facebook any more than you can "turn off" ChatGPT) as it
           | does to any other kind of software.
        
           | Izkata wrote:
           | It's a sci-fi thing, think of it along the lines of "What do
           | you mean Skynet has gone rogue? Can't you just turn it off?"
           | 
           | (I think something along these lines was actually in the
           | _Terminator 3_ movie, the one where Skynet goes live for the
           | first time).
           | 
           | Agreed though, no relation to the actual post.
        
         | wmf wrote:
         | Turning AI off comes up a lot in existential risk discussions
         | so I was surprised the article isn't about that.
        
       | mikkupikku wrote:
       | I don't understand the "your boss" framing of this article, or
       | more accurately, the title of this article. The article contents
       | don't actually seem to have anything to do with management
       | specifically. Is the reader is meant to believe that not being
       | scared of AI is a characteristic of the managerial class? Is the
       | unstated implication that there is some class warfare angle and
       | anybody who isn't against AI is against laborers? Because what
       | the article actually overtly argues, without any reading between
       | the lines, is quite mundane.
        
         | freetime2 wrote:
         | > Is the unstated implication that there is some class warfare
         | angle and anybody who isn't against AI is against laborers?
         | 
         | I didn't read it that way. I read "your boss" as basically
         | meaning any non-technical person who may not understand the
         | challenges of harnessing LLMs compared to traditional, (more)
         | deterministic software development.
        
       | tptacek wrote:
       | It would help if this piece was clearer about the context in
       | which "AI bugs" reveal themselves. As an argument for why you
       | shouldn't have LLMs making unsupervised real-time critical
       | decisions, these points are all well taken. AI shouldn't be
       | controlling the traffic lights in your town. _We may never reach
       | a point where it can._ But among technologists, the major front
       | on which these kinds of bugs are discussed is coding agents, and
       | almost none of these points apply directly to coding agents:
       | agent coding is (or should be) a supervised process.
        
       | wrs wrote:
       | My current method for trying to break through this misconception
       | is informing people that nobody knows how AI works. Literally.
       | Nobody knows. (Note that knowing how to make something is not the
       | same as knowing how it works. Take humans as an obvious example.)
        
         | generic92034 wrote:
         | Nobody knows (full scope and on every level) how human brains
         | work. Still bosses rely on their employees' brains all the
         | time.
        
         | candiddevmike wrote:
         | I don't understand the point you're making. We know how LLMs
         | work, predicting neuron activation while an interesting thought
         | exercise doesn't really mean LLMs are some mythical black box.
         | It's just really expensive math. We haven't invented AI so we
         | don't know how it works?
        
       | jongjong wrote:
       | This article makes a solid case. The worst kinds of bugs in
       | software are not the most obvious ones like syntax errors, they
       | are the ones where the code appears to be working correctly,
       | until some users do something slightly unusual after a few weeks
       | of some code change being deployed and it breaks spectacularly
       | but the bug only affects a small fraction of users so developers
       | cannot reproduce the issue... And the cose change happened such
       | time ago that the guilty code isn't even suspected.
        
       | Animats wrote:
       | Aim bosses at this article in The Economist.[1] If your boss
       | doesn't read The Economist, you need to escalate to a level that
       | does.
       | 
       | [1] https://www.economist.com/leaders/2025/09/25/how-to-stop-
       | ais...
        
         | Traubenfuchs wrote:
         | https://archive.is/R0RJB
        
           | Animats wrote:
           | Management summary, from The Economist article:
           | 
           |  _" The worst effects of this flaw are reserved for those who
           | create what is known as the "lethal trifecta". If a company,
           | eager to offer a powerful AI assistant to its employees,
           | gives an LLM access to un-trusted data, the ability to read
           | valuable secrets and the ability to communicate with the
           | outside world at the same time, then trouble is sure to
           | follow. And avoiding this is not just a matter for AI
           | engineers. Ordinary users, too, need to learn how to use AI
           | safely, because installing the wrong combination of apps can
           | generate the trifecta accidentally."_
        
       | CollinEMac wrote:
       | > It's entirely possible that some dangerous capability is hidden
       | in ChatGPT, but nobody's figured out the right prompt just yet.
       | 
       | This sounds a little dramatic. The _capabilities_ of ChatGPT are
       | known. It generates text and images. The qualities of the content
       | of the generated text and images is not fully known.
        
         | alephnerd wrote:
         | Also, there's a reason AI Red Teaming is now an ask that is
         | getting line item funding from C-Suites.
        
         | luxuryballs wrote:
         | Yeah, and to riff off the headline, if something dangerous is
         | connected to and taking commands from ChatGPT then you better
         | make sure there's a way to turn it off.
        
         | kube-system wrote:
         | And that sounds a little reductive. There's a lot that can be
         | done with text and images. Some of the most influential people
         | and organizations in the world wield their power with text and
         | images.
        
         | kelvinjps10 wrote:
         | Think of the news about the kid who got recommended to suicide
         | by ChatGPT, or chatgpt providing the user information on how to
         | do illegal activities, these capabilities are the ones that the
         | author it's referring to
        
         | Nasrudith wrote:
         | Plus there is the 'monkeys with typewriters' problem with both
         | danger and hypothetical good. In contrast, ChatGPT may
         | technically reply to the right prompt with a universal cancer
         | cure/vaccine. Psuedorandomly generating it wouldn't help as you
         | wouldn't recognize it from all of the other queries of things
         | we don't know of as true or false.
         | 
         | Likewise what to ask it for how to make some sort of horrific
         | toxic chemical, nuclear bomb, or similar isn't much good if you
         | cannot recognize it and dangerous capability depends heavily on
         | what you have available to you. Any idiot can be dangerous with
         | C4 and detonator or bleach and ammonia. Even if ChatGPT could
         | give entirely accurate instructions on how to build an atomic
         | bomb it wouldn't do much good because you wouldn't be able to
         | source the tools and materials without setting off red flags.
        
       | chasing0entropy wrote:
       | 70 years ago we were fascinated by the concept of converting
       | analog to a perfect digital copy. In reality, that goal was a
       | pipe drea!m and the closest we can ever get is a near identical
       | facimile to which data fits... But it's still quite easy to
       | determine digital from true analog with rudimentary means.
       | 
       | Human thought is analog. It is based on chemical reactions, time,
       | and unpredictably (effectively) random physical characteristics.
       | AI is an attempt to turn that which is purely digital into an
       | rational analog thought equivalent.
       | 
       | No matter how much effort, money, power, and rare mineral eating
       | TPUs will - ever - produce true analog data.
        
         | largbae wrote:
         | This is all true. But digital audio and video media has
         | captured essentially all economic value outside of live
         | performance. So it seems likely that we will find a "good
         | enough" in this domain too.
        
         | bcoates wrote:
         | It's been closer to 100 years since we figured out information
         | theory and discredited this idea (that continuous/analog
         | processes have more, or different, information in them than
         | discrete/digital ones)
        
       | excalibur wrote:
       | > It's entirely possible that some dangerous capability is hidden
       | in ChatGPT, but nobody's figured out the right prompt just yet.
       | 
       | Or they have, but chose to exploit or stockpile it rather than
       | expose it.
        
       | bitwize wrote:
       | Boss: You can just turn it off, can't you?
       | 
       | Me: Ask me later.
        
       | skywhopper wrote:
       | Not the point, but I'm confused by the Geoguessr screenshot.
       | Under the reasoning for its decision, it mentions "traffic keeps
       | to the left" but that is not apparent from the photo.
       | 
       | Then it says the shop sign looks like a "Latin alphabet business
       | name rather than Spanish or Portuguese". Uhhh... what? Spanish
       | and Portuguese use the Latin alphabet.
        
       | freetime2 wrote:
       | For a real world example of the challenges of harnessing LLMs,
       | look at Apple. Over a year ago they had a big product launch
       | focused on "Apple Intelligence" that was supposed to make heavy
       | use of LLMs for agentic workflows. But all we've really gotten
       | since then are a couple of minor tools for making emojis,
       | summarizing notifications, and proof reading. And they even had
       | to roll back the notification summaries for a while for being
       | wildly "out of control". [1] And in this year's iPhone launch the
       | AI marketing was toned down _significantly_.
       | 
       | I think Apple execs genuinely underestimated how difficult it
       | would be to get LLMs to perform up to Apple's typical standards
       | of polish and control.
       | 
       | [1] https://www.bbc.com/news/articles/cge93de21n0o
        
         | __loam wrote:
         | I'm happy they ate shit here because I like my mac not getting
         | co-pilot bullshit forced into it, but apparently Apple had two
         | separate teams competing against each other on this topic.
         | Supposedly a lot of politics got in the way of delivering on a
         | good product combined with the general difficulty of building
         | LLM products.
        
       | andrewmutz wrote:
       | Tremendous alpha right now in making scary posts about AI. Fear
       | drives clicks. You don't even need to point to current problems,
       | all you have to do is say we can't be sure they won't happen in
       | the future.
        
       | avalys wrote:
       | All the same criticisms are true about hiring humans. You don't
       | really know what they're thinking, you don't really know what
       | their values and morals are, you can't trust that they'll never
       | make a mistake, etc.
        
       ___________________________________________________________________
       (page generated 2025-10-14 23:00 UTC)