[HN Gopher] Why your boss isn't worried about AI - "can't you ju...
___________________________________________________________________
Why your boss isn't worried about AI - "can't you just turn it
off?"
Author : beyarkay
Score : 144 points
Date : 2025-10-14 18:26 UTC (4 hours ago)
(HTM) web link (boydkane.com)
(TXT) w3m dump (boydkane.com)
| kazinator wrote:
| > _AIs will get more reliable over time, like old software is
| more reliable than new software._
|
| :)
|
| Was that a humam Freudian slip, or artificial one?
|
| Yes, old software is often more reliable than new.
| joomla199 wrote:
| Neither, you're reading it wrong. Think of it as codebases
| getting more reliable over time as they accumulate fixes and
| tests. (As opposed to, say, writing code in NodeJS versus C++)
| giancarlostoro wrote:
| Age of Code does not automatically equal quality of code,
| ever. Good code is maintained by good developers. A lot of
| bad code is pushed out by management, and other situations,
| or just bad devs. This is a can of worms you're talking your
| way into.
| 1313ed01 wrote:
| Old code that has been maintained (bugfixed), but not
| messed with too much (i.e. major rewrites or new features)
| is almost certain to be better than most other code though?
| eptcyka wrote:
| I've read parts of macOS' open source code that surely
| has been around for a while, maintained and absolute
| rubbish.
| DSMan195276 wrote:
| "Bugfixes" doesn't mean the code actually got better, it
| just means someone attempted to fix a bug. I've seen
| plenty of people make code worse and more buggy by trying
| to fix a bug, and also plenty of old "maintained" code
| that still has tons of bugs because it started from the
| wrong foundation and everyone kept bolting on fixes
| around the bad part.
| prasadjoglekar wrote:
| It actually might. Older code running in production is
| almost automatically regression tested with each new fix.
| It might not be pretty, but it's definitely more reliable
| for solving real problems.
| shakna wrote:
| The list of bugs tagged regression at work certainly
| suggests it gets tested... But fixing those
| regressions...? That's a lot of dev time for things that
| don't really have time allocated for them.
| LeifCarrotson wrote:
| You're using different words - the top comment only
| mentioned the reliability of the software, which is only
| tangentially related to the quality, goodness, or badness
| of the code used to write it.
|
| Old software is typically more reliable, not because the
| developers were better or the software engineering targeted
| a higher reliability metric, but because it's been tested
| in the real world for years. Even more so if you consider a
| known bug to be "reliable" behavior: "Sure, it crashes when
| you enter an apostrophe in the name field, but everyone
| knows that, there's a sticky note taped to the
| receptionist's monitor so the new girl doesn't forget."
|
| Maybe the new software has a more comprehensive automated
| testing framework - maybe it simply _has_ tests, where the
| old software had none - but regardless of how accurate you
| make your mock objects, decades of end-to-end testing in
| the real world is hard to replace.
|
| As an industrial controls engineer, when I walk up to a
| machine that's 30 years old but isn't working anymore, I'm
| looking for failed mechanical components. Some switch is
| worn out, a cable got crushed, a bearing is failing...it's
| not the code's fault. It's not even the CMOS battery
| failing and dropping memory this time, because we've had
| that problem 4 times already, we recognize it and have a
| procedure to prevent it happening again. The code didn't
| change spontaneously, it's solved the business problem for
| decades... Conversely, when I walk up to a newly
| commissioned machine that's only been on the floor for a
| month, the problem is probably something that hasn't ever
| been tried before and was missed in the test procedure.
| freetime2 wrote:
| Yup, I have worked on several legacy codebases, and a
| pretty common occurence is that a new team member will
| join and think they may have discovered a bug in the
| code. Sometimes they are even quite adamant that the code
| is complete garbage and could never have worked properly.
| Usually the conversation goes something like: "This code
| is heavily used in production, and hasn't been touched in
| 10 years. If it's broken, then why haven't we had any
| complaints from users?"
|
| And more often than not the issue is a local
| configuration issue, bad test data, a misunderstanding of
| what the code is supposed to do, not being aware of some
| alternate execution path or other pre/post processing
| that is running, some known issue that we've decided not
| to fix for some reason, etc. (And of course sometimes we
| do actually discover a completely new bug, but it's
| rare).
|
| To be clear, there are certainly code quality issues
| present that make _modifications_ to the code costly and
| risky. But the code itself is quite reliable, as most
| bugs have been found and fixed over the years. And a lot
| of the messy bits in the code are actually important
| usability enhancements that get bolted on after the fact
| in response to real-world user feedback.
| hatthew wrote:
| I think we all agree that the quality of the code itself
| goes down over time. I think the point that is being made
| is that the quality of the _final product_ goes up over
| time.
|
| E.g. you might fix a bug by adding a hacky workaround in
| the code; better product, worse code.
| kube-system wrote:
| The author didn't mean that an older commit date on a file
| makes code better.
|
| The author is talking about the maturity of a project.
| Likewise, as AI technologies become more mature we will
| have more tools to use them in a safer and more reliable
| way.
| izzydata wrote:
| Sounds more like survivorship bias. All the bad codebases
| were thrown out and only the good ones lasted a long time.
| wsc981 wrote:
| Basically the Lindy Effect:
| https://en.wikipedia.org/wiki/Lindy_effect
| wvenable wrote:
| In my experience actively maintained but not heavily
| modified applications tend towards stability over time. It
| don't even matter if they are good or bad codebases -- even
| a bad code will become less buggy over time if someone is
| working on bug fixes.
|
| New code is the source of new bugs. Whether that's an
| entirely new product, a new feature on an existing project,
| or refactoring.
| kazinator wrote:
| You mean think of it as opposite to what is written in the
| remark, and then find it funny?
|
| Yes, I did that.
| james_marks wrote:
| I've always called this "Work Hardening", as in, the software
| has been improved over time by real work being done with it.
| glitchc wrote:
| Perhaps better rephrased as "software that's been running for a
| (long) while is more reliable than software that only started
| running recently."
| kstrauser wrote:
| Holy survivorship bias, Batman.
|
| If you think modern software is unreliable, let me introduce
| you to our friend, Rational Rose.
| kazinator wrote:
| At least that project was wise enough to use Lisp for storing
| its project files.
| noir_lord wrote:
| Agreed.
|
| Or debuggers that would take out the entire OS.
|
| Or a bad driver crashing everything multiple times a week.
|
| Or a misbehaving process not handing control back to the OS.
|
| I grew up in the era of 8 and 16 bit micros and early PCs,
| they where hilariously less stable than modern machines while
| doing far less, there wasn't some halcyon age of near perfect
| software, it's always been a case of things been good enough
| to be good enough but at least operating systems _did_
| improve.
| malfist wrote:
| Remember BSODs? Used to be a regular occurrence, now
| they're so infrequent they're gone from windows 11
| kazinator wrote:
| I remember Linux being remarkable reliable throughout its
| entire life _in spite of being rabidly worked on_.
|
| Windows is only stabilizing because it's basically dead.
| All the activity is in the higher layers, where they are
| racking their brains on how to enshittify the experience,
| and extract value out of the remaining users.
| wlesieutre wrote:
| And the "cooperative multitasking" in old operating
| systems where one program locking up meant the whole
| system was locked up
| krior wrote:
| Gone? I had two last year, lets not overstate things.
| rkomorn wrote:
| My anecdata is that my current PC is four years old, with
| the same OS install, and I can't even recall if I've seen
| one BSoD.
| ClimaxGravely wrote:
| Still get them fairly regularly except now they come with
| a QR code.
| dist-epoch wrote:
| Mostly because Microsoft shut down kernel access, wrote
| it's own generic drivers for "simple" devices (USBs,
| printers, sound cards, ...) and made "heavy" drivers
| submit to their WHQL quality control to be signed to run.
| ponector wrote:
| I guess that is because you run it on old hardware. When
| I've bought my Asus ROG expensive laptop I had bsod
| almost daily. A year later with all updates I had bsod
| once in a month on the same device and windows
| installation.
| Podrod wrote:
| They're definitely not gone.
| Yoric wrote:
| I grew up in the same era and I recall crashes being less
| frequent.
|
| There were plenty of other issues, including the fact that
| you had to adjust the right IRQ and DMA for your Sound
| Blaster manually, both physically and in each game, or that
| you needed to "optimize" memory usage, enable XMS or EMS or
| whatever it was at the time, or that you spent hours
| looking at the nice defrag/diskopt playing with your files,
| etc.
|
| More generally, as you hint to, desktop operating systems
| were crap, but the software on top of it was much more
| comprehensively debugged. This was presumably a combination
| of two factors: you couldn't ship patches, so you had a
| strong incentive to debug it if you wanted to sell it, and
| software had way fewer features.
|
| Come to think about it, early browsers kept crashing and
| taking down the entire OS, so maybe I'm looking at it with
| rosy glasses.
| binarymax wrote:
| You know, I had spent a good amount of years not having even
| a single thought about rational rose, and now that's all
| over.
| cjbgkagh wrote:
| How much of that do you think would be attributable to IBM
| or Rational Software?
| kstrauser wrote:
| I do apologize. I couldn't bear this burden alone.
| fidotron wrote:
| But this is why using the AI in the production of (almost)
| deterministic systems makes so much sense, including saving on
| execution costs.
|
| ISTR someone else round here observing how much more effective it
| is to ask these things to write short scripts that perform a task
| than doing the task themselves, and this is my experience as
| well.
|
| If/when AI actually gets much better it will be the boss that has
| the problem. This is one of the things that baffles me about the
| managerial globalists - they don't seem to appreciate that a
| suitably advanced AI will point the finger at them for
| inefficiency much more so than at the plebs, for which it will
| have a use for quite a while.
| pixl97 wrote:
| >that baffles me about the managerial globalists
|
| It's no different from those on HN that yell loudly that unions
| for programmers are the worst idea ever... "it will never be
| me" is all they can think, then they are protesting in the
| streets when it is them, but only after the hypocrisy of
| mocking those in the street protesting today.
| hn_acc1 wrote:
| Agreed. My dad was raised strongly fundamentalist, and in
| North America, that included (back then) strongly resisting
| unions. In hindsight, I've come to realize that my parent's
| weren't maybe even of average intelligence, and definitely of
| above-average gullibility.
|
| Unionized software engineers would solve a lot of the "we
| always work 80 hour weeks for 2 months at the end of a
| release cycle" problems, the "you're too old, you're fired"
| issues, the "new hires seems to always make more than the
| 5/10+ year veterans", etc. Sure, you wouldn't have a few
| getting super rich, but it would also make it a lot easier
| for "unionized" action against companies like Meta, Google,
| Oracle, etc. Right now, the employers hold like 100x the
| power of the employees in tech. Just look at how much any
| kind of resistance to fascism has dwindled after FAANG had
| another round of layoffs..
| fidotron wrote:
| Software "engineers" totally miss a key thing in other
| engineering professions as well, which is organizations to
| enforce some pretense of ethical standards to help push
| back against requests from product. Those orgs often look a
| lot like unions.
| hn_acc1 wrote:
| A bunch of short scripts doesn't easily lead to a large-scale
| robust software platform.
|
| I guess if managers get canned, it'll be just marketing types
| left?
| xutopia wrote:
| The most likely danger with AI is concentrated power, not that
| sentient AI will develop a dislike for us and use us as
| "batteries" like in the Matrix.
| preciousoo wrote:
| Seems like a self fulfilling prophecy
| yoyohello13 wrote:
| Definitely not 'self' fulfilling. There are plenty of people
| actively and vigorously working to fulfill that particular
| reality.
| fidotron wrote:
| I'm not so sure it will be that either, it would be having
| multiple AIs essentially at war with each other over access to
| GPUs/energy or whatever the materials are needed to grow
| if/when that happens. We will end up as pawns in this conflict.
| ben_w wrote:
| Given that even fairly mediocre human intelligences can run
| countries into the ground and avoid being thrown out in the
| process, it's certainly _possible_ for an AI to be in the
| intelligence range where it 's smart enough to win vs humans
| but also dumb enough to turn us into pawns rather just go to
| space and blot out the sun with a Dyson swarm made from the
| planet Mercury.
|
| But don't count on it.
|
| I mean, apart from anything else, that's still a bad outcome.
| pcdevils wrote:
| For one thing, we'd make shit batteries.
| prometheus76 wrote:
| They farm you for attention, not electricity. Attention
| (engagement time) is how they quantify "quality" so that it
| can be gamed with an algorithm.
| noir_lord wrote:
| IIRC the original idea was that the machines used our brain
| capacity as a distributed array but then they decided
| batteries was easier to understand while been sillier, just
| burn the carbon they are feeding us, it's more efficient.
| darth_avocado wrote:
| The reality is that the CEO/executive class already has
| developed a dislike for us and is trying to use us as
| "batteries" like in the Matrix.
| ljlolel wrote:
| CEOs (even most VCs) are labor too
| pavel_lishin wrote:
| Do they know it?
| toomuchtodo wrote:
| Labor competes for compensation, CEOs compete for status
| (above a certain enterprise size, admittedly). Show me a
| CEO willingly stepping down to be replaced by generative
| AI. Jamie Dimon will be so bold to say AI will bring about
| a 3 day week (because it grabs headlines [1]) but he isn't
| going to give up the status of running JPMC; it's all he
| has besides the wealth, which does not appear to be enough.
| The feeling of importance and exceptionalism is baked into
| the identity.
|
| [1] https://fortune.com/article/jamie-dimon-jpmorgan-chase-
| ceo-a...
| Animats wrote:
| That's the market's job. Once AI CEOs start outperforming
| human CEOs, investment will flow to the winners. Give it
| 5-10 years.
|
| (Has anyone tried an LLM on an in-basket test? [1] That's
| a basic test for managers.)
|
| [1] https://en.wikipedia.org/wiki/In-basket_test
| conception wrote:
| Spoiler there's no reason we couldn't work three days a
| week now. And 100 might be pushing it, but having life
| expectancy to 90 as well within our grass today as well.
| We have just decided not to do that.
| darth_avocado wrote:
| Until shareholders treat them as such, they will remain in
| the ruling class
| icedchai wrote:
| Almost everyone is "labor" to some extent. There is always
| a huge customer or major investor that you are beholden to.
| If you are independently wealthy then you are the
| exception.
| vladms wrote:
| Do you know personally some CEO-s? I know a couple and they
| generally seem less empathic than the general population, so
| I don't think that like/dislike even applies.
|
| On the other hand, trying to do something "new" is lots of
| headaches, so emotions are not always a plus. I could make a
| parallel to doctors: you don't want a doctor to start crying
| in a middle of an operation because he feels bad for you, but
| you can't let doctors doing everything that they want - there
| needs to be some checks on them.
| darth_avocado wrote:
| I would say that the parallel is not at all accurate
| because the relationship between a doctor and a patient
| undergoing surgery is not the same as the one you and I
| have with CEOs. And a lot of good doctors have emotions and
| they use them to influence patient outcomes positively.
| nancyminusone wrote:
| To me, the greatest threat is information pollution. Primary
| sources will be diluted so heavily in an ocean of generated
| trash that you might as well not even bother to look through
| any of it.
| tobias3 wrote:
| And it imitates all the unimportant bits perfectly (like
| spelling, grammar, word choice) while failing at the hard to
| verify important bits (truth, consistency, novelty)
| worldsayshi wrote:
| > power resides where men believe it resides
|
| And also where people believe that others believe it resides.
| Etc...
|
| If we can find new ways to collectively renegotiate where we
| think power should reside we can break the cycle.
|
| But we only have time to do this until people aren't a
| significant power factor anymore. But that's still quite some
| time away.
| SkyBelow wrote:
| I agree.
|
| Our best technology at current require teams of people to
| operate and entire legions to maintain. This leads to a sort of
| balance, one single person can never go too far down any path
| on their own unless they convince others to join/follow them.
| That doesn't make this a perfect guard, we've seen it go
| horribly wrong in the past, but, at least in theory, this
| provides a dampening factor. It requires a relatively large
| group to go far along any path, towards good or evil.
|
| AI reduces this. How greatly it reduces this, if it reduces it
| to only a handful, to a single person, or even to 0 people
| (putting itself in charge), seems to not change the danger of
| this reduction.
| mrob wrote:
| Why does an AI need the ability to "dislike" to calculate that
| its goals are best accomplished without any living humans
| around to interfere? Superintelligence doesn't need emotions or
| consciousness to be dangerous.
| Yoric wrote:
| It needs to optimize for something. Like/dislike is an
| anthropomorphization of the concept.
| mrob wrote:
| It's an unhelpful one because it implies the danger is
| somehow the result of irrational or impulsive thought, and
| making the AI smarter will avoid it.
| Yoric wrote:
| That's not how I read it.
|
| Perhaps because most of the smartest people I know are
| regularly irrational or impulsive :)
| ben_w wrote:
| I think most people don't get that; look at how often
| even Star Trek script writers write Straw Vulcans*.
|
| * https://tvtropes.org/pmwiki/pmwiki.php/Main/StrawVulcan
| surgical_fire wrote:
| "AI will take over the world".
|
| I hear that. Then I try to use AI for simple code task, writing
| unit tests for a class, very similar to other unit tests. If
| fails miserably. Forgets to add an annotation and enters in a
| death loop of bullshit code generation. Generates test classes
| that tests failed test classes that test failed test classes
| and so on. Fascinating to watch. I wonder how much CO2 it
| generated while frying some Nvidia GPU in an overpriced data
| center.
|
| AI singularity may happen, but the Mother Brain will be a
| complete moron anyway.
| alecbz wrote:
| Regularly trying to use LLMs to debug coding issues has
| convinced me that we're _nowhere_ close to the kind of AGI
| some are imagining is right around the corner.
| surgical_fire wrote:
| At least Mother Brain will praise your prompt to generate
| yet another image in the style of Studio Ghibli as proof
| that your mind is a _tour de force_ in creativity, and only
| a borderline genius would ask for such a thing.
| ben_w wrote:
| Sure, but also the METR study showed the rate of change is
| t doubles every 7 months where t ~= <<duration of human
| time needed to complete a task, such that SOTA AI can
| complete same with 50% success>>:
| https://arxiv.org/pdf/2503.14499
|
| I don't know how long that exponential will continue for,
| and I have my suspicions that it stops before week-long
| tasks, but that's the trend-line we're on.
| ben_w wrote:
| Concentrated power is kinda a pre-requisite for anything bad
| happening, so yes, it's more likely in exactly the same way
| that given this: Linda is 31 years old, single,
| outspoken, and very bright. She majored in philosophy. As a
| student, she was deeply concerned with issues of discrimination
| and social justice, and also participated in anti-nuclear
| demonstrations.
|
| "Linda is a bank teller" is strictly more likely than "Linda is
| a bank teller and is active in the feminist movement" -- all
| you have is P(a)>P(a&b), not what the probability of either
| statement is.
| navane wrote:
| The power concentration is already massive, and a huge problem
| indeed. The ai is just a cherry on top. The ai is not the
| problem.
| mmmore wrote:
| You can say that, and I might even agree, but many smart people
| disagree. Could you explain why you believe that? Have you read
| in detail the arguments of people who disagree with you?
| alganet wrote:
| > here are some example ideas that are perfectly true when
| applied to regular software
|
| Hm, I'm listening, let's see.
|
| > Software vulnerabilities are caused by mistakes in the code
|
| That's not exactly true. In regular software, the code can be
| fine and you can still end up with vulnerabilities. The platform
| in which the code is deployed could be vulnerable, or the way it
| is installed make it vulnerable, and so on.
|
| > Bugs in the code can be found by carefully analysing the code
|
| Once again, not exactly true. Have you ever tried understanding
| concurrent code just by reading it? Some bugs in regular software
| hide in places that human minds cannot probe.
|
| > Once a bug is fixed, it won't come back again
|
| Ok, I'm starting to feel this is a troll post. This guy can't be
| serious.
|
| > If you give specifications beforehand, you can get software
| that meets those specifications
|
| Have you read The Mythical Man-Month?
| SalientBlue wrote:
| You should read the footnote marked [1] after "a note for
| technical folk" at the beginning of the article. He is very
| consciously making sweeping generalizations about how software
| works in order to make things intelligible to non-technical
| readers.
| dkersten wrote:
| Sure, but:
|
| > these claims mostly hold, but they break down when applied
| to distributed systems, parallel code, or complex
| interactions between software systems and human processes
|
| The claims the GP quoted DON'T mostly hold, they're just
| plain wrong. At least the last two, anyway.
| pavel_lishin wrote:
| But are those sweeping generalizations true?
|
| > _I'm also going to be making some sweeping statements about
| "how software works", these claims mostly hold, but they
| break down when applied to distributed systems, parallel
| code, or complex interactions between software systems and
| human processes._
|
| I'd argue that this describes most software written since,
| uh, I hesitate to even commit to a decade here.
| hedora wrote:
| At least the 1950's. That's when stuff like asynchrony and
| interrupts were worked out. Dijkstra wrote at length about
| this in reference to writing code that could drive a
| teletype (which had fundamentally non-deterministic
| timings).
|
| If you include analog computers, then there are some WWII
| targeting computers that definitely qualify (e.g., on
| aircraft carriers).
| SalientBlue wrote:
| For the purposes of the article, which is to demonstrate
| how developing an LLM is completely different from
| developing traditional software, I'd say they are true
| enough. It's a CS 101 understanding of the software
| development lifecycle, which for non-technical readers is
| enough to get the point across. An accurate depiction of
| software development would only obscure the actual point
| for the lay reader.
| alganet wrote:
| Does that really matter?
|
| He is trying to lax the general public perception around AIs
| shortcomings. He's giving AI a break, at the expense of
| regular developers.
|
| This is wrong on two fronts:
|
| First, because many people foresaw the AI shortcomings and
| warned about them. This "we can't fix a bug like in regular
| software" theatre hides the fact that we can design better
| benchmarks, or accountability frameworks. Again, lots of
| people foresaw this, and they were ignored.
|
| Second, because it puts the strain on non-AI developers. It
| blamishes all the industry, putting together AI with non-AI
| in the same bucket, as if AI companies stumbled on this new
| thing and were not prepared for its problems, when the
| reality is that many people were anxious about the AI
| companies practices not being up to standard.
|
| I think it's a disgraceful take, that only serves to sweep
| things under a carpet.
| SalientBlue wrote:
| I don't think he's doing that at all. The article is
| pointing out to non-technical people how AI is different
| than traditional software. I'm not sure how you think it's
| giving AI a break, as it's pointing out that it is
| essentially impossible to reason about. And it's not at the
| expense of regular developers because it's showing how
| regular software development is _different_ than this. It
| makes two buckets, and puts AI in one and non-AI in the
| other.
| alganet wrote:
| He is. Maybe he's just running with the pack, but that
| doesn't matter either.
|
| The fact is, we kind of know how to prevent problems in
| AI systems:
|
| - Good benchmarks. People said several times that LLMs
| display erratic behavior that could be prevented. Instead
| of adjusting the benchmarks (which would slow down
| development), they ignored the issues.
|
| - Accountability frameworks. Who is responsible when an
| AI fails? How the company responsible for the model is
| going to make up for it? That was a demand from the very
| beginning. There are no such accountability systems in
| place. It's a clown fiesta.
|
| - Slowing down. If you have a buggy product, you don't
| scale it. First, you try to understand the problem. This
| was the opposite of what happened, and at the time, they
| lied that scaling would solve the issues (when in fact
| many people knew for a fact that scaling wouldn't solve
| shit).
|
| Yes, it's kind of different. But it's a different we
| already know. Stop pushing this idea that this stuff is
| completely new.
| SalientBlue wrote:
| >But it's a different we already know
|
| 'we' is the operative word here. 'We', meaning technical
| people who have followed this stuff for years. The target
| audience of this article are not part of this 'we' and
| this stuff IS completely new _for them_. The target
| audience are people who, when confronted with a problem
| with an LLM, think it is perfectly reasonable to just
| tell someone to 'look at the code' and 'fix the bug'. You
| are not the target audience and you are arguing something
| entirely different.
| alganet wrote:
| Let's pretend I'm the audience, and imagine that in the
| past I said those things ("fix the bug" and "look at the
| code").
|
| What should I say now? "AI works in mysterious ways"?
| Doesn't sound very useful.
|
| Also, should I start parroting innacurate outdated
| generalizations about regular software?
|
| The post doesn't teach anything useful for a beginner
| audience. It's bamboozling them. I am amazed that you
| used the audience perspective as a defense of some kind.
| It only made it worse.
|
| Please, please, take a moment to digest my critique
| properly. Think about what you just said and what that
| implies. Re-read the thread if needed.
| drsupergud wrote:
| > bugs are usually caused by problems in the data used to train
| an AI
|
| This also is a misunderstanding.
|
| The LLM can be fine, the training and data can be fine, but
| because the LLMs we use are non-deterministic (at least in regard
| to their being intentional attempts at entropy to avoid always
| failing certain scenarios) current algorithms are inherently by-
| design not going to always answer every question correctly that
| it potentially could have if the values that fall within a range
| had been specific values for that scenario. You roll the dice on
| every answer.
| coliveira wrote:
| This is not necessarily a problem. Any programming or
| mathematical question has several correct answers. The problem
| with LLMs is that they don't have a process to guarantee that a
| solution is correct. They will give a solution that seems
| correct under their heuristic reasoning, but they arrived at
| that result in a non-logical way. That's why LLMs generate so
| many bugs in software and in anything related to logical
| thinking.
| vladms wrote:
| > Any programming or mathematical question has several
| correct answers.
|
| Huh? If I need to sort the list of integer number of 3,1,2 in
| ascending order the only correct answer is 1,2,3. And there
| are multiple programming and mathematical questions with only
| one correct answer.
|
| If you want to say "some programming and mathematical
| questions have several correct answers" that might hold.
| redblacktree wrote:
| What about multiple notational variations?
|
| 1, 2, 3
|
| 1,2,3
|
| [1,2,3]
|
| 1 2 3
|
| etc.
| naasking wrote:
| I think more charitably, they meant either that 1. There is
| often more than one way to arrive at any given answer, or
| 2. Many questions are ambiguous and so may have many
| different answers.
| Yoric wrote:
| "1, 2, 3" is a correct answer
|
| "1 2 3" is another
|
| "After sorting, we get `1, 2, 3`" yet another
|
| etc.
|
| At least, that's how I understood GP's comment.
| naasking wrote:
| > The problem with LLMs is that they don't have a process to
| guarantee that a solution is correct
|
| Neither do we.
|
| > They will give a solution that seems correct under their
| heuristic reasoning, but they arrived at that result in a
| non-logical way.
|
| As do we, and so you can correctly reframe the issue as
| "there's a gap between the quality of AI heuristics and the
| quality of human heuristics". That the gap is still shrinking
| though.
| tyg13 wrote:
| I'll never doubt the ability of people like yourself to
| consistently mischaracterize human capabilities in order to
| make it seem like LLMs' flaws are just the same as (maybe
| even fewer than!) humans. There are still so many obvious
| errors (noticeable by just using Claude or ChatGPT to do
| some non-trivial task) that the average human would simply
| not make.
|
| And no, just because you can imagine a human stupid enough
| to make the same mistake, doesn't mean that LLMs are
| somehow human in their flaws.
|
| > the gap is still shrinking though
|
| I can tell this human is fond of extrapolation. If the gap
| is getting smaller, surely soon it will be zero, right?
| ben_w wrote:
| > doesn't mean that LLMs are somehow human in their
| flaws.
|
| I don't believe anyone is suggesting that LLMs flaws are
| perfectly 1:1 aligned with human flaws, just that both do
| have flaws.
|
| > If the gap is getting smaller, surely soon it will be
| zero, right?
|
| The gap between y=x^2 and y=-x^2-1 gets closer for a bit,
| fails to ever become zero, then gets bigger.
|
| The difference between any given human (or even all
| humans) and AI will never be zero: Some future AI that
| can _only_ do what one or all of us can do, can be
| trivially glued to any of that other stuff where AI can
| already do better, like chess and go (and stuff simple
| computers can do better, like arithmetic).
| naasking wrote:
| > I'll never doubt the ability of people like yourself to
| consistently mischaracterize human capabilities
|
| Ditto for your mischaracterizations of LLMs.
|
| > There are still so many obvious errors (noticeable by
| just using Claude or ChatGPT to do some non-trivial task)
| that the average human would simply not make.
|
| Firstly, so what? LLMs also do things no human could do.
|
| Secondly, they've learned from unimodal data sets which
| don't have the rich semantic content that humans are
| exposed to (not to mention born with due to evolution).
| Questions that cross modal boundaries are expected to be
| wrong.
|
| > If the gap is getting smaller, surely soon it will be
| zero, right?
|
| Quantify "soon".
| smallnix wrote:
| > bad behaviour isn't caused by any single bad piece of data, but
| by the combined effects of significant fractions of the dataset
|
| Related opposing data point to this statement:
| https://news.ycombinator.com/item?id=45529587
| buellerbueller wrote:
| "Signficiant fraction" does not imply (to this data scientist)
| a large fraction.
| themanmaran wrote:
| > Because eventually we'll iron out all the bugs so the AIs will
| get more reliable over time
|
| Honestly this feels like a true statement to me. It's obviously a
| new technology, but so much of the "non-deterministic ===
| unusable" HN sentiment seems to ignore the last two years where
| LLMs have become 10x as reliable as the initial models.
| criddell wrote:
| Right away my mind went to "well, are people more reliable than
| they used to be?" and I'm not sure they are.
|
| Of course LLMs aren't people, but an AGI might behave like a
| person.
| adastra22 wrote:
| Older people are generally more reliable than younger people.
| Yoric wrote:
| By the time a junior dev graduates to senior, I expect that
| they'll be more reliable. In fact, at the end of each
| project, I expect the junior dev to have grown more reliable.
|
| LLMs don't learn from a project. At best, you learn how to
| better use the LLM.
|
| They do have other benefits, of course, i.e. once you have
| trained one generation of Claude, you have as many instances
| as you need, something that isn't true with human beings.
| Whether that makes up for the lack of quality is an open
| question, which presumably depends on the projects.
| CobrastanJorji wrote:
| They have certainly gotten better, but it seems to me like the
| growth will be kind of logarithmic. I'd expect them to keep
| getting better quickly for a few more years and then kinda slow
| and eventually flatline as we reach the maximum for this sort
| of pattern matching kind of ML. And I expect that flat line
| will be well below the threshold needed for, say, a small
| software company to not require a programmer.
| Terr_ wrote:
| > kind of logarithmic
|
| https://en.wikipedia.org/wiki/Sigmoid_function
| CobrastanJorji wrote:
| Ironically, yes. :)
| freediver wrote:
| Lovely blog, RSS please.
| meonkeys wrote:
| There's... something at https://boydkane.com/index.xml
|
| I guessed the URL based on the Quartz docs. It seems to work
| but only has a few items from https://boydkane.com/essays/
| 5- wrote:
| the author (either of the blog or its software) would do well
| to consult https://www.petefreitag.com/blog/rss-
| autodiscovery/
| nlawalker wrote:
| Where did _" can't you just turn it off?"_ in the title come
| from? It doesn't appear anywhere in the actual title or the
| article, and I don't think it really aligns with its main
| assertions.
| meonkeys wrote:
| It shows up at https://boydkane.com under the link "Why your
| boss isn't worried about advanced AI". Must be some kind of
| sub-heading, but not part of the actual article / blog post.
|
| Presumably it's a phrase you might hear from a boss who sees AI
| as similar to (and as benign/known/deterministic as) most other
| software, per TFA
| nlawalker wrote:
| Ah, thanks for that!
|
| _> Presumably it's a phrase you might hear from a boss who
| sees AI as similar to (and as benign/known/deterministic as)
| most other software, per TFA_
|
| Yeah I get that, but I think that given the content of the
| article, _" can't you just fix the code?"_ or the like would
| have been a better fit.
| omnicognate wrote:
| It's a poor choice of phrase if the purpose is to illustrate
| a false equivalence. It applies to AI both as much (you can
| kill a process or stop a machine just the same regardless of
| whether it's running an LLM) and as little (you can't "turn
| off" Facebook any more than you can "turn off" ChatGPT) as it
| does to any other kind of software.
| Izkata wrote:
| It's a sci-fi thing, think of it along the lines of "What do
| you mean Skynet has gone rogue? Can't you just turn it off?"
|
| (I think something along these lines was actually in the
| _Terminator 3_ movie, the one where Skynet goes live for the
| first time).
|
| Agreed though, no relation to the actual post.
| wmf wrote:
| Turning AI off comes up a lot in existential risk discussions
| so I was surprised the article isn't about that.
| mikkupikku wrote:
| I don't understand the "your boss" framing of this article, or
| more accurately, the title of this article. The article contents
| don't actually seem to have anything to do with management
| specifically. Is the reader is meant to believe that not being
| scared of AI is a characteristic of the managerial class? Is the
| unstated implication that there is some class warfare angle and
| anybody who isn't against AI is against laborers? Because what
| the article actually overtly argues, without any reading between
| the lines, is quite mundane.
| freetime2 wrote:
| > Is the unstated implication that there is some class warfare
| angle and anybody who isn't against AI is against laborers?
|
| I didn't read it that way. I read "your boss" as basically
| meaning any non-technical person who may not understand the
| challenges of harnessing LLMs compared to traditional, (more)
| deterministic software development.
| tptacek wrote:
| It would help if this piece was clearer about the context in
| which "AI bugs" reveal themselves. As an argument for why you
| shouldn't have LLMs making unsupervised real-time critical
| decisions, these points are all well taken. AI shouldn't be
| controlling the traffic lights in your town. _We may never reach
| a point where it can._ But among technologists, the major front
| on which these kinds of bugs are discussed is coding agents, and
| almost none of these points apply directly to coding agents:
| agent coding is (or should be) a supervised process.
| wrs wrote:
| My current method for trying to break through this misconception
| is informing people that nobody knows how AI works. Literally.
| Nobody knows. (Note that knowing how to make something is not the
| same as knowing how it works. Take humans as an obvious example.)
| generic92034 wrote:
| Nobody knows (full scope and on every level) how human brains
| work. Still bosses rely on their employees' brains all the
| time.
| candiddevmike wrote:
| I don't understand the point you're making. We know how LLMs
| work, predicting neuron activation while an interesting thought
| exercise doesn't really mean LLMs are some mythical black box.
| It's just really expensive math. We haven't invented AI so we
| don't know how it works?
| jongjong wrote:
| This article makes a solid case. The worst kinds of bugs in
| software are not the most obvious ones like syntax errors, they
| are the ones where the code appears to be working correctly,
| until some users do something slightly unusual after a few weeks
| of some code change being deployed and it breaks spectacularly
| but the bug only affects a small fraction of users so developers
| cannot reproduce the issue... And the cose change happened such
| time ago that the guilty code isn't even suspected.
| Animats wrote:
| Aim bosses at this article in The Economist.[1] If your boss
| doesn't read The Economist, you need to escalate to a level that
| does.
|
| [1] https://www.economist.com/leaders/2025/09/25/how-to-stop-
| ais...
| Traubenfuchs wrote:
| https://archive.is/R0RJB
| Animats wrote:
| Management summary, from The Economist article:
|
| _" The worst effects of this flaw are reserved for those who
| create what is known as the "lethal trifecta". If a company,
| eager to offer a powerful AI assistant to its employees,
| gives an LLM access to un-trusted data, the ability to read
| valuable secrets and the ability to communicate with the
| outside world at the same time, then trouble is sure to
| follow. And avoiding this is not just a matter for AI
| engineers. Ordinary users, too, need to learn how to use AI
| safely, because installing the wrong combination of apps can
| generate the trifecta accidentally."_
| CollinEMac wrote:
| > It's entirely possible that some dangerous capability is hidden
| in ChatGPT, but nobody's figured out the right prompt just yet.
|
| This sounds a little dramatic. The _capabilities_ of ChatGPT are
| known. It generates text and images. The qualities of the content
| of the generated text and images is not fully known.
| alephnerd wrote:
| Also, there's a reason AI Red Teaming is now an ask that is
| getting line item funding from C-Suites.
| luxuryballs wrote:
| Yeah, and to riff off the headline, if something dangerous is
| connected to and taking commands from ChatGPT then you better
| make sure there's a way to turn it off.
| kube-system wrote:
| And that sounds a little reductive. There's a lot that can be
| done with text and images. Some of the most influential people
| and organizations in the world wield their power with text and
| images.
| kelvinjps10 wrote:
| Think of the news about the kid who got recommended to suicide
| by ChatGPT, or chatgpt providing the user information on how to
| do illegal activities, these capabilities are the ones that the
| author it's referring to
| Nasrudith wrote:
| Plus there is the 'monkeys with typewriters' problem with both
| danger and hypothetical good. In contrast, ChatGPT may
| technically reply to the right prompt with a universal cancer
| cure/vaccine. Psuedorandomly generating it wouldn't help as you
| wouldn't recognize it from all of the other queries of things
| we don't know of as true or false.
|
| Likewise what to ask it for how to make some sort of horrific
| toxic chemical, nuclear bomb, or similar isn't much good if you
| cannot recognize it and dangerous capability depends heavily on
| what you have available to you. Any idiot can be dangerous with
| C4 and detonator or bleach and ammonia. Even if ChatGPT could
| give entirely accurate instructions on how to build an atomic
| bomb it wouldn't do much good because you wouldn't be able to
| source the tools and materials without setting off red flags.
| chasing0entropy wrote:
| 70 years ago we were fascinated by the concept of converting
| analog to a perfect digital copy. In reality, that goal was a
| pipe drea!m and the closest we can ever get is a near identical
| facimile to which data fits... But it's still quite easy to
| determine digital from true analog with rudimentary means.
|
| Human thought is analog. It is based on chemical reactions, time,
| and unpredictably (effectively) random physical characteristics.
| AI is an attempt to turn that which is purely digital into an
| rational analog thought equivalent.
|
| No matter how much effort, money, power, and rare mineral eating
| TPUs will - ever - produce true analog data.
| largbae wrote:
| This is all true. But digital audio and video media has
| captured essentially all economic value outside of live
| performance. So it seems likely that we will find a "good
| enough" in this domain too.
| bcoates wrote:
| It's been closer to 100 years since we figured out information
| theory and discredited this idea (that continuous/analog
| processes have more, or different, information in them than
| discrete/digital ones)
| excalibur wrote:
| > It's entirely possible that some dangerous capability is hidden
| in ChatGPT, but nobody's figured out the right prompt just yet.
|
| Or they have, but chose to exploit or stockpile it rather than
| expose it.
| bitwize wrote:
| Boss: You can just turn it off, can't you?
|
| Me: Ask me later.
| skywhopper wrote:
| Not the point, but I'm confused by the Geoguessr screenshot.
| Under the reasoning for its decision, it mentions "traffic keeps
| to the left" but that is not apparent from the photo.
|
| Then it says the shop sign looks like a "Latin alphabet business
| name rather than Spanish or Portuguese". Uhhh... what? Spanish
| and Portuguese use the Latin alphabet.
| freetime2 wrote:
| For a real world example of the challenges of harnessing LLMs,
| look at Apple. Over a year ago they had a big product launch
| focused on "Apple Intelligence" that was supposed to make heavy
| use of LLMs for agentic workflows. But all we've really gotten
| since then are a couple of minor tools for making emojis,
| summarizing notifications, and proof reading. And they even had
| to roll back the notification summaries for a while for being
| wildly "out of control". [1] And in this year's iPhone launch the
| AI marketing was toned down _significantly_.
|
| I think Apple execs genuinely underestimated how difficult it
| would be to get LLMs to perform up to Apple's typical standards
| of polish and control.
|
| [1] https://www.bbc.com/news/articles/cge93de21n0o
| __loam wrote:
| I'm happy they ate shit here because I like my mac not getting
| co-pilot bullshit forced into it, but apparently Apple had two
| separate teams competing against each other on this topic.
| Supposedly a lot of politics got in the way of delivering on a
| good product combined with the general difficulty of building
| LLM products.
| andrewmutz wrote:
| Tremendous alpha right now in making scary posts about AI. Fear
| drives clicks. You don't even need to point to current problems,
| all you have to do is say we can't be sure they won't happen in
| the future.
| avalys wrote:
| All the same criticisms are true about hiring humans. You don't
| really know what they're thinking, you don't really know what
| their values and morals are, you can't trust that they'll never
| make a mistake, etc.
___________________________________________________________________
(page generated 2025-10-14 23:00 UTC)