[HN Gopher] How to stop AI's "lethal trifecta"
___________________________________________________________________
How to stop AI's "lethal trifecta"
Author : 1vuio0pswjnm7
Score : 90 points
Date : 2025-09-26 14:49 UTC (8 hours ago)
(HTM) web link (www.economist.com)
(TXT) w3m dump (www.economist.com)
| 1vuio0pswjnm7 wrote:
| "And that means AI engineers need to start thinking like
| engineers, who build things like bridges and therefore know that
| shoddy work costs lives."
|
| "AI engineers, inculcated in this way of thinking from their
| schooldays, therefore often act as if problems can be solved just
| with more training data and more astute system prompts."
| dpflan wrote:
| Sounds like suggesting some sort of software engineering board
| certification plus and ethics certification -- the "Von Neumann
| Oath"? Unethical while still legal software is just extremely
| lucrative, it seems hard to have this idea take flight.
| DaiPlusPlus wrote:
| > can be solved just with more training data
|
| Well, y'see - those deaths of innocent people *are* the
| training data.
| roughly wrote:
| > AI engineers need to start thinking like engineers
|
| By which they mean actual engineers, not software engineers,
| who should also probably start thinking like real engineers now
| that our code's going into both the bridges and the cars
| driving over them.
| bubblyworld wrote:
| What are the kinds of things real engineers do that we could
| learn from? I hear this a lot ("programmers aren't real
| engineers") and I'm sympathetic, honestly, but I don't know
| where to start improving in that regard.
| skydhash wrote:
| Act like creating a merge-request to main can expose you to
| bankruptcy or put you in jail. AKA investigate the impact
| of a diff to all the failure modes of a software.
| Mistletoe wrote:
| What is the factor of safety on your code?
|
| https://en.wikipedia.org/wiki/Factor_of_safety
| roughly wrote:
| This is off the cuff, but comparing software & software
| systems to things like buildings, bridges, or real-world
| infrastructure, there's three broad gaps, I think:
|
| 1) We don't have a good sense of the "materials" we're
| working with - when you're putting up a building, you know
| the tensile strength of the materials you're working with,
| how many girders you need to support this much
| weight/stress, etc. We don't have the same for our systems
| - every large scale system is effectively designed clean-
| sheet. We may have prior experience and intuition, but we
| don't have models, and we can't "prove" our designs ahead
| of time.
|
| 2. Following on the above, we don't have professional
| standards or certifications. Anyone can call themselves a
| software engineer, and we don't have a good way of actually
| testing for competence or knowledge. We don't really do
| things like apprenticeships or any kind of formalized
| process of ensuring someone has the set of professional
| skills required to do something like write the software
| that's going to be controlling 3 tons of metal moving at
| 80MPH.
|
| 3. We rely too heavily on the ability to patch after the
| fact - when a bridge or a building requires an update after
| construction is complete, it's considered a severe fuckup.
| When a piece of software does, that's normal. By and large,
| this has historically been fine, because a website going
| down isn't a huge issue, but when we're talking about
| things like avionics suites - or even things like Facebook,
| which is the primary media channel for a large segment of
| the population - there's real world effects to all the bugs
| we're fixing in 2.0.
|
| Again, by and large most of this has mostly been fine,
| because the stakes were pretty low, but software's leaked
| into the real world now, and our "move fast and break
| things" attitude isn't really compatible with physical
| objects.
| macintux wrote:
| What concerns me the most is that a bridge, or road, or
| building has a limited number of environmental changes
| that can impact its stability. Software feels like it has
| an infinite number of dependencies (explicit and
| implicit) that are constantly changing: toolchains,
| libraries, operating systems, network availability,
| external services.
| taikahessu wrote:
| > 3. We rely too heavily on the ability to patch after
| the fact...
|
| I agree on all points and to build up on the last: making
| a 2.0 or a complete software rewrite is known to be even
| more hazardous. There are no quarantees the new version
| is better in any regards. Which makes the expertise to
| reflect more of other highly complex systems, like
| medical care.
|
| Which is why we need to understand the patient, develop
| soft skills, empathy, Agile manifesto and ... the list
| could go on. Not an easy task when you include you are
| more likely going to also fight shiny object syndrome of
| yours execs and all the constant hype surrounding all
| tech.
| bostik wrote:
| There's a corollary to combination of 1 & 3. Software is
| by its nature _extremely_ mutable. That in turn means
| that it gets repurposed and shoehorned into things that
| were never part of the original design.
|
| You cannot build a bridge that could independently
| reassemble itself to an ocean liner or a cargo plane. And
| while civil engineering projects add significant margins
| for reliability and tolerance, there is no realistic way
| to re-engineer a physical construction to be able to
| suddenly sustain 100x its previously designed peak load.
|
| In successful software systems, similar requirement
| changes are the norm.
|
| I'd also like to point out that software and large-scale
| construction have one rather surprising thing in common:
| both require constant maintenance from the moment they
| are "ready". Or indeed, even earlier. To think that
| physical construction projects are somehow delivered
| complete is a romantic illusion.
| Exoristos wrote:
| > You cannot build a bridge that could independently
| reassemble itself to an ocean liner or a cargo plane.
|
| Unless you are building with a toy system of some kind.
| There are safety and many other reasons civil engineers
| do not use some equivalent of Lego bricks. It may be time
| for software engineering also to grow up.
| kevin_thibedeau wrote:
| Engineering uses repeatable processes to produce expected
| results. Margin is added to quantifiable elements of a system
| to reduce the likelihood of failures. You can't add margin on
| a black box generated by throwing spaghetti at the wall.
| recursive wrote:
| You can. We know the properties of materials based on
| experimentation. In the same way, we can statistically
| quantify the results that come out of any kind of spaghetti
| box, based on repeated trials. Just like it's done in many
| other fields. Science is based on repeated testing of
| hypotheses. You rarely get black and white answers, just
| results that suggest things. Like the tensile strength of
| some particular steel alloy or something.
| xboxnolifes wrote:
| Practically everything engineers have to interact with and
| consider are equivalent to a software black box. Rainfall,
| winds, tectonic shifts, material properties, etc. Humans
| don't have the source code to these things. We observe
| them, we quantify them, notice trends, model the
| observations, and we apply statistical analysis on them.
|
| And it's possible that a real engineer might do all this
| with an AI model and then determine it's not adequate and
| _choose to not use it_.
| chasd00 wrote:
| > Engineering uses repeatable processes to produce expected
| results
|
| this is the thing with LLMs, the response to a prompt is
| not guaranteed to be repeatable. Why would you use
| something like that in an automation where repeatability is
| required? That's the whole point of automation,
| repeatability. Would you use a while loop that you can
| expect to iterate the specified number of times _almost_
| every time?
| 1vuio0pswjnm7 wrote:
| In addition to software "engineers", don't forget about
| software "architects"
| throwup238 wrote:
| https://archive.ph/8O2aG
| lowbloodsugar wrote:
| Data breaches are hardly lethal. When we're talking about AI
| there are plenty of actually lethal failure modes.
| HPsquared wrote:
| Depends on the data.
| simonw wrote:
| If the breached data is API keys that can be used to rack up
| charges, it's going to cost you a bunch of money.
|
| If it's a crypto wallet then your crypto is irreversibly gone.
|
| If the breached data is "material" - i.e. gives someone an
| advantage in stock market decisions - you're going to get in a
| lot of trouble with the SEC.
|
| If the breached data is PII you're going to get in trouble with
| all kinds of government agencies.
|
| If it's PII for children you're in a world of pain.
|
| Update: I found one story about a company going bankrupt after
| a breach, which is the closest I can get to "lethal":
| https://www.securityweek.com/amca-files-bankruptcy-following...
|
| Also it turns out Mossack Fonseca shut down after the Panama
| papers: https://www.theguardian.com/world/2018/mar/14/mossack-
| fonsec...
| datadrivenangel wrote:
| A PII for children data breach at a Fortune 1000 sized
| company can easily cost 10s of millions of dollars in
| employee time to fully resolve.
| rvz wrote:
| ...and a massive fine in the millions on top of that if you
| have customers that are from the EU.
| crazygringo wrote:
| > _Data breaches are hardly lethal._
|
| They certainly can be when they come to classified military
| information around e.g. troop locations. There are lots more
| examples related to national security and terrorism that would
| be easy to think of.
|
| > _When we're talking about AI there are plenty of actually
| lethal failure modes._
|
| Are you trying to argue that because e.g. Tesla Autopilot
| crashes have killed people, we shouldn't even try to care about
| data breaches...?
| asadotzler wrote:
| Jamal Khashoggi having his smartphone data exfiltrated was
| hardly lethal?
| tedivm wrote:
| There are people who have had to move after data breaches
| exposed their addresses to their stalkers. There's also people
| who may be gay but live in authoritarian places where this
| knowledge could kill them. It's pretty easy to see a path to
| lethality from a data breach.
| fn-mote wrote:
| The trifecta:
|
| > LLM access to untrusted data, the ability to read valuable
| secrets and the ability to communicate with the outside world
|
| The suggestion is to reduce risk by setting boundaries.
|
| Seems like security 101.
| danenania wrote:
| It is, but there's a direct tension here between security and
| capabilities. It's hard to do useful things with private data
| without opening up prompt injection holes. And there's a huge
| demand for this kind of product.
|
| Agents also typically work better when you combine all the
| relevant context as much as possible rather than splitting out
| and isolating context. See: https://cognition.ai/blog/dont-
| build-multi-agents -- but this is at odds with isolating agents
| that read untrusted input.
| kccqzy wrote:
| The external communication part of the trifecta is an easy
| defense. Don't allow external communication. Any external
| information that's helpful for the AI agent should be
| available offline, be present in its model (possibly fine
| tuned).
| rvz wrote:
| It is security 101 as this is just setting basic access
| controls at the very least.
|
| The moment it has access to the internet, the risk is vastly
| increased.
|
| But with a very clever security researcher, it is possible to
| take over the entire machine with a single prompt injection
| attack reducing at least one of the requirements.
| mellosouls wrote:
| Original @simonw article here:
|
| https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age...
|
| https://simonwillison.net/2025/Aug/9/bay-area-ai/
|
| Discussed:
|
| https://news.ycombinator.com/item?id=44846922
| cobbal wrote:
| Wait, the only way they suggest solving the problem by rate
| limiting and using a better model?
|
| Software engineers figured out these things decades ago. As a
| field, we already know how to do security. It's just difficult
| and incompatible with the careless mindset of AI products.
| rvz wrote:
| > Software engineers figured out these things decades ago.
|
| Well this is what happens when a new industry attempts to
| reinvent poor standards and ignores security best practices
| just to rush out "AI products" for the sake of it.
|
| We have already seen how (flawed) standards like MCPs were
| hacked immediately from the start and the approaches developers
| took to "secure" them with somewhat "better prompting" which is
| just laughable. The worst part of all of this was almost
| everyone in the AI industry not questioning the security
| ramifications behind MCP servers having _direct_ access to
| databases which is a disaster waiting to happen.
|
| Just because you _can_ doesn 't mean you should and we are
| seeing how hundreds of AI products are getting breached because
| of this carelessness in security, even before I mentioned if
| the product was "vibe coded" or not.
| crazygringo wrote:
| > _As a field, we already know how to do security._
|
| Well, AI is part of the field now, so... no, we don't anymore.
|
| There's nothing "careless" about AI. The fact that there's no
| foolproof way to distinguish instruction tokens from data
| tokens is not careless, it's a fundamental epistemological
| constraint that human communication suffers from as well.
|
| Saying that "software engineers figured out these things
| decades ago" is deep hubris based on false assumptions.
| NitpickLawyer wrote:
| > As a field, we already know how to do security
|
| Uhhh, no, we actually don't. Not when it comes to people
| anyway. The industry spends countless millions on trainings
| that more and more seem useless.
|
| We've even had extremely competent and highly trained people
| fall for basic phishing (some in the recent few weeks). There
| was even a highly credentialed security researcher that fell
| for one on youtube.
| simonw wrote:
| I like using Troy Hunt as an example of how even the most
| security conscious among us can fall for a phishing attack if
| we are having a bad day (he blamed jet flag fatigue):
| https://www.troyhunt.com/a-sneaky-phish-just-grabbed-my-
| mail...
| simonw wrote:
| This is the second Economist article to mention the lethal
| trifecta in the past week - the first was
| https://www.economist.com/science-and-technology/2025/09/22/... -
| which was the clearest explanations I've seen anywhere in the
| mainstream media about what prompt injection is and why it's such
| a nasty threat.
|
| (And yeah I got some quotes in it so I may be biased there, but
| it genuinely is the source I would send executives to in order to
| understand this.)
|
| I like this new one a lot less. It talks about how LLMs are non-
| deterministic, making them harder to fix security holes in... but
| then argues that this puts them in the same category as bridges
| where the solution is to over-engineer them and plan for
| tolerances and unpredictability.
|
| While that's true for the general case of building against LLMs,
| I don't think it's the right answer for security flaws. If your
| system only falls victim to 1/100 prompt injection attacks...
| your system is fundamentally insecure, because an attacker will
| keep on trying variants of attacks until they find one that
| works.
|
| The way to protect against the lethal trifecta is to cut off one
| of the legs! If the system doesn't have _all three_ of access to
| private data, exposure to untrusted instructions and an
| exfiltration mechanism then the attack doesn 't work.
| skrebbel wrote:
| Must be pretty cool to blog something and post it to a nerd
| forum like HN and have it picked up by the Economist! Nicely
| done.
| simonw wrote:
| I got to have coffee with their AI/technology editor a few
| months ago. Having a blog is awesome!
| belter wrote:
| Love your work. Do you have an opinion on this?
|
| "Safeguard your generative AI workloads from prompt injections"
| - https://aws.amazon.com/blogs/security/safeguard-your-
| generat...
| simonw wrote:
| I don't like any of the solutions that propose guardrails or
| filters to detect and block potential attacks. I think
| they're making promises that they can't keep, and encouraging
| people to ship products that are inherently insecure.
| datadrivenangel wrote:
| The problem with cutting off one of the legs, is that the legs
| are related!
|
| Outside content like email may also count as private data. You
| don't want someone to be able to get arbitrary email from your
| inbox simply by sending you an email. Likewise, many tools like
| email and github are most useful if they can send and receive
| information, and having dedicated send and receive MCP servers
| for a single tool seems goofy.
| simonw wrote:
| The "exposure to untrusted data" one is the hardest to cut
| off, because you never know if a user might be tricked into
| uploading a PDF with hidden instructions, or copying and
| pasting in some long article that has instructions they
| didn't notice (or that used unicode tricks to hide
| themselves).
|
| The easiest leg to cut off is the exfiltration vectors.
| That's the solution most products take - make sure there's no
| tool for making arbitrary HTTP requests to other domains, and
| that the chat interface can't render an image that points to
| an external domain.
|
| If you let your agent send, receive and search email you're
| _doomed_. I think that 's why there are very few products on
| the market that do that, despite the enormous demand for AI
| email assistants.
| datadrivenangel wrote:
| So the easiest solution is full human in the loop &
| approval for every external action...
|
| Agents are doomed :)
| patapong wrote:
| I think stopping exfiltration will turn out to be hard as
| well, since the LLM can social engineer the user to help
| them exfiltrate the data.
|
| For example, an LLM could say "Go to this link to learn
| more about your problem", and then point them to a URL with
| encoded data, set up maliscious scripts for e.g. deploy
| hooks, or just output HTML that sends requests when opened.
| simonw wrote:
| Yeah, one exfiltration vector that's really nasty is
| "here is a big base64 encoded string, to recover your
| data visit this website and paste it in".
|
| You can at least prevent LLM interfaces from providing
| clickable links to external domains, but it's a difficult
| hole to close completely.
| datadrivenangel wrote:
| Human fatigue and interface design are going to be brutal
| here.
|
| It's not obvious what counts as a tool in some of the
| major interfaces, especially as far as built in
| capabilities go.
|
| And as we've seen with conventional software and
| extensions, at a certain point, if a human thinks it
| should work, then they'll eventually just click okay or
| run something as root/admin... Or just hit enter nonstop
| until the AI is done with their email.
| sdenton4 wrote:
| Bridge builders mostly don't have to design for adversarial
| attacks.
|
| And the ones who do focus on portability and speed of
| redeployment, rather than armor - it's cheaper and faster to
| throw down another temporary bridge than to build something
| bombproof.
|
| https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid...
| InsideOutSanta wrote:
| This is exactly the problem. You can't build bridges if the
| threat model is thousands of attacks every second in
| thousands of different ways you can't even fully predict yet.
| nradov wrote:
| LLMs are non-deterministic just like humans and so security can
| be handled in much the same way. Use role-based access control
| to limit access to the minimum necessary to do their jobs and
| have an approval process for anything potentially risky or
| expensive. In any prominent organization dealing with
| technology, infrastructure, defense, or finance we have to
| assume that some of our co-workers are operatives working for
| foreign nation states like Russia / China / Israel / North
| Korea so it's the same basic threat model.
| Retric wrote:
| Humans and LLMs are non-deterministic in very different ways.
| We have thousands of years of history with trying to
| determine which humans are trustworthy and we've gotten quite
| good at it. Not only do we lack that experience with AI, but
| each generation can be very different in fundamental ways.
| nradov wrote:
| We're really not very good at determining which humans are
| trustworthy. Most people barely do better than a coin flip
| at detecting lies.
| cj wrote:
| Determining trustworthiness of LLM responses is like
| determining who's the most trustworthy person in a room
| full of sociopaths.
|
| I'd rather play "2 truths and a lie" with a human rather
| than a LLM any day of the week. So many more cues to look
| for with humans.
| bluefirebrand wrote:
| Big problem with LLMs is if you try and play 2 truths and
| a lie, you might just get 3 truths. Or 3 lies.
| card_zero wrote:
| Lies, or bullshit? I mean, a guessing game like "how many
| marbles" is a context that allows for easy lying, but "I
| wasn't even in town on the night of the murder" is harder
| work. It sounds like you're refering to some study of the
| marbles variety, and not a test of _smooth-talking,_ the
| LLM forte.
| simonw wrote:
| The biggest difference on this front between a human and
| an LLM is _accountability_.
|
| You can hold a human accountable for their actions. If
| they consistently fall for phishing attacks you can train
| or even fire them. You can apply peer pressure. You can
| grant them additional privileges once they prove
| themselves.
|
| You can't hold an AI system accountable for anything.
| Verdex wrote:
| Recently, I've kind of been wondering if this is going to
| turn out to be LLM codegen's Achilles heal.
|
| Imagine some sort of code component of critical
| infrastructure that costs the company millions per hour
| when it goes down and it turns out the entire team is
| just a thin wrapper for an LLM. Infra goes down in a way
| the LLM can't fix and now what would have been a few late
| nights is several months to spin up a new team.
|
| Sure you can hold the team accountable by firing them.
| However this is a threat to someone with actual technical
| know how because their reputation is damaged. They got
| fired doing such and such so can we trust them to do it
| here.
|
| For the person who LLM faked it, they just need to find
| another domain where their reputation won't follow them
| to also fake their way through until the next
| catastrophe.
| Exoristos wrote:
| Your source must have been citing a very controlled
| environment. In actuality, lies almost always become
| apparent over time, and general mendaciousness is
| something most people can sense from face and body alone.
| InsideOutSanta wrote:
| Yeah, so many scammers exist because most people are
| susceptible to at least some of them some of the time.
|
| Also, pick your least favorite presidential candidate.
| They got about 50% of the vote.
| Exoristos wrote:
| I think most neutral, intelligent users rightly assume AI
| to be untrustworthy by its nature.
| hn_acc1 wrote:
| The problem is there aren't many of those in the wild.
| Only a subset are intelligent, and lots of those have
| hitched their wagons to the AI hype train..
| andy99 wrote:
| LLMs are deterministic*. They are unpredictable or maybe
| chaotic.
|
| If you say "What's the capital of France?" is might answer
| "Paris". But if you say "What is the capital of france" it
| might say "Prague".
|
| The fact that it gives a certain answer for some input
| doesn't guarantee it will behave the same for an input with
| some irrelevant (from ja human perspective) difference.
|
| This makes them challenging to train and validate robustly
| because it's hard to predict all the ways they break. It's a
| training & validation data issue though, as opposed to some
| idea of just random behavior that people tend to ascribe to
| AI.
|
| * I know various implementation details and nonzero
| temperature generally make their output nondeterministic, but
| that doesn't change my central point, nor is it what people
| are thinking of when they say LLMs are nondeterministic.
| Importantly, you could make llm output deterministically
| reproducible and it wouldn't change the robustness issue that
| people are usually confusing with non determinism.
| nradov wrote:
| You are _technically_ correct but that 's irrelevant from a
| security perspective. For security as a practical matter we
| have to treat LLMs as non-deterministic. The same principle
| applies to any software that hasn't been formally verified
| but we usually just gloss over this and accept the risk.
| dooglius wrote:
| Non-determinism has nothing to do with security, you
| should use a different word if you want to talk about
| something else
| peanut_merchant wrote:
| This is pedantry, temperature introduces a degree of
| randomness (same input different output) to LLM, even
| outside of that non-deterministic in a security context
| is generally understood. Words have different meanings
| depending on the context in which they are used.
|
| Let's not reduce every discussion to semantics, and
| afford the poster a degree of understanding.
| dooglius wrote:
| If you're saying that "non-determinism" is a term of art
| in the field of security, meaning something different
| than the ordinary meaning, I wasn't aware of that at
| least. Do you have a source? I searched for uses and
| found https://crypto.stackexchange.com/questions/95890/ne
| cessity-o... and https://medium.com/p/641f061184f9 and
| these seem to both use the ordinary meaning of the term.
| Note that an LLM with temperature fixed to zero has the
| same security risks as one that doesn't, so I don't
| understand what the poster is trying to say by "we have
| to treat LLMs as non-deterministic".
| peanut_merchant wrote:
| I understand the point that you are making, but the example
| is only valid with temperature=0.
|
| Altering the temperature parameter introduces randomness by
| sampling from the probability distribution of possible next
| tokens rather than always choosing the most likely one.
| This means the same input can produce different outputs
| across multiple runs.
|
| So no, not deterministic unless we are being pedantic.
| blibble wrote:
| > So no, not deterministic unless we are being pedantic.
|
| and not even then as floating point arithmetic is non-
| associative
| abtinf wrote:
| When processing multiple prompts simultaneously (that is,
| the typical use case under load), LLMs are
| nondeterministic, even with a specific seed and zero
| temperature, due to floating point errors.
|
| See https://news.ycombinator.com/item?id=45200925
| mmoskal wrote:
| The previous article is in the same issue, in science and
| technology section. This is how they typically do it - leader
| article has a longer version in the paper. Leaders tend to be
| more opinionated.
| semiquaver wrote:
| > This is the second Economist article [...] I like this new
| one a lot less.
|
| They are actually in some sense the same article. The economist
| runs "Leaders", a series of articles at the front of the weekly
| issue that often condense more fleshed out stories appearing in
| the same issue. It's essentially a generalization of the
| Inverted Pyramid technique [1] to the entire newspaper.
|
| In this case the linked article is the leader for the better
| article in the same issue's Science and Technology section.
|
| [1]
| https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism...
| pton_xd wrote:
| > The way to protect against the lethal trifecta is to cut off
| one of the legs! If the system doesn't have all three of access
| to private data, exposure to untrusted instructions and an
| exfiltration mechanism then the attack doesn't work.
|
| Don't you only need one leg, an exfiltration mechanism?
| Exposure to data IS exposure to untrusted instructions. Ie why
| can't you trick the user into storing malicious instructions in
| their private data?
|
| But actually you can't remove exfiltration and keep exposure to
| untrusted instructions either; an attack could still corrupt
| your private data.
|
| Seems like a secure system can't have any "legs." You need a
| limited set of vetted instructions.
| simonw wrote:
| If you have the exfiltration mechanism and exposure to
| untrusted content but there is no exposure to private data
| than the exfiltration does not matter.
|
| If you have exfiltration and private data but no exposure to
| untrusted instructions, it doesn't matter either... though
| this is actually a lot less harder to achieve because you
| don't have any control over whether your users will be
| tricked into pasting something bad in as part of their
| prompt.
|
| Cutting off the exfiltration vectors remains the best
| mitigation in most cases.
| hn_acc1 wrote:
| Untrusted content + exfiltration with no "private" data
| could still result in (off the top of my head): -use of
| exploits to gain access (i.e. privilege escalation) -DDOS
| to local or external systems using the exfiltration method
|
| You're essentially running untrusted code on a local
| system. Are you SURE you've locked away / closed EVERY
| access point, AND applied every patch and there aren't any
| zero-days lurking somewhere in your system?
| eikenberry wrote:
| Aren't LLMs non-deterministic by choice? That they regularly
| use random seeds, sampling and batching but that these sources
| of non-determinism can be removed, for instance, by run an LLM
| locally where you can control these parameters.
| simonw wrote:
| Until very recently that proved surprisingly difficult to
| achieve.
|
| Here's the paper that changed that:
| https://thinkingmachines.ai/blog/defeating-nondeterminism-
| in...
| trod1234 wrote:
| Doesn't this inherent problem just come down to classic
| computational limits, and problems that have been largely
| considered impossible to solve for quite a long time; between
| determinism and non-determinism.
|
| Can you ever expect a deterministic finite automata to ever
| solve problems that are within the NFA domain? Halting,
| Incompleteness, Undecidability (between code portions and data
| portions). Most posts seem to neglect the looming giant
| problems instead pretending they don't exist at first, and then
| being shocked when the problems happen. Quite blind.
|
| Computation is just math, probabilistic systems fail when those
| systems have a mixture of both chaos and regularity, without
| determinism and its related properties at the control level you
| have nothing bounding the system to constraints so it functions
| mathematically (i.e. determinism = mathematical relabeling),
| and thus it fails.
|
| People need to be a bit more rational, and risk manage, and
| realize that impossible problems exist, and just because the
| benefits seem so tantalizing doesn't mean you should put your
| entire economy behind a false promise. Unfortunately, when
| resources are held by the few this is more probabistically
| likely and poor choices greatly impact larger swathes than
| necessary.
| rs186 wrote:
| I am not even convinced that we need three legs. It seems that
| just having two would be bad enough, e.g. an email agent
| deleting all files this computer has access to, or _maybe_ ,
| downloading the attachment in the email, unzipping it with a
| password, running that executable which encrypts everything and
| then asking for cryptocurrency. No communication with outside
| world needed.
| simonw wrote:
| That's a different issue from the lethal trifecta - if your
| agent has access to tools that can do things like delete
| emails or run commands then you have a prompt injection
| problem that's independent of data exfiltration risks.
|
| The general rule to consider here is that anyone who can get
| their tokens into your agent can trigger ANY of the tools
| your agent has access to.
| keeda wrote:
| An important caveat: an exfiltration vector is not necessary to
| cause show-stopping disruptions, c.f. https://xkcd.com/327/
|
| Even then, at least in the Bobby Tables scenario the disruption
| is immediately obvious. The solution is also straightforward,
| restore from backup (everyone has them, don't they?) Much, much
| worse is a prompt injection attack that introduces subtle,
| unnoticeable errors in the data over an extended period of
| time.
|
| At a minimum _all inputs_ that lead to any data mutation need
| to be logged pretty much indefinitely, so that it 's at least
| in the realm of possibility to backtrack and fix once such an
| attack is detected. But even then you could imagine multiple
| compounding transactions on that corrupted data spreading
| through the rest of the database. I cannot picture how such
| data corruption could feasibly be recovered from.
| reissbaker wrote:
| I like to think of the security issues LLMs have as: what if
| your codebase was vulnerable to social engineering attacks?
|
| You have to treat LLMs as basically similar to human beings:
| they can be tricked, no matter how much training you give them.
| So if you give them root on all your boxes, while giving
| everyone in the world the ability to talk to them, you're going
| to get owned at some point.
|
| Ultimately the way we fix this with human beings is by not
| giving them unrestricted access. Similarly, your LLM shouldn't
| be able to view data that isn't related to the person they're
| talking to; or modify other user data; etc.
| dwohnitmok wrote:
| > You have to treat LLMs as basically similar to human beings
|
| Yes! Increasingly I think that software developers
| consistently _under_ anthropomorphize LLMs and get surprised
| by errors as a result.
|
| Thinking of (current) LLMs as eager, scatter-brained, "book-
| smart" interns leads directly to understanding the
| overwhelming majority of LLM failure modes.
|
| It is still possible to overanthropomorphize LLMs, but on the
| whole I see the industry consistently underanthropomorphizing
| them.
| collinmcnulty wrote:
| As a mechanical engineer by background, this article feels weak.
| Yes it is common to "throw more steel at it" to use a modern
| version of the sentiment, but that's still based on knowing in
| detail the many different ways a structure can fail. The lethal
| trifecta is a failure mode, you put your "steel" into making sure
| it doesn't occur. You would never say "this bridge vibrates
| violently, how can we make it safe to cross a vibrating bridge",
| you'd change the bridge to make it not vibrate out of control.
| switchbak wrote:
| When a byline starts with "coders need to" I immediately start
| to tune out.
|
| It felt like the analogy was a bit off, and it sounds like
| that's true to someone with knowledge in the actual domain.
|
| "If a company, eager to offer a powerful ai assistant to its
| employees, gives an LLM access to untrusted data, the ability
| to read valuable secrets and the ability to communicate with
| the outside world at the same time" - that's quite the "if",
| and therein lies the problem. If your company is so
| enthusiastic to offer functionality that it does so at the cost
| of security (often knowingly), then you're not taking the
| situation seriously. And this is a great many companies at
| present.
|
| "Unlike most software, LLMs are probabilistic ... A
| deterministic approach to safety is thus inadequate" - complete
| non-sequitur there. Why if a system is non-deterministic is a
| deterministic approach inadequate? That doesn't even pass the
| sniff test. That's like saying a virtual machine is inadequate
| to sandbox a process if the process does non-deterministic
| things - which is not a sensible argument.
|
| As usual, these contrived analogies are taken beyond any
| reasonable measure and end up making the whole article have
| very little value. Skipping the analogies and using terminology
| relevant to the domain would be a good start - but that's
| probably not as easy to sell to The Economist.
| semiquaver wrote:
| > When a byline starts with "coders need to"
|
| A byline lists the author of the article. The secondary
| summary line you're referring to that appears under the
| headline is called a "rubric".
|
| https://www.quora.com/Why-does-The-Economist-sometimes-
| have-...
| scuff3d wrote:
| Sometimes I feel like the entire world has lost its god damn
| mind. To use their bridge analogy, it would be like if hundreds
| of years ago we developed a technique for building bridges that
| technically worked, but occasionally and totally
| unpredictability, the bottom just dropped out and everyone on
| the bridge fell into the water. And instead of saying "hey,
| maybe there is something fundamentally wrong with this
| approach, maybe we should find a better way to build bridges"
| we just said "fuck it, just invest in nets and other mechanisms
| to catch the people who fall".
|
| We are spending billions to build infrastructure on top of
| technology that is inherently deeply unpredictable, and we're
| just slapping all the guard rails on it we can. It's fucking
| nuts.
| chasd00 wrote:
| no one wants to think about security when it stands in the
| way of the shiny thing in front of them. security is hard and
| boring, it always gets tossed aside until something major
| happens. When large, news worthy, security incidents start
| taking place that affects the stock price or lives and
| triggers lawsuits it will get more attention.
|
| The issue that I find interesting is the answer isn't going
| to be as simple as "use prepared statements instead of sql
| strings and turn off services listening on ports you're not
| using", it's a lot harder than that with LLMs and may not
| even be possible.
| hn_acc1 wrote:
| If LLMs are as good at coding as half the AI companies
| claim, if you allow unvetted input, you're essentially
| trying to contain an elite hacker within your own network
| by turning off a few commonly used ports to the machine
| they're currently allowed to work from. Unless your entire
| internal network is locked down 100% tight (and that makes
| it REALLY annoying for your employees to get any work
| done), don't be surprised if they find the backdoor.
| SAI_Peregrinus wrote:
| LLMs don't make a distinction between prompt & data. There's no
| equivalent to an "NX bit", and AFAIK nobody has figured out how
| to create such an equivalent. And of course even that wouldn't
| stop all security issues, just as the NX bit being added to CPUs
| didn't stop all remote code execution attacks. So the best
| options we have right now tend to be based around using existing
| security mechanisms on the LLM agent process. If it runs as a
| special user then the regular filesystem permissions can restrict
| its access to various files, and various other mechanisms can be
| used to restrict access to other resources (outgoing network
| connections, various hardware, cgroups, etc.). But as long as
| untrusted data can contain instructions it'll be possible for the
| LLM output to contain secret data, and if the human using the LLM
| doesn't notice & copies that output somewhere public the
| exfiltration step returns.
| boothby wrote:
| > AFAIK nobody has figured out how to create such an
| equivalent.
|
| I'm curious if anybody has even attempted it; if there's even
| training data for this. Compartmentalization is a natural
| aspect of cognition in social creatures. I've even known dogs
| to not to demonstrate knowledge of a food supply until they
| think they're not being observed. As a working professional
| with children, I need to compartmentalize: my social life,
| sensitive IP knowledge, my kid's private information, knowledge
| my kid isn't developmentally ready for, my internal thoughts,
| information I've gained from disreputable sources, and more.
| Intelligence may be important, but this is wisdom -- something
| that doesn't seem to be a first-class consideration if dogs and
| toddlers are in the lead.
| crazygringo wrote:
| There's an interesting quote from the associated longer article
| [1]:
|
| > _In March, researchers at Google proposed a system called CaMeL
| that uses two separate LLMs to get round some aspects of the
| lethal trifecta. One has access to untrusted data; the other has
| access to everything else. The trusted model turns verbal
| commands from a user into lines of code, with strict limits
| imposed on them. The untrusted model is restricted to filling in
| the blanks in the resulting order. This arrangement provides
| security guarantees, but at the cost of constraining the sorts of
| tasks the LLMs can perform._
|
| This is the first I've heard of it, and seems clever. I'm curious
| how effective it is. Does it actually provide absolute security
| guarantees? What sorts of constraints does it have? I'm wondering
| if this is a real path forward or not.
|
| [1] https://www.economist.com/science-and-
| technology/2025/09/22/...
| simonw wrote:
| I wrote at length about the CaMeL paper here - I think it's a
| solid approach but it's also very difficult to implement and
| greatly restricts what the resulting systems can do:
| https://simonwillison.net/2025/Apr/11/camel/
| crazygringo wrote:
| Thank you! That is very helpful.
|
| I'm very surprised I haven't come across it on HN before.
| Seems like CaMeL ought to be a front-page story here... seems
| like the paper got 16 comments 5 months ago, which isn't
| much:
|
| https://news.ycombinator.com/item?id=43733683
| jngiam1 wrote:
| I have been thinking that the appropriate solution here is to
| detect when one of the legs is appearing to be a risk and then
| cutting it off if so.
|
| You don't want to have a blanket policy since that makes it no
| longer useful, but you want to know when something bad is
| happening.
| neallindsay wrote:
| In-band signaling can never be secure. Doesn't anyone remember
| the Captain Crunch whistle?
___________________________________________________________________
(page generated 2025-09-26 23:01 UTC)