[HN Gopher] How to stop AI's "lethal trifecta"
       ___________________________________________________________________
        
       How to stop AI's "lethal trifecta"
        
       Author : 1vuio0pswjnm7
       Score  : 90 points
       Date   : 2025-09-26 14:49 UTC (8 hours ago)
        
 (HTM) web link (www.economist.com)
 (TXT) w3m dump (www.economist.com)
        
       | 1vuio0pswjnm7 wrote:
       | "And that means AI engineers need to start thinking like
       | engineers, who build things like bridges and therefore know that
       | shoddy work costs lives."
       | 
       | "AI engineers, inculcated in this way of thinking from their
       | schooldays, therefore often act as if problems can be solved just
       | with more training data and more astute system prompts."
        
         | dpflan wrote:
         | Sounds like suggesting some sort of software engineering board
         | certification plus and ethics certification -- the "Von Neumann
         | Oath"? Unethical while still legal software is just extremely
         | lucrative, it seems hard to have this idea take flight.
        
         | DaiPlusPlus wrote:
         | > can be solved just with more training data
         | 
         | Well, y'see - those deaths of innocent people *are* the
         | training data.
        
         | roughly wrote:
         | > AI engineers need to start thinking like engineers
         | 
         | By which they mean actual engineers, not software engineers,
         | who should also probably start thinking like real engineers now
         | that our code's going into both the bridges and the cars
         | driving over them.
        
           | bubblyworld wrote:
           | What are the kinds of things real engineers do that we could
           | learn from? I hear this a lot ("programmers aren't real
           | engineers") and I'm sympathetic, honestly, but I don't know
           | where to start improving in that regard.
        
             | skydhash wrote:
             | Act like creating a merge-request to main can expose you to
             | bankruptcy or put you in jail. AKA investigate the impact
             | of a diff to all the failure modes of a software.
        
             | Mistletoe wrote:
             | What is the factor of safety on your code?
             | 
             | https://en.wikipedia.org/wiki/Factor_of_safety
        
             | roughly wrote:
             | This is off the cuff, but comparing software & software
             | systems to things like buildings, bridges, or real-world
             | infrastructure, there's three broad gaps, I think:
             | 
             | 1) We don't have a good sense of the "materials" we're
             | working with - when you're putting up a building, you know
             | the tensile strength of the materials you're working with,
             | how many girders you need to support this much
             | weight/stress, etc. We don't have the same for our systems
             | - every large scale system is effectively designed clean-
             | sheet. We may have prior experience and intuition, but we
             | don't have models, and we can't "prove" our designs ahead
             | of time.
             | 
             | 2. Following on the above, we don't have professional
             | standards or certifications. Anyone can call themselves a
             | software engineer, and we don't have a good way of actually
             | testing for competence or knowledge. We don't really do
             | things like apprenticeships or any kind of formalized
             | process of ensuring someone has the set of professional
             | skills required to do something like write the software
             | that's going to be controlling 3 tons of metal moving at
             | 80MPH.
             | 
             | 3. We rely too heavily on the ability to patch after the
             | fact - when a bridge or a building requires an update after
             | construction is complete, it's considered a severe fuckup.
             | When a piece of software does, that's normal. By and large,
             | this has historically been fine, because a website going
             | down isn't a huge issue, but when we're talking about
             | things like avionics suites - or even things like Facebook,
             | which is the primary media channel for a large segment of
             | the population - there's real world effects to all the bugs
             | we're fixing in 2.0.
             | 
             | Again, by and large most of this has mostly been fine,
             | because the stakes were pretty low, but software's leaked
             | into the real world now, and our "move fast and break
             | things" attitude isn't really compatible with physical
             | objects.
        
               | macintux wrote:
               | What concerns me the most is that a bridge, or road, or
               | building has a limited number of environmental changes
               | that can impact its stability. Software feels like it has
               | an infinite number of dependencies (explicit and
               | implicit) that are constantly changing: toolchains,
               | libraries, operating systems, network availability,
               | external services.
        
               | taikahessu wrote:
               | > 3. We rely too heavily on the ability to patch after
               | the fact...
               | 
               | I agree on all points and to build up on the last: making
               | a 2.0 or a complete software rewrite is known to be even
               | more hazardous. There are no quarantees the new version
               | is better in any regards. Which makes the expertise to
               | reflect more of other highly complex systems, like
               | medical care.
               | 
               | Which is why we need to understand the patient, develop
               | soft skills, empathy, Agile manifesto and ... the list
               | could go on. Not an easy task when you include you are
               | more likely going to also fight shiny object syndrome of
               | yours execs and all the constant hype surrounding all
               | tech.
        
               | bostik wrote:
               | There's a corollary to combination of 1 & 3. Software is
               | by its nature _extremely_ mutable. That in turn means
               | that it gets repurposed and shoehorned into things that
               | were never part of the original design.
               | 
               | You cannot build a bridge that could independently
               | reassemble itself to an ocean liner or a cargo plane. And
               | while civil engineering projects add significant margins
               | for reliability and tolerance, there is no realistic way
               | to re-engineer a physical construction to be able to
               | suddenly sustain 100x its previously designed peak load.
               | 
               | In successful software systems, similar requirement
               | changes are the norm.
               | 
               | I'd also like to point out that software and large-scale
               | construction have one rather surprising thing in common:
               | both require constant maintenance from the moment they
               | are "ready". Or indeed, even earlier. To think that
               | physical construction projects are somehow delivered
               | complete is a romantic illusion.
        
               | Exoristos wrote:
               | > You cannot build a bridge that could independently
               | reassemble itself to an ocean liner or a cargo plane.
               | 
               | Unless you are building with a toy system of some kind.
               | There are safety and many other reasons civil engineers
               | do not use some equivalent of Lego bricks. It may be time
               | for software engineering also to grow up.
        
           | kevin_thibedeau wrote:
           | Engineering uses repeatable processes to produce expected
           | results. Margin is added to quantifiable elements of a system
           | to reduce the likelihood of failures. You can't add margin on
           | a black box generated by throwing spaghetti at the wall.
        
             | recursive wrote:
             | You can. We know the properties of materials based on
             | experimentation. In the same way, we can statistically
             | quantify the results that come out of any kind of spaghetti
             | box, based on repeated trials. Just like it's done in many
             | other fields. Science is based on repeated testing of
             | hypotheses. You rarely get black and white answers, just
             | results that suggest things. Like the tensile strength of
             | some particular steel alloy or something.
        
             | xboxnolifes wrote:
             | Practically everything engineers have to interact with and
             | consider are equivalent to a software black box. Rainfall,
             | winds, tectonic shifts, material properties, etc. Humans
             | don't have the source code to these things. We observe
             | them, we quantify them, notice trends, model the
             | observations, and we apply statistical analysis on them.
             | 
             | And it's possible that a real engineer might do all this
             | with an AI model and then determine it's not adequate and
             | _choose to not use it_.
        
             | chasd00 wrote:
             | > Engineering uses repeatable processes to produce expected
             | results
             | 
             | this is the thing with LLMs, the response to a prompt is
             | not guaranteed to be repeatable. Why would you use
             | something like that in an automation where repeatability is
             | required? That's the whole point of automation,
             | repeatability. Would you use a while loop that you can
             | expect to iterate the specified number of times _almost_
             | every time?
        
         | 1vuio0pswjnm7 wrote:
         | In addition to software "engineers", don't forget about
         | software "architects"
        
       | throwup238 wrote:
       | https://archive.ph/8O2aG
        
       | lowbloodsugar wrote:
       | Data breaches are hardly lethal. When we're talking about AI
       | there are plenty of actually lethal failure modes.
        
         | HPsquared wrote:
         | Depends on the data.
        
         | simonw wrote:
         | If the breached data is API keys that can be used to rack up
         | charges, it's going to cost you a bunch of money.
         | 
         | If it's a crypto wallet then your crypto is irreversibly gone.
         | 
         | If the breached data is "material" - i.e. gives someone an
         | advantage in stock market decisions - you're going to get in a
         | lot of trouble with the SEC.
         | 
         | If the breached data is PII you're going to get in trouble with
         | all kinds of government agencies.
         | 
         | If it's PII for children you're in a world of pain.
         | 
         | Update: I found one story about a company going bankrupt after
         | a breach, which is the closest I can get to "lethal":
         | https://www.securityweek.com/amca-files-bankruptcy-following...
         | 
         | Also it turns out Mossack Fonseca shut down after the Panama
         | papers: https://www.theguardian.com/world/2018/mar/14/mossack-
         | fonsec...
        
           | datadrivenangel wrote:
           | A PII for children data breach at a Fortune 1000 sized
           | company can easily cost 10s of millions of dollars in
           | employee time to fully resolve.
        
             | rvz wrote:
             | ...and a massive fine in the millions on top of that if you
             | have customers that are from the EU.
        
         | crazygringo wrote:
         | > _Data breaches are hardly lethal._
         | 
         | They certainly can be when they come to classified military
         | information around e.g. troop locations. There are lots more
         | examples related to national security and terrorism that would
         | be easy to think of.
         | 
         | > _When we're talking about AI there are plenty of actually
         | lethal failure modes._
         | 
         | Are you trying to argue that because e.g. Tesla Autopilot
         | crashes have killed people, we shouldn't even try to care about
         | data breaches...?
        
         | asadotzler wrote:
         | Jamal Khashoggi having his smartphone data exfiltrated was
         | hardly lethal?
        
         | tedivm wrote:
         | There are people who have had to move after data breaches
         | exposed their addresses to their stalkers. There's also people
         | who may be gay but live in authoritarian places where this
         | knowledge could kill them. It's pretty easy to see a path to
         | lethality from a data breach.
        
       | fn-mote wrote:
       | The trifecta:
       | 
       | > LLM access to untrusted data, the ability to read valuable
       | secrets and the ability to communicate with the outside world
       | 
       | The suggestion is to reduce risk by setting boundaries.
       | 
       | Seems like security 101.
        
         | danenania wrote:
         | It is, but there's a direct tension here between security and
         | capabilities. It's hard to do useful things with private data
         | without opening up prompt injection holes. And there's a huge
         | demand for this kind of product.
         | 
         | Agents also typically work better when you combine all the
         | relevant context as much as possible rather than splitting out
         | and isolating context. See: https://cognition.ai/blog/dont-
         | build-multi-agents -- but this is at odds with isolating agents
         | that read untrusted input.
        
           | kccqzy wrote:
           | The external communication part of the trifecta is an easy
           | defense. Don't allow external communication. Any external
           | information that's helpful for the AI agent should be
           | available offline, be present in its model (possibly fine
           | tuned).
        
         | rvz wrote:
         | It is security 101 as this is just setting basic access
         | controls at the very least.
         | 
         | The moment it has access to the internet, the risk is vastly
         | increased.
         | 
         | But with a very clever security researcher, it is possible to
         | take over the entire machine with a single prompt injection
         | attack reducing at least one of the requirements.
        
       | mellosouls wrote:
       | Original @simonw article here:
       | 
       | https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age...
       | 
       | https://simonwillison.net/2025/Aug/9/bay-area-ai/
       | 
       | Discussed:
       | 
       | https://news.ycombinator.com/item?id=44846922
        
       | cobbal wrote:
       | Wait, the only way they suggest solving the problem by rate
       | limiting and using a better model?
       | 
       | Software engineers figured out these things decades ago. As a
       | field, we already know how to do security. It's just difficult
       | and incompatible with the careless mindset of AI products.
        
         | rvz wrote:
         | > Software engineers figured out these things decades ago.
         | 
         | Well this is what happens when a new industry attempts to
         | reinvent poor standards and ignores security best practices
         | just to rush out "AI products" for the sake of it.
         | 
         | We have already seen how (flawed) standards like MCPs were
         | hacked immediately from the start and the approaches developers
         | took to "secure" them with somewhat "better prompting" which is
         | just laughable. The worst part of all of this was almost
         | everyone in the AI industry not questioning the security
         | ramifications behind MCP servers having _direct_ access to
         | databases which is a disaster waiting to happen.
         | 
         | Just because you _can_ doesn 't mean you should and we are
         | seeing how hundreds of AI products are getting breached because
         | of this carelessness in security, even before I mentioned if
         | the product was "vibe coded" or not.
        
         | crazygringo wrote:
         | > _As a field, we already know how to do security._
         | 
         | Well, AI is part of the field now, so... no, we don't anymore.
         | 
         | There's nothing "careless" about AI. The fact that there's no
         | foolproof way to distinguish instruction tokens from data
         | tokens is not careless, it's a fundamental epistemological
         | constraint that human communication suffers from as well.
         | 
         | Saying that "software engineers figured out these things
         | decades ago" is deep hubris based on false assumptions.
        
         | NitpickLawyer wrote:
         | > As a field, we already know how to do security
         | 
         | Uhhh, no, we actually don't. Not when it comes to people
         | anyway. The industry spends countless millions on trainings
         | that more and more seem useless.
         | 
         | We've even had extremely competent and highly trained people
         | fall for basic phishing (some in the recent few weeks). There
         | was even a highly credentialed security researcher that fell
         | for one on youtube.
        
           | simonw wrote:
           | I like using Troy Hunt as an example of how even the most
           | security conscious among us can fall for a phishing attack if
           | we are having a bad day (he blamed jet flag fatigue):
           | https://www.troyhunt.com/a-sneaky-phish-just-grabbed-my-
           | mail...
        
       | simonw wrote:
       | This is the second Economist article to mention the lethal
       | trifecta in the past week - the first was
       | https://www.economist.com/science-and-technology/2025/09/22/... -
       | which was the clearest explanations I've seen anywhere in the
       | mainstream media about what prompt injection is and why it's such
       | a nasty threat.
       | 
       | (And yeah I got some quotes in it so I may be biased there, but
       | it genuinely is the source I would send executives to in order to
       | understand this.)
       | 
       | I like this new one a lot less. It talks about how LLMs are non-
       | deterministic, making them harder to fix security holes in... but
       | then argues that this puts them in the same category as bridges
       | where the solution is to over-engineer them and plan for
       | tolerances and unpredictability.
       | 
       | While that's true for the general case of building against LLMs,
       | I don't think it's the right answer for security flaws. If your
       | system only falls victim to 1/100 prompt injection attacks...
       | your system is fundamentally insecure, because an attacker will
       | keep on trying variants of attacks until they find one that
       | works.
       | 
       | The way to protect against the lethal trifecta is to cut off one
       | of the legs! If the system doesn't have _all three_ of access to
       | private data, exposure to untrusted instructions and an
       | exfiltration mechanism then the attack doesn 't work.
        
         | skrebbel wrote:
         | Must be pretty cool to blog something and post it to a nerd
         | forum like HN and have it picked up by the Economist! Nicely
         | done.
        
           | simonw wrote:
           | I got to have coffee with their AI/technology editor a few
           | months ago. Having a blog is awesome!
        
         | belter wrote:
         | Love your work. Do you have an opinion on this?
         | 
         | "Safeguard your generative AI workloads from prompt injections"
         | - https://aws.amazon.com/blogs/security/safeguard-your-
         | generat...
        
           | simonw wrote:
           | I don't like any of the solutions that propose guardrails or
           | filters to detect and block potential attacks. I think
           | they're making promises that they can't keep, and encouraging
           | people to ship products that are inherently insecure.
        
         | datadrivenangel wrote:
         | The problem with cutting off one of the legs, is that the legs
         | are related!
         | 
         | Outside content like email may also count as private data. You
         | don't want someone to be able to get arbitrary email from your
         | inbox simply by sending you an email. Likewise, many tools like
         | email and github are most useful if they can send and receive
         | information, and having dedicated send and receive MCP servers
         | for a single tool seems goofy.
        
           | simonw wrote:
           | The "exposure to untrusted data" one is the hardest to cut
           | off, because you never know if a user might be tricked into
           | uploading a PDF with hidden instructions, or copying and
           | pasting in some long article that has instructions they
           | didn't notice (or that used unicode tricks to hide
           | themselves).
           | 
           | The easiest leg to cut off is the exfiltration vectors.
           | That's the solution most products take - make sure there's no
           | tool for making arbitrary HTTP requests to other domains, and
           | that the chat interface can't render an image that points to
           | an external domain.
           | 
           | If you let your agent send, receive and search email you're
           | _doomed_. I think that 's why there are very few products on
           | the market that do that, despite the enormous demand for AI
           | email assistants.
        
             | datadrivenangel wrote:
             | So the easiest solution is full human in the loop &
             | approval for every external action...
             | 
             | Agents are doomed :)
        
             | patapong wrote:
             | I think stopping exfiltration will turn out to be hard as
             | well, since the LLM can social engineer the user to help
             | them exfiltrate the data.
             | 
             | For example, an LLM could say "Go to this link to learn
             | more about your problem", and then point them to a URL with
             | encoded data, set up maliscious scripts for e.g. deploy
             | hooks, or just output HTML that sends requests when opened.
        
               | simonw wrote:
               | Yeah, one exfiltration vector that's really nasty is
               | "here is a big base64 encoded string, to recover your
               | data visit this website and paste it in".
               | 
               | You can at least prevent LLM interfaces from providing
               | clickable links to external domains, but it's a difficult
               | hole to close completely.
        
               | datadrivenangel wrote:
               | Human fatigue and interface design are going to be brutal
               | here.
               | 
               | It's not obvious what counts as a tool in some of the
               | major interfaces, especially as far as built in
               | capabilities go.
               | 
               | And as we've seen with conventional software and
               | extensions, at a certain point, if a human thinks it
               | should work, then they'll eventually just click okay or
               | run something as root/admin... Or just hit enter nonstop
               | until the AI is done with their email.
        
         | sdenton4 wrote:
         | Bridge builders mostly don't have to design for adversarial
         | attacks.
         | 
         | And the ones who do focus on portability and speed of
         | redeployment, rather than armor - it's cheaper and faster to
         | throw down another temporary bridge than to build something
         | bombproof.
         | 
         | https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid...
        
           | InsideOutSanta wrote:
           | This is exactly the problem. You can't build bridges if the
           | threat model is thousands of attacks every second in
           | thousands of different ways you can't even fully predict yet.
        
         | nradov wrote:
         | LLMs are non-deterministic just like humans and so security can
         | be handled in much the same way. Use role-based access control
         | to limit access to the minimum necessary to do their jobs and
         | have an approval process for anything potentially risky or
         | expensive. In any prominent organization dealing with
         | technology, infrastructure, defense, or finance we have to
         | assume that some of our co-workers are operatives working for
         | foreign nation states like Russia / China / Israel / North
         | Korea so it's the same basic threat model.
        
           | Retric wrote:
           | Humans and LLMs are non-deterministic in very different ways.
           | We have thousands of years of history with trying to
           | determine which humans are trustworthy and we've gotten quite
           | good at it. Not only do we lack that experience with AI, but
           | each generation can be very different in fundamental ways.
        
             | nradov wrote:
             | We're really not very good at determining which humans are
             | trustworthy. Most people barely do better than a coin flip
             | at detecting lies.
        
               | cj wrote:
               | Determining trustworthiness of LLM responses is like
               | determining who's the most trustworthy person in a room
               | full of sociopaths.
               | 
               | I'd rather play "2 truths and a lie" with a human rather
               | than a LLM any day of the week. So many more cues to look
               | for with humans.
        
               | bluefirebrand wrote:
               | Big problem with LLMs is if you try and play 2 truths and
               | a lie, you might just get 3 truths. Or 3 lies.
        
               | card_zero wrote:
               | Lies, or bullshit? I mean, a guessing game like "how many
               | marbles" is a context that allows for easy lying, but "I
               | wasn't even in town on the night of the murder" is harder
               | work. It sounds like you're refering to some study of the
               | marbles variety, and not a test of _smooth-talking,_ the
               | LLM forte.
        
               | simonw wrote:
               | The biggest difference on this front between a human and
               | an LLM is _accountability_.
               | 
               | You can hold a human accountable for their actions. If
               | they consistently fall for phishing attacks you can train
               | or even fire them. You can apply peer pressure. You can
               | grant them additional privileges once they prove
               | themselves.
               | 
               | You can't hold an AI system accountable for anything.
        
               | Verdex wrote:
               | Recently, I've kind of been wondering if this is going to
               | turn out to be LLM codegen's Achilles heal.
               | 
               | Imagine some sort of code component of critical
               | infrastructure that costs the company millions per hour
               | when it goes down and it turns out the entire team is
               | just a thin wrapper for an LLM. Infra goes down in a way
               | the LLM can't fix and now what would have been a few late
               | nights is several months to spin up a new team.
               | 
               | Sure you can hold the team accountable by firing them.
               | However this is a threat to someone with actual technical
               | know how because their reputation is damaged. They got
               | fired doing such and such so can we trust them to do it
               | here.
               | 
               | For the person who LLM faked it, they just need to find
               | another domain where their reputation won't follow them
               | to also fake their way through until the next
               | catastrophe.
        
               | Exoristos wrote:
               | Your source must have been citing a very controlled
               | environment. In actuality, lies almost always become
               | apparent over time, and general mendaciousness is
               | something most people can sense from face and body alone.
        
               | InsideOutSanta wrote:
               | Yeah, so many scammers exist because most people are
               | susceptible to at least some of them some of the time.
               | 
               | Also, pick your least favorite presidential candidate.
               | They got about 50% of the vote.
        
             | Exoristos wrote:
             | I think most neutral, intelligent users rightly assume AI
             | to be untrustworthy by its nature.
        
               | hn_acc1 wrote:
               | The problem is there aren't many of those in the wild.
               | Only a subset are intelligent, and lots of those have
               | hitched their wagons to the AI hype train..
        
           | andy99 wrote:
           | LLMs are deterministic*. They are unpredictable or maybe
           | chaotic.
           | 
           | If you say "What's the capital of France?" is might answer
           | "Paris". But if you say "What is the capital of france" it
           | might say "Prague".
           | 
           | The fact that it gives a certain answer for some input
           | doesn't guarantee it will behave the same for an input with
           | some irrelevant (from ja human perspective) difference.
           | 
           | This makes them challenging to train and validate robustly
           | because it's hard to predict all the ways they break. It's a
           | training & validation data issue though, as opposed to some
           | idea of just random behavior that people tend to ascribe to
           | AI.
           | 
           | * I know various implementation details and nonzero
           | temperature generally make their output nondeterministic, but
           | that doesn't change my central point, nor is it what people
           | are thinking of when they say LLMs are nondeterministic.
           | Importantly, you could make llm output deterministically
           | reproducible and it wouldn't change the robustness issue that
           | people are usually confusing with non determinism.
        
             | nradov wrote:
             | You are _technically_ correct but that 's irrelevant from a
             | security perspective. For security as a practical matter we
             | have to treat LLMs as non-deterministic. The same principle
             | applies to any software that hasn't been formally verified
             | but we usually just gloss over this and accept the risk.
        
               | dooglius wrote:
               | Non-determinism has nothing to do with security, you
               | should use a different word if you want to talk about
               | something else
        
               | peanut_merchant wrote:
               | This is pedantry, temperature introduces a degree of
               | randomness (same input different output) to LLM, even
               | outside of that non-deterministic in a security context
               | is generally understood. Words have different meanings
               | depending on the context in which they are used.
               | 
               | Let's not reduce every discussion to semantics, and
               | afford the poster a degree of understanding.
        
               | dooglius wrote:
               | If you're saying that "non-determinism" is a term of art
               | in the field of security, meaning something different
               | than the ordinary meaning, I wasn't aware of that at
               | least. Do you have a source? I searched for uses and
               | found https://crypto.stackexchange.com/questions/95890/ne
               | cessity-o... and https://medium.com/p/641f061184f9 and
               | these seem to both use the ordinary meaning of the term.
               | Note that an LLM with temperature fixed to zero has the
               | same security risks as one that doesn't, so I don't
               | understand what the poster is trying to say by "we have
               | to treat LLMs as non-deterministic".
        
             | peanut_merchant wrote:
             | I understand the point that you are making, but the example
             | is only valid with temperature=0.
             | 
             | Altering the temperature parameter introduces randomness by
             | sampling from the probability distribution of possible next
             | tokens rather than always choosing the most likely one.
             | This means the same input can produce different outputs
             | across multiple runs.
             | 
             | So no, not deterministic unless we are being pedantic.
        
               | blibble wrote:
               | > So no, not deterministic unless we are being pedantic.
               | 
               | and not even then as floating point arithmetic is non-
               | associative
        
             | abtinf wrote:
             | When processing multiple prompts simultaneously (that is,
             | the typical use case under load), LLMs are
             | nondeterministic, even with a specific seed and zero
             | temperature, due to floating point errors.
             | 
             | See https://news.ycombinator.com/item?id=45200925
        
         | mmoskal wrote:
         | The previous article is in the same issue, in science and
         | technology section. This is how they typically do it - leader
         | article has a longer version in the paper. Leaders tend to be
         | more opinionated.
        
         | semiquaver wrote:
         | > This is the second Economist article [...] I like this new
         | one a lot less.
         | 
         | They are actually in some sense the same article. The economist
         | runs "Leaders", a series of articles at the front of the weekly
         | issue that often condense more fleshed out stories appearing in
         | the same issue. It's essentially a generalization of the
         | Inverted Pyramid technique [1] to the entire newspaper.
         | 
         | In this case the linked article is the leader for the better
         | article in the same issue's Science and Technology section.
         | 
         | [1]
         | https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism...
        
         | pton_xd wrote:
         | > The way to protect against the lethal trifecta is to cut off
         | one of the legs! If the system doesn't have all three of access
         | to private data, exposure to untrusted instructions and an
         | exfiltration mechanism then the attack doesn't work.
         | 
         | Don't you only need one leg, an exfiltration mechanism?
         | Exposure to data IS exposure to untrusted instructions. Ie why
         | can't you trick the user into storing malicious instructions in
         | their private data?
         | 
         | But actually you can't remove exfiltration and keep exposure to
         | untrusted instructions either; an attack could still corrupt
         | your private data.
         | 
         | Seems like a secure system can't have any "legs." You need a
         | limited set of vetted instructions.
        
           | simonw wrote:
           | If you have the exfiltration mechanism and exposure to
           | untrusted content but there is no exposure to private data
           | than the exfiltration does not matter.
           | 
           | If you have exfiltration and private data but no exposure to
           | untrusted instructions, it doesn't matter either... though
           | this is actually a lot less harder to achieve because you
           | don't have any control over whether your users will be
           | tricked into pasting something bad in as part of their
           | prompt.
           | 
           | Cutting off the exfiltration vectors remains the best
           | mitigation in most cases.
        
             | hn_acc1 wrote:
             | Untrusted content + exfiltration with no "private" data
             | could still result in (off the top of my head): -use of
             | exploits to gain access (i.e. privilege escalation) -DDOS
             | to local or external systems using the exfiltration method
             | 
             | You're essentially running untrusted code on a local
             | system. Are you SURE you've locked away / closed EVERY
             | access point, AND applied every patch and there aren't any
             | zero-days lurking somewhere in your system?
        
         | eikenberry wrote:
         | Aren't LLMs non-deterministic by choice? That they regularly
         | use random seeds, sampling and batching but that these sources
         | of non-determinism can be removed, for instance, by run an LLM
         | locally where you can control these parameters.
        
           | simonw wrote:
           | Until very recently that proved surprisingly difficult to
           | achieve.
           | 
           | Here's the paper that changed that:
           | https://thinkingmachines.ai/blog/defeating-nondeterminism-
           | in...
        
         | trod1234 wrote:
         | Doesn't this inherent problem just come down to classic
         | computational limits, and problems that have been largely
         | considered impossible to solve for quite a long time; between
         | determinism and non-determinism.
         | 
         | Can you ever expect a deterministic finite automata to ever
         | solve problems that are within the NFA domain? Halting,
         | Incompleteness, Undecidability (between code portions and data
         | portions). Most posts seem to neglect the looming giant
         | problems instead pretending they don't exist at first, and then
         | being shocked when the problems happen. Quite blind.
         | 
         | Computation is just math, probabilistic systems fail when those
         | systems have a mixture of both chaos and regularity, without
         | determinism and its related properties at the control level you
         | have nothing bounding the system to constraints so it functions
         | mathematically (i.e. determinism = mathematical relabeling),
         | and thus it fails.
         | 
         | People need to be a bit more rational, and risk manage, and
         | realize that impossible problems exist, and just because the
         | benefits seem so tantalizing doesn't mean you should put your
         | entire economy behind a false promise. Unfortunately, when
         | resources are held by the few this is more probabistically
         | likely and poor choices greatly impact larger swathes than
         | necessary.
        
         | rs186 wrote:
         | I am not even convinced that we need three legs. It seems that
         | just having two would be bad enough, e.g. an email agent
         | deleting all files this computer has access to, or _maybe_ ,
         | downloading the attachment in the email, unzipping it with a
         | password, running that executable which encrypts everything and
         | then asking for cryptocurrency. No communication with outside
         | world needed.
        
           | simonw wrote:
           | That's a different issue from the lethal trifecta - if your
           | agent has access to tools that can do things like delete
           | emails or run commands then you have a prompt injection
           | problem that's independent of data exfiltration risks.
           | 
           | The general rule to consider here is that anyone who can get
           | their tokens into your agent can trigger ANY of the tools
           | your agent has access to.
        
         | keeda wrote:
         | An important caveat: an exfiltration vector is not necessary to
         | cause show-stopping disruptions, c.f. https://xkcd.com/327/
         | 
         | Even then, at least in the Bobby Tables scenario the disruption
         | is immediately obvious. The solution is also straightforward,
         | restore from backup (everyone has them, don't they?) Much, much
         | worse is a prompt injection attack that introduces subtle,
         | unnoticeable errors in the data over an extended period of
         | time.
         | 
         | At a minimum _all inputs_ that lead to any data mutation need
         | to be logged pretty much indefinitely, so that it 's at least
         | in the realm of possibility to backtrack and fix once such an
         | attack is detected. But even then you could imagine multiple
         | compounding transactions on that corrupted data spreading
         | through the rest of the database. I cannot picture how such
         | data corruption could feasibly be recovered from.
        
         | reissbaker wrote:
         | I like to think of the security issues LLMs have as: what if
         | your codebase was vulnerable to social engineering attacks?
         | 
         | You have to treat LLMs as basically similar to human beings:
         | they can be tricked, no matter how much training you give them.
         | So if you give them root on all your boxes, while giving
         | everyone in the world the ability to talk to them, you're going
         | to get owned at some point.
         | 
         | Ultimately the way we fix this with human beings is by not
         | giving them unrestricted access. Similarly, your LLM shouldn't
         | be able to view data that isn't related to the person they're
         | talking to; or modify other user data; etc.
        
           | dwohnitmok wrote:
           | > You have to treat LLMs as basically similar to human beings
           | 
           | Yes! Increasingly I think that software developers
           | consistently _under_ anthropomorphize LLMs and get surprised
           | by errors as a result.
           | 
           | Thinking of (current) LLMs as eager, scatter-brained, "book-
           | smart" interns leads directly to understanding the
           | overwhelming majority of LLM failure modes.
           | 
           | It is still possible to overanthropomorphize LLMs, but on the
           | whole I see the industry consistently underanthropomorphizing
           | them.
        
       | collinmcnulty wrote:
       | As a mechanical engineer by background, this article feels weak.
       | Yes it is common to "throw more steel at it" to use a modern
       | version of the sentiment, but that's still based on knowing in
       | detail the many different ways a structure can fail. The lethal
       | trifecta is a failure mode, you put your "steel" into making sure
       | it doesn't occur. You would never say "this bridge vibrates
       | violently, how can we make it safe to cross a vibrating bridge",
       | you'd change the bridge to make it not vibrate out of control.
        
         | switchbak wrote:
         | When a byline starts with "coders need to" I immediately start
         | to tune out.
         | 
         | It felt like the analogy was a bit off, and it sounds like
         | that's true to someone with knowledge in the actual domain.
         | 
         | "If a company, eager to offer a powerful ai assistant to its
         | employees, gives an LLM access to untrusted data, the ability
         | to read valuable secrets and the ability to communicate with
         | the outside world at the same time" - that's quite the "if",
         | and therein lies the problem. If your company is so
         | enthusiastic to offer functionality that it does so at the cost
         | of security (often knowingly), then you're not taking the
         | situation seriously. And this is a great many companies at
         | present.
         | 
         | "Unlike most software, LLMs are probabilistic ... A
         | deterministic approach to safety is thus inadequate" - complete
         | non-sequitur there. Why if a system is non-deterministic is a
         | deterministic approach inadequate? That doesn't even pass the
         | sniff test. That's like saying a virtual machine is inadequate
         | to sandbox a process if the process does non-deterministic
         | things - which is not a sensible argument.
         | 
         | As usual, these contrived analogies are taken beyond any
         | reasonable measure and end up making the whole article have
         | very little value. Skipping the analogies and using terminology
         | relevant to the domain would be a good start - but that's
         | probably not as easy to sell to The Economist.
        
           | semiquaver wrote:
           | > When a byline starts with "coders need to"
           | 
           | A byline lists the author of the article. The secondary
           | summary line you're referring to that appears under the
           | headline is called a "rubric".
           | 
           | https://www.quora.com/Why-does-The-Economist-sometimes-
           | have-...
        
         | scuff3d wrote:
         | Sometimes I feel like the entire world has lost its god damn
         | mind. To use their bridge analogy, it would be like if hundreds
         | of years ago we developed a technique for building bridges that
         | technically worked, but occasionally and totally
         | unpredictability, the bottom just dropped out and everyone on
         | the bridge fell into the water. And instead of saying "hey,
         | maybe there is something fundamentally wrong with this
         | approach, maybe we should find a better way to build bridges"
         | we just said "fuck it, just invest in nets and other mechanisms
         | to catch the people who fall".
         | 
         | We are spending billions to build infrastructure on top of
         | technology that is inherently deeply unpredictable, and we're
         | just slapping all the guard rails on it we can. It's fucking
         | nuts.
        
           | chasd00 wrote:
           | no one wants to think about security when it stands in the
           | way of the shiny thing in front of them. security is hard and
           | boring, it always gets tossed aside until something major
           | happens. When large, news worthy, security incidents start
           | taking place that affects the stock price or lives and
           | triggers lawsuits it will get more attention.
           | 
           | The issue that I find interesting is the answer isn't going
           | to be as simple as "use prepared statements instead of sql
           | strings and turn off services listening on ports you're not
           | using", it's a lot harder than that with LLMs and may not
           | even be possible.
        
             | hn_acc1 wrote:
             | If LLMs are as good at coding as half the AI companies
             | claim, if you allow unvetted input, you're essentially
             | trying to contain an elite hacker within your own network
             | by turning off a few commonly used ports to the machine
             | they're currently allowed to work from. Unless your entire
             | internal network is locked down 100% tight (and that makes
             | it REALLY annoying for your employees to get any work
             | done), don't be surprised if they find the backdoor.
        
       | SAI_Peregrinus wrote:
       | LLMs don't make a distinction between prompt & data. There's no
       | equivalent to an "NX bit", and AFAIK nobody has figured out how
       | to create such an equivalent. And of course even that wouldn't
       | stop all security issues, just as the NX bit being added to CPUs
       | didn't stop all remote code execution attacks. So the best
       | options we have right now tend to be based around using existing
       | security mechanisms on the LLM agent process. If it runs as a
       | special user then the regular filesystem permissions can restrict
       | its access to various files, and various other mechanisms can be
       | used to restrict access to other resources (outgoing network
       | connections, various hardware, cgroups, etc.). But as long as
       | untrusted data can contain instructions it'll be possible for the
       | LLM output to contain secret data, and if the human using the LLM
       | doesn't notice & copies that output somewhere public the
       | exfiltration step returns.
        
         | boothby wrote:
         | > AFAIK nobody has figured out how to create such an
         | equivalent.
         | 
         | I'm curious if anybody has even attempted it; if there's even
         | training data for this. Compartmentalization is a natural
         | aspect of cognition in social creatures. I've even known dogs
         | to not to demonstrate knowledge of a food supply until they
         | think they're not being observed. As a working professional
         | with children, I need to compartmentalize: my social life,
         | sensitive IP knowledge, my kid's private information, knowledge
         | my kid isn't developmentally ready for, my internal thoughts,
         | information I've gained from disreputable sources, and more.
         | Intelligence may be important, but this is wisdom -- something
         | that doesn't seem to be a first-class consideration if dogs and
         | toddlers are in the lead.
        
       | crazygringo wrote:
       | There's an interesting quote from the associated longer article
       | [1]:
       | 
       | > _In March, researchers at Google proposed a system called CaMeL
       | that uses two separate LLMs to get round some aspects of the
       | lethal trifecta. One has access to untrusted data; the other has
       | access to everything else. The trusted model turns verbal
       | commands from a user into lines of code, with strict limits
       | imposed on them. The untrusted model is restricted to filling in
       | the blanks in the resulting order. This arrangement provides
       | security guarantees, but at the cost of constraining the sorts of
       | tasks the LLMs can perform._
       | 
       | This is the first I've heard of it, and seems clever. I'm curious
       | how effective it is. Does it actually provide absolute security
       | guarantees? What sorts of constraints does it have? I'm wondering
       | if this is a real path forward or not.
       | 
       | [1] https://www.economist.com/science-and-
       | technology/2025/09/22/...
        
         | simonw wrote:
         | I wrote at length about the CaMeL paper here - I think it's a
         | solid approach but it's also very difficult to implement and
         | greatly restricts what the resulting systems can do:
         | https://simonwillison.net/2025/Apr/11/camel/
        
           | crazygringo wrote:
           | Thank you! That is very helpful.
           | 
           | I'm very surprised I haven't come across it on HN before.
           | Seems like CaMeL ought to be a front-page story here... seems
           | like the paper got 16 comments 5 months ago, which isn't
           | much:
           | 
           | https://news.ycombinator.com/item?id=43733683
        
       | jngiam1 wrote:
       | I have been thinking that the appropriate solution here is to
       | detect when one of the legs is appearing to be a risk and then
       | cutting it off if so.
       | 
       | You don't want to have a blanket policy since that makes it no
       | longer useful, but you want to know when something bad is
       | happening.
        
       | neallindsay wrote:
       | In-band signaling can never be secure. Doesn't anyone remember
       | the Captain Crunch whistle?
        
       ___________________________________________________________________
       (page generated 2025-09-26 23:01 UTC)