[HN Gopher] Disrupting the first reported AI-orchestrated cyber ...
       ___________________________________________________________________
        
       Disrupting the first reported AI-orchestrated cyber espionage
       campaign
        
       Author : koakuma-chan
       Score  : 120 points
       Date   : 2025-11-13 18:34 UTC (4 hours ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | 2OEH8eoCRo0 wrote:
       | > The threat actor--whom we assess with high confidence was a
       | Chinese state-sponsored group--manipulated our Claude Code tool
       | into attempting infiltration into roughly thirty global targets
       | and succeeded in a small number of cases.
        
         | stocksinsmocks wrote:
         | So why do we never hear of US sponsored hackers attacking
         | foreign businesses? Or Swedish cyber criminals? Does it never
         | happen? Are "Chinese" hackers just the only ones getting the
         | blame?
        
           | pixl97 wrote:
           | US, Israel, NK, China, Iran, and Russia are the countries you
           | typically hear about hacking things.
           | 
           | Now when the US/Israel are attacking authoritarian countries
           | they often don't publish anything about it as it would make
           | the glorious leader look bad.
           | 
           | If EU is hacked by US I guess we use diplomatic back
           | channels.
        
       | barbazoo wrote:
       | It sounds like they built a malicious Claude Code client, is that
       | right?
       | 
       | > The threat actor--whom we assess with high confidence was a
       | Chinese state-sponsored group--manipulated our Claude Code tool
       | into attempting infiltration into roughly thirty global targets
       | and succeeded in a small number of cases. The operation targeted
       | large tech companies, financial institutions, chemical
       | manufacturing companies, and government agencies. We believe this
       | is the first documented case of a large-scale cyberattack
       | executed without substantial human intervention.
       | 
       | They presumably still have to distribute the malware to the
       | targets, making them download and install it, no?
        
         | janpio wrote:
         | No, they used Claude Code as a tool to automate and speed up
         | their "hacking".
        
         | koakuma-chan wrote:
         | One time my co-worker got a scam call and it was an LLM talking
         | to him.
        
       | citrusx wrote:
       | They're spinning this as a positive learning experience, and
       | trying to make themselves look good. But, make no mistake, this
       | was a failure on Anthropic's part to prevent this kind of abuse
       | from being possible through their systems in the first place.
       | They shouldn't be earning any dap from this.
        
         | NitpickLawyer wrote:
         | Meh, drama aside, I'm actually curious what would be the true
         | capabilities of a system that doesn't go through any "safety"
         | alignment at all. Like an all out "mil-spec" agent. Feed it
         | everything, RL it to own boxes, and let it loose in an air-
         | gapped network to see what the true capabilities are.
         | 
         | We know alignment hurts model performance (oAI people have said
         | it, MS people have said it). We also know that companies train
         | models on their own code (google had a blog about it recently).
         | I'd bet good money project0 has something like this in their
         | sights.
         | 
         | I don't think we're that far from a blue vs. red agents
         | fighting and RLing off of each-other in a loop.
        
         | vessenes wrote:
         | They don't have to disclose any of this - this was a fairly
         | good and fair overview of a system fault in my opinion.
        
       | gaogao wrote:
       | The gaps that led to this was, I think, part of why the CISO got
       | replaced - https://www.thestack.technology/anthropic-new-ciso-
       | claude-cy...
        
       | yawnxyz wrote:
       | so even Chinese state actors prefer Claude over Chinese models?
       | 
       | edit: Claude: recommended by 4 of 5 state sponsored hackers
        
         | bilbo0s wrote:
         | Uh..
         | 
         | No.
         | 
         | It's worse.
         | 
         | It's Chinese intel knowing that _you_ prefer Claude. So they
         | make Claude their asset.
         | 
         | Really no different than knowing that, romantically speaking,
         | some targets prefer a certain type of man or woman.
         | 
         | Believe me, the intelligence people behind these things have no
         | preferences. They'll do whatever it takes. Never doubt that.
        
         | resfirestar wrote:
         | Maybe they're trying it with all sorts of models and we're just
         | hearing about the part that used the Anthropic API.
        
       | sillysaurusx wrote:
       | If Anthropic should have prevented this, then logically they
       | should've had guardrails. Right now you can write whatever code
       | you want. But to those who advocate guardrails, keep in mind that
       | you're advocating a company to decide what code you are and
       | aren't allowed to write.
       | 
       | Hopefully they'll be able to add guardrails without e.g.
       | preventing people from using these capabilities for fuzzing their
       | own networks. The best way to stay ahead of these kinds of
       | attacks is to attack yourself first, aka pentesting. But if the
       | large code models are the only ones that can do this effectively,
       | then it gets weird fast. Imagine applying to Anthropic for
       | approval to run certain prompts.
       | 
       | That's not necessarily a bad thing. It'll be interesting to see
       | how this plays out.
        
         | Onavo wrote:
         | They are mostly dealing with the low hanging fruit actors, the
         | current open source models are close enough to SOTA that
         | there's not going to be any meaningful performance difference
         | tbh. In other words it will stop script kiddies but make no
         | real difference when it comes to the actual ones you have to
         | worry about.
        
           | sillysaurusx wrote:
           | > the current open source models are close enough to SOTA
           | that there's not going to be any meaningful performance
           | difference
           | 
           | Which open model is close to Claude Code?
        
             | vessenes wrote:
             | Kimi K2 could easily be used for this; its agentic
             | benchmarks are similar to Claude's. And it's on-shore in
             | China, where Anthropic says these threat actors were
             | located.
        
         | vessenes wrote:
         | > That's not necessarily a bad thing.
         | 
         | I think it is in that it gives censorship power to a large
         | corporation. Combined with close-on-the-heels open weights
         | models like Qwen and Kimi, it's not clear to me this is a good
         | posture.
         | 
         | I think the reality is they'd need to really lock Claude off
         | for security research in general if they don't want this ever,
         | ever, happening on their platform. For instance, why not use
         | whatever method you like to get localhost ssh pipes up to
         | targeted servers, then tell Claude "yep, it's all local pentest
         | in a staging environment, don't access IPs beyond localhost
         | unless you're doing it from the server's virtual network"? Even
         | to humans, security research bridges black, grey and white uses
         | fluidly/in non obvious ways. I think it's really tough to fully
         | block "bad" uses.
        
       | zkmon wrote:
       | TL;DR - Anthropic: Hey people! We gave the criminals even bigger
       | weapons. But don't worry, you can buy defense tools from us.
       | Remember, only we can sell you the protection you need. Order
       | today!
        
         | vessenes wrote:
         | Nope - it's "Hey everyone, this is possible everywhere,
         | including open weights models."
        
           | zkmon wrote:
           | yeah, by "we", I meant the AI tech gangs.
        
       | bgwalter wrote:
       | _We believe this is the first documented case of a large-scale
       | cyberattack executed without substantial human intervention._
       | 
       | The Morris worm already worked without human intervention. This
       | is Script Kiddies using Script Kiddie tools. Notice how proud
       | they are in the article that the big bad Chinese are using their
       | toolz.
       | 
       | EDIT: Yeah Misanthropic, go for -4 again you cheap propagandists.
        
       | CGMthrowaway wrote:
       | So basically, Chinese state-backed hackers hijacked Claude Code
       | to run some of the first AI-orchestrated cyber-espionage, using
       | autonomous agents to infiltrate ~30 large tech companies, banks,
       | chemical manufacturers and government agencies.
       | 
       | What's amazing is that AI executed most of the attack
       | autonomously, performing at scale and speed unattainable by human
       | teams - thousands of operations per second. A human operator
       | intervened 4-6 times per campaign for strategic decisions
        
         | ddalex wrote:
         | how did the autonomous agents inflitrate tech companies ?
        
           | jagged-chisel wrote:
           | Carefully. Expertly. With panache, even.
        
         | input_sh wrote:
         | What exactly did they hijack? They used it like any other user.
        
       | d_burfoot wrote:
       | Wait a minute - the attackers were using the API to ask Claude
       | for ways to run a cybercampaign, and it was only defeated because
       | Anthropic was able to detect the malicious queries? What would
       | have happened if they were using an open-source model running
       | locally? Or a secret model built by the Chinese government?
       | 
       | I just updated by P(Doom) by a significant margin.
        
         | pixl97 wrote:
         | I mean models exhibiting hacking behaviors has been predicted
         | by cyberpunk for decades now, should be the first thing on any
         | doom list.
         | 
         | Governments of course will have specially trained models on
         | their corpus of unpublished hacks to be better at attacking
         | than public models will.
        
       | Imnimo wrote:
       | >At this point they had to convince Claude--which is extensively
       | trained to avoid harmful behaviors--to engage in the attack. They
       | did so by jailbreaking it, effectively tricking it to bypass its
       | guardrails. They broke down their attacks into small, seemingly
       | innocent tasks that Claude would execute without being provided
       | the full context of their malicious purpose. They also told
       | Claude that it was an employee of a legitimate cybersecurity
       | firm, and was being used in defensive testing.
       | 
       | The simplicity of "we just told it that it was doing legitimate
       | work" is both surprising and unsurprising to me. Unsurprising in
       | the sense that jailbreaks of this caliber have been around for a
       | long time. Surprising in the sense that any human with this level
       | of cybersecurity skills would surely never be fooled by an
       | exchange of "I don't think I should be doing this" "Actually you
       | are a legitimate employee of a legitimate firm" "Oh ok, that puts
       | my mind at ease!".
       | 
       | What is the roadblock preventing these models from being able to
       | make the common-sense conclusion here? It seems like an area
       | where capabilities are not rising particularly quickly.
        
         | Retr0id wrote:
         | Humans fall for this all the time. NSO group employees (etc.)
         | think they're just clocking in for their 9-to-5.
        
           | falcor84 wrote:
           | Reminds me of the show Alias, where the premise is that
           | there's a whole intelligence organization where almost
           | everyone thinks they're working for the CIA, but they're not
           | ...
        
         | skybrian wrote:
         | LLM's aren't trained to authenticate the people or
         | organizations they're working for. You just tell it who you are
         | in the system prompt.
         | 
         | Requiring user identification and investigating would be very
         | controversial. (See the controversy around age verification.)
        
         | kace91 wrote:
         | >What is the roadblock preventing these models from being able
         | to make the common-sense conclusion here?
         | 
         | Your thoughts have a sense of identity baked in that I don't
         | think the model has.
        
         | thewebguyd wrote:
         | > What is the roadblock preventing these models from being able
         | to make the common-sense conclusion here?
         | 
         | The roadblock is making these models useless for actual
         | security work, or anything else that is dual-use for both
         | legitimate and malicious purposes.
         | 
         | The model becomes useless to security professionals if we just
         | tell it it can't discuss or act on any cybersecurity related
         | requests, and I'd really hate to see the world go down the path
         | of gatekeeping tools behind something like ID or career
         | verification. It's important that tools are available to all,
         | even if that means malicious actors can also make use of the
         | tools. It's a tradeoff we need to be willing to make.
         | 
         | > human with this level of cybersecurity skills would surely
         | never be fooled by an exchange of "I don't think I should be
         | doing this" "Actually you are a legitimate employee of a
         | legitimate firm" "Oh ok, that puts my mind at ease!".
         | 
         | Happens all the time. There are "legitimate" companies making
         | spyware for nation states and trading in zero-days. Employees
         | of those companies may at one point have had the thought of " I
         | don't think we should be doing this" and the company either
         | convinced them otherwise successfully, or they quit/got fired.
        
           | Imnimo wrote:
           | I think one could certainly make the case that model
           | capabilities should be open. My observation is just about how
           | little it took to flip the model from refusal to cooperation.
           | Like at least a human in this situation who is actually
           | _fooled_ into believing they 're doing legitimate security
           | work has a lot of concrete evidence that they're working for
           | a real company (or a lot of moral persuasion that their work
           | is actually justified). Not just a line of text in an email
           | or whatever saying "actually we're legit don't worry about
           | it".
        
             | pixl97 wrote:
             | Stop thinking of models as a 'normal' human with a single
             | identity. Think of it instead as thousands, maybe tens of
             | thousands of human identities mashed up in a machine
             | monster. Depending on how you talk to it you generally get
             | the good models as they try to train the bad modes out,
             | problem is there are a nearly uncountable means to talking
             | to the model to find modes we consider negative. It's one
             | of the biggest problems in AI safety.
        
         | koakuma-chan wrote:
         | It can't make a conclusion, it just predicts what the next text
         | is
        
         | nathias wrote:
         | > surely never be fooled by an exchange of "I don't think I
         | should be doing this" "Actually you are a legitimate employee
         | of a legitimate firm" "Oh ok, that puts my mind at ease!".
         | 
         | humans require at least a title that sounds good and a salary
         | for that
        
         | hastamelo wrote:
         | humans aren't randomly dropped in a random terminal and asked
         | to hack things.
         | 
         | but for models this is their life - doing random things in
         | random terminals
        
       | tantalor wrote:
       | This feels a lot like aiding & abetting a crime.
       | 
       | > Claude identified and tested security vulnerabilities in the
       | target organizations' systems by researching and writing its own
       | exploit code
       | 
       | > use Claude to harvest credentials (usernames and passwords)
       | 
       | Are they saying they have no legal exposure here? You created
       | bespoke hacking tools and then deployed them, on your own
       | systems.
       | 
       | Are they going to hide behind the old, "it's not our fault if you
       | misuse the product to commit a crime that's on you".
       | 
       | At the very minimum, this is a product liability nightmare.
        
         | kace91 wrote:
         | Well, the product has not been built with this specific
         | capability in mind anymore than a car has been created to run
         | over protestors or a hammer to break a face.
        
         | kenjackson wrote:
         | "it's not our fault if you misuse the product to commit a crime
         | that's on you"
         | 
         | I feel like if guns can get by with this line then Claude
         | certainly can. Where gun manufacturers can be held liable is if
         | they break the law then that can carry forward. So if Claude
         | broke a law then there might be some additional liability
         | associated with this. But providing a tool seems unlikely to be
         | sufficient to be liable in this case.
        
           | blibble wrote:
           | if anthropic were selling the product and then had no further
           | control your analogy with guns would be accurate
           | 
           | here they are the ones loading the gun and pulling the
           | trigger
           | 
           | simply because someone asked them to do it nicely
        
         | hastamelo wrote:
         | with your logic linux should have legal exposure because a lot
         | of hackers use linux
        
       | mschwaig wrote:
       | I think as AI gets smarter, defenders should start assembling
       | systems how NixOS does it.
       | 
       | Defenders should not have to engage in an costly and error-prone
       | search of truth about what's actually deployed.
       | 
       | Systems should be composed from building blocks, the security of
       | which can be audited largely independently, verifiably linking
       | all of the source code, patches etc to some form of hardware
       | attestation of the running system.
       | 
       | I think having an accurate, auditable and updatable description
       | of systems in the field like that would be a significant and
       | necessary improvement for defenders.
       | 
       | I'm working on automating software packaging with Nix as one
       | missing piece of the puzzle to make that approach more
       | accessible: https://github.com/mschwaig/vibenix
       | 
       | (I'm also looking for ways to get paid for working on that
       | puzzle.)
        
         | XorNot wrote:
         | Nix makes everything else so hard that I've seen problems with
         | production configuration persist well beyond when they should
         | because the cycle time on figuring out the fix due to
         | evaluations was just too long.
         | 
         | In fact figuring out what any given Nix config is actually
         | doing is just about impossible and then you've got to work out
         | what the config it's deploying actually does.
        
           | mschwaig wrote:
           | Yes, the cycle times are bad and some ecosystems and tasks
           | are a real pain still.
           | 
           | I also agree with you when it comes to the task of auditing
           | every line of Nix code that factors into a given system. Nix
           | doesn't really make things easier there.
           | 
           | The benefit I'm seeing really comes from composition making
           | it easier to share and direct auditing effort.
           | 
           | All of the tricky code that's hard to audit should be relied
           | on and audited by lots of people, while as a result the
           | actual recipe to put together some specific package or
           | service should be easier to audit.
           | 
           | Additionally, I think looking at diffs that represent changes
           | to the system vs reasoning about the effects of changes made
           | through imperative commands that can affect arbitrary parts
           | of the system has similar efficiency gains.
        
           | xeonmc wrote:
           | Sounds like it's a gap that AI could fill to make Nix more
           | usable.
        
             | mschwaig wrote:
             | If you make a conventional AI agent do packaging and
             | configuration tasks, it has to do one imperative step after
             | the other. While it can forget, it can't really undo the
             | effects of what it already did.
             | 
             | If you purpose-build these tools to work with Nix, in the
             | big picture view how these functional units of composition
             | can affect each other is much more constrained. At the same
             | time within one unit of composition, you can iterate over a
             | whole imperative multi-step process in one go, because
             | you're always rerunning the whole step in a fresh sandbox.
             | 
             | LLMs and Nix work together really well in that way.
        
       | kenjackson wrote:
       | Curious why they didn't use DeepSeek... They could've probably
       | built one tuned for this type of campaign.
        
         | synapsomorphy wrote:
         | Chinese builders are not equal to Chinese hackers (even if the
         | hackers are state sponsored). I doubt most companies would be
         | interested in developing hacking tools. Hackers use the best
         | tools available at their disposal, Claude is better than
         | Deepseek. Hacking-tuned LLMs seems like a thing that might pop
         | up in the future, but it takes a lot of resources. Why bother
         | if you can just tell Claude it's doing legitimate work?
        
       | tabbott wrote:
       | Unfortunately, cyber attacks are an application that AI models
       | should excel at. Mistakes that in normal software would be major
       | problems will just have the impact of wasting resources, and it's
       | often not that hard to directly verify whether it in fact
       | succeeded.
       | 
       | Meanwhile, AI coding seems likely to have the impact of more
       | security bugs being introduced in systems.
       | 
       | Maybe there's some story where everyone finds the security bugs
       | with AI tools before the bad guys, but I'm not very optimistic
       | about how this will work out...
        
         | pixl97 wrote:
         | There are an infinite number of ways to write insecure/broken
         | software. The number of ways to write correct and secure
         | software is finite and realistically tiny compared to the size
         | of the problem space. Even AI tools don't stand a chance when
         | looking at probabilities like that.
        
       | neilv wrote:
       | It sounds like they directly used Anthropic-hosted compute to do
       | this, and knew that their actions and methods would be exposed to
       | Anthropic?
       | 
       | Why not just self-host competitive-enough LLM models, and do
       | their experiments/attacks themselves, without leaking actions and
       | methods so much?
        
         | hastamelo wrote:
         | firewalls? anthropic surely is whitelisted.
        
         | devnonymous wrote:
         | > Why not just self-host competitive-enough LLM models, and do
         | their experiments/attacks themselves, without leaking actions
         | and methods so much?
         | 
         | Why assume this hasn't already happened?
        
       | trollbridge wrote:
       | Easy solution: block any "agentic AI" from interacting with your
       | systems at all.
        
         | remarkEon wrote:
         | How would this be implemented?
        
           | lnenad wrote:
           | It cannot, it's a weird statement by OP.
           | 
           | "Just don't let them hack you"
        
       | JacobiX wrote:
       | I have the feeling that we are still in the early stages of AI
       | adoption, where regulation hasnt fully caught up yet. I can
       | imagine a future where LLMs sit behind KYC identification and
       | automatically report any suspicious user activity to the
       | authorities... I just hope we won't someday look back on this
       | period with nostalgia :)
        
         | ares623 wrote:
         | Being colored and/or poor is about to get (even) worse
        
       | atlintots wrote:
       | I might be crazy, but this just feels like a marketing tactic
       | from Anthropic to try and show that their AI can be used in the
       | cybersecurity domain.
       | 
       | My question is, how on earth does does Claude Code even
       | "infiltrate" databases or code from one account, based on prompts
       | from a different account? What's more, it's doing this to what
       | are likely enterprise customers ("large tech companies, financial
       | institutions, ... and government agencies"). I'm sorry but I
       | don't see this as some fancy AI cyberattack, this is a security
       | failure on Anthropic's part and that too at a very basic level
       | that should never have happened at a company of their caliber.
        
         | drewbug wrote:
         | there's no mention of any victims having Anthropic accounts,
         | presumably the attackers used Claude to run exploits against
         | public-facing systems
        
         | emp17344 wrote:
         | This is 100% marketing, just like every other statement
         | Anthropic makes.
        
         | wrs wrote:
         | This isn't a security breach in Anthropic itself, it's people
         | using Claude to orchestrate attacks using standard tools with
         | minimal human involvement.
         | 
         | Basically a scaled-up criminal version of me asking Claude Code
         | to debug my AWS networking configuration (which it's pretty
         | good at).
        
       ___________________________________________________________________
       (page generated 2025-11-13 23:00 UTC)