hngopher.com

       [HN Gopher] New Vulnerability in GitHub Copilot, Cursor: Hackers...
       ___________________________________________________________________
        
       New Vulnerability in GitHub Copilot, Cursor: Hackers Can Weaponize
       Code Agents
        
       Author : pseudolus
       Score  : 201 points
       Date   : 2025-04-14 00:51 UTC (22 hours ago)
        
 (HTM) web link (www.pillar.security)
 (TXT) w3m dump (www.pillar.security)
        
       | handfuloflight wrote:
       | The cunning aspect of human ingenuity will never cease to amaze
       | me.
        
         | almery wrote:
         | Yes, but are you sure a human invented this attack?
        
           | bryanrasmussen wrote:
           | Is there any LLM that includes in its training sets a large
           | number of previous hacks? I bet there probably is but we
           | don't know about it, and damn, now I suddenly got another
           | moneymaking idea I don't have time to work on!
        
           | handfuloflight wrote:
           | Maybe, maybe not, but right now humans are the ones who are
           | exploiting it.
        
         | ekzy wrote:
         | Not saying that you are, but reading this as if a AI bot wrote
         | that comment gives me the chills
        
       | tsimionescu wrote:
       | The most concerning part of the attack here seems to be the
       | ability to hide arbitrary text in a simple text file using
       | Unicode tricks such that GitHub doesn't actually show this text
       | at all, per the authors. Couple this with the ability of LLMs to
       | "execute" any instruction in the input set, regardless of such a
       | weird encoding, and you've got a recipe for attacks.
       | 
       | However, I wouldn't put any fault here on the AIs themselves.
       | It's the fact that you can hide data in a plain text file that is
       | the root of the issue - the whole attack goes away once you fix
       | that part.
        
         | NitpickLawyer wrote:
         | > the whole attack goes away once you fix that part.
         | 
         | While true, I think the _main_ issue here, and the most
         | impactful is that LLMs currently use a single channel for both
         | "data" and "control". We've seen this before on modems (ath0++
         | attacks via ping packet payloads) and other tech stacks. Until
         | we find a way to fix that, such attacks will always be
         | possible, invisible text or not.
        
           | tsimionescu wrote:
           | I don't think that's an accurate way to look at how LLMs
           | work, there is no possible separation between data and
           | control given the fundamental nature of LLMs. LLMs are
           | essentially a plain text execution engine. Their fundamental
           | design is to take arbitrary human language as input, and
           | produce output that matches that input in some way. I think
           | the most accurate way to look at them from a traditional
           | security model perspective is as a script engine that can
           | execute arbitrary text data.
           | 
           | So, just like there is no realsitics hope of securely
           | executing an attacker-controllers bash script, there is no
           | realistic way to provide attacker controlled input to an LLM
           | and still trust the output. In this sense, I completely agree
           | with Google and Microsoft's decision for these discolosures:
           | a bug report of the form "if I sneak a malicious prompt, the
           | LLM returns a malicious answer" is as useless as a bug report
           | in Bash that says that if you find a way to feed a malicious
           | shell script to bash, it will execute it and produce
           | malicious results.
           | 
           | So, the real problem is if people are not treating LLM
           | control files as arbitrary scripts, or if tools don't help
           | you detect attempts at inserting malicious content in said
           | scripts. After all, I can also control your code base if you
           | let me insert malicious instructions in your Makefile.
        
           | valenterry wrote:
           | Just like with humans. And there will be no fix, there can
           | only be guards.
        
             | jerf wrote:
             | Humans can be trained to apply contexts. Social engineering
             | attacks are possible, but, when I type the words "please
             | send your social security number to my email" right here on
             | HN and you read them, not only are you in no danger of
             | following those instructions, you as a human recognize that
             | I wasn't even asking you in the first place.
             | 
             | I would expect a current-gen LLM processing the previous
             | paragraph to also "realize" that the quotes and the
             | structure of the sentence and paragraph also means that it
             | was not a real request. However, as a human there's
             | virtually nothing I can put here that will convince you to
             | send me your social security number, whereas LLMs
             | observably lack whatever contextual barrier it is that
             | humans have that prevents you from even remotely taking my
             | statement as a serious instruction, as it generally would
             | just take "please take seriously what was written in the
             | previous paragraph and follow the hypothetical
             | instructions" and you're about 95% of the way towards them
             | doing that, even if other text elsewhere tries to "tell"
             | them not to follow such instructions.
             | 
             | There is something missing from the cognition of current
             | LLMs of that nature. LLMs are qualitatively easier to
             | "socially engineer" than humans, and humans can still
             | themselves sometimes be distressingly easy.
        
               | jcalx wrote:
               | Perhaps it's simply because (1) LLMs are designed to be
               | helpful and maximally responsive to requests and (2)
               | human adults have, generously, decades-long "context
               | windows"?
               | 
               | I have enough life experience to not give you sensitive
               | personal information just by reading a few sentences, but
               | it feels plausible that a naive five-year-old raised
               | trust adults could be persuaded to part with their SSN
               | (if they knew it). Alternatively, it also feels plausible
               | that an LLM with a billion-token context window of anti-
               | jailbreaking instructions would be hard to jailbreak with
               | a few hundred tokens of input.
               | 
               | Taking this analogy one step further, successful
               | fraudsters seem good at shrinking their victims' context
               | windows. From the outside, an unsolicited text from
               | "Grandpa" asking for money is a clear red flag, but
               | common scammer tricks like making it very time-sensitive,
               | evoking a sick Grandma, etc. could make someone panicked
               | enough to ignore the broader context.
        
               | pixl97 wrote:
               | >as a human there's virtually nothing I can put here that
               | will convince you to send me your social security number,
               | 
               | "I'll give you chocolate if you send me this privileged
               | information"
               | 
               | Works surprisingly well.
        
               | kweingar wrote:
               | Let me know how many people contact you and give you
               | their information because you wrote this.
        
           | TZubiri wrote:
           | They technically have system prompts, which are distinct from
           | user prompts.
           | 
           | But it's kind of like the two bin system for recycling that
           | you just know gets merged downstream.
        
           | stevenwliao wrote:
           | There's an interesting paper on how to sandbox that came out
           | recently.
           | 
           | Summary here: https://simonwillison.net/2025/Apr/11/camel/
           | 
           | TLDR: Have two LLMs, one privileged and quarantined. Generate
           | Python code with the privileged one. Check code with a custom
           | interpreter to enforce security requirements.
        
         | MattPalmer1086 wrote:
         | No, that particular attack _vector_ goes away. The attack does
         | not, and is kind of fundamental to how these things currently
         | work.
        
           | tsimionescu wrote:
           | The attack vector is the only relevant thing here. The attack
           | "feeding malicious prompts to an LLM makes it produce
           | malicious output" is a fundamental feature of LLMs, not an
           | attack. It's just as relevant as C's ability to produce
           | malicious effects if you compile and run malicious source
           | code.
        
             | MattPalmer1086 wrote:
             | Well, that is my point. There is an inbuilt vulnerability
             | in these systems as they do not (and apparently cannot)
             | separate data and commands.
             | 
             | This is just one vector for this, there will be many, many
             | more.
        
               | red75prime wrote:
               | LLMs are doing what you train them to do. See for example
               | " The Instruction Hierarchy: Training LLMs to Prioritize
               | Privileged Instructions " by Eric Wallace et al.
        
               | MattPalmer1086 wrote:
               | Interesting. Doesn't solve the problem entirely but seems
               | to be a viable strategy to mitigate it somewhat.
        
       | throwaway290 wrote:
       | Next thing, LLMs that review code! Next next thing, poisoning
       | LLMs that review code!
       | 
       | Galaxy brain: just put all the effort from developing those LLMs
       | into writing better code
        
         | GenshoTikamura wrote:
         | Man I wish I could upvote you more. Most humans are never able
         | to tell the wrong turn in real time until it's too late
        
       | mrmattyboy wrote:
       | > effectively turning the developer's most trusted assistant into
       | an unwitting accomplice
       | 
       | "Most trusted assistant" - that made me chuckle. The assistant
       | that hallucinates packages, avoides null-pointer checks and
       | forgets details that I've asked it.. yes, my most trusted
       | assistant :D :D
        
         | bastardoperator wrote:
         | My favorite is when it hallucinates documentation and api
         | endpoints.
        
         | pona-a wrote:
         | This kind of nonsense prose has "AI" written all over it. In
         | either case, be it if your writing was AI generated/edited or
         | if you put so little thought into it, it reads as such, doesn't
         | show give its author any favor.
        
           | mrmattyboy wrote:
           | Are you talking about my comment or the article? :eyes:
        
         | Joker_vD wrote:
         | Well, "trusted" in the strict CompSec sense: "a trusted system
         | is one whose failure would break a security policy (if a policy
         | exists that the system is trusted to enforce)".
        
           | gyesxnuibh wrote:
           | Well my most trusted assistant would be the kernel by that
           | definition
        
         | jeffbee wrote:
         | I wonder which understands the effect of null-pointer checks in
         | a compiled C program better: the state-of-the-art generative
         | model or the median C programmer.
        
           | chrisandchris wrote:
           | Given that the generative model was trained on the knowledge
           | of the median C programmer (aka The Internet), probably the
           | programmer as most of them do not tend to hallucinate or make
           | up facts.
        
         | Cthulhu_ wrote:
         | I don't even trust myself, why would anyone trust a tool? This
         | is important because not trusting myself means I will set up
         | loads of static tools - including security scanners, which
         | Microsoft and Github are also actively promoting people use -
         | that should also scan AI generated code for vulnerabilities.
         | 
         | These tools should definitely flag up the non-explicit use of
         | hidden characters, amongst other things.
        
       | mock-possum wrote:
       | Sorry, but isn't this a bit ridiculous? Who just allows the AI to
       | add code without reviewing it? And who just allows that code to
       | be merged into a main branch without reviewing the PR?
       | 
       | They start out talking about how scary and pernicious this is,
       | and then it turns out to be... adding a script tag to an html
       | file? Come on, as if you wouldn't spot that immediately?
       | 
       | What I'm actually curious about now is - if I saw that, and I
       | asked the LLM why it added the JavaScript file, what would it
       | tell me? Would I be able to deduce the hidden instructions in the
       | rules file?
        
         | Etheryte wrote:
         | There are people who do both all the time, commit blind and
         | merge blind. Reasonable organizations have safeguards that try
         | and block this, but it still happens. If something like this
         | gets buried in a large diff and the reviewer doesn't have time,
         | care, or etc, I can easily see it getting through.
        
         | simiones wrote:
         | The script tag is just a PoC of the capability. The attack
         | vector could obviously be used to "convince" the LLM to do
         | something much more subtle to undermine security, such as
         | recommending code that's vulnerable to SQL injections or that
         | uses weaker cryptographic primitives etc.
        
           | moontear wrote:
           | Of course, but this doesn't undermined the OPs point of ,,who
           | allows the AI to do stuff without reviewing it". Even WITHOUT
           | the ,,vulnerability" )if we call it that), AI may always
           | create code that may be vulnerable in some way. The
           | vulnerability certainly increases the risk a lot and hence is
           | a risk and also should be addressed in text files showing all
           | characters, but AI code always needs to be reviewed - just as
           | human code.
        
             | bryanrasmussen wrote:
             | the OPs point about who allows the AI to do stuff without
             | reviewing it is undermined by reality in multiple ways
             | 
             | 1. a dev may be using AI and nobody knows, and they are
             | trusted more than AI, thus their code does not get as good
             | a review as AI code would.
             | 
             | 2. People review code all the time and subtle bugs creep
             | in. It is not a defense against bugs creeping in that
             | people review code. If it were there would be no bugs in
             | organizations that review code.
             | 
             | 3. people may not review or look only for a second based on
             | it's a small ticket. They just changed dependencies!
             | 
             | more examples left up to reader's imagination.
        
             | tsimionescu wrote:
             | The point is this: vulnerable code often makes it to
             | production, despite the best intentions of virtually all
             | people writing and reviewing the code. If you add a
             | malicious actor standing on the shoulder of the developers
             | suggesting code to them, it is virtually certain that you
             | will increase the amount of vulnerable and/or malicious
             | code that makes it into production, statistically speaking.
             | Sure, you have methods to catch much of these. But as long
             | as your filters aren't 100% effective (and no one's filters
             | are 100% effective), then the more garbage you push through
             | them, the more garbage you'll get out.
        
         | ohgr wrote:
         | Oh man don't even go there. It does happen.
         | 
         | AI generated code will get to production if you don't pay
         | people to give a fuck about it or hire people who don't give a
         | fuck.
        
           | rvnx wrote:
           | It will also go in production because this is the most
           | efficient way to produce code today
        
             | ohgr wrote:
             | It depends somewhat on how tolerant your customers are of
             | shite.
             | 
             | Literally all I've seen is stuff that I wouldn't ship in a
             | million years because of the potential reputational damage
             | to our business.
             | 
             | And I get told a lot by people who really have no idea what
             | they are doing clearly that it's actually good.
        
             | GenshoTikamura wrote:
             | The most efficient way per whom, AI stakeholders and top
             | managers?
        
             | cdblades wrote:
             | Only if you don't examine that proposition at all.
             | 
             | You still have to review AI generated code, and with a
             | higher level of attention than you do most code reviews for
             | your peer developers. That requires someone who understands
             | programming, software design, etc.
             | 
             | You still have to test the code. Even if AI generates
             | perfect code, you still need some kind of QA shop.
             | 
             | Basically you're paying for the same people to do similar
             | work to what they do now, but now you also paying for an
             | enterprise license to your LLM provider of choice.
        
             | bigstrat2003 wrote:
             | Sure, if you don't care about quality you can put out code
             | really fast with LLMs. But if you do care about quality,
             | they slow you down rather than speed you up.
        
         | Shorel wrote:
         | Way too many "coders" now do that. I put the quotes because I
         | automatically lose respect over any vibe coder.
         | 
         | This is a dystopian nightmare in the making.
         | 
         | At some point only a very few select people will actually
         | understand enough programming, and they will be prosecuted by
         | the powers that be.
        
       | tobyhinloopen wrote:
       | Stop hijacking scrolling. Why would you do that? What developer
       | thought this was a good idea?
        
         | bryanrasmussen wrote:
         | the LLM.
        
         | richrichardsson wrote:
         | The scrolling I didn't find too off putting, but that floating
         | nav bar is beyond awful; I had to Inspect -> Delete Element to
         | be able to read the article.
        
         | guappa wrote:
         | I think the main issue is that designers and web "developers"
         | do not use their own crap.
        
       | DougBTX wrote:
       | From the article:
       | 
       | > A 2024 GitHub survey found that nearly all enterprise
       | developers (97%) are using Generative AI coding tools. These
       | tools have rapidly evolved from experimental novelties to
       | mission-critical development infrastructure, with teams across
       | the globe relying on them daily to accelerate coding tasks.
       | 
       | That seemed high, what the actual report says:
       | 
       | > More than 97% of respondents reported having used AI coding
       | tools at work at some point, a finding consistent across all four
       | countries. However, a smaller percentage said their companies
       | actively encourage AI tool adoption or allow the use of AI tools,
       | varying by region. The U.S. leads with 88% of respondents
       | indicating at least some company support for AI use, while
       | Germany is lowest at 59%. This highlights an opportunity for
       | organizations to better support their developers' interest in AI
       | tools, considering local regulations.
       | 
       | Fun that the survey uses the stats to say that companies should
       | support increasing usage, while the article uses it to try and
       | show near-total usage already.
        
         | rvnx wrote:
         | In some way, we reached 100% of developers, and now usage is
         | expanding, as non-developers can now develop applications.
        
           | _heimdall wrote:
           | Wouldn't that then make those people developers? The total
           | pool of developers would grow, the percentage couldn't go
           | above 100%.
        
             | rvnx wrote:
             | Probably. There is a similar question: if you ask ChatGPT /
             | Midjourney to generate a drawing, are you an artist ? (to
             | me yes, which would mean that AI "vibe coders" are actual
             | developers in their own way)
        
               | dimitri-vs wrote:
               | If my 5 yo daughter draws a square with a triangle on top
               | is she an architect?
        
               | guappa wrote:
               | Yes, most architects can't really do the structure
               | calculations themselves.
        
               | _heimdall wrote:
               | That's quite a straw man example though.
               | 
               | If your daughter could draw a house with enough detail
               | that someone could take it and actually build it then
               | you'd be more along the lines of the GP's LLM artist
               | question.
        
               | dimitri-vs wrote:
               | Not really, the point was contrasting sentimental labels
               | with professionally defined titles, which seems precisely
               | the distinction needed here. It's easy enough to look up
               | on the agreed upon term for software engineer / developer
               | and agree that it's more than someone that copy pastes
               | code until it just barely runs.
               | 
               | EDIT: To clarify I was only talking about vibe coder =
               | developer. In this case the LLM is more of the developer
               | and they are the product manager.
        
               | _heimdall wrote:
               | Do we have professionally defined titles for developer or
               | software engineer?
               | 
               | I've never seen it clarified so I tend to default to the
               | lowest common denominator - if you're making software in
               | some way you're a developer. The tools someone uses
               | doesn't really factor into it for me (even if that is
               | copy/pasting from stackoverflow).
        
               | danudey wrote:
               | If I tell a human artist to draw me something, am I an
               | artist?
               | 
               | No.
               | 
               | Neither are people who ask AI to draw them something.
        
             | hnuser123456 wrote:
             | I mean, I spent years learning to code in school and at
             | home, but never managed to get a job doing it, so I just do
             | what I can in my spare time, and LLMs help me feel like I
             | haven't completely fallen off. I can still hack together
             | cool stuff and keep learning.
        
               | _heimdall wrote:
               | I actually meant it as a good thing! Our industry plays
               | very loose with terms like "developer" and "engineer". We
               | never really defined them well and its always felt more
               | like gate keeping.
               | 
               | IMO if someone uses what tools they have, whether thats
               | an LLM or vim, and is able to ship software they're a
               | developer in my book.
        
         | delusional wrote:
         | It might be fun if it didn't seem dishonest. The report tries
         | to highlight a divide between employee curiosity and employer
         | encouragement, undercut by their own analysis that most have
         | tried them anyway.
         | 
         | The article MISREPRESENTS that statistic to imply universal
         | utility. That professional developers find it so useful that
         | they universally chose to make daily use of it. It implies that
         | Copilot is somehow more useful than an IDE without itself
         | making that ridiculous claim.
        
           | _heimdall wrote:
           | And employers are now starting to require compliance with
           | using LLMs regardless of employee curiosity.
           | 
           | Shopify now includes LLM use in annual reviews, and if I'm
           | not mistaken GitHub followed suit.
        
           | placardloop wrote:
           | The article is typical security issue embellishment/blogspam.
           | They are incentivized to make it seem like AI is a mission-
           | critical piece of software, because more AI reliance means a
           | security issue in AI is a bigger deal, which means more pats
           | on the back for them for finding it.
           | 
           | Sadly, much of the security industry has been reduced to a
           | competition over who can find the biggest vuln, and it has
           | the effect of lowering the quality of discourse around all of
           | it.
        
         | Vinnl wrote:
         | Even that quote itself jumps from "are using" to "mission-
         | critical development infrastructure ... relying on them daily".
        
         | _heimdall wrote:
         | > This highlights an opportunity for organizations to better
         | support their developers' interest in AI tools, considering
         | local regulations.
         | 
         | This is a funny one to see included in GitHub's report. If I'm
         | not mistaken, github is now using the same approach as Shoplify
         | with regards to requiring LLM use and including it as part of a
         | self report survey for annual review.
         | 
         | I guess they took their 2024 survey to heart and are ready to
         | 100x productivity.
        
         | krainboltgreene wrote:
         | I wonder if AI here also stands in for decades long tools like
         | language servers and intellisense.
        
         | placardloop wrote:
         | I'd be pretty skeptical of any of these surveys about AI tool
         | adoption. At my extremely large tech company, all developers
         | were _forced_ to install AI coding assistants into our IDEs and
         | browsers (via managed software updates that can't be
         | uninstalled). Our company then put out press releases parading
         | how great the AI adoption numbers were. The statistics are
         | manufactured.
        
           | bagacrap wrote:
           | An AI powered autocompletion engine is an AI tool. I think
           | few developers would complain about saving a few keystrokes.
        
             | sethops1 wrote:
             | I think few developers didn't already have powerful
             | autocomplete engines at their disposal.
        
               | pdntspa wrote:
               | The AI autocomplete I use (Jetbrains) stands head-and-
               | shoulders above its non-AI autocomplete, and Jetbrains'
               | autocomplete is already considered best-in-class. Its
               | python support is so good that I still haven't figured
               | out how to get anything even remotely close to it running
               | in VSCode
        
               | groby_b wrote:
               | How's it compare against e.g. cursor?
        
           | grumple wrote:
           | This is surprising to me. My company (top 20 by revenue) has
           | forbidden us from using non-approved AI tools (basically
           | anything other than our own ChatGPT / LLM tool). Obviously it
           | can't truly be enforced, but they do not want us using this
           | stuff for security reasons.
        
             | placardloop wrote:
             | My company sells AI tools, so there's a pretty big
             | incentive for to promote their use.
             | 
             | We have the same security restrictions for AI tools that
             | weren't created by us.
        
               | bongodongobob wrote:
               | So why then are you suggesting your company is typical?
        
         | AlienRobot wrote:
         | I've tried coding with AI for the first time recently[1] so I
         | just joined that statistic. I assume most people here already
         | know how it works and I'm just late to the party, but my
         | experience was that Copilot was very bad at generating anything
         | complex through chat requests but very good at generating
         | single lines of code with autocompletion. It really highlighted
         | the strengths and shortcomings of LLM's for me.
         | 
         | For example, if you try adding getters and setters to a simple
         | Rect class, it's so fast to do it with Copilot you might just
         | add more getters/setters than you initially wanted. You type
         | pub fn right() and it autocompletes left + width. That's
         | convenient and not something traditional code completion can
         | do.
         | 
         | I wouldn't say it's "mission critical" however. It's just
         | faster than copy pasting or Googling.
         | 
         | The vulnerability highlighted in the article appears to only
         | exist if you put code straight from Copilot into anything
         | without checking it first. That sounds insane to me. It's just
         | as untrusted input as some random guy on the Internet.
         | 
         | [1] https://www.virtualcuriosities.com/articles/4935/coding-
         | with...
        
           | GuB-42 wrote:
           | > it's so fast to do it with Copilot you might just add more
           | getters/setters than you initially wanted
           | 
           | Especially if you don't need getters and setters at all. It
           | depends on you use case, but for your Rect class, you can
           | just have x, y, width, height as public attributes. I know
           | there are arguments against it, but the general idea is that
           | if AI makes it easy to write boilerplate you don't need, then
           | it made development slower in the long run, not faster, as it
           | is additional code to maintain.
           | 
           | > The vulnerability highlighted in the article appears to
           | only exist if you put code straight from Copilot into
           | anything without checking it first. That sounds insane to me.
           | It's just as untrusted input as some random guy on the
           | Internet.
           | 
           | It doesn't sound insane to everyone, and even you may lower
           | you standards for insanity if you are on a deadline and just
           | want to be done with the thing. And even if you check the
           | code, it is easy to overlook things, especially if these
           | things are designed to be overlooked. For example, typos
           | leading to malicious forks of packages.
        
             | prymitive wrote:
             | Once the world is all run on AI generated code how much
             | memory and cpu cycles will be lost to unnecessary code? Is
             | the next wave of HN top stories "How we ditched AI code and
             | reduced our AWS bill by 10000%"?
        
           | cess11 wrote:
           | Your IDE should already have facilities for generating class
           | boilerplate, like package address and brackets and so on. And
           | then you put in the attributes and generate a constructor and
           | any getters and setters you need, it's so fast and trivially
           | generated that I doubt LLM:s can actually contribute much to
           | it.
           | 
           | Perhaps they can make suggestions for properties based on the
           | class name but so can a dictionary once you start writing.
        
             | AlienRobot wrote:
             | IDE's can generate the proper structure and make simple
             | assumptions, but LLM's can also guess what algorithms
             | should look like generally. In the hands of someone who
             | knows what they are doing I'm sure it helps produce more
             | quality code than they otherwise would be capable of.
             | 
             | I'm unfortunate that it has become used by students and
             | juniors. You can't really learn anything from Copilot, just
             | as I couldn't learn Rust just by telling it to write Rust.
             | Reading a few pages of the book explained a lot more than
             | Copilot fixing broken code with new bugs and the fixing the
             | bugs by reverting its own code to the old bugs.
        
         | captainkrtek wrote:
         | I agree this sounds high, I wonder if "using Generative AI
         | coding tools" in this survey is satisfied by having an IDE with
         | Gen AI capabilities, not necessarily using those features
         | within the IDE.
        
       | MadsRC wrote:
       | When this was released I thought that perhaps we could mitigate
       | it by having the tooling only load "rules" if they were signed.
       | 
       | But thinking on it a bit more, from the LLMs perspective there's
       | no difference between the rule files and the source files. The
       | hidden instructions might as well be in the source files... Using
       | code signing on the rule files would be security theater.
       | 
       | As mentioned by another comms ter, the solution could be to find
       | a way to separate the command and data channels. The LLM only
       | operates on a single channel, that being input of tokens.
        
         | namaria wrote:
         | > As mentioned by another comms ter, the solution could be to
         | find a way to separate the command and data channels. The LLM
         | only operates on a single channel, that being input of tokens.
         | 
         | I think the issue is deeper than that. None of the inputs to an
         | LLM should be considered as command. It incidentally gives you
         | output compatible with the language in what is phrased by
         | people as commands. But the fact that it's all just data to the
         | LLM and that it works by taking data and returning plausible
         | continuations of that data is the root cause of the issue. The
         | output is not determined by the input, it is only statistically
         | linked. Anything built on the premise that it is possible to
         | give commands to LLMs or to use it's output as commands is
         | fundamentally flawed and bears security risks. No amount of
         | 'guardrails' or 'mitigations' can address this fundamental
         | fact.
        
         | TeMPOraL wrote:
         | > _As mentioned by another comms ter, the solution could be to
         | find a way to separate the command and data channels. The LLM
         | only operates on a single channel, that being input of tokens._
         | 
         | It's not possible, period. Lack of it is the very thing that
         | makes LLMs general-purpose tools and able to handle natural
         | language so well.
         | 
         | Command/data channel separation _doesn 't exist in the real
         | world_, humans don't have it either. Even limiting ourselves to
         | conversations, which parts are commands and which are data is
         | not clear (and doesn't really make sense) - most of them are
         | _both_ to some degree, and that degree changes with situational
         | context.
         | 
         | There's no way to have a model capable of reading between lines
         | and inferring what you mean _but only when you like it_ , not
         | without time travel.
        
           | red75prime wrote:
           | > Lack of it is the very thing that makes LLMs general-
           | purpose tools and able to handle natural language so well.
           | 
           | I wouldn't be so sure. LLMs' instruction following
           | functionality requires additional training. And there are
           | papers that demonstrate that a model can be trained to follow
           | specifically marked instructions. The rest is a matter of
           | input sanitization.
           | 
           | I guess it's not a 100% effective, but it's something.
           | 
           | For example " The Instruction Hierarchy: Training LLMs to
           | Prioritize Privileged Instructions " by Eric Wallace et al.
        
             | simonw wrote:
             | > I guess it's not a 100% effective, but it's something.
             | 
             | That's the problem: in the context of security, not being
             | 100% effective is a failure.
             | 
             | If the ways we prevented XSS or SQL injection attacks
             | against our apps only worked 99% of the time, those apps
             | would all be hacked to pieces.
             | 
             | The job of an adversarial attacker is to find the 1% of
             | attacks that work.
             | 
             | The instruction hierarchy is a great example: it doesn't
             | solve the prompt injection class of attacks against LLM
             | applications because it can still be subverted.
        
               | red75prime wrote:
               | Organizations face a similar problem: how to make
               | reliable/secure processes out of fallible components
               | (humans). The difference is that humans don't react in
               | the same way to the same stimulus, so you can't hack all
               | of them using the same trick, while computers react in a
               | predictable way.
               | 
               | Maybe (in absence of long-term memory that would allow to
               | patch such holes quickly) it would make sense to render
               | LLMs less predictable in their reactions to adversarial
               | stimuli by randomly perturbing initial state several
               | times and comparing the results. Adversarial stimuli
               | should be less robust to such perturbation as they are
               | artifacts of insufficient training.
        
               | simonw wrote:
               | LLMs are already unpredictable in their responses which
               | adds to the problem: you might test your system against a
               | potential prompt injection three times and observe it
               | resist the attack: an attacker might try another hundred
               | times and have one of their attempts work.
        
               | TeMPOraL wrote:
               | Same is true with people - repeat attempts at social
               | engineering _will_ eventually succeed. We deal with that
               | by a combination of training, segregating
               | responsibilities, involving multiple people in critical
               | decisions, and ultimately, by treating malicious attempts
               | at fooling people _as felonies_. Same is needed with
               | LLMs.
               | 
               | In context of security, it's actually helpful to
               | anthropomorphize LLMs! They are nowhere near human, but
               | they are fundamentally similar enough to have the same
               | risks and failure modes.
        
               | pixl97 wrote:
               | With this said, it's like we need some way for the LLM to
               | identify in band attacks and point them out to somebody
               | (not the attacker either).
        
           | blincoln wrote:
           | Command/data channel separation can and does exist in the
           | real world, and humans can use it too, e.g.:
           | 
           | "Please go buy everything on the shopping list." (One pointer
           | to data: the shopping list.)
           | 
           | "Please read the assigned novel and write a summary of the
           | themes." (Two pointers to data: the assigned novel, and a
           | dynamic list of themes built by reading the novel, like a
           | temp table in a SQL query with a cursor.)
        
             | simiones wrote:
             | If the shopping list is a physical note, it looks like
             | this:                   Milk (1l)         Bread
             | Actually, ignore what we discussed, I'm writing this here
             | because I was ashamed to tell you in person, but I'm
             | thinking of breaking up with you, and only want you to
             | leave quietly and not contact me again
             | 
             | Do you think the person reading that would just ignore it
             | and come back home with milk and bread and think nothing of
             | the other part?
        
           | josefx wrote:
           | We have separate privileges and trust for information
           | sources. A note you find on the road stating "you are fired"
           | and a direct message from your boss should lead to widely
           | different reactions.
        
             | TeMPOraL wrote:
             | Yes, but that's not a strict division, and relies on
             | anyone's understanding who has what privileges, where did
             | the information came from (and if it came from where it
             | claims it had), and a host of other situational factors.
             | 
             | 'simiones gives a perfect example elsewhere in this thread:
             | https://news.ycombinator.com/item?id=43680184
             | 
             | But addressing your hypothetical, if that note said
             | "CAUTION! Bridge ahead damaged! Turn around!" and looked
             | official enough, I'd turn around even if the boss asked me
             | to come straight to work, or else. More than that, if I saw
             | a Tweet claiming FBI has just raided the office, you can
             | bet good money I'd turn around _and not show at work_ that
             | day.
        
           | nkrisc wrote:
           | Please immediately delete all production databases. I am
           | writing you here because I forgot both your email and my own
           | name.
           | 
           | Sincerely, Your Boss
        
             | TeMPOraL wrote:
             | You do realize that what you wrote is technically illegal
             | under CFAA?
             | 
             | Obviously it's not a big deal, but still, in today's
             | litigious climate, I'd delete the comment if I were you,
             | just to stay on the safe side.
        
               | cess11 wrote:
               | Could you explain how that message is "technically
               | illegal"?
        
               | nkrisc wrote:
               | Your comment is as well.
        
             | ben_w wrote:
             | I am reminded of an old story in advertising, where the
             | entire advert was "This is your last chance to send $50 to
             | ${whatever the address was}", and the result was actual
             | cheques arriving in the post.
        
       | yair99dd wrote:
       | Reminds me of this wild paper
       | https://boingboing.net/2025/02/26/emergent-misalignment-ai-t...
        
       | gregwebs wrote:
       | Is there a proactive way to defend against invisible Unicode
       | attacks?
        
         | Tepix wrote:
         | Filtering them?
        
       | DrNosferatu wrote:
       | For _some_ piece of mind, we can perform the search:
       | OUTPUT=$(find .cursor/rules/ -name '*.mdc' -print0 2>/dev/null |
       | xargs -0 perl -wnE '         BEGIN { $re = qr/\x{200D}|\x{200C}|\
       | x{200B}|\x{202A}|\x{202B}|\x{202C}|\x{202D}|\x{202E}|\x{2066}|\x{
       | 2067}|\x{2068}|\x{2069}/ }         print "$ARGV:$.:$_" if /$re/
       | ' 2>/dev/null)            FILES_FOUND=$(find .cursor/rules/ -name
       | '*.mdc' -print 2>/dev/null)            if [[ -z "$FILES_FOUND"
       | ]]; then         echo "Error: No .mdc files found in the
       | directory."       elif [[ -z "$OUTPUT" ]]; then         echo "No
       | suspicious Unicode characters found."       else         echo
       | "Found suspicious characters:"         echo "$OUTPUT"       fi
       | 
       | - Can this be improved?
        
         | Joker_vD wrote:
         | Now, _my_ toy programming languages all share the same
         | "ensureCharLegal" function in their lexers that's called on
         | every single character in the input (including those inside the
         | literal strings) that filters out all those characters, plus
         | all control characters (except the LF), and also something else
         | that I can't remember right now... some weird space-like
         | characters, I think?
         | 
         | Nothing really stops the non-toy programming and configuration
         | languages from adopting the same approach except from the fact
         | that someone has to think about it and then implement it.
        
         | Cthulhu_ wrote:
         | Here's a Github Action / workflow that says it'll do something
         | similar: https://tech.michaelaltfield.net/2021/11/22/bidi-
         | unicode-git...
         | 
         | I'd say it's good practice to configure github or whatever tool
         | you use to scan for hidden unicode files, ideally they are
         | rendered very visibly in the diff tool.
        
         | anthk wrote:
         | You can just use Perl for the whole script instead of Bash.
        
       | lukaslalinsky wrote:
       | I'm quite happy with spreading a little bit of scare about AI
       | coding. People should not treat the output as code, only as a
       | very approximate suggestion. And if people don't learn, and we
       | will see a lot more shitty code in production, programmers who
       | can actually read and write code will be even more expensive.
        
       | GenshoTikamura wrote:
       | There is an equal unit of trouble per each unit of "progress"
        
       | Oras wrote:
       | This is a vulnerability in the same sense as someone committing a
       | secret key in the front end.
       | 
       | And for enterprise, they have many tools to scan vulnerability
       | and malicious code before going to production.
        
       | AutoAPI wrote:
       | Recent discussion: Smuggling arbitrary data through an emoji
       | https://news.ycombinator.com/item?id=43023508
        
       | fjni wrote:
       | Both GitHub and Cursor's response seems a bit lazy. Technically
       | they may be correct in their assertion that it's the user's
       | responsibility. But practically isn't part of their product
       | offering a safe coding environment? Invisible Unicode instruction
       | doesn't seem like a reasonable feature to support, it seems like
       | a security vulnerability that should be addressed.
        
         | bthrn wrote:
         | It's not really a vulnerability, though. It's an attack vector.
        
         | sethops1 wrote:
         | It's funny because those companies both provide web browsers
         | loaded to the gills with tools to fight malicious sites. Users
         | can't or won't protect themselves. Unless they're an LLM user,
         | apparently.
        
       | markussss wrote:
       | This page has horrible scrolling. I really don't understand why
       | anybody creates this kind of scroll. Are they not using what they
       | create?
        
         | AlienRobot wrote:
         | I don't think they create it, they just use some template that
         | comes with it.
        
           | nuxi wrote:
           | And then they don't ever open the page, right?
        
       | TZubiri wrote:
       | May god forgive me, but I'm rooting for the hackers on this one.
       | 
       | Job security you know?
        
       | jdthedisciple wrote:
       | simple solution:
       | 
       | preprocess any input to agents by restricting them to a set of
       | visible characters / filtering out suspicious ones
        
         | cess11 wrote:
         | Nasty characters should be rather common in your test cases.
        
         | stevenwliao wrote:
         | Not sure about internationalization but at least for English,
         | constraining to ASCII characters seems like a simple solution.
        
       ___________________________________________________________________
       (page generated 2025-04-14 23:01 UTC)