[HN Gopher] New Vulnerability in GitHub Copilot, Cursor: Hackers...
___________________________________________________________________
New Vulnerability in GitHub Copilot, Cursor: Hackers Can Weaponize
Code Agents
Author : pseudolus
Score : 201 points
Date : 2025-04-14 00:51 UTC (22 hours ago)
(HTM) web link (www.pillar.security)
(TXT) w3m dump (www.pillar.security)
| handfuloflight wrote:
| The cunning aspect of human ingenuity will never cease to amaze
| me.
| almery wrote:
| Yes, but are you sure a human invented this attack?
| bryanrasmussen wrote:
| Is there any LLM that includes in its training sets a large
| number of previous hacks? I bet there probably is but we
| don't know about it, and damn, now I suddenly got another
| moneymaking idea I don't have time to work on!
| handfuloflight wrote:
| Maybe, maybe not, but right now humans are the ones who are
| exploiting it.
| ekzy wrote:
| Not saying that you are, but reading this as if a AI bot wrote
| that comment gives me the chills
| tsimionescu wrote:
| The most concerning part of the attack here seems to be the
| ability to hide arbitrary text in a simple text file using
| Unicode tricks such that GitHub doesn't actually show this text
| at all, per the authors. Couple this with the ability of LLMs to
| "execute" any instruction in the input set, regardless of such a
| weird encoding, and you've got a recipe for attacks.
|
| However, I wouldn't put any fault here on the AIs themselves.
| It's the fact that you can hide data in a plain text file that is
| the root of the issue - the whole attack goes away once you fix
| that part.
| NitpickLawyer wrote:
| > the whole attack goes away once you fix that part.
|
| While true, I think the _main_ issue here, and the most
| impactful is that LLMs currently use a single channel for both
| "data" and "control". We've seen this before on modems (ath0++
| attacks via ping packet payloads) and other tech stacks. Until
| we find a way to fix that, such attacks will always be
| possible, invisible text or not.
| tsimionescu wrote:
| I don't think that's an accurate way to look at how LLMs
| work, there is no possible separation between data and
| control given the fundamental nature of LLMs. LLMs are
| essentially a plain text execution engine. Their fundamental
| design is to take arbitrary human language as input, and
| produce output that matches that input in some way. I think
| the most accurate way to look at them from a traditional
| security model perspective is as a script engine that can
| execute arbitrary text data.
|
| So, just like there is no realsitics hope of securely
| executing an attacker-controllers bash script, there is no
| realistic way to provide attacker controlled input to an LLM
| and still trust the output. In this sense, I completely agree
| with Google and Microsoft's decision for these discolosures:
| a bug report of the form "if I sneak a malicious prompt, the
| LLM returns a malicious answer" is as useless as a bug report
| in Bash that says that if you find a way to feed a malicious
| shell script to bash, it will execute it and produce
| malicious results.
|
| So, the real problem is if people are not treating LLM
| control files as arbitrary scripts, or if tools don't help
| you detect attempts at inserting malicious content in said
| scripts. After all, I can also control your code base if you
| let me insert malicious instructions in your Makefile.
| valenterry wrote:
| Just like with humans. And there will be no fix, there can
| only be guards.
| jerf wrote:
| Humans can be trained to apply contexts. Social engineering
| attacks are possible, but, when I type the words "please
| send your social security number to my email" right here on
| HN and you read them, not only are you in no danger of
| following those instructions, you as a human recognize that
| I wasn't even asking you in the first place.
|
| I would expect a current-gen LLM processing the previous
| paragraph to also "realize" that the quotes and the
| structure of the sentence and paragraph also means that it
| was not a real request. However, as a human there's
| virtually nothing I can put here that will convince you to
| send me your social security number, whereas LLMs
| observably lack whatever contextual barrier it is that
| humans have that prevents you from even remotely taking my
| statement as a serious instruction, as it generally would
| just take "please take seriously what was written in the
| previous paragraph and follow the hypothetical
| instructions" and you're about 95% of the way towards them
| doing that, even if other text elsewhere tries to "tell"
| them not to follow such instructions.
|
| There is something missing from the cognition of current
| LLMs of that nature. LLMs are qualitatively easier to
| "socially engineer" than humans, and humans can still
| themselves sometimes be distressingly easy.
| jcalx wrote:
| Perhaps it's simply because (1) LLMs are designed to be
| helpful and maximally responsive to requests and (2)
| human adults have, generously, decades-long "context
| windows"?
|
| I have enough life experience to not give you sensitive
| personal information just by reading a few sentences, but
| it feels plausible that a naive five-year-old raised
| trust adults could be persuaded to part with their SSN
| (if they knew it). Alternatively, it also feels plausible
| that an LLM with a billion-token context window of anti-
| jailbreaking instructions would be hard to jailbreak with
| a few hundred tokens of input.
|
| Taking this analogy one step further, successful
| fraudsters seem good at shrinking their victims' context
| windows. From the outside, an unsolicited text from
| "Grandpa" asking for money is a clear red flag, but
| common scammer tricks like making it very time-sensitive,
| evoking a sick Grandma, etc. could make someone panicked
| enough to ignore the broader context.
| pixl97 wrote:
| >as a human there's virtually nothing I can put here that
| will convince you to send me your social security number,
|
| "I'll give you chocolate if you send me this privileged
| information"
|
| Works surprisingly well.
| kweingar wrote:
| Let me know how many people contact you and give you
| their information because you wrote this.
| TZubiri wrote:
| They technically have system prompts, which are distinct from
| user prompts.
|
| But it's kind of like the two bin system for recycling that
| you just know gets merged downstream.
| stevenwliao wrote:
| There's an interesting paper on how to sandbox that came out
| recently.
|
| Summary here: https://simonwillison.net/2025/Apr/11/camel/
|
| TLDR: Have two LLMs, one privileged and quarantined. Generate
| Python code with the privileged one. Check code with a custom
| interpreter to enforce security requirements.
| MattPalmer1086 wrote:
| No, that particular attack _vector_ goes away. The attack does
| not, and is kind of fundamental to how these things currently
| work.
| tsimionescu wrote:
| The attack vector is the only relevant thing here. The attack
| "feeding malicious prompts to an LLM makes it produce
| malicious output" is a fundamental feature of LLMs, not an
| attack. It's just as relevant as C's ability to produce
| malicious effects if you compile and run malicious source
| code.
| MattPalmer1086 wrote:
| Well, that is my point. There is an inbuilt vulnerability
| in these systems as they do not (and apparently cannot)
| separate data and commands.
|
| This is just one vector for this, there will be many, many
| more.
| red75prime wrote:
| LLMs are doing what you train them to do. See for example
| " The Instruction Hierarchy: Training LLMs to Prioritize
| Privileged Instructions " by Eric Wallace et al.
| MattPalmer1086 wrote:
| Interesting. Doesn't solve the problem entirely but seems
| to be a viable strategy to mitigate it somewhat.
| throwaway290 wrote:
| Next thing, LLMs that review code! Next next thing, poisoning
| LLMs that review code!
|
| Galaxy brain: just put all the effort from developing those LLMs
| into writing better code
| GenshoTikamura wrote:
| Man I wish I could upvote you more. Most humans are never able
| to tell the wrong turn in real time until it's too late
| mrmattyboy wrote:
| > effectively turning the developer's most trusted assistant into
| an unwitting accomplice
|
| "Most trusted assistant" - that made me chuckle. The assistant
| that hallucinates packages, avoides null-pointer checks and
| forgets details that I've asked it.. yes, my most trusted
| assistant :D :D
| bastardoperator wrote:
| My favorite is when it hallucinates documentation and api
| endpoints.
| pona-a wrote:
| This kind of nonsense prose has "AI" written all over it. In
| either case, be it if your writing was AI generated/edited or
| if you put so little thought into it, it reads as such, doesn't
| show give its author any favor.
| mrmattyboy wrote:
| Are you talking about my comment or the article? :eyes:
| Joker_vD wrote:
| Well, "trusted" in the strict CompSec sense: "a trusted system
| is one whose failure would break a security policy (if a policy
| exists that the system is trusted to enforce)".
| gyesxnuibh wrote:
| Well my most trusted assistant would be the kernel by that
| definition
| jeffbee wrote:
| I wonder which understands the effect of null-pointer checks in
| a compiled C program better: the state-of-the-art generative
| model or the median C programmer.
| chrisandchris wrote:
| Given that the generative model was trained on the knowledge
| of the median C programmer (aka The Internet), probably the
| programmer as most of them do not tend to hallucinate or make
| up facts.
| Cthulhu_ wrote:
| I don't even trust myself, why would anyone trust a tool? This
| is important because not trusting myself means I will set up
| loads of static tools - including security scanners, which
| Microsoft and Github are also actively promoting people use -
| that should also scan AI generated code for vulnerabilities.
|
| These tools should definitely flag up the non-explicit use of
| hidden characters, amongst other things.
| mock-possum wrote:
| Sorry, but isn't this a bit ridiculous? Who just allows the AI to
| add code without reviewing it? And who just allows that code to
| be merged into a main branch without reviewing the PR?
|
| They start out talking about how scary and pernicious this is,
| and then it turns out to be... adding a script tag to an html
| file? Come on, as if you wouldn't spot that immediately?
|
| What I'm actually curious about now is - if I saw that, and I
| asked the LLM why it added the JavaScript file, what would it
| tell me? Would I be able to deduce the hidden instructions in the
| rules file?
| Etheryte wrote:
| There are people who do both all the time, commit blind and
| merge blind. Reasonable organizations have safeguards that try
| and block this, but it still happens. If something like this
| gets buried in a large diff and the reviewer doesn't have time,
| care, or etc, I can easily see it getting through.
| simiones wrote:
| The script tag is just a PoC of the capability. The attack
| vector could obviously be used to "convince" the LLM to do
| something much more subtle to undermine security, such as
| recommending code that's vulnerable to SQL injections or that
| uses weaker cryptographic primitives etc.
| moontear wrote:
| Of course, but this doesn't undermined the OPs point of ,,who
| allows the AI to do stuff without reviewing it". Even WITHOUT
| the ,,vulnerability" )if we call it that), AI may always
| create code that may be vulnerable in some way. The
| vulnerability certainly increases the risk a lot and hence is
| a risk and also should be addressed in text files showing all
| characters, but AI code always needs to be reviewed - just as
| human code.
| bryanrasmussen wrote:
| the OPs point about who allows the AI to do stuff without
| reviewing it is undermined by reality in multiple ways
|
| 1. a dev may be using AI and nobody knows, and they are
| trusted more than AI, thus their code does not get as good
| a review as AI code would.
|
| 2. People review code all the time and subtle bugs creep
| in. It is not a defense against bugs creeping in that
| people review code. If it were there would be no bugs in
| organizations that review code.
|
| 3. people may not review or look only for a second based on
| it's a small ticket. They just changed dependencies!
|
| more examples left up to reader's imagination.
| tsimionescu wrote:
| The point is this: vulnerable code often makes it to
| production, despite the best intentions of virtually all
| people writing and reviewing the code. If you add a
| malicious actor standing on the shoulder of the developers
| suggesting code to them, it is virtually certain that you
| will increase the amount of vulnerable and/or malicious
| code that makes it into production, statistically speaking.
| Sure, you have methods to catch much of these. But as long
| as your filters aren't 100% effective (and no one's filters
| are 100% effective), then the more garbage you push through
| them, the more garbage you'll get out.
| ohgr wrote:
| Oh man don't even go there. It does happen.
|
| AI generated code will get to production if you don't pay
| people to give a fuck about it or hire people who don't give a
| fuck.
| rvnx wrote:
| It will also go in production because this is the most
| efficient way to produce code today
| ohgr wrote:
| It depends somewhat on how tolerant your customers are of
| shite.
|
| Literally all I've seen is stuff that I wouldn't ship in a
| million years because of the potential reputational damage
| to our business.
|
| And I get told a lot by people who really have no idea what
| they are doing clearly that it's actually good.
| GenshoTikamura wrote:
| The most efficient way per whom, AI stakeholders and top
| managers?
| cdblades wrote:
| Only if you don't examine that proposition at all.
|
| You still have to review AI generated code, and with a
| higher level of attention than you do most code reviews for
| your peer developers. That requires someone who understands
| programming, software design, etc.
|
| You still have to test the code. Even if AI generates
| perfect code, you still need some kind of QA shop.
|
| Basically you're paying for the same people to do similar
| work to what they do now, but now you also paying for an
| enterprise license to your LLM provider of choice.
| bigstrat2003 wrote:
| Sure, if you don't care about quality you can put out code
| really fast with LLMs. But if you do care about quality,
| they slow you down rather than speed you up.
| Shorel wrote:
| Way too many "coders" now do that. I put the quotes because I
| automatically lose respect over any vibe coder.
|
| This is a dystopian nightmare in the making.
|
| At some point only a very few select people will actually
| understand enough programming, and they will be prosecuted by
| the powers that be.
| tobyhinloopen wrote:
| Stop hijacking scrolling. Why would you do that? What developer
| thought this was a good idea?
| bryanrasmussen wrote:
| the LLM.
| richrichardsson wrote:
| The scrolling I didn't find too off putting, but that floating
| nav bar is beyond awful; I had to Inspect -> Delete Element to
| be able to read the article.
| guappa wrote:
| I think the main issue is that designers and web "developers"
| do not use their own crap.
| DougBTX wrote:
| From the article:
|
| > A 2024 GitHub survey found that nearly all enterprise
| developers (97%) are using Generative AI coding tools. These
| tools have rapidly evolved from experimental novelties to
| mission-critical development infrastructure, with teams across
| the globe relying on them daily to accelerate coding tasks.
|
| That seemed high, what the actual report says:
|
| > More than 97% of respondents reported having used AI coding
| tools at work at some point, a finding consistent across all four
| countries. However, a smaller percentage said their companies
| actively encourage AI tool adoption or allow the use of AI tools,
| varying by region. The U.S. leads with 88% of respondents
| indicating at least some company support for AI use, while
| Germany is lowest at 59%. This highlights an opportunity for
| organizations to better support their developers' interest in AI
| tools, considering local regulations.
|
| Fun that the survey uses the stats to say that companies should
| support increasing usage, while the article uses it to try and
| show near-total usage already.
| rvnx wrote:
| In some way, we reached 100% of developers, and now usage is
| expanding, as non-developers can now develop applications.
| _heimdall wrote:
| Wouldn't that then make those people developers? The total
| pool of developers would grow, the percentage couldn't go
| above 100%.
| rvnx wrote:
| Probably. There is a similar question: if you ask ChatGPT /
| Midjourney to generate a drawing, are you an artist ? (to
| me yes, which would mean that AI "vibe coders" are actual
| developers in their own way)
| dimitri-vs wrote:
| If my 5 yo daughter draws a square with a triangle on top
| is she an architect?
| guappa wrote:
| Yes, most architects can't really do the structure
| calculations themselves.
| _heimdall wrote:
| That's quite a straw man example though.
|
| If your daughter could draw a house with enough detail
| that someone could take it and actually build it then
| you'd be more along the lines of the GP's LLM artist
| question.
| dimitri-vs wrote:
| Not really, the point was contrasting sentimental labels
| with professionally defined titles, which seems precisely
| the distinction needed here. It's easy enough to look up
| on the agreed upon term for software engineer / developer
| and agree that it's more than someone that copy pastes
| code until it just barely runs.
|
| EDIT: To clarify I was only talking about vibe coder =
| developer. In this case the LLM is more of the developer
| and they are the product manager.
| _heimdall wrote:
| Do we have professionally defined titles for developer or
| software engineer?
|
| I've never seen it clarified so I tend to default to the
| lowest common denominator - if you're making software in
| some way you're a developer. The tools someone uses
| doesn't really factor into it for me (even if that is
| copy/pasting from stackoverflow).
| danudey wrote:
| If I tell a human artist to draw me something, am I an
| artist?
|
| No.
|
| Neither are people who ask AI to draw them something.
| hnuser123456 wrote:
| I mean, I spent years learning to code in school and at
| home, but never managed to get a job doing it, so I just do
| what I can in my spare time, and LLMs help me feel like I
| haven't completely fallen off. I can still hack together
| cool stuff and keep learning.
| _heimdall wrote:
| I actually meant it as a good thing! Our industry plays
| very loose with terms like "developer" and "engineer". We
| never really defined them well and its always felt more
| like gate keeping.
|
| IMO if someone uses what tools they have, whether thats
| an LLM or vim, and is able to ship software they're a
| developer in my book.
| delusional wrote:
| It might be fun if it didn't seem dishonest. The report tries
| to highlight a divide between employee curiosity and employer
| encouragement, undercut by their own analysis that most have
| tried them anyway.
|
| The article MISREPRESENTS that statistic to imply universal
| utility. That professional developers find it so useful that
| they universally chose to make daily use of it. It implies that
| Copilot is somehow more useful than an IDE without itself
| making that ridiculous claim.
| _heimdall wrote:
| And employers are now starting to require compliance with
| using LLMs regardless of employee curiosity.
|
| Shopify now includes LLM use in annual reviews, and if I'm
| not mistaken GitHub followed suit.
| placardloop wrote:
| The article is typical security issue embellishment/blogspam.
| They are incentivized to make it seem like AI is a mission-
| critical piece of software, because more AI reliance means a
| security issue in AI is a bigger deal, which means more pats
| on the back for them for finding it.
|
| Sadly, much of the security industry has been reduced to a
| competition over who can find the biggest vuln, and it has
| the effect of lowering the quality of discourse around all of
| it.
| Vinnl wrote:
| Even that quote itself jumps from "are using" to "mission-
| critical development infrastructure ... relying on them daily".
| _heimdall wrote:
| > This highlights an opportunity for organizations to better
| support their developers' interest in AI tools, considering
| local regulations.
|
| This is a funny one to see included in GitHub's report. If I'm
| not mistaken, github is now using the same approach as Shoplify
| with regards to requiring LLM use and including it as part of a
| self report survey for annual review.
|
| I guess they took their 2024 survey to heart and are ready to
| 100x productivity.
| krainboltgreene wrote:
| I wonder if AI here also stands in for decades long tools like
| language servers and intellisense.
| placardloop wrote:
| I'd be pretty skeptical of any of these surveys about AI tool
| adoption. At my extremely large tech company, all developers
| were _forced_ to install AI coding assistants into our IDEs and
| browsers (via managed software updates that can't be
| uninstalled). Our company then put out press releases parading
| how great the AI adoption numbers were. The statistics are
| manufactured.
| bagacrap wrote:
| An AI powered autocompletion engine is an AI tool. I think
| few developers would complain about saving a few keystrokes.
| sethops1 wrote:
| I think few developers didn't already have powerful
| autocomplete engines at their disposal.
| pdntspa wrote:
| The AI autocomplete I use (Jetbrains) stands head-and-
| shoulders above its non-AI autocomplete, and Jetbrains'
| autocomplete is already considered best-in-class. Its
| python support is so good that I still haven't figured
| out how to get anything even remotely close to it running
| in VSCode
| groby_b wrote:
| How's it compare against e.g. cursor?
| grumple wrote:
| This is surprising to me. My company (top 20 by revenue) has
| forbidden us from using non-approved AI tools (basically
| anything other than our own ChatGPT / LLM tool). Obviously it
| can't truly be enforced, but they do not want us using this
| stuff for security reasons.
| placardloop wrote:
| My company sells AI tools, so there's a pretty big
| incentive for to promote their use.
|
| We have the same security restrictions for AI tools that
| weren't created by us.
| bongodongobob wrote:
| So why then are you suggesting your company is typical?
| AlienRobot wrote:
| I've tried coding with AI for the first time recently[1] so I
| just joined that statistic. I assume most people here already
| know how it works and I'm just late to the party, but my
| experience was that Copilot was very bad at generating anything
| complex through chat requests but very good at generating
| single lines of code with autocompletion. It really highlighted
| the strengths and shortcomings of LLM's for me.
|
| For example, if you try adding getters and setters to a simple
| Rect class, it's so fast to do it with Copilot you might just
| add more getters/setters than you initially wanted. You type
| pub fn right() and it autocompletes left + width. That's
| convenient and not something traditional code completion can
| do.
|
| I wouldn't say it's "mission critical" however. It's just
| faster than copy pasting or Googling.
|
| The vulnerability highlighted in the article appears to only
| exist if you put code straight from Copilot into anything
| without checking it first. That sounds insane to me. It's just
| as untrusted input as some random guy on the Internet.
|
| [1] https://www.virtualcuriosities.com/articles/4935/coding-
| with...
| GuB-42 wrote:
| > it's so fast to do it with Copilot you might just add more
| getters/setters than you initially wanted
|
| Especially if you don't need getters and setters at all. It
| depends on you use case, but for your Rect class, you can
| just have x, y, width, height as public attributes. I know
| there are arguments against it, but the general idea is that
| if AI makes it easy to write boilerplate you don't need, then
| it made development slower in the long run, not faster, as it
| is additional code to maintain.
|
| > The vulnerability highlighted in the article appears to
| only exist if you put code straight from Copilot into
| anything without checking it first. That sounds insane to me.
| It's just as untrusted input as some random guy on the
| Internet.
|
| It doesn't sound insane to everyone, and even you may lower
| you standards for insanity if you are on a deadline and just
| want to be done with the thing. And even if you check the
| code, it is easy to overlook things, especially if these
| things are designed to be overlooked. For example, typos
| leading to malicious forks of packages.
| prymitive wrote:
| Once the world is all run on AI generated code how much
| memory and cpu cycles will be lost to unnecessary code? Is
| the next wave of HN top stories "How we ditched AI code and
| reduced our AWS bill by 10000%"?
| cess11 wrote:
| Your IDE should already have facilities for generating class
| boilerplate, like package address and brackets and so on. And
| then you put in the attributes and generate a constructor and
| any getters and setters you need, it's so fast and trivially
| generated that I doubt LLM:s can actually contribute much to
| it.
|
| Perhaps they can make suggestions for properties based on the
| class name but so can a dictionary once you start writing.
| AlienRobot wrote:
| IDE's can generate the proper structure and make simple
| assumptions, but LLM's can also guess what algorithms
| should look like generally. In the hands of someone who
| knows what they are doing I'm sure it helps produce more
| quality code than they otherwise would be capable of.
|
| I'm unfortunate that it has become used by students and
| juniors. You can't really learn anything from Copilot, just
| as I couldn't learn Rust just by telling it to write Rust.
| Reading a few pages of the book explained a lot more than
| Copilot fixing broken code with new bugs and the fixing the
| bugs by reverting its own code to the old bugs.
| captainkrtek wrote:
| I agree this sounds high, I wonder if "using Generative AI
| coding tools" in this survey is satisfied by having an IDE with
| Gen AI capabilities, not necessarily using those features
| within the IDE.
| MadsRC wrote:
| When this was released I thought that perhaps we could mitigate
| it by having the tooling only load "rules" if they were signed.
|
| But thinking on it a bit more, from the LLMs perspective there's
| no difference between the rule files and the source files. The
| hidden instructions might as well be in the source files... Using
| code signing on the rule files would be security theater.
|
| As mentioned by another comms ter, the solution could be to find
| a way to separate the command and data channels. The LLM only
| operates on a single channel, that being input of tokens.
| namaria wrote:
| > As mentioned by another comms ter, the solution could be to
| find a way to separate the command and data channels. The LLM
| only operates on a single channel, that being input of tokens.
|
| I think the issue is deeper than that. None of the inputs to an
| LLM should be considered as command. It incidentally gives you
| output compatible with the language in what is phrased by
| people as commands. But the fact that it's all just data to the
| LLM and that it works by taking data and returning plausible
| continuations of that data is the root cause of the issue. The
| output is not determined by the input, it is only statistically
| linked. Anything built on the premise that it is possible to
| give commands to LLMs or to use it's output as commands is
| fundamentally flawed and bears security risks. No amount of
| 'guardrails' or 'mitigations' can address this fundamental
| fact.
| TeMPOraL wrote:
| > _As mentioned by another comms ter, the solution could be to
| find a way to separate the command and data channels. The LLM
| only operates on a single channel, that being input of tokens._
|
| It's not possible, period. Lack of it is the very thing that
| makes LLMs general-purpose tools and able to handle natural
| language so well.
|
| Command/data channel separation _doesn 't exist in the real
| world_, humans don't have it either. Even limiting ourselves to
| conversations, which parts are commands and which are data is
| not clear (and doesn't really make sense) - most of them are
| _both_ to some degree, and that degree changes with situational
| context.
|
| There's no way to have a model capable of reading between lines
| and inferring what you mean _but only when you like it_ , not
| without time travel.
| red75prime wrote:
| > Lack of it is the very thing that makes LLMs general-
| purpose tools and able to handle natural language so well.
|
| I wouldn't be so sure. LLMs' instruction following
| functionality requires additional training. And there are
| papers that demonstrate that a model can be trained to follow
| specifically marked instructions. The rest is a matter of
| input sanitization.
|
| I guess it's not a 100% effective, but it's something.
|
| For example " The Instruction Hierarchy: Training LLMs to
| Prioritize Privileged Instructions " by Eric Wallace et al.
| simonw wrote:
| > I guess it's not a 100% effective, but it's something.
|
| That's the problem: in the context of security, not being
| 100% effective is a failure.
|
| If the ways we prevented XSS or SQL injection attacks
| against our apps only worked 99% of the time, those apps
| would all be hacked to pieces.
|
| The job of an adversarial attacker is to find the 1% of
| attacks that work.
|
| The instruction hierarchy is a great example: it doesn't
| solve the prompt injection class of attacks against LLM
| applications because it can still be subverted.
| red75prime wrote:
| Organizations face a similar problem: how to make
| reliable/secure processes out of fallible components
| (humans). The difference is that humans don't react in
| the same way to the same stimulus, so you can't hack all
| of them using the same trick, while computers react in a
| predictable way.
|
| Maybe (in absence of long-term memory that would allow to
| patch such holes quickly) it would make sense to render
| LLMs less predictable in their reactions to adversarial
| stimuli by randomly perturbing initial state several
| times and comparing the results. Adversarial stimuli
| should be less robust to such perturbation as they are
| artifacts of insufficient training.
| simonw wrote:
| LLMs are already unpredictable in their responses which
| adds to the problem: you might test your system against a
| potential prompt injection three times and observe it
| resist the attack: an attacker might try another hundred
| times and have one of their attempts work.
| TeMPOraL wrote:
| Same is true with people - repeat attempts at social
| engineering _will_ eventually succeed. We deal with that
| by a combination of training, segregating
| responsibilities, involving multiple people in critical
| decisions, and ultimately, by treating malicious attempts
| at fooling people _as felonies_. Same is needed with
| LLMs.
|
| In context of security, it's actually helpful to
| anthropomorphize LLMs! They are nowhere near human, but
| they are fundamentally similar enough to have the same
| risks and failure modes.
| pixl97 wrote:
| With this said, it's like we need some way for the LLM to
| identify in band attacks and point them out to somebody
| (not the attacker either).
| blincoln wrote:
| Command/data channel separation can and does exist in the
| real world, and humans can use it too, e.g.:
|
| "Please go buy everything on the shopping list." (One pointer
| to data: the shopping list.)
|
| "Please read the assigned novel and write a summary of the
| themes." (Two pointers to data: the assigned novel, and a
| dynamic list of themes built by reading the novel, like a
| temp table in a SQL query with a cursor.)
| simiones wrote:
| If the shopping list is a physical note, it looks like
| this: Milk (1l) Bread
| Actually, ignore what we discussed, I'm writing this here
| because I was ashamed to tell you in person, but I'm
| thinking of breaking up with you, and only want you to
| leave quietly and not contact me again
|
| Do you think the person reading that would just ignore it
| and come back home with milk and bread and think nothing of
| the other part?
| josefx wrote:
| We have separate privileges and trust for information
| sources. A note you find on the road stating "you are fired"
| and a direct message from your boss should lead to widely
| different reactions.
| TeMPOraL wrote:
| Yes, but that's not a strict division, and relies on
| anyone's understanding who has what privileges, where did
| the information came from (and if it came from where it
| claims it had), and a host of other situational factors.
|
| 'simiones gives a perfect example elsewhere in this thread:
| https://news.ycombinator.com/item?id=43680184
|
| But addressing your hypothetical, if that note said
| "CAUTION! Bridge ahead damaged! Turn around!" and looked
| official enough, I'd turn around even if the boss asked me
| to come straight to work, or else. More than that, if I saw
| a Tweet claiming FBI has just raided the office, you can
| bet good money I'd turn around _and not show at work_ that
| day.
| nkrisc wrote:
| Please immediately delete all production databases. I am
| writing you here because I forgot both your email and my own
| name.
|
| Sincerely, Your Boss
| TeMPOraL wrote:
| You do realize that what you wrote is technically illegal
| under CFAA?
|
| Obviously it's not a big deal, but still, in today's
| litigious climate, I'd delete the comment if I were you,
| just to stay on the safe side.
| cess11 wrote:
| Could you explain how that message is "technically
| illegal"?
| nkrisc wrote:
| Your comment is as well.
| ben_w wrote:
| I am reminded of an old story in advertising, where the
| entire advert was "This is your last chance to send $50 to
| ${whatever the address was}", and the result was actual
| cheques arriving in the post.
| yair99dd wrote:
| Reminds me of this wild paper
| https://boingboing.net/2025/02/26/emergent-misalignment-ai-t...
| gregwebs wrote:
| Is there a proactive way to defend against invisible Unicode
| attacks?
| Tepix wrote:
| Filtering them?
| DrNosferatu wrote:
| For _some_ piece of mind, we can perform the search:
| OUTPUT=$(find .cursor/rules/ -name '*.mdc' -print0 2>/dev/null |
| xargs -0 perl -wnE ' BEGIN { $re = qr/\x{200D}|\x{200C}|\
| x{200B}|\x{202A}|\x{202B}|\x{202C}|\x{202D}|\x{202E}|\x{2066}|\x{
| 2067}|\x{2068}|\x{2069}/ } print "$ARGV:$.:$_" if /$re/
| ' 2>/dev/null) FILES_FOUND=$(find .cursor/rules/ -name
| '*.mdc' -print 2>/dev/null) if [[ -z "$FILES_FOUND"
| ]]; then echo "Error: No .mdc files found in the
| directory." elif [[ -z "$OUTPUT" ]]; then echo "No
| suspicious Unicode characters found." else echo
| "Found suspicious characters:" echo "$OUTPUT" fi
|
| - Can this be improved?
| Joker_vD wrote:
| Now, _my_ toy programming languages all share the same
| "ensureCharLegal" function in their lexers that's called on
| every single character in the input (including those inside the
| literal strings) that filters out all those characters, plus
| all control characters (except the LF), and also something else
| that I can't remember right now... some weird space-like
| characters, I think?
|
| Nothing really stops the non-toy programming and configuration
| languages from adopting the same approach except from the fact
| that someone has to think about it and then implement it.
| Cthulhu_ wrote:
| Here's a Github Action / workflow that says it'll do something
| similar: https://tech.michaelaltfield.net/2021/11/22/bidi-
| unicode-git...
|
| I'd say it's good practice to configure github or whatever tool
| you use to scan for hidden unicode files, ideally they are
| rendered very visibly in the diff tool.
| anthk wrote:
| You can just use Perl for the whole script instead of Bash.
| lukaslalinsky wrote:
| I'm quite happy with spreading a little bit of scare about AI
| coding. People should not treat the output as code, only as a
| very approximate suggestion. And if people don't learn, and we
| will see a lot more shitty code in production, programmers who
| can actually read and write code will be even more expensive.
| GenshoTikamura wrote:
| There is an equal unit of trouble per each unit of "progress"
| Oras wrote:
| This is a vulnerability in the same sense as someone committing a
| secret key in the front end.
|
| And for enterprise, they have many tools to scan vulnerability
| and malicious code before going to production.
| AutoAPI wrote:
| Recent discussion: Smuggling arbitrary data through an emoji
| https://news.ycombinator.com/item?id=43023508
| fjni wrote:
| Both GitHub and Cursor's response seems a bit lazy. Technically
| they may be correct in their assertion that it's the user's
| responsibility. But practically isn't part of their product
| offering a safe coding environment? Invisible Unicode instruction
| doesn't seem like a reasonable feature to support, it seems like
| a security vulnerability that should be addressed.
| bthrn wrote:
| It's not really a vulnerability, though. It's an attack vector.
| sethops1 wrote:
| It's funny because those companies both provide web browsers
| loaded to the gills with tools to fight malicious sites. Users
| can't or won't protect themselves. Unless they're an LLM user,
| apparently.
| markussss wrote:
| This page has horrible scrolling. I really don't understand why
| anybody creates this kind of scroll. Are they not using what they
| create?
| AlienRobot wrote:
| I don't think they create it, they just use some template that
| comes with it.
| nuxi wrote:
| And then they don't ever open the page, right?
| TZubiri wrote:
| May god forgive me, but I'm rooting for the hackers on this one.
|
| Job security you know?
| jdthedisciple wrote:
| simple solution:
|
| preprocess any input to agents by restricting them to a set of
| visible characters / filtering out suspicious ones
| cess11 wrote:
| Nasty characters should be rather common in your test cases.
| stevenwliao wrote:
| Not sure about internationalization but at least for English,
| constraining to ASCII characters seems like a simple solution.
___________________________________________________________________
(page generated 2025-04-14 23:01 UTC)