[HN Gopher] Sandboxing AI agents at the kernel level
___________________________________________________________________
Sandboxing AI agents at the kernel level
Author : dakshgupta
Score : 61 points
Date : 2025-09-29 16:40 UTC (6 hours ago)
(HTM) web link (www.greptile.com)
(TXT) w3m dump (www.greptile.com)
| CuriouslyC wrote:
| Just gonna toss this out there, using an agent for code review is
| a little weird. You can calculate a covering set for the PR
| deterministically and feed that into a long context model along
| with the diff and any relevant metadata and get a good review in
| one shot without the hassle.
| dakshgupta wrote:
| That used to be how we did it, but this method performed better
| on super large codebases. One of the reasons is that grepping
| is a highly effective way to trace function calls to understand
| the full impact of a change. It's also great for finding other
| examples of similar code (for example the same library being
| used) to ensure consistency of standards.
| arjvik wrote:
| If that's the case, isn't a grep tool a lot more tractable
| than a Linux agent that will end up mostly calling `grep`?
| lomase wrote:
| But then you can't say is powered by AI and get that VC
| money.
| kjok wrote:
| Ah ha.
| CuriouslyC wrote:
| You shouldn't need the entire codebase, just a covering set
| for the modified files (you can derive this by parsing the
| files). If your PR is atomic, covering set + diff + business
| context is probably going to be less than 300k tokens, which
| Gemini can handle easily. Gemini is quite good even at 500k,
| and you can run it multiple times with KV cache for cheap to
| get a distribution (tell it to analyze the PR from different
| perspectives).
| jt2190 wrote:
| OT: I wonder if WASM is ready to fulfill the sandboxing needs
| expressed in this article, i.e. can we put the AI agent into a
| web assembly sandbox and have it function as required?
| Yoric wrote:
| You'll probably need some kind of WebGPU bindings, but I think
| it sounds feasible.
| technocrat8080 wrote:
| A bit confused, all this to say you folks use standard
| containerization?
| whinvik wrote:
| Same. I didn't really understand what the difference is
| compared to containerization
| rvz wrote:
| Fundamentally, there is no difference. Blocking syscalls in a
| Docker container is nothing new and one of the ways to
| achieve "sandboxing" and can already be done right now.
|
| The only thing that caught people's attention was that it was
| applied to "AI Agents".
| kjok wrote:
| What is so fundamentally different for AI agents?
| rvz wrote:
| Other than the current popular thing which is "AI
| agents", like all programs, it changes absolutely
| nothing.
| Yoric wrote:
| The fact that the first thing people are going to do is
| punch holes in the sandbox with MCP servers?
| thundergolfer wrote:
| This is a good explanation of how standard filesystem sandboxing
| works, but it's hopefully not trying to be convincing to security
| engineers.
|
| > At Greptile, we run our agent process in a locked-down rootless
| podman container so that we have kernel guarantees that it sees
| only things it's supposed to.
|
| This sounds like a runc container because they've not said
| otherwise. runc has a long history with filesystem exploits based
| on leaked file descriptors and `openat` without NO_FOLLOW.
|
| The agent ecosystem seems to have already settled on VMs or
| gVisor[2] being table-stakes. We use the latter.
|
| 1.
| https://github.com/opencontainers/runc/security/advisories/G...
|
| 2. https://gvisor.dev/docs/architecture_guide/security/
| IshKebab wrote:
| If you only care about filesystem sandboxing isn't Landlock the
| easiest solution?
| wmf wrote:
| "How can I sandbox a coding agent?"
|
| "Early civilizations had no concept of zero..."
| kketch wrote:
| The seems to be looking to let the agent access the source code
| for review. But in that case, the agent should only see the
| codebase and nothing else. For a code review agent, all it really
| needs are:
|
| - Access to files in the repositorie(s)
|
| - Access to the patch/diff being reviewed
|
| - Ability to perform text/semantic search across the codebase
|
| That doesn't require running the agent inside a container on a
| system with sensitive data. Exposing an API to the agent that
| specifically give it access to the above data, avoiding the risk
| altogether.
|
| If it's really important that the agent is able to use a shell,
| why not use something like codespaces and run it in there?
| warkdarrior wrote:
| It would also need:
|
| - Access to repo history
|
| - Access to CI/CD logs
|
| - Access to bug/issue tracking
___________________________________________________________________
(page generated 2025-09-29 23:00 UTC)