[HN Gopher] Sandboxing AI agents at the kernel level
       ___________________________________________________________________
        
       Sandboxing AI agents at the kernel level
        
       Author : dakshgupta
       Score  : 61 points
       Date   : 2025-09-29 16:40 UTC (6 hours ago)
        
 (HTM) web link (www.greptile.com)
 (TXT) w3m dump (www.greptile.com)
        
       | CuriouslyC wrote:
       | Just gonna toss this out there, using an agent for code review is
       | a little weird. You can calculate a covering set for the PR
       | deterministically and feed that into a long context model along
       | with the diff and any relevant metadata and get a good review in
       | one shot without the hassle.
        
         | dakshgupta wrote:
         | That used to be how we did it, but this method performed better
         | on super large codebases. One of the reasons is that grepping
         | is a highly effective way to trace function calls to understand
         | the full impact of a change. It's also great for finding other
         | examples of similar code (for example the same library being
         | used) to ensure consistency of standards.
        
           | arjvik wrote:
           | If that's the case, isn't a grep tool a lot more tractable
           | than a Linux agent that will end up mostly calling `grep`?
        
             | lomase wrote:
             | But then you can't say is powered by AI and get that VC
             | money.
        
               | kjok wrote:
               | Ah ha.
        
           | CuriouslyC wrote:
           | You shouldn't need the entire codebase, just a covering set
           | for the modified files (you can derive this by parsing the
           | files). If your PR is atomic, covering set + diff + business
           | context is probably going to be less than 300k tokens, which
           | Gemini can handle easily. Gemini is quite good even at 500k,
           | and you can run it multiple times with KV cache for cheap to
           | get a distribution (tell it to analyze the PR from different
           | perspectives).
        
       | jt2190 wrote:
       | OT: I wonder if WASM is ready to fulfill the sandboxing needs
       | expressed in this article, i.e. can we put the AI agent into a
       | web assembly sandbox and have it function as required?
        
         | Yoric wrote:
         | You'll probably need some kind of WebGPU bindings, but I think
         | it sounds feasible.
        
       | technocrat8080 wrote:
       | A bit confused, all this to say you folks use standard
       | containerization?
        
         | whinvik wrote:
         | Same. I didn't really understand what the difference is
         | compared to containerization
        
           | rvz wrote:
           | Fundamentally, there is no difference. Blocking syscalls in a
           | Docker container is nothing new and one of the ways to
           | achieve "sandboxing" and can already be done right now.
           | 
           | The only thing that caught people's attention was that it was
           | applied to "AI Agents".
        
             | kjok wrote:
             | What is so fundamentally different for AI agents?
        
               | rvz wrote:
               | Other than the current popular thing which is "AI
               | agents", like all programs, it changes absolutely
               | nothing.
        
               | Yoric wrote:
               | The fact that the first thing people are going to do is
               | punch holes in the sandbox with MCP servers?
        
       | thundergolfer wrote:
       | This is a good explanation of how standard filesystem sandboxing
       | works, but it's hopefully not trying to be convincing to security
       | engineers.
       | 
       | > At Greptile, we run our agent process in a locked-down rootless
       | podman container so that we have kernel guarantees that it sees
       | only things it's supposed to.
       | 
       | This sounds like a runc container because they've not said
       | otherwise. runc has a long history with filesystem exploits based
       | on leaked file descriptors and `openat` without NO_FOLLOW.
       | 
       | The agent ecosystem seems to have already settled on VMs or
       | gVisor[2] being table-stakes. We use the latter.
       | 
       | 1.
       | https://github.com/opencontainers/runc/security/advisories/G...
       | 
       | 2. https://gvisor.dev/docs/architecture_guide/security/
        
       | IshKebab wrote:
       | If you only care about filesystem sandboxing isn't Landlock the
       | easiest solution?
        
       | wmf wrote:
       | "How can I sandbox a coding agent?"
       | 
       | "Early civilizations had no concept of zero..."
        
       | kketch wrote:
       | The seems to be looking to let the agent access the source code
       | for review. But in that case, the agent should only see the
       | codebase and nothing else. For a code review agent, all it really
       | needs are:
       | 
       | - Access to files in the repositorie(s)
       | 
       | - Access to the patch/diff being reviewed
       | 
       | - Ability to perform text/semantic search across the codebase
       | 
       | That doesn't require running the agent inside a container on a
       | system with sensitive data. Exposing an API to the agent that
       | specifically give it access to the above data, avoiding the risk
       | altogether.
       | 
       | If it's really important that the agent is able to use a shell,
       | why not use something like codespaces and run it in there?
        
         | warkdarrior wrote:
         | It would also need:
         | 
         | - Access to repo history
         | 
         | - Access to CI/CD logs
         | 
         | - Access to bug/issue tracking
        
       ___________________________________________________________________
       (page generated 2025-09-29 23:00 UTC)