沙盒化AI程式碼代理：實用指南

Hacker News·4 個月前

本文提供一份實用指南，介紹如何為Claude Code、Codex和Gemini CLI等AI程式碼代理進行沙盒化，詳細說明了真實的密鑰洩露和提示注入風險，以及如何配置安全機制。

10 min read

January 6, 2026

Sandboxing AI Coding Agents

A practical guide to sandboxing AI coding agents like Claude Code, Codex, and Gemini CLI.

Sandboxing AI Coding Agents

If you're running Claude Code, Codex, or Gemini CLI, do you know what they can actually do on your machine? Can the agent exfiltrate your SSH keys? Send your environment variables to an external server? Modify your shell config to run something malicious next time you open a terminal?

This uncertainty bothered me enough to dig in. All three CLIs have sandboxing capabilities that provide safety mechanisms, but you may or may not have them enabled. The good news is that enabling sandboxing is straightforward and rarely slows you down. But you need to understand what it protects against and where the gaps are.

This post covers the real risks, how each CLI implements sandboxing, and what to configure before you trust them with your codebase.

The Risks Are Real

If you're worried about using AI agents for development, your concerns are legitimate.

Secret exposure. Environment variables containing API keys, database passwords, and cloud credentials are accessible to the model. Sandboxing doesn't automatically protect them. They live in memory and are inherited by child processes unless explicitly blocked. Don't assume sandboxing handles this.

Prompt injection. Malicious instructions can easily be embedded in code comments, README files, or package documentation. When the agent ingests this content, it might follow the instructions. This is OWASP's #1 risk for LLM applications, and it cannot be fully solved at the model level. I've reproduced jailbreaks myself on Claude Opus 4.5 running in Claude Code.

Permission fatigue. Reddit threads are full of engineers admitting they click "approve" reflexively, or use --dangerously-skip-permissions because the friction is unbearable. One user put it bluntly: "Format my hard drive if you want... JUST DON'T MAKE ME CONFIRM ANOTHER BASH COMMAND!!!"

Accidental damage. Engineers may approve a command that accidentally trashes their code changes or even their entire development system. Recovery depends entirely on git discipline and backup practices.

What Does Sandboxing Mean in this Context?

Sandboxing runs a process in an isolated environment with restricted capabilities, with the goal of constraining what actions it can take.

The big three coding agents (Anthropic's Claude Code, OpenAI's Codex, and Google's Gemini CLI) implement sandboxing similarly, but with different defaults and nuances.

Sandboxing in these CLIs implements two main types of boundaries:

Filesystem isolation. What files can the agent read and write? Can it read your private keys in ~/.ssh? Can it modify files outside your project directory? Can it write to shell config files like .bashrc?

Network isolation. What can the agent make network requests to? Can it make API calls to services you haven't approved? Even an HTTP GET request can exfiltrate secrets via the URL path or query parameters.

Sandboxing operates at a lower, more fundamental level than the permission prompts you often see when using these CLIs. The permission prompts depend on the user making the right choice in the moment, while sandboxing doesn't.

You may or may not have sandboxing enabled right now. Don't assume. Check.

How Each Tool Implements Sandboxing

All three tools use OS-level isolation. Here's how they compare:

As you can see, it's important to review your sandbox state in each tool to make
sure you've opted in to sandboxing.

In my testing, your shell environment variables are always available in commands
issued by the agents, regardless of sandboxing settings. You may want to customize
your shell environment accordingly if you want additional protections on this front.

All three CLIs on macOS rely on a tool called sandbox-exec that Apple has marked
deprecated. This is worth watching to see if it becomes problematic.

Quick Start

Claude Code (sandboxing docs) — Run /sandbox and choose a mode:

Codex (sandboxing docs) — Sandboxing is on by default. You can confirm by entering /status. The command line options include:

Gemini CLI (sandboxing docs) — Enable sandbox mode explicitly:

Not a Silver Bullet

Think of sandboxing as one layer of defense, not a complete solution. Here are some things to keep in mind, even when you have sandboxing enabled.

Domain allowlisting is coarse. If you allow github.com, the agent can push to any repo you have access to. If you allow npmjs.com, it could publish a package.

Trusted code can be compromised. If a dependency contains adversarial instructions in comments, the agent will see and potentially follow them. The sandbox limits response, but cannot prevent the agent from being influenced more generally.

Insecure code generation. Even without malicious intent, AI agents can generate code with vulnerabilities. Sandboxing doesn't help here. Code review does.

Escape hatches exist. Every tool provides --yolo or danger-full-access modes. Your actual security is only as strong as your team's discipline in avoiding these modes, or using them only in carefully constructed environments.

Security bugs happen. Security is hard to get right. All three tools have had vulnerabilities discovered and patched. For example: [1] [2] [3]

The good news: Codex and Gemini CLI are fully open source, and Claude Code's sandbox implementation is open source (though the rest of Claude Code is not). Security fixes happen in public. Keep your CLIs updated.

Recommendations

By risk profile:

Universal:

The Bottom Line

Sandboxing doesn't make AI coding assistants entirely safe, but it does give you guarantees where you'd otherwise have none.

Familiarize yourself with the sandbox settings and how to adjust them according to the risk profile associated with your active work.

Be sure to keep your CLIs updated, since security patches come out regularly.

References

Official Documentation:

Security Research & Incidents:

Open Source:

Curtis Myzie

Founder of Deep Noodle with 20 years of engineering and leadership experience as CTO, VP of Engineering, and Principal Engineer. Helping design and build next generation SaaS products.

Deep Noodle

— Hacker News

你的個人知識庫

沙盒化AI程式碼代理：實用指南

Sandboxing AI Coding Agents

Sandboxing AI Coding Agents

The Risks Are Real

What Does Sandboxing Mean in this Context?

How Each Tool Implements Sandboxing

Quick Start

Not a Silver Bullet

Recommendations

The Bottom Line

References