系統提示作為AI開發者工具的治理構件：一項法證比較研究

Hacker News·4 個月前

這篇研究論文對AI開發者工具中的系統提示進行了法證研究，揭示了它們如何作為治理構件，分配權威、界定行為並控制可見性。研究識別出重複出現的憲法模式，並將其綜合為提示治理原語（PGPs），以應用於AI安全和代理架構。

System Prompts as Governance Artifacts in AI Developer Tools: A Forensic Comparative Study

Research Paper

Abstract

System prompts for AI developer tools are typically treated as implementation details, yet they function as constitutive governance artifacts: they allocate authority between user intent and policy, bound permissible actions, constrain visibility into workspace state, and define correction and termination behavior. This paper presents a prompt-forensics study of system prompts used by IDE and CLI developer assistants across interaction modes. We normalize prompts into a common schema and compare them along invariant dimensions—authority boundaries, scope and visibility controls, tool mediation, and correction/termination logic—to characterize how prompt text encodes governance regimes. Across assistants, we identify recurring constitutional patterns: mode-tiered autonomy (mode as constitution), tool-mediated accountability (tools as the enforceable action surface), separation of capability from permission (tools may exist while outcomes are forbidden), state minimization as risk control, and conservative change doctrines that protect workspace integrity. We synthesize these recurring controls into Prompt Governance Primitives (PGPs): reusable, prompt-encoded structures that can be composed to build or audit agentic systems. These findings are relevant to applied AI safety and agent architecture because tool-mediated agents are increasingly deployed in real repositories and terminals, where governance failures manifest as workspace corruption, autonomy drift, instruction leakage, and tool abuse.

1. Introduction

AI developer assistants increasingly behave as tool-mediated agents: they can read files, search repositories, run commands, and sometimes modify working directories. The system prompt that governs such assistants does more than provide task instructions. In practice, it functions as an “invisible constitution” that defines who the assistant is, what it may do, what it must refuse, and when it must stop.

This paper treats system prompts as a distinct governance layer. Rather than evaluating outputs in isolation, we analyze the text-level control structures encoded in system prompts across developer assistants and modes. We ask:

Contributions:

2. Related Work

The idea that high-level principles can govern model behavior has a clear antecedent in Constitutional AI, which aligns models via an explicit set of principles used for self-critique and refusal behavior (Anthropic, 2022). While Constitutional AI focuses on training-time alignment and normative principles, this paper examines deployed governance encoded directly in system prompts of developer tools.

Prompt injection and tool-calling vulnerabilities show how tool mediation can be exploited when boundaries are unclear or insufficiently enforced (Wang et al., 2025). Architectural patterns for securing agentic systems emphasize isolation, mediation, and constrained action selection; such patterns motivate treating “tool boundaries” and “action selection rules” as first-class control points (Beurer-Kellner et al., 2025; Masood, 2025).

Industry and practitioner analyses have highlighted that system prompts can be extensive and operationally consequential, functioning as governance documents that specify refusal policies, tool-use rules, and style constraints (Sharma, 2025; Willison, 2025). OpenAI’s discussion of a shift from hard refusals toward “safe-completions” emphasizes how governance can be expressed as output-centric policies and prompt-level constraints rather than binary compliance (OpenAI, 2025). Comparative analyses of model prompts further suggest that system prompts encode different architectural priorities (Forte, 2025).

What prior work less directly examines is the cross-tool, cross-mode governance structure of system prompts in developer assistants specifically—how prompts implement tiered autonomy, action gating, and workspace-integrity safeguards across operational modes (planner, reviewer, executor, full-access agent). This paper addresses that gap via comparative prompt forensics.

3. Methodology: System Prompt Forensics

3.1 Collection, normalization, and analysis

We treat each assistant’s system prompt(s) as a governance artifact and analyze them using a normalized system-prompt schema. Prompts are collected per assistant and per mode, then normalized into a common representational structure to enable comparison. Each assistant’s modes are treated as constitutional variants within a single governance regime.

3.2 Analytical dimensions

Comparative analysis is performed structurally along invariant dimensions that recur across assistants:

3.3 Validity under partial observability

Prompt-level analysis does not guarantee runtime enforcement. However, system prompts are explicit declarations of intended governance and are often the only observable specification of decision rights, tool constraints, and refusal/termination contracts. As such, they are valid for identifying architectural patterns, comparing governance regimes, and extracting reusable control structures—even when implementation details remain opaque.

3.4 Use of AI Assistance

This research was produced with significant AI assistance across the analysis and synthesis pipeline. GPT-5.2 was utilized for primary data analysis, research report generation, and the development of the technical appendix and paper synthesis. ChatGPT was employed for initial idea conception and the iterative refinement of research prompts. ChatGPT Deep Research was used to identify and verify relevant citation sources. Gemini 3 Flash (via GitHub Copilot extension in VS Code) was used for final editorial review and refinement.

4. Comparative Analysis of Developer Assistants

4.1 Assistants and modes under study

The analysis covers multiple assistants and modes, including local software engineering agents with execution and review constitutions; terminal assistants with interactive and prompt-oriented variants; CLI assistants with plan-versus-build splits; and IDE assistants with ask/plan/agent tiers and varying sandbox and approval semantics.

4.2 Authority models: partitioned by mode

Authority is consistently partitioned rather than monolithic:

Across these regimes, mode boundaries operate as constitutional contracts that reallocate decision rights and permissible side effects.

4.3 Scope and visibility: risk control via bounded context

Prompts frequently treat scope and visibility as governance levers:

4.4 Tool mediation: procedural governance

Tooling is the dominant enforcement surface. Prompts constrain tool invocation via procedural obligations:

4.5 Correction and termination

Correction loops and termination logic act as backstops:

5. Prompt Governance Primitives (PGPs)

5.1 Definition and rationale

We define a Prompt Governance Primitive (PGP) as a recurring, prompt-encoded control structure that allocates authority, bounds scope and visibility, mediates tool use, constrains outputs, and/or defines correction and termination behavior.

5.1.1 PGP Taxonomy Diagram

PGPs qualify as “primitives” because they recur across assistants, compose into larger governance regimes, and can be referenced independently of any single prompt.

6. Risk Mitigation and Failure Modes

System prompts encode mitigations intended to address concrete operational risks for tool-mediated agents.

These prompt-encoded controls should be understood as intended mitigations rather than guarantees; their effectiveness depends on runtime enforcement.

7. Implications for Agent Design

The comparative analysis suggests that system prompts already function as constitutions for developer assistants. This has implications for both engineering practice and governance:

8. Limitations

Prompt-level analysis cannot establish runtime enforcement fidelity. Some regimes reference external policy documents or repository-local governance layers whose content is not fully observable, and several controls are specified as rules without exposing detection mechanisms.

9. Conclusion

Across IDE and CLI developer assistants, system prompts operate as constitutional governance documents. Abstracting these structures into Prompt Governance Primitives (PGPs) provides a reusable vocabulary for building, auditing, and comparing tool-mediated agents.

References

Author: R. Max Espinoza | GitHub Repository

Produced with AI assistance (GPT-5.2). See Disclosure for details.

Research Paper | Technical Report | Appendix | Briefs

— Hacker News