透過分層文件將 AI 編碼模式違規率從 40% 降至 8%

Hacker News·3 個月前

本文探討一個專案團隊如何透過實施分層文件方法，超越單純的 README，來解決 AI 代理的上下文記憶限制，並顯著降低 AI 編碼模式的違規率。

Why AI Agents Need More Than a README

When we started building Cortex TMS with Claude Code, we did what everyone does: wrote a detailed README and expected the AI agent to remember it.

It didn’t work.

Not because the README was bad. Because AI agents don’t work like human developers.

This is what we learned building a CLI tool with heavy AI assistance over 6 months.

Where We Started: The README Approach

Context: Cortex TMS began as a small project (December 2025). Single maintainer, using Claude Code for 80%+ of implementation.

Our setup:

The assumption: If we write good documentation, AI agents will follow it.

What Went Wrong: The Forgetting Problem

After ~2 weeks of development (sprint v2.1), we started noticing patterns:

Observations (what we actually saw):

Example: We documented “use commander for CLI parsing” in README. Three sessions later, Claude Code suggested switching to yargs because it “has better TypeScript support.”

Metrics (from our first 50 commits):

Our interpretation: The AI wasn’t reading (or wasn’t retaining) the README effectively.

Observation vs Interpretation

What we observed:

What we think caused it:

Our hypothesis is that AI agents treat all documentation equally—recent chat context gets the same weight as long-term architectural decisions. When context limits are reached, older decisions get pruned, even if they’re critical.

Think of it like this: a human developer maintains mental “tiers” of knowledge:

AI agents don’t naturally separate these tiers. Everything competes for the same context window.

Important: This is our interpretation based on observing Claude Code’s behavior. We don’t have access to how Claude processes context internally. This could be wrong.

The Experiment: Tiered Memory Architecture

We tried something different: organize documentation by access frequency, not content type.

The Three-Tier System

Tier 1: HOT (Read Every Session)

Purpose: What the AI needs RIGHT NOW for the current task.

Tier 2: WARM (Reference When Needed)

Purpose: What the AI needs SOMETIMES when implementing specific features.

Tier 3: COLD (Archive)

Purpose: What the AI almost NEVER needs, but humans might reference.

Key Design Decisions

Strict Size Limits

Why: Force prioritization. If it’s not important enough for HOT tier, it goes to WARM or COLD.

Task-Oriented Structure
Explicit References

What Changed: Our Results

Scope: Internal development of Cortex TMS (6 months, ~380 commits, single maintainer using Claude Code)

Timeline: Before tiered system (Dec 2025), After tiered system (Jan-Jun 2026)

Before Tiered System (v2.1, ~50 commits)

What we measured:

After Tiered System (v2.2-v2.7, ~330 commits)

What we measured:

What improved:

What stayed the same:

What still hurts:

Real Example: Sprint v2.6 Migration

Here’s a concrete example from our development:

Task: Migrate 7 projects from various templates to Cortex TMS standard

Before tiered system (hypothetical, based on v2.1 experience):

What actually happened (with tiered system, v2.6):

NEXT-TASKS.md:

Result:

The difference: Task was in HOT tier, pattern was in WARM tier with explicit reference.

When Tiered Memory Doesn’t Help

This system has real costs. Here’s when you shouldn’t use it:

1. Small Projects (< 50 commits)

The problem: Overhead of maintaining three tiers outweighs benefits.

Why it fails: For small projects, a single README works fine. The forgetting problem only appears when project complexity exceeds what fits in one document.

Our recommendation: Start with README. Migrate to tiered system when you notice AI forgetting decisions.

2. Solo Developers Without AI Assistance

The problem: Human developers don’t need this structure.

Why it fails: Humans naturally tier knowledge in their heads. The tiered file structure is optimizing for how AI agents process context, not how humans work.

Our recommendation: If you’re not using AI coding assistants heavily (50%+ of commits), stick with conventional documentation.

3. Documentation-Averse Teams

The problem: Maintaining three tiers requires discipline.

Why it fails: If your team doesn’t already document patterns and decisions, adding more structure won’t help. You’ll have three tiers of empty or outdated docs.

Our recommendation: Master conventional documentation first. Tiered memory is optimization, not foundation.

4. Projects With Unstable Architecture

The problem: Rapidly changing patterns make WARM tier thrash.

Why it fails: Every architectural change requires updating PATTERNS.md and DOMAIN-LOGIC.md. If you’re pivoting weekly, maintenance overhead becomes unbearable.

Our recommendation: Wait until architecture stabilizes. Tiered system shines when patterns are established, not during exploration.

5. Teams Without Clear Ownership

The problem: Tiered system requires someone to enforce HOT tier limits.

Why it fails: Without ownership, NEXT-TASKS.md bloats to 400 lines and you’re back to the README problem.

Our recommendation: Assign clear ownership. One person (or rotating role) maintains HOT tier hygiene.

Prerequisites for Success

Only adopt tiered memory if:

The math: Maintaining tiered system costs ~30 min/week. If that doesn’t save more time than it costs, don’t use it.

Implementation Notes

If you’re considering trying this approach, here’s what we learned about implementation:

Start Small

Don’t migrate everything at once. We evolved the system over 4 sprints:

Sprint v2.1: Just added NEXT-TASKS.md (HOT tier only)
Sprint v2.2: Added docs/core/PATTERNS.md (first WARM tier doc)
Sprint v2.3: Added archive system (COLD tier)
Sprint v2.4: Added validation to enforce 200-line limit

Enforce Limits Mechanically

Manual enforcement doesn’t work. We built validation into our CLI:

What it checks:

Why it matters: Without mechanical enforcement, limits drift. Validation makes it objective.

Create Clear Migration Triggers

We documented specific triggers for moving content between tiers:

HOT → COLD triggers:

Why it matters: Without triggers, team debates “should this be archived?” endlessly.

Reference, Don’t Duplicate

HOT tier references WARM tier, doesn’t duplicate it:

❌ Bad (duplicating pattern in NEXT-TASKS.md):

✅ Good (referencing pattern):

Why it matters: Duplication causes drift. When pattern updates, duplicates go stale.

Trade-offs We’re Still Learning

After 6 months with tiered memory, here are open questions we don’t have good answers for:

1. Optimal HOT Tier Size

We enforce 200 lines for NEXT-TASKS.md, but is that right?

What we’ve observed:

Our guess: Somewhere between 150-200 lines is optimal, but this likely varies by AI model and task complexity.

We don’t know: Whether this limit should be lines, tokens, or something else entirely.

2. When to Split WARM Tier Docs

We currently have 4 WARM tier docs (~400-500 lines each). Should we split them further?

Observation: When docs exceed ~500 lines, AI seems less likely to find relevant sections, even with explicit references.

Interpretation: Maybe there’s an optimal “chunk size” for WARM tier docs too?

We don’t know: What that size is, or if it even exists.

3. How Much to Archive

We aggressively archive completed tasks to COLD tier. But should we keep recent history in HOT?

Current approach: Move to archive immediately after sprint ends
Alternative: Keep last 1-2 sprints in HOT for context

We haven’t tested: Whether recent history helps or hurts AI performance.

Try It Yourself

The tiered memory system is built into Cortex TMS templates:

What you get:

Customization:

Our advice: Start with just NEXT-TASKS.md. Add WARM tier docs as you discover patterns worth documenting. Archive aggressively when tasks complete.

What We’re Still Learning

This is our experience with one project, one AI coding assistant (Claude Code), and one maintainer.

We don’t know:

We’re curious:

If you’re experimenting with AI-assisted development, we’d love to hear what you’re learning.

Learn More

If This Sounds Familiar…

Have you noticed AI coding assistants forgetting architectural decisions? Found yourself re-teaching the same patterns across sessions?

We’d love to hear what you’re experiencing and what solutions you’re trying.

We dogfood everything. The tiered memory system emerged from real pain points during Cortex TMS development. We experienced the problems before building the solution. See how we build →

你的個人知識庫