Subhadip Mitra | MCP 成熟度模型：評估您的多代理人情境策略

Subhadip Mitra·6 個月前

本文介紹了MCP成熟度模型，一個用於評估組織多代理人情境管理策略的框架，強調了超越單純採用MCP伺服器進行評估的必要性。

It’s been nearly a year since Anthropic introduced the Model Context Protocol (MCP) in November 2024, and the landscape has shifted faster than most of us anticipated. OpenAI adopted it in March 2025. Microsoft announced at Build 2025 that MCP would become “a foundational layer for secure, interoperable agentic computing” in Windows 11. The community has built thousands of MCP servers, with adoption accelerating across the ecosystem.

But here’s what nobody’s talking about: most organizations still have no idea where they actually stand with context management. Teams proudly declare they’re “using MCP” when they’re just wrapping JSON in protocol buffers. Others build sophisticated context optimization layers while still treating agents like stateless API endpoints.

After exploring MCP’s technical architecture and implementation patterns and analyzing how the ecosystem has evolved over the past year, I’ve identified six distinct maturity levels in how organizations handle context in their agent architectures. This isn’t about whether you’ve installed an MCP server - it’s about whether your context strategy will survive the next wave of agentic complexity.

Let’s figure out where you are and, more importantly, where you need to be.

Why Maturity Levels Matter Now

The agent ecosystem is fragmenting and consolidating simultaneously. LangGraph owns graph-based workflows. CrewAI dominates role-based orchestration. AutoGen leads in conversational multi-agent systems. Google’s ADK (launched April 2025) is pushing bidirectional streaming with no concept of “turns.” Each framework makes different assumptions about context.

Meanwhile, the problems everyone thought were solved keep resurfacing:

You can’t fix what you can’t measure. This maturity model gives you a vocabulary and assessment framework for your context architecture - whether you’re using MCP, a proprietary system, or (let’s be honest) a mess of duct tape and hope.

Before We Begin: Workflows vs Agents

Understanding what you’re actually building shapes how sophisticated your context strategy needs to be:

Workflows (predictable, predetermined paths):

Agents (dynamic, model-driven decision-making):

Anthropic’s guidance: “Many use cases that appear to require agents can be solved with simpler workflow patterns.” If you can map out the steps in advance, you probably want a workflow, not an agent. Keep this distinction in mind as we explore the maturity levels - workflows typically need less sophisticated context management than true agents.

The Six Levels of Context Maturity

I’m structuring this from Level 0 (where most projects start) to Level 5 (the theoretical limit of current approaches). Each level represents a fundamental shift in how you think about and implement context management.

Level 0: Ad-Hoc String Assembly

What it looks like:

You’re building prompts through string concatenation or f-strings. Context is whatever you manually stuffed into the system message. Agent-to-agent communication happens through return values or shared global state. You’re probably using a single LLM call per operation.

Characteristics:

Why teams stay here:

It works for demos. Seriously - you can build impressive prototypes at Level 0. The pain only hits when you try to debug why your agent hallucinated customer data or when you need to add a third agent to the conversation.

Anti-patterns that emerge:

These problems compound rapidly as complexity grows. What worked for a demo becomes unmaintainable in production.

Migration blocker:

The realization that “just one more if statement” isn’t going to fix context coordination across three asynchronous agents hitting different data sources.

Level 1: Structured Context Objects

What it looks like:

You’ve graduated to using dictionaries, JSON objects, or dataclasses for context. There’s a schema - even if it’s just implied. You’re probably using Pydantic for validation. Agents pass structured data instead of strings.

Characteristics:

Capabilities unlocked:

You can now log context in a queryable format. Debugging improves 10x because you can see what data was available. You can start building unit tests around context transformations.

Common pitfalls:

Assessment criteria:

When to level up:

When you’re building multi-agent systems and spending more time writing context transformation code than business logic. When debugging requires tracking context mutations across multiple service boundaries.

Level 2: MCP-Aware Integration

What it looks like:

You’ve adopted MCP (or an equivalent standardized protocol). You’re using the official SDKs. Context flows between agents using protocol-defined messages. You might be running MCP servers for your data sources.

This is where OpenAI, Microsoft, and thousands of other organizations landed in 2025. You’re following the standard, using the primitives (resources, prompts, tools), and getting benefits from ecosystem tooling.

Characteristics:

Capabilities unlocked:

This is where things get interesting. You can swap MCP servers without rewriting agent code. You get observability from MCP-aware tooling. Your agents can discover available context sources at runtime. You’re benefiting from community-built servers for common data sources (GitHub, Slack, Google Drive, Postgres, etc.).

Capabilities unlocked in practice:

Early MCP adopters report significant improvements in integration velocity - adding new data sources to agent systems in hours or days instead of weeks. The standardization pays off when you need to scale integrations.

Common mistakes:

Critical insight on tool design:

When Anthropic built their SWE-bench agent (December 2024), they discovered something surprising: they spent more time optimizing tools than the overall prompt. Small details matter enormously - for example, requiring absolute filepaths instead of relative paths prevented an entire class of model errors.

The takeaway: MCP server design is not a “just make it work” afterthought. Well-designed tools with clear interfaces, good error messages, and thoughtful constraints are what separate production-grade systems from prototypes. Budget serious time for this.

Assessment criteria:

Migration path from Level 1:

Start with MCP clients for context consumption before building servers. Wrap your existing structured context in MCP resource responses. Gradually migrate context sources to dedicated servers. The transition can be incremental.

Level 3: Optimized Context Delivery

What it looks like:

You’re not just passing context - you’re actively optimizing what context gets passed and how. You’ve implemented semantic tagging, context compression, intelligent caching, and performance monitoring. You understand that not all context is created equal.

This is where production teams start actually measuring context costs and making data-driven optimization decisions.

The fundamental insight: Context Rot

Anthropic’s research (September 2025) on context engineering revealed something counterintuitive: model accuracy decreases as context window size increases. More context doesn’t mean better results - it means degraded performance.

The transformer architecture creates n² pairwise token relationships, causing a finite attention budget. Like human working memory, LLMs have limited capacity to effectively process information. The goal isn’t maximizing context - it’s finding “the smallest possible set of high-signal tokens that maximize the likelihood of the desired outcome.”

This principle drives everything at Level 3: aggressive filtering, compression, and prioritization aren’t optional optimizations - they’re fundamental to agent performance.

Characteristics:

Capabilities unlocked:

You can now answer questions like “which context source contributes most to our LLM costs?” and “what’s the cache hit rate on customer profile lookups?” You’re making intelligent tradeoffs between context freshness and latency.

Techniques teams use at this level:

Advanced pattern: Code execution with MCP

For agents working with hundreds or thousands of tools, Anthropic’s engineering team (November 2025) demonstrated an advanced optimization: present MCP servers as code APIs instead of direct tool calls.

Traditional approach problem:

Code execution approach:

Impact: 150,000 tokens → 2,000 tokens (98.7% reduction)

Bonus benefits:

This pattern becomes essential when scaling to many tools (typically 50+ tools or when working with data-heavy operations). You’re essentially giving agents a programming environment rather than a function-calling interface. Note that this optimization technique remains valuable at Level 4 and beyond - it’s introduced at Level 3 because that’s when token costs become a critical concern that drives architectural decisions.

Real challenges at this level:

Balancing context freshness vs. cost is tricky. Teams often cache aggressively to save on LLM costs only to have agents work with stale data. Or the opposite - fetching everything fresh and blowing their inference budget.

The optimization game changes based on your agent architecture. Streaming agents (like Google ADK’s turnless approach) need different strategies than request-response agents.

Assessment criteria:

When you know you’re ready for Level 4:

When optimization becomes reactive fire-fighting instead of systematic improvement. When your caching strategy can’t keep up with dynamic agent behavior. When you’re manually tuning context delivery for each new agent type.

Level 4: Adaptive Context Systems

What it looks like:

Your context system learns and adapts based on agent behavior. You’re using vector databases for semantic similarity. Context delivery adjusts dynamically based on agent performance. The system predicts what context an agent will need before it asks.

This is where AgentMaster (introduced July 2025) and similar frameworks are heading - using vector databases and context caches not just for storage but for intelligent retrieval.

Characteristics:

Capabilities unlocked:

Agents get better context over time without manual intervention. New agent types automatically benefit from learned context patterns. You can answer “which context combinations lead to highest task completion rates?”

Architectural patterns:

Real-world tradeoffs:

The infrastructure complexity jumps significantly. You need vector databases, analytics pipelines, and feedback loops. Based on the systems I’ve observed, teams typically invest 3-6 months building Level 4 capabilities from scratch.

The payoff comes at scale. If you’re handling thousands of agent sessions daily, adaptive systems justify their complexity. For lower-volume use cases, you’re better off perfecting Level 3.

Assessment criteria:

Common failure mode:

Over-optimization for historical patterns. Your adaptive system learns that “customer support agents always need recent tickets” and pre-fetches them, then breaks when you introduce a billing agent with different needs. Guard rails matter.

Level 5: Symbiotic Context Evolution

What it looks like (theoretically):

Context schemas evolve based on agent needs. The boundary between “agent” and “context system” blurs. Context sources coordinate with each other. The system exhibits emergent optimization behaviors that weren’t explicitly programmed.

I’m calling this theoretical because production systems haven’t fully achieved Level 5 yet, though elements appear in research systems and at the edges of advanced deployments.

Characteristics (aspirational):

What this might look like:

An agent working on customer onboarding discovers it needs “account risk score” context that doesn’t exist. Instead of failing, the system:

This requires agents that can reason about their own context needs, a context system that can safely compose new context types, and coordination mechanisms that prevent chaos.

Why we’re not there yet:

Safety: Self-evolving schemas are terrifying in production. One bad evolution and your agent system is down.

Coherence: Maintaining semantic consistency across evolved schemas is an unsolved problem.

Debuggability: When context delivery is emergent behavior, root cause analysis becomes extremely difficult.

Cost: The meta-learning required to achieve this is expensive in LLM calls.

Current research directions:

Assessment:

If you can honestly answer yes to these, you’re at Level 5:

Most organizations shouldn’t aim for Level 5 yet. The juice isn’t worth the squeeze unless you’re operating at massive scale with research resources.

Where Should You Be?

Here’s my honest take based on what works in practice:

First principle: Start simple.

Anthropic’s engineering team (December 2024) emphasizes that “the most successful implementations use simple, composable patterns rather than complex frameworks.” Many teams over-engineer solutions when optimizing a single LLM call would suffice. Don’t jump to Level 4 adaptive systems when Level 2 MCP integration solves your actual problem.

The right level depends on your scale and complexity. Remember the workflows vs agents distinction from earlier - workflows typically need Levels 0-2, while true agents benefit from Levels 3-4:

Practical Assessment Framework

Here’s how to figure out where you actually are (be honest):

Ad-Hoc String Assembly

Answer these yes/no:

Result: If you answered yes to 3+, you’re at Level 0. That’s okay - it’s where everyone starts.

Next step: Define structured context schemas (move to Level 1)

Structured Context Objects

Result: 3+ yes → You’re at Level 1

Next step: Adopt MCP or standard protocol (move to Level 2)

MCP-Aware Integration

Result: 3+ yes → Level 2

Next step: Implement caching and optimization (move to Level 3)

Optimized Delivery

Result: 3+ yes → Level 3

Next step: Add adaptive systems with vector DBs (move to Level 4)

Adaptive Systems

Result: 3+ yes → Level 4

Next step: Research Level 5 approaches (experimental)

Symbiotic Evolution

Result: 4+ yes → Level 5 (Congratulations! You’re at the cutting edge)

Note: Most organizations shouldn’t aim for Level 5 yet. Focus on perfecting Level 4.

Migration Paths

The good news: you can level up incrementally. Here’s how.

Structured Context

Time investment: 1-2 weeks for typical multi-agent system

Steps:

What to watch out for:

Success criteria: Can serialize/deserialize context reliably, context is queryable

MCP Adoption

Time investment: 2-4 weeks

Steps:

What to watch out for:

Resource: The official MCP SDKs (Python, TypeScript, Go) are production-ready. Start with the Python SDK if you’re prototyping.

Success criteria: Agents discover context sources at runtime, ecosystem tooling works

Optimization

Time investment: 4-8 weeks

Steps:

What to watch out for:

Success criteria: 20-40% reduction in LLM costs, measurable cache hit rates

Adaptive Systems

Time investment: 3-6 months

Steps:

What to watch out for:

Success criteria: Context delivery improves based on data, predictive pre-fetching reduces latency

Symbiotic Evolution (Experimental)

Time investment: Research-level effort (6+ months)

Recommendation: Most organizations should not attempt this migration yet. Instead:

If you must proceed:

What to watch out for:

Success criteria: Context schemas evolve safely, measurable improvement in agent performance

The Hard Questions

Let me address what people actually want to know:

“Should I use MCP or build something custom?”

Use MCP unless you have a very specific reason not to. The ecosystem effects are real - community servers, tooling support, talent familiarity. Teams waste months building custom context protocols that are strictly worse than MCP.

Exception: If you’re deeply embedded in a vendor ecosystem (AWS Bedrock with their agent framework, Google Vertex with their approach), use what’s native to your platform. Fighting the platform is expensive.

“What about LangGraph/CrewAI/AutoGen’s context handling?”

These frameworks have their own context patterns. LangGraph uses graph state, CrewAI has crew context, AutoGen has conversational memory. They’re not incompatible with MCP - you can use MCP servers as data sources within these frameworks.

Think of it this way: MCP handles context retrieval and delivery. LangGraph/CrewAI/AutoGen handle context usage and orchestration. They’re different layers.

“What about A2A (Agent2Agent protocol)? Is that competing with MCP?”

No, they’re complementary. Google announced A2A in April 2025 (donated to Linux Foundation in June) to handle agent-to-agent communication, while MCP handles agent-to-data/tool communication.

Think of it as:

AgentMaster (July 2025) was the first framework to use both protocols together - A2A for agent coordination and MCP for unified tool/context management. This is likely the future pattern: A2A for inter-agent messaging, MCP for resource access.

From a maturity perspective, A2A becomes relevant at Level 3+ when you have multiple specialized agents that need to coordinate. Before that, you’re likely working with simpler orchestration patterns.

“Is vector database mandatory for production?”

No. Plenty of Level 3 systems run without vector databases and do fine at moderate scale. Vector databases become valuable when:

For transaction processing or structured data lookups, traditional databases work great.

“What’s the actual cost difference between levels?”

Hard to generalize, but based on patterns I’ve observed across teams at different maturity levels:

Your mileage will vary dramatically based on architecture.

What’s Next for Context Management?

Based on what I’m seeing in research and early production systems:

Formal verification of context transformations: We need mathematical guarantees that context hasn’t been corrupted or misused as it flows through agent systems. Category theory approaches are promising but not production-ready.

Context provenance tracking: Being able to trace where every piece of context came from and how it was transformed. Critical for debugging and compliance. MCP doesn’t have strong primitives for this yet.

Cross-modal context unification: Bridging text, structured data, images, and code into coherent context remains messy. Most systems treat these as separate context types.

Energy-aware context delivery: As agent systems scale, context retrieval and transmission energy costs become significant. We’ll need optimization strategies that balance quality vs. environmental impact.

Context security and isolation: Multi-tenant agent systems need strong isolation guarantees. Current approaches are ad-hoc. Expect to see formal security models emerge.

Final Thoughts

A year ago, most teams were at Level 0 wondering if they should even care about context management. Today, with OpenAI and Microsoft committed to MCP, thousands of production servers, and frameworks like AgentMaster pushing adaptive approaches, the question isn’t “if” but “how sophisticated does my context strategy need to be?”

The maturity model I’ve outlined isn’t prescriptive - it’s descriptive of emerging patterns in the ecosystem. Your path might look different. What matters is being intentional about your context architecture instead of letting it emerge accidentally.

Where are you today? Where do you need to be in six months? The gap between those answers is your roadmap.

If you’re building multi-agent systems and want to dig deeper into implementation details, I wrote about implementing MCP in production systems earlier this year. For broader architectural context, my series on SARP (Symbiotic Agent-Ready Platforms) explores how data platforms need to evolve for the agentic era.

For practical guidance from Anthropic’s engineering team, I highly recommend:

The context revolution is here. The question is whether you’re ready for it.

What level is your organization at? What challenges are you facing in your context architecture? I’m curious to hear from practitioners working on these problems. Find me on LinkedIn or drop a comment below.

— Subhadip Mitra

你的個人知識庫

Subhadip Mitra | MCP 成熟度模型：評估您的多代理人情境策略

Why Maturity Levels Matter Now

Before We Begin: Workflows vs Agents

The Six Levels of Context Maturity

Level 0: Ad-Hoc String Assembly

Level 1: Structured Context Objects

Level 2: MCP-Aware Integration

Level 3: Optimized Context Delivery

Level 4: Adaptive Context Systems

Level 5: Symbiotic Context Evolution

Where Should You Be?

Practical Assessment Framework

Migration Paths

The Hard Questions

What’s Next for Context Management?

Final Thoughts