Show HN:AI 代理的混沌工程

Show HN:AI 代理的混沌工程

Hacker News·

一個名為 agent-chaos 的新開源專案,將混沌工程的原則應用於測試 AI 代理的韌性。該專案旨在透過模擬不可靠的 LLM API 和工具回應,在 AI 代理於生產環境中失敗前,主動找出其弱點。

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

To see all available qualifiers, see our documentation.

Chaos engineering for AI agents

License

Uh oh!

There was an error while loading. Please reload this page.

deepankarm/agent-chaos

Folders and files

Latest commit

History

Repository files navigation

agent-chaos

Image

Chaos engineering for AI agents.

"Introduce a little anarchy. Upset the established order, and everything becomes chaos. I'm an agent of chaos. Oh, and you know the thing about chaos? It's fair!"

Your agent works in demos. It passes evals. Then it hits production: the LLM sends a 500, the tool returns garbage, the stream cuts mid-response. The agent fails silently, returns wrong answers, or loops forever.

agent-chaos breaks your agent on purpose, before production does. For teams building agents for production, not demos.

Why does this exist?

LLM APIs are unreliable. They claim certain rate limits, then behave differently. They accept a stream request, then start sending tokens 10 seconds later. They reject mid-stream. They hang for 20 seconds before returning a 500. We've seen providers return "Sorry about that" as an error message.

Production agent backends run multiple LLMs with retry and fallback because things break randomly. What worked last week might not work today. You often don't realize it until production.

But the chaos isn't just at the transport layer. There's a semantic layer that's harder to catch.

Tools fail in obvious ways (timeouts, errors), but also in subtle ways: empty responses, partial data, wrong data types, malformed JSON, stale information, or data for the wrong entity entirely. A tool might return a 200 OK with an error message buried in the response body. An LLM-backed tool might hallucinate. With MCP, your agent calls tools you don't control, with schemas that can change without notice.

Traditional chaos engineering tools (Chaos Monkey, Gremlin) operate at infrastructure: network partitions, pod failures. They can't corrupt a tool result or cut an LLM stream mid-response.

agent-chaos injects these failures so you can test how your agent handles them before users find out. It integrates with evaluation frameworks like DeepEval, so you can inject chaos and judge the quality of your agent's response.

Core concepts

Scenarios: baseline + variants

A baseline scenario defines a conversation with your agent. A variant adds chaos:

Chaos and assertions

agent-chaos provides chaos injectors for LLM failures (llm_rate_limit, llm_server_error, llm_timeout), tool failures (tool_error, tool_timeout), data corruption (tool_mutate), and more. These are composable and support targeting specific tools, turns, or call counts.

Built-in assertions include MaxTotalLLMCalls, AllTurnsComplete, TokenBurstDetection, among others. For semantic evaluation, agent-chaos optionally integrates with DeepEval, letting you use any DeepEval metric (like GEval) as an assertion.

Both chaos and assertions can be applied per-scenario or per-turn using the at() helper:

Fuzzing

It's difficult to define every failure mode upfront. fuzz_chaos generates random chaos combinations based on a ChaosSpace configuration, so you can explore how your agent behaves under varied conditions.

Fuzzing is for exploration, not CI. See examples/ecommerce-support-agent/scenarios/fuzzing.py for more.

Examples

The examples/ecommerce-support-agent/ directory contains a complete example with an e-commerce support agent built with pydantic-ai, including:

Scenario overview showing baselines, chaos variants, and assertion results:

Image

LLM rate limit injected on turn 1. Agent failed to respond, caught by turn-coherence assertion:

Image

Tool error injected. Agent gracefully handles the failure and offers alternatives:

Image

Status

Under active development.

Supported:

Planned:

About

Chaos engineering for AI agents

Topics

Resources

License

Uh oh!

There was an error while loading. Please reload this page.

Stars

Watchers

Forks

Releases

  3

Packages

  0

Languages

Footer

Footer navigation

Hacker News

相關文章

  1. Show HN:Flakestorm – AI 代理的混沌工程(本地優先,開源)

    4 個月前

  2. 混沌代理人

    26 天前

  3. Show HN:FailWatch – AI 代理的故障關閉斷路器

    4 個月前

  4. AI代理本質上是整合了LLM的CI管道

    4 個月前

  5. 確保代理式AI基礎:無廢話指南 - 第一部分

    4 個月前