Show HN:WatchLLM – 以成本歸因方式逐步調試 AI 代理
WatchLLM 是在 Hacker News 上推出的一款新工具,旨在透過提供決策、工具調用和回應的逐步時間軸,以及每個步驟的成本歸因,來協助開發者調試 AI 代理。它還具備異常檢測和語義快取功能,以降低 LLM 的費用。
-
Debugging agents is painful - When your agent makes 20 tool calls and fails, good luck figuring out which decision was wrong. WatchLLM gives you a step-by-step timeline showing every decision, tool call, and model response with explanations for why the agent did what it did.
-
Agent costs spiral fast - Agents love getting stuck in loops or calling expensive tools repeatedly. WatchLLM tracks cost per step and flags anomalies like "loop detected - same action repeated 3x, wasted $0.012" or "high cost step - $0.08 exceeds threshold".
The core features:
Timeline view of every agent decision with cost breakdown
Anomaly detection (loops, repeated tools, high-cost steps)
Semantic caching that cuts 40-70% off your LLM bill as a bonus
Works with OpenAI, Anthropic, Groq - just change your baseURL
It's built on ClickHouse for real-time telemetry and uses vector similarity for the caching layer. The agent debugger explains decisions using LLM-generated summaries of why each step happened.
Right now it's free for up to 50K requests/month. I'm looking for early users who are building agents and want better observability into what's actually happening (and what it's costing).
Try it: https://watchllm.dev
Would love feedback on what other debugging features would be useful. What do you wish you had when your agents misbehave?

相關文章