Show HN：WatchLLM – 以成本歸因方式逐步調試 AI 代理

Hacker News·3 個月前

WatchLLM 是在 Hacker News 上推出的一款新工具，旨在透過提供決策、工具調用和回應的逐步時間軸，以及每個步驟的成本歸因，來協助開發者調試 AI 代理。它還具備異常檢測和語義快取功能，以降低 LLM 的費用。

Debugging agents is painful - When your agent makes 20 tool calls and fails, good luck figuring out which decision was wrong. WatchLLM gives you a step-by-step timeline showing every decision, tool call, and model response with explanations for why the agent did what it did.
Agent costs spiral fast - Agents love getting stuck in loops or calling expensive tools repeatedly. WatchLLM tracks cost per step and flags anomalies like "loop detected - same action repeated 3x, wasted $0.012" or "high cost step - $0.08 exceeds threshold".

The core features:

Timeline view of every agent decision with cost breakdown
Anomaly detection (loops, repeated tools, high-cost steps)
Semantic caching that cuts 40-70% off your LLM bill as a bonus
Works with OpenAI, Anthropic, Groq - just change your baseURL

It's built on ClickHouse for real-time telemetry and uses vector similarity for the caching layer. The agent debugger explains decisions using LLM-generated summaries of why each step happened.
Right now it's free for up to 50K requests/month. I'm looking for early users who are building agents and want better observability into what's actually happening (and what it's costing).
Try it: https://watchllm.dev
Would love feedback on what other debugging features would be useful. What do you wish you had when your agents misbehave?

— Hacker News

你的個人知識庫

Show HN：WatchLLM – 以成本歸因方式逐步調試 AI 代理