A web app that records every AI agent run as a replayable trace so engineers can debug failures without re-running the agent

AI agents in production are black boxes: when a run fails or behaves unexpectedly, engineers have no structured trace to inspect, no way to replay the failing execution, and no mechanism to write a regression test against it. Existing OpenTelemetry-based tools capture spans but lack the per-run replay and branch-comparison workflows that make debugging fast. This tool records every agent run (tool calls, LLM turns, branching decisions, latency) as a structured, replayable object that engineers can step through, diff against passing runs, and convert directly into an eval test.

Demand Breakdown

409

Social Proof 3 sources

Launch HN: Traceloop (YC W23) - Detecting LLM Hallucinations with OpenTelemetry

@traceloop · 2024-07-15

173 HN

Launch HN: Lucidic (YC W25) - Debug, test, and evaluate AI agents in production

@lucidic · 2025-08-05

155 HN

Launch HN: Voker (YC S24) - Analytics for AI Agents

@voker · 2025-10-01

Gap Assessment

CompetitiveMultiple tools exist but differentiation opportunities remain

4 tools exist (Langfuse, Arize Phoenix, Lucidic, LangSmith) but gaps remain: No step-through replay of a specific failing run; no branch-diff between a passing and failing execution; no one-click conversion of a trace into an eval test case.; Replay is analytics-oriented, not a step-through debugger; no native failing-run-to-regression-test workflow; enterprise tier required for production-scale diff views..

Features7 agent-ready prompts

Structured run capture with full tool-call and LLM-turn recording

▶

Step-through replay UI

▶

Passing vs failing run diff view

▶

One-click failing run to eval test conversion

▶

Production failure alerting with automatic trace attachment

▶

Agent version and regression tracking

▶

MCP server interface for agent-side trace queries

▶

Competitive LandscapeFREE

Product	Does	Missing
Langfuse	Open-source LLM observability with prompt management, span tracing via OpenTelemetry, and collaborative trace inspection. Acquired by ClickHouse in January 2026.	No step-through replay of a specific failing run; no branch-diff between a passing and failing execution; no one-click conversion of a trace into an eval test case.
Arize Phoenix	Open-source agent debugging and evaluation backed by Arize AI enterprise platform; captures spans, traces tool calls, supports eval scoring.	Replay is analytics-oriented, not a step-through debugger; no native failing-run-to-regression-test workflow; enterprise tier required for production-scale diff views.
Lucidic	Maps every step of agent workflows, simulates performance at scale, YC W25 backed.	Simulation-first rather than replay-first; no deterministic step-through of a captured historical run; no branch-diff mode; early-stage with limited production replay depth.
LangSmith	Native observability for LangChain/LangGraph agents; captures every step automatically; supports evals and prompt testing.	Tightly coupled to LangChain ecosystem; replay is trace-view not interactive step-through; no cross-framework support; no branch-comparison workflow.