Production Analytics Finds Agent Failures That Standard Evals Miss
Scott Clark, co-founder and chief executive of Distributional, argues that teams running LLM agents need to look beyond pre-production evals and dashboards of known metrics. His case is that the most consequential failures often emerge only in production, where agents interact with users, tools and changing models in ways teams did not know to test. Clark proposes an observability stack in which telemetry records what happened, monitoring tracks known signals, and analytics clusters trace behavior to surface unknown failure modes that can become new evals, guardrails, prompts or system fixes.
The TWIML AI Podcast·May 7, 2026·20 min read