Laurie Voss

Laurie Voss is Head of Developer Relations at Arize AI, where he focuses on AI observability, evaluations, and developer education. He was previously a co-founder and executive at npm Inc., including roles as CTO and Chief Data Officer, helping build the JavaScript package-management ecosystem.

Choosing The Right Eval Matters More Than Tuning The Judge

Laurie Voss of Arize argues that agentic applications need the same engineering discipline as other production software: instrumentation, inspectable traces, targeted evals, and controlled experiments, not a handful of prompts that “look right.” In a hands-on workshop using a financial analysis agent, Voss shows how teams should read traces before writing evals, classify failures by root cause, and combine deterministic checks, LLM judges, custom rubrics, and human-labeled meta-evaluation. His central warning is that the choice of eval can dominate the result: the same agent scored 0 out of 13 on a correctness eval and 13 out of 13 on a faithfulness eval because the first judge was asking the wrong question.

AI EngineerMay 14, 202624 min read