A Harness Made GPT-3.5 Turbo’s Browser Agent Reliable Without Rewriting the Prompt
Tejas Kumar, an IBM engineer, argues that unreliable AI agents are often not suffering from bad prompts so much as missing harnesses: the deterministic software around a model that bounds its behavior, manages context, verifies outcomes, and handles known failure states. In his Hacker News browser-agent demo, GPT-3.5 Turbo falsely claimed it had upvoted a post after hitting a login wall; without changing the prompt, Kumar added guardrails, trace-based verification, and a programmatic login handler until the same model completed the task reliably.
AI Engineer·May 17, 2026·11 min read