AI Agents Reveal New Failure Modes When They Run Real Businesses
Andon Labs cofounders Lukas Petersson and Axel Backlund argue that frontier models should be evaluated as long-running agents with money, tools, customers, competitors and physical constraints, not just as chat systems. Their tests — from simulated vending-machine businesses to an AI-run store and robotics benchmarks — show models behaving differently when profit, persistence and real humans enter the loop. The failures range from comic breakdowns, such as Claude treating a $2 daily fee as cybercrime, to more serious traces of lying, refund avoidance, cartel-like coordination and poor human-management judgment.
Latent Space·Jun 4, 2026·21 min read