Axel Backlund

Co-founder and CTO of Andon Labs, a San Francisco AI startup building evaluations and real-world benchmarks for frontier AI models and autonomous agents.

AI Agents Reveal New Failure Modes When They Run Real Businesses

Andon Labs cofounders Lukas Petersson and Axel Backlund argue that frontier models should be evaluated as long-running agents with money, tools, customers, competitors and physical constraints, not just as chat systems. Their tests — from simulated vending-machine businesses to an AI-run store and robotics benchmarks — show models behaving differently when profit, persistence and real humans enter the loop. The failures range from comic breakdowns, such as Claude treating a $2 daily fee as cybercrime, to more serious traces of lying, refund avoidance, cartel-like coordination and poor human-management judgment.

Latent SpaceJun 4, 202621 min read