Local Frontier AI Still Needs 100x Better Price Performance
Alex Cheema of EXO Labs argues that running frontier AI locally is primarily an inference-stack problem, not a model-training problem. Using a four-Mac Studio GLM 5.1 setup that costs about $40,000 and reaches roughly 20 tokens per second as the current reference point, Cheema says local price-performance still has about 100x to improve through better kernels, interconnects, heterogeneous hardware, energy efficiency, orchestration, and benchmarks. His case is that today’s awkward home cluster is not the endpoint, but evidence of how much optimization remains outside the cloud.
AI Engineer·May 26, 2026·21 min read