Alex Cheema

Researcher at EXO Labs, a London-based AI software company building open-source tools for running frontier AI locally across Macs, workstations, and heterogeneous hardware. His public work focuses on local AI inference, distributed training, and Apple Silicon AI systems.

Local Frontier AI Still Needs 100x Better Price Performance

Alex Cheema of EXO Labs argues that running frontier AI locally is primarily an inference-stack problem, not a model-training problem. Using a four-Mac Studio GLM 5.1 setup that costs about $40,000 and reaches roughly 20 tokens per second as the current reference point, Cheema says local price-performance still has about 100x to improve through better kernels, interconnects, heterogeneous hardware, energy efficiency, orchestration, and benchmarks. His case is that today’s awkward home cluster is not the endpoint, but evidence of how much optimization remains outside the cloud.

AI EngineerMay 26, 202621 min read