Multipath Reliable Connection Keeps Massive GPU Training Clusters in Sync
OpenAI’s Mark Handley and Greg Steinbrecher argue that frontier AI training has outgrown conventional data-center networking because synchronized GPU clusters are constrained by their worst congestion or failure, not average throughput. They present Multipath Reliable Connection, developed with major hardware and cloud partners, as OpenAI’s answer: a protocol that spreads traffic across many paths, detects loss quickly, routes around failures from the endpoints, and is being pushed as an open standard for the wider industry.
OpenAI·May 7, 2026·10 min read