Microsoft and NVIDIA Redesign PCs and Data Centers for Agentic AI

Satya NadellaNVIDIAWednesday, June 3, 20266 min read

At Microsoft Build, NVIDIA chief executive Jensen Huang joined Microsoft chief executive Satya Nadella to frame their expanded partnership around a single premise: agents are becoming a primary computing workload. Huang argued that this shift requires redesigning PCs, data centers and software together, from RTX Spark devices that can run local autonomous assistants to Grace Blackwell and Vera Rubin systems built for large-scale reasoning and low-latency agent execution. Nadella positioned the work as an extension of Microsoft’s infrastructure and developer platform strategy across Windows, Azure, Fabric, Foundry and GitHub.

Agents are becoming the workload Microsoft and NVIDIA are designing around

Satya Nadella framed NVIDIA’s RTX Spark announcement as part of a renewed push toward “unmetered intelligence right at the edge”: AI capability that is not only accessed through cloud services, but present on the machine in front of the user. Jensen Huang tied that shift to a longer Microsoft-NVIDIA collaboration, saying the current systems grew out of a discussion the two companies began roughly three years earlier about creating a new class of PCs for designers, creators, and AI workloads.

The commercial premise running through the exchange was that agents are no longer treated as an application feature alone. They are being treated as a workload that forces coordinated redesign across client devices, AI factories, and the developer tools those agents will use.

Huang’s central claim was not simply that the PC is becoming more powerful. It was that the role of the PC changes when an autonomous agent can run locally, operate software on the user’s behalf, and continue working when the user is away from the keyboard. He described the arc of Microsoft and NVIDIA’s relationship as moving from DirectX to “this incredible computer that has autonomous systems running.”

The PC evolved from being an incredible tool, to now being a tool that's used autonomously by an AI assistant.

Jensen Huang · Source

Huang gave a remote-work example: while traveling and on the phone, he could text his PC with a coding task or an idea. The PC would open the relevant tools, make design or code changes, and iterate with him remotely. The machine, in that account, stops being only an instrument directly manipulated by a person and becomes an assistant capable of using the instruments itself.

RTX Spark was presented as the local hardware expression of that shift. Huang said it delivers “a petaflops of AI performance” and “a petaflops of NVFP4,” referring to a numerical format he said NVIDIA and Microsoft worked on together. The purpose of the format is to use the system’s 128 gigabytes of memory efficiently enough to fit “maybe a couple of hundred billion parameter model.” Huang called that scale “state of the art” and concluded that “the days of having a really smart assistant running autonomously on the PC is here.”

128GB

memory Huang said RTX Spark can use to fit large local models

The data center is being matched to the AI system, not treated as generic capacity

Nadella connected the desktop discussion to the data center, saying Microsoft and NVIDIA’s AI infrastructure work began with the first supercomputer they built together to train GPT models. He pointed to Microsoft’s Fairwater design as a system “custom built” for the Grace Blackwell era, and said the companies were already validating Vera Rubin.

Jensen Huang described the infrastructure progression by AI workload. The first AI supercomputer the companies built together was based on Ampere. Hopper followed and was “an incredible success.” Those first two generations were focused on pre-training. Grace Blackwell then shifted the center of gravity toward post-training, reinforcement learning, reasoning models, mixture-of-experts systems, and inference.

That shift changed the physical architecture. Reasoning models were “incredibly intelligent” and energy efficient, Huang said, but required giant systems. NVIDIA’s answer was NVLink 72, with “the entire rack” becoming “one computer.” The evolution was from one node to one rack.

Huang said Microsoft has deployed “the largest number of Grace Blackwells in the world today,” adding that it has the “fastest and the largest number” of those systems. He described Fairwater as both a performance and engineering achievement, emphasizing that it is completely liquid cooled, closed loop, and “basically uses almost no water.” Grace Blackwell, he said, increased token generation rate while reducing token generation cost “by an order of magnitude,” specifying “some 30 times over Hopper.”

~30x

Huang’s stated token-generation cost improvement for Grace Blackwell over Hopper

Nadella said Fairwater matched Microsoft’s data-center design to NVIDIA’s system design. Huang later said the companies’ teams aligned “long before the chips taped out” and before the systems were brought up. In his description, Microsoft’s data centers were designed for Vera Rubin, while Vera Rubin was designed to integrate into Microsoft’s “complete stack,” including networking and security.

Vera Rubin is presented as infrastructure for impatient agents

The move from Grace Blackwell to Vera Rubin was explained less as a generic chip upgrade than as a response to a new computing pattern. Jensen Huang said Vera Rubin was created “for a world where these AIs are now agentic.” Hopper, in his taxonomy, was created for pre-training; Grace Blackwell for training, post-training, and inference; Vera Rubin for running agents.

Huang explicitly linked the cloud system to the PC system. The agentic computing pattern for Vera Rubin is “exactly the same computing pattern” that will run on RTX Spark, only at far larger scale. In the data center, that means processing enormous numbers of agents simultaneously, many associated with different customers and partners.

That scale makes security part of the system design rather than an add-on. Huang described a data path in which storage serves as long-term memory and working memory is also protected. The full path is encrypted: data encrypted in transit and “also encrypted in use.” He positioned confidential computing as an area where the companies intend to innovate.

The CPU also changes when the primary user is no longer a human. Huang described Vera as “a revolutionary CPU designed for agents” and contrasted it with past CPUs “designed for humans.” Humans, he said, are more patient than agents. Agents require low latency, because their value depends on fast iteration.

Vera was designed for extremely low latency. And so Vera Rubin is just completely revolutionary.

Jensen Huang · Source

Satya Nadella confirmed that Microsoft had already “stood it up,” and Huang said systems rolling off the line were being stood up at Microsoft because the two teams had already aligned across chips, systems, data centers, networking, and security. Nadella called that “speed of light execution between the teams.”

The partnership extends into the software agents will use

Nadella’s broader point was that the hardware and cloud work only matters if it expands what developers and organizations can build. He named several integration points: NVIDIA models and tooling in Microsoft Foundry, NVIDIA software accelerating Microsoft workloads such as the data warehouse, and NVIDIA capabilities in Windows. The source description for the Build segment also identifies NVIDIA OpenShell secure runtime in GitHub Copilot as part of the expanded partnership, though that element was not discussed in the spoken exchange provided.

Jensen Huang answered by arguing that the companies had been preparing for the current moment for “a decade and a half.” What changed recently, he said, is that agentic systems and converging model quality made AI “useful” in a more concrete, productive sense. He pointed to GitHub as evidence, saying commits into GitHub had gone “completely parabolic” and that, in the last several months, the number of commits increased by a factor of three.

Huang used that claim to connect software productivity to compute demand. If agents are doing productive work, and if “tokens are now profitable,” then both usage and the computation required for agents drive compute demand upward. The response, he said, is to make sure the tools agents use are “fully accelerated.”

Fabric was his main example. Huang said Microsoft Fabric is now fully accelerated, including data processing, SQL, Spark, semantic vector-based workloads, and graph-based workloads. He broadened that to “all of the tools that are available on Azure,” saying the aim is to make them fully GPU accelerated.

The reason was not simply benchmark performance. Huang returned to the agentic workload pattern: agents are impatient. The faster tools return answers, the faster agents can iterate, the faster they can generate tokens. For developers and customers, he said, the desired output is “a lot of tokens” that are both profitable and highly intelligent.

AI Labs and Strategy Inference and Deployment AI Security Agents and Autonomy AI Infrastructure and Compute Coding Assistants

Agents are becoming the workload Microsoft and NVIDIA are designing around

The data center is being matched to the AI system, not treated as generic capacity

Vera Rubin is presented as infrastructure for impatient agents

The partnership extends into the software agents will use

The frontier, in your inbox tomorrow at 08:00.