AI Infrastructure Is Shifting From Accelerator Racks to Distributed Agent Systems

Jensen HuangNVIDIAFriday, June 5, 20267 min read

At Dell Technologies World, Nvidia chief Jensen Huang and Dell CEO Michael Dell argued that enterprise AI is moving from experimental promise to operational infrastructure, with agentic systems driving a sharp increase in compute demand. Huang said agents change the workload from single prompt-response transactions to long-running loops of reasoning, planning and tool use, while Dell framed the response as a pragmatic push toward distributed, “unmetered” intelligence across PCs, data centers and cloud-scale systems.

Useful AI has changed the compute curve

Jensen Huang framed the current shift in AI as a move from novelty to utility. Generative AI, he said, did not stop at generating content; it began generating “content to think with,” which led to reasoning, planning, and agentic systems. The consequence is that AI workloads no longer look like simple query-response transactions. An agent has to understand a task, reason through it, make a plan, use tools, inspect the results, revise the plan, and continue iterating until the job is done.

That loop is the basis for Huang’s compute claim. He said the computation required for these systems has grown by “a hundred x, a thousand x,” because the model may run autonomously for long periods rather than answer a single prompt. In one example, he said a software programming job might run for a week, while doing work that would have taken a team a month. The productivity gain is significant, but it comes paired with a much larger demand for computation.

100x–1,000x

Huang’s estimate of the compute increase for agentic workloads versus earlier query-response AI

Huang described the demand curve as the product of two effects: each AI task now consumes far more computation, and many more people are using agents. He said Nvidia and Dell are both using agents broadly for software development, DevOps, site reliability engineering, CI/CD work, QA, and testing. The pattern he expects is not simply one engineer using one assistant. A strong engineer today may work with an agent, but “a really great engineer in the future,” he said, will orchestrate many agents, which themselves orchestrate sub-agents.

Michael Dell made the enterprise adoption point from Dell’s side. A couple of years earlier, he said, Dell did not have 5,000 enterprise customers for this category. Now, he said, some of the largest companies in the world are “piling into this,” including Lilly, Samsung, and Honeywell. But Dell also emphasized that the workflow transformation is still early. Companies are beginning to imagine what they can become as AI capabilities improve, but at scale, he said, “we haven’t seen that” except in a small number of cases.

Huang’s claim was that this is the moment enterprise AI moves from potential to operational impact. For the past couple of years, he said, many enterprises saw enormous promise but minimal actual use. “Now it’s taken off.” The effect, as he described it, is not merely faster execution of old plans. It changes what organizations attempt. Work that took months now takes weeks, weeks become days, days become hours, and an hour-long task increasingly feels as though it should be instant. Huang said this has changed his own ambition: “How high is up? Well it’s pretty high.”

The agent architecture splits the harness, local models, and frontier models

Dell and Nvidia’s architecture, as Huang described it, is built around the idea that an agent is not just a model. The large language model is the “brain,” and the most computationally intensive part of the system. He pointed to an NVLink 72 system as “the world’s largest scale-up single domain computer,” operating as one giant system that can hold very large models — “1 terabyte, 10 terabyte of parameters, no problem.”

But the agent starts elsewhere: with a harness running inside a secure, governed container. Huang called that container a sandbox. He referred to an Nvidia open-source sandbox, transcribed as “Nvidia Open Shell,” as the security system “just about everybody in the industry’s using.” Inside that environment, he said Nvidia provides a reference harness, transcribed as “NeMo Claw,” which runs on a CPU. That CPU can sit in different parts of the system, depending on where the work needs to happen.

The way to think about agents is this: there’s a large language model, it’s gigantic, it’s the most computationally intensive piece of software the world’s ever known.

Jensen Huang

This is where Huang described the design as hybrid AI. Companies can run specialized agents and open-source models locally, trained for their own domains, data, or skills. Larger frontier models can run in the cloud or in a company’s own data center. Huang said Nvidia’s architecture is “the only architecture in the world” that runs every frontier AI model, and he said it supports every open-source model as well. He also said Anthropic had recently been “leaning into the Nvidia architecture,” which he presented as completing Nvidia’s coverage of frontier models.

The security claim was specific as well. If a company has sensitive models or data, Huang said Nvidia systems are built with confidential computing so the customer does not have to trust the operator running the data center with secure data. His framing was that the same architecture can support cloud AI, local AI, hybrid AI, and agentic systems without forcing a company to choose a single place for inference.

Huang also tied the architecture to a change in compute economics. Earlier CPUs, he said, were built for hyperscale clouds, where the goal was to rent as many CPU cores as possible. In the AI era, he argued, the relevant output is tokens. “You’re not renting CPU cores anymore, you’re generating tokens,” he said. The system therefore has to move through agent work quickly, because token generation is the output of the intelligence.

That is why Huang highlighted Nvidia’s Vera CPU. He said Vera has the highest single-threaded performance of any CPU in the world and three times the memory bandwidth of the fastest CPU in the world. The practical reason, in his explanation, is that agents pound on databases and tools. If the CPU is slow, the large AI system waits for the agent’s surrounding work to complete. He named Starburst and DuckDB as examples of databases that run “incredibly fast” in this architecture.

One architecture is meant to span local machines, data centers, and cloud-scale systems

Michael Dell connected the hardware story to “unmetered intelligence.” His point was that companies and employees should be able to use AI capacity in a PC or in their own data center, with their own data, without constant anxiety over token consumption or a surprise bill. In the older workflow, Dell said, humans did work and passed it to the next human. In the emerging workflow, individuals may manage hundreds or thousands of agents. The infrastructure proposition is meant to support that shift by placing inference and intelligence across local machines, enterprise data centers, and cloud-scale systems.

I love this idea of the unmetered intelligence, right. Where you’ve got the power in your own PC, in your own data center, and you can use it with your own data.

Michael Dell · Source

Jensen Huang then described the onstage hardware as different scales of the same architecture, but his comparisons were tied to the physical machines in front of him: a rack, a desk-side “station,” and a smaller system he called GB10. After signing the rack with Dell, he said one system was “a hundred times” larger than “this one,” identifying the smaller machine as “the station.” He said the larger system used GB300, while the station was the “only computer, desk-side computer in the world” capable of running a one-terabyte, one-trillion-parameter AI model. That, he said, “would basically be a cloud just two days ago” and would have been unimaginable one or two years earlier.

Huang also pointed to another, smaller system that he called “the smallest GB10,” saying it shared the same architecture as the station and the larger GB300 system. His “five, six times larger” comparison was similarly made while referring to the machines onstage, not as a fully specified product hierarchy. The substantive claim was continuity: the same architecture is meant to scale from a small local system, to a desk-side station, to much larger data-center infrastructure.

Dell called the broader result distributed inference and intelligence. Huang called it hybrid AI. Both descriptions point to the same requirement: agents are not served by a single box or a single cloud endpoint, but by a system that coordinates CPUs, GPUs, models, tools, databases, and security boundaries across local and cloud environments.

The partnership framing stayed tied to that architecture. Dell thanked Nvidia for what the two companies have done together for customers, and Huang noted that he and Dell had “grown up together practically,” with Dell saying the relationship had run for 31 years.

AI Application Architecture Inference and Deployment Agents and Autonomy AI Infrastructure and Compute Enterprise AI Adoption

Useful AI has changed the compute curve

The agent architecture splits the harness, local models, and frontier models

One architecture is meant to span local machines, data centers, and cloud-scale systems

The frontier, in your inbox tomorrow at 08:00.