NVIDIA Says Vera Runs Agentic Tasks 80% Faster Than x86

NVIDIAMonday, June 1, 20265 min read

NVIDIA is pitching Vera as a data center CPU built for the CPU-side work created by agentic AI, not as a conventional cloud processor optimized mainly for core count and virtualization. The company argues that as agents run Python code, tool calls, retrieval, sandboxed execution and data orchestration around GPUs, CPU delays become a constraint on GPU utilization, throughput and latency. Vera’s case rests on NVIDIA’s custom Olympus cores, LPDDR5X memory bandwidth, a coherent 88-core fabric and NVLink-C2C links into GPU systems, extending its AI platform from acceleration into orchestration.

Vera is pitched as the CPU that keeps agent workloads from starving the GPU

Agentic AI changes what NVIDIA says the CPU has to do. The CPU is no longer treated mainly as a divisible cloud resource; it becomes the conductor for GPU work. In NVIDIA’s formulation, “the CPU is now the conductor, and the GPU is the orchestra.”

That framing sets Vera against the traditional CPU model: maximize cores per socket, slice the processor into virtualized shares, and rent those shares by the hour. NVIDIA’s visual comparison shows one traditional CPU partitioned among separate customers and workloads, including code, document, and search-like tasks.

The agentic model changes the design target. If the CPU is coordinating work around GPUs, CPU delays are not just server-side overhead. NVIDIA says they directly affect GPU utilization, token throughput, latency, and user experience. Vera is introduced as the replacement for that bottleneck: a CPU built for the agentic loop, combining a custom data center CPU core with a scalable coherency fabric to balance performance cores and bandwidth for AI factory output.

Olympus is aimed at the CPU-side work agents generate

At the center of Vera is the NVIDIA Olympus Core, a custom data center CPU core built for workloads such as branch-heavy Python runtimes, tool calls, and sandboxed code execution. The workload list is explicitly agent-shaped: Python code, tool calls, code compilation, scripting, debugging, data analysis, evaluation, decision making, web search, SQL queries, simulation, and sandbox environments.

Those labels matter because Vera is not being presented as a generic CPU refresh. The CPU-side work around agents includes interpreting code, calling tools, running evaluations, executing sandboxed tasks, querying data, searching the web, and moving through branching decision logic. Vera’s first system-level claim is that its cores are tuned for the throughput demands of those workloads.

Each Olympus core is described as throughput-oriented. The named features are a neural branch predictor that evaluates two taken branches per cycle; a 10-wide decode engine that brings in more work each cycle; a large out-of-order engine to keep instructions moving; and advanced prefetchers with what NVIDIA calls a novel graph engine for anticipating the next data path.

The core argument immediately becomes a data-movement argument. NVIDIA’s line is that fast cores are not enough if data arrives late or incorrectly.

Fast cores only matter when data arrives correctly and on time.

Olympus is therefore positioned less as a standalone core design than as one part of a larger CPU architecture meant to keep agent workloads supplied through retrieval, analytics, and sandbox execution.

The memory claim is bandwidth without giving up latency or correction

Vera’s second system-level claim is memory. NVIDIA describes Vera as the first CPU to use LPDDR5X memory while correcting multiple errors simultaneously without compromising bandwidth. The purpose is to keep Olympus cores fed through retrieval, analytics, and sandbox execution.

1.2 TB/s

stated Vera memory bandwidth

The supporting memory figures are three times more bandwidth per core than x86 CPUs with DDR5, and 40% lower peak memory latency versus x86. The comparison is framed as both bandwidth and latency, not one at the expense of the other.

This is central to the agent-loop pitch. If the CPU is coordinating Python runtimes, tool calls, retrieval, analytics, and sandboxed execution around GPU-accelerated work, memory stalls become system-level stalls. Vera is presented as reducing those stalls by pairing Olympus cores with LPDDR5X bandwidth, lower peak latency, and simultaneous multi-error correction.

The fabric keeps 88 cores on a monolithic mesh and links GPUs coherently

Vera’s third system-level claim is the coherency fabric. NVIDIA’s second-generation scalable coherency fabric unifies all 88 Olympus cores on a monolithic mesh, with separate dies for memory and I/O. The cores are not split across chiplets, which NVIDIA says enables 50% faster core-to-core communication than traditional CPUs.

NVIDIA custom Olympus cores on Vera

The specification set attached to Vera includes 88 NVIDIA custom Olympus cores with spatial multithreading; PCIe Gen 6 and CXL 3.1; 164 MB of L3 cache; 3.4 TB/s of core-to-core bisection bandwidth; up to 1.5 TB of LPDDR5X memory; and an NVLink-C2C coherent CPU-CPU and CPU-GPU interface at 1.8 TB/s.

Vera element	Claim shown or stated
CPU cores	88 NVIDIA custom Olympus cores with spatial multithreading
Cache	164 MB L3 cache
Core fabric	3.4 TB/s core-to-core bisection bandwidth
Memory	Up to 1.5 TB LPDDR5X
I/O	PCIe Gen 6, CXL 3.1
Coherent interface	NVLink-C2C at 1.8 TB/s for CPU-CPU and CPU-GPU connections

NVIDIA’s stated Vera CPU specifications

NVLink-C2C is the bridge between the CPU architecture and the broader AI system. Memory-coherent NVLink chip-to-chip connects GPUs directly to Vera’s fabric. Beyond GPUs, the same chip-to-chip technology can scale Vera to multiple sockets, enabling high bandwidth between CPUs.

The stated performance result is 1.8 times the “agentic sandbox performance” of x86 CPUs. That figure connects the fabric and multi-socket scaling claim back to the workload premise: the CPU has to run the sandboxes, tools, code, and data paths that agents depend on while keeping GPUs utilized.

Vera is positioned across compute, networking, and storage

NVIDIA places Vera across several parts of an AI factory: standalone Vera racks, Vera systems tightly coupled to Rubin GPUs, and Vera BlueField-4 STX for context memory and AI storage.

Standalone Vera racks are described as running agent sandboxes, tools, code, and data pipelines. When paired with Rubin GPUs, Vera is said to keep accelerated workflows moving. Vera BlueField-4 STX is positioned for context memory and AI storage. NVIDIA reduces the system view to three words: compute, networking, storage.

The positioning is that Vera extends NVIDIA’s AI platform from GPU acceleration into the CPU-side orchestration work behind agents. Core count and virtualization are treated as the priorities of the previous cloud era; Vera is framed around branch-heavy runtimes, tool calls, sandboxed execution, memory bandwidth, coherent CPU-GPU links, and fabric-level communication across cores and sockets.

Inference and Deployment Agents and Autonomy AI Infrastructure and Compute