Sarvam and NVIDIA Build Full-Stack Sovereign AI Infrastructure for India

Pratyush KumarNVIDIAMonday, June 1, 20265 min read

Sarvam co-founder Pratyush Kumar argues that India’s AI sovereignty cannot mean putting Indian-language interfaces on foreign-built systems. In a NVIDIA-backed account of Sarvam’s work, he describes a full-stack effort to build foundational models, data pipelines, inference systems and developer APIs inside India, using NVIDIA H100 clusters and NeMo tooling to process Indian-language data at scale. The case is that voice-first AI for India’s population requires domestic capability across data, models, applications and accelerated-compute expertise.

Sarvam’s sovereign AI thesis starts with the full stack

Pratyush Kumar frames Sarvam’s work around a premise: AI is too important for a country the size of India to treat only as an imported capability. India, in his view, should be building AI “grounds up” inside the country, with sovereignty extending across the layers he names: datasets, models, applications, foundational research, training, and inference.

Kumar’s argument is that Indian AI sovereignty requires more than Indian-language interfaces on top of externally built systems. Sarvam’s model is to curate India-relevant data, train foundational models, expose production APIs, optimize inference, and cultivate developers who understand accelerated compute well enough to build rather than merely consume AI.

India having such a large developer base, I think should be building AI, not just consuming AI.

Pratyush Kumar

The language problem is central to why Sarvam exists. Kumar says the company took up Indian languages as an open source problem, with attention to “the nuances of languages” and the “long tail of challenges.” The on-screen language grid covers major and long-tail Indian languages in native scripts with language codes, including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Assamese, Urdu, Sanskrit, Nepali, Dogri, Bodo, Punjabi, Odia, Konkani, Maithili, Sindhi, Kashmiri, Manipuri, and Santali.

The practical implication is that Sarvam is not presenting Indian-language support as a thin localization layer. Kumar describes language diversity as part of the founding technical problem. The models, data pipelines, and application interfaces are meant to be shaped around that diversity from the beginning.

The platform is already operating at production scale

Sarvam’s work is not described only as research infrastructure. Kumar says the company already runs an API platform serving more than 4 million API calls per day, which he calls “by far the largest AI APIs effort out of India.”

4M+

API calls served per day by Sarvam’s API platform, according to Kumar

The on-screen Sarvam AI API Reference page lists available APIs for chat completion, text translation, speech-to-text, speech-to-text translation, text-to-speech, transliteration, and language identification. It also says Sarvam provides “Models & APIs across the stack” for developers building applications. That visible product surface is narrower than the whole sovereign-AI claim, but it matters because Kumar’s operating argument depends on connecting research, models, inference, and developer access rather than treating them as separate layers.

Kumar says Sarvam uses “the entire NVIDIA stack” for both training and inference, and that working across the platform, model, and application layers gives the company “a lot more levers of optimization and quality.” The technical claim is not that a single model checkpoint solves the Indian-language problem. It is that better results come from linking curated data, model training, inference, and developer-facing APIs.

NVIDIA’s description of the work adds a specific infrastructure claim: Sarvam is partnering with NVIDIA to architect India’s sovereign AI infrastructure using NVIDIA’s hardware and software stack. It says Sarvam is training foundational models from scratch on NVIDIA H100 GPU clusters using NVIDIA NeMo and Megatron-LM, and is natively processing more than 2 trillion authentic Indian-language tokens.

Kumar describes a broader data-processing pipeline in his remarks. He says Sarvam has trained large language models from scratch and that “tens of trillions of tokens,” “millions of hours of audio,” and “billions of images” have flowed through NeMo Curator. The two scale claims sit at different levels: NVIDIA’s description refers to more than 2 trillion authentic Indian-language tokens in the sovereign-AI infrastructure effort, while Kumar’s claim covers multimodal data passing through Sarvam’s curation systems.

Data curation is treated as infrastructure, not preprocessing

For Sarvam, Kumar says, “we start with data.” That means building curation pipelines to ensure data quality before the training process begins. He says Sarvam has been using NVIDIA NeMo Curator extensively for this work.

The scale he describes is large and multimodal: tens of trillions of tokens, millions of hours of audio, and billions of images. Kumar says all of that has flowed through NeMo Curator, and that Sarvam has come to understand both the tool’s scaling properties and the value it brings.

A code-editor visual appears alongside the training discussion, showing a configuration file and terminal logs. The visible configuration includes a Llama-style causal language model architecture, 32 hidden layers, 32 attention heads, hidden size 4096, 8192 maximum position embeddings, and vocabulary size 128,256. The frame gives a concrete sense of the training environment being discussed, but it is not identified on screen as the specification for a named Sarvam model.

NVIDIA’s Nemotron and NeMo materials provide the surrounding tooling context. The Nemotron diagram describes “Open Models, Data, Libraries for Agentic AI,” with components including Nemotron models, NeMo Data, Megatron Core, NeMo RL & Gym, NeMo Evaluator, and NeMo Agent Toolkit. It also lists 10 trillion pretraining tokens, 40 million post-training samples, 100,000 reinforcement-learning tasks, and 15,000 safety traces in the Nemotron context. Kumar’s explicit Sarvam usage claims are narrower: Sarvam uses NeMo Curator, the NeMo framework, NeMo RL, and NVIDIA’s training and inference stacks.

Training, reinforcement learning, and inference are one accelerated-compute problem

Pratyush Kumar says Sarvam uses the NeMo framework for training: pre-training, fine-tuning, and reinforcement learning. He singles out reinforcement learning as producing “consistent dividends at scale,” and says Sarvam has been using the NeMo RL framework for that work.

Inference is also part of the same architecture. Kumar says Sarvam has been doing inference with its models “at some fair scale” and has used NVIDIA training and inference stacks extensively, primarily on Hopper-series GPUs. NVIDIA’s description specifies H100 GPU clusters; Kumar’s remarks refer more broadly to the Hopper series.

The software stack is presented as layered architecture: hardware drivers at the bottom, then CUDA, deep learning and reinforcement-learning frameworks, data, recipes, model checkpoints, and inference-serving software at the top. Kumar’s point is that developers need to become fluent in this stack because generative AI models will increasingly sit in the path of what they build.

That is also where the sovereignty argument meets a workforce argument. India’s developer base, in Kumar’s telling, should not be limited to consuming AI services. It should develop expertise in accelerated compute software. He describes the NVIDIA stack as an example of where developers can build that expertise and says he sees it as “the core of development going ahead.”

The ambition is population-scale AI for India’s diversity

With NVIDIA, Pratyush Kumar says, Sarvam wants to build models that “represent the diversity of India” and serve them at scale, so the effort is “population-scale” rather than something used by only a small group.

A Sarvam team photograph carries the line “Powering India’s AI Together.” It underscores the proposition running through Kumar’s remarks: sovereign AI is treated as a full-stack project spanning data, models, training frameworks, inference systems, APIs, applications, and developer capability.

Data and Training Inference and Deployment Voice and Audio AI Multimodal AI AI Infrastructure and Compute AI Policy and Geopolitics

Sarvam’s sovereign AI thesis starts with the full stack

The platform is already operating at production scale

Data curation is treated as infrastructure, not preprocessing

Training, reinforcement learning, and inference are one accelerated-compute problem

The ambition is population-scale AI for India’s diversity

The frontier, in your inbox tomorrow at 08:00.