Retrofitting Sovereign AI Turns Compliance Rules Into Architecture Rework

Bilge YücelAI EngineerTuesday, May 19, 202612 min read

Bilge Yücel of deepset argues that AI sovereignty is an engineering constraint that has to be designed into a system, not a legal or procurement requirement applied after deployment. She frames sovereign AI around control of data, models, infrastructure, and operations, and shows how retrofits expose hidden dependencies: jurisdiction-crossing data flows, model APIs embedded in application logic, managed services that masked operational work, and systems that cannot be traced or audited.

Sovereignty is an engineering constraint, not a policy slogan

Bilge Yücel defines sovereign AI as the ability of an organization to “design, deploy, and operate AI systems on its own terms.” For engineers, she translates that into a more concrete requirement: explicit control over data flow, model choice, infrastructure, observability, and operations.

That framing matters because sovereignty is often discussed as if it were a legal or procurement issue that can be handled after a system works. Yücel’s point is that the technical assumptions are embedded much earlier: where data is sent, which model API the application depends on, where compute runs, whether traces exist, and who can respond when the system fails.

She organizes the problem into four pillars. Each pillar asks a different control question: where data is stored and processed; who controls the running model and what is known about the origin of its training data; where compute happens; and whether the system is traceable, updateable, and operationally owned.

Pillar	Control question
Data sovereignty	Where is data stored and processed, and are access permissions respected?
Model sovereignty	Who controls the running model, and what is known about training-data origin?
Infrastructure sovereignty	Where does compute happen?
Operational sovereignty	Is the system traceable, who can update it, and who owns incident response?

Yücel’s four-pillar framing of sovereign AI

The practical conclusion is not that every AI system must be maximally sovereign. Sovereignty is a spectrum. Finance, healthcare, and other high-stakes deployments may need fully air-gapped systems. A startup or enterprise in a less sensitive domain may not need to satisfy every sovereignty pillar at the highest level. But every team should know its actual level of control and its actual degree of vendor lock-in before sovereignty becomes a requirement imposed from above.

The important thing here is you need to know the level of control. So the level of vendor lock-in you have with your system.

Bilge Yücel · Source

Data sovereignty breaks when sensitive data leaves the trusted jurisdiction

Data sovereignty starts with the rule that data should be stored and processed within trusted jurisdictions to meet compliance requirements. In Yücel’s example, GDPR says European citizen data should stay within Europe. If that data is sent to an embedding API hosted in Virginia, she says the organization has already lost control of the data in the sense relevant to sovereignty.

The example is deliberately ordinary. The sovereignty problem does not require an elaborate agent or a visible frontier-model call. A single embedding request can cross the boundary. Yücel’s warning is that sovereignty constraints attach to data flows wherever they occur in the system.

Data sovereignty also includes access permissions inside the organization. Yücel separates this from jurisdictional storage and processing, but treats it as part of the same control problem. If users in an organization can access data they are not supposed to see, that is also a breach of data sovereignty. The issue is not only whether the data crossed a national or regional boundary; it is whether the system respects the organization’s rules about who may use which data.

That becomes more difficult when teams retrofit sovereignty into an existing system. Moving private data into the required jurisdiction may sound like a storage migration, but Yücel says it can leave teams managing multiple databases and instances. Search then becomes an architectural question. Should the application classify the query first and route it to the right database? Should it send the request to multiple stores and merge the results? The data-location decision cascades into retrieval design, query routing, and ongoing maintenance.

Infrastructure choices trade control against convenience

Bilge Yücel describes infrastructure sovereignty as the question of where compute happens. The application layer — RAG pipelines, ingestion, agents, tools, and related services — all run somewhere. That location and control model determine the sovereignty posture of the system.

She presents the options as a spectrum from maximum control to maximum convenience. At one end is an air-gapped environment: on-premises, no egress, and the highest degree of control. Her slide labels that option “EU AI Act safe.” A private cloud or dedicated VPC is labeled “GDPR safe.” Sovereign cloud depends on the provider. Hybrid architectures combine local systems with gated APIs and therefore require data gating. SaaS sits at the convenience end, but Yücel flags the risk that full vendor dependence can expose the system to CLOUD Act concerns when the provider is headquartered in the United States.

Infrastructure pattern	How Yücel characterizes it	Sovereignty tradeoff
Air-gapped	On-premises, no egress	Maximum control; slide labels it “EU AI Act safe”
Private cloud	VPC or dedicated environment	Slide labels it “GDPR safe”
Sovereign cloud	EU-operated infrastructure	Depends on provider
Hybrid	Local systems plus gated APIs	Requires data gating
SaaS	Full vendor model	Maximum convenience; CLOUD Act risk

The infrastructure sovereignty spectrum shown in Yücel’s slide

Her specific example is a US-headquartered company operating infrastructure in Europe. Even if the application and data are running in Europe, she says the provider may have an ability to access the running data, which conflicts with the organization’s sovereignty goals. The point is not that every SaaS system is unusable. It is that the control question cannot be answered only by asking where the data center is located.

The retrofit cost appears when a team replaces managed infrastructure with on-premises infrastructure. This is often the moment the team discovers how much vendor lock-in it had. Cloud providers and managed platforms may have been handling Kubernetes cluster management, networking, deployment details, and operational services that now become the team’s responsibility.

Hardware limitations also become visible. A system may have an application layer running on CPU while model inference runs on GPU. The team then has to manage the connection between them, including network and systems concerns that were previously hidden by managed infrastructure. Sovereignty can therefore move work from a vendor boundary into the engineering team’s backlog.

Model sovereignty is about swapability before it is about ideology

Model sovereignty asks who controls the model and what is known about the origin of the training data. The most immediate engineering requirement is freedom to choose and switch models.

If a system works only with one specific model provider, it is tightly coupled to that provider. If the API goes down, the application loses access. If the provider raises prices, the application inherits a cost issue. Yücel treats both as violations of the model-sovereignty idea because the organization cannot operate the AI system on its own terms.

Even nominal support for multiple models may not be enough. A team might not be contractually tied to one model provider, but if the codebase cannot swap models without architectural changes, the system is still practically constrained. The standard is not merely “could we use a different model eventually?” It is whether the application can switch models without changing application logic.

Training data origin is the least resolved part of the model-sovereignty pillar in Yücel’s account. She calls it controversial because teams often do not have a way to know exactly where a model was trained or which data was used to train it. Still, she says a European model provider may have an advantage over a US-based provider in this respect.

The cost of retrofitting model sovereignty is substantial. A likely first step for a team told to “make this working system sovereign” is to replace a frontier API with a self-hosted model. That swap is not a drop-in replacement in Yücel’s description. The team may need to translate API logic to the new model architecture, update prompts, evaluate system performance from scratch, and write a significant amount of code.

That phrase — evaluating performance from scratch — is central to her warning. Sovereignty is not just a hosting change. In Yücel’s account, changing the model can force teams back into the work of adapting prompts, application logic, and performance evaluation rather than preserving the assumptions of the original system.

Operational sovereignty requires traces, versioning, and human control

Bilge Yücel treats operational sovereignty as the work of monitoring, evaluating, and managing AI systems over time. Building the system is only one part of sovereignty; maintaining it in production is another.

The system must be monitored in production, including model inputs and outputs. In high-stakes environments such as HR or finance, Yücel says human-in-the-loop is incorporated. Updates to models and the application layer also need to be managed in a controlled, auditable way.

The retrofit problem is sharp for teams that did not build observability into the system from the beginning. If observability was absent until the moment sovereignty became a requirement, Yücel calls that an important issue already. Adding observability later can reveal that the AI application layer is effectively a black box. The team may not know what is happening inside the system, even though it now needs to log behavior somewhere compliant and make the system auditable.

Version control becomes part of the same problem. The question is not only whether ordinary source code is in version control, but how the AI application layer — prompts, model choices, pipelines, tools, guardrails, and other configuration — can be understood historically. If a system produced a problematic result, the team needs to reconstruct which version of the application ran, which model was used, what inputs and outputs occurred, and where those logs were stored.

Yücel’s checklist later condenses this into one question: do you have reproducible run logs, stored in a compliant way? Logs that exist but cannot be tied to the actual application version, model configuration, or jurisdictional requirements do not provide the operational control sovereignty demands.

Orchestration cannot solve every sovereignty problem, but it can reduce hidden coupling

Yücel uses Haystack, deepset’s open-source orchestration framework, as the practical answer to several of the breakages she identifies. She is explicit that a good orchestration framework cannot solve GPU limitations. But she argues that it can help with model swapability, explicit data flow, versioning, extensibility, and traceability.

The first claimed benefit is a consistent interface. If an application needs to move from a cloud model to a self-hosted model, Yücel says Haystack can make that transition possible by changing only a few lines of code, allowing the team to focus on hardware and connection issues rather than rewriting application logic.

The second is explicit data flow. In a Haystack application, she says every input and output is typed and declared. A team can read the pipeline definition or application and understand what data moves where. For less deterministic architectures such as agents, Yücel says tool calls and tool outputs are still traceable.

The third is serialization to YAML. Haystack applications can be turned into YAML and placed under version control. That means the team can return to a historical commit and inspect the application definition associated with it. In sovereignty terms, the pipeline itself becomes part of the auditable artifact, rather than something distributed across implicit runtime state or informal configuration.

The fourth is openness and extensibility. Yücel emphasizes that Haystack is open source, with “no black box” and “no hidden assumptions.” If a team needs to customize a component or extend code, it can do so because the underlying behavior is visible. In her framing, black-box dependencies are sovereignty liabilities: they are places where the organization does not know, or cannot change, what the system is doing.

A sovereign agent architecture makes control points explicit

Bilge Yücel demonstrates a high-level sovereign agent architecture built around guardrails, an agent, tools, observability, and storage. The design is not presented as a universal blueprint. It is an example of how sovereignty requirements change the shape of an AI system.

The first layer is an input guardrail. It checks user input for prompt injection and for organization-specific regulatory constraints. If the input is unsafe, it exits the application layer. If it is safe, it proceeds to the agent.

The agent itself is an LLM with a system prompt and a set of tools. Those tools can include API calls, database search, other agents in a multi-agent system, and MCP servers. The agent performs the requested work, then a final output guardrail runs compliance checks before anything is returned to the user. Yücel’s reason for this second guardrail is straightforward: the system should not leak sensitive information to the user.

The architecture allows different sovereignty choices for different parts of the system. A team might still use proprietary models for public data while requiring local or self-hosted models for guardrails, knowledge-base operations, or the main LLM. Yücel’s slide includes local and self-hosted options alongside proprietary APIs such as Gemini, OpenAI, and Claude. The point is not that one category is always acceptable and another is not; it is that the architecture should permit task-specific choices rather than forcing one provider across the system.

Traceability is another explicit layer. Yücel says Haystack can expose every input and output to a component, as well as traces inside the agent. Those spans can be connected to an LLM observability tool, and because Haystack has OpenTelemetry integration, teams can also implement their own observability. For storage, she highlights providers that offer both cloud and open-source versions, so teams can host them locally or on-premises when required.

Architecture element	Purpose in Yücel’s example
Input guardrail	Classifies prompt injection risk and organization-specific regulatory checks before the request reaches the agent
Agent	Combines an LLM, system prompt, and tools to perform the requested work
Tools	Can include API calls, database search, other agents, MCP servers, ingestion workflows, or RAG pipelines
Output guardrail	Runs compliance checks before the answer is returned to the user
Observability	Captures component inputs and outputs and agent traces
Storage	Uses providers that can support cloud or locally hosted deployments

Control points in the sovereign agent architecture Yücel presents

Her implementation details illustrate the same control pattern. The input guardrail uses a model and router to classify requests into “unsafe” and “safe” paths, with only the safe route continuing. Tools are scoped rather than handed wholesale to the agent: a locally hosted MCP server may expose many tools, but the agent is given only the ones it needs, such as knowledge-base search or PDF report generation. Other tools can come from Python functions, Haystack components, other agents, ingestion workflows, or RAG pipelines.

When the tool catalog grows, the design avoids stuffing every tool definition into the agent context. Yücel shows a searchable tool set using BM25 for dynamic tool search. The sovereignty point is not BM25 itself; it is that tool access can be selected, traced, and managed instead of treated as an unbounded capability.

Human-in-the-loop control appears through confirmation strategies. For a payment-submission tool, the agent should always ask for human approval. For a tool that lists payment requests, the agent may ask once and then reuse permission within the cycle. That is operational sovereignty expressed in application logic: the agent can act, but some actions require explicit human confirmation.

The final pipeline connects a tracer for LLM observability, the input guardrail, the agent, and the output guardrail. Yücel’s example request is to pull outstanding payment requests for Q3 and generate a PDF expenditure report for the Permanent Secretary. The system routes the request through the guardrails, agent, tools, and observability layer, then returns a saved PDF. The example is less about the report than about the architecture: control points are visible, auditable, and replaceable.

The sovereignty checklist is short because the hidden dependencies are not

Yücel closes the technical argument with three questions for evaluating whether an AI system is sovereign enough for its context.

First: can you swap models without changing the application logic? This tests whether model choice is abstracted as a controlled dependency or embedded throughout the application.

Second: do you have reproducible run logs stored in a compliant way? This tests whether operational behavior can be audited after the fact, and whether the audit trail itself respects the relevant jurisdiction and storage requirements.

Third: can your team respond to an incident without calling a vendor or hyperscaler? This tests whether the organization actually owns operations, or whether sovereignty depends on a third party being available and willing to intervene.

These questions map back to the failures Yücel identifies throughout the talk. A system that cannot swap models is exposed to provider downtime, pricing changes, and sovereignty demands it cannot quickly meet. A system without compliant, reproducible logs cannot be audited in the way high-stakes deployments require. A system that needs a hyperscaler for incident response may be convenient, but it is not fully under the organization’s operational control.

AI Application Architecture Inference and Deployment AI Security AI Governance and Regulation Agents and Autonomy AI Infrastructure and Compute