Network Identity Moves Agent Credentials Out of the Sandbox

Remy GuercioAI EngineerMonday, June 1, 202612 min read

Remy Guercio of Tailscale argues that many agent sandboxes protect the runtime while leaving the more dangerous object inside it: the credential. In his account, Aperture, Tailscale’s LLM gateway, separates execution isolation from access control by keeping provider keys at the network layer and giving the agent only a placeholder. Routed through Tailscale’s WireGuard-based identity network, each LLM call carries a verified user, group, or machine identity, giving Aperture a central point for policy, logging, cost controls, hooks, and visibility into tool use.

The weak point in many agent sandboxes is not execution isolation, but credentials

Remy Guercio framed the problem with agent sandboxing as a separation that current designs often blur. A sandbox has, at minimum, two parts: a boundary and a set of permissions. The boundary is the thing that distinguishes inside from outside. The permissions are what make the sandbox useful at all; without identity or authorization, he said, it is “a sandbox without any toys.”

For AI agents, Guercio’s concern is that teams often solve the boundary problem with a VM, container, GitHub Actions runner, or similar isolated runtime, but still put the permissioning material inside that runtime. The agent may be boxed in, but the thing that gives it access is in the box with it.

The usual mechanisms are API keys or OAuth/OIDC. Guercio described API keys as the approach major model labs would like developers to use, because it keeps usage tied to provider billing, but he argued that an API key does not really separate authentication from authorization. It is a bearer credential that grants access to some set of models or services. He described OAuth or OIDC as “maybe the more cost-effective way,” but emphasized that in the common agent setup the login, account, or credentialed session is still happening inside the sandbox.

That is the failure mode Guercio wanted to isolate: not that the container fails as a container, but that the agent is holding material it can misuse. Even synthetic credentials can be used creatively by models, particularly if an agent runs in a loop long enough. A fully isolated runtime can still contain a credential the agent can exfiltrate, share, misuse, or use while trying alternate endpoints.

You have a single key on Aperture. You can then write all your rules in Aperture and then on the other side there’s actually no key.

Remy Guercio

The alternative he demonstrated is to move identity and access control out of the agent runtime and into the network path. The agent still receives something that satisfies client software expecting a key — in the Claude Code configuration he showed, the value was just a dash — but the real provider key sits elsewhere. The sandbox has no provider key to leak.

Tailscale’s network identity becomes the authorization layer

Remy Guercio’s proposal depends on how Tailscale uses WireGuard. WireGuard provides keys for nodes on a network; Tailscale adds identity on top. In his description, a Tailscale connection is not just “this IP address connected to that IP address.” Each connection can carry information about the entity making it: a logged-in user, the groups associated with that user through synced identity data, or tags attached to non-human systems.

That means a laptop, phone, GPU server, container, GitHub Actions runner, or service can be treated as a node whose network connections carry identity. A human user may appear as a user identity with group membership. A PR review bot running in CI may appear as a tagged machine, such as a bot for a particular project or purpose.

The important property is that access can be governed before the application has to trust a credential presented from inside the sandbox. Guercio said Tailscale can decide whether a node is allowed to talk to something at all, based on identity attached to the connection. The service on the other side can also receive that identity information, rather than infer trust from an IP address and then perform a separate API-key check.

In the agent case, Guercio used a GitHub Actions runner as the example sandbox. When the runner starts, it can use GitHub’s federated OIDC to join the tailnet and receive a Tailscale tag. That tag becomes the basis for what the runner can do through Aperture, Tailscale’s AI gateway. The credential-bearing relationship is no longer “agent has API key, agent calls model provider.” It is “tagged network identity calls gateway, gateway decides what that identity can do.”

Aperture puts the provider key at the gateway, not inside the agent

Remy Guercio demonstrated Aperture as an LLM gateway deployed as a node on the Tailscale network. From the application side it looks like a conventional gateway: teams can configure providers such as Anthropic, OpenAI, Gemini, Vertex, or Bedrock, all of which he named in the presentation, and point client tools at an Aperture endpoint. From the security side, the gateway reads Tailscale identity from the network connection and applies grants, quotas, hooks, and policies.

The provider key lives on Aperture. The agent or sandboxed runner does not receive it. In the Claude Code setup Guercio showed, the configuration pointed Claude Code at an Aperture base URL and supplied "-" as the key so that the tool would still run in API-key mode without possessing a real key.

The on-screen configuration reduced the agent-side setup to an endpoint and a placeholder credential:

{
  "apiEndpoint": "aperture",
  "keys": {
    "aperture": "-"
  },
  "APERTURE_BASE_URL": "https://api.aperture.tailscale.com/v1"
}

He emphasized that this works with existing AI development tools by changing the endpoint rather than rewriting the agent harness. He named Claude Code, Codex, and Gemini CLI as examples of tools that can be pointed at the gateway.

That design changes the enforcement point. If access is revoked or a policy denies the request, the agent cannot fall back to the real provider key because it never had one. Guercio contrasted this with a situation where an agent sees that “the key no longer works” and tries another endpoint or workaround. In the Aperture design, he said, “it literally is like, oh, key no longer work, it’s just a dash.”

When LLM access is routed through Aperture, the gateway records that path

A large part of Remy Guercio’s demonstration was not just policy enforcement, but observability for the traffic that goes through the gateway. Aperture showed dashboards for requests, tokens, estimated cost, model usage, user agents, and individual request details. In his demo instance, he could see usage by his own user identity and by tagged automation such as a PR review bot.

He drilled into a Claude Code request and showed request headers, request body, response body, and the context Claude Code sends at the start of a session. He noted that even a small prompt carries the overhead of the tool’s initial context. One screen showed a request that cost about 20 cents for a simple short-story prompt because of the context Claude Code sent with it.

The operational point was not the prompt itself. If a pipeline depends on an agent producing a specific structure, or if a PR review bot silently makes a bad decision, a team needs to know what was sent, what came back, when it happened, and under which identity. When LLM access is configured to go through Aperture, that record is created at the gateway for the model interactions routed through it, rather than reconstructed only from inside the agent runtime.

Identity or view	Requests	Total tokens	Estimated cost
Demo dashboard, last 30 days	112	470,053	$2.25
Dogfood bot, last 30 days	8,025	408,057,998	$340.70

Aperture dashboard metrics shown during the demo

For the tagged dogfood bot, the dashboard showed 8,025 requests, 408,057,998 total tokens, and an estimated cost of $340.70 over the last 30 days. Another logs view showed a PR review bot identity with hundreds of Claude Code requests and tens of millions of total tokens. Guercio used these screens to show that Aperture can attribute routed agent activity to machine identities, not only human users.

Tool-call visibility is the reason Tailscale chose the LLM layer

Remy Guercio repeatedly returned to one claim: for agents configured to route their LLM requests through Aperture, the gateway can capture the tool-use metadata represented in those requests. In the demo, Aperture extracted bash commands, file reads, grep calls, and MCP calls from Claude Code sessions.

He showed a PR review bot session in which the gateway displayed an MCP call to update a code-review comment, bash commands, grep, and another comment update. In another view, the captured commands included git rev-parse HEAD, a git diff --name-only ... command, grep commands looking for TODO/FIXME/HACK/XXX markers, cat ./README.md, git log, and git checkout -b fix-todos.

Guercio described this as a consequence of working at the LLM layer while using the network as the path for configured LLM access. The capture does not depend on instrumentation inside the container or agent harness. If the tool call is represented in the model interaction that crosses the gateway, Aperture can extract it. His formulation was that he has “a sort of a guarantee” that he has seen every tool call that went through the instance.

The Q&A turned to cases where agents write or execute code rather than rely on structured MCP calls. An audience member asked how permissioning works when agents move away from MCP and tool calls toward executing code, which can make behavior harder to parse. Guercio acknowledged that this is more complicated. Even with skills or code, he said, agents are typically still running something, but an agent could write an obfuscated thing and then run it step by step.

His answer was pragmatic. Many organizations, he said, are not yet at the point of blocking specific tool use; they first want to know what tools people are using. He said that internally at Tailscale, bash dominates over other tool types. Aperture’s value, in that context, is to expose the commands represented in the agent’s LLM traffic when that traffic is routed through the gateway, and to create a place where guardrails can be added.

Internally, if you were to look at our actual instance, bash dominates everything else.

Remy Guercio · Source

He gave rm -rf / as the kind of command a guardrail might flag or block. But he did not present Aperture as already solving every possible obfuscation problem or observing arbitrary behavior outside its path. The design choice was to observe and control at the LLM gateway because MCP-only enforcement would miss too much of how agents actually behave.

Permissions can live in Aperture grants or in Tailscale policy

When asked how permissions are configured — whether users can be allowed to make certain tool calls and not others — Remy Guercio said Aperture supports configuration in two places.

The first is Aperture’s own grants interface. In the settings view he showed, grants could be attached to entities and used for model access, quotas, MCP access, hooks, roles, and related controls. He said groups were being added to that interface as well.

The second is Tailscale’s broader policy file. Tailscale already uses an access-control file to define who can access what on the network. Guercio said application capabilities can be placed into that main ACL or access-control file and sent along with identity. The same mechanism that sends user identity, tags, and group membership can also send arbitrary metadata guaranteed by the Tailscale control plane.

For organizations operating at scale, the JSON representation and API matter as much as the visual editor. Guercio said many users want to put this kind of configuration into a GitOps workflow rather than manually clicking through a dashboard.

That distinction fits the broader architecture. Aperture consumes network identity and policy metadata, then applies them to provider access, MCP access, hooks, quotas, and other controls exposed by the gateway.

Cost controls and hooks become cross-provider controls

Because Aperture sits between identities and multiple LLM providers, Remy Guercio said cost controls can be expressed across providers rather than provider by provider. A budget can be set once and spent across the configured providers, rather than allocating separate limits inside each vendor account.

In the dashboard screens, Aperture displayed costs by model, total requests, input and output tokens, cached tokens, reasoning tokens, and estimated spend. Guercio showed his own usage and the usage of the dogfood bot, then moved to organization-level adoption views with daily usage and top users.

He also showed integrations and webhooks. For each tool call or captured event, Aperture can send a request to a third party with information about what happened. Guercio’s claim was that, for traffic routed through Aperture, those hooks can run on the events the gateway observes because the LLM request path goes through the gateway.

That makes the gateway potentially useful for monitoring and operational workflows as well as for developer debugging. The visible integrations page included categories such as security, monitoring, webhooks, and developer tooling, with examples including Splunk and GitHub.

Quotas are another control surface. In one settings view, Aperture showed a quota with a capacity of 500 requests, an interval of one day, and an onExceed behavior of reject. Guercio described this more generally as the ability to say a user or agent gets a certain amount per day and have the gateway enforce it.

Tailscale chose explicit endpoint configuration over transparent interception

An audience member noticed that Claude Code had been configured with Aperture as the base URL and asked whether the gateway could instead transparently capture the traffic at the network layer and substitute itself in.

Remy Guercio said Tailscale had discussed that option, and that it was technically possible. But he said it was not really “the Tailscale way.” His argument was that the product should make the intended path easy and explicit rather than hide interception under the surface. Transparent behavior can cause things to “break and shift and move,” and can become confusing when the ground changes underneath the user.

That answer clarifies the intended user experience. Aperture is not presented as a stealth proxy that silently takes over model-provider traffic. It is meant to be the straightforward way to provide LLM access to a sandbox, a developer machine, or an automation environment without distributing API keys. Developers point tools at the gateway; security and IT administrators get centralized controls, logs, tool-call visibility, and policy enforcement for the traffic that uses it.

Guercio described this as trying to serve both sides: developers who want to build with AI and administrators who want usable controls over what those tools do.

The underlying primitive is available beyond Aperture

Remy Guercio separated Aperture from the underlying Tailscale identity primitives. Aperture is a product built on those primitives, and he said it is available on Tailscale’s free plan. But the same capabilities are available to developers through TSnet, an open-source Go library that lets a program place itself on a tailnet and read the same identity information.

That matters for internal tools. Guercio said a team building an internal MCP server, API endpoint, or organization-only service does not have to open it to everyone or build a separate OAuth flow from scratch. It can use Tailscale identity to ask who made the request and pass that identity into the internal service or proxy.

He also said Aperture itself was built under a constraint: it had to be built on top of Tailscale, not by relying on private internal APIs. In theory, he said, someone could build a similar gateway using the same primitives.

The larger design claim is therefore not only that Aperture solves one LLM gateway problem. It is that network identity can become the substrate for application authorization, especially where agents need access but should not hold durable credentials. The sandbox boundary remains useful, but the credential no longer has to live inside it.

AI Application Architecture Inference and Deployment AI Security Agents and Autonomy Coding Assistants