Enterprises Face a 100,000-Agent Governance Problem
Barndoor AI co-founder and CEO Oren Michaels argues that enterprises are approaching a governance problem created by AI agents that can act across Salesforce, Slack, email and other workplace systems. In a conversation with Craig Smith, Michaels says connectivity protocols such as MCP have made it easier for agents to reach enterprise tools, but have not solved the harder question of what a given agent should be allowed to do for a given task. His central claim is that companies will need a separate control layer to manage thousands of task-specific agents, because traditional identity systems assume human judgment that agents do not have.

Agents make the trust problem operational, not theoretical
Oren Michels defines an AI agent by whether it takes action. A chat interface can suggest that a human update Salesforce, send an email, summarize a Slack thread, or log a call. An agent, in his usage, does the work directly inside the same enterprise systems employees use: Salesforce, email, Slack, Snowflake, QuickBooks, calendars, documents, and internal tools.
That distinction is why the enterprise problem changes. To let an agent act, companies need two things at once: connectivity to tools and confidence that the agent will use those tools appropriately. Michaels says the first problem has become much easier. Model Context Protocol, or MCP, created a common way for models to connect to tools and APIs. But MCP, in his framing, is “merely a pipe.” It helps an AI reach a system. It does not decide whether a particular agent, on behalf of a particular person, for a particular task, should be allowed to perform a particular action on particular data.
That gap is where Michaels places Barndoor AI. The company’s premise is that enterprises will not trust autonomous agents unless those agents are governed at the level of the task. An agent should be given only the capabilities relevant to the job it is doing, so that if it hallucinates, misunderstands an instruction, or “goes off the rails,” its guardrails keep it inside a narrow operational lane.
The scale problem follows from that definition. A company will not have one agent. Each employee may use many agents, and those agents may change as new models and tools appear. Each task may require different rules. The same employee may need one agent that can read a transcript and log a sales call, another that can create contacts from a conference attendee list, another that can summarize unread Slack messages, and another that can draft but not send external communications. Multiply that across a large company and Michaels sees “tens, hundreds of thousands of agents” requiring governance.
The number is a warning about administrative impossibility. If every useful new agent requires reconnecting tools and rewriting rules, the enterprise stack becomes brittle. Michaels expects new agents to arrive continually, including powerful tools that were not anticipated months earlier. His argument is that companies need a governance layer independent of any one agent, model, or application, so they can swap capabilities in and out without rebuilding the trust architecture each time.
Venn, Barndoor’s product for individual users, is meant to expose that pattern at one-person scale. Michaels describes it as “everything I just described to you, but for one person.” A user can connect an AI to email, Slack, calendar, and other personal or work tools, subject to whatever corporate security policies allow, and experience what governed agentic AI feels like before trying to bring that pattern into an organization. His belief is that individual users who discover useful workflows will carry those lessons back to their companies, making enterprise adoption more concrete.
The old identity stack assumes human judgment
Existing identity and access management systems already decide who can enter a system, what database they can query, and what application permissions they hold. Deterministic software has long been controlled through identity, API permissions, and access rules. Craig Smith raises that existing stack as the natural comparison point for agentic AI.
Michaels’ answer is that identity systems assume the governed actor is a human. A person brings judgment, incentives, fear of consequences, and a history of behavior. Michaels gives a simple example: as Oren, he may be technically allowed to delete Salesforce opportunities, but he knows it would be stupid, and he knows that doing it repeatedly would get him fired. That knowledge is itself a governance layer, even if it is not expressed in access-control settings.
That is also why enterprise access is often intentionally over-provisioned. Companies do not want employees to file a ticket every time they need one additional Salesforce capability. If someone has been hired into a role and shown reasonable trustworthiness, the system gives broad latitude and relies on the person not to misuse it.
Agents break that assumption. A company may not want to reduce Michaels’ own permissions just because he is using AI. But the AI acting on his behalf does not have his judgment. It may misunderstand a prompt. It may infer a next step that seems plausible but is wrong. It can also act much faster than a person, causing damage at machine speed.
Michaels therefore argues that an AI acting for a human needs a smaller “blast radius” than the human. The right posture is closer to how a manager handles a new intern: assign low-risk work first, watch closely, expand responsibility as competence is demonstrated, and keep consequential actions constrained until trust has been earned.
They will absolutely give you an answer. They will absolutely do something when you tell them to.
His “enthusiastic intern” analogy is not a claim that agents are useless. It is a claim that usefulness and reliability diverge under uncertainty. An intern may be eager, fast, and occasionally impressive, but the manager still starts with limited authority. Michaels sees many people following that same path with AI, sometimes consciously and sometimes not: first using it as “glorified search,” then asking it to write or interpret, then letting it inspect connected systems for a bounded task.
One example comes from inside his company. A colleague receives so many Slack messages that, before logging off, she asks her AI to look at all unread Slack messages and identify the five important ones to answer before the end of the day. That requires the AI to be connected to Slack. It is also bounded: the agent is reading and prioritizing, not posting broadly, deleting records, or changing business systems.
Smith raises automation bias: when a system works correctly ten times in a row, people stop checking, even if it remains wrong 3% of the time. Michaels does not expect humans to remain vigilant forever. He expects checking mechanisms, including other agents, to become part of the architecture. He compares this to longstanding IT and software practices: nightly programs that validate data, engineers reviewing pull requests, bots running tests, editors and fact-checkers reviewing journalism. If agents do autonomous work, their work will also need QA.
| Governance question | Human IAM assumption | Agent governance requirement |
|---|---|---|
| Who is acting? | A known employee with a role and identity | An AI acting for a person, system, or task |
| Why broad access works | Humans bring judgment, incentives, and fear of consequences | Agents need narrower permissions because they lack that judgment |
| Where risk appears | Unauthorized people or insiders exceeding allowed permissions | Authorized agents doing allowed actions in the wrong context |
| How permissions should narrow | By role, system, and identity | By identity, task, tool, data, action, and acceptable failure mode |
The dangerous case, in Michaels’ account, is authorized action with bad consequences
The new security vector Michaels describes is not a replacement for traditional security problems. Security teams still need to keep outsiders out and prevent insiders from exceeding their permissions. Agentic AI adds a third case: someone inside the company, allowed to be there, using a tool they are allowed to use, doing an action that is technically permitted, and still causing harm.
That is the case Barndoor is designed to govern. Michaels says the company is not trying to “get inside” the AI’s metaphorical head and govern its intentions. He doubts that is feasible, in the same way it is not feasible to govern a person by directly inspecting their thoughts. Instead, he argues that governance should focus on what goes in, what comes out, and what actions the agent is permitted to take.
The practical policies he describes range from broad corporate prohibitions to narrow task-level rules. One customer’s first policy was that no AI could post to any Slack channel whose name begins with “#EXT,” because those channels include people from outside the company. The point was not to ban AI in Slack; it was to keep the early blast radius internal. Another broad rule: no AI should post to anything writable, such as email or Slack, if the content includes something resembling a Social Security number, phone number, address, or other personally identifiable information. In that case, the system should reject the attempt.
The finer-grained examples show why simple role-based access is insufficient. Michaels describes an agent he uses after sales calls. It checks his calendar to identify attendees, retrieves the Zoom transcript, summarizes the call, then logs it in Salesforce against the relevant people. He allows that agent to create the call log. He does not allow it to add new Salesforce contacts.
The reason is not that adding contacts is inherently forbidden. It is that, for this workflow, the agent sometimes fails to find a person who is already in Salesforce and decides to add them again. Michaels would rather have the task fail and ask for human help than create “20 of the same person” in Salesforce.
A different task gets a different policy. After attending a Wall Street Journal conference, Michaels had a list of attendees and wanted to ensure they were in Salesforce. In that context, he expected many would not already exist, so he was willing to let the agent create contacts.
The permission is not simply “Oren can create contacts” or “AI can create contacts.” It depends on the task, the data, and the acceptable failure mode. For one workflow, duplicate contacts are more damaging than an incomplete automation. For another, contact creation is the point.
That distinction also underpins Michaels’ view of read and write actions. He says an agent is not truly useful as an agent unless it can write. In his account, most out-of-the-box MCP connections avoid writes because they lack governance. They may let a model read a calendar but not create calendar events. They may expose safe subsets of an underlying API because deletes, overwrites, drops, and other destructive actions are too risky without a control plane.
Barndoor and Venn, Michaels says, expose more capable MCPs because they wrap those capabilities with governance. On Venn, that looks like a page of toggles for each connected service. In Barndoor, the enterprise product, policies can be built through toggles, written in JSON, managed in the user interface, or controlled through an API. The API matters because Michaels does not believe humans will manually create and maintain all the policies required for thousands of agents. Some policy changes will themselves be driven by systems or AIs that adjust capabilities based on context.
Tool routing is also a performance problem
The agent governance problem is not only about permissions. Michaels identifies context window exhaustion as a separate operational constraint that can make agents worse and more expensive.
He explains it through MCP. An MCP is not just a tool connection; it is effectively a tool plus a user manual for how the model should use it. A Gmail MCP might expose 40, 50, or 100 tools, each with substantial instructions. When a model is told it may need that MCP, the available tools and their manuals consume tokens before the model has done the actual work. Add calendar, maps, mail, Slack, documents, and other services, and the agent may fill its available context with tool descriptions rather than task-relevant information.
The result is not merely cost. Too many tools can confuse the model. Michaels compares it to walking into Home Depot and seeing a sea of tools without knowing where the hammer is, or even what a hammer is. He says agents can respond to a request such as “go to Salesforce and give me information on these three opportunities” by trying to search Google Drive, because the tool universe overwhelms the task.
Barndoor’s answer is Tool IQ. Rather than exposing every connected tool directly to the model, Barndoor presents the model with a single MCP. Behind it, a tool router abstracts the connected services. The model says what it is trying to do; the router provides the small subset of tools needed for that task and the instructions for using them.
The analogy Michaels uses is a tray of tools for a chore at home. The worker does not bring all of Home Depot; they bring the few tools needed for the job. In AI terms, he says, that saves tokens, reduces processing waste, and improves the response by giving the model a smaller, more relevant set of actions.
The agent ecosystem keeps moving underneath the enterprise stack
OpenClaw enters the discussion as an example of how quickly the agent landscape can change. Smith treats it as an inflection point because, in his view, it made the power of agents visible to a broader segment of users. Michaels agrees that it was a “huge” event for Barndoor. His point is not that OpenClaw itself solves enterprise governance. In his description, OpenClaw’s ethos is almost the opposite: it should not need MCP because it can write whatever program it needs and go do what it wants.
Still, Michaels says Barndoor and Venn can work with OpenClaw through an extension called MCP ops, or “micpops,” which he describes as letting OpenClaw access MCPs. In his account, that extension can access Barndoor’s or Venn’s Tool IQ MCP. Asked about MyClaw AI, a cloud-hosted way Smith says some users access OpenClaw, Michaels says he has not tried it personally but assumes it should work if it is a regular OpenClaw version capable of running micpops.
The larger point is substitution. Michaels expects new agents and new interfaces to appear continually. If OpenClaw, Claude, or another tool becomes the best option for a task, the enterprise should be able to connect it through the same governance layer, rather than rebuilding connections, policies, and permissions inside every new application.
That is where he sees limitations in both incumbents and developer-tool approaches. Incumbent API management, identity, and security companies will need to present something credible for agent governance, but Michaels expects them to bolt agent controls onto existing infrastructure. In proof-of-concepts, he says those approaches do not go far enough for the fine-grained management agents require.
Developer tools have the opposite problem. They may let an AI application include its own connections and governance, but Michaels gives two objections. First, he argues that “no one in their right mind really trusts AI companies to govern themselves,” because their incentives to move quickly may not align with governance. Second, no CIO or CISO wants hundreds of AI systems each governing itself, each with separate connections and rules. That would recreate the same administrative burden at larger scale.
Boomi and similar integration platforms can provide API management and APIs that Barndoor sits in front of, Michaels says. They may also expose MCP capabilities. But the fine-grained decisions Michaels describes — what a specific agent may do in Salesforce for a specific task, with a specific failure mode — are, in his view, outside traditional API management and security.
He puts it bluntly: a CISO does not necessarily know Salesforce well enough to decide which specific Salesforce operation should be available in which business context. The governance layer needs to combine identity, system-specific knowledge, task context, and fine-grained access control across the tools agents will use.
The adoption curve steepens when workers see their own problem solved
The case for a 2026 inflection point is behavioral rather than purely technical: humans are pattern matchers. The more people see AI solve problems they recognize, the more they want to use it themselves.
The first enterprise users who got real work out of AI were coders, Michaels says, because coding already resembles a conversation with highly technical experts. A developer whose code does not work can ask someone more expert to fix it. AI fit that pattern. Then developers used MCP and related approaches to make models not just advise on code but create new code, changing how software gets developed.
For many other workers, AI started as search or advice. It answered questions, summarized, drafted, and suggested. Michaels thinks that is now shifting. Marketers, salespeople, HR staff, attorneys, and other knowledge workers are beginning to use AI to get work done rather than merely talk about work. Once they see a workflow that solves a problem they actually have, adoption accelerates.
His example is mundane but important: emptying or triaging a massive inbox. Early Venn users are doing this kind of work, according to Michaels. Barndoor cannot see the underlying data users pass through the system, he says, but it can see usage patterns. The company wants to surface those patterns as examples, because showing someone what they can do with a tool is more effective than telling them to explore a blank interface.
That is also where Michaels locates the failure of many enterprise AI projects. He says “95% of AI projects fail” and argues that many fail because the “project” was little more than giving employees access to AI and expecting usage to materialize. A company signs a deal with OpenAI, tells everyone they can use it, and employees encounter something that “looked like a search box.” They do not automatically know how to transform business problems into agentic workflows.
Michaels argues that successful companies will need to find, promote, and celebrate employees who can connect business challenges to the new tools. Smith mentions a BCG executive’s term “agentic quotient,” meaning the ability to work effectively with agents. Michaels says he likes the term and would steal it.
The organizational owner is still unsettled. Michaels now attends Chief AI Officer conferences, which he says did not exist two years ago. Today, the leader may be whoever the CEO told to “make AI happen”: a CIO, CTO, HR leader, line-of-business executive, or even a retired confidant brought back into the company. Over time, he expects AI leadership to straddle functions in the way IT and HR do.
But he also argues that AI has to be represented at an executive level, not merely managed as a departmental tool. His distinction between managers and executives is that a manager optimizes a department, while an executive understands the company as a whole and may make their own area’s work harder if that helps the company succeed. AI governance requires that broader judgment because risk and reward vary by function. There is no one-size-fits-all policy for the whole company.
The more interesting workflows are not just faster versions of human chores
One way to decide where AI belongs is to start with the work people least want to pass on. Michaels cites a line from Quentin Hardy: AI should take on the tasks in your job that you would not want your children to have to do if they had the same job. That is a useful starting point: remove the work that keeps people from the work they value.
But Michaels does not want the ambition to stop there. Much early agentic AI, he says, is about making existing human tasks faster, more accurate, or less burdensome. That is useful, but it is the “faster horses” version of automation. The more interesting opportunity is to do things humans are not doing, and in some cases cannot do manually, because they cannot process the relevant data or maintain the required personalization at scale.
The business example he gives is a major hotel chain exploring agentic AI for guest services. In luxury hospitality, he argues, the differentiator is not only the physical property but personalization. Today, personalization may mean a thoughtful amenity, a note, or, in one hotel stay Michaels recalls, a picture of him and his wife placed in the room for their anniversary. Those gestures can be meaningful, but they are limited by what humans can gather and execute manually.
With returning guests, a hotel has data about how someone interacted with the property and its systems. Agents could help anticipate needs and deliver service in a way humans could not manage at scale. Michaels compares this to the way AI has been used effectively for years in advertising: systems infer what to show people based on data. He is dismissive of the claim that ad targeting creates a great experience, but he sees the same technical capacity applied to services and goods people actually want. Companies that unlock their data and empower agents to act on it, he argues, can win by giving customers better experiences.
The point is not that agents simply replace clerical work. Michaels is more interested in workflows that combine deterministic programming, probabilistic model judgment, data access, and governed action into something that was previously impractical. That same pattern explains why his examples range from Salesforce hygiene to hotel personalization: the agent becomes useful when it can safely act across systems, not merely summarize information inside one of them.
Broadway shows why data-rich operations are not just office work
Michaels’ work as a Broadway producer becomes relevant when the issue shifts from abstract knowledge work to data-rich operations. Theater is not usually framed as enterprise AI, but Michaels says technology is already deeply embedded in production. Broadway shows load a kind of data center under the stage, with racks of servers and computers controlling lighting, sound, set automation, and video. The technology evolves quickly because Broadway is expensive and risky, especially for musicals, and producers need tools to model, test, and synthesize information.
AI is already useful in theater marketing, audience discovery, design workflows, and analysis. Michaels describes feeding audience surveys and reviews from an out-of-town trial into chat and receiving a summary that a longtime Broadway director said matched the conclusions the creative team had reached after three days of Post-it work.
Agentic AI in theater is earlier, but Michaels sees plausible uses in monitoring and safety. Broadway productions already test every light and movement before a show. Every 15 or 20 shows, he says, he sees a performance stop because something is not moving correctly and the cast must leave the stage. Since those systems generate data, agents could help monitor equipment, schedule maintenance, replace components, and keep the show safe and consistent with the opening-night design.
The example ties back to the enterprise argument because safety monitoring is a governed-action problem. An agent that observes equipment data, recommends maintenance, or schedules a replacement would need permission to read operational systems, identify risk, and possibly trigger action while being constrained from making unsafe or inappropriate changes.
The control layer has to arrive before deployment scales
Michaels named Barndoor after the familiar mistake of closing the barn door after the horse is out. His view is that agentic AI is still early enough for enterprises to put the barn door in place first.
He is not arguing that enterprises lack interest. At a recent conference with roughly 50 enterprise CIOs and CISOs, Michaels says large companies were already experimenting with agents and MCPs at different levels. Some longstanding companies with capable IT teams are building MCPs both for SaaS products and internal systems. Compared with the same event six months earlier, he says, the topic had moved to the front of everyone’s mind.
The blocker is trust. Smith summarizes the proposition as: agents have promise, enterprises are scared of them, and Barndoor provides the control layer that enables trust. Michaels agrees.
That control layer, as Michaels describes it, covers the missing middle between broad identity and unmanaged autonomy. It should know which human or system an agent is acting for, which tool it is trying to use, what task it is performing, what data is involved, and what actions are safe in that context. It should support broad company rules, narrow workflow rules, tool routing, monitoring, and policy automation.
Michaels also presents Barndoor as a platform company with a services component. He compares the adoption work to Mashery, the API-as-a-service company he co-founded in 2006 and sold to Intel in 2013. In Mashery’s early days, many companies wanted APIs without knowing what business problems to solve with them. Mashery would run hackathons, bring together coders and non-coders, and help the company find real use cases. Barndoor has people who do similar work for AI: bringing ideas to customers, helping identify business problems, and translating agentic capability into governed workflows.
The underlying claim is that enterprise AI adoption is not waiting only on better models. Models can connect to tools. Agents can act. Workers can imagine use cases once they see examples. But companies still need a way to decide what those agents are allowed to do, to contain failures, and to manage the resulting population at a scale no human policy team can hand-administer.


