Agentic AI Projects Fail When Governance Cannot Move at Machine Speed

Jack WangAI EngineerThursday, May 28, 202611 min read

Accenture’s Jess Grogan-Avignon and Jack Wang argue that many enterprise agentic AI projects fail not because the agent cannot be built, but because the institution around it cannot move fast enough to ship and learn from it. Drawing on their experience building an agentic application in two weeks and spending another year getting it into production, they say enterprises must recode governance, fund AI as a portfolio of bets, deliver through hypothesis loops, grant autonomy only as evidence builds, and treat live customer feedback as the defensible asset.

The bottleneck is the enterprise operating system, not the agent

Jess Grogan-Avignon and Jack Wang describe large enterprises as organizations built, rationally, for control. Telecoms, utilities, government, healthcare, consumer products companies: at that scale, a bad deployment can affect critical national infrastructure or national-scale services. Over years, these firms built layers of process, repeatability, governance, and signoff because the consequences of failure justified them.

Their central claim is that the same scaffold is now becoming the drag. AI is not failing in these settings only because data is hard to access, APIs are missing, or the model cannot write useful code. It is failing because the enterprise was designed to move at human speed, while AI development and AI-assisted delivery are pushing toward machine speed.

Accenture’s framing reduces the problem to five enterprise tensions: speed, value, delivery, trust, and moat. The slide labels each tension in institutional terms: a “human OS for human pace,” finance “wired for certainty,” delivery that treats agents like features, trust defined by completion, and a moat that is “static without compounding.” Grogan-Avignon says understanding those five tensions can help predict whether an agentic project will succeed before it starts.

Wang’s example is deliberately mundane: an agentic application took about two weeks to build, then another 12 months to get into production. The delay was not because the application logic needed a year of engineering. It was because the infrastructure team, security team, centralized AI gateway team, data governance team, and application teams all had to align before the system could ship.

He compares the experience to Google Search if, before results appeared, three teams had to review them, legal had to sign off, and then everyone had to wait two weeks because it was quarter-end change freeze. That, he says, is how enterprise AI delivery often works today.

The issue becomes sharper once AI coding tools increase the supply of deployable code. Grogan-Avignon argues that handing AI coding agents to developers simply moves the bottleneck downstream unless review and deployment infrastructure changes too. Coding agents are also turning PMs, designers, and domain experts into builders, expanding the number of people able to produce software.

She points to GitHub commit volume as a proxy for the pressure: in 2025, GitHub had reported 1 billion commits so far, was averaging 275 million commits per week, and was on track for 14 billion by year-end. Whether every commit represents enterprise-ready software is not the point. The point is the one Accenture put on the slide: the supply of code is growing, and approval infrastructure has not kept up.

275M

GitHub commits per week cited as the 2025 run rate

Grogan-Avignon’s diagnosis is that the “real tech debt” is not only legacy application code. It is years of underinvestment in engineering automation, including CI/CD and the machinery that lets companies move faster while maintaining control.

Her prescription is not to remove governance. It is to turn governance into executable, adaptive code.

Every single human process needs to become adaptable, executable code. Not another meeting, not a sign off chain, code.

Jess Grogan-Avignon · Source

That reframes governance speed as a technical problem. Wang later calls it the CTO’s top engineering problem and “the ultimate technical debt” to fix. In their view, enterprises do not need less control; they need control mechanisms that can operate at the speed of the systems they now want to deploy.

Traditional business cases can kill the work before it begins

Grogan-Avignon does not dismiss business cases outright. They ask useful questions, create oversight, and force someone to think about return on investment. The problem is that standard enterprise business cases assume three things are knowable upfront: the scope and solution, the expected value, and the cost and time to deliver.

With agentic AI, she says, that order is often backwards. The solution and the business case are learned by doing the work. As prototyping and experimentation costs fall, AI does not merely make existing work cheaper. It makes previously uneconomic work possible: new products, new services, and customer experiences that were not worth attempting under older cost structures.

The effect, in her telling, is not limited to cost reduction. Accenture’s cited research says that “AI Achievers” represent 12% of companies, while 88% remain stuck, often piloting and spending heavily without much return. Grogan-Avignon says those AI Achievers see about 50% higher revenue growth than peers, and frames that growth as coming from doing new things rather than only cutting costs.

Group	Claim in the source
AI Achievers	12% of companies
Companies still stuck	88% of companies
Revenue growth associated with AI Achievers	About 50% higher than peers

Accenture figures used to frame the gap between AI leaders and the rest

Grogan-Avignon uses several examples to make the point that AI value can emerge rather than arrive fully specified in a business case. She says Cursor’s user base of “vibe coders” did not exist when the product was being built or released. She says Claude Code was not planned months in advance on a conventional product roadmap. On the enterprise side, she cites Walmart as having built a social media trend scanner and generative designer that allow it to compete in new ways with Shein and Temu. She also cites JP Morgan as having started with an internal productivity tool that it later productized into a new revenue stream.

The article’s claim should not be stronger than hers: these are examples Grogan-Avignon uses to illustrate the kind of emergent value that a certainty-driven funding model may miss. Her objection is to finance systems “wired for certainty.” In many enterprises, a project starts by justifying itself through committed benefits and predictable cost phasing. That framing asks whether a specific known thing can be justified in advance. Grogan-Avignon argues that AI work often demands a different question: what becomes possible now, and what is the cost of not doing it?

Wang extends that into a funding model. For agentic transformation, he says, the CFO needs to think like a venture capitalist. A VC does not back one project and demand a guaranteed three-year payback because the certainty implied by that business case is “a fantasy.” Instead, the investor backs a portfolio, knowing many bets may not pay off, while searching for the few that compound.

That is the model Wang wants enterprises to apply to AI. The question should not be whether one isolated project can be justified with a precise upfront return. It should be whether the company is placing enough bets across the portfolio to find the ones that can change the business.

If finance cannot think that way, Wang says, transformation should start there, because everything else is downstream.

Agentic delivery needs scientific loops, not milestone programs

Jack Wang argues that data scientists and machine learning engineers have been working in the mode enterprise AI now needs: hypothesis, experiment, statistical confidence. In his view, many organizations treated that group like a modern “IT crowd,” useful but isolated, while the rest of the enterprise continued to run “real work” through Jira boards, PI planning, upfront design, and status reporting.

Agentic systems make that separation untenable. Models are non-deterministic. Agent behavior is emergent. A team cannot scope an agent like a conventional feature build or milestone it like a fixed program, because the work is not simply implementing a known requirement. It is discovering what the system can do reliably enough to be trusted in a specific context.

Wang says much of the energy in delivery trenches is spent not on building the system but on bridging the gap between how these systems actually work and what stakeholders expect. That gap produces utopian design sessions upfront, conversations about guaranteed performance, endless status updates, and decisions that never get made.

His alternative is hypothesis-driven delivery. The program should be reshaped around one goal: building statistical confidence. That means small loops of build, evaluate, iterate, and generate evidence quickly. Delivery should not be measured only by what was delivered, but by what the team learned and how much confidence it created.

This also changes the talent profile. Wang says teams need people comfortable with ambiguity, able to articulate learning rather than just outputs, and able to translate statistical results into stakeholder confidence. He names PMs, delivery leads, architects, and business analysts as roles that need to be upskilled into this way of working.

The proposed loop is simple in structure but demanding in practice: experiment, deploy in constrained ways, evaluate, and iterate. What matters is not that the loop exists on a slide, but that the organization accepts evidence accumulation as the unit of progress.

Trust is shipped through progressive autonomy

Jess Grogan-Avignon defines trust broadly: content quality, accuracy, security, responsible use, privacy, and the other conditions that let users and stakeholders rely on an AI system. Her claim is that, in agentic delivery, completed features may not be the most valuable thing a team ships. The more durable asset is trust in the system’s outputs and behavior.

She describes agentic delivery as a series of deposits and withdrawals in a trust account with stakeholders, leadership, and end customers. A feature can change or disappear. The trust built through evidence may survive.

That matters because many companies still treat agents like traditional automation: test the workflow, deploy it, let it run. Grogan-Avignon says that model is inadequate. Agents are not simply built and turned on. Their behavior is emergent, and teams cannot foresee every response or behavior upfront and test for it with a conventional pass-fail mindset.

Evaluation suites are important, she says, but the deployment model matters too. Accenture’s slide presents the deployment path as an “exposure ladder,” with autonomy increasing only as outcome evidence accumulates. The point is not merely to test an agent before release, but to control how much the system can affect real outcomes at each stage.

Stage	Agent role	How confidence is built
Shadow mode	Runs alongside human processes without affecting outcomes	Compare agent outputs with human decisions
Advisory mode	Recommends in live workflows while humans approve or reject	Use approvals, rejections, and corrections as feedback
Controlled autonomy	Acts in narrow, low-risk scenarios	Operate with limits, kill switches, and outcome evidence
Expanded autonomy	Handles broader scenarios as confidence increases	Advance only when target behaviors are evidenced
Autonomy	Runs with wider independence	Reached through accumulated trust, not project-plan completion

The progressive-autonomy ladder described for deploying agentic systems

In shadow mode, an agent runs alongside human processes but cannot affect outcomes. The team compares human decisions with the agent’s recommendations and uses the difference as a signal for iteration. In advisory mode, the agent runs live but only recommends; humans still approve or reject outcomes, producing another signal. In controlled autonomy, the agent can trigger actions, but only in narrow, low-risk scenarios with clear limits and kill switches. Over time, the system may move toward expanded autonomy and then broader autonomy.

The gating principle is the key distinction. Each step is gated by evidence in outcomes, not by completion of activities in a project plan and not by a one-time pass-fail test. Teams should engineer for trust, not just for completion.

The moat is living memory, not yesterday’s systems of record

Jack Wang argues that in a recursive world where AI can code AI, anything that ships and goes viral can be cloned quickly. That raises a harder question for enterprises: what is unique only to them?

His answer is not the usual enterprise data estate. CRM, ERP, SOPs, and existing enterprise knowledge matter, but he calls them “transactional memory.” They got the company to the table, but every competitor has some version of them. They are a floor, not a fortress.

The defensible asset is what he calls “living memory”: the signals generated when customers interact with the product in the company’s own context and at its own scale. Edge cases, corrections, emotional intent, and actual behavior become the material that competitors cannot simply copy.

That view changes the meaning of deployment. The day a system ships is not the finish line; it is when the race begins. The competitive question becomes how quickly the organization can turn signals into value and compound what it learns.

Wang’s engineering rule is blunt: every feature should either generate feedback signals or deliver on what previous signals have taught the organization. If it does neither, he says, the company is building something anyone can copy.

Feedback is not an option. Feedback is the only moat.

Jack Wang · Source

That also changes what CEOs should value. Wang says the moat is not what the organization holds from yesterday; it is what it learns and compounds every day. The firms that thrive will not necessarily be the earliest adopters. They will be the ones that learn how to learn, build living memory through feedback loops, and cultivate trust with employees and customers. In his phrasing, that cannot be bought or copied.

The operating agenda is to change the institution before scaling the agent

Grogan-Avignon and Wang reduce the prescription to four operating shifts, each aimed at a different institutional constraint.

The first is delivery. Start now, but do not start with the usual project machinery. Shape the next agentic project around hypotheses rather than fixed requirements or precommitted features. Run small loops of experiment, evaluation, and iteration. Measure progress in confidence, not only in completed scope.

The second is finance. Make finance a transformation partner, not a gatekeeper. Instead of forcing every AI project to justify itself in isolation, build a portfolio of bets across the organization. Look beyond cost-out certainty toward new value that may only become visible through experimentation.

The third is governance. Make governance speed an engineering problem, not an escalation problem. Build governance-as-code and treat approval infrastructure as a form of technical debt. The goal is not to weaken control, but to move faster with control.

The fourth is learning. Redefine the moat as what the organization compounds from today. Build feedback loops into the product from the beginning, so deployment starts the learning process rather than ending the project.

Grogan-Avignon’s final summary is compact: “Bet like a VC, upgrade for machine speed, and engineer for trust with the feedback loop from day one.” Their operating test for enterprise agentic AI is therefore institutional before it is technical: whether the company can change how it funds, approves, delivers, measures, and learns from the work.

AI Application Architecture Evals and Benchmarks Agents and Autonomy Enterprise AI Adoption