Applied AI Moves From Capability To Controlled Deployment

Applied AISaturday, May 16, 20261h 47m to watch16 min read

Bloomberg Technology, Kevin Roose and Casey Newton, Kyndryl’s Kris Lovejoy, Tasklet’s Andrew Lee, Intercom’s Brian Scanlan, Wayve’s Alex Kendall, and Waabi’s Raquel Urtasun all pointed to the same shift: AI progress is increasingly limited by the systems around the model. Chips, energy, cyber review, enterprise context, workflow controls, validation, and liability are becoming central to whether AI can be deployed safely and economically.

The AI race is becoming a constraint race

AI competition is moving from the question of who can show the most impressive model or product to the question of who can operate under the hardest constraints. Bloomberg Technology framed the current race around advanced chips, cloud capital expenditure, energy, memory, data centers, export policy, semiconductor labor, and distribution. The common point was that software velocity now depends on an industrial base underneath it.

Goldman Sachs’ Eric Sheridan described the market as still being in an “infrastructure-led cycle.” Hyperscaler capital spending remains elevated, he said, but investors are more willing to tolerate it when revenue visibility improves. Sheridan cited more than $900 billion in combined future revenue backlog across Alphabet’s and Amazon’s cloud divisions, arguing that the backlog gives investors more confidence that today’s capex can become cloud revenue one, two, or three years later.

$900B+

combined future revenue backlog Eric Sheridan cited across Alphabet and Amazon cloud divisions

That framing changes the meaning of several company stories. Custom silicon is not merely a technical side bet; Sheridan treated it as a margin, performance, and distribution strategy. He called Alphabet’s TPUs and Amazon’s in-house silicon one of the market’s most underappreciated narratives because custom chips can pull workloads into each company’s cloud ecosystem and allow hyperscalers to keep more of the economics inside the stack. TPUs may not beat GPUs on absolute performance, in his account, but can compete on price-to-performance for the right workloads.

The same logic applies to model partnerships. Sheridan did not describe the relationships among Amazon, Alphabet, Anthropic, cloud distribution, and foundation-model competition as tidy bilateral alliances. He described an interdependent market: model companies need compute; hyperscalers can provide it; hyperscalers also provide enterprise distribution. Only a handful of companies, in his view, tend to earn excess returns during major computing shifts at the infrastructure and platform layers, and this cycle requires enough capital that only a few can build at the required scale.

The Trump-Xi discussion over Nvidia’s H200 chips added geopolitics to the same operating problem. Donald Trump said aboard Air Force One that Nvidia’s H200 came up in his discussion with Xi Jinping, praised Jensen Huang, and said China “needs” the chip. Bloomberg’s reporting did not treat that as an export agreement or purchase commitment. The market reaction reflected ambiguity: Bloomberg showed the Philadelphia Semiconductor Index down 3.82% intraday and Nvidia down 4.08%, after a roughly 70% year-to-date rally in chip stocks before the prior close.

The policy detail matters because chip access has become part of product strategy. If a model company cannot get enough compute, if a cloud provider cannot secure enough power or memory, or if export policy changes the addressable market for advanced accelerators, software roadmaps change. OpenAI CFO Sarah Friar made that explicit from the company side, saying compute itself has been a bottleneck and naming energy and memory from Southeast Asia as supply-chain choke points. Bloomberg also reported strain in OpenAI’s Apple distribution relationship, a separate reminder that frontier capability does not automatically translate into consumer reach.

The industrial base underneath AI is also labor constrained. Shari Liss of SEMI said CHIPS-related investments are driving “incredible growth” in U.S. semiconductor manufacturing, but the country will need roughly another 150,000 workers in the chip environment. That shortage is not only PhDs or chip designers; she named fab technicians, operators, engineers, researchers, marketing, and finance talent. In her account, the United States has to make fabs “hum,” not just fund them.

Constraint layer	What Bloomberg’s discussion emphasized	Why it changes AI strategy
Compute	OpenAI’s Sarah Friar said compute is a bottleneck.	Model demand can outrun even aggressive infrastructure planning.
Custom silicon	Sheridan highlighted Alphabet TPUs and Amazon silicon.	Cloud providers can tie workloads, margins, and performance economics to their own stacks.
Export policy	Trump discussed Nvidia H200 chips with Xi without a clear deal.	Access to advanced chips is now part of geopolitical bargaining and market expectations.
Energy and memory	Friar named energy and Southeast Asian memory as choke points.	AI scaling depends on supply chains beyond model engineering.
Labor	SEMI’s Shari Liss cited a need for about 150,000 more U.S. semiconductor workers.	Domestic chip strategy depends on training and operating capacity, not only fabs.

Infrastructure constraints now shaping applied AI competition

Bloomberg’s infrastructure frame does not identify a single winner. It points to a narrower conclusion: software velocity increasingly depends on systems outside the model itself. Chips, cloud distribution, data-center capacity, energy procurement, supply chains, labor, and routes to customers are becoming part of the applied AI product stack.

AI Competition Shifts From Models to Chips, Power, and Supply ChainsBloomberg Technology

Cyber made AI safety operational, not abstract

The cyber model story is the clearest case of AI capability forcing institutions to respond. Kevin Roose and Casey Newton described the Trump administration’s possible turn toward pre-release model review not as a broad ideological conversion to AI regulation, but as a reaction to frontier models that can find novel software vulnerabilities and help chain them into attacks.

The reported policy move would be notable because Trump canceled President Biden’s AI executive order on his first day back in office, including a review framework for frontier models. Newton said many Republicans had previously attacked such review as anti-innovation and as a way for the United States to lose ground to China. Roose said parts of the tech right and libertarian technology world had denounced pre-release testing and government submission of results as “communist” during earlier fights over AI regulation.

Anthropic’s Mythos preview model appears to have changed the political facts. Newton described it as unusually capable at finding novel vulnerabilities across many programs, and said people in the administration appear to have concluded, in his characterization, that a broadly released model of that kind could create “vast amounts of harm.” Roose called the federal posture “entirely confused and incoherent,” not because the risk is imaginary, but because the government is trying to do several incompatible things at once.

The policy tension is not whether cyber-capable models matter. It is whether a safety review system can reduce release risk without becoming political gatekeeping.

That incoherence runs through the details. The administration is reportedly considering model vetting while also weighing chip access for China. It is trying to keep dangerous models away from adversaries while also considering commercial deals that could help foreign competitors train similar models. Newton said the Pentagon has designated Anthropic as a supply-chain risk because the company would not amend its contract to allow any “lawful use” of its technology, while also implementing Mythos to scan for vulnerabilities during the period when it is supposed to unwind Anthropic technology.

The agency question is unresolved too. Roose identified a turf fight between CASI, the Commerce Department body formerly known as the U.S. AI Safety Institute, and intelligence-community agencies such as the NSA. Newton argued CASI was built for exactly this problem: evaluating stronger models before release. But he also pointed to the harder enforcement question. If a company wants to release a commercially important model that evaluators consider too dangerous, what happens?

The operational cyber evidence makes the policy debate less theoretical. Palo Alto Networks CEO Nikesh Arora said the defender’s timeline is collapsing. Over the past seven years, he said, the time between breach and extraction of an organization’s “crown jewels” was measured in days. With AI and related technologies, he said, that timeframe is moving toward minutes. Defensive infrastructure built for days has to work in minutes, and some components must operate in seconds.

The vulnerability numbers sharpen the point. Newton cited Mozilla data showing Firefox security bug fixes rising to 423 in April 2026, after monthly counts mostly in the teens and twenties through 2025, then 61 in February and 76 in March. Palo Alto disclosed 26 critical CVEs representing 75 issues in a May “Patch Wednesday” advisory, compared with its usual volume of fewer than five CVEs in a month. The company said it was the first time most findings came from frontier AI models scanning its code, and that none were being exploited in the wild.

423

Firefox security bug fixes Mozilla reported for April 2026, versus roughly 22 per month on average in 2025

Arora did not present the models as magic. He said roughly 30% of Mythos findings were false positives at first and had to be tested. The model improved when Palo Alto gave it context: what the code was supposed to do, how the product worked, and what threat techniques had appeared in thousands of past attacks. He also said Mythos and OpenAI’s GPT-5.5 Cyber found different things, implying that one scan by one model is not a complete audit.

The asymmetry still favors attackers in important ways. Arora restated the old security rule: defenders have to be right 100% of the time; attackers have to be right once. If a model finds five vulnerabilities and an attacker can exploit one, the attacker wins. His worry is not that cybercrime becomes conceptually new. Ransomware, economic harm, and nation-state operations remain familiar. What changes is pace, volume, and the ability to “daisy chain” weaknesses.

That is why the AI-safety debate is becoming operational rather than philosophical. Pre-release review is no longer only about speculative existential risk or generic “responsible AI.” It is about whether a model can help discover and weaponize vulnerabilities faster than banks, hospitals, manufacturers, infrastructure operators, open-source maintainers, and software vendors can patch them. Roose and Newton did not resolve whether pre-release review is good policy. They surfaced the tension: some form of review may be needed, but review power can also become prior restraint, speech control, or partisan model scoring.

AI Cyber Models Push Trump Administration Toward Pre-Release Safety ReviewsHard Fork

Enterprise agents are blocked by context, not imagination

Inside enterprises, the operational frontier looks less like export controls and more like old systems, missing records, compliance obligations, and fragile dependencies. Kris Lovejoy of Kyndryl drew a hard line between building an agentic AI demo and running an agent across production infrastructure. The demo can be fast. The enterprise system has to be secure, compliant, resilient, reliable, scalable, and aware of context accumulated over decades.

Lovejoy described agentic AI as widespread in pilots but rare at meaningful scale. Enterprises, in her view, are past pure experimentation but not yet in “the age of industrialization.” They are finding agentic systems costly, somewhat insecure, unreliable, and difficult to scale. In Europe, she added, sovereignty concerns create another constraint.

Her bullet-train metaphor connects directly to the infrastructure race outside the enterprise. A bullet train capable of 150 miles per hour cannot run safely on track built for 30 to 60. In the same way, an agent may be capable in isolation but constrained by a company’s legacy estate: multiple clouds, old systems, SaaS products, undocumented integrations, and brittle operational practices.

The most concrete failure mode is not science fiction. Lovejoy described a vulnerability-checking agent that finds an unpatched communication protocol and patches it. Technically, the agent did the reasonable thing. Operationally, it may have broken a 15-year-old dependency tied to a legacy system, taking down a critical service. The problem is not that the model cannot spot a vulnerability. It is that the model may not know why the system was configured that way.

CMDB records, ticket histories, incident reports, ServiceNow-style systems, compliance documents, and administrator knowledge all become part of the agent’s safety envelope. In mature organizations, some of that context is recorded. In less mature ones, it may be scattered across people’s memories or gone after turnover.

Lovejoy’s near-term path is deliberately unglamorous: IT service management. She named incident management, problem management, configuration management, patching, provisioning, security health checks, compliance, and audit. These processes are attractive for agentic AI not because they are flashy, but because many are already governed by runbooks and control frameworks. She pointed to ITIL as the administrator’s “bible,” saying there are 34 ITIL processes and estimating that roughly 20 can be “agentified.”

Up to 90%

IT service management cost reduction Lovejoy said may be possible in some cases

The economic claim is bounded but important. Lovejoy said automating parts of IT service management can reduce costs, in some cases by as much as 90%, producing what she called a “modernization dividend.” The savings can then help fund the infrastructure modernization broader agentic AI requires. In that account, the first serious enterprise wins may be incident handling, patch workflows, provisioning, compliance evidence, and configuration hygiene — tasks that make the rest of the company more automatable.

Her distinction between humans “in the loop” and humans “over the loop” is also central. A human in the loop approves or intervenes inside the workflow before consequential actions happen. A human over the loop supervises agents from above. Highly regulated, brittle, or safety-critical environments will require more direct human checkpoints. Less critical workflows may tolerate more supervisory oversight.

This is a different adoption story from the idea that agentic-native startups can simply outrun incumbents. Lovejoy said startup demos can be astonishing, but enterprise buyers quickly ask where the infrastructure runs, whether it is SOC-compliant, and how it connects to SAP, CRM, email, and finance systems. If the product does not touch sensitive enterprise data, adoption is easier. If it does, the buyer is accepting operational, security, compliance, and resilience risk.

Her five-year prediction is narrower than much AI hype: by about 2031, she expects agentic AI to perform roughly half of traditional line-one and line-two systems-administration tasks in the IT infrastructure services market, with humans either in or over the loop. That is not a claim that the entire enterprise becomes autonomous in a few years. It is a claim that the rails need to be built before the train can run faster.

Legacy Infrastructure Is Slowing Enterprise Agentic AI AdoptionEye on AI

The software winners may own context, skills, and workflows

The software question is no longer only which application gets an AI feature first. Andrew Lee of Tasklet and Brian Scanlan of Intercom both described a deeper shift: useful AI systems need organizational memory, tool access, repeatable procedures, model choice, permissions, telemetry, review, and reliability. The value moves toward whoever can make those pieces work together.

Lee offered the broader market taxonomy. He expects AI software to consolidate around three surviving categories: a small number of horizontal agent platforms that hold context and connections; headless or API-first companies that provide underlying capabilities; and solutions companies that sell outcomes while hiding software inside service delivery. In his account, the traditional SaaS interface becomes less durable as agents learn to generate UI, call APIs, write code, and operate across systems.

Tasklet’s own rebuild shows why the chat window is not enough. Lee said the company moved from a workflow-automation product toward a general synchronous agent because users who had connected their accounts and given Tasklet organizational context wanted to talk to it directly. That broke the naive model of sending the whole chat transcript to the LLM. Tasklet moved history into the file system, giving the model hints about what to read rather than feeding every past token into every run.

The architecture is a practical answer to the same context problem Lovejoy raised, but at the product-platform layer. Tasklet’s memory is not just a transcript. It combines durable files, recency-weighted summaries, provider-specific caching, agentic lookup, persistent virtual machines, browsers, shell access, databases, and integrations. Lee described the harness not as a thin wrapper around a model call, but as a “mecha suit”: storage, tools, compute, permissions, user interaction, and ways to affect the world.

Tasklet also illustrates why model neutrality may matter. Lee said Anthropic remains an important supplier and partner, but also a major competitor: roughly 80% of Tasklet churners go to an Anthropic product, according to him, often because they already have a Claude Max plan. OpenAI’s GPT-5.5 has become good enough in Tasklet’s testing to make multi-provider support more credible. Tasklet’s pitch to businesses is not that it has picked the winning model lab, but that customers should not have to.

Software category	Andrew Lee’s description	Operational requirement
Horizontal platforms	General agent platforms that hold context, connections, generated UI, and workflow execution.	Memory, permissions, tool access, rollback, model routing, and ergonomic oversight.
Headless/API companies	Infrastructure whose UI may matter less but whose capability remains essential.	Reliable APIs, compliance, and integration into agent-driven workflows.
Solutions companies	Businesses that sell outcomes rather than software seats.	AI embedded inside delivery, with accountability for the result.

Lee’s taxonomy of AI software positions that may remain durable

Intercom supplies the organizational version of the same argument. Scanlan said Intercom doubled engineering pull-request throughput in less than a year by treating AI coding as an internal platform strategy, not an individual productivity preference. The company standardized on Claude Code, made AI adoption an explicit expectation across R&D, created a dedicated “2x” team, encoded recurring work into skills, connected agents to internal systems under existing controls, and instrumented usage.

Intercom’s reported increase in engineering pull-request throughput after standardizing its AI development system

The reported metrics are company-specific, not universal proof. Intercom’s internal PR-flow dashboard showed 968 total merged pull requests, 95.9% with a Claude label, and a 17.6% Claude-approved rate. Scanlan said review became the bottleneck, so Intercom began moving some simple approvals to automatic agents after backtesting, human labeling, and auditor work around SOC 2, ISO 27001, and HIPAA compliance. He also said CI “melted,” showing that higher coding throughput pushes pressure downstream into testing, review, and deployment infrastructure.

The most transferable idea is the “skill,” not the prompt. Intercom packaged recurring engineering work into durable procedures agents can invoke and improve. A flaky-test skill, for example, required CI-log access before diagnosis, warned against guessing from code alone, gathered context from S3 and Buildkite, classified failure modes, and followed a structured workflow. Scanlan said the value was not merely that an agent could fix one flaky test; the procedure became reusable institutional knowledge.

Tasklet and Intercom approach the problem from different angles. Tasklet explains the market architecture: file-system memory, agentic search, generated UI, provider neutrality, and the shrinking durability of SaaS tabs. Intercom explains the operating architecture: standardization, adoption expectations, skills, plugins, telemetry, automated review, CI bottlenecks, and the conversion of internal know-how into agent-usable procedures.

The next layer of AI software may not be another chat window. It may be the system that remembers the organization, knows the rules, chooses the tools, invokes the right capabilities, and lets agents act safely enough to be useful.

AI Software Winners Will Own Context, APIs, or OutcomesThe Cognitive Revolution

Intercom Doubled Engineering Throughput by Standardizing on Claude CodeAI Engineer

Physical AI has the same pattern: deployment, validation, and liability

Self-driving shows the same operationalization pattern in a setting where the cost of error is more obvious. Wayve CEO Alex Kendall and Waabi CEO Raquel Urtasun both argued that autonomy has moved from basic science risk toward deployment risk, but they did not claim the problem is simply solved. The remaining work is integration, validation, regulation, OEM relationships, economics, liability, and commercialization.

Kendall said “self-driving in a way that economically scales the world” is not solved. His point was that the scientific question — whether end-to-end AI can learn a generalizable driving policy — has moved behind Wayve, while the harder commercial question remains: can that policy be integrated into production vehicles, validated across broad domains, accepted by regulators, and distributed through automakers and fleets?

Wayve’s strategy is to license an “intelligence layer” to automakers and fleets. Kendall contrasted that with Tesla, which builds its own cars, and Waymo, which builds city-by-city robotaxi fleets. Wayve’s model depends on a flexible AI driver that can work across consumer vehicles, robotaxis, vehicle types, geographies, and sensor configurations. He pointed to Nissan plans and Uber supervised robotaxi trials in London, Tokyo, and 10 other cities as commercialization paths.

Waabi’s strategy overlaps but differs. Urtasun emphasized an L4-native architecture, trucking-first commercialization, scalable maps as an added safety layer, and a Driver-as-a-Service model priced primarily per mile. Waabi does not plan to own trucks or robotaxis, and does not want to become an OEM. In robotaxis, Uber provides the marketplace, an OEM supplies the redundant vehicle platform, and Waabi provides the autonomy technology. In trucking, Urtasun said the company has been conducting commercial operations with shippers and carriers in North America since 2023 and has an Uber Freight partnership for “billions of miles” of deployment.

Dimension	Wayve	Waabi
Initial emphasis	Consumer vehicles and robotaxis through OEM and fleet licensing.	Trucking first, expanding into robotaxis.
Commercial model	Licenses an intelligence layer to automakers and fleets.	Driver-as-a-Service, primarily per-mile.
Autonomy stance	Works across L2, L3, and L4 programs for OEMs that want one partner.	Argues L4 must be built as L4-native, not as L2 with fewer interventions.
Maps and sensors	Emphasizes generalization and sensor flexibility across cameras, radar, and lidar.	Uses HD maps when they can be built efficiently as an added safety layer.
Distribution	Named Uber supervised trials and OEM partners such as Nissan.	Uses Uber Freight and Uber robotaxi marketplace partnerships, plus OEM platforms.

Wayve and Waabi share an AI-first autonomy thesis but differ on deployment path

World models are the shared technical substrate. Kendall described a world model as understanding the state of the world, the action taken, and how the world evolves afterward. It helps the system learn what matters for driving and supports simulation, replay, adversarial testing, and validation. Urtasun described Waabi’s world model as a controllable physical-AI simulator that can generate safety-critical scenarios “with no consequences.” Both treat simulation as a way to reduce the impossible burden of learning every rare case on public roads.

The physical setting makes the operational constraint more unforgiving. A chatbot can hallucinate. A self-driving car cannot casually hallucinate. Kendall emphasized that embodied driving requires real-time onboard inference, uncertainty-aware decisions, large-dimensional sensor data, and safety-critical outputs. Urtasun said L2 and L4 are different safety problems: in L2, the human remains responsible; in L4, there is no human fallback, so the system must be designed for that safety case from the beginning.

Liability is part of the product. Kendall said that for hands-off systems, the driver should remain liable if the system is implemented correctly. For eyes-off or driverless systems, liability shifts to the manufacturer or operator, with insurance and contracts allocating risk across the ecosystem. Urtasun’s insistence on OEM redundant platforms reflects the same point. The commercial product is not just a model that drives; it is a validated, regulated, insured, integrated system with known responsibility when no human is driving.

The comparison with enterprise agents is direct. In offices, the hard problem is whether an agent has enough context, permissions, logging, rollback, and human oversight to touch production systems. On roads, the hard problem is whether an autonomy stack has enough validation, sensor coverage, simulation fidelity, regulatory approval, liability allocation, and OEM integration to touch the physical world. In both cases, capability is only the beginning.

Self-Driving Startups Shift From Science Risk to OEM DeploymentThis Week in Startups