Orply.

AI Advantage Moves Into The Systems Around The Model

Applied AI · Thursday, May 7, 2026 · 5h 3m to watch · 15 min to read

Across today’s sources, applied AI was framed less as a contest over standalone models and more as an operating problem: agents need source, memory, monitoring, constraints, and secure access to do useful work. The same systems view appeared in infrastructure, where demand is spreading beyond GPUs into CPUs, memory, fiber, fabs, power, chip design, and platform control points.

Agents are becoming operating systems, not prompt boxes

The strongest applied-AI thread today was not a new model release. It was a set of engineering claims about what agents need when they are expected to do real work: source code they can inspect, memory at the right scope, production monitoring that can read semantic failure, and guardrails that turn repeated mistakes into system constraints.

Michael Arnaldi’s coding-agent workflow for Effect made the most concrete version of this argument. His recommendation was deliberately practical: if a coding agent needs to use a library well, put the library’s source code inside the project where the agent can inspect and imitate it. In the workshop, that meant adding the Effect repository as a visible subtree under .repos/effect, asking the agent to extract local pattern files, and then steering implementation with strict TypeScript diagnostics, tests, lint rules, and persistent instructions in AGENTS.md. His operating model was less “write a better prompt” than “make the repository carry the knowledge the model needs.”

Google Cloud’s Annie Wang made a related argument at the application-architecture layer. Her “goldfish memory problem” was framed as an architecture failure, not a model failure. An agent that forgets a user’s Tokyo travel preferences within the same conversation needs session state. A workflow where a restaurant-finding agent hands a destination to a navigation agent needs shared workflow state. An assistant that remembers preferences after a restart needs persistent storage. In Wang’s framing, memory is not one feature; it is a set of scopes that have to match the work being done.

Raindrop’s observability workshop extended that systems view into production. Zubin Kumar, Danny Gollapalli, and Ben Hylak argued that offline evals are necessary but not sufficient for agents whose inputs, outputs, tool use, and trajectories are open-ended. Their proposed monitoring stack tracks explicit telemetry such as tool errors, latency, cost, and regenerations, but also implicit semantic signals such as user frustration, refusals, task failure, laziness, jailbreak attempts, capability gaps, and unusual workarounds. A production agent can be failing before a clean exception appears; the first signal may be a cluster of frustrated user messages around a provider, tool, or intent.

Taken together, these sources point to the same operational shift from different layers. Arnaldi’s unit of control is the repository: source, patterns, diagnostics, lint rules, tests, and small tasks. Wang’s unit is memory scope: current conversation, multi-agent workflow, and cross-session persistence. Raindrop’s unit is production behavior: traces, semantic labels, alerts, experiments, and self-diagnostics. They are not making identical claims, but they converge on a practical conclusion: agent quality depends increasingly on state, tooling, feedback, and constraints around the model.

The commercial case for that shift came from Replit. Amjad Masad said Replit went from roughly $2.5 million to $250 million in revenue run-rate in one year after launching Replit Agent, with about $1 million of ARR on its first day. His explanation was that the agent did not merely write code; it handled enough of the surrounding software lifecycle — environment setup, packages, databases, debugging, deployment — for non-engineers to build and run applications. Replit’s product-market-fit moment, in Masad’s telling, came when a non-engineer inside the company could make an app.

That makes “AI coding” too broad a category to be useful on its own. Arnaldi described how to make an expert framework usable by an agent inside a real codebase. Replit described how to turn software creation into a platform for non-engineers. Descript’s Laura Burkhauser described the analogous constraint in media: an agentic editor is useful only if its actions are reliable, reversible, and fitted to the project’s internal edit structure. Her view was that creators do not reject all AI equally; they welcome narrow, predictable tools and resist unreliable or replacement-framed generation.

The AI infrastructure trade is widening beyond GPUs

The second major thread was the broadening of AI infrastructure demand. The sources did not argue that GPUs are becoming unimportant. They argued that AI demand is now pulling CPUs, memory, optical networking, power systems, fabrication capacity, chip-design tools, and even launch economics into the same bottleneck map.

Arm chief executive Rene Haas told Bloomberg that visibility for Arm’s AGI CPU orders had doubled from $1 billion to $2 billion in five weeks, even as smartphone demand weakened. Haas tied that demand to agentic AI workloads that require CPU-heavy orchestration, scheduling, and management that GPUs do not handle. Bloomberg’s AMD coverage made a similar point from the market side. Ian King, Caroline Hyde, and RBC’s Srini Pajjuri framed AMD’s rally around Lisa Su’s CPU forecast and a changing CPU-to-GPU ratio as inference and agentic workloads grow. Pajjuri said historical training systems might use one CPU for four or eight GPUs, while inference and agentic AI could move closer to one CPU for two GPUs, with the longer-term ratio still uncertain.

That CPU story does not displace the accelerator story; it complicates it. Pajjuri still described the GPU and custom-chip opportunity as much larger than the CPU market. But Arm and AMD were both presented as evidence that inference and agents require systems, not just accelerators. Nvidia’s Corning deal added another part of the stack. Bloomberg described Nvidia putting up to $500 million into Corning with a warrant for up to 15 million shares, while analysts framed fiber and optical connectivity as emerging constraints as copper reaches limits and data has to move faster among servers and data centers.

Memory and fabrication pressure appeared through Samsung and the semiconductor supply-chain primer. Samsung crossing a $1 trillion valuation was presented by Sangmi Cha as an expression of AI infrastructure demand, especially tight memory-chip supply, not just a symbolic milestone. The broader semiconductor primer described a system that ships about a trillion devices a year but remains dependent on a few specialized chokepoints: ASML’s $400 million EUV machines, TSMC’s concentration of more than 90% of advanced-chip manufacturing in Taiwan, and fabs that take years and deep expertise to bring up.

Apple’s chip-supply discussions with Intel and Samsung fit the same pattern from a platform company’s perspective. Mark Gurman framed Apple’s early talks about using new U.S. fabs for future A-series and M-series chips as a redundancy question, not only a political one. Apple still relies heavily on TSMC, primarily in Taiwan, and Gurman described that concentration as one of Apple’s biggest risks because Apple’s products depend on its custom silicon.

Several AI Ascent talks pushed the infrastructure question further upstream and further out. Ricursive Intelligence’s Anna Goldie and Azalia Mirhoseini argued that chip design itself is a bottleneck and that AI should be used to design the chips that train and serve AI. Naveen Rao of Unconventional AI argued from a different starting point: conventional digital computing may hit an energy wall within a few years, and AI hardware needs to be rebuilt around physical dynamics, time-domain computation, and architectures that blur memory and processing. Philip Johnston of Starcloud made the most speculative case, arguing that orbital data centers become economically plausible if launch costs fall below about $500 per kilogram.

The maturity levels differ sharply. Arm, AMD, Samsung, TSMC, ASML, Apple, and Nvidia are immediate supply-chain and market questions. Ricursive is a venture-backed attempt to change chip-design tooling. Unconventional AI is a physics-first hardware thesis. Starcloud depends on major launch-cost reductions and orbital engineering. The useful synthesis is not that all are equally likely. It is that the AI buildout is no longer adequately described by who has the best GPU. The constraint map now includes CPUs, memory, lithography, foundries, fiber, power, chip-design cycle time, and alternative compute substrates.

$2B
Arm AGI CPU order visibility Haas said had doubled to in five weeks

Model competition is fragmenting across context, data, modality, and workflow

Model competition in the day’s sources looked less like a single leaderboard race than a set of different access problems: long context, data efficiency, modality, proprietary workflow data, and interface form.

DeepSeek V4 Preview was the clearest model-access story. Károly Zsolnai-Fehér presented it as consequential because it combines open weights, frontier-adjacent benchmark results, and a reported one-million-token text context window. His emphasis was not that DeepSeek wins every benchmark. It was that a freely self-hostable or low-cost model appears close enough to recent closed frontier systems to change what developers can afford. He also stressed limits: it is text-only, degrades near the edge of its context window, and still requires serious hardware at full scale.

That long-context story sits uneasily, but usefully, beside Arnaldi’s warning that larger context is not a substitute for repository structure. DeepSeek’s one-million-token window may let a model absorb large documentation sets. Arnaldi’s workflow suggests agents still need curated source, pattern extraction, narrow tasks, context resets, and hard constraints to avoid pollution and shortcuts. The two claims do not contradict each other; they define different parts of the problem. Long context expands what can be supplied. It does not decide which knowledge should be durable, how tasks should be scoped, or how repeated failures should be boxed in.

Flapping Airplanes made a different bottleneck claim: data scarcity, not compute alone, may determine where AI creates value next. Ben and Asher Spector argued that search and coding have produced large gains partly because they are unusually data-rich, while robotics, trading, science, industrial workflows, and narrow supply-chain tasks lack comparable datasets. Their proposed answer is not more scraping alone but more data-efficient models, enabled partly by new GPU-level primitives that current frameworks such as PyTorch make hard to express.

Pinterest, Descript, ElevenLabs, and Luma each argued for a more specific version of the same theme: proprietary signals and domain workflows matter. Bill Ready said Pinterest’s advantage in AI shopping comes from taste and intent signals across more than 80 billion monthly searches, more than half commercial. Burkhauser said Descript builds in-house models where it has editing-workflow data and borrows models where it does not. Mati Staniszewski said ElevenLabs’ hard audio problems now include emotion, timing, authentication, workflow, and domain-specific annotation. Amit Jain argued that Luma’s future is not merely video generation but a unified multimodal transformer and a skills-and-tools stack for professional creative work.

The modality claims diverge in useful ways. ElevenLabs sees voice as the primary interface for agents, robots, services, education, healthcare, and government interactions once intelligence becomes abundant. Luma sees video and multimodal generation as staging points toward systems that can do end-to-end professional work across text, images, audio, video, code, and tools. Descript is more restrained: its bet is not infinite generation but reliable editing of recorded human media. These are different answers to where AI leaves the prompt box and enters work: speech, creative production, recorded media, shopping, and tool-using agents.

Platform companies are opening weak layers while defending control points

Apple appeared in two forms today, and the contrast was revealing. On chips, it is exploring more supplier redundancy because dependence on TSMC and Taiwan is a physical risk. On AI models, Gurman said Apple is moving toward outside model choice because Siri and Apple Intelligence lag rivals. In both cases, Apple is not giving up the platform. It is trying to preserve the customer relationship while reducing dependence on a weak or concentrated layer.

Gurman described Apple’s reported iOS 27 plan as a bounded form of model agnosticism. Users may be able to choose outside AI models such as Gemini, Claude, and ChatGPT for some features, while Apple rebuilds Siri partly using Google Gemini models. Gurman separated infrastructure use of Gemini from consumer-facing Gemini functionality: Siri may be improved using Google model work without becoming Google Gemini as a product. The near-term customer benefit is optionality. The long-term strategic issue, Gurman said, is that a hardware company with future AI-dependent devices cannot rely indefinitely on third parties for the underlying intelligence.

Thoma Bravo’s Seth Boro used similar language — “model agnostic” — but for enterprise AI deployment. His reason was different. Thoma Bravo wants relationships with OpenAI, Anthropic, Google, and others because enterprises are still sorting out model economics, governance, security, and use-case fit. Boro argued that many organizations do not yet understand the cost of bringing tokens into workflows, especially for higher-functioning roles, and that specific models for specific use cases may be more efficient than broad general-purpose deployment.

Lauren Webster’s software-market comments translated that into investor language. She said the market is moving from broad AI panic to company-by-company sorting: which incumbents have embedded enterprise workflows and replacement friction, which companies can execute, and which are exposed to easier substitution by frontier labs. Palantir’s executives and supporters argued that its ontology and real-time enterprise data mapping are the layer that makes agentic AI useful; investors still questioned valuation and commercial concentration. The broader message from those sources was that generic model access is not the moat. Workflow depth, data architecture, services, deployment friction, and trust are where companies are trying to defend value.

Airbnb’s Brian Chesky made the platform-reinvention version of the same argument. He said Airbnb must move beyond homes toward identity, preferences, community, and a broader set of services, while adapting its operating model for AI. His view is that software itself may become less durable as AI lowers creation costs; what may last are community, brand, identity, mission, and authenticated trust. He wants Airbnb’s atomic unit to shift from a home to a person, including richer profiles, proof of personhood, preferences, and real-world relationships.

Razorpay’s Harshil Mathur supplied the founder-operator version. AI, he argued, lowers the cost of building and therefore weakens “building” as a moat. The remaining differentiation is speed and judgment: whether a company can decide what the market will become before the market forces it to respond. Razorpay’s own AI reset began with the question of how the company would be built if started today, followed by a decision to reinvent the platform before a startup did it to them.

Security and evaluation are becoming operating constraints

Several sources treated AI risk less as abstract safety discourse and more as a near-term operating problem.

The strongest security warning came from Oege de Moor of XBOW, who argued that autonomous AI hacking has already moved from assistance to exploitation. He said XBOW’s system found a remote code execution flaw in Bing Image Search with only a URL as input, and that the system reached the top of HackerOne under black-box conditions. His timeline was urgent: he estimated defenders have six to nine months before open-weight models catch up to Mythos-like cyber capability, making similar tools broadly available to attackers.

Boro’s enterprise-security comments were less dramatic but pointed in the same direction. He said enterprises should behave as if powerful cyber-capable models are already available or soon will be, and he emphasized layered defense, network visibility, identity governance, and monitoring of what agents are allowed to do. The governance questions he listed — what agents are doing, what information they have, where data comes from, and what actions they take — are becoming ordinary enterprise AI questions, not edge cases.

AI evaluation appeared in two different registers. Raindrop treated production observability as the practical evaluation surface for deployed agents. B Cavello of Aspen Digital treated evaluations as a governance and philanthropic lever: benchmarks and audits do not merely measure progress; they shape what developers optimize for, who can hold systems accountable, and whether users understand appropriate use. Cavello’s lifecycle map placed evaluation at goal definition, data gathering, model selection, model building, deployment, and monitoring. The implication is that measurement choices influence what gets built.

Washington’s role is widening too. Bloomberg reported that Alphabet, Microsoft, and xAI joined OpenAI and Anthropic in giving the Commerce Department’s AI evaluation center early access to models before public release. Maggie Eastland stressed that the center evaluates but does not regulate. Separately, Elizabeth Economy’s conversation with Randy Schriver and Mike Kuiken argued that U.S.-China policy now cuts across export controls, sanctions, supply chains, cyber infrastructure, Taiwan planning, pharmaceuticals, and investment screening, while the U.S. government still manages many of those tools through fragmented institutions built for an earlier era.

The national-security thread connects directly to the semiconductor thread. The chip primer described TSMC concentration in Taiwan, ASML lithography dependence, U.S. reshoring through the CHIPS Act, and Chinese state funding to close advanced-chip gaps. Economy, Schriver, and Kuiken extended that into a broader statecraft problem: the United States needs machinery to decide which economic ties with China are tolerable, which are dangerous, and which require national effort to replace. In applied AI, those questions are part of the operating environment for chips, models, data centers, cyber defense, and industrial deployment.

The human boundary is still a product boundary

Not every important AI deployment is an enterprise workflow or infrastructure bottleneck. Several sources focused on the human boundary: when convenience displaces care, when education can scale without filtering out talent, and when voice or editing tools mediate human expression.

Stephen Remedios’s “DaddyGPT” story was a caution about successful automation. He built an AI version of himself to answer his teenage sons’ routine permission requests while he worked. The problem was not that it failed. It worked so well that the children kept using the bot even when their parents were beside them, because it was always available, calm, and adaptive. Remedios’s conclusion was that some exchanges matter because a particular imperfect person is the one having them. The source is not a general anti-AI argument; it is a warning that convenience changes when it enters care relationships.

Andrew Thangaraj’s account of IIT Madras’s online data science degree addressed a different human constraint: talent systems that filter too early and teach too little usable skill. He argued that India’s AI capacity problem starts with a higher-education system where elite institutions serve too few students at too high a cost. IIT Madras’s online degree moves the filter from admission to completion: no JEE entry, low cost, modular exits, and a rigorous skills-heavy diploma stage with application and machine-learning projects. The model matters because AI talent at national scale cannot come only from scarce elite seats.

Voice and media tools sit between these poles. ElevenLabs argues that voice will become the primary interface for agents and robots as intelligence becomes more abundant, but Staniszewski emphasized that the next hard problems are emotional intelligence, timing, authentication, and workflow. Descript argues that creators will adopt AI when it removes the drudgery between them and the story they intended to tell, not when it floods platforms with content arbitrage. Both sources distinguish between automation that extends human intent and automation that replaces the meaningful part of an exchange.

The practical boundary is reliability. Voice agents may be useful for government services, education, healthcare, and sales, but they require trust and authentication. Editing agents may save creators time, but only if actions are reversible and reliable. Educational AI may help scale support to tens of thousands of students, but Thangaraj said IIT Madras saw 10% to 15% error rates in earlier AI code-feedback experiments, too high for high-stakes use at large student volumes. These sources are not rejecting AI assistance. They are specifying where reliability, disclosure, accountability, and human presence still matter.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free