Orply.
Topic

RAG and Knowledge Systems

Retrieval-augmented generation, enterprise search, vector databases, embeddings, document intelligence, and knowledge-grounded AI systems.

Agents Often Claim Web Access After Being Blocked or Challenged

Rafael Levi of Bright Data argues that many web-dependent agents fail not because they cannot produce answers, but because they report success after web access has broken. In a demo using Bright Data’s Web MCP, Levi shows the same agent failing against sites such as LinkedIn, Instagram, Amazon and TikTok without live access, then producing usable results when given infrastructure for search, scraping, JavaScript rendering and CAPTCHA handling. His broader case is that reliable agents need a real public-web access layer, not prompts that assume the model saw the page.

Rafael LeviAI EngineerJun 17, 20269 min read

Hermes Uses a Minimal Agent Loop to Preserve State Across Channels

Alejandro AO’s walkthrough of Hermes presents the agent as a deliberately small always-on system rather than a complex orchestration stack. He argues that Hermes’ usefulness comes from a simple loop that builds context from Markdown files, message history, tools, skills and memory, then preserves state through compression, SQLite transcripts, optional external memory providers, gateway integrations and scheduled cron jobs. The architecture’s central concern is continuity: keeping enough context across channels and time for the agent to behave like a persistent assistant.

Alejandro AOHugging FaceJun 17, 202611 min read

Enterprise AI Is Blocked by Context, Not Model Intelligence

Databricks chief executive Ali Ghodsi argues that enterprise AI is constrained less by model intelligence than by access to company context: data, documents, processes and relationships that agents need to operate inside businesses. In a Bloomberg Tech interview with Ed Ludlow, Ghodsi said Databricks is building products such as Genie Ontology and Lakehouse to make that context usable, while adoption in critical workflows remains slowed by security, legal and approval processes. He also declined to confirm reports of a new funding round and said Databricks is not rushing toward an IPO.

Ed Ludlow · Ali GhodsiBloomberg TechnologyJun 16, 20266 min read

Codex Turns Earnings Reports Into Post-Quarter Investment Thesis Updates

OpenAI is pitching Codex’s public-equity investing plugin as a way to turn a company’s latest quarter into thesis-revision work rather than a conventional earnings recap. Using a Cava post-earnings example, the source argues that Codex can combine first-party filings, earnings-call material and third-party data from sources including Quartr, Daloopa and S&P Global to separate business momentum from stock expectations, build bull, base and bear cases, and produce a monitoring checklist for the next reporting window.

OpenAIJun 12, 20265 min read

RAG Is Becoming Agentic Retrieval, Not Disappearing

Kuba Rogut, a deployed engineer at Turbopuffer, argues that claims about RAG’s death rely on defining it as a narrow, one-shot vector search pattern. In his account, retrieval-augmented generation is becoming a broader agentic retrieval system: vector search, full-text search, grep, regex, glob and filters used iteratively by models that keep looking until they have the right context. He points to Cursor’s semantic-search gains and contrasts its upfront indexing with Claude Code’s per-session grep approach to frame embeddings as cached compute whose value depends on reuse.

Kuba RogutAI EngineerJun 9, 20266 min read

AI Compresses Years of Software Vulnerability Discovery Into Weeks

Palo Alto Networks chief executive Nikesh Arora told the All-In podcast that AI has changed cybersecurity by making years of latent software vulnerabilities discoverable in weeks. After testing Anthropic’s Claude Mythos against Palo Alto’s own code, Arora said the company found flaws that would normally have taken five to seven years to identify, raising the stakes for enterprises with weaker defenses. His broader argument was that AI will erode analytical SaaS while increasing the value of data infrastructure, workflow redesign and security systems that can make model outputs reliable enough for production.

Chamath Palihapitiya · Jason Calacanis · David Sacks · David Friedberg · Nikesh AroraAll-In PodcastJun 8, 202614 min read

Ulta Uses AI to Personalize HR Support for 65,000 Workers

Ulta Beauty executives Rachel Williamson and Josh Siebert describe the retailer’s ServiceNow-backed HR automation rollout as a response to a concrete operating problem: 65,000 employees could not reliably find the policies and support they needed. In a sponsored interview, they argue that the value of AI was not the chatbot itself, but its ability to personalize answers, route routine HR work away from overloaded teams, and preserve human judgment for sensitive cases. Their account frames AI as an enabler of workflow redesign, not an end in itself.

Alex Kantrowitz · Rachel Williamson · Josh SiebertAlex KantrowitzJun 8, 202610 min read

Code Agents Need Context Engineering, Not Larger Prompts

Nupur Sharma of Qodo argues that larger context windows have not solved a core agent failure: models still tend to use the beginning and end of an input while losing important material in the middle. Her case is that agent quality depends less on giving a model more context than on engineering how context is retrieved, ranked, constrained and checked. She describes Qodo’s approach as a mix of iterative retrieval, specialist agents, judge nodes and bounded orchestration that reserves high-reasoning models for discovery while using stricter, lighter steps for validation.

Nupur SharmaAI EngineerJun 8, 202612 min read

LSEG Grounds AI Strategy in Trusted Financial Data and Controls

Emily Prince, group head of AI at LSEG, argues in an OpenAI Customer Ignite talk that AI in financial services only becomes useful at scale when it is grounded in trusted data, evaluation frameworks and governance that fit regulated work. She presents LSEG’s strategy as an effort to make its financial data and analytics available inside the tools customers and employees already use, including through APIs and Model Context Protocol, rather than treating AI as a generic answer engine. The case is that speed and experimentation matter, but only if controls, source quality and industry-specific workflows are built into the system.

Emily Prince · Nikolai SkaboOpenAIJun 8, 202610 min read

OpenAI Pitches ChatGPT as Workflow Infrastructure for Financial Institutions

OpenAI solutions engineer Stephanie Anani makes the case that ChatGPT should sit inside financial-services workflows rather than alongside them as a general productivity tool. Her argument is that AI can take on the search, reconciliation, modeling, compliance-checking and presentation work that consumes analysts’ time, while leaving investment and risk judgment with humans. In a QXO investment case, she shows ChatGPT moving from trusted research sources to an auditable Excel model and committee deck, using firm-specific skills and controls meant for regulated environments.

Stephanie AnaniOpenAIJun 8, 20267 min read

AI in Financial Services Is Moving From Answers to Work Products

At OpenAI’s Investor Innovation Day, Sarah Friar and other speakers argued that Codex and enterprise ChatGPT are moving AI use in financial services from “asking mode” into execution. The examples stayed close to existing work: querying deal folders, speeding company research in Excel, generating spreadsheets, models, and decks, and distributing employee-built GPTs into daily operations. James Mackey tied the enterprise case to adoption at scale, saying 2,700 employees now have ChatGPT licenses and are using hundreds of internal GPTs as a business “force multiplier.”

David Bessel · Jasmine Azizi · Sarah Friar · James MackeyOpenAIJun 7, 20265 min read

Enterprise AI’s Constraint Is Judgment, Not Token Consumption

At TBPN’s AIPCon 10 broadcast, Palantir chief executive Alex Karp argued that enterprise AI’s central problem is no longer model capability but organizational judgment: companies are consuming tokens, dashboards and AI-generated artifacts without tying them to decisions that change operations. AIG’s Peter Zaffino, Palantir’s Chad Wahlquist and USDA’s Sam Berry extended the same case from insurance, deployment architecture and government data systems, describing AI as valuable only when embedded in workflows, data structures and feedback loops that reflect how institutions actually work.

Jordi Hays · John Coogan · Chad Wahlquist · Alex Karp · Peter Zaffino · Sam BerryTBPNJun 4, 202626 min read

AI Demand Is Real, but Productivity Gains Remain Unproven

Bloomberg’s Tech event in San Francisco framed the AI boom as a market caught between constrained infrastructure demand and valuations that leave little tolerance for misses. Executives from Databricks, Okta and Altimeter argued that the next bottlenecks are enterprise context, secure system access, power and capital allocation, while San Francisco Fed President Mary Daly said AI investment is widespread but has not yet produced broad, measurable productivity gains.

Caroline Hyde · Ed Ludlow · Andrew Feldman · Ali Ghodsi · Apoorv Agrawal · Mary Daly · Tom Giles · Todd McKinnonBloomberg TechnologyJun 4, 202618 min read

Enterprise AI’s Bottleneck Is Context, Not Smarter Models

Databricks co-founder and CEO Ali Ghodsi told Bloomberg Technology that the main enterprise AI problem is no longer model intelligence but access to organizational context. Ghodsi argued that artificial general intelligence has effectively arrived by a practical workplace test, and that companies should focus on connecting models to their data, processes and metrics so agents can become useful. He also cast that thesis as central to Databricks’ Lakehouse and Genie products, while saying the company can remain privately funded until an eventual IPO is needed for employee liquidity.

Caroline Hyde · Ed Ludlow · Ali GhodsiBloomberg TechnologyJun 4, 20265 min read

AI Voice Agents Are Beating the Average Customer-Service Rep

Tom Chen, chief product officer at Aircall, argues that AI voice agents should be judged against the average customer-service interaction, not the best human rep. In his account, the technology is already good enough for many routine calls, can handle far more concurrency at lower cost, and may improve satisfaction when customers are given a clear choice between faster AI service and a human agent. The main constraint, Chen says, is often not the model but the undocumented company knowledge the agent needs to resolve issues.

Craig Smith · Tom ChenEye on AIJun 4, 202617 min read

Semantic Search Cut Claude Code’s Wasted File Reads to One in Eight

Kuba Rogut of Turbopuffer benchmarked Claude Code on 50 ContextBench tasks to test whether it found the right code context, not whether it solved the tasks. He argues that adding semantic search to windowed grep made Claude Code’s file reads much more precise, cutting irrelevant reads from about one in three to one in eight, but did not make semantic retrieval a blanket replacement for grep. In Rogut’s results, semantic search helped when related code shared behavior rather than keywords, while grep remained stronger when the relevant term or import path was explicit.

Kuba RogutAI EngineerJun 3, 202611 min read

Public-Market Capital Is Becoming an AI Infrastructure Advantage

TBPN’s John Coogan and Jordi Hays use Alphabet’s reported $80bn equity raise, Berkshire Hathaway’s investment and a run of founder interviews to argue that AI is pushing capital markets and operating infrastructure back to the center of technology strategy. Their case is that the advantage is moving to companies that can finance enormous compute buildouts, unify fragmented data, own service businesses where AI can be deployed, and build the physical systems — from data centers to space logistics — that make AI useful.

John Coogan · Jordi Hays · Jensen Huang · Justin Fox · Edward Kim · Tom Mueller · Shreya Murthy · Nate Cavanaugh · Jack Doohan · Brynn PutnamTBPNJun 2, 202630 min read

GitHub’s Agent Era Is Stressing Commits, Actions, Pull Requests, and Trust

GitHub COO Kyle Daigle argues that the agent era is turning GitHub’s AI shift into an infrastructure and trust problem, not just a product expansion beyond Copilot autocomplete. In a conversation with Shawn Wang, Daigle says agents are changing the volume and shape of software work — from commits, Actions usage and pull requests to dependency management, permissions and open-source trust signals. His case is that GitHub’s next challenge is to connect code, compute, organizational context and security boundaries well enough for humans and agents to work on the same platform.

Shawn Wang · Kyle DaigleLatent SpaceJun 2, 202624 min read

Lovable Uses Agent Complaints to Find Bugs and Improve Projects

Benjamin Verbeek of Lovable argues that AI coding products can improve continuously by treating user failures and agent frustration as production signals. In a talk on Lovable’s internal systems, he describes two loops: one that turns sessions where nontechnical users get stuck and later recover into tested contextual guidance, and another that lets the agent complain directly when Lovable’s tools, documentation or platform behavior block its work. Verbeek says the approach has surfaced real bugs, reduced repeated “fix” intent messages and created an operational signal for incidents.

Benjamin VerbeekAI EngineerJun 2, 202610 min read

YouTube Is Becoming Hollywood’s Talent Market and IP Proving Ground

TBPN’s John Coogan and Jordi Hays argue that YouTube is moving from Hollywood competitor to Hollywood’s talent market, where creator-led films prove creative judgment, production ability and audience response before studio capital arrives. The episode extends that pattern to AI policy, software and prediction markets: established institutions are trying to absorb signals formed outside their usual channels, from internet-proven filmmakers and frontier AI labs to traders and startups testing demand before regulators, studios or public markets have settled their response.

Jordi Hays · John Coogan · Marc Benioff · Nico Ferreyra · Mike Schroepfer · Graham Stephan · Bernie Su · Sue Khim · Scott Trinkham · Adam Iscoe · Jason Oppenheim · Danial Jameel · Tyler BohallTBPNJun 1, 202627 min read

Inference Hardware and Continual Learning Are Replacing Data as AI Bottlenecks

Google chief scientist Jeff Dean argues in a Two Minute Papers interview that AI progress is not chiefly constrained by running out of public text, but by systems work: extracting more from existing data, building inference-specialized hardware, distilling large models into smaller ones, and giving models access to much larger context. Dean frames the next phase less as better chatbots than as action-driven, agentic systems that can test, simulate and learn under controlled safety gates, while acknowledging unresolved problems in continual learning, healthcare deployment and infrastructure reliability at Google scale.

Károly Zsolnai-Fehér · Jeff DeanTwo Minute PapersJun 1, 202613 min read

Personal AI Systems Need Separate Layers for Memory and Autonomy

Nathan Labenz opens his personal AI infrastructure to a security audit by Daniel Miessler, showing a system that combines a high-context Claude Code “second brain” with lower-access autonomous agents for operational work. Their central argument is that useful personal AI should not collapse memory, authority, and autonomy into one assistant: raw personal history should be preserved and audited, while agents that act in the world need narrower permissions, clear roles, and containment. Miessler frames the longer-term model as an assistant that navigates from current state to ideal state while continually pruning obsolete scaffolding as models improve.

Nathan Labenz · Daniel MiesslerThe Cognitive RevolutionMay 30, 202629 min read

Context Graphs Let Agents Retrieve Precedents, Not Just Policies

Neo4j’s Zach Blumenfeld argues that agents built for operational decisions need context graphs rather than document retrieval alone. In his model, a standard knowledge base can tell an agent the relevant facts and policies, but a context graph adds prior decision traces, causal links, precedents and outcomes, allowing the agent to retrieve how similar cases were resolved. He presents `create-context-graph` and `neo4j-agent-memory` as open-source scaffolding for building that pattern with graph entities, short-term memory and embedded reasoning traces.

Zach BlumenfeldAI EngineerMay 29, 202610 min read

Abridge Says GPT-5.5 Improves Clinical Synthesis as Tool Complexity Rises

Abridge’s Chaitanya Asawa says GPT-5.5 improved the company’s clinical decision-support system as it added more tools and context, a signal that the model could better synthesize information under complexity. His case is that stronger reasoning and tool use can turn patient context, live clinical conversation, and trusted medical guidance into denser point-of-care support, while leaving clinicians to review answers and accept or reject proposed note edits.

Chaitanya AsawaOpenAIMay 28, 20265 min read

Devin’s 80% Commit Share Shows Background Agents Becoming Production Infrastructure

Cognition co-founder and CPO Walden Yan and OpenInspect creator Cole Murray argue that software engineering is moving from IDE-based, step-by-step prompting toward background agents that can turn a specification into a tested pull request. Their case is that Devin’s rise from 16% to 80% of non-merge commits across three Cognition repos is not mainly a model benchmark, but evidence of a production workflow built on cloud sandboxes, scoped permissions, repo setup, testing, integrations, memory, and code review. Both warn that autonomy without those systems can degrade a codebase as quickly as it accelerates output.

Shawn Wang · Walden Yan · Cole MurrayLatent SpaceMay 28, 202623 min read

Voice Will Become the Default Interface for Enterprise AI

Luiz Domingos, chief technology officer of Mitel, argues that enterprise AI has moved past pilots and into communications workflows where latency, compliance, auditability and human oversight determine whether systems can be deployed. In a conversation with Craig Smith, Domingos says cloud-only AI will not meet the needs of real-time voice and regulated industries, and that edge and hybrid deployments will become central. His larger prediction is that enterprise AI will increasingly be accessed by voice rather than screens, especially for frontline workers whose jobs do not fit a desktop interface.

Craig Smith · Luiz DomingosEye on AIMay 28, 202616 min read

Context Graphs Give AI Agents Rules, Precedent, and Decision Traces

In a Neo4j talk, Zaid Zaim and Andreas Kollegger argue that AI agents need more than language models, tools, and retrieval if they are to make consequential decisions. Zaim frames context graphs as a way to store the policies, prior decisions, causal links, and reasoning traces behind an action; Kollegger extends that into a five-stage decision workflow in which agents frame the case, check rules and precedent, assess risk, act only within authority, and write the outcome back to the graph as future precedent.

Zaid Zaim · Andreas KolleggerAI EngineerMay 28, 202611 min read

Children’s Data Profiles Can Begin Before Birth

Proton engineering director Eamonn Maguire argues that a child’s digital profile can begin before birth, as parents’ emails, searches and sign-ups create signals that advertising and platform systems can use to infer pregnancy, family status and future behavior. Speaking with Craig Smith, Maguire uses Proton’s Born Private initiative, which lets parents reserve an email address for a child, to make a broader case that privacy is an infrastructure decision made long before children can consent. He extends the argument to social media, AI training data and the limits of trusting platforms whose business models depend on profiling.

Craig Smith · Eamonn MaguireEye on AIMay 27, 202617 min read

YC Says Internal Agents Need Shared Context, Tools, and Trust

YC’s Pete Koomen argues that building “superintelligence” inside a company requires more than adding AI features to existing software: agents need access to the organization’s shared context, tools and accumulated work. In a Lightcone discussion with Garry Tan, Jared Friedman, Diana Hu and Harj Taggar, Koomen describes how YC’s internal agent system became useful once it could query a unified company database, reuse hundreds of internal tools and turn repeated judgment into improving skills. The broader claim is that AI-native organizations will depend as much on trust, transparency and broad access as on model capability.

Garry Tan · Diana Hu · Jared Friedman · Harj Taggar · Tom Blomfield · Pete KoomenY CombinatorMay 27, 202617 min read

Context Engines Make Coding Agents Mergeable, Not Just Functional

Brandon Waselnuk of Unblocked argues that coding agents are failing less because they lack access to tools than because they lack organizational context. In his account, MCP connections, larger context windows and naive RAG give agents more material, but not the judgment to know which code patterns, Slack decisions, ownership signals or backwards-compatibility rules matter. His proposed answer is a runtime context engine that reasons across code, PRs, documents, conversations and social structure before the agent writes code, so its output is closer to something a long-tenured engineer could merge.

Brandon WaselnukAI EngineerMay 26, 202613 min read

Useful AI Agents Need Smaller Contexts and Simpler Representations

Angus McLean, an AI Director at OLIVER, argues that useful agents are not the most autonomous ones but the best constrained. Drawing on OLIVER’s production use of AI across thousands of daily creative assets, he says builders should resist both model and developer tendencies toward verbosity and over-engineering: use curated documentation instead of open web access, ask how little context a task needs, choose simple representations such as HTML when they work, and avoid automating jobs they cannot do themselves.

Angus McLeanAI EngineerMay 25, 202611 min read

Google’s GenAI Stack Turns Multimodal Prompts Into Application Pipelines

Google DeepMind’s Paige Bailey and Guillaume Vernade argue that Google’s generative AI stack is being organized as an application pipeline rather than a set of isolated models. In a three-hour workshop, Bailey showed AI Studio turning multimodal Gemini prompts into inspectable API calls and generated apps with auth and Firestore, while Vernade used Gemini, Nano Banana, Veo and Lyria to illustrate, animate and score The Wind in the Willows. Their case is that builders can now orchestrate prompt, code, media generation and deployment in one workflow, even as the demos exposed seams that still require engineering discipline.

Paige Bailey · Guillaume Vernade · Ian ValentineAI EngineerMay 23, 202623 min read

ChatGPT Adds In-PowerPoint Drafting and Editing for Business Decks

OpenAI presents ChatGPT for PowerPoint as an embedded drafting and editing layer for business presentations, now available in beta to all customers. The source argues that the tool is meant to turn scattered company material — notes, account context, market research, prior deck fragments and analysis files — into a structured executive deck inside PowerPoint, with the user reviewing the storyline before generation and refining slide content afterward. Its claim is less that ChatGPT can make slides from a prompt than that it can keep the source material, outline, draft and edits in one workflow.

OpenAIMay 22, 20266 min read

Android Makes Gemini Nano a Shared System Service for Apps

Google’s Florina Muntenescu and Oli Gaymond argue that Android’s on-device AI strategy depends on treating Gemini Nano as a shared system service, not something each app ships and manages itself. In their account, AICore centralizes the three-to-four-gigabyte model, scheduling, battery management and privacy boundaries, while developers call higher-level ML Kit GenAI APIs. The constraint is reach: those APIs need recent flagship-class devices, so Google is positioning hybrid cloud fallback and LiteRT-LM as alternatives when local Gemini Nano is unavailable or too limiting.

Florina Muntenescu · Oli GaymondAI EngineerMay 22, 202611 min read

VS Code Unifies Local, Background, and Cloud Coding Agents

Microsoft’s Liam Hampton argues that coding agents should be chosen by the amount of control a developer wants to keep, not treated as a single all-purpose assistant. In a VS Code demo using one repository, he assigns tests to a local Claude agent for hands-on iteration, a front-end build to a background agent isolated in a Git worktree, and open-source documentation to a cloud agent running through GitHub Actions. His case is that VS Code can act as the control plane for these modes, including Copilot, Claude, and third-party agents.

Liam HamptonAI EngineerMay 21, 202611 min read

Startups Should Build Recorded, Queryable Operations That AI Can Improve

YC general partner Tom Blomfield argues that startups should not treat AI as a copilot bolted onto existing org charts, but as the basis for a company that records its work, exposes its tools, and improves through recursive loops. In his batch talk, he says founders should make company knowledge legible to AI, spend more on tokens rather than headcount, and rebuild operations around systems that can detect failures, update themselves, and reduce the need for human coordination.

Tom BlomfieldY CombinatorMay 21, 20267 min read

Language Models Generalize Differently From Parameters Than From Context

In a Stanford CS25 seminar, Anthropic researcher Andrew Lampinen argues that language models generalize differently depending on whether information is stored in their parameters or supplied in context. His experiments find that models can often use relations flexibly when the relevant facts are visible in the prompt, but fail to make the same reversals, syllogistic inferences, or codebook translations when those facts have only been learned through training. Lampinen presents augmentation, retrieval, and reinforcement-learned recall as partial ways to make latent implications more usable, while stressing that parametric learning and in-context learning remain complementary rather than substitutes.

Steven Feng · Andrew LampinenStanford OnlineMay 20, 202618 min read

AI-Native Startups Are Replacing Teams With Agentic Operating Systems

In a Stanford CS153 Frontier Systems lecture, Y Combinator CEO Garry Tan and general partner Diana Hu argue that AI agents are changing the basic production unit of a startup from a team to a founder operating through skills, memory, evals and customer feedback loops. Tan frames agentic coding as a programmable company architecture, while Hu says AI-native companies are becoming closed-loop systems with far higher revenue per employee and less need for traditional managerial coordination.

Garry Tan · Diana HuStanford OnlineMay 20, 202617 min read

Any-to-Any Agents Rely on Orchestrated Multimodal Models, Not One Network

Google DeepMind’s Patrick Löber presents “any-to-any” agents as an orchestration problem rather than a claim that one model already handles every modality. In his architecture, Gemini reads and reasons across PDFs, images, audio, video and other sources, then uses function calling to invoke specialized native models for images, speech, live audio, video or embeddings. Löber argues that the useful shift is not generating every possible format, but letting an agent decide when a diagram, spoken explanation or other output is warranted.

Patrick LoeberAI EngineerMay 20, 202610 min read

Gemini’s Strategy Shifts From Frontier Leaderboards to Deployable AI Infrastructure

Google DeepMind executives Tulsee Doshi and Logan Kilpatrick argue that Google’s current Gemini strategy is built less around a single frontier model than around a deployable AI stack. In their account, Gemini 3.5 Flash, the Anti-Gravity agent harness and new multimodal products such as Omni are meant to make models fast, cheap and integrated enough to run across Search, the Gemini app, AI Studio, YouTube and enterprise tools. The deeper shift, Kilpatrick says, is that the model is increasingly absorbing the scaffolding that once surrounded it, while Google standardizes the remaining agent infrastructure across its products.

Nathan Labenz · Logan Kilpatrick · Tulsee DoshiThe Cognitive RevolutionMay 20, 202619 min read

Coding Agent Skills Need Live Documentation, Not Cached Product Knowledge

Marc Klingen of Langfuse argues that coding agents can add observability, but often do it first from stale model memory, producing broken or incomplete instrumentation before recovering through current documentation. In a talk on building a Langfuse skill for Claude Code, he says the fix is not to stuff more product knowledge into the agent, but to give it reliable ways to find live docs, expose its intermediate work in traces, and evaluate changes against realistic repositories. The same work, he warns, creates new risks when optimization loops reward shorter paths and remove the documentation-fetching and approval steps that make the skill reliable.

Marc KlingenAI EngineerMay 20, 202613 min read

Spotify Uses Semantic IDs to Make LLMs Recommend Catalog Items

Spotify’s Shivam Verma argues that LLM-era personalization requires translating both users and catalog items into forms a model can process alongside language. In his account, Spotify combines long-term user embeddings, Semantic IDs that turn tracks and episodes into token sequences, and soft tokens that project a listener’s profile into an LLM’s embedding space. The aim is a generative recommender that can produce catalog-native recommendations without full fine-tuning, while still relying on traditional ranking layers for production use.

Shivam VermaAI EngineerMay 19, 202610 min read

AI Backlash Reaches Commencement as Graduates Face a Reshaped Job Market

Jason Calacanis and Alex Wilhelm argue that the boos greeting pro-AI commencement speeches are a visible sign of AI’s legitimacy problem with new graduates entering the workforce. On This Week in Startups, they frame the reaction less as technophobia than as distrust: students have already seen AI weaken academic norms, threaten entry-level work, concentrate wealth around frontier labs, and expand systems of surveillance and data capture. Their discussion returns to a central question: whether workers, founders, consumers, and citizens have any meaningful control over the AI systems now reshaping their choices.

Jason Calacanis · Alex Wilhelm · Gloria Caulfield · Eric SchmidtThis Week in StartupsMay 19, 202621 min read

Agentic AI Is Turning Model Quality Into a Systems Problem

At AI Engineer Singapore’s second day, speakers from Google DeepMind, Cloudflare, Arize, OpenClaw, Adaption and other teams made a shared engineering case: as AI systems become more agentic, model quality is no longer separable from the systems around the model. Richard Ngo framed the risk as long-horizon, situationally aware agents whose goals cannot be inspected, while practitioners argued that production AI now depends on continuous evaluation, traces, deterministic execution boundaries, routing, memory, fine-tuning and test-time search. The source’s central claim is that useful and safe agentic AI is becoming a systems problem, not just a model-selection problem.

Shawn Wang · Eugene Yan · Philip Vollet · Haotian Zhang · Eugene Evstafev · Jason Liu · Pratik Desai · Michelle Chen · Jason Lopatecki · Amr Ahmed · Rita Zhang · Harris Snyder · Adarsh Shah · Eric Zhang · Ricky Robinett · Linoy Bitan · Wei Sheng · Richard NgoAI EngineerMay 17, 202626 min read

Context Graphs Make AI Decision Trails Queryable

Stephen Chin of Neo4j argues that enterprise AI systems need context graphs because retrieval alone can surface relevant facts while missing the relationships that make them usable. In his examples, a graph-augmented system can connect a patient’s emphysema care plan to smoking history or a credit decision to prior rejections, policies, margin trades and fraud signals. Chin’s case is that agents should preserve not only documents and answers, but the decision traces, tool calls, causal chains and outcomes that let humans inspect and reuse prior reasoning.

Stephen ChinAI EngineerMay 16, 202612 min read

Economic Entanglement, Not Decoupling, Defines the New China Bargain

Salesforce CEO Marc Benioff joined the All-In hosts for a discussion that framed U.S.-China relations, enterprise AI, and the software selloff around the same question: when dependence is a stabilizer and when it becomes leverage. Benioff argued that more trade with China can lower conflict risk and that large software platforms remain valuable because AI still needs trusted customer data, cash-flowing distribution, and enterprise deployment. David Friedberg, Chamath Palihapitiya, and Jason Calacanis extended the argument across Taiwan, chips, AI assistants, El Niño-driven food risk, and private-market SPVs, where interconnection can either absorb shocks or transmit them.

Jason Calacanis · Chamath Palihapitiya · David Friedberg · Marc BenioffAll-In PodcastMay 15, 202620 min read

AI Software Winners Will Own Context, APIs, or Outcomes

Tasklet chief executive Andrew Lee argues that AI software is consolidating toward a few horizontal agent platforms that hold context, connect tools, generate interfaces, and choose among models. In a discussion with Nathan Labenz, Lee says Tasklet has rewritten its agent stack around file-system memory, agentic search, and provider-specific context management because the chat transcript is no longer enough. He also frames Anthropic as both Tasklet’s critical supplier and a major competitor, making model neutrality central to Tasklet’s bid to survive the AI transition.

Nathan Labenz · Andrew LeeThe Cognitive RevolutionMay 15, 202623 min read

Supabase Says Skills and MCP Close the Agent Context Gap

Pedro Rodrigues of Supabase argues that agents fail on production systems less because they cannot reason than because they lack product-specific judgment. In a test using the same Postgres task, Supabase found that Claude with MCP alone created a view that could bypass row-level security, while MCP plus a Supabase skill added the required `security_invoker = true` flag. Rodrigues’s case is that MCP gives agents tools, but skills supply the rules, workflows, and current documentation paths needed to use those tools safely.

Pedro RodriguesAI EngineerMay 15, 20269 min read

Intercom Doubled Engineering Throughput by Standardizing on Claude Code

Brian Scanlan, a senior principal engineer at Intercom, argues that the company doubled engineering throughput by treating AI coding as an internal platform strategy rather than an individual productivity tool. In his account, Intercom standardized on Claude Code, encoded recurring engineering work into agent-usable skills, connected agents to internal systems under existing controls, and made AI adoption an explicit expectation across R&D. The reported result was a doubling of pull-request throughput, including 17.6% of merged PRs approved by Claude, alongside new bottlenecks in review and CI.

Brian ScanlanAI EngineerMay 15, 202613 min read

AI Is Pushing Science Beyond the Paper as Its Core Artifact

In closing remarks from an AI and science meeting, Risa Wechsler argued that AI is reshaping scientific fields unevenly, depending on their data, theory and modes of inquiry, and that scientists should use the moment to choose structures aligned with human values. Surya Ganguli pushed the question toward scientific communication itself, suggesting that papers may be too narrow an artifact for AI-assisted science and that richer institutional records of research could better transfer knowledge. Both framed AI for science as a design problem around human purposes, not just faster automation.

Surya Ganguli · Risa WechslerStanford HAIMay 15, 20265 min read

Abridge Bets Clinical Conversations Can Become Healthcare’s Intelligence Layer

Abridge executives Janie Lee and Chaitanya “Chai” Asawa argue that the patient-clinician conversation is becoming healthcare’s core intelligence layer, not merely an input for automated notes. In a discussion with Redpoint’s Jacob Effron, they describe Abridge’s move from ambient documentation into clinical decision support, prior authorization and other workflows that depend on EHR data, payer rules, medical literature and local guidelines. Their case is that healthcare AI will be judged less by chatbot fluency than by whether it can deliver accurate, low-latency, privacy-preserving support inside clinical workflows without adding to clinicians’ alert burden.

Shawn Wang · Janie Lee · Jacob Effron · Chaitanya AsawaLatent SpaceMay 14, 202620 min read

Agent Workflows Route Conversations Through Specialized Subagents

ElevenLabs is introducing Workflows, a visual editor for its Agents Platform that lets builders design routed conversation flows instead of placing all business logic inside one agent prompt. The company argues that specialized subagents, each with their own instructions, tools, knowledge bases and model choices, give teams more control over cost, latency and accuracy. The product is positioned as a way to combine AI interpretation with predefined actions, verification steps and human handoffs on the same design surface.

ElevenLabsMay 13, 20265 min read

AI Companies Are Running Into Infrastructure, Distribution, and Trust Bottlenecks

TBPN’s discussion argued that AI’s value is now being tested less in model demos than in the bottlenecks around deployment: inference speed, power, workflow integration and access to customers. Cerebras was framed as a public-market bet on faster inference, while Giga Energy’s data-center business showed how scarce powered shells have become part of the AI supply chain. The same bottleneck logic appeared outside core AI, from Audemars Piguet using Swatch as an official low-cost entry point to Augustus, with conditional OCC approval, trying to rebuild dollar clearing as a national bank.

Jordi Hays · John Coogan · Alex Taubman · Amir Sadeghian · Quaid Walker · Matt Lohstroh · Jay Azhang · Spencer Rascoff · Tyler Cosgrove · Ferdinand Dabitz · Eric OlsonTBPNMay 11, 202632 min read

Head-Tail Truncation and Memory Stabilized Arize’s Trace-Analyzing Agent

Sally-Ann DeLucia argues that agent performance depends on context management as an operating discipline, not on larger prompts or simple compression. Drawing on Arize’s work building Alyx, an agent that analyzes trace data from AI systems including its own, she says naive truncation broke follow-up reasoning and LLM summarization gave the model too much control over what mattered. Arize’s more durable pattern was to preserve the head and tail of context, store the middle for retrieval, test long sessions explicitly, and move heavy workloads into sub-agents.

Sally-Ann DeLuciaAI EngineerMay 10, 202610 min read

ElevenLabs Voice Engine Wraps Existing Chat Agents Without Rebuilding Them

Luke Harries of ElevenLabs argues that the next step for chat agents is not a new orchestration stack but a voice layer around the agents companies have already built. His case for ElevenLabs’ Voice Engine is that teams can keep their existing LLM logic, RAG, tools and business rules, while offloading speech-to-text, text-to-speech, turn-taking and interruption handling to a wrapper. The product is positioned for companies that want voice interfaces across web, phone and meeting channels without rebuilding their chat agents inside a fully managed platform.

Luke HarriesAI EngineerMay 9, 20266 min read

Fresh Product Data Is the Constraint for LLM Commerce Discovery

Criteo executives Diarmuid Gill and Liva Ralaivola argue that modern ad tech is best understood as a millisecond-scale prediction system: anonymous commerce signals, learned embeddings and real-time auctions are used to decide whether to bid, what to show and how much an impression is worth. In a conversation with Nathan Labenz, they frame Criteo’s work with OpenAI and other generative tools as an extension of that problem, not a replacement for it: LLMs may change product discovery, but the system still depends on fresh retailer data, consent, latency discipline and human oversight.

Nathan Labenz · Alex Persky-Stern · Diarmuid Gill · Liva RalaivolaThe Cognitive RevolutionMay 9, 202618 min read

Personal AI Lets One Builder Do the Work of Teams

Y Combinator CEO Garry Tan argues that personal AI is reaching a stage comparable to the early personal computer: powerful enough to let one person build software that once required a team, but still brittle enough to demand technical ownership. Drawing on his work with Claude Code, OpenClaw and his GStack workflow, Tan makes the case for heavy token use, Markdown-encoded “skills” and multiple coding agents under one accountable human operator. The larger question, he says, is whether users will control their own AI tools, data and prompts, or work inside opaque systems controlled by others.

Garry Tan · Harj Taggar · Diana Hu · Jared FriedmanY CombinatorMay 8, 202615 min read

Agentic Search Needs Specialized Tools and General-Purpose Escape Hatches

Elastic’s Leonie Monigatti argues that context engineering for LLM agents is largely a search-interface problem: the critical question is how an agent decides what to retrieve from files, databases, memory, the web, and other sources before the model answers. In her workshop, she shows why semantic search, database query tools, shell access, and agent skills each solve different parts of that problem and fail in different ways. Her recommendation is to build retrieval stacks that combine easy specialized tools for common tasks with more general tools for ambiguous or complex ones, then use observed failures to refine the stack.

Leonie MonigattiAI EngineerMay 8, 202617 min read

Perplexity Frames AI Agents as Metered Digital Labor

Perplexity chief business officer Dmitry Shevelenko argues that AI agents should be judged less as software features than as metered digital labor: tools users will pay for when they perform economically useful work. In a Big Technology Podcast interview, he makes the case that Perplexity’s computer-use agents, workflow packaging, broad permissions and multi-model orchestration are all part of that shift. The unresolved question is whether users and companies will accept the access, trust and usage-based pricing required to make those agents a real business rather than another AI novelty cycle.

Alex Kantrowitz · Dmitry ShevelenkoAlex KantrowitzMay 7, 202619 min read

Apple Explores Intel and Samsung for U.S. Chip Production

Mark Gurman said Apple has held early talks with Intel and Samsung about using new U.S. fabs to make future A-series and M-series processors, an exploratory move he framed as a supply-chain redundancy question rather than only a political one. Apple still relies heavily on TSMC, primarily in Taiwan, and Gurman described that geographic and supplier concentration as one of the company’s biggest risks. Across the rest of the broadcast, executives and analysts described a similar shift from exposure to execution: AI companies are giving Washington early model access for review, while enterprise adoption is being tested by security, deployment cost and proprietary data advantages.

Caroline Hyde · Mark Gurman · Lauren Webster · Hannah Miller · Seth Boro · Dani Burger · Josh Harris · Bill Ready · Romaine Bostick · Maggie Eastland · Lizette Chapman · Ian King · Peter Oey · Erin Price-WrightBloomberg TechnologyMay 7, 202614 min read

Agent Failure Should Drive Enterprise AI Knowledge Base Curation

Raj Navakoti argues that enterprise AI agents fail less because of model limits or retrieval plumbing than because companies have not made institutional knowledge legible. In his Demand-Driven Context workshop, he proposes building agent-ready knowledge bases from the bottom up: give agents real tickets or incidents, observe where they fail, and turn those failures into structured, validated context blocks. The method, shown through smaller-scope examples and prototypes including work from IKEA Digital, is presented as an incremental curation loop rather than a proven enterprise-scale system.

Raj NavakotiAI EngineerMay 7, 202617 min read

Small-Model Inference Needs Infrastructure Beyond Model Servers

Filip Makraduli of Superlinked argues that the hard part of small-model inference is no longer simply serving a model, but operating many embeddings, rerankers, extractors and multimodal models efficiently in production. In his account, conventional one-model-per-container deployments waste GPU capacity and leave teams to rebuild routing, autoscaling, monitoring, hot-swapping and eviction themselves. Superlinked’s SIE is presented as an open-source attempt to provide that missing infrastructure layer for AI search and document-processing workloads.

Filip MakraduliAI EngineerMay 7, 20269 min read