RAG and Knowledge Systems
Retrieval-augmented generation, enterprise search, vector databases, embeddings, document intelligence, and knowledge-grounded AI systems.
Agents Often Claim Web Access After Being Blocked or Challenged
Rafael Levi of Bright Data argues that many web-dependent agents fail not because they cannot produce answers, but because they report success after web access has broken. In a demo using Bright Data’s Web MCP, Levi shows the same agent failing against sites such as LinkedIn, Instagram, Amazon and TikTok without live access, then producing usable results when given infrastructure for search, scraping, JavaScript rendering and CAPTCHA handling. His broader case is that reliable agents need a real public-web access layer, not prompts that assume the model saw the page.
Hermes Uses a Minimal Agent Loop to Preserve State Across Channels
Alejandro AO’s walkthrough of Hermes presents the agent as a deliberately small always-on system rather than a complex orchestration stack. He argues that Hermes’ usefulness comes from a simple loop that builds context from Markdown files, message history, tools, skills and memory, then preserves state through compression, SQLite transcripts, optional external memory providers, gateway integrations and scheduled cron jobs. The architecture’s central concern is continuity: keeping enough context across channels and time for the agent to behave like a persistent assistant.
Enterprise AI Is Blocked by Context, Not Model Intelligence
Databricks chief executive Ali Ghodsi argues that enterprise AI is constrained less by model intelligence than by access to company context: data, documents, processes and relationships that agents need to operate inside businesses. In a Bloomberg Tech interview with Ed Ludlow, Ghodsi said Databricks is building products such as Genie Ontology and Lakehouse to make that context usable, while adoption in critical workflows remains slowed by security, legal and approval processes. He also declined to confirm reports of a new funding round and said Databricks is not rushing toward an IPO.
Codex Turns Earnings Reports Into Post-Quarter Investment Thesis Updates
OpenAI is pitching Codex’s public-equity investing plugin as a way to turn a company’s latest quarter into thesis-revision work rather than a conventional earnings recap. Using a Cava post-earnings example, the source argues that Codex can combine first-party filings, earnings-call material and third-party data from sources including Quartr, Daloopa and S&P Global to separate business momentum from stock expectations, build bull, base and bear cases, and produce a monitoring checklist for the next reporting window.
RAG Is Becoming Agentic Retrieval, Not Disappearing
Kuba Rogut, a deployed engineer at Turbopuffer, argues that claims about RAG’s death rely on defining it as a narrow, one-shot vector search pattern. In his account, retrieval-augmented generation is becoming a broader agentic retrieval system: vector search, full-text search, grep, regex, glob and filters used iteratively by models that keep looking until they have the right context. He points to Cursor’s semantic-search gains and contrasts its upfront indexing with Claude Code’s per-session grep approach to frame embeddings as cached compute whose value depends on reuse.
AI Compresses Years of Software Vulnerability Discovery Into Weeks
Palo Alto Networks chief executive Nikesh Arora told the All-In podcast that AI has changed cybersecurity by making years of latent software vulnerabilities discoverable in weeks. After testing Anthropic’s Claude Mythos against Palo Alto’s own code, Arora said the company found flaws that would normally have taken five to seven years to identify, raising the stakes for enterprises with weaker defenses. His broader argument was that AI will erode analytical SaaS while increasing the value of data infrastructure, workflow redesign and security systems that can make model outputs reliable enough for production.
Ulta Uses AI to Personalize HR Support for 65,000 Workers
Ulta Beauty executives Rachel Williamson and Josh Siebert describe the retailer’s ServiceNow-backed HR automation rollout as a response to a concrete operating problem: 65,000 employees could not reliably find the policies and support they needed. In a sponsored interview, they argue that the value of AI was not the chatbot itself, but its ability to personalize answers, route routine HR work away from overloaded teams, and preserve human judgment for sensitive cases. Their account frames AI as an enabler of workflow redesign, not an end in itself.
Code Agents Need Context Engineering, Not Larger Prompts
Nupur Sharma of Qodo argues that larger context windows have not solved a core agent failure: models still tend to use the beginning and end of an input while losing important material in the middle. Her case is that agent quality depends less on giving a model more context than on engineering how context is retrieved, ranked, constrained and checked. She describes Qodo’s approach as a mix of iterative retrieval, specialist agents, judge nodes and bounded orchestration that reserves high-reasoning models for discovery while using stricter, lighter steps for validation.
LSEG Grounds AI Strategy in Trusted Financial Data and Controls
Emily Prince, group head of AI at LSEG, argues in an OpenAI Customer Ignite talk that AI in financial services only becomes useful at scale when it is grounded in trusted data, evaluation frameworks and governance that fit regulated work. She presents LSEG’s strategy as an effort to make its financial data and analytics available inside the tools customers and employees already use, including through APIs and Model Context Protocol, rather than treating AI as a generic answer engine. The case is that speed and experimentation matter, but only if controls, source quality and industry-specific workflows are built into the system.
OpenAI Pitches ChatGPT as Workflow Infrastructure for Financial Institutions
OpenAI solutions engineer Stephanie Anani makes the case that ChatGPT should sit inside financial-services workflows rather than alongside them as a general productivity tool. Her argument is that AI can take on the search, reconciliation, modeling, compliance-checking and presentation work that consumes analysts’ time, while leaving investment and risk judgment with humans. In a QXO investment case, she shows ChatGPT moving from trusted research sources to an auditable Excel model and committee deck, using firm-specific skills and controls meant for regulated environments.
AI in Financial Services Is Moving From Answers to Work Products
At OpenAI’s Investor Innovation Day, Sarah Friar and other speakers argued that Codex and enterprise ChatGPT are moving AI use in financial services from “asking mode” into execution. The examples stayed close to existing work: querying deal folders, speeding company research in Excel, generating spreadsheets, models, and decks, and distributing employee-built GPTs into daily operations. James Mackey tied the enterprise case to adoption at scale, saying 2,700 employees now have ChatGPT licenses and are using hundreds of internal GPTs as a business “force multiplier.”
Enterprise AI’s Constraint Is Judgment, Not Token Consumption
At TBPN’s AIPCon 10 broadcast, Palantir chief executive Alex Karp argued that enterprise AI’s central problem is no longer model capability but organizational judgment: companies are consuming tokens, dashboards and AI-generated artifacts without tying them to decisions that change operations. AIG’s Peter Zaffino, Palantir’s Chad Wahlquist and USDA’s Sam Berry extended the same case from insurance, deployment architecture and government data systems, describing AI as valuable only when embedded in workflows, data structures and feedback loops that reflect how institutions actually work.
AI Demand Is Real, but Productivity Gains Remain Unproven
Bloomberg’s Tech event in San Francisco framed the AI boom as a market caught between constrained infrastructure demand and valuations that leave little tolerance for misses. Executives from Databricks, Okta and Altimeter argued that the next bottlenecks are enterprise context, secure system access, power and capital allocation, while San Francisco Fed President Mary Daly said AI investment is widespread but has not yet produced broad, measurable productivity gains.
Enterprise AI’s Bottleneck Is Context, Not Smarter Models
Databricks co-founder and CEO Ali Ghodsi told Bloomberg Technology that the main enterprise AI problem is no longer model intelligence but access to organizational context. Ghodsi argued that artificial general intelligence has effectively arrived by a practical workplace test, and that companies should focus on connecting models to their data, processes and metrics so agents can become useful. He also cast that thesis as central to Databricks’ Lakehouse and Genie products, while saying the company can remain privately funded until an eventual IPO is needed for employee liquidity.
AI Voice Agents Are Beating the Average Customer-Service Rep
Tom Chen, chief product officer at Aircall, argues that AI voice agents should be judged against the average customer-service interaction, not the best human rep. In his account, the technology is already good enough for many routine calls, can handle far more concurrency at lower cost, and may improve satisfaction when customers are given a clear choice between faster AI service and a human agent. The main constraint, Chen says, is often not the model but the undocumented company knowledge the agent needs to resolve issues.
Semantic Search Cut Claude Code’s Wasted File Reads to One in Eight
Kuba Rogut of Turbopuffer benchmarked Claude Code on 50 ContextBench tasks to test whether it found the right code context, not whether it solved the tasks. He argues that adding semantic search to windowed grep made Claude Code’s file reads much more precise, cutting irrelevant reads from about one in three to one in eight, but did not make semantic retrieval a blanket replacement for grep. In Rogut’s results, semantic search helped when related code shared behavior rather than keywords, while grep remained stronger when the relevant term or import path was explicit.
Public-Market Capital Is Becoming an AI Infrastructure Advantage
TBPN’s John Coogan and Jordi Hays use Alphabet’s reported $80bn equity raise, Berkshire Hathaway’s investment and a run of founder interviews to argue that AI is pushing capital markets and operating infrastructure back to the center of technology strategy. Their case is that the advantage is moving to companies that can finance enormous compute buildouts, unify fragmented data, own service businesses where AI can be deployed, and build the physical systems — from data centers to space logistics — that make AI useful.
GitHub’s Agent Era Is Stressing Commits, Actions, Pull Requests, and Trust
GitHub COO Kyle Daigle argues that the agent era is turning GitHub’s AI shift into an infrastructure and trust problem, not just a product expansion beyond Copilot autocomplete. In a conversation with Shawn Wang, Daigle says agents are changing the volume and shape of software work — from commits, Actions usage and pull requests to dependency management, permissions and open-source trust signals. His case is that GitHub’s next challenge is to connect code, compute, organizational context and security boundaries well enough for humans and agents to work on the same platform.
Lovable Uses Agent Complaints to Find Bugs and Improve Projects
Benjamin Verbeek of Lovable argues that AI coding products can improve continuously by treating user failures and agent frustration as production signals. In a talk on Lovable’s internal systems, he describes two loops: one that turns sessions where nontechnical users get stuck and later recover into tested contextual guidance, and another that lets the agent complain directly when Lovable’s tools, documentation or platform behavior block its work. Verbeek says the approach has surfaced real bugs, reduced repeated “fix” intent messages and created an operational signal for incidents.
YouTube Is Becoming Hollywood’s Talent Market and IP Proving Ground
TBPN’s John Coogan and Jordi Hays argue that YouTube is moving from Hollywood competitor to Hollywood’s talent market, where creator-led films prove creative judgment, production ability and audience response before studio capital arrives. The episode extends that pattern to AI policy, software and prediction markets: established institutions are trying to absorb signals formed outside their usual channels, from internet-proven filmmakers and frontier AI labs to traders and startups testing demand before regulators, studios or public markets have settled their response.
Inference Hardware and Continual Learning Are Replacing Data as AI Bottlenecks
Google chief scientist Jeff Dean argues in a Two Minute Papers interview that AI progress is not chiefly constrained by running out of public text, but by systems work: extracting more from existing data, building inference-specialized hardware, distilling large models into smaller ones, and giving models access to much larger context. Dean frames the next phase less as better chatbots than as action-driven, agentic systems that can test, simulate and learn under controlled safety gates, while acknowledging unresolved problems in continual learning, healthcare deployment and infrastructure reliability at Google scale.
Personal AI Systems Need Separate Layers for Memory and Autonomy
Nathan Labenz opens his personal AI infrastructure to a security audit by Daniel Miessler, showing a system that combines a high-context Claude Code “second brain” with lower-access autonomous agents for operational work. Their central argument is that useful personal AI should not collapse memory, authority, and autonomy into one assistant: raw personal history should be preserved and audited, while agents that act in the world need narrower permissions, clear roles, and containment. Miessler frames the longer-term model as an assistant that navigates from current state to ideal state while continually pruning obsolete scaffolding as models improve.
Context Graphs Let Agents Retrieve Precedents, Not Just Policies
Neo4j’s Zach Blumenfeld argues that agents built for operational decisions need context graphs rather than document retrieval alone. In his model, a standard knowledge base can tell an agent the relevant facts and policies, but a context graph adds prior decision traces, causal links, precedents and outcomes, allowing the agent to retrieve how similar cases were resolved. He presents `create-context-graph` and `neo4j-agent-memory` as open-source scaffolding for building that pattern with graph entities, short-term memory and embedded reasoning traces.
Abridge Says GPT-5.5 Improves Clinical Synthesis as Tool Complexity Rises
Abridge’s Chaitanya Asawa says GPT-5.5 improved the company’s clinical decision-support system as it added more tools and context, a signal that the model could better synthesize information under complexity. His case is that stronger reasoning and tool use can turn patient context, live clinical conversation, and trusted medical guidance into denser point-of-care support, while leaving clinicians to review answers and accept or reject proposed note edits.
Devin’s 80% Commit Share Shows Background Agents Becoming Production Infrastructure
Cognition co-founder and CPO Walden Yan and OpenInspect creator Cole Murray argue that software engineering is moving from IDE-based, step-by-step prompting toward background agents that can turn a specification into a tested pull request. Their case is that Devin’s rise from 16% to 80% of non-merge commits across three Cognition repos is not mainly a model benchmark, but evidence of a production workflow built on cloud sandboxes, scoped permissions, repo setup, testing, integrations, memory, and code review. Both warn that autonomy without those systems can degrade a codebase as quickly as it accelerates output.
Voice Will Become the Default Interface for Enterprise AI
Luiz Domingos, chief technology officer of Mitel, argues that enterprise AI has moved past pilots and into communications workflows where latency, compliance, auditability and human oversight determine whether systems can be deployed. In a conversation with Craig Smith, Domingos says cloud-only AI will not meet the needs of real-time voice and regulated industries, and that edge and hybrid deployments will become central. His larger prediction is that enterprise AI will increasingly be accessed by voice rather than screens, especially for frontline workers whose jobs do not fit a desktop interface.
Context Graphs Give AI Agents Rules, Precedent, and Decision Traces
In a Neo4j talk, Zaid Zaim and Andreas Kollegger argue that AI agents need more than language models, tools, and retrieval if they are to make consequential decisions. Zaim frames context graphs as a way to store the policies, prior decisions, causal links, and reasoning traces behind an action; Kollegger extends that into a five-stage decision workflow in which agents frame the case, check rules and precedent, assess risk, act only within authority, and write the outcome back to the graph as future precedent.
Children’s Data Profiles Can Begin Before Birth
Proton engineering director Eamonn Maguire argues that a child’s digital profile can begin before birth, as parents’ emails, searches and sign-ups create signals that advertising and platform systems can use to infer pregnancy, family status and future behavior. Speaking with Craig Smith, Maguire uses Proton’s Born Private initiative, which lets parents reserve an email address for a child, to make a broader case that privacy is an infrastructure decision made long before children can consent. He extends the argument to social media, AI training data and the limits of trusting platforms whose business models depend on profiling.
YC Says Internal Agents Need Shared Context, Tools, and Trust
YC’s Pete Koomen argues that building “superintelligence” inside a company requires more than adding AI features to existing software: agents need access to the organization’s shared context, tools and accumulated work. In a Lightcone discussion with Garry Tan, Jared Friedman, Diana Hu and Harj Taggar, Koomen describes how YC’s internal agent system became useful once it could query a unified company database, reuse hundreds of internal tools and turn repeated judgment into improving skills. The broader claim is that AI-native organizations will depend as much on trust, transparency and broad access as on model capability.
Context Engines Make Coding Agents Mergeable, Not Just Functional
Brandon Waselnuk of Unblocked argues that coding agents are failing less because they lack access to tools than because they lack organizational context. In his account, MCP connections, larger context windows and naive RAG give agents more material, but not the judgment to know which code patterns, Slack decisions, ownership signals or backwards-compatibility rules matter. His proposed answer is a runtime context engine that reasons across code, PRs, documents, conversations and social structure before the agent writes code, so its output is closer to something a long-tenured engineer could merge.
Useful AI Agents Need Smaller Contexts and Simpler Representations
Angus McLean, an AI Director at OLIVER, argues that useful agents are not the most autonomous ones but the best constrained. Drawing on OLIVER’s production use of AI across thousands of daily creative assets, he says builders should resist both model and developer tendencies toward verbosity and over-engineering: use curated documentation instead of open web access, ask how little context a task needs, choose simple representations such as HTML when they work, and avoid automating jobs they cannot do themselves.
Google’s GenAI Stack Turns Multimodal Prompts Into Application Pipelines
Google DeepMind’s Paige Bailey and Guillaume Vernade argue that Google’s generative AI stack is being organized as an application pipeline rather than a set of isolated models. In a three-hour workshop, Bailey showed AI Studio turning multimodal Gemini prompts into inspectable API calls and generated apps with auth and Firestore, while Vernade used Gemini, Nano Banana, Veo and Lyria to illustrate, animate and score The Wind in the Willows. Their case is that builders can now orchestrate prompt, code, media generation and deployment in one workflow, even as the demos exposed seams that still require engineering discipline.
ChatGPT Adds In-PowerPoint Drafting and Editing for Business Decks
OpenAI presents ChatGPT for PowerPoint as an embedded drafting and editing layer for business presentations, now available in beta to all customers. The source argues that the tool is meant to turn scattered company material — notes, account context, market research, prior deck fragments and analysis files — into a structured executive deck inside PowerPoint, with the user reviewing the storyline before generation and refining slide content afterward. Its claim is less that ChatGPT can make slides from a prompt than that it can keep the source material, outline, draft and edits in one workflow.
Android Makes Gemini Nano a Shared System Service for Apps
Google’s Florina Muntenescu and Oli Gaymond argue that Android’s on-device AI strategy depends on treating Gemini Nano as a shared system service, not something each app ships and manages itself. In their account, AICore centralizes the three-to-four-gigabyte model, scheduling, battery management and privacy boundaries, while developers call higher-level ML Kit GenAI APIs. The constraint is reach: those APIs need recent flagship-class devices, so Google is positioning hybrid cloud fallback and LiteRT-LM as alternatives when local Gemini Nano is unavailable or too limiting.
VS Code Unifies Local, Background, and Cloud Coding Agents
Microsoft’s Liam Hampton argues that coding agents should be chosen by the amount of control a developer wants to keep, not treated as a single all-purpose assistant. In a VS Code demo using one repository, he assigns tests to a local Claude agent for hands-on iteration, a front-end build to a background agent isolated in a Git worktree, and open-source documentation to a cloud agent running through GitHub Actions. His case is that VS Code can act as the control plane for these modes, including Copilot, Claude, and third-party agents.
Startups Should Build Recorded, Queryable Operations That AI Can Improve
YC general partner Tom Blomfield argues that startups should not treat AI as a copilot bolted onto existing org charts, but as the basis for a company that records its work, exposes its tools, and improves through recursive loops. In his batch talk, he says founders should make company knowledge legible to AI, spend more on tokens rather than headcount, and rebuild operations around systems that can detect failures, update themselves, and reduce the need for human coordination.
Language Models Generalize Differently From Parameters Than From Context
In a Stanford CS25 seminar, Anthropic researcher Andrew Lampinen argues that language models generalize differently depending on whether information is stored in their parameters or supplied in context. His experiments find that models can often use relations flexibly when the relevant facts are visible in the prompt, but fail to make the same reversals, syllogistic inferences, or codebook translations when those facts have only been learned through training. Lampinen presents augmentation, retrieval, and reinforcement-learned recall as partial ways to make latent implications more usable, while stressing that parametric learning and in-context learning remain complementary rather than substitutes.
AI-Native Startups Are Replacing Teams With Agentic Operating Systems
In a Stanford CS153 Frontier Systems lecture, Y Combinator CEO Garry Tan and general partner Diana Hu argue that AI agents are changing the basic production unit of a startup from a team to a founder operating through skills, memory, evals and customer feedback loops. Tan frames agentic coding as a programmable company architecture, while Hu says AI-native companies are becoming closed-loop systems with far higher revenue per employee and less need for traditional managerial coordination.
Any-to-Any Agents Rely on Orchestrated Multimodal Models, Not One Network
Google DeepMind’s Patrick Löber presents “any-to-any” agents as an orchestration problem rather than a claim that one model already handles every modality. In his architecture, Gemini reads and reasons across PDFs, images, audio, video and other sources, then uses function calling to invoke specialized native models for images, speech, live audio, video or embeddings. Löber argues that the useful shift is not generating every possible format, but letting an agent decide when a diagram, spoken explanation or other output is warranted.
Gemini’s Strategy Shifts From Frontier Leaderboards to Deployable AI Infrastructure
Google DeepMind executives Tulsee Doshi and Logan Kilpatrick argue that Google’s current Gemini strategy is built less around a single frontier model than around a deployable AI stack. In their account, Gemini 3.5 Flash, the Anti-Gravity agent harness and new multimodal products such as Omni are meant to make models fast, cheap and integrated enough to run across Search, the Gemini app, AI Studio, YouTube and enterprise tools. The deeper shift, Kilpatrick says, is that the model is increasingly absorbing the scaffolding that once surrounded it, while Google standardizes the remaining agent infrastructure across its products.
Coding Agent Skills Need Live Documentation, Not Cached Product Knowledge
Marc Klingen of Langfuse argues that coding agents can add observability, but often do it first from stale model memory, producing broken or incomplete instrumentation before recovering through current documentation. In a talk on building a Langfuse skill for Claude Code, he says the fix is not to stuff more product knowledge into the agent, but to give it reliable ways to find live docs, expose its intermediate work in traces, and evaluate changes against realistic repositories. The same work, he warns, creates new risks when optimization loops reward shorter paths and remove the documentation-fetching and approval steps that make the skill reliable.
Spotify Uses Semantic IDs to Make LLMs Recommend Catalog Items
Spotify’s Shivam Verma argues that LLM-era personalization requires translating both users and catalog items into forms a model can process alongside language. In his account, Spotify combines long-term user embeddings, Semantic IDs that turn tracks and episodes into token sequences, and soft tokens that project a listener’s profile into an LLM’s embedding space. The aim is a generative recommender that can produce catalog-native recommendations without full fine-tuning, while still relying on traditional ranking layers for production use.
AI Backlash Reaches Commencement as Graduates Face a Reshaped Job Market
Jason Calacanis and Alex Wilhelm argue that the boos greeting pro-AI commencement speeches are a visible sign of AI’s legitimacy problem with new graduates entering the workforce. On This Week in Startups, they frame the reaction less as technophobia than as distrust: students have already seen AI weaken academic norms, threaten entry-level work, concentrate wealth around frontier labs, and expand systems of surveillance and data capture. Their discussion returns to a central question: whether workers, founders, consumers, and citizens have any meaningful control over the AI systems now reshaping their choices.
Agentic AI Is Turning Model Quality Into a Systems Problem
At AI Engineer Singapore’s second day, speakers from Google DeepMind, Cloudflare, Arize, OpenClaw, Adaption and other teams made a shared engineering case: as AI systems become more agentic, model quality is no longer separable from the systems around the model. Richard Ngo framed the risk as long-horizon, situationally aware agents whose goals cannot be inspected, while practitioners argued that production AI now depends on continuous evaluation, traces, deterministic execution boundaries, routing, memory, fine-tuning and test-time search. The source’s central claim is that useful and safe agentic AI is becoming a systems problem, not just a model-selection problem.
Context Graphs Make AI Decision Trails Queryable
Stephen Chin of Neo4j argues that enterprise AI systems need context graphs because retrieval alone can surface relevant facts while missing the relationships that make them usable. In his examples, a graph-augmented system can connect a patient’s emphysema care plan to smoking history or a credit decision to prior rejections, policies, margin trades and fraud signals. Chin’s case is that agents should preserve not only documents and answers, but the decision traces, tool calls, causal chains and outcomes that let humans inspect and reuse prior reasoning.
Economic Entanglement, Not Decoupling, Defines the New China Bargain
Salesforce CEO Marc Benioff joined the All-In hosts for a discussion that framed U.S.-China relations, enterprise AI, and the software selloff around the same question: when dependence is a stabilizer and when it becomes leverage. Benioff argued that more trade with China can lower conflict risk and that large software platforms remain valuable because AI still needs trusted customer data, cash-flowing distribution, and enterprise deployment. David Friedberg, Chamath Palihapitiya, and Jason Calacanis extended the argument across Taiwan, chips, AI assistants, El Niño-driven food risk, and private-market SPVs, where interconnection can either absorb shocks or transmit them.
AI Software Winners Will Own Context, APIs, or Outcomes
Tasklet chief executive Andrew Lee argues that AI software is consolidating toward a few horizontal agent platforms that hold context, connect tools, generate interfaces, and choose among models. In a discussion with Nathan Labenz, Lee says Tasklet has rewritten its agent stack around file-system memory, agentic search, and provider-specific context management because the chat transcript is no longer enough. He also frames Anthropic as both Tasklet’s critical supplier and a major competitor, making model neutrality central to Tasklet’s bid to survive the AI transition.
Supabase Says Skills and MCP Close the Agent Context Gap
Pedro Rodrigues of Supabase argues that agents fail on production systems less because they cannot reason than because they lack product-specific judgment. In a test using the same Postgres task, Supabase found that Claude with MCP alone created a view that could bypass row-level security, while MCP plus a Supabase skill added the required `security_invoker = true` flag. Rodrigues’s case is that MCP gives agents tools, but skills supply the rules, workflows, and current documentation paths needed to use those tools safely.
Intercom Doubled Engineering Throughput by Standardizing on Claude Code
Brian Scanlan, a senior principal engineer at Intercom, argues that the company doubled engineering throughput by treating AI coding as an internal platform strategy rather than an individual productivity tool. In his account, Intercom standardized on Claude Code, encoded recurring engineering work into agent-usable skills, connected agents to internal systems under existing controls, and made AI adoption an explicit expectation across R&D. The reported result was a doubling of pull-request throughput, including 17.6% of merged PRs approved by Claude, alongside new bottlenecks in review and CI.
AI Is Pushing Science Beyond the Paper as Its Core Artifact
In closing remarks from an AI and science meeting, Risa Wechsler argued that AI is reshaping scientific fields unevenly, depending on their data, theory and modes of inquiry, and that scientists should use the moment to choose structures aligned with human values. Surya Ganguli pushed the question toward scientific communication itself, suggesting that papers may be too narrow an artifact for AI-assisted science and that richer institutional records of research could better transfer knowledge. Both framed AI for science as a design problem around human purposes, not just faster automation.
Abridge Bets Clinical Conversations Can Become Healthcare’s Intelligence Layer
Abridge executives Janie Lee and Chaitanya “Chai” Asawa argue that the patient-clinician conversation is becoming healthcare’s core intelligence layer, not merely an input for automated notes. In a discussion with Redpoint’s Jacob Effron, they describe Abridge’s move from ambient documentation into clinical decision support, prior authorization and other workflows that depend on EHR data, payer rules, medical literature and local guidelines. Their case is that healthcare AI will be judged less by chatbot fluency than by whether it can deliver accurate, low-latency, privacy-preserving support inside clinical workflows without adding to clinicians’ alert burden.
Agent Workflows Route Conversations Through Specialized Subagents
ElevenLabs is introducing Workflows, a visual editor for its Agents Platform that lets builders design routed conversation flows instead of placing all business logic inside one agent prompt. The company argues that specialized subagents, each with their own instructions, tools, knowledge bases and model choices, give teams more control over cost, latency and accuracy. The product is positioned as a way to combine AI interpretation with predefined actions, verification steps and human handoffs on the same design surface.
AI Companies Are Running Into Infrastructure, Distribution, and Trust Bottlenecks
TBPN’s discussion argued that AI’s value is now being tested less in model demos than in the bottlenecks around deployment: inference speed, power, workflow integration and access to customers. Cerebras was framed as a public-market bet on faster inference, while Giga Energy’s data-center business showed how scarce powered shells have become part of the AI supply chain. The same bottleneck logic appeared outside core AI, from Audemars Piguet using Swatch as an official low-cost entry point to Augustus, with conditional OCC approval, trying to rebuild dollar clearing as a national bank.
Head-Tail Truncation and Memory Stabilized Arize’s Trace-Analyzing Agent
Sally-Ann DeLucia argues that agent performance depends on context management as an operating discipline, not on larger prompts or simple compression. Drawing on Arize’s work building Alyx, an agent that analyzes trace data from AI systems including its own, she says naive truncation broke follow-up reasoning and LLM summarization gave the model too much control over what mattered. Arize’s more durable pattern was to preserve the head and tail of context, store the middle for retrieval, test long sessions explicitly, and move heavy workloads into sub-agents.
ElevenLabs Voice Engine Wraps Existing Chat Agents Without Rebuilding Them
Luke Harries of ElevenLabs argues that the next step for chat agents is not a new orchestration stack but a voice layer around the agents companies have already built. His case for ElevenLabs’ Voice Engine is that teams can keep their existing LLM logic, RAG, tools and business rules, while offloading speech-to-text, text-to-speech, turn-taking and interruption handling to a wrapper. The product is positioned for companies that want voice interfaces across web, phone and meeting channels without rebuilding their chat agents inside a fully managed platform.
Fresh Product Data Is the Constraint for LLM Commerce Discovery
Criteo executives Diarmuid Gill and Liva Ralaivola argue that modern ad tech is best understood as a millisecond-scale prediction system: anonymous commerce signals, learned embeddings and real-time auctions are used to decide whether to bid, what to show and how much an impression is worth. In a conversation with Nathan Labenz, they frame Criteo’s work with OpenAI and other generative tools as an extension of that problem, not a replacement for it: LLMs may change product discovery, but the system still depends on fresh retailer data, consent, latency discipline and human oversight.
Personal AI Lets One Builder Do the Work of Teams
Y Combinator CEO Garry Tan argues that personal AI is reaching a stage comparable to the early personal computer: powerful enough to let one person build software that once required a team, but still brittle enough to demand technical ownership. Drawing on his work with Claude Code, OpenClaw and his GStack workflow, Tan makes the case for heavy token use, Markdown-encoded “skills” and multiple coding agents under one accountable human operator. The larger question, he says, is whether users will control their own AI tools, data and prompts, or work inside opaque systems controlled by others.
Agentic Search Needs Specialized Tools and General-Purpose Escape Hatches
Elastic’s Leonie Monigatti argues that context engineering for LLM agents is largely a search-interface problem: the critical question is how an agent decides what to retrieve from files, databases, memory, the web, and other sources before the model answers. In her workshop, she shows why semantic search, database query tools, shell access, and agent skills each solve different parts of that problem and fail in different ways. Her recommendation is to build retrieval stacks that combine easy specialized tools for common tasks with more general tools for ambiguous or complex ones, then use observed failures to refine the stack.
Perplexity Frames AI Agents as Metered Digital Labor
Perplexity chief business officer Dmitry Shevelenko argues that AI agents should be judged less as software features than as metered digital labor: tools users will pay for when they perform economically useful work. In a Big Technology Podcast interview, he makes the case that Perplexity’s computer-use agents, workflow packaging, broad permissions and multi-model orchestration are all part of that shift. The unresolved question is whether users and companies will accept the access, trust and usage-based pricing required to make those agents a real business rather than another AI novelty cycle.
Apple Explores Intel and Samsung for U.S. Chip Production
Mark Gurman said Apple has held early talks with Intel and Samsung about using new U.S. fabs to make future A-series and M-series processors, an exploratory move he framed as a supply-chain redundancy question rather than only a political one. Apple still relies heavily on TSMC, primarily in Taiwan, and Gurman described that geographic and supplier concentration as one of the company’s biggest risks. Across the rest of the broadcast, executives and analysts described a similar shift from exposure to execution: AI companies are giving Washington early model access for review, while enterprise adoption is being tested by security, deployment cost and proprietary data advantages.
Agent Failure Should Drive Enterprise AI Knowledge Base Curation
Raj Navakoti argues that enterprise AI agents fail less because of model limits or retrieval plumbing than because companies have not made institutional knowledge legible. In his Demand-Driven Context workshop, he proposes building agent-ready knowledge bases from the bottom up: give agents real tickets or incidents, observe where they fail, and turn those failures into structured, validated context blocks. The method, shown through smaller-scope examples and prototypes including work from IKEA Digital, is presented as an incremental curation loop rather than a proven enterprise-scale system.
Small-Model Inference Needs Infrastructure Beyond Model Servers
Filip Makraduli of Superlinked argues that the hard part of small-model inference is no longer simply serving a model, but operating many embeddings, rerankers, extractors and multimodal models efficiently in production. In his account, conventional one-model-per-container deployments waste GPU capacity and leave teams to rebuild routing, autoscaling, monitoring, hot-swapping and eviction themselves. Superlinked’s SIE is presented as an open-source attempt to provide that missing infrastructure layer for AI search and document-processing workloads.