YC Says Internal Agents Need Shared Context, Tools, and Trust

Garry Tan

Diana Hu

Jared Friedman

Harj Taggar

Tom Blomfield Pete KoomenY CombinatorWednesday, May 27, 202617 min read

YC’s Pete Koomen argues that building “superintelligence” inside a company requires more than adding AI features to existing software: agents need access to the organization’s shared context, tools and accumulated work. In a Lightcone discussion with Garry Tan, Jared Friedman, Diana Hu and Harj Taggar, Koomen describes how YC’s internal agent system became useful once it could query a unified company database, reuse hundreds of internal tools and turn repeated judgment into improving skills. The broader claim is that AI-native organizations will depend as much on trust, transparency and broad access as on model capability.

The unlock was not an AI feature. It was giving agents the company’s context.

Pete Koomen described YC’s internal AI effort as starting from a narrow operational problem and turning into “a whole infrastructure layer” for the organization. The original problem was finance. YC’s finance team had complicated workflows — booking journal entries, logging price rounds, and other work that helped YC run — and the old software-building loop was slow: finance would explain a workflow to engineers, engineers would encode it in purpose-built deterministic software, and the tool would be handed back for another round.

Koomen was seeing that loop at the same time agentic coding tools were becoming useful enough to change how he worked on his own machine. The contrast bothered him. In a local coding environment, an agent could help him do almost anything. At work, software still moved through a narrow translation chain from domain expert to engineer to product interface.

The first idea was not to replace finance with an agent, but to give the finance team more direct control over its own software. Instead of asking engineers to understand and encode every workflow in Ruby, YC could let finance encode workflows “as English with prompts.”

The first unexpectedly powerful primitive was simpler than that: SQL access. Garry Tan said the earliest versions he remembered were not agentic coding but LLMs writing SQL queries. They worked well enough that non-engineers in finance could ask real questions against YC’s data.

Koomen said the magical moment came when YC’s internal agent loop had a shared tool registry and a tool that could run read-only SQL queries against YC’s database. There was also a tool that let agents read model files. Jared Friedman said he had built those tools and felt as if he was “breaking the rules,” because YC had begun with narrowly scoped tools and he kept finding them too weak.

Friedman described the move in more aggressive terms than Koomen’s later “read-only” framing. He said he wondered what would happen if the agent had “complete access to the production database where it could just trample on anything,” and said he pushed the tool out “surreptitiously,” perhaps late at night. Koomen’s answer was terse: “And it worked.” The source preserves both facts: Koomen identifies the durable unlock as read-only SQL querying, while Friedman’s memory emphasizes the break from narrow permissions and the willingness to test a much broader access pattern than YC had started with.

What made it work was not only the model. It was YC’s internal architecture. Koomen emphasized that YC had long run mostly on its own software, and that software sat on one Postgres database containing the important objects of YC’s world: companies, founders, financial transactions, internal CRM notes, and more. Functions that many companies would have spread across third-party SaaS tools were inside one schema.

That meant an agent with schema context and SQL access could answer questions about the business that previously required a data request or a custom analysis. Koomen gave the example: “Show me all of the investors who invested in a space-related company in the last four batches.” In a fragmented system, that question would require finding the right data, joining it, and asking someone with database access to make time. In YC’s setup, the agent could query the unified context.

Friedman argued that the change did not merely make existing questions cheaper. It changed the number and ambition of questions people were willing to ask. Under the older BI-tool regime, a question like identifying investors in space-related companies might take hours of SQL. Unless it was critical, people would not bother. Koomen framed this as an instance of Jevons paradox: reduce the coordination and cost of asking questions, and the organization asks far more of them.

350+

YC-specific tools in the internal shared tool registry

The lesson Koomen drew for other organizations was concrete: common context matters. A data warehouse or equivalent shared context layer, where as much internal context as possible lives, makes agents more useful. He compared it to the advantage a coding agent has inside a monorepo. YC’s agents, operating against a single database with one schema, were much more effective than they would have been if the relevant context were scattered.

The source also raised a harder case: what should companies do if their context is not already unified? In an unattributed passage, the answer was not simply to wrap every existing system and hope agents can navigate the mess. The speaker compared the moment to earlier infrastructure shifts such as “big table” and argued that scattered data may need to be denormalized into forms optimized for agent retrieval and understanding. The example given was GBrain: data from many systems normalized into a schema relevant to the user and the questions being asked, with retrieval, RAG, graph RAG, hybrid retrieval, and reranking inside the system. The practical claim was that agents need data in a form they can retrieve, understand, and act on.

The multiplayer agent stack has different primitives than a personal coding assistant

Pete Koomen said the popular agent harnesses are still largely in a “single-player era.” Claude Code, Codex, Pi, Open Claude, and Hermes are designed around one human using one agent on one machine. In that setting, the agent can be extremely powerful because it can act broadly inside the user’s local environment.

The harder problem, in his view, is the multiplayer harness: how to make those superpowers work at the team or organizational level. YC’s internal system became an exploration of that problem. The primitives that mattered were not only models. They were shared context, a tool registry, skills, and loops that improve those skills over time.

The tool registry is where YC-specific usefulness accumulates. Koomen said the early system was simple: an agent loop, a simple registry, a model router underneath, and a few tools. At first there were about 20 tools, including SQL access. Over time, teams added more. By the time of the discussion, YC had more than 350 tools. Partners could manage office hours; finance could book journal entries; events could be managed; teams could add tools whenever they found a piece of work that could be improved with an agent.

The registry also lets tools be reused across different surfaces. Internal agents can use them, but so can Claude Code running on individual machines. That is part of why Koomen treated the registry as one of the main things he would build in any organization trying to make agents useful internally.

An unattributed passage connected YC’s tool registry to a broader pattern the speaker called “resolvers”: places where an agent can discover what capabilities exist and how to invoke them. The example was a meta-skill in Open Claude called “skillify,” which turns a useful interaction into a reusable skill and plugs it into a resolver such as an agents.md file. In that framing, Claude Code’s skill registry, YC’s tool registry, and similar systems are versions of the same emerging primitive.

The important operator lesson was not the name of the primitive. It was the maintenance discipline around it. The speaker described using a “check resolvable” meta-skill to inspect existing skills and tools after a new one is created, asking whether the set is DRY — “don’t repeat yourself” — and MECE — mutually exclusive, collectively exhaustive. The practical problem is easy to recognize: ten overlapping skills make the agent system harder to use and harder to improve; one well-parameterized capability is more useful. The source compared the moment to early programming history, when basic primitives were still being discovered and rediscovered in parallel.

Koomen said YC’s own use of skills followed a progression. First people wrote their own system prompts. Then skills emerged and people wrote their own skills. Then they began meta-prompting, asking agents to write or improve skills. YC now has autonomous self-improving loops: every night, a general agent reads through employee-agent conversations and looks for things that could have gone better or pieces of context that would have made the agent more efficient if provided up front.

An unattributed speaker called this a “dream cycle,” comparing it to auto-research-style loops and GBrain. Koomen’s version was specifically about skill improvement. The same pattern, the source suggested, could also read transcripts and write useful knowledge back into internal systems such as a CRM.

A small YC skill shows how organizational intelligence compounds

Pete Koomen’s clearest example of compounding came from YC’s two-sentence description skill. He described it as a shared skill partners use to help companies write concise descriptions of what they do and why it matters.

Garry Tan unpacked why that apparently small task matters. Founders often have perfect context in their own heads and fail to reproduce it in someone else’s. A good two-sentence pitch answers, first, “What is this?” and second, “Why is it interesting or valuable?” If the listener cannot identify what the company does, they cannot even ask a useful question. If they understand the category but not why the company is noteworthy, they tune out.

YC partners have practiced this skill hundreds of times. Koomen said Tom Blomfield wrote an initial skill teaching an agent to take context about a company and condense it into a two-sentence description. Then other partners ran a group office-hours meeting with founders in the spring batch, had each founder try a two-sentence description, and gave live feedback. That meeting transcript captured the tacit knowledge in the partners’ heads: the corrections, instincts, objections, and patterns that make a description work.

The transcript was then fed back to the agent with the instruction to improve the two-sentence description skill based on what it had learned. Koomen said the result was “noticeably better” and argued that “this thing is now better than I am” at writing those descriptions.

Tan treated that as the micro-mechanism for building what he called superintelligence inside an organization. A prompt is written. It is used. Other people use it. Artifacts are produced around the use of it, including transcripts. Those artifacts are then used to meta-prompt and improve the skill automatically. The improved skill becomes available to everyone, and it contains more than any one person’s original prompt; it contains the organization’s accumulated feedback.

An unattributed passage made the same point through a broader organizational analogy. The speaker referenced Jack Dorsey’s work at Block as an attempt to turn the company into a “mini AGI” around helping people make payments. The two-sentence pitch skill was presented as a small instance of the same mechanism: any organization is an aggregate of thousands of repeated operations, and each operation can become a target for capture, improvement, and reuse.

Diana Hu argued that this is what makes an AI-native organization different from merely using AI as a copilot. The copilot framing, in her view, was already dated. The more important move is using AI as “the building layer for everything” and recording the artifacts of work. Meeting recordings, in this framing, are not only for coaching participants after a meeting. They are raw material for improving emails, communication, planning, and other outputs because they preserve the context of how the work is actually done.

Harj Taggar connected this to social norms around recording. Two years earlier, recording meetings could feel intrusive or socially ambiguous. By now, he said, it is often assumed that Zoom meetings and many other meetings are being recorded. That social-cultural shift matters because some blockers to AI progress are not technical; they are about what people are willing to capture, share, and reuse.

Koomen acknowledged that recording everything can feel “a little scary,” but said the frame changes if it is understood as a way for everyone in the organization to improve using the collective skill and instinct of the people they work with. A canonical two-sentence description skill is not just a text generator for founders. It helps Koomen himself understand founder communication better because it lets him draw on what Diana, Harj, Tan, Friedman, and others have learned over years.

Jared Friedman called the result “a shared organizational brain,” the closest thing to connecting brains. Koomen agreed. Once knowledge is in a place where an agent can work with it, employees can practice against it, ask for critique, and learn from patterns that previously lived only in other people’s heads.

The cultural requirements are as important as the technical ones

Garry Tan said YC made a consequential internal choice: by default, agent conversations are globally viewable by any full-time employee. The decision was not obvious. It raised questions about everyone seeing everything and what would or would not be acceptable. Tan said he was glad YC kept it open because employees learned how to use the system by watching one another.

Pete Koomen said that transparency solved several problems at once. Every agent conversation was broadcast internally to a Slack channel. Anyone could join, watch, and learn. When Tan began using the system heavily and creatively, others saw use cases that had not occurred to them.

Transparency also became a kind of social control. If agents are most powerful with unrestricted access to context, the organization needs a way to keep that power from being misused. Koomen argued that broadcasting conversations by default allowed YC to be more lenient on internal security while still encouraging people to keep private information private. That only works, he emphasized, in a high-trust environment.

Tan drew a stronger conclusion: organizations that want the full benefit of agentic systems need to be relatively egalitarian and trust by default. Those are not the default properties of most organizations. Many companies are command-and-control by default; leadership may get access to tools while line-level employees do not. Tan argued that founders who want this kind of organization need those cultural properties at the core.

Koomen said that environment works best at startups: small groups of aligned people operating with high trust. Tan added a cost requirement: companies must be willing to spend tens or hundreds of thousands of dollars a year on tokens, at least while the capability is still expensive. In his view, that spending buys early access to a future default. What costs $100,000 or $1 million now may cost far less in a year or two. Companies that spend early and build skills openly can, he argued, “live in 2028” and leapfrog incumbents that wait.

Jared Friedman compared it to the 1990s, when some companies began buying computers for employees. The systems were expensive and flaky, but having them was a superpower if competitors did not. Diana Hu described the internal effect as “raising the floor.” A new employee who might previously have needed six months to ramp up can access much more company context immediately and learn how the strongest people in the organization do things.

That is not only a productivity story. It changes apprenticeship. Hu said a new employee can simulate what it is like to be Pete coaching founders on sales or Garry giving specific advice. Koomen compared it to his first experience using coding agents: he could ask all the “dumb questions” he was too embarrassed to ask a person. At an organizational level, a new employee can ask the agent instead of interrupting Harj or another busy colleague. More questions get asked and answered, and people ramp faster.

AI-native software puts the agent around the tools, not inside a feature box

Pete Koomen’s essay “AI Horseless Carriages” was a critique of AI software that imitates old software patterns. The screenshot shown from the essay framed the problem through Gmail’s AI assistant and the historical “horseless carriage” analogy: early cars that borrowed from horse-drawn carriage design in ways that later looked obviously broken.

The visible essay text stated the argument directly: Koomen enjoyed using AI to build software more than using most AI applications, because AI used as a building tool felt like “a power tool,” while many AI app features felt “tacked-on and useless.” The screenshot showed a Gmail AI assistant example that generated an email draft from a prompt, next to the horseless-carriage comparison and an image captioned “Trevithick’s London Steam Carriage of 1803.” The point was not the email use case alone. It was that early tools built with a new technology often mimic the old way of doing things and inherit constraints that later look unnecessary.

Koomen’s core critique was that many companies add “a little bit of AI inside of a lot of software.” His example was an email writer in Gmail: a narrow AI feature embedded inside a conventional product. His broader objection was that such products keep prompt context and control locked away from users because the developer assumes it is the developer’s job to decide how the AI should perform the task.

Garry Tan called that “safetyism.” Koomen’s more general claim was that AI’s potential is to shift control of software from developer to user. The tools that feel powerful are not deterministic applications with AI sprinkled inside. They are agents that can wrap deterministic tools and use them flexibly. He expects better AI-native software to look more like an agent wrapping software tools than deterministic software wrapping an AI model.

This led to a debate over interface. Harj Taggar pushed back against the idea that AI needs a radically new interface beyond chat. In his view, that argument often comes from people who have not really used strong agentic systems. Chat works because as users trust the agent more, they need less UI for reviewing every action. Occasionally the agent may need to present a specific view, but the base interface can remain conversational.

Tan called this “just-in-time software”: if a user needs a specific interface at a specific moment, the agent can build a single-page JavaScript app or a reusable skill file for that moment. Diana Hu said she had changed her mind on this. In 2023 she had thought chat might not be the right UI for AI applications. After using the tools more, she came to think chat is likely better because it is closest to human language, and human language and writing are close to expressions of thought. A narrow box constrains the intelligence too much.

Taggar added that modern chat is already multimodal. A user can provide text, voice, pictures, and files. Tan said voice memos, when he does not want to type, make the interaction feel like talking to someone.

Tan illustrated the shift with his own software work. He said he spent January and February building Garry’s List as a Rails app with roughly half a million lines of code, including an agentic framework for research and fact-checking. He built it the way he would have built software in 2013, the last time he had written code seriously: a Web 2.0-style application. Claude Code made that possible, but the result was still rigid.

He contrasted that with GBrain, which he described as Garry’s List 2.0: more open, more dynamic, and built around Open Claude, Telegram, his retrieval system, and MCP. The rewrite, in his telling, did not need half a million lines of Rails. It could be closer to 10,000 lines of TypeScript and 2,000 lines of markdown, with behavior controlled by skills and prompts that non-engineers could change. His editor in chief could adjust an evaluation skill on the fly without Tan touching Rails code or a Ruby file.

Koomen generalized the point: the best AI software he has used tends to be very small. It includes the minimum amount of code needed ahead of time so the model can do the rest. He mentioned Pi, an open-source harness that he described as “the smallest possible coding agent,” and noted that Pi can be used to modify and extend Pi. He is watching for more classic software to emerge in that form: a minimal starting point that an agent extends over time.

The open question is whether AI becomes personal computing or another mainframe era

Garry Tan framed AI as capable of being either centralizing or decentralizing. Gmail’s locked prompt was, for him, a small example of centralization: the user cannot change how the AI behaves.

He described a possible future with a small number of dominant AI providers controlling the most advanced systems, compute, power, and even space data centers. In that future, users do not run their own prompts; AI “happens to you.” He compared it to a world where personal computers never existed and computing remained controlled by mainframes, minicomputers, corporate policies, and a small technical priesthood.

Pete Koomen agreed with the historical contrast. The computing revolution accelerated when people had personal computers they could experiment on. Tan’s preferred analogy was the Homebrew Computer Club and the early Apple I moment: people soldering together boards, discovering primitives, and learning how to package and sell a new kind of computing.

Tan argued that current AI is at a similar primitive-discovery stage. The alternative to centralized AI is a “personal AI” moment: users can run their own software, change their own prompts, test behavior, maintain private repos, choose which model to use, and potentially use open-weight models. He pointed to GBrain, Hermes Agent, and Open Claude as examples of the direction he wants: AI as an extension of the user and what the user cares about, not only what Meta, Alphabet, OpenAI, or Anthropic decide to provide.

Jared Friedman said he resists framing AI as a way to replace people because it does not match his experience. He sees AI as empowering individuals, consistent with the arc from mainframes to PCs to the internet. The internet gave people a publishing platform; AI, in his view, will let people do more and eliminate drudgery that made work painful.

Tan’s closing warning was that empowerment is not automatic. Companies are not open by default. They are often command-and-control by default. Access to powerful tools may be reserved for leaders rather than staff. To get the decentralized, empowering version, organizations and builders have to make explicit choices about openness, trust, and who controls the computing environment.

AI Application Architecture RAG and Knowledge Systems AI Consumer Products Agents and Autonomy Human-AI Interaction AI Product Management Enterprise AI Adoption