Orply.

Tokens Can Now Substitute for 100-Person Startup Engineering Teams

Sam AltmanStanford OnlineMonday, June 15, 202615 min read

In a Stanford CS153 lecture, OpenAI chief executive Sam Altman argued that AI has already rewritten the startup playbook, allowing small teams to buy capabilities with tokens that once required large engineering organizations. He used OpenAI’s experience with ChatGPT, Codex and model scaling to make a broader case: scale keeps producing capabilities that experts underestimate, but the institutions around AI — from education and research pipelines to compute markets and governance — are not adapting as quickly. Altman said the central choice ahead is whether intelligence becomes a broadly available utility or remains concentrated in a few companies.

The startup playbook changed when tokens began substituting for teams

Sam Altman told the Stanford CS153 class that if he had “just a little more time,” he would update the startup course he taught at Stanford in 2014. Not because the old version was missing a lesson, but because “everything about starting a startup has changed so much” that he has not seen “a good version of how you're supposed to make a startup now.”

The central change, in his account, is not that startups have new tools around the edges. It is that an early company can now buy an amount of capability that previously implied a large, elite engineering organization.

with like an affordable amount of spend on tokens, you can do what a 100-person, incredibly great engineering team would do as a startup.

Sam Altman · Source
100-person
engineering team Altman said tokens can now approximate for a startup

That changes “the level of ambition you can have, the speed at which you can move, the amount of stuff you can do at once.” Altman did not turn that into a list of startup ideas. He argued almost the opposite: if he can think of a startup idea obvious enough to assign to a class, it is probably obvious to many other people too. The right target is more like OpenAI at its beginning, which he described as “one of maybe generously speaking, four AGI efforts in the world.”

The valuable thing to find now, he suggested, is whatever has become possible only after the automated-coding era but is still non-obvious: a market that may become multi-trillion-dollar and currently has only a few serious entrants. He said students are more likely than he is to know what that is, because his own “brain has been taken over by OpenAI.”

OpenAI itself, by his account, did not begin as an implementation of a refined startup theory. It began as an anomaly: “a research lab first that later had to bolt on a startup.” Altman called it “the strangest startup of the last maybe couple of decades in Silicon Valley” because the normal path is to start with a product company, grow, slow down, and then attach a research lab to search for the next thing. OpenAI was the reverse. He said he does not recommend that sequence as a general model.

The irony is that OpenAI followed pre-AI startup rules because it was trying to make the AI era happen before having the AI tools now available to founders. The current environment, he argued, is different enough that even a recent startup canon is already dated.

Scale works more often than consensus expects, but it breaks systems faster

Altman’s most general claim was empirical and explicitly undertheorized: the most interesting things he has observed in his career have come from “emergent properties at scale” or from scale continuing to produce returns “far beyond what the consensus thinks will work.” He said he does not have a satisfying theory for why this is true, and that makes him “a little bit nervous” to recommend it. But he recommended it anyway because, in his view, the pattern has held across AI models, research organizations, companies, and startup networks.

The example he spent time on was Y Combinator. When YC grew, he said, many smart people argued that it had become too large and should shrink, fund fewer companies per batch, and return to the era when batches were closer to 10 companies. The theory was that the best companies were obvious and that funding many more added work without much value. Altman said the argument was tempting because shrinking would have made YC easier to run.

But the premise missed what scale created: “the network effects inside of the batch.” That, in his telling, was not visible at one-tenth or one-hundredth the scale. No one had funded startups at that scale in the same way, so no one had discovered that the batch itself could generate an emergent property.

He applies the same pattern to AI scaling. When OpenAI committed to scaling deep learning, he said many “geniuses in the field” dismissed the result as uninteresting: if models got better with scale, why keep doing the same thing? Altman’s answer was that the graph kept improving and the returns had not run out. More broadly, he said that when something is already working in a smaller way and can be pushed to a scale others have not tried, “more often than not that seems to be a good idea.”

That does not mean scale is clean. In systems terms, the reason people underexplore it is that “stuff breaks at an accelerating rate and in an unpredictable way.” A scaled system is “always like a little bit broken,” and smart people will always be available to argue for smaller, less ambitious versions.

In the case of scaling AI models, Altman divided the resistance into several categories. There was the technical question of whether enormous runs across 10,000 or 100,000 GPUs could be done at all. There was the capital question: how to justify billion-dollar and multi-billion-dollar computers before the business existed. There was also the cultural question inside a research organization: if the lab obtained that much compute, why put it into one large bet rather than distribute it across many experiments?

His systems advice was not a formal method. It was to break down the reasons not to scale into their constituent problems and address them one by one. The technical stack, the financing, and the internal research culture all had to be made compatible with the same decision: “we're gonna make a bet on scaling deep learning.”

The human part, he said, requires unusual clarity. People need “a clear goal, a clear plan to get there,” and a clear way to make decisions along the way. Humans, in his view, are not naturally good at reasoning about exponentials. They have trouble imagining that scaling laws, revenue, or organizational complexity can continue exponentially. Getting people to believe that requires repeatedly reasoning from first principles.

ChatGPT was not the planned business, but user behavior made the product legible

Altman described ChatGPT not as the result of OpenAI knowing that chat would be the consumer interface for general models, but as the product that emerged after a period of not knowing what to build around GPT-3.

OpenAI had built GPT-3 and needed revenue because it wanted to scale toward billion-dollar and multi-billion-dollar computers. GPT-3 was “kind of interesting” and “a cool demo,” but the company could not find a product that worked. Altman said the models were expected to improve, but OpenAI wanted a revenue engine sooner. So the company launched the GPT-3 API in the summer of 2020 on the theory that if OpenAI could not find the product, someone else might.

The API initially got little traction. Then, about a month later, it went viral on Twitter after several developers independently got it to do interesting things and posted examples. Usage increased, but Altman emphasized how weak the models were by current standards. “If you go back and use GPT-3 or 3.5,” he said, “you will be astonished at how bad the models were then, relative to the amount of excitement they generated at the time.”

The one business that worked in a significant way on GPT-3 was copywriting, which he described as “not that great and not that exciting.” But OpenAI noticed another behavior: developers who could not make the API work for their business were using their API keys simply to chat with the model. That was the signal. People wanted the interaction even when the commercial use case was unclear.

OpenAI had GPT-4 finished internally and had an intermediate model, 3.5, ready to release. It had also developed a new form of post-training that improved instruction-following, making chat easier. The company built ChatGPT around that behavior. Altman said it was meant as a research demo to persuade other companies to build chat-like products on the API, not as the main product itself.

Then it went “crazy viral.” The key YC lesson he applied was that when something grows rapidly while still being “not very good,” it is probably a hit. For about five days, traffic would spike, fall off, and trigger internal speculation that the hype had passed; the next day it would reach a higher peak. By the fourth or fifth day, Altman said he recognized the pattern.

this is an emergency, this is the good kind of emergency, but we have to build a company and a product all at once.

Sam Altman · Source

That meant two months of scaling, followed by a pragmatic business decision: charge users so compute bills did not exhaust the company. Altman said that stopgap “also turned out just to work.” There was more utility in the system than users had activated on their own. ChatGPT lowered that activation energy.

Codex had a different origin. Altman said that before ChatGPT, OpenAI had planned to go all-in on code. The company believed models could write code and that coding would be valuable. More fundamentally, it saw code as one of the ways models would act on the world: coding would let them control computers, while robotics would let them control the physical world. If a model was smart enough and had the actuators of code and robots, it could “actually get this intelligence to do stuff for you in the world.”

That plan was interrupted by ChatGPT’s unexpected growth. But Altman said Codex “got really good” early in the year of the lecture, with a “real inflection point” at 5.5, where people began doing “incredible, incredible things with it.”

The current training pipeline is real, but Altman expects it to be rewritten

The interviewer described a capabilities pipeline that had become increasingly legible across research groups: pre-training, mid-training, post-training, then reinforcement learning and supervised feedback. Altman called that “definitely the current pipeline,” but not one he expects to remain stable.

He expects “a major rewrite,” though he does not know when it will happen or what form it will take. The current pipeline feels odd to him because it is so sequential, and “doesn't quite feel like the optimal solution.”

When asked what an optimal solution would look like, Altman said that is a research problem for AIs themselves. He framed OpenAI’s near-term research goals in compute-equivalent terms: by “September of this year,” use 500,000 A100-equivalent GPUs “as an AI research intern”; by March 2028, have a “full end-to-end very talented researcher” capable of figuring out completely new architectures.

His expectation is that current pipelines and architectures are likely enough to get “over the line” where AIs can do “incredible, incredible work.” Beyond that, he did not specify an architecture. He described a research process in which AI systems help discover what comes next.

Education has not adapted to the systems it is now teaching students to use

Altman’s comments on education sat next to his discussion of AI research assistants because both concerned institutions built around human cognitive work. If AI systems can be used as research interns and eventually as highly capable researchers, he argued, schools cannot keep teaching and evaluating as if they are in a pre-AGI world.

Altman said education “clearly has to super adapt,” and he is worried that it has not. When ChatGPT launched, he expected roughly one year of cheating and shallow learning before the educational system redesigned itself around AI. Instead, three and a half years later, he said he struggles to point to “any significant systemic change” in education at large.

The risk, in his view, is not that students use AI. It is that schools continue teaching and evaluating as if the tools do not exist, which he thinks will lead to “atrophy” in learning how to think. He expects some old skills to remain worth teaching even when machines perform them better, because they teach meta-skills. He used writing as his own example: he thinks by writing, often writing things he never shows anyone. Others say the same about programming. But much of teaching and evaluation, he argued, should change.

Intelligence may be a utility, but selling the utility requires a concrete use

Altman said he has been studying how new utilities become legible because that is what he thinks AI is becoming. Utilities of this kind, in his framing, are rare: electricity, the internet, water.

Electricity is his working analogy, though he emphasized that it is imperfect. Early electricity companies, he said, did not succeed by selling “electricity” as an abstract substance. Electricity sounded frightening: something entering the home that could kill people. What worked was selling “light at night.” Once people understood the immediate benefit, they could later come to understand that the same system might wash clothes or power other uses.

Altman suspects AI has a similar communications problem. Even if OpenAI is right that intelligence becomes a new utility used by every company, customer, and government, “selling intelligence” may not resonate. The eventual world he described is one in which a person or organization has an OpenAI token subscription plugged into everything, running continuously and doing useful work. But the abstraction is not yet enough.

I don't know what our equivalent of we're selling you light at night is going to be.

Sam Altman · Source

The class had also discussed compute as a utility, including the idea that institutions might pool budgets to procure access. In this exchange, Altman emphasized cheap, abundant intelligence delivered through tokens as the user-facing utility he was trying to make legible, while the interviewer pressed the relationship between that framing and compute itself as a utility.

When asked what he would work on as a student building a “one-person frontier lab,” Altman pointed not to training ideas but to inference infrastructure. His reasoning was that many smart people are already working on training and that “we're gonna have incredible models” regardless. The underinvested problem is delivery: how to provide “huge amounts of cheap intelligence” at scale. He said frontier labs will have to become inference companies “to a significant degree.”

Altman rejects the claim that LLMs are a dead end

Asked about Yann LeCun’s view that LLMs are a dead end, Altman answered by separating capabilities. Current models have “far surpassed human intelligence in some ways” and are “wildly worse in others.” They remain much worse than people, he said, at very long-horizon, high-judgment tasks. But he argued they have already demonstrated forms of intelligence critics said they would not.

His example was a model that, the day before the lecture, had disproved a conjecture, one of the Erdős problems, that smart people had worked on for a long time. He said many scientists had recently claimed such a thing would not happen, and then “the model just did it.” His conclusion was that LLMs are clearly capable of discovering new knowledge and doing some intellectual tasks humans cannot.

He did not say world models are irrelevant. He said they are “clearly important,” especially for robotics. But he described betting against LLM scaling at this point as “quite misguided.” He connected the skepticism to the earlier point about exponentials: in his view, the field was held back by scientists who were too certain about what scaling would not produce, while others looked at the graphs and kept going.

Altman also drew a caution from the sociology of disagreement. If someone makes their identity depend on a particular technical claim — that something will work or will not — then empirical results can become hard to accept. He said this is an important reminder “in both directions.”

The biggest fork is democratization versus concentration

When asked for the most likely forks over the next 10 years, Altman put one above the others: whether AI becomes widely democratized or remains concentrated in a few companies.

He sees reasons the default could be concentration. The technology could sit inside a few firms that become “a significant fraction of the wealth on Earth.” He called that outcome “terrible,” despite OpenAI’s being one of the companies that could benefit from it. He also called it unstable, unfair, and an “alignment failure” because a world in which a few companies control the technology would be fragile and would not represent everyone’s values or agency.

His preferred path is to push the technology into the world through a utility model. That, in his account, is the best way to produce a future where “everybody winning” is possible. But he expects resistance under the banner of safety and stability, as well as from people seeking power.

Altman put the probability of the democratic path at roughly 80%, because the world has such a strong interest in that outcome. But he treated the remaining risk as serious and urged students to use their careers to push against concentration.

the risk of keeping this concentrated in a handful of companies, even though we would be one of those companies, is not something we should tolerate.

Sam Altman

A related economic question is ownership. Altman said there is a lot of discussion about universal basic income, universal ownership of company shares, unchanged capitalism, or communism. He said he has become “much less of an even short-term jobs doomer” than before: he remains optimistic that people will find new things to do, and he suggested the short-term disruption may be less severe than he originally thought.

Still, as leverage shifts from labor to capital, he prefers some form of ownership stake to a fixed monthly cash dividend. He cited his experience funding a large UBI study and watching people invest in startups, saying he knows which model better fits human psychology. What he would like to see is something like a citizens’ wealth fund at the national level, and eventually perhaps globally, where people “own a slice of capitalism.”

Compute shortage is already live, and demand may stay ahead of supply

The second major fork Altman identified was compute distribution. He said compute shortages are already visible and could become much worse. If compute becomes “the most important utility that people need,” then a large imbalance between supply, demand, and price raises the question of equitable distribution.

The interviewer described current market stress in terms of H100 and Blackwell pricing, saying that spreads between long-term reservations and spot had been around 5x earlier in the year, and that H100s were difficult to find. Altman did not confirm the exact spread, saying he was not sure it remained that high, but he agreed on the underlying condition: “there's a gigantic compute shortage.”

Why, then, are people not panicking? Altman offered two reasons: people expect large inference gains on existing hardware, and “a tsunami of hardware” is coming. But he added that the demand tsunami may be even larger, and that people “should be freaking out somewhat.”

He compared AI demand to electricity demand: there is no meaningful statement about global electricity demand without specifying price. If price falls by a factor of 10, demand looks very different than if price rises by a factor of 10. AI, in his view, works similarly. If models become sufficiently smart and sufficiently cheap, demand is “kind of uncapped.”

That means the shortage may not be a temporary anomaly. As long as AI keeps improving, he said, there may be “a shortage forever” in the sense that new capability creates new demand. If personal agents become good, people may not want one agent; they may want 10 running continuously, or 100.

For students looking for underleveraged systems work, the scarce problem is not only how to train more capable models. It is how to make intelligence cheap, abundant, and deliverable enough that the utility model can exist.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free