Modern AI Needs Inference and Incentives, Not AGI Framing

Michael JordanMachine Learning Street TalkWednesday, May 20, 202623 min read

Michael I. Jordan argues that modern AI is being framed around the wrong object: an isolated intelligent machine rather than the collective economic systems in which machine-learning components actually operate. In this conversation, the Berkeley statistician and computer scientist says AGI is mostly a PR term, and that the field’s harder problems lie in inference, uncertainty, incentives, markets, and mechanism design. His case is not that recent models are unimpressive, but that prediction and fluent language are only pieces of systems that must be engineered around human institutions.

The wrong object of analysis is the isolated intelligent machine

Michael Jordan does not accept the standard framing in which modern AI is a race to endow a machine with intelligence, autonomy, or eventually “AGI.” He calls AGI “a PR term,” and says the term distorts research agendas, business models, and the expectations of younger technologists. His objection is not that machine learning systems are useless or that recent systems are unimpressive. It is that the dominant language points attention toward the wrong object: a disembodied intelligent agent rather than a social, economic, and statistical system.

Jordan’s starting point is biographical as well as technical. He says he has “never actually thought” of himself as an AI researcher. He was trained as a statistician and cognitive scientist, and came up through the machine-learning tradition rather than through symbolic AI. The older AI program, as he describes it, was associated with John McCarthy-era aspirations and methods such as logical inference, which “didn’t really quite pan out.” Machine learning, by contrast, arose through decision trees, nearest neighbors, logistic regression, hidden Markov models, and related methods developed largely in statistics, operations research, and adjacent literatures. Those methods, he argues, were already producing industrial success long before the recent language-model wave: supply chains, commerce, transportation systems, and other large operational infrastructures used “vast amounts of machine learning” and still do.

The return of the AI label, in Jordan’s account, followed the rise of language data. Once the statistical box no longer merely predicted prices, logistics delays, or commercial quantities, but emitted fluent human language, people concluded that the old AI problem had been solved. If the target is defined narrowly as something like the Turing test, Jordan concedes, “yeah” — but he thinks that misses the larger engineering and social question.

The “collectivist, economic” alternative he proposes begins from the fact that these systems are already collective. Their inputs come from billions of people; their outputs are meant to serve billions of people; their consequences are mediated through platforms, markets, institutions, and incentives. Intelligence, in this view, is not only something inside an individual agent. Humans aggregate opinions, retain abstractions in cultures, coordinate through markets, and act in contexts shaped by other agents who may collaborate, compete, or exploit.

That is why Jordan wants economics in the foundation, not as an afterthought. He does not mean economics as a slogan about markets or capitalism. He means mathematical tools for strategic interaction: game theory, mechanism design, contract theory, incentives, equilibria, and social welfare. “I’m not a critiquer of AI,” he says. “I want to make it right.” To do that, he argues, the field needs formal, actionable mathematical ideas about the systems into which machine-learning components are being inserted.

The role of humans as producers and consumers in these emerging systems should be respected, amplified.

Michael Jordan

“A Collectivist, Economic Perspective on AI” frames the current discourse as excessively “individualist”: focused on endowing equipment with human-like intelligence or autonomy, while treating multi-agent systems and socio-technical interactions as secondary.^† The alternative is to design systems around the social mechanisms that coordinate thought and action. In economics, Jordan notes, algorithms such as pricing do not exist “in the minds of the producer nor the consumer.” They exist at the level of the system.

Fluent language did not make the assistant-on-your-shoulder model good engineering

Michael Jordan’s skepticism about current AI hype is partly a critique of business imagination. He sees the modern language model as an extension of the search engine: useful, powerful, and important, but not a sufficient foundation for the all-purpose assistant vision. Search engines were “major progress for humanity,” he says. But the next step — a secretary on your shoulder, whispering continuously into your life — strikes him as a “dumb business model.” Many people, he predicts, will not want constant interaction with such an entity. They may want a summary at the end of the day; they do not necessarily want a companion hovering over every thought.

That criticism is tied to his larger systems view. Healthcare, transportation, finance, commerce, and supply chains already contain enormous data flows among many agents. Jordan thinks these are better places to look for consequential machine learning: not because they are glamorous, but because they already involve prediction, uncertainty, incentives, cooperation, competition, and money. The question is not whether a chatbot can appear clever. It is what ecosystem a predictive component belongs to, who it interacts with, at what rate, under what constraints, and what value is created.

He repeatedly distinguishes a statistical input-output component from a system. A large neural net may be a sophisticated predictor, but calling it a system in the relevant sense is insufficient. The real system includes the surrounding institutions, incentives, human decisions, data owners, downstream users, and regulatory pressures. When the interviewer raises real security and safety risks from autonomous software, Jordan answers through system-level examples: airplanes, autopilots, cars, roads, weather, human intervention, and mixed autonomy. His point is that the safety question has to be posed at the level of the whole arrangement, not as though “a super intelligence behind the wheel of a car” were the relevant design concept.

Jordan rejects the idea that one can get the economic layer “for free” by turning language models into multi-agent systems. He compares that attitude to a chemical engineer in the 1940s or 1950s saying, “we’re just going to throw a lot of stuff together and make it work.” You might make something, he says, but you would also get explosions, non-viable economics, and harm.

His complaint is unusually blunt: other engineering disciplines had builders, but also concepts — Maxwell’s equations, Newton’s equations, thermodynamics, and other bodies of theory that helped orient design. In current AI, he sees many smart coders, large resources, ad hoc architectures, and intuitions, but not enough “deeply intellectual” structure around the social systems being affected. He says it is possible to build these systems because earlier generations created the internet and the data flows now being harvested. It is possible, in his words, to “steal the data from wherever you want to,” run gradient descent on it, and raise huge sums of money from people “who aren’t thinking very deeply.” That is not, in his view, the same as mature engineering.

He is not opposed to building systems whose internals are not fully understood. In fact, he says explicitly that this is not inherently bad. Humans themselves are not explainable in mechanistic detail; no one can fully explain why they picked one Airbnb over another. What matters in many settings is predictable input-output behavior, constraints, and the ability to interact without being harmed. The problem is when the surrounding vocabulary becomes buzzword-heavy — “AI safety,” “understanding,” “AGI” — without the engineering concepts needed to make the interaction work.

For Jordan, useful explanation is actionable, not merely mechanistic. If a bank denies someone a loan using a large model, showing them an internal circuit is not the relevant explanation. A more useful system might show 50 people close to them under the model’s embedding, some of whom received the loan and some of whom did not, and then identify the differences that the applicant can act on. That kind of explanation requires systems built around the predictive model: comparisons, interfaces, and institutional procedures. It is not obtained by staring inside the neural network and hoping to find a human-readable thought.

Prediction is useful, but science needs inference with error bars

Michael Jordan’s account of AlphaFold is his clearest example of what modern machine learning can do well and what it still lacks. He says he is “a big admirer” of AlphaFold and does not view it as analogous to a general-purpose LLM. It is targeted at a particular class of problems and performs strongly. But when scientific questions move to the edge of knowledge, high average accuracy is not enough.

The example he discusses concerns whether, in his spoken formulation, “quantum fluctuations” in proteins are associated with phosphorylation — whether proteins are active in the cell. The on-screen chart frames the empirical demonstration in terms of an odds ratio for intrinsic disorder conditioned on PTM: “P(intrinsic disorder | PTM) / P(intrinsic disorder | no PTM).” The statistical structure is a two-by-two test: phosphorylated or not; the relevant fluctuation, disorder, or structural property or not. Using only known proteins with crystal structures, Jordan says, there is not enough data to test the hypothesis with high power. Using roughly 200 million AlphaFold predictions, the hypothesis can be tested with high power, and the null hypothesis of no association can be rejected.

But that apparent power is misleading if the model is biased for the specific query. Jordan says that in their empirical work, the confidence interval on the relevant statistic from that two-by-two table was “extremely narrow and way far from the truth,” as measured against the gold-standard value. This was not because AlphaFold was globally bad. It may have high overall accuracy. The problem is that for a specific scientific question — especially one involving phenomena underrepresented in the training set, such as hard-to-crystallize proteins with the property Jordan was asking about — the model may be biased and may not report the relevant uncertainty.

200 million

AlphaFold-predicted proteins Jordan cites as useful but insufficient without query-specific uncertainty

Jordan’s methodological answer is prediction-powered inference. The paper defines it as “a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.”^† In Jordan’s description, one adds a small amount of ground-truth data to the large set of predictions and uses it to correct the bias for the estimand of interest. The goal is to retain much of the power of the large prediction set while shifting the confidence interval so that it covers the truth, as in classical statistical inference.

The visual explanation of prediction-powered inference breaks the procedure into three steps: identify a rectifier, construct a confidence set for that rectifier using labeled data, and then build a prediction-powered confidence set by including rectified values. Jordan’s point is not that the predictor becomes unbiased in every respect. It is that inference can be made valid for the particular question being asked, even when the underlying prediction system is biased for that question.

The essential distinction is between prediction and inference. A model may be good at generating predictions, but science often asks a particular question that was not known when the model was trained. Error bars must be attached to that question, not merely to some global measure of model performance. Jordan expects this problem to recur in science because scientists usually do not want to restudy the past. They want to ask new questions at the boundary of knowledge — precisely where foundation models are most likely to be weak or biased.

He contrasts this with two inadequate responses to bias. One response says bias will go away with enough data. Another critiques architectures and outputs without offering a scientific method for moving forward. Prediction-powered inference is offered as neither hype nor critique: it is a statistical procedure around a powerful but biased predictor.

Jordan’s stance toward “understanding” follows from this. When asked whether AlphaFold understands, his response is: “Why should AlphaFold understand?” He thinks anthropomorphizing intelligence and understanding is “not necessary, not appropriate, and is a distraction for many, many problems.” AlphaFold predicts; researchers use those predictions, experiments, and statistical tools to derive scientific understanding. The machine need not perform the act of understanding for the system to be useful.

The same argument extends to older industrial machine learning. Around 2000, Jordan says, Amazon was using large quantities of data for supply-chain modeling with methods such as random forests. Those systems made predictions about delays, parts, and logistics across enormous flows of products and customers. No human “understood” the whole supply chain in a cognitive sense. The system mattered because it reduced uncertainty and enabled stockpiling, planning, and optimization. Asking whether the overall system “understands transport and logistics” is, for Jordan, a media question rather than an engineering one.

Drug approval is not just a statistical test; it is an incentive problem

Michael Jordan uses drug discovery and regulation to show why adding economics changes the problem. A pharmaceutical company tests candidate drugs, guided partly by biological knowledge and partly by empirical screening. A regulator then decides whether a drug goes to market. A purely statistical description might focus on false positives and false negatives: approve good drugs, reject bad drugs, control type-one and type-two errors.

But the data do not arrive from a neutral IID source. They come from self-interested pharmaceutical companies. The regulator does not know everything the firms know. The firms may be motivated by helping patients, but also by money. That turns the problem into one of information asymmetry and incentives.

A slide shown during the discussion illustrates Jordan’s point with a simple protocol. Bad drugs have a 5% chance of approval and good drugs have an 80% chance of approval. If running a trial costs $20 million and approval yields $200 million, the expected profit for a bad drug is negative, and approvals are good drugs. But if approval yields $2 billion, the expected profit for a bad drug becomes positive. In that case, many bad drugs may be submitted and some approved, even under the same statistical testing protocol.

Case	Trial cost	Value if approved	Expected profit for bad drug	System implication
Small profit	$20 million	$200 million	-$10 million	All approvals are good drugs
Large profit	$20 million	$2 billion	$80 million	Many bad drugs are approved

Jordan’s FDA-testing example shows why identical statistical error rates can produce different outcomes under different incentives.

The lesson is not that statistical testing is irrelevant. It is that statistical guarantees can fail at the system level when strategic agents choose what data to present. If a drug could make enormous money even if it does not work, firms may have an incentive to throw many candidates at the regulator and profit from false positives. The overall system then fails to control error rates in the way the regulator intended.

Jordan connects this directly to AI deployment. As machine-learning systems spread through society, the relevant data will be local, valuable, and strategically held. Organizations will not simply give away data collected at expense. They will seek value from interactions. The questions become: What is the incentive to send data? What is the incentive to send truthful data? What prevents adversarial behavior? What mechanism aligns private behavior with the system’s stated goal?

That is why he says he cannot imagine a mature deployment of these technologies “without a deeply microeconomic perspective accompanying the machine learning.” Gradient descent on data is only one part of the problem. The rest is about agents, hidden information, incentives, and institutional design.

Data markets require equilibria, not just optimization

Michael Jordan’s three-layer data market is a minimal model meant to expose the structure of real data economies. “On Three-Layer Data Markets” studies a market comprising users as data owners, platforms, and a data buyer.^† Rather than treating the paper metadata as the argument, Jordan uses the model to explain why a data economy cannot be understood as a single prediction problem.

The layers are users, platforms, and data buyers. A platform provides a service — for example, payments — and receives user data as a byproduct. The platform uses the data to improve its service, but may not make enough money from the service itself. It therefore sells data to third-party buyers, who may use it for market research or behavioral analysis.

The third layer changes the equilibrium. The user who supplied the data has lost some privacy when a third party receives information about them. The platform gains revenue. The buyer gains information. The user may not be able simply to walk away. The system is now under stress.

Jordan’s proposed economic framing allows privacy to become a tunable part of the market. One platform might offer one level of differential privacy, another platform a higher level. Privacy-sensitive users may choose the stronger privacy guarantee. That platform may then attract more users and improve its service. But the added privacy means more noise in the data, making it less valuable to the data buyer. The buyer may pay less. Incentives are partially aligned and partially in conflict.

The mathematical problem is therefore not simply optimization. It is an equilibrium problem involving statistical assertions. The model asks how much can be predicted from noisy data, what payments flow, what utilities accrue to users, platforms, and buyers, and how social welfare changes under different policy constraints. Jordan says that in this case one need not merely simulate; one can write equations and calculate Stackelberg equilibria.

This is where he sees a major gap between machine learning and economics. Machine-learning people are good at optimization. Economists have tools for fixed points, equilibria, Pareto frontiers, market size effects, and strategic response. The two traditions, in his view, have barely met. Economists historically lacked enough data to inform market design and often relied on rational assumptions. Machine-learning researchers had data but did the obvious prediction task, such as next-word prediction, without thinking about equilibria. The future, Jordan argues, requires the two perspectives to merge.

He grants one point to the Silicon Valley intuition that data can encode behavioral information. Economists make rational assumptions they should not have to make; data can sometimes replace those assumptions and incorporate behavioral economics. But doing this outside any economic structure is, in his view, naive. It produces data-rich systems without mechanisms for incentives, privacy, or welfare.

Social knowledge is ephemeral, and no dataset captures the moment

Michael Jordan distinguishes data from social knowledge. Social knowledge is local, fleeting, and contextual: what is available on a street in Copenhagen, at what price, to whom, in what mood, under what immediate constraints. No amount of historical data can fully determine whether a particular person walking down the street will buy a particular product in the next ten seconds.

That does not make system design impossible. It means designers need humility. A useful market does not require a godlike overseer with a universal human value function. It requires institutions that let bottom-up preferences be expressed in the moment, without cheating, while allowing exchange to occur. Some preferences may be retained; some may be ephemeral and disappear. The system should respect that rather than assume all relevant information can be absorbed top-down by a platform.

Jordan’s critique of Silicon Valley is sharp here. He says platforms often forget that their data came from bottom-up human activity in specific contexts. They then imagine that enough data will allow them to design everything top-down. He cites the shift after search engines as an example. Search was valuable because it expanded access. But subsequent technology became “prying”: glasses, cameras in the home, detailed life monitoring, and promises to improve life through pervasive observation. “That equation did not calculate for me,” he says.

Culture, in Jordan’s account, is itself a collective abstraction mechanism. Individuals create abstractions that work for them; useful ones can be communicated, promoted into culture, retained, modified, or discarded. Systems might help with this process, but he does not want to trust them to take over the burden. He wants the field to draw on social science, organizational behavior, economics, law, and related domains that study how groups form and sustain useful knowledge.

His examples of platform economics sharpen the point. Jordan argues that Spotify is “close to perhaps a monopoly” and is not strongly incentivized to pay artists well. He advises UnitedMasters, which he describes as an alternative in which musicians keep their work and are connected to brands and other opportunities. The point is not merely music-industry criticism; it is a model of market design. A better system, in his view, links producers and consumers, gives artists information about audiences, and creates opportunities beyond passive streaming revenue.

He makes a similar economic criticism of YouTube under Google. YouTube, he says, was more than a pointer to websites. It created a producer-consumer relationship in which creators made things people watched. A socially responsible platform, in Jordan’s view, would have recognized that it had created a market and made the economic connection between viewer and creator more valid. Instead, he says, the system routed attention through Google’s advertising model, with only modest incentives flowing back to creators. He calls that “a huge mistake,” and says Facebook made the pattern worse.

The superintelligence narrative is demoralizing and too narrow

Michael Jordan calls recursive self-improvement and imminent superintelligence “very science fiction.” He does not deny that science fiction matters socially, nor that autonomous software can pose real risks. His objection is to the public framing in which young builders are offered two grand narratives: exuberant AGI utopia or catastrophic extinction. He says this is “so demoralizing” for 20- and 25-year-olds who want to build useful technology for their families, communities, and countries.

He is especially critical of senior figures who, in his telling, developed algorithms under the banner of understanding intelligence and now tell younger people that the field is too dangerous or already finished because superintelligence is near. Jordan argues that those earlier systems did not understand intelligence either; they built gradient-descent algorithms and other machine-learning machinery. To tell the next generation that there is nothing left to do, or that building is likely to wipe out humanity, is in his view harmful.

His deeper concern is that this dialogue excludes economic thinking. The public debate becomes a fight between those with money who want to build for its own sake and those warning that the technology will destroy humanity. Missing are questions about labor and capital, institutions, education, mechanisms, local data, human improvement, and positive work at human scale.

Jordan’s own positive vision is not human replacement. He says humans are “shockingly beautiful” in creativity, emotion, love, and social life, and he does not want robots taking over from us. But humans also hurt one another, misunderstand intentions, make bad decisions under uncertainty, and have political institutions that often perform poorly. AI, for him, should help repair weaknesses in human information flow. It should help humans make the good decision they wanted to make but could not make because they lacked information or faced a broken system.

That is why he is more worried about labor-capital relationships than about a machine deciding to take over. He expects vertical systems to improve mathematicians, doctors, teachers, and other professionals, but not simply eliminate them. The central question should be how to design hybrid systems that improve human capability and institutional performance.

Airplanes are his example of productive automation. Modern flight is safer in part because of autopilots, with humans able to intervene as needed. This is not pure autonomy replacing humans; it is a blend of automation and human oversight in a system engineered over time. The same systems thinking applies to cars. “Just putting a super intelligence behind the wheel of a car is dumb,” he says. The important design questions involve traffic, roads, human behavior, autonomy, supervision, and the system as a whole.

Jordan also criticizes the culture around some Silicon Valley builders. He says Ilya Sutskever and others have built impressive systems that many people are using and that are changing how people think. But he is bothered by what he describes as a pattern in which figures such as Elon Musk or Sam Altman come in “taking the cream off the top” of work done by previous generations, without appreciating why that infrastructure was built. He describes a world where more outrageous, physics-inflected, biology-inflected, or neuroscience-inflected language makes someone sound like a guru and attracts more money. To Jordan, that posture can look less like mature engineering than “detachment from reality.”

Mechanism design is the inverse of game theory

Michael Jordan treats game theory not as a metaphor for conflict but as a predictive mathematical discipline. Like Newtonian mechanics, it offers a way to write down a model and derive predictions. In physics, one writes down forces and integrates a differential equation. In game theory, one writes down a game and calculates Nash equilibria, correlated equilibria, Stackelberg equilibria, sequential equilibria, regret measures, or social-welfare constructs. Sometimes those equilibria characterize real behavior; sometimes they do not; the theory improves.

The engineering move is to invert the problem. Science asks: given this setup, what will happen? Engineering asks: given the outcome I want, what design will produce it? In physics, the inverse problem is building a bridge that stands up. In strategic systems, the inverse of game theory is mechanism design.

Mechanism design asks what game to design so that a desired outcome is realized. The outcome might be fair allocation, revenue, market creation, payment to the right party, or a particular social-welfare objective. Contract theory, Jordan’s area within mechanism design, studies asymmetric interactions where one party has private information and another wants to incentivize actions based on that information. Auction theory is another branch, where a mechanism reveals how much bidders value an object and allocates it accordingly.

The examples Jordan returns to are concrete: the regulator who must induce pharmaceutical firms to submit genuinely promising drugs rather than profit from false positives; the platform that must offer privacy and data value in a three-layer market; the buyer and seller who need a menu of services and prices when one side has private information. The predictive model says what might happen under a given setup. The mechanism-design question is what setup produces the behavior the system actually needs.

Uncertainty must be contextual, sequential, and social

Michael Jordan’s third foundational pillar is uncertainty. He sees uncertainty quantification not as a decorative confidence score attached to a model answer, but as a discipline for decision-making, evidence gathering, and system design.

The discussion of conformal prediction and e-values shows the level at which he wants to reason. Classical p-values are one-shot tail probabilities: under a model, an observed outcome looks improbable, so perhaps the model is wrong. But if a researcher repeatedly looks at many p-values and selects the smallest, p-hacking produces invalid conclusions. E-values, as Jordan describes them, use nonnegative random variables or supermartingales whose expectation under the null is bounded by one. Evidence can accrue over time, and under Ville’s inequality one can control behavior over an entire path. This enables “anytime inference”: peeking, adapting, gathering more data, and stopping under valid guarantees.

Jordan connects this to incentives. In statistical contract theory, he says, there is a tight connection between incentive-compatible contracts and e-values: a contract is incentive-aligned if and only if payoff functions are e-values. For him, this exemplifies why uncertainty quantification is not merely “here’s an error bar.” The meaning of uncertainty depends on context — a contract, an evidence-gathering process, a market, or an adaptive decision.

His duck example makes the point more intuitively. Suppose a duck knows that one side of a lake has two parts grain and the other has one part. A purely individual Bayesian maximizing expected value would always go to the richer side. But actual ducks, Jordan says, distribute themselves roughly two-thirds to one side and one-third to the other. In a population, that is not irrational noise; it is the equilibrium. If all ducks went to the same side, they would waste the other resource. The right use of uncertainty depends on the collective context.

He then distinguishes other forms of uncertainty. Information asymmetry is not sampling noise: another person may know things you do not, have expertise you lack, or strategically withhold information. Provenance is different again: a medical dataset collected ten years ago should affect confidence differently from fresh data, and that age should be tagged as metadata and quantitatively incorporated. Classical statistics can sometimes discuss such issues, especially in Bayesian terms, but current LLMs generally do not.

When a language model is asked how confident it is, Jordan says, it is largely mimicking patterns in text. Somewhere on the internet a person answered “I’m very sure” in a similar situation, and the model reproduces that style. That is not reasoning under uncertainty. It lacks the social context, provenance awareness, sampling model, experimental-design logic, and incentive structure that humans often combine informally.

Markets, for Jordan, are also uncertainty-reduction devices. A pizza restaurant does not need to forage for tomatoes every day because a market stabilizes supply. That lowers uncertainty enough for the restaurant owner to build a business on top. Markets do not reduce uncertainty by running a textbook optimal experiment. They do it through incentives, exploration, exploitation, and distributed activity.

The new core is computational, inferential, and economic thinking

Michael Jordan’s proposed educational and intellectual foundation is a triangle: computer science, statistics, and economics. The triangle appears under the title “Three Foundational Disciplines,” with statistics, economics, and computer science at the vertices, and econometrics, machine learning, and algorithmic game theory along the connecting edges. Jordan frames these less as departments than as thinking styles.

Computer science contributes computational thinking: modularity, abstraction, APIs, algorithms, and systems. Statistics contributes inferential thinking: uncertainty, data collection, error, hypothesis testing, confidence, experimental design, and prediction under limited information. Economics contributes strategic and institutional thinking: incentives, equilibria, contracts, markets, welfare, and asymmetric information.

He thinks modern LLMs largely come from one side of the triangle: computation, optimization, and scale. That is enough to produce impressive predictors, but not enough to supply context around them. Economic thinking explains the strategic systems into which they enter. Statistical thinking explains how to make claims, quantify uncertainty, and decide what data to gather next.

Jordan calls this triangle a kind of liberal arts for the era. He acknowledges that humanities colleagues might disagree with that framing, but his claim is that the core intellectual issues of the moment involve data, compute, incentives, and uncertainty in society. To handle them responsibly, the next generation needs more than coding skill and scale. It needs the ability to design mechanisms, reason about error, and build systems in which humans remain producers, consumers, decision-makers, and beneficiaries.

AI Research Methods AI Business Models AI Economics and Labor