Google’s AI Strategy Emphasizes Scale Over Frontier Model Leadership

Demis HassabisHard ForkThursday, May 21, 20267 min read

Kevin Roose and Casey Newton read Google’s I/O announcements as evidence of a company that has regained operational confidence in AI without yet proving frontier leadership. Roose argues Google is leaning on speed, cost, distribution and infrastructure — putting capable models across search, coding, video and cloud tools at enormous scale. Newton is more skeptical: fast and cheap, he says, is not the same as best, and many of Google’s most important product claims remain untested until users can rely on them in real workflows.

Google looked operationally stronger than it looked frontier-leading

Google’s I/O pitch, as Kevin Roose read it, was less about having the undisputed best model than about making AI cheap, fast, multimodal, and broadly distributed. The company looked more confident in scale, speed, and product surface area than in any claim that its model quality had clearly surpassed the field. The unresolved competitive question was whether that operating strength is enough if Google still has not shown the best frontier model.

The company showed agentic search, coding tools, video-editing models and products, an interface prompting users to “Create a custom video,” and an “Omni” model that can take in video, images, and text. Onstage, Google framed the effort as a stack: “Products and Platforms,” “Models and Tooling,” “World-Class Research,” “Security,” and “AI Infrastructure.”

The scale argument was explicit. A keynote chart said Google was processing “3.2Q+” monthly tokens across its surfaces, up from “9.7T” in May 2024 and “~480T” in May 2025, with “7x Y/Y growth.” Roose took that as a sign of the company’s strategy: not necessarily to win every benchmark at the frontier, but to serve capable models quickly and cheaply to billions of users.

3.2Q+

monthly tokens processed across Google surfaces, according to an I/O slide

The chart did not show model quality; it showed throughput. That distinction mattered to the hosts’ reading of Google’s position. A company processing tokens at that scale can make speed and serving cost into strategic advantages, even if the question of frontier leadership remains open.

The model at the center of that point was Gemini 3.5 Flash. Roose said Google talked about it as four times faster and much cheaper than other leading frontier models. After trying it in Antigravity, Google’s coding assistant, he found it very fast and said he did not quickly run into token limits, unlike with some other coding models he has used. But he did not find it transformative. For companies burning billions of tokens a day, the serving economics may matter. For an individual choosing a daily driver, he said nothing about it had made him want to switch.

Casey Newton was more skeptical of the framing. “Fast and cheap is what you talk about when it isn’t the best,” he said. He did not argue that Google is uninterested in frontier performance; his point was that if Google had built the best model, that would be the message. Instead, the emphasis this year was speed, cost, and distribution.

The search overhaul is still more promise than proof

Google described its search changes as the biggest in 25 years, but Casey Newton said the demos did not yet establish that scale of change. What was visible, to him, was a search box expanding to accommodate longer queries. One mobile example asked whether wheel throwing or hand building was easier for learning pottery and requested available classes nearby on Tuesday nights or weekends. Another slide promised “Information agents in Search,” available in the summer to Google AI Pro and Ultra subscribers.

Newton’s caution was not that the concept is trivial. Google also said it would generate custom user interfaces based on a user’s query, which could make search feel less like a list of links and more like an adaptive task surface. His point was that many of the most important promises were “coming this summer” or coming first to trusted testers. Productivity tools do not become meaningful until people can use them in their own workflows.

Kevin Roose described the agentic search mode as a more powerful version of Google Alerts: users could ask it to monitor for events such as a new home listing or a relevant baseball score. Newton’s serious question was product-market fit. Which agent, if any, becomes the one with a billion users? Google showed tools in the broad genre of coding and autonomous task execution, but Newton said he did not yet know what he would ask such an agent to do overnight on his behalf.

Roose, by contrast, was interested in the new cloud-based agent setup because it could run work in a virtual machine rather than requiring a laptop to remain open. For him, moving that kind of workflow off a personal machine was one of the more practical improvements shown.

Gemini 3.5 Flash drew early criticism on price and quality

The immediate response from AI power users, as Kevin Roose and Casey Newton described it, was mixed to negative. On-screen social posts criticized Gemini 3.5 Flash from several angles. Ahmad, posting as @TheAhmadOsman, called it “disappointing” and asked why Google kept “fumbling their models.” Ethan Mollick, posting as @emollick, wrote that Gemini 3.5 Flash and Gemini 3.1 Pro were “excellent,” but also said they could not be used for serious purposes, especially enterprise work, because compared with Claude or ChatGPT users could not understand what the model did or how to correct it. Theo of t3.gg wrote that he missed when Flash was “the underrated goat model” and called 3.5 “a useless model” that should not be used for anything “as far as I can tell.”

The sharpest displayed pricing criticism came from an AIBattle post, which challenged the Flash-to-Flash comparison rather than the comparison with other frontier models. The post said Gemini 3.5 Flash was three times as expensive as Gemini 3 Flash: input tokens at $1.50 per million versus $0.50, and output tokens at $9.00 per million versus $3.00.

Model	Input per million tokens	Output per million tokens
Gemini 3.5 Flash	$1.50	$9.00
Gemini 3 Flash	$0.50	$3.00

An AIBattle post compared Gemini Flash pricing and said 3.5 Flash costs three times as much as Gemini 3 Flash.

Newton said the pricing reaction mattered because, in his view, the model had been sold largely on the strength of its low price. For many ordinary uses, he allowed, Gemini 3.5 Flash may be perfectly adequate. If a typical Gemini user is “trying to pass sixth grade for free,” he said, the model could help. But the first people to test new models publicly are often AI power users, and that crowd appeared disappointed.

Roose added that some people in his feeds were saying the model might not be as good as Gemini 3.1 Pro on some coding evaluations. His own experience aligned with a moderate version of the criticism: fast, useful, not enough of an improvement to change his habits.

The keynote calendar may not match the model calendar

Kevin Roose raised a structural problem for Google: a once-a-year developer conference can make AI development look worse than it is if model readiness does not align with the keynote calendar. He said he had heard “whispers” from people at Google that a more powerful model was in development but may not have been ready for I/O. In that case, the company can appear behind simply because the deadline for the show arrived before the stronger model did.

Casey Newton identified the next test as Gemini 3.5 Pro, expected the following month. He described that as the model he would evaluate more seriously against current frontier-class systems and against his own daily workflows. Flash, for him, is not the class of model he naturally reaches for: he would rather wait longer and pay more for the best answer than optimize for near-instant output from a cheaper model.

Roose’s focus was the operating model: low-latency, high-scale AI that Google can put everywhere. Newton’s focus was whether the tools solve specific, high-value problems for demanding users. Google’s announcement looked stronger under the first standard than under the second.

The tone was confident, but the verdict was a base hit

The atmosphere at I/O was notably receptive to AI. Casey Newton said it was the only recent large gathering he had attended where mentions of AI did not produce widespread booing. The keynote included many AI-generated images and videos, and the crowd did not audibly turn against them. Kevin Roose interpreted Google as largely avoiding discussion of the AI backlash and instead betting that useful products will make objections fade.

Then Demis Hassabis closed with a line that changed the register from product launch to civilizational claim.

When we look back at this time, I think we will realize that we were standing in the foothills of the singularity.

Demis Hassabis

Roose said that line served as a reminder that “none of this is normal,” even amid practical demos of search, coding, and video tools. Newton compared it to Google’s version of a Steve Jobs “one more thing”: “one more thing, the singularity.”

Newton called the event “a base hit”: if Google ships what it showed and the products work as advertised, there is useful material there, but no single product made him feel he had to use it immediately. Roose said that roughly matched his expectations. Google, in his view, is in a better position than a couple of years earlier, when it seemed to be scrambling to get Bard out the door, and it has found its footing. But the unresolved question remains whether it can produce a model that is truly state of the art.

AI Search and Browsing AI Labs and Strategy Inference and Deployment Agents and Autonomy Multimodal AI Image and Video Generation Model Releases Coding Assistants