Suno Bets That Making Songs Can Become a Mass Consumer Medium

Mikey ShulmanSequoia CapitalWednesday, May 13, 202613 min read

Suno founder and CEO Mikey Shulman argues that AI music should not be understood as a cheaper substitute for streaming catalogs, but as a new form of active consumer entertainment. In a conversation with Sequoia’s Sonya Huang, he says Suno’s technical choices — modeling raw sound, prioritizing full songs, and using preference data rather than conventional benchmarks — support a product thesis that making music can be as much the point as listening to it. Shulman also frames partnerships with labels such as Warner as central to building new participatory music formats, not as a concession to incumbents.

Suno’s bet is that AI music is not a cheaper way to fill a streaming catalog. Mikey Shulman argues that it is a new consumer medium: active rather than passive, closer in some ways to gaming, cooking, and coding tools than to Spotify. The technical premise and the product premise are linked. Suno, which Shulman describes as a music company and creative entertainment platform, models music as raw sound rather than encoded music theory because it wants users to make things that existing categories would not have anticipated.

Suno’s central bet is that music should be modeled as sound, not as music theory

Mikey Shulman describes Suno’s technical premise as an attempt to remove inherited musical structure from the model rather than encode it. In his account, the important early decision was not to teach the system that Western music has 12 tones, or that a fixed inventory of instruments exists, but to treat music as raw sound: a sampled waveform, “48,000 times a second,” represented as continuous numbers.

That choice made the problem harder. Text has discrete tokens; audio is “unwieldy.” Shulman said Suno’s founders initially believed high-quality music generation was still “a couple of orders of magnitude” away in compute, model size, and capability. The company began by applying similar technologies to understand audio rather than generate music. The surprise was that breakthroughs in audio compression and modeling made music generation possible earlier than their back-of-the-envelope estimates suggested.

The payoff, in Shulman’s telling, is generality. If the model is told that music consists of 12 tones, it can only make those tones. If it is told that there are 200 instruments, it is confined to those sounds. Suno wanted a system that could make the next sound, not merely recombine a sanctioned vocabulary of musical concepts.

In Western music there are 12 tones. If you tell the model there are 12 tones, it will only ever produce those 12 tones. You will be forever limited.

Mikey Shulman · Source

Sonya Huang pressed on whether learning music from first principles simply rediscovers existing genres and notes. Shulman said the system also produces things “you never would have thought of.” The most common version is genre blending: trap with sitar, country with 808s, and other combinations that “have no business going together.” He also pointed to microtonal music and to outputs that sound neither like familiar genres nor like mere noise, but “totally strange and bizarre and lovely.”

The model still has uneven strengths. Shulman said Suno is “very good at country” and “very good at pop music,” and suggested that more formulaic genres may be easier for the system to handle. But he resisted framing the difference only as good versus bad. In genres where Suno is weaker, he said, the company may not have “raised the floor,” so users encounter more bad outputs. But he argued the ceiling has also risen: if a user is willing to search long enough, “you’ll find amazing stuff.”

Music is not an LLM-style scale problem

Mikey Shulman explicitly rejected the idea that music generation should be understood through the same scaling lens as large language models. The models, he said, are “pretty small,” and that matters both for research and for product experience. Smaller models help Suno return music quickly, which he described as important to user experience.

His reasoning is that language-model progress often has visible benchmarks. Even if people argue about which benchmarks are flawed, the benchmarks give model builders something to climb. In music, by contrast, there are “no right answers” and “no benchmarks.” Two listeners may disagree about whether a song is good; they may even disagree about what “good” should mean. The problem is therefore less about pushing a score upward and more about aligning a creative system with human taste.

That makes preference data central. Shulman said Suno can measure whether one model is preferred over another, but those internal preference gains do not map cleanly to user uptake. A model that is 10% or 15% preferred in testing does not necessarily produce proportionate product growth. Music, he said, is too messy for that kind of clean relationship.

Still, he emphasized that preference data is not merely a product metric. It feeds back into research. Suno’s scale of user interaction gives the company data that helps it develop and refine techniques; without that data, he said, the research itself would be harder to do. Huang noted that music preference feedback may be easier to use than text-model feedback because the usual worries about sycophancy apply differently. Shulman agreed, saying that much of Suno’s edge comes from its ability to understand preference, do research on it, and “RL that back” into its models.

Version releases are not clean scientific thresholds. Shulman said Suno has an aggressive research roadmap and already has a sense of what later versions are meant to do. But the cut between one release and the next is partly arbitrary: at some point, the company decides which improvements go into V5.5 and which are held for future models. He framed cadence as a discipline against waiting years to release an all-encompassing “music model to save humanity.”

The product is closer to creative entertainment than to streaming

Suno’s most consequential product claim is not that AI can make songs. It is that making songs can itself be entertainment. Mikey Shulman said that before Suno, nearly everyone was a consumer of music and only a small share of people made it. On Suno, he said, “on any given day, 90% of the users are going to create something.”

90%

of Suno daily users create something, according to Shulman

The striking part, he argued, is that users are not primarily making songs in order to export them elsewhere. They are making music because the creative act is enjoyable and fulfilling. He described that as the “big step change”: creation is the entertaining part.

That is why Shulman and Sonya Huang placed Suno closer to gaming, cooking, and coding assistants than to Spotify. Huang suggested the platform feels like “active entertainment.” Shulman agreed, while acknowledging that comparing music to gaming can be taboo inside the music industry because music is treated as art and gaming often is not. But he said music has things to learn from games: how they command attention, pull users in, make people use their brains, and monetize.

The analogy to cooking is more precise. People cook even when they could get a better meal at a restaurant because the making is enjoyable and because it is satisfying to consume what they made. Shulman sees a similar dynamic in coding copilots such as Claude Code: even if a personal project is not meant to become a production service, it can be fun to build and fun to use. He predicted that in 10 or 20 years there will be many more “creative entertainment” products because AI makes it possible for many more people to create in many more domains.

The entertaining part is being creative. It’s not that you are being creative for the sake of bringing the piece of content somewhere else.

Mikey Shulman

This also shapes Shulman’s answer to criticism that AI output is “slop.” He said the term is often used without clear meaning. If the concern is streaming fraud, he said, the fraud is the bad part; the fact that a song was made with AI is, in his words, an implementation detail. He gave the example of making two songs with his five-year-old. Almost no one else on the planet may want to hear them, but they are meaningful to him. If that is called slop, he said, he is not sure he cares.

He also placed the concern in a longer pattern. When laptops made music production accessible to many more people, some feared a flood of low-quality music. Looking back, he argued, it is “obviously a good thing”: more bad music exists, but so does more great music, along with new kinds of music and new kinds of stars. He sees no reason why another expansion in who can make music should be different.

The ceiling matters as much as the floor

Mikey Shulman distinguished between low-stakes personal creation and public artistic success. He said Suno has had users make charting songs, sign record deals, and create tracks that reach audiences outside the platform. He framed those cases as new creators bringing new perspectives that resonate with listeners.

His favorite example was Ayumoni, the stage name of a poet who, according to Shulman, had been writing for about a decade and used Suno to turn that poetry into music. Shulman described the result as an artist finding “an entirely new voice” and an audience for work that had already been deeply personal. For him, that example answers a common objection: the best music still requires human guidance because listeners respond not only to sound, but to the messenger.

He also argued that the all-AI versus no-AI distinction will not hold. Professional music, in his view, will often contain “some AI” rather than be wholly generated. Shulman said he believes there are already charting tracks with “little bits of Suno in them,” because professionals can use the tool as one part of a workflow. He compared that to existing production techniques such as digital production and Auto-Tune: over time, a tool becomes part of how music is made rather than a separate category.

For Shulman, Suno is not meant to create a parallel AI-music world. If AI is going to be embedded in many songs and workflows, then dividing music into AI and non-AI categories is, in his framing, both practically awkward and bad for users.

Suno wants partnership with the music industry, not a separate AI music economy

Sonya Huang framed music as an unusually difficult industry for an AI founder to enter and referred to a “settlement, partnership” with Warner. Mikey Shulman used the question to push back against the assumption that Suno is hostile to labels. He said people expect him to declare the record labels “cooked,” but he called that “obviously wrong.” The labels, in his view, are among the most culturally important institutions in the world. They understand music and music culture, and they cultivate stars who resonate with billions of people.

The opportunity Shulman said he sees with Warner is to build products that “could never have existed before,” especially products that let fans interact with favorite artists through music. In Shulman’s framing, this can be positive-sum: artists deepen relationships with fans, fans feel they are engaging with artists musically, and rights holders have a monetizable product.

The digital music experience, Shulman said, has barely changed in 25 years. Streaming has dominated the experience, but he believes music is due for “a new innovation and a new format.” He does not think that means “AI-powered Spotify.” In fact, he called the idea that AI simply makes a better Spotify “obviously wrong.” His ambition is to make music less passive and less backgrounded, not merely more efficiently recommended.

That ambition extends to live music. Shulman said Suno is probably already present in some live contexts, in backing tracks or parts of songs. But the larger aspiration is a main-stage consumer participation experience: a “truly interactive concert” where the audience can participate and make music with the artist. He said he hopes to see that within the next year.

He compared the experience of making a song with hundreds or a thousand people during a demo to something “almost religious,” invoking group chanting and singing. His question was why that kind of collective sonic participation should be confined to religious settings when festivals already gather people who are excited to be together.

The product moat has to be experience, not just the model

Mikey Shulman was blunt that model quality alone may not be a durable moat. He said Google has begun building music models; Suno’s are “way better today,” in his view, but Google can outspend Suno and may catch up on the model side. That makes product, interface, and user experience central.

The constraint is especially severe in consumer products. Shulman argued that average consumers will not tolerate rough edges the way enterprise users sometimes do. They are using the product for fun, not work, and they are likely paying themselves rather than through an employer. The experience has to feel good.

Suno’s internal framing reflects that. Shulman said one company value is “we’re just a music company,” and that he often does not think of Suno as a technology company. The point is to prevent the company from building technology for its own sake. Technology exists to “delight people.”

Several product decisions followed from that orientation. One was leaving Discord earlier than Shulman expected. Suno had launched as a Discord bot, inspired partly by the ease of testing a creative tool there. Shulman thought the company might remain on Discord for a while. Instead, after Suno released a relatively thin web app at the end of 2023, 90% of traffic moved to the web within five days. He took that as overwhelming evidence that his expectation had been wrong.

Another decision was to focus on songs with lyrics rather than background music. Shulman said a song is a story and captures attention in a way that vocal-less background music does not. It was also much harder, which became a source of differentiation. In hindsight, he said, the important point was not only that Suno could do something difficult; it was that the human voice “touches people in a certain way” and makes the product more delightful.

A third decision was to prioritize full songs over short, high-fidelity clips. Early systems could make roughly 10- or 12-second snippets. Suno optimized instead for three- or three-and-a-half-minute songs, even though that meant accepting worse audio quality for a long time. Shulman said competitors had much crisper audio, and people could hear one second of a Suno song and recognize it as low-quality. But Suno chose storytelling over polish.

The technical embodiment of that product choice was autoregression rather than diffusion. Shulman did not frame that as an emotional attachment to a method. He framed it as a product decision: telling a complete musical story mattered more than generating a short piece of crisp audio.

Suno’s next year is about social creation and personal expression

Mikey Shulman said Suno remains early: most people still do not know about it, and the product is “still very crude.” Over the next 12 months, he said, the focus is to make music creation more social and more expressive of the person creating it.

Social creation, as he described it, is not only sharing finished songs. It could mean sharing templates that someone else riffs on and sends back, creating a back-and-forth musical exchange. It could also mean co-creating with a favorite artist, perhaps using old unreleased material. He described both synchronous and asynchronous versions of this kind of collaboration.

The other focus is letting users put more of themselves into the music. Suno’s recent voice feature is central to that. Shulman said hearing yourself in a song makes you more attached to it. Hearing someone else’s real voice in a song they send you can make it resonate more than a generic but polished voice, because the human ear is highly attuned to voices.

He also said Suno has a video product in beta. The distinction he drew was between music videos that heighten a song and tell its story, and background music placed behind other online content. He is more interested in the former because it pulls people deeper into music rather than leaving music as a background layer.

That preference returns to the larger disagreement beneath the discussion. Sonya Huang suggested there are fewer consumer AI founders because it is easier to see how AI automates business processes than to imagine how it changes play and creativity. Shulman agreed that Suno is more motivated by doing something that “wasn’t possible until today” than by speeding up something that already exists. The work, as he presents it, is not to automate music consumption. It is to make music creation ordinary, social, and active.

Evals and Benchmarks Data and Training Voice and Audio AI AI Product Management AI in Design and Creative Work AI Consumer Products