ElevenLabs Launches Music v2 for Licensed Commercial AI Song Generation

ElevenLabsTuesday, May 26, 20264 min read

ElevenLabs is presenting Music v2 as a licensed-data AI music model built to generate vocal-led tracks from detailed natural-language prompts, not just loops or backing beds. The launch materials argue that the model can produce finished-sounding, one-shot outputs across styles and languages, while adding workflow features such as targeted inpainting, section-by-section composition, and deployment through ElevenMusic, ElevenCreative, and a forthcoming ElevenAPI.

Music v2 is framed around one-shot, vocal-led generation

ElevenLabs introduces Music v2 with a specific on-screen claim: “This track is an unedited, one-shot output generated by our music model.” The launch is therefore not framed only around generating loops, backing beds, or sketches. ElevenLabs presents the model as producing vocal-led music directly from natural-language musical direction.

The source description extends that claim beyond the heard examples. ElevenLabs says Music v2 improves vocals, instrumentation, multilingual generation, and arrangement “across every genre.” It also lists support for mid-track genre transitions, fast rap, dense lyrical delivery, and non-musical sound effects embedded directly within a track. In the video, the most visible evidence is a series of audio-player screens pairing short generated passages with detailed style prompts.

style prompts shown in the launch video

Those prompts are not simple genre tags. They combine region, era, instrumentation, vocal color, beat design, and performance technique. The product claim is that Music v2 can respond to that kind of direction in generated music, including vocals.

The prompts specify arrangement, delivery, and vocal character

The four visible prompts show the level of musical instruction ElevenLabs is emphasizing. They move from grime and trap to K-pop, nuevo flamenco, and hard-bop jazz on a trap beat, while also naming production choices and vocal performance traits.

Visible prompt	Direction emphasized
London grime meets US trap, skippy eskibeat claps, booming 808s, dark underground anthemic mix.	Hybrid rap production, percussion feel, 808 low end, underground tone
2020s K-pop chorus, deep 808 sub-bass, smacking trap drums, sugary catchy hook.	Pop-chorus structure, trap drums, sub-bass, hook-driven repetition
Raw nuevo flamenco, two acoustic Spanish guitars with aggressive rasgueado passionate male cante hondo vocal in Spanish.	Spanish-language vocal, flamenco instrumentation, guitar technique, male cante hondo performance
1950s hard-bop jazz combo, smoky female jazz contralto phrasing on a trap beat.	Era-specific jazz ensemble, contralto vocal color, jazz-trap hybrid arrangement

The launch examples pair generated passages with detailed natural-language musical prompts.

The first two prompts both draw on trap production language, but they ask for different musical jobs. The grime-meets-trap prompt emphasizes rhythmic density and weight: “skippy eskibeat claps,” “booming 808s,” and a “dark underground anthemic mix.” The vocal passage that follows is fast and branded around ElevenLabs: “Eleven on my wrist, Eleven on the rise. Eleven on a mic, heavy level intricate.”

The K-pop prompt asks for a different function: a chorus, a sugary hook, and repeated syllabic phrasing. The generated line — “Eleven, ele-leven, turn it up. E-E-Eleven” — is built around repetition rather than lyrical density. Together, those examples show ElevenLabs using prompts to position Music v2 as controllable across both sound palette and vocal role.

Multilingual and cross-style generation are central to the demonstration

The Spanish-language example carries the clearest visible support for ElevenLabs’ multilingual claim. The prompt asks for “raw nuevo flamenco,” “two acoustic Spanish guitars,” “aggressive rasgueado,” and a “passionate male cante hondo vocal in Spanish.” The heard lyric mixes Spanish and English: “Once campanas tocan en mi corazón. Olé. Let the rhythm take you to the sun.”

That prompt is doing several things at once. It specifies instrumentation, technique, vocal character, performance tradition, and language. ElevenLabs is not merely labeling the example as Spanish or flamenco; it is presenting Music v2 as responsive to a cluster of musical constraints.

The fourth prompt pushes the model into a different hybrid: “1950s hard-bop jazz combo, smoky female jazz contralto phrasing on a trap beat.” The lyric is smoother and more melodic: “Eleven keys are turning. Eleven hearts on fire. Every note arriving, lifting us up higher. Eleven.” Here the instruction combines an era-specific jazz reference with a modern beat framework and a specified vocal register. That example supports the same broader message: the model is being introduced through vocal-led arrangements where genre, era, production, and singer characteristics are all part of the prompt.

ElevenLabs ties the model to editing workflows and commercial deployment

The source description also presents Music v2 as more than a prompt-to-song generator. ElevenLabs says it powers ElevenMusic, ElevenCreative, and ElevenAPI, with ElevenAPI described as “coming soon.” It maps those products to artists, developers, and brands.

Two workflow capabilities are highlighted. The first is improved inpainting: a user can select any section of a track and regenerate only that part, while leaving the rest untouched. The second is long-form composition: building a full song section by section.

Those claims matter because they describe Music v2 as part of a production workflow rather than only a launch demo. Inpainting addresses revision: keeping a track while changing a specific passage. Long-form composition addresses structure: assembling a full song across sections. The source description places those capabilities alongside the model’s improvements in vocals, instrumentation, multilingual generation, and arrangement.

ElevenLabs also says Music v2 was trained only on licensed data and is cleared for commercial use. In the launch materials, that claim sits with the product’s intended use cases: artists publishing music, developers integrating generation through an API, and brands using purpose-built creative tools.

The core claim is detailed prompt control over finished-sounding musical outputs

All four heard passages are built around the word “Eleven,” and the final screen points to ElevenLabs and elevenmusic.io. The repeated branding makes the comparison easy to follow: the same name is carried through fast rap, a K-pop-style chorus, a Spanish flamenco-inflected vocal, and jazz-trap fusion.

This track is an unedited, one-shot output generated by our music model.

Within the source materials, Music v2 is presented as a model for generating vocal-led music from detailed natural-language direction, then extending that generation into editing and deployment through inpainting, section-by-section composition, ElevenMusic, ElevenCreative, and the forthcoming ElevenAPI. ElevenLabs’ commercial claim is explicit as well: the model is described as trained only on licensed data and cleared for commercial use.

AI in Design and Creative Work Voice and Audio AI Model Releases