Two significant AI music updates landed this week, and neither came from Suno.
Music v2: One track, opera to heavy metal, no breakdownMusic v2 is ElevenLabs' second music model, arriving roughly 10 months after the first. The core pitch is coherence under pressure. According to Elevenlabs, a single track can shift from opera to heavy metal and back, hold together through fast rap, and embed non-musical sound effects—all without the composition coming apart.
Generative audio tends to fall apart exactly when prompts get complicated, so this is the thing worth watching, especially in longer compositions.
Inpainting is now actually useful: select a section, regenerate it, leave everything else untouched. Users can also build songs section by section—intro, verse, chorus—with the model maintaining continuity throughout instead of treating each clip as a standalone generation. Multilingual support has improved too, though ElevenLabs didn't publish specifics.
The model powers three platforms: ElevenMusic for creators, ElevenAPI for developers, and ElevenCreative for brands. It's live on ElevenMusic and ElevenCreative now; API access is early-entry via the sales team.
Stable Audio 3.0: Open weights, on-device, actually longerThe Small models run at 459 million parameters each—no GPU needed. (Parameters are what measure an AI model’s capacity, essentially.) Medium hits 1.4 billion parameters and generates its 6:20 output in about 1.31 seconds on an H200 GPU. Large, at 2.7 billion, is API-only for organizations with over $1 million in revenue. Per-second generation granularity means you get exactly the track length you asked for, not an approximation.
It’s also supported in ComfyUI for local setups
The architecture is new: a semantic-acoustic autoencoder Stability calls SAME, designed to hold melodic coherence over longer outputs. LoRA fine-tuning is supported, so artists can adapt the models to their own catalogs. Inpainting is in too—single-segment, multi-segment, and causal continuation to extend a track past its original endpoint.
For context, a LoRA (Low-Rank Adaptation model) is like a tiny model that conditions how the full model generates its outputs. If you train a LoRA on blues, the model will produce blues, if you train a LoRA on BB King blues, the model will produce songs that will sound like BB King. Inpainting means a model can fix small errors in its creation. So, for example, if the model hallucinates something at the 2:30 mark, you can select a few seconds of the song, ask the model to change it into whatever you want, and the model will generate a piece of the song that fits perfectly in that timeframe and blends with the actual song as a whole.
The target: Suno, the AI music kingIf ChatGPT is the king of AI text, Suno is the king of AI music. The company behind the model hit a $2.45 billion valuation in November 2025, crossed $300 million in annual recurring revenue, and has been used by roughly 100 million people.
It generates around 7 million songs per day. Warner Music settled its suit against Suno in November 2025; Sony and UMG are still in federal court.
To avoid these copyright wars, ElevenLabs has licensing deals with Believe, Kobalt, and Merlin. Stability has Warner and Universal. Udio settled with all three majors and is now a walled garden—nothing you generate can leave the platform.
Stable Audio 3.0 Small and Medium are available on Hugging Face now. Large is live via the Stability AI API. Music v2 is free for ElevenMusic users, with commercial tiers through ElevenCreative and ElevenAPI.


















