minimax/music-2.5

Generate full-length songs with vocals, lyrics, and rich instrumentation from a text prompt

4.1K runs

Readme

Music 2.5

Music 2.5 is MiniMax’s latest music generation model. Give it lyrics and a style description, and it generates a full-length song with vocals and instrumentation.

What’s new in 2.5

Music 2.5 is a significant upgrade over previous versions across the board:

  • Better vocals — more natural-sounding singing with realistic timbre, breathing, and pitch transitions
  • Better instrumentation — expanded sound library including orchestral and traditional instruments, with cleaner separation between vocals and accompaniment
  • Precise structure control — 14+ section tags let you control exactly how the song is arranged
  • Style-aware mixing — the model automatically adjusts mixing characteristics based on genre (rock distortion, jazz warmth, electronic transients, etc.)

Inputs

lyrics (required) — The lyrics for your song, 1–3,500 characters. You can use structure tags to control the arrangement:

[Intro], [Verse], [Pre Chorus], [Chorus], [Hook], [Drop], [Bridge], [Solo], [Build Up], [Inst], [Interlude], [Break], [Transition], [Outro]

Use \n to separate lines and \n\n to add pauses between sections.

prompt (optional) — A description of the music style, mood, and scenario. For example: Indie folk, melancholic, introspective, longing, solitary walk, coffee shop. Up to 2,000 characters.

sample_rate — Audio sample rate. Options: 16000, 24000, 32000, 44100 (default).

bitrate — Audio bitrate. Options: 32000, 64000, 128000, 256000 (default).

audio_format — Output format: mp3 (default), wav, or pcm.

Tips

  • Structure tags make a big difference. A song with [Verse], [Chorus], [Bridge], and [Outro] sections will sound much more like a real song than a wall of text.
  • The prompt is optional but helps steer the style. Be specific about genre, mood, tempo, and vocal style.
  • Lyrics can be up to 3,500 characters — enough for a full multi-verse song with choruses and bridges.

Prompt guide

The prompt field steers the overall sound of your track. Think of it as giving creative direction to a producer. Here’s how to get the most out of it.

Prompt structure

A good prompt follows this general pattern:

[Genre], [Mood/Emotion], [Vocal description], [Tempo], [Key instruments], [Era/Style reference], [Production style]

You don’t need every element every time — pick the ones that matter for your track.

What to include

Element What it does Examples
Genre Sets the foundational sound Pop, Indie folk, Jazz, Blues, EDM, Hip-hop, Rock, Classical, Country, R&B
Mood / Emotion Shapes the emotional tone Melancholic, uplifting, aggressive, dreamy, hopeful, introspective, confident
Vocal style Guides the singer’s delivery Male vocals, female vocals, breathy, powerful, soulful, clear, operatic, raspy
Tempo Controls the pace Slow tempo, 80 BPM, driving 125 BPM, relaxed pace, uptempo
Instruments Requests specific sounds Acoustic guitar, piano, synth bass, 808 drums, orchestral strings, brass section
Era / Style reference Evokes a specific sound 1980s Minneapolis sound, vintage vinyl texture, classic Motown, 90s grunge
Production / Mixing Shapes the sonic character Lo-fi, warm reverb, wide soundstage, intimate studio feel, distorted, crisp

Example prompts

Indie folk: Indie folk, melancholic, introspective, longing, solitary walk, coffee shop

Blues: Soulful Blues, Rainy Night, Melancholy, Male Vocals, Slow Tempo, Electric Guitar

Electronic / Dance: Pop-Dance/Progressive House, uplifting, anthemic, 125 BPM, four-on-the-floor kick, synth bass, atmospheric pads, processed vocal chops

R&B / Pop: Contemporary R&B/Pop with Trap influences, confident, assertive, bright female vocal, Auto-Tune, 808 bassline, 80 BPM, atmospheric synth pads

Jazz: Vocal Jazz and Swing, playful, flirtatious, relaxed 82 BPM, walking upright bass, brushed drums, piano chords, muted trumpet fills

Lo-fi: Lo-fi hip-hop, chill, study vibes, vinyl texture, warm midrange

Rock ballad: Pop-Rock ballad, hopeful, resilient, 77 BPM, piano intro building to full rock band, layered electric guitars, sweeping strings

Country folk: Classic country folk, reflective, melancholic, 107 BPM, male warm baritone, fingerpicked acoustic guitar, walking bass, upright bass

Disco-funk: Disco-Funk, joyful, energetic, 118 BPM, four-on-the-floor disco beat, melodic funk bassline, rhythmic electric guitar, synth strings, male tenor with soulful grit

Prompt tips

  • Be specific about genre. “1980s Minneapolis sound” produces different synth textures than just “1980s synth-pop” — the model picks up on era-specific references.
  • Include BPM when tempo matters. The model responds well to specific tempos (e.g., “80 BPM” vs. just “slow”).
  • Describe vocal characteristics. Gender, timbre, delivery style, and effects all help: “bright, clear female vocal with a slightly sassy edge” is much more useful than “female singer.”
  • Name specific instruments. Instead of “guitar,” say “fingerpicked acoustic guitar” or “distorted electric guitar with heavy riffs.”
  • Mention production qualities. Terms like “wide soundstage,” “vinyl texture,” “warm reverb,” and “intimate studio feel” shape the final mix.

Structure tags

Structure tags let you design the emotional arc and arrangement of your song like a professional arranger.

All 14 tags

Tag Purpose When to use it
[Intro] Song opening Setting mood before vocals kick in
[Verse] Story / narrative sections Main lyrical content
[Pre Chorus] Build-up before chorus Escalating tension
[Chorus] Hook / memorable section The repeating main idea
[Post Chorus] After-hook section Extended hook variation
[Hook] Catchy standalone phrase A memorable moment
[Drop] Energy release (EDM) After a build-up, the beat drops
[Bridge] Contrast section Breaking repetition, new perspective
[Solo] Instrumental spotlight Showcasing a specific instrument
[Inst] Instrumental section Music without vocals
[Build Up] Intensity increase Leading to a drop or climax
[Interlude] Instrumental break Breathing space between sections
[Break] Rhythmic pause Dynamic contrast
[Transition] Section connector Smooth flow between parts
[Outro] Song ending Graceful exit, fade-out

Example: Full song layout

[Intro]
(Soft piano, building slowly)
Oh, here we go again...

[Verse]
The city lights are fading out
I'm walking through the empty streets
Every shadow tells a story
Of the ones I'll never meet

[Pre Chorus]
And I can feel it rising
Something breaking through the night

[Chorus]
We're burning like the northern lights
Dancing on the edge of time
Nothing lasts but nothing dies
We're burning like the northern lights

[Verse]
The radio plays our favorite song
A melody from years ago
I turn it up and close my eyes
And let the memories overflow

[Bridge]
Maybe we were never meant to stay
Maybe that's what made it beautiful

[Chorus]
We're burning like the northern lights
Dancing on the edge of time
Nothing lasts but nothing dies
We're burning like the northern lights

[Solo]
(Electric guitar solo, soaring and emotional)

[Outro]
(Fading out)
Like the northern lights...
Like the northern lights...

Formatting tips

  • Keep each lyric section to 2–4 lines for cleaner melodies.
  • Parenthetical text like (Ooh, yeah) or (Guitar solo - slow, mournful) works for backing vocals, ad-libs, and performance directions.
  • Write your full lyrics in a text editor first, then paste them in.

Vocal control

Music 2.5 gives you fine-grained control over vocals through both the prompt and the lyrics.

In the prompt, describe the voice you want:

  • Gender and range: “male vocalist,” “female soprano,” “male baritone”
  • Timbre: “warm,” “bright,” “breathy,” “raspy,” “clear,” “rich”
  • Delivery style: “confident,” “earnest,” “playful,” “soulful,” “theatrical”
  • Effects: “Auto-Tune,” “moderate reverb,” “classic reverb”
  • Backing vocals: “layered harmonies in choruses,” “call-and-response,” “backing vocal oohs and aahs”
  • Duets: “conversational duet between a male vocalist with a deep, gravelly voice, and a female vocalist with a powerful, clear timbre”

In the lyrics, use parenthetical directions:

  • (whispered), (belted) — vocal delivery cues
  • (Hear it on the roof), (Fallin' on me) — backing vocals and ad-libs
  • (Guitar solo), (Piano and strings building) — instrumental directions

Vocal emotion evolves across sections naturally. Tag your sections thoughtfully and the model will shift from intimate verses to powerful choruses.

Instrument control

The model has a library of 100+ instruments. Name them specifically for best results:

  • Strings: acoustic guitar, electric guitar (clean/distorted), bass guitar, upright bass, violin, cello, orchestral strings
  • Keys: piano, electric piano, synth pads, organ, harpsichord
  • Brass & Woodwinds: trumpet, muted trumpet, trombone, saxophone, flute, clarinet, brass section
  • Drums & Percussion: drum kit, brushed drums, electronic drums, 808 bass, hi-hats, claps
  • Electronic: synth bass, lead synths, atmospheric pads, arpeggiated synths, risers, sweeps

You can also call for instrument moments directly in the lyrics:

[Solo]
(Guitar solo - slow, mournful, bluesy)

[Inst]
(Instrumental break - piano and strings building intensity)

Style-aware mixing

Music 2.5 automatically adapts its mix based on genre — you don’t need to spell out mixing details for common styles:

  • Rock → distortion, power, dynamic range
  • Jazz → spatial depth, warm character, instrument separation
  • Lo-fi → vinyl grain texture, warm midrange, lo-fi compression
  • 1980s synth-pop → period-appropriate synth textures, era-specific reverb
  • Electronic/EDM → crisp transients, wide stereo image
  • Classical → concert hall reverb, natural dynamics, orchestral balance

For more control, add production terms to your prompt: “wide soundstage,” “intimate studio feel,” “vintage vinyl warmth,” “crisp modern production.”

Good to know

  • Max song length: Up to ~5 minutes per generation. Most songs land between 2:30 and 4:30.
  • Language support: English and Mandarin Chinese have the strongest support. Other languages work but with less consistent pronunciation.
  • Each generation is unique. The same prompt and lyrics produce different arrangements each time — vocal delivery stays consistent, but instrumental choices vary.
  • For instrumental tracks, use [Inst] tags and parenthetical instrument directions instead of lyrics.
  • Higher quality settings (44100 sample rate, 256000 bitrate) give the best audio quality. Use wav format for production work.

Privacy policy

Data from this model is sent from Replicate to MiniMax.

Check their privacy policy for details:

https://www.minimax.io/platform/protocol/privacy-policy

Terms of service

https://www.minimax.io/platform/protocol/terms-of-service

Model created
Model updated