I’ve been experimenting with ACE-Step 1.5 lately and wanted to share a short summary of what actually helped me get more controllable and musical results, based on the official tutorial + hands-on testing.
The biggest realization: ACE-Step works best when you treat prompts as [structured inputs], not a single sentence (same as other LLMs)
1. Separate “Tags” from “Lyrics”
Instead of writing one long prompt, think in two layers:
Tags = global control
Use comma-separated keywords to define:
- genre / vibe (funk, pop, disco)
- tempo (112 bpm, up-tempo)
- instruments (slap bass, drum machine)
- vocal type (male vocals, clean, rhythmic)
- era / production feel (80s style, punchy, dry mix)
Being specific here matters a lot more than being poetic.
2. Use structured lyrics
Lyrics aren’t just text — section labels help a ton:
[intro]
[verse]
[chorus]
[bridge]
[outro]
Even very simple lines work better when the structure is clear. It pushes the model toward “song form” instead of a continuous loop.
3. Think rhythm, not prose
Short phrases, repetition, and percussive wording generate more stable results than long sentences. Treat vocals like part of the groove.
4. Iterate with small changes
If something feels off:
- tweak tags first (tempo / mood / instruments)
- then adjust one lyric section
No need to rewrite everything each run.
5. LoRA + prompt synergy
LoRAs help with style, but prompts still control:
- structure
- groove
- energy
Over-strong LoRA weights can easily push outputs into parody.
Overall, ACE-Step feels less like “text-to-music” and more like music-conditioned generation. Once you start thinking in tags + structure, results get much more predictable.
Curious how others here are prompting ACE-Step — especially for groove-based music.
I didn't try what you suggested, but even without better control of the music with more structured prompts, I think the musicality of this system is much lower than what we have commercially. I made the following two songs with the system:
with the same prompt as e.g. this song https://rochus-keller.ch/?p=1428 which I made with Suno (though by first uploading my own audio, which I didn't try with ACE-Step). It doesn't take much to understand that there are worlds between these results.
Though I think it's very good that university researchers are on it and we can expect to have equivalent open-source systems one day.
I’ve been experimenting with ACE-Step 1.5 lately and wanted to share a short summary of what actually helped me get more controllable and musical results, based on the official tutorial + hands-on testing. The biggest realization: ACE-Step works best when you treat prompts as [structured inputs], not a single sentence (same as other LLMs) 1. Separate “Tags” from “Lyrics” Instead of writing one long prompt, think in two layers: Tags = global control Use comma-separated keywords to define: - genre / vibe (funk, pop, disco) - tempo (112 bpm, up-tempo) - instruments (slap bass, drum machine) - vocal type (male vocals, clean, rhythmic) - era / production feel (80s style, punchy, dry mix) Being specific here matters a lot more than being poetic. 2. Use structured lyrics Lyrics aren’t just text — section labels help a ton: [intro] [verse] [chorus] [bridge] [outro] Even very simple lines work better when the structure is clear. It pushes the model toward “song form” instead of a continuous loop. 3. Think rhythm, not prose Short phrases, repetition, and percussive wording generate more stable results than long sentences. Treat vocals like part of the groove. 4. Iterate with small changes If something feels off: - tweak tags first (tempo / mood / instruments) - then adjust one lyric section No need to rewrite everything each run. 5. LoRA + prompt synergy LoRAs help with style, but prompts still control: - structure - groove - energy Over-strong LoRA weights can easily push outputs into parody. Overall, ACE-Step feels less like “text-to-music” and more like music-conditioned generation. Once you start thinking in tags + structure, results get much more predictable. Curious how others here are prompting ACE-Step — especially for groove-based music.
resource:https://github.com/ace-step/ACE-Step-1.5
I didn't try what you suggested, but even without better control of the music with more structured prompts, I think the musicality of this system is much lower than what we have commercially. I made the following two songs with the system:
- http://rochus-keller.ch/Diverses/Ace-Step-v1.5_demo1.mp3
- http://rochus-keller.ch/Diverses/Ace-Step-v1.5_demo2.mp3
with the same prompt as e.g. this song https://rochus-keller.ch/?p=1428 which I made with Suno (though by first uploading my own audio, which I didn't try with ACE-Step). It doesn't take much to understand that there are worlds between these results.
Though I think it's very good that university researchers are on it and we can expect to have equivalent open-source systems one day.