https://stability-ai.github.io/stable-audio-demo/

stable-audio-demo

[?][?] Warning: This website may not function properly on Safari. For the
best experience, please use Google Chrome.

arXiv: Stable Audio's paper

stable-audio-tools: code to reproduce Stable Audio

stable-audio-metrics: code to evaluate Stable Audio

Our model can generate variable-length and long-form stereo music at
44.1kHz:

Generated Stereo                        Prompt
      Music
Audio not         Berlin techno, rave, drum machine, kick, ARP
supported by your synthesizer, dark, moody, hypnotic, evolving, 135
browser.          BPM. Loop.
Audio not
supported by your Uplifting acoustic loop. 120 BPM.
browser.
Audio not         Disco, Driving Drum Machine, Synthesizer, Bass,
supported by your Piano, Guitars, Instrumental, Clubby, Euphoric,
browser.          Chicago, New York, 115 BPM.
Audio not
supported by your Calm meditation music to play in a spa lobby.
browser.
Audio not
supported by your Drum solo.
browser.

Differently from pervious state-of-the-art models, ours can generate
stereo sound effects at 44.1kHz:

Generated Stereo Sounds                    Prompt
Audio not supported by  Door slam. High-quality, stereo.
your browser.
Audio not supported by  Sports car passing by. High-quality, stereo.
your browser.
Audio not supported by  Motorbike passing by. High-quality, stereo.
your browser.
Audio not supported by  Fireworks. High-quality, stereo.
your browser.
Audio not supported by  Reverberant footsteps inside a large rocky
your browser.           cave. High-quality, stereo.

Note that all the examples in this website are generated with the
same model that can generate both variable-length music and sound
effects at 44.1kHz stereo. We append "high-quality, stereo" to our
sound effects prompts because it is generally helpful.

Long-form stereo music: comparison with state-of-the-art with
MusicCaps prompts

Prompt: This song contains someone strumming a melody on a mandolin
while more people are whistling along. Then a mandolin, an e-bass and
an acoustic guitar are playing a short melody in a lower key before
breaking into the next part along with flutes and percussions. This
song may be played outside by musicians performing.

   Our Model      MusicGen-large    MusicGen-stereo      AudioLDM2
    (stereo,       (mono, 32kHz)    (stereo, 32kHz)    (mono, 48kHz)
    44.1kHz)
Audio not        Audio not         Audio not         Audio not
supported by     supported by your supported by your supported by your
your browser.    browser.          browser.          browser.

Prompt: The commercial music features a groovy piano melody played
over snare rolls in the first half of the loop. Right after, there is
a drop that consists of a punchy "4 on the floor" kick pattern,
shimmering hi hats, claps, groovy piano and wide synth lead melody.
It sounds happy, fun, euphoric and exciting.

   Our Model      MusicGen-large    MusicGen-stereo      AudioLDM2
    (stereo,       (mono, 32kHz)    (stereo, 32kHz)    (mono, 48kHz)
    44.1kHz)
Audio not        Audio not         Audio not         Audio not
supported by     supported by your supported by your supported by your
your browser.    browser.          browser.          browser.

These prompts/audios were used for the qualitative study we report in
our paper.

Sound effects: comparison with state-of-the-art with AudioCaps
prompts

Prompt: Clicking and sputtering then eventual revving of an idling
engine.

        Model             Audiogen-medium           AudioLDM2
  (stereo, 44.1kHz)        (mono, 32kHz)          (mono, 48kHz)
Audio not supported by Audio not supported by Audio not supported by
your browser.          your browser.          your browser.

Prompt: Birds chirping loudly.

        Model             Audiogen-medium           AudioLDM2
  (stereo, 44.1kHz)        (mono, 32kHz)          (mono, 48kHz)
Audio not supported by Audio not supported by Audio not supported by
your browser.          your browser.          your browser.

These prompts/audios were used for the qualitative study we report in
our paper. Note the (randomly) selected prompts from AudioCaps did
not require substantial stereo movement, resulting in renders that
are relatively non-spatial.

Autoencoder: reconstructions

This comparison is useful to evaluate the audio fidelity capabilities
of the autoencoder. On the left, we have the ground truth recording.
On the right, we take the ground truth recording and end pass it
through the autoencoder. Note that the autoencoder reconstruction is
fairly transparent, very close to the ground truth.

           Ground truth                Autoencoder reconstruction
Your browser does not support the  Your browser does not support the
audio element.                     audio element.
Your browser does not support the  Your browser does not support the
audio element.                     audio element.
Your browser does not support the  Your browser does not support the
audio element.                     audio element.
Your browser does not support the  Your browser does not support the
audio element.                     audio element.
Your browser does not support the  Your browser does not support the
audio element.                     audio element.