https://stability-ai.github.io/stable-audio-demo/ stable-audio-demo [?][?] Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome. arXiv: Stable Audio's paper stable-audio-tools: code to reproduce Stable Audio stable-audio-metrics: code to evaluate Stable Audio Our model can generate variable-length and long-form stereo music at 44.1kHz: Generated Stereo Prompt Music Audio not Berlin techno, rave, drum machine, kick, ARP supported by your synthesizer, dark, moody, hypnotic, evolving, 135 browser. BPM. Loop. Audio not supported by your Uplifting acoustic loop. 120 BPM. browser. Audio not Disco, Driving Drum Machine, Synthesizer, Bass, supported by your Piano, Guitars, Instrumental, Clubby, Euphoric, browser. Chicago, New York, 115 BPM. Audio not supported by your Calm meditation music to play in a spa lobby. browser. Audio not supported by your Drum solo. browser. Differently from pervious state-of-the-art models, ours can generate stereo sound effects at 44.1kHz: Generated Stereo Sounds Prompt Audio not supported by Door slam. High-quality, stereo. your browser. Audio not supported by Sports car passing by. High-quality, stereo. your browser. Audio not supported by Motorbike passing by. High-quality, stereo. your browser. Audio not supported by Fireworks. High-quality, stereo. your browser. Audio not supported by Reverberant footsteps inside a large rocky your browser. cave. High-quality, stereo. Note that all the examples in this website are generated with the same model that can generate both variable-length music and sound effects at 44.1kHz stereo. We append "high-quality, stereo" to our sound effects prompts because it is generally helpful. Long-form stereo music: comparison with state-of-the-art with MusicCaps prompts Prompt: This song contains someone strumming a melody on a mandolin while more people are whistling along. Then a mandolin, an e-bass and an acoustic guitar are playing a short melody in a lower key before breaking into the next part along with flutes and percussions. This song may be played outside by musicians performing. Our Model MusicGen-large MusicGen-stereo AudioLDM2 (stereo, (mono, 32kHz) (stereo, 32kHz) (mono, 48kHz) 44.1kHz) Audio not Audio not Audio not Audio not supported by supported by your supported by your supported by your your browser. browser. browser. browser. Prompt: The commercial music features a groovy piano melody played over snare rolls in the first half of the loop. Right after, there is a drop that consists of a punchy "4 on the floor" kick pattern, shimmering hi hats, claps, groovy piano and wide synth lead melody. It sounds happy, fun, euphoric and exciting. Our Model MusicGen-large MusicGen-stereo AudioLDM2 (stereo, (mono, 32kHz) (stereo, 32kHz) (mono, 48kHz) 44.1kHz) Audio not Audio not Audio not Audio not supported by supported by your supported by your supported by your your browser. browser. browser. browser. These prompts/audios were used for the qualitative study we report in our paper. Sound effects: comparison with state-of-the-art with AudioCaps prompts Prompt: Clicking and sputtering then eventual revving of an idling engine. Model Audiogen-medium AudioLDM2 (stereo, 44.1kHz) (mono, 32kHz) (mono, 48kHz) Audio not supported by Audio not supported by Audio not supported by your browser. your browser. your browser. Prompt: Birds chirping loudly. Model Audiogen-medium AudioLDM2 (stereo, 44.1kHz) (mono, 32kHz) (mono, 48kHz) Audio not supported by Audio not supported by Audio not supported by your browser. your browser. your browser. These prompts/audios were used for the qualitative study we report in our paper. Note the (randomly) selected prompts from AudioCaps did not require substantial stereo movement, resulting in renders that are relatively non-spatial. Autoencoder: reconstructions This comparison is useful to evaluate the audio fidelity capabilities of the autoencoder. On the left, we have the ground truth recording. On the right, we take the ground truth recording and end pass it through the autoencoder. Note that the autoencoder reconstruction is fairly transparent, very close to the ground truth. Ground truth Autoencoder reconstruction Your browser does not support the Your browser does not support the audio element. audio element. Your browser does not support the Your browser does not support the audio element. audio element. Your browser does not support the Your browser does not support the audio element. audio element. Your browser does not support the Your browser does not support the audio element. audio element. Your browser does not support the Your browser does not support the audio element. audio element.