hngopher.com

       [HN Gopher] Launch HN: Soundry AI (YC W24) - Music sample genera...
       ___________________________________________________________________
        
       Launch HN: Soundry AI (YC W24) - Music sample generator for music
       creators
        
       Hi everyone! We're Mark, Justin, and Diandre of Soundry AI
       (https://soundry.ai/). We provide generative AI tools for
       musicians, including text-to-sound and infinite sample packs.  We
       (Mark and Justin) started writing music together a few years ago
       but felt limited in our ability to create anything that we were
       proud of. Modern music production is highly technical and requires
       knowledge of sound design, tracking, arrangement, mixing,
       mastering, and digital signal processing. Even with our technical
       backgrounds (in AI and cloud computing respectively), we struggled
       to learn what we needed to know.  The emergence of latent diffusion
       models was a turning point for us just like many others in tech.
       All of a sudden it was possible to leverage AI to create beautiful
       art. After meeting our cofounder Diandre (half of the DJ duo
       Bandlez and expert music producer), we formed a team to apply
       generative AI to music production.  We began by focusing on
       generating music samples rather than full songs. Focusing on
       samples gave us several advantages, but the biggest one was the
       ability to build and train our custom models very quickly due to
       the small required length of the generated audio (typically 2-10
       seconds). Conveniently, our early text-to-sample model also fit
       well within many existing music producers' workflows which often
       involve heavy use of music samples.  We ran into several challenges
       when creating our text-to-sound model. The first was that we began
       by training our latent transformer (similar to Open AI's Sora)
       using off-the-shelf audio autoencoders (like Meta's Encodec) and
       text embedders (like Google's T5). The domain gap between the data
       used to train these off-the-shelf models and sample data was much
       greater than we expected, which caused us to incorrectly attribute
       blame for issues in the three model components (latent transformer,
       autoencoder, and embedder) during development. To see how musicians
       can use our text-to-sound generator to write music, you can see our
       text-to-sound demo below:
       https://www.youtube.com/watch?v=MT3k4VV5yrs&ab_channel=Sound...
       The second issue we experienced was more on the product design
       side. When we spoke with our users in-depth we learned that novice
       music producers had no idea what to type into the prompt box, and
       expert music producers felt that our model's output wasn't always
       what they had in mind when they typed in their prompt. It turns out
       that text is much better at specifying the contents of visual art
       than music. This particular issue is what led us to our new
       product: the Infinite Sample Pack.  The Infinite Sample Pack does
       something rather unconventional: prompting with audio rather than
       text. Rather than requiring you to type out a prompt and specify
       many parameters, all you need to do is click a button to receive
       new samples. Each time you select a sound, our system embeds
       "prompt samples" as input to our model which then creates infinite
       variations. By limiting the number of possible outputs we're able
       to hide inference latency by pre-computing lots of samples ahead of
       time. This new approach has seen much wider adoption and so this
       month we'll be opening the system up so that everyone can create
       Infinite Sample Packs of their very own! To compare the workflow of
       the two products, you can check out our new demo using the Infinite
       Sample Pack:
       https://www.youtube.com/watch?v=BqYhGipZCDY&ab_channel=Sound...
       Overall, our founding principle is to start by asking the question:
       "what do musicians actually want?" Meta's open sourcing of MusicGen
       has resulted in many interchangeable text-to-music products, but
       ours is embraced by musicians. By constantly having an open dialog
       with our users we've been able to satisfy many needs including the
       ability to specify BPM and key, including one-shot instrument
       samples (so musicians can write their own melodies), and adding
       drag-and-drop support for digital audio workstations via our
       desktop app and VST. To hear some of the awesome songs made with
       our product, take a listen to our community showcases below!
       https://soundcloud.com/soundry-ai/sets/community-showcases  We hope
       you enjoy our tool, and look forward to discussion in the comments
        
       Author : kantthpel
       Score  : 74 points
       Date   : 2024-03-21 18:09 UTC (4 hours ago)
        
 (HTM) web link (soundry.ai)
 (TXT) w3m dump (soundry.ai)
        
       | hnhg wrote:
       | This looks really great but also I expect there will be many such
       | tools on the market soon. Are you going for the full VC backed
       | route and if so, how would you meet their high growth
       | expectations? I can see a really nice business here but the music
       | software industry seems highly competitive and full of new
       | entrants all the time.
        
         | kantthpel wrote:
         | It's quite true that there is a lot of excitement in this
         | space, and that naturally comes with a lot of competitors. We
         | were among the first movers with respect to AI tools for
         | musicians, and we've seen that many others are having to move
         | fast to catch up. Our unique insight is and has always been
         | thinking about what musicians and music enthusiasts actually
         | want, rather than blindly implementing a text-to-music model.
         | 
         | With respect to growth, our goal is to continue lowering the
         | barrier to entry for people to write amazing music that they
         | are proud of. Our initial product is focused on music samples,
         | but by increasing the level of abstraction to remixes and then
         | full stem generation we can literally enable more people to
         | consider themselves to be "musicians." Adding music
         | distribution to our platform will greatly improve virality and
         | organic growth, since most people who make music want to share
         | what they've made with their friends. We are confident that we
         | can achieve our lofty goals by building a vertically integrated
         | music creation and distribution platform.
        
       | drngdds wrote:
       | How much does it cost? Is it subscription-based? I can't find
       | anything about it on the site.
        
         | kantthpel wrote:
         | $10/month or $100/year for unlimited downloads
        
       | dpflan wrote:
       | I'm watching https://youtu.be/MT3k4VV5yrs, and I see current
       | output, like drum kit, is an ensemble of instruments, can that be
       | expanded to be stems? So you have the instruments as their own
       | component, the ensemble performance broken down into those pieces
       | for full on editing of the output. Rather than drag-and-drop
       | ensembles, you then can modify all the instruments you're adding
       | to your score.
        
         | kantthpel wrote:
         | Exactly! Our next product is focused on stem separation and
         | generation. The ability to modify these generated stems is
         | crucial since it will enable our users to craft the song to
         | their needs.
        
         | MadDemon wrote:
         | You can make it editable with https://samplab.com
        
       | cpursley wrote:
       | If this can pull off drum tracks "in the style of", I'm in.
        
         | kantthpel wrote:
         | Absolutely! You can check out the many different styles in our
         | Infinite Sample Packs. If you already have audio in the style
         | that you want you can even upload that to our Pack Builder and
         | get unlimited variations on it.
        
       | posaune wrote:
       | Very cool idea - I've been curious about text-sound approaches
       | that allow for more granularity rather than just producing fully-
       | formed tracks. Would be curious to know more details about the
       | pitfalls of using Encodec with shorter samples. As a musician and
       | software engineer, I could see lots of applications of this
       | sample-first approach that don't rely on the familiar DAW
       | timeline/track UI, too.
        
         | kantthpel wrote:
         | Thank you so much! The biggest issue with Encodec (especially
         | the 48kHz version) is that it is very dependent on
         | normalization. This wasn't an issue for their use case (music)
         | since music generally doesn't contain silent portions, but not
         | so for samples. Many oneshots and loops have a great deal of
         | silence or very quiet portions of the waveform, which when
         | normalized become essentially pure noise. Training our custom
         | autoencoder to handle this issue was one of the key factors
         | which enabled us to get such good audio quality.
        
       | vthallam wrote:
       | Just wanted to say the video demo on the home page was very cool.
       | Congrats and good luck!
        
         | kantthpel wrote:
         | Thank you so much! :)
        
       | pea wrote:
       | Awesome work! One thing I'm curious about in this space is why
       | people generally generate the sound form directly. I always
       | imagined you'd get better results teaching the model to output
       | parameters which you could feed into synths
       | (wavetable/fm/granular/VA), samplers, and effects, alongside
       | MIDI.
       | 
       | You'd imagine you could estimate most music with this with less
       | compute and higher determinism and introspection. Is it because
       | there isn't enough training data for the above?
        
         | kantthpel wrote:
         | Thank you! There has been a lot of work on midi generation in
         | the past and many people have gotten great results using the
         | approach that you describe. The reason why modern music
         | generators like ours create audio files directly is because
         | midi can't represent all of the nuances of acoustic instruments
         | (vocals, violin, trombone, etc). The allure of modern
         | generative AI (diffusion networks and autoregressive models) is
         | that they are finally capable of generating high quality audio
         | which sounds natural.
         | 
         | If you're interested in really exciting work on applying AI to
         | creating synthesizer patches, I recommend you check out
         | synplant2: https://soniccharge.com/synplant2. Their tool can
         | load in any audio and then create a synth patch which sounds
         | nearly identical to the input audio.
        
           | pea wrote:
           | Thank you so much I'll check that out.
        
       | chaosprint wrote:
       | Congratulations on your launch!
       | 
       | I started music DRL (https://github.com/chaosprint/RaveForce) a
       | few years ago. At that time, SOTA was still the "traditional"
       | method of GANSynth.
       | 
       | Later, I mainly turned to Glicol (https://glicol.org) and tried
       | to combine it with RaveForce.
       | 
       | There are many kinds of music generation nowadays, such as Suno
       | AI, but I think the biggest pain point is the lack of
       | controllability. I mean, after generation, if you can't fine-tune
       | the parameters, it's going to be really painful. As for pro, most
       | of the generated results are still unusable. This is why I wanted
       | to try DRL in the first place. Also worth checking
       | 
       | https://forum.ircam.fr/projects/detail/rave-vst/
       | 
       | If this is your direction, I'm wondering if you have compared the
       | methods of generating midi? After all, the generated midi and
       | parameters can be adjusted quickly, it is also in the form of a
       | loop, and it can be lossless.
       | 
       | In addition, I saw that the demo on your official website was
       | edited at 0:41, so how long does it take to generate the loop? Is
       | this best quality or average quality?
       | 
       | Anyway, I hope you succeed.
        
         | kantthpel wrote:
         | RaveForce is awesome, you have a new GitHub star! Glicol is
         | fascinating, but to be honest I'm not sure what the use case
         | for writing music with code is? Please follow up since I'm
         | curious to learn more. RAVE VST is also awesome, I played
         | around with it a lot when it first came out.
         | 
         | I 100% agree with your point about Suno AI. If you're an
         | amateur you want to be able to have the ability to control and
         | change the output, otherwise how can you call the music your
         | own? If you're a professional, without the ability to control
         | you can never achieve your specific goals! This is why we feel
         | confident in our musician-first approach.
         | 
         | WRT Midi generation we are absolutely considering it, but we
         | don't think we can really offer anything unique there. We
         | believe our ability to create natural sounding instruments is
         | key to enabling the creation of all genres of music. With that
         | said though, the ability to generate MIDI is #1 on our Canny
         | board so maybe that should be next :)
         | 
         | Our text-to-sound model takes roughly 10 seconds to generate,
         | and our Infinite Sample Packs are instant since we pre-compute
         | output to hide latency.
         | 
         | Thank you for your thoughtful questions!
        
           | chaosprint wrote:
           | There is a long history of using PureData, Csound or relevant
           | languages for designing sounds or composition.
           | 
           | https://en.wikipedia.org/wiki/MUSIC-N
           | 
           | Later SuperCollider and TidalCycles led the way in live
           | coding. For me, I just wanted a tool that could write code,
           | compose music or design sounds directly in the browser, play
           | with friends and have sample-level control. From the
           | perspective of sample level control, it seems to be two
           | extremes compared with the black box of AI.
        
             | kantthpel wrote:
             | Very cool, thanks for the extra context!
        
       | 999900000999 wrote:
       | How much does it cost ?
       | 
       | I'm very interested in this ( although I'd rather have an API),
       | but it's major red flag if I don't know the price
        
         | kantthpel wrote:
         | $10/month or $100/year for unlimited downloads
        
           | 999900000999 wrote:
           | That's not bad. Thanks
        
       | garyrob wrote:
       | Sorry to ask this here, it's bit off topic, but I'm looking for
       | AI that can accept a chord progression that I write and make a
       | track from it that someone can sign over. I'm a songwriter, but
       | not that great a guitarist. It would be great if I could use the
       | phenomenal AI we're seeing to make the instrumental track for new
       | songs.
        
         | kantthpel wrote:
         | If you write your chord progression as midi then you can run it
         | through a plugin like Heavier7Strings
         | (https://www.threebodytech.com/en/products/heavier7strings) and
         | it will be able to create relatively realistic sounding guitar.
         | I hope that helps!
        
         | trocado wrote:
         | I don't know about phenomenal AI, but Band-in-a-box is a
         | classic tool for this kind of thing.
        
       | honkycat wrote:
       | AI forge [0] was unable to parse my prompt "crunchy overdriven
       | bass drum" due to a JSON parse error, replacing the spaces with
       | commas fixed it.
       | 
       | 0: https://app.soundry.ai/forge
        
         | kantthpel wrote:
         | Thanks for the heads up!
        
         | justinparus wrote:
         | Hey we had a rush of users earlier. It's up. Still seeing
         | issues?
        
           | honkycat wrote:
           | Fixed!
        
       | nkko wrote:
       | While the generative AI tools for musicians seem promising, I'm
       | skeptical about how much they can enhance creativity and
       | originality in music production. There's a risk of over-reliance
       | on AI, leading to more formulaic and homogeneous music. The human
       | element and the "soul" of music creation shouldn't be lost in
       | pursuing technological convenience.
        
         | block_dagger wrote:
         | I hear your argument but I'm not convinced that reliance on AI
         | will lead to formulaic or homogenous music. Humans produce that
         | stuff by themselves already.
        
         | kantthpel wrote:
         | Thank you for sharing! Our goal is to enable musicians to get
         | their ideas out more quickly than they ever could before.
         | Inspiration can be fleeting, and we hope that our tool and
         | tools like it can enable everyone to seize those opportunities
         | to write a piece of music that they are proud of.
        
       | frankdenbow wrote:
       | Love this. Used jukedeck in the past and did a comp sci for music
       | class at CMU way back in the day. After reading I understand your
       | focus may be on people who would already classify themselves as
       | musicians but I think theres definitely a world where you are
       | making it easier for the amateur who makes music for recreation
       | or are musicians in training (same market as Artiphon who I have
       | worked with). One element of the UX as you describe is that text
       | may be difficult, would you imagine having input being described
       | the way some artists do with humming and audio descriptions?
       | Something along the lines of this:
       | https://www.youtube.com/watch?v=yhOsxMhe8eo
       | 
       | I put together my initial thoughts here:
       | https://www.youtube.com/watch?v=nAZAWBw7c7o
        
         | kantthpel wrote:
         | Wow, thank you so much for the in-depth landing page and
         | product feedback! We absolutely agree that this technology can
         | be used to enable people who haven't previously considered
         | themselves to be musicians to write music. While we currently
         | require that you use our tool with a DAW the next step is to
         | enable full track creation all within our tool.
         | 
         | Also, I love your suggestion for humming and then using that
         | audio as input to the AI. Our text-to-sound product (called The
         | Forge in-app) does actually support you uploading your own
         | audio and then you can add text and other prompting to modify
         | it! I wouldn't say that particular functionality is fully
         | developed yet, but your point is super valid since many
         | musicians communicate rhythms and melodies to each other by
         | imitating instruments with their voice.
        
         | kantthpel wrote:
         | Also, you've got a new subscriber :)
        
       | theresachen wrote:
       | This is a must to have for creating music. They have a real
       | artist on their founding team as well and the rest are DJs and
       | producers. Its better than those other fake AI companies making
       | shitty music
        
         | kantthpel wrote:
         | Hell yeah, thank you for your support!
        
       | b20000 wrote:
       | congrats!
       | 
       | how did you determine market size? TAM etc?
        
         | kantthpel wrote:
         | Thank you so much! We computed TAM the best way: bottom's up!
         | We estimated the number of current musicians, the number of
         | people we could convert into musicians, and the annual
         | subscription price, and then we used this to compute our total
         | TAM. If you're interested to learn more and apply this kind of
         | thing to your own work I recommend this video from SlideBean:
         | https://www.youtube.com/watch?v=M_RMTC2YmXY&ab_channel=Slide...
        
       | serjester wrote:
       | Seems like it has a lot of potential! I'd love to see some AI
       | functionality in mastering tracks. Would love to be able to tune
       | high hats to make it sound more full, etc.
        
         | kantthpel wrote:
         | Thank you so much! Yes, this is a great application of AI.
         | We'll be adding that functionality in soon, but in the meantime
         | you might be interested to learn about the work that Izotope is
         | doing with Ozone: https://www.izotope.com/en/learn/ai-
         | mastering.html
        
       ___________________________________________________________________
       (page generated 2024-03-21 23:00 UTC)