[HN Gopher] Show HN: Sonauto - a more controllable AI music creator
___________________________________________________________________
Show HN: Sonauto - a more controllable AI music creator
Hey HN, My cofounder and I trained an AI music generation model
and after a month of testing we're launching 1.0 today. Ours is
interesting because it's a latent diffusion model instead of a
language model, which makes it more controllable:
https://sonauto.ai/ Others do music generation by training a
Vector Quantized Variational Autoencoder like Descript Audio Codec
(https://github.com/descriptinc/descript-audio-codec) to turn music
into tokens, then training an LLM on those tokens. Instead, we
ripped the tokenization part off and replaced it with a normal
variational autoencoder bottleneck (along with some other important
changes to enable insane compression ratios). This gave us a nice,
normally distributed latent space on which to train a diffusion
transformer (like Sora). Our diffusion model is also particularly
interesting because it is the first audio diffusion model to
generate coherent lyrics! We like diffusion models for music
generation because they have some interesting properties that make
controlling them easier (so you can make _your own_ music instead
of just taking what the machine gives you). For example, we have a
rhythm control mode where you can upload your own percussion line
or set a BPM. Very soon you 'll also be able to generate proper
variations of an uploaded or previously generated song (e.g., you
could even sing into Voice Memos for a minute and upload that!).
@Musicians of HN, try uploading your songs and using Rhythm
Control/let us know what you think! Our goal is to enable more of
you, not replace you. For example, we turned this drum line
(https://sonauto.ai/songs/uoTKycBghUBv7wA2YfNz) into this full song
(https://sonauto.ai/songs/KSK7WM1PJuz1euhq6lS7 skip to 1:05 if
impatient) or this other song I like better
(https://sonauto.ai/songs/qkn3KYv0ICT9kjWTmins - we accidentally
compressed it with AAC instead of Opus which hurt quality, though)
We also like diffusion models because while they're expensive to
train, they're cheap to serve. We built our own efficient inference
infrastructure instead of using those expensive inference as a
service startups that are all the rage. That's why we're making
generations on our site free and unlimited for as long as possible.
We'd love to answer your questions. Let us know what you think of
our first model! https://sonauto.ai/
Author : zaptrem
Score : 228 points
Date : 2024-04-10 16:48 UTC (6 hours ago)
(HTM) web link (sonauto.ai)
(TXT) w3m dump (sonauto.ai)
| Recursing wrote:
| Congratulations on the launch!
|
| I was recently really impressed by the state of AI-generated
| music, after listening to the April Fools LessWrong album
| https://www.lesswrong.com/posts/YMo5PuXnZDwRjhHhE/lesswrong-... .
| They claim it took them ~100 hours to generate 15 songs.
|
| Can't wait for the day I can instantly generate a song based on a
| random blog post or group chat history, this seems like a step in
| that direction
| disqard wrote:
| Perhaps not exactly "instantly generate a song based on a
| random blog post or group chat history", but more like
| "instantly generate a song based on an input prompt sentence"
| is suno.ai -- you should check it out!
| Recursing wrote:
| LessWrong used suno.ai , but the typical song quality is not
| there yet, so they had to generate 3,000-4,000 songs to get
| 15 good ones
| Etheryte wrote:
| The real endgame in this space would be a tool that first
| generates a song layout, think Fruityloops, then the
| corresponding instruments for it, then the vocals, and as
| the last step allows you to modify each of those layers
| without nuking the rest. Imagine something similar to what
| Suno does now, except you had the ability to add in an
| extra verse without altering the rest of the song, swap out
| a few passages of the lyrics with the rest staying in tact,
| swapping out drums for a different drum set etc.
| blueboo wrote:
| If there's variance in output, it stands to reason you'd
| generate many X your desired output count and curate.
| Standard practice for creative output, from Midjourney to
| LLMs
| turnsout wrote:
| Wait, is there a Suno API? I've used the site, but it's
| manual
| ibdf wrote:
| I was just trying similar apps last week and I was so frustrated
| with the amount of options and menus to get through before I
| could generate anything. Not to mention the fact that half of
| these services ended up asking me to pay per setting. I have to
| say this was the least painful service to use this far. Pretty
| impressive output for so little input.
| zaptrem wrote:
| Thanks! We have lots of fun dials for people who want them but
| they're all hidden by default and shouldn't be needed.
| 999900000999 wrote:
| What quality are you producing here ?
|
| Suno has this issue too, but everything sounds like it's washed
| out or something. As if you recorded it from a different room.
|
| Still I love this, ultimately I think it'll be a tool musicians
| use vs something for creating stand alone art
| zaptrem wrote:
| The audio is 44.1khz stereo, but all of us use autoencoders so
| the songs will fit in a transformer's context window, and huge
| compression will affect quality. We're definitely working on
| better ones, though!
| 999900000999 wrote:
| I'd definitely pay more for higher quality!
|
| Good work
| LouisvilleGeek wrote:
| Same here. Please consider a higher quality option.
| cchance wrote:
| Feels like this needs something like was done with stable
| diffusion when they fixed the contrast in images through the
| use of loras
| throw_m239339 wrote:
| > Still I love this, ultimately I think it'll be a tool
| musicians use vs something for creating stand alone art
|
| Spotify is getting flooded with AI generated music. It is
| absolutely something people will use to just generate the music
| they want to hear.
|
| Ultimately though, what would be the point of spotify? Anybody
| will be able to generate 24/7 of songs based on their mood or a
| few keywords.
|
| It will radically change the music landscape and how people
| "consume" music.
| zaptrem wrote:
| If this were the future that would be kinda depressing. I
| think the best, truly catchy songs and those that truly
| connect with people will continue having a significant human
| element. I see this as similar to the invention of Photoshop
| except even easier for normal people to start getting into.
| pksebben wrote:
| So long as there's something to miss about human-generated
| content, there will be a market for that content.
|
| Things are going to get truly weird when you can no longer
| tell the difference, on any level.
| 999900000999 wrote:
| At least for hip hop, AI is too sanitized to do anything too
| creative.
|
| I suspect record labels might train their own models. I know
| for sampling, being able to just create a royalty loop
| without worrying about clearing anything is cool.
| kposehn wrote:
| I've found that adding prompt elements such as "hi-fi", "sharp
| imaging" and "clear soundstage" have helped create a less
| compressed and generally cleaner sound.
| yanis_t wrote:
| Not quite sure if you aware but another AI music generator just
| lunched today https://udio.com/
| cpill wrote:
| haha, this track is hysterical
| https://www.udio.com/songs/jGjYfsRosZjYTkSBdFgEyF
| echelon wrote:
| This space is going to get very full, very fast. Udio just
| launched and improves upon "SOTA" Suno. This will just keep
| coming.
|
| Focus on product. Give actual music producers something they'll
| find useful. These fad, meme products will compete on edge model
| capability for 99% of users and ignore serving actual music
| producers.
|
| I'd like a product with more control, and it doesn't appear Suno
| or Udio are interested in this.
| mrnotcrazy wrote:
| I'm not sure its that they aren't interested, I think its just
| really hard.
| internet101010 wrote:
| Exactly. As of now, Suno can be used as template but you still
| need to go to DAW and make it from scratch. So... individual
| tracks for each instrument/vocals that can be exported and
| brought into DAW is what is needed. For me anyway.
| pachico wrote:
| Good luck! I just tried it and the interface was a bit confusing.
| It allowed me to only fill the last input in the form, which is
| usually a bit counterintuitive.
|
| I presentes this prompt "Noir detective music from the 60s. Low
| tempo, trumpet and walking bass" and got back a one-note only
| song that has nothing to do with the prompt if not for some
| lyrics that were a bit ridiculous.
|
| This is just feedback, I'm passionately expecting something like
| this to surprise me but I know it's really hard!
|
| Happy to share the song/project/account, if you tell me how to :)
| zaptrem wrote:
| Weird. We pushed a BPM assist feature last night that may have
| unforeseen consequences for genres we didn't test (we tried
| pop, edm, classic rock). I'll turn it off by default for now.
| Try checking the instrumental box too.
| Redster wrote:
| Congrats on the launch! I had a similar issue as the comment
| above. I put in the prompt "Celtic symphonic rock" (which
| seems to work on Suno.ai) and some lyrics. The output ended
| up being just readings of the lyrics without any music,
| except some artifact-level whispering of music when the voice
| was silent. Would definitely love to see some demos of what
| it can produce!
| weatherlight wrote:
| yeah, Sounds pretty bloodless to me.
| bufferoverflow wrote:
| Check out what Udio can produce. It's so far ahead.
|
| https://twitter.com/nasescobar316/status/1777481957774872704
|
| https://twitter.com/apples_jimmy/status/1777905772384678149
|
| https://twitter.com/HalimAlrasihi/status/1778118063138673137
|
| https://twitter.com/AngryTomtweets/status/177811764524768059...
|
| https://twitter.com/AngryTomtweets/status/177811769943385715...
| klohto wrote:
| meh, Suno v3 still has better quality for me personally
| throwup238 wrote:
| In my experience with Suno ($40 spent so far) sound quality
| is worse than the cherry picked examples from Udio -
| especially the vocals - but everything I've heard from Udio
| could best be described as the elevator music equivalent of
| their respective genres so that's probably why it sounds so
| good. There seems to be a real quality vs originality trade
| off in the state of the art.
|
| That said, I've only had the chance to generate a few songs
| with Udio and they have all sounded like they were recorded
| by a prison band in an overcrowded cell (I create mainly
| instrumental/orchestra/sound track music).
| bufferoverflow wrote:
| Not for me. Suno voices sound distinctly robotic/metallic.
| jmacd wrote:
| Something about a discussions of the nuance/taste of
| different LLMs for different purposes is really interesting
| to see when it is related to something like music.
| froyolobro wrote:
| These are pretty incredible. More compressed, but way better
| 'songwriting' and 'performance'
| swalsh wrote:
| None of these "songs" have any emotion.. AI music just doesn't
| make me "feel" anything yet.
| Kiro wrote:
| Would you pass a blind test?
| postalrat wrote:
| Probably because you haven't heard them before.
| suyash wrote:
| I bet that's only becuase you know it's created by AI. If no
| one told you that and you hear someone sing that song and
| play along bet you will feel. AI is only getting better, it
| will be just as good as any human and only way we will be
| able to tell is when it's disclosed if it's AI-generated or
| not.
| huac wrote:
| it's difficult to gauge from outside / as a consumer, but
| what's interesting is rarely where models are at a given point
| in time, but rather where the model/team will be with similar
| amounts resources. it may very well still be Udio (who
| presumably have significantly more resources than Sonauto), but
| I would hesitate to say that a compute advantage counts as
| being 'far ahead.'
| rlp wrote:
| Wow. I just had it write a song about being sad about losing my
| keys in r&b/soul style, I'm totally blown away:
|
| https://www.udio.com/songs/bDY5CYdJZP93AdpgpfBJNX
| bogwog wrote:
| > Sign in with Google
|
| Why?
| jedisct1 wrote:
| Dealbreaker for me.
| ragnarok451 wrote:
| 99% of the population finds this easier than setting up a
| user/pass. If you care about this, understand that you will not
| be the target user for most new apps. Incredible that this
| comes up on so many new Show HNs.
| 4chandaily wrote:
| It is bizarre that creating an account on this service
| depends on me also already having an account on another,
| completely unrelated service. This unrelated service also
| requires me to provide it (and notably not Sonauto, the
| service I was actually interested in) my mobile phone number.
| This unrelated service also just recently admitted it
| collects data about you even when it says it doesn't.
|
| As a community made up largely of picky nerds and pedants, it
| doesn't seem incredible at all that this comes up so often.
| More like inevitable.
| suyash wrote:
| It's quite the opposite for professional audience, most
| people don't want to give away their Google credentials to a
| 3rd party website that can get hacked tomorrow.
| ragnarok451 wrote:
| lol tell that to all the (quite successful) B2B SaaS apps
| that started with Google login as their only option
| theshackleford wrote:
| > most people don't want to give away their Google
| credentials to a 3rd party website
|
| Good thing that's not how it works then I suppose.
| jedisct1 wrote:
| Please offer alternatives to Google to sign-in.
| WhitneyLand wrote:
| Can Sonauto (or any tool currently) take an instrumental track
| and lyrics as input and generate vocals?
| zaptrem wrote:
| Rhythm Control can do this for a drum line, and we have a
| variations feature that should be able to do this for
| instruments as well.
| zitterbewegung wrote:
| Is there a project that would do a sample instead of whole songs
| ?
| lta wrote:
| I've tried to look a little bit around but couldn't find
| anything, so I'll ask here.
|
| Any plans to release the model(s) under an open license ?
| zaptrem wrote:
| This would be so cool, but we need to think more about how we
| could do it and make enough money in the future to train more
| models with even cooler features.
| lta wrote:
| That's a very polite way to say no. Thanks for the answer.
|
| Personally not interested then. I'll stick with Bitwig and
| Ardour until an open model is available
| arisAlexis wrote:
| meta has billions. Other startups can't just donate their
| IP to the world and then raise money to do multimillion
| training runs
| pksebben wrote:
| Neither of those look like they have a generative AI
| component.
|
| We (as a society) desperately need a way to train these
| models in a federated, distributed manner. I would be more
| than happy to commit some of my own compute to training
| open audio / text / image / you-name-it models.
|
| But (if I understand correctly) the current architecture
| makes this if not impossible, nearly so.
| echelon wrote:
| All models for all types of content will eventually have open
| source equivalents. The game is to build a great product.
| adenta wrote:
| I cant tell, will this let me upload an instrumental track and
| change the genre/instrument makeup? When I tried, I might've
| overwritten the prompt.
| zaptrem wrote:
| Upload an instrumental track, select it, then click "Use as
| Rhythm Control." Once you do that, you can give the model any
| new prompt and it should use the same rhythm (you may need to
| adjust the control strength depending on genre.)
|
| Genre changes for melodies/etc are coming once we finish
| variations (partial renoising like SDEdit basically).
| cchance wrote:
| Begs the question given this is diffusion based how much of the
| "ipadapter/faceid/controlnet" tech can be brought over, what
| would a audio-faceid or audio-ipadapter look like for something
| like this.
| zaptrem wrote:
| This is exactly what makes it so exciting for us!
| echelon wrote:
| IP-Adapter for music would be a game changer. Upload a
| reference sample, get something in that style.
| cchance wrote:
| Exactly, upload or even multiple songs for influence, some
| lyrics ... tada! Holy shit thats gonna be powerful
| givinguflac wrote:
| Any plans for alternate login systems? Don't want to use a Google
| account personally. I'd love to try it though. Thanks!
| zaptrem wrote:
| Which providers would you prefer? We tried Twitter last night
| but it wasn't working for some reason (kept redirecting
| immediately with no oauth page).
| 4chandaily wrote:
| Basic username and password auth has worked for millions for
| decades. If you absolutely must collect user data for some
| reason, an email address can be used as the username. This
| isn't a hard problem to solve.
| zaptrem wrote:
| For us it had nothing to do with collecting user data,
| adding what you mentioned would have just required another
| few hours of dev time haha. You're right that it's not hard
| to solve, we just wanted to focus on the rest of the app
| since there's only two of us. We can definitely add this
| though!
| 4chandaily wrote:
| Well, what I could see from this side of the wall looked
| professional and well put together. Impressive for a team
| of two.
|
| Congrats on the launch, regardless. I will be sure to
| check it out when it becomes more accessible.
| postalrat wrote:
| The problem is 1 person creating 10,000 accounts. Solve
| that and you will be rich.
| 4chandaily wrote:
| Why solve it at all? 10,000 fake accounts for every human
| is working out great for Elon. =)
|
| Seriously, though - the solution isn't to prevent people
| from doing this, it is to remove the incentives that
| encourage it.
| postalrat wrote:
| How do you remove the incentive? Don't allow free
| accounts?
| 4chandaily wrote:
| Don't use accounts at all for non-paid features.
| ale42 wrote:
| Hacker news ;-) But I guess there's no OAuth or other similar
| function on HN...
|
| More seriously, personally none of them, I don't have
| accounts on any "usually used" login providers. Just allow
| local accounts.
| cchance wrote:
| Question since your now doing diffusion couldn't you also train
| something akin to a "upscaler" to improve the overall quality of
| the output as that seems to be a big complaint, it feels like it
| should be possible to train an upscaling audio model by feeding
| it lower quality versions of songs and high quality FLAC for it
| to learn how to improve audio via diffusion upscaling
| zaptrem wrote:
| This can definitely be done. There are approaches that turn the
| decoder part of the autoencoder into another diffusion model.
| The drawback is that's much more expensive computationally. We
| think there's still a lot of room for better quality on the AE
| side and can't wait to show our improvements.
| ionwake wrote:
| I dont know about the scene but i thought this was great! I was
| given 3 tracks, I have to say one had no sort of beat to it, so
| it was like noise, but the other 2 were fantastic. great stuff!
| zaptrem wrote:
| Thanks! We have a BPM assist that can enforce rhythm as well,
| so you could try that, too!
| adrianh wrote:
| I'm interested to hear more about your statement of "Our goal is
| to enable more of you, not replace you."
|
| Speaking as a musician who plays real instruments (as opposed to
| electronic production): how does this help me? And how does this
| enable more of me?
|
| I am asking with an open mind, with no cynicism intended.
| zaptrem wrote:
| If the future of music was truly just typing some text into a
| box and taking or leaving what the machine gives you that would
| be kinda depressing.
|
| We want you to be able to upload recordings of your real
| instruments and do all sorts of cool things with them (e.g.,
| transform them, generate vocals for your guitar riff, use the
| melody as a jazz song, or just get some inspiration for what to
| add next).
|
| IMO AI alone will never be able to touch hearts like real
| people do, but people _using_ AI will be able to like never
| before.
| Version467 wrote:
| Just to clarify, when you say _never_. Do you actually mean
| never (or some practical equivalent like ~100 years), or do
| you mean not right now, but possibly in 5-10 years?
|
| I'm just asking to try to build some intuition on what people
| who actually train soa models think were capabilities are
| heading.
|
| Either way, congrats on the launch :)
| zaptrem wrote:
| Never == "There will never be tears in my eyes as an AI
| sings ChatGPT-generated lyrics about the cycle of poverty a
| woman is stuck in (https://en.wikipedia.org/wiki/Fast_Car)
| because I know all of those experiences are made up."
| visarga wrote:
| The real value of AI is to be like a map, or like a
| mirror house, it reflects and recombines all our
| experiences. You can explore any mental space, travel the
| latent space of human culture. It is the distillation of
| all our intelligence, work and passion, you should show
| more respect and understand what it is. By treating it as
| if it were worthless you indirectly do the same for the
| training corpus, which is our heritage.
|
| If AI ever surpasses human level in art it will be more
| interesting to enjoy its creations than to ban it. But
| we're not there for now, it just imitative, it has no
| experiences of its own yet. But it will start having
| experiences as it gets deployed and used by millions,
| when it starts interacting with artists and art lovers in
| longer sessions. With each generative art session the AI
| can collect precious feedback targeted to its own
| performance. A shared experience with a human bringing
| complementary capabilities to its own.
| digging wrote:
| Assume a song comes on the radio in 3 years and you like
| it. How do you know it's not entirely AI-generated?
| parpfish wrote:
| There's also the fact that a major component of music
| fandom is about the community and sense of personal
| identity that derives from an artist or a particular
| scene.
|
| Saying that you're a big fan of a band doesn't just mean
| "I like the audio they produce" but often means something
| much bigger about your fashion/style and personal values.
|
| How would any of that work with AI music? Is it possible
| to develop a community around music if everything is made
| on demand and nobody experiences the same songs? Will
| people find other like-minded music fans by recommending
| their favorite prompt engineers to each other?
| digging wrote:
| Personally I get very worried reading statements like "AI
| will never be able to do X", because they seem like
| obviously false statements. I think if one asserts AI will
| never be able to do a thing a human brain can do, that
| needs to be proven, rather than the other way around. For
| example, if we could reverse engineer the entire human
| neurology and build an artificial replica of it, why
| wouldn't we expect it to be able to do everything exactly
| as a human?
| shepherdjerred wrote:
| I don't understand those "AI will never be able to do X"
| statements.
|
| Surely AI will be able to do _anything_ in 1000 years. In
| 100 years it will almost definitely be able to replace
| most knowledge-based jobs.
|
| Even today it can take away many entry-level jobs, e.g. a
| small business no longer needs to hire someone to write a
| jingle, or create a logo.
|
| In 10 years, I would expect much of programming to either
| disappear or dramatically shift.
| anigbrowl wrote:
| But then why are you going down the dead-end route of
| generating complete songs? Nobody wants this except marketing
| people.
|
| I've said it before, there, is no consumer market for an
| infinity jukebox because you can't sing along with songs you
| don't already know, there's already an overabundance of
| recorded music, and emotion in generative music (especially
| vocals) is fake. Nobody likes fakery for its own sake.
| Marketers like it because they want musical wallpaper, the
| same way commercials have it and it increasingly seeps into
| 'news' coverage. The market for fully-generated songs is
| background music in supermarkets, product launch videos, and
| in-group entertainment ('original songs for your company
| holiday party! Hilarious musical portraits of your favorite
| executives - us!').
|
| If you want to innovate in this area (and you should, your
| diffusion model sounds interesting), make an AI band that can
| accompany solo musicians. Prioritize note data rather than
| fully produced tracks (you can have an AI mix engineer as
| well as an AI bass player or drummer). Give people tools to
| build something _in stages_ and they 'll get invested in it.
| People want interactivity, not a slot machine. Many musicians
| _love_ sequencers, arpeggiators, chord generators, and other
| musical automata; what they don 't love is a magic 8-ball
| that leaves themw ith nothing to do and makes them feel
| uncreative.
|
| Otherwise your product will just end up on the cultural
| scrapheap, associated with lowest-common denominator fakers
| spamming social media as is already happening with imagery.
| bongodongobob wrote:
| I've essentially been running an infinity jukebox for the
| last week. I save the ones I like and relisten. Simple as
| that.
|
| Edit: It's been interesting watching non-musicians argue
| about emotion in music. I don't care who you are, the 300th
| time you perform a song, you're faking it to a large
| degree. People see musicians as these iconic, deep,
| geniuses, but most of us are just doing our job. You don't
| get excited about the 300th boilerplate getter and setter
| just like we aren't super excited about playing some song
| for the 300th time. It's a performance. It's pretend. A
| musician singing is like an actor performing. It's not as
| real as you think it is.
| parpfish wrote:
| But emotion was (most likely) involved when you wrote or
| first recorded the song, and that's what people connect
| with.
|
| If you go to a concert and you hear the headliner play a
| love ballad followed up by a breakup song, you don't
| expect them to actually be going through those emotions
| in real time.
| notahacker wrote:
| > Many musicians love sequencers, arpeggiators, chord
| generators, and other musical automata; what they don't
| love is a magic 8-ball that leaves them with nothing to do
| and makes them feel uncreative.
|
| I think this is the key bit. A lot of modern music is
| already created in the DAW (the original version of FL
| Studio picking a 140bpm default beat defined entire music
| scenes in the UK!) with copy/paste, samples, arpeggiators
| and other midi tools and pitch shifting. Asking a prompt to
| add four bars of accompaniment which have a
| $vaguetextinstruction relation to the underlying beat and
| then picking your favourite but asking them to
| $vaguetextinstruction the dynamics a bit can actually feel
| _more_ like part of the creative process than browsing a
| sample library for options or painstakingly moving notes
| around on a piano roll. Asking a prompt to create two
| minutes of produced sound incorporating your lyrics, not so
| much.
|
| And I think a DAW-lite option, ideally capable of both MIDI
| and produced sound output is the way forward here. Better
| still with i/o to existing DAWs
| chefandy wrote:
| > If the future of music was truly just typing some text into
| a box and taking or leaving what the machine gives you that
| would be kinda depressing.
|
| Hm... From my vantage point, it seems like a pretty weird
| choice of businesses if you think that.
|
| > IMO AI alone will never be able to touch hearts like real
| people do, but people using AI will be able to like never
| before.
|
| That's all very heartwarming but musicianship is also a
| profession, not just a human expression of creativity. Even
| if you're not charging yet, you're a business and plan on
| profiting from this, right? It seems to me that:
|
| 1) Generally, if people want music currently, they pay for
| musician-created music, even if its wildly undervalued in
| venues like streaming services.
|
| 2) You took music, most of which people already paid
| musicians to create and they aren't getting paid any more
| because of this, and you used it to make an automated service
| that people will be able to pay for music instead of paying
| musicians.
|
| 3) Your service certainly doesn't hurt, and might even
| enhance people's ability to write and perform music without
| considering the economics of doing so. For example,
| hobbyists.
|
| 4) So you're not trying to replace musicians making music
| with people typing in prompts-- you're trying to replace
| musicians being paid to make music with you being paid to
| make music. Right? Your business isn't replacing musicianship
| as a human art form, but for it to succeed, it will have to
| replace it, in some amount, as a profession, right? Unless
| you are planning on creating an entirely new market for
| music, fundamentally, I'm not sure how it couldn't.
|
| Am I wrong on the facts, here? If so, well hey, this is
| capitalism and that's just how it works around here. If I'm
| mistaken, I'd like to hear how. Regardless, this is very
| consequential to a lot of people, and they deserve the people
| driving these changes to be upfront about it-- not gloss over
| it.
| LZ_Khan wrote:
| Inspiration? You can generate hundreds of ideas in a day. The
| tracks will not be perfect but that's where actual musicians
| can take the ideas/themes from the tracks and perfect it.
|
| In this way it is a tool only useful to expert musicians.
| jimmyjazz14 wrote:
| I mean if you want inspiration there are literally millions
| of amazing songs on Spotify by real musicians. I have yet to
| hear an AI composed song that was in the least bit musically
| inspiring.
| suyash wrote:
| That is just 'marketing speak' so as long you are their
| customers, they need to make money from users who will be using
| their service to make music.
| 93po wrote:
| When Suno came out I spent literally hours/days playing around
| with it to generate music, and came out with some that's really
| close to good, and good enough I've gone back to listen to a
| few. I'd love the tooling to take a premise and be able to
| tweak it to my liking without spending 1000 hours learning
| specific software and without thousands of hours learning to
| play an instrument or learning to sing.
| dwallin wrote:
| I think the problem here is the same one as the other current
| music generation services. Iteration is so important to
| creativity and right now you can't really properly iterate. In
| order to get the right song you just spray and pray and keep
| generating until one that is sufficient arrives or you give up. I
| know you hint at this being a future direction of development but
| in my opinion it's a key feature to take these services beyond
| toys.
|
| I think it's better to think of the process of finding the right
| song as a search algorithm through the space of all possible
| songs. The current approach just uses a "pick a random point in a
| general area". Once we find something that is roughly correct we
| need something that lets us iteratively tweak the aspects that
| are not quite right, decreasing the search space and allowing us
| to iteratively take smaller and smaller steps in defined
| directions.
| zaptrem wrote:
| Our variations feature coming very soon is exactly this! Rhythm
| Control is an early version of this.
| SubiculumCode wrote:
| More strength does what? More or less similar?
| zaptrem wrote:
| More strength = force rhythm more. If you crank it to max
| it will probably result in just a drum line, so I prefer
| 3-4.
| SubiculumCode wrote:
| I uploaded a bit of a song that I recorded once (that I
| wrote, unpublished), and I am trying to get it to riff on it,
| generate something close to it, etc.
| dwallin wrote:
| I'll keep an eye out for that! The variations feature in Suno
| is a good example of what not to do here, as it effectively
| just makes another random iteration using existing settings.
|
| I think the other missing pieces I've found are upscaling and
| stem splitting. While existing tool exist for splitting stems
| exist, my testing found that this didn't work well in
| practice (at least on Suno music), likely due to a
| combination of encoder-specific artifacts and the overall low
| sound quality. Existing upscaling approaches also faced
| similar issues.
|
| My naive guess is that these are things that will benefit
| from being closely intertwined with the generation process.
| Eg when splitting up stems, you can use the diffusion
| model(s) to help jointly converge individual stems into
| reasonable standalone tracks.
|
| I'm excited about the potential of these tools. I've
| definitely personally found uses cases for small independent
| game projects where a paying for musicians is far out of
| budget, and the style of music is not one I can execute on my
| own. But I'm not willing to sacrifice on quality of results
| to do so.
| zaptrem wrote:
| Our variations feature will be nothing like Suno's (which
| just generates another song using the same prompt/lyrics).
| Since we use a diffusion model, we can actually restart the
| generation process from an early timestep (e.g., with a
| similar seed or even parts of the existing song) to get
| exactly what you're looking for.
| throwup238 wrote:
| _> Our variations feature will be nothing like Suno 's
| (which just generates another song using the same
| prompt/lyrics)._
|
| That's their "Remix" feature which just got renamed
| "Reuse prompt" or something.
|
| Their extend feature generates a new song starting from
| an arbitrary timestamp, with a new prompt. It doesn't
| always work for drastic style changes and it can be a bit
| repetitive with some songs but it doesn't completely
| reroll the entire song.
| Barneyhill wrote:
| Yep, I came to similar conclusions w/ text-to-audio models - in
| terms of creative work the ability to iterate is really lacking
| with the current interfaces. We've stopped working on text-to-
| audio models and are instead focusing on targeting a lower-
| level of abstraction by directly exposing an Ableton
| environment to LLM agents.
|
| We just published a blog today discussing this -
| https://montyanderson.net/writing/synthesis
| ctrw wrote:
| Basically you need something like comfy UI for music.
|
| Variation in small details is fine, but you need control over
| larger scale structure.
| boringg wrote:
| I want to say two things -- one congrats - I am sure your team
| has been working exceptionally hard to develop this - and the
| songs sound reasonable good for AI! Two I am soo competely
| unenthusiastic about AI music and it infiltrating the music world
| - all of it sounds like fingernails on a chalkboard. Just
| mainstream overproduced low quality radio music. I know its a
| stepping stone but it kills me to listen to it right now.
| zaptrem wrote:
| Agreed. My thoughts on this are here;
| https://news.ycombinator.com/item?id=39992817#39994616
|
| Also, our model specifically excels at songs from the era
| before overproduction. Try asking for a Johnny Cash or Ella
| Fitzgerald-style country or swing/jazz song!
|
| Here's an example:
| https://sonauto.ai/songs/taJX3GrKZW7C5qOhjopr
| cowboylowrez wrote:
| how does the model know how to do a johnny cash style? did
| you feed it johnny cash tracks? if so, what were the
| licensing terms? are you interested in answering these
| questions about training data or would this be too dodgy to
| chat about on a tech website?
| visarga wrote:
| That's because you didn't listen to the MIT license song. Gen
| music has the potential to make even the driest texts sound
| good, I didn't realize that before. How about paper abstract
| music?
| https://suno.com/song/cb729eb6-4cc5-4c15-ab74-0cdbef779684
| _DeadFred_ wrote:
| 80% of music is familiarity, 20% novelty, yet the majority of
| peoples' time goes into getting the 80% down so that they can
| add their 20%.
|
| Look at current music production and compare it to past. Older
| music seems so much simpler. It was so much easier to come up
| with that 20% 'novel' when pop/recorded music was new.
| Ironically I think AI freeing people to focus on that 20% is
| going to add a lot of creativity to music, not reduce it.
|
| I say this as someone who hates the concept of AI music. I'm
| actually really excited to see what it enables/creates (but I
| don't want to use it, even though I really could use it for
| vocals that I currently pay others to do for me).
|
| I'll be here making my bad knockoffs of bad synth pop bands
| having fun and taking weeks to do 5% of what kids these days
| will start off as their entry point, with my 20% creativity
| ignored because my music sounds 'off' when I can't get the 80%
| familiar down.
|
| People thought synthesizers were the end of music, yet Switched
| on Bach begot Jean Michel Jarre begot Kate Bush and on and on.
| mewpmewp2 wrote:
| I would agree when AI gets to a point where it's possible to
| do that 20%. It is just not possible yet to combine it in
| such ways. Right now you basically get whatever music, but
| there's no way to add that 20%. Same with image/video
| generation. AI advancements have obviously been amazing and
| far beyond what I would've expected, but there's still ways
| to go.
| garyrob wrote:
| My hobby is songwriting. (Example:
| https://www.youtube.com/watch?v=Kjng3UoKkGk)
|
| I play guitar, but I'm not much of a guitarist or singer. I
| really like songwriting, not trying to be polished as a
| performer. So I intermittently look into the AI world to see
| whether it has tools I could use to generate a higher-quality
| song demo than I could do on my own.
|
| I've been looking for something that could take a chord
| progression and style instructions and create a decent backing
| track for a singer to sing over.
|
| But your saying "Very soon you'll also be able to generate proper
| variations of an uploaded or previously generated song (e.g., you
| could even sing into Voice Memos for a minute and upload that!)"
| is very intriguing. I mean, I can sing and play, it just isn't
| very professional. But if I could then have an AI take what I did
| and just... make it better... that would be kind of awesome.
|
| In fact, I believe you could have a very big market among
| songwriters if you could do that. What I would love to see is
| this:
|
| My guitar parts are typically not just strummed, but involve
| picking, sometimes fairly intricate. I'm just not that good at
| it. It would be fantastic to have an AI that would just take
| would I played and fix it so that it's more perfect.
|
| And then to have a tool where I could say, "OK, now add a bass
| part," and "OK, now add drums" would be awesome.
| rideontime wrote:
| Or you could put up some flyers and make some friends.
| zaptrem wrote:
| Awesome to hear this resonates with you! If you join our
| Discord server I'll ping @everyone when improvements are ready.
| mschulkind wrote:
| Check out this AI vocals plugin. It's pretty impressive
| already.
|
| https://youtu.be/PCYTqDSUbvU
| saaaaaam wrote:
| How worried are you about being sued? Seems like your training
| data probably includes quite a bit of copyright protected stuff.
| Just listened to the "blue scoobie doo" example and the
| influences are fairly obvious. With record companies getting
| super litigious about this, is that a concern? Or did you licence
| your training data?
| cush wrote:
| There's a lot of negative comments here, but these are the
| earliest days and generating entire songs is kind of the hello
| world of this tech.
|
| There's always going to be a balance between creating high level
| tools like this with no dials and low level tools with finer
| control, and while this touts itself as being "more
| controllable", it's clearly not there. But, the same way Adobe
| has integrated outpainting and generative fill into Photoshop,
| it's only a matter of time before products like this are built
| into Ableton and VSTs - where a creator can highlight a bar or
| two and ask your AI to make the the snippet more ethereal, create
| a bridge between the verse and the sax solo, or help you with an
| outro.
|
| That said, similar to generating basic copy for a marketing site,
| these tools will be great for generating cheap background music
| but not much else, but any musician, marketing agency, or film-
| maker worth their salt is going to need very specifically branded
| music for their needs, and they're likely willing to pay for a
| real licence to something audiences will recognize, using
| generative AI and tools to remix the content to their specific
| need.
| cush wrote:
| Does it use a male voice by default? Just clicking on random
| songs, it took me 20+ tries to find a female voice
| rcarmo wrote:
| Nice, but Google login is a no-go for me (or any form of social
| login, really).
| CuriouslyC wrote:
| I don't feel like prompt understanding is very good, I don't
| think I really ever got close to what I wanted with any of the
| attempts I made, I imagine learning the model tags and building
| some intuition might help but I wouldn't bother with that unless
| I was tinkering with a local model.
|
| Some things it made sounded ok, but I feel like the average
| generation quality wasn't fantastic. It did a folk guitar melody
| and a vocoded thrash metal voice that I thought sounded pretty
| legit, but mostly vocals had an ear grating quality and
| everything had a bit of low bitrate vibe.
|
| To be honest though, I don't think you need to try and outcompete
| Suno. I think you want to get into DAWs and VSTs and become the
| tool all the best producers in the world use. Spit out stems, and
| train your model on less processed sounds because things like
| matching reverb/delay and pre-squashed dynamics are a pain in the
| ass to work around.
|
| Suno is trying to battle a large established industry that is
| actually very creator friendly and accessible. If you choose to
| instead serve that industry and enable it I think that's the
| winning play.
| zaptrem wrote:
| The vast majority of our time was spent figuring out the model
| architecture and large-scale distributed training, and step 2
| (starting now) is scaling everything up. Prompt understanding
| and audio quality will get significantly better once we swap in
| a larger text embedding model.
|
| Thanks for the feedback re: DAWs, though! That would be really
| cool. Maybe we can tag tracks based on the effects applied to
| them to allow this to be more controllable.
| herval wrote:
| What's your thoughts on copyright and how holders might react in
| a system like that?
|
| My understanding of the music industry is the incumbents are VERY
| lawsuit happy, and plagiarism laws are substantially more
| reaching than with image or video (eg cases where someone gets
| sued for using the same chords as another song) - how do you plan
| to approach all that?
| cwillu wrote:
| A volume control is not optional, and titling the song usually
| comes last for me, which means I have to give a nonsense name in
| the app before I've started.
| giancarlostoro wrote:
| I was going to ask what it was coded in then noticed '/Home' in
| the URL bar. Is this by chance ASP .NET? :)
| zaptrem wrote:
| No, it's React (Native Web) Navigation... mistakes were made
| haha.
| digging wrote:
| > Sign in with Google
|
| Well, maybe I'll try out the next AI music creator posted on HN.
| giancarlostoro wrote:
| How's this compare to Suno?
|
| https://suno.com/
| browningstreet wrote:
| Hmm, I get "peppy cola commercial before movie starts" vibes off
| most of the vocals.
| ALittleLight wrote:
| This has to be bad for Spotify, right? Infinite low cost music
| generation from multiple competitors challenges Spotify's moat
| and forces them to develop a similar product and compete away
| profits from innumerable challengers - or else just go out of
| business.
| artur_makly wrote:
| This app made my day. I literally just created my dream CD of
| Weird-Al-inspired parody songs. thank you.
| alexpogosyan wrote:
| Is there a music-generating AI that takes audio as input? I'm
| looking to upload simple guitar melodies or chord progressions
| I've doodled and receive an enhanced version back. Similar to how
| image generators turn doodles/sketches into polished drawings.
| zug_zug wrote:
| I love this. What this needs imo is the ability to generate X
| samples (I see you already have that) and then say "Now generate
| 3 more like this one, with the following change: ..." I think
| this was a killer feature for midjourney.
| e12e wrote:
| I'm somewhat positively surprised by my first attempt - simple
| prompt, no editing of the (admittedly flat) lyrics: song to a
| robot harvester, "Robot Friend":
|
| https://sonauto.ai/songs/avg5NT3qf9QYNfWAyeOn
|
| Look forward to playing with this.
| realfeel78 wrote:
| Some uses of AI can be net positive for society. Making fake
| music is not one of them.
| mewpmewp2 wrote:
| Until the music is just more beautiful than whatever people
| could generate. But you are correct. We are not there yet.
| dengsauve wrote:
| Had a blast playing around w/prompts and listening to the various
| results.
|
| I play piano, sax, guitar, and I can sing well enough. I'm
| garbage at songwriting and composing. I immediately see the value
| of using this tool to scaffold an idea out. I think being able to
| export lyrics and chord progressions would be an amazing paid
| feature to keep this as a freemium product.
| BonoboIO wrote:
| This works amazing even with German lyrics and a mashup of Till
| Lindemann from Rammstein and 1970s Rock
|
| https://sonauto.ai/song/JSmCpJssZeIS2C87pkQW
___________________________________________________________________
(page generated 2024-04-10 23:00 UTC)