[HN Gopher] Hybrid-Net: Real-time audio source separation, gener...
___________________________________________________________________
Hybrid-Net: Real-time audio source separation, generate lyrics,
chords, beat
Author : herogary
Score : 178 points
Date : 2024-03-26 12:42 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tremarley wrote:
| This is incredible
| floathub wrote:
| Indeed.
|
| Only things I've seen it get wrong in a few minutes of testing
| are french lyrics (try a Serge Gainsbourg for example).
|
| But it is really, really amazing.
| yetihehe wrote:
| Presented some strange unscrollable tables, then when trying
| to press play, it threw some error. One video was probably
| tabulated, but inaccessible in player. Meh, maybe will work
| when HN effect wears off, I was searching for such service
| lately but everything required logging in or was "for
| business and education use, trust us, it works".
| jszymborski wrote:
| Here's a question for folks who work on DL for audio: what are
| folks using for vocoders these days?
|
| I feel like that's where a lot of artifacts are introduced (at
| least for TTS) and the best methods a while ago were slow and
| autoregressive.
| herogary wrote:
| In recent years, there has been substantial advancement in
| vocoders for DL audio applications. WaveGAN and MelGAN have
| emerged as promising solutions, harnessing the power of
| generative adversarial networks (GANs) to produce high-fidelity
| audio. Furthermore, parallel-waveGAN and HiFi-GAN have
| showcased improved efficiency with quicker inference times
| while maintaining exceptional audio quality.
| jszymborski wrote:
| Thanks!!
| timetraveller26 wrote:
| wow, I remember hearing about the Cocktail mix problem a few
| years back and I thought it may be near to impossible to solve,
| this may not be quite there yet but AI progress is impressive.
|
| You can try it on their website https://lamucal.ai/
| MaryCsb wrote:
| Great experience
| benob wrote:
| It doesn't make sense to show the progress of AI processing for
| featured songs which have been seen by dozens of visitors and
| surely are cached. Is this trick related to copyright issues?
| timetraveller26 wrote:
| Hopefully no DMCA of something like that affect this project,
| long time ago I wanted to build a lyrics translation website, but
| apparently you may get in trouble for using copyrighted lyrics.
| herogary wrote:
| The data sources come from YouTube or user-uploaded audio, and
| the lyrics are extracted from the audio using AI models.
| kej wrote:
| There appears to be an issue with the lyric matching, which
| I'm guessing is based on caching the previously generated
| lyrics for different songs with the same name. I noticed it
| specifically on the song "Sweet Pea" by the Tobasco Donkeys,
| where it shows tabs and lyrics but they don't match the song.
| sevagh wrote:
| Any details on the source separation performance (SDR and other
| BSS metrics)?
| herogary wrote:
| The SDR is 6.3
| tgkudelski wrote:
| Am I missing something? What license is this released under?
| MaryCsb wrote:
| It looks like the network structure of the model is provided,
| but the dataset is not.
| mkl wrote:
| No license, so the default "all rights reserved" applies. Can't
| really be used for anything.
| fallinditch wrote:
| Well done! Worked very well for a song I tried. This is a useful
| tool for learning music production and theory. Downloading the
| generated midi file it only gave me the basic chords - that's
| useful but it would be awesome to get midi of all the
| instruments. I will definitely be exploring this further, good
| stuff!
| iamjackg wrote:
| I'm a little confused, hopefully somebody can point me in the
| right direction. Can this be run locally, or are all the models
| proprietary and hidden somewhere? The repo seems to only contain
| the inference code.
| timlod wrote:
| Pretty cool! Tried it with RHCPs Dani California
| (https://lamucal.ai/songs/red-hot-chili-peppers/dani-californ...)
| and there's a lot of wrong chords still. Impressive nonetheless,
| and already quite useful in the song-part recognition (assuming
| it's all the ML)! Lyrics seem right too.
|
| The source separation only seems to be available when downloading
| their app, which I didn't do, so I can't comment on that.
| collinmehle wrote:
| I downloaded and tried their app, experiencing the audio source
| separation feature, and ended up with five tracks (piano,
| vocals, drums, bass, and others). It sounds pretty good, but
| unfortunately, there is no guitar track.
| peab wrote:
| So cool!
|
| I just tried it with a song with a fairly complicated chord
| progression - yesterday by the beatles. It did pretty well! But
| it got a couple parts wrong.
|
| Is there support for modifying the results of the chords/lyrics?
| I don't see it immediately.
| herogary wrote:
| We're about to roll out features for chord and lyric
| modifications, and we'll continue to optimize the model going
| forward.
| hipnoizz wrote:
| Well, from the PoV of someone who tries to learn how to play the
| guitar I must say that all this AI frenzy managed to produce some
| useful tools ;-)
|
| I checked https://lamucal.ai/ with some example MP3:
|
| - lyrics are OK (although I've seen tools that managed to do
| better),
|
| - chords recognition wasn't bad,
|
| - the UI is a bit rough around the edges (and I managed to get
| some Unity-related errors),
|
| - pitch-aware speed adjustments is always a great tool when
| someone tries to learn how to play the song,
|
| - transposing can be useful as well (although the web application
| does not support it).
|
| I'm using (and paid for) some other similar application, although
| I primarily use that for tracks separation. Later I import tracks
| into Ardour and then record my own guitar lines. I use just a
| miniscule percentage features of the DAW, so if someone could
| provide an application with all that AI goodies coupled with
| recording ability that would be wonderful.
|
| That said in personally I've found that one way or another I need
| to listen a lot to the song I'm trying to learn, make notes,
| break down the song structure (sections, strumming patterns,
| chords etc.). And a good video on YouTube that starts with a
| simple version of the song and then adds more and more feature
| are often the best help to start with, at least at my current
| level.
| herogary wrote:
| Thank you for raising the issue. We are continuously optimizing
| our model, and we are also constantly gathering various UI and
| business-related bugs. We will continue to optimize and resolve
| them in the future.
| koiueo wrote:
| LOL https://lamucal.ai/songs/john-coltrane/john-coltrane-
| giant-s...
|
| It's definitely on to something. I wonder how would it perform if
| it was trained on Jazz.
| benzible wrote:
| Ha, Giant Steps was the first thing I tried as well. Not bad,
| certainly has some way to go.
| prophesi wrote:
| Any chance of adding documentation for running it locally? venv /
| requirements.txt at the very least, or a docker image?
| eigenvalue wrote:
| Tried it on a couple of my wife's songs and she said it was quite
| inaccurate in terms of chords and tabs (the lyrics were pretty
| close though). This seems like one of those use cases where it's
| not particularly useful until it gets above some minimum accuracy
| threshold.
| herogary wrote:
| Thanks for sharing your experience. We appreciate the feedback.
| It's clear that improving accuracy, especially with chords and
| tabs, is a priority for us. We're committed to enhancing the
| accuracy of our tool to meet your expectations and provide a
| more valuable experience.
| CMLab wrote:
| At this point, I'm conflicted. The lyrics testing went well for
| me, but I'm not proficient in chords. Are the chord results
| really that poor? I wonder if I can use it to play my favorite
| songs in real life.
| CSSer wrote:
| I play guitar/piano, and I concur that they're not great. I
| tried "Vagabond (acoustic)" by Wolfmother. I figured it would
| have an easier time because it's just a vocal and an acoustic
| guitar. Some of the notes in the tabs are right, but the
| melody is too simplistic. All of the embellishments are also
| missing. It's interesting how the mile-high view isn't so bad
| though. If I sat down to figure out the notes for a track no
| tabs exist for, this might look like a rough approximation of
| what I'd start with.
| weinzierl wrote:
| I tried the open source one that Spotify published a while ago on
| jazz trio music (just piano, double bass and drums) but it was
| pretty useless. My experiments with the trials of some commercial
| services, where you select an instrument and it extracts just
| that were much better.
|
| If the piano is a Rhodes then extracting electric guitar works
| well, extracting a piano not at all.
| herogary wrote:
| The testing model for guitar separation is currently under
| development. The test results are somewhat unsatisfactory due
| to the significant variations in guitar instrument tones,
| especially for electric guitars. This adds to the difficulty of
| training
| adrianh wrote:
| I tried it on one of my own tunes:
|
| https://lamucal.ai/songs/adrian-holovaty/adrian-holovaty-the...
|
| The beats/chords were consistently a full beat off, and the
| chords were probably only 50% right. I chose this tune because
| (to my ears) the harmony is pretty clear.
|
| Compare this to my own manually created transcription of the same
| tune, and it's night-and-day difference:
|
| https://www.soundslice.com/slices/tpbwc/
|
| Beat detection and chord detection are hard problems, likely due
| to a lack of diverse training data. Chordify (another site that
| does this, which has been around for ages) has roughly similar
| performance.
|
| Full disclosure: I run Soundslice, a website built around synced
| sheet music, in which there's no automatic transcription involved
| (maybe someday, but the tech isn't good enough yet!). I've been
| following these developments for 15+ years.
| herogary wrote:
| Thanks for your feedback. We're currently in the process of
| adjusting our dataset and model to address issues with chords
| and rhythm. We're looking forward to providing you with a
| better experience in the future.
| luckydata wrote:
| sounds like your site could be a great training set, someone
| should reach out to you sounds like a good business
| opportunity.
| 999900000999 wrote:
| Cool project, but it's not very accurate. I wouldn't charge for
| this yet
| herogary wrote:
| Thank you for providing feedback. We will continue to optimize
| and improve the model.
| CMLab wrote:
| I tried a few songs and uploaded one of my favorite songs. The
| lyrics recognition result was excellent, but there were a few
| instances where some words were missing or incorrect. It would be
| great if there could be an option to edit the lyrics.
| herogary wrote:
| We are in the process of adding the lyrics editing feature, as
| well as chord and rhythm types. It will be released soon.
| atum47 wrote:
| I've always wanted to implement a FFT from scratch and play with
| it to separate audio waves but then a full time job came along. I
| guess once you separate vocals from everything else you can just
| feed it to a speech to text?
|
| To be completely honest, as a human that does not speak English
| natively, i find some lyrics hard to understand. I've seen native
| English speakers also having this problem. I think it's only
| neutral for a NN to do the same mistakes.
| herogary wrote:
| Source separation is commonly done by applying masks to the
| spectrogram. Deep learning is used to train the mask masks for
| different instruments' parameters. As you mentioned, this is
| the approach we will follow in the subsequent steps.
| avallach wrote:
| In the Android app I consistently get "Downloading model file"
| stuck at exactly -60830200% . Tried clearing data and caches and
| changing the connection.
| herogary wrote:
| Thank you for your feedback. Could you please email us the
| information of your phone model and system version? We will
| investigate promptly. In the meantime, you can try exiting the
| program and re-entering to see if that helps. Please also check
| your network connection.
| may4m wrote:
| does anyone know if virtualdj.com uses the same technique?
| herogary wrote:
| It looks like a DAW, and I'm not very familiar with their
| source separation. The technology we use has been publicly
| released on GitHub
| pseudocomposer wrote:
| I see a lot of "7M" chords in generated outputs, which isn't a
| type of chord I'm familiar with. Is this meant to be "M7" (major
| 7)?
| herogary wrote:
| Yes, it's a major 7th chord.
| johnmaguire wrote:
| My partner plays violin and was looking for a version of Lindsey
| Stirling's Crystallize without the violin part (or rather, with
| it turned down.)
|
| I found plenty of tools online to do this, but they were all
| credit-based and a bit annoying to use. I eventually found they
| were mostly using Demucs from Facebook:
| https://github.com/facebookresearch/demucs
|
| Really nice tool if you need simple splitting of things like
| drums, vocals, etc. It's not perfect, but it's a great start.
| herogary wrote:
| Yes, the Demucs mixing model has excellent SNR performance, but
| it is computationally intensive. It also incorporates a random
| mechanism, so each time it produces different spectrograms.
| sevagh wrote:
| You can run Demucs directly in your browser (through WASM) on
| my website: https://freemusicdemixer.com/
|
| No usage credits or cost, since it's all on your computer.
| joshuak wrote:
| I would think that by comparison to image models synthetic data
| would be relatively easy to generate for audio model training.
| I'm curious then why it continues to be so difficult to build a
| nearly flawless audio separation model. Is synthetic data being
| widely used? Is it just too hard of a problem to train even with
| this data? I don't have a good sense of what the most challenging
| aspects are of audio models.
| herogary wrote:
| Unlike images, audio signals are time-dependent and have
| complex temporal dynamics, making it more challenging to
| generate realistic synthetic data that captures the nuances of
| real-world audio. Meanwhile, the complex nature of audio
| signals, the scarcity of high-quality training data, and the
| subjective evaluation of audio quality collectively contribute
| to the ongoing challenges in building near-flawless audio
| separation models.
| crtified wrote:
| It doesn't seem very well pleased with The Dance of Eternity by
| Dream Theater. But it was quite impressive in some aspects with
| the likes of Nirvana.
|
| Future music students are going to be fortunate to have these
| kinds of tool. An instant split and full analysis of any song.
| Remixes and backing tracks on tap, etc.
|
| That said, we older students learned a lot from the process of
| doing this manually ourselves. Those lessons can still be learned
| and others besides, but the dynamics of the learning change with
| the tech.
| corn-cheese wrote:
| Finally, a new development in lyric transcription! I've been
| waiting to see this for ages!
|
| def get_lyrics(waveform, sr, cfg): # asr and wav2vec2 raise
| NotImplementedError()
___________________________________________________________________
(page generated 2024-03-26 23:00 UTC)