[HN Gopher] Hybrid-Net: Real-time audio source separation, gener...
       ___________________________________________________________________
        
       Hybrid-Net: Real-time audio source separation, generate lyrics,
       chords, beat
        
       Author : herogary
       Score  : 178 points
       Date   : 2024-03-26 12:42 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tremarley wrote:
       | This is incredible
        
         | floathub wrote:
         | Indeed.
         | 
         | Only things I've seen it get wrong in a few minutes of testing
         | are french lyrics (try a Serge Gainsbourg for example).
         | 
         | But it is really, really amazing.
        
           | yetihehe wrote:
           | Presented some strange unscrollable tables, then when trying
           | to press play, it threw some error. One video was probably
           | tabulated, but inaccessible in player. Meh, maybe will work
           | when HN effect wears off, I was searching for such service
           | lately but everything required logging in or was "for
           | business and education use, trust us, it works".
        
       | jszymborski wrote:
       | Here's a question for folks who work on DL for audio: what are
       | folks using for vocoders these days?
       | 
       | I feel like that's where a lot of artifacts are introduced (at
       | least for TTS) and the best methods a while ago were slow and
       | autoregressive.
        
         | herogary wrote:
         | In recent years, there has been substantial advancement in
         | vocoders for DL audio applications. WaveGAN and MelGAN have
         | emerged as promising solutions, harnessing the power of
         | generative adversarial networks (GANs) to produce high-fidelity
         | audio. Furthermore, parallel-waveGAN and HiFi-GAN have
         | showcased improved efficiency with quicker inference times
         | while maintaining exceptional audio quality.
        
           | jszymborski wrote:
           | Thanks!!
        
       | timetraveller26 wrote:
       | wow, I remember hearing about the Cocktail mix problem a few
       | years back and I thought it may be near to impossible to solve,
       | this may not be quite there yet but AI progress is impressive.
       | 
       | You can try it on their website https://lamucal.ai/
        
         | MaryCsb wrote:
         | Great experience
        
         | benob wrote:
         | It doesn't make sense to show the progress of AI processing for
         | featured songs which have been seen by dozens of visitors and
         | surely are cached. Is this trick related to copyright issues?
        
       | timetraveller26 wrote:
       | Hopefully no DMCA of something like that affect this project,
       | long time ago I wanted to build a lyrics translation website, but
       | apparently you may get in trouble for using copyrighted lyrics.
        
         | herogary wrote:
         | The data sources come from YouTube or user-uploaded audio, and
         | the lyrics are extracted from the audio using AI models.
        
           | kej wrote:
           | There appears to be an issue with the lyric matching, which
           | I'm guessing is based on caching the previously generated
           | lyrics for different songs with the same name. I noticed it
           | specifically on the song "Sweet Pea" by the Tobasco Donkeys,
           | where it shows tabs and lyrics but they don't match the song.
        
       | sevagh wrote:
       | Any details on the source separation performance (SDR and other
       | BSS metrics)?
        
         | herogary wrote:
         | The SDR is 6.3
        
       | tgkudelski wrote:
       | Am I missing something? What license is this released under?
        
         | MaryCsb wrote:
         | It looks like the network structure of the model is provided,
         | but the dataset is not.
        
         | mkl wrote:
         | No license, so the default "all rights reserved" applies. Can't
         | really be used for anything.
        
       | fallinditch wrote:
       | Well done! Worked very well for a song I tried. This is a useful
       | tool for learning music production and theory. Downloading the
       | generated midi file it only gave me the basic chords - that's
       | useful but it would be awesome to get midi of all the
       | instruments. I will definitely be exploring this further, good
       | stuff!
        
       | iamjackg wrote:
       | I'm a little confused, hopefully somebody can point me in the
       | right direction. Can this be run locally, or are all the models
       | proprietary and hidden somewhere? The repo seems to only contain
       | the inference code.
        
       | timlod wrote:
       | Pretty cool! Tried it with RHCPs Dani California
       | (https://lamucal.ai/songs/red-hot-chili-peppers/dani-californ...)
       | and there's a lot of wrong chords still. Impressive nonetheless,
       | and already quite useful in the song-part recognition (assuming
       | it's all the ML)! Lyrics seem right too.
       | 
       | The source separation only seems to be available when downloading
       | their app, which I didn't do, so I can't comment on that.
        
         | collinmehle wrote:
         | I downloaded and tried their app, experiencing the audio source
         | separation feature, and ended up with five tracks (piano,
         | vocals, drums, bass, and others). It sounds pretty good, but
         | unfortunately, there is no guitar track.
        
       | peab wrote:
       | So cool!
       | 
       | I just tried it with a song with a fairly complicated chord
       | progression - yesterday by the beatles. It did pretty well! But
       | it got a couple parts wrong.
       | 
       | Is there support for modifying the results of the chords/lyrics?
       | I don't see it immediately.
        
         | herogary wrote:
         | We're about to roll out features for chord and lyric
         | modifications, and we'll continue to optimize the model going
         | forward.
        
       | hipnoizz wrote:
       | Well, from the PoV of someone who tries to learn how to play the
       | guitar I must say that all this AI frenzy managed to produce some
       | useful tools ;-)
       | 
       | I checked https://lamucal.ai/ with some example MP3:
       | 
       | - lyrics are OK (although I've seen tools that managed to do
       | better),
       | 
       | - chords recognition wasn't bad,
       | 
       | - the UI is a bit rough around the edges (and I managed to get
       | some Unity-related errors),
       | 
       | - pitch-aware speed adjustments is always a great tool when
       | someone tries to learn how to play the song,
       | 
       | - transposing can be useful as well (although the web application
       | does not support it).
       | 
       | I'm using (and paid for) some other similar application, although
       | I primarily use that for tracks separation. Later I import tracks
       | into Ardour and then record my own guitar lines. I use just a
       | miniscule percentage features of the DAW, so if someone could
       | provide an application with all that AI goodies coupled with
       | recording ability that would be wonderful.
       | 
       | That said in personally I've found that one way or another I need
       | to listen a lot to the song I'm trying to learn, make notes,
       | break down the song structure (sections, strumming patterns,
       | chords etc.). And a good video on YouTube that starts with a
       | simple version of the song and then adds more and more feature
       | are often the best help to start with, at least at my current
       | level.
        
         | herogary wrote:
         | Thank you for raising the issue. We are continuously optimizing
         | our model, and we are also constantly gathering various UI and
         | business-related bugs. We will continue to optimize and resolve
         | them in the future.
        
       | koiueo wrote:
       | LOL https://lamucal.ai/songs/john-coltrane/john-coltrane-
       | giant-s...
       | 
       | It's definitely on to something. I wonder how would it perform if
       | it was trained on Jazz.
        
         | benzible wrote:
         | Ha, Giant Steps was the first thing I tried as well. Not bad,
         | certainly has some way to go.
        
       | prophesi wrote:
       | Any chance of adding documentation for running it locally? venv /
       | requirements.txt at the very least, or a docker image?
        
       | eigenvalue wrote:
       | Tried it on a couple of my wife's songs and she said it was quite
       | inaccurate in terms of chords and tabs (the lyrics were pretty
       | close though). This seems like one of those use cases where it's
       | not particularly useful until it gets above some minimum accuracy
       | threshold.
        
         | herogary wrote:
         | Thanks for sharing your experience. We appreciate the feedback.
         | It's clear that improving accuracy, especially with chords and
         | tabs, is a priority for us. We're committed to enhancing the
         | accuracy of our tool to meet your expectations and provide a
         | more valuable experience.
        
         | CMLab wrote:
         | At this point, I'm conflicted. The lyrics testing went well for
         | me, but I'm not proficient in chords. Are the chord results
         | really that poor? I wonder if I can use it to play my favorite
         | songs in real life.
        
           | CSSer wrote:
           | I play guitar/piano, and I concur that they're not great. I
           | tried "Vagabond (acoustic)" by Wolfmother. I figured it would
           | have an easier time because it's just a vocal and an acoustic
           | guitar. Some of the notes in the tabs are right, but the
           | melody is too simplistic. All of the embellishments are also
           | missing. It's interesting how the mile-high view isn't so bad
           | though. If I sat down to figure out the notes for a track no
           | tabs exist for, this might look like a rough approximation of
           | what I'd start with.
        
       | weinzierl wrote:
       | I tried the open source one that Spotify published a while ago on
       | jazz trio music (just piano, double bass and drums) but it was
       | pretty useless. My experiments with the trials of some commercial
       | services, where you select an instrument and it extracts just
       | that were much better.
       | 
       | If the piano is a Rhodes then extracting electric guitar works
       | well, extracting a piano not at all.
        
         | herogary wrote:
         | The testing model for guitar separation is currently under
         | development. The test results are somewhat unsatisfactory due
         | to the significant variations in guitar instrument tones,
         | especially for electric guitars. This adds to the difficulty of
         | training
        
       | adrianh wrote:
       | I tried it on one of my own tunes:
       | 
       | https://lamucal.ai/songs/adrian-holovaty/adrian-holovaty-the...
       | 
       | The beats/chords were consistently a full beat off, and the
       | chords were probably only 50% right. I chose this tune because
       | (to my ears) the harmony is pretty clear.
       | 
       | Compare this to my own manually created transcription of the same
       | tune, and it's night-and-day difference:
       | 
       | https://www.soundslice.com/slices/tpbwc/
       | 
       | Beat detection and chord detection are hard problems, likely due
       | to a lack of diverse training data. Chordify (another site that
       | does this, which has been around for ages) has roughly similar
       | performance.
       | 
       | Full disclosure: I run Soundslice, a website built around synced
       | sheet music, in which there's no automatic transcription involved
       | (maybe someday, but the tech isn't good enough yet!). I've been
       | following these developments for 15+ years.
        
         | herogary wrote:
         | Thanks for your feedback. We're currently in the process of
         | adjusting our dataset and model to address issues with chords
         | and rhythm. We're looking forward to providing you with a
         | better experience in the future.
        
         | luckydata wrote:
         | sounds like your site could be a great training set, someone
         | should reach out to you sounds like a good business
         | opportunity.
        
       | 999900000999 wrote:
       | Cool project, but it's not very accurate. I wouldn't charge for
       | this yet
        
         | herogary wrote:
         | Thank you for providing feedback. We will continue to optimize
         | and improve the model.
        
       | CMLab wrote:
       | I tried a few songs and uploaded one of my favorite songs. The
       | lyrics recognition result was excellent, but there were a few
       | instances where some words were missing or incorrect. It would be
       | great if there could be an option to edit the lyrics.
        
         | herogary wrote:
         | We are in the process of adding the lyrics editing feature, as
         | well as chord and rhythm types. It will be released soon.
        
       | atum47 wrote:
       | I've always wanted to implement a FFT from scratch and play with
       | it to separate audio waves but then a full time job came along. I
       | guess once you separate vocals from everything else you can just
       | feed it to a speech to text?
       | 
       | To be completely honest, as a human that does not speak English
       | natively, i find some lyrics hard to understand. I've seen native
       | English speakers also having this problem. I think it's only
       | neutral for a NN to do the same mistakes.
        
         | herogary wrote:
         | Source separation is commonly done by applying masks to the
         | spectrogram. Deep learning is used to train the mask masks for
         | different instruments' parameters. As you mentioned, this is
         | the approach we will follow in the subsequent steps.
        
       | avallach wrote:
       | In the Android app I consistently get "Downloading model file"
       | stuck at exactly -60830200% . Tried clearing data and caches and
       | changing the connection.
        
         | herogary wrote:
         | Thank you for your feedback. Could you please email us the
         | information of your phone model and system version? We will
         | investigate promptly. In the meantime, you can try exiting the
         | program and re-entering to see if that helps. Please also check
         | your network connection.
        
       | may4m wrote:
       | does anyone know if virtualdj.com uses the same technique?
        
         | herogary wrote:
         | It looks like a DAW, and I'm not very familiar with their
         | source separation. The technology we use has been publicly
         | released on GitHub
        
       | pseudocomposer wrote:
       | I see a lot of "7M" chords in generated outputs, which isn't a
       | type of chord I'm familiar with. Is this meant to be "M7" (major
       | 7)?
        
         | herogary wrote:
         | Yes, it's a major 7th chord.
        
       | johnmaguire wrote:
       | My partner plays violin and was looking for a version of Lindsey
       | Stirling's Crystallize without the violin part (or rather, with
       | it turned down.)
       | 
       | I found plenty of tools online to do this, but they were all
       | credit-based and a bit annoying to use. I eventually found they
       | were mostly using Demucs from Facebook:
       | https://github.com/facebookresearch/demucs
       | 
       | Really nice tool if you need simple splitting of things like
       | drums, vocals, etc. It's not perfect, but it's a great start.
        
         | herogary wrote:
         | Yes, the Demucs mixing model has excellent SNR performance, but
         | it is computationally intensive. It also incorporates a random
         | mechanism, so each time it produces different spectrograms.
        
         | sevagh wrote:
         | You can run Demucs directly in your browser (through WASM) on
         | my website: https://freemusicdemixer.com/
         | 
         | No usage credits or cost, since it's all on your computer.
        
       | joshuak wrote:
       | I would think that by comparison to image models synthetic data
       | would be relatively easy to generate for audio model training.
       | I'm curious then why it continues to be so difficult to build a
       | nearly flawless audio separation model. Is synthetic data being
       | widely used? Is it just too hard of a problem to train even with
       | this data? I don't have a good sense of what the most challenging
       | aspects are of audio models.
        
         | herogary wrote:
         | Unlike images, audio signals are time-dependent and have
         | complex temporal dynamics, making it more challenging to
         | generate realistic synthetic data that captures the nuances of
         | real-world audio. Meanwhile, the complex nature of audio
         | signals, the scarcity of high-quality training data, and the
         | subjective evaluation of audio quality collectively contribute
         | to the ongoing challenges in building near-flawless audio
         | separation models.
        
       | crtified wrote:
       | It doesn't seem very well pleased with The Dance of Eternity by
       | Dream Theater. But it was quite impressive in some aspects with
       | the likes of Nirvana.
       | 
       | Future music students are going to be fortunate to have these
       | kinds of tool. An instant split and full analysis of any song.
       | Remixes and backing tracks on tap, etc.
       | 
       | That said, we older students learned a lot from the process of
       | doing this manually ourselves. Those lessons can still be learned
       | and others besides, but the dynamics of the learning change with
       | the tech.
        
       | corn-cheese wrote:
       | Finally, a new development in lyric transcription! I've been
       | waiting to see this for ages!
       | 
       | def get_lyrics(waveform, sr, cfg): # asr and wav2vec2 raise
       | NotImplementedError()
        
       ___________________________________________________________________
       (page generated 2024-03-26 23:00 UTC)