[HN Gopher] Audio Decomposition - open-source seperation of musi...
       ___________________________________________________________________
        
       Audio Decomposition - open-source seperation of music to
       constituent instruments
        
       Author : thunderbong
       Score  : 255 points
       Date   : 2024-11-10 03:57 UTC (19 hours ago)
        
 (HTM) web link (matthew-bird.com)
 (TXT) w3m dump (matthew-bird.com)
        
       | DidYaWipe wrote:
       | Some of those videos don't have audio, as far as I can tell...
        
         | tjoff wrote:
         | The youtube links explains why: "No audio as a result of
         | copyright." And also has a link to the audio that you can play
         | alongside.
        
       | bottom999mottob wrote:
       | This is really cool, but there's real-world instrument physics
       | that might not be captured by simple Fourier transform templates,
       | like a trumpet playing softly can have a significantly different
       | harmonic spectrum than the same trumpet playing loudly, even at
       | the same pitch
       | 
       | Trumpets produce a rich harmonic series with strong overtones,
       | meaning their Fourier transform would show prominent peaks at
       | integer multiples of the fundamental frequency. Instruments like
       | flutes have more pure tones, but brass instruments typically have
       | stronger higher harmonics, which would lead to more complex
       | partial derivatives in the matrix equation shown in the article
       | 
       | So this script uses bandpass filtering and cross-correlation of
       | attack/release envelopes to identify note timing. Given that
       | brass instruments can exhibit non-linear behavior where the
       | harmonic content changes significantly with playing intensity
       | (think of the brightness difference between pp and ff passages),
       | not sure how would this algorithm could handle intensity-
       | dependent timbral variations. I'd consider adding intensity-
       | dependent Fourier templates for each instrument to improve
       | accuracy
        
         | atoav wrote:
         | As someone who uses source separation twice a week for mixing
         | purposes the number of other instruments that can produce
         | sounds of "vocal" quality is high. These models all stop
         | functiining well when you have bands where the instruments
         | don't sound typical and aren't played and/or mixed in a way
         | that achieves maximum separation between them -- e.g. an
         | electrical guitar with a distorted harmonic hitting the same
         | note as your singer while the drummer plays only shrieking
         | noises on their cymbals and the bass player simulates a
         | punching kick drum on their instrument.
         | 
         | In these situations (experimental music) source separation will
         | produce completely unpredictable results, thst may or may not
         | be useful for musical rebalancing.
        
           | fnordlord wrote:
           | What tool do you use for the source separation? Everything
           | I've used so far is great for learning or transcribing to
           | MIDI but the separated tracks always have a strange phasing
           | sound to them. Are you doing something to clean that up
           | before mixing back in or are the results already good enough?
        
             | atoav wrote:
             | iZotope RX with musical rebalance, great to reduce drum
             | spill from vocal mics
        
       | ipsum2 wrote:
       | I must be dumb, but none of the YouTube video demos are
       | demonstrating source separation?
       | 
       | Edit: to clarify, source separation in audio research means
       | separating out the audio into separate clips.
        
         | atoav wrote:
         | I think decomposition is the word, source separation in this
         | case (misleadingly) referes to the fact that the decomposed
         | notes can be separated into different sources.
        
         | wkjagt wrote:
         | The "source" here goes with "open source".
        
       | timlod wrote:
       | The title is a bit confusing as open-source separation of ...
       | reads like source separation, which this is not. Rather, it is a
       | pitch detection algorithm which also classifies the instrument
       | the pitch originated with.
       | 
       | I think it's really neat, but the results look like it could take
       | more time to fix the output than using a manual approach (if
       | really accurate results are required).
        
         | earthnail wrote:
         | Thanks for clarifying.
         | 
         | In fairness to the author, he is still at high school:
         | https://matthew-bird.com/about.html
         | 
         | Amazing work for that age.
        
           | timlod wrote:
           | Wow, I didn't see that. Great to see this level of interest
           | early on!
        
           | veunes wrote:
           | He's definitely a talent to watch!
        
         | emptiestplace wrote:
         | No, it doesn't read like that. The hyphen completely eliminates
         | any possible ambiguity.
        
           | croes wrote:
           | Maybe added later by OP? Because there is no hyphen in the
           | article's subtitle.
           | 
           | >Open source seperation of music into constituent
           | instruments.
        
             | emptiestplace wrote:
             | The complaint:
             | 
             | > The title is a bit confusing as open-source separation of
             | ... reads like source separation, which this is not.
        
           | ipsum2 wrote:
           | The title of the submission was modified. It you read the
           | article it says:
           | 
           | Audio Decomposition [Blind Source Seperation]
        
         | TazeTSchnitzel wrote:
         | Is "source separation" better known as "stem separation" or is
         | that something else? I think the latter term is the one I
         | usually hear from musicians who are interested in taking a
         | single audio file and recovering (something approximating) the
         | original tracks prior to mixing (i.e. the "stems").
        
           | timlod wrote:
           | Audio Source Separation I think is the general term used in
           | research. It is often applied to musical audio though, where
           | you want to do stem separation - that's source separation
           | where you want to isolate audio stems, a term referring to
           | audio from related groups of signals, e.g. drums (which can
           | contain multiple individual signals, like one for each
           | drum/cymbal).
        
           | Earw0rm wrote:
           | Stem separation refers to doing it with audio playback
           | fidelity (or an attempt at that). So it should pull the bass
           | part out at high enough fidelity to be reused as a bass part.
           | 
           | This is a partly solved problem right now. Some tracks and
           | signal types can be unmixed easier than others, it depends on
           | what the sources are and how much post-processing (reverb,
           | side chaining, heavy brick wall limiting and so on)
        
             | dylan604 wrote:
             | > This is a partly solved problem right now.
             | 
             | I'd agree with the partly. I have yet to find one that
             | either isolates an instrument as a separate file or removes
             | one from the rest of the mix that does not negatively
             | impact the sound. The common issues I hear are similar to
             | the early internet low bit rate compression. The new "AI"
             | versions are really bad at this, but even the ones
             | available before the AI craze were still susceptible
        
       | baq wrote:
       | Got a flashback of playing audiosurf 15 or so years ago. Time
       | flies.
       | 
       | https://en.wikipedia.org/wiki/Audiosurf
        
       | fxj wrote:
       | If you are interested in audio (or stem) separation have a look
       | at RipX
       | 
       | https://hitnmix.com/ripx-daw-pro/
       | 
       | It can even export the separated tracks as midi files. It still
       | has some problems but works very well. Stem separation is now
       | standard in the musical software and almost every DAW provides
       | it.
        
         | makz wrote:
         | Thanks for the information. I'm a long time Logic Pro user and
         | I wasn't aware of this feature.
        
           | Sporktacular wrote:
           | On an M1/2/3/4 processor. Not Intel.
        
         | sbarre wrote:
         | Stemroller[0] has been around for a while too, it's free and
         | based on Meta's models:
         | 
         | 0: https://www.stemroller.com/
        
           | cloudking wrote:
           | I've heard Meta's Demucs is SOTA, has anything else better
           | come out since?
        
         | oidar wrote:
         | > almost every DAW provides it.
         | 
         | It's an up and coming feature that nearly every DAW should
         | have, but most don't yet.
         | 
         | Ableton Live - No
         | 
         | Bigwig - No
         | 
         | Cubase - No
         | 
         | FL - Yes
         | 
         | Logic - Yes
         | 
         | Pro Tools - No
         | 
         | Reason - No
         | 
         | Reaper - No
         | 
         | Studio One - Yes
        
           | fxj wrote:
           | MPC3 - Yes
           | 
           | Mixcraft - Yes
           | 
           | Maschine3 - Yes
        
         | tasty_freeze wrote:
         | RipX can do stem separation and allows repitching notes in the
         | mix. If that is what you want to do it is great.
         | 
         | I find moises (https://moises.ai/) to be easy to use for the
         | tasks I need to do. It allows transposing or time scaling the
         | entire song. It does stem separation and has a simple interface
         | for muting and changing the volume on a per-track basis. It
         | auto-detects the beat and chords.
         | 
         | I'm not affiliated, just a happy nearly-daily user for learning
         | and practicing songs. I boost the original bass part and put
         | everything else at < 10% volume to hear the bass part clearly
         | clearly (which often shows how bad online transcriptions are,
         | even paid ones). Once once I know the part, I mute the bass
         | part and play along with the original song as if I was the bass
         | player.
        
         | antback wrote:
         | It appears to be related to Polymath.
         | 
         | https://github.com/samim23/polymath
         | 
         | Polymath is effective at isolating and extracting individual
         | instrument tracks from MP3s. It works very well.
        
       | ekianjo wrote:
       | Looks like this may be the work of Joshua Bird's little brother
       | (?). Joshua bird did some impressive projects already, that were
       | featured on HN before: https://www.youtube.com/@joshuabird333
        
       | generalizations wrote:
       | No one else is going to mention that "separation" was misspelled
       | four times?
        
         | orbitingpluto wrote:
         | If we can all hear the tiny violin, who cares?
        
           | generalizations wrote:
           | Degradation of the environment. https://en.wikipedia.org/wiki
           | /Broken_windows_theory#Theoreti...
        
       | bastloing wrote:
       | I can't find the source code, but the project looks interesting.
        
         | ssttoo wrote:
         | There's a GitHub link right below the videos
         | https://github.com/mbird1258/Audio-Decomposition
        
           | bastloing wrote:
           | Thanks! Nice! This kid is pretty sharp, can't wait to see
           | what else he does!
        
       | loubbrad wrote:
       | I didn't see it referenced directly anywhere in this post.
       | However, for those interested, automatic music transcription
       | (i.e., audio->MIDI) is actually a decently sized subfield of deep
       | learning and music information retrieval.
       | 
       | There have been several successful models for multi-track music
       | transcription - see Google's MT3 project
       | (https://research.google/pubs/mt3-multi-task-multitrack-
       | music...). In the case of piano transcription, accuracy is nearly
       | flawless at this point, even for very low-quality audio:
       | 
       | https://github.com/EleutherAI/aria-amt
       | 
       | Full disclaimer: I am the author of the above repo.
        
         | WiSaGaN wrote:
         | How does the problem simplify when it's restricted to piano?
        
           | loubbrad wrote:
           | Essentially, the leading way to do automatic music
           | transcription is to train a neural network on supervised
           | data, i.e., paired audio-MIDI data. In the case of piano
           | recordings, there is a very good dataset for this task which
           | was released by Google in 2018:
           | 
           | https://magenta.tensorflow.org/datasets/maestro
           | 
           | Most current research involves refining deep learning based
           | approaches to this task. When I worked on this problem
           | earlier this year, I was interested in adding robustness to
           | these models by training a sort of musical awareness into
           | them. You can see a good example of it in this tweet:
           | 
           | https://x.com/loubbrad/status/1794747652191777049
        
         | bravura wrote:
         | I know the reported scores of MT3 are very good, but have you
         | had success with using it yourself?
         | 
         | https://replicate.com/turian/multi-task-music-transcription
         | 
         | I ported their colab to runtime so I could use it more easily.
         | 
         | The MIDI output is... puzzling?
         | 
         | I've tried feeding it even simple stems and found the output
         | unusable for some tracks, i.e. the MIDI output and audio were
         | not well aligned and there were timing issues. On other audio
         | it seemed to work fine.
        
           | loubbrad wrote:
           | Multi-track transcription has a long way to go before it
           | seriously useful for real-world applications. Ultimately I
           | think that converting audio into MIDI makes a lot more sense
           | for piano/guitar transcription than it does for complex
           | multi-instrument works with sound effects ect...
           | 
           | Luckily for me, audio-to-seq approaches do work very well for
           | piano, which turns out to be an amazing way of getting
           | expressive MIDI data for training generative models.
        
           | air217 wrote:
           | I developed https://pyaar.ai, it uses MT3 under the hood. I
           | realized that continuous string instruments (guitar) that
           | have things like slides, bends are quite difficult to capture
           | in MIDI. Piano works much better because it's more discrete
           | (the keys abstract away the strings) and so the MIDI file has
           | better representation
        
             | duped wrote:
             | > I realized that continuous string instruments (guitar)
             | that have things like slides, bends are quite difficult to
             | capture in MIDI.
             | 
             | It's just pitch bend?
             | 
             | I think trying to transcribe as MIDI is just a
             | fundamentally flawed approach that has too many (well
             | known) pitfalls to be useful.
             | 
             | A trained human can listen to a piece and transcribe it in
             | seconds, but programming it as MIDI could take
             | minutes/hours. If you're not trying to replicate how humans
             | learn by ear, you're probably approaching this wrong.
        
         | Earw0rm wrote:
         | He's trying to solve a second (also hard ish) problem as well,
         | deriving an accurate musical score from MIDI data. It's a
         | "sounds easy but isn't" problem, especially when audio to MIDI
         | transcribers are great at pitch and onset times, but rather
         | less reliable at duration and velocity.
        
           | loubbrad wrote:
           | I agree that the audio->score and MIDI->score problems are
           | quite hard. There has been research in this area too, however
           | it is far less developed than audio->MIDI.
        
             | Earw0rm wrote:
             | That's because MIDI doesn't contain all the information
             | that was in a score.
             | 
             | Scores are interpreted by musicians to create a
             | performance, and MIDI is a capture of (some of) the data
             | about that performance. Music engraving is full of implicit
             | and explicit cultural rules, and getting it _right_ has
             | parallels with handwritten kanji script in terms of both
             | the importance of correctness to the reader, and the amount
             | of traps for the unwary or uncultured.
             | 
             | All of which can be taken to mean "classical musicians are
             | incredibly picky and anal about this stuff", or, "well-
             | formed music notation conveys all sorts of useful
             | contextual information beyond simply 'what note to play
             | when'".
        
               | pclmulqdq wrote:
               | A lot of modern scores are written with MIDI in mind
               | (whether or not the composer knows it - that's how they
               | hear it the first 50 or so times). That should make it
               | somewhat easier to go MIDI -> score for similar pieces.
               | Current attempts I have seen still make a lot of stupid
               | errors like making note durations too precise and
               | spelling accidentals badly. There's probably still a lot
               | of low-hanging fruit.
               | 
               | This is absolutely not easy, though, given all the
               | cultural context. Things like picking up a "legato" or
               | "cantabile" marking and choosing an accent vs a dagger or
               | a marcato mark are going to be very difficult no matter
               | what.
        
       | testoveride wrote:
       | Ff
        
       | kasajian wrote:
       | dude can't spell
        
         | berbec wrote:
         | He's in high school and pulls of a project like this. I thought
         | I was slick convincing the 7-11 guy to give me my Twist-a-
         | Pepper soda without charging me bottle deposit or tax.
        
       ___________________________________________________________________
       (page generated 2024-11-10 23:00 UTC)