[HN Gopher] Audio Decomposition - open-source seperation of musi...
___________________________________________________________________
Audio Decomposition - open-source seperation of music to
constituent instruments
Author : thunderbong
Score : 255 points
Date : 2024-11-10 03:57 UTC (19 hours ago)
(HTM) web link (matthew-bird.com)
(TXT) w3m dump (matthew-bird.com)
| DidYaWipe wrote:
| Some of those videos don't have audio, as far as I can tell...
| tjoff wrote:
| The youtube links explains why: "No audio as a result of
| copyright." And also has a link to the audio that you can play
| alongside.
| bottom999mottob wrote:
| This is really cool, but there's real-world instrument physics
| that might not be captured by simple Fourier transform templates,
| like a trumpet playing softly can have a significantly different
| harmonic spectrum than the same trumpet playing loudly, even at
| the same pitch
|
| Trumpets produce a rich harmonic series with strong overtones,
| meaning their Fourier transform would show prominent peaks at
| integer multiples of the fundamental frequency. Instruments like
| flutes have more pure tones, but brass instruments typically have
| stronger higher harmonics, which would lead to more complex
| partial derivatives in the matrix equation shown in the article
|
| So this script uses bandpass filtering and cross-correlation of
| attack/release envelopes to identify note timing. Given that
| brass instruments can exhibit non-linear behavior where the
| harmonic content changes significantly with playing intensity
| (think of the brightness difference between pp and ff passages),
| not sure how would this algorithm could handle intensity-
| dependent timbral variations. I'd consider adding intensity-
| dependent Fourier templates for each instrument to improve
| accuracy
| atoav wrote:
| As someone who uses source separation twice a week for mixing
| purposes the number of other instruments that can produce
| sounds of "vocal" quality is high. These models all stop
| functiining well when you have bands where the instruments
| don't sound typical and aren't played and/or mixed in a way
| that achieves maximum separation between them -- e.g. an
| electrical guitar with a distorted harmonic hitting the same
| note as your singer while the drummer plays only shrieking
| noises on their cymbals and the bass player simulates a
| punching kick drum on their instrument.
|
| In these situations (experimental music) source separation will
| produce completely unpredictable results, thst may or may not
| be useful for musical rebalancing.
| fnordlord wrote:
| What tool do you use for the source separation? Everything
| I've used so far is great for learning or transcribing to
| MIDI but the separated tracks always have a strange phasing
| sound to them. Are you doing something to clean that up
| before mixing back in or are the results already good enough?
| atoav wrote:
| iZotope RX with musical rebalance, great to reduce drum
| spill from vocal mics
| ipsum2 wrote:
| I must be dumb, but none of the YouTube video demos are
| demonstrating source separation?
|
| Edit: to clarify, source separation in audio research means
| separating out the audio into separate clips.
| atoav wrote:
| I think decomposition is the word, source separation in this
| case (misleadingly) referes to the fact that the decomposed
| notes can be separated into different sources.
| wkjagt wrote:
| The "source" here goes with "open source".
| timlod wrote:
| The title is a bit confusing as open-source separation of ...
| reads like source separation, which this is not. Rather, it is a
| pitch detection algorithm which also classifies the instrument
| the pitch originated with.
|
| I think it's really neat, but the results look like it could take
| more time to fix the output than using a manual approach (if
| really accurate results are required).
| earthnail wrote:
| Thanks for clarifying.
|
| In fairness to the author, he is still at high school:
| https://matthew-bird.com/about.html
|
| Amazing work for that age.
| timlod wrote:
| Wow, I didn't see that. Great to see this level of interest
| early on!
| veunes wrote:
| He's definitely a talent to watch!
| emptiestplace wrote:
| No, it doesn't read like that. The hyphen completely eliminates
| any possible ambiguity.
| croes wrote:
| Maybe added later by OP? Because there is no hyphen in the
| article's subtitle.
|
| >Open source seperation of music into constituent
| instruments.
| emptiestplace wrote:
| The complaint:
|
| > The title is a bit confusing as open-source separation of
| ... reads like source separation, which this is not.
| ipsum2 wrote:
| The title of the submission was modified. It you read the
| article it says:
|
| Audio Decomposition [Blind Source Seperation]
| TazeTSchnitzel wrote:
| Is "source separation" better known as "stem separation" or is
| that something else? I think the latter term is the one I
| usually hear from musicians who are interested in taking a
| single audio file and recovering (something approximating) the
| original tracks prior to mixing (i.e. the "stems").
| timlod wrote:
| Audio Source Separation I think is the general term used in
| research. It is often applied to musical audio though, where
| you want to do stem separation - that's source separation
| where you want to isolate audio stems, a term referring to
| audio from related groups of signals, e.g. drums (which can
| contain multiple individual signals, like one for each
| drum/cymbal).
| Earw0rm wrote:
| Stem separation refers to doing it with audio playback
| fidelity (or an attempt at that). So it should pull the bass
| part out at high enough fidelity to be reused as a bass part.
|
| This is a partly solved problem right now. Some tracks and
| signal types can be unmixed easier than others, it depends on
| what the sources are and how much post-processing (reverb,
| side chaining, heavy brick wall limiting and so on)
| dylan604 wrote:
| > This is a partly solved problem right now.
|
| I'd agree with the partly. I have yet to find one that
| either isolates an instrument as a separate file or removes
| one from the rest of the mix that does not negatively
| impact the sound. The common issues I hear are similar to
| the early internet low bit rate compression. The new "AI"
| versions are really bad at this, but even the ones
| available before the AI craze were still susceptible
| baq wrote:
| Got a flashback of playing audiosurf 15 or so years ago. Time
| flies.
|
| https://en.wikipedia.org/wiki/Audiosurf
| fxj wrote:
| If you are interested in audio (or stem) separation have a look
| at RipX
|
| https://hitnmix.com/ripx-daw-pro/
|
| It can even export the separated tracks as midi files. It still
| has some problems but works very well. Stem separation is now
| standard in the musical software and almost every DAW provides
| it.
| makz wrote:
| Thanks for the information. I'm a long time Logic Pro user and
| I wasn't aware of this feature.
| Sporktacular wrote:
| On an M1/2/3/4 processor. Not Intel.
| sbarre wrote:
| Stemroller[0] has been around for a while too, it's free and
| based on Meta's models:
|
| 0: https://www.stemroller.com/
| cloudking wrote:
| I've heard Meta's Demucs is SOTA, has anything else better
| come out since?
| oidar wrote:
| > almost every DAW provides it.
|
| It's an up and coming feature that nearly every DAW should
| have, but most don't yet.
|
| Ableton Live - No
|
| Bigwig - No
|
| Cubase - No
|
| FL - Yes
|
| Logic - Yes
|
| Pro Tools - No
|
| Reason - No
|
| Reaper - No
|
| Studio One - Yes
| fxj wrote:
| MPC3 - Yes
|
| Mixcraft - Yes
|
| Maschine3 - Yes
| tasty_freeze wrote:
| RipX can do stem separation and allows repitching notes in the
| mix. If that is what you want to do it is great.
|
| I find moises (https://moises.ai/) to be easy to use for the
| tasks I need to do. It allows transposing or time scaling the
| entire song. It does stem separation and has a simple interface
| for muting and changing the volume on a per-track basis. It
| auto-detects the beat and chords.
|
| I'm not affiliated, just a happy nearly-daily user for learning
| and practicing songs. I boost the original bass part and put
| everything else at < 10% volume to hear the bass part clearly
| clearly (which often shows how bad online transcriptions are,
| even paid ones). Once once I know the part, I mute the bass
| part and play along with the original song as if I was the bass
| player.
| antback wrote:
| It appears to be related to Polymath.
|
| https://github.com/samim23/polymath
|
| Polymath is effective at isolating and extracting individual
| instrument tracks from MP3s. It works very well.
| ekianjo wrote:
| Looks like this may be the work of Joshua Bird's little brother
| (?). Joshua bird did some impressive projects already, that were
| featured on HN before: https://www.youtube.com/@joshuabird333
| generalizations wrote:
| No one else is going to mention that "separation" was misspelled
| four times?
| orbitingpluto wrote:
| If we can all hear the tiny violin, who cares?
| generalizations wrote:
| Degradation of the environment. https://en.wikipedia.org/wiki
| /Broken_windows_theory#Theoreti...
| bastloing wrote:
| I can't find the source code, but the project looks interesting.
| ssttoo wrote:
| There's a GitHub link right below the videos
| https://github.com/mbird1258/Audio-Decomposition
| bastloing wrote:
| Thanks! Nice! This kid is pretty sharp, can't wait to see
| what else he does!
| loubbrad wrote:
| I didn't see it referenced directly anywhere in this post.
| However, for those interested, automatic music transcription
| (i.e., audio->MIDI) is actually a decently sized subfield of deep
| learning and music information retrieval.
|
| There have been several successful models for multi-track music
| transcription - see Google's MT3 project
| (https://research.google/pubs/mt3-multi-task-multitrack-
| music...). In the case of piano transcription, accuracy is nearly
| flawless at this point, even for very low-quality audio:
|
| https://github.com/EleutherAI/aria-amt
|
| Full disclaimer: I am the author of the above repo.
| WiSaGaN wrote:
| How does the problem simplify when it's restricted to piano?
| loubbrad wrote:
| Essentially, the leading way to do automatic music
| transcription is to train a neural network on supervised
| data, i.e., paired audio-MIDI data. In the case of piano
| recordings, there is a very good dataset for this task which
| was released by Google in 2018:
|
| https://magenta.tensorflow.org/datasets/maestro
|
| Most current research involves refining deep learning based
| approaches to this task. When I worked on this problem
| earlier this year, I was interested in adding robustness to
| these models by training a sort of musical awareness into
| them. You can see a good example of it in this tweet:
|
| https://x.com/loubbrad/status/1794747652191777049
| bravura wrote:
| I know the reported scores of MT3 are very good, but have you
| had success with using it yourself?
|
| https://replicate.com/turian/multi-task-music-transcription
|
| I ported their colab to runtime so I could use it more easily.
|
| The MIDI output is... puzzling?
|
| I've tried feeding it even simple stems and found the output
| unusable for some tracks, i.e. the MIDI output and audio were
| not well aligned and there were timing issues. On other audio
| it seemed to work fine.
| loubbrad wrote:
| Multi-track transcription has a long way to go before it
| seriously useful for real-world applications. Ultimately I
| think that converting audio into MIDI makes a lot more sense
| for piano/guitar transcription than it does for complex
| multi-instrument works with sound effects ect...
|
| Luckily for me, audio-to-seq approaches do work very well for
| piano, which turns out to be an amazing way of getting
| expressive MIDI data for training generative models.
| air217 wrote:
| I developed https://pyaar.ai, it uses MT3 under the hood. I
| realized that continuous string instruments (guitar) that
| have things like slides, bends are quite difficult to capture
| in MIDI. Piano works much better because it's more discrete
| (the keys abstract away the strings) and so the MIDI file has
| better representation
| duped wrote:
| > I realized that continuous string instruments (guitar)
| that have things like slides, bends are quite difficult to
| capture in MIDI.
|
| It's just pitch bend?
|
| I think trying to transcribe as MIDI is just a
| fundamentally flawed approach that has too many (well
| known) pitfalls to be useful.
|
| A trained human can listen to a piece and transcribe it in
| seconds, but programming it as MIDI could take
| minutes/hours. If you're not trying to replicate how humans
| learn by ear, you're probably approaching this wrong.
| Earw0rm wrote:
| He's trying to solve a second (also hard ish) problem as well,
| deriving an accurate musical score from MIDI data. It's a
| "sounds easy but isn't" problem, especially when audio to MIDI
| transcribers are great at pitch and onset times, but rather
| less reliable at duration and velocity.
| loubbrad wrote:
| I agree that the audio->score and MIDI->score problems are
| quite hard. There has been research in this area too, however
| it is far less developed than audio->MIDI.
| Earw0rm wrote:
| That's because MIDI doesn't contain all the information
| that was in a score.
|
| Scores are interpreted by musicians to create a
| performance, and MIDI is a capture of (some of) the data
| about that performance. Music engraving is full of implicit
| and explicit cultural rules, and getting it _right_ has
| parallels with handwritten kanji script in terms of both
| the importance of correctness to the reader, and the amount
| of traps for the unwary or uncultured.
|
| All of which can be taken to mean "classical musicians are
| incredibly picky and anal about this stuff", or, "well-
| formed music notation conveys all sorts of useful
| contextual information beyond simply 'what note to play
| when'".
| pclmulqdq wrote:
| A lot of modern scores are written with MIDI in mind
| (whether or not the composer knows it - that's how they
| hear it the first 50 or so times). That should make it
| somewhat easier to go MIDI -> score for similar pieces.
| Current attempts I have seen still make a lot of stupid
| errors like making note durations too precise and
| spelling accidentals badly. There's probably still a lot
| of low-hanging fruit.
|
| This is absolutely not easy, though, given all the
| cultural context. Things like picking up a "legato" or
| "cantabile" marking and choosing an accent vs a dagger or
| a marcato mark are going to be very difficult no matter
| what.
| testoveride wrote:
| Ff
| kasajian wrote:
| dude can't spell
| berbec wrote:
| He's in high school and pulls of a project like this. I thought
| I was slick convincing the 7-11 guy to give me my Twist-a-
| Pepper soda without charging me bottle deposit or tax.
___________________________________________________________________
(page generated 2024-11-10 23:00 UTC)