[HN Gopher] AudioFlux: A C/C++ library for audio and music analysis
       ___________________________________________________________________
        
       AudioFlux: A C/C++ library for audio and music analysis
        
       Author : CMLab
       Score  : 152 points
       Date   : 2024-08-13 13:51 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bravura wrote:
       | If this is supposed to be used for deep-learning, shouldn't all
       | the transforms be GPU-accelerated torch functions?
        
         | herogary wrote:
         | Maybe for the convenience of mobile usage?
        
         | tgv wrote:
         | By the looks of it, those functions extract features (like
         | frequency peaks). You do that once for a sound. The output
         | could function as input for an NN, in which case it would be a
         | tokenizer for sound.
        
           | bravura wrote:
           | Given what I've seen in audio ML research:
           | 
           | 1) Tuning hyperparameters of your audio preprocessing is a
           | pain if it's a preprocessed CPU step. You have to redo
           | preprocessing every time you want to tune your audio feature
           | hyperparams
           | 
           | 2) It's quite common to use torchaudio spectrograms, etc.
           | purely because they are faster (I can link to a handful of
           | recent high-impact audio ML github repos if you like)
           | 
           | 3) If you use nnAudio, you can actually backprop the STFT or
           | mel filters and tune them if you like. With that said, this
           | is not so commonplace.
           | 
           | 4) Sometimes the audio is GENERATED by a GPU. For example, in
           | a neural vocoder, you decode the audio from a mel to a
           | waveform. Then, you compute the loss over the true versus
           | predict audio mel spectrograms. You can't do this with these
           | C++ features. (Again, I can link a handful of recent high-
           | impact audio ML github repos if you like.)
           | 
           | Again, I just don't get it.
        
             | codetrotter wrote:
             | > I can link to a handful of recent high-impact audio ML
             | github repos if you like
             | 
             | Yes please :D
        
               | bravura wrote:
               | For instance:
               | 
               | https://github.com/descriptinc/descript-audio-
               | codec/blob/mai...
               | 
               | https://github.com/NVIDIA/BigVGAN/blob/main/loss.py#L23
               | 
               | https://arxiv.org/pdf/2210.13438 (the github repo doesn't
               | include training, just inference)
               | 
               | It is INCREDIBLY common to use multi-scale spectral loss
               | as the audio distance / objective measure in audio
               | generation. They have some issues (i.e. they aren't
               | always well correlated with human perception) but they
               | are the known-current-best.
        
             | tgv wrote:
             | Backpropping filter coefficients sounds clever, but can't
             | you just do that on any layer that takes a spectrum as
             | input?
        
               | bravura wrote:
               | Backpropping filter coefficients is clever, but it hasn't
               | really caught on much. Google also tried with LEAF
               | (https://github.com/google-research/leaf-audio) to have a
               | learnable audio filterbank.
               | 
               | Anyway, in audio ML what is very common is:
               | 
               | a) Futzing with the way you do feature extraction on the
               | input. (Oh, maybe I want CQT for this task or a different
               | scale Mel etc)
               | 
               | b) Doing feature extraction on generated audio output,
               | and constructing loss functions from generated audio
               | features.
               | 
               | So, as I said, I don't exactly see the utility of this
               | library for deep learning.
               | 
               | With that said, it is definitely nice to have really high
               | speed low latency audio algorithms in C++. I just
               | wouldn't market it as "useful for deep learning" because
               | 
               | a) during training, you need more flexibility than non-
               | GPU methods without backprop
               | 
               | b) if you are doing "deep learning" then your inferred
               | model will presumably be quite large, and there will be a
               | million other things you'll need to optimize to get real-
               | time inference or inference on CPUs to work well.
               | 
               | Is just my gut reaction. It seems like a solid project, I
               | just question the one selling point of "useful for deep
               | learning" that's all.
        
               | Severian wrote:
               | Are there resources you would recommend reading regarding
               | ML and audio?
        
               | bravura wrote:
               | This is a really broad topic. I began studying it about 5
               | years ago.
               | 
               | Can you start by suggesting what you task you want to do?
               | I'll throw out some suggestions, but you can say
               | something different. Also you are welcome to email me
               | (email in HN profile):
               | 
               | * Voice conversion / singing voice conversion
               | 
               | * Transcription of audio to MIDI
               | 
               | * Classification / tagging of audio scene
               | 
               | * Applying some effect / cleanup to audio
               | 
               | * Separating audio into different instruments
               | 
               | etc
               | 
               | The really quick summary of audio ML as a topic is:
               | 
               | * Often people treat it audio ML as vision ML, by using
               | spectrogram representations of audio. Nonetheless, 1D
               | models are sometimes just as good if not better, but they
               | require very specific familiarity with the audio domain.
               | 
               | * Audio distance measures (loss functions) are pretty
               | crappy and not well-correlated with human perception. You
               | can say the same thing about vision distance measures,
               | but a lot more research has gone into vision models so we
               | have better heuristics around vision stuff. With that
               | said, multi-scale log mel spectrogram isn't that
               | terrible.
               | 
               | * Audio has a handful of little gotches around padding,
               | windowing, etc.
               | 
               | * DSP is a black art and DSP knowledge has high ROI
               | versus just being dumb and black boxy about everything.
        
       | nesarkvechnep wrote:
       | What's this C/C++ language?
        
         | n4r9 wrote:
         | Some preliminary analysis suggests that if C is an integer
         | greater than 1, C/C++ will always evaluate to 1 [0].
         | 
         | [0] https://www.programiz.com/online-compiler/9fkHTct0Mybpu
        
         | troymc wrote:
         | 1/++
        
           | rossant wrote:
           | or 1++ if you got your C operator precedence wrong
        
         | iExploder wrote:
         | mean you are using C++ but only the features that dont suck, so
         | probably 10% of the language, rest is plain C
        
       | dsego wrote:
       | It's also for Python, I just discovered it a few days ago. This
       | is the website https://audioflux.top/
        
       | jcelerier wrote:
       | It would be nice to have a comparison with any of the many C++
       | MIR (music information retrieval) libraries in the wild:
       | 
       | - https://essentia.upf.edu/
       | 
       | - https://github.com/marsyas/marsyas
       | 
       | - https://github.com/ircam-ismm/pipo
       | 
       | - https://github.com/flucoma/flucoma-core/tree/main/include/al...
        
         | BrannonKing wrote:
         | If a person wanted to transcribe sheet music from recorded
         | audio, do you know which library and features would be the best
         | starting point?
        
           | bckr wrote:
           | Start with source separation using demucs
        
           | bravura wrote:
           | I have had mixed luck with this model, which is supposedly
           | state-of-the-art: https://github.com/magenta/mt3
           | 
           | What kind of music are you trying to transcribe?
           | 
           | Feel free to email me.
        
           | cpdomina wrote:
           | For MIDI:
           | 
           | https://github.com/Music-and-Culture-Technology-Lab/omnizart
           | and https://basicpitch.spotify.com/
           | 
           | They work better if you apply some source separation before
           | (e.g, https://github.com/sigsep/open-unmix-pytorch,
           | https://github.com/facebookresearch/demucs, or
           | https://mvsep.com)
           | 
           | Still, I think the best results are from proprietary models
           | (specifically https://www.ableton.com/en/manual/converting-
           | audio-to-midi/ and https://www.celemony.com/en/melodyne/what-
           | is-melodyne)
        
           | atoav wrote:
           | [delayed]
        
       | dekken_ wrote:
       | It's C and Python, not C++
        
         | dsego wrote:
         | C can be used in C++ code, no?
        
           | Galanwe wrote:
           | Something very close, but that's not what you would expect
           | for something that markets itself as a C++ library IMHO.
           | Especially in 2024, most people would hope (or assume) that
           | "C++" means "C++ 11" at least.
           | 
           | Definitely doesn't count as _lying_, but still underwhelming.
        
           | dekken_ wrote:
           | Depends, not all C is C++, eg, there is no (yet) `restrict`
           | keyword in C++ (even if lots of C++ compilers support
           | __restrict__, it's not in the spec)
        
           | bregma wrote:
           | Yes. And C can also be used with Python and Rust. That does
           | not make this a Rust library.
        
             | dsego wrote:
             | Right, but C++ started as an extension of C and is mostly
             | compatible and historically you could compile C with the
             | C++ compiler. I don't think it's a good comparison.
        
               | codetrotter wrote:
               | Zig can compile C. That makes this C/C++/Zig library.
               | Right? :^)
        
               | jcelerier wrote:
               | > historically you could compile C with the C++ compiler.
               | 
               | not any C, only the C++-compatible subset.
               | int* foo = malloc(sizeof(int));
               | 
               | has never worked in C++ for instance while it's valid C.
               | Code that worked is code that people actually did effort
               | to express in a way compatible with a C++ compiler.
        
           | epcoa wrote:
           | It is true that there is C code that is conforming C++ code.
           | However I would say if you're using a _C_ compiler with with
           | "extern C" in the headers for C++ linker compatibility (as
           | this library does) then saying C++ is about as misleading as
           | saying a Rust library is C++ as you can link to that too.
           | 
           | As far as compatibility and "history" the languages are
           | different enough now. There are both: features in C that do
           | not exist in C++, and code that is conforming C that would be
           | UB in C++. Saying C/C++ (for real) is usually a dumb target
           | when it's better to pick one and settle with that.
           | 
           | If it's C, just say so. Everyone knows what extern C is, you
           | don't need to confuse.
        
             | leonardohn wrote:
             | Even Pascal is closer to C than C++ is, yet historically
             | people use this term implying they are very close.
        
         | textlapse wrote:
         | All squares are rectangles I guess.
        
       | morning-coffee wrote:
       | Do you have one in safe Rust? See, we've only just met, and I
       | don't know how you handle your ptr/len arguments in C just yet.
       | ;)
        
       | gosub100 wrote:
       | Can this be used for audio fingerprinting?
        
       | BrannonKing wrote:
       | So are they going for feature parity with librosa? I think that
       | would be great.
        
       ___________________________________________________________________
       (page generated 2024-08-13 23:00 UTC)