[HN Gopher] AudioFlux: A C/C++ library for audio and music analysis
___________________________________________________________________
AudioFlux: A C/C++ library for audio and music analysis
Author : CMLab
Score : 152 points
Date : 2024-08-13 13:51 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| bravura wrote:
| If this is supposed to be used for deep-learning, shouldn't all
| the transforms be GPU-accelerated torch functions?
| herogary wrote:
| Maybe for the convenience of mobile usage?
| tgv wrote:
| By the looks of it, those functions extract features (like
| frequency peaks). You do that once for a sound. The output
| could function as input for an NN, in which case it would be a
| tokenizer for sound.
| bravura wrote:
| Given what I've seen in audio ML research:
|
| 1) Tuning hyperparameters of your audio preprocessing is a
| pain if it's a preprocessed CPU step. You have to redo
| preprocessing every time you want to tune your audio feature
| hyperparams
|
| 2) It's quite common to use torchaudio spectrograms, etc.
| purely because they are faster (I can link to a handful of
| recent high-impact audio ML github repos if you like)
|
| 3) If you use nnAudio, you can actually backprop the STFT or
| mel filters and tune them if you like. With that said, this
| is not so commonplace.
|
| 4) Sometimes the audio is GENERATED by a GPU. For example, in
| a neural vocoder, you decode the audio from a mel to a
| waveform. Then, you compute the loss over the true versus
| predict audio mel spectrograms. You can't do this with these
| C++ features. (Again, I can link a handful of recent high-
| impact audio ML github repos if you like.)
|
| Again, I just don't get it.
| codetrotter wrote:
| > I can link to a handful of recent high-impact audio ML
| github repos if you like
|
| Yes please :D
| bravura wrote:
| For instance:
|
| https://github.com/descriptinc/descript-audio-
| codec/blob/mai...
|
| https://github.com/NVIDIA/BigVGAN/blob/main/loss.py#L23
|
| https://arxiv.org/pdf/2210.13438 (the github repo doesn't
| include training, just inference)
|
| It is INCREDIBLY common to use multi-scale spectral loss
| as the audio distance / objective measure in audio
| generation. They have some issues (i.e. they aren't
| always well correlated with human perception) but they
| are the known-current-best.
| tgv wrote:
| Backpropping filter coefficients sounds clever, but can't
| you just do that on any layer that takes a spectrum as
| input?
| bravura wrote:
| Backpropping filter coefficients is clever, but it hasn't
| really caught on much. Google also tried with LEAF
| (https://github.com/google-research/leaf-audio) to have a
| learnable audio filterbank.
|
| Anyway, in audio ML what is very common is:
|
| a) Futzing with the way you do feature extraction on the
| input. (Oh, maybe I want CQT for this task or a different
| scale Mel etc)
|
| b) Doing feature extraction on generated audio output,
| and constructing loss functions from generated audio
| features.
|
| So, as I said, I don't exactly see the utility of this
| library for deep learning.
|
| With that said, it is definitely nice to have really high
| speed low latency audio algorithms in C++. I just
| wouldn't market it as "useful for deep learning" because
|
| a) during training, you need more flexibility than non-
| GPU methods without backprop
|
| b) if you are doing "deep learning" then your inferred
| model will presumably be quite large, and there will be a
| million other things you'll need to optimize to get real-
| time inference or inference on CPUs to work well.
|
| Is just my gut reaction. It seems like a solid project, I
| just question the one selling point of "useful for deep
| learning" that's all.
| Severian wrote:
| Are there resources you would recommend reading regarding
| ML and audio?
| bravura wrote:
| This is a really broad topic. I began studying it about 5
| years ago.
|
| Can you start by suggesting what you task you want to do?
| I'll throw out some suggestions, but you can say
| something different. Also you are welcome to email me
| (email in HN profile):
|
| * Voice conversion / singing voice conversion
|
| * Transcription of audio to MIDI
|
| * Classification / tagging of audio scene
|
| * Applying some effect / cleanup to audio
|
| * Separating audio into different instruments
|
| etc
|
| The really quick summary of audio ML as a topic is:
|
| * Often people treat it audio ML as vision ML, by using
| spectrogram representations of audio. Nonetheless, 1D
| models are sometimes just as good if not better, but they
| require very specific familiarity with the audio domain.
|
| * Audio distance measures (loss functions) are pretty
| crappy and not well-correlated with human perception. You
| can say the same thing about vision distance measures,
| but a lot more research has gone into vision models so we
| have better heuristics around vision stuff. With that
| said, multi-scale log mel spectrogram isn't that
| terrible.
|
| * Audio has a handful of little gotches around padding,
| windowing, etc.
|
| * DSP is a black art and DSP knowledge has high ROI
| versus just being dumb and black boxy about everything.
| nesarkvechnep wrote:
| What's this C/C++ language?
| n4r9 wrote:
| Some preliminary analysis suggests that if C is an integer
| greater than 1, C/C++ will always evaluate to 1 [0].
|
| [0] https://www.programiz.com/online-compiler/9fkHTct0Mybpu
| troymc wrote:
| 1/++
| rossant wrote:
| or 1++ if you got your C operator precedence wrong
| iExploder wrote:
| mean you are using C++ but only the features that dont suck, so
| probably 10% of the language, rest is plain C
| dsego wrote:
| It's also for Python, I just discovered it a few days ago. This
| is the website https://audioflux.top/
| jcelerier wrote:
| It would be nice to have a comparison with any of the many C++
| MIR (music information retrieval) libraries in the wild:
|
| - https://essentia.upf.edu/
|
| - https://github.com/marsyas/marsyas
|
| - https://github.com/ircam-ismm/pipo
|
| - https://github.com/flucoma/flucoma-core/tree/main/include/al...
| BrannonKing wrote:
| If a person wanted to transcribe sheet music from recorded
| audio, do you know which library and features would be the best
| starting point?
| bckr wrote:
| Start with source separation using demucs
| bravura wrote:
| I have had mixed luck with this model, which is supposedly
| state-of-the-art: https://github.com/magenta/mt3
|
| What kind of music are you trying to transcribe?
|
| Feel free to email me.
| cpdomina wrote:
| For MIDI:
|
| https://github.com/Music-and-Culture-Technology-Lab/omnizart
| and https://basicpitch.spotify.com/
|
| They work better if you apply some source separation before
| (e.g, https://github.com/sigsep/open-unmix-pytorch,
| https://github.com/facebookresearch/demucs, or
| https://mvsep.com)
|
| Still, I think the best results are from proprietary models
| (specifically https://www.ableton.com/en/manual/converting-
| audio-to-midi/ and https://www.celemony.com/en/melodyne/what-
| is-melodyne)
| atoav wrote:
| [delayed]
| dekken_ wrote:
| It's C and Python, not C++
| dsego wrote:
| C can be used in C++ code, no?
| Galanwe wrote:
| Something very close, but that's not what you would expect
| for something that markets itself as a C++ library IMHO.
| Especially in 2024, most people would hope (or assume) that
| "C++" means "C++ 11" at least.
|
| Definitely doesn't count as _lying_, but still underwhelming.
| dekken_ wrote:
| Depends, not all C is C++, eg, there is no (yet) `restrict`
| keyword in C++ (even if lots of C++ compilers support
| __restrict__, it's not in the spec)
| bregma wrote:
| Yes. And C can also be used with Python and Rust. That does
| not make this a Rust library.
| dsego wrote:
| Right, but C++ started as an extension of C and is mostly
| compatible and historically you could compile C with the
| C++ compiler. I don't think it's a good comparison.
| codetrotter wrote:
| Zig can compile C. That makes this C/C++/Zig library.
| Right? :^)
| jcelerier wrote:
| > historically you could compile C with the C++ compiler.
|
| not any C, only the C++-compatible subset.
| int* foo = malloc(sizeof(int));
|
| has never worked in C++ for instance while it's valid C.
| Code that worked is code that people actually did effort
| to express in a way compatible with a C++ compiler.
| epcoa wrote:
| It is true that there is C code that is conforming C++ code.
| However I would say if you're using a _C_ compiler with with
| "extern C" in the headers for C++ linker compatibility (as
| this library does) then saying C++ is about as misleading as
| saying a Rust library is C++ as you can link to that too.
|
| As far as compatibility and "history" the languages are
| different enough now. There are both: features in C that do
| not exist in C++, and code that is conforming C that would be
| UB in C++. Saying C/C++ (for real) is usually a dumb target
| when it's better to pick one and settle with that.
|
| If it's C, just say so. Everyone knows what extern C is, you
| don't need to confuse.
| leonardohn wrote:
| Even Pascal is closer to C than C++ is, yet historically
| people use this term implying they are very close.
| textlapse wrote:
| All squares are rectangles I guess.
| morning-coffee wrote:
| Do you have one in safe Rust? See, we've only just met, and I
| don't know how you handle your ptr/len arguments in C just yet.
| ;)
| gosub100 wrote:
| Can this be used for audio fingerprinting?
| BrannonKing wrote:
| So are they going for feature parity with librosa? I think that
| would be great.
___________________________________________________________________
(page generated 2024-08-13 23:00 UTC)