[HN Gopher] Show HN: Alsa_rnnoise - RNNoise-based noise removal ...
       ___________________________________________________________________
        
       Show HN: Alsa_rnnoise - RNNoise-based noise removal plugin for ALSA
        
       Author : ArsenArsen
       Score  : 120 points
       Date   : 2021-01-31 11:44 UTC (11 hours ago)
        
 (HTM) web link (sr.ht)
 (TXT) w3m dump (sr.ht)
        
       | ArsenArsen wrote:
       | Ever since I got a new microphone I've been having issues with
       | large quantities of background noise, primarily typing sounds,
       | leaking in through my microphone. A friend of mine had told me
       | about rnnoise, describing it as "very good" at removing noise, so
       | I decided to test it. My initial testing started by me piping raw
       | PCM from arecord through rnnoises denoiser demo into aplay. The
       | results frankly shocked me. When not speaking, there was no noise
       | (or sound) at all (this is due to rnnoises voice detection
       | system, which essentially mutes the microphone when there's no
       | voice), and when I was talking, the sound of my keyboard was a
       | lot quieter without it affecting my voice, this lead me to decide
       | to develop alsa_rnnoise to have good real time noise
       | cancellation.
       | 
       | alsa_rnnoise is a very simple ALSA filter plugin that runs its
       | input through rnnoise before outputting it back to ALSA. It
       | operates in real time, adds very little latency (the amount
       | depends on what size frames ALSA delivers, but is nominally less
       | than 10ms). After enabling it, any annoying background sounds
       | were gone. My intended use cases for this were VoIP,
       | screencasting and streaming, and, as far as I can tell, it works
       | great for all three.
        
         | spion wrote:
         | I'm intrigued by this. I've never been really satisfied with
         | any software noise reduction system, but this sounds like a
         | phenomenal improvement.
         | 
         | Tried installing it on Ubuntu LTS via latest pulseeffects, but
         | it seems like it switched to pipewire 0.3 which is not
         | available yet so I can't really run it.
         | 
         | My solution so far has been a headset with a boom microphone,
         | like the CoolerMaster MH630 (one of the better boom mics, see
         | https://www.rtings.com/headphones/reviews/cooler-master/mh63...
         | for a sound demo at the "Recording quality" section). When it
         | comes to noise reduction, bringing the microphone as close to
         | the mouth as possible is a really good way to get an amazing
         | SNR boost immediately (even for omnidirectional mics).
         | Unfortunately that's a pain with large capsule condenser
         | microphones (unless you're ready to have a large boom arm
         | hanging about on your desk and accept view obstruction, and you
         | add postprocessing to remove the very bassy proximity effect)
         | 
         | Another benefit of headsets with boom mics is consistency
         | (without DSP). No matter how you move, the distance and angle
         | to the microphone is always identical, and therefore the sound
         | is very consistent (save for your own personal loudness)
         | 
         | You can of course add DSP (limiter, noise reduction etc) to
         | that, but the better input you provide, the better output you
         | get.
        
         | drblah wrote:
         | A few weeks ago, I tried doing this with the LADSPA version of
         | RNNoise and ALSA, but couldn't get it to work reliably.
         | 
         | I have also experimented with NoiseTorch which routes the
         | microphone through LADSPA using Pulseaudio but it didn't work
         | reliably either. The biggest problem with this is that
         | Pulseaudio will load one CPU thread 100% even when no audio
         | input. This makes it a deal breaker for laptops.
         | 
         | I will definitely check this out. RNNoise is truly amazing
         | tech, but it is not as accessible as I would like. The best use
         | if it is in the Mumble client where it is an optional setting.
         | 
         | It is a shame Nvidia has taken over this space completely with
         | RTX voice. RNNoise does a comparable job without the need for
         | an Nvidia GPU. But I guess it is because RNNoise is just not as
         | easy to setup.
        
           | ArsenArsen wrote:
           | I believe Pulse also has a LADSPA module that you could try.
        
         | onli wrote:
         | Hey, thanks for building this! There are multiple options like
         | this for Pulseaudio, but last I researched it nothing for pure
         | ALSA. On a system without Pulseaudio this is obviously better,
         | great to have.
         | 
         | The pulseaudio plugins like noisetorch have the issue of
         | significant system load even without current sound input
         | (something about how the loopback works iirc), will this alsa
         | plugin share that issue or will the system load be lower when
         | currently there is no sound input?
        
           | darkwater wrote:
           | > There are multiple options like this for Pulseaudio,
           | 
           | Mind sharing? I didn't manage to find any that was easy to
           | install/configure, so it would help me a lot. Thanks!
        
             | onli wrote:
             | NoiseTorch is the one I use on my Pulseaudio-enabled
             | laptop, https://github.com/lawl/NoiseTorch. It has a GUI
             | and is very easy to install, seems to work well.
             | Significant cpu usage when active, so I only load it when
             | it's needed, but that's okay for me.
        
           | ArsenArsen wrote:
           | The plugin uses very little CPU, and is entirely inactive
           | when not in use (i.e. when data isn't being pulled => the
           | transfer function isn't being called) due to how ALSA works
           | 
           | EDIT: Do note, though, that each process pulling audio will
           | be denoising independently, so the usage scales linearly with
           | the amount of clients. This is due to how ALSA plugins work,
           | but regardless of that, on a Ryzen 5 1600x (the only CPU I
           | can test on), the plugin uses 2.5% of a single core when
           | recording mono 48k
        
             | onli wrote:
             | Excellent.
             | 
             | I'm testing this right now and am noticing that some more
             | info about the installation could be helpful. Specifically,
             | when installing rnnoise as shown in the readme it of course
             | goes to /usr/local/lib, but /usr/local/lib/pkgconfig was
             | not in the PKG_CONFIG_PATH of my distro. Maybe there could
             | be a hint to set that when calling `meson build` if rnnoise
             | can't be found?
             | 
             | Packaging software is always annoying, sorry for dragging
             | you into that mud. Ideally distros will pick it up and
             | compiling manually unnecessary. I would have left this as
             | an issue but saw no issue tracker on the project page.
        
               | ArsenArsen wrote:
               | There's an issue tracker on that page, under tickets, but
               | I'd prefer if you took a discussion to the attached
               | mailing list first before it hits the official tracker.
               | 
               | As for packaging, that's my field of work for some
               | projects I'm working on so it's not unfamiliar to me, the
               | only problem is that the RNNoise upstream lacks releases,
               | although there's discussion about something happening
               | about that.
        
               | onli wrote:
               | Okay. To also mention the result: Installation worked,
               | alsa plugin worked and the filter does work. Nice, thanks
               | again.
               | 
               | With extreme sounds (vacuum) in the background the voice
               | gets a bit more distorted than ideal, but something like
               | a keyboard gets filtered nicely to be less noisy. I
               | assumed that's just how RNNoise behaves, I'm just
               | mentioning it because of the sound quality discussion
               | above. Maybe also to that: Just activating the
               | alsa_rnnoise filter does not significantly lower
               | recording quality, at least not that I can notice.
        
           | [deleted]
        
       | the_real_sparky wrote:
       | rnnoise is fantastic. I use it in an Equalizer APO filter chain
       | on my gaming machine along with an EQ and compressor which are
       | fed from a dynamic mic. I consistently get comments about the
       | quality of my mic setup in-game and on Discord.
       | 
       | The best part is that it has almost no impact on voice quality,
       | unlike Krisp and some other options I have tried. Singing into
       | the filter chain even sounds good, with the exception of when my
       | 5 year old daughter joins in. rnnnoise seems to think that her
       | voice is noise and tries to intermittently filter it out, which
       | causes a volume warble while we sing together. To be fair, 99.9%
       | of the time her voice should definitely be considered noise I
       | want filtered out. ;)
        
       | cristyansv wrote:
       | looks promising.
       | 
       | but I've always wondered how Krisp.ai achieves such good results,
       | considering that it works on the local device, plus the size is
       | quite small (a few hundred MB). it really impresses me.
       | disclaimer: I'm not affiliated in any way with Krisp.ai, just a
       | happy user.
        
       | methyl wrote:
       | For PulseAudio, there is https://github.com/lawl/NoiseTorch
        
       | PostThisTooFast wrote:
       | What's "ALSA?"
        
       | dgellow wrote:
       | If you're using Windows, I recently found it this small tool to
       | reduce background and keyboard/mouse noises:
       | https://closedlooplabs.com. It's not open source as far as I'm
       | aware but way cheaper than krisp.ai's subscription model.
        
         | ArsenArsen wrote:
         | It is possible to use VST2 on Windows. This way you get RNNoise
         | and the advantages of Free software.
         | 
         | https://github.com/werman/noise-suppression-for-voice
        
       | syntaxing wrote:
       | Is there any RNNoise based alternative for MacOS? I managed to
       | install the plug-in but find it hard to pipeline the audio into
       | it.
        
       | pabs3 wrote:
       | I noticed that RNNoise doesn't appear to be an open model, you
       | can't re-train it from scratch from the source data, which isn't
       | publicly documented (or doesn't exist?), even if you had enough
       | hardware.
        
         | ArsenArsen wrote:
         | The documentation is a bit poor. The original data is available
         | for download (with more info about the entire process, most of
         | which is outside of my grasp as I am not an ML person) in the
         | demo blog post: https://jmvalin.ca/demo/rnnoise/ (towards the
         | bottom of the page)
        
           | ArsenArsen wrote:
           | Coming back with information from #xiph on freenode:
           | 16:57 <ArsenArsen> where and under what license is the
           | training data used for RNNoise?       18:38 <rillian>
           | ArsenArsen: There's a copy of what I believe is the training
           | data on the xiph server, but afaik it's never been published
           | 18:39 <rillian> the original submission page has an EULA
           | waiving copyright and liability claims, and agreeing that it
           | _may_ be released CC0.       18:40 <rillian> it looks like
           | that didn't actually happen.       18:41 <rillian> there may
           | have been concerns about auditing it for privacy issues, but
           | there's a lot of audio to listen to, 6.5G compressed
           | 18:41 <rillian> jmspeex, TD-Linux: what's the status of
           | publishing the rnnoise training data?       18:43 <jmspeex>
           | Are you talking about the data that was used to train the
           | default RNNoise model or the noise that got collected with
           | the demo?       18:43 <rillian> jmspeex: I think debian just
           | cares about the training data for the default model.
           | 18:44 <jmspeex> There was never plan to release that -- it
           | includes data from databases we cannot release       18:44
           | <jmspeex> but I don't see what the issue is. Distributing the
           | model is not the same as distributing the data       18:45
           | <rillian> ah, I see. I didn't realize you'd used proprietary
           | sources as well.
        
           | pabs3 wrote:
           | Any idea about the license for the original data?
        
             | pabs3 wrote:
             | The paper links to the McGill TSP speech database (English
             | & French) as one of the sources of the data, which claims
             | to be BSD licensed:
             | 
             | http://www-mmsp.ece.mcgill.ca/Documents/Data/
        
             | pabs3 wrote:
             | The other source of data mentioned in the paper is the NTT
             | Multi-Lingual Speech Database for Telephonometry, which
             | seems to be commercial, so presumably under a proprietary
             | license.
             | 
             | https://www.ntt-at.com/product/multilingual/
             | https://www.ntt-at.com/product/speech2002/
        
             | the-dude wrote:
             | So far we have 3 ideas!
        
             | pabs3 wrote:
             | Hmm, OTOH, the 6.4GB data tarball says that it is from
             | contributors who responded to the demo and is licensed
             | under CC0.
        
               | ArsenArsen wrote:
               | +1, that data is CC0, and I believe that's all the data
               | that was used for training.
        
               | jmvalin wrote:
               | No, exactly _none_ of that data was used for training.
               | The training was done before the demo that was asking for
               | noise contributions. The contributions are CC0, but were
               | never used (i.e. totally unknown dataset quality).
        
           | pabs3 wrote:
           | Also any idea if the training required nvidia GPUs or was it
           | done on CPUs or GPUs with non-proprietary drivers?
        
             | ArsenArsen wrote:
             | There is training instructions in the repository. The
             | training scripts appear to be using some pretty standard ML
             | libraries (I'm seeing keras and mentions of tensorflow), so
             | I imagine that the requirements are the same as those.
             | 
             | I don't feel I'm qualified to elaborate on this
             | specifically, again, I'm no ML person. For more info look
             | here: https://github.com/xiph/rnnoise/tree/master/training
             | https://github.com/xiph/rnnoise/blob/master/TRAINING-README
        
       | ZoomZoomZoom wrote:
       | Sound engineer here.
       | 
       | RNNoise is an amazing feat, but please, don't overdo it. Most of
       | the time, you don't really want complete ambient noise
       | elimination, as human speech appearing from dead silence sounds
       | unnatural. Moreover, most noise reduction software is
       | considerably less effective in reducing noise _during_ a person
       | speaking, either removing too much, producing degraded speech
       | sound (worst case) or too little. If it 's possible, always start
       | adding your noise reduction gradually, stop when it sounds good
       | to your ear and then back up a bit.
       | 
       | If you're doing voice recording/streaming, please, get to know
       | Expanding and Compression first, and only after configuring your
       | sound processing chain add noise reduction in.
       | 
       | On of the serious offenders is OBS studio, which recently added
       | RNNoise filter, but provides no means of mixing processed sound
       | with the dry one (in other words, filter is always 100% on).
       | Wet/Dry mix knob is heavily needed for most filters there.
       | 
       | I'm very saddened by the state of sound quality in lots of
       | amazing videos people have been producing lately and now I'm
       | considering writing a guide for voice processing for
       | streams/conferences/etc for the techy people, if anyone's
       | interested.
        
         | zamadatix wrote:
         | I wouldn't be too worried about it unless you're working on
         | something at the level you know why to be worried about it
         | (i.e. you're mixing audio as part of the what you're doing not
         | because you just need the audio output to work). For instance
         | I'd take missing comfort noise 10 times before everyone hearing
         | my water heater kick up once on a conference call or while
         | playing a team shooter.
         | 
         | That being said RNNoise isn't that great at actually filtering
         | background noise as much as guessing when to drop the levels
         | and as you mention it really doesn't block much when it detects
         | you're speaking rather just lets most everything through until
         | you stop.
         | 
         | RTX voice made the gold standard in filtering IMO though and as
         | amazing a feat RNNoise is (I certainly couldn't do better) it's
         | just not that good in comparison. I'm not sure what they did to
         | make their model so good but I can use a boom mic set to omni,
         | run a fan at high speed into the mic, bang on the desk
         | repeatedly with one hand, have the water heater making noise,
         | my phone vibrating on the table, a car alarm going in the
         | background, the cat scratching a post, and so on and as long as
         | I remember to talk at a normal volume it's damn near
         | indistinguishable from talking in a quiet room. It may sound
         | preposterous or like I'm exaggerating for effect but I'll be
         | damned it actually filters that well. I didn't believe it until
         | I tried. It finally gets "bad" when the noise is so bad and
         | loud on the microphone your voice starts to sound a bit
         | distorted but it's still isolated. Does let cat meows through,
         | though that is technically voice and I'm not sure how you could
         | identify it was a meow without massive latency to hear the
         | whole thing first.
         | 
         | That being said they seem to have completely fucked something
         | up porting it to Nvidia Broadcast as the mic filtering in that
         | leaks to the point it was like it wasn't even on.
        
         | im_dario wrote:
         | Your guide would be a blessing for techies looking to improve
         | their audio quality. Please, do it!
        
         | gsich wrote:
         | >ng feat, but please, don't overdo it. Most of the time, you
         | don't really want complete ambient noise elimination, as human
         | speech appearing from dead silence sounds unnatural.
         | 
         | No. Most sane programs don't do comfort noise because it is
         | everything but comfort. Iff you speak data should be
         | transmitted.
        
         | StavrosK wrote:
         | I think pretty much everyone who does A/V production (and some
         | people who don't, like me) would be interested in such a guide.
         | Please do write it!
        
         | ArsenArsen wrote:
         | I'd be quite interested in such an article, again, my goal
         | (besides VoIP) is screencasting and/or streaming, so any bit of
         | advice someone with experience might have is greatly useful.
         | 
         | I'll look into expansion and compression, and I could implement
         | a wet/dry setting that multiplies the source samples and then
         | mixes them into the result, if I understood the concept right.
         | 
         | EDIT: RNNoise seems to be alright when it comes to canceling
         | noise during speech too, I didn't notice it overdoing it.
        
           | ZoomZoomZoom wrote:
           | > I could implement a wet/dry setting that multiplies the
           | source samples and then mixes them into the result, if I
           | understood the concept right.
           | 
           | Haven't tested your version yet, but werman/noise-
           | suppression-for-voice plugin introduces some delay and dumb
           | wet/dry control (or mixing with original sound source in some
           | other way) doesn't work, so it might turn out to be not so
           | simple.
        
             | ArsenArsen wrote:
             | Right now there's no such feature in place, but I imagine
             | keeping the buffer from before denoising and mixing it into
             | the denoised result (plus the multiplication) will do what
             | you're describing? It may increase volume, I might need to
             | reduce the volume of the denoised audio first. I'll play
             | around with it, and am open to hearing what you've got to
             | say about it.
        
         | AndrewUnmuted wrote:
         | Great post.
         | 
         | I'm also an audio engineer. This is the truth.
         | 
         | In an audio recording featuring spoken voice, there are two
         | sounds present in every recording: the spoken voice, and the
         | room ambiance in the background. We typically will refer to the
         | latter as "room tone."
         | 
         | Even though we don't usually explicitly realize this, our
         | ears/brain implicitly do. So, when people overdo noise removal,
         | we implicitly hear the difference since half of the sounds that
         | compose your filtered output are now gone. We tend to associate
         | such recognizable "noise gating" with lower production quality
         | and we find that generally such processing leads to lower
         | intelligibility of the human voice.
        
           | NovemberWhiskey wrote:
           | The addition of an artificial ambient background is known as
           | "comfort noise" for those who are interested to look further
           | into it; usually it's done on the receiver end.
        
       ___________________________________________________________________
       (page generated 2021-01-31 23:01 UTC)