[HN Gopher] Show HN: Alsa_rnnoise - RNNoise-based noise removal ...
___________________________________________________________________
Show HN: Alsa_rnnoise - RNNoise-based noise removal plugin for ALSA
Author : ArsenArsen
Score : 120 points
Date : 2021-01-31 11:44 UTC (11 hours ago)
(HTM) web link (sr.ht)
(TXT) w3m dump (sr.ht)
| ArsenArsen wrote:
| Ever since I got a new microphone I've been having issues with
| large quantities of background noise, primarily typing sounds,
| leaking in through my microphone. A friend of mine had told me
| about rnnoise, describing it as "very good" at removing noise, so
| I decided to test it. My initial testing started by me piping raw
| PCM from arecord through rnnoises denoiser demo into aplay. The
| results frankly shocked me. When not speaking, there was no noise
| (or sound) at all (this is due to rnnoises voice detection
| system, which essentially mutes the microphone when there's no
| voice), and when I was talking, the sound of my keyboard was a
| lot quieter without it affecting my voice, this lead me to decide
| to develop alsa_rnnoise to have good real time noise
| cancellation.
|
| alsa_rnnoise is a very simple ALSA filter plugin that runs its
| input through rnnoise before outputting it back to ALSA. It
| operates in real time, adds very little latency (the amount
| depends on what size frames ALSA delivers, but is nominally less
| than 10ms). After enabling it, any annoying background sounds
| were gone. My intended use cases for this were VoIP,
| screencasting and streaming, and, as far as I can tell, it works
| great for all three.
| spion wrote:
| I'm intrigued by this. I've never been really satisfied with
| any software noise reduction system, but this sounds like a
| phenomenal improvement.
|
| Tried installing it on Ubuntu LTS via latest pulseeffects, but
| it seems like it switched to pipewire 0.3 which is not
| available yet so I can't really run it.
|
| My solution so far has been a headset with a boom microphone,
| like the CoolerMaster MH630 (one of the better boom mics, see
| https://www.rtings.com/headphones/reviews/cooler-master/mh63...
| for a sound demo at the "Recording quality" section). When it
| comes to noise reduction, bringing the microphone as close to
| the mouth as possible is a really good way to get an amazing
| SNR boost immediately (even for omnidirectional mics).
| Unfortunately that's a pain with large capsule condenser
| microphones (unless you're ready to have a large boom arm
| hanging about on your desk and accept view obstruction, and you
| add postprocessing to remove the very bassy proximity effect)
|
| Another benefit of headsets with boom mics is consistency
| (without DSP). No matter how you move, the distance and angle
| to the microphone is always identical, and therefore the sound
| is very consistent (save for your own personal loudness)
|
| You can of course add DSP (limiter, noise reduction etc) to
| that, but the better input you provide, the better output you
| get.
| drblah wrote:
| A few weeks ago, I tried doing this with the LADSPA version of
| RNNoise and ALSA, but couldn't get it to work reliably.
|
| I have also experimented with NoiseTorch which routes the
| microphone through LADSPA using Pulseaudio but it didn't work
| reliably either. The biggest problem with this is that
| Pulseaudio will load one CPU thread 100% even when no audio
| input. This makes it a deal breaker for laptops.
|
| I will definitely check this out. RNNoise is truly amazing
| tech, but it is not as accessible as I would like. The best use
| if it is in the Mumble client where it is an optional setting.
|
| It is a shame Nvidia has taken over this space completely with
| RTX voice. RNNoise does a comparable job without the need for
| an Nvidia GPU. But I guess it is because RNNoise is just not as
| easy to setup.
| ArsenArsen wrote:
| I believe Pulse also has a LADSPA module that you could try.
| onli wrote:
| Hey, thanks for building this! There are multiple options like
| this for Pulseaudio, but last I researched it nothing for pure
| ALSA. On a system without Pulseaudio this is obviously better,
| great to have.
|
| The pulseaudio plugins like noisetorch have the issue of
| significant system load even without current sound input
| (something about how the loopback works iirc), will this alsa
| plugin share that issue or will the system load be lower when
| currently there is no sound input?
| darkwater wrote:
| > There are multiple options like this for Pulseaudio,
|
| Mind sharing? I didn't manage to find any that was easy to
| install/configure, so it would help me a lot. Thanks!
| onli wrote:
| NoiseTorch is the one I use on my Pulseaudio-enabled
| laptop, https://github.com/lawl/NoiseTorch. It has a GUI
| and is very easy to install, seems to work well.
| Significant cpu usage when active, so I only load it when
| it's needed, but that's okay for me.
| ArsenArsen wrote:
| The plugin uses very little CPU, and is entirely inactive
| when not in use (i.e. when data isn't being pulled => the
| transfer function isn't being called) due to how ALSA works
|
| EDIT: Do note, though, that each process pulling audio will
| be denoising independently, so the usage scales linearly with
| the amount of clients. This is due to how ALSA plugins work,
| but regardless of that, on a Ryzen 5 1600x (the only CPU I
| can test on), the plugin uses 2.5% of a single core when
| recording mono 48k
| onli wrote:
| Excellent.
|
| I'm testing this right now and am noticing that some more
| info about the installation could be helpful. Specifically,
| when installing rnnoise as shown in the readme it of course
| goes to /usr/local/lib, but /usr/local/lib/pkgconfig was
| not in the PKG_CONFIG_PATH of my distro. Maybe there could
| be a hint to set that when calling `meson build` if rnnoise
| can't be found?
|
| Packaging software is always annoying, sorry for dragging
| you into that mud. Ideally distros will pick it up and
| compiling manually unnecessary. I would have left this as
| an issue but saw no issue tracker on the project page.
| ArsenArsen wrote:
| There's an issue tracker on that page, under tickets, but
| I'd prefer if you took a discussion to the attached
| mailing list first before it hits the official tracker.
|
| As for packaging, that's my field of work for some
| projects I'm working on so it's not unfamiliar to me, the
| only problem is that the RNNoise upstream lacks releases,
| although there's discussion about something happening
| about that.
| onli wrote:
| Okay. To also mention the result: Installation worked,
| alsa plugin worked and the filter does work. Nice, thanks
| again.
|
| With extreme sounds (vacuum) in the background the voice
| gets a bit more distorted than ideal, but something like
| a keyboard gets filtered nicely to be less noisy. I
| assumed that's just how RNNoise behaves, I'm just
| mentioning it because of the sound quality discussion
| above. Maybe also to that: Just activating the
| alsa_rnnoise filter does not significantly lower
| recording quality, at least not that I can notice.
| [deleted]
| the_real_sparky wrote:
| rnnoise is fantastic. I use it in an Equalizer APO filter chain
| on my gaming machine along with an EQ and compressor which are
| fed from a dynamic mic. I consistently get comments about the
| quality of my mic setup in-game and on Discord.
|
| The best part is that it has almost no impact on voice quality,
| unlike Krisp and some other options I have tried. Singing into
| the filter chain even sounds good, with the exception of when my
| 5 year old daughter joins in. rnnnoise seems to think that her
| voice is noise and tries to intermittently filter it out, which
| causes a volume warble while we sing together. To be fair, 99.9%
| of the time her voice should definitely be considered noise I
| want filtered out. ;)
| cristyansv wrote:
| looks promising.
|
| but I've always wondered how Krisp.ai achieves such good results,
| considering that it works on the local device, plus the size is
| quite small (a few hundred MB). it really impresses me.
| disclaimer: I'm not affiliated in any way with Krisp.ai, just a
| happy user.
| methyl wrote:
| For PulseAudio, there is https://github.com/lawl/NoiseTorch
| PostThisTooFast wrote:
| What's "ALSA?"
| dgellow wrote:
| If you're using Windows, I recently found it this small tool to
| reduce background and keyboard/mouse noises:
| https://closedlooplabs.com. It's not open source as far as I'm
| aware but way cheaper than krisp.ai's subscription model.
| ArsenArsen wrote:
| It is possible to use VST2 on Windows. This way you get RNNoise
| and the advantages of Free software.
|
| https://github.com/werman/noise-suppression-for-voice
| syntaxing wrote:
| Is there any RNNoise based alternative for MacOS? I managed to
| install the plug-in but find it hard to pipeline the audio into
| it.
| pabs3 wrote:
| I noticed that RNNoise doesn't appear to be an open model, you
| can't re-train it from scratch from the source data, which isn't
| publicly documented (or doesn't exist?), even if you had enough
| hardware.
| ArsenArsen wrote:
| The documentation is a bit poor. The original data is available
| for download (with more info about the entire process, most of
| which is outside of my grasp as I am not an ML person) in the
| demo blog post: https://jmvalin.ca/demo/rnnoise/ (towards the
| bottom of the page)
| ArsenArsen wrote:
| Coming back with information from #xiph on freenode:
| 16:57 <ArsenArsen> where and under what license is the
| training data used for RNNoise? 18:38 <rillian>
| ArsenArsen: There's a copy of what I believe is the training
| data on the xiph server, but afaik it's never been published
| 18:39 <rillian> the original submission page has an EULA
| waiving copyright and liability claims, and agreeing that it
| _may_ be released CC0. 18:40 <rillian> it looks like
| that didn't actually happen. 18:41 <rillian> there may
| have been concerns about auditing it for privacy issues, but
| there's a lot of audio to listen to, 6.5G compressed
| 18:41 <rillian> jmspeex, TD-Linux: what's the status of
| publishing the rnnoise training data? 18:43 <jmspeex>
| Are you talking about the data that was used to train the
| default RNNoise model or the noise that got collected with
| the demo? 18:43 <rillian> jmspeex: I think debian just
| cares about the training data for the default model.
| 18:44 <jmspeex> There was never plan to release that -- it
| includes data from databases we cannot release 18:44
| <jmspeex> but I don't see what the issue is. Distributing the
| model is not the same as distributing the data 18:45
| <rillian> ah, I see. I didn't realize you'd used proprietary
| sources as well.
| pabs3 wrote:
| Any idea about the license for the original data?
| pabs3 wrote:
| The paper links to the McGill TSP speech database (English
| & French) as one of the sources of the data, which claims
| to be BSD licensed:
|
| http://www-mmsp.ece.mcgill.ca/Documents/Data/
| pabs3 wrote:
| The other source of data mentioned in the paper is the NTT
| Multi-Lingual Speech Database for Telephonometry, which
| seems to be commercial, so presumably under a proprietary
| license.
|
| https://www.ntt-at.com/product/multilingual/
| https://www.ntt-at.com/product/speech2002/
| the-dude wrote:
| So far we have 3 ideas!
| pabs3 wrote:
| Hmm, OTOH, the 6.4GB data tarball says that it is from
| contributors who responded to the demo and is licensed
| under CC0.
| ArsenArsen wrote:
| +1, that data is CC0, and I believe that's all the data
| that was used for training.
| jmvalin wrote:
| No, exactly _none_ of that data was used for training.
| The training was done before the demo that was asking for
| noise contributions. The contributions are CC0, but were
| never used (i.e. totally unknown dataset quality).
| pabs3 wrote:
| Also any idea if the training required nvidia GPUs or was it
| done on CPUs or GPUs with non-proprietary drivers?
| ArsenArsen wrote:
| There is training instructions in the repository. The
| training scripts appear to be using some pretty standard ML
| libraries (I'm seeing keras and mentions of tensorflow), so
| I imagine that the requirements are the same as those.
|
| I don't feel I'm qualified to elaborate on this
| specifically, again, I'm no ML person. For more info look
| here: https://github.com/xiph/rnnoise/tree/master/training
| https://github.com/xiph/rnnoise/blob/master/TRAINING-README
| ZoomZoomZoom wrote:
| Sound engineer here.
|
| RNNoise is an amazing feat, but please, don't overdo it. Most of
| the time, you don't really want complete ambient noise
| elimination, as human speech appearing from dead silence sounds
| unnatural. Moreover, most noise reduction software is
| considerably less effective in reducing noise _during_ a person
| speaking, either removing too much, producing degraded speech
| sound (worst case) or too little. If it 's possible, always start
| adding your noise reduction gradually, stop when it sounds good
| to your ear and then back up a bit.
|
| If you're doing voice recording/streaming, please, get to know
| Expanding and Compression first, and only after configuring your
| sound processing chain add noise reduction in.
|
| On of the serious offenders is OBS studio, which recently added
| RNNoise filter, but provides no means of mixing processed sound
| with the dry one (in other words, filter is always 100% on).
| Wet/Dry mix knob is heavily needed for most filters there.
|
| I'm very saddened by the state of sound quality in lots of
| amazing videos people have been producing lately and now I'm
| considering writing a guide for voice processing for
| streams/conferences/etc for the techy people, if anyone's
| interested.
| zamadatix wrote:
| I wouldn't be too worried about it unless you're working on
| something at the level you know why to be worried about it
| (i.e. you're mixing audio as part of the what you're doing not
| because you just need the audio output to work). For instance
| I'd take missing comfort noise 10 times before everyone hearing
| my water heater kick up once on a conference call or while
| playing a team shooter.
|
| That being said RNNoise isn't that great at actually filtering
| background noise as much as guessing when to drop the levels
| and as you mention it really doesn't block much when it detects
| you're speaking rather just lets most everything through until
| you stop.
|
| RTX voice made the gold standard in filtering IMO though and as
| amazing a feat RNNoise is (I certainly couldn't do better) it's
| just not that good in comparison. I'm not sure what they did to
| make their model so good but I can use a boom mic set to omni,
| run a fan at high speed into the mic, bang on the desk
| repeatedly with one hand, have the water heater making noise,
| my phone vibrating on the table, a car alarm going in the
| background, the cat scratching a post, and so on and as long as
| I remember to talk at a normal volume it's damn near
| indistinguishable from talking in a quiet room. It may sound
| preposterous or like I'm exaggerating for effect but I'll be
| damned it actually filters that well. I didn't believe it until
| I tried. It finally gets "bad" when the noise is so bad and
| loud on the microphone your voice starts to sound a bit
| distorted but it's still isolated. Does let cat meows through,
| though that is technically voice and I'm not sure how you could
| identify it was a meow without massive latency to hear the
| whole thing first.
|
| That being said they seem to have completely fucked something
| up porting it to Nvidia Broadcast as the mic filtering in that
| leaks to the point it was like it wasn't even on.
| im_dario wrote:
| Your guide would be a blessing for techies looking to improve
| their audio quality. Please, do it!
| gsich wrote:
| >ng feat, but please, don't overdo it. Most of the time, you
| don't really want complete ambient noise elimination, as human
| speech appearing from dead silence sounds unnatural.
|
| No. Most sane programs don't do comfort noise because it is
| everything but comfort. Iff you speak data should be
| transmitted.
| StavrosK wrote:
| I think pretty much everyone who does A/V production (and some
| people who don't, like me) would be interested in such a guide.
| Please do write it!
| ArsenArsen wrote:
| I'd be quite interested in such an article, again, my goal
| (besides VoIP) is screencasting and/or streaming, so any bit of
| advice someone with experience might have is greatly useful.
|
| I'll look into expansion and compression, and I could implement
| a wet/dry setting that multiplies the source samples and then
| mixes them into the result, if I understood the concept right.
|
| EDIT: RNNoise seems to be alright when it comes to canceling
| noise during speech too, I didn't notice it overdoing it.
| ZoomZoomZoom wrote:
| > I could implement a wet/dry setting that multiplies the
| source samples and then mixes them into the result, if I
| understood the concept right.
|
| Haven't tested your version yet, but werman/noise-
| suppression-for-voice plugin introduces some delay and dumb
| wet/dry control (or mixing with original sound source in some
| other way) doesn't work, so it might turn out to be not so
| simple.
| ArsenArsen wrote:
| Right now there's no such feature in place, but I imagine
| keeping the buffer from before denoising and mixing it into
| the denoised result (plus the multiplication) will do what
| you're describing? It may increase volume, I might need to
| reduce the volume of the denoised audio first. I'll play
| around with it, and am open to hearing what you've got to
| say about it.
| AndrewUnmuted wrote:
| Great post.
|
| I'm also an audio engineer. This is the truth.
|
| In an audio recording featuring spoken voice, there are two
| sounds present in every recording: the spoken voice, and the
| room ambiance in the background. We typically will refer to the
| latter as "room tone."
|
| Even though we don't usually explicitly realize this, our
| ears/brain implicitly do. So, when people overdo noise removal,
| we implicitly hear the difference since half of the sounds that
| compose your filtered output are now gone. We tend to associate
| such recognizable "noise gating" with lower production quality
| and we find that generally such processing leads to lower
| intelligibility of the human voice.
| NovemberWhiskey wrote:
| The addition of an artificial ambient background is known as
| "comfort noise" for those who are interested to look further
| into it; usually it's done on the receiver end.
___________________________________________________________________
(page generated 2021-01-31 23:01 UTC)