[HN Gopher] Enhanced noise suppression in Jitsi Meet
___________________________________________________________________
Enhanced noise suppression in Jitsi Meet
Author : jlpcsl
Score : 254 points
Date : 2022-10-01 11:28 UTC (11 hours ago)
(HTM) web link (jitsi.org)
(TXT) w3m dump (jitsi.org)
| quickthrower2 wrote:
| Aside, Jitsi is pretty awesome for creating an video app idea
| quickly. The API is very easy to use.
| mcluck wrote:
| It does seem to do a good job of eliminating noise but it seems
| like it gets rid of a lot of the signal too. It's much easier to
| understand the noisy sample than the processed one
| josteink wrote:
| I'm using RNNoise as a pipewire input filter on my Linux
| machines, but that's very Linux-specific and a bit "hardcore" to
| setup.
|
| Nice to see it getting integrated into video meeting solutions,
| so more people can take advantage of this awesome library.
| Doman wrote:
| Awesome! Could you please elaborate how to do it or post some
| good/not outdated links?
| josteink wrote:
| Don't remember exactly which guide I followed, but I used the
| build from this repo, and the instructions looks plausible:
|
| https://github.com/werman/noise-suppression-for-
| voice#pipewi...
| asicsp wrote:
| > _but that's very Linux-specific and a bit "hardcore" to
| setup_
|
| Have you tried https://github.com/noisetorch/NoiseTorch/?
| nicolaslem wrote:
| Or https://github.com/wwmm/easyeffects for noise reduction
| and other effects like compression and EQ for a real crooner
| voice in any video call application.
| kevincox wrote:
| Definitely recommend easyeffects over noisetorch. No root,
| high quality GUI and can work automatically in startup. I
| only use the noise suppression 99% of the time but having
| the other effects available can also be fun.
| pen2l wrote:
| Created a few years ago by Jean-Marc Valin of xiph/mozilla (who
| by the way is also the author of Opus codec among other things):
| https://gitlab.xiph.org/xiph/rnnoise/
|
| Overview of RNNoise from the horse's mouth is here:
| https://jmvalin.ca/demo/rnnoise/
|
| Used as a Wasm module! In some ways the web is becoming more
| opaque. Is this the future then, a hodgepodge of binaries doing
| things behind the scenes? Though in this case it happens to be
| OSS, and it may well be a moot point -- backend is already a
| blackbox to the enduser, now parts of frontend are blackboxes.
| The practical implication is probably just that some measure of
| customizability is gone.
| saghul wrote:
| What a weird take.
|
| How else would we have implemented this? WASM has facilitated
| introducing these technologies into web applications, it
| literally wasn't possible before.
|
| Thanks to emscripten it wasn't even that hard to get rnnoise
| working on WASM: https://github.com/jitsi/rnnoise-wasm
|
| I concede WASM does open the possibility of adding opaque stuff
| to web apps but IMHO the benefits outweigh the drawbacks at
| this point.
| pen2l wrote:
| Oh no you're absolutely right, my general frustration was
| ill-placed for this thread. Wasm is no doubt the right and
| only way to have done this.
| danuker wrote:
| Short of reproducible builds, you can't even check that what
| you're being served is, in fact, the OSS version.
| naillo wrote:
| I feel like I have about as much chance reading disassembled
| wasm as I would have reading unminified javascript so I don't
| think it changes much.
|
| (You could technically turn the wasm to JS and unminify that
| too, which I doubt is much harder/easier to decipher as the
| same thing written in JS and minified/unminified.)
| sabjut wrote:
| Is this a troll comment? Yes, wasm works based on a compiled
| binary, just like any other program written in a compiled
| language in the past 50 years. You try to suggest that everyday
| users of the web are just going into the js sources of webpages
| and understand whats going on. With the plethora of libraries,
| frameworks and static optimization used in todays websites,
| normal people can't really dissect the inner workings of a
| website just by looking at the code. That's why we have tools
| like request analyzers etc which all would still work with
| compiled libraries.
|
| Compiled code has existed for half a century and we know how to
| work with it.
|
| Suggesting that the web is doomed because people of the future
| prefer rust instead of javascript is beyond any rationale.
| salawat wrote:
| ...I still dissect website code, thank you very much.
| Basically have to do it just to figure out quirks I'm
| constantly running into.
| troyvit wrote:
| They didn't suggest the web is doomed, just that more aspects
| of it are opaque. I don't think they're talking about every
| day users of the web either, but rather nascent developers.
|
| The early web was a great equalizer. Anybody could study a
| little html, download an ftp manager, jump through a few
| procedural hoops and have a web page. After some studying and
| trial and error they could even build an interactive site.[1]
|
| It's easy to miss all the potential of wasm when that's what
| you remember of the web. To me the amazing thing is that
| browsers will still work with the methods described above[2]
| but we're on the cusp of being able to do almost everything a
| full application environment can do.
|
| That said, even though there will be plenty of OSS wasm tech,
| it'll still be more opaque to those of us who don't do
| compiled languages. It'll be a lot tougher to just fork the
| code and do something more creative with it.
|
| [1] PHP used to stand for "Personal Home Page" and, as one of
| its founders put it, was created so that "any idtiot" could
| make an interactive site.
|
| [2] https://t.mkws.sh/58bytes/
| fragmede wrote:
| JavaScript minifiers to obfuscate the code have been around
| pretty much since the language got popular, so that version
| of the web's been gone since about when Myspace lost to
| Facebook. Places like Glitch.com is trying to bring that
| back though.
| maven29 wrote:
| Are modern-day "no code" tools like Webflow not an
| acceptable equivalent?
|
| We already lost any semblence of building from scratch in
| the mid-2000s with the emergence of gargantuan HTML
| templates and Wordpress/Drupal/PHPbb deployments with
| plugins and themes.
|
| This is a direct result of people being held to higher
| standards and thus spending a lot more effort overriding
| the compositional and behaviour defaults of the user agent.
|
| The modern-day iteration just optimizes for scaling up to
| tens of thousands of concurrent end-users on anemic
| hardware.
|
| We have to accept the fact that personal webpages gave way
| to social network profile pages. This didn't happen
| overnight and there is zero demand for a hand-crafted
| presence on the web anymore.
| kragen wrote:
| No, an environment for writing _new_ code is not any kind
| of equivalent for the ability to reverse-engineer
| _existing_ code. Firebug and its clones are a much closer
| equivalent than anything like WebFLow.
| rektide wrote:
| Build from scratch is out of favor, but not necessarily
| that far off. Folks like Github & Youtube have very
| simple bottom-up webcomponent systems they use, rather
| than top doen frameworks. Existing concerns about
| bundling might be met by bundled http exchamges
| (webpackage).
|
| I dont think "no code" is an aid. If anything it's
| pushing in the opposite direction: rather than a
| transparent approachable web medium, it suggests we need
| hyperadvanced tools that we really wont understand or
| have control over to synthesize web code. It's a simpler
| user experience, but a push away from notepad.exe webdev.
|
| I wouldnt rush to make any conclusions about who or what
| has won, as a settled fact & case for all time. We havent
| had good ways to run online systems ourselves, versus
| hosted for us, and there's still lightyears to go but
| we're doing good things & finally maturing well. We're
| only a couple years into ActivityPub as an interchange
| format & growing many of the caoabilities & tools &
| systems, around all mimds of use cases, that will make
| throwong together a fair, interactabke competitive
| offering possoble. Social media has had huge huge
| investmemt poured into it, but we are in decent preteen
| years of growing up & owning the libre equivalents. We
| can assess demamd only after there is a visualizable
| state people can imagine; just having an isolated blog is
| not the equivalent to the well connected social media
| site, but these capabilities slowly arise. Follow the
| alpha geeks; this currently long phase will not be
| forever.
| Uehreka wrote:
| Sure "everyday users" aren't clicking "View Source", but
| that's not really what the issue is about.
|
| When I was a kid, every piece of software I used was pre-
| compiled, and therefore opaque. This made it difficult for me
| to figure out how people made certain things, and after a
| while I lost interest in programming.
|
| When I got back into it later, one thing that made a huge
| difference was being able to see how various cool JS sites
| were built. The ability to "View Source" like that was
| revolutionary, and also allowed me to build some early fun
| projects, like a Cookie Clicker "AI" that could play the game
| automatically by calling the functions I could see in the
| game's source.
|
| I'm far from the only person with experiences like these.
| Yes, there was programming before View Source and there will
| be programming after. And for those of us with the right
| tools or reverse engineering skills, View Source isn't
| particularly relevant. What we're losing is a pipeline that
| helped people become/stay interested in programming, which
| makes it likely that future programmers who would've followed
| a path like mine will do something else instead.
| est31 wrote:
| On the other hand, it's never been as easy to contribute to
| OSS projects as it is now. Github has severely lowered the
| requirements compared to earlier settings where you had to
| get an e-mail client, configure it in just the right way,
| etc. You have live coding youtubers, there are discord
| communities for all types of technology, and knowledge
| about programming and technology is extremely available
| through Google, way more than it was 20 years ago. I think
| young people still have tons of opportunities to start out.
| SergeAx wrote:
| Today's JavaScript "View source" is 90% useless because of
| Webpack et al. The original program is effectively compiled
| into obscure and obfuscated lowest-common-denominator JS.
| Weatherweathe wrote:
| Arent wasm modules still sandboxed? Reverse enginering binaries
| should have around same complexity than reverse enginering
| uglify js, not sure how they are more opaque
| pen2l wrote:
| You probably have a point but I'm thinking unuglified js code
| (http://www.nice2predict.org/) is not as impenetrable as code
| from reverse engineered wasm binaries? The element of
| plausible deniability is more potent though for the nefarious
| actor on the other side in the case of wasm binaries.
| robalni wrote:
| I don't think it makes much of a difference whether you can
| read the code because even if you can read the javascript, it's
| automatic so it can be different on the next request. If we
| want to be able to trust the web, we have to get rid of the
| automatic download and execution of arbitrary script code.
| api wrote:
| Most JavaScript these days is basically compiled binary. Rarely
| is it very human readable.
| elcomet wrote:
| Wasm is about distribution of binaries, not about open source.
| Those are two different subjects.
|
| When I install a program on my debian machine with apt-get, I
| also get binaries. But this doesn't mean that it is opaque
| right?
| KMnO4 wrote:
| I think we've been lulled into some false sense of expectation
| that the web exists as a place for "open source code" to be
| run. As if the fact that you can view the source of any page is
| any purveyor of that.
|
| If that's your definition of transparency, then perhaps
| learning to read assembly would give you the same comfort. In
| fact, there's a lot more binaries distributed with symbols
| intact than unminified JS.
|
| Or, to put it another way, if you could right click -> view
| disassembly of any binary on your computer, how would that be
| any different than today's web?
| geiser wrote:
| Sorry, but at least in my smartphone, I can understand better the
| unprocessed audio showcased down in the Web page, than the noise-
| suppresed audio. How is that?
| CharlesW wrote:
| The original audio is significantly easier to understand. This
| may be technically interesting, but the noise suppression is
| aggressive to the point that it's eating critical signal with
| the noise.
| SergeAx wrote:
| This is the default for online conferencing. Everyone is way
| better off asking other party to repeat couple of words than
| listening for all that noise during the whole call.
| ComputerGuru wrote:
| > Everyone is way better off asking other party to repeat
| couple of words than listening for all that noise during
| the whole call.
|
| I didnt understand the first three words, for Alice it was
| the next two, and for Bob it was the last four. How many
| people are going to ask to repeat?
|
| Evolution taught us to understand over the sound of waves,
| crickets, rain, thunder, and more. It didn't teach us to
| comprehend with half the signals masked.
| leni536 wrote:
| But this might be better served with a simplistic voice
| activity detection, like in mumble.
| atty wrote:
| Somewhat tangential, but at my work we have found WebEx's
| background noise removal to be absolutely amazing. So many times
| we've had someone in a meeting say "sorry about X/Y/Z, it's so
| noisy", and the rest of us won't hear a thing. This sorta tech
| has gotten so good, and is a really nice quality of life
| improvement for remote work. (Or for meetings with people in
| noisy offices of course)
| naillo wrote:
| Rare to find creative real time small-weight uses of ML but I
| love when it's done and this has an impressive and well written
| explanation with it as well. Great stuff.
| haunter wrote:
| This is one of the filters OBS use too (the other is Speex which
| is obsolete to some extent)
| eis wrote:
| Bummer, reading the title I thought Jitsi had a new de-noiser
| because they had RRNoise for some time. Unfortunately RRNoise has
| not received much advancement for a couple years. It's by now
| half a decade old tech. I've worked with the WASM version in the
| past but it can be hit or miss. Sometimes it makes the audio you
| want a bit weird. It also added something like 10% CPU usage and
| in the end we disabled it again.
|
| I'd love to see some more state of the art solution that works
| with WASM. Maybe even something that one could train on their own
| voice and filter everything else would be awesome. Because all
| the noise cancellation tech does not help if you sit in an
| environment with other people talking next to you and the AI
| doesn't filter it because it's voices. Sometimes coworkers use
| Krisp but even that proprietary paid solution is so-so.
| saghul wrote:
| While we've had rnnoise integration for a while it was for
| "noisy environment" notifications, this is the first time we
| use it to actually filter audio.
|
| Also audio worklets weren't a thing when we first introduced
| it.
|
| I'm not aware of any other open source (and better) models, but
| if any come up, we'll certainly check them out!
| pen2l wrote:
| If you have any involvement with Jigasi or might be in the
| know -- are there plans to use whisper, for instance, instead
| of Google's API for transcription? If I recall correctly
| jigasi is using google's API, local transcription aligns well
| with the rest of Jitsi's missions.
| nikvaes wrote:
| The problem for Jigasi's speech-to-text feature with
| Whisper - or any recent SOTA speech-to-text neural
| networks, is that they are transformer-based. One of the
| key features of transformers is that they are very good at
| processing a sequence with the attention mechanism. But
| attention inherently needs to see the whole input sequence.
| So it's difficult to adapt these architectures to perform
| well in real-time scenarios like captioning meetings.
| pen2l wrote:
| Yes! But a part of the Jitsi ecosystem enables recordings
| and whisper is a good candidate to use for these recorded
| sessions.
|
| On that topic -- they record sessions in an interesting
| way, basically an instance of chrome and is started and
| captured... I think with OBS. That always made me raise
| an eye but I also can't think of up a better way.
| saghul wrote:
| We do have VOSK support already. I haven't heard of
| whisper, but it does sound like a good GSoC project for
| next year!
| pen2l wrote:
| If I have time I'll try to help you guys out. I'm a big
| fan of what you're doing. :)
| eis wrote:
| Thanks for the clarification. We also experimented with audio
| worklets + rrnoise about 1.5 years or so ago but had very
| mixed results. The potential upside with processing in
| another thread is clear but some browser and OS combinations
| just didn't work well and resulted in micro stutters in the
| audio. I remember Chromium on Linux for example being
| finicky. Some browsers worked better with smaller buffers,
| some needed bigger ones. We spent too much time debugging and
| tuning for different systems and the audio quality
| improvement was not deemed good enough so we shelved the
| effort. I guess audio worklets improved since then and
| probably is more useable by now. Do you guys have some kind
| of performance monitoring for the noise cancellation or audio
| in general?
|
| At the time I also spent a few days looking for something
| better but didn't really find anything. Unfortunately RRNoise
| is the best we have :( The only other noise cancellation
| software that actually impressed me was the one from Nvidia
| but that's not something that one could integrate via WASM
| and of course wouldn't work on most devices anyways.
|
| Oh what a day it will be where we have energy efficient
| hardware encoders for AV1 in every device plus some really
| good noise cancellation. Oh and then we just need internet
| connections without packetloss :P
| [deleted]
| gnicholas wrote:
| Anyone have tips for using Jitsi? I've been thinking about moving
| off Zoom now that they're enforcing a 40 min limit even for one-
| on-one calls.
|
| Does it create friction for folks who haven't used it before? Any
| suggested instructions to send with a meeting invite?
| e12e wrote:
| We've been using jitsi via zulip chat at work. It should be
| drop-in for at least small groups (one-on-one, handful of
| people - I have yet to investigate "conference" or "class room"
| size).
|
| We do unfortunately see semi-regular lock-up/freezes where one
| end of the stream stops for ~30 seconds. Maybe this is worse in
| safari vs chrome/Firefox - we have not yet experimented much
| with different browsers. Or maybe there's a difference between
| x86_64 and arm/m1/m2.
| dividedbyzero wrote:
| As someone invited to a Jitsi meeting a while ago, not having
| any video background removal, a lot less audio processing and
| what looked like no video processing at all meant everyone was
| harder to understand, harder to see and any activity or clutter
| in the background was fully visible of course. I guess buying
| quality microphones and cameras for everyone involved would
| help. Detailed instructions are a good idea as well, I
| struggled a bit with the unfamiliar interface.
|
| Personally, I'd stick with the big names, long remote meetings
| are strenuous enough even with all the quality of life features
| those offer.
| _joel wrote:
| I prefer the sample with the noise. Seems clearer to understand
| SergeAx wrote:
| Would you prefer to listen this noise for half an hour? :)
| mcluck wrote:
| Or just have them mute and unmute at appropriate times. I do
| this even in non-noisy environments
| dsr_ wrote:
| Assuming the demo samples aren't rigged, that's a very
| substantial improvement.
| hawski wrote:
| Is there video conferencing software that does spatial audio for
| conferences? What I have in mind is that it is often problematic
| to understand each other while multiple people are talking. It is
| much easier in person. I guess it all goes down to ability to
| focus on directial cues of an audio source. Currently everyone
| are placed inside one's head so they interfere much more this
| way.
| gnicholas wrote:
| Apparently FaceTime offers this. [1] Presumably Apple will
| allow other companies to do it as well, since they let them
| offer spatial audio in other contexts.
|
| 1: https://support.apple.com/guide/iphone/change-the-audio-
| sett...
| d110af5ccf wrote:
| Why would Apple need to allow it? It's simply a matter of a
| given program postprocessing the various audio streams
| appropriately prior to muxing them for output.
| rasz wrote:
| You could give up on audio portion of your current Video
| conferencing setup and just install Teamspeak with spatial
| plugin
| https://www.myteamspeak.com/addons/9ddfa0b2-25c2-4302-8a43-0...
| tbalsam wrote:
| Very very good, a little bit of stuttering during the honking I
| think but I like it overall! :D :)
|
| Jitsi Meet has been a great alternative to other meeting apps in
| these crazy times.
| Kwpolska wrote:
| My experience with Jitsi Meet has been quite bad. My previous
| employer was a cheapskate, and they self-hosted Jitsi Meet.
| Random disconnections and instability were pretty much a daily
| occurrence, some people were disconnected every few seconds.
| While I suppose the self-hosting by Cheapskate Inc. was the
| main culprit, Jitsi's screen sharing wasn't looking very good.
| andrepd wrote:
| So someone hosted $software on a shitty server, and you blame
| $software for the shitty performance? To draw any conclusions
| you should look at meet.jit.si (hosted by Jitsi), no?
| saghul wrote:
| We've made significant tweaks to screen-sharing in the past
| 2-3 stable releases, in case you feel inclined to check us
| out again :-)
| spockz wrote:
| Aside from consuming a ton of resources when screen sharing,
| my experience with Jitsi meet has been very good. It consumes
| two cores of my 5900X (1 for the Firefox process, and another
| for some system process I don't recall exactly) but it works.
| This was with sharing a 4K screen.
|
| I have run jitsi on cheap VMs and it worked decently. But you
| need quite some cores to serve all the traffic. Ultimately I
| ended up having as many 2-4core VMs as I had concurrent
| calls.
| 2Gkashmiri wrote:
| how is the meet.jit.si hosted? i assume with lots and lots
| of random users, the bandwidth and processing costs to be
| astronomical
| troyvit wrote:
| My last employers were cheapskates too (I love 'em for it)
| and they just used meet.jit.si for calls. It was a lot more
| stable than self-hosted jitsi. That said there were almost
| always microphone or video issues using it, just because
| people weren't used to it I guess. It made job interviews
| fun. It was a nice live test to show how a potential employee
| handled adversity.
| TingPing wrote:
| My company self-hosts an instance and it's excellent.
| wrp wrote:
| I've been using Jitsi Meet regularly for about a year. It's
| usually fine, but on some days I experience disconnections
| every several minutes.
| shaan7 wrote:
| Indeed! I recently used a locally hosted Jitsi to talk to my
| family in the other room while in COVID isolation. It was a
| life saver, and extremely easy to setup with docker-compose
| with only a handful of steps that I could complete even with
| fever+headache https://jitsi.github.io/handbook/docs/devops-
| guide/devops-gu...
___________________________________________________________________
(page generated 2022-10-01 23:00 UTC)