[HN Gopher] Show HN: Cleanvoice - Automated Podcast Editing
___________________________________________________________________
Show HN: Cleanvoice - Automated Podcast Editing
Author : autoencoders
Score : 160 points
Date : 2021-11-20 14:58 UTC (8 hours ago)
(HTM) web link (cleanvoice.ai)
(TXT) w3m dump (cleanvoice.ai)
| tokamak-teapot wrote:
| "The algorithm can also work with accents from other countries,
| such as Australian ones or Irish."
|
| Other than which country, though? Presumably an English speaking
| one - UK? New Zealand? Canada? US?
| hs86 wrote:
| Reminds me of https://auphonic.com/
|
| Their pricing is also similar, but Auphonic allows both
| subscription and prepaid "credits".
| autoencoders wrote:
| Yes, the idea is to bring also prepaid credits soon.
|
| Auphonic and Cleanvoice go well together.
|
| I guess the idea is to have your podcast edited by Cleanvoice
| and then the audio post-processing with Auphonic.
| ghaff wrote:
| Auphonic's volume equalization is almost a must-have for
| podcasts. I used to spend a lot of time getting volumes
| right. With Auphonic it's quick and easy.
|
| I definitely prefer pre-paid credits to a subscription given
| my podcast production varies a lot.
| xipho wrote:
| Can anyone recommend similar for removing ums etc. in videos?
| IIRC there is a workflow in some professional software, but being
| able to train and throw the algorithim right at the video itself
| (especially locally) would be useful.
| autoencoders wrote:
| For now, Descript would be the best option. You can still make
| it work with the integrations, but it is a lot of effort.
|
| That will change in Q2, when I add support for video.
| mijustin wrote:
| Yes; Descript.com does this.
| nickjj wrote:
| > Can anyone recommend similar for removing ums etc. in videos?
|
| For single camera floating head style videos where you're
| continuously talking about 1 topic it's going to be very
| jarring if you start cutting out filler words. You'll end up
| with a bunch of jump cuts where it looks like video frames are
| dropped.
| nickjj wrote:
| As someone who has personally edited over a hundred 1-2 hour
| podcasts with a new guest every time removing umms, ahhs, dead
| air and filler words is soul crushing. It has gotten to the point
| where after 2 years of running my podcast[0] I'm seriously
| considering stopping the show because I'm getting burnt out from
| editing and without sponsors it's not feasible to hire an editor,
| but even with the show making no money I would happily pay triple
| your asking price if I could click a button and have the problem
| solved in a way that matched a human's ability to edit out filler
| words.
|
| It really is the difference between being able to edit a 1 hour
| episode in 1 real life hour (editing at 2x speed) vs literally
| spending 5 hours to edit 1 hour when there's a lot of filler
| words or ums. That's due to having to stop every few seconds,
| think about when to cut it and perform the cut. This is using a
| heavily optimized keyboard shortcut focused workflow too.
|
| I hope you don't mind constructive criticism but in my opinion
| your "after" version doesn't sound natural. This isn't an attack
| on your service specifically, because the outcome is the same
| with all of the automated tools I've tried. I haven't tried them
| all but I did play with a few of them.
|
| For example in your case the pause between "Removing" and
| "filler" doesn't match the pace of the rest of the sentence and
| the transition from "very" to "time" has a very hard cut. This is
| also a 10 word clip that's about 6 seconds. If you listened to a
| 1 hour podcast episode that was edited like this it would be much
| more noticeable.
|
| There's so many intricate and subtle details around when and what
| to cut to remove these things in a way where it's not noticeable.
| Are there any paths moving forward in AI / ML that can lead to
| this being indistinguishable from being humanly edited?
|
| I debated deleting this comment before posting it because it's a
| combination of feedback but also saying the service isn't
| something I would buy in its current state but I'd like to think
| it's more beneficial to post this to show there is a real demand
| for this service if it can be executed flawlessly.
|
| [0]: https://runninginproduction.com/
| moritonal wrote:
| Meta, but your comment was (IMHO) a great example of
| constructive criticism. Show HN is about that, not just staying
| silent and letting the users work die.
| dannyeei wrote:
| Funnily enough I was about to start building this then found
| descript[1]. It transcribes the text and allows you to edit the
| transcription then export it as audio.
|
| [1] https://www.descript.com/
| [deleted]
| autoencoders wrote:
| The edit on the page is not the best. I agree!. Mainly, if your
| recording is unnatural (like that one) the edit is also
| unnatural. However, the tool works better in an interview
| podcast. I would strongly recommend to just upload a sample,
| and you would see a big difference.
|
| Regarding if ML would be indistinguishable from humanly edit.
| Hard to tell. I think it will be like self-driving cars in the
| future. 98% edits good 2% bad edits.
| qmmmur wrote:
| What post-processing do you do already to catch the low hanging
| fruit? Izotope? I reckon putting in 100 hours of editing and
| not being able to get an hour down to sub an hour means there
| is something which could be optimised out quite quickly.
| nickjj wrote:
| > What post-processing do you do already to catch the low
| hanging fruit?
|
| None, everything is manual.
|
| I use DaVinci Resolve to do the editing where both the guest
| and myself have separate tracks. Then I line up the tracks
| (only takes a few seconds) and start playing things from the
| beginning at 2x speed. I stop to make cuts mostly to remove
| filler content.
|
| Through out this process of editing I'm also creating show
| notes as I go. An example of the end result is here
| https://runninginproduction.com/podcast/103-great-
| question-m.... Basically every few minutes I recap what was
| said into a 1 sentence bullet point with a timestamp. Along
| the way I list out techs used as tags and list out reference
| links / libraries into a Markdown document. Then once I'm
| done editing the show I write a few paragraphs which is a
| TL;DR of the episode.
|
| All in all if the guest uses minimal filler words or noises
| it takes about 1 real life hour per 1 hour of recorded
| content to do all of the above. For context, the episode I
| linked has someone who I would bucket into a category of
| speaking very fluently with minimal filler content. I was
| able to blaze through that one.
|
| I also have a 2560x1440 display and use the "always on top"
| feature of most window managers to layer the Markdown
| document and a preview of the page just above the waveform in
| DaVinci Resolve so I can quickly make cuts and update the
| notes with minimal mouse movement. Almost everything is
| keyboard driven.
|
| What tools can be used to speed up that process?
| simonbarker87 wrote:
| I've not edited anywhere near as much as you have but I agree,
| it's so tedious and by the end of an editing session you can
| really start to resent the guest and all their verbal ticks. I
| find I get a good idea for what the waveforms look like for
| some noises and can see them coming and preemptively split the
| track the start with a decent success rate.
|
| Using RiversideFM to get two locally recordings is also a big
| help.
|
| I was sat next to an audio editor and producer at a wedding
| recently and we got on to this topic and he said "your number
| one job when editing an interview is to make the host sound
| good and then just do the minimum on the guest, otherwise
| you'll waste too much time".
|
| Doing the kind of editing 8 hours a day I can see why he says
| that.
| nickjj wrote:
| Yeah it's weird. I have these in depth technical
| conversations with every guest where it's great, I love this
| part. The frequency of verbal ticks and filler content really
| takes an edit from "this isn't too bad" to "what the fuck am
| I doing with my life?" all based on how many times you need
| to remove filler content within the first 5 minutes of
| editing a 90 minute show.
|
| I'm kind of surprised that wedding producer openly said that.
| My philosophy has always been the opposite. One of my main
| goals of the show is to make the guest walk away thinking
| this was the best podcast experience they ever had from start
| to finish as well as do everything I can to make them come
| off as good as possible.
|
| I rarely cut content but most episodes have hundreds of
| manual edits to remove filler content and create a more
| concise flow by removing long pauses because my 2nd main goal
| is to optimize for the listener. I keep the edits organic at
| the same time by leaving in some filler content and subtle
| things like a deep inhale or a sigh because there's a lot of
| meaning around that when it comes to sentiment and tone, the
| same can be said for sometimes leaving in an extra 500ms
| pause to amplify the meaning behind something. At the same
| time, sometimes filler content gets left in because it flowed
| too quickly into the next word so cutting it sounds too
| unnatural as if it clipped.
|
| This is why I think it's a crazy hard problem to get a
| machine to be able to make decisions like this.
|
| I do use separate recordings (we each record our track
| locally), it definitely helps eliminate the few cases where
| we talk over each other or being able to lower the volume of
| a laugh so it doesn't overpower what the other person said
| while still keeping it in because it's a good part of a
| conversation and a snort or laugh can easily be the
| difference between a listener wondering if the guest was
| offended or happily agreeing with something.
| [deleted]
| mijustin wrote:
| Hey! Justin (from Transistor.fm) here. This looks really
| interesting. Two questions:
|
| 1. Any plans for an API and bulk pricing?
|
| 2. Any plans to add loudness normalization, balancing, etc to the
| processing?
| autoencoders wrote:
| Hey Justin! Love your podcast.
|
| 1) API Access will come end of Q1.
|
| 2) In the next 6 months, No. However, Auphonic would be a good
| fit for you.
| abdik wrote:
| The logo is similar to ours https://www.lovo.ai/
| bryans wrote:
| While turning it into a heart may be clever branding, you've
| only slightly modified a ubiquitous icon representing audio,
| and countless startups used that before you.
| eganist wrote:
| This is awesome.
|
| Can I suggest the ability to export as project files for popular
| editors for your roadmap? It'd cut professional workflows down
| substantially, which would be worth an (even higher) upcharge.
|
| (It wasn't immediately obvious to me if you already did this)
|
| Edit: https://cleanvoice.ai/integrations seems pretty close. I'd
| honestly charge more for integrations and provide a base tier for
| just exporting sound. I imagine most indie users would benefit
| from finished exports enough to pay, while project files would
| command a higher fee from editors looking to speed up their
| workflow to take more clients. That's where I'm coming from on
| pricing tiers and upcharging for professional features.
| autoencoders wrote:
| ADL Support will come around Q2, so you can import it in lot of
| audio and video editors. For now, we have these export files
| which you mentioned.
|
| Regarding Pricing, that's a good point. I will definitely
| consider it, thank you!
| daenney wrote:
| The Terms of service seem worrisome.
|
| > By posting your Contributions to any part of the Site or making
| Contributions accessible to the Site by linking your account from
| the Site to any of your social networking accounts, you
| automatically grant, and you represent and warrant that you have
| the right to grant, to us an unrestricted, unlimited,
| irrevocable, perpetual, non-exclusive, transferable, royalty-
| free, fully-paid, worldwide right, and license to host, use,
| copy, reproduce, disclose, sell, resell, publish, broadcast,
| retitle, archive, store, cache, publicly perform, publicly
| display, reformat, translate, transmit, excerpt (in whole or in
| part), and distribute such Contributions (including, without
| limitation, your image and voice) for any purpose, commercial,
| advertising, or otherwise, and to prepare derivative works of, or
| incorporate into other works, such Contributions, and grant and
| authorize sublicenses of the foregoing.
|
| It sounds an awful lot like "we are allowed to do anything and
| everything we want with the content you upload to us". Maybe I'm
| misunderstanding something, but I'd be extremely hesitant to
| upload any content I create to a service with those kinds of
| terms.
| [deleted]
| [deleted]
| stevenicr wrote:
| also.. " Your Contributions are not obscene, lewd, lascivious,
| filthy, violent, harassing, libelous, slanderous, or otherwise
| objectionable (as determined by us). 7. Your Contributions do
| not ridicule, mock, disparage, intimidate, or abuse anyone."
|
| Cancel culture coming.. main reason I would not invest time
| into using Anchor,fm ..
|
| So.. is "Us" progressive or conservative?
|
| Bill Maher breaks these every night on both sides pretty much,
| so I try to think, if they won't protect Bill Maher or Larry
| Flynt's words, they are not going to protect mine.
|
| "Contributions are not false, inaccurate, or misleading." - So
| mainstream news can't use it either - that's a bonus.
|
| I'd add more, but I see you mentioned you will be changing and
| this is just a boilerplate to save time.
| 1-6 wrote:
| Thanks for the heads up. I'm a little hesitant to upload
| something now. On the flip-side, I think devs just want total
| protection while they navigate the landscape of machine
| learning. I agree that they could have worded things better but
| someone who worked on writing this probably didn't understand
| the nuances of machine learning or the countries that people
| would be signing up from. Plus they'll need to constantly use
| datasets for their internal purposes to train.
| autoencoders wrote:
| Yes, that's exactly the case. As I previously commented, I
| used an terms generator, until I get a lawyer, which can
| write specifically what I do with the data.
| autoencoders wrote:
| I agree. The terms will be changed. I used an auto-generated
| Terms generator for now (termly.io)
|
| I would like to rewrite it.
|
| What I do is just keep your files on the server for a week. In
| case you have an issue, I will look into your file to fix your
| issue. And if you want, you can give consent for me to further
| improve the service. (Say you have an accent which the AI is
| bad and I can use your audio file to understand why it failed.)
| throwthere wrote:
| With this statement you've now shown that your site doesn't
| take contracts seriously and opened the door to people
| arguing future contacts are also invalid. I'd delete this
| response asap.
| stavros wrote:
| What? This person made something, we pointed out an
| improvement and they said they'd change it. You're
| literally complaining that it wasn't perfect already, and
| thus they somehow don't "respect stuff".
| giansegato wrote:
| Why? They can change the policy and ask for a confirmation,
| as every service out there is already doing.
| simtel20 wrote:
| How and when have you seen it happen that a contract was
| invalidated by one party indicating that they would prefer
| a more appropriate contract?
| pfortuny wrote:
| Thank you.
|
| I would pay for a piece of software that does that job on my
| computer with no Internet.
|
| This way? I may even end up in court for saying something
| "improper"...
|
| Edi. OK: I've just read the developer's reply below.
|
| Honestly: you need to fix this because right now it is more
| scary than not.
|
| Congratulations for the project but please do fix this.
| autoencoders wrote:
| I agree. More and more AI applications are exploiting our
| data in negative ways.
|
| I will get proper terms soon as possible. Especially, since
| now people have mentioned it.
| axhl wrote:
| Congratulations on launching. How are you finding using termly.io
| for the legal side of things?
| autoencoders wrote:
| It's not ideal. See the comment talking about the terms. I have
| a meeting with a lawyer soon. But I guess is better than no
| terms.
| throwaway1777 wrote:
| Overcast has features to do some of this on the listener side. I
| prefer having the AI on the listener side so I can go back to the
| raw version if the AI messes up for some reason.
| fareesh wrote:
| What's the high level approach required to build something like
| this yourself?
|
| Does it involve relying on speech to text with timestamps and
| then a series of cuts based on that?
| monroewalker wrote:
| Sounds similar to Descript https://www.descript.com/
| spicybright wrote:
| I'm going to sound like a negative nancey, but I wish
| podcasters/youtubers would just practice their speaking skills
| instead of rely on series of really quick jump cuts. Worst
| offenders are those that can't get through a sentence without
| splicing it 2+ times...
|
| Perhaps you could have a mode to detect how much one stutters,
| and parts worth redoing without spending as much time combing the
| whole thing.
| pfortuny wrote:
| Classically professionals learnt their discourses by heart.
| That stands out when you see it.
|
| I remember fondly a student of mine who seemed unable to
| express himself properly. I told him to memorize his final
| project dissertation because otherwise it would be a wreck (OK,
| I did not say this last part, it was more of a suggestion).
|
| BOY: did he memorize it. He got an honors and I did think "this
| guy has really done it, and it sounds like music!"
|
| When you do it well, it tells.
| intrasight wrote:
| Some podcasts I listen to are over-edited. I'd always assumed
| that a) it was done manually and b) it was done to keep the
| length below some threshold. Now I'm curious if they are using
| software to automate the editing.
|
| I find the cadence very unnatural when all the spaces between
| phonemes are removed.
| ghaff wrote:
| >I find the cadence very unnatural when all the spaces
| between phonemes are removed.
|
| Any editing can be overdone and, while I do a modicum of
| editing out umms, you knows, and other verbal ticks when I'm
| putting together a podcast interview, I'm not fanatical about
| it.
|
| You do occasionally get someone who just speaks quite slowly
| and it is sort of annoying to listen to as audio. So I've
| done some automated gap reduction is a couple cases.
| intrasight wrote:
| What software do you use to automate?
| ghaff wrote:
| Audacity.
| [deleted]
| cube00 wrote:
| Especially ones who won't set their background LED lights to a
| stable color. The smooth flowing gradient becomes very
| distracting when you jump cut the heck out of it.
| intrasight wrote:
| synesthesia?
| curiousgal wrote:
| Once I started noticing jumpcuts it ruined every single YouTube
| video with a person talking into the camera. The worst offender
| being Phillip DeFranco.
| ghaff wrote:
| I find talking into a camera really tough. If you're doing it
| by yourself you almost need to imagine you're talking to a
| person. I even know of people who put cutouts or pictures of
| someone by the camera so they can talk to a person.
|
| I haven't had a lot of luck using teleprompters but maybe I
| just haven't hit of the right setup.
|
| Something else someone told me recently was to try to work in
| short segments that you redo until you get right and then do
| a cut to the next segment somewhere that it's natural.
| unholiness wrote:
| Interesting take. I saw Phillip DeFranco as more of a pioneer
| of that style. He really leaned into the cuts. At the time it
| was something no one else was doing so it was very
| noticeable, and he had a very crisp cadence with them where
| the jarring cuts were part of the presentation. It was clear
| his process was: Write a script, mark cuts everywhere it
| could make sense, go through the script repeating every
| phrase until you're happy with the sound, and when editing,
| always make the cuts where they're marked, even if it could
| be skipped.
|
| The result feels something like pixel art: Clearly not the
| closest possible imitation of conversational speaking, but
| something else. A style in its own right with different
| considerations.
|
| Now that it's par for the course to have jump cuts, I see
| them used more sloppily everywhere, where it's clear the
| narrator decided where to do the cuts after the fact. Cutting
| off the beginning or end of a phoneme, missing or repeating
| bits of a thought because they they liked one phrasing in
| recording but opted for another one in post, misordered cuts
| where something which moved in the background moves back to
| its old place, etc. Phillip's style looked lazy but it can't
| really be imitated with actual laziness.
|
| These days I look back and really cringe at the substance of
| his show. But I still see the style as professional.
| mikepechadotcom wrote:
| Really cool project, I wish you great success! Could be useful
| for my (german) podcast agency!
|
| Out of curiosity: Which ai-technology did you use? OpenAI? Google
| API? Or did you train the models yourself with Python (sth. like
| Tensorflow)?
|
| Cheers, Mike
| autoencoders wrote:
| Hallo Mike, freut mich dich kennenzulernen!
|
| I trained my own models. No OpenAI/Google API.
|
| Liebe Grusse, Adrian
| notafraudster wrote:
| "Free 30 Minutes Trial" is not native English. "Free 30 Minute
| Trial" would be better; but I think the sentence is a little
| confusing. I presume you mean you can convert 30 minutes of audio
| for free, not that the trial account is only valid for 30 minutes
| from creation. I would do "Clean 30 minutes of audio for free. No
| Credit Card needed." or similar. The sale page which says "Get 30
| minutes credit to try the service out." is better, and "30
| minutes" does sound correct on that page.
|
| In your FAQ, you say: "Currently we remove lip smacks, saliva
| crackle, mouth clicks and harsh parts of breathing (not the whole
| breath). If you want to remove a particular mouth sound (ex.
| Chewing), write us in the chat as a feature request." I don't
| think most English speakers would understand what "harsh parts of
| breathing" are. Typically a parenthetical example in English
| would be written "(e.g. chewing)" not "(ex. Chewing")".
|
| Your question "What filetype and sizes do you support?" doesn't
| answer what filetypes you support, and I suspect the singular
| "filetype" was a grammar error. You also write "We have an audio
| file size limit of 1.5G per file or in case you are uploading
| multi-track and a total file size of 2 GB. ". The part that says
| "or in case you are uploading multi-track and" doesn't make any
| sense in English. I think you mean "We support file sizes up to
| 1.5GB per file for single-track files, or 2GB if you are
| uploading a multi-track file as separate files." but I'm not
| sure.
|
| In general I don't understand why each selling point has a
| separate FAQ page but the FAQs are often not related to the
| selling point. I don't think people think the "Mouth Sound
| Remover" page is the one that lists file size support, while the
| "Stutter Remover" page is the one that lists the maximum number
| of tracks per project.
|
| Your integrations page lowercases "cleanvoice" whereas other
| pages write it as "Cleanvoice".
|
| Under integrations, you have a section called "Markers Export".
| This should probably be "Export Markers" or "Marker Export".
|
| Under "How to Export Edits", you probably don't want to
| capitalize "Results" or "Editor" unless these are supposed to be
| title cased, in which case you probably want to title case all of
| them.
|
| Under your pricing FAQ you have "Does my credit expire at end of
| the month? Your credit will reset every billing month. Unused
| credit will be lost." This is needlessly confusing. You use the
| verbs "expire", "reset", and "be lost" to describe the same
| thing, and you don't actually answer the question. Also you don't
| want "at end of the month", you want "at month's end" or "at the
| end of the month". I would rewrite as "Does my credit expire at
| the end of each month? Yes. Credit resets every month and cannot
| be carried over to future months. Unused credit will be lost."
| This is a terrible business model, though, and so I suggest you
| not do this. Either sell as a subscription or sell as a credit
| model, not both, this is gross.
|
| In general I think you want to pay someone who is a professional
| English copywriter to fix your website. Cheers.
|
| Edit: I just noticed your changelog is powered by a service
| called Headway. I am not sure if you also made Headway, but
| Headway's website is also in need of English copyediting.
| [deleted]
| [deleted]
| autoencoders wrote:
| Wow! Thank you so much! You are right, I need to get ASAP a
| copywriter.
|
| I'm curious why the Subscription + Onetime Credit is bad. But I
| agree it is confusing.
|
| My understanding is that not every customer wants or needs a
| subscription, since they upload podcasts irregularly.
|
| This business model is seen in other AI products:
|
| https://www.remove.bg/pricing https://auphonic.com/pricing
|
| I am very grateful, you took the time to help out. Really
| appreciate it!
| sdoering wrote:
| Maybe you can get away for a quick fix with something like
| deepl.com.
|
| They are great. As a German native speaker I came a long way
| with using them when I needed valid translations.
| arendtio wrote:
| That logo is very similar to the Cisco logo:
|
| https://www.cisco.com
| stavros wrote:
| This is excellent, well done! I'd be curious to know how it's
| done, as I don't know much about deep learning and this looks
| like magic to me.
| autoencoders wrote:
| Hey HN!
|
| I like podcasting, but I hate editing them. I tend to stutter and
| have a lot of filler words in my podcast. That's why I created
| Cleanvoice, in order to spend less time editing them. Cleanvoice
| is an ML tool which removes filler words, mouth sounds,
| stuttering and dead air from your podcast. To use it, just upload
| your podcast - wait some minutes - download the cleaned audio.
|
| It's still not perfect, but it's at a stage where I can blindly
| use it on every single one of my podcast.
|
| I would love to hear your feedback!
| wpietri wrote:
| Neat! I love products that come out of a personal need.
|
| Is it possible for you to do a live, personal demo? No logins
| or anything. I'm thinking something where you tell people to
| start up their audio and then give them a quick prompt like
| "Describe your breakfast yesterday." Record for 30 seconds, and
| then let them play back the original and cleaned versions. You
| could limit them to, say, 5 goes, with a different prompt each
| time.
|
| I suggest it because a) a little personal investment makes it
| more likely they'll give you their email address for signing
| up, and b) many potential customers underestimate how much they
| need something like this.
| autoencoders wrote:
| I like your idea, makes sense.
|
| My biggest fear is that without login, people will start
| abusing it in ways that I don't expect. Definitely
| considering it. Thanks you!
| wpietri wrote:
| That's a good fear to have. That's the kind of thing I
| would set up some monitoring for and then wait to see. You
| might get a few jerks. But those same jerks might also be
| the sort of people who would sign up with a bunch of fake
| emails, so gating on an email address may not be much
| better than gating on a fresh-issued cookie.
|
| Thanks for listening, and good luck with your project!
| telesilla wrote:
| Have you compared this to other commercial options such as
| Descript? Looks really great at a glance, thanks for sharing!
| autoencoders wrote:
| I tried to use Descript for my podcast, but it has some
| issues.
|
| 1) It doesn't work well if you have a strong accent. As an
| non-native speaker, the transcription were quite bad, making
| the editing quite bad.
|
| 2) Cleanvoice works with multiple languages, descript
| doesn't.
|
| 3) Cleanvoice can remove stutters (not always, but it tries)
| and mouth sounds like lip smacking, teeth clicking. Descript
| can't. This is not a big deal for most, but since I stutter
| alot this was essential.
|
| My approach is different from Descript. They use a
| transcription service, and then they edit the audio based on
| the text. I work directly on the phonetics level. Allowing me
| to have more control over audio.
|
| Depending on the needs, either one is better. I guess you
| should try it for yourself and compare.
| ckdarby wrote:
| I use Descript and it is absolutely lovely. There are a bunch
| in this space that I would not be surprised being merged or
| acquired. Would love to see Descript & GetWelder merging
| together.
|
| While Cleanvoice has some niche features that Descript
| doesn't offer I would not be surprised to find them rolling
| these features out in the next major release they're doing.
| IMO the founder of Cleanvoice should sell/join Descript.
| qmmmur wrote:
| Without giving away your secret sauce, what are your approaches
| to the cleaning process? Is it a combination of different
| passes of algos or is it something more generic and "sausage
| machine-like" like a neural network?
| jwuphysics wrote:
| Based on the OP's username, surely one of the deep learning
| algorithms is a denoising autoencoder, right?
| autoencoders wrote:
| The audio is edited in several phases. It uses different
| algorithms, but most of them are deep learning based. It is
| surely overengineered, but as a Data Scientist, ML is the
| most fun part for me.
| nmstoker wrote:
| How is the latency and, if it's sufficiently low, could
| this realistically be applied to "nearly live" content?
|
| That scenario seems really appealing for conferences, even
| if it just quietens down the verbal ticks, but I'm guessing
| if the lag is too great it would get like a bad lip sync
| issue
| pokot0 wrote:
| How does real time makes sense in the first place for an
| algorithm that gets 1 minute of audio and gives you back
| 50s? You are gonna have to fill the gaps anyway with
| something not meaningful.
| staticautomatic wrote:
| Silence is meaningful, but pretty awkward when not
| deliberate!
| laumars wrote:
| Tools like this are designed to remove awkward silences.
|
| What it sounds like the GP is after is something more
| like hiss and pop removal (to use an only vinyl analogy)
| and that's a different and also simpler problem to solve.
| I'd wager there are already tools on the market for that.
| pokot0 wrote:
| Very insightful :). Now I need an AI to tell me when
| silence is deliberate or not. :)
| autoencoders wrote:
| It would be a huge engineering endeavour, which I
| wouldn't be capable of doing. That said, things like
| background noise and some sounds can be removed. See
| Krisp.ai
| qmmmur wrote:
| Izotope plugins already do some of these things but not
| all. In particular their de-clicking algorithm is pretty
| good but definitely not automatic or low latency.
| Fogest wrote:
| Nvidia RTX voice does similar. It's pretty similar to
| other technology though where it focuses more on removing
| background noise. It actually works very well. It would
| definitely be interesting to see it also filter speech
| itself. But I feel like this would be hard to do without
| introducing extra latency. If someone is saying "umm" or
| some other filler before a word you kinda need to know
| what that word will be to determine if it's filler or
| not. So it almost can't be done without introducing
| latency as it would need some future speech to determine
| if filler or not.
| qmmmur wrote:
| Do you do any audio segmentation to remove the filler words
| and such?
| undoware wrote:
| I literally just bought your product, thank you very much, I
| needed this and wondered why no one had made it yet.
| autoencoders wrote:
| I appreciate it! If you have any issues or need help, feel
| free to reach out. (You can use the chat in the app.)
| gus_massa wrote:
| Is the example in the page really made by the computer? In my
| opinion the pauses in where the filler words were are slightly
| too long. Is it possible to configure this?
|
| Is it possible to keep some filler words? I make something
| similar (but not professionally), and sometimes I like too keep a
| few of them.
| autoencoders wrote:
| > Is the example in the page really made by the computer? Yes.
| >In my opinion the pauses in where the filler words were are
| slightly too long. Is it possible to configure this? I agree,
| however, if you use it in an interview. The edits sound better.
| In an unnatural setting, you get unnatural results.
|
| Currently, there is no way to set it for now. But customization
| is planned for Q2 next year.
|
| >Is it possible to keep some filler words? For now no, but
| keeping some filler sounds to keep it authentic is something
| which I plan.
| gus_massa wrote:
| I agree that the correct length of the pause after the word
| is removed is very tricky. Perhaps your configuration is the
| better than my imaginary magical edition.
|
| In other comment, eganist posted a link to
| https://cleanvoice.ai/integrations It looks interesting
| because I can choose which to keep and even use it to sink
| with video [with some additional work]. I didn't see it the
| first time in the page.
| autoencoders wrote:
| ADL Support is also around Q2, so you could just import it
| in your audio/video editor without issue. Thank you point
| out. I'll put Integrations on the homepage as well.
| pwned1 wrote:
| I suspected something like this was happening with podcasts. I've
| noticed lately that some podcasters have unnaturally short pauses
| between speakers (question and answer) or between sentences. It
| really annoys me. It makes it almost unlistenable.
| carols10cents wrote:
| Yes, the worst is when so much silence is removed that it
| sounds like someone is laughing over themselves.
| autoencoders wrote:
| I agree, as if they don't breathe!
|
| This is not the case with my app. I keep the edits longer than
| shorter, since I also find that unlistenable.
| nateweiss wrote:
| Looks cool! Would this also work for "explainer" type videos,
| showing how to use a software product or similar?
|
| If yes, you might consider a page or callout about that use-case,
| as it might attract some additional users. Just a thought.
| tyingq wrote:
| That seems like it would be tricky, as the video and audio
| would get out of sync. You would have to remove, then "fill" to
| keep the timing. Though this product does mention it works with
| multiple speakers on different tracks...so they are already
| somewhat in that space.
| autoencoders wrote:
| For video is quite tricky. One thing with Video is that you
| don't want to over edit the audio, since its then very hard
| to keep the video synced. That said for explainer video it
| should work ok, but for a Video Podcast it would be horrible.
| I have an idea how to deal with this, but this is not now
| available.
| sdoering wrote:
| Not sure were you are located, but if you are giving access to
| people protected by the GDPR your cookie notice does not fullfill
| the requirements set by European Regulations.
|
| Additionally, if you are located in a country that (like Germany
| for example) has regulations on the necessity of an imprint, this
| might also be missing.
| autoencoders wrote:
| It should be ok, since I use strictly essential cookies, which
| don't require consent. (But users need to be informed)
|
| Or do I misunderstand the law?
|
| [1] Strictly necessary cookies -- These cookies are essential
| for you to browse the website and use its features, such as
| accessing secure areas of the site. Cookies that allow web
| shops to hold your items in your cart while you are shopping
| online are an example of strictly necessary cookies. These
| cookies will generally be first-party session cookies. While it
| is not required to obtain consent for these cookies, what they
| do and why they are necessary should be explained to the user.
|
| [1] - https://gdpr.eu/cookies/
| geuis wrote:
| Your demos don't play on iOS safari.
| autoencoders wrote:
| Ups! Thank you for point it that out. I'll check it.
___________________________________________________________________
(page generated 2021-11-20 23:00 UTC)