Post Ak6UgmuHiumeAUIiTw by soothspider@gigaohm.bio
(DIR) More posts by soothspider@gigaohm.bio
(DIR) Post #Ajys6VpmZ69MQdkcJE by peertube@framapiaf.org
2024-07-16T07:16:09Z
1 likes, 3 repeats
:peertube: #PeerTube 6.2 is out 🎉 This release brings automatic video #transcription using #Whisper (also supported in PeerTube runners too), a new comment policy "requires approval first", auto-tagging/labelling of videos and comments based on specific rules and a comment moderation page for video publishers 🎉 https://joinpeertube.org/news/release-6.2You can support our work by making a donation to #Framasoft, the non profit that develops #PeerTube 🥰 https://support.joinpeertube.org
(DIR) Post #AjysLcwwONcoqBJ43U by collectifission@greennuclear.online
2024-07-16T07:18:51Z
0 likes, 0 repeats
@peertube Awesome! Love how this gets better and better! Congrats on this release 😊
(DIR) Post #Ajz6pt7AFr0z1MPbyi by RiQuY@mastodon.online
2024-07-16T10:01:15Z
0 likes, 0 repeats
@peertube Isn't Whisper an OpenAI product? https://openai.com/index/whisper/ Does that mean that every video uploaded to PeerTube is feeding data to OpenAI products/datasets?Please clarify this, OpenAI data handling is a privacy nightmare.
(DIR) Post #Ak0EdEEtOcOHZ4SfCq by gabriel@mk.gabe.rocks
2024-07-16T23:02:10.993Z
0 likes, 0 repeats
👀 @tedb@gigaohm.bio @soothspider@gigaohm.bio
(DIR) Post #Ak0FxyFI9erlzs6Q3E by fcayre@mamot.fr
2024-07-16T23:18:14Z
0 likes, 0 repeats
@peertube spreading OpenAI/Microsoft software usage in libre software is a very bad move IMHO. Your historic links with @Framasoft should remind you technological choices like this are not neutral. Do you endorse OpenAI way to build #whisper?Integrating whisper in peertube is like selling organic avocados from 20000km away in an organic store.
(DIR) Post #Ak0LNUAF7R0o07j7Ds by soothspider@gigaohm.bio
2024-07-16T23:05:57.022843Z
1 likes, 0 repeats
Nice! @jjcouey @jazzilla
(DIR) Post #Ak0g8LzdJZyDLuOEl6 by tedb@gigaohm.bio
2024-07-17T04:11:31.819646Z
1 likes, 0 repeats
Awesome
(DIR) Post #Ak0iXEFNS2ruTak8kC by tedb@gigaohm.bio
2024-07-17T04:38:26.757500Z
0 likes, 0 repeats
@jjcouey assume you would want to enable this🎉 Add automatic transcription of videos to generate subtitles 🎉 #6303 Uses Whisper engines and models to create the subtitle and guess the video language Has to be enabled by admins in the configuration web interface: PeerTube will automatically download and install Whisper binaries/models Transcription can also be performed by PeerTube runners, as it can consume a lot of CPU Transcription generation can also be run manually by administrators
(DIR) Post #Ak1ndfXFbJl7m9xB7g by alonefree@mastodon.social
2024-07-17T17:10:15Z
0 likes, 0 repeats
@peertube hola,cómo se puede acceder a este sitio?? quiero desconectar de meta mierda, gracias 🙏✊🫶🫂♾️💯
(DIR) Post #Ak3Z895XB6gV7jz2Xo by tedb@gigaohm.bio
2024-07-18T13:37:11.666544Z
1 likes, 0 repeats
Version 6.2 is live and auto transcription turned on
(DIR) Post #Ak4eeTs48alHmAsj2W by soothspider@gigaohm.bio
2024-07-18T21:59:22.747504Z
1 likes, 0 repeats
Nice, you're using whisper-ctranslate2 (e.g. "faster-whisper"). Not sure how fast it is compared to whisperx, but at least it should be faster than plain whisper.We can try the timings with "small" model for now and also "large-v2" depending on how accurate it is.https://github.com/gigaohm/fediverse-servers/blob/c13723790d128f0df59252fc6a33d2555b09c61d/config/peertube-production.yaml.j2#L735I wonder if we can prompt it before hand. There's been some suggestions (but I haven't tried myself) that if I prompted whisper w/ spellings that it would correctly spell it. (e.g. "Some of the people who are being discussed here have names and spellings like Jonathan Jay Couey, John Beaudoin, ....") I haven't yet tried this, but if that works, that could "fix" a lot of the spelling errors ahead of time. It seems to get almost everything right except names of people, companies and things.(And I heard that large-v3 hallucinates a lot.)Depending on the output, we could maybe also add a post transcription step to push the subtitles to a repo.
(DIR) Post #Ak6HYPA4pDx8npMN3A by soothspider@gigaohm.bio
2024-07-19T20:54:32.858266Z
0 likes, 0 repeats
It's looking good. I jumped around here and there and the captions and the timings look pretty good. I guess we'll just have to wait until more people see it to point out any issues.
(DIR) Post #Ak6HYQE0s4hC6KD1pA by soothspider@gigaohm.bio
2024-07-19T20:58:59.904225Z
0 likes, 0 repeats
@tedb Where are the captions stored? Or rather, would we be able to pull captions from https://github.com/gigaohm/Stream.Transcripts/tree/main/twitch and backfill? Of course many of them are timed to Twitch so they'll be maybe 4-5s too early. Or we can always just use the ones that I captioned off of peertube.Jun 20 17:47 2177185950 - Throwback Thursday 18-20 Apr 2020-- (20 June 2024) -- Brief [GigaohmBiological - 2024-06-20].peertube.composed.mp4*Jun 21 17:26 2178054343 - The Kaufman Kirsch Crossover Event -- (21 June 2024) -- Brief [GigaohmBiological - 2024-06-21].peertube.composed.mp4*Jun 24 17:38 2180736879 - The Johnson Staff Meeting 2024 -- (24 June 2024) -- Brief [GigaohmBiological - 2024-06-24].peertube.composed.mp4*Jun 26 13:46 2182348678 - All the Ways THEY LIED about PCR -- (26 June 2024) -- Brief [GigaohmBiological - 2024-06-26].peertube.composed.mp4*Jul 9 11:57 2192753404 - Malone on Vejon June 2020 II -- (8 July 2024) -- Brief [GigaohmBiological - 2024-07-08].peertube.composed.mp4*Jul 9 12:18 2193249179 - Who the Hell is Jon A Wolff__ -- (9 July 2024) -- Brief [GigaohmBiological - 2024-07-09].peertube.composed.mp4*Jul 11 12:05 2194912309 - Matthew Ehret LIVE -- (11 July 2024) -- Brief [GigaohmBiological - 2024-07-11].peertube.composed.mp4*Jul 13 15:21 2196659396 - Elon Musk is NOT a Visionary Part I -- (13 July 2024) -- Brief [GigaohmBiological - 2024-07-13].peertube.composed.mp4*Jul 15 13:00 2198362510 - What is Alex Jones REALLY_ -- (15 July 2024) -- Brief [GigaohmBiological - 2024-07-15].peertube.composed.mp4*
(DIR) Post #Ak6HYQm2pXuxns8Crw by tedb@gigaohm.bio
2024-07-19T21:04:22.850409Z
0 likes, 0 repeats
Not sure where stored yet as I didn’t even know it was a feature. Guessing in the server on in the DB. There should be an API endpoint to upload I hope. Will check the peertube docs and PR for the feature.
(DIR) Post #Ak6IDKcqavPKbjKfcO by tedb@gigaohm.bio
2024-07-19T21:11:47.471821Z
0 likes, 0 repeats
Current config is OOTB and looks like:https://github.com/Chocobozzz/PeerTube/blob/fbee171a0bea725e74ddc9ca4a00875c2250afd4/config/production.yaml.example#L729-L755
(DIR) Post #Ak6Kg6syxQPWIhjIoK by soothspider@gigaohm.bio
2024-07-19T21:14:57.683853Z
0 likes, 0 repeats
Sounds good. When I have more time I'll poke around as well.
(DIR) Post #Ak6Kg7vV5Y1FWnupNI by soothspider@gigaohm.bio
2024-07-19T21:37:41.205150Z
0 likes, 0 repeats
Looking at my network console, it pulls the .vtt from /lazy-static/video-captions/. So it stores it locally the VM. So there is probably a DB entry that connects the VOD to the .vtt location/or-id.https://stream.gigaohm.bio/lazy-static/video-captions/7d766071-ce4a-4718-8177-f7549dd9de95-en.vtt
(DIR) Post #Ak6Kg8jq4OHk2vT1pw by tedb@gigaohm.bio
2024-07-19T21:39:23.482562Z
0 likes, 0 repeats
Check this https://docs.joinpeertube.org/use/create-upload-video#manual-transcription
(DIR) Post #Ak6KviyRd3wslxjKiW by tedb@gigaohm.bio
2024-07-19T21:42:13.449048Z
0 likes, 0 repeats
Also API https://docs.joinpeertube.org/api-rest-reference.html#tag/Video-Captions/operation/addVideoCaption
(DIR) Post #Ak6LCFgpl9tmsANF4a by soothspider@gigaohm.bio
2024-07-19T21:43:57.244876Z
0 likes, 0 repeats
That is nice! It even allows for multiple languages. @Teo might want to send you Norwegian subtitle translations to get added.Is there an API call?This post implies that we can do it. https://beeldengeluid.github.io/extending-peertube/subtitles/2021/08/09/adding-subtitles-with-the-api.html
(DIR) Post #Ak6LCGl7mgvQBlOBOq by tedb@gigaohm.bio
2024-07-19T21:45:12.395187Z
0 likes, 0 repeats
Yep API calls to upload or even trigger a new generation using the whisper AIhttps://docs.joinpeertube.org/api-rest-reference.html#tag/Video-Captions
(DIR) Post #Ak6LN7cC7cfP0xO3Ps by tedb@gigaohm.bio
2024-07-19T21:47:10.449806Z
0 likes, 0 repeats
I would have to speak to @jjcouey to see how he wants to handle the uploads (as in he uploads or we create a user for others to upload)
(DIR) Post #Ak6LW4rdidnD5Zsj2m by soothspider@gigaohm.bio
2024-07-19T21:47:03.419489Z
0 likes, 0 repeats
Okay, I guess in the future/near-future, if we wanted we can try something like:1st pass English transcriptions are checked into GitHub.Any additional corrections or language additions are checked into GitHub.Upon merge/PR (or whatever), it will push/publish the changed (or all??) captions for a given video.So basically if anyone wants to generate subtitles for other languages (e.g. @Teo ), then they can also submit changes to GitHub. Also if there are any spelling errors or whatever.
(DIR) Post #Ak6LW5X7EQyBADHqgy by tedb@gigaohm.bio
2024-07-19T21:48:47.325525Z
0 likes, 0 repeats
Yeah that’s cool. If we have it on the gigaohm org we can run actions from there with the API secrets. Merge to main after PR approval would publish updated transcriptions.
(DIR) Post #Ak6LZ80MUy4PhMPrNY by jazzilla@noauthority.social
2024-07-19T21:49:20Z
0 likes, 0 repeats
@tedb @jjcouey @soothspider Could probably be automated from the GitHub workflow...(I'm a broken record 😂)@Teo @gabriel
(DIR) Post #Ak6LaSvcBuaesfpk0G by tedb@gigaohm.bio
2024-07-19T21:49:35.436205Z
0 likes, 0 repeats
Totally.
(DIR) Post #Ak6Ugm2OxFgLTN5gUi by soothspider@gigaohm.bio
2024-07-19T22:06:10.536948Z
0 likes, 0 repeats
Yes, we should tap into Jazzilla's experience here and use GitHub workflows as much as possible. It can store secrets in the repo so we don't expose all our keys.I think that's the way to do it. Automate it from GitHub and then the only thing we need is to manage PRs in GitHub. Members of the community can be empowered to look at that as well.And we can probably tool up some JS hack for them to preview their own subtitle timings (e.g. https://jsfiddle.net/fnoL8drj/)I suspect for most of the changes, it can just be done in their own forks just using the Web UI.(e.g. Fork, branch, edit, preview w/ their own raw file url, submit PR, merge, update their fork; hence why we should teach them to branch first...)
(DIR) Post #Ak6UgmuHiumeAUIiTw by soothspider@gigaohm.bio
2024-07-19T22:07:57.729968Z
0 likes, 0 repeats
Okay, now that I thought of this more... let's see if we can do it this way.Then what I'll do is I'll submit the PRs for the previous timed-to-peertube captions to test it out. I'll just use the Web UI and work through it going backwards. When it gets submitted, we can then test to see if it worked as expected and we can iron out any bugs.If it works as expected, it should then publish those captions upon merge.
(DIR) Post #Ak6UgndJ1WnQQ7Mfei by tedb@gigaohm.bio
2024-07-19T23:31:33.704477Z
0 likes, 0 repeats
I’m already using GitHub actions heavily with the deployment of these servers plus it day job is an engineer in a team owning GitHub enterprise and CI/CD so super comfortable in implementing this.
(DIR) Post #Ak6UuPFkSrRUrSpm76 by tedb@gigaohm.bio
2024-07-19T23:34:02.173624Z
0 likes, 0 repeats
Are you up for moving the current transcriptions repo into the gigaohm org and then me putting you as write access to it? We can build it out there.
(DIR) Post #Ak6V3QAmPicGJvoCTg by tedb@gigaohm.bio
2024-07-19T23:35:39.866023Z
0 likes, 0 repeats
Then you can fork back to yours. Also we can allow the forks to trigger workflows on branches as well so it can all be tested/dry-run.
(DIR) Post #Ak6VAaSAacLMERMME4 by tedb@gigaohm.bio
2024-07-19T23:36:57.638555Z
0 likes, 0 repeats
Also you never store secrets in the repo. I mean you can encrypted, but the actions workflow can reference org level (or repo level) secrets, but they are never in the code.
(DIR) Post #Ak6VKzs0wfGRxNPbYe by tedb@gigaohm.bio
2024-07-19T23:38:50.503193Z
0 likes, 0 repeats
The publishing doesn’t need to be complex. Can be bash scripts using curl iterating over files in the repo.
(DIR) Post #Ak6VPFRYzAyDPn4Cyu by soothspider@gigaohm.bio
2024-07-19T22:00:16.911132Z
1 likes, 0 repeats
Sounds good.Since the API call is something like: PUT /api/v1/videos/{id}/captions/{captionLanguage} (e.g. PUT /ap1/v1/videos/fdbe3702-a6d9-46ec-a7e3-1cc5f2ca66cb/captions/en)We can probably arrange the repo something like:- streams/ - 2024-07-19 - Study Hall: LiMengYan, Healy, and Baltimore/ - README.md - id - en.vtt - no.vtt- tools/So the captions will just be the name of the 2-letter language code. The id can just be a text file there. An automated README.md could be generated (e.g. link back to the actual published video URL).It runs into the problem of having 2 streams in the same day not be in the order it played (due to being sorted alphanumerically), but I don't think that poses a big problem necessarily. I guess we can also put a timestamp or HH:MM on it. Or we can add it to the auto-gen README.md.Just trying to make things easier for the future (e.g. it's much easier to use scripts to do mass changes... so what kinds of info do we want/care about?).
(DIR) Post #Ak6lQDE4JaQY5qEjAm by soothspider@gigaohm.bio
2024-07-20T02:38:19.942495Z
0 likes, 0 repeats
Re: secrets. Yeah I meant storing them in Git or the org via GitHub secrets.For the transcription repo, we already have a fork of my repo on Gigaohm that I have write access to, but I'm thinking it would be better just to have a new repo that's structured to just serve this peertube channel.I have several reasons for this setup:It would be a pinch to setup other channels this way (it's probably also fine to have multiple channels per repo). So for example we can enable it for @Housatonic channels as well (instead of Mark poor-manning YT transcriptions). If more channels come to Gigaohm, then they can all get their own repos (if they wanted) and different people with write access.My intention with my repo was that I was also going to do captions for non-1st-party Gigaohm streams and the freedom to mess it up without worrying that it still works in some sort of production workflow. I might also sunset commits to my repo and the fork if this works out well.I want to test the idea that non-developers/sysops people can just fork the repo, branch, make changes, submit a PR all using just the Web UI. This will help with contributions overall. So my first tasks will be to try doing it with the captions I already have for Peertube streams (e.g. timed to Peertube instead of twitch or rumble).Putting in the hooks for this kind of thing in this way should make it reusable for later deployments. There are others (e.g. @super_spreaders ) who have thought about standing up their own Peertube as well so with a turn-key that supports community edits, it might be something that no one else really has (and is useful).BTW, do you have timing logs? Do you know how long the last stream took to transcribe on small?
(DIR) Post #Ak9P4OGmxTpUSOZtkO by fosserytech@social.linux.pizza
2024-07-21T09:12:37Z
0 likes, 0 repeats
@peertube Pretty useful things. I'm not a big fan of AI but if it runs locally on instances it's kinda ok, it's actually a good usecase.Nice too see PT improve, there are only 2 features I still miss:1. comment reactions (like, dislike, heart)2. Markdown support in video descriptions (I know other Fediverse platforms probably couldn't handle this well, so it wouldn't be too practical)
(DIR) Post #AkCEU3Q2aU8vITouiu by soothspider@gigaohm.bio
2024-07-22T17:50:10.619652Z
0 likes, 0 repeats
@tedb For example, all the batcave videos can go in the same repo. Having subtitles in the repo also makes it useful for searching.So maybe the structure should be be /<channel>/<specific-stream>...https://gigaohm.bio/@soothspider/posts/Ak6MXb6I04pmlyxA9YMight need a config file then as well (e.g. instead of hard coding which folders it looks in).e.g./batcave/gigaohm(Then it'll watch /batcave/* and /gigaohm/*.)
(DIR) Post #AkCEU4cU7hgv1MeMl6 by jazzilla@noauthority.social
2024-07-22T17:58:15Z
0 likes, 0 repeats
@soothspider If you're talking about storing large files in a repo, you're going to run into the limitations and work arounds we previously discussed.@Housatonic @jjcouey @super_spreaders @tedb @Teo @gabriel
(DIR) Post #AlwF5t3uWhiLFfBIfo by biloti@ursal.zone
2024-09-12T20:33:44Z
0 likes, 1 repeats
@peertube this is great. The transcription of my last video to Brazilian Portuguese was just perfect.