[HN Gopher] Automatic Video Editing
___________________________________________________________________
Automatic Video Editing
Author : todsacerdoti
Score : 90 points
Date : 2021-03-23 12:41 UTC (10 hours ago)
(HTM) web link (tratt.net)
(TXT) w3m dump (tratt.net)
| johnx123-up wrote:
| Just curious.. how is this better than
| https://davidbieber.com/snippets/2020-02-21-jump-cut-program... ?
| netik wrote:
| A lot of the issues raised in this post are well solved by
| commercial video editing software.
|
| For instance, markers and scene detection / edit lists in
| DaVinci, or word by word editing with recognition in Descript.
|
| I feel like the OP should give them a try before trying to apply
| the universal script hammer to this problem. The ffmpeg line
| alone is frightening.
|
| Use of unix time instead of timecode is another problem here.
|
| It's cool that he wrote this but it doesn't conform to any video
| standards in use.
| moyix wrote:
| The author seems to be using OpenBSD, so I imagine that rather
| limits their options.
| jjice wrote:
| My guess is that it was written because the author much prefers
| writing code and automating tasks to video editing, and it was
| worth the extra time for them to do that.
|
| Out of curiosity, do any commercial video editing programs
| offer automation/scripting? That seems like a great place for
| scripting to be available to help alleviate common tedious
| tasks, if it's not already there. I think that could lead to a
| best of both worlds.
| kfarr wrote:
| Yes there are many existing automation options for video
| production applications. For example, the concept of a
| portable EDL (edit decision list) file is already well
| established in the industry for decades:
| https://en.wikipedia.org/wiki/Edit_decision_list
|
| Scripting languages are also available for specific post
| production applications such as Final Cut:
| https://en.wikipedia.org/wiki/FXScript
|
| Then there grew a whole crop of cloud-based scripted content
| generators like stupeflix, sundaysky, idomoo.com.
|
| I think the author did a great job creating a fun project and
| explaining the ffmpeg workflow, but a video professional has
| many off-the-shelf options already.
| netik wrote:
| descript is the only thing I've seen that turns video editing
| into something resembling word processing - it gets very jump
| cut-y and jerky but most people are used to it in the
| TikTok/YT era.
|
| It's terrifically buggy right now but as a PoC it's amazing.
| netik wrote:
| oh, and scripting wise you can fully automate davinci with
| python.
| dceddia wrote:
| I've been thinking a lot about this problem myself! Making
| screencasts, I realized most of my editing time was spent cutting
| out silence. If I recorded with that in mind (staying silent
| between re-takes), the editing became pretty straightforward but
| really tedious.
|
| I made my own automatic editor tool [0] recently, and the
| approach I started with is simply cutting out the silent parts.
|
| I looked at other options like jumpcutter.py and auto-editor.py
| before building it, and they helped prove out the idea, but I
| really wanted an interactive UI, so I built an app in Swift.
|
| I figured the automated approach is probably not gonna be
| perfect, so I built it to export XML/EDL/ScreenFlow files that
| can be imported to other editors for fine-tuning. It works pretty
| well and people seem happy with it so far.
|
| Someone else mentioned timestamps vs. timecode. Maaan, timing has
| been a real thorn in my side with this project. The "even"
| framerates aren't too bad (30fps/60fps/etc) but the uneven ones
| are a huge pain (29.97, 59.94). One fun problem recently was
| figuring out how to bin audio samples at 48khz into frames at
| 29.97. Because each frame holds an uneven number (1601.6), I had
| to alternate between assigning 1602/1601 to each frame, or else
| my idea of time would slowly but steadily skew out of sync. As
| someone who'd never worked with video before, this has been a fun
| adventure.
|
| Right now I'm working on adding more manual control over the
| cuts, and this kind of stuff is what I'd like to tackle next!
| Automatic scene detection, ability to leave markers during
| recording, more control over transitions, stuff like that would
| be really cool. Feels to me like automatic editing might get more
| popular as more people realize it's even possible.
|
| 0: https://getrecut.com
| jjnoakes wrote:
| By "Automatic scene detection" do you mean somehow detecting in
| the video if interesting things are going on and avoiding cuts
| around those spots? Because a lot of videos I record have
| silence in the audio when important things might be happening
| in the video, and having to manually go find those places is a
| bit of a pain.
|
| Of course, automatically detecting which parts of the video are
| interesting or not is probably impossible, but it sure feels
| like an interesting problem to try to tackle.
| dceddia wrote:
| Something like that. I could base it on frame-to-frame
| changes, or if my app was doing the screen recording, I could
| look at keyboard/mouse input as another signal of "non-
| silence".
|
| I'm pretty much looking at silence removal as a good starting
| point. My overall goal is to cut down on the manual editing
| required, so just looking for repetitive processes that I
| could add automations for.
| clawoo wrote:
| Your software is such a great idea. The UI could use some work
| to be more attractive, but the core functionality is top notch.
|
| Here's a suggestion- using Speech.framework[0] you could
| probably quite easily transcribe the audio and identify filler
| words ("umm", "hmm", etc) and add an option to automatically
| exclude those as well.
|
| [0] https://developer.apple.com/documentation/speech
| dceddia wrote:
| Thanks! I like the Speech framework idea. I've toyed with it
| a bit and the results not so great, especially offline, and
| the online one has some limits. I think if I want to properly
| add transcription I'll need to integrate with some SaaS
| solution, but I need to do a little more experimenting first.
|
| Do you have any specific suggestions or critiques for the UI?
| I definitely agree it could be more attractive but I've had
| trouble figuring out what to do besides "make it look like
| Final Cut" or whatever. (or maybe I should actually do just
| that!)
| nate wrote:
| Love stuff like this. It feels like we're close to some really
| interesting things here but I haven't quite seen it yet.
| Facebook/Apple have their "auto movies" but they're largely just
| montages over music. Any interesting/useful audio captured in
| those clips just seems ignored.
|
| My brain cycles on thoughts of what GPT-3 like things could be
| enable here possibly. Could there be some interesting algorithm
| trained on which clip should come next kind of AI: these 10s or
| skip and check again.
|
| Self promotion, but I did fool with a way to try and automate
| making stop motion movies: https://www.trylocomotion.com
|
| Rudimentary process of letting people take a video and doing a
| reverse motion detection algorithm. "When there's nothing moving
| in frame, use that frame for the stop motion movie." But that was
| a fun dive into this world.
| bluetwo wrote:
| "Aeschylus is the worst written bit of software I've put my name
| to since I was about 15 years old and it will probably never be
| usable for anyone other than me."
|
| Hilarious and honest.
| PeterisP wrote:
| This reminds me of a cool tech demo about "enchanced tool" for
| video editing I saw in January -
| https://www.youtube.com/watch?v=Bl9wqNe5J8U from descript.com (no
| affiliation).
| nickjj wrote:
| After having recorded close to 600 screencast videos, I automated
| a number of setup and teardown processes too.
|
| Such as using the Sizer[0] tool to move windows to specific
| 1920x1080 coordinates of the screen where I configured OBS to
| record from. This way my desktop resolution never needs to
| change. Using Sizer requires right clicking a title bar and
| choosing a pre-created menu item and it auto-resizes and
| positions the window correctly. Very painless.
|
| But I also have these little shell scripts that are responsible
| for setting up font sizes and making sure my history is clear.
| Not showing history is so important if you're using CTRL+r and
| FZF frequently because having to blur stuff later is time
| consuming and error prone (ie, missing 1 frame of blur by
| accident). The stop record script reverts everything back to
| normal. record-start () { mv
| ~/.bash_history ~/.bash_history.bak && history -c rm
| /tmp/%* change_terminal_font 9 18 if
| [[ "${1}" = "--obs" ]]; then cd "/c/Program
| Files/obs-studio/bin/64bit" wslview obs64.exe
| cd - fi } record-stop () {
| mv ~/.bash_history.bak ~/.bash_history && history -r
| change_terminal_font 18 9 }
| change_terminal_font () { [[ -z "${1}" || -z "${2}"
| ]] && echo "Usage: change_terminal_font FROM_SIZE TO_SIZE"
| from="${1}" to="${2}"
| windows_user="$(powershell.exe '$env:UserName' | sed -e
| 's/\r//g')" terminal_config="/c/Users/${windows_user}
| /AppData/Local/Packages/Microsoft.WindowsTerminal_8wekyb3d8bbwe/L
| ocalState/settings.json" perl -i -pe "s/\"fontSize\":
| ${from}/\"fontSize\": ${to}/g" "${terminal_config}" }
|
| I don't think I'll ever automate the editing process because
| editing is where you can throw in a lot of human nice touches,
| like zooming into a specific area of the screen for emphasis or
| adding an overlay picture for context.
|
| But I do try to make things as live as possible, such as using
| OBS scenes to cut down on post processing editing. That and
| automating your audio processing so you don't need to edit your
| audio afterwards has given me the biggest bank for my buck in
| terms of how fast I can go from an idea in my head to a video
| ready for YouTube.
|
| A complete list of tools that I use for dev + recording + editing
| can be found here: https://nickjanetakis.com/blog/the-tools-i-use
|
| [0]: http://www.brianapps.net/sizer4/
| stevenicr wrote:
| checked your tools page for "zoom" with ctrl-f - wondering what
| you are using for zooming in.
|
| some years ago I had a microsoft mouse that included a third
| button and a driver-addon (I think) - that gave a great
| magnified box you could move around the screen until you
| unclicked the third button)
|
| I would love to find a way to do this magnified box again -
| when finding zoom in your comments, should I assume that you
| are using camtasia and doing it in post?
|
| I'd love to have this zoom ability for making videos but also
| when screen sharing live.
| nickjj wrote:
| Live zooming is something I tried but ultimately stopped
| doing it because it's too difficult to live code + narrate my
| thought process + zoom in on demand. Maybe if I had a foot
| pedal or something to control it heh.
|
| I zoom in post production during the editing process. Once
| you get used to your tools it's fast. It takes about a minute
| to zoom into a specific area of the screen, position it in
| the exact spot I want and then eventually zoom out back to
| normal. I like this process because it lets you adjust the
| zoom transition speed as needed and sometimes I also offset
| the X / Y coords to center it, etc..
| SeanFerree wrote:
| I agree. I could never have editing automated. It takes a
| while, but like you said, it adds a personal touch
| EricE wrote:
| I dunno - tools like this can take care of the 80% drudgery,
| freeing you to really focus on that 20% that provides the human
| touch :)
|
| Unless you are also recording all your streams before you make
| your on the fly directors cut with OBS, if you make an "ooops"
| you're done. With his approach if you don't like the automatic
| edits, the underlying source files are still there and you can
| override the automation.
|
| It would be easier if he did the automation within a
| traditional NLE workflow; overriding the automatic editing
| would be a lot easer. Since, ya know, editors were designed to
| make and keep track of changes (ha!)
| nickjj wrote:
| I think it comes down to recording styles too.
|
| My work flow is to start recording with OBS. Since I use a
| webcam in the corner while recording my screen I'm aiming for
| as little cuts as possible because with a webcam unless your
| face is positioned exactly how it was before watchers will
| see the jump cut (even if it's subtle). Editing where to cut
| manually to produce the least visible cut is an art form and
| takes a human touch.
|
| But I'll press record and do my best. If I get let's say 5
| minutes out of a 20 minute video down solid but screw up then
| I'll stop the recording. Then I'll start recording another
| file with OBS and resume where I left off trying to place my
| mouse cursor exactly where it was and lead off by saying what
| I was saying before so it flows.
|
| In the end I might have 2-5 relatively good videos that I
| then edit together using an NLE tool. Knowing where the cuts
| are is easy because it's pretty much the beginning of the
| file to the end.
|
| This also helps reduce massive file sizes where you end up
| with like a single 45 minute source video that gets edited
| down to 20 minutes because you made a ton of little mistakes.
| At a decent scale this matters because disk space while cheap
| isn't free for someone who is just a solo developer and does
| everything from 1 dev box. Usually by the end of a recording
| session I'll have like 15 source videos that I delete because
| I know they came out bad (often times getting into the first
| few minutes is the hardest for me), here's a screenshot of
| what I mean haha
| https://twitter.com/nickjanetakis/status/1347574482714685441.
|
| Normally I edit my stuff at 2x speed. If I'm not adding a lot
| of extra effects (zooming, tooltips, highlights, blurs,
| overlays, etc.) it goes by pretty fast. With work flows that
| you're used to editing really isn't that bad. It's creating
| the content / material and executing the human part
| (delivering the video -- execution basically) that takes most
| of the time.
|
| Then there's also editing things like an audio only group
| podcast. There's no way an automated editing process is going
| to be able to intelligently remove ums and ahs but leaving in
| a few at key places to make things sound more natural, or
| maybe removing a bit of stuttering from someone's line in a
| way that no one would ever notice. Or perhaps cutting out 2
| minutes all together because it doesn't add much to the
| conversation and isn't referenced later so it's safe to cut
| and no one would ever know.
___________________________________________________________________
(page generated 2021-03-23 23:01 UTC)